DATA VIRTUALIZATION – WHAT’S REAL?
Meenakshinathan Padmanabhan & Arvind Handuu
Unless you’ve been living under a rock for the past few years OR have
chosen to not look at a printed word, you’d have been told, often enough
that you are physically fatigued, data is what you produce and how that
(data) is the new oil. The real cure for all that ails us. How, once we
organize it and make it get more well-rounded experience and learning, the
world would be a better place. My cat is now afraid of Data, it’s data
world and not the cat’s, we all including the cat just rent it.
To be fair though, the impact of Data – the availability, regeneration,
amount, age, lineage, utility, diversity, value, reach- though at times
overstated is significant enough that organizations are well served by
continually scanning the environment to look for opportunities that make
value creation possible.
In a recent Forbes Survey’ 2018 Data Virtualization was in the top 3
highest growth areas for the year 2018 / 2019.
This post is an attempt to simplify the conceptual presentation on Data
Virtualization. So then let’s take it from the top.
So, what is Data Virtualization? what is a good use case for this? and
what are the corner cases where this approach fails?
Data Management, especially in the applications that require post-fact data
creation and analysis of institutional data assets, is an ever-moving
target. It appears that just as a seemingly effective governance model is
implemented and initial set of questions are getting answered, new
questions arise often requiring new information and data sources to be
incorporated in the answer base. This requires reworking the data
warehouse, introducing the new data source, applying the same rigor to
ensure data hygiene. And an expensive build, add, analyze cycle repeats.
Data Virtualization offers a near-term reprieve from this solution by
making it easy to introduce new data sources rather quickly. This solution
has a potential of being THE solution in case of less complex data
environments and at least a resilient intermediate solution in applications
with higher data complexity.
Data virtualization is the process of offering data consumers a data
access interface that hides the technical aspects of stored data, such
as location, storage structure, API, access language, and storage
Data virtualization creates integrated views of data drawn from disparate
sources, locations, and formats, without replicating the data, and delivers
these views, in real time, to multiple applications and users. Data
virtualization is any approach to data management that allows an
application to retrieve and manipulate data without requiring technical
details about the data, such as how it is formatted at source, or where it
is physically located, and can provide a single customer view (or single
view of any other entity) of the overall data. Data virtualization can draw
from a wide variety of structured, semi-structured, and unstructured
sources, and can deliver to a wide variety of consumers. Because no
replication is involved, the data virtualization layer contains no source
data; it contains only the metadata required to access each of the
applicable sources, as well as any global instructions that the
organization may want to implement, such as security or governance
controls. This concept and software is a subset of data integration and is
commonly used within business intelligence, service-oriented architecture
data services, cloud computing, enterprise search, and master data
The concept was initially incorporated in various business intelligence
tools like @Qlik, @Spotfire, @Tableau to name a few. The obvious limitation
being the close coupling between the virtual data store and the choice of
analytical (at the time this was mainly data visualization) tools. That
meant that the limitations of the analytical tools defined the extent to
which data could be utilized. The below graphic represents the data
virtualization approach by one of the leading solutions vendors in this
Image Courtesy: Denodo
Our teams have taken a position that in case of very small data base
volumes and relatively clean data sources, data virtualization would be an
effective solution that would allow a federated data structure and quick
analytics solution. However, as the data complexity increases the
organizations will need a more disciplined data governance practices
effected in the data warehouse led analytics platform. In such cases a
virtualized database solution would be utilized as a rapid Proof of Concept
solution to test various source systems.
We find data virtualization highly effective in the following use cases:
‣ Generally structured data sources with easy to define relationships.
Referring to the promise stated earlier in this article, Data
Virtualization really does deliver on the data integration front. Whether
one needs data from a mobile application or from hundreds of domains and
other web technologies, Data Virtualization consolidates all of that into a
‣ Data virtualization supports the integration of structured and
semi-structured data, and is seamlessly supported by the likes of Hadoop
‣ Rapid analytics delivery OR short-term proof of concept solutions. Unlike
some massive Data Management solutions, Data Virtualization can be
implemented at an unnervingly rapid rate. It can be implemented into
already existing infrastructure in a matter of weeks and months. Some Data
Virtualization adopters have reported an ROI turnaround of less than six
‣ Direct exposure into the source applications, the reason for data
virtualization is the ability in incorporate operations data in real time.
While the above might appear compelling, data virtualization falls short in
the following key application areas:
‣ Historical and lineage tracking applications e.g. Slowly Changing
Dimension Type I/ Type II problem areas. Organizations need to use data
warehouses when there exists a need to analyze data that is days, weeks or
even months old. Data warehouses are a better option for an organization in
‣ Data Virtualization often imposes a great deal of stress on the
organization’s operations, often requiring massive overhead. These changes
need to be integrated and distributed throughout every user and application
within your entire infrastructure. This can be a huge financial and
logistical strain on your environment.
‣ Overall effectiveness, data virtualization solutions can be deceptively
difficult. The data virtualization solutions’ effectiveness in managing
real-time data delivery can be a little underwhelming. The expectation gap
usually occurs when an organization thinks that just because they’re using
a powerful Data Virtualization solution that they no longer have to manage
their own data.
In the Data Management space there are very few, if any, magic bullets.
Data Virtualization is an effective Swiss Army knife in a data architect /
Solution strategist’s toolkit. While data virtualization is far from
perfect now the overall market is evolving at a rapid rate to provide
access to real-time, easily managed data. But as a sole mode of capturing,
interpreting, and managing BI data, the virtualized data warehouse is an
effective strategy to create business value and introduce additional data
sources in the analytics framework.
Meenakshinathan (Nathan) Padmanabhan is a Sr. Data Solutions
Architect at Visvero, Inc. He has been supporting various F2000
clients in deploying effective data management, Business
Intelligence and Analytics solutions for over 20 years. Nathan is
based out of the Visvero, Pittsburgh.
Arvind Handuu is a Practice Manager for Business Intelligence &
Analytics at Visvero. Arvind is an analytics value purist. He
believes that a BI & Analytics platform should be a
self-contained and sovereign solution. “Its value drops to zero the
instant you are using a different data source to inform your
decision.” Arvind is based out of Visvero, Pittsburgh.
I have had the pleasure of working with Visvero (Sandi) for almost two years now. She has been an extremely resourceful contact and is always quick to help. She and her team at Visvero were able to find candidates for even the most stringent requirements and job descriptions. She is a pleasure to work with and will make sure any candidate she sends over is a qualified candidate. I would recommend any recruiters working on C2C contract positions to reach out to her.
Process Automation, Budgeting, Applications Development, Google Charts, PHP, Open Source, Pentaho
Visvero helped us develop a industry transforming solution to have a direct profit impact on mid-size landscaping industry. We have been able to help our clients manage the business lifecycle with their clients from the lead generation to contracts execution phase. The application helps our clients identify and independently track and manage each cost component on any project or maintenance contract. Tracking/ managing these allows our clients to better control quality and profitability of each contract.
Business Intelligence, Qlik, Alteryx, .NET, Tableau, Sisense, SpotFire, Machine Learning
As an advisor to various state and national agencies NIC takes on several blue sky technology projects for which we rely on trusted partners for advice and investment of time and treasure. Visvero team invested thousands of person hours helping our team socialize the key analytics technologies and develop and enhance the products for various client projects. The team was always able to take project briefs and create high utility analytics applications. Definitely recommend them for public sector applications projects.
Business Intelligence, Staffing, .NET, Tableau, Flexible
In LA, we are in a competitive market for talent. So, when we had a recent need for specialized Tableau developers, we turned to our trusted friends at Visvero. We were rapidly able to add 3 talented developers on project. Visvero was very flexible in accommodating our request to convert developers as Fandango FTE. I highly recommend Visvero recruiters for being able to find critical Business Intelligence consultants quickly and being flexible with contracts to make the gains permanent.
Business Intelligence, Qlik, OBIEE, Risk Reporting, HR Analytics
“We organized an analytics “bake-off” between IBM using Endeca and Visvero using Qlik. In two weeks the Visvero team was able to deliver a consumer ready Risk and HR analytics application, four weeks later we were just getting started with IBM. And what’s more, my team and I spent over 80 hours getting IBM to start to understand what it is that we needed. Visvero team had a subject matter expert who took all of 6 hours with my team. Engaging Visvero was a “no-brainer” decision.”
Pharmaceuticals, Sales and Research applications, Qlik, SAP
At Allergan, our business team had a very divergent view of output from our analytics roll-out. Visvero was able bridge the gap and deploy a crack team of Qlik and SAP developers who enabled us to deliver the applications in weekly sprints. Our team scaled 1 power user to a team of 8 developers in 10 days. QuikWIN methodology was transformational in how we delivered applications to our customers – the sales organization.
Business Intelligence, Staffing, SAP, Business Objects, Oracle, DB2
This project is a key win for us. Visvero consultants were properly vetted and compliant with our rigorous processes. Consultants appeared to be satisfied with offline mentoring and periodic training on new tools. As a contracted asset the engaged consultants do not have adequate access to same trainings that are available to our FTEs, Visvero closed this gap by creating their internal trainings to support our consultants. It was great for everybody involved. And the teams were super productive!