DATA VIRTUALIZATION – WHAT’S REAL?
Meenakshinathan Padmanabhan & Arvind Handuu
Unless you’ve been living under a rock for the past few years OR have chosen to not look at a printed word, you’d have been told, often enough that you are physically fatigued, data is what you produce and how that (data) is the new oil. The real cure for all that ails us. How, once we organize it and make it get more well-rounded experience and learning, the world would be a better place. My cat is now afraid of Data, it’s data world and not the cat’s, we all including the cat just rent it.
To be fair though, the impact of Data – the availability, regeneration, amount, age, lineage, utility, diversity, value, reach- though at times overstated is significant enough that organizations are well served by continually scanning the environment to look for opportunities that make value creation possible.
In a recent Forbes Survey’ 2018 Data Virtualization was in the top 3 highest growth areas for the year 2018 / 2019.
This post is an attempt to simplify the conceptual presentation on Data Virtualization. So then let’s take it from the top.
So, what is Data Virtualization? what is a good use case for this? and what are the corner cases where this approach fails?
Data Management, especially in the applications that require post-fact data creation and analysis of institutional data assets, is an ever-moving target. It appears that just as a seemingly effective governance model is implemented and initial set of questions are getting answered, new questions arise often requiring new information and data sources to be incorporated in the answer base. This requires reworking the data warehouse, introducing the new data source, applying the same rigor to ensure data hygiene. And an expensive build, add, analyze cycle repeats. Data Virtualization offers a near-term reprieve from this solution by making it easy to introduce new data sources rather quickly. This solution has a potential of being THE solution in case of less complex data environments and at least a resilient intermediate solution in applications with higher data complexity.
Data virtualization is the process of offering data consumers a data access interface that hides the technical aspects of stored data, such as location, storage structure, API, access language, and storage technology.
Data virtualization creates integrated views of data drawn from disparate sources, locations, and formats, without replicating the data, and delivers these views, in real time, to multiple applications and users. Data virtualization is any approach to data management that allows an application to retrieve and manipulate data without requiring technical details about the data, such as how it is formatted at source, or where it is physically located, and can provide a single customer view (or single view of any other entity) of the overall data. Data virtualization can draw from a wide variety of structured, semi-structured, and unstructured sources, and can deliver to a wide variety of consumers. Because no replication is involved, the data virtualization layer contains no source data; it contains only the metadata required to access each of the applicable sources, as well as any global instructions that the organization may want to implement, such as security or governance controls. This concept and software is a subset of data integration and is commonly used within business intelligence, service-oriented architecture data services, cloud computing, enterprise search, and master data management.
The concept was initially incorporated in various business intelligence tools like @Qlik, @Spotfire, @Tableau to name a few. The obvious limitation being the close coupling between the virtual data store and the choice of analytical (at the time this was mainly data visualization) tools. That meant that the limitations of the analytical tools defined the extent to which data could be utilized. The below graphic represents the data virtualization approach by one of the leading solutions vendors in this technology, Denodo.
Image Courtesy: Denodo
Our teams have taken a position that in case of very small data base volumes and relatively clean data sources, data virtualization would be an effective solution that would allow a federated data structure and quick analytics solution. However, as the data complexity increases the organizations will need a more disciplined data governance practices effected in the data warehouse led analytics platform. In such cases a virtualized database solution would be utilized as a rapid Proof of Concept solution to test various source systems.
We find data virtualization highly effective in the following use cases:
‣ Generally structured data sources with easy to define relationships. Referring to the promise stated earlier in this article, Data Virtualization really does deliver on the data integration front. Whether one needs data from a mobile application or from hundreds of domains and other web technologies, Data Virtualization consolidates all of that into a single solution.
‣ Data virtualization supports the integration of structured and semi-structured data, and is seamlessly supported by the likes of Hadoop and MapReduce.
‣ Rapid analytics delivery OR short-term proof of concept solutions. Unlike some massive Data Management solutions, Data Virtualization can be implemented at an unnervingly rapid rate. It can be implemented into already existing infrastructure in a matter of weeks and months. Some Data Virtualization adopters have reported an ROI turnaround of less than six months.
‣ Direct exposure into the source applications, the reason for data virtualization is the ability in incorporate operations data in real time.
While the above might appear compelling, data virtualization falls short in the following key application areas:
‣ Historical and lineage tracking applications e.g. Slowly Changing Dimension Type I/ Type II problem areas. Organizations need to use data warehouses when there exists a need to analyze data that is days, weeks or even months old. Data warehouses are a better option for an organization in this case.
‣ Data Virtualization often imposes a great deal of stress on the organization’s operations, often requiring massive overhead. These changes need to be integrated and distributed throughout every user and application within your entire infrastructure. This can be a huge financial and logistical strain on your environment.
‣ Overall effectiveness, data virtualization solutions can be deceptively difficult. The data virtualization solutions’ effectiveness in managing real-time data delivery can be a little underwhelming. The expectation gap usually occurs when an organization thinks that just because they’re using a powerful Data Virtualization solution that they no longer have to manage their own data.
In the Data Management space there are very few, if any, magic bullets. Data Virtualization is an effective Swiss Army knife in a data architect / Solution strategist’s toolkit. While data virtualization is far from perfect now the overall market is evolving at a rapid rate to provide access to real-time, easily managed data. But as a sole mode of capturing, interpreting, and managing BI data, the virtualized data warehouse is an effective strategy to create business value and introduce additional data sources in the analytics framework.
Meenakshinathan (Nathan) Padmanabhan is a Sr. Data Solutions Architect at Visvero, Inc. He has been supporting various F2000 clients in deploying effective data management, Business Intelligence and Analytics solutions for over 20 years. Nathan is based out of the Visvero, Pittsburgh.
Arvind Handuu is a Practice Manager for Business Intelligence & Analytics at Visvero. Arvind is an analytics value purist. He believes that a BI & Analytics platform should be a self-contained and sovereign solution. “Its value drops to zero the instant you are using a different data source to inform your decision.” Arvind is based out of Visvero, Pittsburgh.
They created a website for me just like OLX related to advertising , buying and selling of product, “Easy Save” . The client was happy with The final delivered product was very nice and fulfilled my expectations. Had a good one year experience with the great team.
Started first project with Visvero in 2014 and worked with them on 3 more. I was very happy with the Project Management process and with the timely updates given by the team. He grew his business along with Visvero and gave many new projects to us.
Mr. Chase Slepak
Too much satisfied with the problem solving ability of the team. They resolved the issues in a very easier way instead of making it a complex one. Found the work to be easy as I worked with them in a team environment. It was totally an home like environment.
Developed a marketplace for babies and children product. During the whole development of the website I was very happy that the team followed my instructions and at the completion results were very good and satisfying. Wish to work with them on more bigger projects.