Debunking common data virtualisation myths

Fri, 14th Oct 2022

FYI, this story is more than a year old

By Ravi Shankar, Senior Vice President, Denodo

For most organisations in Australia, data is a valuable and strategic enterprise asset required for supporting and driving many business initiatives. Significant investment has gone into innovation to modernise the technology landscape and embrace cloud platforms, with more recent focus on better leveraging data for business improvement, continuity and competitive advantage through the adoption of AI and ML technologies.

In the modern enterprise, data strategies and architectures require flexibility and agility to support the growing and changing needs of business. Data is key to helping an organisation solve business problems and be more efficient and more effective.

Gartner estimates that more than 56% of organisations have an average of four or more concurrent hyperautomation initiatives underway.

How can you achieve enterprise intelligence and make improvements more rapidly while mitigating the risks and reducing the associated costs of traditional data delivery methods? One way is to take a logical approach and create a virtual view of the highly distributed and complex data ecosystem.

Data virtualisation is a data integration technique that provides a single unified layer across the entire data landscape, making data accessible without having to move or replicate the data. Data delivery is accelerated, complexity is removed, and efficiencies are gained, all while protecting and governing the data regardless of its type and location.

As the adoption of data virtualisation grows, it is also discussed more frequently by analysts and vendors. However, the technology is not always described accurately. Often, data virtualisation is described as being similar to the early data federation systems of the late 1990s or early 2000s, which are extremely different from modern data virtualisation platforms.

To provide a better understanding of data virtualisation, I will clarify some of the common myths about this technology.

Myth 1: Data Virtualisation is Equivalent to Data Federation

When data virtualisation was first introduced, data federation was one of its primary capabilities. Data federation involved the ability to answer queries by combining and transforming data from two or more data sources in real-time. Similarly, the data-access layer established by data virtualisation contains the necessary metadata for accessing a variety of data sources and returning the results in a fraction of a second.

However, this capability of data virtualisation has been broadened by leaps and bounds in recent times. Data virtualisation's toolset now includes capabilities such as advanced query acceleration, which can improve the performance of slow data sources. Data virtualisation solutions also provide sophisticated data catalogues and can build rich semantic layers into the data-access layer so that different consumers can access the data in their chosen form.

Myth 2: Data Virtualisation Overwhelms the Network

The data sources used in analytics architectures typically contain exceptionally large volumes of data. This is especially true as data generation has been increasing exponentially.

One might think that data virtualisation platforms will always need to retrieve large data volumes through the network, especially when they are federating data from several data sources in the same query, which would heavily tax query performance.

The fact is, the query acceleration capabilities of data virtualisation platforms mentioned above also minimise the amount of data flowing through the network. These techniques offer the dual advantage of improving performance while reducing network impact, which frees up the system to accommodate a heavier query workload.

This is possible due to advances in the query execution engines of data virtualisation, which act like coordinators that delegate most of the work to the applicable data sources. If a given data source is capable of resolving a given query, then all of the work will be pushed down to that source.

However, if the query needs data from multiple source systems, the query execution engine will automatically rewrite the query so that each source will perform the applicable calculations on its own data, before channelling the results to the data virtualisation platform. These results involve far less data being read over the network compared with the early incarnations of federation tools.

Myth 3: Data Virtualisation Means Retrieving All Data in Real Time

With data virtualisation, the default mode for query execution is to obtain the required data in real-time directly from the data source. This will often perform well, and at Denodo, this is the most common execution strategy used by our customers.

However, advanced data virtualisation platforms also support additional execution methods to further improve performance and better accommodate slow data sources.

For instance, data virtualisation can replicate specific virtual datasets in full. This can be useful for specific cases, such as providing data scientists with a data copy they can modify and work with without affecting the original data.

Today, data scientists can decide between a range of options from zero, to partial, to full replication. Also, that decision is transparent to the data consumers, and it can be changed at any time without affecting the original data source.

Next-Generation Data Virtualisation

With the explosion of data, Australian businesses struggle to empower their personnel with the right data and analytics at the right time, in the right format, to make data-driven decisions. Gartner survey data shows that 65% of the decisions made are more complex than just two years ago.

Data is fast becoming the lifeblood of modern-day businesses that operate in increasingly digitalised environments. Many businesses turn to data virtualisation as a foundational technology to drive business performance, operational agility, and overall resilience.

Data virtualisation has evolved over the years to offer both advanced performance and advanced support. It now incorporates emerging technologies, such as artificial intelligence (AI), to automate manual functions and speed up data analysis. These capabilities effectively free up IT teams so that they can focus on innovation and other business objectives.

Business and technology leaders need to understand the true benefits and possibilities of data virtualisation. Only then can they fully appreciate the potential that the technology brings to modern analytics.

Share on: