Data evolution: Past, present and future
Article by Denodo Technologies CEO and founder Angel Viña
As the needs of enterprises have shifted and come to rely on data as possibly the single biggest asset driving business outcomes, the data landscape has shifted as well. Tracing back through the evolution of data proves to be incisive – it has become evident that our attitude to data, and how it serves us, has changed dramatically.
In the 1980s, when all data was stored in a single database, our concept of data and its uses was based on the idea of a central repository. From there, the single database evolved into data warehouses, which allowed data to be ingested and stored according to its source and category, but in doing so it became siloed, and ultimately proved to be inefficient.
Enterprises later began to realise the power of analytics, and early-generation analytics programs were written to sit above the data layer and derive business value from it. The data warehouse evolved into mini data warehouses to avoid bottlenecks created from having one mass repository, and then evolved again into the enterprise data warehouse, with integrations and layers of analytic software built into them. By the mid-2000s, Hadoop changed things up once more, building code into the data itself, so that it became more fluid and easily accessible.
It was also in this period where we witnessed a surge in the number of disparate data sources that an enterprise had at its disposal, with the rise of social media and mobile applications. This brought a whole lot more data into the business, much of which was unstructured rather than structured.
However, companies required an unobstructed view of both data sets to maximise the benefits and effectiveness of the intelligence within their enterprise. Storing all data in one place was no longer an effective strategy, as it became hard to access and even harder to analyse effectively. Deriving true business value from both structured and unstructured data meant it was not stored in a single place, introducing a host of other inefficiencies.
One source of data for finance, for example, would need its own data warehouse, as would sales, as would marketing. Much of that data would be common across several departments, so the governance of data became much more complicated as well.
Throughout this brief history of big data storage, a common theme remains constant — the need to access data quickly and effectively. This has always been the central concern for an enterprise needing fast, efficient access to data for generating reports, predicting consumer behaviours, employing predictive maintenance on key assets, and enabling strategic decision-making.
The next phase of data access in the enterprise is to move to a scenario where data resides in source systems, but is accessed virtually from one single platform. Just as virtualisation was a highly effective model for moving information and apps into the cloud, data virtualisation will be crucial in enabling the next stage of the data evolution.
The logical approach to accessing different types of data across disparate sources and systems has been to bring it together into one single pool rather than store it in one repository. Yet data virtualisation can allow enterprises to connect to this pool wherever it happens to be, even across distributed systems, without having to physically bring it together into the central repository.
With data divided into structured and unstructured forms and streaming in from across disparate systems, it would be almost impossible to process without automation. Currently, enterprises generally process data at great speed, even as it is collected.
However more than raw data, people require information in their hands in real time, or at least as close to real time as is possible. To get from data to information, there is a significant need to process data and convert it into intelligence and business insights even as more data is streaming onto the business.
Now that artificial intelligence and machine learning are becoming fundamental parts of business functionality, automating the value derived from data makes a lot of sense to speed up processes and remove inefficiencies.
Beyond this, data virtualisation creates intelligence through the metadata layer, which allows businesses to understand and relate to changes occurring in source systems, and have that aggregated data available to end users. Virtualisation can also provide information on the lineage of data, indicating how it has transformed over time, where it has come from, how it has associated with other data sources over the life of its journey.
This in turn can provide deep insights into core business functions such as sales and marketing — mapping and preventing customer churn, indicating what products they may have purchased in the past, whether they have warranties for that product, or if they have upgraded services at any stage as a few use cases.
Automated, virtualised data also catalogues this information across all systems, documenting changes so that the organisation does not have to manually reconcile it. This can provide users insights into data literally as events occur to all areas of the business. Hot ticket items suddenly trending online? Retailers can set an alert, mark up prices, or order more stock in real time, reacting to what business intelligence reveals in real time.
The data landscape has changed and it is clear that the virtualisation of data is the next stage of its evolutionary journey. We’ve moved beyond the physical movement of data, reducing the costs associated with that process, of storing it in one large repository. This equates to reduced overheads, coupled with the savings in time afforded by a more efficient system.
Combine that with a better, more holistic view of data, and the ability to assimilate information in real time as it streams into the organisation, and the business benefits become obvious. Virtualisation of data is the next vital step towards running a faster, more efficient and cost-effective enterprise.