Article written by MapR senior vice president for data and applications Jack Norris
One of the greatest challenges faced by today’s Chief Data Officers (CDOs) is understanding the value of their data. According to the 2017 Gartner Chief Data Office Survey, the top internal roadblock to the success of a CDO is “culture challenges to accept change” for 40% of respondents, and “poor data literacy,” was the second biggest challenge (35%), suggesting a general misunderstanding of the value of data and analytics, and their link to business outcomes.
This inability to understand the value of data is often caused by a misplaced focus.
Many organisations treat data as their greatest asset and will go to great lengths to accumulate data in vast warehouses or lakes. These data stores produce queries with the intent of being used to gain historical insights that incite change and deliver business impact.
However, the problem is these reports don't give comprehensive insights across the entire organisation and are only useful in selecting the users of the data. The second problem is with the use of historical data, you only see where the organisation/customer has been, versus looking at real-time data to see where the organisation is going right now.
It is therefore clear that a different perspective by the CDO is required - data should not be viewed as an asset in-and-of-itself, but a raw material that can be leveraged. Not to provide further insight into how a business operated and performed, raw material is to be used to impact business while it is happening. This redirected focus will put the emphasis on data flows, rather than data stores, and on streams rather than lakes or warehouses.
A CDO today needs to focus on the following three keys to success:
As a 2016 Deloitte report on "Flow and the Big Shift in Business Models" suggests, businesses need to shift their focus from the size of the data that is being collected to how we might interact with that data as it's generated.
Furthermore, understanding the nature and power of data flows will be the primary task in computing over the next decade.
It wasn’t long ago that separate networks existed and as a result, network traffic wasn’t easily shared across a building, let alone across the world. The term sneakernet was used to refer to the process of moving things manually between computer devices.
In employing sneakernet, data flows could easily be used to describe the transit time between separate data storages. Moreover, it is still a common mistake to confuse data ingestion with the data flows that drive transformational value.
However, we are no longer talking about the transit time between separate data siloes, the sneakernet equivalent for data. Data flows now refer to the ability to support a diversity of application processing and analytics on a common data infrastructure – a "Data Fabric”. Understanding this shift will enable CDOs to build a solid foundation from which they can make a greater impact within their organisation.
IT architectures of the past were largely dictated by the volume of their data. The more compact the data, the smaller the hardware platform that was required, and the lower the cost. From PCs to servers and mainframes, the costs of architecture would increase dramatically as they handled larger data stores.
Speed was also once a benefit of handling smaller sets of data. As a result, each application or "use" would be followed by a specialised data structure, driving certain activities and requiring extraction and transformation processes from data sources.
A data fabric significantly simplifies this architecture, as well as its modernised forms. It includes a variety of data formats at scale, data-in-motion and data-at-rest. It is not limited to a rack, or a building, but a data fabric instead stretches from the edge to the cloud.
Instead of a sequence of overlapping data structures popping up across an organisation, the current data fabric can provide a common data platform that serves the needs of a diverse set of applications. The fabric improves continuity and efficiency within an organisation’s IT infrastructure, allowing CDOs to understand and command the value of their data with more ease.
To understand the context and actions of customers, competitors and ecosystem partners is to ensure the most appropriate changes can be made and the greatest competitive advantages can be reaped.
No longer should businesses be using batches of old, collated analytics. Data flows and real-time applications should be used to inject analytics into business functions as they happen, to deliver immediate impacts. With the support of a data fabric, a business can optimise revenue as a customer engages, minimise risk as threats occur, and improve efficiency and quality as operations continue.
This underlying data fabric provides an enterprise-grade persistence layer for broad data sources including files, tables, streams, videos, sensor data and more. The data fabric supports a converged processing layer for file operations, database functions, data exploration and stream processing.
It also supports automated processing as well as traditional SQL for existing analysis and reporting needs. Additionally, the platform provides open access to enable more sophisticated machine learning and AI as an organisation's needs and complexity evolve.