Historically, increasing volumes of large scale datasets - petabytes and above - have been concentrated in centralized corporate data centers which can create data concentration risk.
However, the complexities of these environments and the interactions between data, people, systems, applications, analytics and clouds that lie outside of these data centers can have a detrimental impact on performance and quality.
What’s going on with all this data?
First, data is growing exponentially - from gigabytes to terabytes, to petabytes and now approaching zettabytes. Proportionally more data is being generated each year than what already existed in storage.
Second, data is no longer centralized.
Its creation and processing have shifted to the edge, where distributed data growth and consumption occurs.
This increasing data gravity at the edge is attracting associated applications and services, such as analytics tools drawing real-time insights from multiple, distributed data sources.
The physics of latency, along with ever-increasing bandwidth costs, make backhauling all this data to a centralized data center, far from these applications and services, unsustainable.
Finally, businesses that were once product-driven are becoming increasingly data-driven, as data becomes more valuable to businesses, customers and competitors, and a more desirable target for bad actors.
As more and more data is created, accumulated, updated and accessed at the edge, proper tools must be available that can manage data segmentation logically and geographically to prevent unmanageable data sprawl at the edge.
What’s impeding optimal data access and management at the edge?
A number of obstacles prevent digital businesses from fully leveraging their data capabilities, including:
There has to be a better solution
In our last article in this series, Data is Pouring Over the Edge, we described using a digital edge node, as part of an Interconnection Oriented Architecture (IOA) strategy, to act as an interconnection hub, tailored for local or shared data services at specific geographic locations.
An IOA framework leverages distributed digital edge nodes to create distributed data repositories that can be controlled, improving performance and security in multiple edge locations, while optimizing wide area networking.
Using distributed storage platforms that apply erasure coding to deploy a single namespace data service (optimized for high availability and data protection) in all edge node locations, you can make your data immutable, essentially shielding it against human error and data corruption, and dramatically increase availability.
The data service can use policy-based controls to address logical and geographical data segmentation, and can support cloud provider’s large-scale, multi-tenant data services (i.e., private cloud storage, object storage, etc.).
You can geographically place data nodes (Data Hubs) in each edge node, as well as in cloud environments.
Built-in algorithms interpret policies and store the actual data in a way that protects it from device, location or even regional failures and breaches, without losing data or access capabilities.
This strategy offers far more protection than a data “copy” approach and uses much less storage. Data services can also be optimized for integration, supporting multiple interfaces (e.g., web, APIs, file system) as part of the data abstraction layer across multiple underlying technologies realized with the IOA framework (see diagram below).
A Distributed Data Repository Infrastructure with Global Namespace
To create a distributed data repository at the edge, follow these steps:
The benefits of a distributed data repository at the edge
A distributed data service with centralized management solves the following constraints:
It also enables the following benefits:
Examples of a distributed data repository’s use include: shared drive, package distribution for applications/containers, logging repository, staging area for analytics, etc.
Article by Olu Rowaiye, Equinix blog network