How IoT data collection & aggregation with local event processing work

Sat, 2nd Dec 2017

FYI, this story is more than a year old

Trying to predict how many Internet of Things (IoT) devices will go online over the next decade is like trying to predict the growth rate of rabbits in the wild.

It suffices to say that 2017 could be the year in which IoT devices exceed the total human population, based on a Gartner forecast of 8.4 billion IoT connected devices, or one device for each of the 7.5 billion people, plus just under a billion more to spare.

As companies wrap their brains around the loads of data coming from billions of IoT devices, they are finding that collecting, processing and analysing IoT data is more complex than for traditional data. Consider that each model of automotive vehicle on the road represents a mobile IoT device.

For driverless cars alone, that could translate into generating 47 megabytes per sec per car!

IoT traffic is intermittent and highly unstructured. It is diverse in frequency, function and bidirectionality. To complicate matters further, businesses typically need the insights it holds delivered in real-time.

Making real-time analytics a reality becomes increasingly difficult to achieve when you take into account the slow transfer of larger, more complex data flows to the cloud over less flexible, long-haul networks.

Such networks can't scale to match need due to limited bandwidth and high-latency connectivity. Increasing interconnection to IoT devices at the edge and providing proximate, high-bandwidth, low-latency connections can address the issues that this “data diversity” present.

What's happening with this data deluge?

The distributed sources of all this data, which are remote from the majority of corporate data centers, are creating forces that every business must come to terms with. These include:

The growth of IoT workflow traffic and data at the edge is creating “data gravity,” where more data attracts more data, as well as applications and analytics that need to interact with that data. This is driving the need to reduce the distance between data, applications and analytics to gain more timely results.
Latency is the biggest problem most businesses face. Eventually, adding more, costly bandwidth no longer increases data flow velocity, according to the laws of physics.
The ubiquitous nature of the IoT and related field use cases are driving greater simplicity out at the edge. It's easier to put automated mechanisms for system updates in 10 places for 10,000,000 things, rather than in 10,000,000 places.
Not all actions that are required to process or analyse data are equal. Investigation into staging data processing allows a more balanced approach (e.g., Level 2 data does not need as heavy processing as Level 5 data).
The bidirectional nature of IoT workflows means aggregated data can be sent to an edge device instead of backhauling it to a centralised staging repository.

These would be pretty basic concepts to address if it weren't for the inherent constraints within legacy IT infrastructures when dealing with IoT data, including:

Data from the field isn't the same as prepared data—it tends to be messy, intermittent, unstructured and somewhat dynamic (especially if the source device is mobile).
The volume of devices, variety of sources, frequency of samples and the data itself are all growing and will not scale using traditional approaches to computing, storing and transporting the resulting data files.
In many use cases, latency matters. Delays between a data event and a reaction must be near real-time. Throughput is impacted along with data growth, and delays will become unacceptable to users.
Traditional approaches to centralising all data and running analytics in the cloud are unsustainable for real-time use cases. Cloud-based data and analytics can pose security challenges as they are physically outside of the data center's security perimeter.
As machine learning intelligence capabilities become more commonplace in devices out in the field, those devices become more complex, requiring greater CPU and memory and drawing more power. Increasing complexity slows processing down and leads to data results being discarded since by the time results from the device are gathered, more recent data is desired.

Companies need a way to combine data collection and aggregation with local event processing. This would optimise streaming data flows and edge analytics resulting in faster time to insight and greater overall value. In addition, they need to participate in new and expanding IoT ecosystems by interconnecting with partners and exchanging data for optimal business results.

How IoT workflows and analytics work

The IoT is the new normal to enterprise data processing environments. Where software is fast, “things” are slow. An IoT solution must satisfy many requirements as a collective system, for example, interoperability and openness are two key design elements. IoT workflows are separated into two general types of functions:

Device management that includes device-centric requirements such as registering devices, updating operating shells, and systems and authenticating identity and access.
Event processing that involves data events or data polling events. This includes of the actual delivery of targeted data points produced by the IoT devices to their ultimate destinations.

The IoT is also heavily dependent on interconnected partnerships and ecosystems that rely on direct, private and secure connections between counterparties.

In essence, the IoT offers a balance between the cloud and the edge with two distinctly different frequencies of data flows that have two distinct paths: Hot (i.e., real-time telemetry) and Cold (i.e., finding patterns within a data warehouse) workflows, as illustrated in the diagram below.

An IoT Edge-to-Cloud Workflow

IoT analytics combines insights obtained from a traditional approach of finding patterns within streaming data by using data warehouse mining and real-time telemetry of data points provided by individual IoT devices in the field.

Device management design patterns are also dependent on the frequency and source of interconnected components.

A by-product of the edge IoT device's bi-directional data flow is the opportunity to process analytics locally with high proximity to the workload.

As a result, you can gain aggregated insights from the patterns found within the information produced by cloud-based analytics capabilities, along with the data from other devices and systems that are close to and interoperate with those IoT devices that are creating the data points.

Enterprise-grade scaling can be achieved by adding digital edge node aggregation points (aka., geographical distributed interconnection hubs) and placing them in regions where local cloud compute resources reside.

If leveraging cloud resources is not an option, an alternative would be to add more distributed data processing units within a digital edge node to create regionally-focused, distributed analytics engines that contribute compute power to IoT data in greater proximity to the workload.

Optimising streaming data flows and edge analytics

As mentioned above, placing IoT event processing and device management (firmware and updates) in one or more digital edge nodes brings IoT data and analytics closer together.

Since the digital edge node is located at the intersection point of consolidated access (e.g., cellular, broadband, internet), it is as close as possible both to data sources in the field and to the clouds.

Placing device management and IoT analytical capabilities at the edge solves latency, bandwidth and device complexity constraints.

It also provides multi-destination control and choice in the types of networks/providers you want to use, as well as a selection of cloud analytic platforms.

Participating in IoT ecosystems by connecting with partners and exchanging data also provides fast, efficient access to a wide variety of different data sources and types.

As the data comes in, it is validated, authenticated, inspected, pre-processed, stored in the global namespace, and then delivered to the next downstream processing step (see diagram below).

Data Collection and Aggregation, with Local Event Processing

Leveraging an IOA strategy for data collection and aggregation with local event processing allows you to optimise data streaming flows and edge analytics. Follow these steps:

Establish segmentation flows from field area networks. Messages from IoT gateways (in the field) get published on the message bus; otherwise a local IoT gateway processes it first.
Boundary control validates and authenticates the source and message.
Valid messages pass through an inspection zone and policy enforcement.
Messages are persisted to the data repository, then delivered via IoT event processing. Downstream messages are published to the cloud analytic platform(s) of choice, or your own cloud-agnostic repository.
IoT gateway “device requests” follow the same flow but are subscribed to by the device management function. Appropriate payload(s) (firmware, etc.) are loaded from the data repository and published back to the IoT gateway (or to the device directly). IoT data aggregation points in the field can also perform batch processing of the data.

Implementing this streaming data flows and edge analytics design pattern enable the following benefits:

IoT edge capabilities can scale to billions of devices, at each metro location globally.
Collected data can be validated, authenticated and inspected before being pre-processed.
All data can be stored in the global namespace — no need to discard.
Bandwidth is provided with the greatest efficiency and lowest latency for real-time event reactions and systematic scaling.
A choice of cloud IoT platforms provide as much processing as needed, localised at the edge.
Data can be monetised and its access sold on a data exchange, generating valuable insights and revenue.

Article by Ben Towner, Equinix Blog Network