Streaming services and lessons for unrivalled customer experience
"To produce attractive content in a future shaped by digitalisation, first-class technological capabilities are a necessity".
This was the succinct message by researchers at Deloitte in predicting the TV and video industry landscape by 2030. They issued a clarion call to established broadcasters and content producers to "constantly invest in their digital competence, because technology has become a core element of their business processes."
There's a common denominator, not oft-spoken 'secret sauce' amongst streaming giants like Netflix, Disney+ or HBO Max - they're often generally recognised as digital content pioneers, not "observability leaders". But a closer look under the hood will reveal otherwise.
The entertainment giants have long realised that observability is pivotal to preventing downtime and outages while increasing viewers' loyalty and adoption. They serve valuable lessons for accomplished and upcoming players alike.
Using observability to create a best-in-class customer experience starts with a data-driven approach. The big streaming services take a very deliberate path in defining the measurable attributes of a great customer experience.
This involves determining the benchmarks to achieve and then creating linkages between these benchmarks and all of the systems and services required to deliver that experience.
As with most software investments, the best plan is to deliver incremental value over time then optimise the investment through information gleaned in each increment. Observability is no different.
For many large streaming service providers, initial observability investments have focused on key customer metrics, including video start-time and buffering percentage.
By looking at these metrics and their relationship to the various systems and networks required to deliver a strong customer experience from the outset, streaming services have honed their observability practice while creating a wider organisational understanding of the value of observability to both the top and bottom lines.
The importance of ubiquitous observability
Another important aspect is making observability ubiquitous in two distinct dimensions.
The organisation: tools, teams, processes and training are put in place to give every engineer access to observability tools. It's much easier to realise value if the entire ecosystem is instrumented and easier to instrument if a set of easy to use, standardised tools is provided to developers to implement observability as a standard part of the Software Development Lifecycle (SDLC).
Data: the best approach is to initially ingest as much data/information as possible (from existing systems) to enable an understanding of what data is valuable and what data can later be removed. Starting with sampling upfront might lead to missing important relationships or not allowing complex issues to be isolated.
Logs, distributed tracing fundamental to success
One of the most fundamental rules of observability for many streaming providers is recording trace IDs in logs.
In a digital world where a single user experience is created by orchestrating a large number of cloud services, the ability to correlate the experience across services and drill down into logs to isolate and resolve issues is a significant advantage and a powerful productivity tool.
For example, Netflix has created an internal tool called Edgar that allows for the service relationships to be displayed and highlights the service parts that are correlated with issues to allow faster problem resolution.
Other streaming giants use similar types of functionality in observability platforms, including aggregate views through Workloads, Galaxy and PathPoint, and automatic, zero-configuration dashboards such as Lookout that bring possible anomalies to the attention of engineers.
When combined with logs and distributed tracing, these tools allow teams to understand how different services and components impact the customer experience much more efficiently.
Putting in place a real-world-ready SDLC
Another important observability principle involves making the software development lifecycle as close to production and the real world as possible.
One sub-principle of this is that observability should be built into a technology platform from the beginning, starting with the developer's integrated developer environment (IDE) and running through each of the environments leading up to production.
The second sub-principle is chaos engineering - or a deliberate strategy in development to faithfully reproduce real-life situations through development processes that introduce all of the random ugliness that occurs in the real world.
Netflix started this off in earnest with the introduction of an internally developed tool called Chaos Monkey that randomly "took out" servers, thereby forcing a development mindset focused on resilience under pressure.
In combination with full-stack observability frameworks, this ubiquitous, real-world approach results in better, more reliable systems and a much happier customer.
In the streaming world, quality customer experience also relies on partners and systems outside an organisation. This includes Content Delivery Network (CDN) providers, mobile and broadband networks, Digital Rights Management (DRM) providers and a host of other services.
Partnering with businesses that can integrate these external signals into a framework that encompasses a wider streaming ecosystem will enable the full competitive advantage of observability to be realised.