Tackling internal perceptions that SLAs aren’t performing
FYI, this story is more than a year old
Article by ThousandEyes A/NZ director Will Barrera.
Not all organisations have realised the maturity to recognise when the performance claims of public cloud don’t stack up.
Cloud is often sold on its performance credentials: that it can achieve higher levels of reliability, availability and resilience than one could achieve with on-premises infrastructure
This has led many Australian organisations to adopt cloud-centric IT strategies, such as cloud-first, cloud-only and multi-cloud.
But, as Deloitte Australia notes, “Australia is still at the beginning of its cloud journey, both in terms of the share of businesses using cloud services, and the sophistication of their use. As these evolve, cloud-based innovation will have a growing impact for businesses across the Australian economy.”
Operating in cloud-centric environments remains a challenge.
Whole disciplines have emerged to keep various aspects of cloud operations in check. There’s FinOps to optimise cloud economics, site reliability engineering (SRE) to bring a software mindset to infrastructure management in addition to aspects to meeting cloud service level agreements, service availability, uptime, and response times.
Cloud performance is also fast becoming a discipline in its own right.
A recent survey - with one-third of respondents from A/NZ - found 57% of IT decision-makers perceive their use of cloud results in more frequent outages.
That probably isn’t the reality, but it’s the perception, and it was most strongly held by executives, whose support is crucial not just to cloud but to the success of a range of IT initiatives.
Fuelling that perception - or at least making it difficult to refute - is the dearth of information about how different clouds perform.
An Australia-specific public cloud performance benchmark released for the first time last year measures and compares network performance between the top five cloud providers, including AWS, GCP, Azure, IBM and Alibaba Cloud.
It clearly shows that not all cloud providers are equal, and that performance really depends on the unique needs and geographic regions a business cares about. Being “in the cloud” is no guarantee of a consistent experience. A year on, and against a backdrop of rising cloud use, the benchmark remains one of few public-domain resources of its type.
All cloud. Including Amazon, Google and Microsoft Azure, make decimal point performance promises in the form of service level agreements (SLAs).
Availability is typically expressed in an ‘x nines’ format. A three nines SLA - 99.9% availability - means up to 10 hours of downtime a year, whereas five nines (99.999%) means just six minutes.
When cloud providers violate their SLAs, some offer service credits as compensation to users.
A 2017 survey found only a quarter of providers offered refunds or service credits for missing an SLA. Due to lack of research since, it’s unclear how this has changed.
There is certainly awareness among Australian organisations about the ability to claim service credits on the odd occasions where public clouds have seriously fallen over. But there’s little data out there on the success of service credit claims.
Of course, large outages are easier to spot. It may be that services suffer a series of much shorter brownouts that go under-reported (or undisclosed) on official status pages. Therefore it could work in an organisation’s favour to have performance monitoring that does not rely on the official status notifications of their cloud providers.
Only organisations that take a disciplined approach to cloud performance management are likely to recognise the quantum of otherwise minor performance glitches.
Tackling Cloud SLAs head-on
One issue is that when it comes to enforcing cloud SLAs, most IT teams lack the necessary visibility into cloud provider performance metrics that impact the delivery of their services.
Without good visibility, it’s next to impossible to isolate performance issues between your organisation, cloud service providers and end users - and therefore to hold vendors accountable to meeting expected performance standards.
Organisations that rely on a hybrid cloud or multi-cloud strategy, in particular, can find it difficult to determine which service is causing an issue due to the complexity of their deployment. This results in increased costs, as staff waste precious resources trying to determine the source of each issue.
Digital experience monitoring solutions can help bridge that gap. They create a window into an agency’s entire IT ecosystem and, when paired with procurement best practices, can help ensure infrastructure is performing, is cost effective, and is responsive to end user needs.
Cloud users are also advised to create a continuous lifecycle approach to monitoring their entire network, incorporating regular monitoring as part of the deployment of new applications and cloud components.
The ultimate end goal is to develop a shared understanding of how each service provider impacts your organisation’s network operations and security posture. This makes it much easier to address any issues - and internal perceptions - as they occur.