Selecting a cloud is often based on its suitability to run certain types of workloads, or for commercial reasons such as introducing competitive tension, but neither are areas where cloud journeys typically come unstuck, writes Mike Hicks, Principal Solutions Analyst, Cisco ThousandEyes
Cloud adoption has reached a stage of maturity in many enterprises. While maturity comes with time and experience, it’s not uniformly visible across all cloud domains.
We see maturity in the migration pathways and pipelines, and in some of the automated operations for example, even if automated detection and recovery from anomalous events does not always go to plan.
At the same time, there is relative immaturity in other areas, such as cost and performance management. While cloud cost containment has dominated 2022 strategies, 2023 is shaping as the year where a performance-based lens is applied to cloud-based architecture and configuration decisions.
Today, selecting a cloud is often based on its suitability to run certain types of workloads; for commercial reasons such as creating competitive tension between multiple cloud providers, disincentivising sudden cost increases; or for sovereignty reasons, storing data inside of an established border and its associated laws.
The focus is very much on the cloud service provider and its immediate capabilities.
But to get a more complete cloud experience, the scope of this decision-making needs to expand to the infrastructure that supports cloud environments, in addition to the cloud architecture itself.
This aligns to the experiences of many cloud customers, who have seen three main consequences of their cloud decision-making materialise to date.
Consequence 1: Change is the only constant
In cloud environments, change is the only constant, and that is keeping application, operations and infrastructure teams on their toes.
No one expects distributed applications, or the infrastructure they run on, to be static and set-and-forget, but at the same time, teams may not have factored in the amount of change these applications are exposed to.
These aren’t changes made by the application owner. The modular, API-centric nature of modern applications and the broad adoption of business SaaS have a vast web of interdependence, with cloud at its centre. Changes to APIs, services or third-party code libraries upon which these component services or applications rely can - and routinely does - break the applications.
This drives to the realisation that there’s just no steady state in the cloud. Teams need visibility and oversight to stay on top of the dynamic nature of the cloud if they are to continue to accrue benefits and just additional costs.
Consequence 2: Living with latency
Infrastructure and operations teams are accustomed to dealing with latency in relatively controlled setups. The conventional response to latency is to reduce the distance and number of hops required to move traffic from point A to point B - such as between an office and the entry point into cloud. A shorter route typically means physically less distance to cover, and that should improve performance.
In cloud-based operations it is a bit more complicated.
As we’ve established, change is constant. Public cloud connectivity architectures continuously evolve and can be subject to significant changes at the discretion of a provider. Decisions made by cloud providers - including how they advertise service endpoints, prioritise optimisations, obscure underlay paths, or leverage shared infrastructure for their backbone - can all add milliseconds to the round trip that data traffic takes, and influence over this sits outside the enterprise’s sphere of control.
The way to address latency is in the application architecture itself, and in setting (or resetting) user expectations.
The goal is not to reduce latency to the shortest time possible, but instead to make performance fast enough to satisfy the application’s and users’ requirements. Low latency may be achievable by hosting workloads in a single cloud region that is close to corporate users; however, this may not be desirable if the local region costs more for compute, or would cause potential resiliency issues due to single country dependency.
By comparison, a higher latency might be acceptable if it balanced cost and convenience, and if the application was architected in such a way that it could account for and work with that additional latency, without causing a degraded experience.
Consequence 3: Capacity curveballs
Enterprises often think about their data being in the cloud and the immediate potential bottlenecks to ingress and egress, but the reality is the capacity bottlenecks may be further afield.
Cabling systems that carry the world’s traffic are increasingly a key concern for digital and cloud-first organisations.
While Google, Amazon, and Microsoft have all made significant investments in infrastructure projects, such as subsea cable systems, they still must leverage some shared physical links, whether subsea or terrestrial, to carry traffic between different parts of their networks.
The contracted or leased capacity on these cables varies between providers. The cables themselves also come ashore in dedicated landing zones; in parts of Asia, these may overlap with shipping lanes, or be in relatively shallow waters susceptible to tropical storm damage.
Network performance can also vary substantially over time and in different regions. Some cloud providers try to bring traffic into their networks very close to where it originates and regardless of its destination; others route traffic via the internet and only bring it on-net closer to their physical locations.
Enterprises need to have the ability to exercise oversight of their connectivity - whether traffic is inside or outside of a public cloud provider - taking into consideration regional performance conditions, route diversity, Internet sovereignty, legal compliance, and organisational policy as appropriate.