Taming multi-cloud kubernetes networking with topology-aware routing
Building a multi-cloud Kubernetes cluster is a fascinating challenge. The ideal outcome is a single, unified control plane spanning multiple cloud providers - AWS, Azure, and GCP - but the real hurdle is networking. How do you ensure pods running in different clouds communicate securely and efficiently?
I recently explored this when preparing a technical demonstration, and the journey turned out to be far more educational than expected. I began with a simple, lightweight architecture, but quickly ran into deeper networking issues, especially around Kubernetes Services. This is how I built the environment, what broke along the way, and how the final solution came together.
Initial Setup: Terraform, Ansible, K3s, and WireGuard
My approach was to keep everything intentionally minimal.
Kubernetes Distro:
I selected K3s for its lightweight footprint and straightforward setup. For this kind of distributed experiment, avoiding the complexity of a full upstream distribution made sense.
Infrastructure:
Using Terraform, I created three modules - one each for Azure, AWS, and GCP.
Each module provisioned two VMs with public IPs. These IPs were essential for establishing WireGuard tunnels between the clouds.
Configuration:
Terraform automatically generated an Ansible inventory. I then used a set of Ansible playbooks to:
Install and configure WireGuard on every node, forming a flat, secure overlay network across the three clouds.
Install K3s and join all nodes into a single cluster.
Once the cluster was live, I used k8s-netperf to benchmark the reliability and performance of inter-cloud communication.
The Problem: When Services Betray You
The early k8s-netperf results were mixed.
The Good: Direct node-to-node and pod-to-pod communication across the WireGuard tunnel worked surprisingly well. Even cross-cloud traffic delivered acceptable throughput.
The Bad: The moment I targeted a Kubernetes Service (such as netperf-server), performance deteriorated sharply. Traffic became unstable and often extremely slow. Even when a local pod was available, Kubernetes sometimes sent requests across clouds unnecessarily.
This revealed a clear insight: the WireGuard overlay was functioning correctly, but Kubernetes' service discovery and load balancing - managed through kube-proxy - was not respecting the underlying topology.
Troubleshooting the Routing Problem
My first instinct was to tune kube-proxy.
Attempt 1: Topology-Aware Hints
I tried enabling Kubernetes' built-in Topology-Aware Hints by patching the Service with:
service.kubernetes.io/topology-mode: auto
trafficDistribution: PreferClose
In theory, this should have encouraged kube-proxy to choose endpoints in the same zone (or cloud). In practice, nothing changed. Either kube-proxy ignored the hints or the feature simply wasn't compatible with this architecture.
Attempt 2: Switching to IPVS
K3s defaults to iptables mode, so I tested IPVS in the hope of improving load-balancing decisions.
This failed immediately. IPVS conflicted with the WireGuard configuration and broke the tunnel entirely.
Attempt 3: Replace kube-proxy (and Flannel)
At this point, it was clear kube-proxy was the bottleneck. Removing it, however, also meant removing Flannel - K3s's default CNI - which depends on kube-proxy.
So I needed a new CNI and a kube-proxy replacement. Two candidates stood out: Calico and Cilium.
Calico:
I first tested Calico in L3 mode. Although technically robust, it introduced difficult routing challenges. Without VXLAN encapsulation, Calico could not automatically route pod-to-pod traffic over WireGuard. This would have required configuring BGP or manually adding every pod CIDR into each node's WireGuard configuration. Clearly not scalable.
Cilium:
Cilium immediately looked more promising. It supports VXLAN overlays, integrates cleanly with WireGuard-based networks, and includes a mature kube-proxy replacement feature.
The Solution: Cilium to the Rescue
Step 1: Initial Cilium Installation
I removed Flannel and deployed Cilium with its kube-proxy replacement enabled. Instantly, the erratic service behavior disappeared. Traffic became stable and predictable - a major improvement.
However, the benchmarks still showed traffic routing to pods in other clouds even when local pods were available. The routing was fair, but not topology-aware.
Step 2: The Final Fix
After diving deeper into Cilium's documentation, I discovered the feature I was missing: Cilium's native service topology awareness.
When enabled, this produced exactly the behavior I wanted:
- Prefer a pod running on the same node
- If none exist, prefer a pod in the same zone (same cloud provider)
- Only fall back to cross-cloud traffic if no local endpoint exists
This fully aligned Kubernetes service routing with the actual network topology of the multi-cloud environment.
Final Takeaway
Creating a multi-cloud overlay with WireGuard is surprisingly straightforward. The real complexity lies in making Kubernetes Service routing behave intelligently across that topology.
In this experiment - and in real enterprise environments - kube-proxy can struggle to make optimal decisions. But Cilium's combination of kube-proxy replacement and service topology awareness provided an elegant and effective solution.
For organisations exploring multi-cloud architectures, especially those focusing on cloud-native reliability and performance, these lessons are invaluable. At Darumatic, we frequently see teams underestimate the networking layer when adopting multi-cloud Kubernetes. As this experiment shows, getting topology-aware routing right is essential for predictable performance, cost control, and a seamless user experience.