IT Brief Australia - Technology news for CIOs & IT decision-makers
Modern enterprise datacenter kubernetes gpu racks ai automation ops

AI workloads force Kubernetes shift to autonomous SRE

Sat, 10th Jan 2026

AI-heavy workloads are set to upend how large organisations run Kubernetes and cloud infrastructure by 2026, according to Komodor cofounder and CTO Itiel Shwartz. He expects a shift from AI model training towards large-scale inference, wider use of autonomous cloud operations, and rising pressure on site reliability engineering teams.

Shwartz said infrastructure built for traditional web and microservices applications now faces a different profile of demand. AI models in production require sustained and predictable access to GPU resources, higher throughput, and tighter cost control. This strain is beginning to appear in large Kubernetes clusters at cloud providers.

"As AI/ML use continues to increase more workloads will move from training to inference. Even the new GKE experiments are showing signs of this, as the huge number of nodes that they scale up with contain a significant amount of inference workloads," said Shwartz, CTO and cofounder, Komodor.

He said the operational impact of that transition would fall first on SRE and platform engineering teams. Many enterprises already run large Kubernetes estates, and they are now layering AI and generative AI services onto existing clusters.

AI SRE emerges

Shwartz expects the combination of talent shortages and competitive pressure from generative AI adopters to push companies towards "AI SRE". This describes a model where small human teams work alongside automated agents and machine learning systems that manage routine operations.

"As more organizations deploy cloud native infrastructure, and GenAI cutting time to market for their competitors, platform teams will understand that to continue to innovate and lead, they need to scale up their SRE teams. With Kubernetes experts at a premium, AI SRE will prove to be the missing ingredient that allows them to adapt," said Shwartz.

He said this shift depends on standardised operational data and clear control points inside clusters. That includes consistent telemetry, shared event formats, and APIs that automated systems can call safely.

Towards autonomy

Shwartz expects a gradual move from human-in-the-loop automation towards more autonomy in cloud operations. He said AI-assisted tooling is gaining acceptance among enterprises that previously resisted automatic remediation or scaling decisions.

"As more and more AI powered tooling is adopted, and users trust it more, we will see a movement among traditionally conservative enterprises towards allowing some operations to be autonomously managed by AI," said Shwartz.

He suggested organisations wrap automated actions in policy-as-code and audit trails. That structure would let teams expand the scope of automated operations while maintaining governance.

New job systems

Shwartz highlighted a change in how complex workloads queue for compute resources. He expects cloud-native job queueing systems such as Kueue to see higher adoption as organisations compete in high performance computing, AI and machine learning, and emerging quantum workloads.

Traditional job queues were built around smaller or less elastic environments. They often do not handle the bursty, GPU-centric, and multi-tenant realities of modern clusters. Shwartz said this limitation opens the door for new schedulers and queue managers that integrate more closely with Kubernetes.

"Cloud-native job queueing systems, like Kueue will see a major uptick in adoption, as the race for deploying HPC, AI/ML, and even quantum applications heats up. Since previous queue systems are not built for this scale, new tooling will quickly be implemented across the industry," said Shwartz.

Scheduler overhaul

The Kubernetes scheduler itself faces significant change under these conditions, Shwartz said. Current designs emphasise pods as the primary scheduling unit. AI training and inference workloads often require groups of pods to start together and share GPU and network resources.

He pointed to ongoing community work on "gang scheduling", which treats a set of tasks as a single schedulable unit. The feature appears in Kubernetes Enhancement Proposal 4671 and aims for native support in future releases.

"With applications and workloads relying on more compute than ever before, Kubernetes scheduling will require a makeover. The current pod-centric approach will not be able to handle this increased scale, so a more workload specific approach for the scheduler will be required. The community is actively working on this through KEP-4671: Gang Scheduling, which will be managed natively in K8s," said Shwartz.

GPU pressure

The expansion of AI inference also focuses attention on GPU capacity and cost. Shwartz expects GPU overprovisioning to become a more visible operational problem as organisations seek higher utilisation rates across clusters.

"As the macro economic climate continues to push towards greater efficiency, organizations will have to find ways to optimize their GPU monitoring and usage," said Shwartz.

He recommended platform teams treat GPU efficiency as a reliability concern, not only as a spending problem. That involves setting service level objectives around GPU usage, tracking fragmentation and saturation, and feeding those metrics into autoscalers and admission controls.

Tool consolidation

Shwartz also predicted consolidation in cloud infrastructure tooling. He drew a parallel with cloud security where buyers have moved away from multiple point products.

"FinOps tools will start to consolidate with other products in the cloud infrastructure stack. Similar to what is happening in cloud security, products will consolidate different capabilities, including observability, insights, tracing, cost optimization and troubleshooting, into a single platform. This will remove cognitive load from teams struggling to keep up with too many dashboards and products," said Shwartz.

He said platform leaders should start by reviewing their current toolchains and by identifying overlap across monitoring, tracing, cost analysis, and debugging. That assessment would prepare the ground for integrated systems that handle both operational health and financial efficiency.

Shwartz said platform teams that standardise telemetry, experiment with new scheduling approaches, and embed GPU efficiency into SRE practice will be better placed as AI workloads scale across Kubernetes estates by 2026.