AI code praised in review, but faults rise in production

Thu, 11th Jun 2026

New Relic has published its 2026 State of AI Coding report, which found that many technology leaders rate AI-generated code more highly than human-written code at the review stage.

That confidence often fades after deployment. According to the survey, 78% of respondents reported more incidents once AI-generated code went live, 86% said senior staff were spending more time fixing code, and 74% said at least a quarter of AI-generated code had needed significant rework over the past year.

The report is based on a survey of 200 U.S.-based technology decision-makers in IT and engineering roles at upper mid-market and enterprise companies that use generative and agentic AI in software engineering. All respondents were managers or above, including directors, vice presidents, and C-suite executives, and had software purchasing authority.

The data suggests a broad shift in how software is written inside larger organisations, not just startups. Some 67% of technology leaders said AI now generates or significantly refactors between 51% and 75% of their organisation's weekly code output.

That level of adoption appears to be matched by formal internal acceptance. The survey found that 88% of organisations have incorporated vibe coding into production policies, 5% restrict it to non-production environments, and none said their organisations ban the practice outright.

Production strain

The report suggests the biggest tension lies between code review and production performance. While 94% of respondents said they viewed AI-generated code as higher quality than human-written code at the time of review, only 2% said they saw it as lower quality. Those views did not translate into smoother outcomes once the software was running in live systems.

In the past six months, 82% of respondents said they had experienced at least one production failure linked to AI-generated code. Only 19% said their organisations had not faced any AI-generated code challenges during that period.

The survey also indicates that teams are shipping AI-produced software with limited manual scrutiny. Nearly 62% of technology leaders said their engineering teams often trust AI-generated code enough to send it into production without line-by-line manual verification.

That may help explain why senior engineers are carrying more of the burden after launch. If AI tools speed up drafting and refactoring, the operational cost appears to shift to debugging, incident response, and rework.

New Relic described this pattern as a growing engineering management problem tied to what it calls agent debt. The term refers to a build-up of software logic that may appear sound in review but has not been sufficiently tested or understood before release.

"AI coding agents are no longer just autocompleting lines of text, they are driving the majority of software development across the enterprise," said Nic Benders, Chief Technical Strategist, New Relic.

"However, our report brings to light a concerning trend: the rapid accumulation of what we're calling 'agent debt.' While leaders praise the velocity of agent-generated code during initial reviews, organisations are quietly inheriting a massive deficit of unvetted architectural logic that triggers production incidents down the line. Finding ways to mitigate agent debt is now a defining challenge for engineering organisations," Benders said.

Observability focus

The report also found strong support for observability practices as companies try to manage software increasingly produced by machines. Some 96% of technology leaders rated observability as very or extremely important when working with AI-generated code, and none rated it as only slightly important or not important.

That concern is increasingly shaping how engineers use AI tools during development. Nearly 78% of teams said they now routinely prompt AI systems to include telemetry such as logs, traces, and metrics directly in generated code.

This points to an effort to move monitoring earlier in the software lifecycle. Rather than adding instrumentation after release, teams are trying to ensure code is observable from the moment it is generated.

The survey covered companies already using generative and agentic AI in software engineering, so the findings reflect organisations that have moved beyond experimentation. Within that group, the data suggests AI-assisted coding has become embedded in everyday development work even as concerns over production reliability rise.

The report adds to a wider debate in the technology industry over whether AI coding tools reduce total engineering effort or simply shift it to different parts of the workflow. In this case, the findings suggest many organisations see faster code creation and stronger review impressions upfront, but more incidents and more repair work after release.

For engineering leaders, that tension is likely to sharpen as AI takes on a larger share of software output. With 67% of respondents saying most of their weekly code is now generated or heavily refactored by AI, the question is no longer whether teams use these tools, but how much operational strain they are prepared to absorb once that code reaches production.

ChatGPT

Key takeaways Explain why it matters Create action plan Future watch

Claude

Key takeaways Explain why it matters Create action plan Future watch

Perplexity

Key takeaways Explain why it matters Create action plan Future watch

Grok

Key takeaways Explain why it matters Create action plan Future watch

Share Share

Add us as a preferred source on Google

Image: Nic Benders