Study finds AI-generated code far buggier than human work

Thu, 18th Dec 2025

AI-generated code introduces significantly more defects than human-written code across key software quality categories, according to new research from CodeRabbit.

The San Francisco-based AI code review company analysed 470 open source pull requests on GitHub. It compared 320 pull requests labelled as AI-coauthored with 150 human-only submissions. The study measured issue rates in areas including logic, maintainability, security, performance and readability.

The report found that AI-generated pull requests contain about 1.7 times more issues on average than those written only by humans. It also identified higher rates of critical and major defects in code that involved AI tools.

More bugs and risks

CodeRabbit said AI-authored changes showed elevated problems across every major quality dimension. Logic and correctness issues rose by 75 per cent in AI-generated code. These included business logic errors, misconfigurations and unsafe control flow.

Security vulnerabilities appeared between 1.5 and 2 times more often in AI-generated pull requests. The most common security problems involved improper password handling and insecure object references.

Readability issues increased more than threefold in AI-assisted code. The report cited higher levels of naming inconsistencies and formatting problems, which can make code harder to understand and maintain over time.

Performance inefficiencies showed the largest relative jump. The study found that issues such as excessive input/output operations appeared nearly eight times more frequently in AI-generated submissions than in human-only pull requests.

One of the authors said the findings match concerns raised by software teams this year. "These findings reinforce what many engineering teams have sensed throughout 2025," said David Loker, Director of AI, CodeRabbit. "AI coding tools dramatically increase output, but they also introduce predictable, measurable weaknesses that organizations must actively mitigate."

Rising AI adoption

The study comes as AI code generation tools spread quickly across the software industry. More than 90 per cent of developers now report using AI-assisted coding tools to increase productivity and handle routine tasks.

Companies report gains from these tools, including around 10 per cent faster engineering output and reductions in time spent on repetitive work. The new data suggests those productivity benefits sit alongside higher defect rates and new categories of risk.

CodeRabbit's analysis indicates that AI-generated code does not only add minor issues. The company said critical and major defects occur up to 1.7 times more often in AI-authored changes than in human-only pull requests.

Patterns of failure

The report highlights clear patterns in where AI-generated code falls short. Logic errors often stem from misunderstandings of business rules or incorrect assumptions about configuration and architecture. Misconfigurations and unsafe control flows raise the risk of incorrect behaviour in production systems.

Security issues arise frequently where AI-generated code handles sensitive information. The study notes repeated instances of improper password storage or transmission. It also flags insecure object references that could expose data or functions unintentionally.

Readability and style problems appear widespread. Inconsistent naming, irregular formatting and unclear structure increase the workload for reviewers and maintainers. These issues can slow code reviews and make long-term maintenance more complex.

Performance defects linked to AI tools often involve inefficient resource usage. The report points to excessive I/O operations as a recurring pattern, which can degrade response times and increase infrastructure costs when deployed at scale.

Mitigation strategies

Alongside the statistical findings, CodeRabbit sets out a series of measures that software teams can adopt when using AI-assisted development.

One focus is on providing richer project context in prompts. The company said AI models make more mistakes when they lack information about business rules, configuration standards or architectural constraints. It recommends the use of repository-specific prompt snippets, instruction capsules and configuration schemas that supply this context up front.

Another area involves stricter control of code style and formatting. The report notes that CI-enforced formatters, linters and style guides can remove many readability and formatting problems before human review. Policy expressed as code in these tools reduces the need for manual enforcement of standards.

CodeRabbit also calls for stronger checks in continuous integration pipelines in response to the rise in logic and error-handling issues. It suggests requiring tests for any non-trivial control flow. It also suggests mandating explicit nullability and type assertions, standardising exception-handling approaches and explicitly prompting for guardrails in complex areas.

On security, the company recommends centralising credential handling and blocking ad hoc password usage in code. It also urges teams to run static application security testing and security linters automatically in their pipelines to catch vulnerabilities earlier in the process.

The report proposes AI-aware pull request checklists for reviewers. These include explicit questions about error-path coverage, correctness of concurrency primitives, validation of configuration values and adherence to approved password-handling helpers. The aim is to steer reviewers towards the areas where AI is most likely to introduce defects.

Methodology detail

The analysis draws on 470 open source GitHub pull requests. It uses CodeRabbit's structured review taxonomy to classify issues into logic, maintainability, security and performance categories.

Researchers compared AI-coauthored and human-only pull requests using normalised issue rates. They applied Poisson rate ratios with 95 per cent confidence intervals to assess the statistical strength of differences between the two groups.

CodeRabbit said it plans to continue tracking real-world AI-generated code. The company aims to expand the data set and refine guidance for development teams as adoption of AI coding tools increases.

ChatGPT

Key takeaways Explain why it matters Create action plan Future watch

Claude

Key takeaways Explain why it matters Create action plan Future watch

Perplexity

Key takeaways Explain why it matters Create action plan Future watch

Grok

Key takeaways Explain why it matters Create action plan Future watch

Share Share

Add us as a preferred source on Google