AI-Generated Code from LLMs Remains Highly Insecure

Artificial intelligence (AI) code-generation tools like GitHub Copilot and ChatGPT have rapidly become indispensable for developers, promising enhanced productivity. However, a growing body of research reveals a troubling reality: the code produced by these Large Language Models (LLMs) is often riddled with significant security vulnerabilities.

A recent 2025 GenAI Code Security Report by Veracode found that 45% of AI-generated code introduces security vulnerabilities. The study, which analyzed over 100 LLMs, revealed that when presented with a choice between secure and insecure coding methods, GenAI models opted for the insecure option 45% of the time. This issue is particularly pronounced in certain programming languages. For instance, AI-generated Java code was found to have a 71.50% security failure rate, while other major languages like Python, C#, and JavaScript still had failure rates between 38% and 45%.

The root cause of these insecurities lies in how LLMs are trained. These models are trained on vast amounts of publicly available code from the internet, which includes insecure coding practices and known vulnerabilities. Consequently, the AI replicates these flawed patterns. Common vulnerabilities introduced include flaws listed in the OWASP Top 10, such as SQL injection, cross-site scripting (XSS), and command injection. Specifically, models performed especially poorly at defending against XSS and log injection, producing weaknesses in 86.47% and 87.97% of tasks, respectively.

An additional risk factor is the over-reliance and false sense of security these tools can instill in developers. A Stanford University study found that participants with access to an AI assistant wrote significantly less secure code than those without. Paradoxically, the same participants were more likely to believe the code they wrote was secure. This overconfidence can lead developers to skip critical code reviews, allowing flaws to be pushed into production environments.

Experts stress that while LLMs are powerful productivity enhancers, they are not a substitute for human expertise and rigorous security testing. Developers must treat AI-generated code with the same scrutiny as a snippet from an unknown developer. The ultimate responsibility for the security and integrity of software remains firmly with the human developer, who must review, test, and validate every line of code before deployment.