× News Alerts AI News CyberSec News Let's Talk Local AI Bank Tech News Cyber Advisories Contact

White-Basilisk: A New Hybrid Model for Detecting Code Vulnerabilities

Researchers have developed White-Basilisk, a compact and efficient AI model for detecting software vulnerabilities. With just 200 million parameters, this model outperforms much larger systems by analyzing extensive codebases with unprecedented speed and a reduced energy footprint, challenging the 'bigger is better' philosophy in AI.

White-Basilisk: A New Hybrid Model for Detecting Code Vulnerabilities

A team of researchers has developed a new AI model, called White-Basilisk, that detects software vulnerabilities more efficiently than much larger systems. The model's release comes at a time when developers and security teams face mounting pressure to secure complex codebases, often without the resources to deploy large-scale AI tools.

A compact model with big results

Unlike LLMs, which can require billions of parameters and heavy computational power, White-Basilisk is compact, with just 200 million parameters. Yet, it outperforms models more than 30 times its size on multiple public benchmarks for vulnerability detection. This challenges the idea that bigger models are always better, at least for specialized security tasks.

White-Basilisk's design focuses on long-range code analysis. Real-world vulnerabilities often span multiple files or functions, which many existing models struggle with due to limitations on how much context they can process at once. In contrast, White-Basilisk can analyze sequences up to 128,000 tokens long, enough to assess entire codebases in a single pass.

Built for efficiency and context

To overcome the limitations of traditional models, the team created a hybrid architecture built on three components. Mamba layers, chosen for their exceptional efficiency in capturing local dependencies in code sequences, form the backbone of White-Basilisk. A custom linear attention mechanism maintains global context, and a Mixture of Experts (MoE) system routes input to different parts of the model depending on the content. “The core challenge we tackled stems from a fundamental limitation in how AI models process code,” said Ioannis Lamprou, the lead researcher. “Our breakthrough was developing a hybrid architecture that achieves linear complexity. Computational requirements grow proportionally rather than exponentially with code length.”

Greener AI and real-world applications

White-Basilisk is also energy-efficient. The research team estimates that training produced just 85.5 kilograms of CO₂. This efficiency also applies at runtime, as White-Basilisk can analyze full-length codebases on a single high-end GPU without needing distributed infrastructure. This could make it more practical for small security teams and companies without large cloud budgets.

The researchers envision White-Basilisk fitting into current development and security workflows, such as a VSCode extension providing real-time suggestions or integration into CI/CD pipelines. While the model was only trained on C and C++ code, this research not only establishes new benchmarks in code security but also provides empirical evidence that compact, efficiently designed models can outperform larger counterparts in specialized tasks.

Subscribe for AI & Cybersecurity news and insights