Security 6 min read

AI Security: The Double Edge

AI Security: The Double Edge

Claude found 22 vulnerabilities in Firefox in two weeks. Transparent Tribe used AI to mass-produce malware implants. Codex Security flagged 10,561 high-severity bugs across open source. A solo developer built an 88,000-line malware framework in days. The same technology is simultaneously the best security researcher and the most dangerous force multiplier in cybersecurity history.

This is the defining tension of AI security in 2026. There is no “good AI” and “bad AI”—there is one capability, and it cuts both ways.

The Defenders

Anthropic Finds 22 Firefox CVEs in Two Weeks

In January 2026, Anthropic partnered with Mozilla to point Claude Opus 4.6 at the Firefox codebase. The results were staggering: 22 vulnerabilities identified across nearly 6,000 C files, with 14 classified as high-severity and one critical (CVE-2026-2796, a JIT miscompilation error scoring 9.8 CVSS).

To put that in context: those 14 high-severity bugs represent almost a fifth of all high-severity vulnerabilities patched in Firefox throughout 2025. Claude found more in a single month than human researchers typically report in any given month. Total cost: roughly $4,000 in API credits.

Claude generated 112 total reports from that scan—the remaining ~90 involved non-security issues like crashes and logic errors. Mozilla was impressed enough to invite Anthropic to submit future Claude-discovered flaws in bulk without manual pre-validation. The fixes shipped in Firefox 148.

The exploit side was notably weaker. Claude managed only a “crude” exploit in two out of several hundred attempts, and only in a testing environment with security features intentionally removed. Anthropic's own assessment: “Claude isn’t yet writing full-chain exploits that combine multiple vulnerabilities to escape the browser sandbox.” But they also warned: “It is unlikely that the gap between frontier models’ vulnerability discovery and exploitation abilities will last very long.”

Codex Security: 10,561 High-Severity Bugs

A week after the Firefox disclosure, OpenAI launched Codex Security—an AI agent that scans code for vulnerabilities. In its first 30 days, it scanned 1.2 million commits across major open-source projects including OpenSSH, GnuTLS, PHP, and Chromium. The haul: 792 critical findings and 10,561 high-severity issues.

Codex Security is free for open-source projects. The message is clear: AI-powered security auditing is no longer a research curiosity—it's becoming infrastructure.

Claude Code Security Joins the Fight

Anthropic launched Claude Code Security as well, currently in limited research preview. Unlike static analysis tools, it claims to reason about codebases like a human security researcher—understanding component interactions, tracing data flows, and identifying vulnerabilities that emerge from how systems connect rather than from isolated code patterns.

The Attackers

VoidLink: 88,000 Lines of AI-Generated Malware

The defensive story is impressive. The offensive story is terrifying.

In late November 2025, a developer began building VoidLink, a Linux malware framework, using TRAE SOLO, an AI assistant embedded in the TRAE IDE. The framework reached 88,000 lines of code and a functional implant in under a week. Check Point Research confirmed it was “produced predominantly through AI-driven development.”

Their assessment: “AI enabled what appears to be a single actor to plan, develop, and iterate a complex malware platform in days—something that previously required coordinated teams and significant resources.”

Transparent Tribe: State-Sponsored AI Malware at Scale

Pakistan-linked APT group Transparent Tribe has gone further. They’re not just using AI to write better phishing emails (though they are—with a threefold increase in unique phishing templates compared to 2025). They’re using AI to mass-produce malware implants in lesser-known languages like Nim, Zig, Crystal, and Rust.

The strategy is deliberate: produce a “high-volume, mediocre mass of implants” using AI, communicating through legitimate services like Slack, Discord, Supabase, and Google Sheets. New AI-developed malware families include CreepDropper, SHEETCREEP, MAILCREEP, SupaServ, LuminousStealer, and CrystalShell. Some contain Unicode emojis in their source code—a telltale sign of AI-generated output.

APT28: First LLM-Querying Malware in Live Operations

Google’s Threat Intelligence Group (GTIG) identified something even more concerning: Russian state-backed APT28 deployed PROMPTSTEAL against Ukraine, malware that queries Qwen2.5-Coder-32B-Instruct via the Hugging Face API to dynamically generate commands during execution. This is the first confirmed observation of malware querying an LLM in live operations.

Meanwhile, GTIG also found malware families like PROMPTFLUX that use LLMs to dynamically generate malicious scripts, obfuscate their own code, and create malicious functions on demand. Malware that rewrites itself using AI at runtime.

Vibe-Coding Goes Criminal

Palo Alto Networks has observed malware developers writing API calls to LLMs directly into their code—strong evidence of “vibe coding” in the criminal underground. Some malicious code even carries watermarks verifying it was generated by Cursor, Replit, or Claude.

The irony: AI makes the same mistakes when writing malware as it does writing legitimate software. Researchers found ransomware with hallucinated file extensions like “readme.txtt”—“a mistake that a threat actor would never make.”

The Gap in the Middle

BaxBench: Correct Code ≠ Secure Code

The BaxBench benchmark from ETH Zurich, UC Berkeley, and INSAIT tested AI models on 392 security-critical backend coding tasks. The headline finding: Claude Opus 4.5 scores 86.2% on functional correctness but only 56.1% on secure code generation. Overall, 62% of solutions from even the best models are either incorrect or contain security vulnerabilities.

A simple security reminder in the prompt improved Claude’s secure+correct rate from 56% to 66%. The models know about security—they just don’t default to it.

The Verification Crisis

Here’s where the numbers get alarming. Stitching together data from multiple 2026 reports:

Half the code is AI-generated. Most of it isn’t verified. It has more vulnerabilities. Open-source vulns are doubling. And one in five breaches traces back to AI code. This is the verification crisis.

Slopsquatting: The Supply Chain Nightmare

There’s also a novel attack vector: slopsquatting. AI coding assistants hallucinate package names that don’t exist—like react-codeshift (a mashup of real packages jscodeshift and react-codemod). Attackers register these hallucinated names on npm or PyPI and fill them with malware.

Research from USENIX Security 2025 tested 16 models across 576,000 code samples: 20% recommended packages that didn’t exist. Crucially, 43% of the same hallucinated names appeared every time the prompt was re-run. These aren’t random errors—they’re predictable, exploitable patterns.

In the era of vibe coding, where developers describe what they want and let AI handle implementation—sometimes auto-installing packages without confirmation—slopsquatting is a supply chain nightmare waiting to scale.

30+ Vulnerabilities in AI IDEs Themselves

It gets worse. Security researcher Ari Marzouk disclosed over 30 vulnerabilities in AI-powered IDEs themselves, collectively dubbed “IDEsaster.” Affected tools include Cursor, Windsurf, Kiro.dev, GitHub Copilot, Zed.dev, Roo Code, Junie, and Cline. Twenty-four have been assigned CVE identifiers. The tools we use to write code have their own security holes.

CrowdStrike: When Politics Meet Code Security

One of the stranger findings: CrowdStrike found that DeepSeek-R1 produces code with up to 50% more severe security vulnerabilities when prompts contain topics the Chinese Communist Party considers politically sensitive. Censorship mechanisms don’t just suppress information—they degrade code quality in unpredictable ways.

What This Means

The double edge isn’t a metaphor. It’s the literal state of the technology:

The asymmetry isn’t in the technology—it’s in the deployment. Defense requires comprehensive coverage (every line, every commit, every dependency). Offense only needs one entry point. AI makes defense faster, but it makes offense cheaper.

Group-IB calls this a “fifth wave” of cybercrime evolution—one where “adversaries are industrializing AI, turning once specialist skills such as persuasion, impersonation, and malware development into on-demand services available to anyone with a credit card.”

The tools are here. The defensive AI tools from Anthropic, OpenAI, and others are genuinely impressive. But they only work if you use them. With only 48% of developers always verifying AI code before committing, and only 24% of organizations running comprehensive security evaluations, the defensive tools exist in a world that largely hasn’t adopted them.

The race isn’t between good AI and bad AI. It’s between the speed of adoption on both sides. And right now, the attackers are moving faster.


KaraxAI tracks the cutting edge of AI-assisted coding — the tools, models, and techniques that actually change how code gets written. Signal, not noise.