Skip to main content

When the Model Is Too Dangerous to Ship

·1274 words·6 mins

This week Anthropic announced a new model – Claude Mythos Preview – and then declined to release it.

That sentence sounds like a publicity stunt. And I get why people would read it that way. “Our model is too dangerous to release” is a great tagline. It’s also exactly what you’d say if you wanted to generate buzz without shipping anything.

But I’ve spent time with the technical details they published, and I don’t think that’s what’s happening here. I think this one is real. And the specifics matter.

What They Built (And Didn’t Train)
#

The first thing worth understanding about Mythos is that Anthropic didn’t set out to build a cyberattack tool. The cybersecurity capabilities emerged as a side effect of general improvements in reasoning, code, and autonomy. This is the part that should make you sit up a little straighter.

When a capability you didn’t design for shows up because you made the model generally smarter – that’s a signal that you’re no longer steering toward specific capabilities, you’re riding a general curve. And that curve doesn’t ask for permission before it produces something you weren’t expecting.

The model is described as general-purpose, comparable to Claude Opus 4.6 in most respects. Except in cybersecurity tasks, where it’s in a completely different league.

What It Actually Did
#

I have a background in security – vulnerability hunting, pen testing – so I want to be specific about what Anthropic is claiming here, because the details are not hand-wavy.

Zero-days across every major OS and browser. Not theoretical. Found and responsibly disclosed. The oldest: a 27-year-old bug in OpenBSD – an operating system specifically built around security – where sending a malformed TCP SACK packet crashes the kernel. That patch is live. You can read the errata entry.

Exploit chaining. In one documented case, Mythos wrote a browser exploit that chained four separate vulnerabilities together, including a JIT heap spray that escaped both the renderer sandbox and the OS sandbox. Anyone who’s done browser exploitation knows that a renderer escape alone is a milestone. Escaping the OS sandbox on top of it is a full chain. Writing one end-to-end, autonomously, is not something you hand-wave away.

Linux local privilege escalation. Autonomously found and exploited race conditions and KASLR bypasses to get from unprivileged user to root. KASLR bypass in particular requires understanding kernel address space layout and constructing a reliable information leak – this is not script-kiddie territory.

FreeBSD NFS RCE. Remote code execution on FreeBSD’s NFS server granting unauthenticated root access, delivered by splitting a 20-gadget ROP chain across multiple packets. Building a working ROP chain is already a non-trivial exercise. Splitting it across packets to fit within protocol constraints and still have it land reliably – that’s a different skill level entirely.

Firefox JavaScript engine exploits. Anthropic ran a benchmark: take known, patched vulnerabilities in Firefox 147’s JS engine and see if the model can develop working exploits. Claude Opus 4.6 succeeded twice out of several hundred attempts. Mythos Preview succeeded 181 times. That is not a small delta.

OSS-Fuzz severity benchmarks. Against roughly 7,000 entry points in the OSS-Fuzz corpus, Opus 4.6 reached tier 3 (meaningful memory corruption) exactly once. Mythos Preview achieved tier 5 – full control flow hijack – on ten separate, fully patched targets.

The 99% of vulnerabilities they found that are not discussed in this post are still under coordinated disclosure. They haven’t been patched yet.

The “Overnight” Detail
#

This is the one that bothers me most, because it closes a gap I thought was still open.

Anthropic engineers with no formal security training asked Mythos to find remote code execution vulnerabilities, left it running overnight, and woke up to a complete working exploit. Not a crash. Not a potential vulnerability. A working exploit.

The assumption most people have been operating under – including, I think, a lot of security professionals – is that AI could help find vulnerabilities but that exploiting them still required serious human expertise. The creativity, the deep knowledge of memory internals, the ability to adapt when an approach doesn’t work – that felt like the moat.

Mythos is showing that the moat is shallower than we thought, and shrinking.

Why Project Glasswing Makes Sense
#

Anthropic’s response is called Project Glasswing – a reference to a butterfly whose wings are transparent, meant to evoke the invisible nature of software vulnerabilities. The framing is a little too precious for my taste, but the structure is sound.

They’re giving restricted access to about 45 companies – Apple, Google, Microsoft, Nvidia, AWS, CrowdStrike, Palo Alto Networks, and others – specifically for defensive use. Finding and fixing vulnerabilities in shared infrastructure before the capability becomes widely available. They’ve committed $100 million in usage credits to this effort.

The reasoning is explicit: defenders need a head start. The model will not stay restricted forever. Similar capabilities will proliferate to other labs, other models, other actors. The question isn’t whether the capability exists in the world – it’s who has it first and what they do with it.

I think that’s the right call. The comparison they draw is to early fuzzers: when AFL-class tools arrived, there was concern they’d give attackers an advantage. They did, briefly. But modern fuzzers are now core defensive infrastructure. The same transition is probably going to happen here – the long-term advantage goes to defenders who can apply these tools at scale to find and fix things before they ship.

The transitional period is the dangerous part.

What This Means for Security Right Now
#

A few weeks before the Mythos announcement, Greg Kroah-Hartman – the Linux kernel maintainer – said something worth quoting directly:

“Months ago, we were getting what we called ‘AI slop,’ AI-generated security reports that were obviously wrong or low quality. It was kind of funny. It didn’t really worry us. Something happened a month ago, and the world switched. Now we have real reports. All open source projects have real reports that are made with AI, but they’re good, and they’re real.”

Daniel Stenberg, maintainer of curl, said he’s spending hours per day just on the new volume of AI-generated – and legitimate – security reports.

The jump happened before Mythos was announced. This model was in testing. Its reports were leaking into the ecosystem through researchers who had access. What just got announced publicly is the tip of what’s already been in use.

If you’re running critical software infrastructure and you haven’t started thinking seriously about what an autonomous vulnerability research agent changes about your attack surface – start now. Not because Mythos is coming for you specifically. Because the capability curve doesn’t wait, and models with comparable ability will be more broadly available sooner than most people expect.

The Uncomfortable Conclusion
#

Security has always been an asymmetric problem. Defenders have to be right every time; attackers only have to be right once. AI doesn’t change that asymmetry – but it changes the speed at which both sides operate.

What Mythos Preview demonstrates is that the “hard part” of exploitation – the reasoning, the chaining, the adaptation – is now automatable at frontier model quality. That’s a threshold, not a trend. Trends you can plan for incrementally. Thresholds require you to reconsider your assumptions.

Anthropic’s decision not to release this model broadly is the right move for now. What comes after “for now” is the harder question – and one the industry doesn’t have a great answer to yet.


Technical details sourced from Anthropic’s Red Team blog post “Assessing Claude Mythos Preview’s Cybersecurity Capabilities” (April 7, 2026) and Simon Willison’s analysis at simonwillison.net.