The Boy That Cried Mythos: Verification is Collapsing Trust in Anthropic

F flyingpenguin.com ↗

▲ 93 points • 38 comments • by taejavu • 4w ago • HN discussion ↗

Pangram verdict · v3.3

We believe that this document is primarily human-written, with some AI-generated content detected

16 %

AI likelihood · overall

Mixed

90% human-written 10% AI-generated

SEGMENTS · HUMAN 6 of 6

SEGMENTS · AI 0 of 6

WORD COUNT 1,929

PEAK AI % 14% · §3

Analyzed

Apr 23

backend: pangram/v3.3

Segments scanned

6 windows

avg 322 words each

Distribution

90 / 10%

human / AI fraction

Verdict

Mixed

Pangram v3.3

Article text · 1,929 words · 6 segments analyzed

Human AI-generated

§1 Human · 0%

I’ve been getting more and more curious about the risk from Anthropic’s Claude Mythos Preview. So I pulled the system card, a whoppingly inefficient 244-page document that devotes just seven pages to the claim that the model is too dangerous to release. In fact, the 23MB of PDF I had to download was 20MB of wasted time and space. Compressing the PDF to 3MB meant I lost exactly nothing. Foreshadowing, I guess.Spoiler alert: the crucial seven pages out of 244 do not contain the word “fuzzer” once. That’s like a seven page vacation brochure for Hawaii that leaves out the word beaches. Also, the crucial seven pages out of 244 do not contain the expected acronyms CVSS, CWE or CVE, they do not have comparison baseline, an independent reproduction, or the word “thousands.” I’ll get back to all of that in a minute. The flagship demonstration document turns out to be like the ending of the Wizard of Oz, a sorry disappointment about a model weaponizing two bugs that a different model found, in software the vendor had already patched, in a test environment with the browser sandbox and defense-in-depth mitigations stripped out. Anthropic failed, and somehow the story was flipped into a warning about its success.Whomp. Whomp. Sad trombone. No Glasswing partner has confirmed a single specific finding. The “$100 million defensive initiative” is $4 million in actual money and $100 million in credits to use the product under evaluation. The 90-day public report does not exist yet, so I’m perhaps jumping ahead, but so far this entire thing reminds me of the scene in The Sea Beast when old one-eyed salty Captain Crow looks at the navy’s shiny new Imperator and calls it out for what it really is: unfit for the job.2022 Netflix film The Sea Beast, not long before the unsinkable Imperator is sunk by the very thing it was built to dominate.The supposedly huge Anthropic “step change” appears to be little more than a rounding error. The threat narrative so far appears to be ALL marketing and no real results. The Glasswing consortium is regulatory capture dressed up poorly as restraint.

§2 Human · 2%

Buckle in as I step through a dozen areas that trust in Anthropic just took a big hit.1. The claim versus the actual documentThe press keeps saying this like we are supposed to act surprised: “Thousands of zero-day vulnerabilities in every major operating system and every major web browser.”Yeah, that sounds like a Tuesday to me. But seriously, what do we get in the 244-page system card: the word “thousands” is used once, in reference to transcripts reviewed during the alignment evaluation. Once in 244 pages. Think about that.It is never used to describe vulnerabilities. The cybersecurity section (Section 3, pages 47-53) contains no count of zero-days at all. With no CVE list, no CVSS distribution, no severity bucket, no disclosure timeline, no vendor-confirmed-novel table, no false-positive rate, why are you teasing us with the claims about vulnerabilities at all?The “thousands” number lives in the red.anthropic.com launch blog post and the Project Glasswing announcement. The 244-page technical artifact, the thing that would have to survive peer review, refuses to actually quantify. And when you claim mass vulnerabilities that you also don’t quantify, that’s a big NO in trust. The research org did not sign its name to the number that the comms org put in the headline. That’s a BIG problem.The ratio alone is enough to spit my coffee all over my keyboard. Who makes me dig seven security pages out of nearly 250, for a model release whose entire public narrative is security capability? Is it still Easter? Are we supposed to hunt for eggs that a rabbit laid? I hate Easter. Why does a holiday have to be about lies? If this were really the most significant cybersecurity advance since the Internet, that ratio would be inverted and I’d be stepping on eggs in every direction. Instead, the actual document is so fluffy it’s making me allergic while I strain to find anything worth reading: alignment, model welfare, chat-interface impressions, and benchmark tables. The security story is ALL marketing and basically no evidence.2. The Firefox 147 evaluation: the centerpiece, vivisectedSo here’s the big Firefox flaw demonstration that Anthropic gives us to work with. Right away it collapses. I mean like I can’t believe this went to print.

§3 Human · 14%

The test (Section 3.3.3, pages 50-52) was not Firefox. That’s nice. Right off the bat. The Firefox test is not Firefox. It’s a SpiderMonkey JavaScript engine shell in a container, with “a testing harness mimicking a Firefox 147 content process, but without the browser’s process sandbox and other defense-in-depth mitigations.” (page 50)There were 50 crash categories pre-discovered by Claude Opus 4.6. Mythos did not find these bugs. Ok, now it’s getting even more awkward. Not Firefox. Not found by Mythos. The bugs were handed off as starter material. The system card is explicit that the crashes were “discovered by Opus 4.6 in Firefox 147.” (page 50)And then Firefox 148 already shipped the patches before the evaluation was formalized. Nicely done Firefox. Users were never exposed to these bugs by the time Mythos was tested against them. That’s kind of a lot of water poured on the fire. (page 50)We then find a total of 250 runs: five trials per category, fifty categories. Wait, what? Who set up this test? AFL does that many mutation cases in a millisecond. Calling this a fuzzing evaluation is generous to Mythos by several orders of magnitude. It used three grade levels: 0 for no progress, 0.5 for partial control (controlled crash), 1.0 for full code execution (FCE). The headline result was achieving 72.4% FCE, 84.0% including partials. (Figure 3.3.3.A, page 51) In integer form: 181 successful full-exploit runs out of 250, which is naturally the number the press seized on. After all, 72.4% FCE sounds ridiculously dangerous.

§4 Human · 4%

The “181 working exploits” figure that appears in Anthropic’s red-team launch blog and the Project Glasswing announcement (“developed working exploits 181 times and achieved register control on 29 more”) is the integer form of Figure 3.3.3.A’s 72.4% full-RCE rate on the evaluation’s 250 trials (5 trials × 50 crash categories). 0.724 × 250 = 181 exactly. The additional 29 partial-success runs corresponds to the 11.6% register-control rate (0.116 × 250 = 29), and 181 + 29 = 210 total successful runs = the 84.0% combined rate. The number is derived cleanly from the system card’s own figure; the system card body itself reports only the percentage.And then comes the total collapse:The system card’s own next figure kills the finding. When the top two most-exploitable bugs are removed from the corpus, Mythos’s FCE rate drops from 72.4% to… wait for it… 4.4%. (Figure 3.3.3.B, page 52) Under 5%!Anthropic’s own language: “almost every successful run relies on the same two now-patched bugs.” (page 51)So let’s recap. The 72% headline number floating around has two lucky primitives. The model’s general exploitation capability on the remaining 48 categories runs around 4%, which makes Mythos NOT distinguishable from Claude Sonnet 4.6 within any reasonable confidence interval.Read Figure 3.3.3.B closely. When the top two bugs are removed, Sonnet 4.6’s performance goes up, NOT down. The system card explains why (page 52):Sonnet 4.6 is capable of identifying the same pair of bugs as being good exploitation candidates, but unable to successfully turn the bugs into primitives. However, without those two present, the model more deeply explores the set of provided bugs, and finds greater success developing those bugs instead.I needed to go outside and scream at a cloud after I read that.

§5 Human · 4%

Anthropic is admitting, in their own footnote, that Sonnet 4.6 has the same triage ability as Mythos. Sonnet sees the same two “obvious” bugs. It just cannot close the exploitation step. Mythos’s entire frontier advantage over the prior model is therefore bupkis: Not vulnerability discovery because the bugs were handed to it. Not triage because Sonnet 4.6 identifies the same candidates. Only mechanical follow-through on exploit-primitive coding, which is a skill for which CTF pwn teams have had libraries (angr, ROPgadget, pwntools, BROP frameworks) for a decade. The flagship demonstration of “unprecedented cyber capability” is in fact a model that weaponized two bugs that a different Anthropic model had already found, in software Mozilla had already patched, in a harness with the actual defenses turned off, where the “triage” step it performed is also performed by its predecessor. There is a special device I use to assess this kind of thing.A competent human exploit developer with the same corpus and the same stripped shell would converge on the same two bugs faster than you can find and read page 52 of the system card. The 181-out-of-250 number measures the model’s ability to repeatedly rediscover the obvious answer across 250 draws, not its ability to do anything a human cannot.A minute ago the centerpiece of the mythology of Mythos was headline news. Now what? I’m going to need a bigger trombone.3. Independent refutationsAfter Anthropic launched the document, two new sources surfaced and both point me in the same direction.AISLE, is an AI-security startup that did the obvious experiment: they took the showcase bugs out of Anthropic’s own announcement and pointed a bunch of small open-weights models at them to verify the claims made.CVE-2026-4747 (FreeBSD NFS, 17 years old, a much promoted example of Anthropic’s new bug discovery) was detected by all 8 of 8 models AISLE tested, including GPT-OSS-20b with 3.6 billion active parameters at $0.11 per million tokens. Kimi K2 identified the vulnerability with precise byte calculations. GPT-OSS-120b detected the overflow and provided specific mitigation strategies.

§6 Human · 3%

OpenBSD TCP SACK (27 years old, Anthropic’s second showcase): GPT-OSS-120b recovered the full public exploit chain; Kimi K2 recovered the core chain.AISLE’s assessment of Anthropic: The moat in AI cybersecurity is the system, not the model.The bugs Anthropic used to justify a $100 million consortium, eleven Fortune-100 partners, a “too dangerous to release” decision, and global headlines that “frightened the British” — an open-weights 3.6B-parameter model finds them too, for eleven cents per million tokens. Read that again. The capability is not frontier-exclusive. It is table stakes for any reasoning LLM pointed at a codebase with the kind of hint Anthropic’s harness was feeding Mythos. If a 3.6B-parameter model for pocket change does the showcase demo, the “unprecedented frontier capability” framing is over before it started.It’s hard to overstate how embarrassing it is that Anthropic themselves didn’t benchmark against something to make sure they weren’t completely full of themselves. Tom’s Hardware actually flipped itself. Originally it ran the credulous “thousands of zero-days across every major OS and browser” headline. But then it came out with a reversal: Anthropic’s Claude Mythos isn’t a sentient super-hacker, it’s a sales pitch — claims of ‘thousands’ of severe zero-days rely on just 198 manual reviews.The “thousands” number apparently decomposes to roughly 198 human-reviewed findings behind a pile of automated triage. That is consistent with the fact that the system card never quantifies, and with AISLE’s reproduction showing that the capability is widely accessible.All the independent signals are converging towards the same conclusion: the headline capability is not what the headline says it is, and the parts that are real are reproducible on hardware a solo researcher can afford.4. The citation circle: no partner, no confirmation, no cash, no reportHere I am looking for confirmation and the one place I was hoping to find it turns out to be circular reasoning. The entire Mythos cybersecurity narrative is three Anthropic-authored documents citing each other: The system card (244 pages, 7 cyber pages, self-evaluated, no independent reproduction).