A quick look at Mythos run on Firefox: too much hype?

X xark.es ↗

▲ 101 points • 36 comments • by leonidasv • 2mo ago • HN discussion ↗

Pangram verdict · v3.3

We believe that this document is primarily human-written, with a small amount of AI content detected

14 %

AI likelihood · overall

Human

94% human-written 6% AI-generated

SEGMENTS · HUMAN 3 of 5

SEGMENTS · AI 1 of 5

WORD COUNT 1,730

PEAK AI % 100% · §4

Analyzed

Apr 24

backend: pangram/v3.3

Segments scanned

5 windows

avg 346 words each

Distribution

94 / 6%

human / AI fraction

Verdict

Human

Pangram v3.3

Article text · 1,730 words · 5 segments analyzed

Human AI-generated

§1 Human · 28%

Apr 23, 2026When Anthropic published its Mythos announcement, it really seemed impressive at first, almost worrying. But when reading thoroughly, the public evidence is less clean than the headline effect. The often-cited "under $20,000" figure does not mean Mythos casually found one devastating bug for that price; in Anthropic's own writeup, that budget covered a large search process with roughly a thousand scaffolded runs and several dozen findings. That is still notable, but it is a very different claim from the dramatic version people repeat. Mozilla followed with a post about using Mythos identifying a large number of AI-found issues in Firefox 150, and it also seems to push the narrative in the same direction: AI has arrived for vulnerability research. I mean, the latter post is entitled "The zero-days are numbered". Although it looks like a bold take, that may be true. But the public evidence does not support the strongest version of that claim, and unless you are working for one of the chosen (by Anthropic), it is not simple to figure out if these public claims are just marketing or if they are a real game changer. The interesting question is not whether Mythos found bugs. It clearly did. The interesting question is what kind of bugs were found, how serious they were, and whether those findings actually change the balance between defenders and attackers. I spent a few hours going through the Firefox commit history, advisory references, and linked bugs to get a better sense of what Mozilla's numbers really mean. This is not a full audit of every patch, but it is enough to form a more grounded view than the marketing cycle usually allows. The claim Mozilla reported that 271 vulnerabilities were identified in Firefox 150 associated with Mythos. At the same time, the Firefox 150 security advisory does not map that claim to a single clean list of 271 Firefox-only bug IDs. It contains many individual CVEs from different reporters, including at least three entries explicitly credited to Anthropic, as well as several aggregated "memory safety bugs" entries: CVE-2026-6746: Use-after-free in the DOM: Core & HTML component CVE-2026-6784: Memory safety bugs fixed in Firefox 150 and Thunderbird

§2 Human · 20%

150 CVE-2026-6785: Memory safety bugs fixed in Firefox ESR 115.35, Firefox ESR 140.10, Thunderbird ESR 140.10, Firefox 150 and Thunderbird 150 CVE-2026-6786: Memory safety bugs fixed in Firefox ESR 140.10, Thunderbird ESR 140.10, Firefox 150 and Thunderbird 150 Those four entries alone link to hundreds of bugs. That should immediately make anyone cautious about reading the headline number too literally. A large AI-assisted cleanup campaign can still be important without every individual fix representing a directly exploitable, high-end vulnerability. The linked bug counts here are 1, 55, 154, and 107 respectively, which makes 317 in total. But that still should not be compared directly to Mozilla's "271 vulnerabilities identified" claim, because the aggregated CVE buckets also cover Thunderbird and ESR releases, not just Firefox 150. There is also a basic accounting problem here: Mozilla's 271 figure, Bugzilla bug IDs, advisory CVEs, and individual commits are not the same unit. Publicly, you can reconstruct pieces of the picture, but not a single authoritative Firefox-only list that cleanly explains the 271 number. That does not mean Mozilla is wrong. It means outsiders should be careful not to over-interpret the advisory as if it were a perfect ledger of the claim. What the data suggests I vibecoded a small tool to group commits, bugs, CVEs, and touched subsystems as well as displaying some statistics. I also made a poor attempt at trying to score the bugs depending on keywords found, in order to prioritize which bugs would look like actually actionable. You can use it to quickly browse through commits, and even get my scripts sources at the end of the summary for reproducibility. Even if you ignore the exact totals, the shape of the data is informative: Hundreds of commits and bug references are involved. The touched code is spread across major Firefox attack surface areas like dom, gfx, netwerk, js, layout.

§3 Human · 21%

The patch set mixes obvious safety fixes, defensive cleanups, lifecycle hardening, API usage tightening, and some changes that look closer to real exploit primitives. As part of the CVEs, some patches seem to be not security related (e.g. avoiding null dereference) although relevant for the program stability. That distinction matters. "Found a bug" is not the same statement as "found an exploitable vulnerability", and it is definitely not the same statement as "found a weaponizable chain component". In browser exploitation, there is a wide spectrum between: a harmless correctness bug, a crash-only bug, a bug that creates a memory corruption primitive, and a bug that survives into a reliable exploit chain. If you collapse that spectrum into a single headline number, you get attention, but you lose precision. Stats between tags FIREFOX_BETA_149_END and FIREFOX_BETA_150_END I'm using these tags as a rough release window, not as a precise Mythos boundary. That distinction matters. The stats below describe the Firefox 150 development interval broadly, and not a cleanly isolated set of Mythos-derived fixes. So they are useful for showing scale and patch distribution, but they should not be read as "these are the 271 Mythos vulnerabilities". Commits: 6,115 Bug IDs: 3,209 High-Priority Candidates: 252 Bugs with (high) CVE: 301 (counting non-mythos CVEs as well) Commits with (high) CVE: 340 Changed lines: 3,438,679 Median lines / commit: 52 Mean lines / commit: 562.34 Largest patch: 480,735 Commits with crashtest: 47 We can also notice that many commits associated with those bugs predate the Anthropic post by days or weeks, with an obvious pike on April 2. That is not surprising. Advisory aggregation happens late, and some fixes that end up grouped under a release CVE were clearly authored earlier, for example on March 5.

§4 AI · 100%

Are these "real vulnerabilities"? This depends on the standard you care about. If you are a defender, the answer is straightforward: yes, broadly speaking, many of these fixes matter. Memory-safety issues, lifetime mistakes, race conditions, incorrect ownership, and serialization problems are exactly the kinds of patterns that defenders want removed before an attacker gets to them. Even when a bug is not independently exploitable, it can still reduce safety margins or become useful when combined with another issue, think of e.g. a relative or arbitrary read primitive. If you are thinking like an attacker, the bar is higher. A bug is only truly interesting if it buys leverage: control of memory, type confusion, privilege boundary crossing, sandbox escape, or something else that materially advances exploitation. By that standard, a lot of the published fixes look more like hardening and bug debt reduction than obvious exploit gold. That is not a criticism. Hardening is good. But it is not the same thing as proving that a model is now outperforming top offensive researchers at finding high-value browser chains. This brings me to the context of a vulnerability. For a defender, a vulnerability is a vulnerability regardless of its exploitability context. When it comes to browsers, there are attack surfaces hidden behind additional user interactions, or very specific setups, runtime options, and more, which would not be reliably actionable to weaponize a vulnerability. As an attacker, you would typically never spend effort on such surface. What stands out in the patch set A quick pass through the linked fixes shows several recurring categories: reference lifetime fixes, ownership and cleanup corrections, race-condition and async teardown fixes, bounds checks and integer handling, safer serialization and IPC handling, upstream library updates and vendor syncs. Some of those are exactly where dangerous bugs come from. Others are better understood as preventative maintenance that happened to be triggered by large-scale automated review. This is why one issue such as 2014596 for CVE-2026-6746 stands out more than the giant aggregate CVE buckets. A concrete use-after-free is easy to reason about as a potentially exploitable security issue. A long list of "memory safety bugs fixed" is directionally important, but analytically much weaker unless you inspect the individual bugs.

§5 Mixed · 32%

What Mythos seems good at The strongest charitable reading of the Firefox 150 data is this: Mythos appears to be very good at surfacing suspicious patterns at scale. That is already valuable. A model that can find cleanup bugs, lifetime hazards, API misuse, unsafe assumptions, and latent memory-safety issues across a codebase the size of Firefox is useful even if only a fraction of those findings are directly exploitable. For a defensive team, that can translate into faster hardening, broader code review coverage, and less time wasted on manual triage. Publicly, that is the part that looks well supported. This is probably the most important practical outcome. Security teams do not need a model to independently invent a full exploit chain for it to have significant value. However, its value is not clear compared to other LLMs, if you tried yourself to run any model at finding bugs in a codebase, or even wrote your own agents, you most certainly are confident that it would warn you for most of the patterns found by Mythos. Take Google Big Sleep for instance, there is a chance it has been way more relevant than Mythos already, and there hasn't been such dramatical announcements. What remains unproven The offensive claim is much harder to support. From the public evidence, we still do not know how many tokens, runs, and analyst-hours were required, how much human filtering was needed, how many findings were duplicates or low-value crashes, how Mythos compares to other strong models on the same targets, and how many of the fixed bugs would have materially mattered in a real exploit-development context. I'm sure Mozilla did not even spend time to prove exploitability, nor did Mythos provide a PoC for them (although some commits include crashtests). Without knowing the actually exploitable bugs count, it is hard to call this a security revolution rather than a successful large-scale bug-mining campaign. And the distinction is important because browser security is not measured by the number of bug fixes, it is measured by whether attackers lose meaningful capabilities. And that is not yet obvious here. Defender relevance vs attacker relevance This is where I currently land. For defenders, Mythos looks relevant right now. Even if many of the findings are "just" stability issues, suspicious cleanup bugs, or latent memory-safety hazards, removing them improves the codebase and reduces future opportunity for attackers.