Thoughts on the Mythos system card with a specific eye on cyber capabilities

This is a crosspost from my X thread, which I am adding here because I think it deserves a place in a blog-style presentation. Even though I have been critical of Anthropic here, and even more so in an earlier thread that this one quotes, it got liked by Logan Graham, the head of the Frontier Red Team at Anthropic. It is only a like, but doing that and following me made me more optimistic about how Anthropic will handle and write these reports going forward, and earned him some respect from me.

I believe the authors of the Mythos system card consider it an earnest product, in line with the RSP. But it is hard not to come away feeling like you have read marketing slop rather than substance. The quoted post makes this point. Without raw traces, even cherry-picked ones, and without proper descriptions of the eval suites, there is no way to seriously review whether the headline numbers mean what the charts imply.

I am not coping, and I am not 100% confident Mythos is not a real-world cybersecurity threat. It is very likely a strong model, and I would be disappointed if a serious effort at cybersecurity at this scale with today’s frontier models found nothing. I would also expect OpenAI, in a comparable push, to already be capable of finding the exploits in question since GPT-5.2 Pro. My narrow point is about presentation, especially around browser exploitation. Looking at how the results are framed, which implications are drawn, and which conclusions are reached, it is hard not to get at least a little suspicious and put certain results in question.

Anthropic’s researchers are obviously smart, and it would be stupid to assume they are not aware of this themselves. My read is that they are, and that they got pushed by a marketing chapter to present the results as stronger than they themselves believe them to be. If that is not the case, I would be very concerned, or surprised that they got psyoped to some degree themselves. The more charitable hypothesis is also probably the more likely one: smart people in a bad institutional equilibrium.

Binary exploitation is not my main area of expertise, but I have played alongside the very best in many CTF competitions, so I know my fair bit. Zero-days are not that hard to find in some random C software. Name me any random software and even I, or at least some people I know, will find something. There is just no real financial incentive for skilled enough people to find those. Bug bounties pay badly, and just doing your day-to-day job at a cybersecurity company pays way better and is way more comfortable.

In my experience, these people do not have any good reason to become criminal; see Crime and Punishment by Dostoevsky. Nobody is really looking for zero-days in random C software if they are good at making their own company’s software secure. It is not as hard to find zero-days as it seems, and some of the same memetic dynamics of misunderstanding happened when models got IMO gold but still to this day, and I guess even Mythos, struggle with IMO P6.

A post with this much traction obviously attracts a lot of bad quote-tweets and comments, the kind saying “it is just brute-forcing” or “it is a stochastic parrot”, which are obviously wrong. On the other side, you get people who already held these premises and just searched for any confirmation that kind of looked like something against Mythos in general, or who held the premise “frontier models can now do end-to-end exploitation” and read the figures as confirmation without noticing that the substrate does not license it. There is a lot of cope in both directions.

It reminds me heavily of the IMO gold model release, where most people could not really put it in context. And how could they? They had not done olympiads. To be clear, I do not expect everyone to be good at or understand the problems of these contests. They did not need to, and so could not evaluate whether the problems were hard in the way claimed. That is fine. It just means the burden of careful presentation falls harder on the lab, and that burden was not met here. Some overexaggeration is fine to market your model and everybody does it. It is not fine in a report that claims scientific accuracy.

AI is moving fast, I am not doubting that. But there will remain problems with vast exploration spaces where my intuition is that we will need to rethink RL fairly fundamentally to get anywhere close to human-level novelty in zero-day discovery. These models likely converge on the same bug classes already represented in prior exploits, and if I remember correctly even the model card says that for novel research Mythos provides only minor benefit. As long as that holds, humans will keep finding a lot of the genuinely new bugs that AI will not, at least near- to mid-term.

This is good progress, but it will not replace the need for smart humans in the loop finding the really novel stuff, though those humans will likely be accelerated by working alongside a model like this. In most other security domains there is historically a heavy attacker’s advantage, but that is the beauty of cryptography: it is inherently a place where the defense side wins when correctly implemented, and it is great that models are now capable enough to surface some of these flaws even there.

The playing field is changing, but not discontinuously. A lot of what Mythos is being credited with has very likely been happening already, with clever prompting and operators who know what they are doing on today’s public models. We will obviously see more flaws surfaced as serious well-resourced efforts get pointed at the problem, and I applaud Anthropic for running one. I just wish we had gotten a scientific writeup of the charts instead of a marketing-chapter-shaped one.

One last thought. It can seem noble not to trust the general public as models get more capable. But historically, very centralized capability in two or three institutions is not a story with a good ending. You are basically trusting these companies and their associated entities not to do anything harmful with extremely powerful models, and asking them to evaluate honestly how dangerous those models are. That is exactly the regime in which a marketing chapter gets to reshape a figure, and exactly the regime in which nobody outside can call it.

My best guess for now is that this lands closer to marketing stunt than to real-world cybersecurity risk as already stated. It will become a real risk eventually, once models get more capable, but I do not see that happening in the near- to mid-term. The one unambiguously good thing to come out of it is that more people are now thinking seriously about these risks and how to mitigate them, and that is worth something on its own. I have more thoughts, but enough Mythos posting for now. I wrote this on a plane tired, so forgive me some weird formulations.

References and acknowledgements

Acknowledgements

Thanks to two friends who prefer to stay anonymous for very helpful discussion.

Anthropic. System Card: Claude Mythos Preview.
gum. Original X thread.
gum. Earlier X thread on the cyber section.

Cite this post

@online{gum2026mythosredteam,
  author = {gum},
  title  = {Thoughts on the Mythos system card with a specific eye on cyber capabilities},
  year   = {2026},
  month  = {04},
  day    = {10},
  url    = {/post/mythos-red-team-system-card/},
}