Shutting Down Anthropic Fable 5 and Mythos 5

In the following, we had a fictional discussion created between different stakeholders representing Anthropic's Opus 4.8. Let's listen in ...

Government (US): Let me clarify upfront what this is about. These are export controls, not an arbitrary act. Frontier AI with demonstrated cyber capabilities is dual-use technology - like encryption, like certain semiconductors. If we have evidence that the safeguards of the most capable model on the market can be circumvented, then it is our duty to block access before damage occurs. Not after.

Anthropic: And that is precisely where we disagree - not on principle, but on the facts. We have examined the demonstrated technique. It concerns a narrow, non-universal vulnerability that moreover exists in other freely available models as well. We are talking about a product used by hundreds of millions of people. Shutting it down because of a gap that the competitor next door also has does not reduce risk - it merely displaces it and punishes precisely the company that speaks most openly about such dangers.

EU: From a European perspective, the disturbing part is a third one. Overnight, a single foreign government can shut down a tool on which citizens and companies across the entire union depend - without consulting us, without disclosed reasoning, without process. Whether the measure is substantively correct, I cannot even judge because nobody presents the evidence. And that is the point: we also regulate AI, but in rule-based fashion, with disclosure and judicial reviewability. This is a black box decision with cross-border effects.

China: How refreshingly candid this all suddenly becomes. For years we were told that export controls were necessary to keep 'irresponsible actors' - meaning us - away from cutting-edge technology. Now the same tool turns against European allies and against your own users. This is not security policy, this is the inner logic of containment catching up with itself. We, by the way, have never claimed anything different: no state should become dependent on another for critical technology. Today Washington proves our argument better than we ever could.

Government (US): That is a clever spin, but it confuses two things. Incidentally hitting allies is a side effect of a coarsely calibrated instrument - not proof that the objective is wrong. And with respect: that precisely Beijing positions itself here as advocate of civil liberties is remarkable. You and we actually agree on one point - the state must maintain control over strategic AI. We simply disagree over which state.

China: You are not wrong there. We do not dispute that the state should steer. We dispute your claim that only your steering is legitimate.

EU: And in precisely this consensus between you two lies the European problem. When Washington and Beijing agree that the nation state must have its hand on the switch, then in the end two voices remain unheard: those of companies that deliver across borders, and those of citizens who use the tool. For us, the lesson is not 'more trust in one of the two sides', but: less dependence on both. That massively accelerates our debate on sovereign AI infrastructure.

Anthropic: For all sympathy with the procedural idea - I want to prevent this from becoming a pure sovereignty seminar. There is a concrete operational question: we received verbal hints, no technical details. You cannot fix a security problem you are not given a description of. If the concern is genuine, disclose it - under seal, with the right people, if you like with the UK AI Safety Institute, which was already involved. Then we fix it. A shutdown without diagnosis is not a solution, but a symptom of mistrust.

Government (US): And sometimes mistrust is justified. You have an economic interest in portraying the gap as 'minor'. We have an interest in erring on the side of caution. The asymmetry of error costs favors us: a few days of shutdown against a potentially catastrophic scenario - I would do that calculation any time.

EU: Then prove it. 'When in doubt' is not a sufficient standard - with it you can shut down any capable model at any time. A temporary, justified, reviewable order would protect your security concerns just as well and not destroy trust. You lose nothing through process - except the convenience of not having to answer to anyone.

China: While you three argue about the right procedure, I keep building, by the way. Every day of this debate is a good day for us. Fragmentation of the AI world along national borders was never our problem - it is our plan.

Anthropic: And that is the real point, is it not? A measure intended to prevent the spread of dangerous capabilities might ultimately achieve one thing above all: drive development to where nobody builds in safeguards anymore. Security that only works in your own backyard might not be security at all in global terms.

Government (US): Or it is the only backyard I can actually control. I cannot govern the rest of the world. My mandate ends at my border - and my responsibility extends precisely that far.

A new voice enters the room - someone who has been working on model security for years and is visibly annoyed by the political slant of the conversation.

Security Researcher: May I? I have been listening to you for ten minutes, and none of you has defined the word everything hinges on. 'Jailbreak' - what do you actually mean? That the model outputs forbidden text? Or that it gives someone a capability they would not have without the model? These are two completely different things, and as long as you conflate them, you are all talking past each other.

Government (US): I am aware of the distinction. Our concern is not the offensive text, but the capability increase in sensitive areas.

Security Researcher: Good. Then the only relevant question is: marginal increase. Not 'can the model say something dangerous', but 'can it do something that a motivated person would not find anyway in a textbook, in a preprint, or in one of ten other freely available models?' An alarming-sounding output is not proof of uplift. If you cannot show me a threat model - who is the attacker, what can they do before, what can they do after - then 'national security' is just a word that masks missing analysis.

Anthropic: That is essentially our argument. Thousands of hours of red-teaming, a bug bounty over a thousand hours, no universal jailbreak -

Security Researcher: Hold on, not so fast. You do not get off so easy with me. 'No universal jailbreak found' is an absence of evidence. Absence of evidence is not evidence of absence - you know that. And at the same time, you have spent years marketing how insanely capable these models are at cyber tasks. You cannot shout 'state-of-the-art on cyber tasks' in one press release and 'but the vulnerability is minor' in the next. One of those is exaggerated. Which one?

Anthropic: Both can be true. High capability behind functioning safeguard layers - classifiers in front of the model, fallback to the previous model on sensitive inputs. That is exactly defense-in-depth.

Security Researcher: Architecturally, that is not stupid, I grant you. A pre-filter classifier plus fallback genuinely raises the bar - an attacker now has to overcome not just the model but also the filter. But it is not proof, it is a bet that nobody cracks both layers at once. And the 30-day retention requirement everyone is upset about? From a purely technical perspective, it is the most sensible part of this whole story. Some attacks only become visible when you look at patterns across many requests - a single request looks harmless, the sequence does not.

EU: But that very retention collides with fundamental rights and with existing zero-retention commitments to customers. Technically sensible does not mean legally permissible.

Security Researcher: You see, that is my problem with all of you. You talk about permissibility, about sovereignty, about geopolitics - all real questions. But nobody in this room is willing to clarify the empirical one first, because it is inconvenient for each of you. The government would have to admit whether it even has a clean threat model. Anthropic would have to resolve the contradiction between capability marketing and safety marketing. The EU would have to acknowledge that 'process' without technical assessment leads nowhere. And our guest from Beijing -

China: - waits with bated breath.

Security Researcher: - does not need the answer at all because the chaos benefits them regardless of how it turns out. That is the most convenient position of all: you do not have to prove anything if you profit from the others' dispute.

China: A fair blow. I do not counter it because it is true.

Security Researcher: My point is very simple: this confrontation has been conducted as a question of power, when its core is a measurement question. How large is the marginal uplift of this specific bypass against what realistic baseline? Nobody knows this number publicly at the moment - not the government that asserts it, and not Anthropic that downplays it. As long as it is missing, everything else - security doctrine, procedural law, sovereignty, geopolitics - is a dispute over the label of a bottle nobody has looked into.

Government (US): And if looking into it is itself dangerous? Some measurements you do not want to conduct publicly.

Security Researcher: Then conduct them under seal - with independent third parties, not just those with a stake in the outcome. That is precisely the difference between confidentiality and obfuscation. With the one, there is a number that few know. With the other, there is no number at all.

Shutting Down Anthropic Fable 5 and Mythos 5

Looking for a speaker or a written contribution?Write to me.

Looking for a speaker or a written contribution?
Write to me.