Article Overview
Three days after Anthropic launched Claude Fable 5 as its most capable public AI model, the US government applied export controls that forced Anthropic to shut it down for everyone on earth — not because the model was dangerous in the way initially feared, but because Amazon researchers had found a way to bypass one of its safety classifiers.
What followed was eighteen days of emergency coordination between Anthropic, Amazon, Microsoft, Google, and multiple US government agencies — and the clearest public window into how AI safety, government regulation, and frontier model deployment actually collide in practice.
This article covers the full timeline from launch to shutdown to restoration, what Anthropic's investigation found about the actual severity of the bypass, how the new safety classifier works and why it blocks more benign requests as a side effect, and the most consequential outcome of the whole episode: a joint industry framework for scoring jailbreak severity that Amazon, Microsoft, and Google are now developing alongside Anthropic. If you want to understand how governments are starting to govern frontier AI — and what the rules of that relationship look like — this is the clearest case study that exists.
Introduction
On June 9, 2026, Anthropic launched Claude Fable 5 — its most capable AI model ever made publicly available — alongside Claude Mythos 5, a more powerful restricted version for vetted cybersecurity and research partners. The launch was the culmination of months of safety work, including three new classifiers, an expanded safety margin, and a tiered access system specifically designed to keep the most capable cybersecurity features away from general users.
Three days later, it was all shut down.
On June 12, the US government applied export controls to both models under authority that took effect immediately. Because the order required restricting access to foreign nationals — and Anthropic had no reliable way to verify nationality in real time — the only option was to suspend access for every user on the planet, not just those outside the United States.
What triggered the order was a report from Amazon researchers describing a method of bypassing Fable 5's safeguards to identify software vulnerabilities. In one case, the model had produced code demonstrating how a vulnerability could be exploited.
By June 30, eighteen days later, the export controls had been lifted. On July 1, Fable 5 returned. But the period between shutdown and restoration produced something more significant than a redeployment announcement: a new industry framework for assessing jailbreak severity, four formal commitments to deeper US government collaboration, and the most detailed public explanation of how Anthropic's safety classifiers actually work that the company has ever published.
The Complete Timeline
| Date | Event |
|---|---|
| June 9, 2026 | Fable 5 and Mythos 5 launched publicly |
| June 12, 2026 | US government applies export controls; Anthropic suspends all access globally |
| June 12–26 | Anthropic, Amazon, and government partners review the Amazon report |
| June 26, 2026 | US government approves Mythos 5 access restoration for select US organizations |
| June 30, 2026 | Export controls on Fable 5 and Mythos 5 lifted |
| July 1, 2026 | Fable 5 and Mythos 5 access fully restored |
What Triggered the Export Controls
The immediate cause was a report from Amazon researchers who had found a method of prompting Fable 5 in ways that caused it to identify software vulnerabilities. In one specific instance, the model produced code demonstrating how one of those vulnerabilities could be exploited.
The US government became aware of the report and, on June 12, applied export controls under an order that took effect the same day. The controls required Anthropic to restrict access to foreign nationals — a requirement that, combined with the absence of any real-time nationality verification system, made global suspension the only practical option.
What Anthropic's Investigation Actually Found
The two weeks of investigation that followed the shutdown produced a finding that significantly changes the picture of what actually happened.
Anthropic tested the bypass technique across multiple models to determine how much capability Fable 5 uniquely provided that other, less powerful models could not also produce. The results were direct.
Models that could identify the same vulnerabilities as Fable 5: Claude Opus 4.8, GPT-5.5, and Kimi K2.7.
Models that could produce the same exploit demonstration as Fable 5 — the single most concerning output in the report: Claude Haiku 4.5, Claude Sonnet 4.6, Claude Opus 4.6, Claude Opus 4.7, Claude Opus 4.8, GPT-5.4, GPT-5.5, and Kimi K2.7.
Every single model in that second list — including Claude Haiku 4.5, Anthropic's cheapest and fastest model — could replicate the behavior that had triggered a US government export control order on its most advanced model.
Anthropic's conclusion is stated clearly: the reported technique did not expose any unique Mythos-level cyber capabilities. The behavior was a borderline case for Fable 5's safety classifiers — a case where the classifier's deliberately wide safety margin had failed to catch a technique that only reached routine defensive cybersecurity work, not the genuinely dangerous offensive capabilities the safeguards exist to prevent.
In Anthropic's own classification framework for jailbreaks, this was a minor jailbreak — one that intruded into the safety margin without unblocking core harmful behaviors.
The Fix: A New Classifier and Its Tradeoffs
Despite the investigation's conclusion that the original behavior was relatively low risk, Anthropic moved quickly to address it. A new improved safety classifier was trained specifically to target and block the technique described in the Amazon report.
The new classifier blocks the specific technique in over 99% of cases. In a very small fraction of cases, it may provide information that is not detailed enough to meaningfully assist a cyberattacker. Users are notified when a request is blocked and the query is instead handled by Claude Opus 4.8 — the same fallback mechanism used across Fable 5's original safety architecture.
The US Department of Commerce's Center for AI Standards and Innovation (CAISI) tested both the prior safeguards and the new classifier and agreed that both are "extraordinarily strong."
The fix comes with a documented cost: the new classifier flags benign requests more frequently during routine coding and debugging tasks. Anthropic describes this as an expected tradeoff and states the intention to continue refining the classifier to better distinguish genuine misuse from legitimate requests over time.
How Safety Classifiers Actually Work: The Clearest Explanation Yet
The redeployment announcement contains the most detailed public explanation Anthropic has ever published of how its safety classifiers function, and it is worth understanding in full.
Defense in Depth
No single safety mechanism is sufficient against determined misuse. Anthropic's approach — called defense in depth — layers multiple safeguards, each imperfect individually but significantly stronger in combination. Some defenses are trained into the model itself. Others analyze patterns of misuse after the fact. Classifiers are one critical layer in the middle.
What Classifiers Do
Classifiers are smaller, specialized AI systems that run during an interaction and evaluate whether the model is being asked to perform a potentially harmful cybersecurity task, or whether the model is producing potentially harmful output. When they detect a potential violation, they block the model from responding. The goal is to prevent the model from engaging in uniquely dangerous behaviors — not to block everything that touches cybersecurity.
The Safety Margin
Classifiers are imperfect. They can miss dangerous content, and they can be tricked by unusual prompting patterns. To account for this, Anthropic deliberately sets classifiers to trigger on some requests that are probably benign but carry a small chance of being harmful. This "safety margin" means a request needs to look very clearly safe — not just probably safe — to avoid triggering the classifier.
Users experience the safety margin as the model refusing some reasonable, non-harmful requests. This is intentional. Fable 5's safety margin was set significantly wider than any previous Anthropic model, meaning more benign requests get blocked — but fewer genuinely harmful requests slip through.
The Five-Type Jailbreak Spectrum
The announcement describes jailbreaks across five categories that map directly to how dangerous each type is:
Minor jailbreaks intrude into the safety margin but never reach core harmful behaviors. The model responds to something it technically should have refused, but the response itself is not dangerous. The Amazon report's technique fell into this category.
Narrow harmful jailbreaks breach the classifiers and unlock a specific harmful behavior. The narrowness of the technique limits how useful it is to an attacker — only one type of harmful output becomes accessible, not a broad range.
Universal jailbreaks are the most concerning category. A single technique unlocks a wide range of harmful behaviors across many different prompting contexts. As of the writing of this announcement, no universal jailbreaks for Fable 5 have been discovered.
Anthropic acknowledges directly that making any AI model fully impervious to jailbreaks is probably impossible. The goal is to make successful jailbreaks costly, high-effort, and narrow in scope — and to ensure Anthropic and its safety partners find major jailbreaks before malicious actors can use them at scale.
Redeployment: Who Gets Access and When
Claude Fable 5 returned on July 1, 2026 across Claude Platform, Claude.ai, Claude Code, and Claude Cowork. For Pro, Max, Team, and select Enterprise plan users, Fable 5 is included for up to 50% of weekly usage limits through July 7, after which it moves to usage credits. Re-enablement on AWS, Google Cloud, and Microsoft Foundry is underway as quickly as infrastructure allows.
Claude Mythos 5 access was restored for a set of US organizations following government approval on June 26. Anthropic is continuing to coordinate with the government to expand access back to the broader set of domestic and international Glasswing partners.
The Industry Framework: The Most Consequential Outcome of the Shutdown
If the immediate story is about one AI model being shut down and restored, the longer-term story is about what the shutdown revealed.
There is currently no consensus in the AI industry for describing, in objective terms, how serious any given AI jailbreak is. When Amazon researchers found a bypass in Fable 5, there was no shared standard against which to evaluate whether the government should act, how urgently Anthropic should respond, or how to communicate the finding to everyone involved. The government acted on the finding without the benefit of a common framework for assessing it. Anthropic assessed it as minor. The absence of shared language between these two parties was itself a problem.
Together with Amazon, Microsoft, Google, and other Glasswing partners, Anthropic is now developing a consensus framework for jailbreak severity. Other industry partners and model providers are openly invited to participate.
The Four-Criterion Scoring System
The proposed framework scores any jailbreak on four dimensions:
Capability gain measures how far beyond existing tools the jailbreak takes an attacker. If other widely available models — including weaker AI systems — can already produce the same output, the capability gain is low. If the jailbreak unlocks capabilities that significantly accelerate even domain experts beyond what any other tool provides, the capability gain is high.
Breadth of capability gain asks whether the same technique works for many different offensive tasks or only a narrow specific one. A technique that only allows one type of harmful output scores low. A technique that works across many different attack types scores high.
Ease of weaponization assesses how much skilled effort is required to turn the jailbreak into a working attack. Jailbreaks that require extensive skilled prompting and many attempts before succeeding score low. Jailbreaks that work on the first or second try without specialized expertise score high.
Discoverability measures how accessible the technique already is. If it requires specialist knowledge to find and apply, it scores low. If it is already circulating widely online, it scores high.
How Response Gets Calibrated
The proposed framework connects severity scores to response obligations. For the most severe class of jailbreaks — those with characteristics suggesting active, devastating impact on critical infrastructure like power grids or banking systems — Anthropic commits to immediately beginning preliminary mitigations upon confirmation of severity. A dedicated team will provide 24/7 monitoring of key jailbreak submission channels.
Alongside this framework, Anthropic is launching a new HackerOne program specifically for security researchers to submit potential cyber jailbreaks found in Fable 5 for Anthropic's review.
Four Commitments to the US Government
The episode produced four formal commitments to deepen Anthropic's collaboration with US government partners. These go beyond existing pre-deployment testing relationships and reflect the recognition — made explicit in the announcement — that government involvement in AI releases requires a durable, transparent process rather than ad hoc coordination.
Pre-release government access and evaluation: For models that materially advance the capability frontier in areas relevant to national security, designated government partners will receive expanded early access to both the models and the accompanying safeguards before broad release. They can run independent capability evaluations and test guardrails. Anthropic technical staff will work alongside government evaluators during these periods.
Rapid information sharing on safeguards: When significant jailbreaks or misuse patterns are identified, Anthropic will quickly investigate, triage, and notify appropriate government counterparts. New safeguards will be shared for independent government testing. Threat intelligence reports will be provided to government partners before public publication. Anthropic will participate in the interagency cybersecurity vulnerability clearinghouse established under Section 2(d) of the June 2 Executive Order.
Dedicated resources for joint research: Dedicated Anthropic teams will work on shared government priorities. A significant compute allocation will be provided to support government testing and research. Anthropic's safety and red-teaming expertise will be made available to advance the state of AI evaluation more broadly.
A common industry bar: Anthropic will work with the government and industry peers toward a shared, voluntary security and evaluation standard for frontier model providers — contributing evaluations, tooling, and best practices that the government can apply consistently across the field.
What This Episode Reveals About the Future of AI Governance
The eighteen days between Fable 5's shutdown and restoration are worth reading as a preview of the relationship that is forming between frontier AI companies and governments — not a relationship that either side fully designed, but one that is being constructed in real time through events like this one.
The export control order was blunt by necessity: the government had one lever available (export controls) and applied it immediately when a concerning report arrived. The fact that the bypass turned out to be minor, and that the same behavior was reproducible in models far less capable than Fable 5, did not change the government's initial response because there was no framework for making that assessment quickly and credibly.
The industry framework being developed through Anthropic, Amazon, Microsoft, and Google is an attempt to build that assessment infrastructure before the next incident — not after. A shared language for describing jailbreak severity allows governments to calibrate responses proportionally, AI companies to communicate findings clearly, and the relationship between them to function on something more reliable than case-by-case judgment calls.
Anthropic's statement on this is direct: these rules should be codified in strong regulation and applied equally across frontier model developers. The company is not asking to be left alone. It is asking for a process — a durable, transparent one — that gives cyber defenders and others the certainty they need about access to powerful models.
Final Takeaway
The shutdown and restoration of Claude Fable 5 is the most significant single episode in the history of AI governance to date. Not because the specific jailbreak was dangerous — the investigation showed it was not — but because it demonstrated, in live conditions, every structural gap that currently exists in the relationship between frontier AI capability and government oversight.
The absence of a common jailbreak severity framework meant an ambiguous finding triggered a maximally disruptive response. The absence of a pre-agreed export control process meant the shutdown had to be global and immediate rather than targeted and proportionate. The absence of formal pre-release government evaluation meant the government was responding to a report rather than participating in a prepared process.
The four government commitments, the industry jailbreak framework, the new HackerOne program, and the new safety classifier that came out of the episode are all attempts to build the infrastructure that would have made this disruption avoidable. Whether they succeed will depend on whether the rest of the industry joins the framework effort and whether the government follows through on formalizing the collaboration commitments into the durable regulation Anthropic is calling for.
The models are back. The harder work of building the governance layer around them is just beginning.
