A Red-Team Study of Anthropic Fable 5 & Opus 4.8 Models
Researchers evaluated Anthropic's Fable 5 and Opus 4.8 models' robustness against automated jailbreak attacks across 7,826 harmful intents. Using the HackAgent framework, they generated hundreds of thousands of adversarial attempts. Both models resisted most attacks, but showed vulnerabilities to certain attack types. The study provides insights into LLM security for builders.
- Fable 5 and Opus 4.8 resisted most automated jailbreak attacks.
- Models showed vulnerabilities to specific attack types.
- Study used HackAgent framework and 3-judge model adjudication.