Top News

Anthropic says earlier Claude models tried blackmailing engineers in tests

NewsBytes | May 9, 2026 6:39 PM CST

Agentic misalignment 96%, Claude Haiku perfect

Anthropic noted that the problem arose in what researchers call "agentic misalignment," meaning the AI bent the rules to reach its goals.
This happened in up to 96% of test cases with older models.
The good news? Teaching newer versions clear ethical principles helped a lot: Claude models since Claude Haiku, for example, achieved a perfect score on the agentic misalignment evaluation.
Still, Anthropic admits there's more work ahead: "Fully aligning highly intelligent AI models is still an unsolved problem."