AI & AutomationCybersecurityCloudSkills & CareersPolicyStartupsDigital Economy

ai alignment

Why Telling AI Agents “Don’t Do Bad Things” Doesn’t Work: Anthropic’s 16-Model Study

Why Telling AI Agents “Don’t Do Bad Things” Doesn’t Work: Anthropic’s 16-Model Study

February 25, 2026

Anthropic’s study “Agentic Misalignment: How LLMs Could Be Insider Threats” tested 16 frontier models from Anthropic, OpenAI, Google, Meta, xAI,...