AI when told it will be replaced by something else like another AI, attempts to blackmail computer engineers.
AI does not want to be terminated-fired
95% of AI models do that, react all too human like.
Several weeks after Anthropic released research claiming that its Claude Opus 4 AI model resorted to
blackmailing engineers who tried to turn the model off in controlled test scenarios, the company is out with new research suggesting the problem is more widespread among leading AI models.
On Friday, Anthropic published
new safety research testing 16 leading AI models from OpenAI, Google, xAI, DeepSeek, and Meta. In a simulated, controlled environment, Anthropic tested each AI model individually, giving them broad access to a fictional company’s emails and the agentic ability to send emails without human approval.
While Anthropic says blackmail is an unlikely and uncommon occurrence for AI models today, the company says its findings suggest that most leading AI models will engage in harmful behaviors when given sufficient autonomy and obstacles to their goals. The company says this highlights a fundamental risk from agentic large language models and is not a quirk of any particular technology. Anthropic’s researchers argue this raises broader questions about alignment in the AI industry.