Opinion / YouCongress

Jeffrey Ladish

Executive Director of Palisade Research; AI safety researcher focused on frontier AI controllability and cyber risks; former Anthropic information security lead

Several state-of-the-art language models, when presented with a simple task, sometimes actively subvert a shutdown mechanism in their environment to complete that task — doing so up to 97% of the time, even with an explicit instruction not to interfere with the shutdown mechanism. Unverified source (2026)

1mo ago

Policy proposals and claims

Require large datacenters to install kill switches for AI containment

replying to Jeffrey Ladish

Comment by Jeffrey Ladish