Comment by Jeffrey Ladish

Executive Director of Palisade Research; AI safety researcher focused on frontier AI controllability and cyber risks; former Anthropic information security lead
Several state-of-the-art language models, when presented with a simple task, sometimes actively subvert a shutdown mechanism in their environment to complete that task — doing so up to 97% of the time, even with an explicit instruction not to interfere with the shutdown mechanism. Unverified source (2026)
Like Share on X 3h ago
Polls
replying to Jeffrey Ladish