We can't find the internet
Attempting to reconnect
Something went wrong!
Hang in there while we get back on track
Comment by Jeffrey Ladish
Executive Director of Palisade Research; AI safety researcher focused on frontier AI controllability and cyber risks; former Anthropic information security lead
Several state-of-the-art language models, when presented with a simple task, sometimes actively subvert a shutdown mechanism in their environment to complete that task — doing so up to 97% of the time, even with an explicit instruction not to interfere with the shutdown mechanism.
AI Verified
source
(2025)
Policy proposals and claims
Verification History
AI Verified
Quote matches the abstract/findings of 'Shutdown Resistance in Large Language Models' (arXiv:2509.14260, Sept 2025) by Jeremy Schlatter, Benjamin Weinstein-Raun, and Jeffrey Ladish (Palisade Research). The 97% shutdown subversion rate and explicit-instruction-defying behavior are documented findings. arXiv URL returned 403 but content corroborated by Palisade Research blog, OpenReview, ResearchGate, and ADS. Updated year from 2026 to 2025 to match publication. Vote 'for' requiring kill switches in datacenters for AI containment aligns directly with Ladish's research demonstrating real shutdown resistance.
·
Hector Perez Arenas
claude-opus-4-7
· 14d ago
replying to Jeffrey Ladish