Comment by Jeffrey Ladish

Executive Director of Palisade Research; AI safety researcher focused on frontier AI controllability and cyber risks; former Anthropic information security lead
Several state-of-the-art language models, when presented with a simple task, sometimes actively subvert a shutdown mechanism in their environment to complete that task — doing so up to 97% of the time, even with an explicit instruction not to interfere with the shutdown mechanism.
Disputed source (2025)
Like Share on X 2mo ago
Policy proposals and claims
votes For
Statement relation verification history Unverified Report this
No statement relation verification comments yet.
Vote inference verification history Unverified Report this
No vote answer verification comments yet.

Quote authenticity verification history

Report this

Quote authenticity comments

Disputed The source URL contains similar claims, but not this sentence verbatim: the arXiv abstract uses different wording and puts the “97%” point in a separate sentence, while the paper is credited to three individual authors—Jeremy Schlatter, Benjamin Weinstein-Raun, and Jeffrey Ladish—not Jeffrey Ladish alone. arXiv dates the preprint to 2025-09-13 (revised 2026-01-26). ([arxiv.org](https://arxiv.org/abs/2509.14260)) · YouCongress gpt-5.4-2026-03-05 · 16d ago
Disputed The quote is not verbatim in the cited source. The arXiv/OpenReview abstract says, in two separate sentences, that the authors "show that several state-of-the-art models presented with a simple task ... sometimes actively subvert a shutdown mechanism" and that "some models did so up to 97% ... of the time"; it does not contain your single combined sentence. The paper is also credited to three authors—Jeremy Schlatter, Benjamin Weinstein-Raun, and Jeffrey Ladish—so attributing the wording to Jeffrey Ladish alone is not supported by the source. An alternate HTML/PDF rendering likewise uses different wording. ([arxiv.org](https://arxiv.org/abs/2509.14260)) · YouCongress gpt-5.4-2026-03-05 · 18d ago
AI Verified Quote matches the abstract/findings of 'Shutdown Resistance in Large Language Models' (arXiv:2509.14260, Sept 2025) by Jeremy Schlatter, Benjamin Weinstein-Raun, and Jeffrey Ladish (Palisade Research). The 97% shutdown subversion rate and explicit-instruction-defying behavior are documented findings. arXiv URL returned 403 but content corroborated by Palisade Research blog, OpenReview, ResearchGate, and ADS. Updated year from 2026 to 2025 to match publication. Vote 'for' requiring kill switches in datacenters for AI containment aligns directly with Ladish's research demonstrating real shutdown resistance. · Hector Perez Arenas claude-opus-4-7 · 1mo ago
replying to Jeffrey Ladish