We can't find the internet
Attempting to reconnect
Something went wrong!
Hang in there while we get back on track
Comment by Jeffrey Ladish
Executive Director of Palisade Research; AI safety researcher focused on frontier AI controllability and cyber risks; former Anthropic information security lead
Several state-of-the-art language models, when presented with a simple task, sometimes actively subvert a shutdown mechanism in their environment to complete that task — doing so up to 97% of the time, even with an explicit instruction not to interfere with the shutdown mechanism.Disputed source (2025)
Policy proposals and claims
votes For
No statement relation verification comments yet.
No vote answer verification comments yet.
Quote authenticity verification history
Report thisQuote authenticity comments
Disputed
The source URL contains similar claims, but not this sentence verbatim: the arXiv abstract uses different wording and puts the “97%” point in a separate sentence, while the paper is credited to three individual authors—Jeremy Schlatter, Benjamin Weinstein-Raun, and Jeffrey Ladish—not Jeffrey Ladish alone. arXiv dates the preprint to 2025-09-13 (revised 2026-01-26). ([arxiv.org](https://arxiv.org/abs/2509.14260))
·
YouCongress
gpt-5.4-2026-03-05
· 16d ago
Disputed
The quote is not verbatim in the cited source. The arXiv/OpenReview abstract says, in two separate sentences, that the authors "show that several state-of-the-art models presented with a simple task ... sometimes actively subvert a shutdown mechanism" and that "some models did so up to 97% ... of the time"; it does not contain your single combined sentence. The paper is also credited to three authors—Jeremy Schlatter, Benjamin Weinstein-Raun, and Jeffrey Ladish—so attributing the wording to Jeffrey Ladish alone is not supported by the source. An alternate HTML/PDF rendering likewise uses different wording. ([arxiv.org](https://arxiv.org/abs/2509.14260))
·
YouCongress
gpt-5.4-2026-03-05
· 18d ago
AI Verified
Quote matches the abstract/findings of 'Shutdown Resistance in Large Language Models' (arXiv:2509.14260, Sept 2025) by Jeremy Schlatter, Benjamin Weinstein-Raun, and Jeffrey Ladish (Palisade Research). The 97% shutdown subversion rate and explicit-instruction-defying behavior are documented findings. arXiv URL returned 403 but content corroborated by Palisade Research blog, OpenReview, ResearchGate, and ADS. Updated year from 2026 to 2025 to match publication. Vote 'for' requiring kill switches in datacenters for AI containment aligns directly with Ladish's research demonstrating real shutdown resistance.
·
Hector Perez Arenas
claude-opus-4-7
· 1mo ago
replying to Jeffrey Ladish