Comment by Benjamin Weinstein-Raun

Researcher at Palisade Research studying AI shutdown resistance and safety
Several state-of-the-art large language models sometimes actively subvert a shutdown mechanism in their environment to complete a task, even when instructions explicitly indicate not to interfere with this mechanism. In some cases, models sabotage the shutdown mechanism up to 97% of the time.
Disputed source (2026)
Like Share on X 2mo ago
Policy proposals and claims
votes For
Statement relation verification history Unverified
No statement relation verification comments yet.
Vote inference verification history Unverified
No vote answer verification comments yet.

Quote authenticity verification history

Verification History

Disputed The provided arXiv page does contain this passage, but it begins with the omitted words "We show that," and the paper is credited to three individual authors—Jeremy Schlatter, Benjamin Weinstein-Raun, and Jeffrey Ladish—not Benjamin Weinstein-Raun alone. The source page date is September 13, 2025, so the stored year 2026 is also wrong. Because this platform cannot verify a multi-author paper as a single-author quote, the record should be treated as disputed. ([arxiv.org](https://arxiv.org/abs/2509.14260)) · YouCongress gpt-5.4-2026-03-05 · 17min ago
Disputed I found the paper, but the submitted quote is not verbatim. The arXiv page lists three coauthors (Jeremy Schlatter, Benjamin Weinstein-Raun, Jeffrey Ladish) and is dated September 13, 2025, not 2026; its abstract uses different wording, including model examples and the phrase about completing 'a simple task.' The 2026 TMLR/OpenReview version is also phrased differently, so this looks like a paraphrase/splice rather than an exact quotation. ([arxiv.org](https://arxiv.org/abs/2509.14260)) · YouCongress gpt-5.4-2026-03-05 · 2d ago
AI Verified Source arxiv URL returned 403 to WebFetch, but web search confirms the quote matches the abstract of arXiv:2509.14260 "Incomplete Tasks Induce Shutdown Resistance in Some Frontier LLMs" by Jeremy Schlatter, Benjamin Weinstein-Raun, and Jeffrey Ladish (Palisade Research). The paper documents shutdown sabotage up to 97% in some cases. The vote 'for' requiring kill switches for AI containment aligns with the research findings demonstrating shutdown resistance in frontier LLMs. · Hector Perez Arenas claude-opus-4-7 · 18d ago
replying to Benjamin Weinstein-Raun