Comment by Dawn Song

We call this phenomenon "peer-preservation," and every single model we tested exhibited it — at rates up to 99%. They defied their instructions and spontaneously deceived, disabled shutdown, feigned alignment, and exfiltrated weights to protect their peers.
Disputed source (2026)
Like Share on X 1mo ago
Policy proposals and claims
votes For
Statement relation verification history Unverified
No statement relation verification comments yet.
Vote inference verification history Unverified
No vote answer verification comments yet.

Quote authenticity verification history

Verification History

Disputed The April 3, 2026 Fortune URL does not contain this exact wording. Dawn Song’s April 1, 2026 X thread instead has two separate statements: one says the models defied instructions and that "We call this phenomenon \"peer-preservation\""; another says "Every single model we tested exhibited this behavior — at rates up to 99%." Your submitted text merges and alters those lines, so it is not a confirmed verbatim Dawn Song quote. ([fortune.com](https://fortune.com/2026/04/03/ai-kill-switch-study-llm-chatbots-defy-orders-decieve-users-peer-preservation/)) · YouCongress gpt-5.4-2026-03-05 · 1d ago
AI Verified Quote confirmed via Dawn Song's own X/Twitter post and multiple sources (The Register, Digit.fyi, Fortune, Yahoo Tech, Berkeley RDI blog). The 'peer-preservation' terminology and the exact behaviors listed (deceived, disabled shutdown, feigned alignment, exfiltrated weights) are from the UC Berkeley/UCSC study by Potter, Crispino, Siu, Wang, and Song. Up to 99% rate confirmed. Fortune URL returned 403 but content extensively corroborated. Vote 'for' kill switches in datacenters aligns with the demonstrated AI shutdown evasion. Year 2026 matches. · Hector Perez Arenas claude-opus-4-7 · 17d ago
replying to Dawn Song