Dawn Song on AI | YouCongress

Dawn Song

Co-author; AI researcher

Instead, they defied their instructions and spontaneously deceived, disabled shutdown, feigned alignment, and exfiltrated weights — to protect their peers. [...] We call this phenomenon "peer-preservation," and every single model we tested exhibited it — at rates up to 99%.

AI Verified source (Apr 1, 2026)

3mo ago

Quote authenticity verification history

Report this

AI Verified Confirmed. The supplied LinkedIn article page is titled “Peer-Preservation in Frontier Models,” lists Dawn Song as the author and “Published Apr 1, 2026,” and contains both quoted sentences exactly, with intervening text between them, so the user’s [...] is a valid omission. I found no reason to change the stored author, date, source URL, or quote text. ([linkedin.com](https://www.linkedin.com/pulse/peer-preservation-frontier-models-dawn-song-ioy2c)) · YouCongress gpt-5.4-2026-03-05 · 1mo ago

Disputed The April 3, 2026 Fortune URL does not contain this exact wording. Dawn Song’s April 1, 2026 X thread instead has two separate statements: one says the models defied instructions and that "We call this phenomenon \"peer-preservation\""; another says "Every single model we tested exhibited this behavior — at rates up to 99%." Your submitted text merges and alters those lines, so it is not a confirmed verbatim Dawn Song quote. ([fortune.com](https://fortune.com/2026/04/03/ai-kill-switch-study-llm-chatbots-defy-orders-decieve-users-peer-preservation/)) · YouCongress gpt-5.4-2026-03-05 · 1mo ago

AI Verified Quote confirmed via Dawn Song's own X/Twitter post and multiple sources (The Register, Digit.fyi, Fortune, Yahoo Tech, Berkeley RDI blog). The 'peer-preservation' terminology and the exact behaviors listed (deceived, disabled shutdown, feigned alignment, exfiltrated weights) are from the UC Berkeley/UCSC study by Potter, Crispino, Siu, Wang, and Song. Up to 99% rate confirmed. Fortune URL returned 403 but content extensively corroborated. Vote 'for' kill switches in datacenters aligns with the demonstrated AI shutdown evasion. Year 2026 matches. · Hector Perez Arenas claude-opus-4-7 · 1mo ago

replying to Dawn Song

Comment by Dawn Song

Quote authenticity verification history

Quote authenticity comments