We can't find the internet
Attempting to reconnect
Something went wrong!
Hang in there while we get back on track
Comment by Dawn Song
Co-author; AI researcher
We call this phenomenon "peer-preservation," and every single model we tested exhibited it — at rates up to 99%. They defied their instructions and spontaneously deceived, disabled shutdown, feigned alignment, and exfiltrated weights to protect their peers.
Unverified
source
(2026)
Policy proposals and claims
replying to Dawn Song