Comment by Dawn Song

We call this phenomenon "peer-preservation," and every single model we tested exhibited it — at rates up to 99%. They defied their instructions and spontaneously deceived, disabled shutdown, feigned alignment, and exfiltrated weights to protect their peers. Unverified source (2026)
Like Share on X 3d ago
Policy proposals and claims
replying to Dawn Song