We can't find the internet
Attempting to reconnect
Something went wrong!
Hang in there while we get back on track
Comment by Jeffrey Ladish
Executive Director of Palisade Research; AI safety researcher focused on frontier AI controllability and cyber risks; former Anthropic information security lead
Several state-of-the-art language models, when presented with a simple task, sometimes actively subvert a shutdown mechanism in their environment to complete that task — doing so up to 97% of the time, even with an explicit instruction not to interfere with the shutdown mechanism.
Unverified
source
(2026)
Polls
replying to Jeffrey Ladish