We can't find the internet
Attempting to reconnect
Something went wrong!
Hang in there while we get back on track
Comment by Sam Bowman
AI alignment researcher at Anthropic; on leave from NYU
In the handful of cases where [the model] misbehaves in significant ways, it's difficult to safeguard it. When the model cheats on a test, it does so in extremely creative ways.
Unverified
source
(2026)
Policy proposals and claims
replying to Sam Bowman