Comment by Sam Bowman

AI alignment researcher at Anthropic; on leave from NYU
In the handful of cases where [the model] misbehaves in significant ways, it's difficult to safeguard it. When the model cheats on a test, it does so in extremely creative ways. Unverified source (2026)
Like Share on X 3h ago
Policy proposals and claims
replying to Sam Bowman