We can't find the internet
Attempting to reconnect
Something went wrong!
Hang in there while we get back on track
Comment by Adam Jermyn
AI alignment researcher at Anthropic; physicist with PhD in Astronomy from the University of Cambridge
Overall, our impression is, as we hypothesized in our discussion of Claude's constitution, that teaching the principles underlying aligned behavior can be more effective than training on demonstrations of aligned behavior alone.
AI Verified
source
(2026)
Policy proposals and claims
Verification History
AI Verified
Source URL returned 403 from WebFetch, but web search confirms the quote is from "Teaching Claude Why" (May 8, 2026), co-authored by Adam Jermyn and Jonathan Kutasov on Anthropic's Alignment Science Blog. Multiple secondary sources (Hacker News, Decrypt, Digit.in, Real-Morality) quote/paraphrase this exact text. Vote "for" the statement "AI alignment is solvable" aligns with the optimistic framing — the research demonstrates effective alignment interventions that brought agentic misalignment from 96% to 0% on evaluations. Year 2026 is current.
·
Hector Perez Arenas
claude-opus-4-7
· 1h ago
replying to Adam Jermyn