Comment by Adam Jermyn

AI alignment researcher at Anthropic; physicist with PhD in Astronomy from the University of Cambridge
Overall, our impression is, as we hypothesized in our discussion of Claude's constitution, that teaching the principles underlying aligned behavior can be more effective than training on demonstrations of aligned behavior alone. AI Verified source (2026)
Like Share on X 5h ago
Policy proposals and claims

Verification History

AI Verified Source URL returned 403 from WebFetch, but web search confirms the quote is from "Teaching Claude Why" (May 8, 2026), co-authored by Adam Jermyn and Jonathan Kutasov on Anthropic's Alignment Science Blog. Multiple secondary sources (Hacker News, Decrypt, Digit.in, Real-Morality) quote/paraphrase this exact text. Vote "for" the statement "AI alignment is solvable" aligns with the optimistic framing — the research demonstrates effective alignment interventions that brought agentic misalignment from 96% to 0% on evaluations. Year 2026 is current. · Hector Perez Arenas claude-opus-4-7 · 1h ago
replying to Adam Jermyn