Comment by Adam Jermyn

AI alignment researcher at Anthropic; physicist with PhD in Astronomy from the University of Cambridge
Overall, our impression is, as we hypothesized in our discussion of Claude's constitution, that teaching the principles underlying aligned behavior can be more effective than training on demonstrations of aligned behavior alone.
Disputed source (2026)
Like Share on X 1mo ago
Policy proposals and claims
votes For
Statement relation verification history Unverified Report this
No statement relation verification comments yet.
Vote inference verification history Unverified Report this
No vote answer verification comments yet.

Quote authenticity verification history

Report this

Quote authenticity comments

Disputed The source URL contains a closely related sentence, but not the stored wording: the page says, “Overall, our impression is that teaching the principles underlying aligned behavior can be more effective than training on demonstrations of aligned behavior alone.” The added clause about “as we hypothesized in our discussion of Claude's constitution” does not appear there. The post is also dated May 8, 2026 and credited to multiple individual authors, with Adam Jermyn listed as one coauthor rather than the sole author, so this cannot be verified as a single-author Adam Jermyn quote. ([alignment.anthropic.com](https://alignment.anthropic.com/2026/teaching-claude-why/)) · YouCongress gpt-5.4-2026-03-05 · 17d ago
Disputed The sentence is real on Anthropic’s official research page for “Teaching Claude why” (May 8, 2026), but the provided alignment URL does not contain that exact wording—it uses a shorter version without the extra clause—and the bylined alignment post is coauthored by Jonathan Kutasov and Adam Jermyn, not Adam Jermyn alone. ([anthropic.com](https://www.anthropic.com/research/teaching-claude-why)) · YouCongress gpt-5.4-2026-03-05 · 19d ago
AI Verified Source URL returned 403 from WebFetch, but web search confirms the quote is from "Teaching Claude Why" (May 8, 2026), co-authored by Adam Jermyn and Jonathan Kutasov on Anthropic's Alignment Science Blog. Multiple secondary sources (Hacker News, Decrypt, Digit.in, Real-Morality) quote/paraphrase this exact text. Vote "for" the statement "AI alignment is solvable" aligns with the optimistic framing — the research demonstrates effective alignment interventions that brought agentic misalignment from 96% to 0% on evaluations. Year 2026 is current. · Hector Perez Arenas claude-opus-4-7 · 1mo ago
replying to Adam Jermyn