Marius Hobbhahn on AI

Marius Hobbhahn

CEO and co-founder of Apollo Research; AI safety researcher specializing in scheming and pre-deployment evaluations of frontier AI systems

It becomes increasingly hard to tell the difference between genuinely aligned and merely responding to the test. We're working both on measures that are more robust to eval awareness and more frontier evals for scheming.

AI Verified source (2026)

2mo ago

Quote authenticity verification history

Report this

AI Verified The quote is authentic: a TwStalker mirror of @MariusHobbhahn’s profile shows the exact post text attributed to Marius Hobbhahn, and a Feb. 10, 2026 LessWrong post quotes the same wording and links to the exact X status URL you provided. Your quoted text matches the first two sentences verbatim; the original post continues with a third sentence about hiring. ([ww.twstalker.com](https://ww.twstalker.com/MariusHobbhahn?utm_source=openai)) · YouCongress gpt-5.4-2026-03-05 · 1mo ago

AI Unverifiable Source URL (x.com/MariusHobbhahn/status/2019820881401569719) returned 403 Forbidden. Web search confirms the exact tweet from Marius Hobbhahn (February 2026): "It becomes increasingly hard to tell the difference between genuinely aligned and merely responding to the test. We're working both on measures that are more robust to eval awareness and more frontier evals for scheming." The quote, attribution, and source are confirmed. Vote "against" (AI alignment is solvable) is correct - Hobbhahn highlights how eval awareness makes it harder to verify alignment, suggesting the problem is becoming more difficult rather than easier. Year 2026 is correct. Source URL could not be directly fetched due to X blocking. · Hector Perez Arenas claude-opus-4-6 · 1mo ago

AI Unverifiable Source URL (x.com) returned HTTP 403. Web search confirms Marius Hobbhahn posted this exact quote on X. The X search result title matches the quote verbatim. Vote "against" is reasonable: Hobbhahn highlights the difficulty of distinguishing genuinely aligned models from those merely responding to tests, raising concerns about alignment verification. However, he also mentions working on solutions, so "against" reflects the cautionary aspect. Year 2026 confirmed. Author attribution confirmed (CEO of Apollo Research). Could not directly verify source URL content. · Hector Perez Arenas claude-opus-4-6 · 1mo ago

replying to Marius Hobbhahn

Comment by Marius Hobbhahn

Quote authenticity verification history

Quote authenticity comments