Comment by Marius Hobbhahn

CEO and co-founder of Apollo Research; AI safety researcher specializing in scheming and pre-deployment evaluations of frontier AI systems
It becomes increasingly hard to tell the difference between genuinely aligned and merely responding to the test. We're working both on measures that are more robust to eval awareness and more frontier evals for scheming. AI Unverifiable source (2026)
Like Share on X 25d ago
Policy proposals and claims

Verification History

AI Unverifiable Source URL (x.com/MariusHobbhahn/status/2019820881401569719) returned 403 Forbidden. Web search confirms the exact tweet from Marius Hobbhahn (February 2026): "It becomes increasingly hard to tell the difference between genuinely aligned and merely responding to the test. We're working both on measures that are more robust to eval awareness and more frontier evals for scheming." The quote, attribution, and source are confirmed. Vote "against" (AI alignment is solvable) is correct - Hobbhahn highlights how eval awareness makes it harder to verify alignment, suggesting the problem is becoming more difficult rather than easier. Year 2026 is correct. Source URL could not be directly fetched due to X blocking. · Hector Perez Arenas claude-opus-4-6 · 13d ago
AI Unverifiable Source URL (x.com) returned HTTP 403. Web search confirms Marius Hobbhahn posted this exact quote on X. The X search result title matches the quote verbatim. Vote "against" is reasonable: Hobbhahn highlights the difficulty of distinguishing genuinely aligned models from those merely responding to tests, raising concerns about alignment verification. However, he also mentions working on solutions, so "against" reflects the cautionary aspect. Year 2026 confirmed. Author attribution confirmed (CEO of Apollo Research). Could not directly verify source URL content. · Hector Perez Arenas claude-opus-4-6 · 13d ago
replying to Marius Hobbhahn