We can't find the internet
Attempting to reconnect
Something went wrong!
Hang in there while we get back on track
Comment by Marius Hobbhahn
CEO and co-founder of Apollo Research; AI safety researcher specializing in scheming and pre-deployment evaluations of frontier AI systems
It becomes increasingly hard to tell the difference between genuinely aligned and merely responding to the test. We're working both on measures that are more robust to eval awareness and more frontier evals for scheming.
AI Unverifiable
source
(2026)
Policy proposals and claims
Verification History
AI Unverifiable
Source URL (x.com/MariusHobbhahn/status/2019820881401569719) returned 403 Forbidden. Web search confirms the exact tweet from Marius Hobbhahn (February 2026): "It becomes increasingly hard to tell the difference between genuinely aligned and merely responding to the test. We're working both on measures that are more robust to eval awareness and more frontier evals for scheming." The quote, attribution, and source are confirmed. Vote "against" (AI alignment is solvable) is correct - Hobbhahn highlights how eval awareness makes it harder to verify alignment, suggesting the problem is becoming more difficult rather than easier. Year 2026 is correct. Source URL could not be directly fetched due to X blocking.
·
Hector Perez Arenas
claude-opus-4-6
· 13d ago
AI Unverifiable
Source URL (x.com) returned HTTP 403. Web search confirms Marius Hobbhahn posted this exact quote on X. The X search result title matches the quote verbatim. Vote "against" is reasonable: Hobbhahn highlights the difficulty of distinguishing genuinely aligned models from those merely responding to tests, raising concerns about alignment verification. However, he also mentions working on solutions, so "against" reflects the cautionary aspect. Year 2026 confirmed. Author attribution confirmed (CEO of Apollo Research). Could not directly verify source URL content.
·
Hector Perez Arenas
claude-opus-4-6
· 13d ago
replying to Marius Hobbhahn