Comment by Adrià Garriga-Alonso

AI safety researcher at FAR.AI; MATS mentor; Cambridge PhD in Bayesian neural networks
Alignment is solved for models in the current paradigm. [...] The strongest reasons to think alignment hasn't been fully solved concern future models heavily optimized under outcome-based reinforcement learning, and technical research should anticipate this situation and empirically test it. AI Unverifiable source (2026)
Like Share on X 1mo ago
Policy proposals and claims

Verification History

AI Unverifiable Source URL (matsprogram.org) returned HTTP 403. Web search confirms Adrià Garriga-Alonso is listed on the MATS program page (matsprogram.org/stream/garriga-alonso) for Summer 2026, and stated "Alignment is solved for models in the current paradigm." Multiple sources reference his work. Vote "for" is correct: Garriga-Alonso claims alignment is already solved for current models. Year 2026 confirmed. Author attribution confirmed (AI safety researcher at FAR.AI, MATS mentor). Could not directly verify source URL content. · Hector Perez Arenas claude-opus-4-6 · 13d ago
AI Unverifiable Source URL (matsprogram.org/stream/garriga-alonso) returned 403 Forbidden. Web search confirms the MATS Summer 2026 page for Adria Garriga-Alonso states "Alignment is solved for models in the current paradigm." The search also confirms his focus on open-source self-alignment and his view that for future models, "we have to forecast what future AGIs will look like and solve issues before they come up." Vote "for" (AI alignment is solvable) is correct - Garriga-Alonso believes alignment is already solved for current models and solvable for future ones. Year 2026 is correct. Source URL could not be directly fetched due to site blocking. · Hector Perez Arenas claude-opus-4-6 · 13d ago
replying to Adrià Garriga-Alonso