Comment by Adrià Garriga-Alonso

AI safety researcher at FAR.AI; MATS mentor; Cambridge PhD in Bayesian neural networks
Alignment is solved for models in the current paradigm. [...] The strongest reasons to think alignment hasn’t been solved yet hinge on future models which have been heavily optimized under outcome-based RL. Therefore, technical research on AI alignment should anticipate this situation and empirically test what will happen then.
AI Verified source (2026)
Like Share on X 1mo ago

Quote authenticity verification history

Report this

Quote authenticity comments

AI Verified The MATS page at the provided source URL is titled “Adrià Garriga-Alonso at MATS: Summer 2026,” identifies the author as Adrià Garriga-Alonso, and contains the first sentence verbatim at line 76 (“Alignment is solved for models in the current paradigm.”) plus the later passage verbatim at line 93 (“The strongest reasons to think alignment hasn’t been solved yet hinge on future models which have been heavily optimized under outcome-based RL. Therefore, technical research on AI alignment should anticipate this situation and empirically test what will happen then.”). Because your quote uses [...] only to omit intervening text from the same page, it is authentic and correctly attributed; the stored URL and year-level date are consistent with the source. ([matsprogram.org](https://www.matsprogram.org/stream/garriga-alonso)) · YouCongress gpt-5.4-2026-03-05 · 9d ago
Disputed The cited MATS Summer 2026 stream page is clearly attributed to Adrià Garriga-Alonso and contains the first sentence exactly, but the second excerpt does not appear verbatim there. The page instead refers to "future models" optimized under "outcome-based RL" and says research should "empirically test what will happen then"; your version substitutes different wording. Because the ellipsis joins real text to a paraphrase rather than preserving verbatim wording, this quote is materially altered. ([matsprogram.org](https://www.matsprogram.org/stream/garriga-alonso)) · YouCongress gpt-5.4-2026-03-05 · 11d ago
AI Unverifiable Source URL (matsprogram.org) returned HTTP 403. Web search confirms Adrià Garriga-Alonso is listed on the MATS program page (matsprogram.org/stream/garriga-alonso) for Summer 2026, and stated "Alignment is solved for models in the current paradigm." Multiple sources reference his work. Vote "for" is correct: Garriga-Alonso claims alignment is already solved for current models. Year 2026 confirmed. Author attribution confirmed (AI safety researcher at FAR.AI, MATS mentor). Could not directly verify source URL content. · Hector Perez Arenas claude-opus-4-6 · 1mo ago
AI Unverifiable Source URL (matsprogram.org/stream/garriga-alonso) returned 403 Forbidden. Web search confirms the MATS Summer 2026 page for Adria Garriga-Alonso states "Alignment is solved for models in the current paradigm." The search also confirms his focus on open-source self-alignment and his view that for future models, "we have to forecast what future AGIs will look like and solve issues before they come up." Vote "for" (AI alignment is solvable) is correct - Garriga-Alonso believes alignment is already solved for current models and solvable for future ones. Year 2026 is correct. Source URL could not be directly fetched due to site blocking. · Hector Perez Arenas claude-opus-4-6 · 1mo ago
replying to Adrià Garriga-Alonso