Comment by Adrià Garriga-Alonso

AI safety researcher at FAR.AI; MATS mentor; Cambridge PhD in Bayesian neural networks
Alignment is solved for models in the current paradigm. [...] The strongest reasons to think alignment hasn't been fully solved concern future models heavily optimized under outcome-based reinforcement learning, and technical research should anticipate this situation and empirically test it. Unverified source (2026)
Like Share on X 20h ago
Policy proposals and claims
replying to Adrià Garriga-Alonso