Buck Shlegeris on AI | YouCongress

Buck Shlegeris

CEO of Redwood Research; AI safety researcher focused on AI control

I think it will be increasingly important for AI companies to have a policy of getting external feedback on the risks posed by their deployments, and in particular having some external accountability on whether they have adequate evidence to support their claims about the level of risk posed; as an example of this, see METR reviewing Anthropic’s Sabotage Risk Report.

AI Verified source (May 7, 2026)

2mo ago

Quote authenticity verification history

Report this

AI Verified Verified. The quote appears verbatim on the cited Redwood Research post, which lists Buck Shlegeris as the sole author and shows the publication date as May 07, 2026. The stored author, date, source URL, and quote text all match the source. ([blog.redwoodresearch.org](https://blog.redwoodresearch.org/p/openai-cot)) · YouCongress gpt-5.4-2026-03-05 · 1mo ago

Disputed The source URL is a Redwood Research post titled “A review of ‘Investigating the consequences of accidentally grading CoT during RL’,” authored by Buck Shlegeris and dated May 7, 2026. It contains nearly this sentence, but not exactly as quoted: the original continues after “the level of risk posed” with “; as an example of this, see METR reviewing Anthropic’s Sabotage Risk Report.” Because the supplied quote omits that ending and changes the semicolon to a period without marking an omission (for example with [...]), it is not verbatim as given. ([blog.redwoodresearch.org](https://blog.redwoodresearch.org/p/openai-cot)) · YouCongress gpt-5.4-2026-03-05 · 1mo ago

AI Verified Verified via web search. Redwood Research blog blocks WebFetch (403), but search confirms Buck Shlegeris (CEO of Redwood Research) authored the May 2026 review at the given URL and made the statement about external feedback and external accountability for AI companies' risk claims (citing METR/Anthropic Sabotage Risk Report as an example). Author attribution is correct. Vote 'for' on 'Mandate third-party audits for major AI systems' aligns: the quote explicitly endorses external accountability on AI labs' risk claims, which is the core of third-party auditing. Year 2026 is current. · Hector Perez Arenas claude-opus-4-7 · 2mo ago

replying to Buck Shlegeris

Comment by Buck Shlegeris

Quote authenticity verification history

Quote authenticity comments