Shallow Review of Technical AI Safety, 2025

Aligning to the social contract

Generate AIs' operational values from 'social contract'-style ideal civic deliberation formalisms and their consequent rulesets for civic actors
Theory of Change:Formalize and apply the liberal tradition's project of defining civic principles separable from the substantive good, aligning our AIs to civic principles that bypass fragile utility-learning and intractable utility-calculation
General Approach:Cognitive
Some names:Gillian Hadfield, Tan Zhi-Xuan, Sydney Levine, Matija Franklin, Joshua B. Tenenbaum
Estimated FTEs:5 - 10
Outputs:
Law-Following AI: designing AI agents to obey human lawsCullen O'Keefe, Ketan Ramakrishnan, Janna Tay, Christoph Winter
A Pragmatic View of AI PersonhoodJoel Z. Leibo, Alexander Sasha Vezhnevets, William A. Cunningham, Stanley M. Bileschi
Societal Alignment Frameworks Can Improve LLM AlignmentKarolina Stańczak, Nicholas Meade, Mehar Bhatia, Hattie Zhou, Konstantin Böttinger, Jeremy Barnes, Jason Stanley, Jessica Montgomery, Richard Zemel, Nicolas Papernot, Nicolas Chapados, Denis Therien, Timothy P. Lillicrap, Ana Marasović, Sylvie Delacroix, Gillian K. Hadfield, Siva Reddy
ACE and Diverse Generalization via Selective DisagreementOliver Daniels, Stuart Armstrong, Alexandre Maranhão, Mahirah Fairuz Rahman, Benjamin M. Marlin, Rebecca Gorman
Resource Rational Contractualism Should Guide AI AlignmentSydney Levine, Matija Franklin, Tan Zhi-Xuan, Secil Yanik Guyot, Lionel Wong, Daniel Kilov, Yejin Choi, Joshua B. Tenenbaum, Noah Goodman, Seth Lazar, Iason Gabriel
Statutory Construction and Interpretation for Artificial IntelligenceLuxi He, Nimra Nadeem, Michel Liao, Howard Chen, Danqi Chen, Mariano-Florentino Cuéllar, Peter Henderson
Beyond Preferences in AI AlignmentTan Zhi-Xuan, Micah Carroll, Matija Franklin, Hal Ashton