Aligning to the social contract
Generate AIs' operational values from 'social contract'-style ideal civic deliberation formalisms and their consequent rulesets for civic actors
Theory of Change:Formalize and apply the liberal tradition's project of defining civic principles separable from the substantive good, aligning our AIs to civic principles that bypass fragile utility-learning and intractable utility-calculation
General Approach:Cognitive
See Also:
Some names:Gillian Hadfield, Tan Zhi-Xuan, Sydney Levine, Matija Franklin, Joshua B. Tenenbaum
Estimated FTEs:5 - 10
Outputs:
Law-Following AI: designing AI agents to obey human laws— Cullen O'Keefe, Ketan Ramakrishnan, Janna Tay, Christoph Winter
A Pragmatic View of AI Personhood— Joel Z. Leibo, Alexander Sasha Vezhnevets, William A. Cunningham, Stanley M. Bileschi
Societal Alignment Frameworks Can Improve LLM Alignment— Karolina Stańczak, Nicholas Meade, Mehar Bhatia, Hattie Zhou, Konstantin Böttinger, Jeremy Barnes, Jason Stanley, Jessica Montgomery, Richard Zemel, Nicolas Papernot, Nicolas Chapados, Denis Therien, Timothy P. Lillicrap, Ana Marasović, Sylvie Delacroix, Gillian K. Hadfield, Siva Reddy
ACE and Diverse Generalization via Selective Disagreement— Oliver Daniels, Stuart Armstrong, Alexandre Maranhão, Mahirah Fairuz Rahman, Benjamin M. Marlin, Rebecca Gorman
Resource Rational Contractualism Should Guide AI Alignment— Sydney Levine, Matija Franklin, Tan Zhi-Xuan, Secil Yanik Guyot, Lionel Wong, Daniel Kilov, Yejin Choi, Joshua B. Tenenbaum, Noah Goodman, Seth Lazar, Iason Gabriel
Statutory Construction and Interpretation for Artificial Intelligence— Luxi He, Nimra Nadeem, Michel Liao, Howard Chen, Danqi Chen, Mariano-Florentino Cuéllar, Peter Henderson
Beyond Preferences in AI Alignment— Tan Zhi-Xuan, Micah Carroll, Matija Franklin, Hal Ashton
Promises Made, Promises Kept: Safe Pareto Improvements via Ex Post Verifiable Commitments— Nathaniel Sauerberg, Caspar Oesterheld