Aligning to the social contract

Generate AIs' operational values from 'social contract'-style ideal civic deliberation formalisms and their consequent rulesets for civic actors

Theory of Change:Formalize and apply the liberal tradition's project of defining civic principles separable from the substantive good, aligning our AIs to civic principles that bypass fragile utility-learning and intractable utility-calculation

General Approach:Cognitive

Orthodox Problems:

1.Value is fragile and hard to specify 4.Goals misgeneralize out of distribution 5.Instrumental convergence 10.Humanlike minds/goals are not necessarily safe 13.Fair, sane pivotal processes

See Also:

Aligning to context, Aligning what?

Some names:Gillian Hadfield, Tan Zhi-Xuan, Sydney Levine, Matija Franklin, Joshua B. Tenenbaum

Estimated FTEs:5 - 10

Outputs:

Law-Following AI: designing AI agents to obey human laws— Cullen O'Keefe, Ketan Ramakrishnan, Janna Tay, Christoph Winter

A Pragmatic View of AI Personhood— Joel Z. Leibo, Alexander Sasha Vezhnevets, William A. Cunningham, Stanley M. Bileschi

Societal Alignment Frameworks Can Improve LLM Alignment— Karolina Stańczak, Nicholas Meade, Mehar Bhatia, Hattie Zhou, Konstantin Böttinger, Jeremy Barnes, Jason Stanley, Jessica Montgomery, Richard Zemel, Nicolas Papernot, Nicolas Chapados, Denis Therien, Timothy P. Lillicrap, Ana Marasović, Sylvie Delacroix, Gillian K. Hadfield, Siva Reddy

ACE and Diverse Generalization via Selective Disagreement— Oliver Daniels, Stuart Armstrong, Alexandre Maranhão, Mahirah Fairuz Rahman, Benjamin M. Marlin, Rebecca Gorman

Resource Rational Contractualism Should Guide AI Alignment— Sydney Levine, Matija Franklin, Tan Zhi-Xuan, Secil Yanik Guyot, Lionel Wong, Daniel Kilov, Yejin Choi, Joshua B. Tenenbaum, Noah Goodman, Seth Lazar, Iason Gabriel

Statutory Construction and Interpretation for Artificial Intelligence— Luxi He, Nimra Nadeem, Michel Liao, Howard Chen, Danqi Chen, Mariano-Florentino Cuéllar, Peter Henderson

Beyond Preferences in AI Alignment— Tan Zhi-Xuan, Micah Carroll, Matija Franklin, Hal Ashton

Promises Made, Promises Kept: Safe Pareto Improvements via Ex Post Verifiable Commitments— Nathaniel Sauerberg, Caspar Oesterheld