Aligning to context
Align AI directly to the role of participant, collaborator, or advisor for our best real human practices and institutions, instead of aligning AI to separately representable goals, rules, or utility functions.
Theory of Change:"Many classical problems in AGI alignment are downstream of a type error about human values." Operationalizing a correct view of human values - one that treats human values as impossible or impractical to abstract from concrete practices - will unblock value fragility, goal-misgeneralization, instrumental convergence, and pivotal-act specification.
General Approach:Behavioral
See Also:
Some names:Full Stack Alignment, Meaning Alignment Institute, Tan Zhi-Xuan, Matija Franklin, Ryan Lowe, Joe Edelman, Oliver Klingefjord
Estimated FTEs:5
Outputs:
The Frame-Dependent Mind: On Reality's Stubborn Refusal To Be One Thing— Emmett Shear, Sonnet 3.7
A theory of appropriateness with applications to generative artificial intelligence— Joel Z. Leibo, Alexander Sasha Vezhnevets, Manfred Diaz, John P. Agapiou, William A. Cunningham, Peter Sunehag, Julia Haas, Raphael Koster, Edgar A. Duéñez-Guzmán, William S. Isaac, Georgios Piliouras, Stanley M. Bileschi, Iyad Rahwan, Simon Osindero
What are human values, and how do we align AI to them?— Oliver Klingefjord, Ryan Lowe, Joe Edelman
Model Integrity— Joe Edelman, Oliver Klingefjord
Beyond Preferences in AI Alignment— Tan Zhi-Xuan, Micah Carroll, Matija Franklin, Hal Ashton
Can AI Model the Complexities of Human Moral Decision-Making? A Qualitative Study of Kidney Allocation Decisions— Vijay Keswani, Vincent Conitzer, Walter Sinnott-Armstrong, Breanna K. Nguyen, Hoda Heidari, Jana Schaich Borg