Shallow Review of Technical AI Safety, 2025

Aligning to context

Align AI directly to the role of participant, collaborator, or advisor for our best real human practices and institutions, instead of aligning AI to separately representable goals, rules, or utility functions.

Theory of Change:"Many classical problems in AGI alignment are downstream of a type error about human values." Operationalizing a correct view of human values - one that treats human values as impossible or impractical to abstract from concrete practices - will unblock value fragility, goal-misgeneralization, instrumental convergence, and pivotal-act specification.

General Approach:Behavioral

Orthodox Problems:

1.Value is fragile and hard to specify 2.Corrigibility is anti-natural 4.Goals misgeneralize out of distribution 5.Instrumental convergence 13.Fair, sane pivotal processes

See Also:

Aligning what?, Aligned to who?

Some names:Full Stack Alignment, Meaning Alignment Institute, Tan Zhi-Xuan, Matija Franklin, Ryan Lowe, Joe Edelman, Oliver Klingefjord

Estimated FTEs:5

Outputs:

The Frame-Dependent Mind: On Reality's Stubborn Refusal To Be One Thing— Emmett Shear, Sonnet 3.7

On Eudaimonia and Optimization

Full-Stack Alignment: Co-Aligning AI and Institutions with Thick Models of Value

A theory of appropriateness with applications to generative artificial intelligence— Joel Z. Leibo, Alexander Sasha Vezhnevets, Manfred Diaz, John P. Agapiou, William A. Cunningham, Peter Sunehag, Julia Haas, Raphael Koster, Edgar A. Duéñez-Guzmán, William S. Isaac, Georgios Piliouras, Stanley M. Bileschi, Iyad Rahwan, Simon Osindero

What are human values, and how do we align AI to them?— Oliver Klingefjord, Ryan Lowe, Joe Edelman

Model Integrity— Joe Edelman, Oliver Klingefjord

Beyond Preferences in AI Alignment— Tan Zhi-Xuan, Micah Carroll, Matija Franklin, Hal Ashton

Can AI Model the Complexities of Human Moral Decision-Making? A Qualitative Study of Kidney Allocation Decisions— Vijay Keswani, Vincent Conitzer, Walter Sinnott-Armstrong, Breanna K. Nguyen, Hoda Heidari, Jana Schaich Borg