Shallow Review of Technical AI Safety, 2025

Aligning what?

Develop alternatives to agent-level models of alignment, by treating human-AI interactions, AI-assisted institutions, AI economic or cultural systems, drives within one AI, and other causal/constitutive processes as subject to alignment

Theory of Change:Model multiple reality-shaping processes above and below the level of the individual AI, some of which are themselves quasi-agential (e.g. cultures) or intelligence-like (e.g. markets), will develop AI alignment into a mature science for managing the transition to an AGI civilization

Orthodox Problems:

1.Value is fragile and hard to specify 2.Corrigibility is anti-natural 4.Goals misgeneralize out of distribution 5.Instrumental convergence 13.Fair, sane pivotal processes

See Also:

Theory for aligning multiple AIs, Aligning to context, Aligned to who?

Some names:Richard Ngo, Emmett Shear, Softmax, Full Stack Alignment, AI Objectives Institute, Jan Kulveit

Estimated FTEs:5-10

Outputs:

Towards a scale-free theory of intelligent agency— Richard Ngo

Alignment first, intelligence later— Chris Lakin

Claude Opus 4 and 4.1 can now end a rare subset of conversations

Full-Stack Alignment: Co-Aligning AI and Institutions with Thick Models of Value

On Eudaimonia and Optimization

AI Governance through Markets

Collective cooperative intelligence— Wolfram Barfuss, Jessica Flack, Chaitanya S. Gokhale, Lewis Hammond, Christian Hilbe, Edward Hughes, Joel Z. Leibo, Tom Lenaerts, Naomi Leonard, Simon Levin, Udari Madhushani Sehwag, Alex McAvoy, Janusz M. Meylahn, Fernando P. Santos

Multipolar AI is Underrated— Allison Duettmann

What, if not agency?

A Phylogeny of Agents— Equilibria

The Multiplicity Thesis, Collective Intelligence, and Morality— Andrew Critch

Hierarchical Agency: A Missing Piece in AI Alignment— Jan_Kulveit

Emmett Shear on Building AI That Actually Cares: Beyond Control and Steering— Emmett Shear, Erik Torenberg, Séb Krier