Shallow Review of Technical AI Safety, 2025

The Learning-Theoretic Agenda

Create a mathematical theory of intelligent agents that encompasses both humans and the AIs we want, one that specifies what it means for two such agents to be aligned; translate between its ontology and ours; produce formal desiderata for a training setup that produces coherent AGIs similar to (our model of) an aligned agent
Theory of Change:Fix formal epistemology to work out how to avoid deep training problems
General Approach:Cognitive
Target Case:Worst Case
Some names:Vanessa Kosoy, Diffractor
Estimated FTEs:3
Critiques: