The Learning-Theoretic Agenda

Create a mathematical theory of intelligent agents that encompasses both humans and the AIs we want, one that specifies what it means for two such agents to be aligned; translate between its ontology and ours; produce formal desiderata for a training setup that produces coherent AGIs similar to (our model of) an aligned agent

Theory of Change:Fix formal epistemology to work out how to avoid deep training problems

General Approach:Cognitive

Target Case:Worst Case

Orthodox Problems:

1.Value is fragile and hard to specify 4.Goals misgeneralize out of distribution 9.Humans cannot be first-class parties to a superintelligent value handshake

Some names:Vanessa Kosoy, Diffractor

Estimated FTEs:3

Critiques:

Matolcsi

Outputs:

New paper: Infra-Bayesian Decision Estimation Theory

Infra-Bayesianism— abramdemski, Ruby

New Paper: Ambiguous Online Learning— Vanessa Kosoy

Regret Bounds for Robust Online Decision Making— Alexander Appel, Vanessa Kosoy

FUNDAMENTALS OF INFRA-BAYESIANISM— Brittany Gelb

Non-Monotonic Infra-Bayesian Physicalism— Marcus Ogren