The Learning-Theoretic Agenda
Create a mathematical theory of intelligent agents that encompasses both humans and the AIs we want, one that specifies what it means for two such agents to be aligned; translate between its ontology and ours; produce formal desiderata for a training setup that produces coherent AGIs similar to (our model of) an aligned agent
Theory of Change:Fix formal epistemology to work out how to avoid deep training problems
General Approach:Cognitive
Target Case:Worst Case
Some names:Vanessa Kosoy, Diffractor
Estimated FTEs:3
Critiques:
Outputs:
Infra-Bayesianism— abramdemski, Ruby
New Paper: Ambiguous Online Learning— Vanessa Kosoy
Regret Bounds for Robust Online Decision Making— Alexander Appel, Vanessa Kosoy
FUNDAMENTALS OF INFRA-BAYESIANISM— Brittany Gelb
Non-Monotonic Infra-Bayesian Physicalism— Marcus Ogren