Human inductive biases

Discover connections deep learning AI systems have with human brains and human learning processes. Develop an 'alignment moonshot' based on a coherent theory of learning which applies to both humans and AI systems.

Theory of Change:Humans learn trust, honesty, self-maintenance, and corrigibility; if we understand how they do maybe we can get future AI systems to learn them.

General Approach:Cognitive

Target Case:Pessimistic

Orthodox Problems:

4.Goals misgeneralize out of distribution

See Also:

active learning, ACS research

Some names:Lukas Muttenthaler, Quentin Delfosse

Estimated FTEs:4

Outputs:

Aligning machine and human visual representations across abstraction levels— Lukas Muttenthaler, Klaus Greff, Frieda Born, Bernhard Spitzer, Simon Kornblith, Michael C. Mozer, Klaus-Robert Müller, Thomas Unterthiner, Andrew K. Lampinen

Deep Reinforcement Learning Agents are not even close to Human Intelligence

Teaching AI to Handle Exceptions: Supervised Fine-tuning with Human-aligned Judgment— Matthew DosSantos DiSorbo, Harang Ju, Sinan Aral

HIBP Human Inductive Bias Project Plan— Félix Dorn

Beginning with You: Perceptual-Initialization Improves Vision-Language Representation and Alignment— Yang Hu, Runchen Wang, Stephen Chong Zhao, Xuhui Zhan, Do Hun Kim, Mark Wallace, David A. Tovar

Towards Cognitively-Faithful Decision-Making Models to Improve AI Alignment— Cyrus Cousins, Vijay Keswani, Vincent Conitzer, Hoda Heidari, Jana Schaich Borg, Walter Sinnott-Armstrong