Human inductive biases
Discover connections deep learning AI systems have with human brains and human learning processes. Develop an 'alignment moonshot' based on a coherent theory of learning which applies to both humans and AI systems.
Theory of Change:Humans learn trust, honesty, self-maintenance, and corrigibility; if we understand how they do maybe we can get future AI systems to learn them.
General Approach:Cognitive
Target Case:Pessimistic
Orthodox Problems:
See Also:
active learning, ACS research
Some names:Lukas Muttenthaler, Quentin Delfosse
Estimated FTEs:4
Outputs:
Aligning machine and human visual representations across abstraction levels— Lukas Muttenthaler, Klaus Greff, Frieda Born, Bernhard Spitzer, Simon Kornblith, Michael C. Mozer, Klaus-Robert Müller, Thomas Unterthiner, Andrew K. Lampinen
Teaching AI to Handle Exceptions: Supervised Fine-tuning with Human-aligned Judgment— Matthew DosSantos DiSorbo, Harang Ju, Sinan Aral
HIBP Human Inductive Bias Project Plan— Félix Dorn
Beginning with You: Perceptual-Initialization Improves Vision-Language Representation and Alignment— Yang Hu, Runchen Wang, Stephen Chong Zhao, Xuhui Zhan, Do Hun Kim, Mark Wallace, David A. Tovar
Towards Cognitively-Faithful Decision-Making Models to Improve AI Alignment— Cyrus Cousins, Vijay Keswani, Vincent Conitzer, Hoda Heidari, Jana Schaich Borg, Walter Sinnott-Armstrong