Theory for aligning multiple AIs
Use realistic game-theory variants (e.g. evolutionary game theory, computational game theory) or develop alternative game theories to describe/predict the collective and individual behaviours of AI agents in multi-agent scenarios.
Theory of Change:While traditional AGI safety focuses on idealized decision-theory and individual agents, it's plausible that strategic AI agents will first emerge (or are emerging now) in a complex, multi-AI strategic landscape. We need granular, realistic formal models of AIs' strategic interactions and collective dynamics to understand this future.
General Approach:Cognitive
Some names:Lewis Hammond, Emery Cooper, Allan Chan, Caspar Oesterheld, Vincent Conitzer, Jan Kulveit, Richard Ngo, Emmett Shear, Softmax, Full Stack Alignment, AI Objectives Institute
Estimated FTEs:10
Outputs:
Multi-Agent Risks from Advanced AI— Lewis Hammond, Alan Chan, Jesse Clifton, Jason Hoelscher-Obermaier, Akbir Khan, Euan McLean, Chandler Smith, Wolfram Barfuss, Jakob Foerster, Tomáš Gavenčiak, The Anh Han, Edward Hughes, Vojtěch Kovařík, Jan Kulveit, Joel Z. Leibo, Caspar Oesterheld, Christian Schroeder de Witt, Nisarg Shah, Michael Wellman, Paolo Bova, Theodor Cimpeanu, Carson Ezell, Quentin Feuillade-Montixi, Matija Franklin, Esben Kran, Igor Krawczuk, Max Lamparth, Niklas Lauffer, Alexander Meinke, Sumeet Motwani, Anka Reuel, Vincent Conitzer, Michael Dennis, Iason Gabriel, Adam Gleave, Gillian Hadfield, Nika Haghtalab, Atoosa Kasirzadeh, Sébastien Krier, Kate Larson, Joel Lehman, David C. Parkes, Georgios Piliouras, Iyad Rahwan
An Economy of AI Agents— Gillian K. Hadfield, Andrew Koh
Moloch's Bargain: Emergent Misalignment When LLMs Compete for Audiences— Batu El, James Zou
AI Testing Should Account for Sophisticated Strategic Behaviour— Vojtech Kovarik, Eric Olav Chen, Sami Petersen, Alexis Ghersengorin, Vincent Conitzer
Emergent social conventions and collective bias in LLM populations— Ariel Flint Ashery, Luca Maria Aiello, Andrea Baronchelli
Strategic Intelligence in Large Language Models: Evidence from evolutionary Game Theory— Kenneth Payne, Baptiste Alloui-Cros
Communication Enables Cooperation in LLM Agents: A Comparison with Curriculum-Based Approaches— Hachem Madmoun, Salem Lahlou
Higher-Order Belief in Incomplete Information MAIDs— Jack Foxabbott, Rohan Subramani, Francis Rhys Ward
Characterising Simulation-Based Program Equilibria— Emery Cooper, Caspar Oesterheld, Vincent Conitzer
Promises Made, Promises Kept: Safe Pareto Improvements via Ex Post Verifiable Commitments— Nathaniel Sauerberg, Caspar Oesterheld
The Pando Problem: Rethinking AI Individuality— Jan_Kulveit