Theory for aligning multiple AIs

Use realistic game-theory variants (e.g. evolutionary game theory, computational game theory) or develop alternative game theories to describe/predict the collective and individual behaviours of AI agents in multi-agent scenarios.

Theory of Change:While traditional AGI safety focuses on idealized decision-theory and individual agents, it's plausible that strategic AI agents will first emerge (or are emerging now) in a complex, multi-AI strategic landscape. We need granular, realistic formal models of AIs' strategic interactions and collective dynamics to understand this future.

General Approach:Cognitive

Orthodox Problems:

4.Goals misgeneralize out of distribution 7.Superintelligence can fool human supervisors 8.Superintelligence can hack software supervisors

Some names:Lewis Hammond, Emery Cooper, Allan Chan, Caspar Oesterheld, Vincent Conitzer, Jan Kulveit, Richard Ngo, Emmett Shear, Softmax, Full Stack Alignment, AI Objectives Institute

Estimated FTEs:10

Outputs:

Multi-Agent Risks from Advanced AI— Lewis Hammond, Alan Chan, Jesse Clifton, Jason Hoelscher-Obermaier, Akbir Khan, Euan McLean, Chandler Smith, Wolfram Barfuss, Jakob Foerster, Tomáš Gavenčiak, The Anh Han, Edward Hughes, Vojtěch Kovařík, Jan Kulveit, Joel Z. Leibo, Caspar Oesterheld, Christian Schroeder de Witt, Nisarg Shah, Michael Wellman, Paolo Bova, Theodor Cimpeanu, Carson Ezell, Quentin Feuillade-Montixi, Matija Franklin, Esben Kran, Igor Krawczuk, Max Lamparth, Niklas Lauffer, Alexander Meinke, Sumeet Motwani, Anka Reuel, Vincent Conitzer, Michael Dennis, Iason Gabriel, Adam Gleave, Gillian Hadfield, Nika Haghtalab, Atoosa Kasirzadeh, Sébastien Krier, Kate Larson, Joel Lehman, David C. Parkes, Georgios Piliouras, Iyad Rahwan

An Economy of AI Agents— Gillian K. Hadfield, Andrew Koh

Moloch's Bargain: Emergent Misalignment When LLMs Compete for Audiences— Batu El, James Zou

AI Testing Should Account for Sophisticated Strategic Behaviour— Vojtech Kovarik, Eric Olav Chen, Sami Petersen, Alexis Ghersengorin, Vincent Conitzer

Emergent social conventions and collective bias in LLM populations— Ariel Flint Ashery, Luca Maria Aiello, Andrea Baronchelli

Strategic Intelligence in Large Language Models: Evidence from evolutionary Game Theory— Kenneth Payne, Baptiste Alloui-Cros

Communication Enables Cooperation in LLM Agents: A Comparison with Curriculum-Based Approaches— Hachem Madmoun, Salem Lahlou

Higher-Order Belief in Incomplete Information MAIDs— Jack Foxabbott, Rohan Subramani, Francis Rhys Ward

Characterising Simulation-Based Program Equilibria— Emery Cooper, Caspar Oesterheld, Vincent Conitzer

Safe (Pareto) Improvements in Binary Constraint Structures

Promises Made, Promises Kept: Safe Pareto Improvements via Ex Post Verifiable Commitments— Nathaniel Sauerberg, Caspar Oesterheld

The Pando Problem: Rethinking AI Individuality— Jan_Kulveit