Tools for aligning multiple AIs
Develop tools and techniques for designing and testing multi-agent AI scenarios, for auditing real-world multi-agent AI dynamics, and for aligning AIs in multi-AI settings.
Theory of Change:Addressing multi-agent AI dynamics is key for aligning near-future agents and their impact on the world. Feedback loops from multi-agent dynamics can radically change the future AI landscape, and require a different toolset from model psychology to audit and control.
Some names:Lewis Hammond, Emery Cooper, Allan Chan, Caspar Oesterheld, Vincent Conitzer, Gillian Hadfield
Estimated FTEs:10 - 15
Outputs:
Beyond the high score: Prosocial ability profiles of multi-agent populations— Marko Tesic, Yue Zhao, Joel Z. Leibo, Rakshit S. Trivedi, Jose Hernandez-Orallo
Multiplayer Nash Preference Optimization— Fang Wu, Xu Huang, Weihao Xuan, Zhiwei Zhang, Yijia Xiao, Guancheng Wan, Xiaomin Li, Bing Hu, Peng Xia, Jure Leskovec, Yejin Choi
AgentBreeder: Mitigating the AI Safety Risks of Multi-Agent Scaffolds via Self-Improvement— J Rosser, Jakob Foerster
When Autonomy Goes Rogue: Preparing for Risks of Multi-Agent Collusion in Social Systems— Qibing Ren, Sitao Xie, Longxuan Wei, Zhenfei Yin, Junchi Yan, Lizhuang Ma, Jing Shao
Infrastructure for AI Agents— Alan Chan, Kevin Wei, Sihao Huang, Nitarshan Rajkumar, Elija Perrier, Seth Lazar, Gillian K. Hadfield, Markus Anderljung
A dataset of questions on decision-theoretic reasoning in Newcomb-like problems— Caspar Oesterheld, Emery Cooper, Miles Kodama, Linh Chi Nguyen, Ethan Perez
Virtual Agent Economies— Nenad Tomasev, Matija Franklin, Joel Z. Leibo, Julian Jacobs, William A. Cunningham, Iason Gabriel, Simon Osindero
An Interpretable Automated Mechanism Design Framework with Large Language Models— Jiayuan Liu, Mingyu Guo, Vincent Conitzer
Comparing Collective Behavior of LLM and Human Groups— Anna B. Stephenson, Andrew Zhu, Chris Callison-Burch, Jan Kulveit