Shallow Review of Technical AI Safety, 2025

Tools for aligning multiple AIs

Develop tools and techniques for designing and testing multi-agent AI scenarios, for auditing real-world multi-agent AI dynamics, and for aligning AIs in multi-AI settings.
Theory of Change:Addressing multi-agent AI dynamics is key for aligning near-future agents and their impact on the world. Feedback loops from multi-agent dynamics can radically change the future AI landscape, and require a different toolset from model psychology to audit and control.
Some names:Lewis Hammond, Emery Cooper, Allan Chan, Caspar Oesterheld, Vincent Conitzer, Gillian Hadfield
Estimated FTEs:10 - 15
Outputs:
Beyond the high score: Prosocial ability profiles of multi-agent populationsMarko Tesic, Yue Zhao, Joel Z. Leibo, Rakshit S. Trivedi, Jose Hernandez-Orallo
Multiplayer Nash Preference OptimizationFang Wu, Xu Huang, Weihao Xuan, Zhiwei Zhang, Yijia Xiao, Guancheng Wan, Xiaomin Li, Bing Hu, Peng Xia, Jure Leskovec, Yejin Choi
When Autonomy Goes Rogue: Preparing for Risks of Multi-Agent Collusion in Social SystemsQibing Ren, Sitao Xie, Longxuan Wei, Zhenfei Yin, Junchi Yan, Lizhuang Ma, Jing Shao
Infrastructure for AI AgentsAlan Chan, Kevin Wei, Sihao Huang, Nitarshan Rajkumar, Elija Perrier, Seth Lazar, Gillian K. Hadfield, Markus Anderljung
A dataset of questions on decision-theoretic reasoning in Newcomb-like problemsCaspar Oesterheld, Emery Cooper, Miles Kodama, Linh Chi Nguyen, Ethan Perez
Virtual Agent EconomiesNenad Tomasev, Matija Franklin, Joel Z. Leibo, Julian Jacobs, William A. Cunningham, Iason Gabriel, Simon Osindero
Comparing Collective Behavior of LLM and Human GroupsAnna B. Stephenson, Andrew Zhu, Chris Callison-Burch, Jan Kulveit