Shallow Review of Technical AI Safety, 2025

Aligned to who?

Technical protocols for taking seriously the plurality of human values, cultures, and communities when aligning AI to "humanity"
Theory of Change:use democratic/pluralist/context-sensitive principles to guide AI development, alignment, and deployment somehow. Doing it as an afterthought in post-training or the spec isn't good enough. Continuously shape AI's social and technical feedback loop on the road to AGI
General Approach:Behavioral
Target Case:Average Case
Some names:Joel Z. Leibo, Divya Siddarth, Séb Krier, Luke Thorburn, Seth Lazar, AI Objectives Institute, The Collective Intelligence Project, Vincent Conitzer
Estimated FTEs:5 - 15
Outputs:
The AI Power Disparity Index: Toward a Compound Measure of AI Actors' Power to Shape the AI EcosystemRachel M. Kim, Blaine Kuehnert, Seth Lazar, Ranjit Singh, Hoda Heidari
Research Agenda for Sociotechnical Approaches to AI SafetySamuel Curtis, Ravi Iyer, Cameron Domenico Kirk-Giannini, Victoria Krakovna, David Krueger, Nathan Lambert, Bruno Marnette, Colleen McKenzie, Julian Michael, Evan Miyazono, Noyuri Mima, Aviv Ovadya, Luke Thorburn, Vehbi Deger Turan
Cultivating Pluralism In Algorithmic Monoculture: The Community Alignment DatasetLily Hong Zhang, Smitha Milli, Karen Jusko, Jonathan Smith, Brandon Amos, Wassim Bouaziz, Manon Revel, Jack Kussman, Yasha Sheynin, Lisa Titus, Bhaktipriya Radharapu, Jane Yu, Vidya Sarma, Kris Rose, Maximilian Nickel
Training LLM Agents to Empower HumansEvan Ellis, Vivek Myers, Jens Tuyls, Sergey Levine, Anca Dragan, Benjamin Eysenbach
Societal and technological progress as sewing an ever-growing, ever-changing, patchy, and polychrome quiltJoel Z. Leibo, Alexander Sasha Vezhnevets, William A. Cunningham, Sébastien Krier, Manfred Diaz, Simon Osindero
Democratic AI is Possible. The Democracy Levels Framework Shows How It Might WorkAviv Ovadya, Kyle Redman, Luke Thorburn, Quan Ze Chen, Oliver Smith, Flynn Devine, Andrew Konya, Smitha Milli, Manon Revel, K. J. Kevin Feng, Amy X. Zhang, Bilva Chandra, Michiel A. Bakker, Atoosa Kasirzadeh
Political Neutrality in AI Is Impossible- But Here Is How to Approximate ItJillian Fisher, Ruth E. Appel, Chan Young Park, Yujin Potter, Liwei Jiang, Taylor Sorensen, Shangbin Feng, Yulia Tsvetkov, Margaret E. Roberts, Jennifer Pan, Dawn Song, Yejin Choi
Build Agent Advocates, Not Platform AgentsSayash Kapoor, Noam Kolt, Seth Lazar
Gradual Disempowerment: Systemic Existential Risks from Incremental AI DevelopmentJan Kulveit, Raymond Douglas, Nora Ammann, Deger Turan, David Krueger, David Duvenaud