Shallow Review of Technical AI Safety, 2025

Autonomy evals

Measure an AI's ability to act autonomously to complete long-horizon, complex tasks.
Theory of Change:By measuring how long and complex a task an AI can complete (its "time horizon"), we can track capability growth and identify when models gain dangerous autonomous capabilities (like R&D acceleration or replication).
General Approach:Behavioral
Target Case:Average Case
Estimated FTEs:10-50
Outputs:
Measuring AI Ability to Complete Long TasksThomas Kwa, Ben West, Joel Becker, Amy Deng, Katharyn Garcia, Max Hasin, Sami Jawhar, Megan Kinniment, Nate Rush, Sydney Von Arx, Ryan Bloom, Thomas Broadley, Haoxing Du, Brian Goodrich, Nikola Jurkovic, Luke Harold Miles, Seraphina Nix, Tao Lin, Neev Parikh, David Rein, Lucas Jun Koba Sato, Hjalmar Wijk, Daniel M. Ziegler, Elizabeth Barnes, Lawrence Chan
RE-Bench: Evaluating frontier AI R&D capabilities of language model agents against human expertsHjalmar Wijk, Tao Lin, Joel Becker, Sami Jawhar, Neev Parikh, Thomas Broadley, Lawrence Chan, Michael Chen, Josh Clymer, Jai Dhyani, Elena Ericheva, Katharyn Garcia, Brian Goodrich, Nikola Jurkovic, Holden Karnofsky, Megan Kinniment, Aron Lajko, Seraphina Nix, Lucas Sato, William Saunders, Maksym Taran, Ben West, Elizabeth Barnes
OS-Harm: A Benchmark for Measuring Safety of Computer Use AgentsThomas Kuntz, Agatha Duzan, Hao Zhao, Francesco Croce, Zico Kolter, Nicolas Flammarion, Maksym Andriushchenko
OpenAgentSafety: A Comprehensive Framework for Evaluating Real-World AI Agent SafetySanidhya Vijayvargiya, Aditya Bharat Soni, Xuhui Zhou, Zora Zhiruo Wang, Nouha Dziri, Graham Neubig, Maarten Sap
PaperBench: Evaluating AI's Ability to Replicate AI ResearchGiulio Starace, Oliver Jaffe, Dane Sherburn, James Aung, Chan Jun Shern, Leon Maksin, Rachel Dias, Evan Mays, Benjamin Kinsella, Wyatt Thompson, Johannes Heidecke, Mia Glaese, Tejal Patwardhan
Forecasting Frontier Language Model Agent CapabilitiesGovind Pimpale, Axel Højmark, Jérémy Scheurer, Marius Hobbhahn
GSM-Agent: Understanding Agentic Reasoning Using Controllable EnvironmentsHanlin Zhu, Tianyu Guo, Song Mei, Stuart Russell, Nikhil Ghosh, Alberto Bietti, Jiantao Jiao