Shallow Review of Technical AI Safety, 2025

Guaranteed-Safe AI

Have an AI system generate outputs (e.g. code, control systems, or RL policies) which it can quantitatively guarantee comply with a formal safety specification and world model.
Theory of Change:Various, including: i) safe deployment: create a scalable process to get not-fully-trusted AIs to produce highly trusted outputs; ii) secure containers: create a 'gatekeeper' system that can act as an intermediary between human users and a potentially dangerous system, only letting provably safe actions through. (Notable for not requiring that we solve ELK; does require that we solve ontology though)
Target Case:Worst Case
See Also:
Towards Guaranteed Safe AI, Standalone World-Models, Scientist AI, Safeguarded AI, Asymptotic guarantees, Open Agency Architecture, SLES, program synthesis, Scalable formal oversight
Some names:ARIA, Lawzero, Atlas Computing, FLF, Max Tegmark, Steve Omohundro, Joar Skalse, Stuart Russell, Alessandro Abate
Estimated FTEs:10-100
Outputs:
Report on NSF Workshop on Science of Safe AIRajeev Alur, Greg Durrett, Hadas Kress-Gazit, Corina Păsăreanu, René Vidal
A benchmark for vericoding: formally verified program synthesisSergiu Bursuc, Theodore Ehrenborg, Shaowei Lin, Lacramioara Astefanoaei, Ionel Emilian Chiosa, Jure Kukovec, Alok Singh, Oliver Butterley, Adem Bizid, Quinn Dougherty, Miranda Zhao, Max Tan, Max Tegmark