Guaranteed-Safe AI
Have an AI system generate outputs (e.g. code, control systems, or RL policies) which it can quantitatively guarantee comply with a formal safety specification and world model.
Theory of Change:Various, including:
i) safe deployment: create a scalable process to get not-fully-trusted AIs to produce highly trusted outputs;
ii) secure containers: create a 'gatekeeper' system that can act as an intermediary between human users and a potentially dangerous system, only letting provably safe actions through.
(Notable for not requiring that we solve ELK; does require that we solve ontology though)
Target Case:Worst Case
See Also:
Towards Guaranteed Safe AI, Standalone World-Models, Scientist AI, Safeguarded AI, Asymptotic guarantees, Open Agency Architecture, SLES, program synthesis, Scalable formal oversight
Some names:ARIA, Lawzero, Atlas Computing, FLF, Max Tegmark, Steve Omohundro, Joar Skalse, Stuart Russell, Alessandro Abate
Estimated FTEs:10-100
Critiques:
Outputs:
SafePlanBench: evaluating a Guaranteed Safe AI Approach for LLM-based Agents— Agustín Martinez Suñé, Tan Zhi Xuan
Beliefs about formal methods and AI safety— Quinn Dougherty
Report on NSF Workshop on Science of Safe AI— Rajeev Alur, Greg Durrett, Hadas Kress-Gazit, Corina Păsăreanu, René Vidal
A benchmark for vericoding: formally verified program synthesis— Sergiu Bursuc, Theodore Ehrenborg, Shaowei Lin, Lacramioara Astefanoaei, Ionel Emilian Chiosa, Jure Kukovec, Alok Singh, Oliver Butterley, Adem Bizid, Quinn Dougherty, Miranda Zhao, Max Tan, Max Tegmark