Shallow Review of Technical AI Safety, 2025

Asymptotic guarantees

Prove that if a safety process has enough resources (human data quality, training time, neural network capacity), then in the limit some system specification will be guaranteed. Use complexity theory, game theory, learning theory and other areas to both improve asymptotic guarantees and develop ways of showing convergence.
Theory of Change:Formal verification may be too hard. Make safety cases stronger by modelling their processes and proving that they would work in the limit.
General Approach:Cognitive
Target Case:Pessimistic
Some names:AISI, Jacob Pfau, Benjamin Hilton, Geoffrey Irving, Simon Marshall, Will Kirby, Martin Soto, David Africa
Estimated FTEs:5 - 10
Outputs:
An alignment safety case sketch based on debateMarie_DB, Jacob Pfau, Benjamin Hilton, Geoffrey Irving
UK AISI's Alignment Team: Research AgendaBenjamin Hilton, Jacob Pfau, Marie_DB, Geoffrey Irving