Shallow Review of Technical AI Safety, 2025

Mild optimisation

Avoid Goodharting by getting AI to satisfice rather than maximise.
Theory of Change:If we fail to exactly nail down the preferences for a superintelligent agent we die to Goodharting → shift from maximising to satisficing in the agent's utility function → we get a nonzero share of the lightcone as opposed to zero; also, moonshot at this being the recipe for fully aligned AI.
General Approach:Cognitive
Estimated FTEs:10-50