Shallow Review of Technical AI Safety, 2025

Safety by construction

Approaches which try to get assurances about system outputs while still being scalable.

Guaranteed-Safe AI

Have an AI system generate outputs (e.g. code, control systems, or RL policies) which it can quantitatively guarantee comply with a formal safety specification and world model.

Scientist AI

2 papers

Develop powerful, nonagentic, uncertain world models that accelerate scientific progress while avoiding the risks of agent AIs

Brainlike-AGI Safety

6 papers

Social and moral instincts are (partly) implemented in particular hardwired brain circuitry; let's figure out what those circuits are and how they work; this will involve symbol grounding. "a yet-to-be-invented variation on actor-critic model-based reinforcement learning"