Shallow Review of Technical AI Safety, 2025

Synthetic data for alignment

Uses AI-generated data (e.g., critiques, preferences, or self-labeled examples) to scale and improve alignment, especially for superhuman models.
Theory of Change:We can overcome the bottleneck of human feedback and data by using models to generate vast amounts of high-quality, targeted data for safety, preference tuning, and capability elicitation.
General Approach:Engineering
Target Case:Average Case
See Also:
Data quality for alignment, Data filtering, scalable oversight, automated alignment research, Weak-to-strong generalization
Some names:Mianqiu Huang, Xiaoran Liu, Rylan Schaeffer, Nevan Wichers, Aram Ebtekar, Jiaxin Wen, Vishakh Padmakumar, Benjamin Newman
Estimated FTEs:50-150