Synthetic data for alignment
Uses AI-generated data (e.g., critiques, preferences, or self-labeled examples) to scale and improve alignment, especially for superhuman models.
Theory of Change:We can overcome the bottleneck of human feedback and data by using models to generate vast amounts of high-quality, targeted data for safety, preference tuning, and capability elicitation.
General Approach:Engineering
Target Case:Average Case
See Also:
Data quality for alignment, Data filtering, scalable oversight, automated alignment research, Weak-to-strong generalization
Some names:Mianqiu Huang, Xiaoran Liu, Rylan Schaeffer, Nevan Wichers, Aram Ebtekar, Jiaxin Wen, Vishakh Padmakumar, Benjamin Newman
Estimated FTEs:50-150
Critiques:
Outputs:
Aligning Large Language Models via Fully Self-Synthetic Data— Shangjian Yin, Zhepei Wei, Xinyu Zhu, Wei-Lin Chen, Yu Meng
Synth-Align: Improving Trustworthiness in Vision-Language Model with Synthetic Preference Data Alignment— Robert Wijaya, Ngoc-Bao Nguyen, Ngai-Man Cheung
Carleman Estimates and Controllability of Forward Stochastic Parabolic Equations with General Dynamic Boundary Conditions— Said Boulite, Abdellatif Elgrou, Lahcen Maniar, Abdelaziz Rhandi
Numerical Investigation of Sequence Modeling Theory using Controllable Memory Functions— Haotian Jiang, Zeyu Bao, Shida Wang, Qianxiao Li
On the Realization of quantum gates coming from the Tracy-Singh product— Fabienne Chouraqui
Sharp uniform approximation for spectral Barron functions by deep neural networks— Yulei Liao, Pingbing Ming, Hao Yu
OGBoost: A Python Package for Ordinal Gradient Boosting— Mansour T.A. Sharabiani, Alex Bottle, Alireza S. Mahani
GRADEO: Towards Human-Like Evaluation for Text-to-Video Generation via Multi-Step Reasoning— Zhun Mou, Bin Xia, Zhengchao Huang, Wenming Yang, Jiaya Jia