Data quality for alignment
Improves the quality, signal-to-noise ratio, and reliability of human-generated preference and alignment data.
Theory of Change:The quality of alignment is heavily dependent on the quality of the data (e.g., human preferences); by improving the "signal" from annotators and reducing noise/bias, we will get more robustly aligned models.
General Approach:Engineering
Target Case:Average Case
See Also:
Synthetic data for alignment, scalable oversight, Assistance games, assistive agents, Model values / model preferences
Some names:Maarten Buyl, Kelsey Kraus, Margaret Kroll, Danqing Shi
Estimated FTEs:20-50
Outputs:
AI Alignment at Your Discretion— Maarten Buyl, Hadi Khalaf, Claudio Mayrink Verdun, Lucas Monteiro Paes, Caio C. Vieira Machado, Flavio du Pin Calmon
Maximizing Signal in Human-Model Preference Alignment— Kelsey Kraus, Margaret Kroll
DxHF: Providing High-Quality Human Feedback for LLM Alignment via Interactive Decomposition— Danqing Shi, Furui Cheng, Tino Weinkauf, Antti Oulasvirta, Mennatallah El-Assady
Challenges and Future Directions of Data-Centric AI Alignment— Min-Hsuan Yeh, Jeffrey Wang, Xuefeng Du, Seongheon Park, Leitian Tao, Shawn Im, Yixuan Li
You Are What You Eat -- AI Alignment Requires Understanding How Data Shapes Structure and Generalisation— Simon Pepin Lehalleur, Jesse Hoogland, Matthew Farrugia-Roberts, Susan Wei, Alexander Gietelink Oldenziel, George Wang, Liam Carroll, Daniel Murfet