The "Neglected Approaches" Approach
Agenda-agnostic approaches to identifying good but overlooked empirical alignment ideas, working with theorists who could use engineers, and prototyping them.
Theory of Change:Empirical search for "negative alignment taxes" (prioritizing methods that simultaneously enhance alignment and capabilities)
General Approach:Engineering
Target Case:Average Case
Orthodox Problems:
See Also:
Iterative alignment, automated alignment research, Beijing Key Laboratory of Safe AI and Superalignment, Aligned AI
Some names:AE Studio, Gunnar Zarncke, Cameron Berg, Michael Vaiana, Judd Rosenblatt, Diogo Schwerz de Lucena
Estimated FTEs:15
Outputs:
Towards Safe and Honest AI Agents with Neural Self-Other Overlap— Marc Carauleanu, Michael Vaiana, Judd Rosenblatt, Cameron Berg, Diogo Schwerz de Lucena
Momentum Point-Perplexity Mechanics in Large Language Models— Lorenzo Tomaz, Judd Rosenblatt, Thomas Berry Jones, Diogo Schwerz de Lucena
Large Language Models Report Subjective Experience Under Self-Referential Processing— Cameron Berg, Diogo de Lucena, Judd Rosenblatt