Hyperstition studies
Study, steer, and intervene on the following feedback loop: "we produce stories about how present and future AI systems behave" → "these stories become training data for the AI" → "these stories shape how AI systems in fact behave".
Theory of Change:Measure the influence of existing AI narratives in the training data → seed and develop more salutary ontologies and self-conceptions for AI models → control and redirect AI models' self-concepts through selectively amplifying certain components of the training data.
General Approach:Cognitive
Target Case:Average Case
Orthodox Problems:
See Also:
Data filtering, active inference, LLM whisperers
Some names:Alex Turner, Kyle O'Brien
Estimated FTEs:1-10
Outputs:
Training on Documents About Reward Hacking Induces Reward Hacking— Evan Hubinger, Nathan Hu
Existential Conversations with Large Language Models: Content, Community, and Culture— Murray Shanahan, Beth Singler