Tiling agents
An aligned agentic system modifying itself into an unaligned system would be bad and we can research ways that this could occur and infrastructure/approaches that prevent it from happening.
Theory of Change:Build enough theoretical basis through various approaches such that AI systems we create are capable of self-modification while preserving goals.
General Approach:Cognitive
Target Case:Worst Case
See Also:
Some names:Abram Demski
Estimated FTEs:1-10
Outputs:
Working through a small tiling result— James Payor
Communication & Trust— Abram Demski
Understanding Trust— Abram Demski