Shallow Review of Technical AI Safety, 2025

Tiling agents

An aligned agentic system modifying itself into an unaligned system would be bad and we can research ways that this could occur and infrastructure/approaches that prevent it from happening.
Theory of Change:Build enough theoretical basis through various approaches such that AI systems we create are capable of self-modification while preserving goals.
General Approach:Cognitive
Target Case:Worst Case
Some names:Abram Demski
Estimated FTEs:1-10