Target Cases

Assumptions about how difficult alignment is. "Pessimistic" approaches assume alignment is hard, while "optimistic" approaches assume current techniques may be sufficient.

Inspired by: Defining Alignment Research

Average Case25 agendas

Focuses on typical expected outcomes rather than extreme scenarios. Emphasizes practical safety measures that work well in normal operation, without necessarily handling all edge cases.

Pessimistic19 agendas

Assumes AI alignment is difficult and that achieving safe AI requires substantial effort, novel breakthroughs, or solving hard open problems. Prioritizes robustness against adversarial or deceptive AI behavior.

Relevant Agendas

Capability removal: unlearning Model values / model preferences Emergent misalignment Model psychopathology Data poisoning defense RL safety Lie and deception detectors Model diffing Human inductive biases Monitoring concepts Scientist AI Supervising AIs improving AIs AI explanations of AIs High-Actuation Spaces Asymptotic guarantees Other corrigibility WMD evals (Weapons of Mass Destruction)AI scheming evals Sandbagging evals

Worst Case18 agendas

Designs for the most challenging possible scenarios, including highly capable adversarial AI systems. Prioritizes formal guarantees and provable safety properties over practical convenience.

Relevant Agendas

Control Reverse engineering Extracting latent knowledge Causal Abstractions Learning dynamics and developmental interpretability Guaranteed-Safe AI Brainlike-AGI Safety Debate Agent foundations Tiling agents Heuristic explanations Behavior alignment theory Natural abstractions The Learning-Theoretic Agenda Situational awareness and self-awareness evals Steganography evals AI deception evals Self-replication evals