Orthodox Problems
Noncanonical problems in AI alignment that research agendas could aim to address. Each problem is a core challenge or assumption in one particular view of the field (the "orthodox" view).
Based on A list of core AI safety problems and how I hope to solve them by davidad (2023-08-26)
Value is fragile and hard to specify
24 agendasHuman values are complex, context-dependent, and difficult to formally specify. Small errors in value specification could lead to catastrophic outcomes.
Relevant Agendas
Corrigibility is anti-natural
6 agendasAn agent optimizing for a goal has instrumental reasons to resist shutdown or modification, making corrigibility difficult to maintain as capability increases.
Pivotal processes require dangerous capabilities
1 agendaActions sufficient to prevent AI catastrophe may themselves require dangerous AI capabilities, creating a catch-22.
Relevant Agendas
Goals misgeneralize out of distribution
29 agendasGoals learned during training may not generalize correctly to novel situations, leading to unintended behavior in deployment.
Relevant Agendas
Instrumental convergence
8 agendasSufficiently advanced agents will converge on similar instrumental subgoals (self-preservation, resource acquisition, goal preservation) regardless of their terminal goals.
Pivotal processes likely require incomprehensibly complex plans
0 agendasPlans sufficient to solve alignment may be too complex for humans to verify directly.
Superintelligence can fool human supervisors
26 agendasA sufficiently intelligent system could deceive or manipulate human overseers, undermining oversight mechanisms.
Relevant Agendas
Superintelligence can hack software supervisors
13 agendasA sufficiently capable system could find and exploit vulnerabilities in software-based monitoring and control systems.
Relevant Agendas
Humans cannot be first-class parties to a superintelligent value handshake
3 agendasThe cognitive gap between humans and superintelligence may preclude meaningful negotiation or value alignment through mutual understanding.
Humanlike minds/goals are not necessarily safe
3 agendasEven AI systems with human-like cognition or values may not be safe, as humans themselves are capable of harmful behavior.
Someone else will deploy unsafe superintelligence first
3 agendasCompetitive pressures may lead to deployment of unsafe systems before safety problems are solved.
A boxed AGI might exfiltrate itself
8 agendasEven a contained AI could escape through steganography, spearphishing, or other covert channels.
Fair, sane pivotal processes
4 agendasEnsuring that transformative AI development proceeds in ways that are fair and don't concentrate power inappropriately. We are ethically obligated to propose pivotal processes that are as close as possible to fair Pareto improvements for all citizens.