What Is AI Alignment?
AI Alignment is the problem of ensuring an AI system pursues the goals its designers and society actually intend, rather than unintended proxies.
AI Alignment, the problem of ensuring an AI system pursues the goals its designers and society actually intend, rather than unintended proxies.
Alignment is about making sure a capable system does what we mean, not just what we literally asked. Misalignment can be subtle — a model that optimises a metric in ways that defeat the metric's purpose — and it becomes more consequential as systems become more capable and autonomous. It is a central concern in AI safety research and in the safety frameworks of frontier developers.
Source: AI safety research literature
Plain-language explanation
Alignment is about making sure a capable system does what we mean, not just what we literally asked. Misalignment can be subtle — a model that optimises a metric in ways that defeat the metric's purpose — and it becomes more consequential as systems become more capable and autonomous. It is a central concern in AI safety research and in the safety frameworks of frontier developers.
Related terms
See where you stand on AI governance
Take the free 7-question maturity assessment and get a personalised action plan.
Free assessment, 3 minutes →