Governance Concept

What Is AI Safety?

AI safety is a broad field concerned with ensuring AI systems behave reliably, in accordance with human values, and without causing unintended harm. It spans immediate, practical risks from today's AI tools, unreliable outputs, security vulnerabilities, privacy breaches, and longer-term questions about the governance of increasingly capable AI systems.

Definition

AI Safety, the field concerned with ensuring AI systems behave reliably and as intended, particularly as their capabilities approach or exceed human-level performance in defined domains.

AI safety as a technical field focuses on alignment (does the AI pursue the goals we actually want), robustness (does it behave reliably under unusual inputs or adversarial pressure), interpretability (can we understand what it is doing), and oversight (can we maintain control). Frontier AI safety is increasingly institutionalised through national AI Safety Institutes (US, UK, Japan, Singapore, Australia) and through voluntary Responsible Scaling Policies from frontier labs.

Source: UK AISI; US AISI; Frontier Model Forum

Near-term AI safety: the immediate enterprise concerns

For most organisations, AI safety in 2025–26 means managing specific, concrete risks from currently deployed AI tools. These are not hypothetical, they are occurring in Australian and global workplaces right now.

Harmful outputs

AI generating dangerous, discriminatory, or misleading content at scale, including in contexts where a human reviewer would catch the problem but the deployment model does not require review.

Reliability failures

AI systems failing in high-stakes contexts, medical diagnosis, safety-critical infrastructure, financial advice, where failure causes real harm.

Security vulnerabilities

AI systems being manipulated through adversarial inputs, prompt injection, or data poisoning, used to produce harmful outputs or take unintended actions.

Privacy violations

AI systems exposing personal information from training data, or processing personal information in ways that breach privacy law.

Concentration of power

AI enabling excessive surveillance, manipulation, or control, whether by employers over workers, governments over citizens, or platforms over users.

Australia's AI Safety Institute

The Australian Government announced the establishment of the AI Safety Institute (AISI) on 25 November 2025, with AUD $29.9 million in initial funding. The AISI focuses on testing and evaluation of advanced AI systems, coordinates with regulators including the OAIC, and joins the International Network of AI Safety Institutes alongside the US, UK, Canada, South Korea, and Japan.

The AISI complements the AI6 framework, AI6 provides governance standards for organisations deploying AI, while the AISI focuses on evaluating the safety of advanced models themselves. Together they reflect Australia's approach of relying on existing laws, voluntary guidance, and safety infrastructure rather than a standalone AI Act.

The EU AI Act as a safety framework

The EU AI Act is partly a safety framework, it requires prohibited AI practices (social scoring, manipulation, real-time biometric surveillance) to be banned outright, and requires high-risk AI systems to be safe, accurate, and robust before market access. For organisations operating in the EU, AI safety is not just an aspiration, it is a conformity assessment requirement with penalty exposure of up to 7% of global turnover.

Australia's AI safety approach What is the EU AI Act?