AIRiskAware
What Is...
Core Concept

What Is a Large Language Model?

A large language model (LLM) is an AI system trained on vast amounts of text data to predict and generate human-like text. LLMs are the technology behind ChatGPT, Claude, Gemini, and Copilot. They are powerful, statistically sophisticated — and fundamentally different from how conventional software works.

How LLMs actually work

LLMs are trained by processing enormous amounts of text — books, websites, code, scientific papers — and learning statistical patterns in language. They do not store facts like a database. They learn relationships between words and concepts, then use those patterns to predict what text should come next. This is why they can write fluent prose, answer questions, and summarise documents — but also why they can generate confident-sounding content that is factually wrong. The model has no way to verify whether its output is true.

The key distinction from conventional software

Traditional software follows deterministic rules: given input X, it always produces output Y. LLMs are probabilistic — the same input can produce different outputs, and the model has no reliable mechanism to verify factual accuracy. This has fundamental governance implications. You cannot test an LLM the way you test a conventional system. A control that works for a spreadsheet does not work for a language model.

Governance implications for organisations

Hallucination risk
LLMs generate plausible-sounding but factually wrong content. Output requires human review before any professional or consequential use.
Training data opacity
LLMs are trained on vast internet data, raising copyright, privacy, and bias concerns. Understanding what trained a model matters for responsible procurement.
Non-determinism
The same input does not always produce the same output. LLMs cannot be tested like conventional software — governance must account for statistical, not binary, behaviour.
Privacy exposure
Data entered into an LLM may be retained and used for training. Entering personal or confidential information into consumer LLMs creates Privacy Act risk for Australian organisations.
EU AI Act scope
LLMs with over 10²⁵ FLOPs of training compute are GPAI models with systemic risk under the EU AI Act — subject to specific obligations from August 2025.
Accountability gap
When an LLM produces output that causes harm, liability falls on the deploying organisation, not the model vendor. Professional responsibility does not transfer to the AI.

LLMs and regulatory classification

Under the EU AI Act, LLMs deployed as standalone services are classified as General Purpose AI (GPAI) models. Those trained with more than 10²⁵ FLOPs of compute are considered to present systemic risk — subject to additional obligations including adversarial testing, incident reporting, and transparency. These obligations apply to providers, but deployers also have obligations around appropriate use and human oversight.

In Australia, LLMs are not specifically regulated, but the Privacy Act, AI6 framework, and sector-specific requirements all create obligations that apply to how organisations deploy them. An LLM processing personal information is subject to the Australian Privacy Principles regardless of where the model is hosted.

What is generative AI? What is hallucination?