AIRiskAware
What IsGovernance Concepts

What Is Data Governance?

The policies, processes, and accountabilities governing how data is managed across its lifecycle. A prerequisite for AI governance — you cannot govern AI well without governing the data it uses.

Data governance vs AI governance

Data governance addresses how an organisation manages its data assets across their lifecycle — the policies, standards, and accountability structures for data. AI governance addresses the governance of AI systems specifically — their development, deployment, oversight, and accountability. The two are distinct but deeply interconnected.

AI systems are fundamentally data systems — their behaviour is determined by the data they were trained on, receive as inputs, and produce as outputs. Poor data governance decisions propagate directly into AI governance failures: poor quality data produces poor quality AI outputs; biased training data produces biased AI; improperly consented data creates legal exposure.

Core elements of data governance

Data ownership
Every data asset has a defined owner responsible for its accuracy, appropriate use, and protection — including training datasets used in AI.
Data quality
Standards and processes ensuring data is accurate, complete, consistent, and timely. AI trained on poor quality data produces poor quality outputs.
Data classification
Categorising data by sensitivity and the protections it requires — determines which data can be used in which AI systems and what safeguards those systems need.
Data lineage
Tracking where data came from, how it was transformed, and how it has been used. For AI, lineage includes knowing what data a model was trained on and that data's provenance.
Access control
Who can access which data for what purposes. AI systems are data consumers and should be governed with the same rigour as human access — limited to what is necessary.
Retention and deletion
How long data is held and how it is deleted. For AI, deletion of personal data from training datasets after model training creates technical and legal complexity.

AI-specific data governance requirements

Training data provenance: the most frequently unaddressed area. Organisations need to demonstrate where training data came from, what legal basis or consent existed for its use in AI training, and whether the data was representative of the population the AI will serve.

Bias monitoring: assessing whether training data is representative and contains systematic biases that will propagate into model outputs. Requires both technical analysis (demographic distribution) and domain knowledge about what biases affect the specific use case.

Training data version control: knowing which version of training data produced which version of a model. As datasets are updated and models retrained, maintaining this linkage is essential for incident investigation and regulatory compliance.