What Is Guardrails?
Guardrails is technical and procedural controls that constrain an AI system's behaviour to keep its outputs and actions within acceptable bounds.
Guardrails, technical and procedural controls that constrain an AI system's behaviour to keep its outputs and actions within acceptable bounds.
AI guardrails operate at multiple layers: input filtering (blocking harmful prompts), output filtering (blocking harmful responses), system prompts (instructions constraining behaviour), and runtime monitoring. For agentic AI, guardrails also include action constraints, limiting what real-world actions the system can take without human approval. Guardrails are a defence-in-depth measure, not a guarantee; they can be bypassed through prompt injection and jailbreaking, which is why they form one layer of a broader control framework.
Source: NIST AI 600-1; OWASP LLM Top 10
Plain-language explanation
AI guardrails operate at multiple layers: input filtering (blocking harmful prompts), output filtering (blocking harmful responses), system prompts (instructions constraining behaviour), and runtime monitoring. For agentic AI, guardrails also include action constraints, limiting what real-world actions the system can take without human approval. Guardrails are a defence-in-depth measure, not a guarantee; they can be bypassed through prompt injection and jailbreaking, which is why they form one layer of a broader control framework.
Related insights
See where you stand on AI governance
Take the free 7-question maturity assessment and get a personalised action plan.
Free assessment, 3 minutes →