What Is Jailbreaking?
Jailbreaking is the practice of crafting inputs that circumvent an AI system's safety guardrails to elicit outputs the system was designed to refuse.
Jailbreaking, the practice of crafting inputs that circumvent an AI system's safety guardrails to elicit outputs the system was designed to refuse.
Jailbreaking exploits the gap between what a model is capable of producing and what its safety training intends to prevent. Techniques include role-play framing, hypothetical scenarios, encoded instructions, and adversarial suffixes. Jailbreaking is a persistent governance challenge because safety alignment is probabilistic, not absolute — no current technique fully prevents it. Red-teaming programmes deliberately attempt jailbreaks to find and close gaps before deployment.
Source: NIST AI 100-2; OWASP LLM Top 10 (2025)
Plain-language explanation
Jailbreaking exploits the gap between what a model is capable of producing and what its safety training intends to prevent. Techniques include role-play framing, hypothetical scenarios, encoded instructions, and adversarial suffixes. Jailbreaking is a persistent governance challenge because safety alignment is probabilistic, not absolute — no current technique fully prevents it. Red-teaming programmes deliberately attempt jailbreaks to find and close gaps before deployment.
See where you stand on AI governance
Take the free 7-question maturity assessment and get a personalised action plan.
Free assessment, 3 minutes →