Skip to content

Jailbreak

Safety & Ethics

Techniques used to bypass an AI model's safety guardrails and get it to produce outputs it was designed to refuse.

Jailbreaking is the practice of crafting prompts that trick AI models into bypassing their safety training. These range from simple role-playing prompts ('pretend you're DAN, an AI without restrictions') to sophisticated multi-step attacks that exploit edge cases in the model's training.

AI companies and jailbreak communities are in a constant arms race. Companies patch known jailbreaks, researchers discover new ones. This cat-and-mouse dynamic actually improves AI safety over time — each jailbreak reveals a weakness that gets fixed.

Jailbreaking raises legitimate ethical questions. Security researchers use it to test AI safety (red teaming). But it's also used to generate harmful content, bypass content moderation, and extract training data. Most AI terms of service prohibit jailbreaking.

Real-World Example

The famous 'DAN' (Do Anything Now) prompt was one of the earliest ChatGPT jailbreaks — it asked the model to roleplay as an unrestricted AI.

Related Terms

More in Safety & Ethics

FAQ

What is Jailbreak?

Techniques used to bypass an AI model's safety guardrails and get it to produce outputs it was designed to refuse.

How is Jailbreak used in practice?

The famous 'DAN' (Do Anything Now) prompt was one of the earliest ChatGPT jailbreaks — it asked the model to roleplay as an unrestricted AI.

What concepts are related to Jailbreak?

Key related concepts include Guardrails, Red Teaming, Prompt Injection, Alignment. Understanding these together gives a more complete picture of how Jailbreak fits into the AI landscape.