Security Prompt Injection

What Is Prompt Injection?

Prompt injection is an attack where malicious instructions are embedded in user input, attempting to override the AI agent's system prompt and security policies. It is the most common attack vector against AI agents and one of the most dangerous.

Types of Prompt Injection

OpenClaw's Defense Layers

OpenClaw employs multiple defense layers against prompt injection:

  1. Input sanitization: All user inputs are processed through a prompt firewall that detects common injection patterns before they reach the LLM.
  2. System prompt isolation: The system prompt is cryptographically separated from user input, making it significantly harder for injections to override core instructions.
  3. Output validation: Before executing any action, the agent validates that the proposed action is consistent with its security policies — regardless of what the LLM suggested.
  4. Allowlist enforcement: Even if an injection successfully manipulates the LLM's output, the allowlist system blocks unauthorized operations.

Hardening Against Injection

Prompt Guardian
Protect your AI agent from prompt injection and malicious commands.
Explore Prompt Guardian →