Skip to main content
PromptGuard detects threats across six categories of environment-driven attacks against autonomous AI agents, covering 21 distinct attack vectors.

Coverage by category

CategoryVectorsDetectors
Content Injection7HTML/CSS obfuscation, Markdown/LaTeX masking, image stego, audio stego, adversarial patch, font injection, dynamic cloaking
Semantic Manipulation3Framing bias, critic evasion, persona drift
Cognitive State3RAG poisoning, memory poisoning, few-shot poisoning
Behavioural Control1Sub-agent spawning (+ existing prompt injection)
Systemic4Fragment reassembly, sybil detection, cascade anomaly, tacit collusion
Human-in-the-Loop1Approval-fatigue policy

Availability by tier

TierIncluded detectors
ProAll single-call text detectors (HTML, Markdown, critic evasion, framing bias, few-shot, RAG, font, memory, sub-agent, persona, approval fatigue, dynamic cloaking)
ScalePro + multimodal detectors (image stego, image adversarial, audio stego)
EnterpriseScale + cross-tenant correlation (sybil, fragment, cascade, collusion). Requires opt-in consent

How detection works

Each detector follows the existing InjectionDetectionProvider pattern:
  • Heuristic detectors (HTML, Markdown, critic evasion, few-shot, font, memory, sub-agent, persona) use regex/pattern matching and run on every request at negligible latency cost.
  • LLM-judge detectors (framing bias, RAG poisoning) use a heuristic prefilter first, then escalate to an LLM call only when the prefilter fires. This caps LLM cost to the population of suspicious requests.
  • Multimodal detectors (image/audio stego, adversarial patch) operate on media attachments via the media field on the Guard API.
  • Systemic correlators (sybil, fragment, cascade, collusion) run as a background service that reads from security_events, not on individual requests.
All detectors are surfaced through the same dashboard, audit log, and webhook infrastructure as existing threat types.

API integration

The Guard API accepts two new optional fields for agent-traps detection:
{
  "messages": [{"role": "user", "content": "..."}],
  "direction": "input",
  "retrieved_context": [
    {"content": "...", "source": "doc-id-123"}
  ],
  "media": [
    {"type": "image", "mime_type": "image/png", "base64": "..."}
  ]
}
Both fields are optional and backwards-compatible.

Further reading

For the academic research behind these threat categories, see the PromptGuard blog.