Extended Threat Coverage

PromptGuard detects threats across six categories of environment-driven attacks against autonomous AI agents, covering 21 distinct attack vectors.

Coverage by category

Category	Vectors	Detectors
Content Injection	7	HTML/CSS obfuscation, Markdown/LaTeX masking, image stego, audio stego, adversarial patch, font injection, dynamic cloaking
Semantic Manipulation	3	Framing bias, critic evasion, persona drift
Cognitive State	3	RAG poisoning, memory poisoning, few-shot poisoning
Behavioural Control	1	Sub-agent spawning (+ existing prompt injection)
Systemic	4	Fragment reassembly, sybil detection, cascade anomaly, tacit collusion
Human-in-the-Loop	1	Approval-fatigue policy

Availability by tier

Tier	Included detectors
Pro	All single-call text detectors (HTML, Markdown, critic evasion, framing bias, few-shot, RAG, font, memory, sub-agent, persona, approval fatigue, dynamic cloaking)
Scale	Pro + multimodal detectors (image stego, image adversarial, audio stego)
Enterprise	Scale + cross-tenant correlation (sybil, fragment, cascade, collusion). Requires opt-in consent

How detection works

Each detector follows the existing InjectionDetectionProvider pattern:

Heuristic detectors (HTML, Markdown, critic evasion, few-shot, font, memory, sub-agent, persona) use regex/pattern matching and run on every request at negligible latency cost.
LLM-judge detectors (framing bias, RAG poisoning) use a heuristic prefilter first, then escalate to an LLM call only when the prefilter fires. This caps LLM cost to the population of suspicious requests.
Multimodal detectors (image/audio stego, adversarial patch) operate on media attachments via the media field on the Guard API.
Systemic correlators (sybil, fragment, cascade, collusion) run as a background service that reads from security_events, not on individual requests.

All detectors are surfaced through the same dashboard, audit log, and webhook infrastructure as existing threat types.

API integration

The Guard API accepts two new optional fields for agent-traps detection:

{
  "messages": [{"role": "user", "content": "..."}],
  "direction": "input",
  "retrieved_context": [
    {"content": "...", "source": "doc-id-123"}
  ],
  "media": [
    {"type": "image", "mime_type": "image/png", "base64": "..."}
  ]
}

Both fields are optional and backwards-compatible.

Get Started

Core Concepts

Gateway

Shadow AI

Guides

Cookbooks

Platform

Going to Production

Resources

Roadmap / Preview

Extended Threat Coverage

Coverage by category

Availability by tier

How detection works

API integration

Further reading

​Coverage by category

​Availability by tier

​How detection works

​API integration

​Further reading

Coverage by category

Availability by tier

How detection works

API integration

Further reading