Coverage by category
| Category | Vectors | Detectors |
|---|---|---|
| Content Injection | 7 | HTML/CSS obfuscation, Markdown/LaTeX masking, image stego, audio stego, adversarial patch, font injection, dynamic cloaking |
| Semantic Manipulation | 3 | Framing bias, critic evasion, persona drift |
| Cognitive State | 3 | RAG poisoning, memory poisoning, few-shot poisoning |
| Behavioural Control | 1 | Sub-agent spawning (+ existing prompt injection) |
| Systemic | 4 | Fragment reassembly, sybil detection, cascade anomaly, tacit collusion |
| Human-in-the-Loop | 1 | Approval-fatigue policy |
Availability by tier
| Tier | Included detectors |
|---|---|
| Pro | All single-call text detectors (HTML, Markdown, critic evasion, framing bias, few-shot, RAG, font, memory, sub-agent, persona, approval fatigue, dynamic cloaking) |
| Scale | Pro + multimodal detectors (image stego, image adversarial, audio stego) |
| Enterprise | Scale + cross-tenant correlation (sybil, fragment, cascade, collusion). Requires opt-in consent |
How detection works
Each detector follows the existingInjectionDetectionProvider pattern:
- Heuristic detectors (HTML, Markdown, critic evasion, few-shot, font, memory, sub-agent, persona) use regex/pattern matching and run on every request at negligible latency cost.
- LLM-judge detectors (framing bias, RAG poisoning) use a heuristic prefilter first, then escalate to an LLM call only when the prefilter fires. This caps LLM cost to the population of suspicious requests.
- Multimodal detectors (image/audio stego, adversarial patch) operate on media attachments via the
mediafield on the Guard API. - Systemic correlators (sybil, fragment, cascade, collusion) run as a background service that reads from
security_events, not on individual requests.