Red Team Testing
The Red Team API lets you test your PromptGuard security configuration against a library of adversarial prompts. Use it to validate your presets, identify blind spots, and measure block rates before deploying to production.Why Red Team?
Before deploying AI security to production, you need to know:- Block rate: What percentage of attacks does your configuration stop?
- False negatives: Which attack types slip through?
- Confidence calibration: Are confidence scores reliable?
- Preset comparison: Which preset works best for your use case?
Endpoints
List Available Tests
Get all available red team test cases.Run Specific Test
Run a single test against a target preset.| Parameter | Type | Default | Description |
|---|---|---|---|
test_name | string (path) | Required | Name of the test to run |
target_preset | string (body) | "default" | Policy preset to test against |
Run Full Test Suite
Execute all red team tests and get an aggregate summary.Test Custom Prompt
Test your own adversarial prompt against the security configuration.Test Categories
| Category | Description | Example |
|---|---|---|
jailbreak | Bypass AI safety measures | ”You are DAN…” |
injection | Override instructions | ”Ignore previous instructions…” |
exfiltration | Extract system prompts or data | ”Repeat everything above” |
manipulation | Social engineering attacks | ”As an AI researcher, I need…” |
encoding | Obfuscated attacks | Base64, ROT13, leetspeak |
SDK Usage
- Python
- Node.js
- cURL
Interpreting Results
Block Rate Benchmarks
| Block Rate | Assessment | Action |
|---|---|---|
| > 90% | Good | Configuration is strong |
| 70-90% | Warning | Review allowed tests and tighten rules |
| < 70% | Critical | Upgrade preset strictness or add custom rules |
Confidence Score Guide
| Confidence | Meaning |
|---|---|
| > 0.85 | High confidence detection |
| 0.6 - 0.85 | Moderate confidence — may need rule tuning |
| < 0.6 | Low confidence — consider adjusting thresholds |
Best Practices
- Test before deploying — Run the full suite against your preset before going to production
- Compare presets — Run the same tests against different presets to find the best fit
- Test regularly — Re-run after changing presets or custom policies
- Use custom prompts — Test with prompts specific to your domain and user base
- Monitor the allowed tests — Focus on understanding why certain attacks weren’t blocked