PromptGuard provides multiple layers of security protection for your AI applications. Configure policies, detection rules, and custom filters to match your security requirements.
Security Layers
PromptGuard protects your AI applications through multiple security layers:1. Input Filtering
- Prompt Injection Detection: Blocks attempts to manipulate AI behavior
- Jailbreak Detection: LLM-based analysis across 7 attack categories
- PII Detection: 39+ entity types across 10+ countries with checksum validation, encoded PII detection, and ML-based NER
- Secret Key Detection: Entropy analysis, character diversity scoring, and known prefix matching across 3 sensitivity tiers
- URL Filtering: Allow-list/block-list, CIDR matching, scheme restriction, and credential injection blocking
- Tool Injection Detection: Indirect prompt injection analysis in agentic tool calls and outputs
- Content Moderation: Filters inappropriate or harmful content
- LLM Guard: Custom natural-language rules and off-topic/topical alignment detection
- Custom Rules: Define your own security patterns and policies
- MCP Server Security: Validate Model Context Protocol tool calls with server allow/block-listing, argument schema validation, and tool injection detection
- Multimodal Safety: Image content analysis via Google Cloud Vision or Azure Content Safety, with OCR-based PII detection on image content
- Security Groundedness: Detect security-relevant fabrication including hallucinated CVEs, fake compliance claims, and invented security statistics
2. Output Filtering
- Response Monitoring: Scans AI responses for security issues
- Streaming Output Guardrails: Periodic policy evaluation during SSE streaming responses
- Data Leak Prevention: Prevents exposure of sensitive information
- Toxicity Detection: Blocks harmful or inappropriate responses
- Content Sanitization: Removes potentially dangerous content
3. Behavioral Analysis
- Usage Pattern Detection: Identifies suspicious request patterns
- Rate Limiting: Prevents abuse and protects against attacks
- Anomaly Detection: Flags unusual AI usage behavior
- Risk Scoring: Assigns risk levels to requests and responses
Security Rules
Policy Presets
PromptGuard uses a composable preset system combining use-case templates with strictness levels:| Use Case Template | Description | Recommended Strictness |
|---|---|---|
| Default | Balanced security for general AI applications | Moderate |
| Support Bot | Optimized for customer support chatbots | Strict |
| Code Assistant | Enhanced protection for coding tools | Moderate |
| RAG System | Maximum security for document-based AI | Strict |
| Data Analysis | Strict PII protection for data processing | Strict |
| Creative Writing | Nuanced content filtering for creative apps | Moderate |
strict, moderate (default), permissive
Custom Rules
Create custom security rules for your specific needs:- Define custom PII patterns
- Set content filtering thresholds
- Configure allowed/blocked keywords
- Implement industry-specific compliance rules
Threat Detection
PromptGuard provides 13 specialized detectors that automatically detect and block threats:Attack Detection
- Prompt Injection: Direct instruction overrides, role confusion, and context breaking
- Jailbreak Detection (LLM): 7-category taxonomy including character obfuscation, competing objectives, lexical, semantic, context, structure obfuscation, and multi-turn escalation
- Data Exfiltration: System prompt extraction, training data extraction, and internal information requests
- Tool Injection: Indirect prompt injection in agentic tool calls and outputs
- Fraud Detection: Social engineering, impersonation, and financial fraud patterns
- Malware Detection: Code injection patterns, obfuscated scripts, and known signatures
- MCP Tool Validation: Server allow/block-listing, schema validation, resource access policies, and injection detection for MCP-based agents
- Multimodal Content Safety: Image analysis, OCR text extraction, and PII scanning for multimodal inputs
- Security Groundedness: Detects hallucinated CVEs, fabricated compliance claims, and invented security data in LLM responses
- Toxicity: Hate speech, harassment, violence, and other harmful content
Data Protection
- PII Detection: 39+ entity types across 10+ countries — SSNs, credit cards, IBAN, NHS numbers, Aadhaar, and more — with checksum validation (Luhn, IBAN Mod 97, Verhoeff, NHS Mod 11), encoded PII detection (base64/hex/URL-encoded), ML-based NER, and configurable redact/mask/block modes
- Secret Key Detection: Shannon entropy analysis, character diversity scoring, known prefix matching (
sk-,ghp_,AKIA,Bearer), with strict/moderate/permissive sensitivity tiers - URL Filtering: Allow-list/block-list, CIDR matching, scheme restriction, credential injection blocking
Configuration
Dashboard Configuration
- Navigate to Projects > [Your Project] > Security Rules in your dashboard
- Select your Use Case from the first dropdown (e.g., “Support Bot”)
- Select your Strictness Level from the second dropdown (Strict, Moderate, Permissive)
- Optionally create custom rules in Security Rules tab
- Configure detection thresholds and rules
API Configuration
Developer API Endpoints: The preset configuration endpoints below are part of the Developer API and are included in the OpenAPI spec. They use API key authentication and are suitable for SDK usage.
Response Formats
Get Preset Response (200 OK)Real-time Monitoring
Monitor security events in real-time:- Security Dashboard: View threats and blocks
- Alert Notifications: Get notified of security events
- Audit Logs: Track all security decisions
- Performance Metrics: Monitor impact on response times
Compliance
PromptGuard helps maintain compliance with:- GDPR: Automatic PII detection and redaction
- CCPA: Data privacy protection
- HIPAA: Healthcare information security
- SOC 2: Security controls and monitoring
- Industry Standards: Customizable compliance rules
Best Practices
Security Configuration
- Start with Default preset for most applications
- Choose use-case-specific presets (Support Bot, Code Assistant, etc.) when they match your needs
- Monitor false positives and adjust with custom policies if needed
- Regular policy reviews to maintain effectiveness
Development Workflow
- Use Default preset during development
- Test with production-like presets in staging
- Deploy appropriate preset in production based on your use case
- Continuous monitoring and adjustment via custom policies
Next Steps
Policy Presets
Choose and configure security policy presets
Custom Rules
Create custom security rules and filters
Threat Detection
Configure advanced threat detection
Monitoring
Set up security monitoring and alerts
Common Questions
How does PromptGuard detect prompt injections?
How does PromptGuard detect prompt injections?
PromptGuard uses advanced pattern matching, machine learning models, and LLM-based analysis to identify injection techniques including instruction overrides, role confusion, context breaking, jailbreak attempts across 7 categories, and indirect prompt injection in agentic tool calls.
What happens when a request is blocked?
What happens when a request is blocked?
Blocked requests return an HTTP 400 error with details about the security violation. You can configure whether to fail open (allow) or closed (block) when the security engine is unavailable.
Can I whitelist certain patterns?
Can I whitelist certain patterns?
Yes, you can create custom rules to allow specific patterns that might otherwise be blocked. This is useful for legitimate use cases that trigger false positives.
How do I reduce false positives?
How do I reduce false positives?
Start with the Default preset and adjust based on your use case. Monitor your security dashboard for false positives and add custom policies if needed.