Skip to main content
PromptGuard provides multiple layers of security protection for your AI applications. Configure policies, detection rules, and custom filters to match your security requirements.

Security Layers

PromptGuard protects your AI applications through multiple security layers:

1. Input Filtering

  • Prompt Injection Detection: Blocks attempts to manipulate AI behavior
  • Jailbreak Detection: LLM-based analysis across 7 attack categories
  • PII Detection: 39+ entity types across 10+ countries with checksum validation, encoded PII detection, and ML-based NER
  • Secret Key Detection: Entropy analysis, character diversity scoring, and known prefix matching across 3 sensitivity tiers
  • URL Filtering: Allow-list/block-list, CIDR matching, scheme restriction, and credential injection blocking
  • Tool Injection Detection: Indirect prompt injection analysis in agentic tool calls and outputs
  • Content Moderation: Filters inappropriate or harmful content
  • LLM Guard: Custom natural-language rules and off-topic/topical alignment detection
  • Custom Rules: Define your own security patterns and policies
  • MCP Server Security: Validate Model Context Protocol tool calls with server allow/block-listing, argument schema validation, and tool injection detection
  • Multimodal Safety: Image content analysis via Google Cloud Vision or Azure Content Safety, with OCR-based PII detection on image content
  • Security Groundedness: Detect security-relevant fabrication including hallucinated CVEs, fake compliance claims, and invented security statistics

2. Output Filtering

  • Response Monitoring: Scans AI responses for security issues
  • Streaming Output Guardrails: Periodic policy evaluation during SSE streaming responses
  • Data Leak Prevention: Prevents exposure of sensitive information
  • Toxicity Detection: Blocks harmful or inappropriate responses
  • Content Sanitization: Removes potentially dangerous content

3. Behavioral Analysis

  • Usage Pattern Detection: Identifies suspicious request patterns
  • Rate Limiting: Prevents abuse and protects against attacks
  • Anomaly Detection: Flags unusual AI usage behavior
  • Risk Scoring: Assigns risk levels to requests and responses

Security Rules

Policy Presets

PromptGuard uses a composable preset system combining use-case templates with strictness levels:
Use Case TemplateDescriptionRecommended Strictness
DefaultBalanced security for general AI applicationsModerate
Support BotOptimized for customer support chatbotsStrict
Code AssistantEnhanced protection for coding toolsModerate
RAG SystemMaximum security for document-based AIStrict
Data AnalysisStrict PII protection for data processingStrict
Creative WritingNuanced content filtering for creative appsModerate
Strictness Levels: strict, moderate (default), permissive

Custom Rules

Create custom security rules for your specific needs:
  • Define custom PII patterns
  • Set content filtering thresholds
  • Configure allowed/blocked keywords
  • Implement industry-specific compliance rules

Threat Detection

PromptGuard provides 13 specialized detectors that automatically detect and block threats:

Attack Detection

  • Prompt Injection: Direct instruction overrides, role confusion, and context breaking
  • Jailbreak Detection (LLM): 7-category taxonomy including character obfuscation, competing objectives, lexical, semantic, context, structure obfuscation, and multi-turn escalation
  • Data Exfiltration: System prompt extraction, training data extraction, and internal information requests
  • Tool Injection: Indirect prompt injection in agentic tool calls and outputs
  • Fraud Detection: Social engineering, impersonation, and financial fraud patterns
  • Malware Detection: Code injection patterns, obfuscated scripts, and known signatures
  • MCP Tool Validation: Server allow/block-listing, schema validation, resource access policies, and injection detection for MCP-based agents
  • Multimodal Content Safety: Image analysis, OCR text extraction, and PII scanning for multimodal inputs
  • Security Groundedness: Detects hallucinated CVEs, fabricated compliance claims, and invented security data in LLM responses
  • Toxicity: Hate speech, harassment, violence, and other harmful content

Data Protection

  • PII Detection: 39+ entity types across 10+ countries — SSNs, credit cards, IBAN, NHS numbers, Aadhaar, and more — with checksum validation (Luhn, IBAN Mod 97, Verhoeff, NHS Mod 11), encoded PII detection (base64/hex/URL-encoded), ML-based NER, and configurable redact/mask/block modes
  • Secret Key Detection: Shannon entropy analysis, character diversity scoring, known prefix matching (sk-, ghp_, AKIA, Bearer), with strict/moderate/permissive sensitivity tiers
  • URL Filtering: Allow-list/block-list, CIDR matching, scheme restriction, credential injection blocking

Configuration

Dashboard Configuration

  1. Navigate to Projects > [Your Project] > Security Rules in your dashboard
  2. Select your Use Case from the first dropdown (e.g., “Support Bot”)
  3. Select your Strictness Level from the second dropdown (Strict, Moderate, Permissive)
  4. Optionally create custom rules in Security Rules tab
  5. Configure detection thresholds and rules

API Configuration

Developer API Endpoints: The preset configuration endpoints below are part of the Developer API and are included in the OpenAPI spec. They use API key authentication and are suitable for SDK usage.
Configure security policies programmatically using the Developer API:
import requests
import os

api_key = os.environ.get("PROMPTGUARD_API_KEY")
base_url = "https://api.promptguard.co/api/v1/presets"

headers = {
    "X-API-Key": api_key,
    "Content-Type": "application/json"
}

# Complete configuration workflow
def configure_security_preset(project_id, use_case, strictness_level="moderate"):
    """
    Configure security preset for a project.

    Args:
        project_id: Project ID to configure
        use_case: Use case template (e.g., 'support_bot', 'code_assistant')
        strictness_level: Strictness level ('strict', 'moderate', 'permissive')
    """
    try:
        # Step 1: List available use cases
        print("Fetching available use cases...")
        use_cases_response = requests.get(
            f"{base_url}/use-cases",
            headers=headers
        )

        if use_cases_response.status_code != 200:
            print(f"Error fetching use cases: HTTP {use_cases_response.status_code}")
            return False

        use_cases = use_cases_response.json()
        available_keys = [uc.get('key') for uc in use_cases.get('use_cases', [])]

        # Step 2: Validate use case
        if use_case not in available_keys:
            print(f"Error: Use case '{use_case}' not found")
            print(f"Available use cases: {', '.join(available_keys)}")
            return False

        # Step 3: Validate strictness level
        valid_strictness = ['strict', 'moderate', 'permissive']
        if strictness_level not in valid_strictness:
            print(f"Error: Invalid strictness level '{strictness_level}'")
            print(f"Valid levels: {', '.join(valid_strictness)}")
            return False

        # Step 4: Get current configuration
        print(f"\nGetting current configuration for project {project_id}...")
        current_response = requests.get(
            f"{base_url}/projects/{project_id}/preset",
            headers=headers
        )

        if current_response.status_code == 200:
            current = current_response.json()
            print(f"Current: {current.get('use_case')}:{current.get('strictness_level')}")
        elif current_response.status_code == 404:
            print(f"Error: Project {project_id} not found")
            return False

        # Step 5: Update preset
        preset_name = f"{use_case}:{strictness_level}"
        print(f"\nUpdating preset to: {preset_name}")

        update_response = requests.put(
            f"{base_url}/projects/{project_id}/preset",
            headers=headers,
            json={"preset_name": preset_name}
        )

        if update_response.status_code == 200:
            result = update_response.json()
            print(f"✅ Successfully updated to {preset_name}")

            # Step 6: Verify configuration
            print("\nVerifying configuration...")
            verify_response = requests.get(
                f"{base_url}/projects/{project_id}/preset",
                headers=headers
            )

            if verify_response.status_code == 200:
                verified = verify_response.json()
                if (verified.get('use_case') == use_case and
                    verified.get('strictness_level') == strictness_level):
                    print("✅ Configuration verified successfully!")
                    return True
                else:
                    print("⚠️  Warning: Configuration may not have updated correctly")
                    return False

            return True
        elif update_response.status_code == 400:
            error = update_response.json()
            print(f"Error: {error.get('detail', 'Invalid request')}")
        elif update_response.status_code == 404:
            print(f"Error: Project {project_id} not found")
        elif update_response.status_code == 401:
            print("Error: Invalid API key")
        else:
            print(f"Error: HTTP {update_response.status_code}")

        return False

    except requests.exceptions.RequestException as e:
        print(f"Network error: {e}")
        return False
    except Exception as e:
        print(f"Unexpected error: {e}")
        return False

# Quick preset update
def quick_update_preset(project_id, preset_name):
    """
    Quick update using preset name string.
    Format: 'use_case:strictness' or 'use_case' (defaults to moderate)
    """
    response = requests.put(
        f"{base_url}/projects/{project_id}/preset",
        headers=headers,
        json={"preset_name": preset_name}
    )

    if response.status_code == 200:
        print(f"✅ Preset updated to: {preset_name}")
        return response.json()
    else:
        error = response.json() if response.content else {}
        print(f"Error: {error.get('detail', f'HTTP {response.status_code}')}")
        return None

# Example usage
if __name__ == "__main__":
    project_id = "proj_abc123"

    # Complete workflow
    configure_security_preset(project_id, "support_bot", "strict")

    # Quick update
    quick_update_preset(project_id, "code_assistant:moderate")

    # Update with default strictness (moderate)
    quick_update_preset(project_id, "rag_system")

Response Formats

Get Preset Response (200 OK)
{
  "project_id": "proj_abc123",
  "use_case": "support_bot",
  "strictness_level": "strict",
  "preset_name": "support_bot:strict"
}
Update Preset Response (200 OK)
{
  "project_id": "proj_abc123",
  "use_case": "support_bot",
  "strictness_level": "strict",
  "preset_name": "support_bot:strict",
  "message": "Preset updated successfully"
}
Error Responses 400 Bad Request - Invalid preset format
{
  "detail": "Invalid preset format. Use 'use_case:strictness' or 'use_case'"
}
404 Not Found - Project doesn’t exist
{
  "detail": "Project not found"
}
401 Unauthorized - Invalid API key
{
  "detail": "Invalid API key"
}

Real-time Monitoring

Monitor security events in real-time:
  • Security Dashboard: View threats and blocks
  • Alert Notifications: Get notified of security events
  • Audit Logs: Track all security decisions
  • Performance Metrics: Monitor impact on response times

Compliance

PromptGuard helps maintain compliance with:
  • GDPR: Automatic PII detection and redaction
  • CCPA: Data privacy protection
  • HIPAA: Healthcare information security
  • SOC 2: Security controls and monitoring
  • Industry Standards: Customizable compliance rules

Best Practices

Security Configuration

  1. Start with Default preset for most applications
  2. Choose use-case-specific presets (Support Bot, Code Assistant, etc.) when they match your needs
  3. Monitor false positives and adjust with custom policies if needed
  4. Regular policy reviews to maintain effectiveness

Development Workflow

  1. Use Default preset during development
  2. Test with production-like presets in staging
  3. Deploy appropriate preset in production based on your use case
  4. Continuous monitoring and adjustment via custom policies

Next Steps

Policy Presets

Choose and configure security policy presets

Custom Rules

Create custom security rules and filters

Threat Detection

Configure advanced threat detection

Monitoring

Set up security monitoring and alerts

Common Questions

PromptGuard uses advanced pattern matching, machine learning models, and LLM-based analysis to identify injection techniques including instruction overrides, role confusion, context breaking, jailbreak attempts across 7 categories, and indirect prompt injection in agentic tool calls.
Blocked requests return an HTTP 400 error with details about the security violation. You can configure whether to fail open (allow) or closed (block) when the security engine is unavailable.
Yes, you can create custom rules to allow specific patterns that might otherwise be blocked. This is useful for legitimate use cases that trigger false positives.
Start with the Default preset and adjust based on your use case. Monitor your security dashboard for false positives and add custom policies if needed.
Need help configuring security? Contact our security team for personalized assistance.