Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.promptguard.co/llms.txt

Use this file to discover all available pages before exploring further.

Changelog

Stay up to date with the latest changes to PromptGuard, including new features, improvements, and bug fixes.

April 2026 - Pricing update

  • Pro is now **99/month(annual:99/month** (annual: 1,089) — same 100K requests / 5 projects / 7-day retention.
  • Scale is now **199/month(annual:199/month** (annual: 2,189) — same 1M soft-limit / unlimited projects / 30-day retention.
  • Free and Enterprise are unchanged.
  • Existing subscribers stay on their original price (Stripe does not migrate active subscriptions).

April 2026 - v3.3.0: ATR Integration, Agentic OWASP, Cisco Plugin

  • Community rule pack — Ingested 108 open-source rules / 714 regex patterns as a fast pre-filter layer covering agent-specific threats: MCP tool poisoning, cross-agent manipulation, skill supply chain attacks, privilege escalation, and excessive autonomy. PromptGuard now runs ~1,000+ detection patterns across built-in and community rule sets.
  • OWASP Agentic Top 10 mapping — Every security event now maps to both the OWASP LLM Top 10 (LLM01–LLM10) and the OWASP Agentic Top 10 (ASI01–ASI10). Full coverage for enterprise compliance reporting across both frameworks.
  • External benchmark eval framework — Added loaders for PINT (Invariant Labs, 850 adversarial samples) and Garak (NVIDIA, 666+ jailbreak probes) benchmarks for continuous validation using the existing eval runner.
  • Scanner integration — Thin API shim (PromptGuardAnalyzer) for contributing to open-source agent security scanners as an optional analysis backend. Zero detection logic shipped — all intelligence stays server-side.
  • Detection strategy bug fix — Fixed FAST_FIRST mode: non-detections from later providers no longer overwrite earlier detections, ensuring the first positive match is always preserved.

March 2026 - v3.0: SOTA Detection Upgrade

  • Content safety classification — LLM-based harmful intent detection via an open-weight safety classifier, catching requests that traditional toxicity models miss (100% detection, 0% false positives)
  • Multi-turn intent drift detection — DeepContext-inspired crescendo attack detection using semantic embedding drift analysis with LLM verification
  • Universal ML access — All detection layers (ML ensemble, content safety, multi-turn analysis) now available on all plan tiers; pricing differentiates on usage volume only
  • Six-layer detection architecture — Upgraded from four layers to six: normalization → regex → ML ensemble → content safety → multi-turn drift → policy evaluation
  • HIGHEST_CONFIDENCE strategy — Detection from any layer is sufficient to block; layers complement rather than gate each other

March 2026

Dashboard Overhaul - CISO & ML Features

  • Alerts Feed: Real-time alert feed with severity filtering, status tracking, and unread count badge in the navigation bar
  • Threat Intelligence: Cross-tenant anonymized attack patterns, mutation strategy trends, and evasion rate analysis (Scale+)
  • Audit Log: Dedicated filterable audit log page with JSON export for SOC 2/GDPR compliance (Scale+)
  • Webhook Delivery Monitoring: Track delivery status, retry failed deliveries, and diagnose integration issues per project
  • Detector Performance: Per-detector accuracy, false positive rates, and latency metrics to help CISOs tune detection
  • Token-Level Explainability: Interaction detail pages now highlight which parts of a prompt triggered detection with confidence breakdowns
  • Attack Drift Detection: Visualize how attack patterns shift over time on the Threat Intelligence page
  • Conversation-Level Threat View: Group multi-turn interactions to detect escalation patterns across turns
  • Security Cost Analysis: ROI visualization showing latency cost vs. threats prevented in project analytics
  • Feedback Impact: See how your false positive/negative reports improve detection accuracy
  • Compliance Reports: Interactive framework-specific reports (SOC 2, GDPR, HIPAA, OWASP) with coverage progress bars
  • Metrics Consistency: Unified “Threats Flagged” metric (block + redact) across all dashboard pages - no more mismatched numbers
  • Brand Refresh: New deep indigo color identity with cool-tinted neutrals, unified chart color system, and View Transitions API for smooth page navigation
  • URL State: Interaction filters, search queries, and tab states are now bookmarkable and shareable
  • Command Palette: Enhanced ⌘K menu with “Jump to” shortcuts for Alerts, Threat Intelligence, and Audit Log
  • Responsive Design: Dashboard settings, compliance, and all new pages fully responsive for mobile and tablet

OWASP LLM Top 10 Mapping

  • Every security event is automatically classified against the OWASP LLM Top 10 framework with owasp_id, cwe_id, and human-readable title
  • Dashboard shows OWASP badges on event detail pages and an aggregate OWASP Top 10 Coverage chart on the project overview
  • Supports all 10 OWASP categories: LLM01 (Prompt Injection) through LLM09 (Misinformation), mapped from PromptGuard’s native threat types

AI-Generated Remediation Suggestions

  • Blocked and redacted events automatically receive AI-generated security insights using an open-source HuggingFace model (Qwen/Qwen3-4B)
  • Each insight includes a summary, impact assessment, and actionable remediation steps
  • Runs asynchronously in a background thread - adds zero latency to the request path
  • Displayed in the dashboard event detail as an “AI-Generated Insight” card

Dashboard UX Enhancements

  • Security Posture Card: Project overview shows a data-driven circular gauge (0–100) reflecting guardrail coverage and event activity, with status labels (Excellent / Good / Needs Attention / Critical)
  • Active Guardrails Strip: Horizontal pill badges showing which guardrails are enabled/disabled at a glance, linking to the guardrails configuration page
  • Recent Threats Table: Main dashboard overview shows the last 5 blocked/flagged events across all projects with threat type badges, OWASP IDs, and click-through navigation
  • Enhanced Global Search (⌘K): Server-side event search with debounced API calls, threat type and OWASP ID search, and result counts per group
  • Sidebar Count Badges: Interactions nav item shows a live count of flagged events in the last 24 hours
  • StatsGrid Sparklines: Inline SVG sparklines on stat cards showing trends from timeseries data
  • Standardized Page Headers: Consistent PageHeader component across Interactions, Analytics, and Security Rules pages with contextual action buttons and keyboard shortcut hints
  • Keyboard Navigation: G then I/R/A/O/K/T/P shortcuts for rapid project page navigation

Custom Policy Engine

  • 7 policy types: input_filter, output_filter, topic_filter, llm_guard, entity_blocklist, rate_limit, and custom - all manageable via API and dashboard
  • Topic Filter: Define conversation scope in natural language; an LLM judge blocks off-topic queries
  • LLM Guard: Custom natural-language business rules evaluated by an LLM judge for constraints too nuanced for regex
  • Entity Blocklist: Protect specific names, terms, or identifiers from appearing in prompts or responses with pipe-delimited matching
  • contains_text_any condition: Match any of multiple pipe-separated terms in a single rule (e.g., "Acme|Globex|Initech")
  • Full dashboard UI for creating and managing all policy types, including system_prompt_details editor for topic filter and LLM guard

Zero-Trust Response Verification

  • HMAC-SHA256 response signing (X-PromptGuard-Signature): Cryptographic proof that the response came from PromptGuard and was not tampered with
  • Content hashing (X-PromptGuard-Content-Hash): SHA-256 hash of the response body for independent integrity verification
  • Zero-retention header (X-PromptGuard-Zero-Retention): Explicit confirmation that prompt content was not stored when zero-retention mode is enabled
  • Replay protection: Timestamp-based signature validation with configurable max age (default 5 minutes)

Per-Project Token Limits

  • Set max_tokens_per_request per project to cap prompt size before it reaches the LLM provider
  • Requests exceeding the limit are rejected with HTTP 413, saving LLM costs
  • tiktoken integration: Accurate token counting using OpenAI’s tokenizer (falls back to chars/4 heuristic)

Hallucination Detection with RAG Context

  • RAG context threading: Automatically extracts grounding context from system messages and tool results in conversation history
  • Source-grounded verification: Compares LLM responses against retrieved documents for higher-accuracy hallucination scoring
  • Configurable enforcement: metadata (default), flag (log for review), or block (reject above threshold)
  • Adjustable block_threshold (0.0–1.0) for tuning sensitivity per project

Hardened Container Security

  • 3-stage Dockerfile: Build → Compile → Hardened production image
  • Source code compiled to .pyc bytecode; original .py files stripped from production image
  • Shell binaries removed (/bin/sh, /bin/bash, curl, wget, apt-get) - kubectl exec has nothing to invoke
  • Docker Compose: read_only: true, cap_drop: ALL, no-new-privileges
  • Helm chart: readOnlyRootFilesystem, allowPrivilegeEscalation: false, capabilities.drop: ALL

Autonomous Red Team Agent

  • LLM-powered adversarial search discovers novel attack vectors through intelligent mutation
  • Budget-controlled iterations (1—1000) for configurable thoroughness
  • Generates graded security reports (A through F) with actionable recommendations
  • CLI support: promptguard redteam --autonomous --budget 200
  • SDK support: pg.redteam.run_autonomous() (Python) / pg.redteam.runAutonomous() (Node.js)

Attack Intelligence Database

  • Anonymized bypass pattern storage for organizational learning
  • Query statistics via GET /internal/redteam/intelligence/stats
  • Categories, severity breakdown, and recent discovery counts

CI/CD Security Gate

  • GitHub Action (promptguard/security-gate@v1) runs red team tests on every PR
  • Configurable minimum grade (A—F), regression detection, and PR comment reporting
  • Outputs: grade, score, bypasses found, and full JSON report

MCP Server Security

  • Validate Model Context Protocol (MCP) tool calls before execution
  • Server allow/block-listing, JSON Schema argument validation, and resource access policies
  • Tool injection detection for MCP-based agent architectures

Policy-as-Code (YAML)

  • Define guardrail configurations in YAML, version in git, apply via CLI
  • promptguard policy apply / diff / export commands
  • Validation, diffing, and idempotent application against live config

Multimodal Guardrails

  • Image content safety via API delegation (Google Cloud Vision, Azure Content Safety)
  • OCR-based text extraction with PII detection on image content
  • Pluggable provider architecture for vision analysis

Security Groundedness Detection

  • Detects security-relevant fabrication in LLM responses
  • Identifies hallucinated CVEs, fake compliance claims, and invented security statistics
  • Pattern-based confidence scoring with configurable thresholds

Open Source AI Attack Dataset

  • Curated adversarial evaluation dataset with deterministic and LLM-powered mutations
  • 8 mutation categories: synonym substitution, character obfuscation, encoding, payload splitting, and more
  • HuggingFace-ready export for community benchmarking

Performance & Observability

  • PolicyEngine fast path with thread-safe LRU cache (TTL-based) for sub-50ms repeated evaluations
  • Per-detector profiling with timing instrumentation for performance analysis
  • OpenTelemetry metrics: counters for block/allow decisions, latency histograms, detector-level timing
  • Plugs into Datadog, Grafana, Honeycomb, and any OTEL-compatible backend

SDK & CLI Updates

  • Python + Node.js SDKs: run_autonomous() and intelligence_stats() methods on RedTeam class
  • CLI: redteam --autonomous flag with --budget control
  • CLI: policy apply/diff/export subcommands for YAML-based config management

Late February 2026

Expanded Security Guardrails

  • Expanded from 7 to 10 security guardrails (since grown to 14 with v3.3.0): Prompt Injection, PII Detection, Data Exfiltration, Toxicity, Secret Key Detection, URL Filtering, Fraud Detection, Malware Detection, Jailbreak Detection (LLM), and Tool Injection
  • Jailbreak Detection (LLM): LLM-powered jailbreak detection catches sophisticated bypass attempts that evade traditional pattern matching, including multi-turn and encoded attacks
  • URL Filtering: Detect and block malicious, phishing, or unauthorized URLs in prompts and responses
  • Tool Injection Detection: Block attempts to inject malicious tool calls or manipulate agent tool usage through crafted prompts

Enhanced PII Detection

  • Expanded PII coverage from 14 to 39+ entity types across 10+ countries
  • Checksum validation for structured identifiers (credit cards, IBANs, tax IDs, national IDs)
  • Country-specific entity support including national health numbers, driving licenses, and passport formats

Secret Key Detection with Entropy Analysis

  • Entropy-based analysis to detect high-randomness strings that are likely secrets
  • Provider-specific pattern matching for major API key formats (AWS, Stripe, GitHub, etc.)

Granular Guardrail Configuration Dashboard

  • Per-guardrail enable/disable and threshold configuration from the dashboard
  • Fine-tune sensitivity, actions (block/redact/log), and scope for each guardrail

Streaming Output Guardrails

  • Real-time guardrail enforcement on streaming responses from LLM providers
  • Scan and filter output tokens as they stream, blocking threats mid-response without breaking the stream

SDK Improvements

  • Retry logic with configurable backoff for transient failures in Python and Node.js SDKs
  • Async Python client for high-throughput, non-blocking guardrail calls
  • Embeddings API support - guardrail protection for embedding model requests

Evaluation Framework

  • Benchmarking framework for measuring guardrail accuracy, latency, and false-positive rates
  • Pre-built test suites for prompt injection, PII detection, and jailbreak scenarios
  • Compare guardrail configurations side-by-side with detailed metrics

February 2026

GitHub Code Security Scanner

  • GitHub App integration for connecting repositories to PromptGuard
  • Automatic scanning of repositories for unprotected LLM SDK calls
  • AST-based detection for both Python (ast module) and JS/TS (tree-sitter) — zero false positives from comments, strings, or template literals
  • Auto-fix pull requests that add PromptGuard protection to detected LLM calls
  • CI checks on pull requests to flag new unprotected LLM usage
  • Consolidated UX: scan history, findings, and repository management all accessible from Settings > Integrations in a single expandable interface

Organizations & Teams

  • Create team organizations with shared projects and billing
  • Role-based access control: Owner, Admin, Member, and Viewer roles
  • Invite members via email with configurable roles
  • Transfer ownership and manage invitations from Settings > Team
  • Full API support under /dashboard/organizations

Enterprise Tier

  • Self-hosted deployment — run PromptGuard on your own infrastructure
  • Zero-trust / air-gapped mode — fully offline operation with no external API calls
  • SSO (SAML / OIDC) support
  • Audit logs and IP allowlisting
  • Custom data retention and dedicated support with SLA
  • Enterprise comparison table on pricing page

SDK Auto-Instrumentation

  • Python SDK: promptguard.init() auto-patches OpenAI, Anthropic, Google, Cohere, and AWS Bedrock SDKs
  • Node.js SDK: init() auto-patches OpenAI, Anthropic, Google AI, Cohere, and AWS Bedrock SDKs
  • Works transparently with all frameworks (LangChain, CrewAI, LlamaIndex, Vercel AI SDK, AutoGen)
  • Enforce mode (block threats) and monitor mode (log only)
  • Fail-open by default with configurable fail-closed mode
  • Optional response scanning (scan_responses / scanResponses)

Guard API

  • New POST /api/v1/guard endpoint for standalone content scanning
  • Accepts messages array with direction (input/output), model, and context
  • Returns decision (allow/block/redact), confidence score, threat details, and optional redacted messages
  • Used internally by auto-instrumentation and available directly via GuardClient

Security Scan & Redact Endpoints

  • POST /api/v1/security/scan — analyze raw text for prompt injection and other threats
  • POST /api/v1/security/redact — strip PII from text with selective type filtering
  • Lightweight alternatives to the Guard API for pipelines and batch processing

Framework Integrations

  • LangChain.js callback handler (PromptGuardCallbackHandler)
  • Vercel AI SDK middleware (promptGuardMiddleware)
  • Python: Native support via auto-instrumentation for LangChain, CrewAI, LlamaIndex

Documentation Overhaul

  • Rewrote Python and Node.js SDK references to cover auto-instrumentation, GuardClient, and framework integrations
  • Added Enterprise pricing tier with feature comparison
  • Added Guard API, Security Scan, and Security Redact API reference pages
  • Added Organizations & Teams documentation
  • Regenerated OpenAPI spec (35 developer endpoints, 20 schemas)
  • Updated pricing to match current plans ($149/month Scale, Enterprise tier)

Code Quality

  • Tree-sitter AST parsing for JS/TS code scanning (replacing regex)
  • Shared detection manifest (sdk-patterns.json) as single source of truth for LLM SDK patterns
  • Removed 17 unused backend endpoint files and dead code

January 2026

Billing & Subscriptions

  • Plan change with proration support (upgrade/downgrade mid-cycle)
  • Usage-based billing alerts at 80% and 100% thresholds
  • Stripe integration with metered billing for Scale plan overage

Security Improvements

  • AI-powered threat detection with F1 = 0.887 and 99.1% precision
  • Enhanced PII detection patterns (SSN, credit card, API keys)
  • Red team test suite with 25+ adversarial test cases

December 2025

Initial Launch

  • PromptGuard API (OpenAI-compatible proxy)
  • Dashboard with project management and analytics
  • Free, Pro, and Scale subscription tiers
  • Regex-based threat detection
  • PII redaction (email, phone, SSN, credit card)
  • Rate limiting and usage tracking

For feature requests or bug reports, contact support@promptguard.co.