Skip to main content

Overview

PromptGuard implements two types of limits to ensure fair usage and system stability:
  1. Monthly Request Quotas - Based on your subscription plan
  2. Rate Limiting - Maximum requests per minute (anti-abuse)

Monthly Request Quotas

Your subscription plan determines how many requests you get per month:
PlanMonthly LimitOver-Quota Behavior
Free10,000Hard block (429 error when exceeded)
Pro100,000Hard block (429 error when exceeded)
Scale1,000,000Soft limit (alerts only, never blocks)
EnterpriseCustom (per contract)Soft limit (never blocks, custom alerts)

Hard vs Soft Limits

Free and Pro plans use hard limits:
  • When you exceed your monthly quota, requests return 429 Too Many Requests
  • You must upgrade to continue using the service
  • Free (10K) → Upgrade to Pro (100K)
  • Pro (100K) → Upgrade to Scale (1M)
Scale plan uses soft limits:
  • When you exceed 1M requests/month, requests continue processing
  • You receive email alerts about overage
  • No blocking - your application keeps running
  • Overage is logged for analytics and billing
Example (Scale plan):
# You're on Scale plan (1M/month)
# Usage: 1,050,000 requests this month

# Request still works:
curl https://api.promptguard.co/api/v1/chat/completions \
  -H "X-API-Key: your_api_key" \
  -H "Authorization: Bearer YOUR_OPENAI_KEY" \
  -d '{"model": "gpt-5-nano", "messages": [...]}'

# Returns: 200 OK (not 429)
# Logged as "over quota" for billing analytics

Checking Your Usage

View current usage in the dashboard:
Dashboard → Usage → Current Period
- Requests Used: 105,234 / 100,000
- Status: Over Quota (5,234 overage)
- Next Reset: January 15, 2025
Or via API:
curl https://api.promptguard.co/api/v1/usage/stats \
  -H "X-API-Key: your_api_key"

{
  "requests_used": 105234,
  "requests_limit": 100000,
  "overage": 5234,
  "reset_at": "2025-01-15T00:00:00Z"
}

Rate Limiting

PromptGuard enforces per-plan rate limits on all /api/v1/* endpoints:
PlanRate Limit
Free60 requests/minute
Pro120 requests/minute
Scale300 requests/minute
EnterpriseCustom (configurable per organization)
Additionally, Cloud Armor provides a global anti-abuse limit of 100 requests per minute per IP.

Rate Limit Headers

Every API response includes standard rate limit headers:
HTTP/1.1 200 OK
X-RateLimit-Limit: 120
X-RateLimit-Remaining: 85
X-RateLimit-Reset: 1708128060
HeaderDescription
X-RateLimit-LimitMax requests per minute for your plan
X-RateLimit-RemainingRequests remaining in current window
X-RateLimit-ResetUnix timestamp when the window resets
Enterprise organizations can request custom rate limits by contacting sales.

Handling Rate Limits

If you exceed 100 req/min, you’ll receive a 429 Too Many Requests response:
{
  "error": {
    "message": "Rate limit exceeded. Please try again later.",
    "type": "rate_limit_exceeded",
    "code": "too_many_requests"
  }
}
Recommended handling:
import time
import openai

def make_request_with_retry(prompt, max_retries=3):
    for attempt in range(max_retries):
        try:
            response = openai.ChatCompletion.create(
                model="gpt-5-nano",
                messages=[{"role": "user", "content": prompt}]
            )
            return response
        except openai.RateLimitError as e:
            if attempt < max_retries - 1:
                # Exponential backoff
                time.sleep(2 ** attempt)
            else:
                raise

Idempotency Keys

For safe retries of POST/PUT/PATCH requests, include an Idempotency-Key header:
curl -X POST https://api.promptguard.co/api/v1/chat/completions \
  -H "X-API-Key: your_api_key" \
  -H "Idempotency-Key: unique-request-id-12345" \
  -H "Content-Type: application/json" \
  -d '{"model": "gpt-4", "messages": [...]}'
If you retry the same request with the same idempotency key within 24 hours, you’ll get back the cached response with an X-Idempotency-Replayed: true header. This prevents duplicate operations.
Idempotency keys are scoped to your API key and expire after 24 hours.

Best Practices

1. Implement Exponential Backoff

async function makeRequestWithBackoff(prompt, maxRetries = 3) {
  for (let i = 0; i < maxRetries; i++) {
    try:
      return await openai.chat.completions.create({
        model: "gpt-5-nano",
        messages: [{ role: "user", content: prompt }]
      });
    } catch (error) {
      if (error.status === 429 && i < maxRetries - 1) {
        await new Promise(resolve =>
          setTimeout(resolve, Math.pow(2, i) * 1000)
        );
      } else {
        throw error;
      }
    }
  }
}

2. Monitor Usage Proactively

Set up monitoring to alert before you hit limits:
# Check usage before making request
usage = client.get_usage()
if usage['requests_used'] > usage['requests_limit'] * 0.9:
    send_alert("Approaching monthly quota limit")

3. Batch Requests When Possible

Instead of:
for prompt in prompts:
    response = openai.ChatCompletion.create(...)  # 100 API calls
Use batch processing:
# Combine prompts where appropriate
combined_prompt = "\n".join(prompts)
response = openai.ChatCompletion.create(
    messages=[{"role": "user", "content": combined_prompt}]
)  # 1 API call

4. Cache Responses

Cache frequently requested results:
import hashlib
import redis

cache = redis.Redis()

def get_cached_response(prompt):
    cache_key = hashlib.sha256(prompt.encode()).hexdigest()
    cached = cache.get(cache_key)

    if cached:
        return json.loads(cached)

    response = openai.ChatCompletion.create(...)
    cache.setex(cache_key, 3600, json.dumps(response))  # 1 hour TTL
    return response

Upgrading for Higher Limits

Need higher rate limits or custom quotas? Enterprise plans offer:
  • Custom rate limits per organization
  • Custom monthly request quotas
  • IP allowlisting for API access control
  • Idempotency keys for safe retries
  • Dedicated support and SLA guarantees
Contact us at sales@promptguard.co for Enterprise pricing.

Frequently Asked Questions

Why do different plans have different rate limits?

Rate limits scale with your plan tier (Free: 60/min, Pro: 120/min, Scale: 300/min, Enterprise: custom). The Cloud Armor limit of 100 req/min per IP is an additional anti-abuse layer.

What happens if I consistently go over my monthly quota?

For Free and Pro plans, requests are blocked with 429 errors. For Scale and Enterprise plans, requests continue processing — we never block paying customers in production. You’ll receive email alerts at 80%, 90%, and 100% usage thresholds.

Can I increase my rate limit?

Yes. Enterprise plans support custom rate limits configured per organization. Contact sales@promptguard.co.

Do retries count against my quota?

Yes. Every request to our API counts, including retries. Implement smart retry logic with exponential backoff to minimize wasted quota.

How is usage calculated?

One request = one API call to /api/v1/chat/completions or /api/v1/completions, regardless of:
  • Number of tokens
  • Response length
  • Model used

Monitoring Tools

Dashboard Analytics

Track usage in real-time:
  • Current period usage
  • Daily/weekly/monthly trends
  • Over-quota events
  • Rate limit hits

Usage API

Programmatically monitor usage:
curl https://api.promptguard.co/api/v1/usage/stats \
  -H "X-API-Key: your_api_key"
Returns:
{
  "daily_usage": [
    {"date": "2025-10-11", "requests": 5234},
    {"date": "2025-10-10", "requests": 4892},
    ...
  ],
  "total": 35789,
  "limit": 100000,
  "remaining": 64211
}

Need Help?