Skip to main content

Overview

PromptGuard implements two types of limits to ensure fair usage and system stability:
  1. Monthly Request Quotas - Based on your subscription plan
  2. Rate Limiting - Maximum requests per minute (anti-abuse)

Monthly Request Quotas

Your subscription plan determines how many “fast” requests you get per month:
PlanMonthly LimitOver-Quota Behavior
Free10,000Hard block (429 error when exceeded)
Pro100,000Hard block (429 error when exceeded)
Scale1,000,000Soft limit (alerts only, never blocks)

Hard vs Soft Limits

Free and Pro plans use hard limits:
  • When you exceed your monthly quota, requests return 429 Too Many Requests
  • You must upgrade to continue using the service
  • Free (10K) → Upgrade to Pro (100K)
  • Pro (100K) → Upgrade to Scale (1M)
Scale plan uses soft limits:
  • When you exceed 1M requests/month, requests continue processing
  • You receive email alerts about overage
  • No blocking - your application keeps running
  • Overage is logged for analytics and billing
Example (Scale plan):
# You're on Scale plan (1M/month)
# Usage: 1,050,000 requests this month

# Request still works:
curl https://api.promptguard.co/api/v1/chat/completions \
  -H "X-API-Key: your_api_key" \
  -H "Authorization: Bearer YOUR_OPENAI_KEY" \
  -d '{"model": "gpt-5-nano", "messages": [...]}'

# Returns: 200 OK (not 429)
# Logged as "over quota" for billing analytics

Checking Your Usage

View current usage in the dashboard:
Dashboard → Usage → Current Period
- Requests Used: 105,234 / 100,000
- Status: Over Quota (5,234 overage)
- Next Reset: January 15, 2025
Or via API:
curl https://api.promptguard.co/api/v1/usage/stats \
  -H "X-API-Key: your_api_key"

{
  "requests_used": 105234,
  "requests_limit": 100000,
  "overage": 5234,
  "reset_at": "2025-01-15T00:00:00Z"
}

Rate Limiting (Anti-Abuse)

To prevent system abuse, we enforce a global rate limit: 100 requests per minute (all plans) This is an anti-abuse measure, not a pricing feature. All tiers get the same limit.

Rate Limit Headers

API responses may include rate limit information:
HTTP/1.1 200 OK
X-RateLimit-Remaining: 85
Rate limit headers are added by the bot detection middleware when applicable. The exact headers may vary.

Handling Rate Limits

If you exceed 100 req/min, you’ll receive a 429 Too Many Requests response:
{
  "error": {
    "message": "Rate limit exceeded. Please try again later.",
    "type": "rate_limit_exceeded",
    "code": "too_many_requests"
  }
}
Recommended handling:
import time
import openai

def make_request_with_retry(prompt, max_retries=3):
    for attempt in range(max_retries):
        try:
            response = openai.ChatCompletion.create(
                model="gpt-5-nano",
                messages=[{"role": "user", "content": prompt}]
            )
            return response
        except openai.RateLimitError as e:
            if attempt < max_retries - 1:
                # Exponential backoff
                time.sleep(2 ** attempt)
            else:
                raise

Best Practices

1. Implement Exponential Backoff

async function makeRequestWithBackoff(prompt, maxRetries = 3) {
  for (let i = 0; i < maxRetries; i++) {
    try:
      return await openai.chat.completions.create({
        model: "gpt-5-nano",
        messages: [{ role: "user", content: prompt }]
      });
    } catch (error) {
      if (error.status === 429 && i < maxRetries - 1) {
        await new Promise(resolve =>
          setTimeout(resolve, Math.pow(2, i) * 1000)
        );
      } else {
        throw error;
      }
    }
  }
}

2. Monitor Usage Proactively

Set up monitoring to alert before you hit limits:
# Check usage before making request
usage = client.get_usage()
if usage['requests_used'] > usage['requests_limit'] * 0.9:
    send_alert("Approaching monthly quota limit")

3. Batch Requests When Possible

Instead of:
for prompt in prompts:
    response = openai.ChatCompletion.create(...)  # 100 API calls
Use batch processing:
# Combine prompts where appropriate
combined_prompt = "\n".join(prompts)
response = openai.ChatCompletion.create(
    messages=[{"role": "user", "content": combined_prompt}]
)  # 1 API call

4. Cache Responses

Cache frequently requested results:
import hashlib
import redis

cache = redis.Redis()

def get_cached_response(prompt):
    cache_key = hashlib.sha256(prompt.encode()).hexdigest()
    cached = cache.get(cache_key)

    if cached:
        return json.loads(cached)

    response = openai.ChatCompletion.create(...)
    cache.setex(cache_key, 3600, json.dumps(response))  # 1 hour TTL
    return response

Upgrading for Higher Limits

Need more than 100 requests/minute? Contact us at [email protected] for:
  • Enterprise rate limits (custom req/min)
  • Dedicated infrastructure
  • SLA guarantees

Frequently Asked Questions

Why do all plans get the same rate limit?

The 100 req/min limit is an anti-abuse measure to protect infrastructure, not a pricing feature. Monthly quotas (1K vs 100K vs 1M) are how plans differ.

What happens if I consistently go over my monthly quota?

Nothing! We never block your app. However:
  • Overage is logged for analytics
  • You may receive emails suggesting an upgrade
  • Enterprise plans can set up overage billing

Can I increase my rate limit?

Yes. Contact [email protected] for custom rate limits on Enterprise plans.

Do retries count against my quota?

Yes. Every request to our API counts, including retries. Implement smart retry logic with exponential backoff to minimize wasted quota.

How is usage calculated?

One request = one API call to /api/v1/chat/completions or /api/v1/completions, regardless of:
  • Number of tokens
  • Response length
  • Model used

Monitoring Tools

Dashboard Analytics

Track usage in real-time:
  • Current period usage
  • Daily/weekly/monthly trends
  • Over-quota events
  • Rate limit hits

Usage API

Programmatically monitor usage:
curl https://api.promptguard.co/api/v1/usage/stats \
  -H "X-API-Key: your_api_key"
Returns:
{
  "daily_usage": [
    {"date": "2025-10-11", "requests": 5234},
    {"date": "2025-10-10", "requests": 4892},
    ...
  ],
  "total": 35789,
  "limit": 100000,
  "remaining": 64211
}

Need Help?