Rate Limits & Quotas

Overview

PromptGuard implements two types of limits to ensure fair usage and system stability:

Monthly Request Quotas - Based on your subscription plan
Rate Limiting - Maximum requests per minute (anti-abuse)

Monthly Request Quotas

Your subscription plan determines how many “fast” requests you get per month:

Plan	Fast Requests/Month	Over-Quota Behavior
Free	1,000	Unlimited slow requests
Starter	100,000	Unlimited slow requests
Growth	1,000,000	Unlimited slow requests

What “Unlimited Slow Requests” Means

We never block your application, even when you exceed your monthly quota.

Within quota: Requests processed normally
Over quota: Requests still processed, just logged as “over quota”
No artificial delays: Natural backpressure only when system is under load
Cursor-inspired model: We prioritize keeping your app running over strict enforcement

Example:

# You're on Starter plan (100K/month)
# Usage: 105,000 requests this month

# Request still works:
curl https://api.promptguard.co/v1/proxy/chat/completions \
  -H "X-API-Key: your_api_key" \
  -d '{"model": "gpt-4o", "messages": [...]}'

# Returns: 200 OK (not 429)
# Logged as "over quota" for billing analytics

Checking Your Usage

View current usage in the dashboard:

Dashboard → Usage → Current Period
- Requests Used: 105,234 / 100,000
- Status: Over Quota (5,234 overage)
- Next Reset: January 15, 2025

Or via API:

curl https://api.promptguard.co/v1/usage \
  -H "X-API-Key: your_api_key"

{
  "requests_used": 105234,
  "requests_limit": 100000,
  "overage": 5234,
  "reset_at": "2025-01-15T00:00:00Z"
}

Rate Limiting (Anti-Abuse)

To prevent system abuse, we enforce a global rate limit: 100 requests per minute (all plans) This is an anti-abuse measure, not a pricing feature. All tiers get the same limit.

Rate Limit Headers

Every API response includes rate limit info:

HTTP/1.1 200 OK
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 85
X-RateLimit-Reset: 1673894400

Handling Rate Limits

If you exceed 100 req/min, you’ll receive a 429 Too Many Requests response:

{
  "error": {
    "message": "Rate limit exceeded. Please try again later.",
    "type": "rate_limit_exceeded",
    "code": "too_many_requests"
  }
}

Recommended handling:

import time
import openai

def make_request_with_retry(prompt, max_retries=3):
    for attempt in range(max_retries):
        try:
            response = openai.ChatCompletion.create(
                model="gpt-4o",
                messages=[{"role": "user", "content": prompt}]
            )
            return response
        except openai.RateLimitError as e:
            if attempt < max_retries - 1:
                # Exponential backoff
                time.sleep(2 ** attempt)
            else:
                raise

Best Practices

1. Implement Exponential Backoff

async function makeRequestWithBackoff(prompt, maxRetries = 3) {
  for (let i = 0; i < maxRetries; i++) {
    try:
      return await openai.chat.completions.create({
        model: "gpt-4o",
        messages: [{ role: "user", content: prompt }]
      });
    } catch (error) {
      if (error.status === 429 && i < maxRetries - 1) {
        await new Promise(resolve =>
          setTimeout(resolve, Math.pow(2, i) * 1000)
        );
      } else {
        throw error;
      }
    }
  }
}

2. Monitor Usage Proactively

Set up monitoring to alert before you hit limits:

# Check usage before making request
usage = client.get_usage()
if usage['requests_used'] > usage['requests_limit'] * 0.9:
    send_alert("Approaching monthly quota limit")

3. Batch Requests When Possible

Instead of:

for prompt in prompts:
    response = openai.ChatCompletion.create(...)  # 100 API calls

Use batch processing:

# Combine prompts where appropriate
combined_prompt = "\n".join(prompts)
response = openai.ChatCompletion.create(
    messages=[{"role": "user", "content": combined_prompt}]
)  # 1 API call

4. Cache Responses

Cache frequently requested results:

import hashlib
import redis

cache = redis.Redis()

def get_cached_response(prompt):
    cache_key = hashlib.sha256(prompt.encode()).hexdigest()
    cached = cache.get(cache_key)

    if cached:
        return json.loads(cached)

    response = openai.ChatCompletion.create(...)
    cache.setex(cache_key, 3600, json.dumps(response))  # 1 hour TTL
    return response

Upgrading for Higher Limits

Need more than 100 requests/minute? Contact us at [email protected] for:

Enterprise rate limits (custom req/min)
Dedicated infrastructure
SLA guarantees

Frequently Asked Questions

Why do all plans get the same rate limit?

The 100 req/min limit is an anti-abuse measure to protect infrastructure, not a pricing feature. Monthly quotas (1K vs 100K vs 1M) are how plans differ.

What happens if I consistently go over my monthly quota?

Nothing! We never block your app. However:

Overage is logged for analytics
You may receive emails suggesting an upgrade
Enterprise plans can set up overage billing

Can I increase my rate limit?

Yes. Contact [email protected] for custom rate limits on Enterprise plans.

Do retries count against my quota?

Yes. Every request to our API counts, including retries. Implement smart retry logic with exponential backoff to minimize wasted quota.

How is usage calculated?

One request = one API call to /v1/proxy/chat/completions or /v1/proxy/completions, regardless of:

Number of tokens
Response length
Model used

Monitoring Tools

Dashboard Analytics

Track usage in real-time:

Current period usage
Daily/weekly/monthly trends
Over-quota events
Rate limit hits

Usage API

Programmatically monitor usage:

curl https://api.promptguard.co/v1/usage/history?days=7 \
  -H "X-API-Key: your_api_key"

Returns:

{
  "daily_usage": [
    {"date": "2025-10-11", "requests": 5234},
    {"date": "2025-10-10", "requests": 4892},
    ...
  ],
  "total": 35789,
  "limit": 100000,
  "remaining": 64211
}

Need Help?

Questions: [email protected]
Enterprise Limits: [email protected]
Technical Issues: [email protected]

This page reflects the current implementation. Future enhancements (per-plan rate limits, token-based limits) will be documented here when available.

Getting Started

CLI & Editor Tools

Integration Guides

Security & Policies

Monitoring & Analytics

Advanced

Examples

Overview

Monthly Request Quotas

What “Unlimited Slow Requests” Means

Checking Your Usage

Rate Limiting (Anti-Abuse)

Rate Limit Headers

Handling Rate Limits

Best Practices

1. Implement Exponential Backoff

2. Monitor Usage Proactively

3. Batch Requests When Possible

4. Cache Responses

Upgrading for Higher Limits

Frequently Asked Questions

Why do all plans get the same rate limit?

What happens if I consistently go over my monthly quota?

Can I increase my rate limit?

Do retries count against my quota?

How is usage calculated?

Monitoring Tools

Dashboard Analytics

Usage API

Need Help?

Getting Started

CLI & Editor Tools

Integration Guides

Security & Policies

Monitoring & Analytics

Advanced

Examples

​Overview

​Monthly Request Quotas

​What “Unlimited Slow Requests” Means

​Checking Your Usage

​Rate Limiting (Anti-Abuse)

​Rate Limit Headers

​Handling Rate Limits

​Best Practices

​1. Implement Exponential Backoff

​2. Monitor Usage Proactively

​3. Batch Requests When Possible

​4. Cache Responses

​Upgrading for Higher Limits

​Frequently Asked Questions

​Why do all plans get the same rate limit?

​What happens if I consistently go over my monthly quota?

​Can I increase my rate limit?

​Do retries count against my quota?

​How is usage calculated?

​Monitoring Tools

​Dashboard Analytics

​Usage API

​Need Help?

Overview

Monthly Request Quotas

What “Unlimited Slow Requests” Means

Checking Your Usage

Rate Limiting (Anti-Abuse)

Rate Limit Headers

Handling Rate Limits

Best Practices

1. Implement Exponential Backoff

2. Monitor Usage Proactively

3. Batch Requests When Possible

4. Cache Responses

Upgrading for Higher Limits

Frequently Asked Questions

Why do all plans get the same rate limit?

What happens if I consistently go over my monthly quota?

Can I increase my rate limit?

Do retries count against my quota?

How is usage calculated?

Monitoring Tools

Dashboard Analytics

Usage API

Need Help?