Skip to main content

Overview

PromptGuard implements two types of limits to ensure fair usage and system stability:
  1. Monthly Request Quotas - Based on your subscription plan
  2. Rate Limiting - Maximum requests per minute (anti-abuse)

Monthly Request Quotas

Your subscription plan determines how many “fast” requests you get per month:
PlanFast Requests/MonthOver-Quota Behavior
Free1,000Unlimited slow requests
Starter100,000Unlimited slow requests
Growth1,000,000Unlimited slow requests

What “Unlimited Slow Requests” Means

We never block your application, even when you exceed your monthly quota.
  • Within quota: Requests processed normally
  • Over quota: Requests still processed, just logged as “over quota”
  • No artificial delays: Natural backpressure only when system is under load
  • Cursor-inspired model: We prioritize keeping your app running over strict enforcement
Example:
# You're on Starter plan (100K/month)
# Usage: 105,000 requests this month

# Request still works:
curl https://api.promptguard.co/v1/proxy/chat/completions \
  -H "X-API-Key: your_api_key" \
  -d '{"model": "gpt-4o", "messages": [...]}'

# Returns: 200 OK (not 429)
# Logged as "over quota" for billing analytics

Checking Your Usage

View current usage in the dashboard:
Dashboard → Usage → Current Period
- Requests Used: 105,234 / 100,000
- Status: Over Quota (5,234 overage)
- Next Reset: January 15, 2025
Or via API:
curl https://api.promptguard.co/v1/usage \
  -H "X-API-Key: your_api_key"

{
  "requests_used": 105234,
  "requests_limit": 100000,
  "overage": 5234,
  "reset_at": "2025-01-15T00:00:00Z"
}

Rate Limiting (Anti-Abuse)

To prevent system abuse, we enforce a global rate limit: 100 requests per minute (all plans) This is an anti-abuse measure, not a pricing feature. All tiers get the same limit.

Rate Limit Headers

Every API response includes rate limit info:
HTTP/1.1 200 OK
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 85
X-RateLimit-Reset: 1673894400

Handling Rate Limits

If you exceed 100 req/min, you’ll receive a 429 Too Many Requests response:
{
  "error": {
    "message": "Rate limit exceeded. Please try again later.",
    "type": "rate_limit_exceeded",
    "code": "too_many_requests"
  }
}
Recommended handling:
import time
import openai

def make_request_with_retry(prompt, max_retries=3):
    for attempt in range(max_retries):
        try:
            response = openai.ChatCompletion.create(
                model="gpt-4o",
                messages=[{"role": "user", "content": prompt}]
            )
            return response
        except openai.RateLimitError as e:
            if attempt < max_retries - 1:
                # Exponential backoff
                time.sleep(2 ** attempt)
            else:
                raise

Best Practices

1. Implement Exponential Backoff

async function makeRequestWithBackoff(prompt, maxRetries = 3) {
  for (let i = 0; i < maxRetries; i++) {
    try:
      return await openai.chat.completions.create({
        model: "gpt-4o",
        messages: [{ role: "user", content: prompt }]
      });
    } catch (error) {
      if (error.status === 429 && i < maxRetries - 1) {
        await new Promise(resolve =>
          setTimeout(resolve, Math.pow(2, i) * 1000)
        );
      } else {
        throw error;
      }
    }
  }
}

2. Monitor Usage Proactively

Set up monitoring to alert before you hit limits:
# Check usage before making request
usage = client.get_usage()
if usage['requests_used'] > usage['requests_limit'] * 0.9:
    send_alert("Approaching monthly quota limit")

3. Batch Requests When Possible

Instead of:
for prompt in prompts:
    response = openai.ChatCompletion.create(...)  # 100 API calls
Use batch processing:
# Combine prompts where appropriate
combined_prompt = "\n".join(prompts)
response = openai.ChatCompletion.create(
    messages=[{"role": "user", "content": combined_prompt}]
)  # 1 API call

4. Cache Responses

Cache frequently requested results:
import hashlib
import redis

cache = redis.Redis()

def get_cached_response(prompt):
    cache_key = hashlib.sha256(prompt.encode()).hexdigest()
    cached = cache.get(cache_key)

    if cached:
        return json.loads(cached)

    response = openai.ChatCompletion.create(...)
    cache.setex(cache_key, 3600, json.dumps(response))  # 1 hour TTL
    return response

Upgrading for Higher Limits

Need more than 100 requests/minute? Contact us at [email protected] for:
  • Enterprise rate limits (custom req/min)
  • Dedicated infrastructure
  • SLA guarantees

Frequently Asked Questions

Why do all plans get the same rate limit?

The 100 req/min limit is an anti-abuse measure to protect infrastructure, not a pricing feature. Monthly quotas (1K vs 100K vs 1M) are how plans differ.

What happens if I consistently go over my monthly quota?

Nothing! We never block your app. However:
  • Overage is logged for analytics
  • You may receive emails suggesting an upgrade
  • Enterprise plans can set up overage billing

Can I increase my rate limit?

Yes. Contact [email protected] for custom rate limits on Enterprise plans.

Do retries count against my quota?

Yes. Every request to our API counts, including retries. Implement smart retry logic with exponential backoff to minimize wasted quota.

How is usage calculated?

One request = one API call to /v1/proxy/chat/completions or /v1/proxy/completions, regardless of:
  • Number of tokens
  • Response length
  • Model used

Monitoring Tools

Dashboard Analytics

Track usage in real-time:
  • Current period usage
  • Daily/weekly/monthly trends
  • Over-quota events
  • Rate limit hits

Usage API

Programmatically monitor usage:
curl https://api.promptguard.co/v1/usage/history?days=7 \
  -H "X-API-Key: your_api_key"
Returns:
{
  "daily_usage": [
    {"date": "2025-10-11", "requests": 5234},
    {"date": "2025-10-10", "requests": 4892},
    ...
  ],
  "total": 35789,
  "limit": 100000,
  "remaining": 64211
}

Need Help?


This page reflects the current implementation. Future enhancements (per-plan rate limits, token-based limits) will be documented here when available.