Overview
PromptGuard implements two types of limits to ensure fair usage and system stability:
- Monthly Request Quotas - Based on your subscription plan
- Rate Limiting - Maximum requests per minute (anti-abuse)
Monthly Request Quotas
Your subscription plan determines how many requests you get per month:
| Plan | Monthly Limit | Over-Quota Behavior |
|---|
| Free | 10,000 | Hard block (429 error when exceeded) |
| Pro | 100,000 | Hard block (429 error when exceeded) |
| Scale | 1,000,000 | Soft limit (alerts only, never blocks) |
| Enterprise | Custom (per contract) | Soft limit (never blocks, custom alerts) |
Hard vs Soft Limits
Free and Pro plans use hard limits:
- When you exceed your monthly quota, requests return
429 Too Many Requests
- You must upgrade to continue using the service
- Free (10K) → Upgrade to Pro (100K)
- Pro (100K) → Upgrade to Scale (1M)
Scale plan uses soft limits:
- When you exceed 1M requests/month, requests continue processing
- You receive email alerts about overage
- No blocking - your application keeps running
- Overage is logged for analytics and billing
Example (Scale plan):
# You're on Scale plan (1M/month)
# Usage: 1,050,000 requests this month
# Request still works:
curl https://api.promptguard.co/api/v1/chat/completions \
-H "X-API-Key: your_api_key" \
-H "Authorization: Bearer YOUR_OPENAI_KEY" \
-d '{"model": "gpt-5-nano", "messages": [...]}'
# Returns: 200 OK (not 429)
# Logged as "over quota" for billing analytics
Checking Your Usage
View current usage in the dashboard:
Dashboard → Usage → Current Period
- Requests Used: 105,234 / 100,000
- Status: Over Quota (5,234 overage)
- Next Reset: January 15, 2025
Or via API:
curl https://api.promptguard.co/api/v1/usage/stats \
-H "X-API-Key: your_api_key"
{
"requests_used": 105234,
"requests_limit": 100000,
"overage": 5234,
"reset_at": "2025-01-15T00:00:00Z"
}
Rate Limiting
PromptGuard enforces per-plan rate limits on all /api/v1/* endpoints:
| Plan | Rate Limit |
|---|
| Free | 60 requests/minute |
| Pro | 120 requests/minute |
| Scale | 300 requests/minute |
| Enterprise | Custom (configurable per organization) |
Additionally, Cloud Armor provides a global anti-abuse limit of 100 requests per minute per IP.
Every API response includes standard rate limit headers:
HTTP/1.1 200 OK
X-RateLimit-Limit: 120
X-RateLimit-Remaining: 85
X-RateLimit-Reset: 1708128060
| Header | Description |
|---|
X-RateLimit-Limit | Max requests per minute for your plan |
X-RateLimit-Remaining | Requests remaining in current window |
X-RateLimit-Reset | Unix timestamp when the window resets |
Enterprise organizations can request custom rate limits by contacting sales.
Handling Rate Limits
If you exceed 100 req/min, you’ll receive a 429 Too Many Requests response:
{
"error": {
"message": "Rate limit exceeded. Please try again later.",
"type": "rate_limit_exceeded",
"code": "too_many_requests"
}
}
Recommended handling:
import time
import openai
def make_request_with_retry(prompt, max_retries=3):
for attempt in range(max_retries):
try:
response = openai.ChatCompletion.create(
model="gpt-5-nano",
messages=[{"role": "user", "content": prompt}]
)
return response
except openai.RateLimitError as e:
if attempt < max_retries - 1:
# Exponential backoff
time.sleep(2 ** attempt)
else:
raise
Idempotency Keys
For safe retries of POST/PUT/PATCH requests, include an Idempotency-Key header:
curl -X POST https://api.promptguard.co/api/v1/chat/completions \
-H "X-API-Key: your_api_key" \
-H "Idempotency-Key: unique-request-id-12345" \
-H "Content-Type: application/json" \
-d '{"model": "gpt-4", "messages": [...]}'
If you retry the same request with the same idempotency key within 24 hours, you’ll get back the cached response with an X-Idempotency-Replayed: true header. This prevents duplicate operations.
Idempotency keys are scoped to your API key and expire after 24 hours.
Best Practices
1. Implement Exponential Backoff
async function makeRequestWithBackoff(prompt, maxRetries = 3) {
for (let i = 0; i < maxRetries; i++) {
try:
return await openai.chat.completions.create({
model: "gpt-5-nano",
messages: [{ role: "user", content: prompt }]
});
} catch (error) {
if (error.status === 429 && i < maxRetries - 1) {
await new Promise(resolve =>
setTimeout(resolve, Math.pow(2, i) * 1000)
);
} else {
throw error;
}
}
}
}
2. Monitor Usage Proactively
Set up monitoring to alert before you hit limits:
# Check usage before making request
usage = client.get_usage()
if usage['requests_used'] > usage['requests_limit'] * 0.9:
send_alert("Approaching monthly quota limit")
3. Batch Requests When Possible
Instead of:
for prompt in prompts:
response = openai.ChatCompletion.create(...) # 100 API calls
Use batch processing:
# Combine prompts where appropriate
combined_prompt = "\n".join(prompts)
response = openai.ChatCompletion.create(
messages=[{"role": "user", "content": combined_prompt}]
) # 1 API call
4. Cache Responses
Cache frequently requested results:
import hashlib
import redis
cache = redis.Redis()
def get_cached_response(prompt):
cache_key = hashlib.sha256(prompt.encode()).hexdigest()
cached = cache.get(cache_key)
if cached:
return json.loads(cached)
response = openai.ChatCompletion.create(...)
cache.setex(cache_key, 3600, json.dumps(response)) # 1 hour TTL
return response
Upgrading for Higher Limits
Need higher rate limits or custom quotas?
Enterprise plans offer:
- Custom rate limits per organization
- Custom monthly request quotas
- IP allowlisting for API access control
- Idempotency keys for safe retries
- Dedicated support and SLA guarantees
Contact us at sales@promptguard.co for Enterprise pricing.
Frequently Asked Questions
Why do different plans have different rate limits?
Rate limits scale with your plan tier (Free: 60/min, Pro: 120/min, Scale: 300/min, Enterprise: custom). The Cloud Armor limit of 100 req/min per IP is an additional anti-abuse layer.
What happens if I consistently go over my monthly quota?
For Free and Pro plans, requests are blocked with 429 errors. For Scale and Enterprise plans, requests continue processing — we never block paying customers in production. You’ll receive email alerts at 80%, 90%, and 100% usage thresholds.
Can I increase my rate limit?
Yes. Enterprise plans support custom rate limits configured per organization. Contact sales@promptguard.co.
Do retries count against my quota?
Yes. Every request to our API counts, including retries. Implement smart retry logic with exponential backoff to minimize wasted quota.
How is usage calculated?
One request = one API call to /api/v1/chat/completions or /api/v1/completions, regardless of:
- Number of tokens
- Response length
- Model used
Dashboard Analytics
Track usage in real-time:
- Current period usage
- Daily/weekly/monthly trends
- Over-quota events
- Rate limit hits
Usage API
Programmatically monitor usage:
curl https://api.promptguard.co/api/v1/usage/stats \
-H "X-API-Key: your_api_key"
Returns:
{
"daily_usage": [
{"date": "2025-10-11", "requests": 5234},
{"date": "2025-10-10", "requests": 4892},
...
],
"total": 35789,
"limit": 100000,
"remaining": 64211
}
Need Help?