Rate Limiting

The CLE Engine API uses rate limiting to ensure fair usage and service stability. This guide explains how rate limits work and how to handle them.

Overview

Rate limiting is applied at two levels:

Monthly Quota: Total requests allowed per billing period
Per-Minute Rate: Maximum requests per minute to prevent abuse

Pricing Tiers

Tier	Monthly Quota	Rate Limit	Use Case
Free	100 requests	10/minute	Testing and evaluation
Starter	1,000 requests	60/minute	Small law firms, solo practitioners
Professional	10,000 requests	120/minute	Mid-size firms, CLE providers
Enterprise	Unlimited	300/minute	Large organizations, custom integrations

How It Works

Monthly Quota

Quota resets on the first day of each month at 00:00 UTC
Each successful API call counts as one request toward your quota
Failed requests (4xx/5xx) also count toward quota
Unused quota does not roll over to the next month

Per-Minute Rate Limiting

Measured using a sliding window algorithm
Requests beyond the limit return 429 Too Many Requests
Wait at least 1 minute before retrying after hitting rate limit

Response Headers

All API responses include rate limit information:

HTTP/1.1 200 OK
X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 766
X-RateLimit-Reset: 1738368000

Header	Description
`X-RateLimit-Limit`	Your tier's monthly quota
`X-RateLimit-Remaining`	Requests remaining this month
`X-RateLimit-Reset`	Unix timestamp when quota resets

Quota Exceeded Response

When you exceed your monthly quota:

HTTP/1.1 429 Too Many Requests
Content-Type: application/json

{
  "detail": "quota exceeded"
}

Best Practices

1. Monitor Your Usage

Check the rate limit headers on each response to track remaining quota:

response = requests.post(url, headers=headers, json=data)

remaining = int(response.headers.get("X-RateLimit-Remaining", 0))
if remaining < 100:
    print(f"Warning: Only {remaining} requests remaining this month")

2. Implement Exponential Backoff

When you receive a 429 response, wait before retrying:

import time

def make_request_with_backoff(url, headers, data, max_retries=5):
    for attempt in range(max_retries):
        response = requests.post(url, headers=headers, json=data)

        if response.status_code == 200:
            return response.json()
        elif response.status_code == 429:
            wait_time = min(2 ** attempt, 60)  # Max 60 seconds
            print(f"Rate limited. Waiting {wait_time}s before retry...")
            time.sleep(wait_time)
        else:
            response.raise_for_status()

    raise Exception("Max retries exceeded")

3. Batch Your Requests

If you need to check multiple jurisdictions, space out your requests:

import time

jurisdictions = ["CA", "NY", "TX", "FL", "IL"]

for jurisdiction in jurisdictions:
    result = api.compute(jurisdiction=jurisdiction)
    process_result(result)
    time.sleep(0.1)  # 100ms delay between requests

4. Cache Results

CLE deadlines don't change frequently. Cache results to reduce API calls:

import json
from datetime import datetime, timedelta

CACHE_FILE = "cle_cache.json"
CACHE_TTL = timedelta(days=1)

def get_cle_deadline(jurisdiction, **kwargs):
    cache = load_cache()
    cache_key = f"{jurisdiction}:{json.dumps(kwargs, sort_keys=True)}"

    # Check cache
    if cache_key in cache:
        cached = cache[cache_key]
        if datetime.fromisoformat(cached["expires"]) > datetime.now():
            return cached["data"]

    # Make API request
    result = api.compute(jurisdiction=jurisdiction, **kwargs)

    # Cache result
    cache[cache_key] = {
        "data": result,
        "expires": (datetime.now() + CACHE_TTL).isoformat()
    }
    save_cache(cache)

    return result

5. Use Webhooks for Bulk Operations

For large-scale operations, consider:

Spreading requests across multiple days
Using off-peak hours (nights/weekends)
Contacting support for bulk data exports

Monitoring Usage

Dashboard

View your current usage in the API Dashboard:

Current month's request count
Historical usage trends
Per-endpoint breakdown
Remaining quota

Usage API (Coming Soon)

GET /v1/usage
X-API-Key: cle_your_api_key_here

Response:

{
  "tier": "starter",
  "monthly_limit": 1000,
  "current_usage": 234,
  "remaining": 766,
  "reset_date": "2026-02-01T00:00:00Z",
  "usage_by_endpoint": {
    "/v1/compute": 1200,
    "/v1/stats": 34
  }
}

Upgrading Your Plan

If you consistently hit rate limits, consider upgrading:

Log in to cle-engine.com
Navigate to Settings > Billing
Select a higher tier
Your new quota takes effect immediately

Enterprise Options

For high-volume users, we offer:

Custom quotas: Tailored to your usage patterns
Dedicated rate limits: Higher per-minute limits
Priority support: Direct engineering access
SLA guarantees: Uptime and response time commitments
Bulk data exports: Download full jurisdiction datasets

Contact enterprise@cle-engine.com for details.

FAQs

Q: What happens when I exceed my quota?

Your API calls will return 429 Too Many Requests until the quota resets on the first of the next month. Upgrade your plan for immediate access.

Q: Do failed requests count toward my quota?

Yes, all requests that reach our servers count toward your quota, regardless of response status.

Q: Can I purchase additional quota for a single month?

Contact support for one-time quota increases. We recommend upgrading tiers for sustained higher usage.

Q: How do I know when my quota resets?

Check the X-RateLimit-Reset header for the Unix timestamp, or view the reset date in your dashboard.

Q: Is there a way to test without using my quota?

The /health endpoint doesn't require authentication and doesn't count toward quota. For /v1/compute testing, the Free tier provides 100 requests per month.

Q: Can multiple team members share one API key?

Yes, but all requests from a single key share the same quota. Enterprise plans offer multiple keys with independent tracking.