Docs / Use Cases / Consumer AI Subscription: Tiers with Resetting Quotas
SUBSCRIPTION · RESETTING QUOTAS

Consumer AI Subscription: Tiers with Resetting Quotas

How to model a ChatGPT Plus- or Claude Pro-style consumer AI subscription with fixed monthly pricing and quotas that reset on a sub-monthly cadence.

Inspired by: ChatGPT Plus, Claude Pro, Perplexity Pro, Cursor Pro

Mental Model

Think of this as ChatGPT Plus or Claude Pro: flat monthly price, generous quota, but the quota resets every few hours or every day — you can't save it up. Each interval, old credits expire and a fresh allowance lands. No hoarding, no unbounded cost.

Quick Take
Prepaid subscription with a recurring credit grant that fires on the reset cadence (daily / weekly / custom)
`expires_after_seconds` matches the grant interval — old credits die as new ones arrive
No rollover, no hoarding — the key anti-abuse mechanic
Optional wallet overflow (priority 0) for heavy users who want to top up real money when the quota runs out
PREPAID · BILLING CYCLE INDEPENDENT Subscription window N window N+1 P10 · EXPIRES IN 5H Quota Block FIRES IN 5H · EXPIRES IN 10H Next Quota Block RESET debits PER MESSAGE / PER REQUEST Usage Event

Consumer AI Subscription

Pattern: prepaid subscription + resetting quotas + anti-hoarding.

The problem

You run a consumer AI product. Users pay a flat monthly fee for a tier (Free, Plus, Pro) and expect a generous quota — but not unlimited usage, because every request has marginal inference cost. Two competing constraints:

  • Users hate dead ends. A monthly quota that runs out on day 12 feels punitive.
  • Margins require anti-hoarding. If you let heavy users save their monthly quota for a binge on day 30, you get whales that blow up infrastructure costs in bursts.

The answer most consumer AI products converge on: a rolling window quota. ChatGPT Plus gives you weekly limits. Claude Pro gives you 5-hour windows. Cursor Pro gives you hourly bursts. Users get a fresh allowance every few hours; unused credits don’t accumulate; cost stays predictable.

This use case shows how to model that in QuotaStack.

Credit structure

A single recurring grant, with expiry equal to the grant interval. Each interval creates a fresh credit block; the previous interval’s credits expire exactly when the new grant lands. No rollover, no hoarding.

Block typePriorityExpirySource
Tier quota grant10End of grant intervalPlan variant’s credit_grants
Optional overflow wallet0NoneTopup grant after pay-as-you-go purchase

The overflow wallet is optional. It lets heavy users top up real money when they hit their tier limit — useful for prosumer pricing (Claude Pro users who want extra Opus usage, Cursor users hitting their daily burst, etc.). Wallet credits burn only after the tier quota is exhausted.

Plan catalog

Three tiers — the shape varies per product, but the mechanic is identical.

TierPrice/monthQuotaReset cadence
Free$010 messagesDaily
Plus$20200 messagesDaily
Pro$8050 messagesEvery 4 hours

“Messages” is the unit users understand. Under the hood it’s credits (1 message = 1,000 mc in these examples), metered via a chat_message billable metric. You can scale per-model cost by charging different unit counts for different models (Sonnet = 1 unit, Opus = 5 units) using QuotaStack’s per-unit metering.

Plan variant configuration

The critical trick: expires_after_seconds matches the grant interval.

Daily reset (Plus tier):

curl -X POST https://api.quotastack.io/v1/plans/{plan_id}/variants \
  -H "X-API-Key: qs_live_..." \
  -H "Idempotency-Key: variant-plus-v1" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Plus",
    "billing_cycle": "monthly",
    "billing_mode": "prepaid",
    "price_cents": 2000,
    "currency": "USD"
  }'

Then add the recurring grant:

curl -X POST https://api.quotastack.io/v1/plans/{plan_id}/variants/{variant_id}/grants \
  -H "X-API-Key: qs_live_..." \
  -H "Idempotency-Key: grant-plus-daily-v1" \
  -H "Content-Type: application/json" \
  -d '{
    "credits": 200000,
    "grant_interval": "daily",
    "grant_type": "recurring",
    "expires_after_seconds": 86400,
    "rollover_percentage": 0,
    "priority": 10
  }'

What each field does:

FieldWhy it matters
grant_interval: dailyFires a new grant every 24 hours from subscription activation
expires_after_seconds: 86400Matches the interval. Old credits expire exactly when the new grant lands
rollover_percentage: 0No hoarding — unused credits don’t carry forward
priority: 10Burns before any wallet credits (see burn order)

The invariant: at any moment the customer has at most one active quota block plus any optional wallet credits. When they check their balance, they see “180 / 200 messages, resets in 3h 17m” — one number, one reset.

Custom cadences (Claude Pro, hourly resets, etc.). grant_interval also accepts ISO 8601 durationsPT5H (every 5 hours), PT30M (every 30 minutes), P3D (every 3 days), etc. The grant cadence is independent of the subscription’s billing cycle, so you can have monthly billing with a 5-hour quota reset.

curl -X POST https://api.quotastack.io/v1/plans/{plan_id}/variants/{variant_id}/grants \
  -H "X-API-Key: qs_live_..." \
  -H "Idempotency-Key: grant-pro-claude-v1" \
  -H "Content-Type: application/json" \
  -d '{
    "credits": 50000,
    "grant_interval": "PT5H",
    "grant_type": "recurring",
    "expires_after_seconds": 18000,
    "rollover_percentage": 0,
    "priority": 10
  }'

This configures a Claude Pro–style plan: every 5 hours, a fresh 50,000-credit block is issued; the previous block expires exactly when the new grant lands.

Supported interval formats

FormExampleNotes
Keyword"daily", "weekly", "monthly", "billing_cycle", "on_activation"Best for standard cadences.
ISO 8601 duration"PT5H", "PT30M", "P3D", "P1DT12H"Best for custom cadences. Supported letters: D (days), H (hours), M (minutes), S (seconds).

For year/month durations, use the yearly / monthly keywords — months and years have variable length, so ISO P1M / P1Y are intentionally not supported.

Minimum interval: 5 minutes (PT5M). Anything shorter returns 422.

No drift: grants fire on the subscription’s anniversary schedule (e.g., every 5 hours from subscription.created_at), not relative to the previous fire. A delayed or missed grant doesn’t shift future ones forward.

No backfill on downtime: if QuotaStack has a brief outage that crosses a reset boundary, users get the current interval’s credits on recovery — previous windows are not backfilled (which would be unexpected free credits).

Subscription lifecycle

Standard prepaid flow, no different from SaaS Subscription:

# Customer subscribes (charge via your payment provider first, then tell QuotaStack)
curl -X POST https://api.quotastack.io/v1/subscriptions \
  -H "X-API-Key: qs_live_..." \
  -H "Idempotency-Key: sub-create:user_abc:pv_plus" \
  -H "Content-Type: application/json" \
  -d '{
    "external_customer_id": "user_abc",
    "plan_variant_id": "pv_plus"
  }'

On activation: the first quota grant fires immediately — customer has their daily quota available right away. The next grant fires 24 hours later. And so on every interval, for the life of the subscription.

Checking quota in the UI

Every user expects two things on their profile page:

  1. “How many messages do I have left?”
  2. “When does my quota reset?”

Both come from a single entitlement check:

curl "https://api.quotastack.io/v1/customer-by-external-id/user_abc/entitlements/chat_message" \
  -H "X-API-Key: qs_live_..."

Response:

{
  "allowed": true,
  "customer_id": "019d...",
  "external_customer_id": "user_abc",
  "billable_metric_key": "chat_message",
  "units": 1,
  "balance": 180000,
  "reserved_balance": 0,
  "effective_balance": 180000,
  "estimated_cost": 1000,
  "balance_after": 179000
}

180 messages remaining (180,000 mc / 1,000 mc per unit). To find the reset time, fetch the account’s active blocks:

curl "https://api.quotastack.io/v1/customer-by-external-id/user_abc/credits?include_blocks=true" \
  -H "X-API-Key: qs_live_..."
{
  "balance": 180000,
  "blocks": [
    {
      "id": "019d8a20-4ff5-7be0-81da-e1454b3d6f64",
      "remaining_amount": 180000,
      "priority": 10,
      "expires_at": "2026-04-15T09:00:00Z",
      "source": "plan_grant"
    }
  ]
}

The block’s expires_at is your reset time. Format it as “resets in 3h 17m” in your UI.

Per-message consumption

Each message is a straightforward usage event:

curl -X POST https://api.quotastack.io/v1/usage \
  -H "X-API-Key: qs_live_..." \
  -H "Idempotency-Key: usage:{message_id}" \
  -H "Content-Type: application/json" \
  -d '{
    "external_customer_id": "user_abc",
    "billable_metric_key": "chat_message",
    "units": 1,
    "idempotency_key": "msg_k8x2m"
  }'

For per-model pricing (e.g., Sonnet = 1 unit, Opus = 5 units), just send different units values. The metering rule stays the same.

Tier upgrades mid-window

A Plus user upgrades to Pro mid-day. What happens?

Recommended path: swap the subscription to the new variant. QuotaStack cancels the remaining Plus quota grant and issues the first Pro grant immediately.

# Upgrade: cancel the old subscription, create a new one on the Pro variant
curl -X POST https://api.quotastack.io/v1/subscriptions/{old_sub_id}/cancel \
  -H "X-API-Key: qs_live_..." \
  -H "Idempotency-Key: cancel:sub_plus_user_abc" \
  -H "Content-Type: application/json" \
  -d '{"cancel_immediately": true, "reason": "Upgrade to Pro"}'

curl -X POST https://api.quotastack.io/v1/subscriptions \
  -H "X-API-Key: qs_live_..." \
  -H "Idempotency-Key: sub-create:user_abc:pv_pro" \
  -H "Content-Type: application/json" \
  -d '{
    "external_customer_id": "user_abc",
    "plan_variant_id": "pv_pro"
  }'

What the user sees: their quota immediately resets to the Pro allowance (50 messages, 4-hour window). The remaining Plus credits expire on cancel. The reset timer is based on the new subscription’s activation.

If you want to credit back the unused Plus window prorata, issue a POST /v1/customers/{id}/credits/adjust with a small promotional grant. This is a product decision — most consumer AI products just give the full Pro allowance immediately and call it “we upgraded you, enjoy.”

Overflow wallet (optional)

Add a pay-as-you-go top-up option for heavy users who hit their tier limit:

# After payment confirms, grant wallet credits
curl -X POST https://api.quotastack.io/v1/topup/grant \
  -H "X-API-Key: qs_live_..." \
  -H "Idempotency-Key: topup:pay_abc123" \
  -H "Content-Type: application/json" \
  -d '{
    "external_customer_id": "user_abc",
    "credits": 100000,
    "price_paid": 500,
    "currency": "USD",
    "external_payment_id": "pay_abc123",
    "priority": 0
  }'

Priority 0, no expiry, no rollover config — this is the wallet pattern from AI Chat App. Because the tier grant is priority 10 and the wallet is priority 0, the tier burns first. Wallet credits only dip when the customer exhausts their current window’s quota.

This is how Cursor’s “overages” and Claude’s “API access” work — subscription for the common case, wallet/API for heavy users.

Per-feature quotas

Some products (Claude Pro) separate quotas per model or feature. Sonnet has its own 5-hour window, Opus has its own 5-hour window, and they don’t share.

Model it with separate billable metrics per feature, each with its own recurring grant:

# Sonnet quota — 5-hour reset
curl -X POST https://api.quotastack.io/v1/plans/{plan_id}/variants/{variant_id}/grants \
  -H "X-API-Key: qs_live_..." \
  -H "Idempotency-Key: grant-pro-sonnet-v1" \
  -H "Content-Type: application/json" \
  -d '{
    "credits": 100000,
    "grant_interval": "PT5H",
    "grant_type": "recurring",
    "expires_after_seconds": 18000,
    "rollover_percentage": 0,
    "priority": 10,
    "metadata": { "feature": "sonnet" }
  }'

# Opus quota — separate grant, same plan variant, same 5-hour reset
curl -X POST https://api.quotastack.io/v1/plans/{plan_id}/variants/{variant_id}/grants \
  -H "X-API-Key: qs_live_..." \
  -H "Idempotency-Key: grant-pro-opus-v1" \
  -H "Content-Type: application/json" \
  -d '{
    "credits": 20000,
    "grant_interval": "PT5H",
    "grant_type": "recurring",
    "expires_after_seconds": 18000,
    "rollover_percentage": 0,
    "priority": 10,
    "metadata": { "feature": "opus" }
  }'

Usage events then reference different billable metrics:

{ "external_customer_id": "user_abc", "billable_metric_key": "sonnet_message", "units": 1 }
{ "external_customer_id": "user_abc", "billable_metric_key": "opus_message", "units": 1 }

Each metric has its own metering rule targeting the specific block via matching metadata.feature. Your UI surfaces two reset timers — “Sonnet resets in 2h 14m · Opus resets in 3h 47m” — both sourced from the respective block’s expires_at.

Pause and resume

If the customer pauses their subscription:

  • No future quota grants fire.
  • The existing block stays until it expires (they get to finish their current window).
  • On resume, grants restart on the next cadence boundary.

The subscription.paused and subscription.resumed webhooks let you reflect this in your UI.

Edge cases

  • What if the user is mid-message when their quota runs out? Standard metering behavior — the POST /v1/usage call for that message goes through (it was enqueued) but the debit leaves balance at 0 or just below. Next entitlement check returns allowed: false and your UI shows “you’ve used up this window’s messages, resets in Xh Ym.”
  • Time zones. Quota grants fire on the subscription’s anniversary schedule, not wall-clock midnight. A user who subscribed at 3:47pm UTC gets their daily resets at 3:47pm UTC. Display the reset time in the user’s local time zone in your UI.
  • Daylight saving time. Grant intervals are defined in seconds — DST has no effect. A “daily” grant is 86,400 seconds, full stop.
  • Plan upgrade changes quota mid-window. Cancel + re-create (see “Tier upgrades”). New quota available immediately.

Tips

  • Match expires_after_seconds to the grant interval exactly. This is the entire anti-hoarding mechanic. If they drift apart you’ll get overlapping blocks and users will hoard.
  • Display reset time from the block’s expires_at. Don’t compute it from the subscription’s created_at — it will drift. The block is the source of truth.
  • Keep billable metrics separate per feature if you want independent quotas. Don’t try to share one metric across features with clever metadata filtering; it gets brittle fast.
  • Upgrade = cancel + re-create. Don’t try to patch the subscription’s plan_variant_id in place. Cleaner audit trail, simpler state machine.
  • Idempotency keys for consumption should be derived from your message ID (or interaction ID), not a UUID per request. Retry-safe on flaky networks.

Concepts used in this pattern

🤖
Building with an AI agent?
Get this page as markdown: /docs/use-cases/consumer-ai-subscription.md · Full index: /llms.txt