Consumer AI Subscription: Tiers with Resetting Quotas
How to model a ChatGPT Plus- or Claude Pro-style consumer AI subscription with fixed monthly pricing and quotas that reset on a sub-monthly cadence.
Inspired by: ChatGPT Plus, Claude Pro, Perplexity Pro, Cursor Pro
Think of this as ChatGPT Plus or Claude Pro: flat monthly price, generous quota, but the quota resets every few hours or every day — you can't save it up. Each interval, old credits expire and a fresh allowance lands. No hoarding, no unbounded cost.
Consumer AI Subscription
Pattern: prepaid subscription + resetting quotas + anti-hoarding.
The problem
You run a consumer AI product. Users pay a flat monthly fee for a tier (Free, Plus, Pro) and expect a generous quota — but not unlimited usage, because every request has marginal inference cost. Two competing constraints:
- Users hate dead ends. A monthly quota that runs out on day 12 feels punitive.
- Margins require anti-hoarding. If you let heavy users save their monthly quota for a binge on day 30, you get whales that blow up infrastructure costs in bursts.
The answer most consumer AI products converge on: a rolling window quota. ChatGPT Plus gives you weekly limits. Claude Pro gives you 5-hour windows. Cursor Pro gives you hourly bursts. Users get a fresh allowance every few hours; unused credits don’t accumulate; cost stays predictable.
This use case shows how to model that in QuotaStack.
Credit structure
A single recurring grant, with expiry equal to the grant interval. Each interval creates a fresh credit block; the previous interval’s credits expire exactly when the new grant lands. No rollover, no hoarding.
| Block type | Priority | Expiry | Source |
|---|---|---|---|
| Tier quota grant | 10 | End of grant interval | Plan variant’s credit_grants |
| Optional overflow wallet | 0 | None | Topup grant after pay-as-you-go purchase |
The overflow wallet is optional. It lets heavy users top up real money when they hit their tier limit — useful for prosumer pricing (Claude Pro users who want extra Opus usage, Cursor users hitting their daily burst, etc.). Wallet credits burn only after the tier quota is exhausted.
Plan catalog
Three tiers — the shape varies per product, but the mechanic is identical.
| Tier | Price/month | Quota | Reset cadence |
|---|---|---|---|
| Free | $0 | 10 messages | Daily |
| Plus | $20 | 200 messages | Daily |
| Pro | $80 | 50 messages | Every 4 hours |
“Messages” is the unit users understand. Under the hood it’s credits (1 message = 1,000 mc in these examples), metered via a chat_message billable metric. You can scale per-model cost by charging different unit counts for different models (Sonnet = 1 unit, Opus = 5 units) using QuotaStack’s per-unit metering.
Plan variant configuration
The critical trick: expires_after_seconds matches the grant interval.
Daily reset (Plus tier):
curl -X POST https://api.quotastack.io/v1/plans/{plan_id}/variants \
-H "X-API-Key: qs_live_..." \
-H "Idempotency-Key: variant-plus-v1" \
-H "Content-Type: application/json" \
-d '{
"name": "Plus",
"billing_cycle": "monthly",
"billing_mode": "prepaid",
"price_cents": 2000,
"currency": "USD"
}'
Then add the recurring grant:
curl -X POST https://api.quotastack.io/v1/plans/{plan_id}/variants/{variant_id}/grants \
-H "X-API-Key: qs_live_..." \
-H "Idempotency-Key: grant-plus-daily-v1" \
-H "Content-Type: application/json" \
-d '{
"credits": 200000,
"grant_interval": "daily",
"grant_type": "recurring",
"expires_after_seconds": 86400,
"rollover_percentage": 0,
"priority": 10
}'
What each field does:
| Field | Why it matters |
|---|---|
grant_interval: daily | Fires a new grant every 24 hours from subscription activation |
expires_after_seconds: 86400 | Matches the interval. Old credits expire exactly when the new grant lands |
rollover_percentage: 0 | No hoarding — unused credits don’t carry forward |
priority: 10 | Burns before any wallet credits (see burn order) |
The invariant: at any moment the customer has at most one active quota block plus any optional wallet credits. When they check their balance, they see “180 / 200 messages, resets in 3h 17m” — one number, one reset.
Custom cadences (Claude Pro, hourly resets, etc.). grant_interval also accepts ISO 8601 durations — PT5H (every 5 hours), PT30M (every 30 minutes), P3D (every 3 days), etc. The grant cadence is independent of the subscription’s billing cycle, so you can have monthly billing with a 5-hour quota reset.
curl -X POST https://api.quotastack.io/v1/plans/{plan_id}/variants/{variant_id}/grants \
-H "X-API-Key: qs_live_..." \
-H "Idempotency-Key: grant-pro-claude-v1" \
-H "Content-Type: application/json" \
-d '{
"credits": 50000,
"grant_interval": "PT5H",
"grant_type": "recurring",
"expires_after_seconds": 18000,
"rollover_percentage": 0,
"priority": 10
}'
This configures a Claude Pro–style plan: every 5 hours, a fresh 50,000-credit block is issued; the previous block expires exactly when the new grant lands.
Supported interval formats
| Form | Example | Notes |
|---|---|---|
| Keyword | "daily", "weekly", "monthly", "billing_cycle", "on_activation" | Best for standard cadences. |
| ISO 8601 duration | "PT5H", "PT30M", "P3D", "P1DT12H" | Best for custom cadences. Supported letters: D (days), H (hours), M (minutes), S (seconds). |
For year/month durations, use the yearly / monthly keywords — months and years have variable length, so ISO P1M / P1Y are intentionally not supported.
Minimum interval: 5 minutes (PT5M). Anything shorter returns 422.
No drift: grants fire on the subscription’s anniversary schedule (e.g., every 5 hours from subscription.created_at), not relative to the previous fire. A delayed or missed grant doesn’t shift future ones forward.
No backfill on downtime: if QuotaStack has a brief outage that crosses a reset boundary, users get the current interval’s credits on recovery — previous windows are not backfilled (which would be unexpected free credits).
Subscription lifecycle
Standard prepaid flow, no different from SaaS Subscription:
# Customer subscribes (charge via your payment provider first, then tell QuotaStack)
curl -X POST https://api.quotastack.io/v1/subscriptions \
-H "X-API-Key: qs_live_..." \
-H "Idempotency-Key: sub-create:user_abc:pv_plus" \
-H "Content-Type: application/json" \
-d '{
"external_customer_id": "user_abc",
"plan_variant_id": "pv_plus"
}'
On activation: the first quota grant fires immediately — customer has their daily quota available right away. The next grant fires 24 hours later. And so on every interval, for the life of the subscription.
Checking quota in the UI
Every user expects two things on their profile page:
- “How many messages do I have left?”
- “When does my quota reset?”
Both come from a single entitlement check:
curl "https://api.quotastack.io/v1/customer-by-external-id/user_abc/entitlements/chat_message" \
-H "X-API-Key: qs_live_..."
Response:
{
"allowed": true,
"customer_id": "019d...",
"external_customer_id": "user_abc",
"billable_metric_key": "chat_message",
"units": 1,
"balance": 180000,
"reserved_balance": 0,
"effective_balance": 180000,
"estimated_cost": 1000,
"balance_after": 179000
}
180 messages remaining (180,000 mc / 1,000 mc per unit). To find the reset time, fetch the account’s active blocks:
curl "https://api.quotastack.io/v1/customer-by-external-id/user_abc/credits?include_blocks=true" \
-H "X-API-Key: qs_live_..."
{
"balance": 180000,
"blocks": [
{
"id": "019d8a20-4ff5-7be0-81da-e1454b3d6f64",
"remaining_amount": 180000,
"priority": 10,
"expires_at": "2026-04-15T09:00:00Z",
"source": "plan_grant"
}
]
}
The block’s expires_at is your reset time. Format it as “resets in 3h 17m” in your UI.
Per-message consumption
Each message is a straightforward usage event:
curl -X POST https://api.quotastack.io/v1/usage \
-H "X-API-Key: qs_live_..." \
-H "Idempotency-Key: usage:{message_id}" \
-H "Content-Type: application/json" \
-d '{
"external_customer_id": "user_abc",
"billable_metric_key": "chat_message",
"units": 1,
"idempotency_key": "msg_k8x2m"
}'
For per-model pricing (e.g., Sonnet = 1 unit, Opus = 5 units), just send different units values. The metering rule stays the same.
Tier upgrades mid-window
A Plus user upgrades to Pro mid-day. What happens?
Recommended path: swap the subscription to the new variant. QuotaStack cancels the remaining Plus quota grant and issues the first Pro grant immediately.
# Upgrade: cancel the old subscription, create a new one on the Pro variant
curl -X POST https://api.quotastack.io/v1/subscriptions/{old_sub_id}/cancel \
-H "X-API-Key: qs_live_..." \
-H "Idempotency-Key: cancel:sub_plus_user_abc" \
-H "Content-Type: application/json" \
-d '{"cancel_immediately": true, "reason": "Upgrade to Pro"}'
curl -X POST https://api.quotastack.io/v1/subscriptions \
-H "X-API-Key: qs_live_..." \
-H "Idempotency-Key: sub-create:user_abc:pv_pro" \
-H "Content-Type: application/json" \
-d '{
"external_customer_id": "user_abc",
"plan_variant_id": "pv_pro"
}'
What the user sees: their quota immediately resets to the Pro allowance (50 messages, 4-hour window). The remaining Plus credits expire on cancel. The reset timer is based on the new subscription’s activation.
If you want to credit back the unused Plus window prorata, issue a POST /v1/customers/{id}/credits/adjust with a small promotional grant. This is a product decision — most consumer AI products just give the full Pro allowance immediately and call it “we upgraded you, enjoy.”
Overflow wallet (optional)
Add a pay-as-you-go top-up option for heavy users who hit their tier limit:
# After payment confirms, grant wallet credits
curl -X POST https://api.quotastack.io/v1/topup/grant \
-H "X-API-Key: qs_live_..." \
-H "Idempotency-Key: topup:pay_abc123" \
-H "Content-Type: application/json" \
-d '{
"external_customer_id": "user_abc",
"credits": 100000,
"price_paid": 500,
"currency": "USD",
"external_payment_id": "pay_abc123",
"priority": 0
}'
Priority 0, no expiry, no rollover config — this is the wallet pattern from AI Chat App. Because the tier grant is priority 10 and the wallet is priority 0, the tier burns first. Wallet credits only dip when the customer exhausts their current window’s quota.
This is how Cursor’s “overages” and Claude’s “API access” work — subscription for the common case, wallet/API for heavy users.
Per-feature quotas
Some products (Claude Pro) separate quotas per model or feature. Sonnet has its own 5-hour window, Opus has its own 5-hour window, and they don’t share.
Model it with separate billable metrics per feature, each with its own recurring grant:
# Sonnet quota — 5-hour reset
curl -X POST https://api.quotastack.io/v1/plans/{plan_id}/variants/{variant_id}/grants \
-H "X-API-Key: qs_live_..." \
-H "Idempotency-Key: grant-pro-sonnet-v1" \
-H "Content-Type: application/json" \
-d '{
"credits": 100000,
"grant_interval": "PT5H",
"grant_type": "recurring",
"expires_after_seconds": 18000,
"rollover_percentage": 0,
"priority": 10,
"metadata": { "feature": "sonnet" }
}'
# Opus quota — separate grant, same plan variant, same 5-hour reset
curl -X POST https://api.quotastack.io/v1/plans/{plan_id}/variants/{variant_id}/grants \
-H "X-API-Key: qs_live_..." \
-H "Idempotency-Key: grant-pro-opus-v1" \
-H "Content-Type: application/json" \
-d '{
"credits": 20000,
"grant_interval": "PT5H",
"grant_type": "recurring",
"expires_after_seconds": 18000,
"rollover_percentage": 0,
"priority": 10,
"metadata": { "feature": "opus" }
}'
Usage events then reference different billable metrics:
{ "external_customer_id": "user_abc", "billable_metric_key": "sonnet_message", "units": 1 }
{ "external_customer_id": "user_abc", "billable_metric_key": "opus_message", "units": 1 }
Each metric has its own metering rule targeting the specific block via matching metadata.feature. Your UI surfaces two reset timers — “Sonnet resets in 2h 14m · Opus resets in 3h 47m” — both sourced from the respective block’s expires_at.
Pause and resume
If the customer pauses their subscription:
- No future quota grants fire.
- The existing block stays until it expires (they get to finish their current window).
- On resume, grants restart on the next cadence boundary.
The subscription.paused and subscription.resumed webhooks let you reflect this in your UI.
Edge cases
- What if the user is mid-message when their quota runs out? Standard metering behavior — the
POST /v1/usagecall for that message goes through (it was enqueued) but the debit leaves balance at 0 or just below. Next entitlement check returnsallowed: falseand your UI shows “you’ve used up this window’s messages, resets in Xh Ym.” - Time zones. Quota grants fire on the subscription’s anniversary schedule, not wall-clock midnight. A user who subscribed at 3:47pm UTC gets their daily resets at 3:47pm UTC. Display the reset time in the user’s local time zone in your UI.
- Daylight saving time. Grant intervals are defined in seconds — DST has no effect. A “daily” grant is 86,400 seconds, full stop.
- Plan upgrade changes quota mid-window. Cancel + re-create (see “Tier upgrades”). New quota available immediately.
Tips
- Match
expires_after_secondsto the grant interval exactly. This is the entire anti-hoarding mechanic. If they drift apart you’ll get overlapping blocks and users will hoard. - Display reset time from the block’s
expires_at. Don’t compute it from the subscription’screated_at— it will drift. The block is the source of truth. - Keep billable metrics separate per feature if you want independent quotas. Don’t try to share one metric across features with clever metadata filtering; it gets brittle fast.
- Upgrade = cancel + re-create. Don’t try to patch the subscription’s
plan_variant_idin place. Cleaner audit trail, simpler state machine. - Idempotency keys for consumption should be derived from your message ID (or interaction ID), not a UUID per request. Retry-safe on flaky networks.