API Platform: Postpaid Metered Billing
How to model a usage-based API platform with postpaid billing, tiered pricing, usage summaries, and contract lifecycle management.
Inspired by: Stripe API, Twilio, OpenAI API, most usage-based developer infra
Think of this as a utility bill for an API: customers consume requests throughout the month, QuotaStack tracks usage at tiered rates, and at period close you receive a usage_summary webhook to invoice them. No prepayment, no spending caps unless you want them.
API Platform
Pattern: postpaid metered API billing.
The problem
You run an API platform. Customers make API calls and are billed monthly based on actual usage. Like Stripe’s API pricing or OpenAI’s usage-based billing. Customers get a credit budget at the start of each cycle, consume it through API calls, and receive an invoice at cycle end for what they used.
You need:
- Postpaid billing: usage first, invoice later.
- Tiered pricing: first 10,000 API calls are cheap, the next 90,000 cost less per call.
- Automatic cycle advancement with usage summaries for invoicing.
- Contract lifecycle: annual contracts with auto-renewal and ending-soon alerts.
- Graceful handling of overdue invoices — do not cut off a paying customer over a delayed ACH.
Credit structure
In postpaid mode, credits function as a budget, not a prepayment. The customer does not pay for them upfront. QuotaStack grants them at cycle start to set the usage ceiling, and the tenant invoices based on actual consumption at cycle end.
| Block type | Priority | Expiry | Source |
|---|---|---|---|
| Monthly budget grant | 10 | End of billing cycle | Credit grant template on plan variant |
| Included credits (optional) | 0 | None | One-time grant on activation |
Budget vs. prepaid: The credit blocks work identically at the API level. The difference is business logic: in prepaid, credits are paid for before use; in postpaid, credits are a spending limit and the tenant invoices after the fact. QuotaStack does not care which model you use — it tracks the credits either way.
Billable metrics and tiered metering
Single metric: api_call
POST /v1/billable-metrics
Idempotency-Key: metric:api_call
{ "key": "api_call", "name": "API Call" }
Tiered metering rule
For graduated tiered pricing: different credit costs at different usage volumes within a cycle.
POST /v1/metering-rules
Idempotency-Key: rule:api_call
{
"billable_metric_key": "api_call",
"cost_type": "tiered",
"tiers": [
{ "up_to": 10000, "credit_cost": 1000 },
{ "up_to": 100000, "credit_cost": 500 },
{ "up_to": null, "credit_cost": 100 }
],
"unit_cost": 1000
}
This defines graduated tiers:
| Tier | Range | Credit cost per call | Per-call in credits |
|---|---|---|---|
| 1 | First 10,000 calls | 1000mc | 1 credit |
| 2 | Next 90,000 calls | 500mc | 0.5 credits |
| 3 | Beyond 100,000 calls | 100mc | 0.1 credits |
Graduated mode: Each tier applies only to the units within its range. A customer making 15,000 calls in a cycle pays:
Tier 1: 10,000 calls x 1000mc = 10,000,000mc (10,000 credits)
Tier 2: 5,000 calls x 500mc = 2,500,000mc (2,500 credits)
Total: 12,500,000mc (12,500 credits)
The tiered computation happens inside QuotaStack’s metering rule engine. Your app just posts units: 1 per API call.
Plan catalog
Plan and variant
POST /v1/plans
Idempotency-Key: plan:api-platform
{ "name": "API Platform", "description": "Usage-based API access" }
POST /v1/plans/{plan_id}/variants
Idempotency-Key: variant:api-monthly-postpaid
{
"name": "Monthly Postpaid",
"billing_cycle": "monthly",
"billing_mode": "postpaid",
"allow_usage_while_overdue": true
}
Key difference from prepaid: billing_mode: "postpaid". QuotaStack auto-advances the cycle at current_period_end without waiting for a renew call. No renewal_due_days or grace_period_days — those are prepaid concepts.
Credit grant template (budget)
POST /v1/plans/{plan_id}/variants/{variant_id}/credit-grants
Idempotency-Key: grant:api-monthly-budget
{
"credits": 100000000,
"grant_interval": "billing_cycle",
"grant_type": "recurring",
"source": "plan_grant",
"expires_after_seconds": 2592000,
"rollover_percentage": 0,
"accumulation_cap": null
}
100,000,000mc = 100,000 credits per cycle. This is the budget ceiling, not a prepayment. At 1 credit per API call (tier 1 rate), this covers 100,000 calls before overage kicks in.
For customers expected to exceed the budget, set overage_policy: "allow" on the customer so the balance can go negative.
Integration flow
1. Customer subscribes
After contract signing and payment setup:
POST /v1/customers
Idempotency-Key: signup:<customer_id>
{
"external_id": "acme_corp",
"overage_policy": "allow"
}
POST /v1/subscriptions
Idempotency-Key: sub:<contract_id>
{
"customer_id": "cus_...",
"plan_variant_id": "pvr_api_monthly_postpaid",
"contract_end": "2027-04-13T00:00:00Z",
"contract_ending_soon_days": 30,
"external_subscription_id": "your_contract_ref_123",
"metadata": {
"account_manager": "jane@yourcompany.com",
"annual_contract_value": "120000"
}
}
QuotaStack:
- Creates the subscription with status
active. - Sets the billing cycle:
current_period_start= now,current_period_end= now + 1 month. - Sets the contract window:
contract_start= now,contract_end= 2027-04-13. - Fires the credit grant template: creates a 100,000-credit budget block.
- Fires
subscription.createdwebhook.
2. Recording API usage
Each time a customer makes an API call, record usage:
POST /v1/usage
Idempotency-Key: usage:<request_id>
{
"customer_id": "cus_...",
"billable_metric_key": "api_call",
"units": 1,
"metadata": {
"endpoint": "/v1/generate",
"request_id": "req_abc123",
"response_status": 200
}
}
Response: 202 Accepted. The usage event is enqueued and processed asynchronously. Your API responds to the customer without waiting for the credit debit to land.
Batch usage for high-throughput APIs:
POST /v1/usage/batch
Idempotency-Key: batch:<batch_id>
{
"events": [
{
"customer_id": "cus_...",
"billable_metric_key": "api_call",
"units": 1,
"idempotency_key": "usage:req_001",
"metadata": { "endpoint": "/v1/generate" }
},
{
"customer_id": "cus_...",
"billable_metric_key": "api_call",
"units": 1,
"idempotency_key": "usage:req_002",
"metadata": { "endpoint": "/v1/analyze" }
}
]
}
Up to 100 events per batch. Each event has its own idempotency key for deduplication.
3. Cycle end — usage summary and invoicing
At current_period_end, QuotaStack automatically advances the subscription and fires subscription.renewed with a usage summary:
{
"event": "subscription.renewed",
"subscription_id": "sub_...",
"customer_id": "cus_...",
"billing_mode": "postpaid",
"prior_period": {
"start": "2026-04-13T00:00:00Z",
"end": "2026-05-13T00:00:00Z"
},
"new_period": {
"start": "2026-05-13T00:00:00Z",
"end": "2026-06-13T00:00:00Z"
},
"credits_granted": 100000000,
"usage_summary": {
"total_credits_consumed": 12500000,
"net_balance_at_cycle_start": 100000000,
"net_balance_at_cycle_end": 87500000,
"by_billable_metric": {
"api_call": {
"units": 15000,
"credits": 12500000
}
}
}
}
Your billing system reads this webhook and computes the invoice:
def handle_subscription_renewed(event):
if event.billing_mode != "postpaid":
return
summary = event.usage_summary
# Map credit consumption back to fiat using your pricing
api_calls = summary.by_billable_metric["api_call"]["units"]
# Your tiered fiat pricing (separate from QuotaStack's credit tiers)
invoice_amount = 0
if api_calls <= 10000:
invoice_amount = api_calls * 0.01 # $0.01 per call
elif api_calls <= 100000:
invoice_amount = 10000 * 0.01 + (api_calls - 10000) * 0.005
else:
invoice_amount = 10000 * 0.01 + 90000 * 0.005 + (api_calls - 100000) * 0.001
# Create invoice via your PSP
stripe.Invoice.create(
customer=event.customer.external_id,
amount=int(invoice_amount * 100),
description=f"API usage: {api_calls} calls"
)
# Optionally: zero the ledger for the prior period
# (only if the balance went negative due to overage)
if summary.net_balance_at_cycle_end < 0:
quotastack.adjust_credits(
customer_id=event.customer_id,
delta=-summary.net_balance_at_cycle_end, # positive adjustment
idempotency_key=f"settle:{event.subscription_id}:{event.prior_period.end}",
metadata={"source": "postpaid_settlement"}
)
The new cycle’s budget is granted automatically. The customer keeps using the API without interruption.
4. Contract lifecycle
30 days before contract end: QuotaStack fires subscription.contract_ending_soon:
{
"event": "subscription.contract_ending_soon",
"subscription_id": "sub_...",
"customer_id": "cus_...",
"contract_end": "2027-04-13T00:00:00Z",
"days_remaining": 30
}
Your account management team uses this to start renewal negotiations.
If the customer renews: Extend the contract:
POST /v1/subscriptions/sub_.../extend
Idempotency-Key: extend:<amendment_id>
{
"contract_end": "2028-04-13T00:00:00Z",
"reason": "annual renewal signed",
"metadata": {
"amendment_id": "AMD-2027-001",
"new_annual_value": "144000"
}
}
The subscription continues auto-advancing at each cycle boundary.
If the customer does not renew: At contract_end, QuotaStack transitions to contract_ended:
{
"event": "subscription.contract_ended",
"subscription_id": "sub_...",
"customer_id": "cus_...",
"contract_end": "2027-04-13T00:00:00Z"
}
No more auto-renewals. No more credit grants. Generate the final usage summary and invoice.
Re-activating a lapsed contract: If the customer comes back, extend the old subscription:
POST /v1/subscriptions/sub_.../extend
Idempotency-Key: extend:<new_amendment_id>
{
"contract_end": "2028-04-13T00:00:00Z",
"reason": "customer returned after lapse"
}
Extend works on contract_ended subscriptions — it flips them back to active and resumes auto-advancing.
Usage event pipeline
For high-throughput API platforms, the usage pipeline must be fast and resilient.
Async ingestion
POST /v1/usage returns 202 Accepted immediately. The event is queued and processed asynchronously, so your API response time stays unaffected by credit-debit latency.
In the background, QuotaStack:
- Looks up the metering rule for
api_call. - Computes the credit cost (applying tiered pricing if applicable).
- Debits the customer’s credit blocks in burn-down order.
- Writes a credit ledger entry.
- Optionally fires a
credit.consumedwebhook.
Idempotency
Every usage event must have a unique idempotency key. Use your internal request ID:
Idempotency-Key: usage:<your_request_id>
If the same event is posted twice (network retry, queue replay), the second is a no-op. This gives at-least-once delivery semantics with exactly-once processing.
Batching
For APIs handling thousands of requests per second, batch usage events rather than posting one at a time. Collect events in a local buffer (e.g., 100 events or 5 seconds, whichever comes first) and send them via POST /v1/usage/batch.
# Pseudocode: usage event buffer
class UsageBuffer:
def __init__(self, max_size=100, flush_interval_seconds=5):
self.buffer = []
self.max_size = max_size
self.flush_interval = flush_interval_seconds
def add(self, customer_id, metric_key, units, request_id, metadata):
self.buffer.append({
"customer_id": customer_id,
"billable_metric_key": metric_key,
"units": units,
"idempotency_key": f"usage:{request_id}",
"metadata": metadata
})
if len(self.buffer) >= self.max_size:
self.flush()
def flush(self):
if not self.buffer:
return
quotastack.post_usage_batch(
events=self.buffer,
idempotency_key=f"batch:{generate_uuid()}"
)
self.buffer = []
Entitlement gating for API calls
For prepaid-like behavior where you want to reject calls when the budget is exhausted, add an entitlement check to your API gateway:
# API gateway middleware
def check_api_entitlement(customer_id):
ent = quotastack.get_entitlement(customer_id, "api_call", units=1)
if not ent.allowed:
return http_response(
status=429,
body={
"error": "usage_limit_exceeded",
"message": "Monthly API call budget exhausted",
"balance": ent.balance,
"affordable_units": ent.affordable_units
}
)
return None # proceed
The entitlement check responds in sub-1ms on the fast path — safe to put on the hot path. See Entitlements: latency and freshness for the full semantics.
For postpaid customers with overage_policy: "allow", skip the entitlement check or use it only for reporting — the balance can go negative and the customer will be invoiced.
Handling overdue invoices
When a postpaid customer has an unpaid invoice, you may want to throttle or block their API access. QuotaStack does not enforce this automatically — it is your business decision.
Option 1: Set overage_policy to “block” when the invoice is overdue:
PATCH /v1/customers/cus_...
Idempotency-Key: block:<invoice_id>
{
"overage_policy": "block"
}
Now the entitlement check returns allowed: false when the balance reaches 0. Restore to "allow" when the invoice is paid.
Option 2: Use allow_usage_while_overdue: false on the plan variant. This blocks usage when the subscription enters overdue status. But postpaid subscriptions do not have an overdue state (that is prepaid-only), so this option only applies if you also use prepaid elements.
Option 3: Handle it entirely in your API gateway. Check your own invoice status and reject requests independently of QuotaStack. This is the most common approach for API platforms.
Example: full cycle with tiered usage
Day 1: Subscription created (postpaid, monthly, annual contract).
100,000-credit budget granted.
Day 1-30: Customer makes 85,000 API calls.
Tier 1: 10,000 calls x 1 credit = 10,000 credits
Tier 2: 75,000 calls x 0.5 credits = 37,500 credits
Total consumed: 47,500 credits.
Balance: 100,000 - 47,500 = 52,500 credits.
Day 30: Cycle auto-advances.
subscription.renewed webhook fires with usage_summary:
total_credits_consumed: 47,500,000 mc (47,500 credits)
by_billable_metric:
api_call: { units: 85,000, credits: 47,500,000 }
Your billing system computes:
85,000 calls at your fiat pricing = $550
Generates Stripe invoice.
New 100,000-credit budget granted for next cycle.
Customer keeps calling the API without interruption.
Month 11: subscription.contract_ending_soon fires.
Sales contacts customer for renewal.
Month 12: Customer signs renewal.
POST /v1/subscriptions/{id}/extend
Contract extends to year 2.
Tips
-
Postpaid auto-advances. Unlike prepaid where you must call
/renew, postpaid subscriptions advance automatically atcurrent_period_end. You do not need to call anything. Thesubscription.renewedwebhook delivers the usage summary for invoicing. -
Usage summary is your invoice source. Do not re-query the credit ledger to compute invoices. The
usage_summaryin thesubscription.renewedwebhook payload has everything: total credits consumed, per-metric breakdown. Map credits back to fiat using your own pricing table. -
Credit tiers are not fiat tiers. QuotaStack’s tiered metering rules define credit costs per unit. Your fiat pricing tiers may differ. A common pattern: set QuotaStack’s credit cost to 1000mc per unit at all tiers (flat), and apply tiered fiat pricing in your invoicing code. Alternatively, mirror your fiat tiers in credit costs so the usage summary directly reflects tiered consumption.
-
Overage with negative balance. When
overage_policy: "allow", the balance can go negative. This is normal for postpaid. The negative balance represents unbilled usage. At cycle end, settle the ledger by posting a positive adjustment after invoicing. -
Contract lifecycle is separate from billing cycle. A 12-month contract with monthly billing has 12 cycles. Each cycle auto-advances. The contract boundary only matters for
contract_ending_soonandcontract_endedevents. Billing continues as normal within the contract. -
Idempotency is critical at scale. At thousands of requests per second, network retries happen. Every usage event must carry a unique, deterministic idempotency key (your request ID is ideal). Duplicate events are silently dropped. This gives you exactly-once credit accounting with at-least-once delivery.
See also: Subscriptions, Billing Modes, Metering Rules, Usage Events.