Metering
How to define billable metrics, configure metering rules with flat/per-unit/tiered pricing, and record usage events that debit credits.
Metering is the bridge between "something happened" and "credits got spent". You define the price list (metering rules), report what happened (usage events), and QuotaStack does the math.
Metering
Metering is how QuotaStack maps real-world actions to credit costs. You define what you charge for (billable metrics), how much it costs (metering rules), and then report when it happens (usage events). The system handles the rest — looking up the rule, computing the cost, and debiting credits from the customer’s account.
Billable metrics
A billable metric is a named thing you charge for. It is identified by a unique key within your tenant.
Examples:
| Key | Description |
|---|---|
chat_message | A single chat message sent. |
look | An AI-generated outfit look. |
api_call | One API request to your service. |
storage_gb | One gigabyte-month of storage. |
plan_purchase | Purchasing or renewing a subscription plan. |
Metric keys are strings. Use lowercase with underscores. The key is what you reference in entitlement checks, usage events, and metering rules.
Listing billable metrics
GET /v1/billable-metrics
curl https://api.quotastack.io/v1/billable-metrics \
-H "X-API-Key: qs_live_..."
This returns all billable metrics defined for your tenant — the named units you charge for (e.g. chat_message, look). Each metric’s pricing lives in a separate metering rule (see below). A billable metric without an active metering rule cannot be charged for.
Getting a single metric
GET /v1/billable-metrics/{key}
curl https://api.quotastack.io/v1/billable-metrics/look \
-H "X-API-Key: qs_live_..."
Metering rules
A metering rule maps a billable metric key to a credit cost. Each tenant has at most one active rule per metric key at any time.
Creating a metering rule
POST /v1/metering-rules
curl -X POST https://api.quotastack.io/v1/metering-rules \
-H "X-API-Key: qs_live_..." \
-H "Content-Type: application/json" \
-H "Idempotency-Key: rule-look-v1" \
-d '{
"billable_metric_key": "look",
"cost_type": "per_unit",
"unit_cost": 1000
}'
Rule fields
| Field | Type | Required | Description |
|---|---|---|---|
billable_metric_key | string | yes | The metric this rule prices. |
cost_type | string | yes | flat, per_unit, or tiered. |
base_cost | int64 | for flat | Fixed millicredit cost (ignores unit count). |
unit_cost | int64 | for per_unit | Millicredits per unit. |
tier_config | object | for tiered | Tier definitions (see below). |
metadata | object | no | Arbitrary key-value pairs. |
Listing metering rules
GET /v1/metering-rules
Supports optional query parameters:
| Parameter | Description |
|---|---|
billable_metric_key | Filter by a specific metric key. |
active_only=true | Only return currently active rules (no effective_until). |
Cost types
Flat
A fixed cost regardless of how many units are consumed. Useful for plan purchases or one-time fees.
{
"billable_metric_key": "plan_purchase",
"cost_type": "flat",
"base_cost": 99000
}
Checking 1 unit or 100 units always costs 99,000 mc.
Per-unit
Cost scales linearly with units. The most common type for usage-based billing.
{
"billable_metric_key": "chat_message",
"cost_type": "per_unit",
"unit_cost": 1000
}
| Units | Cost |
|---|---|
| 1 | 1,000 mc |
| 5 | 5,000 mc |
| 100 | 100,000 mc |
Tiered
Tiered pricing supports two modes: graduated and volume.
Graduated
Each tier prices its own slice of units. The total cost is the sum across all tiers.
{
"billable_metric_key": "api_call",
"cost_type": "tiered",
"tier_config": {
"mode": "graduated",
"tiers": [
{ "up_to": 100, "unit_cost": 500, "flat_cost": 0 },
{ "up_to": 1000, "unit_cost": 300, "flat_cost": 0 },
{ "up_to": null, "unit_cost": 100, "flat_cost": 0 }
]
}
}
For 250 API calls:
| Tier | Units | Unit cost | Subtotal |
|---|---|---|---|
| 1-100 | 100 | 500 mc | 50,000 mc |
| 101-250 | 150 | 300 mc | 45,000 mc |
| Total | 250 | 95,000 mc |
Volume
All units are priced at the rate of the tier they fall into. The tier is determined by the total unit count.
{
"billable_metric_key": "api_call",
"cost_type": "tiered",
"tier_config": {
"mode": "volume",
"tiers": [
{ "up_to": 100, "unit_cost": 500, "flat_cost": 0 },
{ "up_to": 1000, "unit_cost": 300, "flat_cost": 0 },
{ "up_to": null, "unit_cost": 100, "flat_cost": 0 }
]
}
}
For 250 API calls: all 250 fall into the second tier (up_to 1000), so total cost = 250 * 300 = 75,000 mc.
Tier fields
| Field | Type | Description |
|---|---|---|
up_to | int64 or null | Upper bound for this tier. null means infinity (must be the last tier). |
unit_cost | int64 | Millicredits per unit within this tier. |
flat_cost | int64 | Fixed millicredit fee added once per tier entered (graduated) or once per event (volume). See below. |
flat_cost behavior
- Graduated: added once when the customer’s cumulative usage crosses into the tier.
- Volume: added once, alongside the per-unit cost at the single tier all units fall into.
Concrete example — two tiers:
- Tier 1: 0–100 units,
unit_cost: 10,flat_cost: 100 - Tier 2: 101+ units,
unit_cost: 5,flat_cost: 200
For 150 units:
| Mode | Calculation | Total |
|---|---|---|
| Graduated | 100 (tier-1 flat) + 100 × 10 + 200 (tier-2 flat) + 50 × 5 | 1,550 mc |
| Volume | 200 (tier-2 flat) + 150 × 5 | 950 mc |
Rule versioning
Creating a new metering rule for a billable_metric_key that already has an active rule automatically deactivates the old one. The old rule gets an effective_until timestamp set to now; the new rule becomes the active rule with effective_until = null.
This means you can update pricing without downtime:
- Create a new rule for the same metric key with the new cost.
- The old rule is deactivated immediately.
- All subsequent usage events and entitlement checks use the new rule.
Historical ledger entries retain the cost they were computed with at the time of the event. Changing a rule does not retroactively alter past charges.
Usage events
Usage events are how you tell QuotaStack that something billable happened. Recording a usage event triggers the pipeline that debits credits.
Recording a usage event
POST /v1/usage
curl -X POST https://api.quotastack.io/v1/usage \
-H "X-API-Key: qs_live_..." \
-H "Content-Type: application/json" \
-H "Idempotency-Key: req-abc-123" \
-d '{
"external_customer_id": "user_abc",
"billable_metric_key": "look",
"units": 1,
"idempotency_key": "look-gen-xyz-789",
"metadata": {
"outfit_id": "outfit_456"
}
}'
| Field | Type | Required | Description |
|---|---|---|---|
external_customer_id | string | yes* | Your tenant ID for the customer. See Customer identification. |
customer_id | string | yes* | Alternative: QuotaStack UUID. Exactly one of the two is required. |
billable_metric_key | string | yes | The metric key being consumed. |
units | int64 | yes | Number of units consumed (must be positive). |
idempotency_key | string | yes | Unique key for this event. Prevents double-charging on retries. |
occurred_at | timestamp | no | When the event happened. Defaults to now. Must not be in the future. |
metadata | object | no | Arbitrary key-value pairs attached to the event. |
The response returns immediately with status 202 Accepted:
{
"event_id": "evt_01HXY...",
"idempotency_key": "look-gen-xyz-789",
"status": "accepted",
"estimated_cost": 1000,
"duplicate": false
}
Two-layer idempotency
Usage events have two levels of idempotency protection:
- HTTP-level: The
Idempotency-Keyheader prevents duplicate HTTP requests. If you retry the same POST, you get the same response. - Event-level: The
idempotency_keyfield in the body prevents duplicate business events. Even if two different HTTP requests carry the same event idempotency key, the event is processed at most once.
Batch usage events
Record multiple events in a single request:
POST /v1/usage/batch
curl -X POST https://api.quotastack.io/v1/usage/batch \
-H "X-API-Key: qs_live_..." \
-H "Content-Type: application/json" \
-H "Idempotency-Key: batch-abc-001" \
-d '{
"events": [
{
"external_customer_id": "user_abc",
"billable_metric_key": "chat_message",
"units": 1,
"idempotency_key": "msg-001"
},
{
"external_customer_id": "user_abc",
"billable_metric_key": "chat_message",
"units": 1,
"idempotency_key": "msg-002"
}
]
}'
Each event in the batch is validated independently. The response includes per-event status — one entry per input event, in the same order:
{
"accepted": 1,
"rejected": 1,
"results": [
{
"event_id": "019d...",
"idempotency_key": "msg-001",
"status": "accepted",
"estimated_cost": 1000,
"duplicate": false,
"index": 0
},
{
"idempotency_key": "msg-002",
"status": "rejected",
"error": "billable_metric_key is required; units must be positive",
"index": 1
}
]
}
Rejected events omit event_id and estimated_cost and include an error string (multiple errors joined with ; ). The overall HTTP response is always 202 Accepted when the batch is well-formed — per-event rejections don’t fail the whole request.
Event-level idempotency_key scope
The idempotency_key field inside the event body is scoped per-tenant + per-environment + per-credit-block. Two different customers within the same tenant can safely share the same idempotency_key value without collision. Still, in practice, prefer a globally unique key derived from your business event (message ID, request ID) so your own logs and reconciliation stay clean.
This is separate from the HTTP Idempotency-Key header, which is per-tenant-scoped and deduplicates the whole POST request. See Idempotency for details.
The usage event pipeline
Usage events are processed asynchronously. The POST enqueues the event and returns immediately; a background pipeline then applies the debit in order:
The POST returns 202 Accepted immediately after enqueue. Events for a given tenant are processed in order, ensuring that credits are debited exactly once per event (using the event-level idempotency key).
If processing a particular event fails with a transient error, it’s retried automatically. Events are only acknowledged after successful processing.
Typical latency
Pipeline processing is typically single-digit milliseconds to a few seconds end-to-end. There is no strict SLA — treat it as near-real-time, not guaranteed-real-time.
If your flow requires strict ordering or synchronous insufficient-credit errors, use reservations instead of usage events.
Insufficient balance during async processing
The 202 Accepted confirms the event is enqueued, not that the debit succeeded. At processing time, if the customer’s effective_balance is less than the computed cost, the debit fails silently — no negative balance is written, the customer is not charged, and no webhook is emitted today.
To avoid this, always:
- Pre-check via the single-metric entitlement endpoint before sending the event, or
- Use
/v1/reservefor expensive operations where you need a synchronous 402 on insufficient credits.
Overage policy (allow, notify) does not automatically cause silent overdrafts. If you want to permit overage, you must check the entitlement’s allowed: true response (which reflects the policy) before posting the usage event; the backend pipeline itself will not write a negative balance even under allow.
Periodic reconciliation — comparing the customer balance from GET /v1/customers/{id}/credits against your own expected totals — is the safest way to detect skew.
Common Mistakes
The mistakes developers typically make with this concept — and what to do instead.
/debit endpoint — it doesn't exist