Metering

How to define billable metrics, configure metering rules with flat/per-unit/tiered pricing, and record usage events that debit credits.

Mental Model

Metering is the bridge between "something happened" and "credits got spent". You define the price list (metering rules), report what happened (usage events), and QuotaStack does the math.

Quick Take

Define billable metrics (what you charge for) and metering rules (how much it costs)

Metrics are typed: metered (credit-based), boolean (on/off), gauge (count-with-cap), or static (JSON config)

Three cost types: flat, per-unit, and tiered (graduated or volume)

Record usage events — the system looks up the rule, computes cost, and debits credits

Batch ingestion supported for high-throughput scenarios

Metering

Metering is how QuotaStack maps real-world actions to credit costs. You define what you charge for (billable metrics), how much it costs (metering rules), and then report when it happens (usage events). The system handles the rest — looking up the rule, computing the cost, and debiting credits from the customer’s account.

Billable metrics

A billable metric is a named thing you charge for. It is identified by a unique key within your tenant.

Examples:

Key	Description
`chat_message`	A single chat message sent.
`look`	An AI-generated outfit look.
`api_call`	One API request to your service.
`storage_gb`	One gigabyte-month of storage.
`plan_purchase`	Purchasing or renewing a subscription plan.

Metric keys are strings. Use lowercase with underscores. The key is what you reference in entitlement checks, usage events, and metering rules.

Metric types

Every billable metric has a type that determines what kind of entitlement it represents. The type is set at creation and cannot be changed afterward.

Type	Value shape	Use case
`metered`	`{}` (governed by credit grants + metering rules)	Credit-based usage — the default. Existing behavior, unchanged.
`boolean`	`{"enabled": true\|false}`	Feature flags (SSO, custom branding, advanced analytics)
`gauge`	`{"cap": <integer>}`	Count-with-cap (max seats, max projects, max environments)
`static`	`{"config": {...}}`	Arbitrary JSON config (rate limits, allowed model lists, webhook quotas)

Metered metrics work exactly as described in the rest of this page — they’re priced by metering rules and debited via usage events. The other three types represent non-credit entitlements: they don’t cost credits but instead declare what a customer’s plan gives them access to. Their values are resolved from plan-variant entitlements.

Each metric also has a default_value — the value used when no plan-variant entitlement explicitly overrides it. For metered metrics this is {} (unused). For the others, it provides a sensible baseline (e.g. {"enabled": false} for a boolean feature flag).

Listing billable metrics

GET /v1/billable-metrics

curl https://api.quotastack.io/v1/billable-metrics \
  -H "X-API-Key: qs_live_..."

This returns all billable metrics defined for your tenant — the named units you charge for (e.g. chat_message, look). Each metric’s pricing lives in a separate metering rule (see below). A billable metric without an active metering rule cannot be charged for.

Getting a single metric

GET /v1/billable-metrics/{key}

curl https://api.quotastack.io/v1/billable-metrics/look \
  -H "X-API-Key: qs_live_..."

Response:

{
  "key": "look",
  "name": "Outfit Look",
  "description": "An AI-generated outfit look.",
  "type": "metered",
  "default_value": {},
  "status": "active"
}

Creating a billable metric

POST /v1/billable-metrics

The type defaults to metered if omitted. Existing integrations that create metrics without a type continue to work as before.

Metered (default):

curl -X POST https://api.quotastack.io/v1/billable-metrics \
  -H "X-API-Key: qs_live_..." \
  -H "Content-Type: application/json" \
  -H "Idempotency-Key: metric-api-call" \
  -d '{
    "key": "api_call",
    "name": "API Call",
    "description": "One API request."
  }'

Boolean — feature flag:

curl -X POST https://api.quotastack.io/v1/billable-metrics \
  -H "X-API-Key: qs_live_..." \
  -H "Content-Type: application/json" \
  -H "Idempotency-Key: metric-sso" \
  -d '{
    "key": "sso",
    "name": "SSO Access",
    "type": "boolean",
    "default_value": {"enabled": false}
  }'

Gauge — count with cap:

curl -X POST https://api.quotastack.io/v1/billable-metrics \
  -H "X-API-Key: qs_live_..." \
  -H "Content-Type: application/json" \
  -H "Idempotency-Key: metric-max-seats" \
  -d '{
    "key": "max_seats",
    "name": "Team Seats",
    "type": "gauge",
    "default_value": {"cap": 5}
  }'

Static — arbitrary config:

curl -X POST https://api.quotastack.io/v1/billable-metrics \
  -H "X-API-Key: qs_live_..." \
  -H "Content-Type: application/json" \
  -H "Idempotency-Key: metric-rate-limit" \
  -d '{
    "key": "rate_limit",
    "name": "Rate Limit Config",
    "type": "static",
    "default_value": {"config": {"rpm": 100}}
  }'

Updating a billable metric

PATCH /v1/billable-metrics/{key}

You can update a metric’s name, description, status, and default_value. The type field is immutable — attempting to change it returns a 422. If you need a different type, create a new metric with a new key.

curl -X PATCH https://api.quotastack.io/v1/billable-metrics/max_seats \
  -H "X-API-Key: qs_live_..." \
  -H "Content-Type: application/json" \
  -d '{
    "default_value": {"cap": 10}
  }'

Metering rules

A metering rule maps a billable metric key to a credit cost. Each tenant has at most one active rule per metric key at any time.

Creating a metering rule

POST /v1/metering-rules

curl -X POST https://api.quotastack.io/v1/metering-rules \
  -H "X-API-Key: qs_live_..." \
  -H "Content-Type: application/json" \
  -H "Idempotency-Key: rule-look-v1" \
  -d '{
    "billable_metric_key": "look",
    "cost_type": "per_unit",
    "unit_cost": 1000
  }'

Rule fields

Field	Type	Required	Description
`billable_metric_key`	string	yes	The metric this rule prices.
`cost_type`	string	yes	`flat`, `per_unit`, or `tiered`.
`base_cost`	int64	for `flat`	Fixed millicredit cost (ignores unit count).
`unit_cost`	int64	for `per_unit`	Millicredits per unit.
`tier_config`	object	for `tiered`	Tier definitions (see below).
`metadata`	object	no	Arbitrary key-value pairs.

Listing metering rules

GET /v1/metering-rules

Supports optional query parameters:

Parameter	Description
`billable_metric_key`	Filter by a specific metric key.
`active_only=true`	Only return currently active rules (no `effective_until`).

Cost types

Flat

A fixed cost regardless of how many units are consumed. Useful for plan purchases or one-time fees.

{
  "billable_metric_key": "plan_purchase",
  "cost_type": "flat",
  "base_cost": 99000
}

Checking 1 unit or 100 units always costs 99,000 mc.

Per-unit

Cost scales linearly with units. The most common type for usage-based billing.

{
  "billable_metric_key": "chat_message",
  "cost_type": "per_unit",
  "unit_cost": 1000
}

Units	Cost
1	1,000 mc
5	5,000 mc
100	100,000 mc

Tiered

Tiered pricing supports two modes: graduated and volume.

Graduated

Each tier prices its own slice of units. The total cost is the sum across all tiers.

{
  "billable_metric_key": "api_call",
  "cost_type": "tiered",
  "tier_config": {
    "mode": "graduated",
    "tiers": [
      { "up_to": 100,  "unit_cost": 500, "flat_cost": 0 },
      { "up_to": 1000, "unit_cost": 300, "flat_cost": 0 },
      { "up_to": null, "unit_cost": 100, "flat_cost": 0 }
    ]
  }
}

For 250 API calls:

Tier	Units	Unit cost	Subtotal
1-100	100	500 mc	50,000 mc
101-250	150	300 mc	45,000 mc
Total	250		95,000 mc

Volume

All units are priced at the rate of the tier they fall into. The tier is determined by the total unit count.

{
  "billable_metric_key": "api_call",
  "cost_type": "tiered",
  "tier_config": {
    "mode": "volume",
    "tiers": [
      { "up_to": 100,  "unit_cost": 500, "flat_cost": 0 },
      { "up_to": 1000, "unit_cost": 300, "flat_cost": 0 },
      { "up_to": null, "unit_cost": 100, "flat_cost": 0 }
    ]
  }
}

For 250 API calls: all 250 fall into the second tier (up_to 1000), so total cost = 250 * 300 = 75,000 mc.

Tier fields

Field	Type	Description
`up_to`	int64 or null	Upper bound for this tier. `null` means infinity (must be the last tier).
`unit_cost`	int64	Millicredits per unit within this tier.
`flat_cost`	int64	Fixed millicredit fee added once per tier entered (graduated) or once per event (volume). See below.

`flat_cost` behavior

Graduated: added once when the customer’s cumulative usage crosses into the tier.
Volume: added once, alongside the per-unit cost at the single tier all units fall into.

Concrete example — two tiers:

Tier 1: 0–100 units, unit_cost: 10, flat_cost: 100
Tier 2: 101+ units, unit_cost: 5, flat_cost: 200

For 150 units:

Mode	Calculation	Total
Graduated	`100 (tier-1 flat) + 100 × 10 + 200 (tier-2 flat) + 50 × 5`	`1,550 mc`
Volume	`200 (tier-2 flat) + 150 × 5`	`950 mc`

Rule versioning

Creating a new metering rule for a billable_metric_key that already has an active rule automatically deactivates the old one. The old rule gets an effective_until timestamp set to now; the new rule becomes the active rule with effective_until = null.

This means you can update pricing without downtime:

Create a new rule for the same metric key with the new cost.
The old rule is deactivated immediately.
All subsequent usage events and entitlement checks use the new rule.

Historical ledger entries retain the cost they were computed with at the time of the event. Changing a rule does not retroactively alter past charges.

Usage events

Usage events are how you tell QuotaStack that something billable happened. Recording a usage event triggers the pipeline that debits credits.

Recording a usage event

POST /v1/usage

curl -X POST https://api.quotastack.io/v1/usage \
  -H "X-API-Key: qs_live_..." \
  -H "Content-Type: application/json" \
  -H "Idempotency-Key: req-abc-123" \
  -d '{
    "external_customer_id": "user_abc",
    "billable_metric_key": "look",
    "units": 1,
    "idempotency_key": "look-gen-xyz-789",
    "metadata": {
      "outfit_id": "outfit_456"
    }
  }'

Field	Type	Required	Description
`external_customer_id`	string	yes*	Your tenant ID for the customer. See Customer identification.
`customer_id`	string	yes*	Alternative: QuotaStack UUID. Exactly one of the two is required.
`billable_metric_key`	string	yes	The metric key being consumed.
`units`	int64	yes	Number of units consumed (must be positive).
`idempotency_key`	string	yes	Unique key for this event. Prevents double-charging on retries.
`occurred_at`	timestamp	no	When the event happened. Defaults to now. Must not be in the future.
`metadata`	object	no	Arbitrary key-value pairs attached to the event.

The response returns immediately with status 202 Accepted:

{
  "event_id": "evt_01HXY...",
  "idempotency_key": "look-gen-xyz-789",
  "status": "accepted",
  "estimated_cost": 1000,
  "duplicate": false
}

Two-layer idempotency

Usage events have two levels of idempotency protection:

HTTP-level: The Idempotency-Key header prevents duplicate HTTP requests. If you retry the same POST, you get the same response.
Event-level: The idempotency_key field in the body prevents duplicate business events. Even if two different HTTP requests carry the same event idempotency key, the event is processed at most once.

Batch usage events

Record multiple events in a single request:

POST /v1/usage/batch

curl -X POST https://api.quotastack.io/v1/usage/batch \
  -H "X-API-Key: qs_live_..." \
  -H "Content-Type: application/json" \
  -H "Idempotency-Key: batch-abc-001" \
  -d '{
    "events": [
      {
        "external_customer_id": "user_abc",
        "billable_metric_key": "chat_message",
        "units": 1,
        "idempotency_key": "msg-001"
      },
      {
        "external_customer_id": "user_abc",
        "billable_metric_key": "chat_message",
        "units": 1,
        "idempotency_key": "msg-002"
      }
    ]
  }'

Each event in the batch is validated independently. The response includes per-event status — one entry per input event, in the same order:

{
  "accepted": 1,
  "rejected": 1,
  "results": [
    {
      "event_id": "019d...",
      "idempotency_key": "msg-001",
      "status": "accepted",
      "estimated_cost": 1000,
      "duplicate": false,
      "index": 0
    },
    {
      "idempotency_key": "msg-002",
      "status": "rejected",
      "error": "billable_metric_key is required; units must be positive",
      "index": 1
    }
  ]
}

Rejected events omit event_id and estimated_cost and include an error string (multiple errors joined with ; ). The overall HTTP response is always 202 Accepted when the batch is well-formed — per-event rejections don’t fail the whole request.

Event-level `idempotency_key` scope

The idempotency_key field inside the event body is scoped per-tenant + per-environment + per-credit-block. Two different customers within the same tenant can safely share the same idempotency_key value without collision. Still, in practice, prefer a globally unique key derived from your business event (message ID, request ID) so your own logs and reconciliation stay clean.

This is separate from the HTTP Idempotency-Key header, which is per-tenant-scoped and deduplicates the whole POST request. See Idempotency for details.

The usage event pipeline

Usage events are processed asynchronously. The POST enqueues the event and returns immediately; a background pipeline then applies the debit in order:

POST /v1/usage

↓

Enqueued (per-tenant, ordered)

↓

Lookup active metering rule for billable_metric_key

↓

Compute credit cost using the rule

↓

Debit credits from customer's account (burn-down order)

↓

Refresh entitlement summary

The POST returns 202 Accepted immediately after enqueue. Events for a given tenant are processed in order, ensuring that credits are debited exactly once per event (using the event-level idempotency key).

If processing a particular event fails with a transient error, it’s retried automatically. Events are only acknowledged after successful processing.

Typical latency

Pipeline processing is typically single-digit milliseconds to a few seconds end-to-end. There is no strict SLA — treat it as near-real-time, not guaranteed-real-time.

If your flow requires strict ordering or synchronous insufficient-credit errors, use reservations instead of usage events.

Insufficient balance during async processing

The 202 Accepted confirms the event is enqueued, not that the debit succeeded. At processing time, if the customer’s effective_balance is less than the computed cost, the debit fails silently — no negative balance is written, the customer is not charged, and no webhook is emitted today.

To avoid this, always:

Pre-check via the single-metric entitlement endpoint before sending the event, or
Use /v1/reservations for expensive operations where you need a synchronous 402 on insufficient credits.

Overage policy (allow, notify) does not automatically cause silent overdrafts. If you want to permit overage, you must check the entitlement’s allowed: true response (which reflects the policy) before posting the usage event; the backend pipeline itself will not write a negative balance even under allow.

Periodic reconciliation — comparing the customer balance from GET /v1/customers/{id}/credits against your own expected totals — is the safest way to detect skew.

Common Mistakes

The mistakes developers typically make with this concept — and what to do instead.

Don't look for a /debit endpoint — it doesn't exist

Why

Debiting is intentionally not directly exposed. You record a usage event and the async pipeline handles rule lookup, cost calculation, and debit atomically.

Don't change a billable_metric_key mid-flight

Why

Metering rules are keyed by the metric key. Renaming it orphans historical rules and breaks in-flight events. Create a new metric instead.

Don't try to change a metric's type after creation

Why

Type is immutable. If you need a different type, create a new metric with a new key.

Metering

Metering

Billable metrics

Metric types

Listing billable metrics

Getting a single metric

Creating a billable metric

Updating a billable metric

Metering rules

Creating a metering rule

Rule fields

Listing metering rules

Cost types

Flat

Per-unit

Tiered

Graduated

Volume

Tier fields

flat_cost behavior

Rule versioning

Usage events

Recording a usage event

Two-layer idempotency

Batch usage events

Event-level idempotency_key scope

The usage event pipeline

Typical latency

Insufficient balance during async processing

Common Mistakes

`flat_cost` behavior

Event-level `idempotency_key` scope