Entitlements

How to check whether a customer can perform an action, how entitlement results are computed, and how caching keeps checks fast on the hot path.

Mental Model

An entitlement check is the bouncer at the door. Before your app starts an expensive operation, it asks QuotaStack "can this customer afford this?" — answer comes back in milliseconds, cached and ready.

Quick Take

One endpoint answers: "Can this customer do this right now?"

Returns allowed, estimated cost, and remaining balance

Cost computed from metering rules: flat, per-unit, or tiered

Non-metered entitlements (boolean, gauge, static) are resolved from plan-variant attachments, not credit balance

Hot-path checks return in sub-millisecond latency, safe to gate every action

Entitlements

An entitlement check answers one question: “Can this customer perform this action right now?”

You call it before starting work — before generating an image, sending a message, making an API call. The response tells you whether to proceed, how much it will cost, and how many more units the customer can afford.

The check endpoint

Two URL forms — pick the one that matches the ID you have (see Customer identification):

GET /v1/customers/{customer_id}/entitlements/{billable_metric_key}?units=N
GET /v1/customer-by-external-id/{external_id}/entitlements/{billable_metric_key}?units=N

Parameter	Location	Required	Default	Description
`customer_id` / `external_id`	path	yes	—	The customer to check, in either ID form.
`billable_metric_key`	path	yes	—	The metric key (e.g. `chat_message`, `look`, `api_call`).
`units`	query	no	1	How many units the customer wants to consume.

Example request

Check if a customer can generate 1 outfit look (which costs 1,000 mc per the metering rule). Using the external-id form:

curl "https://api.quotastack.io/v1/customer-by-external-id/user_abc/entitlements/look?units=1" \
  -H "X-API-Key: qs_live_..."

Example response

{
  "allowed": true,
  "customer_id": "019d6258-07ba-7418-83be-58f5fde53e4e",
  "external_customer_id": "user_abc",
  "billable_metric_key": "look",
  "units": 1,
  "balance": 150000,
  "reserved_balance": 10000,
  "effective_balance": 140000,
  "estimated_cost": 1000,
  "balance_after": 139000,
  "subscription_status": "active",
  "overage_policy": "block"
}

Response fields

Field	Type	Description
`allowed`	boolean	Whether the customer can perform the action.
`balance`	int64	Total millicredits in the account.
`reserved_balance`	int64	Millicredits held by active reservations.
`effective_balance`	int64	`balance - reserved_balance`. The usable amount.
`estimated_cost`	int64	Millicredits this operation would cost, computed from the active metering rule.
`balance_after`	int64	`effective_balance - estimated_cost`. Can be negative if the overage policy allows it.
`subscription_status`	string or null	The customer’s subscription status (`active`, `trialing`, `overdue`, etc.), or null if no subscription.
`overage_policy`	string	Tenant-level policy: `block` (deny when insufficient), `allow` (permit overage), or `notify` (permit but flag).

When is `allowed` true?

If effective_balance >= estimated_cost, the customer has enough credits. allowed = true.
If the overage policy is allow or notify, allowed = true regardless of balance. The balance_after field will be negative, signaling overage.
If the customer’s subscription is overdue and the plan variant has allow_usage_while_overdue = false, allowed = false even if balance is sufficient.

Configuring overage policy

overage_policy is set at the tenant level and can be overridden per-customer:

# Tenant default
curl -X PATCH https://api.quotastack.io/v1/tenants/{tenant_id}/config \
  -H "X-API-Key: qs_live_..." \
  -H "Idempotency-Key: config-overage:{tenant_id}" \
  -H "Content-Type: application/json" \
  -d '{ "overage_policy": "block" }'

# Per-customer override
curl -X PATCH https://api.quotastack.io/v1/customer-by-external-id/user_abc \
  -H "X-API-Key: qs_live_..." \
  -H "Content-Type: application/json" \
  -d '{ "overage_policy": "allow" }'

When a customer’s override is null, the tenant default applies. Overage policy applies to consumption (usage events) and entitlement checks — it does not apply to manual grant or adjust operations, which are always strict.

Customers without any active subscription still have an overage_policy; it’s a tenant/customer property independent of subscriptions.

How cost is computed

The entitlement check looks up the active metering rule for the given billable_metric_key and computes cost based on the rule’s cost_type:

Cost type	Formula	Example
`flat`	`base_cost` (fixed, ignores units)	Plan purchase: `base_cost = 99000` mc. Checking 1 unit costs 99,000 mc.
`per_unit`	`unit_cost * units`	Chat message: `unit_cost = 1000` mc. Checking 5 units costs 5,000 mc.
`tiered`	Graduated or volume pricing across tiers	See metering rules for details.

If no active metering rule exists for the metric key, the check returns a 404.

Bulk entitlement check

Retrieve entitlements for all active metrics at once. Two URL forms:

GET /v1/customers/{customer_id}/entitlements
GET /v1/customer-by-external-id/{external_id}/entitlements

curl https://api.quotastack.io/v1/customer-by-external-id/user_abc/entitlements \
  -H "X-API-Key: qs_live_..."

Response:

{
  "customer_id": "019d6258-07ba-7418-83be-58f5fde53e4e",
  "external_customer_id": "user_abc",
  "environment": "live",
  "balance": 150000,
  "reserved_balance": 10000,
  "effective_balance": 140000,
  "subscription_status": "active",
  "plan_name": "Pro",
  "entitlements": {
    "look": {
      "billable_metric_key": "look",
      "allowed": true,
      "estimated_cost_per_unit": 1000,
      "affordable_units": 140,
      "cost_type": "per_unit"
    },
    "chat_message": {
      "billable_metric_key": "chat_message",
      "allowed": true,
      "estimated_cost_per_unit": 500,
      "affordable_units": 280,
      "cost_type": "per_unit"
    }
  },
  "cached_at": "2025-01-15T10:30:00Z"
}

Each entry in the entitlements map includes:

Field	Type	Description
`allowed`	boolean	Whether the customer can perform at least 1 unit.
`estimated_cost_per_unit`	int64	Millicredits for a single unit of this metric. For `tiered` rules, this is the cost of the next unit at the customer’s current tier position (see caveat below).
`affordable_units`	int64	How many units the customer can afford at the current balance, computed as `effective_balance / estimated_cost_per_unit`. For `flat` rules: either 1 or 0. For a free metric (cost 0): `int64` max.
`cost_type`	string	The metering rule type: `flat`, `per_unit`, or `tiered`.

Caveat for tiered rules: affordable_units assumes the per-unit cost stays constant. Crossing a tier boundary during consumption changes the rate, so the actual number of affordable units may be higher (cheaper upper tier) or lower (more expensive upper tier). Treat it as a guide, not a guarantee.

Latency and freshness

Entitlement checks are designed for the hot path. The two endpoints trade staleness for speed differently:

Endpoint	Freshness	Typical latency
Single-metric (`/entitlements/{metric}`)	Always live — reflects the balance at request time	5-15ms
Bulk (`/entitlements`)	Up to 30 seconds stale	sub-1ms on the fast path, 5-15ms otherwise

Use the single-metric endpoint on the hot path — usage-gating, reserve→check→commit flows, anywhere a few-second-stale answer would be wrong. Use the bulk endpoint for dashboards, profile screens, and other places where 30-second staleness is acceptable.

Freshness after balance changes

Any credit mutation — a usage event, topup, grant, reservation, block expiry, or adjustment — immediately refreshes the customer’s bulk-endpoint result. Your next check after a balance change sees the up-to-date numbers without waiting for the staleness window to elapse.

Forcing a fresh result

To force the bulk endpoint to skip its staleness window, send the Cache-Control: no-cache header:

curl https://api.quotastack.io/v1/customer-by-external-id/user_abc/entitlements \
  -H "X-API-Key: qs_live_..." \
  -H "Cache-Control: no-cache"

The single-metric endpoint always computes live — no header needed.

Using entitlements on the hot path

Entitlement checks are designed for the hot path. Common patterns:

Gate UI elements. Before rendering a “Generate” button, check if the user is entitled. If allowed is false, show a disabled button with an upgrade prompt.

Pre-check before expensive operations. Before kicking off an AI generation that will cost compute resources, verify the user has credits. This avoids wasting infrastructure on work you cannot charge for.

Display affordable units. Use affordable_units to show the user how many actions they have remaining: “You have 140 looks left this month.”

Determine upsell moments. When affordable_units drops below a threshold, prompt the user to purchase more credits or upgrade their plan.

Example: checking if a user can generate an outfit

A fashion SaaS charges 1,000 mc (1 credit) per outfit look. Before starting the AI pipeline:

curl "https://api.quotastack.io/v1/customer-by-external-id/user_xyz/entitlements/look?units=1" \
  -H "X-API-Key: qs_live_..."

If allowed is true, proceed with generation. After generation completes, record the usage event to debit the credits. If you need to hold credits during the generation, use a reservation instead.

If allowed is false, return an error to the user and suggest they purchase a credit pack or upgrade.

Non-metered entitlements

Not every entitlement is about credit balance. Billable metrics with types boolean, gauge, or static represent feature access, limits, and configuration that customers inherit from their plan variant — no credits involved.

Boolean — feature flags

“Does this customer have SSO?” The answer comes from the plan variant’s entitlement attachment on the sso metric. If the variant attaches {"enabled": true}, the customer has SSO. Otherwise, the metric’s default_value applies (typically {"enabled": false}).

Gauge — count-with-cap

“How many seats does this customer get?” The answer is the {"cap": N} value attached to the max_seats metric on their plan variant. A Free plan might attach {"cap": 5}, while Pro attaches {"cap": 50}.

Static — arbitrary configuration

“What rate limits apply to this customer?” The answer is a JSON config blob — {"config": {"rpm": 1000, "models": ["gpt-4", "claude-sonnet"]}} — attached to the plan variant. Your app reads it and applies the constraints.

How non-metered entitlements resolve

A billable metric is created with a type and default_value.
A plan-variant entitlement attaches that metric to a specific plan variant with a value override.
When a customer subscribes, they inherit entitlements from their subscription’s plan variant.

If no plan-variant entitlement exists for a metric, the metric’s default_value is the fallback. This means you can define sensible defaults (SSO off, 5 seats) and only override them on higher-tier variants.

The metered type continues to work exactly as described above — all existing credit-balance entitlement checking behavior is unchanged. Metered entitlements don’t need explicit plan-variant attachments; they’re governed by credit grants and metering rules.

Common Mistakes

The mistakes developers typically make with this concept — and what to do instead.

Don't skip entitlement checks "for performance"

Why

Cached checks are sub-millisecond. Skipping them means your customer can run up a negative balance, which is usually worse than the latency you saved.

Don't treat allowed: true as a binding promise

Why

Between the check and the actual usage, another request could drain the balance. For long-running operations, reserve credits instead.

Don't assume all entitlement checks hit the credit balance

Why

Boolean, gauge, and static entitlements are resolved from plan-variant attachments — no credits involved.

Entitlements

Entitlements

The check endpoint

Example request

Example response

Response fields

When is allowed true?

Configuring overage policy

How cost is computed

Bulk entitlement check

Latency and freshness

Freshness after balance changes

Forcing a fresh result

Using entitlements on the hot path

Example: checking if a user can generate an outfit

Non-metered entitlements

Boolean — feature flags

Gauge — count-with-cap

Static — arbitrary configuration

How non-metered entitlements resolve

Common Mistakes

When is `allowed` true?