Docs / Concepts / Entitlements

Entitlements

How to check whether a customer can perform an action, how entitlement results are computed, and how caching keeps checks fast on the hot path.

Mental Model

An entitlement check is the bouncer at the door. Before your app starts an expensive operation, it asks QuotaStack "can this customer afford this?" — answer comes back in milliseconds, cached and ready.

Quick Take
One endpoint answers: "Can this customer do this right now?"
Returns allowed, estimated cost, and remaining balance
Cost computed from metering rules: flat, per-unit, or tiered
Hot-path checks return in sub-millisecond latency, safe to gate every action
Check Request Has feature? Yes No Balance > 0? Yes No allowed: true allowed: false allowed: false

Entitlements

An entitlement check answers one question: “Can this customer perform this action right now?”

You call it before starting work — before generating an image, sending a message, making an API call. The response tells you whether to proceed, how much it will cost, and how many more units the customer can afford.

The check endpoint

Two URL forms — pick the one that matches the ID you have (see Customer identification):

GET /v1/customers/{customer_id}/entitlements/{billable_metric_key}?units=N
GET /v1/customer-by-external-id/{external_id}/entitlements/{billable_metric_key}?units=N
ParameterLocationRequiredDefaultDescription
customer_id / external_idpathyesThe customer to check, in either ID form.
billable_metric_keypathyesThe metric key (e.g. chat_message, look, api_call).
unitsqueryno1How many units the customer wants to consume.

Example request

Check if a customer can generate 1 outfit look (which costs 1,000 mc per the metering rule). Using the external-id form:

curl "https://api.quotastack.io/v1/customer-by-external-id/user_abc/entitlements/look?units=1" \
  -H "X-API-Key: qs_live_..."

Example response

{
  "allowed": true,
  "customer_id": "019d6258-07ba-7418-83be-58f5fde53e4e",
  "external_customer_id": "user_abc",
  "billable_metric_key": "look",
  "units": 1,
  "balance": 150000,
  "reserved_balance": 10000,
  "effective_balance": 140000,
  "estimated_cost": 1000,
  "balance_after": 139000,
  "subscription_status": "active",
  "overage_policy": "block"
}

Response fields

FieldTypeDescription
allowedbooleanWhether the customer can perform the action.
balanceint64Total millicredits in the account.
reserved_balanceint64Millicredits held by active reservations.
effective_balanceint64balance - reserved_balance. The usable amount.
estimated_costint64Millicredits this operation would cost, computed from the active metering rule.
balance_afterint64effective_balance - estimated_cost. Can be negative if the overage policy allows it.
subscription_statusstring or nullThe customer’s subscription status (active, trialing, overdue, etc.), or null if no subscription.
overage_policystringTenant-level policy: block (deny when insufficient), allow (permit overage), or notify (permit but flag).

When is allowed true?

  • If effective_balance >= estimated_cost, the customer has enough credits. allowed = true.
  • If the overage policy is allow or notify, allowed = true regardless of balance. The balance_after field will be negative, signaling overage.
  • If the customer’s subscription is overdue and the plan variant has allow_usage_while_overdue = false, allowed = false even if balance is sufficient.

Configuring overage policy

overage_policy is set at the tenant level and can be overridden per-customer:

# Tenant default
curl -X PATCH https://api.quotastack.io/v1/tenants/{tenant_id}/config \
  -H "X-API-Key: qs_live_..." \
  -H "Idempotency-Key: config-overage:{tenant_id}" \
  -H "Content-Type: application/json" \
  -d '{ "overage_policy": "block" }'

# Per-customer override
curl -X PATCH https://api.quotastack.io/v1/customer-by-external-id/user_abc \
  -H "X-API-Key: qs_live_..." \
  -H "Content-Type: application/json" \
  -d '{ "overage_policy": "allow" }'

When a customer’s override is null, the tenant default applies. Overage policy applies to consumption (usage events) and entitlement checks — it does not apply to manual grant or adjust operations, which are always strict.

Customers without any active subscription still have an overage_policy; it’s a tenant/customer property independent of subscriptions.

How cost is computed

The entitlement check looks up the active metering rule for the given billable_metric_key and computes cost based on the rule’s cost_type:

Cost typeFormulaExample
flatbase_cost (fixed, ignores units)Plan purchase: base_cost = 99000 mc. Checking 1 unit costs 99,000 mc.
per_unitunit_cost * unitsChat message: unit_cost = 1000 mc. Checking 5 units costs 5,000 mc.
tieredGraduated or volume pricing across tiersSee metering rules for details.

If no active metering rule exists for the metric key, the check returns a 404.

Bulk entitlement check

Retrieve entitlements for all active metrics at once. Two URL forms:

GET /v1/customers/{customer_id}/entitlements
GET /v1/customer-by-external-id/{external_id}/entitlements
curl https://api.quotastack.io/v1/customer-by-external-id/user_abc/entitlements \
  -H "X-API-Key: qs_live_..."

Response:

{
  "customer_id": "019d6258-07ba-7418-83be-58f5fde53e4e",
  "external_customer_id": "user_abc",
  "environment": "live",
  "balance": 150000,
  "reserved_balance": 10000,
  "effective_balance": 140000,
  "subscription_status": "active",
  "plan_name": "Pro",
  "entitlements": {
    "look": {
      "billable_metric_key": "look",
      "allowed": true,
      "estimated_cost_per_unit": 1000,
      "affordable_units": 140,
      "cost_type": "per_unit"
    },
    "chat_message": {
      "billable_metric_key": "chat_message",
      "allowed": true,
      "estimated_cost_per_unit": 500,
      "affordable_units": 280,
      "cost_type": "per_unit"
    }
  },
  "cached_at": "2025-01-15T10:30:00Z"
}

Each entry in the entitlements map includes:

FieldTypeDescription
allowedbooleanWhether the customer can perform at least 1 unit.
estimated_cost_per_unitint64Millicredits for a single unit of this metric. For tiered rules, this is the cost of the next unit at the customer’s current tier position (see caveat below).
affordable_unitsint64How many units the customer can afford at the current balance, computed as effective_balance / estimated_cost_per_unit. For flat rules: either 1 or 0. For a free metric (cost 0): int64 max.
cost_typestringThe metering rule type: flat, per_unit, or tiered.

Caveat for tiered rules: affordable_units assumes the per-unit cost stays constant. Crossing a tier boundary during consumption changes the rate, so the actual number of affordable units may be higher (cheaper upper tier) or lower (more expensive upper tier). Treat it as a guide, not a guarantee.

Latency and freshness

Entitlement checks are designed for the hot path. The two endpoints trade staleness for speed differently:

EndpointFreshnessTypical latency
Single-metric (/entitlements/{metric})Always live — reflects the balance at request time5-15ms
Bulk (/entitlements)Up to 30 seconds stalesub-1ms on the fast path, 5-15ms otherwise

Use the single-metric endpoint on the hot path — usage-gating, reserve→check→commit flows, anywhere a few-second-stale answer would be wrong. Use the bulk endpoint for dashboards, profile screens, and other places where 30-second staleness is acceptable.

Freshness after balance changes

Any credit mutation — a usage event, topup, grant, reservation, block expiry, or adjustment — immediately refreshes the customer’s bulk-endpoint result. Your next check after a balance change sees the up-to-date numbers without waiting for the staleness window to elapse.

Forcing a fresh result

To force the bulk endpoint to skip its staleness window, send the Cache-Control: no-cache header:

curl https://api.quotastack.io/v1/customer-by-external-id/user_abc/entitlements \
  -H "X-API-Key: qs_live_..." \
  -H "Cache-Control: no-cache"

The single-metric endpoint always computes live — no header needed.

Using entitlements on the hot path

Entitlement checks are designed for the hot path. Common patterns:

Gate UI elements. Before rendering a “Generate” button, check if the user is entitled. If allowed is false, show a disabled button with an upgrade prompt.

Pre-check before expensive operations. Before kicking off an AI generation that will cost compute resources, verify the user has credits. This avoids wasting infrastructure on work you cannot charge for.

Display affordable units. Use affordable_units to show the user how many actions they have remaining: “You have 140 looks left this month.”

Determine upsell moments. When affordable_units drops below a threshold, prompt the user to purchase more credits or upgrade their plan.

Example: checking if a user can generate an outfit

A fashion SaaS charges 1,000 mc (1 credit) per outfit look. Before starting the AI pipeline:

curl "https://api.quotastack.io/v1/customer-by-external-id/user_xyz/entitlements/look?units=1" \
  -H "X-API-Key: qs_live_..."

If allowed is true, proceed with generation. After generation completes, record the usage event to debit the credits. If you need to hold credits during the generation, use a reservation instead.

If allowed is false, return an error to the user and suggest they purchase a credit pack or upgrade.

Common Mistakes

The mistakes developers typically make with this concept — and what to do instead.

×
Don't skip entitlement checks "for performance"
Why
Cached checks are sub-millisecond. Skipping them means your customer can run up a negative balance, which is usually worse than the latency you saved.
×
Don't treat allowed: true as a binding promise
Why
Between the check and the actual usage, another request could drain the balance. For long-running operations, reserve credits instead.
🤖
Building with an AI agent?
Get this page as markdown: /docs/concepts/entitlements.md · Full index: /llms.txt