---
title: Entitlements
description: How to check whether a customer can perform an action, how entitlement results are computed, and how caching keeps checks fast on the hot path.
order: 2
---

# Entitlements

An entitlement check answers one question: **"Can this customer perform this action right now?"**

You call it before starting work -- before generating an image, sending a message, making an API call. The response tells you whether to proceed, how much it will cost, and how many more units the customer can afford.

> **Mental Model:** An entitlement check is the **bouncer at the door**. Before your app starts an expensive operation, it asks QuotaStack "can this customer afford this?" — answer comes back in milliseconds, cached and ready.

## Quick Take

- One endpoint answers: **"Can this customer do this right now?"**
- Returns **allowed**, **estimated cost**, and **remaining balance**
- Cost computed from metering rules: flat, per-unit, or tiered
- Hot-path checks return in **sub-millisecond** latency, safe to gate every action

## Diagram

A check request first asks "Has feature?". If yes, it asks "Balance > 0?" — yes returns allowed: true, no returns allowed: false. If the feature check fails, it returns allowed: false directly.

```mermaid
flowchart TD
    A[Check Request] --> B{Has feature?}
    B -->|Yes| C{Balance > 0?}
    B -->|No| D[allowed: false]
    C -->|Yes| E[allowed: true]
    C -->|No| F[allowed: false]
```

## The check endpoint

Two URL forms — pick the one that matches the ID you have (see [Customer identification](/docs/concepts/customer-identification)):

```
GET /v1/customers/{customer_id}/entitlements/{billable_metric_key}?units=N
GET /v1/customer-by-external-id/{external_id}/entitlements/{billable_metric_key}?units=N
```

| Parameter | Location | Required | Default | Description |
|---|---|---|---|---|
| `customer_id` / `external_id` | path | yes | -- | The customer to check, in either ID form. |
| `billable_metric_key` | path | yes | -- | The metric key (e.g. `chat_message`, `look`, `api_call`). |
| `units` | query | no | 1 | How many units the customer wants to consume. |

### Example request

Check if a customer can generate 1 outfit look (which costs 1,000 mc per the metering rule). Using the external-id form:

```bash
curl "https://api.quotastack.io/v1/customer-by-external-id/user_abc/entitlements/look?units=1" \
  -H "X-API-Key: qs_live_..."
```

### Example response

```json
{
  "allowed": true,
  "customer_id": "019d6258-07ba-7418-83be-58f5fde53e4e",
  "external_customer_id": "user_abc",
  "billable_metric_key": "look",
  "units": 1,
  "balance": 150000,
  "reserved_balance": 10000,
  "effective_balance": 140000,
  "estimated_cost": 1000,
  "balance_after": 139000,
  "subscription_status": "active",
  "overage_policy": "block"
}
```

### Response fields

| Field | Type | Description |
|---|---|---|
| `allowed` | boolean | Whether the customer can perform the action. |
| `balance` | int64 | Total millicredits in the account. |
| `reserved_balance` | int64 | Millicredits held by active reservations. |
| `effective_balance` | int64 | `balance - reserved_balance`. The usable amount. |
| `estimated_cost` | int64 | Millicredits this operation would cost, computed from the active metering rule. |
| `balance_after` | int64 | `effective_balance - estimated_cost`. Can be negative if the overage policy allows it. |
| `subscription_status` | string or null | The customer's subscription status (`active`, `trialing`, `overdue`, etc.), or null if no subscription. |
| `overage_policy` | string | Tenant-level policy: `block` (deny when insufficient), `allow` (permit overage), or `notify` (permit but flag). |

### When is `allowed` true?

- If `effective_balance >= estimated_cost`, the customer has enough credits. `allowed = true`.
- If the overage policy is `allow` or `notify`, `allowed = true` regardless of balance. The `balance_after` field will be negative, signaling overage.
- If the customer's subscription is `overdue` and the plan variant has `allow_usage_while_overdue = false`, `allowed = false` even if balance is sufficient.

### Configuring overage policy

`overage_policy` is set at the tenant level and can be overridden per-customer:

```bash
# Tenant default
curl -X PATCH https://api.quotastack.io/v1/tenants/{tenant_id}/config \
  -H "X-API-Key: qs_live_..." \
  -H "Idempotency-Key: config-overage:{tenant_id}" \
  -H "Content-Type: application/json" \
  -d '{ "overage_policy": "block" }'

# Per-customer override
curl -X PATCH https://api.quotastack.io/v1/customer-by-external-id/user_abc \
  -H "X-API-Key: qs_live_..." \
  -H "Content-Type: application/json" \
  -d '{ "overage_policy": "allow" }'
```

When a customer's override is `null`, the tenant default applies. Overage policy applies to consumption (usage events) and entitlement checks — it does **not** apply to manual `grant` or `adjust` operations, which are always strict.

Customers without any active subscription still have an `overage_policy`; it's a tenant/customer property independent of subscriptions.

## How cost is computed

The entitlement check looks up the active [metering rule](/docs/concepts/metering) for the given `billable_metric_key` and computes cost based on the rule's `cost_type`:

| Cost type | Formula | Example |
|---|---|---|
| `flat` | `base_cost` (fixed, ignores units) | Plan purchase: `base_cost = 99000` mc. Checking 1 unit costs 99,000 mc. |
| `per_unit` | `unit_cost * units` | Chat message: `unit_cost = 1000` mc. Checking 5 units costs 5,000 mc. |
| `tiered` | Graduated or volume pricing across tiers | See [metering rules](/docs/concepts/metering) for details. |

If no active metering rule exists for the metric key, the check returns a `404`.

## Bulk entitlement check

Retrieve entitlements for **all** active metrics at once. Two URL forms:

```
GET /v1/customers/{customer_id}/entitlements
GET /v1/customer-by-external-id/{external_id}/entitlements
```

```bash
curl https://api.quotastack.io/v1/customer-by-external-id/user_abc/entitlements \
  -H "X-API-Key: qs_live_..."
```

Response:

```json
{
  "customer_id": "019d6258-07ba-7418-83be-58f5fde53e4e",
  "external_customer_id": "user_abc",
  "environment": "live",
  "balance": 150000,
  "reserved_balance": 10000,
  "effective_balance": 140000,
  "subscription_status": "active",
  "plan_name": "Pro",
  "entitlements": {
    "look": {
      "billable_metric_key": "look",
      "allowed": true,
      "estimated_cost_per_unit": 1000,
      "affordable_units": 140,
      "cost_type": "per_unit"
    },
    "chat_message": {
      "billable_metric_key": "chat_message",
      "allowed": true,
      "estimated_cost_per_unit": 500,
      "affordable_units": 280,
      "cost_type": "per_unit"
    }
  },
  "cached_at": "2025-01-15T10:30:00Z"
}
```

Each entry in the `entitlements` map includes:

| Field | Type | Description |
|---|---|---|
| `allowed` | boolean | Whether the customer can perform at least 1 unit. |
| `estimated_cost_per_unit` | int64 | Millicredits for a single unit of this metric. For `tiered` rules, this is the cost of the **next** unit at the customer's current tier position (see caveat below). |
| `affordable_units` | int64 | How many units the customer can afford at the current balance, computed as `effective_balance / estimated_cost_per_unit`. For `flat` rules: either 1 or 0. For a free metric (cost 0): `int64` max. |
| `cost_type` | string | The metering rule type: `flat`, `per_unit`, or `tiered`. |

**Caveat for tiered rules:** `affordable_units` assumes the per-unit cost stays constant. Crossing a tier boundary during consumption changes the rate, so the actual number of affordable units may be higher (cheaper upper tier) or lower (more expensive upper tier). Treat it as a guide, not a guarantee.

## Latency and freshness

Entitlement checks are designed for the hot path. The two endpoints trade staleness for speed differently:

| Endpoint | Freshness | Typical latency |
|---|---|---|
| **Single-metric** (`/entitlements/{metric}`) | Always live — reflects the balance at request time | 5-15ms |
| **Bulk** (`/entitlements`) | Up to 30 seconds stale | sub-1ms on the fast path, 5-15ms otherwise |

Use the **single-metric** endpoint on the hot path — usage-gating, reserve→check→commit flows, anywhere a few-second-stale answer would be wrong. Use the **bulk** endpoint for dashboards, profile screens, and other places where 30-second staleness is acceptable.

### Freshness after balance changes

Any credit mutation — a usage event, topup, grant, reservation, block expiry, or adjustment — immediately refreshes the customer's bulk-endpoint result. Your next check after a balance change sees the up-to-date numbers without waiting for the staleness window to elapse.

### Forcing a fresh result

To force the bulk endpoint to skip its staleness window, send the `Cache-Control: no-cache` header:

```bash
curl https://api.quotastack.io/v1/customer-by-external-id/user_abc/entitlements \
  -H "X-API-Key: qs_live_..." \
  -H "Cache-Control: no-cache"
```

The single-metric endpoint always computes live — no header needed.

## Using entitlements on the hot path

Entitlement checks are designed for the hot path. Common patterns:

**Gate UI elements.** Before rendering a "Generate" button, check if the user is entitled. If `allowed` is false, show a disabled button with an upgrade prompt.

**Pre-check before expensive operations.** Before kicking off an AI generation that will cost compute resources, verify the user has credits. This avoids wasting infrastructure on work you cannot charge for.

**Display affordable units.** Use `affordable_units` to show the user how many actions they have remaining: "You have 140 looks left this month."

**Determine upsell moments.** When `affordable_units` drops below a threshold, prompt the user to purchase more credits or upgrade their plan.

### Example: checking if a user can generate an outfit

A fashion SaaS charges 1,000 mc (1 credit) per outfit look. Before starting the AI pipeline:

```bash
curl "https://api.quotastack.io/v1/customer-by-external-id/user_xyz/entitlements/look?units=1" \
  -H "X-API-Key: qs_live_..."
```

If `allowed` is `true`, proceed with generation. After generation completes, [record the usage event](/docs/concepts/metering) to debit the credits. If you need to hold credits during the generation, use a [reservation](/docs/concepts/reservations) instead.

If `allowed` is `false`, return an error to the user and suggest they purchase a credit pack or upgrade.

## Common Mistakes

**✗ Don't skip entitlement checks "for performance"**

Cached checks are sub-millisecond. Skipping them means your customer can run up a negative balance, which is usually worse than the latency you saved.

**✗ Don't treat `allowed: true` as a binding promise**

Between the check and the actual usage, another request could drain the balance. For long-running operations, [reserve credits](/docs/concepts/reservations) instead.