WALLET · PLANS · PER-MESSAGE METERING

AI Chat App: Wallet + Plans + Per-Message Metering

How to model an AI companion chat app with wallet-based credits, time-limited plans purchased from the wallet, and per-message metering.

Inspired by: Character.AI, Replika, OurVibe-style companion apps

Mental Model

Think of this as a chat app with a prepaid wallet: customers top up real money, optionally buy time-limited plan packs (1-hour or weekly bundles) from that wallet, and every message debits credits. Plan packs burn first; wallet is the safety net.

Quick Take

A persistent wallet funded by topups — never expires, so expiring packs burn before it

Time-limited plan packs (1hr, weekly) bought from the wallet, burn first

Per-message metering with variable cost by companion or model

Auto-buy logic: when plan expires mid-conversation, purchase the next from the wallet without breaking flow

AI Chat App

Pattern: wallet + plans + per-message metering.

The problem

You run an AI companion chat app. Users chat with AI companions, and each message costs credits. Users maintain a wallet topped up with real money. They can buy time-limited plans (1-hour, weekly) from that wallet. Time-limited plan credits burn before no-expiry wallet credits; when plan credits run out or expire, the wallet balance kicks in.

You need:

A persistent wallet balance that never expires.
Time-limited plan bundles that burn before the wallet.
Free conversation credits granted when a user starts a new chat.
Variable pricing per companion (some companions cost more than others).
Auto-buy logic: when a plan expires mid-session, automatically purchase a new one from the wallet.

Credit structure

QuotaStack models this with three types of credit blocks, all stacking on a single customer:

Block type	Expiry	Source
Wallet	None	Topup grant after payment
Plan credits (1hr, weekly)	Time-limited (1hr or 7 days)	Topup grant after wallet debit
Free conversation credits	None	Topup grant on new conversation

Burn-down order: all blocks use priority 0, so time-limited plan credits burn before no-expiry credits. When plans expire or run out, free conversation credits and wallet balance share the same no-expiry tier — free sources burn before paid topups, then oldest-created first.

This is the right behavior: time-limited credits should be consumed before permanent ones to minimize waste.

Billable metrics

Set up three billable metrics and their metering rules:

POST /v1/billable-metrics
{ "key": "chat_message", "name": "Chat Message" }

POST /v1/billable-metrics
{ "key": "plan_purchase_1hr", "name": "1-Hour Plan Purchase" }

POST /v1/billable-metrics
{ "key": "plan_purchase_weekly", "name": "Weekly Plan Purchase" }

Metering rules (all per_unit at 1000mc per unit):

POST /v1/metering-rules
{
  "billable_metric_key": "chat_message",
  "cost_type": "per_unit",
  "credit_cost": 1000,
  "unit_cost": 1000
}

POST /v1/metering-rules
{
  "billable_metric_key": "plan_purchase_1hr",
  "cost_type": "per_unit",
  "credit_cost": 1000,
  "unit_cost": 1000
}

POST /v1/metering-rules
{
  "billable_metric_key": "plan_purchase_weekly",
  "cost_type": "per_unit",
  "credit_cost": 1000,
  "unit_cost": 1000
}

Each unit = 1000mc = 1 credit. A message costs 1 unit (1 credit). A plan purchase costs N units, where N is the plan price in credits — passed dynamically via the units field.

Integration flow

Create the customer when they sign up:

POST /v1/customers
Idempotency-Key: signup:<your_user_id>

{
  "external_id": "user_12345"
}

When the user pays via your payment provider (Stripe, Razorpay, etc.), grant wallet credits:

POST /v1/topups/grant
Idempotency-Key: topup:<your_payment_id>

{
  "customer_id": "cus_...",
  "credits": 500000,
  "metadata": {
    "source": "wallet_recharge",
    "payment_id": "pay_abc123",
    "amount_paid": "499"
  }
}

No expiry, no priority specified (defaults to 0). This is the wallet. The metadata field stores your fiat payment reference for audit — QuotaStack never reads it.

2. New conversation — grant free credits

Each time a user starts a new conversation, grant 50 free message credits:

POST /v1/topups/grant
Idempotency-Key: convo-grant:<conversation_id>

{
  "customer_id": "cus_...",
  "credits": 50000,
  "metadata": {
    "source": "free_conversation",
    "conversation_id": "conv_789"
  }
}

50 credits = 50,000mc. No expiry, priority 0. These stack with the wallet and burn in creation-order alongside wallet blocks.

3. Buy a plan from the wallet

Purchasing a plan is a two-step operation: debit the plan cost from the wallet, then grant the plan credits with an expiry.

Step 1: Debit the plan cost.

POST /v1/usage
Idempotency-Key: usage:plan-buy-<purchase_id>

{
  "customer_id": "cus_...",
  "billable_metric_key": "plan_purchase_1hr",
  "units": 100,
  "metadata": {
    "companion_id": "companion_42",
    "plan_type": "1hr"
  }
}

Here units: 100 means 100 credits (100,000mc) — the price of a 1-hour plan for this companion. Different companions can have different prices; you pass the companion-specific price as the units value.

Step 2: Grant plan credits.

POST /v1/topups/grant
Idempotency-Key: topup:plan-grant-<purchase_id>

{
  "customer_id": "cus_...",
  "credits": 200000,
  "expires_at": "2026-04-13T11:00:00Z",
  "metadata": {
    "source": "plan_1hr",
    "companion_id": "companion_42",
    "priority": 0
  }
}

200 credits (200,000mc) with a 1-hour expiry. These burn before no-expiry wallet credits at the same priority.

Variable pricing per companion: The plan cost (units on the debit) and the credits granted can both vary by companion. Your app stores companion pricing; QuotaStack just executes the credit math.

4. Send a message

Before sending a message to the AI, check entitlement and then record usage:

GET /v1/customers/cus_.../entitlements/chat_message?units=1

Response:

{
  "allowed": true,
  "balance": 245000,
  "effective_balance": 245000,
  "cost_per_unit": 1000,
  "cost_total": 1000,
  "affordable_units": 245
}

If allowed is true, proceed:

POST /v1/usage
Idempotency-Key: usage:msg-<message_id>

{
  "customer_id": "cus_...",
  "billable_metric_key": "chat_message",
  "units": 1,
  "metadata": {
    "conversation_id": "conv_789",
    "companion_id": "companion_42"
  }
}

The usage consumer debits 1000mc from the next block in burn-down order (the plan credits, if active). Returns 202 — processing is async.

5. Plan stacking

When a user buys a second plan while one is still active, use stack_after so the new plan starts after the current one expires — no overlap, no race conditions.

POST /v1/topups/grant
Idempotency-Key: topup:plan-grant-<purchase_id>

{
  "external_customer_id": "user42:companion7",
  "credits": 200000,
  "price_paid": 0,
  "currency": "mc",
  "duration_seconds": 3600,
  "stack_after": {
    "metadata_match": { "source": "plan_1hr" },
    "fallback": "now"
  },
  "metadata": { "source": "plan_1hr", "companion_id": "companion_42" }
}

The server atomically finds the latest active block matching {"source": "plan_1hr"}, sets the new block’s effective_at to that block’s expires_at, and computes expires_at = effective_at + 3600s. If no matching block exists, fallback: "now" makes it start immediately.

Full message handler with auto-buy

Here is the complete pseudocode for handling a message send, including auto-buy logic:

def handle_send_message(customer_id, companion_id, conversation_id, message):
    # 1. Check entitlement
    ent = quotastack.get_entitlement(customer_id, "chat_message", units=1)

    if not ent.allowed:
        # Try auto-buy if wallet has funds for a plan
        bought = try_auto_buy_plan(customer_id, companion_id)
        if not bought:
            return error("Insufficient credits. Please recharge your wallet.")
        # Re-check after auto-buy
        ent = quotastack.get_entitlement(customer_id, "chat_message", units=1)
        if not ent.allowed:
            return error("Insufficient credits after plan purchase.")

    # 2. Record usage (async, returns 202)
    quotastack.record_usage(
        customer_id=customer_id,
        billable_metric_key="chat_message",
        units=1,
        idempotency_key=f"usage:msg-{message.id}",
        metadata={
            "conversation_id": conversation_id,
            "companion_id": companion_id,
        }
    )

    # 3. Call AI and stream response
    response = ai_service.chat(companion_id, message)
    return response


def try_auto_buy_plan(customer_id, companion_id):
    """Attempt to purchase the user's preferred plan from wallet balance."""
    companion = get_companion(companion_id)
    plan_cost = companion.default_plan_cost      # e.g. 100 credits
    plan_credits = companion.default_plan_credits # e.g. 200 credits
    plan_duration = companion.default_plan_duration # e.g. 1 hour

    # Check if wallet can afford the plan
    ent = quotastack.get_entitlement(
        customer_id,
        "plan_purchase_1hr",
        units=plan_cost
    )
    if not ent.allowed:
        return False

    purchase_id = generate_uuid()

    # Step 1: Debit plan cost from wallet
    quotastack.record_usage(
        customer_id=customer_id,
        billable_metric_key="plan_purchase_1hr",
        units=plan_cost,
        idempotency_key=f"usage:plan-buy-{purchase_id}",
        metadata={
            "companion_id": companion_id,
            "auto_buy": True,
        }
    )

    # Step 2: Grant plan credits with stack_after — server sequences atomically
    quotastack.topup_grant(
        customer_id=customer_id,
        credits=plan_credits * 1000,
        duration_seconds=int(plan_duration.total_seconds()),
        stack_after={
            "metadata_match": {"source": "plan_1hr"},
            "fallback": "now",
        },
        idempotency_key=f"topup:plan-grant-{purchase_id}",
        metadata={
            "source": "plan_1hr",
            "companion_id": companion_id,
            "auto_buy": True,
        }
    )

    return True

Plan expiry and auto-buy trigger

QuotaStack fires a credit.expired webhook when plan blocks expire. Use this to trigger auto-buy:

def handle_webhook(event):
    if event.type == "credit.expired":
        block = event.credit_block
        # Only auto-buy for plan blocks
        if block.metadata.get("source") in ["plan_1hr", "plan_weekly"]:
            customer = get_customer(block.customer_id)
            if customer.settings.auto_buy_enabled:
                companion_id = block.metadata.get("companion_id")
                try_auto_buy_plan(block.customer_id, companion_id)

Showing balance in the UI

Use the balance endpoint to display remaining credits:

GET /v1/customers/cus_.../credits?include_blocks=true

From the response, compute:

Wallet balance: Sum remaining_amount of blocks with no expiry and source "wallet_recharge".
Plan credits remaining: Sum remaining_amount of blocks with plan metadata (plan_1hr or plan_weekly) where effective_at is null or in the past.
Queued plan credits: pending_balance from the balance response — credits from stacked plans that haven’t started yet.
Plan expires at: Latest expires_at among active plan blocks.
Free credits remaining: Sum remaining_amount of blocks with source "free_conversation".
Total messages available: balance / 1000 (since 1 message = 1000mc).

Tips

Idempotency keys matter. The two-step plan purchase (debit + grant) must use deterministic idempotency keys derived from a single purchase_id. If step 1 succeeds but step 2 fails, retrying with the same keys is safe — step 1 replays as a no-op, step 2 executes.
Usage is async. POST /v1/usage returns 202 immediately; the credit debit is applied in the background. For chat messages this is fine — the entitlement check already confirmed the user can afford it.
Expiry is the key lever. Plan credits have an expires_at, wallet credits do not. At the same priority, expiring credits burn before no-expiry wallet credits.
Variable companion pricing. QuotaStack’s metering rules are tenant-scoped, not companion-scoped. All companions share the same chat_message metric at 1000mc per unit. Price differences are expressed as different units values on the plan purchase debit, not as different metering rules.
Plan purchase is not a subscription. This pattern uses topup grants with expiry, not QuotaStack subscriptions. There is no renewal cycle, no grace period, no subscription state machine. The plan is a credit block with a clock on it. This keeps the model simple for consumer apps where users buy plans ad hoc.

See also: Credits, Burn-Down Order, Entitlements.

Concepts used in this pattern

Topups & Wallets Credits Metering Entitlements