Docs / Use Cases / AI Chat App: Wallet + Plans + Per-Message Metering
WALLET · PLANS · PER-MESSAGE METERING

AI Chat App: Wallet + Plans + Per-Message Metering

How to model an AI companion chat app with wallet-based credits, time-limited plans purchased from the wallet, and per-message metering.

Inspired by: Character.AI, Replika, OurVibe-style companion apps

Mental Model

Think of this as a chat app with a prepaid wallet: customers top up real money, optionally buy time-limited plan packs (1-hour or weekly bundles) from that wallet, and every message debits credits. Plan packs burn first; wallet is the safety net.

Quick Take
A persistent wallet funded by topups — never expires, lowest priority
Time-limited plan packs (1hr, weekly) bought from the wallet, burn first
Per-message metering with variable cost by companion or model
Auto-buy logic: when plan expires mid-conversation, purchase the next from the wallet without breaking flow
CUSTOMER PAYS Topup Grant buys plan tops up P10 · TIME-LIMITED (1HR OR 7D) Plan Pack P0 · NEVER EXPIRES Wallet Balance 1st fallback PER-MESSAGE METERING Send Message

AI Chat App

Pattern: wallet + plans + per-message metering.

The problem

You run an AI companion chat app. Users chat with AI companions, and each message costs credits. Users maintain a wallet topped up with real money. They can buy time-limited plans (1-hour, weekly) from their wallet that give a bundle of message credits at higher priority. When plan credits run out or expire, the wallet balance kicks in.

You need:

  • A persistent wallet balance that never expires.
  • Time-limited plan bundles that burn before the wallet.
  • Free conversation credits granted when a user starts a new chat.
  • Variable pricing per companion (some companions cost more than others).
  • Auto-buy logic: when a plan expires mid-session, automatically purchase a new one from the wallet.

Credit structure

QuotaStack models this with three types of credit blocks, all stacking on a single customer:

Block typePriorityExpirySource
Wallet0NoneTopup grant after payment
Plan credits (1hr, weekly)10Time-limited (1hr or 7 days)Topup grant after wallet debit
Free conversation credits0NoneTopup grant on new conversation

Burn-down order: Plan credits (priority 10) burn first. When they expire or run out, free conversation credits and wallet balance (both priority 0) share the same tier — oldest-expiring first, then oldest-created first. Since both have no expiry, the oldest block burns first.

This is the right behavior: time-limited credits should be consumed before permanent ones to minimize waste.

Billable metrics

Set up three billable metrics and their metering rules:

POST /v1/billable-metrics
{ "key": "chat_message", "name": "Chat Message" }

POST /v1/billable-metrics
{ "key": "plan_purchase_1hr", "name": "1-Hour Plan Purchase" }

POST /v1/billable-metrics
{ "key": "plan_purchase_weekly", "name": "Weekly Plan Purchase" }

Metering rules (all per_unit at 1000mc per unit):

POST /v1/metering-rules
{
  "billable_metric_key": "chat_message",
  "cost_type": "per_unit",
  "credit_cost": 1000,
  "unit_cost": 1000
}

POST /v1/metering-rules
{
  "billable_metric_key": "plan_purchase_1hr",
  "cost_type": "per_unit",
  "credit_cost": 1000,
  "unit_cost": 1000
}

POST /v1/metering-rules
{
  "billable_metric_key": "plan_purchase_weekly",
  "cost_type": "per_unit",
  "credit_cost": 1000,
  "unit_cost": 1000
}

Each unit = 1000mc = 1 credit. A message costs 1 unit (1 credit). A plan purchase costs N units, where N is the plan price in credits — passed dynamically via the units field.

Integration flow

1. Customer signup and wallet recharge

Create the customer when they sign up:

POST /v1/customers
Idempotency-Key: signup:<your_user_id>

{
  "external_id": "user_12345"
}

When the user pays via your payment provider (Stripe, Razorpay, etc.), grant wallet credits:

POST /v1/topup/grant
Idempotency-Key: topup:<your_payment_id>

{
  "customer_id": "cus_...",
  "credits": 500000,
  "metadata": {
    "source": "wallet_recharge",
    "payment_id": "pay_abc123",
    "amount_paid": "499"
  }
}

No expiry, no priority specified (defaults to 0). This is the wallet. The metadata field stores your fiat payment reference for audit — QuotaStack never reads it.

2. New conversation — grant free credits

Each time a user starts a new conversation, grant 50 free message credits:

POST /v1/topup/grant
Idempotency-Key: convo-grant:<conversation_id>

{
  "customer_id": "cus_...",
  "credits": 50000,
  "metadata": {
    "source": "free_conversation",
    "conversation_id": "conv_789"
  }
}

50 credits = 50,000mc. No expiry, priority 0. These stack with the wallet and burn in creation-order alongside wallet blocks.

3. Buy a plan from the wallet

Purchasing a plan is a two-step operation: debit the plan cost from the wallet, then grant the plan credits with an expiry.

Step 1: Debit the plan cost.

POST /v1/usage
Idempotency-Key: usage:plan-buy-<purchase_id>

{
  "customer_id": "cus_...",
  "billable_metric_key": "plan_purchase_1hr",
  "units": 100,
  "metadata": {
    "companion_id": "companion_42",
    "plan_type": "1hr"
  }
}

Here units: 100 means 100 credits (100,000mc) — the price of a 1-hour plan for this companion. Different companions can have different prices; you pass the companion-specific price as the units value.

Step 2: Grant plan credits.

POST /v1/topup/grant
Idempotency-Key: topup:plan-grant-<purchase_id>

{
  "customer_id": "cus_...",
  "credits": 200000,
  "expires_at": "2026-04-13T11:00:00Z",
  "metadata": {
    "source": "plan_1hr",
    "companion_id": "companion_42",
    "priority": 10
  }
}

200 credits (200,000mc) with a 1-hour expiry and priority 10. These burn before wallet credits.

Variable pricing per companion: The plan cost (units on the debit) and the credits granted can both vary by companion. Your app stores companion pricing; QuotaStack just executes the credit math.

4. Send a message

Before sending a message to the AI, check entitlement and then record usage:

GET /v1/customers/cus_.../entitlements/chat_message?units=1

Response:

{
  "allowed": true,
  "balance": 245000,
  "effective_balance": 245000,
  "cost_per_unit": 1000,
  "cost_total": 1000,
  "affordable_units": 245
}

If allowed is true, proceed:

POST /v1/usage
Idempotency-Key: usage:msg-<message_id>

{
  "customer_id": "cus_...",
  "billable_metric_key": "chat_message",
  "units": 1,
  "metadata": {
    "conversation_id": "conv_789",
    "companion_id": "companion_42"
  }
}

The usage consumer debits 1000mc from the highest-priority block (the plan credits, if active). Returns 202 — processing is async.

5. Plan stacking

When a user buys a second plan while one is still active, set expires_at relative to the existing plan’s expiry so the new plan extends the window rather than overlapping.

To compute the correct expiry:

GET /v1/customers/cus_.../credits?include_blocks=true

Find the latest expires_at among active plan blocks (blocks with priority 10 and a non-null expiry). Set the new plan’s expires_at to that value plus the plan duration.

# Pseudocode
blocks = get_credit_blocks(customer_id)
plan_blocks = [b for b in blocks if b.priority == 10 and b.expires_at]

if plan_blocks:
    latest_expiry = max(b.expires_at for b in plan_blocks)
    new_expires_at = latest_expiry + plan_duration
else:
    new_expires_at = now() + plan_duration

This prevents wasted overlap. The user effectively extends their plan window.

Full message handler with auto-buy

Here is the complete pseudocode for handling a message send, including auto-buy logic:

def handle_send_message(customer_id, companion_id, conversation_id, message):
    # 1. Check entitlement
    ent = quotastack.get_entitlement(customer_id, "chat_message", units=1)

    if not ent.allowed:
        # Try auto-buy if wallet has funds for a plan
        bought = try_auto_buy_plan(customer_id, companion_id)
        if not bought:
            return error("Insufficient credits. Please recharge your wallet.")
        # Re-check after auto-buy
        ent = quotastack.get_entitlement(customer_id, "chat_message", units=1)
        if not ent.allowed:
            return error("Insufficient credits after plan purchase.")

    # 2. Record usage (async, returns 202)
    quotastack.record_usage(
        customer_id=customer_id,
        billable_metric_key="chat_message",
        units=1,
        idempotency_key=f"usage:msg-{message.id}",
        metadata={
            "conversation_id": conversation_id,
            "companion_id": companion_id,
        }
    )

    # 3. Call AI and stream response
    response = ai_service.chat(companion_id, message)
    return response


def try_auto_buy_plan(customer_id, companion_id):
    """Attempt to purchase the user's preferred plan from wallet balance."""
    companion = get_companion(companion_id)
    plan_cost = companion.default_plan_cost      # e.g. 100 credits
    plan_credits = companion.default_plan_credits # e.g. 200 credits
    plan_duration = companion.default_plan_duration # e.g. 1 hour

    # Check if wallet can afford the plan
    ent = quotastack.get_entitlement(
        customer_id,
        "plan_purchase_1hr",
        units=plan_cost
    )
    if not ent.allowed:
        return False

    purchase_id = generate_uuid()

    # Compute stacked expiry
    blocks = quotastack.get_credits(customer_id, include_blocks=True).blocks
    plan_blocks = [b for b in blocks if b.priority == 10 and b.expires_at]
    if plan_blocks:
        base_time = max(b.expires_at for b in plan_blocks)
    else:
        base_time = now()
    expires_at = base_time + plan_duration

    # Step 1: Debit plan cost from wallet
    quotastack.record_usage(
        customer_id=customer_id,
        billable_metric_key="plan_purchase_1hr",
        units=plan_cost,
        idempotency_key=f"usage:plan-buy-{purchase_id}",
        metadata={
            "companion_id": companion_id,
            "auto_buy": True,
        }
    )

    # Step 2: Grant plan credits with expiry + high priority
    quotastack.topup_grant(
        customer_id=customer_id,
        credits=plan_credits * 1000,  # convert to millicredits
        expires_at=expires_at,
        idempotency_key=f"topup:plan-grant-{purchase_id}",
        metadata={
            "source": "plan_1hr",
            "companion_id": companion_id,
            "priority": 10,
            "auto_buy": True,
        }
    )

    return True

Plan expiry and auto-buy trigger

QuotaStack fires a credit.expired webhook when plan blocks expire. Use this to trigger auto-buy:

def handle_webhook(event):
    if event.type == "credit.expired":
        block = event.credit_block
        # Only auto-buy for plan blocks
        if block.priority == 10 and block.metadata.get("source") in ["plan_1hr", "plan_weekly"]:
            customer = get_customer(block.customer_id)
            if customer.settings.auto_buy_enabled:
                companion_id = block.metadata.get("companion_id")
                try_auto_buy_plan(block.customer_id, companion_id)

Showing balance in the UI

Use the balance endpoint to display remaining credits:

GET /v1/customers/cus_.../credits?include_blocks=true

From the response, compute:

  • Wallet balance: Sum remaining_amount of blocks with no expiry and source "wallet_recharge".
  • Plan credits remaining: Sum remaining_amount of blocks with priority 10.
  • Plan expires at: Latest expires_at among priority-10 blocks.
  • Free credits remaining: Sum remaining_amount of blocks with source "free_conversation".
  • Total messages available: balance / 1000 (since 1 message = 1000mc).

Tips

  • Idempotency keys matter. The two-step plan purchase (debit + grant) must use deterministic idempotency keys derived from a single purchase_id. If step 1 succeeds but step 2 fails, retrying with the same keys is safe — step 1 replays as a no-op, step 2 executes.

  • Usage is async. POST /v1/usage returns 202 immediately; the credit debit is applied in the background. For chat messages this is fine — the entitlement check already confirmed the user can afford it.

  • Priority is the key lever. Setting plan credits to priority 10 and wallet credits to priority 0 ensures plans burn first. This is the entire mechanism for “plans are consumed before wallet.”

  • Variable companion pricing. QuotaStack’s metering rules are tenant-scoped, not companion-scoped. All companions share the same chat_message metric at 1000mc per unit. Price differences are expressed as different units values on the plan purchase debit, not as different metering rules.

  • Plan purchase is not a subscription. This pattern uses topup grants with expiry, not QuotaStack subscriptions. There is no renewal cycle, no grace period, no subscription state machine. The plan is a credit block with a clock on it. This keeps the model simple for consumer apps where users buy plans ad hoc.

See also: Credits, Burn-Down Order, Entitlements.

Concepts used in this pattern

🤖
Building with an AI agent?
Get this page as markdown: /docs/use-cases/ai-chat-app.md · Full index: /llms.txt