AI Chat App: Wallet + Plans + Per-Message Metering
How to model an AI companion chat app with wallet-based credits, time-limited plans purchased from the wallet, and per-message metering.
Inspired by: Character.AI, Replika, OurVibe-style companion apps
Think of this as a chat app with a prepaid wallet: customers top up real money, optionally buy time-limited plan packs (1-hour or weekly bundles) from that wallet, and every message debits credits. Plan packs burn first; wallet is the safety net.
AI Chat App
Pattern: wallet + plans + per-message metering.
The problem
You run an AI companion chat app. Users chat with AI companions, and each message costs credits. Users maintain a wallet topped up with real money. They can buy time-limited plans (1-hour, weekly) from their wallet that give a bundle of message credits at higher priority. When plan credits run out or expire, the wallet balance kicks in.
You need:
- A persistent wallet balance that never expires.
- Time-limited plan bundles that burn before the wallet.
- Free conversation credits granted when a user starts a new chat.
- Variable pricing per companion (some companions cost more than others).
- Auto-buy logic: when a plan expires mid-session, automatically purchase a new one from the wallet.
Credit structure
QuotaStack models this with three types of credit blocks, all stacking on a single customer:
| Block type | Priority | Expiry | Source |
|---|---|---|---|
| Wallet | 0 | None | Topup grant after payment |
| Plan credits (1hr, weekly) | 10 | Time-limited (1hr or 7 days) | Topup grant after wallet debit |
| Free conversation credits | 0 | None | Topup grant on new conversation |
Burn-down order: Plan credits (priority 10) burn first. When they expire or run out, free conversation credits and wallet balance (both priority 0) share the same tier — oldest-expiring first, then oldest-created first. Since both have no expiry, the oldest block burns first.
This is the right behavior: time-limited credits should be consumed before permanent ones to minimize waste.
Billable metrics
Set up three billable metrics and their metering rules:
POST /v1/billable-metrics
{ "key": "chat_message", "name": "Chat Message" }
POST /v1/billable-metrics
{ "key": "plan_purchase_1hr", "name": "1-Hour Plan Purchase" }
POST /v1/billable-metrics
{ "key": "plan_purchase_weekly", "name": "Weekly Plan Purchase" }
Metering rules (all per_unit at 1000mc per unit):
POST /v1/metering-rules
{
"billable_metric_key": "chat_message",
"cost_type": "per_unit",
"credit_cost": 1000,
"unit_cost": 1000
}
POST /v1/metering-rules
{
"billable_metric_key": "plan_purchase_1hr",
"cost_type": "per_unit",
"credit_cost": 1000,
"unit_cost": 1000
}
POST /v1/metering-rules
{
"billable_metric_key": "plan_purchase_weekly",
"cost_type": "per_unit",
"credit_cost": 1000,
"unit_cost": 1000
}
Each unit = 1000mc = 1 credit. A message costs 1 unit (1 credit). A plan purchase costs N units, where N is the plan price in credits — passed dynamically via the units field.
Integration flow
1. Customer signup and wallet recharge
Create the customer when they sign up:
POST /v1/customers
Idempotency-Key: signup:<your_user_id>
{
"external_id": "user_12345"
}
When the user pays via your payment provider (Stripe, Razorpay, etc.), grant wallet credits:
POST /v1/topup/grant
Idempotency-Key: topup:<your_payment_id>
{
"customer_id": "cus_...",
"credits": 500000,
"metadata": {
"source": "wallet_recharge",
"payment_id": "pay_abc123",
"amount_paid": "499"
}
}
No expiry, no priority specified (defaults to 0). This is the wallet. The metadata field stores your fiat payment reference for audit — QuotaStack never reads it.
2. New conversation — grant free credits
Each time a user starts a new conversation, grant 50 free message credits:
POST /v1/topup/grant
Idempotency-Key: convo-grant:<conversation_id>
{
"customer_id": "cus_...",
"credits": 50000,
"metadata": {
"source": "free_conversation",
"conversation_id": "conv_789"
}
}
50 credits = 50,000mc. No expiry, priority 0. These stack with the wallet and burn in creation-order alongside wallet blocks.
3. Buy a plan from the wallet
Purchasing a plan is a two-step operation: debit the plan cost from the wallet, then grant the plan credits with an expiry.
Step 1: Debit the plan cost.
POST /v1/usage
Idempotency-Key: usage:plan-buy-<purchase_id>
{
"customer_id": "cus_...",
"billable_metric_key": "plan_purchase_1hr",
"units": 100,
"metadata": {
"companion_id": "companion_42",
"plan_type": "1hr"
}
}
Here units: 100 means 100 credits (100,000mc) — the price of a 1-hour plan for this companion. Different companions can have different prices; you pass the companion-specific price as the units value.
Step 2: Grant plan credits.
POST /v1/topup/grant
Idempotency-Key: topup:plan-grant-<purchase_id>
{
"customer_id": "cus_...",
"credits": 200000,
"expires_at": "2026-04-13T11:00:00Z",
"metadata": {
"source": "plan_1hr",
"companion_id": "companion_42",
"priority": 10
}
}
200 credits (200,000mc) with a 1-hour expiry and priority 10. These burn before wallet credits.
Variable pricing per companion: The plan cost (units on the debit) and the credits granted can both vary by companion. Your app stores companion pricing; QuotaStack just executes the credit math.
4. Send a message
Before sending a message to the AI, check entitlement and then record usage:
GET /v1/customers/cus_.../entitlements/chat_message?units=1
Response:
{
"allowed": true,
"balance": 245000,
"effective_balance": 245000,
"cost_per_unit": 1000,
"cost_total": 1000,
"affordable_units": 245
}
If allowed is true, proceed:
POST /v1/usage
Idempotency-Key: usage:msg-<message_id>
{
"customer_id": "cus_...",
"billable_metric_key": "chat_message",
"units": 1,
"metadata": {
"conversation_id": "conv_789",
"companion_id": "companion_42"
}
}
The usage consumer debits 1000mc from the highest-priority block (the plan credits, if active). Returns 202 — processing is async.
5. Plan stacking
When a user buys a second plan while one is still active, set expires_at relative to the existing plan’s expiry so the new plan extends the window rather than overlapping.
To compute the correct expiry:
GET /v1/customers/cus_.../credits?include_blocks=true
Find the latest expires_at among active plan blocks (blocks with priority 10 and a non-null expiry). Set the new plan’s expires_at to that value plus the plan duration.
# Pseudocode
blocks = get_credit_blocks(customer_id)
plan_blocks = [b for b in blocks if b.priority == 10 and b.expires_at]
if plan_blocks:
latest_expiry = max(b.expires_at for b in plan_blocks)
new_expires_at = latest_expiry + plan_duration
else:
new_expires_at = now() + plan_duration
This prevents wasted overlap. The user effectively extends their plan window.
Full message handler with auto-buy
Here is the complete pseudocode for handling a message send, including auto-buy logic:
def handle_send_message(customer_id, companion_id, conversation_id, message):
# 1. Check entitlement
ent = quotastack.get_entitlement(customer_id, "chat_message", units=1)
if not ent.allowed:
# Try auto-buy if wallet has funds for a plan
bought = try_auto_buy_plan(customer_id, companion_id)
if not bought:
return error("Insufficient credits. Please recharge your wallet.")
# Re-check after auto-buy
ent = quotastack.get_entitlement(customer_id, "chat_message", units=1)
if not ent.allowed:
return error("Insufficient credits after plan purchase.")
# 2. Record usage (async, returns 202)
quotastack.record_usage(
customer_id=customer_id,
billable_metric_key="chat_message",
units=1,
idempotency_key=f"usage:msg-{message.id}",
metadata={
"conversation_id": conversation_id,
"companion_id": companion_id,
}
)
# 3. Call AI and stream response
response = ai_service.chat(companion_id, message)
return response
def try_auto_buy_plan(customer_id, companion_id):
"""Attempt to purchase the user's preferred plan from wallet balance."""
companion = get_companion(companion_id)
plan_cost = companion.default_plan_cost # e.g. 100 credits
plan_credits = companion.default_plan_credits # e.g. 200 credits
plan_duration = companion.default_plan_duration # e.g. 1 hour
# Check if wallet can afford the plan
ent = quotastack.get_entitlement(
customer_id,
"plan_purchase_1hr",
units=plan_cost
)
if not ent.allowed:
return False
purchase_id = generate_uuid()
# Compute stacked expiry
blocks = quotastack.get_credits(customer_id, include_blocks=True).blocks
plan_blocks = [b for b in blocks if b.priority == 10 and b.expires_at]
if plan_blocks:
base_time = max(b.expires_at for b in plan_blocks)
else:
base_time = now()
expires_at = base_time + plan_duration
# Step 1: Debit plan cost from wallet
quotastack.record_usage(
customer_id=customer_id,
billable_metric_key="plan_purchase_1hr",
units=plan_cost,
idempotency_key=f"usage:plan-buy-{purchase_id}",
metadata={
"companion_id": companion_id,
"auto_buy": True,
}
)
# Step 2: Grant plan credits with expiry + high priority
quotastack.topup_grant(
customer_id=customer_id,
credits=plan_credits * 1000, # convert to millicredits
expires_at=expires_at,
idempotency_key=f"topup:plan-grant-{purchase_id}",
metadata={
"source": "plan_1hr",
"companion_id": companion_id,
"priority": 10,
"auto_buy": True,
}
)
return True
Plan expiry and auto-buy trigger
QuotaStack fires a credit.expired webhook when plan blocks expire. Use this to trigger auto-buy:
def handle_webhook(event):
if event.type == "credit.expired":
block = event.credit_block
# Only auto-buy for plan blocks
if block.priority == 10 and block.metadata.get("source") in ["plan_1hr", "plan_weekly"]:
customer = get_customer(block.customer_id)
if customer.settings.auto_buy_enabled:
companion_id = block.metadata.get("companion_id")
try_auto_buy_plan(block.customer_id, companion_id)
Showing balance in the UI
Use the balance endpoint to display remaining credits:
GET /v1/customers/cus_.../credits?include_blocks=true
From the response, compute:
- Wallet balance: Sum
remaining_amountof blocks with no expiry and source"wallet_recharge". - Plan credits remaining: Sum
remaining_amountof blocks with priority 10. - Plan expires at: Latest
expires_atamong priority-10 blocks. - Free credits remaining: Sum
remaining_amountof blocks with source"free_conversation". - Total messages available:
balance / 1000(since 1 message = 1000mc).
Tips
-
Idempotency keys matter. The two-step plan purchase (debit + grant) must use deterministic idempotency keys derived from a single
purchase_id. If step 1 succeeds but step 2 fails, retrying with the same keys is safe — step 1 replays as a no-op, step 2 executes. -
Usage is async.
POST /v1/usagereturns 202 immediately; the credit debit is applied in the background. For chat messages this is fine — the entitlement check already confirmed the user can afford it. -
Priority is the key lever. Setting plan credits to priority 10 and wallet credits to priority 0 ensures plans burn first. This is the entire mechanism for “plans are consumed before wallet.”
-
Variable companion pricing. QuotaStack’s metering rules are tenant-scoped, not companion-scoped. All companions share the same
chat_messagemetric at 1000mc per unit. Price differences are expressed as differentunitsvalues on the plan purchase debit, not as different metering rules. -
Plan purchase is not a subscription. This pattern uses topup grants with expiry, not QuotaStack subscriptions. There is no renewal cycle, no grace period, no subscription state machine. The plan is a credit block with a clock on it. This keeps the model simple for consumer apps where users buy plans ad hoc.
See also: Credits, Burn-Down Order, Entitlements.