AI Token Metering
Every AI pricing model rests on one thing: accurately measuring what your product consumes. Here is how to capture, attribute, and rate every token — across any model provider — with sub-second, exactly-once accuracy, then send rated usage to the billing system of your choice.
The Short Answer
What is token metering?
Token metering is the process of capturing, attributing, and rating every token an AI product consumes. Input tokens (the prompt) and output tokens (the response) are metered separately — often at different rates, and across different models — then attributed to the correct customer and rated against your pricing rules in real time.
It is the measurement layer beneath every AI pricing model. Whether you sell credits, pay-as-you-go, or hybrid plans, none of it is accurate unless the underlying metering is. Get the metering right and the billing falls into place.
What You Can Meter
Tokens are just the start
Tokens are the highest-frequency unit, but the same engine meters everything your AI product consumes — at the application level, independent of which providers you use behind the scenes.
LLM tokens
Input and output tokens metered separately, across any model provider, at rates you control.
Compute & inference
GPU seconds, inference time, and processing duration for self-hosted or managed models.
Agent actions
Tool calls, reasoning steps, and autonomous decisions inside an agent run.
Retrieval & embeddings
Vector-DB queries, embeddings, and document processing for RAG pipelines.
API calls
External service invocations, webhooks, and integrations triggered by your product.
Custom events
Any measurable unit of value your product defines — metered the same way.
How It Works
How to meter tokens at scale
Four stages turn raw, high-frequency events into rated usage you can bill on — in milliseconds, not nightly batches.
Capture
Send usage events via SDK or API as they happen — tokens, compute, actions. High-throughput ingestion handles millions of events per hour.
Attribute
Every event is validated, deduplicated for exactly-once accuracy, and attributed to the right customer, feature, and model in milliseconds.
Rate
Your rate cards turn raw events into rated line items in real time — per-token, per-action, tiered, or credit-based — with no engineering release to change a price.
Hand off to billing
Rated usage flows to your billing platform of choice for invoicing, or accumulates for a use-then-invoice true-up. Metering is the source of truth.
From Metered Usage to Revenue
Meter here. Bill anywhere.
Nalpeiron is the metering and rating layer — not a billing system. Once usage is metered and rated, you have two well-trodden paths to revenue.
Your billing platform of choice
Rated line items flow to Stripe, Zuora, NetSuite, Chargebee, or any system you already use. Proration, tax, currency, and reconciliation are handled downstream — we stay billing-agnostic so you never have to rip and replace.
Use then invoice (true-up)
Prefer traditional invoicing? Let customers consume across the period, then issue a true-up invoice based on actual metered usage. A familiar, finance-friendly model for enterprise and committed-use deals.
Go Deeper
Metering powers every AI model
Accurate metering is the foundation. Here is what you build on top of it.
FAQ
Token metering FAQ
Meter every token with confidence
See how Nalpeiron captures, attributes, and rates your AI usage in real time — and hands it to the billing system you already use.