Dimension · score weight 35%
Tokenizer Family Fingerprint
What this dimension detects
A deterministic slope test estimates the server-side token accounting tokenizer family. It regresses reported prompt_tokens against repeated probe-unit count k; the slope equals the exact token count of one probe unit under the server tokenizer. The result is compared with js-tiktoken exact o200k_base and cl100k_base counts. Note that o200k_base is the current OpenAI tokenizer for the entire GPT-4o to GPT-5.x flagship series, so a tokenizer-family match deterministically covers the most expensive OpenAI tier. For Claude and Gemini, whose tokenizers are closed-source and cannot be computed locally, differential mode compares the audited endpoint's prompt_tokens slope directly against the trusted reference endpoint's slope — the reference slope is the exact ground-truth tokenizer count for the real model, enabling deterministic detection of substitution without a local tokenizer.
Algorithm
Send repeated-unit prompts for CJK, emoji, rare Unicode, and long ASCII units at k = 0, 1, 2, 3. For each unit, fit prompt_tokens ~ k, require R² ≥ 0.98, and compare the rounded slope with exact o200k_base and cl100k_base counts. A consistent family across all units identifies or excludes the OpenAI tokenizer family; a consistent multiplicative ratio above 1.0 is surfaced as possible usage inflation. In differential mode, the audited endpoint's slope is also compared with the trusted reference endpoint's slope for the same units; a mismatch between the two slopes indicates the audited endpoint is using a different tokenizer than the real model, even when the tokenizer is closed-source (Claude/Gemini).
Thresholds
| Condition | Verdict contribution |
|---|---|
| All units fit one claimed exact family; R² ≥ 0.98 | Scored match for tokenizer family |
| Slope points to a different exact family or non-o200k/cl100k while the claimed model is exact-checkable | Scored mismatch, but only likely evidence unless another scored dimension corroborates |
| Missing units, non-linear counts, or inconsistent families | Unavailable; weight rebalances across scored dimensions |
Limitations
This measures the server-side token accounting tokenizer family, not a direct hidden-model identifier. It usually matches the served model's native tokenizer, but gateways may bill through a normalized tokenizer. A lone mismatch should be treated as likely evidence and needs capability or differential corroboration before a confirmed verdict. Image models (e.g., gpt-image-2) are outside this framework: they have no chat prompt_tokens tokenizer slope, no text capability probes, and MMD is a text-distribution test. Image-model substitution detection requires image statistics and latency fingerprinting, which are currently unsupported.
References
- OpenAI tiktoken encoding tables, github.com/openai/tiktoken
- js-tiktoken exact o200k_base and cl100k_base encoders
- TrueLLMs lib/identity-audit/tokenizer-fingerprint.ts