Glossary

Terms used when auditing LLM proxies, tokenizers, and identity fingerprints.

API gateway: A server that sits in front of an LLM provider and re-exposes the same API surface, often with billing, rate-limiting, or model routing.
ARC-AGI-2: The 2025 successor to ARC-AGI, a visual-reasoning benchmark. Used by TrueLLMs as a tier-1 canary prompt for capability checks.
Baseline: A reference set of responses from a known-good endpoint. Required for the MMD dimension; recorded once, replayed for every audit.
BPE (Byte-Pair Encoding): The tokenization algorithm used by GPT, Claude and most current LLMs. Operates on byte sequences with a learned merge table.
Cache hit: When the proxy returns a previously-computed response without running fresh inference. Detectable by near-zero TTFT and identical text on repeated calls.
Canary prompt: A small, deterministic test prompt where the expected answer is known. Differential failure on canaries reveals capability gaps.
ChatML: OpenAI's chat-format tokenization with role tokens. Specific role tokens leak into top_logprobs and reveal the family.
cl100k_base: The OpenAI tokenizer used for GPT-3.5-turbo and GPT-4. ~100k vocab.
Claude Opus 4.7: Anthropic's frontier model as of May 2026. Successor to Claude Opus 4.5.
Confidence: TrueLLMs' top-line metric, 0–100. Represents the share of available evidence that points to substitution. Capped at 70 when logprobs are unavailable.
DeepSeek V3.2: DeepSeek's frontier MoE model, May 2026 release.
DFT: Discrete Fourier Transform. Used by the ITT dimension to extract spectral features from inter-token gaps.
Direct mode: TrueLLMs' default network mode. Browser talks to the proxy directly. CORS must be enabled by the proxy.
Evidence chain: The expandable list of detection-dimension cards. Every conclusion in TrueLLMs ships with the raw evidence.
Fingerprint: A multi-feature signature that distinguishes one model from another. TrueLLMs maintains fingerprints for tokenizer family, latency, ITT rhythm, and stylometry.
GPT-5: OpenAI's flagship as of May 2026. Tokenizer is o200k_base. Variants: gpt-5, gpt-5-mini, gpt-5-nano.
Hamming kernel: A discrete kernel that counts string differences position-by-position. Used inside MMD for tokenized response prefixes.
HLE: Humanity's Last Exam, a 2025 benchmark of expert-level questions across 100+ domains. Tier-1 canary set.
INP (Interaction to Next Paint): A Core Web Vital. TrueLLMs targets < 200 ms by running tokenization in a Web Worker.
ITT (Inter-Token Times): Time gaps between consecutive streamed SSE chunks. Alhazbi et al. 2025 (arXiv:2502.20589) show these are model-fingerprintable. The TrueLLMs implementation measures arrival times at the server-side reader, not true inter-token times — see the methodology page.
LLMmap: USENIX Security 2025 (Pasquini et al.) active-probing fingerprint technique. The paper trains a deep classifier and reports ~95% vendor identification accuracy across 42 LLM versions. The TrueLLMs implementation is a heuristic approximation and does not claim that accuracy.
Logprob: log P(token). When returned alongside top_logprobs, becomes a near-fingerprint of the underlying model.
MMD (Maximum Mean Discrepancy): A kernel-based two-sample test. Applied by Gao et al. 2025 to LLM endpoints, found 11/31 production endpoints deviated significantly.
Model Equality Testing: The framing of model-substitution detection as a two-sample distribution test. ICLR 2025.
o200k_base: The OpenAI tokenizer used for GPT-4o and GPT-5. ~200k vocab. Distinct enough from cl100k_base to act as a fingerprint.
OpenAI-compatible API: Any endpoint that implements POST /v1/chat/completions with the OpenAI request/response shape. Most aggregator gateways speak this.
Proxy mode: TrueLLMs' fallback network mode. Requests are forwarded by an in-process Next.js route handler. Key never leaves your machine.
Refusal template: Each vendor's recognisable phrasing when declining a prompt. A weak fingerprint, but cheap.
Sparse-token forgetting: Phenomenon where low-frequency tokens drift in the lm_head during SFT — the input embedding is barely updated (so the model still understands the token) but the output projection moves enough to push the token out of the top-p sampling window. Documented by MiniMax (May 2026) on the 嘉祺 / 王郸 / 相続税 cases. The forgotten-token set is vendor-specific because each vendor's SFT data mix is different.
Speculative decoding: Inference acceleration that runs a draft model in parallel with verification. Produces bimodal inter-token gap distributions.
Stylometry: Sentence-length / Markdown / punctuation feature vector. Coarse but resilient to logprobs being stripped.
tiktoken: OpenAI's open-source BPE encoder. js-tiktoken is the JS port TrueLLMs uses for local re-counting.
Token inflation: When usage.prompt_tokens or usage.completion_tokens exceeds a faithful local recount. Ratios > 1.05 sustained across probes are suspicious.
Token 注水 (Token zhù shuǐ): Chinese term for token-count inflation by a proxy.
TTFT (Time To First Token): Time from request send to first streamed chunk. A coarse model-size signal.
Verdict: TrueLLMs' four-level summary: matches, inconclusive, likely-substituted, confirmed-substituted.
中转站 (Zhōng zhuǎn zhàn): Chinese term for an LLM proxy / aggregator gateway.
模型偷换 (Móxíng tōu huàn): Chinese term for silent model substitution by a proxy.

Disclaimer · About Interpreting Detection Signals

Anysingle signal cannot provemalicious behavior. Proxies may show anomalies for legitimate reasons (regional routing, A/B testing, degradation strategies, cache optimization).
Token ratio deviation may result from ChatML wrapping, system prompt injection, or tokenizer version differences — not necessarily intentional inflation.
Model identity judgment is based on statistical fingerprint matching, not cryptographic proof. Quantization, fine-tuning, and post-processing can all alter fingerprints.
MMD distribution tests are sensitive to temperature, sampling parameters, and system prompts. Significant p-values mean distributional difference, not proof of substitution.
Logprobs unavailability is increasingly common (many providers disable it by default in 2025-2026) and does not by itself indicate deception.
ITT rhythm fingerprinting is an early-stage technique. Network jitter, TCP coalescing, and gateway buffering can produce false signals.
This tool generates reference-grade evidence chains, not legal conclusions. Do not make definitive accusations based solely on this report.

The wording in the report refers to statistical "deviations" or "signal inconsistencies". Please do not use this to make fraud or deception claims against any service provider.