Back to the auditor

Glossary

Terms used when auditing LLM proxies, tokenizers, and identity fingerprints.

API gateway
A server that sits in front of an LLM provider and re-exposes the same API surface, often with billing, rate-limiting, or model routing.
ARC-AGI-2
The 2025 successor to ARC-AGI, a visual-reasoning benchmark. Used by TrueLLMs as a tier-1 canary prompt for capability checks.
Baseline
A reference set of responses from a known-good endpoint. Required for the MMD dimension; recorded once, replayed for every audit.
BPE (Byte-Pair Encoding)
The tokenization algorithm used by GPT, Claude and most current LLMs. Operates on byte sequences with a learned merge table.
Cache hit
When the proxy returns a previously-computed response without running fresh inference. Detectable by near-zero TTFT and identical text on repeated calls.
Canary prompt
A small, deterministic test prompt where the expected answer is known. Differential failure on canaries reveals capability gaps.
ChatML
OpenAI's chat-format tokenization with role tokens. Specific role tokens leak into top_logprobs and reveal the family.
cl100k_base
The OpenAI tokenizer used for GPT-3.5-turbo and GPT-4. ~100k vocab.
Claude Opus 4.7
Anthropic's frontier model as of May 2026. Successor to Claude Opus 4.5.
Confidence
TrueLLMs' top-line metric, 0–100. Represents the share of available evidence that points to substitution. Capped at 70 when logprobs are unavailable.
DeepSeek V3.2
DeepSeek's frontier MoE model, May 2026 release.
DFT
Discrete Fourier Transform. Used by the ITT dimension to extract spectral features from inter-token gaps.
Direct mode
TrueLLMs' default network mode. Browser talks to the proxy directly. CORS must be enabled by the proxy.
Evidence chain
The expandable list of detection-dimension cards. Every conclusion in TrueLLMs ships with the raw evidence.
Fingerprint
A multi-feature signature that distinguishes one model from another. TrueLLMs maintains fingerprints for tokenizer family, latency, ITT rhythm, and stylometry.
GPT-5
OpenAI's flagship as of May 2026. Tokenizer is o200k_base. Variants: gpt-5, gpt-5-mini, gpt-5-nano.
Hamming kernel
A discrete kernel that counts string differences position-by-position. Used inside MMD for tokenized response prefixes.
HLE
Humanity's Last Exam, a 2025 benchmark of expert-level questions across 100+ domains. Tier-1 canary set.
INP (Interaction to Next Paint)
A Core Web Vital. TrueLLMs targets < 200 ms by running tokenization in a Web Worker.
ITT (Inter-Token Times)
Time gaps between consecutive streamed SSE chunks. Alhazbi et al. 2025 (arXiv:2502.20589) show these are model-fingerprintable. The TrueLLMs implementation measures arrival times at the server-side reader, not true inter-token times — see the methodology page.
LLMmap
USENIX Security 2025 (Pasquini et al.) active-probing fingerprint technique. The paper trains a deep classifier and reports ~95% vendor identification accuracy across 42 LLM versions. The TrueLLMs implementation is a heuristic approximation and does not claim that accuracy.
Logprob
log P(token). When returned alongside top_logprobs, becomes a near-fingerprint of the underlying model.
MMD (Maximum Mean Discrepancy)
A kernel-based two-sample test. Applied by Gao et al. 2025 to LLM endpoints, found 11/31 production endpoints deviated significantly.
Model Equality Testing
The framing of model-substitution detection as a two-sample distribution test. ICLR 2025.
o200k_base
The OpenAI tokenizer used for GPT-4o and GPT-5. ~200k vocab. Distinct enough from cl100k_base to act as a fingerprint.
OpenAI-compatible API
Any endpoint that implements POST /v1/chat/completions with the OpenAI request/response shape. Most aggregator gateways speak this.
Proxy mode
TrueLLMs' fallback network mode. Requests are forwarded by an in-process Next.js route handler. Key never leaves your machine.
Refusal template
Each vendor's recognisable phrasing when declining a prompt. A weak fingerprint, but cheap.
Sparse-token forgetting
Phenomenon where low-frequency tokens drift in the lm_head during SFT — the input embedding is barely updated (so the model still understands the token) but the output projection moves enough to push the token out of the top-p sampling window. Documented by MiniMax (May 2026) on the 嘉祺 / 王郸 / 相続税 cases. The forgotten-token set is vendor-specific because each vendor's SFT data mix is different.
Speculative decoding
Inference acceleration that runs a draft model in parallel with verification. Produces bimodal inter-token gap distributions.
Stylometry
Sentence-length / Markdown / punctuation feature vector. Coarse but resilient to logprobs being stripped.
tiktoken
OpenAI's open-source BPE encoder. js-tiktoken is the JS port TrueLLMs uses for local re-counting.
Token inflation
When usage.prompt_tokens or usage.completion_tokens exceeds a faithful local recount. Ratios > 1.05 sustained across probes are suspicious.
Token 注水 (Token zhù shuǐ)
Chinese term for token-count inflation by a proxy.
TTFT (Time To First Token)
Time from request send to first streamed chunk. A coarse model-size signal.
Verdict
TrueLLMs' four-level summary: matches, inconclusive, likely-substituted, confirmed-substituted.
中转站 (Zhōng zhuǎn zhàn)
Chinese term for an LLM proxy / aggregator gateway.
模型偷换 (Móxíng tōu huàn)
Chinese term for silent model substitution by a proxy.