If a model claims a 200k context window but the proxy errors at 32k, the routing or the middleware is not what is advertised. There are legitimate explanations (gateway cost caps, safety truncation, regional limits) — this dimension flags the mismatch, it does not assert substitution.
Algorithm
Send a graduated context probe: 4k, 16k, 64k, 200k tokens of stable filler with a needle near the end. Confirm the needle is recovered at each step.
Thresholds
Condition
Verdict contribution
Needle recovered at advertised window
Match
Needle lost before advertised window
Mismatch
Limitations
Probe is expensive (≥ 200k input tokens) so it is off by default. Enable for the Deep preset only.
References
Long-context evaluation in Liu et al., 'Lost in the Middle', 2023
Anysingle signal cannot provemalicious behavior. Proxies may show anomalies for legitimate reasons (regional routing, A/B testing, degradation strategies, cache optimization).
Token ratio deviation may result from ChatML wrapping, system prompt injection, or tokenizer version differences — not necessarily intentional inflation.
Model identity judgment is based on statistical fingerprint matching, not cryptographic proof. Quantization, fine-tuning, and post-processing can all alter fingerprints.
MMD distribution tests are sensitive to temperature, sampling parameters, and system prompts. Significant p-values mean distributional difference, not proof of substitution.
Logprobs unavailability is increasingly common (many providers disable it by default in 2025-2026) and does not by itself indicate deception.
ITT rhythm fingerprinting is an early-stage technique. Network jitter, TCP coalescing, and gateway buffering can produce false signals.
This tool generates reference-grade evidence chains, not legal conclusions. Do not make definitive accusations based solely on this report.
The wording in the report refers to statistical "deviations" or "signal inconsistencies". Please do not use this to make fraud or deception claims against any service provider.