At temperature > 0, repeated requests should normally show some response diversity. Repeated byte-identical responses suggest cache replay, deterministic sampling override, or another freshness problem.
Algorithm
Group repeated cache probes by testCaseId. For each group with at least two samples, compare response text exactly and by simple character similarity. Exact duplicate groups and near-duplicate groups are counted separately; the result is scored as cache replay only when duplicate behavior is common across groups.
Thresholds
Condition
Verdict contribution
Exact duplicate responses in ≥ 50% of repeated groups
Scored cache-replay mismatch
Near-duplicate responses (> 95% similarity) in ≥ 50% of repeated groups
Possible cache or sampling override
Repeated groups show reasonable diversity
No cache replay observed
Limitations
Short prompts, temperature floors, provider-side deterministic decoding, or explicit seed controls can produce repeated text without dishonest caching. This dimension flags freshness risk; it does not identify which model served the response.
Anysingle signal cannot provemalicious behavior. Proxies may show anomalies for legitimate reasons (regional routing, A/B testing, degradation strategies, cache optimization).
Token ratio deviation may result from ChatML wrapping, system prompt injection, or tokenizer version differences — not necessarily intentional inflation.
Model identity judgment is based on statistical fingerprint matching, not cryptographic proof. Quantization, fine-tuning, and post-processing can all alter fingerprints.
MMD distribution tests are sensitive to temperature, sampling parameters, and system prompts. Significant p-values mean distributional difference, not proof of substitution.
Logprobs unavailability is increasingly common (many providers disable it by default in 2025-2026) and does not by itself indicate deception.
ITT rhythm fingerprinting is an early-stage technique. Network jitter, TCP coalescing, and gateway buffering can produce false signals.
This tool generates reference-grade evidence chains, not legal conclusions. Do not make definitive accusations based solely on this report.
The wording in the report refers to statistical "deviations" or "signal inconsistencies". Please do not use this to make fraud or deception claims against any service provider.