Dimension · score weight 15%

Cache Replay Detection

What this dimension detects

At temperature > 0, repeated requests should normally show some response diversity. Repeated byte-identical responses suggest cache replay, deterministic sampling override, or another freshness problem.

Algorithm

Group repeated cache probes by testCaseId. For each group with at least two samples, compare response text exactly and by simple character similarity. Exact duplicate groups and near-duplicate groups are counted separately; the result is scored as cache replay only when duplicate behavior is common across groups.

Thresholds

Condition	Verdict contribution
Exact duplicate responses in ≥ 50% of repeated groups	Scored cache-replay mismatch
Near-duplicate responses (> 95% similarity) in ≥ 50% of repeated groups	Possible cache or sampling override
Repeated groups show reasonable diversity	No cache replay observed

Limitations

Short prompts, temperature floors, provider-side deterministic decoding, or explicit seed controls can produce repeated text without dishonest caching. This dimension flags freshness risk; it does not identify which model served the response.

References

TrueLLMs lib/identity-audit/index.ts detectCacheHit

Back to the full methodology