All dimensions

Dimension · score weight 15%

Cache Replay Detection

What this dimension detects

At temperature > 0, repeated requests should normally show some response diversity. Repeated byte-identical responses suggest cache replay, deterministic sampling override, or another freshness problem.

Algorithm

Group repeated cache probes by testCaseId. For each group with at least two samples, compare response text exactly and by simple character similarity. Exact duplicate groups and near-duplicate groups are counted separately; the result is scored as cache replay only when duplicate behavior is common across groups.

Thresholds

ConditionVerdict contribution
Exact duplicate responses in ≥ 50% of repeated groupsScored cache-replay mismatch
Near-duplicate responses (> 95% similarity) in ≥ 50% of repeated groupsPossible cache or sampling override
Repeated groups show reasonable diversityNo cache replay observed

Limitations

Short prompts, temperature floors, provider-side deterministic decoding, or explicit seed controls can produce repeated text without dishonest caching. This dimension flags freshness risk; it does not identify which model served the response.

References

  • TrueLLMs lib/identity-audit/index.ts detectCacheHit

Back to the full methodology