All dimensions

Detection dimension · weight 8%

Cache Hit Detection

What this dimension detects

Cached prefix replays return identical text with TTFT close to 0 and inter-token times near zero variance. A proxy that aggressively caches is not strictly substituting models, but it is also not running fresh inference.

Algorithm

Send the same prompt twice with temperature > 0. If the responses are identical and TTFT drops by ≥ 50% on the second call, mark as cached.

Thresholds

ConditionVerdict contribution
Identical text + TTFT drop ≥ 50%Cache hit
Identical text + TTFT drop 20–50%Possible cache
OtherwiseNo cache observed

Limitations

Identical text at temperature > 0 across two calls is not impossible — short, deterministic prompts can produce stable outputs.

References

  • TrueLLMs lib/identity-audit/index.ts detectCacheHit

Back to the full methodology