All dimensions

Detection dimension · weight 2%

Latency Distribution

What this dimension detects

TTFT and tokens-per-second distributions correlate with model size. A proxy claiming GPT-5 that consistently responds in 200ms with 180 tok/s is not running GPT-5.

Algorithm

Collect TTFT and tok/s across all probes, compare to baseline ranges in lib/fingerprints/latency.

Thresholds

ConditionVerdict contribution
Within ±30% of baselineMatch
±30–60% offBorderline
> ±60% offMismatch

Limitations

Network and provider load dominate at small N. ITT is a sharper signal when streaming is available.

References

  • TrueLLMs lib/fingerprints/latency.ts

Back to the full methodology