All dimensions

Detection dimension · weight 8%

Inter-Token Rhythm Fingerprint

What this dimension detects

When the API streams, the time between consecutive SSE chunks (as measured at the server-side reader) carries a signature of the inference stack: pure autoregressive models look different from speculative-decoding models, which look different from cached replays. TrueLLMs collects chunk arrival timestamps automatically: in Direct mode the client measures arrival locally; in Proxy mode it parses the audit.timing SSE event emitted by the proxy.

Algorithm

Collect inter-chunk timestamps from streamed responses (Date.now() millisecond resolution), compute mean, variance, skew, and a 16-bin DFT over the gap series. Classify rhythm into {autoregressive, speculative-decoding, cached, steady-state}, then compare against the model's recorded rhythm fingerprint.

Thresholds

ConditionVerdict contribution
Rhythm class matches and DFT cosine ≥ 0.85Match
Class matches but DFT cosine < 0.85Borderline
Class differsMismatch

Limitations

Requires streaming. What we actually measure is inter-SSE-chunk arrival time at the server-side reader, not the true inter-token time inside the model — TCP coalescing, SSE flushing cadence, gateway buffering and millisecond-only timestamps add noise. Cached-replay detection (gap < 1ms) is below the measurement resolution and is currently a placeholder. The per-model rhythm fingerprint library is seeded with developer estimates, not large-N measurements.

References

  • Alhazbi et al. LLMs Have Rhythm: Fingerprinting Large Language Models Using Inter-Token Times and Network Traffic Analysis. 2025. arXiv:2502.20589

Back to the full methodology