Detection dimension · weight 8%
Inter-Token Rhythm Fingerprint
What this dimension detects
When the API streams, the time between consecutive SSE chunks (as measured at the server-side reader) carries a signature of the inference stack: pure autoregressive models look different from speculative-decoding models, which look different from cached replays. TrueLLMs collects chunk arrival timestamps automatically: in Direct mode the client measures arrival locally; in Proxy mode it parses the audit.timing SSE event emitted by the proxy.
Algorithm
Collect inter-chunk timestamps from streamed responses (Date.now() millisecond resolution), compute mean, variance, skew, and a 16-bin DFT over the gap series. Classify rhythm into {autoregressive, speculative-decoding, cached, steady-state}, then compare against the model's recorded rhythm fingerprint.
Thresholds
| Condition | Verdict contribution |
|---|---|
| Rhythm class matches and DFT cosine ≥ 0.85 | Match |
| Class matches but DFT cosine < 0.85 | Borderline |
| Class differs | Mismatch |
Limitations
Requires streaming. What we actually measure is inter-SSE-chunk arrival time at the server-side reader, not the true inter-token time inside the model — TCP coalescing, SSE flushing cadence, gateway buffering and millisecond-only timestamps add noise. Cached-replay detection (gap < 1ms) is below the measurement resolution and is currently a placeholder. The per-model rhythm fingerprint library is seeded with developer estimates, not large-N measurements.
References
- Alhazbi et al. LLMs Have Rhythm: Fingerprinting Large Language Models Using Inter-Token Times and Network Traffic Analysis. 2025. arXiv:2502.20589