Sparse-token stress is a weak generation-side fingerprint. It asks the model to echo fragile low-frequency token strings and checks whether the output side of the tuned model can still emit them.
Algorithm
Send echo-only prompts for rare CJK names, Chinese SEO strings, Japanese colloquial strings, and related low-frequency forms. Classify each response as hit, omit, substitute, partial, refuse, or blank. Aggregate hit rate drives the score; failure families are displayed for forensics but do not vote a specific vendor.
The probe is vendor-independent and weak. Failures can reflect SFT data coverage, language specialization, safety wrappers, or sampling behavior. TrueLLMs does not yet have enough cross-vendor measured failure tables to infer an exact substitute model from this dimension.
References
MiniMax. Internal investigation: Ma Jiaqi (马嘉祺) sparse-token forgetting and lm_head drift, May 2026.
Anysingle signal cannot provemalicious behavior. Proxies may show anomalies for legitimate reasons (regional routing, A/B testing, degradation strategies, cache optimization).
Token ratio deviation may result from ChatML wrapping, system prompt injection, or tokenizer version differences — not necessarily intentional inflation.
Model identity judgment is based on statistical fingerprint matching, not cryptographic proof. Quantization, fine-tuning, and post-processing can all alter fingerprints.
MMD distribution tests are sensitive to temperature, sampling parameters, and system prompts. Significant p-values mean distributional difference, not proof of substitution.
Logprobs unavailability is increasingly common (many providers disable it by default in 2025-2026) and does not by itself indicate deception.
ITT rhythm fingerprinting is an early-stage technique. Network jitter, TCP coalescing, and gateway buffering can produce false signals.
This tool generates reference-grade evidence chains, not legal conclusions. Do not make definitive accusations based solely on this report.
The wording in the report refers to statistical "deviations" or "signal inconsistencies". Please do not use this to make fraud or deception claims against any service provider.