Dimension · score weight 10%

Sparse-Token Stress Test

What this dimension detects

Sparse-token stress is a weak generation-side fingerprint. It asks the model to echo fragile low-frequency token strings and checks whether the output side of the tuned model can still emit them.

Algorithm

Send echo-only prompts for rare CJK names, Chinese SEO strings, Japanese colloquial strings, and related low-frequency forms. Classify each response as hit, omit, substitute, partial, refuse, or blank. Aggregate hit rate drives the score; failure families are displayed for forensics but do not vote a specific vendor.

Thresholds

Condition	Verdict contribution
Hit rate ≥ 80%	Scored match; tested vocabulary coverage appears intact
50% ≤ hit rate < 80%	Borderline; inspect failures
Hit rate < 50% and ≥ 3 probes scored	Weak scored mismatch for generation-side drift

Limitations

The probe is vendor-independent and weak. Failures can reflect SFT data coverage, language specialization, safety wrappers, or sampling behavior. TrueLLMs does not yet have enough cross-vendor measured failure tables to infer an exact substitute model from this dimension.

References

MiniMax. Internal investigation: Ma Jiaqi (马嘉祺) sparse-token forgetting and lm_head drift, May 2026.
TrueLLMs lib/fingerprints/sparse-tokens-2026.ts

Back to the full methodology