Dimension · score weight 20%

MMD Distribution Equivalence Test

What this dimension detects

Maximum Mean Discrepancy is a kernel two-sample test from Gao et al. ICLR 2025. TrueLLMs uses it only in differential mode with user-supplied trusted reference endpoint samples and sufficient stochastic samples.

Algorithm

Collect response samples from the audited endpoint and the trusted reference endpoint at temperature > 0, grouped by prompt. Build prompt-stratified sample pairs, take the first 100 raw characters of each response, compute MMD² with a Hamming kernel, and estimate a p-value by stratified permutations inside each prompt block.

Thresholds

Condition	Verdict contribution
No trusted reference, temperature ≤ 0, < 5 prompt pairs, or < 40 total samples	Unavailable; no synthetic baseline is invented
p ≥ 0.05	No statistically significant distribution difference observed
p < 0.05	Scored distribution mismatch; cause still needs interpretation

Limitations

A rejected null means the two response distributions differ. Quantization, fine-tuning, system prompts, regional routing, safety layers, and post-processing can all cause that. MMD is strongest when the reference endpoint is an official endpoint controlled by the user for the same claimed model.

References

Gao et al. Model Equality Testing: Which Model is this API Serving? ICLR 2025. arXiv:2410.20247
TrueLLMs lib/identity-audit/mmd.ts

Back to the full methodology