Dimension · score weight 20%
MMD Distribution Equivalence Test
What this dimension detects
Maximum Mean Discrepancy is a kernel two-sample test from Gao et al. ICLR 2025. TrueLLMs uses it only in differential mode with user-supplied trusted reference endpoint samples and sufficient stochastic samples.
Algorithm
Collect response samples from the audited endpoint and the trusted reference endpoint at temperature > 0, grouped by prompt. Build prompt-stratified sample pairs, take the first 100 raw characters of each response, compute MMD² with a Hamming kernel, and estimate a p-value by stratified permutations inside each prompt block.
Thresholds
| Condition | Verdict contribution |
|---|---|
| No trusted reference, temperature ≤ 0, < 5 prompt pairs, or < 40 total samples | Unavailable; no synthetic baseline is invented |
| p ≥ 0.05 | No statistically significant distribution difference observed |
| p < 0.05 | Scored distribution mismatch; cause still needs interpretation |
Limitations
A rejected null means the two response distributions differ. Quantization, fine-tuning, system prompts, regional routing, safety layers, and post-processing can all cause that. MMD is strongest when the reference endpoint is an official endpoint controlled by the user for the same claimed model.
References
- Gao et al. Model Equality Testing: Which Model is this API Serving? ICLR 2025. arXiv:2410.20247
- TrueLLMs lib/identity-audit/mmd.ts