All dimensions

Detection dimension · weight 6%

Context Window Probe

What this dimension detects

If a model claims a 200k context window but the proxy errors at 32k, the routing or the middleware is not what is advertised. There are legitimate explanations (gateway cost caps, safety truncation, regional limits) — this dimension flags the mismatch, it does not assert substitution.

Algorithm

Send a graduated context probe: 4k, 16k, 64k, 200k tokens of stable filler with a needle near the end. Confirm the needle is recovered at each step.

Thresholds

ConditionVerdict contribution
Needle recovered at advertised windowMatch
Needle lost before advertised windowMismatch

Limitations

Probe is expensive (≥ 200k input tokens) so it is off by default. Enable for the Deep preset only.

References

  • Long-context evaluation in Liu et al., 'Lost in the Middle', 2023

Back to the full methodology