Dimension · score weight 0%

Context Window Probe

What this dimension detects

Advertised context length is a useful operational claim, but the current context-window surface is diagnostic only.

Algorithm

When enabled, send graduated long-context prompts with a recoverable needle and record whether the endpoint accepts the input and recovers the needle. Otherwise report the probe as not enabled.

Thresholds

Condition	Verdict contribution
Needle recovered at advertised window	Diagnostic match
Endpoint errors or loses the needle before the advertised window	Diagnostic anomaly
Any result	Score contribution remains 0

Limitations

Large-context probes are expensive and often disabled. Gateway cost caps, truncation policy, regional limits, or request-size limits can all explain failures without implying model substitution.

References

Liu et al. 'Lost in the Middle', 2023

Back to the full methodology