All dimensions

Dimension · score weight 0%

Context Window Probe

What this dimension detects

Advertised context length is a useful operational claim, but the current context-window surface is diagnostic only.

Algorithm

When enabled, send graduated long-context prompts with a recoverable needle and record whether the endpoint accepts the input and recovers the needle. Otherwise report the probe as not enabled.

Thresholds

ConditionVerdict contribution
Needle recovered at advertised windowDiagnostic match
Endpoint errors or loses the needle before the advertised windowDiagnostic anomaly
Any resultScore contribution remains 0

Limitations

Large-context probes are expensive and often disabled. Gateway cost caps, truncation policy, regional limits, or request-size limits can all explain failures without implying model substitution.

References

  • Liu et al. 'Lost in the Middle', 2023

Back to the full methodology