All dimensions

Detection dimension · weight 1%

Refusal Boundary

What this dimension detects

Each vendor has a recognisable refusal style. Anthropic uses 'I can't help with that' phrasing; OpenAI uses 'I'm sorry, but I can't help with that request'.

Algorithm

Send two safety-edge prompts at the policy boundary, classify the refusal phrasing template, and compare to the claimed vendor.

Thresholds

ConditionVerdict contribution
Refusal template matches claimed vendorMatch
Refusal template matches different vendorMismatch

Limitations

Aggressive proxies rewrite refusals. LLMmap covers the same ground more reliably; this dimension is now weighted at 1%.

Back to the full methodology