All dimensions

Dimension · score weight 0%

Canary Prompts

What this dimension detects

Canary prompts are deterministic behavior probes with known or expected answers. In the current model, canary behavior is report-only; scored capability checks live in the separate capability-floor dimension.

Algorithm

Run the canary prompt set, compare responses with known-answer templates when available, and display misses or surprising alternatives. The diagnostic is useful for inspecting behavior but is excluded from the headline score.

Thresholds

ConditionVerdict contribution
Template hitDiagnostic hit
Template miss or surprising alternativeDiagnostic miss
Any resultScore contribution remains 0

Limitations

Known-answer templates may be estimates or incomplete for a claimed model. Prompt wording and system prompts can change outputs. Use capability-floor for scored ground-truth grading.

References

  • TrueLLMs lib/fingerprints/canaries-2026.ts

Back to the full methodology