Dimension · score weight 0%

Canary Prompts

What this dimension detects

Canary prompts are deterministic behavior probes with known or expected answers. In the current model, canary behavior is report-only; scored capability checks live in the separate capability-floor dimension.

Algorithm

Run the canary prompt set, compare responses with known-answer templates when available, and display misses or surprising alternatives. The diagnostic is useful for inspecting behavior but is excluded from the headline score.

Thresholds

Condition	Verdict contribution
Template hit	Diagnostic hit
Template miss or surprising alternative	Diagnostic miss
Any result	Score contribution remains 0

Limitations

Known-answer templates may be estimates or incomplete for a claimed model. Prompt wording and system prompts can change outputs. Use capability-floor for scored ground-truth grading.

References

TrueLLMs lib/fingerprints/canaries-2026.ts

Back to the full methodology