Current results for a research prototype with strict source-support gates.
MoodSpan reports strict unsupported claims, broad relevance flags, refusal tradeoffs, and corpus status separately. The current stored result clears the strict release target.
Strict Kira release gate
Passing
0.0% visible strict unsupported-claim rate against a 25.0% target.
Recall@5
92.0%
MRR
87.7%
NDCG@10
87.9%
Held-out queries
107
Strict contract unsupported
0.0%
- Count
- 0 / 107
- Generation-error sentinels
- 0 / 107 (0.0%)
- 95% interval
- n/a to n/a
- Release target
- 25.0% or lower
Decision: the strict release gate is passing on the stored artifact. The regression guard is separate and is not a release approval.
Statistical read: the stored artifact does not include enough interval data for this comparison. Generation-error sentinels are tracked separately from visible unsupported clinical claims.
Usefulness tradeoff
- Overall refusals
- 23 / 107 (21.5%)
- Clinical-depth refusals
- 4 / 41 (9.8%)
- Clinical-depth correctness
- 2.46 / 5
- Clinical-depth completeness
- 1.98 / 5
Passing groundedness means the system avoided unsupported claims in this artifact. It does not mean clinical-depth answers are useful enough yet.
Broad relevance flags
- Overall broad flags
- 0 / 107 (0.0%)
- Clinical-depth broad flags
- 0 / 41 (0.0%)
These rows are relevance and completeness flags from the broad judge, not the strict unsupported-claim release gate.
- Tracked article corpus
- 0 files
- Corpus path
- Hybrid quarantine: keep high-traffic and clinically cited articles, regenerate medium-quality pages, and retire low-traffic stubs.
- Public content label
- Use "curated set of N articles" after review, not a historical planning count.
- Review state
- Human groundedness and clinician review are not complete
Top failure types
No stored rows available.
Top failure categories
No stored rows available.
Clinical-depth modes
No stored rows available.
Next engineering target
Broad relevance rows are clear in the latest promoted artifact. Next work is held-out stress coverage, clinician review, and better usefulness on conservative refusals without relaxing source support.