Research results

Current results for a research prototype with strict source-support gates.

MoodSpan reports strict unsupported claims, broad relevance flags, refusal tradeoffs, and corpus status separately. The current stored result clears the strict release target.

Strict Kira release gate

Passing

0.0% visible strict unsupported-claim rate against a 25.0% target.

Recall@5

92.0%

MRR

87.7%

NDCG@10

87.9%

Held-out queries

107

Groundedness

Strict contract unsupported

0.0%

Count
0 / 107
Generation-error sentinels
0 / 107 (0.0%)
95% interval
n/a to n/a
Release target
25.0% or lower

Decision: the strict release gate is passing on the stored artifact. The regression guard is separate and is not a release approval.

Statistical read: the stored artifact does not include enough interval data for this comparison. Generation-error sentinels are tracked separately from visible unsupported clinical claims.

Usefulness tradeoff

Overall refusals
23 / 107 (21.5%)
Clinical-depth refusals
4 / 41 (9.8%)
Clinical-depth correctness
2.46 / 5
Clinical-depth completeness
1.98 / 5

Passing groundedness means the system avoided unsupported claims in this artifact. It does not mean clinical-depth answers are useful enough yet.

Broad relevance flags

Overall broad flags
0 / 107 (0.0%)
Clinical-depth broad flags
0 / 41 (0.0%)

These rows are relevance and completeness flags from the broad judge, not the strict unsupported-claim release gate.

Corpus and review
Tracked article corpus
0 files
Corpus path
Hybrid quarantine: keep high-traffic and clinically cited articles, regenerate medium-quality pages, and retire low-traffic stubs.
Public content label
Use "curated set of N articles" after review, not a historical planning count.
Review state
Human groundedness and clinician review are not complete

Top failure types

No stored rows available.

Top failure categories

No stored rows available.

Clinical-depth modes

No stored rows available.

Next engineering target

Broad relevance rows are clear in the latest promoted artifact. Next work is held-out stress coverage, clinician review, and better usefulness on conservative refusals without relaxing source support.

Read methods