Research results

Current results for a research prototype with strict source-support gates.

MoodSpan reports strict unsupported claims, broad relevance flags, refusal tradeoffs, and corpus status separately. The strict gate is waiting on a fresh credentialed result.

Strict Kira release gate

Passing

n/a visible strict unsupported-claim rate against a 25.0% target.

Recall@5

n/a

MRR

n/a

NDCG@10

n/a

Held-out queries

Groundedness

Strict contract unsupported

n/a

Count: n/a / n/a
Generation-error sentinels: n/a / n/a (n/a)
95% interval: n/a to n/a
Release target: 25.0% or lower

Decision: new Kira behavior remains blocked until a fresh credentialed run clears the strict target. The regression guard is separate and is not a release approval.

Statistical read: the stored artifact does not include enough interval data for this comparison. Generation-error sentinels are tracked separately from visible unsupported clinical claims.

Usefulness tradeoff

Overall refusals: n/a / n/a (n/a)
Clinical-depth refusals: n/a / n/a (n/a)
Clinical-depth correctness: n/a / 5
Clinical-depth completeness: n/a / 5

Passing groundedness means the system avoided unsupported claims in this artifact. It does not mean clinical-depth answers are useful enough yet.

Broad relevance flags

Overall broad flags: n/a / n/a (n/a)
Clinical-depth broad flags: n/a / n/a (n/a)

These rows are relevance and completeness flags from the broad judge, not the strict unsupported-claim release gate.

Corpus and review

Tracked article corpus: 0 files
Corpus path: Hybrid quarantine: keep high-traffic and clinically cited articles, regenerate medium-quality pages, and retire low-traffic stubs.
Public content label: Use "curated set of N articles" after review, not a historical planning count.
Review state: Human groundedness and clinician review are not complete

Top failure types

No stored rows available.

Top failure categories

No stored rows available.

Clinical-depth modes

No stored rows available.

Next engineering target

Next work should tighten off-target salvage and improve differential/detail completeness without raising strict unsupported above target or refusal above the current envelope.

Read methods