Reproducibility report
Research prototype with strict gates, published artifacts, and visible limits.
This page records the current evaluation artifact, corpus manifest, held-out stress-test status, and source-support method. It is evidence for reproducible engineering work, not clinical validation.
Build reference
02f6b6ae218a
Generated 2026-06-02
Strict unsupported
0.0%
0 / 107 promoted rows
Refusals
21.5%
23 / 107 promoted rows
Broad flags
0.0%
0 / 107 promoted rows
Stress strict
0.0%
0 / 100 held-out rows
Method
Source-support gate.
01
Retrieve source chunks before answer construction.
02
Draft a short educational answer or abstain when evidence is thin.
03
Check citation coverage, source-local support, generation-error sentinels, and visible unsupported clinical claims.
04
Report strict unsupported separately from broad relevance and refusal tradeoffs.
05
Keep corpus restoration under hybrid quarantine until provenance and review are complete.
Held-out stress test
Mental-health-domain coverage outside the core eval.
The stress set uses 100 curated questions across eating disorders, OCD, addiction, postpartum presentations, complex trauma, psychosis, medication caveats, neurodivergence, and social-risk contexts.
Strict unsupported target
0.0%
Target: 2.0% or lower
- Rows
- 100
- Refusal
- 25.0%
- Broad flags
- 31.0%
The held-out stress set clears the strict unsupported target. This is a source-support result, not a clinical validation result.
Artifact: data/eval/results/response-quality-2026-06-02T20-29-42-579Z.json
Boundary
MoodSpan remains an educational research prototype. The strongest claim is measurement discipline: strict gates, artifacts, and known limits are visible before stronger clinical claims are considered.