Reproducibility report

Research prototype with strict gates, published artifacts, and visible limits.

This page records the current evaluation artifact, corpus manifest, held-out stress-test status, and source-support method. It is evidence for reproducible engineering work, not clinical validation.

Build reference

02f6b6ae218a

Generated 2026-06-02

Strict unsupported

0.0%

0 / 107 promoted rows

Refusals

21.5%

23 / 107 promoted rows

Broad flags

0.0%

0 / 107 promoted rows

Stress strict

0.0%

0 / 100 held-out rows

Method

Source-support gate.

01

Retrieve source chunks before answer construction.

02

Draft a short educational answer or abstain when evidence is thin.

03

Check citation coverage, source-local support, generation-error sentinels, and visible unsupported clinical claims.

04

Report strict unsupported separately from broad relevance and refusal tradeoffs.

05

Keep corpus restoration under hybrid quarantine until provenance and review are complete.

Held-out stress test

Mental-health-domain coverage outside the core eval.

The stress set uses 100 curated questions across eating disorders, OCD, addiction, postpartum presentations, complex trauma, psychosis, medication caveats, neurodivergence, and social-risk contexts.

Strict unsupported target

0.0%

Target: 2.0% or lower

Clears target
Rows
100
Refusal
25.0%
Broad flags
31.0%

The held-out stress set clears the strict unsupported target. This is a source-support result, not a clinical validation result.

Artifact: data/eval/results/response-quality-2026-06-02T20-29-42-579Z.json

Boundary

MoodSpan remains an educational research prototype. The strongest claim is measurement discipline: strict gates, artifacts, and known limits are visible before stronger clinical claims are considered.

Current results