Reproducibility report

Research prototype with strict gates, published artifacts, and visible limits.

This page records the current evaluation artifact, corpus manifest, held-out stress-test status, and source-support method. It is evidence for reproducible engineering work, not clinical validation.

Build reference

8964b74af5e3

Generated 2026-07-19

Response-quality JSON Corpus manifest Stress-test JSON

Strict unsupported

n/a

n/a / n/a promoted rows

Refusals

n/a

n/a / n/a promoted rows

Broad flags

n/a

n/a / n/a promoted rows

Stress strict

pending

waiting on held-out run

Method

Source-support gate.

Retrieve source chunks before answer construction.

Draft a short educational answer or abstain when evidence is thin.

Check citation coverage, source-local support, generation-error sentinels, and visible unsupported clinical claims.

Report strict unsupported separately from broad relevance and refusal tradeoffs.

Keep corpus restoration under hybrid quarantine until provenance and review are complete.

Held-out stress test

Mental-health-domain coverage outside the core eval.

The stress set uses 100 curated questions across eating disorders, OCD, addiction, postpartum presentations, complex trauma, psychosis, medication caveats, neurodivergence, and social-risk contexts.

The held-out stress set is defined, but the latest stress response-quality artifact has not been summarized yet.

Boundary

MoodSpan remains an educational research prototype. The strongest claim is measurement discipline: strict gates, artifacts, and known limits are visible before stronger clinical claims are considered.

Current results