Screeners14 min read

PHQ-9 (Patient Health Questionnaire-9): Scoring, Interpretation, and Clinical Use

Learn how the PHQ-9 screens for depression: what it measures, how it's scored, clinical validity, limitations, and how clinicians use results in practice.

Last updated: 2025-12-18Reviewed by MoodSpan Clinical Team

Medical Disclaimer: This content is for informational and educational purposes only. It is not a substitute for professional medical advice, diagnosis, or treatment. Always seek the advice of a qualified health provider with any questions you may have regarding a medical condition.

What Is the PHQ-9?

The PHQ-9 (Patient Health Questionnaire-9) is a brief, self-report screening instrument designed to assess the presence and severity of depressive symptoms. It consists of nine items, each corresponding directly to one of the nine diagnostic criteria for major depressive disorder (MDD) as defined in the Diagnostic and Statistical Manual of Mental Disorders (DSM-5-TR).

Developed by Drs. Robert L. Spitzer, Janet B.W. Williams, and Kurt Kroenke in collaboration with Pfizer Inc., the PHQ-9 was introduced in 2001 as part of the broader Patient Health Questionnaire system. It was originally derived from the PRIME-MD (Primary Care Evaluation of Mental Disorders) diagnostic instrument, with the goal of creating a tool that was short enough for routine clinical use but rigorous enough for meaningful clinical interpretation.

The PHQ-9 is one of the most widely used depression screening instruments in the world. It is employed across primary care, specialty mental health settings, research contexts, and population-level health surveys. Its brevity — typically completed in under five minutes — and its strong psychometric properties have made it a standard tool in evidence-based clinical practice.

It is important to understand that the PHQ-9 is a screening and severity-monitoring tool, not a diagnostic instrument. A PHQ-9 score alone does not establish a diagnosis of major depressive disorder. Diagnosis requires a comprehensive clinical evaluation by a qualified professional, including assessment of differential diagnoses, medical conditions, substance use, and the full clinical picture.

What Does the PHQ-9 Measure?

Each of the nine items on the PHQ-9 directly maps onto one of the nine symptom criteria for major depressive disorder in the DSM-5-TR. The respondent is asked to rate how often they have been bothered by each symptom over the past two weeks. The nine domains assessed are:

  • Anhedonia — Little interest or pleasure in doing things
  • Depressed mood — Feeling down, depressed, or hopeless
  • Sleep disturbance — Trouble falling or staying asleep, or sleeping too much
  • Fatigue — Feeling tired or having little energy
  • Appetite changes — Poor appetite or overeating
  • Negative self-evaluation — Feeling bad about yourself, or that you are a failure, or have let yourself or your family down
  • Concentration difficulties — Trouble concentrating on things such as reading or watching television
  • Psychomotor changes — Moving or speaking so slowly that other people could have noticed, or the opposite — being so fidgety or restless that you have been moving around more than usual
  • Suicidal ideation — Thoughts that you would be better off dead, or of hurting yourself in some way

Each item is scored on a four-point Likert scale ranging from 0 (not at all) to 3 (nearly every day). This structure allows the PHQ-9 to function both as a categorical screener (identifying possible cases of depression) and as a continuous measure of symptom severity.

An additional tenth item — not scored as part of the total — asks respondents to rate how difficult the endorsed symptoms have made it to do work, take care of things at home, or get along with other people. This functional impairment question provides important clinical context, as the DSM-5-TR requires clinically significant distress or functional impairment for a diagnosis of major depressive disorder.

Who Is the PHQ-9 Designed For?

The PHQ-9 was originally developed and validated for use in adult primary care populations. It has since been validated across a wide range of settings and populations, including:

  • Primary care and general medical settings — where it is most commonly used as a routine screening tool
  • Specialty mental health clinics — for intake assessment and ongoing symptom monitoring
  • Obstetric and perinatal care — for screening for perinatal depression (though the Edinburgh Postnatal Depression Scale is also frequently used in this context)
  • Geriatric populations — validated in older adults, though clinicians should be attentive to somatic symptom overlap with medical conditions
  • Chronic disease populations — including patients with diabetes, cardiovascular disease, cancer, and chronic pain, where comorbid depression is common
  • Research settings — as an outcome measure in clinical trials and epidemiological studies

For adolescents, a modified version called the PHQ-A (PHQ-Adolescent) has been developed and validated for use in individuals aged 12–17. The standard PHQ-9 is generally considered appropriate for adults aged 18 and older.

The PHQ-9 has been translated into over 80 languages and validated across diverse cultural and ethnic populations. However, clinicians should remain mindful that cultural factors influence the expression and reporting of depressive symptoms, and normative data may vary across populations.

The instrument is not designed to screen for bipolar depression, psychotic features, or other psychiatric conditions that can present with depressive symptoms. A positive screen should always prompt further clinical evaluation to rule out these and other differential diagnoses.

How Is the PHQ-9 Administered?

The PHQ-9 is designed for self-administration, meaning patients complete it on their own, typically on paper or through an electronic platform. It can also be administered verbally by a clinician or trained interviewer, which is useful for patients with low literacy, visual impairment, or cognitive difficulties.

Administration takes approximately 2–5 minutes, and scoring can be completed in under one minute. This brevity is one of the tool's greatest clinical strengths — it imposes minimal burden on both patients and clinical workflow.

Common administration contexts include:

  • Waiting room screening — completed on paper or tablet before an appointment
  • Electronic health record (EHR) integration — many EHR systems include the PHQ-9 as a standardized intake or follow-up measure
  • Telehealth and digital platforms — administered through patient portals or digital health applications
  • Serial monitoring — repeated at regular intervals (e.g., every 2–4 weeks during treatment) to track symptom change over time

The time frame referenced in the PHQ-9 is the previous two weeks, aligning with the DSM-5-TR requirement that symptoms of a major depressive episode persist for at least two weeks. This standardized window provides a consistent snapshot of recent symptom burden.

When administered as part of a broader clinical assessment, the PHQ-9 is often paired with other instruments, such as the GAD-7 (Generalized Anxiety Disorder-7) for anxiety screening, to capture common comorbid symptom patterns.

Scoring and Interpretation

Scoring the PHQ-9 is straightforward. Each of the nine items is rated from 0 to 3, yielding a total score range of 0 to 27. The widely accepted severity thresholds, established in the original validation studies by Kroenke, Spitzer, and Williams (2001), are:

  • 0–4: Minimal or no depression
  • 5–9: Mild depression
  • 10–14: Moderate depression
  • 15–19: Moderately severe depression
  • 20–27: Severe depression

A score of 10 or above is the most commonly used clinical cutoff for identifying possible major depression and triggering further evaluation. At this threshold, the PHQ-9 demonstrates a sensitivity of approximately 88% and a specificity of approximately 88% for major depressive disorder, based on the original validation study.

Beyond the total score, the PHQ-9 can also be used with a diagnostic algorithm approach. Under this method, a provisional diagnosis of major depression is suggested when:

  • Five or more of the nine items are endorsed at a score of 2 ("more than half the days") or 3 ("nearly every day"), and
  • At least one of the endorsed items is Item 1 (anhedonia) or Item 2 (depressed mood)

This algorithm mirrors the DSM-5-TR symptom count requirement for a major depressive episode. However, research has shown that the total severity score approach tends to have better sensitivity than the algorithm method and is more commonly used in clinical practice.

Item 9 — suicidal ideation — requires special clinical attention. Any positive response to this item (a score of 1, 2, or 3) should prompt immediate follow-up, including a more detailed suicide risk assessment. A positive response does not necessarily indicate imminent danger, but it signals that further evaluation is clinically necessary.

For treatment monitoring, a change of 5 or more points on the PHQ-9 is generally considered a clinically meaningful change. A score reduction to below 5 is often used as a benchmark for remission in treatment outcome research.

Clinical Validity and Reliability

The PHQ-9 has one of the strongest evidence bases of any depression screening instrument. Its psychometric properties have been extensively studied across diverse populations and clinical settings.

Reliability:

  • Internal consistency is excellent, with Cronbach's alpha values typically ranging from 0.86 to 0.89 across validation studies. This indicates that the nine items reliably measure a coherent underlying construct.
  • Test-retest reliability is strong, with intraclass correlation coefficients generally exceeding 0.80 when the measure is administered within a short interval without intervening treatment changes.

Validity:

  • Criterion validity — When compared against structured clinical interviews (such as the SCID — Structured Clinical Interview for DSM Disorders), the PHQ-9 demonstrates strong performance. At a cutoff of 10, sensitivity and specificity for major depression are both approximately 88%, as established in the landmark Kroenke et al. (2001) validation study.
  • Construct validity — PHQ-9 scores correlate strongly with other validated depression measures, including the Beck Depression Inventory-II (BDI-II) and the Hamilton Rating Scale for Depression (HRSD). Scores also correlate with functional impairment, disability days, and healthcare utilization, supporting its construct validity.
  • Responsiveness to change — The PHQ-9 is sensitive to changes in depression severity over time, making it well-suited for tracking treatment response. This responsiveness has been demonstrated in pharmacotherapy trials, psychotherapy studies, and collaborative care interventions.

A large-scale meta-analysis published in the Journal of General Internal Medicine (Levis et al., 2019) involving over 58,000 participants confirmed that the PHQ-9 performs well across diverse settings, though it noted that diagnostic accuracy varies somewhat depending on the clinical context and reference standard used.

The U.S. Preventive Services Task Force (USPSTF) recommends screening for depression in the general adult population, and the PHQ-9 is one of the instruments they cite as appropriate for this purpose.

Limitations of the PHQ-9

Despite its strengths, the PHQ-9 has important limitations that clinicians and patients should understand:

  • It is a screening tool, not a diagnostic instrument. A high PHQ-9 score indicates the presence of depressive symptoms but does not establish a clinical diagnosis. Symptoms of depression overlap with medical conditions (e.g., hypothyroidism, anemia, sleep disorders), medication side effects, substance use, grief, adjustment reactions, and other psychiatric conditions such as bipolar disorder or PTSD. A comprehensive clinical evaluation is always necessary.
  • It does not assess all relevant clinical features. The PHQ-9 does not evaluate the duration of individual episodes, the pattern of recurrence, the presence of manic or hypomanic episodes, psychotic features, or contextual factors that are critical to accurate diagnosis and treatment planning.
  • Self-report bias. Like all self-report instruments, the PHQ-9 is subject to response biases, including social desirability, minimization, or exaggeration of symptoms. Patients experiencing severe depression may underreport due to hopelessness or cognitive impairment, while others may over-endorse symptoms to communicate distress.
  • Somatic symptom overlap. Several PHQ-9 items (fatigue, appetite changes, sleep disturbance, psychomotor changes) assess symptoms that are common in medical illness. In medically complex populations — such as patients with cancer, chronic pain, or heart failure — elevated PHQ-9 scores may partly reflect physical illness rather than depression, potentially reducing specificity.
  • Cultural considerations. While the PHQ-9 has been translated and validated in many languages, the expression and interpretation of depressive symptoms vary across cultures. Some populations may emphasize somatic rather than emotional symptoms, and certain items may carry different connotations across cultural contexts.
  • False positives. At the commonly used cutoff of 10, the PHQ-9 generates a meaningful rate of false positives — individuals who screen positive but do not meet criteria for major depression upon structured diagnostic interview. This underscores the importance of treating the PHQ-9 as a first-step screen rather than a definitive assessment.
  • Item 9 limitations. While the suicidal ideation item provides valuable clinical information, it is a single question and is not sufficient for a comprehensive suicide risk assessment. Any positive endorsement requires further evaluation with validated suicide-specific tools and clinical judgment.

How Results Are Used in Clinical Practice

Clinicians use PHQ-9 results in several interconnected ways:

1. Screening and Case Identification

In primary care, the PHQ-9 is frequently used as a universal or targeted screen. Patients scoring at or above the threshold of 10 are flagged for further evaluation. Many health systems incorporate the PHQ-9 into annual wellness visits, new patient intakes, or visits for populations at elevated risk (e.g., patients with chronic illness, postpartum individuals, older adults).

2. Severity Assessment and Treatment Planning

The severity categories help guide clinical decision-making. General practice patterns informed by the PHQ-9 include:

  • Scores of 5–9 (mild): Watchful waiting, psychoeducation, lifestyle interventions, and repeat screening in 2–4 weeks may be appropriate. Some patients may benefit from brief psychotherapeutic interventions.
  • Scores of 10–14 (moderate): Clinical evaluation for major depression; consideration of psychotherapy, pharmacotherapy, or both, depending on patient preference and clinical context.
  • Scores of 15–19 (moderately severe): Active treatment is typically indicated, and a combination of pharmacotherapy and psychotherapy is often recommended.
  • Scores of 20–27 (severe): Prompt treatment with combined approaches; consideration of specialty mental health referral; close monitoring, including assessment of safety.

3. Treatment Monitoring and Measurement-Based Care

One of the PHQ-9's most valuable applications is in measurement-based care (MBC) — the systematic use of standardized measures to track treatment response and guide clinical decisions. When administered at regular intervals (e.g., every 2–4 weeks), the PHQ-9 provides objective data about whether symptoms are improving, stable, or worsening. Research consistently demonstrates that measurement-based care improves treatment outcomes compared to clinical judgment alone.

A clinically significant improvement is typically defined as a reduction of 5 or more points. Treatment response is often defined as a 50% or greater reduction from baseline. Remission is generally benchmarked at a score of less than 5.

4. Collaborative Care Models

The PHQ-9 is a foundational tool in the Collaborative Care Model (CoCM), an evidence-based approach to integrating behavioral health into primary care. In this model, care managers use PHQ-9 scores to track a patient registry, identify patients who are not responding to treatment, and facilitate psychiatric consultation for treatment adjustments.

5. Population Health and Quality Metrics

Many healthcare systems and payers use PHQ-9 scores as quality metrics. For example, the Healthcare Effectiveness Data and Information Set (HEDIS) includes depression screening and follow-up measures that rely on the PHQ-9. These metrics are used to evaluate the quality of depression care at the system level.

Where to Access the PHQ-9

The PHQ-9 is in the public domain. No permission is required to reproduce, translate, display, or distribute it. There are no licensing fees or copyright restrictions. This open accessibility is one of the reasons it has become so widely adopted globally.

Reliable sources for accessing the PHQ-9 include:

  • The original publisher — Pfizer, which funded the development, has made the instrument freely available. It can be found through the PHQ Screeners website (phqscreeners.com).
  • The American Psychological Association (APA) — Provides links to validated versions of the PHQ-9.
  • Medical education and clinical reference platforms — Sites such as MDCalc provide the PHQ-9 with built-in scoring calculators.
  • Electronic health record systems — Many EHR platforms (e.g., Epic, Cerner) include the PHQ-9 as a built-in clinical tool.
  • Translated versions — Validated translations in over 80 languages are available through the PHQ Screeners website and academic publications.

Related instruments in the PHQ family include:

  • PHQ-2 — An ultra-brief two-item screener using only the first two PHQ-9 items (anhedonia and depressed mood). Often used as a first-stage screen; patients who score 3 or higher are then given the full PHQ-9.
  • PHQ-A — The adolescent version validated for ages 12–17.
  • PHQ-15 — A somatic symptom severity scale.
  • PHQ-SADS — A combined instrument assessing depression, anxiety, and somatic symptoms.

When to Seek Professional Help

If you have completed the PHQ-9 and your responses suggest patterns consistent with moderate, moderately severe, or severe depressive symptoms — particularly if you endorsed Item 9 regarding thoughts of death or self-harm — it is important to seek evaluation from a qualified healthcare provider. This includes primary care physicians, psychiatrists, psychologists, licensed clinical social workers, and other licensed mental health professionals.

Depression is a highly treatable condition. Effective, evidence-based treatments include several forms of psychotherapy (such as cognitive behavioral therapy and interpersonal therapy), pharmacotherapy, and combined approaches. Early identification and intervention are associated with better outcomes.

Even if your symptoms appear mild, persistent low mood, loss of interest, or difficulty functioning in daily life warrant professional attention. A clinician can provide a thorough evaluation, consider differential diagnoses, and develop a personalized treatment plan.

If you are in crisis or having thoughts of suicide, contact the 988 Suicide and Crisis Lifeline by calling or texting 988 (in the United States), go to your nearest emergency department, or call emergency services.

Frequently Asked Questions

What is a good score on the PHQ-9?

A PHQ-9 score of 0–4 falls in the minimal depression range, suggesting few or no significant depressive symptoms. Scores of 5–9 indicate mild symptoms. Lower scores generally reflect better mental health, but any score should be interpreted within the full clinical context by a qualified professional.

Is the PHQ-9 enough to diagnose depression?

No. The PHQ-9 is a screening tool that identifies symptoms consistent with depression, but it cannot establish a clinical diagnosis on its own. A diagnosis of major depressive disorder requires a comprehensive evaluation by a qualified clinician, including assessment of symptom duration, differential diagnoses, and functional impairment.

How often should the PHQ-9 be administered?

In treatment settings, the PHQ-9 is commonly administered every 2–4 weeks to track symptom changes over time. For routine screening in primary care, it may be administered annually or at initial intake. The frequency depends on clinical context and the purpose of assessment.

Can I take the PHQ-9 online for free?

Yes. The PHQ-9 is in the public domain and available at no cost. It can be accessed through sites like phqscreeners.com, MDCalc, and many healthcare provider portals. However, completing it on your own does not substitute for professional evaluation — results should ideally be reviewed with a clinician.

What does it mean if I scored high on Item 9 about self-harm?

Any positive response to Item 9 — which asks about thoughts of being better off dead or hurting yourself — should be taken seriously and discussed with a healthcare provider promptly. A positive score on this item does not necessarily mean you are in immediate danger, but it signals a need for further safety assessment. If you are in crisis, contact the 988 Suicide and Crisis Lifeline.

What is the difference between the PHQ-2 and PHQ-9?

The PHQ-2 is an ultra-brief screener that uses only the first two items of the PHQ-9 — depressed mood and anhedonia (loss of interest). It is often used as a quick first-step screen. If a patient scores 3 or higher on the PHQ-2, the full PHQ-9 is typically administered for a more detailed assessment of depressive symptom severity.

Can the PHQ-9 detect bipolar disorder?

No. The PHQ-9 assesses depressive symptoms only and does not screen for manic or hypomanic episodes, which are required for a diagnosis of bipolar disorder. Using the PHQ-9 alone in someone with unrecognized bipolar disorder could lead to an incomplete clinical picture. A comprehensive diagnostic evaluation is necessary to distinguish unipolar depression from bipolar disorder.

Is a PHQ-9 score of 10 considered depressed?

A score of 10 is the most commonly used cutoff for identifying clinically significant depressive symptoms that warrant further evaluation. It falls at the lower boundary of the moderate depression range. However, a score of 10 does not automatically mean a person has major depressive disorder — it means a thorough clinical assessment is recommended.

Sources & References

  1. Kroenke K, Spitzer RL, Williams JBW. The PHQ-9: validity of a brief depression severity measure. Journal of General Internal Medicine. 2001;16(9):606-613. (primary_clinical)
  2. Levis B, Benedetti A, Thombs BD, et al. Accuracy of Patient Health Questionnaire-9 (PHQ-9) for screening to detect major depression: individual participant data meta-analysis. BMJ. 2019;365:l1476. (meta_analysis)
  3. American Psychiatric Association. Diagnostic and Statistical Manual of Mental Disorders, Fifth Edition, Text Revision (DSM-5-TR). Washington, DC: American Psychiatric Publishing; 2022. (clinical_guideline)
  4. Spitzer RL, Kroenke K, Williams JBW. Validation and utility of a self-report version of PRIME-MD: the PHQ primary care study. JAMA. 1999;282(18):1737-1744. (primary_clinical)
  5. US Preventive Services Task Force. Screening for Depression in Adults: US Preventive Services Task Force Recommendation Statement. JAMA. 2023;329(23):2057-2067. (clinical_guideline)
  6. Kroenke K, Spitzer RL, Williams JBW, Löwe B. The Patient Health Questionnaire Somatic, Anxiety, and Depressive Symptom Scales: a systematic review. General Hospital Psychiatry. 2010;32(4):345-359. (systematic_review)