Improving The Quality Of Quality Metrics

This article first appeared in Health Affairs. It is co-authored with Aditya Narayan and Nirav R. Shah.

The evolution of health care quality metrics over the preceding decades illustrates an enduring commitment toward enhancing patient outcomes, refining the structure of health care delivery, and mitigating health care costs. However, these efforts are not without complications, with the current landscape of quality metrics resulting in an overabundance of instruments, measurement modalities, and competing priorities.

Consider the National Quality Measures Clearinghouse, which enumerated more than 2,500 performance measures—demonstrating the escalating focus on quantifiable indices of health care quality. Broadly, these domains range from outcome measures, such as the risk-adjusted rate of in-hospital hip fracture among patients older than age 65, to process measures, such as the percentage of patients with stable coronary artery disease who were provided lipid-lowering therapy. The economic ramifications of this issue are immense, with US physician practices allocating $15.4 billion annually in 2016 for adherence to quality measures, with likely year-on-year increases (Note 1). Concerningly, health care quality measures often lag behind updates in the evidence justifying their implementation. Moreover, transient metrics driven by shifting reimbursement policies complicate the development of cumulative science and longitudinal evaluation. (These shifting metrics may, however, improve true quality by limiting the ability of health systems to fixate on “gaming” metrics on the margin to maximize financial endpoints.)

The landscape of health care quality measures is dynamic, reflecting efforts to enhance patient care, safety, and outcomes. These measures are developed and reimbursed by a variety of stakeholders, including government agencies such as the Centers for Medicare and Medicaid Services (CMS), health care organizations, and independent bodies such as the National Quality Forum and the Agency for Healthcare Research and Quality (AHRQ). The development process involves rigorous research, stakeholder engagement, and consensus-building to ensure measures are evidence-based, applicable across different health care settings, and meaningful to patient care. For example, in 2017, CMS introduced the Meaningful Measures Initiative, aimed at identifying high priorities for improving patient care through quality measurement and improvement efforts. This initiative focuses on streamlining the measure development process and making it more responsive to the needs of patients and health care providers—ultimately reducing the number of measures by 18 percent and saving an estimated $128 million.

Designing effective quality measures involves inherent trade-offs between validity and reliability. A highly reliable measure may not capture nuances of patient care (thus compromising validity), while a highly valid measure might be too specific or complex to yield consistent results across different settings (affecting reliability). For instance, consider the metric of 30-day hospital readmission rates, which is widely used as a quality measure to assess the effectiveness of hospital care and transitions to other care settings. This measure is highly reliable because it offers a clear, quantifiable outcome that can be consistently tracked across different hospitals. However, its validity can be questioned given that some readmissions are planned as part of optimal patient care or are unrelated to the initial admission, thus not accurately reflecting the quality of hospital care or discharge planning.

Additionally, each clinical situation demands a tailored approach to measurement. One example is deciding whether to rely on intrinsic motivation versus implementing an explicit measurement that is imperfect with potential negative externalities. For example, fostering intrinsic motivation among health care providers for opioid stewardship practices and implementing explicit measures—such as tracking the rate of appropriate opioid prescriptions for chronic pain—presents a complex trade-off. While intrinsic motivation relies on education and professional judgment without direct metrics, explicit measurements offer clear accountability but may lead to unintended consequences, such as clinicians feeling pressured to reduce opioid prescriptions for patients in genuine need.

Amidst this crowded landscape, certain instruments such as the Consumer Assessment of Healthcare Providers and Systems (CAHPS), 36-Item Short Form Health Survey (SF-36), Patient-Reported Outcomes Measurement Information System (PROMIS), and Healthcare Effectiveness Data and Information Set (HEDIS) have achieved widespread adoption in measuring diverse health care quality domains, symbolizing strides toward standardized, reproducible outcome metrics. This Forefront article synthesizes lessons from widely adopted metrics to inform future quality metric design and development.

Best Practices In Health Care Quality Measurement

At the core of health care quality improvement are the leading instruments that have emerged due to their comprehensive approach to measuring patient care outcomes, experiences, and system performance. We have chosen CAHPS, SF-36, PROMIS, and HEDIS as exemplars because they collectively offer a broad perspective on the multifaceted nature of health care quality, ranging from patient satisfaction and health-related quality of life to the effectiveness of care and service delivery. Each instrument brings a unique lens through which health care quality is assessed, providing valuable insights for health care providers, patients, and policy makers alike. It is important to note that while these instruments offer overlapping question sets, they differ significantly in terms of their target groups, development processes, and the specific health care quality aspects they measure. Accordingly, we seek to describe lessons learned regarding the development and adoption of these tools.

In the context of seeking standardized metrics, in 1995, the AHRQ created the CAHPS program to enhance the understanding of patient experience in health care. CAHPS specifically centers on patient needs, including effective provider communication, service accessibility, and respect shown by providers. CAHPS has seen widespread adoption, with the 2023 CAHPS Health Plan Survey Database reporting results from more than 500 Medicaid and Children’s Health Insurance Program plans. This is attributable to it providing actionable feedback to health care providers for quality improvement, along with its utility in public reporting and health care consumer choice. In particular, CAHPS is used by CMS as an evaluative measure within the Medicare Advantage (MA) stars program, which, in turn, influences patient enrollment, pay-for-performance frameworks, public reporting efforts, and provisions of the Affordable Care Act (ACA) (in that it is used in the Hospital Value-Based Purchasing program to calculate value-based incentive payments for hospitals). Public reporting of CAHPS results on platforms such as the CMS Care Compare website has facilitated competitive advantage for health care entities as patients seek higher-rated plans and has been shown to support patient self-management and provider referrals. The successful implementation of CAHPS therefore underscores the importance of developing metrics that are actionable and aligned with economic incentives.

The SF-36 was conceived in 1992 by the RAND Corporation as a segment of the Medical Outcomes Study, evolving from a more extensive survey from the 1980s, with the aim to elucidate patient outcomes across a myriad of medical conditions and treatment settings. This instrument furnishes a comprehensive yet widely applicable measure of health-related quality of life, encompassing physical, mental, and social domains. The SF-36 has been extensively validated and used for measuring health-related quality of life across 50 countries as part of the International Quality of Life Assessment (IQOLA) Project. The IQOLA Project aimed to support the use of patient-reported outcomes measures across different countries and cultures, ensuring that the SF-36 could be reliably applied in multinational clinical trials. Noteworthy adaptations by health systems including Veterans Affairs (VA-36) have further increased its adoption. The SF-36 epitomizes the significance of crafting versatile, easy-to-administer, and easily validated instruments that can gauge health outcomes across a wide spectrum of patient populations.

Another cornerstone in the landscape of quality metrics is HEDIS, which traces its inception to 1991. Initially designated as the “HMO Employer Data and Information Set,” the nomenclature underwent an evolution to “Health Plan Employer Data and Information Set” in 1993’s Version 2.0 and transitioned to HEDIS by Version 3.0 in 1997. HEDIS allows for health plans to assess performance on dimensions of care and service, thereby facilitating the appraisal of health care quality. Of note, it is used by more than 90 percent of US health plans. The extensive adoption of HEDIS enables granular comparisons across plans. The National Committee for Quality Assurance (NCQA) orchestrates annual updates to HEDIS, ensuring its alignment with contemporary health care standards. Widespread adoption is attributable to the NCQA’s use of HEDIS for health plan accreditation.

Despite the wide array of extant instruments, capturing inherently subjective measures of health such as pain intensity or emotional well-being in care settings can pose a challenge. To this end, PROMIS emerged from the National Institutes of Health Roadmap initiative, which sought to fund innovative research tools advancing public health, aiming to create an efficient mechanism for evaluating self-reported health outcomes. The goal of PROMIS is to standardize patient outcome reporting, as many unique approaches were promulgated with little ability to compare results across studies or health care settings. PROMIS leverages modern technology and psychometrics to reliably capture patient-reported outcome measures on physical, emotional health, and social well-being across various chronic diseases. Its widespread adoption—as evidenced by more than 400 publications involving PROMIS—can be attributed to its ability to measure core domains of health across diverse clinical and research settings.

Lessons Learned

Reflecting on these developments, several key lessons can be drawn to inform future quality metric development. First, metrics should be reliable, valid, and capable of capturing health care’s intrinsic complexity. For instance, “optimal start time” for dialysis illustrates how a simple measure can encapsulate the multifaceted interactions patients have within the health care system. “Optimal start time” aims to quantify the efficiency of coordinating multiple stakeholders (primary care physician, nephrologist, and surgeon) and steps (recognizing the need for dialysis, coordinating catheter placement, educating the patient) involved in initiating a patient on dialysis. Essentially, this measure integrates various time-bound actions and coordination among stakeholders, reflecting the health care system’s efficiency and illustrating the need for metrics to capture parallel workstreams.

Second, while financial incentives are significant—as in the case of CAHPS’s use in MA star ratings and ACA plans—they must be balanced with appropriate implementation studies. One of the authors (Shah) previously spearheaded public investments in patient-centered medical homes (PCMHs) in New York State, leading to increased uptake of the model. However, later evidence suggested that the PCMH model was not reaching desired clinical or public health endpoints. Accordingly, the timing and manner of introducing financial incentives for these measures pose significant considerations. Should financial incentives be applied immediately to motivate swift changes in health care practices (for example, toward PCMH adoption and attaining a large enough sample size to see effects on public health), or is it more prudent to wait, observe, and analyze the potential impacts before taking action?

Third, technological involvement is crucial, not only in rethinking what quality metrics are measured but also in overhauling the labor-intensive measurement process itself. The evolution of PROMIS underscores this, as many patient-reported outcome measures are designed for in-office evaluations. With the advent of mobile phones, remote monitoring of biometrics, and digital health, we can instead rethink the way we gather patient-reported outcomes to better reflect the ways in which individuals operate in their daily lives. Lastly, employing novel technologies such as large language models may significantly reduce the burden of the measurement process while also facilitating innovative approaches to metric design by integrating qualitative patient narratives into more valid and reliable metrics for specific clinical scenarios.

Reflecting on the advancements brought forth by CAHPS, SF-36, PROMIS, and HEDIS provides insight into the formulation of future quality metrics aimed at enhancing patient-centered care and health care outcomes. Collectively, these insights advocate for a multidimensional approach for future quality metric design and adoption with the adaptability of metrics, the incorporation of technology, and an inclusive understanding of the health care environment as its core principles.

Note 1

While specific data on year-on-year increases since then is not explicitly available, the expectation of rising costs is based on trends such as the growing number of quality measures and the escalating complexity of compliance.

Leave a comment