Relationship between FEV1 change and patient-reported outcomes in randomised trials of inhaled bronchodilators for stable COPD: a systematic review

Background Interactions between spirometry and patient-reported outcomes in COPD are not well understood. This systematic review and study-level analysis investigated the relationship between changes in FEV1 and changes in health status with bronchodilator therapy. Methods Six databases (to October 2009) were searched to identify studies with long-acting bronchodilator therapy reporting FEV1 and health status, dyspnoea or exacerbations. Mean and standard deviations of treatment effects were extracted for each arm of each study. Relationships between changes in trough FEV1 and outcomes were assessed using correlations and random-effects regression modelling. The primary outcome was St George's Respiratory Questionnaire (SGRQ) total score. Results Thirty-six studies (≥3 months) were included. Twenty-two studies (23,654 patients) with 49 treatment arms each contributing one data point provided SGRQ data. Change in trough FEV1 and change in SGRQ total score were negatively correlated (r = -0.46, p < 0.001); greater increases in FEV1 were associated with greater reductions (improvements) in SGRQ. The correlation strengthened with increasing study duration from 3 to 12 months. Regression modelling indicated that 100 mL increase in FEV1 (change at which patients are more likely to report improvement) was associated with a statistically significant reduction in SGRQ of 2.5 (95% CI 1.9, 3.1), while a clinically relevant SGRQ change (4.0) was associated with 160.6 (95% CI 129.0, 211.6) mL increase in FEV1. The association between change in FEV1 and other patient-reported outcomes was generally weak. Conclusions Our analyses indicate, at a study level, that improvement in mean trough FEV1 is associated with proportional improvements in health status.


Introduction
Chronic obstructive pulmonary disease (COPD) is a complex, chronic condition, which is characterised by progressive airflow limitation that is not fully reversible. The major symptoms of COPD, such as dyspnoea, cough and sputum production, are disabling and have substantial impact on both patients' health status and the health care system [1,2]. Although treatment involves several approaches, bronchodilator medications are central to the management of COPD, improving both lung function and symptoms [1].
The complex nature of COPD means that it is important to assess treatment effectiveness in terms of patient-reported outcomes, including symptoms or health status scores [3]. Clinicians and policy makers have recognised the importance of measuring health status, in order to make informed patient management and policy decisions [4], and clinician-led guidelines recommend this approach for COPD [1,2]. However, regulatory authorities continue to emphasise airflow obstruction, measured by spirometry, as the primary outcome required for registration trials of new bronchodilators. It is therefore relevant to establish if and how changes in lung function may translate into patient-reported outcomes.
Although primary studies with bronchodilators frequently report both spirometry and patient-reported outcomes, the relationships between outcome measures are poorly understood. A study by Stahl et al. published in 2001, showed weak correlations between the St George's Respiratory Questionnaire (SGRQ) and cough, breathlessness, forced expiratory volume in 1 second (FEV 1 ) and walking distance but reported only limited supporting patient level data [5]. Study-level meta-analysis is a meaningful and cost-effective approach to addressing a clinical research question, particularly where individual patient data is difficult to obtain [6]. We are unaware of any study level analysis which has specifically addressed how lung function is related to outcomes.
The present study was a systematic review of randomised controlled trials (RCTs) of inhaled bronchodilators in adult patients with stable COPD, which reported change in trough FEV 1 , the primary physiological outcome in most studies of long-acting bronchodilators, alongside patient-reported outcomes. The primary objective was to assess at a study level the relationship between FEV 1 change and health status change, as measured by the SGRQ, and to estimate the increase in mean FEV 1 associated with a clinically important improvement in health status. As secondary objectives, we assessed the relationship between change in FEV 1 and SGRQ domains, the influence of study duration, and the relationship between change in FEV 1 and change in other patient-reported outcomes, such as dyspnoea, as measured by the Transition Dyspnea Index (TDI), and COPD exacerbations.

Search strategy
We sought all relevant trials regardless of language or publication status (published, unpublished, in press, and in progress). The following databases were searched: Search strategies with keywords were developed specifically for each database: the search strategy for MED-LINE is provided in Additional file 1. In addition, databases of completed and ongoing trials such as Clini-calTrials.gov, websites of licensing agencies, the Guidelines International Network and worldwide HTA were searched and references in retrieved articles and systematic reviews were checked.

Selection criteria
Our selection criteria included published and unpublished, parallel, RCTs of ≥12 weeks duration. Non-RCTs were excluded, given that RCTs represent the most robust level of efficacy evidence, especially for outcomes reported by patients. Studies had to include COPD patients (according to any definition) aged ≥35 years with stable disease (no exacerbations for at least 4 weeks prior to study entry or 'stable COPD' as an inclusion criteria), chronic bronchitis (excluding acute/spastic bronchitis), or emphysema. Trials which recruited mixed populations (e.g. asthma and COPD) were excluded, unless separate data were reported for COPD patients.
We included all studies that had intervention treatment arms using a long-acting inhaled bronchodilator treatment as monotherapy for stable COPD, e.g. longacting β2-agonists (LABA), long-acting muscarinic antagonists (LAMA), LABA + LAMA combinations, methylxanthines and placebo, thus limiting the analysis to drugs with similar pharmacodynamic properties. The comparator treatment could include a placebo or any of the interventions listed above. Short-acting treatment arms were excluded. Studies had to report change in trough FEV 1 from baseline and at least one patientreported outcome (health status [SGRQ], exacerbations or dyspnoea [TDI]). Trough FEV 1 was extracted as reported in the primary studies. Although there was some variation in details provided, this was usually defined as the measurement of FEV 1 taken before the first morning dose. Both the SGRQ and TDI are disease specific questionnaires. The SGRQ consists of three domains (Symptoms, Activity and Impacts) and a Total score which provides values between 0 and 100. Higher values correspond to greater impairment, with a 4 unit change in total score considered to be the minimal clinically important difference (MCID) [7]. The TDI represents a change from baseline and provides values between -9 and 9 with positive values indicating improvement and a 1 unit change representing the MCID [8].
Trial selection, data extraction and quality assessment Two reviewers (MW and GW) independently inspected the abstract of each reference identified to determine potential relevance. For potentially relevant articles, or in cases of disagreement, the full article was obtained, independently inspected, and inclusion criteria applied. Any disagreement was resolved through discussion and checked by a third reviewer. Data for each study were extracted by one reviewer and checked for accuracy by a second reviewer, using a standardised data extraction sheet. Any disagreements were resolved by consensus. Baseline and endpoint data were extracted where available, otherwise, change from baseline data were extracted. Outcome data were extracted for all available time points. If studies did not report numerical data, values were estimated from graphs, and standard deviations were imputed using weighted averages from other studies which included the same drug comparison and time point, in line with recommended methodology [9].
Quality assessment was carried out by one reviewer, using the Cochrane Collaboration quality assessment checklist, and checked for accuracy by a second reviewer. Any disagreements were resolved by consensus. Results are summarised in Additional file 2.

Data analysis
The relationship between mean changes in FEV 1 and mean changes in SGRQ scores for each treatment arm from each study was assessed visually using scatter plots. Plots were constructed for SGRQ total score and SGRQ domains (Symptoms, Activity and Impacts) at any time point measured; where studies reported multiple time points, only data for the 6 month time point (the most frequently measured time point across studies) were used for analyses that include all time points. For the relationship between changes in FEV 1 and SGRQ total score, separate plots were constructed for the 3, 6 and 12 month time points. Pearson correlation coefficients were calculated and regression lines from a simple linear regression model were added to each plot. These were used to estimate the mean change in FEV 1 corresponding to 3-and 4-unit changes in SGRQ, and the mean change in SGRQ score associated with a 100 mL increase in FEV 1 (magnitude of change in FEV 1 at which patients are more likely to report improvement in an important clinical parameter such as health status) [10].
Random effects regression modelling was used to explore the effects of the change in FEV 1 on the change in total SGRQ score. The model included time (3, 6 or 12 months) as a categorical variable and study was treated as a random effect to allow for correlation within each study, thus adjusting for possible confounders. This model allows an estimate of the strength of the relationship between FEV 1 and SGRQ (the size and statistical significance of the model coefficient). Where sufficient data were available, similar methods were applied to investigate the relationship between changes in FEV 1 and the outcomes TDI and percentage of patients experiencing at least one COPD exacerbation. All statistical analyses were performed in Stata 10.1.

FEV 1 change and change in SGRQ total score
Using all treatment arms and all time points (n = 49), Figure 2 shows a moderate negative correlation between the mean change in trough FEV 1 and change in SGRQ total score; greater increases in FEV 1 were associated with greater reductions (i.e. improvements) in SGRQ. Zero change in FEV 1 was associated with a significant reduction in SGRQ score of 2.5 (95% CI 1.8, 3.3). The additional reduction in SGRQ associated with a 100 mL increase in FEV 1 was 1.6 (0.7, 2.5), making the total improvement in SGRQ 4.1 units. When excluding placebo arms, zero change in FEV 1 was associated with a reduction in SGRQ total score of 4.1 (2.7, 5.6). However the association between change in FEV 1 and additional change in SGRQ total score was no longer statistically significant; for a 100 mL increase in FEV 1 the reduction in SGRQ was 0.4 (-1.1, 1.9). Table 2 illustrates the increasing probability of reaching a clinically meaningful improvement in SGRQ with increasing levels of FEV 1 improvement. For treatment arms where mean changes in FEV 1 were ≥100 mL (using the largest ΔSGRQ values for studies with data for multiple time points) the probability of reaching a mean reduction in total SGRQ score of 4 units was 80%.
Random effects modelling found that a 100 mL increase in FEV 1 was associated with an estimated reduction in SGRQ total score of 2.5 (1.9, 3.1). This equates to a clinically meaningful reduction of 4 units in SGRQ being associated with an estimated improvement in FEV 1 of 160.6 (129.0, 211.6) mL. When this analysis was repeated excluding the placebo arms, a 100 mL increase in FEV 1 led to an estimated change in SGRQ score of 1.02 (0.0, 2.5) although the association between FEV 1 and SGRQ score was no longer significant.
When data for all treatment and placebo arms, regardless of time, were stratified by SGRQ domains, there was a weak, non-significant negative correlation between change in trough FEV 1 and change in SGRQ Symptoms score (Table 3). However there was a weak, but statistically significant negative correlation with change in SGRQ Activity score and a moderate and statistically significant negative correlation with change in SGRQ Impacts score. Table 4 presents the results for the relationship between change in FEV 1 , and TDI and exacerbations. Considering all treatment arms and 3, 6 and 12-month time points (n = 15), there was a moderate positive correlation between change in TDI and change in FEV 1 . The improvement in TDI associated with a 100 mL increase in FEV 1 was 0.5 although this was below the 1 unit MCID for TDI [8]. When placebo arms were excluded from the analysis there was no evidence of an association between change in FEV 1 and change in TDI score.

FEV 1 change and other patient-reported outcomes
Increasing FEV 1 was associated with a reduction in the proportion of patients experiencing at least one exacerbation, although the correlation was weak (Table  4). An increase of 100 mL in trough FEV 1 was associated with an estimated 6.0% reduction in the proportion of patients experiencing at least one exacerbation.

Discussion
Our study-level analysis demonstrated a relationship between improved lung function (as measured by FEV 1 ) and improvements in health status (as measured by SGRQ) in adult patients with stable COPD who are treated with long-acting inhaled bronchodilators. Results of random-effects regression modelling indicated that a 100 mL increase in FEV 1 was associated with a reduction in SGRQ total score of 2.5 units. This equates to a clinically meaningful reduction of 4 units in SGRQ being associated with an estimated improvement in FEV 1 of 160.6 mL. These results were supported by correlation analyses which demonstrated a moderate negative correlation between change in total SGRQ score and change in trough FEV 1 , when all treatment arms were considered. When the placebo arms were excluded from the analyses the relationship was not significant, which may be due in part to the reduction in sample size, but principally because clustering of results for the placebo arms around zero for change in FEV 1 and change in SGRQ increased the scatter in the data which allowed correlations to emerge. It should be emphasised that the principal objective of our review was to investigate the relationship between trough FEV 1 and outcomes rather than test differential effects of treatment, so all use of treatment arms including placebo arms was appropriate. It is important to note that our analysis focussed on studies including long-acting    Additional change in outcome for 100 mL change in FEV 1 (95% CI)** 0.5 (0.1, 0.9) -6.0 (-0.04,-11.9)% *Pearson correlation coefficient; **output from random effects regression modelling FEV 1 : forced expiratory volume in 1 second, LAMA: long-acting muscarinic antagonists, LABA: long-acting β2-agonists bronchodilators. Relationships between FEV 1 and outcomes may be different for anti-inflammatory treatments. Further, different results may have been obtained had we assessed the relationship between peak FEV 1 and outcomes. However, we selected the trough measurement since it was the primary endpoint and therefore best documented outcome in most studies.
Despite the discrepancy in outcome measures required to demonstrate clinical effectiveness between the regulatory authorities and reimbursement agencies, such as the National Institute for Health and Clinical Excellence in the UK and the Institute for Quality and Efficiency in Health Care in Germany, few studies have investigated the relationship between change in lung function and change in patient-reported outcomes. We are aware of no other analysis addressing this issue at a study level. However, our data are consistent with the results of patient-level analyses [5,48], although in these studies the strength of the relationship between change in SGRQ and FEV 1 was too weak to allow health status gains to be inferred from spirometric changes [48]. This is not a limitation, but rather reflects how different individuals with the same physiological limitations may experience differing effects on their health status.
Our study indicated that the correlation between change in trough FEV 1 and change in SGRQ total score appears to strengthen with increasing study duration from 3 to 6 to 12 months. Over an intermediate and longer term period, the impact of an improvement in lung function may have a greater effect on patient wellbeing, although in our analysis, the limited data reported in the included studies did not allow us to assess whether changes in FEV 1 at 3 months correlated with longer term changes in outcomes. There was also a trend to increasing mean change in SGRQ, across all study arms, with longer study duration. When data were analysed by SGRQ domain, the association between change in FEV 1 and change in SGRQ scores was still present for the Activity and Impacts domains. A weak correlation between SGRQ Symptoms domain and FEV 1 has been reported ever since the first validation of this instrument [3].
Another important issue to be addressed is the "meaning" of the 100 mL increase in FEV 1 associated with a reduction in SGRQ total score of 2.5 units, and an estimated improvement in FEV 1 of 160 mL in relation to a clinically meaningful reduction of 4 units in SGRQ. There is no universally accepted approach for determining the clinical important difference of a measurement. As a measure, SGRQ reflects aspects of COPD beyond lung function alone [48]. In our analysis, the corresponding increase in health status in treatment arms with larger improvement in FEV 1 enhances the ability to interpret lung function changes at a study level, but not at a patient level.
Depending on the intervention under study, FEV 1 may offer the perspective of an intermediate end point in assessing likely treatment effectiveness. However, treatment effectiveness cannot be based exclusively on spirometry, requiring assessment of other relevant clinical parameters such as patient-reported health status.
It is interesting to note that a zero change in FEV 1 still resulted in a reduction in SGRQ score of 2.5. This effect has been noted in many clinical trials in COPD and appears to relate to a 'Hawthorne effect', whereby patients receive better care by participating in the trial [49]. It could relate to a number of different factors, including improved compliance with treatments which may not all have bronchodilator effects.
There was also some evidence of a positive relationship between change in FEV 1 and other outcomes, i.e., improvements in TDI score and reduction in the proportion of patients experiencing at least one exacerbation. These associations were weaker than those observed with SGRQ. However, correlation data for TDI versus trough FEV 1 were limited by the relatively small number of studies (n = 8) reporting both outcome measures. For data on exacerbations, longer study durations would have been required to fully assess the apparent negative correlation with change in FEV 1 .
Our review has limitations. We did not explicitly seek primary studies assessing the correlation between outcome measures and the restriction of our search strategy to RCTs in order to enhance the quality of the analysis means that observational studies of this type would not have been identified. In addition, the objectives of included studies differed from those of the review: included studies were generally designed to measure the effects of treatment upon COPD outcomes, whereas we were interested in the relationships between outcome measures. Included studies tended to present full results for their primary outcome measure only, with reporting of additional outcomes being poor and measures of variance were often absent. Thus, standard deviations had to be imputed for a high proportion of the data sets included in our analyses. In addition, many studies did not report numerical data and values were estimated from graphs, although such approaches are consistent with established systematic review methodology.
Although our review did not address treatment effect sizes, our objectives did include an assessment of the relationships between treatment effects upon treatment effect sizes (data addressing this objective were sparse and not included in this article). For this reason only RCTs of long acting bronchodilators which included a placebo arm or which compared different classes of bronchodilator were compared.
Finally, the correlation analyses used to assess the relationships between patient-reported outcomes and FEV 1 where data were insufficient to support regression modelling, combined treatment arms from different studies. Thus the data were essentially treated as observational cohorts and the strengths of the RCT design were lost. Combining the data in this way does not take account of differences between studies, such as treatment and dose, and participant baseline characteristics, which may affect estimates of correlation. In theory, this limitation can be overcome using random effects regression modelling. However, even where such modelling was possible, the number of explanatory variables which could be included was constrained by both the reporting of these variables in the primary studies and the size of the data set; both poor reporting and small data sets were factors in this review.
The results of this review give important new insight into the relationship between FEV 1 , a key primary outcome required by regulatory authorities for COPD clinical trials, and patient-reported outcomes such as health status, dyspnoea and exacerbations, which are of greater interest to clinicians, patients and reimbursement agencies. Our analyses have been limited by the size and quality of the available data set and are encouraging, but should be considered hypothesis generating and warrant further investigation.
This study-level analysis indicated that improvement in trough FEV 1 with inhaled bronchodilators may be associated with improvement in health status and may also be associated with improvements in other patientreported outcomes. Although the strength of the association was modest, improvements in both FEV 1 and SGRQ, relative to changes likely to be clinically relevant, were of similar magnitude. FEV 1 may offer the perspective of an intermediate endpoint in assessing treatment effectiveness at a study level.

Additional material
Additional file 1: Search strategy for the MEDLINE database.
Additional file 2: Quality assessment of studies selected for inclusion in the systematic review.