Correlating changes in lung function with patient outcomes in chronic obstructive pulmonary disease: a pooled analysis

Background Relationships between improvements in lung function and other clinical outcomes in chronic obstructive pulmonary disease (COPD) are not documented extensively. We examined whether changes in trough forced expiratory volume in 1 second (FEV1) are correlated with changes in patient-reported outcomes. Methods Pooled data from three indacaterol studies (n = 3313) were analysed. Means and responder rates for outcomes including change from baseline in Transition Dyspnoea Index (TDI), St. George's Respiratory Questionnaire (SGRQ) scores (at 12, 26 and 52 weeks), and COPD exacerbation frequency (rate/year) were tabulated across categories of ΔFEV1. Also, generalised linear modelling was performed adjusting for covariates such as baseline severity and inhaled corticosteroid use. Results With increasing positive ΔFEV1, TDI and ΔSGRQ improved at all timepoints, exacerbation rate over the study duration declined (P < 0.001). Individual-level correlations were 0.03-0.18, but cohort-level correlations were 0.79-0.95. At 26 weeks, a 100 ml increase in FEV1 was associated with improved TDI (0.46 units), ΔSGRQ (1.3-1.9 points) and exacerbation rate (12% decrease). Overall, adjustments for baseline covariates had little impact on the relationship between ΔFEV1 and outcomes. Conclusions These results suggest that larger improvements in FEV1 are likely to be associated with larger patient-reported benefits across a range of clinical outcomes. Trial Registration ClinicalTrials.gov NCT00393458, NCT00463567, and NCT00624286


Introduction
In the absence of other widely accepted and validated markers for chronic obstructive pulmonary disease (COPD), lung function measurement, specifically forced expiratory volume in 1 second (FEV 1 ), has been used as a global marker for pathophysiological changes [1] and by regulators in the drug approval process. Consequently, clinical trials for new products in COPD are typically powered to demonstrate significant improvements in FEV 1 . However, healthcare professionals are more likely to be interested in improvements in patientreported outcomes such as symptoms and health status, which may better reflect treatment impact on the patient. Decision-makers also require evidence to assess trends across large cohorts of patients.
Several studies have demonstrated a significant relationship between poor lung function and worsened health and economic outcomes in patients with COPD [2][3][4][5][6][7][8][9][10][11][12][13], but few have investigated whether changes in lung function associated with an intervention are correlated with changes in such endpoints [12][13][14][15]. There is good evidence that declining lung function leads to worsened patient outcomes, but a surprising lack of evidence that improvements in lung function are correlated with improvements in symptomatic outcomes.
Indacaterol is a novel, inhaled, ultra-long-acting β 2agonist. Initial Phase III trials included over 3000 patients, providing a large pooled dataset. We analysed this dataset in order to examine the relationships between change in FEV 1 and outcomes including dyspnoea, health status, exacerbations and rescue medication use.

Study design and treatments
This investigation was a pooled analysis of patient-level data from three Phase III, randomised studies: Study 1 (INVOLVE [INdacaterol: Value in COPD: Longer term Validation of Efficacy and safety]) was a double-blind comparison of indacaterol 300 μg or 600 μg once daily with formoterol 12 μg twice daily and placebo for 52 weeks; Study 2 (INHANCE [INdacaterol versus tiotropium to Help Achieve New COPD treatment Excellence]) compared double-blind indacaterol 150 μg or 300 μg once daily with placebo and open-label tiotropium 18 μg once daily for 26 weeks; Study 3 (INLIGHT 1 [INdacaterol: efficacy evaLuation usInG 150 μg doses witH COPD PatienTs]) was a 12-week study comparing double-blind indacaterol 150 μg once daily with placebo for 12 weeks. Patients were permitted to continue inhaled corticosteroid (ICS) monotherapy if the dose and regimen were stable for 1 month before screening, and were to remain stable throughout the study; patients were also permitted rescue salbutamol as needed. Full details have been reported elsewhere [16][17][18]. All studies were conducted in accordance with the Declaration of Helsinki (1989) and local applicable laws and regulations. Approval was obtained from the Institutional Review Board or Independent Ethics Committee of each participating study centre. All patients provided written informed consent prior to participating in each study included in the pooled analysis. All patient data was anonymised.

Patients
Patients were male or female, aged ≥ 40 years, with a smoking history of ≥ 20 pack years and a diagnosis of moderate-to-severe COPD [19]. All patients in whom trough FEV 1 measurements were available both at baseline and at 12 weeks were included. Patients with extreme changes from baseline in trough FEV 1 (> +500 or < -500 ml) were excluded, as these values were considered erroneous.

Endpoints
The primary endpoint in all three studies was trough FEV 1 [20].
Secondary endpoints included health status (using the St George's Respiratory Questionnaire [SGRQ] [21]), and dyspnoea (using the Transition Dyspnoea Index [TDI] [22]; Studies 1 and 2 only). The SGRQ provides scores between 0 and 100, with higher values indicating greater impairment. The TDI is inherently a change from baseline and provides values between -9 and +9, with positive values indicating improvement. Rescue medication use (number of puffs of salbutamol) was recorded by patients in diaries. COPD exacerbations were defined as the onset or worsening of > 1 respiratory symptom for > 3 consecutive days, requiring intensified treatment (e.g. systemic steroids, antibiotics, oxygen) and/or hospitalisation or emergency room visit. Severe exacerbations were those requiring hospitalisation.

Statistical methods
The primary objective was to examine relationships between patient-reported outcomes and change from baseline in trough FEV 1 (ΔFEV 1 ) using data summarisation and model-based analysis. Outcome variables for both analysis approaches were TDI, change from baseline in SGRQ (ΔSGRQ), rescue medication use and exacerbation rates.
For TDI and ΔSGRQ, relationships were examined with the average of each patient's ΔFEV 1 through the corresponding week of observation. For rescue medication use and exacerbations, the average ΔFEV 1 over time on treatment was used.

Data summaries and related inferences
TDI and ΔSGRQ were handled as outcome variables at 12, 24/26 and 52 weeks. Responders were patients who achieved at least the minimal clinically important difference (MCID) from baseline (one and four units for TDI and SGRQ, respectively [21,22]). Daily rescue medication use was the number of puffs during treatment divided by the number of days on treatment. Rate of exacerbations was the number of exacerbations on treatment, normalised to 1 year (365 × number of exacerbations while on treatment/days on treatment).
For each of the timepoints, outcomes and responder rates for ΔSGRQ and TDI were tabulated across five categories of ΔFEV 1 that were chosen to distribute patients approximately equally across categories, and bounded above and below by ± 500 ml. The hypothesis of equality across categories was tested by the Kruskal-Wallis test. Correlation coefficients were computed between observed individual values of ΔFEV 1 and the outcome, and between the category midpoint values of ΔFEV 1 (-275 ml, 0 ml, 100 ml, 200 ml and 375 ml) and the category mean response of the outcome.

Model-based analyses
In line with established statistical procedures, generalised linear modelling [23,24] was performed to examine the relationship between ΔFEV 1 and each outcome variable. For TDI and ΔSGRQ, observations at all timepoints were modelled together using repeated-measures multiple regression analyses, assuming constant variance and an unstructured correlation matrix. Time was included both as a main effect and in an interaction with ΔFEV 1 .
Rescue medication use and exacerbations were modelled as number of puffs and number of exacerbations, respectively, during time on treatment. Rescue medication use was modelled using the zero-inflated negative binomial distribution for likelihood-based model building, and then the final model was refitted using quasilikelihood to report parameter estimates. Exacerbations were modelled using the negative binomial distribution. For both, in order to ensure positivity of the modelled mean response, the logarithm of the mean was represented as linear in the covariates, and then the mean was found by taking antilogs.
Other predictor variables were baseline trough FEV 1 (continuous), age (continuous), gender (binary), ICS use (binary: yes or no), treatment (indacaterol, formoterol, tiotropium or placebo), screening FEV 1 measured to assess reversibility before and after a short-acting β 2agonist, and before and after a short-acting anticholinergic, world region (Western Europe and the USA, Eastern Europe and Turkey, Rest-of-World), and time at risk for exacerbations and rescue medication use. Disease severity was included as a binary variable, based on the Global initiative for chronic Obstructive Lung Disease (GOLD) stages [19]; predominantly GOLD 2 (moderate or less severity including 91% moderate, referred to subsequently as GOLD 2) versus predominantly GOLD 3 (severe or greater severity including 98% severe, referred to as GOLD 3), as measured by per cent predicted FEV 1 at screening after short-acting β 2 -agonist. The default condition for all models was: baseline FEV 1 of 1.3 l, age 65 years, gender male, GOLD 2, no ICS use, indacaterol treatment, screening FEV 1 before (and after) reversibility testing of 1.3 l (1.5 l) and Western Europe/USA region. All statistical comparisons were made relative to this combination of covariates, and unless otherwise stated, these were the values of the parameters used for predictions by the models.
Model-based inference steps were performed to test for interactions between ΔFEV 1 and the covariates treatment, disease severity, ICS use and world region. For this purpose, disease severity was represented jointly by baseline FEV 1 , the binary severity indicator defined above, Baseline Dyspnoea Index (BDI) for TDI and baseline SGRQ for ΔSGRQ. To allow for the possibility of differing relationships for negative versus positive values of ΔFEV 1 , a possible breakpoint at ΔFEV 1 = 0 was tested in each model. The main effects of covariates were tested for significance according to Wald P values in the final model, with P < 0.01 judged significant without any adjustments for multiplicity.
For each outcome variable, the improvement in expected response for an increase in FEV 1 from 0 to 100 ml was also computed, based on a model that excluded treatment effects, to allow for variation in ΔFEV 1 between, as well as within, treatments.

Results
In total, 3313 patients were included in the analysis. Patient demographic and clinical characteristics are presented in Table 1. Age, pre-and post-bronchodilator FEV 1 and body mass index were well balanced across studies. Study 1 included more males, patients taking ICS and patients with slightly lower per cent predicted FEV 1 and reversibility.

Data summaries and related inferences
The distribution of average ΔFEV 1 responses by timepoint is shown in Table 2, both as frequencies within ΔFEV 1 categories and as percentiles of distributions. Median values ranged from 75 to 94 ml. Approximately 5% of observations were excluded due to extreme ΔFEV 1 (± 500 ml) at any timepoint; 0.7% observations (24/3313) were less than -500 ml and 4.1% (137/3313) were greater than 500 ml at Week 12.
All relationships between ΔFEV 1 and outcomes were statistically significant, except for severe exacerbations (Table 3). Individual-level correlations were weak (0.03-0.18), reflecting the large variability in outcomes; however, cohort-level correlations were stronger (0.79-0.95). When outcome means were plotted versus ΔFEV 1 midpoints, there were clear trends towards greater improvement in outcomes with increasing ΔFEV 1 , particularly for positive ΔFEV 1 (Figure 1). Responder rates, in terms of TDI and ΔSGRQ, followed a similar pattern to the mean outcomes (Table 4).

Model-based results
The plots of curves fitted from the model-based analysis for each outcome variable versus ΔFEV 1 are presented in Figures 2a-d. For TDI and ΔSGRQ, the significant breakpoints at zero are evident in the changes of slope in the fitted lines. For rescue medication and exacerbations the breakpoints were not significant. The fitted curves for rescue medication and exacerbations are linear on logarithmic scales, so appear nonlinear on the scales of these plots.
ΔFEV 1 was significantly correlated with TDI score (P < 0.0001). A significant breakpoint in the fitted lines is seen at zero; the slope was significantly shallower for negative ΔFEV 1 compared with positive ΔFEV 1 (P = 0.003 for the difference between slopes). The slope of the relationship (determining the magnitude of change in outcome for a given improvement in ΔFEV 1 ) was not significantly affected by treatment, baseline severity, ICS use or world region. Hence, the overall model-predicted increase in TDI for a 100 ml increase in ΔFEV 1 was the same for all combinations of covariates, and estimated to be 0.46 at Week 24/26. Although the slope of the relationship with ΔFEV 1 was the same for all covariates, the intercept, that is, the TDI corresponding to zero change in FEV 1 , was not. For a given ΔFEV 1 , patients with lower baseline FEV 1 , lower BDI, using ICS or on placebo, had significantly lower values of TDI, while those from Eastern Europe/Turkey and Rest-of-World regions had significantly higher values. When covariates representative of patients who were less severe were inputted into the model (i.e., GOLD 2, no ICS and baseline FEV 1 of 1.595 l), the model-predicted TDI for a zero and +100 ml change in FEV 1 was 1.98 and 2.44, respectively. For more severe patients (i.e., GOLD 3, ICS and baseline FEV 1 of 0.95 l), the model-predicted TDI for patients with zero and +100 ml change in FEV 1 was -0.20 and 0.26, respectively.
significantly shallower for negative ΔFEV 1 (P = 0.002 for the difference between slopes). The slope of the relationship with improvement in FEV 1 was not significantly affected by treatment, ICS use, or world region, but it was steeper for patients in GOLD 3, and with baseline FEV 1 0.95 l compared with GOLD 2 and baseline FEV 1 1.595 l (P = 0.004). For an increase of ΔFEV 1 of 100 ml, the model predicted a change in SGRQ of -1.3 for GOLD 2 and -1.9 for GOLD 3 patients at Week 24/26. Patients with worse baseline FEV 1 , with worse baseline SGRQ, using ICS or on placebo, had significantly higher ΔSGRQ, whereas patients from Eastern Europe/Turkey and Rest-of-World regions had significantly lower ΔSGRQ at Week 24/26. For GOLD 2 patients, who had used no ICS and had baseline SGRQ of < 31, the model-predicted improvement in SGRQ at Week 26 for a zero and +100 ml change in FEV 1 was -1.6 and -2.9, respectively. Similarly, for GOLD 3 patients who had used ICS and had baseline SGRQ of > 58, the modelpredicted improvement in SGRQ at Week 24/26 was -0.9 and -2.8, respectively. ΔFEV 1 was significantly correlated with rescue medication use (P < 0.0001). Treatment, baseline severity, ICS use or world region, did not significantly affect the slope of the relationship, and the slope did not change significantly between negative and positive ΔFEV 1 . Hence, for all combinations of covariates, an increase of 100 ml in ΔFEV 1 is predicted to yield the same 10% reduction in rescue medication use. Patients with lower baseline FEV 1 , male patients, those with higher baseline medication usage or more severe disease, using ICS or on placebo or tiotropium, had significantly higher rates of rescue medication usage. Younger patients (< 65 years) had almost significantly higher rates (P = 0.012, versus the defined significance level of P < 0.01). For GOLD 2 patients not receiving ICS, the predicted daily number of puffs of rescue medication for a zero and +100 ml ΔFEV 1 was 0.89 and 0.80, respectively, and 1.83 and 1.64 for those in GOLD 3 and using ICS.
ΔFEV 1 was significantly correlated with exacerbations (P = 0.002). Treatment, baseline severity, ICS use or world region, did not significantly affect the slope of the relationship. Furthermore, the slope did not change significantly between negative and positive ΔFEV 1 . Hence, for all combinations of covariates, an increase of 100 ml in ΔFEV 1 is predicted to yield the same 12% decrease in exacerbations. Patients with lower baseline FEV 1 and patients using ICS had significantly higher rates of exacerbations. Patients from the Eastern Europe/Turkey region had significantly lower rates of exacerbations. The model estimate for the annual rate of exacerbations for patients with a zero and +100 ml ΔFEV 1 were 0.29 and 0.25, respectively, for GOLD 2 patients not using ICS; and 1.28 and 1.12, respectively, for patients in GOLD 3 and using ICS. As in the summary analysis, the rate of severe exacerbations requiring hospitalisation was not significantly correlated with ΔFEV 1 (P = 0.3).

Discussion
Our analyses show that improvement in FEV 1 is significantly related to changes in the patient-reported outcomes TDI, SGRQ, exacerbation rate and rescue medication use over 12-52 weeks of treatment. These relationships were significant at both an individual and population level, although correlations were much stronger in the population-based analyses. Few studies have examined the relationship between change in FEV 1 and change in outcomes. However, our results are consistent with analyses of patients from the 3-year EUROSCOP (The European Respiratory Society Study on Chronic Obstructive Pulmonary disease) study, in which an improvement of 100 ml in FEV 1 was associated with a 4% reduction in dyspnoea in males [13], and a 16-week clinical study, in which a significant, but weak correlation between change in FEV 1 and change in SGRQ score was demonstrated (r = 0.33, P = 0.001) [14]. Further, a recent systematic review of 22 studies found that 100 ml increase in FEV 1 was associated with a statistically significant reduction in SGRQ of 2.5 [15]. However, to our knowledge, the current analysis is the largest and most comprehensive to investigate the correlation between change in FEV 1 and change in outcomes using individual patient data from studies of similar design. This provides a relatively homogeneous population for analysis, compared with study-level metaanalyses.
We demonstrated that a 100 ml increase in trough FEV 1 (a magnitude of change with perceptible effects [25]) was associated with a 0.46-unit increase in TDI and a 1.3-to 1.9-unit improvement in SGRQ after a 24/ 26-week treatment period and, over 12-52 weeks of treatment, a 10% decrease in daily rescue medication use and a 12% decrease in the annual exacerbation rate. In general, we found that treatment, baseline severity, concomitant ICS use and world region, did not affect the slope of the relationship between outcome and change in FEV 1 , except for ΔSGRQ where more severe COPD, as characterised by a lower FEV 1 and a higher SGRQ at baseline, was associated with a steeper slope, compared with less severe COPD. This is consistent with results from the 3-year TORCH (TOwards a Revolution in COPD Health) study, in which trends to greater improvement in SGRQ with worsening GOLD severity were noted with active treatments [26].
Although severe exacerbations showed a trend toward greater reductions with increasing ΔFEV 1 , the relationship was not statistically significant. While the observed 12% reduction in overall exacerbation rate for an improvement of 100 ml in FEV 1 was comparable with previously published data [11], the studies included in our analysis were not powered to show an effect on exacerbations, and did not specifically recruit patients at risk of exacerbations.
We found inconsistent effects of different treatments across individual outcomes, perhaps due to patient numbers in sub-categories being too low to demonstrate consistent differences for individual treatments across all outcomes. However, our analysis did demonstrate that the relationship between ΔFEV 1 and outcome appeared to be the same, regardless of treatment arm. Similarly, baseline severity, ICS use and world region were assessed as main effects, as well as for their potential influence on the effect of ΔFEV 1 . Although numbers of patients in GOLD 4 (as well as GOLD 1) were too small to make any inferences, patients predominantly in GOLD 3 at baseline, and those using ICS, consistently exhibited significantly worse outcomes. Indeed, the variability in baseline severity and ICS use are likely to have been major contributors to the large variability in observed outcomes.
The relationships between outcomes and ΔFEV 1 may differ between negative and positive ΔFEV 1 , and for this reason, the models included a possible breakpoint at zero in the relationship slope. The inclusion of this breakpoint was found to be significant for TDI and ΔSGRQ, suggesting that baseline severity and other included covariates could not explain the observed behaviour fully. These results may have been influenced by differences in withdrawal rates between categories [27], since the highest withdrawal rate was in those with a negative change in FEV 1 , although differences between groups were minimal. The inclusion of a breakpoint was not significant for rescue medication and exacerbations, even though Figure 1 may have anticipated its importance, especially for exacerbations. The large variability and count nature of the data for rescue medication and exacerbations may have caused 'Type-2' statistical errors, i.e., failure to find the true breakpoints to be significant.
We found that zero change in FEV 1 was associated with significant positive improvements in TDI and SGRQ. Additionally, while a greater proportion of patients achieved the MCID for TDI and SGRQ as ΔFEV 1 increased, our results indicated that as many as 50% patients responded, irrespective of ΔFEV 1 , possibly an effect of clinical trial participation seen consistently in the placebo limb of clinical trials [28][29][30].
We constructed the models in our analysis using ΔFEV 1 as a predictor, and the other outcome measures as the response variables, based on the results of a carefully-controlled series of clinical trials. However, ΔFEV 1 was as much a response as was the outcome, so ΔFEV 1 was not an 'independent' variable controlled as part of the experimental design. There may have been further confounders that simultaneously affected how both ΔFEV 1 and the outcome responded to treatment. The fitted models therefore describe the observed relationships under the conditions of a clinical trial, but do not provide a definitive answer as to whether there is a causal relationship between ΔFEV 1 and the outcomes. The studies included in our analysis were powered on the spirometric endpoint FEV 1 , which is required by regulatory agencies for the approval of bronchodilators, and is included in the majority of treatment guidelines. For this reason we made FEV 1 the focus of our analysis. Other physiological parameters such as inspiratory capacity may have stronger correlations with dyspnoea [31]. However data for these parameters were not available from our dataset and further research is needed to investigate such correlations in large numbers of patients.

Conclusions
It is commonly stated that spirometry does not fully capture the impact of COPD on a patient's health [32]. Our analysis of a large cohort of patients has demonstrated that in individual subjects, change in FEV 1 is a significant, albeit relatively weak predictor of improvement in patient-reported outcomes. However, the current analysis also shows that, at a population level, improvements in FEV 1 with long-acting bronchodilator therapy are strongly correlated with improvements in dyspnoea, health status and exacerbations. This suggests that interventions which significantly improve FEV 1 are also likely to be associated with improved clinical and patient-reported outcomes.