Composite endpoints in COPD: clinically important deterioration in the UPLIFT trial

Background Assessments of lung function, exacerbations and health status are common measures of chronic obstructive pulmonary disease (COPD) progression and treatment response in clinical trials. We hypothesised that a composite endpoint could more holistically assess clinically important deterioration (CID) in a COPD clinical trial setting. Methods A composite endpoint was tested in a post hoc analysis of 5652 patients with Global Initiative for Chronic Obstructive Lung Disease (GOLD) 2–4 COPD from the 4-year UPLIFT study. Patients received tiotropium 18 μg or placebo. Results The composite endpoint included time to first confirmed decrease in trough forced expiratory volume in 1 s (FEV1) ≥100 mL, confirmed increase in St. George’s Respiratory Questionnaire (SGRQ) total score ≥ 4 units, or moderate/severe exacerbation. Most patients (> 80%) experienced CID, with similar incidence among GOLD subgroups. Most confirmed trough FEV1 (74.6–81.6%) and SGRQ (72.3–78.1%) deteriorations were sustained across the study and in all GOLD subgroups. Patients with CID more frequently experienced subsequent exacerbation (hazard ratio [HR] 1.79; 95% confidence interval [CI] 1.67, 1.92) or death (HR 1.21; 95% CI 1.06, 1.39) by Month 6. CID was responsive to bronchodilator treatment. Conclusions Composite endpoints provide additional information on COPD progression and treatment effects in clinical trials. Trial registration ClinicalTrials.gov NCT00144339. Graphical abstract

clinically important changes at the individual patient level. Furthermore, the focus on only one dimension of COPD may misrepresent real improvements that are meaningful to patients.
Measuring clinically important deterioration (CID) in terms of the most impactful events at the individual patient level might provide a significant benefit in studying the progression and effects of COPD in clinical trials. The three events included in this composite endpointtrough FEV 1 , St. George's Respiratory Questionnaire (SGRQ) score and moderate/severe exacerbationhave been previously used by Singh et al. [3], Anzueto et al. [4] and Greulich et al. [5], and were selected because they are commonly used in clinical trials and are known to have an impact on patients with COPD.
To explore the composite endpoint of CID further, we used a post hoc analysis of the 4-year UPLIFT study. The objectives of this analysis were to test the validity of CID when only including FEV 1 and SGRQ events that were confirmed at a subsequent visit, to prove that CID predicts future outcomes, and to explore other elements of CID.

Methods
This post hoc analysis assessed time to first CID as time to the first occurrence of at least one of the following: decrease in trough FEV 1 from baseline ≥100 mL, increase in SGRQ total score from baseline ≥4 units or moderate/severe exacerbation (the same components as suggested by Singh et al. [3]). Changes in FEV 1 and SGRQ score were always calculated from baseline. Changes in FEV 1 were assessed using pre-bronchodilation values, in line with previous studies assessing CID [6][7][8] and reflecting realworld clinical practice for FEV 1 monitoring. A decrease in trough FEV 1 ≥ 100 mL is considered to be the minimum clinically important change perceived by patients [9,10] and is within the defined range suggested by the American Thoracic Society/European Respiratory Society task force [11], whereas an increase in SGRQ total score ≥ 4 units is considered the minimum clinically important change in quality of life [12].
Unlike for the composite endpoint published by Singh et al. [3], we only included confirmed FEV 1 and SGRQ deteriorations, i.e. events that were present during at least two consecutive assessments (5 or 6 months apart). This excluded short-term fluctuations in the disease, which could provide an unreliable indication of CID. If no further assessment was available, but the patient discontinued study medication or died, the event was also considered as confirmed. Confirmed events were not required for exacerbations of COPD.
We have used the term "sustained" to refer to deteriorations that were then maintained at almost every subsequent visit.

Study design
Study design details have been previously reported [13] and are briefly summarised below. UPLIFT (Clinical-Trials.gov: NCT00144339) was a 4-year, randomised, double-blind, parallel-group study comparing tiotropium 18 μg, administered once daily via the HandiHaler®, with matching placebo [14]. The UPLIFT study was conducted in 37 countries [14]. Patients were aged ≥40 years, with a smoking history of ≥10 pack-years and moderate-to-very severe COPD (Global Initiative for Chronic Obstructive Lung Disease [GOLD] 2-4 [15]). For further details, see the Supplementary Methods. The protocol was approved by the ethics committee at each centre, and all patients provided written, informed consent.
Spirometric testing was performed at randomisation, at the Day 30 visit and at visits every 6 months up to Month 48. SGRQ was assessed at randomisation and every 6 months up to Month 48. Exacerbations and associated hospital admissions were recorded on case report forms at every visit. The two primary endpoints were pre-and post-bronchodilation yearly rate of decline in mean FEV 1 .

Statistical analysis
For time-to-event endpoints, hazard ratios (HRs), 95% confidence intervals (CIs) and P values were calculated using a Cox proportional hazards model. Patients without CID events were censored at the treatment stop date.
To assess the association of CID with future outcomes, patients experiencing a CID event within the first 6 months were compared with those not experiencing the event. For this analysis, the time to first moderate/severe exacerbation was calculated from Month 6 (180 days) to the first subsequent event or treatment discontinuation. Time to death was calculated from Month 6 (180 days) to the date of death or the end of the vital status followup (Day 1470).

Incidence of CID
Most patients in the total population (83.9%) experienced at least one CID during the study (Table 1). Exacerbations were more frequent than FEV 1 or SGRQ decline ( Table 1).
The contribution of exacerbations to the composite endpoint became more pronounced whereas the contribution of FEV 1 became less pronounced as COPD severity (GOLD stage) increased in the total population (Table 1).
Time to first event for each component is shown in e- Figure 1.
Overall, about half of patients experienced at least two of the three events qualifying as CID, whereas fewer patients experienced all three events (Fig. 1a). A similar proportion of patients in each GOLD group experienced at least two CID events ( Fig. 1b-d). The incidence of all three CID events was also similar for GOLD 2 and 3 patients, whereas few GOLD 4 patients experienced all three CID events ( Fig. 1b-d).
Overall, most confirmed events were sustained at subsequent visits. Confirmed trough FEV 1 decline was sustained at 12-48 months after the initial event in 74.6-81.6% of patients ( Table 2). Confirmed SGRQ deterioration was also sustained at 12-42 months after the initial event in 72.3-78.1% of patients (Table 2). This pattern was comparable with the GOLD subgroups (e- Table 1), although patient numbers were low for the GOLD 4 subgroup.
For unconfirmed events (reported at one timepoint), the proportion of patients whose FEV 1 decline or SGRQ deterioration was sustained was lower: 51.6-71.9% of patients still had the FEV 1 decline 6-48 months after first decline, and 52.5-65.5% still had SGRQ deterioration (e- Table 2).
In addition, in patients who had confirmed events, mean FEV 1 remained at least 193 mL worse than baseline for the rest of the trial ( Table 2). For unconfirmed events, mean FEV 1 in patients with an event ranged from 95 mL worse than baseline at Month 6 to 142 mL worse than baseline at Month 24 and 213 mL at Month 48. In patients with SGRQ deterioration, mean increase was > 10 units for the rest of the trial for confirmed events, but ranged from 4.7 to 8.3 units for unconfirmed events.

Relative timing of events
The pattern and timing of clinically relevant events was highly variable for individual patients. Of patients who experienced both confirmed FEV 1 decline and SGRQ deterioration, it was unusual to experience both events at the same assessment ( Table 3). The time from FEV 1 decline to subsequent SGRQ deterioration was slightly longer than the time from SGRQ deterioration to subsequent FEV 1 decline (Fig. 2).
Exacerbations demonstrated a greater contribution to the composite endpoint in more severe patients. Patients with more severe COPD were more likely to experience an exacerbation prior to experiencing FEV 1 decline or SGRQ deterioration (Table 3).

Response to treatment
The time to first CID event, and time to first occurrence of the individual components, was sensitive to therapeutic intervention (Table 4). Time to first CID, two CID events and all three CID events was longer with tiotropium than with placebo (Table 4 and Fig. 1a). This trend was observed in GOLD 2 and 3 subgroups, but less so with GOLD 4 patients ( Fig. 1b-d).
When the composite endpoint was broken down into its component events, the HRs for future exacerbations were smaller for FEV 1 decline and SGRQ deterioration by Month 6 than for the composite endpoint in the overall population, and among GOLD 2 and GOLD 3 COPD patients (Table 5). Exacerbations within 6 months had higher HRs for any exacerbation and for severe exacerbations than the composite endpoint. For unconfirmed events, the HRs for long-term outcomes were lower than for the sustained events (Table 5 and e- Table 3).
Investigating future events by CID status at Month 12 showed similar results (e- Table 4).

Mortality analysis with CID
Additional analyses using time to composite event or time to one of the component events as a time-varying covariate were performed. The HR for death for patients with a CID event versus patients without an event was 1.69 (95% CI 1.42, 2.01) (e- Table 5).
Using a stepwise Cox regression model to adjust for important baseline predictors of mortality had little effect on the predictive performance of the composite (e- Table 5). When all three components were included as separate predictors, all were associated with increased mortality risk (e- Table 5).
To validate these findings, the results in e-Tables 4, 6 and 7 are presented for the placebo and tiotropium arms separately. The HRs are slightly higher in the tiotropium arm, which may be related to the larger number of events in the placebo arm before Month 6. The results in e- Tables 6 and 7 are similar between arms and confirm the results in the total population.

Discussion
Composite endpoints have only recently been introduced in post hoc analyses of COPD clinical trials [3][4][5][6]16]. Here, we conducted a post hoc analysis of the UPLIFT study. This analysis demonstrated the importance of using confirmed events in CID analysis and that CID predicts future outcomes. It also confirmed that the components of this composite endpoint behaved differently based on the baseline FEV 1 of the individual patient. These data suggest that sustained decline in trough FEV 1 , sustained deterioration in SGRQ score of ≥4 units and a moderate/severe exacerbation are appropriate components of a composite endpoint for the assessment of CID in     [16,17]. In the current analysis we focus on a composite endpoint of validated clinically important criteria (FEV 1 , SGRQ and exacerbations) to provide a more complete assessment of the impact on patients.
The individual components of the composite endpoint comprise characteristics of COPD that impact patient well-being, are clinically relevant events for the patient and predict future outcomes [15]. Although there are other parameters that could be included in such an endpoint, the components included are relatively easy to include in clinical trials and have established minimum clinically important differences.
Most deteriorations in FEV 1 and SGRQ that were confirmed at a second visit were maintained for the rest of the 4-year UPLIFT study. Some publications of composite endpoints in COPD have not required confirmation at a subsequent visit [3,16]. We believe that counting only confirmed FEV 1 and SGRQ deteriorations improves the reliability of the composite endpoint, as it excludes short-term variation and inconsistent measurements. This is supported by the low proportion of patients with unconfirmed events whose FEV 1 or SGRQ deterioration is sustained at subsequent timepoints, and by the lower HRs for long-term outcomes with unconfirmed events compared with confirmed events.
Our analysis demonstrated that the components of the composite endpoint rarely occur at the same time in an individual patient. Most patients experience decline of trough FEV 1 , deterioration of SGRQ score and moderate/severe exacerbations on an individualised time scale. This supports the value of individual components in a Fig. 2 Kaplan-Meier estimates of time to first subsequent SGRQ deterioration or first subsequent FEV 1 deterioration. Kaplan-Meier estimates of median time from FEV 1 decline ≥100 mL to SGRQ score deterioration ≥4 units, and median time from SGRQ score deterioration ≥4 units to FEV 1 decline ≥100 mL in the overall population. CI: confidence interval; FEV 1 : forced expiratory volume in 1 s; NE: not evaluable; SGRQ: St. George's Respiratory Questionnaire composite endpoint. The stepwise regression data also show that each component independently contributes to increased mortality risk. The composite endpoint is also sensitive to pharmacological treatment, and is similar to the findings of Singh et al., who observed a reduction in first CID with umeclidinium/vilanterol versus placebo in a post hoc study of the same composite endpoint [3]. Other post hoc analyses have used slightly different composite endpoints: FEV 1 , SGRQ and Transition Dyspnea Index focal score [6]; FEV 1 or Transition Dyspnea Index; an increase in SGRQ; and a moderate-to-severe COPD exacerbation [4].
In all the publications that included FEV 1 , the strongest driver of CID in each of the analysis populations was lung function [3][4][5][6]. In contrast to these previous results, the most commonly reported endpoint in our study was exacerbations, perhaps because the UPLIFT study was 4 years long compared with the shorter (maximum 26 weeks) duration of the previous studies [3]. Our analysis showed a high overall frequency of CID for both treatment arms, which is expected due to the long study duration.
Lastly, we have shown that patients considered to have a CID early in the UPLIFT study (within the first 6 months) had worse outcomes for the 42-month remainder of the study; this was also confirmed in an analysis using CID as a time-varying covariate. These outcomes support results from previous analyses of the shorter TORCH and ECLIPSE studies. The 4-year length of our study provided valuable information on sustained CID and the relationship between clinically important events that could not be ascertained in clinical trials of shorter duration.
The study had limitations. In addition, relatively few patients with GOLD 4 lung function impairment were enrolled. Additionally, GOLD 4 patients have a lower baseline FEV 1 than GOLD 2 or 3 patients, and as such, declines in FEV 1 of ≥100 mL were less common, and would be expected to be more debilitating, in these patients. This should be considered in future studies, where percentage declines may be considered as an alternative clinically significant decline. The composite index considers the parameters SGRQ and moderate/severe exacerbations, which could be seen as subjective; therefore, it is possible that this could introduce some variability in the results. Also, this was a post hoc analysis, although the large population and long follow-up time allowed for a satisfactory number of events to be observed.

Conclusions
We believe these results indicate that a composite endpoint of CID is a promising endpoint to assess disease activity in COPD clinical trials and may be a useful outcome that helps clinicians interpret the implications of trial results for individual patient management. Development of prospective studies is required to determine whether patients who experience disease progression (i.e. those who experience CID) at an increased rate can be identified earlier. By stratifying patients based on time to CID in a clinical trial database, it may be possible to identify characteristics that are associated with longer-term poor outcomes that could be useful for identifying which patients require further treatment earlier. Moreover, the composite endpoint may also serve to reduce patient numbers in clinical trials, as large numbers of patients are required to generate enough statistical power to detect a single outcome within patients with moderate COPD [18]. The length of trials may also be reduced, thereby limiting challenges such as patient discontinuation and cost that are prohibitive in trials of increased duration. Prospective studies are needed on the use of this concept to understand the sensitivity and efficacy of current and potential therapies.