Comparative efficacy of long-acting bronchodilators for COPD - a network meta-analysis

Background Clinicians are faced with an increasingly difficult choice regarding the optimal bronchodilator for patients with chronic obstructive pulmonary disease (COPD) given the number of new treatments. The objective of this study is to evaluate the comparative efficacy of indacaterol 75/150/300 μg once daily (OD), glycopyrronium bromide 50 μg OD, tiotropium bromide 18 μg/5 μg OD, salmeterol 50 μg twice daily (BID), formoterol 12 μg BID, and placebo for moderate to severe COPD. Methods Forty randomized controlled trials were combined in a Bayesian network meta-analysis. Outcomes of interest were trough and post-dose forced expiratory volume in 1 second (FEV1), St. George’s Respiratory Questionnaire (SGRQ) score and responders (≥4 points), and Transition Dyspnea Index (TDI) score and responders (≥1 point) at 6 months. Results Indacaterol was associated with a higher trough FEV1 than other active treatments (difference for indacaterol 150 μg and 300 μg versus placebo: 152 mL (95% credible interval (CrI): 126, 179); 160 mL (95% CrI: 133, 187)) and the greatest improvement in SGRQ score (difference for indacaterol 150 μg and 300 μg versus placebo: -3.9 (95% CrI -5.2, -2.6); -3.6 (95% CrI -4.8, -2.3)). Glycopyrronium and tiotropium 18 μg resulted in the next best estimates for both outcomes with minor differences (difference for glycopyrronium versus tiotropium for trough FEV1 and SGRQ: 18 mL (95% CrI: -16, 51); -0.55 (95% CrI: -2.04, 0.92). Conclusion In terms of trough FEV1 and SGRQ score indacaterol, glycopyrronium, and tiotropium are expected to be the most effective bronchodilators.


Background
Patients with chronic obstructive pulmonary disease (COPD) experience airway obstruction, involving reduced lung function and health-related quality of life due to symptoms such as breathlessness and exacerbations [1]. Since COPD is a progressive disease, the main objective of treatment is to improve lung function, prevent and control symptoms, and ultimately to improve health status. Bronchodilator medications are central to symptom management in COPD, with long-acting preparations preferred over short-acting ones [1].
Given the number of the alternative long-acting treatments available for COPD, clinicians are faced with an increasingly challenging choice regarding the optimal treatment. Since there is no head-to-head randomized controlled trial (RCT) that evaluates all the different monotherapies available, and it is unlikely that such a trial will ever be performed (given the increasing number of options available), a comprehensive systematic review and network meta-analysis is of interest to synthesize the RCT evidence. The objective of the current analysis was to evaluate the comparative efficacy of long-acting bronchodilators in patients with moderate to severe COPD in terms of lung function, health status, and dyspnoea. Approved bronchodilators or those for which data was available at the time of the literature search were included: indacaterol 75/150/ 300 μg OD, salmeterol 50 μg BID, formoterol 12 μg BID, tiotropium bromide 18 μg or 5 μg OD, aclidinium bromide 200 μg OD and glycopyrronium bromide 50 μg OD. No evidence for the approved dose of aclidinium bromide was available at the time of the search, therefore results for aclidinium bromide 200 μg OD were included in the analysis but are not presented given that this dose has not been approved.

Methodology
Identification and selection of articles A systematic literature search was performed to identify RCTs evaluating the efficacy of the long-acting monotherapies for COPD. MEDLINE® and EMBASE® Continuous outcomes: d=0 Binary outcomes: OR=1

More effective
Point estimate suggests the treatment is expected to be better (i.e. higher FEV1 and TDI and lower SGRQ CFB) than the comparator and the 95% credible interval does not include 0 or 1

Likely to be favourable
The 95% credible interval includes 0 or 1 but point estimate is favourable and there is a ≥85% probability that treatment is better than the comparator

Likely to be unfavourable
The 95% credible interval includes 0 or 1 but point estimate is unfavourable and there is a ≤15% probability that treatment is better than the comparator

Less effective
Point estimate suggests the treatment is expected to be worse than the comparator (i.e. lower FEV1 and TDI and higher SGRQ CFB) and the 95% credible interval does not include 0 or 1  Figure 1 Interpretation of the relative treatment effects resulting from the network meta-analysis for continuous and binary outcomes. CFB=Change from baseline; D=difference in CFB; FEV 1 =Forced expiratory volume; OR=odds ratio; SGRQ=St. George's Respiratory Questionnaire; TDI=Transitional Dyspnoea Index; d=0 indicates that the dotted line is equal to a difference in CFB between the treatments of zero for continuous endpoints (i.e. no difference between treatments); OR=1 indicates that the dotted line is equal to an odds ratio of one for binary endpoints (i.e. no difference between treatments). % FEV1 predicted % severe or very severe COPD 0% Figure 3 Patient characteristics of the randomized controlled trials used for the network meta-analysis. Note: Zero values indicate not reported unless otherwise indicated; Figure B presents the mean duration of COPD in years (red) per study and the mean age per study (blue+ red). Comparators: Any of the interventions evaluated as monotherapy or placebo; Outcomes: Trough forced expiratory volume in 1 second (FEV 1 ), post-dose FEV 1 (2 hours after dosing), St. George's Respiratory Questionnaire (SGRQ) total score and proportion of patients with an improvement of at least 4 units in SGRQ total score ("SGRQ Responders" [2]), Transition Dyspnoea Index (TDI) total score and proportion of patients with an improvement of at least 1 unit in TDI score ("TDI Responders" [3]), proportion of patients with an exacerbation and exacerbation rate; Study Design: RCTs.

Outcomes of interest
The outcomes of interest included trough FEV 1 , post-dose FEV 1 , SGRQ total score, SGRQ responders, TDI total score, and TDI responders. Change from baseline was evaluated for all continuous outcomes, with the exception of TDI which was evaluated at follow-up. The current analysis focuses on results at 6 months (discussed in the following), although endpoints were also analyzed at 12 weeks (see online Additional file 1). Exacerbation outcomes will be evaluated in separate manuscript in order to account for differences in definitions.

Data extraction
Information related to the study and patient characteristics was extracted for the included studies, which allowed for a comprehensive assessment of the similarities and differences across the trials. For each outcome the mean results and the associated uncertainty (i.e. standard error) were extracted where sufficient information was available within a two week range for each time point of interest (i.e. between 22-26 weeks for 6 month time point). If necessary, the software DigitizIt version 1.5.8 was used to extract data from graphs presented in the publications.

Network meta-analysis
Bayesian network meta-analysis (NMA) models were used [15][16][17][18] to simultaneously synthesize the results of the included studies for each outcome of interest.
NMAs within the Bayesian framework involve data, a likelihood distribution, a model with parameters, and prior distributions [19]. The model relates the data from the individual studies to basic parameters reflecting the (pooled) treatment effect of each intervention relative to placebo as the overall reference treatment. Based on these basic parameters, the relative efficacy between each of the interventions was obtained. For the continuous outcomes a normal likelihood distribution was used and for the binary outcomes a binomial likelihood was used [16,17,20,21]. For each analysis, both fixed and random effects models were evaluated. With a NMA, randomization only holds within a trial and not across trials. Consequently, there is the risk that patients who were studied in different comparisons are not similar, which may lead to consistency violations. In order to minimize confounding bias, analyses with a constant treatment by covariate interaction were evaluated [22] or analyses were performed excluding specific trials to assess the impact of potential treatment effect modifiers. Potential treatment effect modifiers were identified a priori as concomitant treatments, COPD severity, smoking status, age, and sex. Separate analyses were performed to evaluate the potential treatment effect modifiers given the limited number of studies included in each analysis. Noninformative prior distributions for the model parameters were implemented to avoid influencing the results of the analysis based on the prior beliefs.
For each model with and without covariates, both fixed and random effects models were tested. The deviance information criterion was used to compare the models, which provides a measure of model fit that penalizes model complexity accordingly [23]. The random effects model was selected unless there was sufficient evidence to suggest the fit of the fixed effect model was

Scenario analysis
Base case excluding RCTs that included only patients with a history of exacerbations better. The analyses were performed using WinBUGS 1.4.1 statistical software [24]. The results of the NMA are presented in terms of 'point estimates' for the relative treatment effects and the 95% credible intervals (95% CrI). The probability that each treatment is best is also presented which is calculated based on the proportion of Markov chain Monte Carlo cycles in which a specific treatment ranks first out of the total (where the ranking is based on the treatment effect size) [25]. Figure 1 outlines the interpretation of the results for continuous and binary outcomes, which utilizes the probability that one treatment is better than another (i.e. proportion of cycles in which specific treatment estimate is better than the comparator).
All studies were parallel placebo-controlled multi-center RCTs, with the exception of three RCTs that compared active treatments only [5,13,28]. All trials were doubleblind, although three RCTs evaluated open-label tiotropium [7,11,51]. The RCTs were generally considered of good quality, but the method of randomization and concealment of treatment allocation were not always well reported.
The study designs were mostly similar with some differences in terms of the study location and background medications. The studies were predominantly European and North American, although two trials were based in Asia [4,12] and some trials included study centers in South America, Africa, and Asia [6,7,14]. Most RCTs allowed patients to receive a concomitant ICS, whereas some RCTs allowed the continued use of LABAs [26,35,37,44,45,48] or LAMAs [30,36].
The enrolled patients had a COPD diagnosis, were 40 years of age or older and were current or ex-smokers with a smoking history of at least 10 years. Selected RCTs included patients with a smoking history of at least 15 years [34] or 20 years [6,7,9,12,14,41,43]. Generally, patients were required to have an FEV 1 / FVC of less than or equal to 0.70 and an FEV 1 percent predicted often between 30 and 80%, although this range varied across the studies (see Additional file 1: Table S1). Exacerbation history was reported in only five of the RCTs [8,11,31,36,42], and two studies specified inclusion criteria with respect to exacerbations history, requiring at least one exacerbation over the prior one to two years [35] or one exacerbation per year over the prior three years [30]. Figure 3 illustrates the variation in the RCTs in terms of the proportion of males (range: 52-99%), average age (range: 60-68 years), duration of COPD (range: 3.8-13.1 years), proportion of current smokers (range: 22-59%), proportion receiving ICS during the trial (range: 0-78%), and the proportion with severe or very severe COPD (range: 36-95%) as reported based on the GOLD criteria or calculated as function of the FEV 1 % predicted. Overall, differences were most apparent in terms of ICS use and severity.

Network meta-analysis
The RCTs were synthesized with a network metaanalysis. The individual study results are presented in an online supplement (See Additional file 1: Tables S2-7). In the base case analysis all RCTs were included without covariates. Scenario analyses were performed to explore the impact of differences identified in terms of concomitant ICS use, concomitant LABAs or LAMAs, COPD severity, and exacerbation history that were considered most likely to cause bias. These covariates were selected based on the extent of the variation across the RCTs and any evidence regarding treatment effect modifiers in the individual studies. Initially the results for the base case analysis at 6 months are presented by outcome. Figures 4,5, and 6 present the base case and scenario analyses at 6 months, which illustrate the results of the NMA for each treatment versus placebo in terms of lung function, health status, and dyspnoea, respectively. The last section summarizes the impact of the scenario analyses across the outcomes.
Trough and post-dose forced expiratory volume in 1 second In terms of change from baseline (CFB) in lung function, results from the base case analysis suggest that there is 64% probability that indacaterol 300 μg provides the greatest improvement in trough FEV 1 and an 83% probability of the greatest effect in post-dose FEV 1 . Indacaterol 150 μg (29%; 10%) and glycopyrronium (6%; 6%) are less likely to provide the greatest improvement in these

Scenario analysis
Base case excluding RCTs that included only patients with a history of exacerbations outcomes out of all interventions compared. (The probability of each treatment being the best is presented in online supplement (see Additional file 1: Table S8)). Tables 1 and 2 present the treatment effect estimates for each treatment comparison for trough and post-dose FEV 1 , respectively. In terms of trough FEV 1 all treatments are expected to be more efficacious than placebo. The largest difference in trough FEV 1 was between indacaterol and formoterol (range in point estimates for difference in CFB for indacaterol 150-300 μg: 93-100 mL), although indacaterol was also more efficacious than salmeterol 50 μg (72-79 mL), tiotropium 5 μg (41-48 mL) and tiotropium 18 μg (37-44 mL). Both glycopyrronium 50 μg and tiotropium 18 μg were more efficacious than formoterol 12 μg (73 mL and 55 mL, respectively) and salmeterol 50 μg (53 mL and 35 mL, respectively) in terms of trough FEV 1 . The availability of data for post-dose FEV 1 was more limited; although the treatments were all more efficacious than placebo no differences were detected between the active treatments.

St. George's Respiratory questionnaire
In terms of health status, base case results indicated there is a~50% probability that indacaterol 150 μg is the   most efficacious in terms of SGRQ total score and SGRQ responders, which was followed by indacaterol 300 μg (25%; 10%), formoterol 12 μg (6%; 13%), and glycopyrronium (15%; 12%). Tables 3 and 4 present the treatment effect estimates for each treatment comparison for SGRQ total score and responders, respectively. All active treatments are expected to be more efficacious than placebo for SGRQ total score. For SGRQ response only indacaterol 150 μg and 300 μg, tiotropium 18 μg and glycopyrronium 50 μg were more efficacious than placebo, whereas the credible intervals for the other treatment estimates versus placebo include 1. With respect to SGRQ total score, indacaterol was more efficacious than salmeterol 50 μg (indacaterol 150 μg/300 μg difference point estimates ranging from −2.26 to −2.56 points), as was glycopyrronium 50 μg (−1.87 points) and tiotropium 18 μg (−1.32 points), although an improved response was only observed for the comparison of indacaterol 150 μg versus salmeterol (odds ratio (OR) of 1.42).

Transition dyspnoea index
For TDI, base case results suggest there is an 86% probability that indacaterol 300 μg is the best treatment as measured with the total score; In a responder analysis using TDI, a 95% probability was obtained with indacaterol 300 μg, which was followed by formoterol 12 μg (4%), and indacaterol 150 μg (1%), and glycopyrronium (1%). Tables 5  and 6 present the treatment effect estimates for each treatment comparison for TDI total score and responders, respectively. Indacaterol 300 μg and tiotropium were more efficacious than salmeterol 50 μg in terms of total score (difference of 0.58 and 0.32, respectively) and indacaterol 300 μg was also more efficacious than formoterol 12 μg

Scenario analyses
An overview of the NMA results for the continuous outcomes at 6 months is presented in Figure 7 for the different scenarios using symbols to summarize the main conclusion for each comparison and analysis. Treatment effect estimates were most sensitive to adjustment for disease severity (i.e. degree of airflow limitation). Results were less sensitive to adjustment for concomitant ICS use or concomitant LABA or LAMA use. Minimal changes were observed by excluding studies that required an exacerbation history. Overall, the different meta-regression analyses  resulted in minimal changes in the treatment effect estimates and did not alter the interpretation.

Discussion
The objective of this analysis was to compare the efficacy of individual bronchodilators for patients with moderate to severe COPD in terms of lung function, health status, and dyspnoea. Based on the results of the NMA at 6 months, indacaterol resulted in the best treatment at either the 150 or 300 μg dose, depending on the outcome assessed, although indacaterol was not always more efficacious than the alternative bronchodilators and differences versus other active treatments were small. Thresholds for clinically important differences have been established for active treatments versus placebo in terms of FEV 1 , SGRQ total score, and TDI total score. Although comparisons of alternative active treatments may not be expected to reach these thresholds, in the absence of any clear guidance for interpretation of active treatments these thresholds have been used to help identify whether differences between treatments are clinically relevant. FEV 1 thresholds defined to evaluate whether an active treatment has demonstrated a clinically meaning difference versus placebo (i.e. 100 mL-140 mL [64][65][66]) suggest that none of the differences in lung function between the active treatments were clinically meaningful in the analysis of all studies without covariates (basecase analysis). In terms of health status, improvements in SGRQ total score were identified for indacaterol 150 and 300 μg, glycopyrronium 50 μg, and tiotropium 18 μg in comparison to salmeterol, although differences were less than the 4 units for a clinically relevant difference. Only indacaterol 150 μg led to a clinically relevant response relative to salmeterol with respect to SGRQ. The estimated differences between treatments in terms of TDI total score were smaller than the threshold for Table 3 Results of base case NMA: Difference in intervention versus the comparator for CFB in SGRQ total score at 6 months, 95% credible intervals, and probability that the intervention is better than the comparator  The validity of the findings depends on the quality of the RCTs and the extent of any violations in the similarity and consistency assumptions across studies [22]. In a network meta-analysis of RCTs involving multiple treatment comparisons, the randomization holds only within the individual trials, and not across trials. If the different direct comparisons show systematic differences in study and patient characteristics, and these differences are treatment effect modifiers, then the estimates of any indirect comparison as obtained with the network metaanalysis will be biased. With a meta-regression analysis we aim to minimize this confounding bias by adjusting for inconsistencies in the evidence base.
The trials included in the network meta-analysis were generally of good quality. All trials were blinded with the exception of open label tiotropium 18 μg in three RCTs, which has been shown to be comparable to blinded results for FEV 1 , although with some minimal bias introduced on more subjective measures [67]. However, some differences across trials were identified in terms of concomitant ICS use, concomitant LABA or LAMA use, the severity of COPD and the exacerbation history requirements. Individual scenario analyses were performed to evaluate these differences using a either a metaregression model or by excluding specific studies that differed in terms of the characteristics identified. Overall, the interpretation of findings obtained with analysis based on all studies without adjustment for covariates (base-case analysis) was the same as obtained with the scenario analyses in the majority of cases. Only a few Table 4 Results of base case NMA: Difference in intervention versus the comparator for SGRQ responders at 6 months in terms of odds ratios (ORs), 95% credible intervals, and probability that the intervention is better than the comparator  scenarios suggest a slight difference in the strength of the comparative effects.
Although we went to great lengths to assess whether the network meta-analysis was biased by systematic differences between studies, the meta-regression analysis was based on study level data which has limitations. First of all, it was not feasible to include all covariates of interest simultaneously due to the limited number of data points. Second, study and patient characteristics were not consistently reported. For example, limited information was available for the exacerbation history of patients and therefore it was only possible to exclude trials that clearly required an exacerbation history. Similarly, information regarding COPD comorbidities was not consistently reported across the RCTs, so these potential differences could not be explored. Third, it is well known that meta-regression analysis based on study level data can be prone to ecological bias, which means that association between study level patient characteristics and the treatment effects may not reflect the individual-level effect modification of that covariate. As such, it has to be accepted that there is the risk of residual confounding bias.
This study aimed to provide a comprehensive evidence base. However, it was not possible to capture all recent studies. The literature search did not capture the AC-CORD [68] or ATTAIN [69] trials evaluating aclidinium 400 μg BID that were published after the date of the search, and an updated analysis including these studies is of interest. Moreover, data for new bronchodilators that may be of interest, such as vilanterol, olodaterol, and unmeclidinium, were not available and may necessitate an updated analysis including these treatments in future. It should also be noted, that in order to include the indacaterol and glycopyrronium trials that were not yet published at the time of the literature search, Novartis provided the corresponding clinical study reports. No attempt was made to obtain study reports from manufacturers of formoterol, salmeterol, or tiotropium. This may have induced a bias, but it is unlikely that key positive results were withheld from the primary papers.  The current paper focussed on the 6 month time point, but results at 12 weeks are available as well (online Additional file 1). Results at 6 months provide better insight regarding efficacy over a longer term than the 12 week results, particularly for patient reported outcomes, but data for the approved dose of indacaterol in the United States (75 μg) are only available at 12 weeks. There is also a need to evaluate whether there is sufficient data available to inform decision-makers regarding longer term comparative efficacy. One potential limitation of the current analysis is that some studies were excluded from the analysis if the outcomes reported deviated by more than 2 weeks from the specified time points of interest. For example, the analyses of SGRQ and trough FEV 1 at the 6 months excluded data from Stockley et al. 2006 (n = 634), and Chan et al. 2007 (n = 913), which may have influenced the results somewhat for salmeterol and tiotropium. However, given the large number of studies included for these treatments, the exclusion of these trials is not expected to have a large impact. Similarly, only postdose FEV 1 results at 2 hours after dosing were included, which reflected the most commonly reported time point.
Although several network meta-analyses have been published in the area of COPD, it is challenging to compare the current results to previous analyses. Earlier analyses did not include indacaterol [70,71], and more recent analyses have focussed on comparisons to fixed-dose combinations [72,73] or were not as comprehensive in terms of the data or outcomes evaluated. For example, the analysis by Cope et al. 2012 [74] was restricted to four trials from the indacaterol trial program and the review by Cope et al. 2012 [75] focussed on trough FEV 1 and SGRQ total score at 12 weeks. Furthermore, these studies did not include evidence regarding treatments such as glycopyrronium 50 μg or tiotropium 5 μg. Recent meta-analyses restricted the evidence to RCTs that directly compared the active interventions of interest [76,77] or placebo controlled trials [78] without considering the full network of evidence. Also, in some cases alternative LABAs were pooled together (i.e. formoterol, salmeterol, and/or indacaterol), despite potential differences in these treatments, preventing a clear comparison to the current results. Therefore, to our knowledge, the current study generates new evidence regarding the efficacy of monotherapies for moderate to severe COPD.    The efficacy outcomes analyzed here provide insight into a broad range of clinically relevant outcomes. FEV 1 is often a primary endpoint and reflects an important outcome from a clinical and regulatory perspective, providing a reproducible and objective measurement of airflow limitation [1]. SGRQ and TDI are based on validated instruments and provide unique insight into the patient perspective. Exacerbations reflect another key outcome given their impact on quality of life and resource utilization, although a separate publication will be developed in order to capture the complexity associated with these outcomes. It should be acknowledged that no safety outcomes were assessed, which is a critical aspect of decision-making that has been addressed by others [79][80][81][82][83].
In conclusion, based on the results of the NMA, indacaterol, glycopyrronium, and tiotropium are expected to be the most favourable bronchodilators in terms of lung function, health status, and dyspnoea at six months, although differences were only clinically meaningful for indacaterol 150 μg in comparison to salmeterol in terms of SGRQ and for indacaterol 300 μg in comparison to salmeterol, tiotropium and formoterol in terms of TDI response.

Additional file
Additional file 1: RCT study and patient characteristics, individual study results, flow diagram, and NMA results at 12 weeks. Table S1. Key study characteristics for RCTs included in the network meta-analysis. Table S2. Individual study results for trough FEV 1 at 12 weeks and 6 months (mL): difference in change from baseline (CFB) for treatment versus comparator. Table S3. Individual study results for post-dose FEV 1 at 12 weeks and 6 months (mL): difference in change from baseline (CFB) for treatment versus comparator. Table S4. Individual study results for SGRQ total score at 12 weeks and 6 months: difference in change from baseline (CFB) for treatment versus comparator. Table S5. Individual study results for SGRQ responders at 12 weeks and 6 months: n/N (proportion responders) for each treatment. Table S6. Individual study results for TDI total score at 12 weeks and 6 months: difference in change from baseline (CFB) for treatment versus comparator. Table S7. Individual study results for TDI responders at 12 weeks and 6 months: n/N (proportion responders) for each treatment. Table S8. Results of base case network meta-analysis: Probability of each treatment being the best in terms of trough and post-dose FEV 1 (mL), SGRQ total score and response, and TDI total score and response at 6 months. Figure S1. Flow diagram of study selection. Figure S2. Trough and post-dose FEV 1 network meta-analysis results at 12 weeks: Difference in change from baseline (CFB) versus placebo. Figure S3. SGRQ total score network meta-analysis results at 6 months: Difference in change from baseline (CFB) or odds ratio (OR) versus placebo. Figure S4. TDI total score network meta-analysis results at 6 months: Difference in change from baseline (CFB) or odds ratio (OR) versus placebo.