Improving the evaluation of COPD exacerbation treatment effects by accounting for early treatment discontinuations: a post-hoc analysis of randomized clinical trials

Background Chronic obstructive pulmonary disease (COPD) clinical trials aimed at evaluating treatment effects on exacerbations often suffer from early discontinuations of randomized treatment. Treatment discontinuations imply a loss of information and should ideally be considered in the statistical analysis of trial results, particularly if the discontinuations are related to the disease or treatment itself. Here, we explore this issue by investigating (1) whether there exists an association between the risks of exacerbation and treatment discontinuation in COPD clinical trials and (2) whether disregarding this association can cause bias in exacerbation treatment effect estimates. We focus on the hypothetical estimand, i.e. the treatment effect that would have been observed had all subjects completed the trial as planned. Methods The association between exacerbation and discontinuation risks was analysed by applying a joint frailty (random effect) model – allowing for the simultaneous analysis of multiple types of correlated events – to data from five Phase III-IV COPD clinical trials. Specifically, the impact of the association on exacerbation treatment effect estimates was assessed by comparing the treatment hazard ratios of the joint frailty model to the rate/hazard ratios of two related statistical models (the negative binomial and shared frailty models), which both assume discontinuations to be unrelated to the trial outcome. The models were also compared using simulated data. Results A statistically significant (p < 0.0001), positive association between exacerbation and discontinuation risks was found in all trials. Importantly, simulations confirmed that – with such an association – models disregarding the association risk producing biased results (> 5 percentage point difference in hazard/rate ratio). For some treatment comparisons in the clinical trials, the difference in treatment effect estimates between the joint frailty and the other models was as high as 10–15 percentage points. The difference was affected by the strength of the exacerbation-discontinuation association, the population heterogeneity in exacerbation risk, and the difference in discontinuation rates between treatment arms. Conclusions We have identified an association between the risks of exacerbation and treatment discontinuation in five COPD clinical trials. We recommend using the joint frailty model to account for this association when estimating exacerbation treatment effects, particularly when targeting the hypothetical estimand.


S2 Joint frailty model for recurrent episodes and a terminal event
For a patient i, i = 1, · · · , N we observe time of a terminal event (time to early treatment discontinuation), T i = min(T * i , C i ) which is the minimum of time to the true event from the beginning of the study, T * i , and the time of the right-censoring, C i which is assumed to be non-informative and corresponds to the case when, for a given patient, an early treatment discontinuation was not observed, i.e. the individual completed the study. We denote the observed starting times  NA -category as such not included in the study protocol a In datasets A-C, early treatment discontinuation from the study due to deterioration of COPD was recorded as "adverse event" b "Sponsor decision" applies only to dataset E and means that the sites were shut down Table S1: Reasons for early treatment discontinuation in the analyzed datasets. Percentages in the different discontinuation categories are based on the total number of discontinuations in each dataset. Among the individuals who discontinued treatment early in the trials, there were also patients who stopped treatment as they were not willing to continue and subsequently withdrew their consent. These patients are not presented here as their data was not available due to data re-use rules.
of the recurrent episodes T s ij = min(T s * ij , C i , T * i ), j = 1, · · · , n i , with T s * ij the true time of the start of the episode (exacerbation). Analogically, we denote the observed ending time of the recurrent episodes T e ij = min(T e * ij , C i , T * i ). We define the indicators of the events, δ i = I {T i =T * i } , for early treatment discontinuations, and δ ij = I {T s ij =T s * ij } , for exacerbations. We assume, for all ij ≤t} define counting processes that represent the number of observed event starts and endings before individual i was completed or withdrew from the study since the study entry. Denote by Y ex i (t−)} the at-risk process whether the previous exacerbation has terminated. The joint frailty model is expressed as the conditional hazard functions for recurrent and terminal events considering repeated episodes and is defined by [6]: where X ex,ij are covariates for the recurrent event process and X ed,i are covariates for the terminal event process. Parameters β ex and β ed are the related regression coefficients. Shared frailty terms u i are assumed to be independent of each other and identically distributed under a gamma distribution, u i ∼ Γ(1/θ, 1/θ). The variance θ reflects the population heterogeneity, the intensity of the association between the recurrent events and the terminal event as well as the correlation between the recurrent events of the same subject. Finally, the parameter of association α determines the intensity and the sign of influence of the recurrent events on the terminal event. In a joint model for recurrent events and a terminal event, for subject i at each rank j we observe {T s ij , T e ij , δ ij , T i , δ i }. Denote by ξ = (r 0 (·), λ 0 (·), β ex , β ed , θ, α) the vector of parameters to estimate. We assume that the processes of recurrent and terminal events are independent of each other given the frailty u i . Thus, the joint individual marginal likelihood can be expressed by: where functions f T ex |u i (·|·) and f T ed |u i (·|·) denote density functions of recurrent events and the terminal event, respectively, and f u i (·) is the density function of the frailty term.
The inference of the model uses the approach of penalized maximum likelihood estimation with cubic M-splines, polynomial functions of order 3 [7], for the approximation of the baseline risk functions. The penalization of the log-likelihood is performed for the smooth estimation of baseline hazard functions. The penalty terms include second derivatives of the baseline risk functions and smoothing parameters that control the trade-off between the smoothness and fit to the data. The smoothing parameters can be chosen using appropriate automatic procedures on reduced models [8].
The integral in the marginal likelihood has no analytical solution and we approximate it using the Gauss-Legendre quadrature. In order to provide accurate estimates we place the quadrature points in regions that can be problematic when the gamma distribution for the frailty has a high variance (θ > 1). We use 50 points in the region (0,0.001), 50 points in (0.001, 1), 20 points in (1,10) and 20 points in (10,50). The penalized log-likelihood is maximized using the Marquardt algorithm. The standard errors of the estimated parameters are obtained from the inverse Hessian matrix of the penalized log-likelihood.
The baseline hazards are estimated in the model using the approximation of cubic M-splines. In this approach, smooth hazard functions with low local variations are assumed and this is obtained by introducing penalized maximum likelihood estimation (MPLE). In the MPLE, the log-likelihood of the model is penalized by term that have large values for rough functions. The smoothing parameters that control the trade-off between the fit to the data and smoothness, can be chosen using appropriate automatic procedures on the reduced models [8].

S3 Negative binomial model
The negative binomial model can be parametrized in a number of different ways. With the mean parametrization as is used in the R implementation of the glm.nb function, the density function is defined as Here, k is the outcome (in our case the number of events), µ is the mean (number of events), and θ is a parameter related to the (excess) variance modeled compared to a simple Poisson distribution. If X has a negative binomial distribution under this parametrization we have that where we refer to δ as the dispersion parameter. This parameter is tabulated in table S3 for the negative binomial models studied.

S4 Software
The joint frailty and shared frailty models were all estimated using the R package frailtypack [11]. The negative binomial models were estimated with the function glm.nb from the R package MASS [12]. The LASSO regression was performed using the R package glmnet [13].

S5 Simulation study
The main purpose of the simulation study was to investigate the impact of differential early treatment discontinuation, either positively or negatively related to therapy and/or medications, on the estimation of the treatment effect on the risk of recurrent exacerbations.

S5.1 Data generation
We generated recurrent event times and terminal event times from a joint frailty model assuming the processes being positively correlated. The generated datasets were similar to the clinical ones; we assumed a study length of one year (fixed right-censoring) with two treatment groups and 800 individuals in total. Firstly, for an individual, we generated the frailty parameter from a gamma distribution, then we randomly assigned the individual to the treatment group and we used these values to generate the time to early treatment discontinuation using exponential baseline hazard functions. If the generated time to early treatment discontinuation was larger than one year, the individual was considered as "censored", i.e. he/she completed the study and the early treatment discontinuation was not observed. Then, considering the generated time to terminal event, we generated recurrent event times up to the time of the end of observation. We assumed the hazard ratio of treatment for recurrent exacerbations equal to 0.6.

S5.2 Scenarios and models
We investigated the bias of treatment effect on exacerbations assuming different effects of treatment on early discontinuations. In particular, we considered hazard ratios for discontinuation related treatment effects equal to 0.5, meaning that individuals tended to discontinue more in the control group, HR = 1, which means no differential early treatment discontinuation, and HR = 2 that corresponded to the case of more early withdrawals in the treatment group. For each scenario we also considered different rates of exacerbations and early treatment discontinuations. We controlled the rates of events by adjusting the exponential distribution parameter for the baseline hazards. Finally, we investigated the bias by considering different combinations of strong (α = 1.5) and weak association (α = 0.8) between the recurrent and terminal event processes and high (θ = 2) and moderate (θ = 0.8) variance of the frailty term. The simulations results for each scenario were obtained through 1000 replicates of 800 individuals. We estimated the shared frailty and joint frailty models with an assumed gamma distribution for the frailty term and applied splines with 4 knots distributed using equidistant knots. The penalty terms were found for each dataset using the automatic cross-validation procedure in the shared frailty model for exacerbations. In the joint model, we used the same penalty for recurrent episodes and for terminal events, we used the penalty obtained from a Cox model for early treatment discontinuations. In case of non-convergence, we changed the value of the penalty term (by multiplying by 10 values smaller than 100 and dividing by 10 values larger than 100) until convergence was reached.

S5.3 Results
In all scenarios, we observed biased estimates of the treatment effect on the risk of exacerbations using shared frailty models that ignore the process of early treatment discontinuations ( Figure  S1, the right bottom panel corresponds to the results presented in Figure 3 in the main text). When applying the joint frailty model to the same generated datasets, we found unbiased results ( Figure S2).
With the shared frailty model, we observed underestimation of the treatment effect on the risk of exacerbations when more patients withdraw from the study in the control group (HR = 0.5 for treatment effect on the risk of early discontinuations). On the other hand, the treatment effect on the risk of exacerbations was overestimated for HR on early treatment discontinuations equal to 2 (similar to trends observed in datasets D and E), when the treatment is associated with increased risk of early discontinuations. Finally, when there were no differential early treatment discontinuations, i.e. HR of the treatment effect on the risk of discontinuations equaled 1, we observed small bias in the estimates of the effect of treatment on the risks of exacerbations in the shared frailty model.
We observed a larger bias in scenarios with the strong association and high variance compared to the scenarios of the weak association and moderate variance. When considering the rates of events on the magnitude of the bias of the treatment effect, we found that the number of exacerbations observed per patient had less influence than the rate of observed early treatment discontinuations in the data. In the case, when there were many discontinuations (40%), we found larger average bias than in the scenario with smaller number of discontinuations (25%). Indeed, when we have many early treatment discontinuations in the data, the average time of follow up is shorter and we are less able to correctly estimate the parameters. On the other hand, the results showed that even if the rate of discontinuations is not larger than 25%, as it is the case in our application, we are still able to observe the bias of the effect of treatment on the risk of exacerbations if the process of early treatment discontinuations is ignored. Treatment effect on discontinuation (hazard ratio) Treatment effect on exacerbations (median hazard ratio) Figure S2: Bias of treatment effect on the risk of exacerbations using joint frailty model that includes the process of early treatment discontinuations in the simulation study in scenarios of different strength of association between the generated processes. Different points on the plot represent scenarios with different rates of exacerbations and early treatment discontinuations in the data.  For the COPD datasets we estimated models with only treatment effect as a covariate. For the shared frailty and joint frailty models, we used percentile splines (the intervals for splines were divided in a way to obtain the same number of events in each interval) for estimating the baseline hazard functions with the number of knots chosen to ensure the best goodnessof-fit to the data (dataset A: 13, B: 12, C: 12, D: 15, E: 14). Random effects were assumed to be distributed according to a gamma distribution so that the models would be similar to the rate analysis using negative binomial models. However, a sensitivity analysis of the frailty distribution showed that the estimations and goodness-of-fit (using an approximated likelihood cross-validation criterion, LCV [9]) were almost identical for both gamma and log-normal joint frailty models. The exacerbation hazard ratios in all the models as well as association parameters in the joint frailty model are presented in the main manuscript (Tables 3 and 4). Estimates of the frailty variance in the shared frailty model and dispersion parameters in the negative binomial model are presented in Table S2. Early treatment discontinuation hazard ratios in the joint frailty model are presented in Table S3.
For comparison with pooled data analysis with covariates, results of the joint frailty model for pooled dataset with no covariates (only treatment and study effect) are presented in Table  S4.

S6.2 Joint models with covariates applied to pooled COPD datasets
The analysis that included demographic and disease related prognostic factors as covariates in addition to treatment was performed on pooled datasets. The variables are listed in Table S5. We combined datasets with similar treatments and similar design; from datasets A, B and C we created pooled A-C and from datasets D and E we formed pooled D-E.
The covariates were selected for the analysis using a combination of an automatic procedure and clinical knowledge. Firstly, the LASSO (least absolute shrinkage and selection operator)   and impact (0-100 score) from the run-in period CAT score total score, i.e. sum of cough, mucus, tightness of chest, value at baseline (D-E trials) breathlessness, activities limitation, confidence to leave home, sleep, energy (0-40 score) Table S5: Baseline variables considered in the analysis.
Cox regression [10] was performed for simplified models: time to first exacerbation and time to early treatment discontinuation applied to separate datasets. Four methods for finding optimum penalty were applied: 10-fold cross-validation (CV) rule, one standard error 10-fold CV rule, BIC rule and bootstrap (B=1000) BIC rule. Then, we created a set of covariates (including treatment) for datasets A-C if the covariates were selected by at least two different methods in at least two datasets. For datasets D-E, the covariates were chosen if they were selected in at least two methods. The covariate sets were complemented by variables that, although not selected by the LASSO procedure, were found to be clinically meaningful. The final sets of covariates for each study was found using stepwise backward selection with joint frailty models on each pooled dataset. In this step we also added the time-varying seasonality covariate. To facilitate the model selection, all continuous variables were standardized. The respective steps of the covariate selection are presented in Figure S3 and the specifications of the final set of variables can be found in table 5 of the main text. In the final model for pooled A-C we did not use the laboratory variable (neutrophils, significant effect on exacerbation risk, p-value = 0.04) even though it was chosen in the backward selection procedure as there were missing values for an important number of patients (48 in dataset A, 38 in B and 36 in C) and imputing values would introduce additional uncertainty to the estimates. Summary of all the included covariates is presented in Table S6.
We applied splines for the approximation of baseline hazard functions using the approach of percentiles. Using graphical assessment of the estimated baseline hazard functions and fit to the data using LCV, we found that 13 knots were enough to well approximate the functions in both pooled A-C and D-E datasets. The plots of the estimated baseline hazard risks are shown in Figure S4. We observed hazard functions for exacerbations with quite regular periods of increased and decreased risks during the follow-up. The pattern of periods of increased risks was similar for exacerbations and early treatment discontinuations but the increase of the risk of discontinuation was smaller compared to exacerbations.  Figure S3: Flow chart of covariate selection process for the joint models analysis (Ex -exacerbation risk process, ETD -early treatment discontinuation risk process).     Figure S4: Estimated baseline hazard functions of the full joint frailty models for pooled datasets (output from the package frailtypack ). Red lines correspond to baseline hazard risks of recurrent moderate/severe exacerbations and green lines to early treatment discontinuations.