Skip to main content

Prediction of long-term mortality by using machine learning models in Chinese patients with connective tissue disease-associated interstitial lung disease



The exact risk assessment is crucial for the management of connective tissue disease-associated interstitial lung disease (CTD-ILD) patients. In the present study, we develop a nomogram to predict 3‑ and 5-year mortality by using machine learning approach and test the ILD-GAP model in Chinese CTD-ILD patients.


CTD-ILD patients who were diagnosed and treated at the First Affiliated Hospital of Zhengzhou University were enrolled based on a prior well-designed criterion between February 2011 and July 2018. Cox regression with the least absolute shrinkage and selection operator (LASSO) was used to screen out the predictors and generate a nomogram. Internal validation was performed using bootstrap resampling. Then, the nomogram and ILD-GAP model were assessed via likelihood ratio testing, Harrell’s C index, integrated discrimination improvement (IDI), the net reclassification improvement (NRI) and decision curve analysis.


A total of 675 consecutive CTD-ILD patients were enrolled in this study, during the median follow-up period of 50 (interquartile range, 38–65) months, 158 patients died (mortality rate 23.4%). After feature selection, 9 variables were identified: age, rheumatoid arthritis, lung diffusing capacity for carbon monoxide, right ventricular diameter, right atrial area, honeycombing, immunosuppressive agents, aspartate transaminase and albumin. A predictive nomogram was generated by integrating these variables, which provided better mortality estimates than ILD-GAP model based on the likelihood ratio testing, Harrell’s C index (0.767 and 0.652 respectively) and calibration plots. Application of the nomogram resulted in an improved IDI (3- and 5-year, 0.137 and 0.136 respectively) and NRI (3- and 5-year, 0.294 and 0.325 respectively) compared with ILD-GAP model. In addition, the nomogram was more clinically useful revealed by decision curve analysis.


The results from our study prove that the ILD-GAP model may exhibit an inapplicable role in predicting mortality risk in Chinese CTD-ILD patients. The nomogram we developed performed well in predicting 3‑ and 5-year mortality risk of Chinese CTD-ILD patients, but further studies and external validation will be required to determine the clinical usefulness of the nomogram.


Connective tissue disease (CTD) which consists of many autoimmune mechanisms is characterized by self-directed inflammation often leading to collagen deposition, tissue damage and ultimately target organs failure [1]. CTD could involve multiple organs and systems, among which interstitial lung disease (ILD) remains a main cause of morbidity and mortality [2]. The median survival time for patients with CTD-associated ILD (CTD-ILD) was reported to be around 6.5 years, and up to 12.4% of patients with CTD-ILD die of ILD [3, 4]. Thus, the exact risk assessment is crucial for the management of CTD-ILD patients.

The risk prediction of CTD-ILD remains challenging, due to the heterogeneity in patient-specific and disease-specific variables. The ILD-gender-age-physiology (ILD-GAP) model is a multidimensional mortality risk prediction model composed by the ILD diagnosis, sex, age, the percent predicted values of forced vital capacity (FVC %Predicted) and the percent predicted values of diffusion capacity of lung for carbon monoxide (DLco %Predicted). Since the ILD-GAP model was firstly established by Christopher J. Ryerson et al. based on North America population, it was wildly used to predict mortality across all chronic ILD subtypes, including CTD-ILD [5]. However, the ILD-GAP model has not been validated in Chinese CTD-ILD patients. Therefore, more inclusive studies are needed to validate and improve the prediction accuracy of the existing assessment model.

We performed this study to establish a comprehensive predictive nomogram by using machine learning algorithms, involving demographic characteristics, clincal features, echocardiography, laboratory testing as well as imageological examination. Furthermore, we also validated whether the combination of the nomogram and ILD-GAP model could generate a superior prognostic performance.



CTD-ILD patients who were diagnosed and treated at the First Affiliated Hospital of Zhengzhou University were enrolled based on a prior well-designed criterion between February 2011 and July 2018. The patients would be included if they met four of the following inclusion criteria: (1) Patients were diagnosed CTD-ILD recommendated by the American Rheumatism Association and the American College of Rheumatology [6,7,8,9,10,11,12], including polymyositis/dermatomyositis (PM/DM), systemic lupus erythematosus (SLE), systemic sclerosis (SSc), ankylosing spondylitis (AS), sjogren syndrome (SS), mixed connective tissue disease (MCTD), rheumatoid arthritis (RA), undifferentiated connective tissue disease (UCTD) and overlap syndromes (OCTD). UCTD patients should also followed the diagnostic criteria for UCTD-ILD established by the previous research [13]; (2) having clinical symptoms (dyspnea or cough); (3) having signs suggestive of ILD (endinspiratory bibasilar crepitations); (4) having radiographic signs (honeycombing, ground-glass opacities, nodular or reticulonodular) of ILD confirmed by high-resolution computed tomography (HRCT). The patients would be excluded if they met one of the following exclusion criteria: (1) Age younger than 18 years; (2) pregnancy; (3) lossing to follow-up; (4) incomplete clinical records. This study received the Institutional Review Board approval by the First Affiliated Hospital of Zhengzhou University (2019-KY-116).

Data collection

Demographic variables were extraction from medical chart review, including age, sex, occupation, smoking history, days of symptoms, medication treatment history, chronic disease history (diabetes and hypertension), CTD types, PFTs, echocardiography, laboratory data (routine inflammatory, hematological and biochemical parameters) and chest HRCT.

The collected PTFs data included FVC %Predicted, the percent predicted values of forced expiratory volume in one second (FEV1%Predicted), FEV1/FVC and DLco %Predicted.

The collected echocardiography data included right ventricular diameter (RVD), right atrial area (RAA), left ventricular diameter, aortic annulus diameter, left atrial diameter, ascending aortic diameter, pulmonary artery diameter, pulmonary artery systolic pressure (PASP), aortic valve regurgitation peak velocity, tricuspid regurgitant peak velocity and left ventricular ejection fraction (LVEF).

The collected laboratory data included klebs von den Lungen-6, procalcitonin, complement component C4, complement component C3, C-reactive protein (CRP), erythrocyte sedimentation rate, leukocyte count, platelet count, hemoglobin count, erythrocyte count, hematocrit, blood urea nitrogen, B-type natriuretic peptide (BNP), uric acid, creatinine, fasting blood glucose, aspartate transaminase (AST), alanine aminotransferase, γ-Glutamyltranspeptidase (GGT), alkaline phospatase, total protein, albumin (ALB), globulin, triglyceride, prothrombin time, cholesterol, activated partial thromboplastin time, prothrombin time activity, thrombin time, international normalized ratio, fibrinogen and D‐dimer.

HRCT images were reviewed independently by 2 expert thoracic radiologists, who were kept blinded for patients’ diagnosis. Images were re-evaluated till reaching a consensus when divergence occurred. The collected HRCT characteristics included honeycombing, ground-glass opacities, nodular, fine reticular opacities, local pleural thickening, pulmonary bullous, hydrothorax and hydropericardium.

Follow‑up and study outcome

All-cause mortality was the endpoint during follow-up until July 2021. Patients’ follow-up were performed by contacting with patients or their family through mobile phone.

Statistical analyses

Analyses were performed with the R programming language (R Core Team, online, 2021; version 4.1.0). Mean ± standard deviation (SD) was used to present continuous normal distributed variables, median (Interquartile Range, IQR) was used to present non-normal distributed parameters. The student t-test was applied to the comparison of normal distribution random variables. Wilcoxon signed-rank test was applied to comparison non-normal distribution variables. Besides, a Chi-square test and fisher exact test were employed for comparing categorical data. First, multiply-imputed by chained equations was conducted to impute covariates by using the “mice” package in R. Second, the method least absolute shrinkage and selection operator (LASSO) was done to avoid overfitting by using the “glmnet” package, and we tuned lambda (λ) by a tenfold cross-validation (CV) method by using the “cv.glmnet” function from the “glmnet” R package. Then, the Cox regression analysis was uesd to assess the significance of remained predicted factors in mortality by using the function “coxph” in the R package “survival”, and the prognostic nomogram was established by multivariable Cox regression coefficients based on package “rms”. Finally, the calibration plot of internal validation was conducted via a bootstrap method with 1000 resamples, by the “rms” R package, specifying the parameter “method = “boot”, B = 1000”, from the training set (n = 1000). The predicted performance of the established nomogram and the ILD-GAP model was compared with Harrell’s C index (“survival” R package), likelihood ratio testing (“lrtest” function in R package “lmtest”), a continuous version of the net reclassification improvement (NRI) and integrated discrimination improvement (IDI) (R package “survC1” and “survIDINRI”). Additionally, the decision curve analysis (DCA) was performed using the source file “stdca.R”. P -values (P) less than 0.050 were considered statistically significant.


Patient characteristics

The process of patient screening is illustrated in Fig. 1. After excluding the patients with younger than 18 years (n = 4), pregnancy (n = 2), much missing data (n = 5) and loss of follow-up (n = 43), a total of 675 patients eventually entered into the study. There were no significant deviations between the enrolled patients and patients were lost to follow-up in age, gender, occupation, smoking history, days of symptoms, medication treatment history, chronic disease history, pulmonary function test (PFTs) and HRCT (all P > 0.050). Therefore, excluding the patients with loss of follow-up may not affect the overall results in our study (Table 1).

Fig. 1
figure 1

The flowchart of patient screening and selection for this study. CTD-ILD, connective tissue disease-associated interstitial lung disease

Table 1 Clinical characteristics of CTD-ILD patients

In this study, the mean age of the cases was 54 ± 12 years (23.7% of male and 11.1% of ever smokers), and the median follow-up period was 50 months (interquartile range, 38–65). The disease subtypes comprise mainly polymyositis/dermatomyositis (29.8%), systemic lupus erythematosus (8.3%), systemic sclerosis (13.3%), ankylosing spondylitis (0.1%), sjogren syndrome (14.4%), mixed connective tissue disease (5.9%), rheumatoid arthritis (RA) (7.4%), undifferentiated connective tissue disease (15.3%) and overlap syndromes (5.5%). 158 patients died during the follow-up period, the 3- and 5-year mortality were 17.1% (95% confidence interval (CI) 14.2–19.8%) and 24.5% (95% CI 20.9–28.0%), respectively (Fig. 2). As compared with survival patients, deceased patients were significantly more likely to be older, males, ever smokers, farmers and treated without immunosuppressive drugs (all P < 0.050). Patients with RA had the highest mortality compared to the other CTD subtypes (P < 0.001). Deceased patients were also more likely to have lower the percent predicted values of diffusion capacity of lung for carbon monoxide (DLco %Predicted) and left ventricular ejection fraction (LVEF), lager right atrial area (RAA), and higher pulmonary artery systolic pressure (PASP) (all P < 0.050). In addition, when presenting with honeycombing, fine reticular opacities, local pleural thickening, pulmonary bullous, hydrothorax and hydropericardium on chest HRCT, most CTD-ILD patients are more exposed to the risk of dying (all P < 0.050) (Table 1).

Fig. 2
figure 2

All-cause mortality among 675 Chinese CTD-ILD patients. CTD-ILD, connective tissue disease-associated interstitial lung disease

Model derivation

A total of 74 prognostic indicators were included in this study. First, we reduced the dimension and picked the most meaningful prognostic indicators by LASSO Cox regression penalty. Subsequently, a tenfold cross-validation of the lasso model was performed for tuning parameter selection via the minimum criteria (Fig. 3A). The trajectory of each prognostic indicators coefficient was observed in the LASSO coefficient profiles with the changing of the log-transformed lambda in LASSO algorithm (Fig. 3B).

Fig. 3
figure 3

The Cox regression model with LASSO (Least Absolute Shrinkage and Selection Operator) was adopted to reduce the redundancy of high-dimensional features and to select the most useful prognostic features. The lambda with 1 standard error of the minimum criteria (the 1-SE criteria) by the black line, and the red line equals lambda with the minimum criteria. A λ value of 0.052, with log (λ) of − 2.950 was chosen (the minimum criteria) according to tenfold cross-validation (A). LASSO coefficient profiles of the 74 features. A coefficient profile plot was produced against the log (λ) sequence. Red vertical line was drawn at the value selected using tenfold cross-validation, where optimal λ resulted in 14 nonzero coefficients (B)

Finally, the optimal lambda value was 0.052 (log (lambda) was − 2.950) by using the LASSO algorithm and 14 variables were selected as potential prognosis-related indicators, including age, RA, Dlco %Predicted, right ventricular diameter (RVD), RAA, PASP, LVEF, honeycombing, C-reactive protein (CRP), B-type natriuretic peptide (BNP), aspartate transaminase (AST), γ-Glutamyltranspeptidase (GGT), albumin (ALB) and immunosuppressive agents. Univariable analysis showed that increased age (hazard ratio (HR) 1.041, 95% CI 1.027–1.055), RVD (HR 1.027, 95% CI 1.017–1.038), RAA (HR 1.122, 95% CI 1.093–1.151), PASP (HR 1.025, 95% CI 1.017–1.034), CRP (HR 1.005, 95% CI 1.002–1.008), BNP (HR 1.000, 95% CI 1.000–1.000), AST (HR 1.003, 95% CI 1.002–1.1005), GGT (HR 1.002, 95% CI 1.001–1.1003) and a lower DLCO %Predicted (HR 0.982, 95% CI 0.975–0.990), LVEF (HR 0.949, 95% CI 0.926–0.973), ALB levels (HR 0.936, 95% CI 0.914–0.959) correlated with increased mortality (all P < 0.001). Patients with RA (HR 2.292, 95% CI 1.539–3.413, P < 0.001) and honeycombing (HR 2.167, 95% CI 1.392–3.373, P = 0.001) also had higher mortality. In addition, mortality declined in those patients receiving immunosuppressive agents therapy (HR 0.506, 95% CI 0.367–0.697, P < 0.001) (Table 2). Significant variables (P value < 0.050) of the univariate analysis were entered into a multivariate Cox model, and showed that age, RA, Dlco %Predicted, RVD, RAA, honeycombinge, immunosuppressive agents, AST, ALB affected overall mortality significantly (all P < 0.050) (Table 2). According to multivariable Cox regression analysis, 9 independent variables were enrolled in nomogram for prognostic assessment (Fig. 4).

Table 2 Risk factors for all-cause mortality in CTD-ILD
Fig. 4
figure 4

Nomogram predicting CTD-ILD mortality at 3 and 5 years. The nomogram was developed in the primary cohort, with age, rheumatoid arthritis (RA), the percent predicted values of diffusion capacity of lung for carbon monoxide (DLco %Predicted), right ventricular diameter (RVD), right atrial area (RAA), honeycombing, aspartate transaminase (AST), albumin (ALB) and immunosuppressive agents incorporated. The predicted mortality at 3 and 5 years is then obtained from each scale by referring to the corresponding value

Model validation

The ILD-GAP model exhibited increasing mortality rates in patents with higher scores by univariate variable Cox regression (HR 1.413, 95% CI 1.285–1.554, P < 0.001; Table 2). However, the ILD-GAP model did not perform well in predicting mortality (Harrell’s C index 0.652), and calibration plots showed that 3- and 5-year predicted survival rates were overestimated (Fig. 5A, B).

Fig. 5
figure 5

Calibration plots of ILD-GAP model and nomogram showing predicted 3-year (A and C, respectively) and 5-year (B and D, respectively) survival by stage against actual survival

The nomogram exhibited a better prognostic performance (Harrell’s C index 0.767) compared with the ILD-GAP model, because likelihood-ratio test indicated that there was a statistically significant improvement after the inclusion of nomogram in the ILD-GAP model (P < 0.001), but no statistical difference after the inclusion of the ILD-GAP model in nomogram (P = 0.455) (Table 3). Calibration plots for nomogram predicted 3- and 5-year overall survival showed good agreement with actual observations (Fig. 5C, D). The nomogram also improved the ability of discriminate 3-year (0.137 and 0.294, IDI and NRI respectively, all P < 0.001) and 5-year (0.136 and 0.325, IDI and NRI respectively, all P < 0.001) mortality rates compared to ILD-GAP model (Table 4). To substantiate the utility of the both models, we performed decision curve analysis. For the optimal decision threshold > 0%, the nomogram showed a better net benefit than the ILD-GAP model for clinical intervention (Fig. 6A, B). In internal validation, the average Harrell’s C index for the prediction models developed in the bootstrap sample was 0.876, and the estimate of optimism was − 0.108.

Table 3 Comparison of nomogram and the ILD-GAP model
Table 4 Prediction improvement with nomogram compared to ILD-GAP model
Fig. 6
figure 6

Decision curve analysis comparing the clinical performance of nomogram and the ILD-GAP model. For risk of 3‑year (A) and 5-year (B) mortality, nomogram showed the highest net benefit for all potential thresholds. The black dot line represents the nomogram and the red dot line represents the ILD-GAP model. The black solid line represents the assumption that no patients have received treatment and the blue solid line represents the assumption that all patients have received treatment


The ILD-GAP model was derived and validated in a Western cohort but has not been validated in Chinese population to date, its ability to accurately define disease stage is partly debated [14,15,16]. In order to eliminate potently racial bias from the ILD-GAP model, we developed a nomogram for predicting 3‑ and 5-year mortality of Chinese CTD-ILD patients by using a machine learning approach and tested whether the combination of the nomogram and ILD-GAP model could generate a superior prognostic performance.

Multivariable analysis demonstrated that older age, RA, honeycombing, lower Dlco %Predicted and ALB, increased RVD, RAA and AST associated with higher mortality, but receiving immunosuppressive agents therapy correlated with reduced mortality. These independent risk factors can be supported by previous studies and theories. Age has been demonstrated to be an independent predictor of mortality in CTD-ILD by previous study, because older patients generally have more comorbidities and worse health status [16]. Among ILD, presenting usual interstitial pneumonia (UIP) on chest HRCT has a poor response to corticosteroids and a worse prognosis than other subtypes [17, 18]. Honeycombing occurs in up to 90% of UIP cases, and it is the most specific finding of UIP on chest HRCT [19]. Therefore, honeycombing is correlated with the prognosis of CTD-ILD patients to some extent. Gas exchange impairment is a common pathophysiological change at early stage of ILD, it typically presents as reduction of Dlco [20]. Qiang Fu et al. reported that the percent predicted values of Dlco < 45% is a risk factor for CTD-ILD prognosis [21]. A serum AST elevation and abnormal ALB can be caused by impaired heart, liver and kidney function due to CTD-ILD [22]. Long-term monitoring of serum AST and ALB can be and early warning signal before organ dysfunction occurs. Furthermore, the abnormal increase in AST and hypoalbuminemia have been shown to increase mortality in CTD-ILD patients [23,24,25]. Long-term hypoxia caused by gas exchange impairment may lead to an increase in pulmonary artery pressure and right ventricular afterload [26]. Right heart enlargement due to persistently increased afterload is a common cause of mortality in patients with ILD which is characterized by the increase of RVD and RAA [27]. In addition, glucocorticoid and immunosuppressive therapy are essential choices for CTD-ILD patients, and mortality can be reduced by the appropriate use immunosuppressive agents [2, 28,29,30]. ILD can complicate RA and it is associated with an excess in mortality [31]. Research has shown that nearly 10% of RA patient deaths were attributable to ILD. RA patients are more likely to die due to ILD compared to other CTD patients [2, 32].

We developed a nomogram by these independent mortality risk factors based on the multivariable analysis. In this nomogram, we assessed the association between predictor variables and time-to-event outcomes by LASSO-Cox method. Lasso is a machine learning algorithm that utilizes regularization to improve the estimation accuracy, it incorporates an L1-penalization term into the loss function forcing, which can shrink coefficients towards zero. Recently, LASSO-Cox method is popular by researchers, it could minimize overfitting and select predictors of nomogram [33].

In our cohort, the nomogram for Chinese CTD-ILD patients showed better discriminative ability, calibration and clinical net benefit compared with the ILD-GAP model. Despite the combination of the nomogram and ILD-GAP model was found to improve prognostic performance compared with the ILD-GAP model, it could not improve prognostic performance compared with the nomogram. Specifically, Harrell’s C index and calibration curve of the nomogram showed a good concordance for prediction and actual mortality risk. The nomogram also improved the ability of discriminating mortality compared to ILD-GAP model confirmed by integrated discrimination improvement and net reclassification improvement. For decision threshold > 0%, the nomogram showed a higher net benefit than the ILD-GAP model for clinical intervention in decision curve analysis. There are two results might explain why the ILD-GAP model is inferior to the nomogram in predicting prognosis of Chinese CTD-ILD patients. First, the ILD-GAP model was derived and validated in a Western cohort, there was no Chinese population involved. Thus, the risk of bias incurred from ethnic differences should also be considered. Second, the GAP risk prediction model was specifically developed for idiopathic pulmonary fibrosis (IPF) patients to prognosis prediction, from which the ILD-GAP model derived. However, the median survival time of IPF was much shorter compared to CTD-ILD [5, 34]. It is undeniable that the ILD-GAP model can provide important value for the treatment of CTD-ILD patients. To achieve the better predictable results, complex model seems necessary [35,36,37]. The clinical indicators included in this nomogram were routine and easily acquired data for most hospital which makes it applicable for daily clinical use. We strongly believe that the nomogram could be widely clinical referenced after cross-sectional and longitudinal validation and improvement.

Our study featured some limitations. First, the nomogram was not subjected to external validation, therefore caution is advised when employing it in a clinical framework. To the best of our knowledge, this is the first predictive model developed for predicting all-cause mortality of the Chinese population with CTD-ILD, we believe that an early report is urgent to provide a basis for future studies. Second, the disease categories were included as predictors in the nomogram instead of the serologic autoantibodies, because of the risk of collinearity. Third, The median survival time for CTD-ILD patients was reported to be around 6.5 years, but the median follow-up period was 50 (interquartile range, 38–65) months in our cohort. However, our study had a greater sample size and longer follow-up period than most of previous studies. Fourth, the nomogram and the ILD-GAP model were established by baseline characteristics, and longitudinal disease activity was not considered. Thus, omitted risk-associated trajectories of disease would likely have led to an underestimate of the true relation between CTD-ILD and mortality by the two above-mentioned models.


In conclusion, the ILD-GAP model performed poorly in predicted mortality of the Chinese patients with CTD-ILD. Our study developed a nomogram for predicting 3‑ and 5-year mortality of Chinese CTD-ILD patients by using a machine learning approach and performed well in predicting mortality risk.

Availability of data and materials

The datasets used and/or analysed during the current study are available from the corresponding author on reasonable request.



Connective tissue disease


Interstitial lung disease


Connective tissue disease-associated interstitial lung disease


Pulmonary function test




Systemic lupus erythematosus


Systemic sclerosis


Ankylosing spondylitis


Sjogren syndrome


Mixed connective tissue disease


Rheumatoid arthritis


Undifferentiated connective tissue disease


Overlap syndromes


High-resolution computed tomography

FVC %Predicted:

Percent predicted values of forced vital capacity


Percent predicted values of forced expiratory volume in one second

DLco %Predicted:

Percent predicted values of diffusion capacity of lung for carbon monoxide


Right ventricular diameter


Right atrial area


Pulmonary artery systolic pressure


Left ventricular ejection fraction


B-type natriuretic peptide


Aspartate transaminase






Standard deviation


Interquartile range


Least absolute shrinkage and selection operator


Net reclassification improvement


Integrated discrimination improvement


Decision curve analysis


  1. 1.

    Spagnolo P, Distler O, Ryerson CJ, Tzouvelekis A, Lee JS, Bonella F, et al. Mechanisms of progressive fibrosis in connective tissue disease (CTD)-associated interstitial lung diseases (ILDs). Ann Rheum Dis. 2021;80(2):143–50.

    CAS  Article  Google Scholar 

  2. 2.

    Mathai SC, Danoff SK. Management of interstitial lung disease associated with connective tissue disease. BMJ (Clinical research ed). 2016;352:h6819.

    PubMed  PubMed Central  Google Scholar 

  3. 3.

    Suzuki A, Kondoh Y, Fischer A. Recent advances in connective tissue disease related interstitial lung disease. Expert Rev Respir Med. 2017;11(7):591–603.

    CAS  Article  Google Scholar 

  4. 4.

    Demoruelle MK, Mittoo S, Solomon JJ. Connective tissue disease-related interstitial lung disease. Best Pract Res Clin Rheumatol. 2016;30(1):39–52.

    Article  Google Scholar 

  5. 5.

    Ryerson CJ, Vittinghoff E, Ley B, Lee JS, Mooney JJ, Jones KD, et al. Predicting survival across chronic interstitial lung disease: the ILD-GAP model. Chest. 2014;145(4):723–8.

    Article  Google Scholar 

  6. 6.

    McVeigh CM, Cairns AP. Diagnosis and management of ankylosing spondylitis. BMJ (Clinical research ed). 2006;333(7568):581–5.

    Article  Google Scholar 

  7. 7.

    Sharp GC, Irvin WS, Tan EM, Gould RG, Holman HR. Mixed connective tissue disease–an apparently distinct rheumatic disease syndrome associated with a specific antibody to an extractable nuclear antigen (ENA). Am J Med. 1972;52(2):148–59.

    CAS  Article  Google Scholar 

  8. 8.

    Vitali C, Bombardieri S, Moutsopoulos HM, Coll J, Gerli R, Hatron PY, et al. Assessment of the European classification criteria for Sjogren’s syndrome in a series of clinically defined cases: results of a prospective multicentre study. The European Study Group on Diagnostic Criteria for Sjogren’s Syndrome. Ann Rheum Dis. 1996;55(2):116–21.

    CAS  Article  Google Scholar 

  9. 9.

    Wolf L, Sheahan M, McCormick J, Michel B, Moskowitz RW. Classification criteria for systemic lupus erythematosus. Frequency in normal patients. JAMA. 1976;236(13):1497–9.

    CAS  Article  Google Scholar 

  10. 10.

    Bohan A, Peter JB. Polymyositis and dermatomyositis (first of two parts). N Engl J Med. 1975;292(7):344–7.

    CAS  Article  Google Scholar 

  11. 11.

    Preliminary criteria for the classification of systemic sclerosis (scleroderma). Subcommittee for scleroderma criteria of the American Rheumatism Association Diagnostic and Therapeutic Criteria Committee. Arthritis Rheum. 1980;23(5):581–90.

  12. 12.

    Aletaha D, Neogi T, Silman AJ, Funovits J, Felson DT, Bingham CO 3rd, et al. 2010 Rheumatoid arthritis classification criteria: an American College of Rheumatology/European League Against Rheumatism collaborative initiative. Arthritis Rheum. 2010;62(9):2569–81.

    Article  Google Scholar 

  13. 13.

    Hu Y, Wang LS, Wei YR, Du SS, Du YK, He X, et al. Clinical characteristics of connective tissue disease-associated interstitial lung disease in 1,044 Chinese patients. Chest. 2016;149(1):201–8.

    Article  Google Scholar 

  14. 14.

    Brusca RM, Pinal-Fernandez I, Psoter K, Paik JJ, Albayda J, Mecoli C, et al. The ILD-GAP risk prediction model performs poorly in myositis-associated interstitial lung disease. Respir Med. 2019;150:63–5.

    Article  Google Scholar 

  15. 15.

    Mango RL, Matteson EL, Crowson CS, Ryu JH, Makol A. Assessing mortality models in systemic sclerosis-related interstitial lung disease. Lung. 2018;196(4):409–16.

    CAS  Article  Google Scholar 

  16. 16.

    Kam MLW, Li HH, Tan YH, Low SY. Validation of the ILD-GAP model and a local nomogram in a singaporean cohort. Respiration. 2019;98(5):383–90.

    CAS  Article  Google Scholar 

  17. 17.

    Yunt ZX, Chung JH, Hobbs S, Fernandez-Perez ER, Olson AL, Huie TJ, et al. High resolution computed tomography pattern of usual interstitial pneumonia in rheumatoid arthritis-associated interstitial lung disease: relationship to survival. Respir Med. 2017;126:100–4.

    Article  Google Scholar 

  18. 18.

    Kim EJ, Elicker BM, Maldonado F, Webb WR, Ryu JH, Van Uden JH, et al. Usual interstitial pneumonia in rheumatoid arthritis-associated interstitial lung disease. Eur Respir J. 2010;35(6):1322–8.

    CAS  Article  Google Scholar 

  19. 19.

    Chung JH, Chawla A, Peljto AL, Cool CD, Groshong SD, Talbert JL, et al. CT scan findings of probable usual interstitial pneumonitis have a high predictive value for histologic usual interstitial pneumonitis. Chest. 2015;147(2):450–9.

    Article  Google Scholar 

  20. 20.

    Kelly CA, Saravanan V, Nisar M, Arthanari S, Woodhead FA, Price-Forbes AN, et al. Rheumatoid arthritis-related interstitial lung disease: associations, prognostic factors and physiological and radiological characteristics–a large multicentre UK study. Rheumatology (Oxford). 2014;53(9):1676–82.

    CAS  Article  Google Scholar 

  21. 21.

    Fu Q, Wang L, Li L, Li Y, Liu R, Zheng Y. Risk factors for progression and prognosis of rheumatoid arthritis-associated interstitial lung disease: single center study with a large sample of Chinese population. Clin Rheumatol. 2019;38(4):1109–16.

    Article  Google Scholar 

  22. 22.

    Hull RP, Goldsmith DJ. Nephrotic syndrome in adults. BMJ (Clinical research ed). 2008;336(7654):1185–9.

    Article  Google Scholar 

  23. 23.

    Lawrence YA, Steiner JM. Laboratory evaluation of the liver. Vet Clin North Am Small Anim Pract. 2017;47(3):539–53.

    Article  Google Scholar 

  24. 24.

    Li R, Zhu WJ, Wang F, Tang X, Luo F. AST/ALT ratio as a predictor of mortality and exacerbations of PM/DM-ILD in 1 year-a retrospective cohort study with 522 cases. Arthritis Res Ther. 2020;22(1):202.

    CAS  Article  Google Scholar 

  25. 25.

    Akirov A, Masri-Iraqi H, Atamna A, Shimon I. Low albumin levels are associated with mortality risk in hospitalized patients. Am J Med. 2017;130(12):1465.

    CAS  Article  Google Scholar 

  26. 26.

    Grimminger J, Ghofrani HA, Weissmann N, Klose H, Grimminger F. COPD-associated pulmonary hypertension: clinical implications and current methods for treatment. Expert Rev Respir Med. 2016;10(7):755–66.

    CAS  Article  Google Scholar 

  27. 27.

    Wang Z, Chesler NC. Pulmonary vascular mechanics: important contributors to the increased right ventricular afterload of pulmonary hypertension. Exp Physiol. 2013;98(8):1267–73.

    Article  Google Scholar 

  28. 28.

    Castelino FV, Varga J. Interstitial lung disease in connective tissue diseases: evolving concepts of pathogenesis and management. Arthritis Res Ther. 2010;12(4):213.

    Article  Google Scholar 

  29. 29.

    Vij R, Strek ME. Diagnosis and treatment of connective tissue disease-associated interstitial lung disease. Chest. 2013;143(3):814–24.

    CAS  Article  Google Scholar 

  30. 30.

    Witt LJ, Demchuk C, Curran JJ, Strek ME. Benefit of adjunctive tacrolimus in connective tissue disease-interstitial lung disease. Pulm Pharmacol Ther. 2016;36:46–52.

    CAS  Article  Google Scholar 

  31. 31.

    Young A, Koduri G, Batley M, Kulinskaya E, Gough A, Norton S, et al. Mortality in rheumatoid arthritis. Increased in the early course of disease, in ischaemic heart disease and in pulmonary fibrosis. Rheumatology. 2007;46(2):350–7.

    CAS  Article  Google Scholar 

  32. 32.

    Olson AL, Swigris JJ, Sprunger DB, Fischer A, Fernandez-Perez ER, Solomon J, et al. Rheumatoid arthritis-interstitial lung disease-associated mortality. Am J Respir Crit Care Med. 2011;183(3):372–8.

    Article  Google Scholar 

  33. 33.

    Tibshirani R. The lasso method for variable selection in the Cox model. Stat Med. 1997;16(4):385–95.

    CAS  Article  Google Scholar 

  34. 34.

    Ley B, Ryerson CJ, Vittinghoff E, Ryu JH, Tomassetti S, Lee JS, et al. A multidimensional index and staging system for idiopathic pulmonary fibrosis. Ann Intern Med. 2012;156(10):684–91.

    Article  Google Scholar 

  35. 35.

    Jones PW, Quirk FH, Baveystock CM. The St George’s respiratory questionnaire. Respir Med. 1991;85(Suppl B):25–31.

    Article  Google Scholar 

  36. 36.

    Schurink CAM, Nieuwenhoven CAV, Jacobs JA, Rozenberg-Arska M, Joore HCA, Buskens E, et al. Clinical pulmonary infection score for ventilator-associated pneumonia: accuracy and inter-observer variability. Intensive Care Med. 2004;30(2):217–24.

    Article  Google Scholar 

  37. 37.

    Valencia M, Badia JR, Cavalcanti M, Ferrer M, Agusti C, Angrill J, et al. Pneumonia severity index class v patients with community-acquired pneumonia: characteristics, outcomes, and value of severity scores. Chest. 2007;132(2):515–22.

    Article  Google Scholar 

Download references


Not applicable.


This study was supported by National Natural Science Foundation of China (U1904142, 82000015), Scientific and technological projects of Science and Technology Department of Henan Province (182102410010), Key Scientific Research Project of Colleges and Universities in Henan Province (18A320056).

Author information




DS, YW, QL, TT-W, PF-L, TC-J, LL-D, LQ-J and WJ-Z selected the patients and acquired the data; DS analyzed, interpreted the data and completed the writing. YW was substantially involved in revising the article. ZC had full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Zhe Cheng.

Ethics declarations

Ethics approval and consent to participate

Ethical approval for this study was obtained from the Institution Review Board of the First Affiliated Hospital of Zhengzhou University (2019-KY-116).

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Sun, D., Wang, Y., Liu, Q. et al. Prediction of long-term mortality by using machine learning models in Chinese patients with connective tissue disease-associated interstitial lung disease. Respir Res 23, 4 (2022).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI:


  • Interstitial lung disease
  • Connective tissue disease
  • Nomogram
  • Machine learning