Development and performance assessment of novel machine learning models to predict pneumonia after liver transplantation

Background Pneumonia is the most frequently encountered postoperative pulmonary complications (PPC) after orthotopic liver transplantation (OLT), which cause high morbidity and mortality rates. We aimed to develop a model to predict postoperative pneumonia in OLT patients using machine learning (ML) methods. Methods Data of 786 adult patients underwent OLT at the Third Affiliated Hospital of Sun Yat-sen University from January 2015 to September 2019 was retrospectively extracted from electronic medical records and randomly subdivided into a training set and a testing set. With the training set, six ML models including logistic regression (LR), support vector machine (SVM), random forest (RF), adaptive boosting (AdaBoost), extreme gradient boosting (XGBoost) and gradient boosting machine (GBM) were developed. These models were assessed by the area under curve (AUC) of receiver operating characteristic on the testing set. The related risk factors and outcomes of pneumonia were also probed based on the chosen model. Results 591 OLT patients were eventually included and 253 (42.81%) were diagnosed with postoperative pneumonia, which was associated with increased postoperative hospitalization and mortality (P < 0.05). Among the six ML models, XGBoost model performed best. The AUC of XGBoost model on the testing set was 0.734 (sensitivity: 52.6%; specificity: 77.5%). Pneumonia was notably associated with 14 items features: INR, HCT, PLT, ALB, ALT, FIB, WBC, PT, serum Na+, TBIL, anesthesia time, preoperative length of stay, total fluid transfusion and operation time. Conclusion Our study firstly demonstrated that the XGBoost model with 14 common variables might predict postoperative pneumonia in OLT patients. Supplementary Information The online version contains supplementary material available at 10.1186/s12931-021-01690-3.


Introduction
Postoperative pulmonary complications (PPC) adversely affect the clinical course of orthotopic liver transplantation (OLT) and play an important role in poor survival [1]. Postoperative pneumonia is the most common type of PPC, contributing to morbidity, length of hospital stay, and mortality [2]. Identification of patients at high risk of developing postoperative pneumonia is the key to early implementing interventions to prevent its onset or antibiotics to treat bacterial infection [3]. On the contrary, unnecessary and excessive antibiotic use in patients at low risk for postoperative pneumonia can lead to antibiotic resistance and side effects. For instance, recent studies have shown that extensive use of antibiotics for anti-bacteria prophylaxis, multi-drug resistant bacteria in post-transplant patients have been induced [4,5]. Therefore, it is essential to establish a reliable model for prediction of postoperative pneumonia to tailor preventive interventions and treatments for patients at high-risk of postoperative pneumonia and avoid unnecessary use of antibiotics in low-risk patients.
In recent years, several scoring systems for prediction of postoperative pneumonia have been reported to improve risk-stratification [6], such as the Prestroke Independence, Sex, Age, National Institutes of Health Stroke Scales (ISAN) in acute ischemic stroke patients [7], a pneumonia risk index for patients undergoing major noncardiac surgery [8], and a systemic inflammation score for patients after radical resection of gastric cancer [9,10]. However, these predictive models are not applicable to liver transplant recipients, mainly due to the preoperative pulmonary condition of patients with endstage liver disease and the immunosuppressive status of allograft recipients [10]. Currently, an effective risk classification for postoperative pneumonia has not yet been available for liver transplant recipients.
Compared with the traditional scoring systems, machine learning (ML) models have shown better performance in predicting various diseases or clinical conditions [11][12][13]. ML models are usually constructed based on high volume data recorded in the electronic patient record (EPR) systems and its deep learning ability allows ML models to capture complex, nonlinear relationships, even previously unknown correlations in big data, digging deeper into clinical data [14], and shows promising potential in clinical scenes where large amount of data were collected and integrated every day. Recently, Li and colleagues [15] have developed a model using ML methods to predict stroke-associated pneumonia in Chinese patients with acute ischemic stroke. In addition, ML was used to predict severe pneumonia during posttransplant hospitalization in recipients of a kidney transplant [16]. ML was also applied in developing models for liver disease and transplantation to predict post-transplant survival and complications, including acute kidney injury (AKI) and diabetes [17]. To date, there has been no ML model for prediction of postoperative pneumonia in recipients of liver transplant [18].
In this study, we aimed to develop predictive models using ML methods, and to evaluate their performance in predicting postoperative pneumonia in OLT patients.
The findings obtained through conducting this study was expected to provide a novel ML algorithm for prediction of postoperative pneumonia in patients after liver transplantation.

Human subjects and study design
In this retrospective study, data of 894 patients who underwent either living donor liver transplantation (LDLT) or deceased donor liver transplantation (DDLT) in the Third Affiliated Hospital of Sun Yat-sen University-Lingnan Hospital (Guangzhou, Guangdong, China) spanning from January 2015 to September 2019 were retrieved from the EPR systems. All the patients were registered as recipients of organ transplantation in the China Organ Transplant Response Systems (www. cot. org. cn). During the retrospective enrollment, the patients aged < 18 years, presented with preoperative pneumonia or lack of sufficient post-operative data were excluded from this study.
In the EPR systems of our hospital, a database platform was established by extracting medical records from hospital information system (HIS), laboratory information system (LIS), picture archiving and communication system (PACS), and Docare Anesthesia System (2005-2020 Medicalsystem Co., Ltd. Suzhou, China). This database platform enabled access to comprehensive data collected during hospital admission, inpatient stay, and post-hospital follow-up visit, including demographic characteristics, daily documentation, laboratory tests, imaging results, anesthesia records, and other clinical characteristics. This study was reported in accordance with the Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis (TRIPOD) guidelines.

Primary outcome
The primary outcome was the incidence of postoperative pneumonia during the postoperative period before hospital discharge. Postoperative pneumonia was defined on the basis of European Perioperative Clinical Outcome (EPCO) definitions, in which at least one of the following definitive chest X-ray or CT findings was fulfilled: infiltrate, consolidation, cavitation; and at least one of the following signs and symptoms of infection (Temperature > 38 °C or < 36 °C with no other causes, white blood cell (WBC) count > 10 × 10 9 /L or < 4 × 10 9 /L) [6].

Data selection
The data elements related to the following categories were chosen from database platform: (1) Demographics: age, gender, height and weight; (2) Preoperative comorbidities: hypertension, coronary heart disease, myocardial infarction, diabetes mellitus, history of alcohol abuse, smoking, and past surgery; (3) Etiology: primary liver diseases contributing to the decision of LT with main focus on hepatitis B, hepatitis C, dual infection of any combination of the known hepatitis virus A to E, hepatic malignancy (including hepatocellular carcinoma and cholangiocarcinoma), alcohol-related liver disease (ALD), drug-induced liver injury (DILI), and autoimmune liver disease; (4) Perioperative laboratory values: lab results concerning liver function, kidney function, electrolytes, and count of blood cells. The results of the latest tests prior to surgery were collected. Lab MELD score prior to surgery was calculated; (5) Preoperative complications: complications and metrics reminding the severity of the patients were collected, which mainly consist of complications related to cirrhosis and portal hypertension, the documentation of treatment escalation including length of stay in ICU, use of continuous blood purification (CBP) and mechanical ventilation; (6) Intraoperative incidents: incidents indicating hemodynamic instability, such as cardia arrest, arrhythmia, lactic acidosis, acidosis, hypernatremia, hypokalemia, and hypotension; (7) Intraoperative medication: including intraoperative use of vasoconstrictors (either used as bolus or continuously) and blood coagulant, which reflected the extent of hemodynamic instability and hemorrhagic tendency. The data collected were the accumulative sum by the end of the surgery; (8) Intraoperative fluid and transfusion: the total of intraoperative fluid infusion and output, as well as the total of blood product transfused were respectively extracted. Red blood cell transfusion, plasma transfusion, total blood product transfusion and total fluid transfusion were all classified into two categories based on specific criterions; (9) Post-operative medications with mainly traced the post-operative medications within 7 days after surgery. These medications consist of colloid, vasoconstrictors, as well as immunosuppressant, antifungal agents and antibiotics; (10) Microorganism observation: test on microorganism during preoperative period and post-operative period.

Variable selection
With 591 records and 148 features, overfitting could occur during training and undermine model performance. Therefore, we first implemented univariate test to filter out features that were statistically insignificant. Finally, 33 features were statistically significant (P < 0.05) and proceeded to be used in a recursive feature elimination (RFE) method embedded with random forest [19]. Initially, RFE method trained on all features and then it recursively removed least important features, the subset of features which had the highest sensitivity score was selected.
XGBoost model was constructed using the xgboost package (https:// xgboo st. readt hedocs. io/ en/ latest/ python/ index. html). The remaining five models were established via Scikit-learn package (https:// github. com/ scikit-learn/ scikit-learn). Considering that machine learning models had multiple tuning parameters which were essential for model performance, fivefold crossvalidation grid search method was used for selection of the best parameters and AUCs on testing set were measured (Additional file 1: Table S1). The complete data set of 591 adult was then randomly separated into 70% train and 30% test for validation. Bootstrap method was then used to sample 1000 different test sets in order to get 95% confidence interval (CI) of the best tuned models' evaluation metrics. Model performance was evaluated by area under receiver-operating curve (AUC), accuracy, sensitivity, and specificity.

Statistical analysis
Python (Anaconda Distribution, version 3.7) package Numpy (version 1.16.5) and Pandas (version 0.25.1) were employed for data cleaning. Python (Anaconda Distribution version 3.7) Scipy package (version 1.3.1) were used to analyze the data. The continuous variables were presented with the mean along with standard deviation (SD), or median along with interquartile range. Independent sample t-test was used for normally distributed data, while Mann-Whitney U test was used for non-normal distribution data in univariate analyses. Categorical variables were expressed with quantities and percentages, and tested by Chi-square test or Fisher's exact test. Kaplan-Meier methods were applied to estimate the long-term survival rates. Besides, the comparisons between groups were performed by Gehan-Breslow-Wilcoxon test and Log-rank test.
No variables had missing percentage higher than 1%. We employed mean imputation, which imputed missing value with the mean of each feature, to fill in missing values. Before we proceeded to machine learning models, continuous variables were normalized based on the mean and SD of the training set. Categorical variables were encoded into binary variable, 1 represents having an incident, 0 represents not having an incident. Gender was also encoded, 1 represents male, 0 represents female.
The whole dataset was split into 70% of training set and 30% of testing set. The data in the training set was used for development of predictive models, while the testing set was used to validate models' performance.

Characteristics of the study subjects and preoperative factors associated with postoperative pneumonia
A total of 894 patients who underwent orthotopic liver transplantation in our hospital, spanning the period from January 2015 to September 2019, were assessed for eligibility. After 65 pediatric patients, 226 patients with preoperative pneumonia, and 12 patients lack of sufficient postoperative data, were excluded, 591 patients were finally enrolled and used for development and performance evaluation of machine learning models to predict postoperative pneumonia. The flow diagram of the enrollment was presented in Fig. 1. Notably, pneumonia occurred in 253 patients, accounting for as high as 42.81% of the study subjects following liver transplantation, while 338 (57.19%) patients did not have postoperative pneumonia.
The demographic characteristics, laboratory tests results, and clinical features of the enrolled patients with or without postoperative pneumonia were summarized in Table 1. The demographic characteristics and preoperative comorbidities did not differ significantly between the patients with or without occurrence of postoperative pneumonia (P > 0.05). Notably, hepatic malignancy, hematocrit (HCT), alanine transaminase (ALT), total bilirubin (TBIL), albumin (ALB), coagulation function, MELD score, and hospital stay were found to have significant differences between patients with or without postoperative pneumonia (P < 0.05). In particular, the patients without postoperative pneumonia had significantly better preoperative hepatic function, as reflected by preoperative liver function tests in comparison with those patients who developed pneumonia after surgery (P < 0.05).

Analysis of intraoperative and postoperative factors related to postoperative pneumonia
The intraoperative factors, including those in the following three categories: intraoperative incidents, fluid management and transfusion, and medications, were compared between the study patients with or without postoperative pneumonia. As shown in Table 2, hypernatronemia, longer operation time and anesthesia time, more red blood cell (RBC) transfusion and blood product  transfusion, larger volume of infusion and more blood loss were found to be significantly associated with postoperative pneumonia (P < 0.05). Notably, the proportions of patients with RBC transfusion > 18U, blood product transfusion > 5000 mL, total volume of infusion > 10 L, and blood loss > 2 L were significantly higher in the pneumonia group than the non-pneumonia group ( Table 2). In addition, higher doses of recombinant activated factor VII (0.343 ± 1.031 vs. 0.134 ± 0.615, P = 0.008) and prothrombin complex concentrate (602.367 ± 410.826 vs. 506.719 ± 359.224, P = 0.01) were administrated in the patients without pneumonia than those with pneumonia. In terms of postoperative medications (Table 3), the doses of telipressin and dopamine in patients without pneumonia were significantly higher than those with pneumonia (0.148 ± 0.414 vs. 0.079 ± 0.314 mg/day, P = 0.012; 47.544 ± 72.198 vs. 35.473 ± 63.069 mg/day, P = 0.013; respectively). There were no significant differences between the two groups in terms of norepinephrine, dopamine, epinephrine and tacrolimus (P > 0.05).

Feature selection using univariate and recursive feature elimination methods
As partially relevant or less important features may negative affect performance of machine learning models, we performed feature selection and ranked levels of feature importance. Feature selection was performed using univariate and recursive feature elimination (RFE) methods, after which dimensionality was reduced from 148 to 14 features. These 14 features were listed as follows: preoperative international normalized ratio (INR), HCT, platelets (PLT), ALB, ALT, fibrinogen (FIB), WBC, prothrombin time (PT), serum sodium (Na + ), TBIL, anesthesia time, preoperative hospital stay, total fluid transfusion, and operation time. Further, feature importance plot was created to rank the levels of importance using fine tuned eXtreme Gradient Boosting (XGBoost) model. As a result, preoperative length of hospital stay, PT, and WBC were ranked first, second, and third, respectively (Fig. 2).

Performance assessment of the machine learning models for prediction of postoperative pneumonia
Six machine learning models, including LR, SVM, RF, MLP XGBoost, and GBM, were constructed, and their performance for prediction of postoperative pneumonia was assessed. Additional file 1: Table S1 and Fig. 3 showed the best hyperparameter combination for each model and their AUCs in predicting postoperative pneumonia. XGBoost had the highest AUC value (0.793) with the lowest AUC value (0.674) for SVM. The AUC values of LR, SVM, and MLP were relatively lower than other Data were expressed as frequency (proportion). Continuous variables were presented as mean (standard deviation), or median (interquartile range). The bold emphasis means that p < 0.05 WBC white blood cell, ALT alanine transaminase, AST aspartate amino transferase, TBIL total bilirubin, IBIL indirect bilirubin, ALB albumin, BUN blood urea nitrogen, PT prothrombin time, APTT activated partial thromboplastin time, FIB fibrinogen, INR international normalized ratio In addition to AUCs, accuracy, sensitivity, and specificity were used for evaluation of performance of the six machine learning models. As shown in Table 4, on  Table 4).

Discussion
Early detection of postoperative pneumonia is critical for timely interventions to prevent the onset of the complication. Until now, the predication of postoperative pneumonia has been challenging, and there is need for reliable and accurate predictive model for patients after liver transplantation. This study, based upon large volume of data and ML methods, has the following major novel findings: (1) The incidence of postoperative pneumonia was high in patients after OLT, and the occurrence was significantly associated with prolonged hospital stay and increased mortality after liver transplantation; (2) A total of 14 factors were identified to be significantly correlated with postoperative pneumonia after OLT, including INR, HCT, PLT, ALB, ALT, FIB, WBC, PT, serum Na + , TBIL, anesthesia time, preoperative length of hospital stay, total fluid transfusion, and operation time; (3) The XGBoost model exhibited the best overall performance in predicting postoperative pneumonia among the developed ML models, with the value of AUC of 0.794, sensitivity of 52.6%, and specificity of 77.5%; (4) Multiple lines of evidence support that the XGBoost model holds promise for future clinical application in predicting postoperative pneumonia in patients after liver transplantation. XGBoost model is recognized as an efficient and scalable tree boosting system [26], and it has performed well in the ML competitions, especially the simplicity in use and the accuracy in prediction [27,28]. In the present study, we developed a total of six ML models, of these, XGBoost model had the best overall performance, with a specificity of 77.5% and a sensitivity of 52.6% in predicting postoperative pneumonia in OLT patients. In the study, the AUC values of LR, SVM, and MLP were relatively lower than other three ensemble machine learning models including XGboost, RF and GBM, whose accuracy and robustness might be attributed to their nature of integrating multiple base classifiers or learners. However, RF is a bagging ensemble, and it needs to train a large amount of decision trees and aggregate them. As a result, it usually takes much more time to trade numerous random computations for high accuracy, compared with GBM and XGboost, which both belong to boosting ensemble method. Moreover, compared to GBM, XGboost leverages second order derivative and implements sampling method in each iteration to alleviate overfitting and speed up computation.
Considering the high prevalence of multi-drug resistant bacteria in post-transplant patients induced by the excessive use of antibiotics [4], high specificity is  especially necessary in clinical practice to avoid an unnecessary and overuse of antibiotics in low-risk patients. By contrast, all patients received peri-operative antibiotic therapy for 72 h, and this has posed considerable challenge in predicting pneumonia at an early stage [29]. Therefore, the novel XGBoost model as established in this study may assist clinicians in making optimal interventions and treatments, and eventually improve care for affected patients. It has been reported that a number of risk factors, including age of recipient, liver dysfunction score, indication for OLT, perioperative transfusions especially the blood and fresh frozen plasma units, restrictive preoperative pulmonary testing pattern and INR measured prior OLT, are significantly associated with post-liver transplant pneumonia [3,30,31]. However, these factors are limited for its underutilization of within-category information, causing a loss of information [32]. For instance, patients above or below the optimal cut-point value had been equally considered in the risk-factor prediction, yet the risk of post-transplant pneumonia may vary considerably. As the risk-factor prediction is developed with neither combining all factors together nor weighting difference between different factors, it is not widely used in clinical practice. In addition, the traditional scores were given on the basis of the assumption that all misclassification errors have equal costs. In fact, this assumption is indefensible if apply in real-world applications [33]. In this study, we applied RFE feature selection method on 33 features which were statistically significant, of which 14 best features with the highest sensitivity score, including preoperative laboratory results of INR, HCT, PLT, ALB, ALT, FIB, WBC, PT, serum Na + , TBIL, anesthesia time; preoperative length of hospital stay, total fluid transfusion, and operation time. We found that most of the factors have been reported to be associated with pneumonia and PPCs except for PLT and serum Na + [18,30,31,34,35]. As the risk factors reported in different literatures are quite different and this may be attributed to different population and definition of pneumonia and PPCs, we think it just reflects the advantage of ML models to capture previously unknown correlations in big data. Although the underlying mechanism remained unclear, the high clinical relevance of these factors laid a solid foundation for the consequent ML process and made the conclusion more practical and clinically valuable [36]. Moreover, we found the 14 features in ML model were all routinely recorded and widely used, and no factors need special instrument or equipment to obtain, indicating that our models are feasible and can be widely used in hospitals.
To date, ML models have shown outstanding performance in prediction of diseases and clinical conditions, for which these models can be helpful in decision-making about the use of interventions and medications [33]. For example, ML models can generate an individualized probability for each patient. Additionally, implementation of sophisticated computer algorithms at the bedside has become a reality since the popularity of EPR systems and wide availability of structured patient data. In our study, the EPR systems included HIS, LIS, PACS, and Docare Anesthesia System, which allowed us to integrate medical data generated during admission, covering demographic data, daily documentation, laboratory and imaging results, anesthesia records and thorough record of medication, and treatment. In addition, we separated the patients 1000 times (70% train and 30% test) into 1000 different pairs of train and test sets and this could minimize accidental error and enhance the accuracy of the current ML models. This result showed that in predicting post-transplant pneumonia, we should not apply only one of the ML model.
In the study, we found that patients with hepatic malignancy, better hepatic function before surgery, and longer hospital stay before surgery were significantly associated with lower risk of developing postoperative pneumonia. We postulated that this could be attributed to the better preoperative treatment and preparation, suggested that interventions should be implemented to improve the patients' overall preoperative conditions. In consistence with previous reports [37,38], we identified that a number of intraoperative factors, such as the longer operation and anesthesia time, excessive blood product transfusion, and fluid transfusion, were significantly related to postoperative pneumonia in patients following liver transplantation. By contrast, we found that there was an association between the use of telipressin and dopamine and decreased incidence of postoperative pneumonia in patients after liver transplantation. These findings are clinically important for the intraoperative anesthetic management and help improving the clinical outcomes.
The study may have several limitations. Firstly, the ML models are developed on the basis of a singlecenter cohort study, and future multi-center study will be needed for external validation. Secondly, this study is performed retrospectively, for which collection and entry bias, as well as possible residual confounding may occur. Thirdly, we were unable to incorporate the metrics of liver donors as training variables in our study, due to the lack of donor information in the EPR systems of our hospital.

Summary
Our study has successfully established six novel ML models to predict postoperative pneumonia among OLT patients. Of these, the XGboost model has demonstrated overall best performance, and therefore holds promise for future clinical application to predict post-transplant pneumonia in OLT patients. To the best of our knowledge, this is the first ML-based study to provide a novel ML algorithm for prediction of postoperative pneumonia in patients after liver transplantation.