A comprehensive study on machine learning models combining with oversampling for bronchopulmonary dysplasia-associated pulmonary hypertension in very preterm infants

Wang, Dan; Huang, Shuwei; Cao, Jingke; Feng, Zhichun; Jiang, Qiannan; Zhang, Wanxian; Chen, Jia; Kutty, Shelby; Liu, Changgen; Liao, Wenyu; Zhang, Le; Zhu, Guli; Guo, Wenhao; Yang, Jie; Liu, Lin; Yang, Jingwei; Li, Qiuping

doi:10.1186/s12931-024-02797-z

Research
Open access
Published: 08 May 2024

A comprehensive study on machine learning models combining with oversampling for bronchopulmonary dysplasia-associated pulmonary hypertension in very preterm infants

Dan Wang^1,2,4^na1,
Shuwei Huang³^na1,
Jingke Cao^1,2^na1,
Zhichun Feng^1,2,
Qiannan Jiang⁵,
Wanxian Zhang⁶,
Jia Chen⁷,
Shelby Kutty⁸,
Changgen Liu^1,2,
Wenyu Liao⁹,
Le Zhang⁹,
Guli Zhu⁹,
Wenhao Guo⁹,
Jie Yang¹⁰^na2,
Lin Liu³^na2,
Jingwei Yang⁹^na2 &
…
Qiuping Li^1,2^na2

Respiratory Research volume 25, Article number: 199 (2024) Cite this article

603 Accesses
2 Altmetric
Metrics details

Abstract

Background

Bronchopulmonary dysplasia-associated pulmonary hypertension (BPD-PH) remains a devastating clinical complication seriously affecting the therapeutic outcome of preterm infants. Hence, early prevention and timely diagnosis prior to pathological change is the key to reducing morbidity and improving prognosis. Our primary objective is to utilize machine learning techniques to build predictive models that could accurately identify BPD infants at risk of developing PH.

Methods

The data utilized in this study were collected from neonatology departments of four tertiary-level hospitals in China. To address the issue of imbalanced data, oversampling algorithms synthetic minority over-sampling technique (SMOTE) was applied to improve the model.

Results

Seven hundred sixty one clinical records were collected in our study. Following data pre-processing and feature selection, 5 of the 46 features were used to build models, including duration of invasive respiratory support (day), the severity of BPD, ventilator-associated pneumonia, pulmonary hemorrhage, and early-onset PH. Four machine learning models were applied to predictive learning, and after comprehensive selection a model was ultimately selected. The model achieved 93.8% sensitivity, 85.0% accuracy, and 0.933 AUC. A score of the logistic regression formula greater than 0 was identified as a warning sign of BPD-PH.

Conclusions

We comprehensively compared different machine learning models and ultimately obtained a good prognosis model which was sufficient to support pediatric clinicians to make early diagnosis and formulate a better treatment plan for pediatric patients with BPD-PH.

Introduction

With the ongoing advancements in perinatal technology, there has been a remarkable increase in the success rate of treating very and extremely preterm infants diagnosed with bronchopulmonary dysplasia (BPD) [1]. However, some BPD infants may develop pulmonary hypertension (PH) which significantly impacts their mortality rate. Studies have demonstrated that the incidence of PH in preterm infants with BPD is high as 26.8% in other countries [2] with the mortality rate ranging from 14 to 38%, [3, 4] or even 50% in some countries [5, 6]. Even survivors still face short- and long-term adverse complications. The pathogenesis of BPD-PH remains unclear, and there is no safe and effective treatment for BPD-PH at present. Early prevention and timely diagnosis prior to pathological change are the key to reducing morbidity and improving prognosis [7]. How to identify infants at risk for BPD-PH has therefore become one of the hotspots of research for pediatric clinicians.

So far, there have been few studies reporting the establishment of BPD-PH predictive models [8]. Individual studies reporting the establishment of BPD-PH predictive models used regression coefficients to indicate the degree of correlation between risk factors and diseases, the algorithm that they used was relatively simple, and the results could not accurately give specific probability values. With the development of medical and health big data infrastructure, the form and quantity of medical data continue to improve, which gives us a chance to address the aforementioned technical caveats with a larger scale of training and more rigorous validation. In addition, to cope with the commonly seen unbalancing issue among real-world datasets, some machine learning techniques, such as oversampling could be tried out. Synthetic minority over-sampling technique (SMOTE) [9] is an oversampling method based on neural networks, which simulates the learning process of neural network to learn missing values and outliers. In this retrospective study, we intended to answer the following research questions by using the inpatient medical records from four different major tertiary-level medical centers in China:

RS1 – Can machine learning methods support BPD-PH risk prediction in a more practical clinical setting across different organizations?
RS2 – Can oversampling techniques such as SMOTE help improve the prediction results?

Data collection

Patients

This study was approved by the research ethics board of the Seventh Medical Center of PLA General Hospital (No. 2022–02), and consisted with the Declaration of Helsinki. Informed consent was waived. The study population included very preterm infants (VPIs) with BPD who were born or admitted to the Seventh Medical Center of PLA General Hospital (Beijing, Chia), Qingdao Women and Children’s Hospital (Qingdao, China), Tianjin Central Hospital of Gynecology Obstetrics (Tianjin, China), and Guangdong Women and Children Hospital (Guangzhou, China) between August 1, 2015 and February 28, 2022. A total of 801 VPIs with BPD less than 32 weeks of gestational age were collected. Among these, there are 626 newborns in The Seventh Medical Center of PLA General Hospital, 69 newborns in Qingdao Women and Children’s Hospital, 59 newborns in Tianjin Central Hospital of Gynecology Obstetrics, and 47 newborns in Guangdong Women and Children Hospital. The following criteria were applied to construct the initial dataset:

Inclusion criteria were VPIs with gestational age less than 32 gestational weeks who were diagnosed with BPD. The diagnostic criteria and severity of BPD were based on the definition proposed in the conference of the National Institute of Child Health and Human Development in June 2000 [10], when their gestational age was corrected at 36 weeks.

Exclusion criteria were VPIs 1) at an admission age > 3 days or those who were not hospitalized beyond 36 weeks’ corrected gestation (due to discharge or death); 2) with congenital lung diseases, other anatomical abnormalities (such as diaphragmatic hernia or thoracic malformation), congenital heart disease except for patent ductus arteriosus (PDA), patent foramen ovale (PFO), atrial septal defect (ASD) and small interventricular septal defect (VSD) (defect diameter < 5 mm/m² of body surface area) leading to PH.

VPIs with an admission age > 3 days or those hospitalized for < 36 weeks’ corrected gestation were considered valid subjects, knowing that admission age > 3 days may result in missing maternal and neonatal clinical data, while hospitalization < 36 weeks’ corrected gestation may affect the diagnosis of severity of BPD and PH. Therefore, a total of 40 cases of BPD infants were excluded and 761 cases of BPD infants were enrolled finally.

According to the results of Doppler echocardiography at least over 36 weeks’ corrected gestation, the patients were divided into BPD-PH group and BPD group.

Clinical features

A total of 46 clinical features which are the same with variables were collected by reviewing the medical records during hospitalization, including maternal pregnancy factors, newborn clinical data, conditions of treatment and echocardiography-related information.

Maternal pregnancy factors included cesarean section, hypertension in pregnancy, gestational diabetes mellitus, intrauterine distress, preeclampsia, multiple births, natural conception, placental abnormality, placental abruption, premature rupture of the membrane (PROM), duration of PROM more than 18 h, and oligohydramnios.

Newborn clinical data included gestational age (week), birth weight (g), the severity of BPD, early-onset PH, 1-min Apgar score (A1), 5-min Apgar score (A5), meconium aspiration syndrome (MAS), amniotic fluid contamination, exclusive breastfeeding, neonatal respiratory distress syndrome (NRDS) more than II grades, ventilator-associated pneumonia [11], septicemia [12], necrotizing enterocolitis (NEC) requiring surgery, pulmonary hemorrhage [13], hemodynamically significant PDA [14], retinopathy of prematurity (ROP) more than II grades, small for gestational age (SGA), sex, age at initiated feeding (day), and time to achieve full enteral feeding (day).

Conditions of treatment included duration of invasive respiratory support (day), duration of nasal cannula oxygenation (day), duration of noninvasive respiratory support (day), corticosteroids used for treatment of BPD, usage of pulmonary surfactant (PS), a single dose, multiple doses, usage of vasoactive agents, usage of caffeine, and ligation of the PDA.

Echocardiography-related information included PFO, maximum diameter of PDA, VSD, ASD, the peak velocity of tricuspid regurgitant flow, velocity and direction of blood flow in PDA, shunt at the level of PFO, shunt at the level of VSD, and position of the ventricular septum.

These clinical features cover most of the factors that may affect the disease, allowing subsequent models to more accurately predict whether patients are at risk of developing the disease. The diagnostic criteria referred to related criteria [15, 16]. And all diagnostic and treatment times are before or at the time of correcting the gestational age of 36 weeks except BPD-PH. Early-onset PH was defined as PH occurring between 72 h and 14 days after birth [17]. BPD-PH was defined as PH occurring after 36 weeks' corrected gestation [18]. PH echocardiographic diagnostic criteria, ① Right ventricular systolic pressure (RVSP) > 35 mmHg (1 mmHg = 0.133 kPa); RVSP = (tricuspid valve flow velocity) ² × 4 + right atrial pressure (usually 5 mmHg); ② RVSP/ sBP ratio > 0.5; ③ Any VSD or PDA with bidirectional or right-to-left shunt; ④ If there is no tricuspid valve regurgitation or shunt, then meet 2 of the following 3 criteria: ① Any degree of flattening of the ventricular septum; ②Right ventricular dilation; ③Right ventricular hypertrophy [19]. It is also noteworthy that the dataset exhibited an imbalance between the amount of BPD-PH data and the amount of data without PH.

Methods

Data preprocessing

The dataset was divided into two parts: 80% for training and 20% for validation. In the preprocessing stage of the study, several steps (including categorical feature encoding, missing value imputation and feature selection) were performed to prepare the training datasets for the machine learning methods. Firstly, the features of the BPD-PH status are described in (Table 1). A p-value < 0.05 was considered statistically significant. 27 statistically significant indicators would be used in next steps. Secondly, categorical features were encoded into numerical form. The rules (Table 2) were utilized. Then, missing values in the datasets were addressed through imputation. The missing values in continuous features were replaced with the mean value of the respective feature, while the missing values in categorical features were replaced with the most frequent category. Lastly, in statistically significant indicators in single factor analysis, SelectKBest was utilized to select the most relevant features and evaluation function f_regression was applied to assess the correlation between the features and the target features [20]. The top 10 features with the highest scores (Table 3) are listed for analysis, constituting a better feature set for subsequent model training.

Table 1 Clinical characteristics between PH group and non-PH group

Full size table

Table 2 Value pairs before and after encoding

Full size table

Table 3 Top 10 features with the highest regression scores

Full size table

Oversampling

In our study population, the BPD-PH group accounted for 13% which was significantly lower than that of BPD group. Due to the lack of sufficient data, the classifier lacked the ability to characterize the BPD-PH group, making effective classification difficult. To reduce the impact of data imbalance and increase the accuracy of the model, oversampling of rare classes was used. For comparison, the number of samples in BPD-PH group was increased by using SMOTE.

Classification model and evaluation indicator

In our study, multiple machine learning methods were used and compared. These methods included multivariate logistic regression [21, 22], decision tree [23], random forest [24] and neural network [22]. Their performance was evaluated and compared on the same dataset to determine which method performed best for prediction. To improve the training results, hyperparameters were adjusted separately for each method. The multivariate predictive model results were evaluated under the metrics accuracy, sensitivity, specificity and negative predictive value (NPV).

Results

During the study period, 761 newborns (427 male and 334 female) with BPD were enrolled initially, among whom 99 newborns (13%) were later confirmed as having PH and 662 newborns (87%) as having no PH. The number of the 662 newborns with mild, moderate and severe BPD was 306, 84 and 272, respectively. The incidence of PH in grade mild to severe BPD newborns were 1.96% (6/306), 5.95% (5/84), and 32.35% (88/272), respectively (Fig. 1). The number of gestational weeks ranged from 24 to 32 weeks. The incidence of PH in 24–26, 26–28, 28–30, more than 30 gestational weeks at birth was 41.07% (23/56), 12.95% (44/365), 7.66% (20/261) and 15.19% (12/79), respectively. The number of birth weight ranged from 500 to 2000 g. The incidence of PH in 500–1000, 1000–1500,1500–2000 at birth weight was 17.30% (50/289), 10.76% (47/437) and 5.71% (2/35), respectively. The dataset was divided into two parts: 80% for training (608 cases) and 20% for validation (153 cases).

Feature selection

Based on the scores and rankings (Table 3), 5 features were chosen for model training and functional evaluation including the duration of invasive respiratory support, the severity of BPD, ventilator-associated pneumonia, pulmonary hemorrhage, and early-onset PH.

Training and evaluation

Training was performed by four machine learning methods and two sampling methods (original and SMOTE). Table 4 shows the results of different groups. When the model was trained directly using the training datasets without any data oversampling, it was possible to achieve a high accuracy and specificity. The accuracy scores, in logistics regression and neural network groups, could both achieve over 90%. However, due to imbalanced data, the classifier lacked the ability to characterize the BPD-PH group. So, the sensitivity, which indicates the detection rate of BPD-PH, tended to be low before oversampling. The accuracy scores in decision tree and random forest were lower than those in the previously mentioned groups, but the sensitivity was higher. The sensitivity of the model using SMOTE were significantly improved in logistics regression and neural network groups, but the effect was relatively small in random forest and decision tree groups.

Table 4 The evaluating indicators of the models

Full size table

Logistics regression and neural network groups exhibited the highest sensitivity (93.8%) and also relatively higher in other indicators. The final choice was the logistic regression model, because it provided interpretable results. The coefficients associated with each independent features could be easily interpreted as the log odds ratio, allowing us to understand the direction and magnitude of the effects. However, neural networks could capture complex patterns and relationships in large datasets. As the size of the data increased and the quality improved, neural networks may become more effective, so it is worthwhile to try out in our future work.

Figure 2 shows the result of the selection model. The area under the ROC curve (AUC) of this predictive model was 0.933. With the regression analysis, a predictive model was finally established.

$$\begin{array}{c}\text{logit}\left(\text{P}\right)=\text{ln}\left(\frac{\text{P}}{1-\text{P}}\right)=\\-4.850\\\begin{array}{c}+1.095\times\mathrm{The}\;\mathrm{severity}\;\mathrm{of}\;\mathrm{BPD}\;\left(\text{mild}=1,\;\mathrm{moderate}=2,\;\mathrm{severe}=3\right)\\+1.198\times\text{Early}-\mathrm{onset}\;\mathrm{PH}\;\left(\text{True}=1,\;\text{False}=0\right)\\+0.020\times\mathrm{Duration}\;\mathrm{of}\;\mathrm{invasive}\;\mathrm{respiratory}\;\mathrm{support}\;\left(\text{day}\right)\end{array}\\+0.703\times\text{Ventilator}-\mathrm{associated}\;\mathrm{pneumonia}\;\left(\text{True}=1,\;\text{False}=0\right)\\+0.948\times\mathrm{Pulmonary}\;\mathrm{hemorrhage}\;(\text{True}=1,\;\text{False}=0)\end{array}$$

If the value of logit(P) was greater than 0 (i.e. logit(P) = 0, odds = P/(1-P) = 1, P = 0.5), it was predicted as a positive class and otherwise as a negative class. A nomogram was developed based on the predictive model to improve the convenience of the model in clinical practice (Fig. 3).

Discussion

In this study, we established a good model of predicting PH in VPIs with BPD by collecting the clinical data from four tertiary-level hospitals in China by using an artificial intelligence algorithm and different models to select the final indicators. By using advanced machine learning techniques, we could identify some important clinical factors associated with BPD-PH, such as, duration of invasive respiratory support (day), the severity of BPD, ventilator-associated pneumonia, pulmonary hemorrhage, and early-onset PH.

An ability to accurately predict BPD-PH and then taking measures to intervene were clinically important to avoid the bad survival outcome of infants with BPD. Presently, there exists a deficiency in a dependable tool that can accurately forecast BPD-PH during its initial stages. In this study, through using oversampling algorithms and multi-institutional clinical data, we finally had an overall good performance. These indicators are readily available clinically, which do not cause harm to children, and neonatologists can use the score corresponding to the formula to visually assess the risk of BPD-PH in VPIs. Scores greater than 0 should be considered a warning sign for developing BPD-PH. Compared with the study of Collaco et al. [25] our study had the same patient criteria for all data and did a calibration test. Compared with the study of Trittmann et al. [26] The AUC of our study was obviously higher and the data of our study were more accessible. In addition, we had a larger sample size and did verification. Although the AUC of our study was lower than that of the study by WANG et al. [27] our study was multi-institutional with a larger sample size. Multi-institutional study involves collecting data from different regions and healthcare institutions, thereby increasing the representativeness and generalizability of the samples. Additionally, multi-center data can help validate and verify the consistency and universality of research findings. Therefore, utilizing multi-center data for disease prediction provides more reliable and comprehensive information. The sample size was large enough in our study to obtain good results and we combined machine learning techniques. It is our hope that the model developed herein will help neonatologists to identify VPIs at risk of BPD-PH in time in the future and help pediatric clinicians to reduce the incidence and mortality of BPD-PH.

We found duration of invasive respiratory support (day) is clearly correlated with BPD-PH. Due to lung immaturity of VPIs, respiratory support is an important treatment for BPD patients. During the process of ventilator treatment, lung inflammation and capillary endothelial cell damage are more likely to occur in immature lung tissues. Vascular remodeling and pulmonary artery pressure increased [18]. Therefore, protective ventilation strategies should be adopted for preterm infants to minimize the injury caused by mechanical ventilation.

Ventilator-associated pneumonia, the severity of BPD, early-onset PH are also important factors in the occurrence of BPD-PH in our study. Ventilator-associated pneumonia is an independent risk factor for the development of BPD-PH, and is also a postnatal respiratory injury that disrupts the growth of pulmonary vessels and alveoli [28]. The presence of lung damage caused by prolonged ventilator use increases the number of inflammatory cells and mediators in the systemic circulation [29], and repeated inflammatory infections of the lungs make ventilator evacuation difficult, thus prolonging the duration of mechanical ventilation, which forms a vicious circle. Similarly, many studies have shown that moderate to severe BPD is associated with the development of BPD-PH [30,31,32], which is consistent with our study. Preterm infants with moderate to severe BPD often have severe lung developmental disorders, and long-term mechanical ventilation induced pulmonary inflammation and aggravates alveolar epithelial cell damage, resulting in PH finally. And a prospective study [17] also found that early-onset PH was significantly associated with BPD progression and BPD-PH in preterm infants. Therefore, early-onset PH is an independent risk factor for the development of BPD-PH [33], which is also supported by the model developed in this study. Animal studies have shown that an increase in pulmonary vascular pressure may directly damage vascular growth and alveolarization during lung development. In a study using a chronic intrauterine pulmonary hypertension model in fetal sheep, Grover et al. [34] found that chronic PH resulted in thickening of the pulmonary arteriole walls, decreased pulmonary arterial density, and simplified alveoli. These studies suggest that early-onset PH may play a role in the progression of BPD-PH by early impairing lung development and affecting vascular and alveolar development. These clinical factors are closely related and interact with each other, ultimately leading to the occurrence of BPD-PH.

In addition, our study suggests that pulmonary hemorrhage is also an important factor in the development of BPD-PH. This is an interesting finding, as few previous studies have reported pulmonary hemorrhage as a risk factor for BPD-PH. VPIs with pulmonary hemorrhage was mainly caused by other primary diseases such as infection and often had lower gestational age and body weight, and their lung development is absolutely immature. Previous studies [35] have shown that pulmonary hemorrhage children had a high incidence of BPD. So, these children often have higher requirements for ventilator parameters when pulmonary hemorrhage occurs, and the longer the usage of ventilator, the more severe the damage to the airway, pulmonary blood vessels and lung interstitium, and the more difficult to wean the ventilator, leading to an increased incidence of BPD-PH. Therefore, prevention and timely and effective treatment of pulmonary hemorrhage are beneficial in preventing the occurrence of BPD-PH.

This study has some strengths and weaknesses. In this study, we used the oversampling technique to address the issue of imbalanced positive and negative cases, finding that it could significantly improve the accuracy of the results. By using the oversampling technique, we artificially increased the number of minority class samples, ensuring a more balanced representation of both positive and negative cases in our dataset. SMOTE generated synthetic examples of PH-group, effectively increasing its presence in the training data. This helps the model learn and generalize better, improving its ability to accurately classify both positive and negative instances. Our study used four machine learning methods for training, and selected the optimal model through comparison. The advantage of using multiple machine learning methods lies in the ability to analyze and understand the data from different perspectives. Each method has its own underlying assumptions and algorithms, which can capture unique patterns and insights within the dataset. By comparing the results obtained from different methods, we can gain a comprehensive understanding of the data and identify the best-performing model. We found that decision tree and random forest performed relatively well before oversampling, but logistic regression performed better after oversampling. Both logistics regression and neural network performed well after oversampling and the final choice was the logistic regression model, because it provided interpretable results. However, as the size of the data increases and the quality improves, neural networks may become more effective, so it is worthwhile to try out in our future work. This study also had some limitations. Firstly, this model is mainly suitable for patients at or after the time of correcting the gestational age of 36 weeks, but for patients who are less than 36 weeks correcting the gestational age, we can also use it after inferring based on their condition. Then we can intervene with patients early or close follow-up should be conducted after these high-risk children patient is discharged. Secondly, our model involves four centers, but one center providing the bulk of the samples. This may limit the generalizability beyond the centers that provided the dataset. We need to continue to collect more positive cases from more centers for verification to improve the accuracy of the model and retry a neural network way to build a better predictive model. Thirdly, due to the unbalanced data, oversampling method was used in our study. However, oversampling may promote overfitting of the model to minority class samples, reducing generalizability beyond the dataset. We are going to carry out a prospective cohort study and follow-up to overcome this problem.

Conclusion

In this study, we established a predictive model of BPD-PH by using five most significant of 46 clinical features. The predictive model could help clinicians to make early diagnosis and formulate better treatment plans for VPIs with BPD-PH in that it presented good performance for prediction and offered an AUC of 93.3%. Of course, larger-sample studies using other machine learning techniques to develop more BPD-PH predictive models are required to verify the findings and conclusions of the present study.

Availability of data and materials

The dataset analyzed during the current study is available from the corresponding author on reasonable request.

Abbreviations

BPD:: Bronchopulmonary dysplasia
PH:: Pulmonary hypertension
SMOTE:: Synthetic minority over-sampling technique
PDA:: Patent ductus arteriosus
NRDS:: Neonatal respiratory distress syndrome
NEC:: Neonatal necrotizing enterocolitis
ROP:: Retinopathy of prematurity
SGA:: Small for gestational age
MAS:: Meconium aspiration syndrome
PROM:: Premature rupture of the membrane
PS:: Pulmonary surfactant
VSD:: Ventricular septal defect
ASD:: Atrial septal defect
PFO:: Patent foramen ovale
VPIs:: Very preterm infants
NPV:: Negative predictive value
AUC:: Area under the ROC curve
SD:: Standard deviation

References

Mohammadizadeh M, Ardestani AG, Sadeghnia AR. Early administration of surfactant via a thin intratracheal catheter in preterm infants with respiratory distress syndrome: Feasibility and outcome. J Res Pharm Pract. 2015;4(1):31–6.
Article PubMed PubMed Central Google Scholar
Sheth S, Goto L, Bhandari V, Abraham B, Mowes A. Factors associated with development of early and late pulmonary hypertension in preterm infants with bronchopulmonary dysplasia. J Perinatol. 2020;40(1):138–48.
Article CAS PubMed Google Scholar
Slaughter JL, Pakrashi T, Jones DE, South AP, Shah TA. Echocardiographic detection of pulmonary hypertension in extremely low birth weight infants with bronchopulmonary dysplasia requiring prolonged positive pressure ventilation. J Perinatol. 2011;31(10):635–40.
Article CAS PubMed Google Scholar
Hansmann G, Sallmon H, Roehr CC, Kourembanas S, Austin ED, Koestenberger M. Pulmonary hypertension in bronchopulmonary dysplasia. Pediatr Res. 2021;89(3):446–55.
Article PubMed Google Scholar
Seo YH, Choi HJ. Clinical utility of echocardiography for early and late pulmonary hypertension in preterm infants: relation with bronchopulmonary dysplasia. J Cardiovasc Ultrasound. 2017;25(4):124–30.
Article PubMed PubMed Central Google Scholar
Khemani E, McElhinney DB, Rhein L, Andrade O, Lacro RV, Thomas KC, Mullen MP. Pulmonary artery hypertension in formerly premature infants with bronchopulmonary dysplasia: clinical features and outcomes in the surfactant era. Pediatrics. 2007;120(6):1260–9.
Article PubMed Google Scholar
Hocq C, Vanhoutte L, Guilloteau A, Massolo AC, Van Grambezen B, Carkeek K, Piersigilli F, Danhaive O, from the European Society for Pediatric Research. Early diagnosis and targeted approaches to pulmonary vascular disease in bronchopulmonary dysplasia. Pediatr Res. 2022;91(4):804–15.
Article PubMed Google Scholar
Levy PT, Levin J, Leeman KT, Mullen MP, Hansmann G, Kourembanas S. Diagnosis and management of pulmonary hypertension in infants with bronchopulmonary dysplasia. Semin Fetal Neonatal Med. 2022;27(4):101351.
Article PubMed Google Scholar
Lemaître GNF, Aridas CK. Imbalanced-learn: a python toolbox to tackle the curse of imbalanced datasets in machine learning. JMLR. 2017;18(1):559–63.
Google Scholar
Jobe AH, Bancalari E. Bronchopulmonary dysplasia. Am J Respir Crit Care Med. 2001;163(7):1723–9.
Article CAS PubMed Google Scholar
Alriyami A, Kiger JR, Hooven TA. Ventilator-associated pneumonia in the neonatal intensive care unit. NeoReviews. 2022;23(7):e448–61.
Article PubMed Google Scholar
Chinese Medical Association Pediatrics Branch Neonatology Group, Chinese Medical Doctor Association Neonatology Physicians Branch Infection Specialty Committee. Diagnosis and treatment of neonatal sepsis Chinese consensus (2019 version). Chin J Pediatr. 2019;57(4):252–7.
Google Scholar
Welde MA, Sanford CB, Mangum M, Paschal C, Jnah AJ. Pulmonary Hemorrhage in the Neonate. Neonatal Netw. 2021;40(5):295–304.
Article PubMed Google Scholar
Mitra S, Florez ID, Tamayo ME, Mbuagbaw L, Vanniyasingam T, Veroniki AA, Zea AM, Zhang Y, Sadeghirad B, Thabane L. Association of placebo, indomethacin, ibuprofen, and acetaminophen with closure of hemodynamically significant patent ductus arteriosus in preterm infants: a systematic review and meta-analysis. JAMA. 2018;319(12):1221–38.
Article CAS PubMed PubMed Central Google Scholar
Xu CJ, Hua KQ. Practical obstetrics and gynecology 4^th edition. Beijing: People’s Health Publishing House; 2018.
Google Scholar
Shao XM, Ye HM, Qiu XS. Practical neonatology. 5th ed. Beijing: People’s Medical Press; 2019.
Google Scholar
Mourani PM, Sontag MK, Younoszai A, Miller JI, Kinsella JP, Baker CD, Poindexter BB, Ingram DA, Abman SH. Early pulmonary vascular disease in preterm infants at risk for bronchopulmonary dysplasia. Am J Respir Crit Care Med. 2015;191(1):87–95.
Article PubMed PubMed Central Google Scholar
Mirza H, Ziegler J, Ford S, Padbury J, Tucker R, Laptook A. Pulmonary hypertension in preterm infants: prevalence and association with bronchopulmonary dysplasia. J Pediatr. 2014;165(5):909-14. e1.
Article PubMed Google Scholar
Chinese Medical Association Pediatric Branch Neonatology Group, Editorial Committee of Chinese Journal of Pediatrics. Expert consensus on diagnosis and treatment of neonatal pulmonary arterial hypertension. Chin J Pediatr. 2017;55(3):163–8.
Google Scholar
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, et al. Scikit-learn: machine learning in python. J Mach Learn Res. 2011;12:2825–30.
Google Scholar
Nick TG, Campbell KM. Logistic regression. Topics in biostatistics 2007: 273–301.
Dreiseitl S, Ohno-Machado L. Logistic regression and artificial neural network classification models: a methodology review. J Biomed Inform. 2002;35(5–6):352–9.
Article PubMed Google Scholar
Safavian SR, Landgrebe D. A survey of decision tree classifier methodology. IEEE Trans Syst Man Cybern. 1991;21(3):660–74.
Article Google Scholar
Breiman L. Random forests. Mach Learn. 2001;45:5–32.
Article Google Scholar
Collaco JM, Dadlani GH, Nies MK, Leshko J, Everett AD, McGrath-Morrow SA. Risk factors and clinical outcomes in preterm infants with pulmonary hypertension. PLoS ONE. 2016;11(10):e0163904.
Article PubMed PubMed Central Google Scholar
Trittmann JK, Bartenschlag A, Zmuda EJ, Frick J, Stewart WCL, Nelin LD. Using clinical and genetic data to predict pulmonary hypertension in bronchopulmonary dysplasia. Acta Paediatr. 2018;107(12):2158–64.
Article CAS PubMed PubMed Central Google Scholar
Wang C, Ma X, Xu Y, Chen Z, Shi L, Du L. A prediction model of pulmonary hypertension in preterm infants with bronchopulmonary dysplasia. Front Pediatr. 2022;10:925312.
Article PubMed PubMed Central Google Scholar
Vayalthrikkovil S, Vorhies E, Stritzke A, Bashir RA, Mohammad K, Kamaluddeen M, Thomas S, Al Awad E, Murthy P, Soraisham A. Prospective study of pulmonary hypertension in preterm infants with bronchopulmonary dysplasia. Pediatr Pulmonol. 2019;54(2):171–8.
Article PubMed Google Scholar
Carvalho CG, Silveira RC, Procianoy RS. Ventilator-induced lung injury in preterm infants. Rev Bras Ter Intensiva. 2013;25(4):319–26.
Article PubMed PubMed Central Google Scholar
Weismann CG, Asnes JD, Bazzy-Asaad A, Tolomeo C, Ehrenkranz RA, Bizzarro MJ. Pulmonary hypertension in preterm infants: results of a prospective screening program. J Perinatol. 2017;37(5):572–7.
Article CAS PubMed Google Scholar
Check J, Gotteiner N, Liu X, Su E, Porta N, Steinhorn R, Mestan KK. Fetal growth restriction and pulmonary hypertension in premature infants with bronchopulmonary dysplasia. J Perinatol. 2013;33(7):553–7.
Article CAS PubMed PubMed Central Google Scholar
Stuart BD, Sekar P, Coulson JD, Choi SE, McGrath-Morrow SA, Collaco JM. Health-care utilization and respiratory morbidities in preterm infants with pulmonary hypertension. J Perinatol. 2013;33(7):543–7.
Article CAS PubMed Google Scholar
Mourani PM, Mandell EW, Meier M, Younoszai A, Brinton JT, Wagner BD, Arjaans S, Poindexter BB, Abman SH. Early pulmonary vascular disease in preterm infants is associated with late respiratory outcomes in childhood. Am J Respir Crit Care Med. 2019;199(8):1020–7.
Article PubMed PubMed Central Google Scholar
Grover TR, Parker TA, Balasubramaniam V, Markham NE, Abman SH. Pulmonary hypertension impairs alveolarization and reduces lung growth in the ovine fetus. Am J Physiol Lung Cell Mol Physiol. 2005;288(4):L648–54.
Article CAS PubMed Google Scholar
Lou Y Dai YH, Liu WD, Huang WM. Risk factors of bronchopulmonary dysplasia in preterm infants. Chin Pediatr Emerg Med. 2015;22(7):474–7.
Google Scholar

Download references

Acknowledgements

We would like to thank The Seventh Medical Center of PLA General Hospital, Qingdao Women and Children’s Hospital, Tianjin Central Hospital of Gynecology Obstetrics, Guangdong Women and Children Hospital for providing the data.

Funding

This study was supported by National Key R&D Program of China (2021YFC2701702 to Q.L.). The science and technology innovation Program of Hunan Province(2023RC4012). Hunan Province Natural Science Foundation Youth Project(2022JJ40202). The study sponsor had no role in study design, collection, analysis and interpretation of data, the writing of the report, and the decision to submit the manuscript for publication.

Author information

Dan Wang, Shuwei Huang and Jingke Cao contributed equally as co-first authors.
Qiuping Li, Jingwei Yang, Lin Liu, Jie Yang contributed equally as co-corresponding authors.

Authors and Affiliations

Newborn Intensive Care Unit, Faculty of Pediatrics, the Seventh Medical Center of PLA General Hospital, Beiing, China
Dan Wang, Jingke Cao, Zhichun Feng, Changgen Liu & Qiuping Li
The Second School of Clinical Medicine, Southern Medical University, Guangzhou, China
Dan Wang, Jingke Cao, Zhichun Feng, Changgen Liu & Qiuping Li
School of Software, Tsinghua University, Beijing, China
Shuwei Huang & Lin Liu
Department of Cardiology, Hunan Children’s Hospital, Changsha, China
Dan Wang
Department of Neonatology, Qingdao Women and Children’s Hospital, Qingdao, China
Qiannan Jiang
Department of Neonatology, Tianjin Central Hospital of Gynecology Obstetrics, Tianjin, China
Wanxian Zhang
Department of Neonatology, Guangdong Women and Children Hospital, Guangdong Neonatal ICU Medical Quality Control Center, Guangzhou, China
Jia Chen
Pediatric and Congenital Cardiology, Taussig Heart Center, Johns Hopkins School of Medicine, Baltimore, MD, USA
Shelby Kutty
Department of Statistics and Data Science, BNU-HKBU United International College, Zhuhai, China
Wenyu Liao, Le Zhang, Guli Zhu, Wenhao Guo & Jingwei Yang
Department of Neonatology, Nanfang Hospital, Southern Medical University, Guangzhou, China
Jie Yang

Authors

Dan Wang
View author publications
You can also search for this author in PubMed Google Scholar
Shuwei Huang
View author publications
You can also search for this author in PubMed Google Scholar
Jingke Cao
View author publications
You can also search for this author in PubMed Google Scholar
Zhichun Feng
View author publications
You can also search for this author in PubMed Google Scholar
Qiannan Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Wanxian Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Jia Chen
View author publications
You can also search for this author in PubMed Google Scholar
Shelby Kutty
View author publications
You can also search for this author in PubMed Google Scholar
Changgen Liu
View author publications
You can also search for this author in PubMed Google Scholar
Wenyu Liao
View author publications
You can also search for this author in PubMed Google Scholar
Le Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Guli Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Wenhao Guo
View author publications
You can also search for this author in PubMed Google Scholar
Jie Yang
View author publications
You can also search for this author in PubMed Google Scholar
Lin Liu
View author publications
You can also search for this author in PubMed Google Scholar
Jingwei Yang
View author publications
You can also search for this author in PubMed Google Scholar
Qiuping Li
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Dan Wang, Shuwei Huang, and Jingke Cao drafted and designed the study. Jingke Cao, Changgen Liu, Qiannan Jiang, Wanxian Zhang, and Jia Chen acquired the data. Qiuping Li, Jingwei Yang, Lin Liu, Jie Yang, and Shelby Kutty analyzed the data and revised the manuscript. Shuwei Huang, Wenyu Liao, Le Zhang, Guli Zhu, Wenhao Guo, Jingwei Yang, and Lin Liu analyzed the data. Qiuping Li, Zhichun Feng supervised the study. All authors reviewed and approved the final manuscript.

Corresponding authors

Correspondence to Jie Yang, Lin Liu, Jingwei Yang or Qiuping Li.

Ethics declarations

Ethics approval and consent to participate

This study was approved by the research ethics board of the Seventh Medical Center of PLA General Hospital (No. 2022–02).

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article

Wang, D., Huang, S., Cao, J. et al. A comprehensive study on machine learning models combining with oversampling for bronchopulmonary dysplasia-associated pulmonary hypertension in very preterm infants. Respir Res 25, 199 (2024). https://doi.org/10.1186/s12931-024-02797-z

Download citation

Received: 31 October 2023
Accepted: 31 March 2024
Published: 08 May 2024
DOI: https://doi.org/10.1186/s12931-024-02797-z

A comprehensive study on machine learning models combining with oversampling for bronchopulmonary dysplasia-associated pulmonary hypertension in very preterm infants