Identification of genetic factors underlying persistent pulmonary hypertension of newborns in a cohort of Chinese neonates

Background Persistent pulmonary hypertension of the newborn (PPHN) is a severe clinical problem among neonatal intensive care unit (NICU) patients. The genetic pathogenesis of PPHN is unclear. Only a few genetic polymorphisms have been identified in infants with PPHN. Our study aimed to investigate the potential genetic etiology of PPHN. Methods This study recruited PPHN patients admitted to the NICU of the Children’s Hospital of Fudan University from Jan 2016 to Dec 2017. Exome sequencing was performed for all patients. Variants in reported PPHN/pulmonary arterial hypertension (PAH)-related genes were assessed. Single nucleotide polymorphism (SNP) association and gene-level analyses were carried out in 74 PPHN cases and 115 non-PPHN controls with matched baseline characteristics. Results Among the patient cohort, 74 (64.3%) patients were late preterm and term infants (≥ 34 weeks gestation) and 41 (35.7%) were preterm infants (< 34 weeks gestation). Preterm infants with PPHN exhibited low birth weight and a high frequency of bronchopulmonary dysplasia, respiratory distress syndrome (RDS) and mortality. Nine patients (only one preterm infant) were identified as harboring genetic variants, including three with pathogenic/likely pathogenic variants in TBX4 and BMPR2 and six with variants of unknown significance in BMPR2, SMAD9, TGFB1, KCNA5 and TRPC6. Three SNPs (rs192759073, rs1047883 and rs2229589) in CPS1 and one SNP (rs1044008) in NOTCH3 were significantly associated with PPHN (p < 0.05). CPS1 and SMAD9 were identified as risk genes for PPHN (p < 0.05). Conclusions In this study, we identified genetic variants in PPHN patients, and we reported CPS1, NOTCH3 and SMAD9 as risk genes for late preterm and term PPHN in a single-center Chinese cohort. Our findings provide additional genetic evidence of the pathogenesis of PPHN and new insight into potential strategies for disease treatment. Electronic supplementary material The online version of this article (10.1186/s12931-019-1148-1) contains supplementary material, which is available to authorized users.


Background
Persistent pulmonary hypertension of the newborn (PPHN) is caused by a failure in the normal circulatory transition at birth and is characterized by elevated pulmonary vascular resistance (PVR), which leads to right-to-left shunting and hypoxemia. The incidence of PPHN ranges from 2 to 6 per 1000 live births and the mortality rate is 10-20% [1]. PPHN can be idiopathic or may be caused by multiple pulmonary diseases including perinatal asphyxia, meconium aspiration syndrome (MAS), respiratory distress syndrome (RDS), pulmonary dysplasia and congenital diaphragmatic hernia [2].
To date, only a few genetic polymorphisms have been identified in infants with PPHN. Pearson et al. first found that a T1405 N variant of carbamoyl phosphate synthetase I (CPS1) exhibited a different distribution between infants with PPHN and the general population [3]. A homozygous missense variant (L326R) in the ABCA3 gene was identified in a newborn with severe hypoxemic respiratory failure and refractory pulmonary hypertension [4]. Single nucleotide polymorphisms (SNPs) in the corticotropin-releasing hormone receptor 1 (CRHR1) and corticotropinreleasing hormone-binding protein (CRHBP) genes were significantly associated with PPHN [5]. Most recently, rs2070699 in endothelin 1 (EDN1) was found to increase the risk of PPHN with respiratory distress [6].
PPHN is a subgroup of pulmonary arterial hypertension (PAH), which is a complex disorder characterized by elevation of PVR and failure of right heart function. It is a common complication of many clinical diseases. PAH is classified into five types according to the pathogenesis of the disease (WHO classification) and PPHN is a specific group [7]. Pathogenic variants among several genes have been reported in PAH patients exhibiting both adulthood onset and childhood onset. Bone morphogenic protein receptor type 2 (BMPR2), a member of the transforming growth factor beta (TGF-β) superfamily, is associated with 70% of familial pulmonary arterial hypertension (FPAH)/heritable pulmonary arterial hypertension (HPAH) cases and 20% of idiopathic pulmonary hypertension (IPAH) cases [8]. SMAD9 [9], CAV1 [10], KCNK3 [11] are also known PAH genes listed in the Online Mendelian Inheritance in Man (OMIM) database. Hereditary hemorrhagic telangiectasia (HHT) gene variants in the activin receptor-like kinase 1 (ACVRL1) [12] and endoglin (ENG) [13] have been identified in PAH patients. To date, no causal genes for PPHN have been reported, and the genetic etiology remains unclear. We suggest that the genetic pathogenesis of PPHN may share some similarities with PAH in adults and children.
Therefore, in the present study, we applied clinical exome sequencing to investigate the genetic etiology of PPHN in 115 Chinese patients. We aimed to identify causal variants in reported PPHN/PAH-related genes and genetic risk polymorphisms for PPHN patients.

Study participants
In this study, neonates who were admitted to the neonatal intensive care unit (NICU) of the Children's Hospital of Fudan University from January 2016 to December 2017 were recruited. The inclusion criteria were neonates with hypoxemic respiratory failure with a clinical diagnosis of pulmonary hypertension within 28 days after birth. Neonates with PPHN were diagnosed based on clinical and echocardiographic data. The criteria used for diagnosis was from previously published work by Alano et al. [14]: 1) the clinical criteria included a preductal/postductal oxygen saturation difference of > 10%, and 2) the echocardiographic criteria included a structurally normal heart and elevated pulmonary artery pressure (PAP). The last criterion was considered to be present if there was either right-to-left or bidirectional flow across the patent ductus arteriosus or foramen ovale, or systolic pulmonary arterial pressure was greater than or equal to the systemic blood pressure according to Doppler measurement of the tricuspid-regurgitation jet. Neonates with congenital anomalies or structural congenital heart disease other than patent ductus arteriosus and patent foramen ovale were excluded.
Infants who did not have pulmonary hypertension were collected as non-PPHN controls. Late preterm and term PPHN cases and control infants with matched baseline characteristics were included for further casecontrol analysis. All guardians of the subjects included in this study were provided with appropriate informed consent. This study was approved by the ethics committee of the Children's Hospital of Fudan University.

Clinical exome sequencing
Genomic DNA was extracted from whole blood using the QIAamp DNA Blood Mini Kit (Qiagen, Hilden, Germany). The DNA concentration was measured using a Nano-Drop spectrophotometer (ND-1000, Thermo Fisher Scientific Inc., Waltham, MA, USA). The clinical exome panel used for sequencing, which covered 2742 genes causing inherited diseases, was generated using the Agilent ClearSeq Inherited Disease Kit (Agilent Technologies, Santa Clara, CA) and Illumina Cluster and SBS Kits (Illumina Inc., San Diego, CA, USA). Sequencing was performed on the Illumina HiSeq 2000/ 2500 platform (Illumina Inc., San Diego, CA, USA). Clean reads were aligned to the reference human genome (UCSC hg19) by the Burrows-Wheeler Aligner (BWA; v.0.5.9-r16). After quality control, variants were obtained using GATK.

Variant annotation and classification
Variants were annotated by using ANNOVAR [15] and VEP [16] software, and the Human Gene Mutation Database (HGMD, professional version) and ClinVar database (https://www.ncbi.nlm.nih.gov/clinvar/). Missense variants were evaluated with SIFT [17], PolyPhen-2 [18] and MutationTaster [19]. Common variants with a minor allele frequency (MAF) > 0.01 were excluded based on the Exome Aggregation Consortium (ExAC) database (http://exac.broadinstitute.org/), 1000 Genomes database (http://www.internationalgenome.org/) and our in-house database. Synonymous and intronic variants falling outside the +/− 15 bp boundaries of exons were also discarded. MAFs for all variants were collected from the ExAC database and the Genome Aggregation Database (GnomAD, http://gnomad.broadinstitute.org/). Causal variants were further selected in 25 reported PPHN/PAH disease-related genes (Additional file 1: Table S1). Only variants with a very low frequency (MAF < 0.005) in both the overall and East Asian populations and were absent from infants without PPHN in our in-house database (including 24,336 samples) were further considered to be pathogenic. The pathogenicity of the variants was defined based on the American College of Medical Genetics and Genomics (ACMG) criteria [20]. Specifically, a case was classified as molecularly diagnosed when the identified pathogenic or likely pathogenic (P/LP) variants were truncating variants or reported missense variants detected in a disease gene that sufficiently explained the phenotypes of the studied individual. Variants with unknown significance (VUSs) were associated with one or more clinical phenotypes of patients and were absent or present in the GnomAD/ ExAC databases with low frequencies.

SNP calling and quality control
Only SNPs with high confidence (depth ≥ 10, call ratio ≥ 0.2 for heterogeneous variants and ≥ 0.6 for homogeneous variants) that were present in at least one individual were selected for further analysis. The following filtering criteria were used to filter SNPs for case-control analysis: 1) missing rate < 0.2, 2) MAF (PLINK) > 0.001; 3) control Hardy-Weinberg equilibrium (HWE) (PLINK) > 0.001. SNPs within the sequencing coding region of 25 PPHN/PAH-related genes were included for further analysis. The workflow is shown in Fig. 1.

Statistical analysis
Differences in clinical characteristics in different study groups were analyzed using the t-test for continuous variables and the chi-square test for categorical variables. A two-sided type I error of 0.05 was used to test for statistical significance. The statistical analyses were performed with SPSS version 16 (SPSS Inc., Chicago, IL, Fig. 1 Flow diagram of the genetic testing and analysis strategy applied in the study. Exome sequencing was performed for all patients, and sequencing data were used for the following analysis. Disease-causing variants among 25 PPHN/PAH-related genes were analyzed in 115 PPHN cases. SNP association analysis and gene-level analysis were carried out in 74 PPHN cases and 115 non-PPHN controls USA). The results of SNP association analysis using exome sequence data were analyzed with the chi-square test using PLINK software (http://zzz.bwh.harvard.edu/ plink/, version 1.07). Gene-level analysis using the Sequence Kernel Association test (SKAT) [21] in the R package with default settings was carried out for rare SNPs (MAF < 0.05).

Patient characteristics
A total of 115 infants diagnosed with PPHN were enrolled in this study. The average gestational age was 34.9 weeks, and the average birth weight was 2516.2 g. The majority of the patients were male (64, 55.7%). The age at diagnosis for all patients ranged from 1 day to 5 days after birth, and the majority (110 patients) were diagnosed within 3 days. RDS (53, 46.1%) and pneumonia (44, 38.3%) were two major primary diagnoses in PPHN cases. Among all cases, 17 were treated with inhaled nitric oxide (iNO), and 4 were treated with extracorporeal membrane oxygenation (ECMO). The patients' demographic and clinical characteristics are shown in Table 1 and Fig. 2a.
PPHN occurred in most (64.3%, 74/115) of the late preterm and term infants (≥ 34 weeks gestation), and 35.7% (41/115) of the patients were preterm infants (< 34 weeks gestation). Preterm infants exhibited lower birth weight, a higher incidence of RDS and were more likely to require longer ventilation treatment. iNO treatment was usually carried out in late preterm and term infants, and ECMO was only performed in late preterm and term infants. Specifically, 17 infants (16 infants > 34 weeks gestation and 1 infant < 34 weeks gestation) were treated with iNO, and 4 infants (all > 34 weeks gestation) were treated with ECMO. Mortality was higher in preterm patients than late preterm and term infants (17.1% vs 9.5%).

Sequencing results
An average of 27.5 million effective reads were generated with an average sequencing depth of 230.98-fold per target in 115 PPHN patients and 115 non-PPHN controls. In total, 99.8% of the target region was covered, among which 99.6% was covered at least 10-fold, and 99.2% was covered at least 20-fold.

Genetic variant identification PPHN/PAH-related genes
In total, 9 phenotype related variants spanning 6 PAH-related genes were identified in 9 patients (7.8%). The variants included 3 P/LP variants and 6 VUSs, among which two were reported disease-causing variants, and seven were novel. The 3 P/LP variants, including one stop-gain variant (c.1633G > T. p.G545X) in TBX4 and two reported PAH-     (Fig. 2b). This variant has been reported as a possibly pathogenic variant for PAH. Although this variant appears in GnomAD (16 heterogeneous carriers) and ExAC (2 heterogeneous carriers), it is absent in the East Asian population in these two databases and in our large internal database with 24,336 non-PPHN samples. Therefore, this variant was considered an LP for PPHN and needs to be further confirmed in the patients' family. Patients P004 and P005, who exhibited the novel VUSs M189 V and D199V in BMPR2, located between the transmembrane region and the kinase domain of the protein, were diagnosed with mild PPHN. Patients P006 and P008-P009, who harbored VUSs in SMAD9, KCNA5 and TRPC6, also displayed moderate to mild phenotypes. A boy P007 who carried an E142D variant in TGFB1 exhibited severe PPHN with an MOI value of 23.1. He was treated with ventilation for 11 days and vasoactive agent therapy for 7 days. This residue is located in the α3-helix of the ARM domain of the latency-associated peptide (LAP) region (Fig. 2b). LAP is required for homodimer assembly and protein secretion, and regulates the bioactivity of TGF-β [24]. Detailed information on the clinical phenotypes is shown in Additional file 2: Table S2.

Risk polymorphism identification
We included 74 PPHN cases and 115 non-PPHN controls with matched clinical characteristics (≥ 34 weeks gestational age) in case-control analysis to exclude the influence of nongenetic risk factors. All baseline characteristics of the case and control groups were similar (Table 1). After quality control and SNP filtering, 153 SNPs remained in 25 PPHN/PAH-related genes. Three SNPs in CPS1 and one SNP in NOTCH3 were significantly associated with PPHN ( Table 3). The most significant SNPs were rs192759073 in CPS1 and rs1044008 in NOTCH3 (p = 0.03). The other two SNPs in CPS1, rs1047883 and rs2229589 were in linkage disequilibrium (LD, r 2 = 1) and exhibited a relatively high frequency in public databases (0.45). We considered the first two SNPs (rs192759073 in CPS1 and rs1044008 in NOTCH3) to be better risk markers for PPHN. Gene-level analysis was performed for 128 rare SNPs (MAF < 0.05) spanning 15 genes. Among these gene sets, CPS1 was associated with PPHN at p = 0.006, which was consistent with the association of CPS1 SNPs and PPHN (Additional file 3: Table S3). SMAD9 was also associated with PPHN at p = 0.039.

Discussion
PPHN is a severe clinical problem and accounts for~6% of our NICU patients. The role of genetics in the pathogenesis of PPHN remains elusive. The present study investigated the genetic contributions to the pathogenesis of PPHN in 115 Chinese PPHN patients using exome sequencing. Among all cases, 41 were preterm infants and 74 were late preterm and term infants. We identified three patients with P/LP variants in TBX4 and BMPR2 and six patients with VUSs in BMPR2 and 4 other reported PAHrelated genes. CPS1, NOTCH3 and SMAD9 were identified as important risk genes for late preterm and term PPHN through case-control analysis.
PPHN has generally been recognized to occur among late preterm and term infants, but studies have reported an increasing rate of PPHN in preterm infants [25]. In this study, most of the infants with PPHN were late preterm and term infants (≥ 34 weeks gestation), and preterm infants also accounted for 35.7% of the patients. Among the 9 patients with genetic findings, only 1 patient with c.596A > T (p.D199V) in BMPR2 was born before 34 weeks gestation (32 + 3 weeks). The genetic diagnosis rates were different in the two groups (8/74 in the ≥34-week gestation group vs 1/41 in the < 34-week gestation group). Our findings indicated that preterm complications play major roles in preterm infants with PPHN, while genetic factors have a greater effect on late preterm and term infants.
In terms of the genetic background of PPHN, previous studies have not found the disease-causing gene for PPHN patients thus far, and only polymorphisms in 5 genes have been reported to be associated with PPHN. PAH has been widely studied in both adults and children, and 20 genes have been associated with the development of the disease (Additional file 1: Table S1). The genetic etiology of PPHN in newborns is complex and unclear and may share some similarities with PAH in adults and children. In this study, we identified several variants in PAH-related genes, which verified that PAH and PPHN potentially exhibit a common genetic pathogenesis. We also found three disease-causing genes in the three other patients. However, these genes were not associated with the development of pulmonary hypertension. Further studies are needed to investigate other potential disease-causing genes related to PPHN. Among the genes identified in this study, several genetic variants in the BMP/TGF-β/SMAD pathways were identified, including three P/LP variants in TBX4 and BMPR2 and one VUS in TGFB1 related to severe clinical phenotypes in four patients. BMP/TGF-β/SMAD signaling (especially BMPR2) has been reported to be involved in the regulation of the proliferation and apoptosis of pulmonary arterial smooth muscle cells (PASMCs) and pulmonary arterial endothelial cells (PAECs) [26]. In a previous study of PAH, BMPR2 variants were more commonly found in females than males (3.6:1 ratio in adult-onset PAH cases and 1.7:1 ratio in pediatric-onset PAH cases) [27]. The sex ratio was similar (3:1, female: male ratio) in our study among the 4 BMPR2 variantcarrying PPHN patients. TBX4 is a member of the T-box genes that is important for the development of airway branching and the regulation of lung fibrosis. TBX4 variants have been reported in childhood-onset PAH patients [28] and might contribute to PAH by decreasing the activation of the BMP/TGF-β/SMAD signaling pathways [29]. TGFB1 (transforming growth factor β1) is a member of the TGF-β superfamily, whose members are important modulators of cell growth, inflammation and apoptosis. TGFB1 can suppress the proliferation and migration of endothelial and smooth muscle cells and thereby inhibit vascular remodeling. Variants in TGFB1 might affect its function and lead to pulmonary hypertension [30]. Both the TGF-β and BMP signaling pathways ultimately converge on SMADs. One rare SMAD9 variant, A196V, located in the linker domain of the protein was identified in one patient (Fig. 2b). The linker region of SMAD9 is rendered shorter than those of other SMADs, which suppresses its transcriptional activity and ability to activate BMP signaling, while facilitating interaction with other molecules [31]. In addition, two ion channel genes, which might also play important roles in PAH, were identified in our patients. The Kv1.5 channel gene (KCNA5) is a pore-forming α-subunit that forms a voltage-gated K + channel in PASMCs. Downregulation of KCNA5 causes membrane depolarization and increases the cytosolic Ca 2+ concentration, resulting in pulmonary vasoconstriction, and pulmonary vascular remodeling [32]. A novel D549Y variant in KCNA5 was identified in one girl. The residue is located in the Clinker region following transmembrane domain segment 6. Another novel variant, F443I in TRPC6, was identified in another patient. TRPC6 is an important member of the TRPC channels of the transient receptor potential (TRP) superfamily expressed in the lungs and PASMCs [33]. A SNP in the promoter region of TRPC6 has been demonstrated to increase the risk of IPAH by recruiting NF-κB [34].
Furthermore, we performed SNP association and genelevel analyses in 25 PPHN/PAH-disease related genes among 74 late preterm and term PPHN cases and 115 controls with matched clinical characteristics to further investigate the genetic etiology of PPHN. We identified 3 SNPs in CPS1 and 1 SNP in NOTCH3 associated with PPHN. The CPS1 SNPs rs192759073 and rs2229589 are synonymous variants, and rs1047883 is a missense variant. The heterozygous rs192759073 T allele was identified in 3 female PPHN patients and none of the controls. For rs1047883 and rs2229589, homozygous SNPs were found in 19 PPHN cases and 20 controls and heterozygous SNPs were found in 41 PPHN cases and 58 controls. The synonymous SNP rs1044008 in NOTCH3 was detected in three PPHN patients (heterozygous). These SNPs are reported to be associated with PPHN for the first time in this study. CPS1 (carbamoyl phosphate synthase 1) encodes one of the key enzymes located in mitochondria involved in the urea cycle. A functional deficiency in the CPS1 enzyme can affect the catalysis of the first step of the urea cycle and the generation of nitric oxide, which plays a critical role in regulating pulmonary vascular resistance [35]. Sixteen polymorphisms, including three in coding regions (rs1047891, rs2287599 and rs41272667) of CPS1 [3,36], have been reported to be associated with PPHN. However, rs1047891 and rs2287599 were not significant in our cohort, and rs41272667 (close to the noncoding region) was not included in our study. The reason for this difference may be that the genetic risk factors for PPHN differ in different ethnic populations. NOTCH3 belongs to the Notch signaling pathway, which plays an important role in the regulation of cellular proliferation, differentiation and apoptosis. Heterozygous variants in NOTCH3 might affect cell proliferation and NOTCH3-HES5 signaling resulting in PAH [37]. Gene-level analysis also identified CPS1 and SMAD9 as genetic risk factors for PPHN.
There are several limitations to our study. We used clinical exome sequencing (with 16 PPHN/PAH diseaserelated genes included in the panel) for genetic testing, and the other 9 genes need to be further studied. Genetic risk polymorphisms are usually identified in noncoding regions, which cannot be detected using exome sequencing panels. However, exome sequencing provides more information for variants spread throughout genes than candidate SNP genotyping has provided in previous studies. Additionally, we could not study the association between nitric oxide metabolites and PPHN since the plasma concentrations of nitric oxide metabolites were not measured/recorded for all patients.