Genome-wide DNA methylation analysis of pulmonary function in middle and old-aged Chinese monozygotic twins

Background Previous studies have determined the epigenetic association between DNA methylation and pulmonary function among various ethnics, whereas this association is largely unknown in Chinese adults. Thus, we aimed to explore epigenetic relationships between genome-wide DNA methylation levels and pulmonary function among middle-aged Chinese monozygotic twins. Methods The monozygotic twin sample was drawn from the Qingdao Twin Registry. Pulmonary function was measured by three parameters including forced expiratory volume the first second (FEV1), forced vital capacity (FVC), and FEV1/FVC ratio. Linear mixed effect model was used to regress the methylation level of CpG sites on pulmonary function. After that, we applied Genomic Regions Enrichment of Annotations Tool (GREAT) to predict the genomic regions enrichment, and used comb-p python library to detect differentially methylated regions (DMRs). Gene expression analysis was conducted to validate the results of differentially methylated analyses. Results We identified 112 CpG sites with the level of P < 1 × 10–4 which were annotated to 40 genes. We identified 12 common enriched pathways of three pulmonary function parameters. We detected 39 DMRs located at 23 genes, of which PRDM1 was related to decreased pulmonary function, and MPL, LTB4R2, and EPHB3 were related to increased pulmonary function. The gene expression analyses validated DIP2C, ASB2, SLC6A5, and GAS6 related to decreased pulmonary function. Conclusion Our DNA methylation sequencing analysis on identical twins provides new references for the epigenetic regulation on pulmonary function. Several CpG sites, genes, biological pathways and DMRs are considered as possible crucial to pulmonary function. Supplementary Information The online version contains supplementary material available at 10.1186/s12931-021-01896-5.


Introduction
Pulmonary function is determined as an important predictor of cardiovascular health [1] and mortality [2], which declines with increasing age after the third decade of lifetime [3]. Accelerated decline in pulmonary function has immense impact on individual and social economy [4]. Pulmonary function can be influenced by a variety of factors. Traditional epidemiologic studies have widely investigated the relationship of environmental factors, such as cigarette smoking [5] and air pollution [6] with pulmonary function. Besides, family-based study [7] and genome-wide association study (GWAS) [8] have estimated the heritability of pulmonary function ranging from 0.42 to 0.71, indicating genetic contribution to the variation of pulmonary function.
Currently, an increasing number of GWASs have smoothed the way for discovering human genetic variants linked to pulmonary function which are quantified by forced expiratory volume in one second (FEV1), forced vital capacity (FVC), and FEV1/FVC ratio [9]. Yet the reported nucleotide-level polymorphisms could explain a limited proportion of pulmonary function variation [10] (5.0% for FEV1, 3.4% for FVC, and 9.2% for FEV1/FVC) compared with the estimated heritability, suggesting that other gene-regulatory mechanisms such as epigenetics might also be at play. Epigenetics is the study of heritable phenotype alterations that do not involve changes in the DNA sequence [11], and the epigenetic changes include DNA methylation, histone modification and noncoding RNA. Previous epigenome-wide association studies (EWASs) have investigated the association between DNA methylation and pulmonary function among various ethnic population but only a limited amount of significant genomic sites have been revealed [4,[12][13][14]. Besides, expect one study based on monozygotic (MZ) twin design, most of previous studies were conducted based on general population, which could not control the genetic effect and early life milieu including intrauterine environment on epigenetic changes [15].
As the genetic makeup is perfectly matched within pair, the monozygotic twins serve as optimal and valuable samples for EWAS on complex diseases and phenotypes [16]. The genetic influences on epigenetic changes are cancelled out in the discordant MZ twins design, thus the differential DNA methylation triggered by environmental factors could be identified [17]. The Chinese population is different from the other ethnics of the world in terms of genetic background, environmental exposure and lifestyle. However, there is no EWAS of pulmonary function in the Chinese twins published present. Thereby, we performed an EWAS to identify the association between DNA methylation variants and pulmonary function among Chinese monozygotic twin pairs.

Samples and study procedures
The discordant identical twin pairs are sub-sample of twins derived from Qingdao Twin Registry [18] conducted by Qingdao Centers for Disease Control and Prevention. The details of sample recruitment have been described elsewhere [19]. A total of 68 twin pairs which were conducted DNA methylation sequencing using the reduced representation bisulfite sequencing (RRBS) were included in the sample. After excluded twin pairs with incomplete measurement of pulmonary function (n = 1) and participants with minimal absolute values of intra-pair difference in pulmonary function score (ΔFEV1 < 0.1, n = 7; ΔFVC < 0.1, n = 8, and ΔFEV1/FVC < 0.05, n = 23), complete monozygotic twin pairs who met the criteria were included in the study, including 60 twin pairs for FEV1(34 male and 26 female pairs), 59 twin pairs for FVC (34 male and 25 female pairs), and 44 twin pairs for FEV1/FVC (21 male and 23 female pairs). Informed written consents were obtained from all participants. Regional Ethics Committee of the Qingdao Centers for Disease Control and Prevention Institutional Review Boards has approved this study.
Pulmonary function including FEV1 and FVC (liters) was assessed by the electronic hand-held spirometer (Micro 0102). Trained investigators calibrated the spirometer before measurement every morning. Based on the standard procedure of spirometry, each participant performed two maneuvers in standing state twice, and best trial data were applied to further analysis. The ratio FEV1/FVC was calculated according to the above measurements.

DNA methylation analysis
The Cetyltrimethyl Ammonium Bromide was used to extract genomic DNA from whole blood. DNA methylation library was constructed using RRBS by Biomarker Technologies Corporation, Beijing, China (http:// www. bioma rker. com. cn/). Firstly, genomic DNA was digested with Mspl restriction enzyme. After digesting, the 5′ CG overhangs were repaired, and A-tails were added. Then the DNA was loaded on an agarose gel, and 230-380 bp long (including 100 bp adaptor) fragments were sort out for next bisulfite conversion using NEXTflex Bisulfite-Seq Kit (Bioo Scientific, Austin, TX, USA). After all, the bisulfite converted DNA was amplified with PCR. The reduced representation bisulfite sequencing was conducted using Illumina HiSeq X Ten (Illumina Inc., San Diego, CA, USA).

Data preprocessing
Our previous study has detailed the data preprocessing [20,21]. In brief, the raw data were first trimmed and mapped to Genome Reference Consortium Human Build 37 (hg19) by Bismark [22]. The mapping output from Bismark was then imported to BiSeq (R package) [23] to detect the methylation level. To reduce bias, the coverage was restricted to 90% quantile, and CpG sites with beyond ten missing observations or average methylation beta value < 0.01 were removed. We used logit transformation to transform the beta value to M-value for conducting further differential methylation analyses.

Cell-type composition
Because the DAN sample extracted from the whole blood including distinct cell types which might result in false discoveries. We applied ReFACTor [24] method to control the cellular heterogeneity impact on DNA methylation. ReFACTor is based on principal component analysis and calculates the linear transformations of cell-type composition as principal component analysis components. We selected the top five components as covariates to control cell-type heterogeneity for the subsequent analyses.

Epigenome-wide association analyses
For single CpG analysis, linear mixed effect models were applied to regress methylation level on pulmonary function adjusting for cell-type composition and other confounding factors (FEV1: diastolic pressure; FVC: none; FEV1/FVC: diastolic and systolic pressure) as fixed effects and twin pairing variable as a random effect, based on the co-twin design as proposed by Tan et al. [16]. The smoking status of in-pair twins were almost consistent in sample. The number of smoking twins was 22 for FEV1 and FVC and 15 for FEV1/FVC, the number of non-smoking twins was 32 for FEV1, 31 for FVC, and 25 for FEV1/FVC, and the number of inconsistent smoking status twins was 6 for FEV1 and FVC, 4 for FEV1/ FVC. We added the smoking status as fixed effects to control it. False discovery rate (FDR) [25] was calculated to solve multiplicity problem. We defined the significance of genome-wide as FDR < 0.05, and conducted these analyses by R software (version 4.1.0).

Genomic regions enrichment analysis
Genomic regions enrichment analysis was performed using Genomic Regions Enrichment of Annotations Tool (GREAT) to examine the enrichment of identified methylation sites (P < 0.05) in the functional significance of cis-regulatory regions [26]. GREAT is able to properly incorporate distal binding sites and control for false positives using a binomial test over the input genomic regions. Annotation of GREAT is based on Genome Reference Consortium Human Build 37 (hg19).

Detecting differentially methylated regions (DMRs)
Based on bisulfite-sequencing data with P-values from EWAS result, the significant differentially methylated regions (DMRs) for pulmonary function were identified using comb-p python library proposed by Petersen et al. [27]. This method first combined adjacent P-values as weighted according to the calculated auto-correlation, then performed Benjamini-Hochberg false discovery adjustment to find regions of significant enrichment. The documentation and implementation of comb-p python library are available at website [28] https:// github. com/ brentp/ combi ned-pvalu es. The analyses of DMRs were conducted by Python software (version 3.8.8).

Gene expression analyses
Weighted gene co-expression network analyses (WGCNA) We used R software (version 4.1.0) to perform weighted correlation network analysis such as coexpression network analysis of gene expression data through WGCNA package [29][30][31]. In brief, we firstly constructed a gene co-expression network, and then used dynamic tree cut to identify modules. Next, we related modules to pulmonary function indices. Finally, we used DAVID [32,33] tool to conduct the enrichment analysis of genes clustered in specific modules. The significant enriched terms were defined as a modified fisher exact P-value < 0.05.

Correlational analysis
We applied Spearman's rank correlation analyses by R software (version 4.1.0) to evaluate the correlation between the gene expression levels of genes where the top CpG sites and DMRs annotated and pulmonary function indices. Statistically significant was defied as P-value < 0.05.

Results
Descriptive statistics of basic characteristics are shown in Additional file 1: Table S1. The number of monozygotic twin pairs involved in our study was 60 for FEV1(34 male pairs), 59 for FVC (34 male pairs), and 44 for FEV1/FVC ratio (21 male pairs). The median age of participants was above 50 years old. The mean (standard deviation, SD) of pulmonary function was 1.98 (0.72) for FEV1, 2.33(0.83) for FVC, and 0.86(0.14) for FEV1/FVC. Most clinical indicators had considerably significant correlation, indicating that our discordant MZ twin design could benefit. And the insignificant intra-pair confounders would be added as covariates in our subsequent association analyses. We drew scatter plots with regression line to illustrate the relationship between intra-differences of pulmonary function (ΔFEV1, ΔFVC, ΔFEV1/FVC) and intra-differences of methylated values of top significant CpG sites (P value < 10 −4 , Δ methylated values of CpG sites at corresponding location) in MZ twin pairs (Additional file 2: Table S2, Additional file 3: Fig. S1, Additional file 4: Fig. S2, and Additional file 5: Fig. S3). The Δ methylation value of four CpG sites (f, h, i, j) were positively correlated with ΔFEV1, and the Δ methylation value of seven CpG sites (a, b, c, d, e, g, k) were negatively correlated with ΔFEV1. The Δ methylation value of eleven CpG sites (a, b, c, g, h, i, j, k, m, o, q) were positively correlated with ΔFVC, and the Δ methylation value of six CpG sites (d,e,f,l,n,p) were negatively correlated with ΔFVC. The Δ methylation value of two CpG sites (c,i) were positively correlated with ΔFEV1/FVC ratio, and the Δ methylation value of ten CpG sites (a,b,d,e,f,g,h,j,k,l) were negatively correlated with ΔFEV1/FVC ratio.

Epigenome-wide association analysis
The results of EWAS for pulmonary function are shown in Table 1. In analysis of pulmonary function, 25 CpG sites with P value < 10 −4 were identified for FEV1, and the top 25 CpG sites were located at 8 genes, among which 4 (50%) genes WDR90, DIP2C, PANX2, NUBP2 were associated with pulmonary function. For intra-pair difference in FVC, 56 CpG sites with a P value < 10 −4 were found with 4 sites reaching a P value < 10 −5 . And the top CpG sites were located at 21 genes, among which 8 (38%) genes AP5B1, CYP26B1, GAS6, IL11, IRS1, IRS2, MAD1L1, NUAK1 were associated with pulmonary function. Intra-pair methylation difference of FEV1/FVC ratio identified 31 CpG sites with P value < 10 −4 . The CpG sites located at 11 genes and the most significant site was located at FENDRR and ENSG00000268388 (chr16: 86,528,639 bp, cor = − 1.93, P = 2.27 × 10 −6 ). The Manhattan plots of pulmonary function for the P-values of each CpG site against its chromosomal location are illustrated in Fig. 1.

Biological pathway analysis
The number of genomic cis-regulatory regions related to one or more genes was 13,821, 14,901, and 17,929 for FEV1, FVC, and FEV1/FVC, respectively (Additional file 6: Fig. S4). The absolute distance of genomic regions to transcription start site was displayed in Additional file 7: Fig. S5 and Additional file 8: Fig. S6.
The analysis found 12 common functional clusters of biological process with very high statistical significance (binomial p-value < 1.07E−13) ( Table 2), including negative regulation of phospholipid biosynthetic process, platelet-derived growth factor binding, potassium:chloride symporter activity, epithelial-mesenchymal cell signaling, decreased serum estradiol, low voltage-gated calcium channel activity, cAMP response element binding protein binding, activation of Cdc42 GTPase activity, ceramide signaling pathway, transcription regulation by bZIP transcription factor, mitogenactivated protein kinase p38 binding, and notch signaling pathway.
The MSiDB and PANTHER pathway, Human Phenotype, and Go enriched terms of FEV1, FVC, and FEV1/ FVC are shown in Additional file 9: Table S3, Additional file 10: Table S4, and Additional file 11: Table S5, respectively.

Region-based analysis
By using comb-p, region-based analyses identified 13, 14, and 12 DMRs (FDR < 0.05) associated with FEV1, FVC, and FEV1/FVC ratio, respectively (  Table 1. Of all DMRs, three DMRs (located at PRDM1, MPL, EPHB3) were related to more than one trait. Of the significant DMRs associated with pulmonary function, nine DMRs for FEV1 were annotated to PRDM1

Gene expression analysis
In the gene expression analyses, we included 12 twin pairs (7 male pairs) with median age of 53 years (ranging from 43 to 65), a median FEV1 of 2.05 (ranging from 1.04 to 3.81), a median FVC of 2.17 (ranging from 1.32 to 4.10), and a median FEV1/FVC of 0.97(ranging from 0.57 to 1.01).

Weighted gene co-expression network analysis (WGCNA)
As shown in Additional file 12: Fig. S7, the genes clustered in lightsteelblue1 module (including 492 genes) were both positively correlated with FEV1 (r = 0.58, P = 0.003) and FVC (r = 0.51, P = 0.01). The genes clustered in this module were significantly enriched in positive regulation of protein secretion, positive regulation of cell division, growth factor activity, calcium ion binding, motile cilium, platelet degranulation, and phospholipase A2 activity. (Additional file 13: Table S6). Moreover, the genes clustered in darkorange2 module (including 62 genes) were also both positively correlated with FEV1(r = 0.45, P = 0.03) and FVC (r = 0.53, P = 0.007). The genes clustered in this module were significantly enriched in extracellular region, negative regulation of exocytosis, and cell adhesion (Additional file 14: Table S7).
Additionally, the genes clustered in ivory module (including 76 genes) were negatively correlated with FEV1/FVC (r = − 0.63, P = 0.001). The genes clustered in this module were significantly enriched in cytokine activity, extracellular region, intermediate filament, and so on (Additional file 15: Table S8).

The common genes and enrichment terms between methylation analysis and WGCNA
We detected the common genes and enrichment terms between the methylation analyses and WGCNA. We found DIP2C gene which included in lightsteelblue1 modules linked to FEV1, and ASB2 which included in darkorange2 modules associated with FVC. The common enrichment terms "platelet alpha granule lumen" was identified.

Correlation analysis
Significant correlations between gene expression levels and pulmonary function indices were identified, including SLC6A5 related to FEV1 (r = 0.454, P = 0.026), and GAS6 related to FVC (r = 0.533, P = 0.007).   (Table 1) played important roles in pulmonary function. Most interestingly, DIP2C gene was not only identified to link to pulmonary function in our EWAS results, but further validated in the WGCNA. Moreover, DIP2C has been detected to related to pulmonary function in blood DNA in Koreans adults [34]. Mutations in DIP2C have been identified in lung cancer samples  [35]. This demonstrated that DIP2C gene indeed plays an important role at the pulmonary disease. WDR90 was identified as required gene for ciliogenesis [36]. The lung ciliary-related proteins keeping the airways clear of mucus and dirt play a role in human pulmonary function. PANX2 was expressed in human airway epithelial cells and alveolar macrophages, which might have an impact on pulmonary function [37]. NUBP2 was found to express in distal lung epithelium, which might function in lung development of mice [38]. AP5B1 was identified as susceptibility loci for the combined eczema plus asthma phenotype, which might affect pulmonary function [39]. Cyp26b1 was an essential regulator of distal airway epithelial differentiation during lung development [40]. GAS6 promoted Axl-mediated survival in pulmonary endothelial cells [41]. IL-11 was suggested that could cause lung inflammation and airway obstruction [42]. IRS1 and IRS2 were found to mediate IL-4-induced migration of human airway epithelial cells, which influence pulmonary function [43]. MAD1L1 was identified as a genome-wide significant signals with idiopathic pulmonary fibrosis by GWAS [44]. CAMTA1 was a regulator of nuclear factor of activated T cells signaling, which was linked to pulmonary arterial hypertension [45]. FENDRR was long noncoding RNA exhibiting antifibrotic activity in pulmonary fibrosis [46]. Decreased expression of MUC2 has been observed in patients with COPD [47]. Pathway enrichment analyses showed lots of common significant pathways of pulmonary function using GREAT. The significant enrichment pathway include negative regulation of phospholipid biosynthetic process [48], platelet-derived growth factor binding [49], potassium:chloride symporter activity [50], epithelialmesenchymal cell signaling [51], decreased serum estradiol [52], low voltage-gated calcium channel activity [53], cAMP response element binding protein binding [54], activation of Cdc42 GTPase activity [55], ceramide signaling pathway [56], transcription regulation by bZIP transcription factor [57], mitogen-activated protein kinase p38 binding [58], and notch signaling pathway [59].
The genomic region-based analyses found 39 DMRs locating at 23 genes (Table 3), of which PRDM1, MPL, LTB4R2, EPHB3 and SLC6A5 had certain biological function potentially linked to pulmonary function. Previous study found that NF-κB(p65) promotion of miR-99b could aggravate acute lung injury by PRDM1 down-regulation, and over-expressed PRDM1 inhibits acute lung injury in mice [60]. MPL was defined as an important gene in a novel VEGF-miR-1-Mpl-Pselectin effector pathway in lung Th2 inflammation and found as potential therapeutic targets for asthma [61]. LTB4R2, as one of pivotal leukotriene B4 receptors, was proposed as potential therapeutic targets in asthma [62]. EphB3 was expressed at human lung fibroblasts, which induce dephrin-B2 forward signal involved in several fibroblast functions [63]. SLC6A5, also named GLYT-2, encoded a sodium-and chloride-dependent glycine neurotransmitter transporter. The glycinergic   C, D, E, F, G, H) negatively associated with FEV1/FVC ratio inhibitory synaptic inputs played an important role in respiratory motoneurons, which could affect pulmonary function [64].
As additional validation, we integrated the methylation data with gene expression data. Genes clustered in light-steelblue1 and darkorange2 modules were positively correlated with FEV1 and FVC in WGCNA, and some genes were in common with EWAS findings, including DIP2C discussed above and ASB2 involved in pulmonary function remained to be studied further. Additionally, SLC6A5 and GAS6 discussed above were positively correlated to pulmonary function. Moreover, the common enrichment terms between methylation analysis and WGCNA was platelet alpha granule lumen, which involved in pulmonary function remained to be studied further.
There were several strengths in the present study. The identical twin design used in our study to detect the epigenetic variation of pulmonary function could perfectly control over the genetic background to provide credible results. Moreover, this was one of the few pulmonary function EWA studies in Asian and the first in Chinese. As the genetic background and environmental exposures differ from ethnic populations, our study elucidated the underlying physiological mechanism of pulmonary function changes in Chinese adults. However, our studies also have some limitations. First, compared with other general case-control design, the sample size of our study was relatively small due to the difficulty of recruiting and identifying qualified MZ twin pairs. However, previous study has determined that the sample sizes of monozygotic twins just require roughly 1/4 of sample sizes in the ordinary case-only design to provide the sufficient power [65]. Second, the DNA sample was extracted from blood rather than the lung tissue. Although we know methylation is the characteristic of tissue-specificity, it was difficult to obtain the lung tissue of sample. Moreover, the mounting evidences have supported disease-associated methylation loci could be identified from peripheral samples [66]. Third, the non-shared environment for the individual siblings of MZ twins, such as occupational environment [67], residential environment [68], and mode of transport [69], could expose themselves to different levels of environmental pollutants, including particulate matter, nitrogen dioxide; volatile organic compounds, polycyclic aromatic hydrocarbons, and so on, which might directly affect pulmonary function [70][71][72][73], and cause different levels of DNA methylation [74][75][76][77][78] thereby indirectly influencing pulmonary function. However, due to the complicated causes of DNA methylation and the difficulty of monitoring for the external environmental exposure, we have not further analyzed the causes of DNA methylation. We will seek practical method to solve it in the future research.
Although these results could not immediately be applied as clinical predictors of disease in individuals, they are important from an aetiological perspective. Epigenetic studies complement genetic association studies to identify pulmonary function related genes. The EWAS and gene expression analysis identified candidate genes and pathways related to pulmonary function, which could help understand underlying mechanisms of pulmonary function and explore new molecular biological pathway of pulmonary functional decline in clinical.

Conclusion
In conclusion, our DNA methylation sequencing analysis on identical twins provides new references for the epigenetic regulation on pulmonary function. Several CpG sites, genes, biological pathways and DMRs were considered as possible crucial to pulmonary function. All findings point important clues to further explore of pulmonary function.