DNA methylation is associated with lung function in never smokers

Background Active smoking is the main risk factor for COPD. Here, epigenetic mechanisms may play a role, since cigarette smoking is associated with differential DNA methylation in whole blood. So far, it is unclear whether epigenetics also play a role in subjects with COPD who never smoked. Therefore, we aimed to identify differential DNA methylation associated with lung function in never smokers. Methods We determined epigenome-wide DNA methylation levels of 396,243 CpG-sites (Illumina 450 K) in blood of never smokers in four independent cohorts, LifeLines COPD&C (N = 903), LifeLines DEEP (N = 166), Rotterdam Study (RS)-III (N = 150) and RS-BIOS (N = 206). We meta-analyzed the cohort-specific methylation results to identify differentially methylated CpG-sites with FEV1/FVC. Expression Quantitative Trait Methylation (eQTM) analysis was performed in the Biobank-based Integrative Omics Studies (BIOS). Results A total of 36 CpG-sites were associated with FEV1/FVC in never smokers at p-value< 0.0001, but the meta-analysis did not reveal any epigenome-wide significant CpG-sites. Of interest, 35 of these 36 CpG-sites have not been associated with lung function before in studies including subjects irrespective of smoking history. Among the top hits were cg10012512, cg02885771, annotated to the gene LTV1 Ribosome Biogenesis factor (LTV1), and cg25105536, annotated to Kelch Like Family Member 32 (KLHL32). Moreover, a total of 11 eQTMS were identified. Conclusions With the identification of 35 CpG-sites that are unique for never smokers, our study shows that DNA methylation is also associated with FEV1/FVC in subjects that never smoked and therefore not merely related to smoking.


Background
Chronic Obstructive Pulmonary Disease (COPD) is a progressive inflammatory lung disease characterized by persistent airway obstruction that causes severe respiratory symptoms and poor quality of life [1]. Although smoking is generally considered the main environmental risk factor, estimations are that 25-45% of patients with COPD have never smoked [2]. Despite extensive research, the etiology of COPD remains incompletely understood. It is known that the development of this complex heterogeneous disease is influenced by both genetic and environmental factors, as well as their interactions [3][4][5][6]. As interface between the inherited genome and environmental exposures, an important role has been postulated for the epigenome [7]. The epigenome includes multiple epigenetic mechanisms that affect gene expression without modifying the DNA sequence. These epigenetic mechanisms are highly dynamic and respond to environmental exposures, ageing and diseases [8]. One such epigenetic mechanism is DNA methylation, which involves the binding of a methyl group to a cytosine base located adjacent to a guanine base. Methylation of these so called CpG-sites in regulatory regions of the DNA generally result in decreased expression of a particular gene [9].
So far, only a few studies have investigated the association between DNA methylation in peripheral blood and COPD or lung function using an epigenome-wide hypothesis free approach [10][11][12][13][14][15][16][17]. Although findings across the studies are not consistent, there is suggestive evidence that alterations in DNA methylation might play a role in the etiology of COPD. However, in previous studies, subjects were mainly included irrespective of smoking status, thus including current smokers, ex-smokers and never smokers. As a consequence, it is currently not known if there are differences in DNA methylation between healthy individuals and patients with COPD who have never smoked. Recently, we studied the association between epigenome-wide DNA methylation and COPD in both current smokers and never smokers [16]. Although we did not find any epigenome-wide significant association in current smokers nor in never smokers, the associations between DNA methylation and COPD were different between both groups. Hence, by further exploring the role of DNA methylation in a much larger set of never smokers together with a continuous measurement of lung function, we might be able to reveal important novel insights in the etiology of COPD. In this study, we aim to assess the association between DNA methylation and lung function in never smokers, meta-analyzing four independent population-based cohorts.

Study population
To study the association between epigenome-wide DNA methylation and lung function, defined as the ratio between the Forced Expiratory Volume in 1 s (FEV 1 ) and Forced Vital Capacity (FVC), in never smokers, we performed a meta-analysis in four different cohorts. Two cohorts originated from the LifeLines population-based cohort study [18]: the LifeLines COPD & Controls DNA methylation study [16,19] (LL COPD&C, n = 903) and the LifeLines DEEP study [20] (LLDEEP, n = 166). The two other cohorts originated from the population-based Rotterdam study (RS) [21]: The first visit of the third RS cohort (RS-III-1, n = 150) and a cohort selected for the Biobank-based Integrative Omics Studies (BIOS) project (RS-BIOS, n = 206). Both population-based cohort studies were approved by the local university medical hospital ethical committees and all participants signed written informed consent. In all cohorts, never smoking was defined based on self-reported never smoking history and 0 pack years included in the standardized questionnaires.

Lung function
Within the LifeLines population-based cohort study, prebronchodilator spirometry was performed with a Welch Allyn Version 1.6.0.489, PC-based Spiroperfect with CA Workstation software according to ATS/ERS guidelines. Technical quality and results were evaluated by well-trained assistants and difficult to interpret results were re-evaluated by a lung physician. Within the population-based Rotterdam study, pre-bronchodilator spirometry was performed during the research center visit using a SpiroPro portable spirometer (RS-III-1) or a Master Screen® PFT Pro (RS-BIOS) by trained paramedical staff according to the ERS/ATS Guidelines. Spirometry results were analyzed by two researchers and verified by a specialist in pulmonary medicine.

DNA methylation
In all four cohorts, DNA methylation levels in whole blood were determined with the Illumina Infinium Methylation 450 K array. Data was presented as beta values (ratio of methylated probe intensity and the overall intensity) ranging from 0 to 1. Quality control has been performed for all datasets separately as described before [19,22]. After quality control, data was available on 396,243 CpG-sites in all four datasets.

Statistical analysis Epigenome-wide association study and meta-analysis
We performed an epigenome-wide association study (EWAS) on lung function defined as FEV 1 /FVC in all four cohorts separately using robust linear regression analysis in R. The analysis was adjusted for the potential confounders age and sex. To adjust for the cellular heterogeneity of the whole blood samples, we included proportional white blood cell counts of mononuclear cells, lymphocytes, neutrophils and eosinophils, obtained by standard laboratory techniques. For LL COPD&C, we adjusted for technical variation by performing a principal components analysis using the 220 control probes    incorporated in the Illumina 450 k Chip. The 7 principal components that explained > 1% of the technical variation were included in the analysis. For LLDEEP, data on technical variance was not accessible. For the two RS cohorts, we included the position on the array and array number to adjust for technical variation. Regression estimates from all four individual EWA studies were combined by a weighted by the inverse of the variance random-effect meta-analysis using the effect estimates and standard errors in "rmeta" package in R. CpG-sites with a p-value below 1.26 × 10^− 7 (Bonferroni corrected p-value by number of CpG-sites 0.05/396243) were considered epigenome-wide significant. CpG-sites with a pvalue below 0.0001 in the meta-analysis were defined as top associations in our study.

Expression quantitative trait methylation (eQTM) analysis
To assess whether top associations were also associated with gene expression levels, we used the never smokers included in the Biobank-based Integrative Omics Studies (BIOS). For all cohorts separately, reads were normalized to counts per million. To adjust for technical variation for gene expression and DNA methylation, principal component analysis was conducted on the residual normalized counts and beta-values excluding the potential confounders age and gender. Principal components that explained more than 5% of the technical variation in gene expression or DNA methylation were included in the analysis. Subsequently, robust linear regression analysis was performed on the CpG-sites and the genes within 1 MB around the CpG-sites. The analyses were adjusted for the potential confounders age, sex and technical variation by principal components as stated before. The individuals eQTM analysis were combined by a random-effect meta-analysis using the effect estimates and standard errors in RMeta. An eQTM was considered significant when the Bonferroni-adjusted p-value for the number of genes within 1 MB around the CpGsites was below 0.05.

Subject characteristics
An overview of the characteristics of the subjects included in the study is shown in Table 1. LL COPD&C was the largest cohort included in this meta-analysis. Notably, since this cohort is a non-random selection from the Life-Lines cohort study with COPD (defined as FEV 1 /FVC < 0.70) as one of the selection criteria, the percentages of COPD cases should not be interpreted as prevalence.

Meta-analysis of the four epigenome-wide association studies
The meta-analysis of the four different cohorts did not reveal CpG-sites that were epigenome wide significantly associated with FEV 1 /FVC. We identified 36 CpG-sites as our top associations ( Table 2). The Manhattan plot of the meta-analysis is shown in Fig. 1a. Forest plots of the three most significant CpG-sites cg10012512, located in the intergenic region of chromosome 7q36.3 (p=5.94 × 10^− 7 ), cg02285771, annotated to LTV1 Ribosome Biogenesis Factor (LTV1) (p=4.10 × 10^− 6 ) and cg25105536, annotated to Kelch Like Family Member 32 (KLHL32) (p= 9.09 × 10^− 6 ) are shown in Fig. 1b-d. An overview of all CpG-sites associated with FEV 1 /FVC at nominal p-value of 0.05 can be found in Additional file 1: Table S1. The direction of the effect of the 36 top CpG-sites did not change in a sensitivity analysis in the LL COPD&C cohort excluding the subjects that were exposed to environmental tobacco smoke (ETS)(N=659 subjects) (Additional file 2: Table S2).

Expression quantitative trait methylation (eQTM) analysis
In total, 803 genes were located within 2 MB of the 36 CpG-sites. The expression of 11 genes was significantly  associated with DNA methylation levels at the 9 different CpG-sites (Table 3). DNA methylation at cg25105536, annotated to KLHL32, was significantly associated with gene expression levels of KLHL32. DNA methylation levels at cg08065963, located in the intergenic region on chromosome 16 and not yet annotated to a gene, showed a significant association with gene expression levels of 4-Aminobutyrate Aminotransferase (ABAT). For the other 7 CpG-sites, DNA methylation levels were associated with gene expression levels of one or two genes other than the previously annotated genes. An overview of the association between DNA methylation and gene expression levels of all genes can be found in Additional file 3: Table S3.

Discussion
This study is the first large general population-based EWA study on lung function in never smokers. So far, virtually all EWA studies on the origin of COPD included subjects with a history of cigarette smoking. As a consequence, these studies mainly addressed the origins of COPD in response to smoking. It is unclear if the results of these studies help to explain the etiology of COPD or rather explain the contribution of cigarette smoke towards the disease. Therefore, our study importantly contributes to the current understanding of COPD in never smokers. We identified 36 CpG-sites that were significantly associated with FEV 1 /FVC at p-value below 0.0001. The top hit of our meta-analysis, cg10012512, is located in the intergenic region of chromosome 7q36.3. It is therefore not possible to speculate on the functional effect of differences in DNA methylation at this specific CpG-site and how these differences may affect FEV 1 /FVC. While associations found with an eQTM analysis may help to get more insight in the function of a CpG-site, our eQTM analysis did not reveal any nominal significant associations for cg10012512. However, this CpG-site was differentially methylated between never smokers and current smokers [23]. Presumably, this CpG-site does also respond to other inhaled deleterious substances, which in turn affects lung function. The second top hit, cg02885771 located on chromosome 6q24.2 is annotated LTV1. Previously, this CpG-site has been associated with asthma in airway epithelial cells [24] and LTV1 was shown to be expressed in lung tissue in the Genotype Tissue Expression (GTEx) project. Although studies in yeast describe LTV1 as a conserved 40S-associated biogenesis factor that functions in small subunit nuclear export, a specific role for LTV1 in respiratory diseases is not known [25]. The third top hit, cg25105536, is annotated to KLHL32 on chromosome 6q16.1 and we found a significant association between DNA methylation levels of cg25105536 and gene expression levels of KLHL32. The function of KLHL32 is poorly understood, however, four genetic variants in the KLHL32 gene have been associated with FEV 1 and FEV 1 /FVC in African American subjects with COPD and a history of smoking [26]. Notwithstanding the fact that these associations were only identified in a specific group, it might suggest a role for KLHL32 in the respiratory system. Next to KLHL32, we found that gene expression levels of 10 additional genes were significantly associated with DNA methylation levels at one of the 36 CpG-sites. cg08065963, which was not yet annotated to a gene, was significantly associated with 4-Aminobutyrate Aminotransferase (ABAT). Interestingly, a role for ABAT in COPD has not been described before. The remaining nine genes were other genes than the annotated genes of the particular CpG-sites. This suggest that the CpG-sites may also regulate distant genes within a region of 2 MB, which complicates the functional assessment of differences in DNA methylation even further.  To the best of our knowledge, there are eight studies in literature describing the association between DNA methylation and lung function ( Table 4). Six of these studies included both subjects with and without a history of cigarette smoking and, except for the study by Qui et al., adjusted for smoking status in the statistical analysis. In addition, the recent study by Imboden et al. performed analyses with and without adjustment for smoking status and pack years. Altogether, these seven studies identified 462 unique CpG-sites. Interestingly, none of the 36 CpG-sites from our meta-analysis in never smokers were among these 462 previously identified CpG-sites (Table 5). Apparently these 36 CpG-sites are only associated with lung function level in never smokers. The fact that 17 CpG-sites (47%) were associated at nominal p-value < 0.05 with COPD (dichotomously defined as the ratio of FEV 1 /FVC below 70%) in our previously EWAS stratified for never smoking, further underscores this assumption [16]. There is, however, one exception, since cg22742965, annotated to Transmembrane Protein With EGF Like And Two Follistatin Like Domains 2 (TMEFF2), was also significantly associated with COPD in smokers. Most likely, this CpG-site shows a general response to inhaled deleterious substances such as cigarette smoke and other yet unknown substances.
Assuming that the observed differential DNA methylation at the majority of the CpG-sites in our study occurs without exposure to smoking, the question arises why this differential DNA methylation is observed. One possible explanation may be that other factors within the environment such as air pollution and job-related exposures are responsible for the observed differences in DNA methylation. Recently, we studied the epigenome-wide association between DNA methylation and exposure to air pollution and job-related exposures in a selection of the LifeLines population cohort including both never and current smokers [19,27]. While we did find significant associations, none of them were replicated in independent cohorts. Additional analyses in never smokers for this paper did not reveal novel associations between DNA methylation and environmental exposures (Additional file 4: Table  S4 and Additional file 5: Figure S1). This might potentially be due to lack of power, since only a small percentage of the subjects that have never smoked in the LL COPD&C cohort have been exposed to environmental exposures. Moreover, exposure levels to air pollution in the LL COPD&C are relatively low compared to the average Dutch levels determined within the 2012 Dutch national health survey as described by Strak et al [28]. Next to environmental exposures, another explanation may be that a reduced lung function level precedes the differences in DNA methylation. However, with the cross-sectional design of this study, we cannot derive conclusions on the direction of the association and causality. Large longitudinal studies are required to investigate causality between DNA methylation and FEV 1 /FVC. Moreover, this will give the opportunity to investigate if low levels of FEV 1 and decline in FEV 1 over the years is associated with DNA methylation in never smokers.