Expression profiling identifies genes involved in emphysema severity

Chronic obstructive pulmonary disease (COPD) is a major public health problem. The aim of this study was to identify genes involved in emphysema severity in COPD patients. Gene expression profiling was performed on total RNA extracted from non-tumor lung tissue from 30 smokers with emphysema. Class comparison analysis based on gas transfer measurement was performed to identify differentially expressed genes. Genes were then selected for technical validation by quantitative reverse transcriptase-PCR (qRT-PCR) if also represented on microarray platforms used in previously published emphysema studies. Genes technically validated advanced to tests of biological replication by qRT-PCR using an independent test set of 62 lung samples. Class comparison identified 98 differentially expressed genes (p < 0.01). Fifty-one of those genes had been previously evaluated in differentiation between normal and severe emphysema lung. qRT-PCR confirmed the direction of change in expression in 29 of the 51 genes and 11 of those validated, remaining significant at p < 0.05. Biological replication in an independent cohort confirmed the altered expression of eight genes, with seven genes differentially expressed by greater than 1.3 fold, identifying these as candidate determinants of emphysema severity. Gene expression profiling of lung from emphysema patients identified seven candidate genes associated with emphysema severity including COL6A3, SERPINF1, ZNHIT6, NEDD4, CDKN2A, NRN1 and GSTM3.


Introduction
Chronic obstructive pulmonary disease (COPD) is a major health burden worldwide [1]. Smoking is the primary cause of COPD, with up to 50% of smokers developing the disease [2]. It is frequently under-diagnosed and under-treated [3] since its early stages are often asymptomatic. COPD patients are classified into mild, moderate and severe based on the degree of airflow limitation, which is a result of damage in the large airways (bronchitis), small airways (bronchiolitis) and or alveoli (emphy-sema). Emphysema affects 40% of heavy smokers [4] and causes loss of elastic recoil, leading to abnormal gas exchange and breathlessness. Despite smoking cessation, some individuals continue to deteriorate, developing severe emphysema due to persistent inflammation and continued damage [5]. A recent meta-analysis by Godtfredson et al suggests that former smokers with mild to moderate COPD have better morbidity and mortality outcomes [6]. Hence, early identification of susceptible individuals would increase the opportunity for improved intervention, early treatment and prevention of progression. Patho-biological mechanisms in emphysema development include inflammation, protease and antiprotease imbalance and oxidative stress [7], but many pathways, both within and outside of these mechanisms, remain to be explored. In this study we used microarrays to simultaneously study multiple genes with the aim of identifying markers and/or pathways that would enable greater understanding of the biology of emphysema progression in susceptible smokers, and which could have potential as diagnostic tools or therapeutic targets.
High throughput microarray technology has been used to profile gene expression patterns to identify important genes and pathways implicated in chronic lung disease. Susceptibility studies in COPD have used lung tissue and primary cells to profile gene expression. Four of these studies have compared gene expression changes between various Global Initiative for Chronic Obstructive Lung Disease (GOLD) stages (I-IV) [8][9][10][11], but only two studies so far have profiled lungs from patients clinically stratified by emphysema (these are discussed in detail below) [12,13]. Spira et al [12] performed a case-control study which compared the gene expression profile of 20 smokers with severely emphysematous lungs and 14 smokers with normal or mildly emphysematous lungs [12]. Similarly, Golpon et al [13] compared lung expression profiles between controls and patients with either severe emphysema or alpha 1 antitrypsin (α 1 AT) enzyme deficiency [13]. These studies identified differential expression of particular genes as well as a global reduction in gene expression in severe emphysema, compared with normal lung, potentially explained by the relative acellularity of end-stage emphysema. Validation of published expression differences and identification of additional genes responsible for the progression of emphysema would contribute to progress in understanding patho-biology and improving clinical management.
We hypothesised that gene expression profiling would identify differentially expressed genes that are associated with the progression from mild to moderate emphysema. We chose these stages for two main reasons: (i) we considered this phase of progression (from mild to moderate) to be most critical in the development of symptomatic, clinically significant emphysema, as well as more responsive to treatment than end-stage lung disease and (ii) to avoid lack of sensitivity from previously shown global gene downregulation of severe acellular end-stage emphysema. The transcriptome profile in mild and moderately emphysematous lung was therefore compared to identify gene candidates for severity of disease, which were then validated in an independent set of test patients.

Subjects and samples for The Prince Charles Hospital training set
Patients who had undergone curative resection for lung cancer and who agreed to donate resected lung to The Prince Charles Hospital (TPCH) lung tissue bank were selected for this study if they fulfilled the following inclusion criteria: 1) > 20 pack years of self-reported smoking history (where one pack-year was defined as the equivalent of 20 cigarettes per day for one year), 2) ceased smoking > 10 months prior to surgery (to avoid the effects of current smoking on gene expression) and 3) chronic airflow limitation with FEV 1 /VC ratio < 0.70. Exclusion criteria were the following: 1) current use of inhaled or oral steroids (to exclude the effects of steroids on gene expression), 2) pre-operative chest x-ray showing obstructive pneumonitis (to exclude the potential confounding effect of obstructive pneumonitis), 3) α 1 AT deficiency (S or Z alleles) ascertained by genotyping genomic DNA (to exclude the effects of α 1 AT associated emphysema) [14] and 4) other lung pathology causing impaired gas transfer (interstitial lung disease, pulmonary embolism). Thirty cases met criteria for this study. The project was approved by the Human Research Ethics Committees of The University of Queensland and TPCH. All subjects gave written, informed consent prior to the surgery.
All subjects had pre-bronchodilator lung function testing before surgery. Spirometry and gas transfer were performed according to American Thoracic Society standards on the Jaeger Compactlab Transfer and Body Systems (Jaeger, Hoechberg, Germany) and results were compared to predicted values [15,16]. The single breath carbon monoxide diffusing capacity (DLCO) was divided by alveolar volume to estimate carbon monoxide diffusing capacity within the volume of lung accessed by the single breath (KCO). The 30 COPD patients were arbitrarily classed as mild emphysema with KCO ≥ 75% predicted (n = 10) and moderate emphysema with KCO < 75% predicted (n = 20).

Microarray experiments
Immediately after surgery the non-tumor tissue from the peripheral lung was macroscopically dissected by a pathologist under aseptic conditions, snap-frozen in liquid nitrogen, and stored at -80°C. Total RNA was extracted from these samples using Trizol (Invitrogen Corporation, Carlsbad, CA, USA), DNase treated (Qiagen, Hilden, Germany) and quality checked on an Agilent Bioanalyzer (Agilent Technologies Inc., Santa Clara, CA, USA) as previously published from our laboratory [17]. Lung and universal reference RNA (Stratagene, La Jolla, CA, USA) was reverse transcribed, labeled with Cy5 and Cy3 (Amersham/GE Healthcare, Buckinghamshire, England) respectively and co-hybridized onto a 22K Operon V2.

Microarray data preprocessing
Raw images were imported into Imagene V5.1 (BioDiscovery, Inc., El Segundo, CA, USA) for background correction, filtering of spots with poor morphology, and calculation and extraction of median intensity signals. Avadis V4.3 (Strand Genomics, Bangalore, India), was used to suppress 'bad' spots, which were signals fewer than 20 pixels or greater than 65,000 pixels. Data was centralized across all samples using Lowess normalization, to account for non-linear dye bias. The Cy5/Cy3 ratio was then computed and log transformed to the base two. Genes with log ratio variation of p > 0.05 were excluded as their signal ratios displayed no significant variance from the mean signal ratio of the samples.

Genelist selection and external validation
Class comparison analysis, based on the supervising parameter KCO, was performed in BRB ArrayTools V3.5β1 (developed by Dr Richard Simon and Amy Peng Lam, freely accessible online http://linus.nci.nih.gov/BRB-ArrayTools.html) to identify genes differentially expressed between mild (≥ 75% predicted KCO) and moderate emphysema (<75% predicted KCO) groups categorized by gas transfer.
In order to prioritise significant dysregulated genes for technical validation, we initially selected those represented on the gene expression microarray platforms used in two previously published studies that analyzed emphysematous tissue (Spira et al [12] and Golpon et al [13]) accessed from Gene Expression Omnibus (GEO) Spira et al (GEO series GSE1650) used the Affymetrix HG-U133A gene chip that contained probes for ~22,500 human transcripts and Golpon et al (GEO series GSE1122) used the HuGeneFL Affymetrix gene chip that contained probes for~6 ,086 transcripts. Chip Comparer http://ten ero.duhs.duke.edu/genearray/perl/chip/chipcomparer.pl was used to find genes that were common between the Operon V2.1, Affymetrix HG-U133A and HUGeneFL platforms. We chose to validate by qRT-PCR only those genes represented both in Operon and at least one of the other two platforms. This will facilitate external validation and identification of robust genes involved in the pathogenesis of emphysema.

Technical validation of mRNA in the training set by quantitative reverse transcriptase PCR (qRT-PCR)
Total RNA prepared for the microarray experiments was reverse transcribed using Superscript III (Invitrogen Technologies, Carlsbad, California) according to the manufacturer's instructions, and 30 ng of cDNA was used for each qRT-PCR reaction. For each candidate gene, forward and reverse primers were designed using Primer Express v1.5 (PerkinElmer, Inc., Wellesley, MA, USA) to a target close to the microarray probe to amplify the same transcripts if applicable. Primer sequences are listed in the additional file (see Additional file 1). SYBR ® green chemistry (Applied Biosystems, Foster City, California) [18] was used to measure the mRNA level of the gene of interest on a real time rotary analyzer (Rotor-Gene 6000, Corbett Life Science, NSW, Australia) [19]. Target genes were normalized to the geometric mean of three housekeeping genes -18S rRNA, alpha actinin 4 (ACTN4) and hepatocyte growth factor-regulated tyrosine kinase substrate (HGS) [20]. The primer sequences for the housekeepers were 18s fwd: 5'-cggctaccacatccaaggaa-3', rev: 3'-gctggaattaccgcggct-5' ACTN4 fwd: 5'-agcgcaagaccttcacgg-3' rev: 3'-tcatcaatgttctcgatctgtgtg-5' and HGS fwd: 5'-acctgctgaagagacaagtggag-3', rev: 3'-ggtacaggatcttgttacggacgt-5'. The ratio of mean expression in cases with moderate emphysema to the mean expression in cases with mild emphysema was compared between qRT-PCR and microarray signals. Signal ratios of genes demonstrating consistent change in direction of transcript expression in both qRT-PCR and microarray were judged technically validated.

Biological replication of mRNA in test set
Technically validated candidate genes that were statistically significant (t-test, p < 0.05) were selected for biological replication on an independent test set of 62 lung samples from the TPCH lung tissue bank. The subjects in the test set included smokers with at least ten pack-years smoking history with mild or moderate emphysema. The test set consisted of 21 patients with mild emphysema (>75% predicted KCO) and 41 patients with moderate emphysema (40-74% predicted KCO). These samples did not overlap with the samples used in the training set. Total RNA was isolated and reverse transcribed to cDNA as described above. Quantitative RT-PCR was performed and the mean expression ratio was calculated. Genes that showed concordant direction of transcript expression in the test and training set were judged biologically validated.

Demographics
The demographics of the 30 training set and 62 test set subjects are summarised in Table 1. All subjects in the training set were Caucasian former smokers with >20 pack year smoking history and there were more males than females. The subjects were classified as stage I (mild COPD) (9 subjects, 30%) and stage II (moderate COPD) (21 subjects, 70%) according to GOLD guidelines. For the supervised class comparison, emphysema severity in these COPD patients was classified physiologically by the KCO measurement into mild (n = 10, median 79, range 75-85% predicted) and moderate (n = 20, median 69, range 38-74% predicted) emphysema groups.

Microarray data analysis
The filtering of poor quality spots and normalisation resulted in a list of 20,274 probes comprising 13,178 known genes. Of these, 6,420 transcripts representing 4,159 known genes varied significantly (p < 0.05) from the median expression of all genes, and hence were chosen for gene selection analysis.

Genelist selection and external validation
Class comparison analysis identified 98 differentially expressed genes (p < 0.01) between mild and moderate emphysema (See Additional file 2). Fifty-one of the 98 genes were represented on the arrays (HG-U133A) used in the Spira et al study [12] that were used to profile 34 lung tissue samples (20 severe emphysema, 14 mild emphysema/normal lung) and 27 probes were represented in Golpon et al (Affymetrix HuGeneFL) study [13] that profiled 10 lung tissue samples with 5 severe emphysema and 5 normal lung. These 27 probes were also represented on the HG-U133A arrays used by Spira et al. A flow chart showing prioritisation of genelists and the analysis work flow is included in Figure 1. To test the accuracy of these genes to classify or predict emphysema severity, leave-one out class prediction analysis using the multivariate predictor, Nearest Centroid Correct was used, correcting for random variance, in BRB ArrayTools. The shortlisted 51 genes were 100% accurate (100% sensitivity and 100% specificity) in classifying emphysema severity in the 30 training samples. The classification accuracy of the 51 and 27 probes on the Spira et al and Golpon et al datasets respectively were 77% (83% sensitivity and 67% specificity) and 80% (80% sensitivity and 80% specificity) in predicting normal and severe emphysema (See Additional file 3). The hierarchical clustering of these 51 genes in TPCH training set is included in as additional File (See Additional file 4).

Technical validation of mRNA expression using qRT-PCR in the training set
The 51 shortlisted genes progressed to technical validation by qRT-PCR in the training set. For 29 genes the direction of mean expression ratios by qRT-PCR (up-or downregulation) was concordant with their corresponding microarray expression ratios. Eleven of the 29 genes demonstrated statistically significant differences between mild and moderate emphysema (t-test, p < 0.05). For informa-tion on genes and their p values please see Additional file 2.

Biological replication of mRNA expression in the TPCH test set and in silico replication in public test sets
These 11 genes were submitted to biological replication in a test set of 62 lung samples from the TPCH lung tissue bank. Of the 11 genes selected from microarray analysis and technically validated by qRT-PCR, eight displayed concordant increased or decreased expression. Seven of the genes displayed greater than 1.3 fold changes in expression between moderate versus mild emphysema lung samples in the TPCH test set. These seven candidate emphysema severity genes were 60% (59% sensitive and 62% specific) accurate in classifying mild and moderate emphysema patients in TPCH independent test, 83% (83% sensitive and 83% specific) and 80% (80% sensitive and 80% specific) accurate in classifying normal and severe emphysema patients in Spira and Golpon studies respectively (See Additional file 5). The qRT-PCR expression results of the training and independent test sets are shown in Figure 2a &2b. In silico comparison of direction of gene expression between the three studies displayed five of seven genes to be concordant between Spira and TPCH cohort. Three of the five genes common with the HuGeneFL platform were observed to be concordant in direction of expression between the Golpon and TPCH cohorts ( Figure 3).

Discussion
We used gene expression microarrays with subsequent technical, biological and in silico validation, to identify genes differentially expressed between mild and moderate emphysema as defined by KCO. We believe that the rigour of this approach minimises the chance of identifying false positive genes and ensures that the most robust candidate genes are selected for functional validation. This study is the first to profile the genes involved in the progression of emphysema by comparing mild and moderate emphysema patients. This stage of disease is more amenable to intervention and therapy, and avoids a low signal to noise issue from the known global gene expression downregulation of severe end stage emphysema.
The 98 genes differentially expressed between mild and moderate emphysema were prioritised for technical validation, initially by choosing 51 genes represented in at least one of two public emphysema microarray platforms (Spira et al [12] and Golpon et al [13]). Using qRT-PCR, 29 of the 51 genes (56%) passed technical validation in our training set of 30 samples. In contrast to this study, Spira et al [12] and Golpon et al [13] randomly chose fewer candidate genes to validate by qRT-PCR (a total of ten and three candidate genes, respectively) and they found qRT-PCR expression to correlate strongly with Flowchart of the study design and outcome Figure 1 Flowchart of the study design and outcome. To facilitate external validation we used previously published emphysema datasets (Spira et al [12] and Golpon et al [13]) to verify the expression of our candidate genes. We compared the genes differentially expressed between mild and moderate emphysema at p < 0.01 (n = 98) in our study, with those in Spira identified one gene, TOMM20, to be differentially expressed but in different directions. Minimal or no gene overlaps between the three studies is a common observation in array comparisons, and likely to be due to the different populations studied, variation in biology, platforms, bioinformatics, statistical chance and technical differences [17,21]. A recent publication by Zeskind et al also emphasizes this issue of low reproducibility of differentially expressed genes between cohorts [22].
To our knowledge, this is the first and only study so far in emphysema to use an independent test cohort to verify the strength of candidate genes. Use of an independent test set for biological validation has been uncommon in previous gene expression profiling studies of emphysema in COPD patients. Eight genes showed concordant change in expression between TPCH training and test sets, and seven of the genes had 1.3 to 4.8 fold change in expression in the moderate emphysema compared with mild emphysema in the TPCH test set, providing increased confidence on the validity of these genes as candidates. The seven genes also showed reasonably high accuracy in classifying normal/mild and moderate/severe emphysema. The candidate genes (CDKN2A, GSTM3, COL6A3, SERPINF1, NRN1, NEDD4 and ZNHIT6) had ontologies that were relevant to emphysema progression, including cell cycle regulation (CDKN2A) [23], collagen (COL6A3) [24], antiangiogenesis (SERPINF1) [25] and oxidative stress (GSTM3) [26]. The expressions of all genes were disease associated, except for GSTM3 which was up regulated in the moderate emphysema cases. Few studies have also found an increase in GSTM3 expression in mild/moderate COPD smokers; this strengthens their role as protective intracellular and extracellular lung mediators [27,28]. To evaluate direct and indirect gene networks, we used Ingenuity Pathway Analysis (IPA) (Ingenuity Systems, http:// www.ingenuity.com/) to map biological pathways that linked these genes ( Figure 4). All eight genes were directly or indirectly linked within one network. For example, COL6A3 and the ZNHIT6 complex are indirectly regulated by cytokine growth factor, TGFβ1, which is linked directly to the CDKN2A complex and indirectly to the NFκB complex. The NFκB complex in turn indirectly regulates the enzymes NEDD4, GSTM3, and SERPINF1. CDKN2A, a cell cycle regulator, has a direct effect on NEDD4 and NRN1 through the PMEPA1 complex and transcriptional regulator HIF1A respectively. Canonical pathway analysis showed other pathways by which these genes could be involved, such as cell cycle checkpoint, p53 signaling, IGF-1 signaling, NRF2 mediated oxidative stress, Wnt/β-Catenin signaling and others (see Additional file 6). The genes were also significantly enriched in ontologies including development, differentiation and enzyme regulation (using DAVID -Database for Annotation, Visualization and Integrated Discovery) (see Additional file 7a &7b) [29]. To clarify the importance of these genes in Comparison of mRNA expression in seven candidate genes between TPCH test (n = 30, microarray) and training set (n = 62 qRT-PCR data), with two public microarray datasets of lung tissue samples (Spira et al, [12], n = 34; and Golpon et al [13], n = 10) Figure 3 Comparison of mRNA expression in seven candidate genes between TPCH test (n = 30, microarray) and training set (n = 62 qRT-PCR data), with two public microarray datasets of lung tissue samples (Spira et al, [12], n = 34; and Golpon et al [13], n = 10). Fold change represents mean expression ratio of moderate versus mild emphysema (TPCH training set), severe/mild emphysema versus normal (Spira et al), or severe emphysema versus normal samples (Golpon et al). The absence of a bar indicates the gene was not represented on the microarray platform.
emphysema progression, further functional characterisation is now required to measure the downstream effects from gene activation or gene inactivation and in in vitro or in vivo disease models.
A potential limitation of this study is the use of gas transfer measurements (KCO) to classify emphysema severity and lack of histological verification of emphysema severity in the lung samples tested. This was a challenge for this study due to the lack of availability of fresh and formalin fixed paraffin embedded tissue (FFPE) sections from the same site for mRNA analysis and pathological quantification respectively. Despite this, we were able to biologically replicate the expression of candidate genes in an independent set of lung tissues. Also to develop biological markers for disease severity it is important to correlate expression to clinical phenotypes such as KCO and FEV1.
By correlating gene expression profile with DLCO and FEV1, Spira et al [12] and Golpon et al [13] identified genes significantly associated with emphysema, including oxidative stress, immune, inflammation and extracellular matrix. Despite the TPCH test set being randomly selected, candidate genes still showed similar gene dysregulation to the TPCH training set when stratified by KCO, thus providing reassurance about the robustness of these genes as potential candidates for emphysema severity. Another potential drawback is the prioritisation of our gene list differentiating mild versus moderate emphysema samples using published studies [12,13] that compared normal versus severe emphysema lung samples. Although these were different stages of emphysema, we felt that this was a valid approach to prioritising our gene list for further validation, because we reasoned that involved path-Ingenuity Pathway Analysis (IPA) on the seven validated candidate genes ways would be more dysregulated along the continuum of normal, mild, moderate and severe emphysema.
In conclusion, we have used microarray technology to identify seven plausible candidate genes with potential involvement in the progression from mild to moderate emphysema, two of which, COL6A3 and SERPINF1, are concordantly increased in three different studies. It is highly likely that pathways rather than single genes are involved in progression of emphysema, mandating further investigation of the pathways in which these candidate genes are involved. Future goals include measurement of protein expression and characterization of function by knocking down candidate expression in vitro and quantifying cellular endophenotypes relevant to emphysema. These candidates could then be used to develop therapeutic targets against emphysema progression and potential diagnostic biomarkers to identify smokers with mild to moderate emphysema in COPD patients who are most susceptible to disease progression.

Conclusion
This study reports the identity of seven candidate genes that could be involved in emphysema severity. These genes have been technically and biologically validated in in-house training and independent datasets respectively. In addition, candidate genes also predicted normal and severe emphysema in Spira et al and Golpon et al datasets with a high accuracy of 83% and 80% respectively. The use of these genes as therapeutic or diagnostic tools warrants further investigation.