Genetic polymorphisms in lung disease: bandwagon or breakthrough?

The study of genetic polymorphisms has touched every aspect of pulmonary and critical care medicine. We review recent progress made using genetic polymorphisms to define pathophysiology, to identify persons at risk for pulmonary disease and to predict treatment response. Several pitfalls are commonly encountered in studying genetic polymorphisms, and this article points out criteria that should be applied to design high-quality genetic polymorphism studies.


Introduction
Genetic polymorphisms are defined as variations in DNA that are observed in 1% or more of the population. Genetic polymorphisms may alter protein structure and function through a single nucleotide base substitution in a gene's coding region, and may increase or decrease gene expression either by affecting mRNA stability when occurring in a gene's 3′ untranslated region or by altering transcription factor binding when occurring in the 5′ promoter region. Alternatively, a polymorphism may have no discernable effect on the protein product and may lie within DNA regions that are not involved in gene transcription or translation. Polymorphisms that exist in these regions as variations in repeat sequences throughout the genome have served the basis for genetic linkage analysis [1].
The study of genetic polymorphisms promises to help define pathophysiologic mechanisms, to identify individuals at risk for disease and to suggest novel targets for drug treatment. The methodology to study polymorphisms is simple, requiring only access to a polymerase chain reaction machine, funding for reagents, and DNA samples from cases and controls ( Fig. 1 illustrates the methods used to detect polymorphisms). The seemingly unlimited potential of genetics to help predict who will get lung disease or who, once diagnosed with disease, will have an unfavorable prognosis has inspired many investigators to jump on the bandwagon of studying genetic polymorphisms. While progress in understanding and treating pulmonary diseases has occurred through investigating genetic polymorphisms, the limitations and potential pitfalls of this approach may be under-appreciated.

Approaches to genetic polymorphism analysis
quality of association study design. To identify susceptibility loci, association studies involve typing a genetic polymorphism in unrelated affected individuals and in a group of healthy, ethnically matched controls. A given polymorphism is associated with the disease if that allele occurs at a significantly higher frequency among cases compared with controls. When evaluating polymorphisms as disease progression factors, rather than comparing affected individuals with healthy controls, investigators compare individuals with extreme phenotypes. If a significant association emerges, three possibilities exist: the polymorphism itself is the locus of interest, the polymorphism is in linkage disequilibrium (LD) with the locus, or confounding factors are present.
LD exists when alleles at two separate genetic loci are found more often together in a population than would be expected based on their individual allelic frequencies. Possible causes of LD include recent mutation, founder effects, or selection. Another potential cause of LD is a population admixture, where two populations that have been apart for a significant amount of time (and subse-quently may have very different allelic frequencies at numerous genetic loci) combine to form a hybrid population. Depending on the nature of the admixture, the resulting LD can be expanded beyond distances generally observed in more stable populations [3]. The best estimates of LD within an outbred population suggest that LD is unlikely to extend over distances more than 1-2 cM or about 1-2 million base pairs, although in an inbred population, such as the Hutterities, LD may exist over 10 times that distance [4].
Confounding factors must be considered particularly when polymorphisms identified in one study cannot be duplicated in a similar ethnic group. One confounding factor is population stratification. This may occur with an unbalanced ethnic admixture, such as a Caucasian admixture in the African-American gene pool [5]. Fortunately, this problem can be overcome by careful planning in either the analysis or study design phase [6]. Family-based association studies, which require the genotyping of affected individuals and their parents (and/or unaffected sibs), are specifically designed to control for the genetic background that can cause con- (a) Several methods to detect specific nucleotide changes (polymorphisms) exist. One method relies on hybridization of oligonucleotides of known sequences to target DNA. The target DNA is generally obtained using the polymerase chain reaction and specific primers. Allele-specific oligonucleotides are then used to detect single base changes in the DNA samples. Typically, target DNA is immobilized on a solid support and denatured. Labeled (radioactive or fluorescent) oligonucleotides are then allowed to anneal. Complementary sequences bind while noncomplementary sequences do not. Sequences that match the oligonucleotide are detected by fluorescence or when the oligonucleotide is radiolabeled by exposure to X-ray film. (b) Another means of rapid screening for DNA variations relies on detecting conformational changes in secondary structure caused by the nucleotide sequence alteration. The change in structure can be detected in a number of ways including denaturing gradient electrophoresis and denaturing gradient high-performance liquid chromatography. SSCP, single-stranded conformational polymorphism. (c) Base mismatch methods begin with creating heteroduplexes between wild-type or normal DNA and target DNA. Heteroduplexes with mismatches are detected by enzymatic or chemical cleavage, with the cleavage products resolved by electrophoresis. (d) DNA sequencing can also be used to detect polymorphisms but is the most labor intensive. The method involves synthesis of DNA using DNA polymerase. Dideoxynucleotides are included in the synthesis mix to randomly terminate synthesis at each nucleotide in the sequence. Generally, each dideoxy nucleotide is labeled with a flourescent tag. Terminated strands are separated by denaturing gel or capillary electrophoresis and are detected using fluorescence.
founding. Two common test statistics for these types of studies are the transmission disequilibrium test (TDT) and the haplotype relative risk (HRR). The TDT compares the frequency with which each allele is transmitted from a heterozygous parent to an affected offspring [7,8]. The HRR is similar to the TDT, but compares transmission of genotypes (haplotype) rather than alleles [9,10].
A critical factor to consider in genetic polymorphism studies is the choice of what phenotype to investigate. In fact, many studies that have evaluated genetic polymorphisms have been hindered by the case sample containing multiple phenotypes. For example, some studies of asthma have included samples consisting of patients with intrinsic, extrinsic, adult onset, mild and severe asthma. Studies that seek to determine whether an association between a polymorphism and disease exists can be greatly improved by studying a more narrowly defined intermediate phenotype.
In the case of asthma, intermediate phenotypes include total IgE and bronchial hyperresponsiveness. Another consideration is that genes responsible for disease susceptibility may not be the same genes involved with disease progression. It may be beneficial to limit the sample to those with a particular stage or severity of disease. For example, to evaluate chronic obstructive pulmonary disease (COPD) candidate genes, investigators have focused on patients with severe early onset COPD [11,12].
Association studies are limited to evaluating DNA polymorphisms near or within candidate genes. To perform a genome screen to search for candidate genes, linkage analysis using families or affected siblings is required. Linkage analysis is comprehensive and locates genes that exert a major effect on disease susceptibility, but linkage analysis has relatively low power and will fail to detect genes conferring only mild to moderate disease risk. For instance, if a disease susceptibility allele exerts a twofold disease risk compared with the wild-type allele, several hundred to several thousand families need to be typed, a sample size that may not be achievable. Association studies have greater power, but associations are detected over much smaller genetic regions (thousands of base pairs) compared with that detected by linkage analysis (millions of base pairs). To perform a genome scan with association studies, tens of thousands of markers would be needed, which is not possible with current technology; although it is anticipated that this may soon be possible [13].
Most lung diseases require some type of environmental inciting agent to be manifest. For most complex diseases where genetic susceptibility alone accounts for only a fraction of disease variation, not considering the environment can severely underpower gene-finding studies. Furthermore, genome screens or association studies performed on populations not stratified nor selected on the basis of environmental exposure may only identify genes for which the relevant environmental exposure is ubiquitous in that population. For example, studying a random population of asthmatics selected from the Midwest region of the US would stand a reasonable chance of identifying genes important in determining house dust mite response, but would not be likely to identify genes important in isocyanate-induced asthma.
Gene-environment interaction may manifest in various ways, including differential exposure risk effects based on an individual's genotype, or differential gene risk effects based on an individual's exposure. Methods to study gene-environment interaction have been reviewed by Yang and Khoury [14]. Two main interactions exist: statistical and biologic. A statistical interaction of risk factors (gene and environment) involves the coefficient of the product term of the genetic and environmental risk factors with the interaction measured in terms of departure from a multiplicative model. This method is arbitrary, model dependent and can ignore interaction or synergy on the biologic level. In the biologic interaction model, interaction between two factors is defined as their co-participation in the same causal mechanism to disease development, and in some instances may only be detectable in terms of a departure from an additive model.

Disease gene associations
Having reviewed the general approach for evaluating polymorphisms, we turn to some recent examples. Table 1 displays examples of recently published reports of polymorphisms that were evaluated in a variety of lung diseases.

Chronic beryllium disease
Genetic polymorphisms in human leukocyte antigen (HLA) genes associated with resistance or susceptibly to disease exist. HLA polymorphisms determine immune response variation to individual antigens, including autoantigens, and thus make for excellent candidate genes for a variety of immune-mediated disorders [15]. One striking example is in the genetic analysis of patients with chronic beryllium disease. These studies revealed an association with alleles of HLA-DPB1 encoding a DP beta chain with glutamic acid at residue 69 [16]. Individuals with glutamic acid at position 69 have nearly a 10-fold increased disease risk [17]. Furthermore, functional studies have demonstrated that the presence of glutamic acid at residue 69 is essential for reactivity in T-cell clones generated from three patients with disease [18]. Few examples in pulmonary diseases exist where the HLA association with disease risk is as strong.
Given the clinical and pathohistologic similarities of chronic beryllium disease and sarcoidosis, we have evaluated this same polymorphism in sarcoidosis patients and controls, and found no association [19]. The lack of association of glutamic acid at residue 69 in sarcoidosis illus-(page number not for citation purposes) Table 1 Examples of recently published studies of polymorphisms in lung disease

Disease
Reference Summary

ARDS
Polymorphisms of human SP-A, SP-B and SP-D genes: [40] Data presented suggest that SP-B or a linked gene contributes association of SPB Thr131ILE with ARDS to susceptibility to ARDS Asthma Effect of polymorphism of the β-2-adrenergic receptor on [41] Arg/Arg subjects who used albuterol regularly had AM PEF response to regular use of albuterol in asthma lower than Arg/Arg patients who had used albuterol as needed only. Subjects homozygous for glycine at β-2-adrenergic receptor-16 showed no such decline Association of a promoter polymorphism of the CD14 gene [42] -159 C to T promoter polymorphism in the CD14 gene was and atopy found associated with expression of a more severe allergic phenotype The role of the C-C chemokines receptor-5 Delta32 [43] Data indicate that the CCR5*D32 allele is not a genetic risk polymorphism in asthma and in the production of regulated factor for the development of asthma and does not influence on activation, normal T cells expressed and secreted disease severity nor influence RANTES production COPD TNF-α gene promoter polymorphism in COPD [44] TNF gene promoter allele was not found to influence the risk of developing COPD in a Caucasian population of smokers and there was no association with severity of airflow obstruction A polymorphism in the TNF-α gene promoter region may [45] Homozygosity for adenine substitution polymorphism at predispose to a poor prognosis in COPD position -308 was found associated with more severe airflow obstruction and a worse prognosis Microsatellite polymorphism in the heme oxygenase-1 [46] Findings suggest that the large size of a GT(n) repeat in the gene promoter is associated with susceptibility to emphysema heme oxygenase-1 gene promoter may reduce the gene's inducibility by reactive oxygen species in cigarette smoke, thus resulting in emphysema Cystic fibrosis HLA class II polymorphism in cystic fibrosis. A possible [47] DR7 allele was significantly associated with an increase in total modifier of pulmonary phenotype IgE and Pseudomonas aeruginosa colonization in cystic fibrosis patients An α1-antitrypsin enhancer polymorphism is a genetic [48] An enhancer polymorphism in the AAT gene was found modifier of pulmonary outcome in cystic fibrosis associated with better pulmonary prognosis in cystic fibrosis patients

Hypersensitivity pneumonitis
Major histocompatibility complex and TNF-α polymorphisms [49] Results suggest that genetic factors located with the major in pigeon breeder's disease histocompatibility complex region contribute to the development of pigeon breeder's disease TNF-α -308 promoter gene polymorphism and increased [50] The frequency for the TNFA2 allele, a genotype associated with TNF serum bioactivity in farmer's lung patients high TNF-α production in vitro, was significantly higher in farmer's lung patients

Idiopathic pulmonary fibrosis
Analysis of TNF-α, lymphotoxin alpha, TNF receptor II, and [28] This is the first paper to suggest that disease progression in IL-6 polymorphisms in patients with idiopathic pulmonary idiopathic pulmonary fibrosis may be linked to a particular fibrosis genetic marker or to functional polymorphisms Sarcoidosis HLA-Gm/κ interaction in sarcoidosis. Suggestions for a [51] This study addresses the interplay between IgG heavy chain/κ complex genetic structure light chain markers and major histocompatibility complex genes Lack of association with IL-1 receptor antagonist and IL-1β [52] No bias in the IL-1 receptor antagonist and IL-1β genotype was gene polymorphisms in sarcoidosis patients found in Japanese sarcoidosis patients CC chemokine receptor gene polymorphisms in [53] CCR5Delta32 and CCR2-64I were found associated with Czech patients with pulmonary sarcoidosis sarcoidosis

Silicosis
Polymorphisms of the IL-1 gene complex in coal miners [54] This is the first report showing an association between the IL-1 with silicosis receptor antagonist polymorphism and silicosis PEF, Peak expiratory flow; ARDS, acute respiratory distress syndrome; COPD, chronic obstructive pulmonary disease; HLA, human leukocyte antigen; IL, interleukin; TNF, tumor necrosis factor.
trates the pitfall of studying polymorphisms in candidate genes, however attractive, chosen based on a limited understanding of the pathophysiology of disease.

Chronic obstructive pulmonary disease
That only 10-20% of cigarette smokers develop symptomatic COPD suggests that genetic factors are likely to be important. In addition, several studies have shown an increased prevalence of COPD within families. COPD thus appears to be ripe to investigate genetic polymorphisms in disease susceptibility [20].
Studies have implicated oxidant-anti-oxidant interaction in the pathogenesis of COPD, which led Smith and Harrison to investigate genetic polymorphisms of the xenobiotic metabolizing enzyme, microsomal epoxide hydrolase (mEH) [21]. A mEH slow allele and a mEH fast allele exist. The homozygous state for the slow alleles results in very slow microsomal epoxide hydrolase activity. The presence of epoxides in the lung for longer periods following cigarette smoke exposure could lead to greater tissue damage and inflammation.

The study design and statistical analysis in the Smith and
Harrison study is commonly encountered in reports of disease-associated genetic polymorphisms. The investigators studied blood donor controls (n = 203), patients with asthma (n = 57), patients with lung cancer (n = 50), patients with COPD (n = 68) and patients with emphysema (n = 94). The proportion of individuals with innate slow activity was significantly higher in both the COPD group and the emphysema group than in the control group: COPD 19% versus control 6%, and emphysema 22% versus control 6%. The odds ratios for homozygous slow activity versus all other phenotypes were 4.1 for COPD and 5.0 for emphysema. Koyama and Geddes have noted that caution is needed over the interpretation of these findings [22]. The groups are small and the emphysema group was unusual, being defined from the morbid anatomy of lung samples resected for cancer.
Sakao et al. have argued that tumor necrosis factor (TNF)-α, a potent pro-inflammatory cytokine, may be involved in the development of COPD [23]. TNF-α has been reported to be elevated in bronchoalveolar lavage, bronchial biopsies and induced sputum of COPD patients. A polymorphism at position -308 of the TNF-α gene promoter is associated with alteration of TNF-α secretion in vitro [24]. The polymorphism consists of a guanine to adenine substitution. The guanine allele was denoted as 1 and the adenine allele as 2.
Sakao et al. compared TNF-α-308 1/2 allele frequencies in 106 Japanese patients with 110 asymptomatic smoker/ ex-smoker control subjects matched for sex and age, and 129 population control blood donors. The authors reported that TNF-α-308 1/2 alleles were significantly associated with the presence of smoking-related COPD. Allele frequencies were significantly different among the groups: in patients with COPD, the 1/2 allele frequencies were 0.835/0.165; in smoker/ex-smoker control subjects, 0.918/0.082; and in the population control subjects, 0.922/0.078. However, the TNF-α-308*2 allele was not found to be associated with COPD in a white population. Furthermore, studies have been inconsistent in demonstrating an association with the presence of the TNF-α-308*2 allele in a number of inflammatory diseases such as sarcoidosis and asthma [25][26][27].
Sandford et al. noted that widely divergent rates of decline in lung function in smokers would be a robust phenotype for detecting genes that contribute to COPD severity [11]. This association study has enhanced features, including reducing phenotypic heterogeneity by focusing on the decline of lung function rather than COPD and by comparing extreme phenotypes. From 5887 male and female smokers recruited to the Lung Health Study (LHS) conducted by the National Heart, Lung and Blood Institute, Sandford et al. selected 283 smokers with the fastest rate of decline of forced expiratory volume in 1 s, and 308 smokers who had no decline. All subjects were white and continued to smoke during the 5-year period of the LHS. These subjects were genotyped for polymorphisms in α1antitrypsin, mEH, vitamin D binding protein, and TNF-α and TNF-β genes. TNF-β is also known as lymphotoxin alpha. The authors found [11] that the α1-antitrypsin MZ genotype and the mEH His113/His139 slow haplotype were associated with increased rate of decline of lung function. Both of these associations were strong when the subject had a family history of COPD, suggesting an interaction with other familial risk factors. No association of the TNF haplotypes with rate of decline of lung function was found.
It would be helpful to re-evaluate these polymorphisms in a family-based association study using TDT or HRR. Unlike asthma or sarcoidosis, however, COPD generally occurs later in life. Those recruited to the LHS range in age from 35 to 60 years, making it more difficult to recruit their parents.

Idiopathic pulmonary fibrosis
A strong link between overexpression of pro-inflammatory mediators and idiopathic pulmonary fibrosis (IPF) exist.
Recently, Pantelidis et al. reported an evaluation of TNF, lymphotoxin alpha, TNF receptor II and IL-6 polymorphisms in patients with IPF [28]. These investigators, through their thorough analysis, raised several issues regarding genetic polymorphism studies. While allele frequencies did not differ between a normal, white, British control population and a sample with IPF, they did observe a significant increase in the frequency of a particular TNF haplotype in females with IPF compared with males. Similar gender association has been observed in the distri-bution of TNF-α haplotypes in ulcerative colitis [29]. This raises the issue that polymorphisms may act differently in women, and separate analysis of associations with polymorphisms in women may need to be considered. Pantelidis et al. [28] also noted an increased frequency of co-carriage of the IL-6 (intron 4G) allele located on chromosome 7p21-p14 and the TNF receptor II (1690C) allele located on chromosome 1p36.2. Since complex disorders, such as IPF, can be expected to involve several genes, combination of alleles on different chromosomes may need to be considered.

Tuberculosis
Convincing evidence exists that genetic factors are important in tuberculosis susceptibility. Stead et al. found, among over 25,000 tuberculin-negative nursing home residents, that black subjects were twice as likely to become infected with tuberculosis as white subjects living in the same environment [30]. Twin studies in tuberculosis have consistently found much higher disease concordance among monozygotic than dizygotic twins [31].
Studies on murine models of susceptibility to mycobacterial infection led to the discovery of the natural resistanceassociated macrophage protein gene and its human homolog NRAMP1 [32,33]. Bellamy et al. found a significant association of NRAMP polymorphisms and tuberculosis in a Gambian population (n = 800) [34]. Functional studies indicate that NRAMP1 is involved in the early stages of macrophage priming and activation, making NRAMP1 an attractive candidate gene for both tuberculosis and sarcoidosis [35].
We analyzed the same NRAMP gene polymorphisms found associated with tuberculosis in an association case-control study of 157 African-American sarcoidosis patients and 111 control subjects [36]. These polymorphisms included a microsatellite repeat in the 5′ region (5′-CA), a non-conservative single base substitution at codon 543 changing aspartic acid to asparagine (D543N) position 543, a TGTG deletion in the 3′ untranslated region and a single nucleotide change in intron 4. Our results, in contrast to those reported in tuberculosis patients, showed that the genotypes found associated with tuberculosis were underrepresented in the sarcoidosis patients, suggesting a potential protective effect in sarcoidosis. One could speculate a mechanism whereby altered NRAMP expression could lead to susceptibility to tuberculosis and a protective effect in sarcoidosis, but it is more important to confirm these findings. We are therefore presently analyzing NRAMP polymorphisms in 240 sarcoidosis families using TDT.

Pharmacogenetics
An exciting development has been the evaluation of genetic polymorphisms in determining treatment response. The best data concerns beta-2-adenoreceptor polymorphisms. It appears that individuals who are homozygous for the glycine 16 variant of the adenoreceptor, which downregulates to a greater extent than other forms of the receptor, show a reduced response following chronic beta-agonist use [37,38]. Treatment response to 5-lipoxygenase inhibitors also appears determined by polymorphisms in the promoter regions of this gene [39].

Bandwagon or breakthrough?
Several reasons exist why the genetic evaluation of lung diseases is not proceeding as rapidly as one might expect.
There is a need to apply several criteria ( Table 2) in designing high-quality association studies and in the interpretation of their results.

Accurate narrow definition of the phenotype
Disease heterogeneity may obscure association between a disease subtype and a polymorphism. By narrowing the phenotype, the investigator improves chances of uncovering an association. An alternative strategy is to analyze an intermediate biologic or clinical phenotype (e.g. elevated IgE in patients with asthma).

Large sample size
Given the number of conflicting reports of disease-associated genetic polymorphisms, it is important to point out that most negative studies lack power. A strong inverse relationship between the allele frequency in a population and the sample size required to test the allele contribution to the phenotype exists. Furthermore, complex diseases are polygenic in nature and we generally evaluate one genetic variant or a small fraction of genetic variants of the many genes likely to contribute, thus further requiring a larger sample size. Networks of investigators are probably needed when addressing complex diseases. Given the cost and time required to recruit such populations, investigators should be encouraged to enter into large-scale collaborations. For example, in the US, in the multicenter A Case Control Etiologic Sarcoidosis Study (ACCESS), 10 clinical centers recruited over 700 cases and 700 controls; and in the Sarcoidosis Genetic Analysis (SAGA) study, 11 centers are recruiting 350 affected sibling pairs and their family members for linkage analysis.

Well-matched controls
Besides lack of power, most genetic polymorphism studies are confounded by lack of well-matched controls.
In fact, rather than the recruitment of patients, the real challenge often lies in the recruitment of controls. Control groups are generally confounded by a selection bias that may influence the genetic makeup of that population.

Biological plausibility and functional significance of candidate genes
Most association studies evaluating candidate genes choose candidates based on their potential role in the pathophysiology of the disease. When evaluating polymorphisms, it is best that they have some plausible biologic role. However, when a candidate gene is chosen based on previous linkage with the disease or an intermediate phenotype, it is more likely that the candidate gene is in some way involved.

Independent replication in other populations
Polymorphism-disease associations from different populations are extremely difficult to interpret. Different genetic variants contributing to the phenotype may be different in different populations. We therefore cannot expect to generalize the findings in one ethnic group to another ethnic group, but we can insist that when an association is found using a case-control design, it is repeated using a familybased association study to minimize population stratification effects.

Conclusions
We have defined genetic polymorphisms, have pointed out why their study has become common, and have reviewed several pitfalls that need to be considered in designing genetic polymorphism studies and their interpretation. We have discussed association and linkage studies. Linkage studies can ensure that genes exerting large effects on disease susceptibility have not been missed, but will fail to identify genes exerting mild to moderate effects on disease risk. Association studies can detect genes that exert smaller effects, but cannot be used to screen the genome. Association studies look at known genes. TDTs greatly reduce the likelihood that any allele frequency differences between cases and controls might be due to unsuspected genetic differences among subgroups within the population.
A search for disease genes is best accomplished when both association and linkage strategies are used. We briefly cited some of the genetic polymorphism studies in COPD, sarcoidosis, IPF and tuberculosis. Criteria for a high-quality association study were listed and discussed. While many have jumped on the genetic polymorphism bandwagon, those investigators who address the potential pitfalls will use genetic polymorphisms to provide breakthroughs in understanding disease.