Variation in conserved non-coding sequences on chromosome 5q and susceptibility to asthma and atopy

Background Evolutionarily conserved sequences likely have biological function. Methods To determine whether variation in conserved sequences in non-coding DNA contributes to risk for human disease, we studied six conserved non-coding elements in the Th2 cytokine cluster on human chromosome 5q31 in a large Hutterite pedigree and in samples of outbred European American and African American asthma cases and controls. Results Among six conserved non-coding elements (>100 bp, >70% identity; human-mouse comparison), we identified one single nucleotide polymorphism (SNP) in each of two conserved elements and six SNPs in the flanking regions of three conserved elements. We genotyped our samples for four of these SNPs and an additional three SNPs each in the IL13 and IL4 genes. While there was only modest evidence for association with single SNPs in the Hutterite and European American samples (P < 0.05), there were highly significant associations in European Americans between asthma and haplotypes comprised of SNPs in the IL4 gene (P < 0.001), including a SNP in a conserved non-coding element. Furthermore, variation in the IL13 gene was strongly associated with total IgE (P = 0.00022) and allergic sensitization to mold allergens (P = 0.00076) in the Hutterites, and more modestly associated with sensitization to molds in the European Americans and African Americans (P < 0.01). Conclusion These results indicate that there is overall little variation in the conserved non-coding elements on 5q31, but variation in IL4 and IL13, including possibly one SNP in a conserved element, influence asthma and atopic phenotypes in diverse populations.


Background
Comparison of human DNA sequences with those of other mammalian species is a powerful method for identifying functionally important sequence elements in the human genome because sequences with function tend to be evolutionarily conserved whereas those without function tend to accumulate variation over time. In fact, ~50% of the DNA sequences that are evolutionarily conserved between humans and mice lie outside of coding sequences of known genes [1]. Some of these conserved non-coding sequences have been shown to be long-range transcriptional regulatory elements participating in the temporal and tissue-specific expression patterns of genes [2,3].
Previous comparison of a 1 Mb region on human chromosome 5q31, which includes the cytokine genes encoding the T-helper 2 (Th2) cytokines, interleukin (IL)-4, IL-5, and IL-13, with the syntenic murine segment identified highly conserved non-coding sequences [4]. Examination of these conserved non-coding sequences in five additional mammalian species demonstrated that these elements are frequently conserved in all mammals. The longest conserved non-coding sequence, called CNS-1, is located between the IL4 and IL13 genes and showed a high degree of conservation across species [4]. Functional evaluation of CNS-1 in mutant mice revealed its role in the control of the global expression of IL4, IL5 and IL13, VISTA plot [24] displaying evolutionarily conserved sequences identified by the comparison of ~48 kb of human 5q31 DNA encoding the IL4, IL13 and KIF3A genes with murine sequences (BAC clone AF276990) Figure 1 VISTA plot [24] displaying evolutionarily conserved sequences identified by the comparison of ~48 kb of human 5q31 DNA encoding the IL4, IL13 and KIF3A genes with murine sequences (BAC clone AF276990). On the horizontal axis, conserved sequences are plotted in relation to their position in the human reference sequence; kb distances are shown under the horizontal bar. The height of the peaks on the vertical axis indicates the level of conservation in percent identity between the human reference sequence and the murine sequences. Conserved sequences (>100 bp and >70% identity) defined as coding exons (dark blue), untranslated exons (light blue) and non-coding (red) are shown. The exons in each of three genes are shown as rectangle boxes; only the 3' end (exons 9 through 16) of KIF3A is shown. Six conserved non-coding elements were examined in this study (CNE-A --CNE-F). The SNPs identified or genotyped in this study and their approximate locations are shown. CNE-B corresponds to CNS-1 and CNE-F corresponds to CNS-2 described by Loots et al. [4].
It is likely, therefore, that additional variation in this interval contributes to susceptibility to both asthma and atopic phenotypes. In the present study, we screened six noncoding elements on 5q31 that are evolutionarily conserved between the human and murine genomes and are thus possible regulatory elements. We studied 10 polymorphisms across this region, including two within and two flanking conserved non-coding elements, and evaluated their relationship to asthma and atopy in members of a large Hutterite pedigree and in well-defined African American and European American patient populations.

Sample composition
Conserved non-coding elements ( Figure 1) were screened for SNPs in DNA from 10 African American and 10 European American unrelated controls, and from 10 individuals who are members of a founder population, the Hutterites. The 10 Hutterites were selected to represent distant branches of their pedigree but without regard to disease status.
Associations with asthma and atopy were evaluated in a large Hutterite pedigree [9] and in outbred individuals ascertained in Chicago. Six hundred thirty eight Hutterites were evaluated for asthma and atopy, as previously described [9]; 71 had a diagnosis of asthma, 156 were bronchial hyperresponsive to methacholine, and 311 were atopic. The Chicago samples included 205 African Americans and 126 European Americans with asthma and 388 control subjects with a negative personal and family history of asthma (183 African Americans and 205 European Americans). Subjects included in this study reported having had at least three grandparents who were either of African American or European ancestry. Given the allele frequencies observed in these samples (Table 4), we had 80% power to detect a relative risk of ≥ 1.7 in the African Americans and ≥ 2.2 in the European Americans [20].

Evaluation of phenotypes
The Hutterites were evaluated for asthma and atopy using previously described protocols [9]. Exposure to cigarette *Corresponds to CNS-2 [4] Corresponds to CNS-1 [4] smoke among the Hutterites was rare. The 331 unrelated asthma cases were recruited in Chicago as part of the Collaborative Study on the Genetics of Asthma (CSGA) and met the same diagnostic critieria as that used for the Hutterites [21,22]. Subjects with a history of cigarette smoking (>3 pack-year equivalent) were excluded from these studies. Atopy was defined by skin prick test. No clinical testing was performed on the control subjects. These protocols were approved by The University of Chicago Institutional Review Board; written consent was obtained from all subjects.

Identification of conserved sequences
An ~40 kb interval on human 5q31 was compared to the syntenic region in the mouse using AVID alignment programs [23] and visualized as a VISTA plot [24]. Conserved non-coding sequences were defined as having every contiguous subsegment of length 100 bp to be ≥ 70% identical to its paired sequence. These regions differ slightly from the earlier study [4] because in that study the CNE calculation was made using PIPMaker and here we used VISTA, which was developed after the Loots study.

Identification of polymorphisms
Amplified PCR products that included the conserved noncoding elements (Additional File, Table 1) were screened for polymorphisms by denaturing high performance liquid chromatography (DHPLC) [25], which detects nearly 100% of mutations in fragments of 600 bp or less [26][27][28][29]. PCR products with variant DHPLC patterns were sequenced; the complement of human BAC clone AC004039.1 was used as the reference sequence for identifying SNPs.

Genotyping
The genotyping methods used in this study are described in Additional File Table 2. In addition to four SNPs in or flanking conserved sequences, we genotyped six known SNPS in the IL4 and IL13 genes to evaluate LD patterns between these genes and the CNEs and evaluate the relative magnitude of their effects. These SNPs were IL13_-1112C/T [15], IL13_+1923 [17], IL13_Arg130Gln (A/G) [16,17], IL4_-589C/T [12], IL4_+3017 [13], and IL4_+8374A/G (previously identified in our lab).

Statistical analysis
In the Hutterites, genotyping errors were detected using PEDCHECK [30] and deviations from Hardy-Weinberg equilibrium (HWE) were determined using an application modified to allow for related individuals [31]. To test for associations with SNPs and haplotypes, we used a case-control test developed for large pedigrees, as previously described [32]. Haplotypes comprised of 10 SNPs across the interval were constructed manually by the direct observation of alleles segregating in families. During haplotype construction, missing genotypes were filled in if they could be directly inferred from family data but no inferences were made regarding the haplotype composition when there was more than one possible haplotype. Two locus (pairwise) haplotypes were then generated from the larger 10 SNP haplotypes. We corrected for multiple comparisons using a Bonferonni correction for 4 SNPs and 6 pairwise haplotypes (see Results), and we considered significant P-values to be <0.0125 (0.05/4) and <0.00833 (0.05/6), respectively.
Deviations from HWE and differences in allele and genotype frequencies between outbred cases and controls were examined using the program FINETTI [33]. Estimation of haplotype frequencies and testing for associations between cases and controls were conducted using the program FAMHAP [34]; 1,000 permutations were used to assess significance. If empiric P-values were <0.001, 10,000 permutations were performed. We used the Bonferonni correction for multiple comparisons (10 SNPs, P < 0.005; 45 pairwise comparisons, P < 0.0011). This is a conservative correction because these SNPs are not truly independent; some occur in the same gene and some are in LD. On the other hand, we did not correct for the number of phenotypes examined because these are also highly correlated. Within each ethnic group we compared the asthmatic and atopic cases to the non-asthmatic controls.
Linkage disequilibrium LD plots were generated in the Chicago samples using publicly available software [35].

SNP discovery
Six conserved non-coding sequences were identified in the interval between KIF3A and IL13 on human chromosome 5q31 (Figure 1). Of note is that none of the exons in either IL4 or IL13 are conserved between human and mouse, or between human and dog [36]. This is quite unusual (see for comparison, the pattern in KIF3A) and suggests possible divergence of function or accelerated rates of evolution of the human IL-4 and IL-13 proteins between humans and mice/dogs.
Eight SNPs, referred to as SNP1-SNP8, were identified within or flanking the six conserved elements (Table 1). One SNP (SNP2) in CNE-C was identical to a previously reported SNP, +33C/T [14], and one SNP (SNP8) was in CNE-F. Six additional SNPs were identified in the sequences flanking CNE-B (SNP1), CNE-D (SNP3, SNP4), and CNE-E (SNP5, SNP6, SNP7). No variation was detected in CNE-B, which corresponds to CNS-1 in the Loots study and was previously shown to coordinately regulate IL4, IL5 and IL13 [4,5,37]. CNS-F, which corresponds to CNS-2 in the Loots study, harbored one variant (SNP8). We note that SNP4 resides within a conserved element in the IL4 gene that was identified by Dubchak and colleagues using human-dog sequence comparisons [36]. Furthermore, other than one rare SNP in Chinese (rs17772853; minor allele frequency 0.01), there is no additional variation in these regions reported in dbSNP [38] or in two previous studies of this region [39,40], sug-Pairwise LD plots (r 2 ) for cases (lower half) and controls (upper half) in a) African Americans and b) European Americans

Figure 2
Pairwise LD plots (r 2 ) for cases (lower half) and controls (upper half) in a) African Americans and b) European Americans.
gesting that we identified all common variation in these conserved elements.
A description of the eight SNPs and their distribution among the 30 individuals in the screening sample is shown in Table 1. Three SNPs (SNP1, SNP3 and SNP5) were present only in the African American sample. The remaining five SNPs (SNP2, SNP4, SNP6, SNP7 and SNP8) were present in all three groups. SNP6 and SNP7 were the only variants that appeared to be in perfect LD in all three screening samples. Because so few SNPs were discovered in the conserved non-coding elements and because one of the SNPs (SNP7) fell within a conserved element defined using different criteria in another study [40], we genotyped SNP2, SNP4, SNP7 and SNP8, in addition to three known variants each in IL4 and IL13.

Patterns of linkage disequilibrium
Nine haplotypes, comprised of 10 SNPs, were present in the Hutterites (Table 2). Three groups of SNPs were in perfect LD in this founder population: +1923C/T and Arg130Gln in the IL13 gene; -589C/T, SNP2C/T, SNP4G/ A, and +8374A/G in the IL4 gene; and +3017G/T in the IL4 gene with intergenic SNP7A/G flanking CNE-E and SNP8G/C in CNE-F. For the remaining analyses in the Hutterites, therefore, we used only one SNP from each of these three LD groups, selecting the one with the most complete genotype information (+1923C/T, SNP2C/T, and +3017G/T), and one SNP that was not in perfect LD with any other SNP (-1112C/T). Because there were few pairs of SNPs that showed perfect LD in the outbred samples and they differed between cases and controls, we analyzed all SNPs in the outbred samples.

SNP studies in the hutterites
The minor allele frequencies of SNPs in the Hutterites were: IL13_-1112T, 0.208; IL13_+1923T, 0.173; SNP2-T, 0.156; and IL4_+3017T, 0.226. Genotype proportions were in HWE at all loci (P > 0.01). In the single SNP analyses, there were modest associations between IL13_-1112T and asthma (P = 0.025), BHR (P = 0.028), and allergic sensitization to CR allergens (P = 0.032); and between SNP2-T with allergic sensitization to molds (P = 0.034). None of these were significant after correcting for multiple comparisons. However, highly significant associations were present between variation in the IL13 gene and sensitization to mold allergens (lL13_-1112T, P = 0.00067; IL13_+1923T, P = 0.0074), which remained significant after correcting for multiple comparisons. Moreover, only SNPs in IL13 were associated with total serum IgE, with a highly significant association between high IgE and the IL13_+1923T allele (P = 0.00022) and a more modest association with the lL13_-1112T allele (P = 0.014). Adjusting for allergic sensitization to molds in the analysis reduced the significance of the IL13_+1923T allele, but did not eliminate the association (P = 0.0085).
To determine if susceptibility to asthma or atopic phenotypes is determined by combinations of SNPs across this interval or by specific haplotypes, we examined pairwise combinations of the four SNPs (Table 3). Highly significant associations (P < 0.001) with +SPT to mold allergens were observed with haplotypes comprised of either SNP in the IL13 gene (-1112C/T or +1923C/T) and SNP2 in the IL4 gene. Less significant associations were observed with these same pairwise combinations and allergic sensitization to cockroach allergen. However, in all of these analy-

-SNP haplotype analyses in the Hutterites. In this sample, IL13_+1923C/T is in perfect LD (r 2 = 1) with IL13_Arg130Gln (G→ A); SNP2C/Tis in perfect LD with IL4_-589C/T, SNP4G/A, and IL4_+8374A/G; IL4_+3017G/T is in perfect LD with SNP7A/G and SNP8C/G (Table 2). Only haplotypes and phenotypes with at least one P-value < 0.05 are shown. The number of cases in each analysis is shown in parentheses. P-values that were significant after adjusting for multiple comparisons are in bold font.
Specific IgE Response (+SPT) to ses, the haplotypes carrying the common alleles at the IL13 SNPs (-1112C and +1923C) were underrepresented in the cases compared with controls, and the two haplotypes carrying the minor alleles at the IL13 SNPs (-1112T and +1923T) were overrepresented in the cases compared with controls, regardless of the allele at SNP2 (i.e., C or T). Therefore, the results of the haplotype analyses suggest that the IL13 SNPs are primarily associated with allergic sensitization to mold and cockroach allergens, and that variation in the IL4 gene is not contributing to this association in the Hutterites. None of the haplotypes were associated with asthma, BHR, or the other atopic phenotypes. Thus, in the Hutterites, variation in the IL13 gene is strongly associated with total serum IgE and allergic sensitization to mold allergens, and to a lesser extent to cockroach allergens, but not to any of the other phenotypes. None of the SNPs in or near conserved non-coding sequences contributed to susceptibility in the Hutterites.

Studies in outbred case-control samples
Allele and genotype frequencies of the 10 SNPs in 337 subjects with asthma and 388 non-asthmatic controls are shown in Table 4 by ethnicity and phenotype. Genotypes were in HWE in the African American and European American control samples (P > 0.01). In the single SNP analyses, there was only a modest association between SNP2-T and allergic sensitization to mold allergens in European Americans (P = 0.04), which was not significant after adjusting for multiple comparisons.
However, pairwise combinations of SNPs in the IL4 gene were significantly associated with asthma and allergic sensitization, primarily in the European American sample ( Table 5). In that sample, nearly all of the haplotypes that were associated with asthma and the one most strongly associated with atopy included the IL4_-589T allele and other SNPs in the IL4 gene (SNP2-T, SNP4-A, IL4_+3017T, IL4_+8374G, SNP8-C). A haplotype comprised of the IL13_+1923T and IL13_130Gln alleles was also strongly associated with asthma in this sample. All but one of the seven associations remained significant after adjusting for multiple comparisons. In the African Americans, the frequencies of the IL13_-1112T/ IL13_+1923T and IL13_-1112T/IL4_+3017T haplotypes were increased in cases with allergic sensitization to mold (P = 0.009 and 0.005, respectively), although this was not significant after correcting for multiple comparisons.
However, because some of the controls may have been SPT+ to mold allergens, this is a conservative test. Similar to the Hutterites, there were no associations with asthma or SPT to any allergen or with combinations of SNPs in the IL4 gene in the African Americans.

Discussion and conclusions
Cross-species comparisons are powerful tools for identifying potential functional elements in non-coding DNA [3,4,36,[41][42][43][44]. However, it is unknown whether conserved non-coding elements in the human genome harbor variation that contributes to inter-individual differences in susceptibility to common diseases. To address this question, we surveyed variation in six conserved non-coding elements in the Th2 cytokine gene cluster on chromosome 5q31 to determine whether such variation, if it exists, is associated with susceptibility to asthma-related phenotypes.
Only one of these conserved non-coding elements, CNS-1 (CNE-B in our study), has been shown to have regulatory properties: the deletion of CNS-1 in transgenic mice results in the reduction of human IL-4, IL-5 and IL-13 producing cells [5,37]. Similar to our results, neither Noguchi et al. [39] nor Banerjee et al. [40] found sequence variation in CNS-1 in 48 individuals of Japanese origin [39] or in 17 individuals of African origin and 23 individuals of European origin [40]. These results combined with ours indicate that CNS-1 is highly conserved among humans and is under strong selective constraints, consistent with its role as a cis-acting regulatory element.
CNS-2 (CNE-F in our study) was also among the most conserved non-coding elements identified in a comparison of human 5q31 DNA with conserved syntenic mouse sequences, second only to CNS-1 [4]. We found one SNP in this element (SNP8), similar to the study of Banerjee [40]. However, this variant was not associated with asthma or atopy in the Hutterites or outbred case-control samples. However, we note that SNP8 in CNE-F (CNS-2) is in very strong LD with IL4_+3017, which was associated with IgE levels in a previous study in Caucasian subjects [13]. We did not find any variation in CNE-E, although one rare and two common SNPs (SNP5 and SNP6, SNP7, respectively) were identified just outside the boundaries of this element. SNP7 was in a conserved element defined by Banerjee, but this SNP was also not associated with asthma or atopy in our study.
Only SNP2 (+33C/T) in CNE-C was associated with asthma and atopy, and only when considered in combination with other SNPs in the IL4 gene. This SNP was previously associated with IgE levels in Japanese (P < 0.05) [14]. However, our results indicate that either combinations of SNPs in and near the IL4 gene act synergistically to influence susceptibility, or other variation on a haplotype that includes the -589-T, SNP2-T, SNP4-G, +3017-T, +8374-G, and SNP8-C alleles influences susceptibility. In either case, the variation in IL4 that influences asthma and atopy resides in non-coding regions. Similarly, the -589-T and +3017-T alleles, which have been associated with asthma and/or atopy in other studies [12,13,[45][46][47][48][49][50], do not by themselves or in combination with each other account for the associations observed in this study.
Lastly, we identified an association between variation in the IL13 gene and allergic sensitization to mold allergens in the Hutterites, which was also present, albeit to a lesser degree, in two outbred populations. Associations of other atopic phenotypes with two functional polymorphisms [15,51] in IL13 have been reported previously [15][16][17][52][53][54][55], but this is the first report of a specific association with +SPT to molds. Haplotypes comprised of SNPs in the IL13 gene were also associated with +SPT to mold allergens in the African American and European American samples, suggesting that either these SNPs interact to confer risk or additional variation in this gene also contributes. In addition, the +1923T and/or 130Gln alleles were also very strongly associated with total serum IgE (as a quantitative trait) in the Hutterites. The association with IgE was only partially accounted for by mold sensitization, indicating a role for this gene in IgE mediated immune responses, consistent with studies in other populations [16,17,54,55].
The fact that we identified associations between variation in the IL13 gene and atopy in all three populations (and with asthma in the European Americans), but between variation in the IL4 gene and asthma only in the European Americans, reflects the complexity of genetic susceptibility to asthma and atopy. It is notable that allele frequencies at SNPs across this interval differed considerably between the African American and European American samples (Table 4). For example, the minor allele in the European American sample was the more common (major) allele in the African American sample at five loci (IL13_+1923, IL4_-589, IL4_+3017, SNP7, CNE-F_SNP8). At nearly all other loci, the allele frequencies were more even (i.e., closer to 50%) in the African American than in the European American sample. Furthermore, although the overall pattern of LD was similar in the African American and European American control subjects (Figure 2), there was more LD between the -589C/T alleles with alleles at other IL4 SNPs in the European American cases compared with controls. The latter is the expected pattern at a disease locus [56], and is consistent with the highly significant associations that we observed between IL4 haplotypes and asthma in the European Americans. These differences in allele frequencies and LD patterns may have reduced our power detect associations in the African American sample, particularly with respect to untyped SNPs that might be in LD with IL4_-589. Alternatively, the observation that no one SNP or combination of SNPs is penetrant in all populations may reflect the modifying effects of background genes or environmental exposures on risk [57,58]. This possibility is supported by a genome-wide linkage study of asthma in which different linkage signals were detected in Caucasian and African American families, despite the fact that both groups were evaluated using identical protocols and ascertained at the same centers [10,21]. These results highlight the challenges in elucidating the genetic architecture of complex diseases, which is likely to differ among individuals with different environmental exposures and different genetic backgrounds, some of which is captured by racial/ethnic ancestry.
In summary, these data indicate that the conserved noncoding elements on human chromosome 5q31 in the interval including the IL13 and IL4 genes do not contain variation that influences disease risk among individuals. SNP2 (+33C/T), in a conserved element (CNE-C) in the IL4 gene, may influence susceptibility in combination with other variation in IL4, or may merely be in LD with other variation in the gene that influences susceptibility to asthma and atopic phenotypes. Additional studies are required to differentiate between these alternatives, to fully characterize the functional variants in this region that influence disease risk, and to provide a model for understanding the role of non-coding variation on gene function and disease susceptibility.