X chromosome associations with chronic obstructive pulmonary disease and related phenotypes: an X chromosome-wide association study

Background The association between genetic variants on the X chromosome to risk of COPD has not been fully explored. We hypothesize that the X chromosome harbors variants important in determining risk of COPD related phenotypes and may drive sex differences in COPD manifestations. Methods Using X chromosome data from three COPD-enriched cohorts of adult smokers, we performed X chromosome specific quality control, imputation, and testing for association with COPD case–control status, lung function, and quantitative emphysema. Analyses were performed among all subjects, then stratified by sex, and subsequently combined in meta-analyses. Results Among 10,193 subjects of non-Hispanic white or European ancestry, a variant near TMSB4X, rs5979771, reached genome-wide significance for association with lung function measured by FEV1/FVC (\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta$$\end{document}β 0.020, SE 0.004, p 4.97 × 10–08), with suggestive evidence of association with FEV1 (\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta$$\end{document}β 0.092, SE 0.018, p 3.40 × 10–07). Sex-stratified analyses revealed X chromosome variants that were differentially trending in one sex, with significantly different effect sizes or directions. Conclusions This investigation identified loci influencing lung function, COPD, and emphysema in a comprehensive genetic association meta-analysis of X chromosome genetic markers from multiple COPD-related datasets. Sex differences play an important role in the pathobiology of complex lung disease, including X chromosome variants that demonstrate differential effects by sex and variants that may be relevant through escape from X chromosome inactivation. Comprehensive interrogation of the X chromosome to better understand genetic control of COPD and lung function is important to further understanding of disease pathology. Trial registration Genetic Epidemiology of COPD Study (COPDGene) is registered at ClinicalTrials.gov, NCT00608764 (Active since January 28, 2008). Evaluation of COPD Longitudinally to Identify Predictive Surrogate Endpoints Study (ECLIPSE), GlaxoSmithKline study code SCO104960, is registered at ClinicalTrials.gov, NCT00292552 (Active since February 16, 2006). Genetics of COPD in Norway Study (GenKOLS) holds GlaxoSmithKline study code RES11080, Genetics of Chronic Obstructive Lung Disease. Supplementary Information The online version contains supplementary material available at 10.1186/s12931-023-02337-1.


Annotation
Annotation of variants was performed by assessing the closest gene to each variant by distance using National Center for Biotechnology Information databases dbSNP/Gene, UCSC Genome Browser,Ensembl,LDlink,. Distance based annotation does not imply function and other genes in the region should be considered for functional investigations.

Power to detect associations
The genotype relative risk for rare events, including a single nucleotide polymorphisms (SNP), is approximately the odds-ratio (OR). In this study, assuming a COPD prevalence of 10%, while using a genome-wide significance level of p 5 x 10 -8 , and a disease allele frequency threshold equal to the effect allele frequency (EAF) of < 1%, the current meta-analysis with 5382 cases and 3501 controls (case to control ratio 1.537) has 90% power to detect a variant with an OR of 2.095, 10% power to detect an OR of 1.639, and < 0.1% power to detect an OR in the range of 1.000 - 1.320 (12, 13).
In Table 2 we present the top suggested COPD association from the meta-analysis, including top variants found in at least one population strata. The EAF range is 0.01 -0.39, OR range is 0.65 -3.83, and Pvalue range is 0.897 -2.65 x 10 -6 . For the top COPD variant among all subjects, in rs138704174 with EAF 3%, there is 85.4% power to detect this association with OR of 1.58 and p 5 x 10 -8 . For the lowest EAF variant for COPD among all subjects, rs150086151 with EAF 1%, there is 3.5% power to detect this association with OR of 1.55 at p 5 x 10 -8 . For a theoretical variant in this COPD meta-analysis with EAF 1% there is < 1% power to detect an association with an OR < 1.316 at p 5 x 10 -8 .
To improve power with the same COPD prevalence (10%) and significance level (p < 5 x 10 -8 ), an idealized study would have a case to control ratio of 1.00 and would require 100,000 cases and 100,000 controls to have 90% power to detect a variant with EAF of 1% and an OR of 1.203.

Annotation of rs142755000
The sex-stratified analysis by Zhao et al. found that rs142755000 reached genome-wide significance for FEV1. Zhao found it has the same direction of effect in males and females, but a notably larger effect in males, which was not seen in our current study. We annotated rs142755000 to HMGN5 and a nearby top suggested variant we identified in HMGN5, rs185387095, did have significant sex-differences in this XWAS for FEV1 with larger effect in males ( males -0.039, females -0.029, sex-difference p 3.30 x 10 -02 ). The closest gene to rs142755000 is BRWD3, 149kb upstream. In this study we annotated rs142755000 to HMGN5, which is 155kb downstream ( Table 2). This annotation to HMGN5 was made due to the fact that another top suggested association in this study, rs185387095, is found in HMGN5, and rs142755000 is in linkage disequilibrium (LD) at R 2 0.52 (11). Additionally, there are other LD variants within the same recombination hotspot that includes HMGN5 and are in and near rs142755000 (supplement Figure 3).

Suggestive associations
In COPD there was a suggestive association in SH3KBP1, a gene that escapes XCI, encoding a protein that facilitates protein-protein interactions and has been implicated in cellular processes including cell adhesion, cytoskeletal rearrangement, apoptosis, endocytosis, and it has been shown to play a role in maturation of alveolar epithelial cells and surfactant production in mice (7,(14)(15)(16). SH3KBP1 is expressed in lung tissue and exhibits sex-biased gene expression in whole blood (17,18).
In lung function XWAS, variants in Xp11.21 were implicated including the dense region in FOXR2/near RRAGB/near PAGE5. FOXR2 is a member of the FOX superfamily of genes known to interact with -catenin and to play a role in epithelial-mesenchymal transition of cancer cells, and in non-small cell lung cancer (NSCLC). FOXR2 expression inactivates the Wnt/ -catenin pathway (19,20). RRAGB, a gene expressed in lung tissue, is part of a large family of signal transducers, and high expression of RRAGB has been found to predict good survival in NSCLC (7,18,21). PAGE5 is part of family of proteins expressed in some fetal tissues as well as in a variety of tumors, and it encodes a protein that may protect cells from programed cell death (22).
Among top suggestive associations in females were a number of genes that are expressed in lung tissue and have interesting implications in disease pathobiology (18). LINC0259 is a long noncoding RNA that mediates TGF-signaling and has been implicated in lung cancer cell migration and invasion (18,23). TMEM47 encodes a highly conserved protein that is a member of the PMP22/EMP/claudin family important in cell morphology that is involved in localization of tight junction proteins and actomyosin structure (7,24,25). ITM2A has been implicated in ankylosing spondylitis where it is differentially expressed by CD4+ T cells and is involved in T cell activation (26,27). TAB3 encodes a protein that functions in the NF-B signaling pathway, which plays a role in response to pro-inflammatory cytokines TNF or IL-1. It has been reported to be involved in signaling events in pathogenesis or progression of idiopathic pulmonary fibrosis, where it is thought to alter immune response, tissue repair, and fibrosis (28).
DMD, a top suggestive association for COPD for males as well as for FEV1/FVC for females, encodes dystrophin and is the largest gene identified in humans; it has been associated with an extensive number of traits (7,15,16,29). DMD is known to escape X chromosome inactivation, demonstrate sex-biased gene expression in lung tissue with a female bias, and have femalebiased expression in numerous other tissues (15,17,30). Dystrophin is present in cardiac and cytoskeletal muscle and mutations in DMD lead to Duchenne's and Becker's muscular dystrophies, X-linked recessive disorders that manifests only in males and results from decreased dystrophin production (31). Duchenne's muscular dystrophy patients have early mortality related to respiratory muscle weakness, with decreased muscle strength leading to hypoventilation.  Recombination rate (cM/Mb)  Recombination rate (cM/Mb) Recombination rate (cM/Mb) Recombination rate (cM/Mb) Recombination rate (cM/Mb) Recombination rate (cM/Mb) Recombination rate (cM/Mb) Recombination rate (cM/Mb) Recombination rate (cM/Mb) Recombination rate (cM/Mb)  Recombination rate (cM/Mb)  Recombination rate (cM/Mb) Recombination rate (cM/Mb) Recombination rate (cM/Mb) Recombination rate (cM/Mb)  Recombination rate (cM/Mb)  Recombination rate (cM/Mb)  Includes 18 female-biased edges (red), 5 male-biased edges (blue). Yellow nodes represent target genes in the X chromosome and purple notes in the autosome.