Differential transcriptomics in sarcoidosis lung and lymph node granulomas with comparisons to pathogen-specific granulomas

Rationale Despite the availability of multi-“omics” strategies, insights into the etiology and pathogenesis of sarcoidosis have been elusive. This is partly due to the lack of reliable preclinical models and a paucity of validated biomarkers. As granulomas are a key feature of sarcoidosis, we speculate that direct genomic interrogation of sarcoid tissues, may lead to identification of dysregulated gene pathways or biomarker signatures. Objective To facilitate the development sarcoidosis genomic biomarkers by gene expression profiling of sarcoidosis granulomas in lung and lymph node tissues (most commonly affected organs) and comparison to infectious granulomas (coccidiodomycosis and tuberculosis). Methods Transcriptomic profiles of immune-related gene from micro-dissected sarcoidosis granulomas within lung and mediastinal lymph node tissues and compared to infectious granulomas from paraffin-embedded blocks. Differentially-expressed genes (DEGs) were profiled, compared among the three granulomatous diseases and analyzed for functional enrichment pathways. Results Despite histologic similarities, DEGs and pathway enrichment markedly differed in sarcoidosis granulomas from lymph nodes and lung. Lymph nodes showed a clear immunological response, whereas a structural regenerative response was observed in lung. Sarcoidosis granuloma gene expression data corroborated previously reported genomic biomarkers (STAB1, HBEGF, and NOTCH4), excluded others and identified new genomic markers present in lung and lymph nodes, ADAMTS1, NPR1 and CXCL2. Comparisons between sarcoidosis and pathogen granulomas identified pathway divergences and commonalities at gene expression level. Conclusion These findings suggest the importance of tissue and disease-specificity evaluation when exploring sarcoidosis genomic markers. This relevant translational information in sarcoidosis and other two histopathological similar infections provides meaningful specific genomic-derived biomarkers for sarcoidosis diagnosis and prognosis.


Introduction
Sarcoidosis is a multisystemic disease of unknown etiology characterized by the formation of granulomatous lesions, especially in lung tissues and thoracic lymph nodes [1,2]. The clinical heterogeneity and unpredictable disease course in sarcoidosis represents a challenge in early diagnosis and prediction of progression. Despite significant progress in the understanding of genetic predisposition and role of immunity in sarcoidosis pathogenesis, predicting the clinical course of sarcoidosis in a given patient remains challenging. The majority of sarcoidosis patients experience spontaneous recovery, however, fully a third of subjects develop complicated sarcoidosis defined as multiorgan involvement, progressive fibrotic lung involvement and pulmonary function deterioration [3,4]. Validated sarcoidosis biomarkers are desperately needed to distinguish patients who will spontaneously recover from patients who will worsen and develop severe manifestations of the disease. Thus, personalized medicine approaches are tasked with the capacity to predict the patient phenotype likely to develop progressive disease or that may eventually require lung transplantation. While there has been progress in identifying genetic variants that contribute to complicated sarcoidosis [5,6], and we have previously utilized genomic expression profiling of peripheral mononuclear cells (PBMCs) to subphenotype patients with a variety of inflammatory disorders [7][8][9] including sarcoidosis [10], there remains an important and unmet need for molecular and specific genome-based biomarkers for diagnosis and prediction of disease severity in sarcoidosis.
The etiology of sarcoidosis is unknown, and the diagnosis remains one of exclusion [4]. Specific infections, including a variety of Mycobacterium species and Propionibacterium acnes [11][12][13][14][15][16][17][18][19][20], and environmental exposures [21] produce granulomatous inflammatory lesions that mimic sarcoidosis and have been posited as potential etiologic agents. Exposure to mycobacterium tuberculosis (TB) and endemic fungi such coccidioidomycosis (CM), also known as Valley Fever, are major causes of granulomatous lung disease [15] and must be clinically excluded when sarcoidosis is suspected. Coccidioidomycosis caused by Coccidioides immitis and C. posadasii, a soil-dwelling fungi disease, are endemic in the arid southwestern USA and northern Mexico.
Granulomas are conglomerates of epithelioid and multinucleated giant cells surrounded by CD4+ and CD8+ T lymphocytes that result from the complex immunopathogenesis between host genetic factors, environmental or infectious triggers [11]. Sarcoidosis arraybased transcriptomic studies in blood, lung and lymph node, lacrimal gland, orbital tissue and skin identified tissue-specific differential gene expression [22][23][24][25], highlighting the importance of assessing different compartments in a multisystemic disease such as sarcoidosis. The majority of these transcriptome studies were focused on identifying gene signatures that differentiate controls from progressive forms of sarcoidosis. Additional studies in peripheral blood compared the gene expression of sarcoidosis and TB, identifying a significant overlap in gene expression associated with Type I and II IFN pathways [24]. Approaches to quantify the expression of sarcoidosis granuloma-related genes using RT-PCR focused solely on cytokine-related genes associated with NFKB and STAT transcription factors [26].
As expression profiling of sarcoidosis granuloma using next generation sequencing expression has not been reported, one goal of the present study was to compare gene expression profiles in two sarcoidosis granulomatous tissues: lung and lymph node. The second goal of this study was to assess the ability of granuloma gene expression profiles to discriminate sarcoidosis from TB and CM. Our bioinformatic analyses included determination of unique pathways associated with sarcoidosis and overlapping pathways with CM and TB. Our results indicate that despite histologic similarities, sarcoidosis lung and lymph node granulomas exhibit distinct expression profiles. In addition, sarcoidosis pathway expression was significantly divergent compared to TB and CM. These findings suggest the importance of tissue-specific pathobiology considerations to be employed when exploring potential sarcoidosis genomic markers for potential diagnostic use and discriminate from other granulomatous diseases.

Sample selection and design
We used microdissected granulomatous tissue from lung [6] and mediastinal lymph node [12] from 18 subjects with sarcoidosis, three with CM, four with TB and control tissue from lungs [3] and lymph nodes [3] from 6 healthy individuals. Patient and sample characteristics are described in Table 1. Tissue specimens were collected from clinically-indicated biopsies for diagnostic purposes. Archived, de-identified specimens were acquired from the Tissue Acquisition Shared Resource at the University of Arizona. The study was approved by the human subjects protection program IRB # 1509097312A001.

Gene expression profiling of selected granulomas
Formalin-fixed, paraffin-embedded (FFPE) microdissections from lung and lymph node granulomatous tissue and healthy tissue (controls) were assayed using Next Generation Sequencing-based gene expression by HTG EdgeSeq Oncology-biomarker (HTG Molecular Diagnostics, Inc.). This system uses targeted capture sequencing to quantitate RNA expression levels of gene targets in FFPE tissues. The panel included 2535 probes and 15 housekeeper genes for quantitative analysis of targeted mRNAs. A list of the complete biomarker panel genes and the biological pathways assayed are in Additional file 1: Table S1. This panel covers multiple immuneoncologic-related pathways previously published and identified as relevant in sarcoidosis including such as interferon, MAP kinase, NFКB, and JAK/STAT pathways [18][19][20]. Representative tissue sections from granulomatous tissue were stained with hematoxylin-and eosin (H&E). Sites for granuloma microdissection were selected by an expert pathologist (Additional file 2). Sections of ~ 1.5 mm 2 thickness of a 5 μm FFPE tissue were obtained, 5-10 sections per subject were lysed followed by RNA extraction-free chemistry method using the HTG ® 's assay kit. Library preparation for the HTG Edge processor for nuclease protection steps was used, followed by PCR tagging, library amplification, quantitation and normalization. Library was then sequenced on an Illumina MiSeq using 150-cycle V3 kits [27].

Analysis of biomarker transcriptomes and canonical pathways
We generated the transcriptome index using the salmon package [28] and a fasta file containing the sequence of the HTG EdgeSeq probes. Quantification was performed for each sample using the salmon quant command [28], the index file and the corresponding fastq files. The raw counts were loaded into R using the tximport package [29]. The Differential Expression (DE) was calculated using Limma [30] and EdgeR [31] packages from the R v3.5.3 and Bioconductor v3.8 packages [32]. A list with statistically-significant genes for each comparison was submitted for pathway enrichment unbiased comparison of the DE gene sets to the human ConsensusPathDB [33] website using the over-representation gene set analysis against the pathway databases and the gene ontology categories using a minimum overlap with input list of 2 and a p-value cutoff of 0.01.

Transcriptome dataset validation
We validated our transcriptome results in three independent cohorts of microarray datasets publicly available in the Gene Expression Omnibus (GEO), GSE16538 and GSE63548 [34] and an unpublished microarray dataset, details in Additional file 3: Table S3. At the time this manuscript was written, no GEO data sets were available for humans infected with CM. The microarray datasets for TB and sarcoidosis were analyzed using R (v3.5.3) and Bioconductor (v3.8) [32]. Oligo package (v1.46.0) was used for the processing of the data in combination with specific platform design library and the corresponding annotation library. For the Affymetrix array analyses, Oligo's implementation of the Robust Multichip Average algorithm (RMA) for background correction, quantile normalization and summarization was used. For the Illumina arrays, the Illuminaio package [35] was used in combination with the corresponding bead Chip. The differential expression analysis was performed using the Limma package [36]. After normalization gene expression was averaged and each tissue was compared against heathy tissue from the same origin. The inverse log2 (FC) generated by Limma was used to determine differences between comparisons, with the direction of change indicated by the sign. Transcripts with a fold change (FC) > 2 and a p value < 0.01 were deemed differentially expressed.

Patient characteristics
Microdissected lung and lymph node granulomas from 18 sarcoidosis subjects, four TB subjects, three CM subjects and six healthy controls were assessed. There were no significant differences in age (p < 0.1) in the three granulomatous disease groups ( Table 1). The majority of sarcoidosis subjects were female and non-Hispanic whites while the demographic most frequent reported in the CM and TB groups was Hispanic. Clinical data related to organ involvement and radiographic Scadding stage was available for 70% of sarcoidosis subjects as detailed in Additional file 1: Table S1.
After excluding probes with multiple hits or partial hits, our gene expression data in granulomatous tissues included 2430 probes that uniquely mapped to single genes. We averaged gene expression levels after normalization and compared diseased groups versus healthy tissue controls from the same tissue origin, the fold change (FC) was obtained as the ratio between the averages. We identified 250 significantly dysregulated genes between all granulomatous-diseased tissues and healthy control tissues. Figure 1 represents the differentially expressed genes (DEGs) volcano plots, in each category; granulomas from CM exhibited the fewest number of differentiated transcripts, 34 in total (Fig. 1a), sarcoidosis granulomas from the lung (Fig. 1b) had more DEGs than those granulomas from lymph nodes (Fig. 1c), 88 vs 60 dysregulated transcripts respectively. Notably, in both sarcoidosis-affected tissues, the number of downregulated genes (FC < − 2) outpaced the number of upregulated genes (FC > 2), with downregulation more marked in lung granulomas (93% of the transcripts) than in lymph node granulomas (56% of the transcripts). Tuberculosis granulomas ( Fig. 1d), showed the highest number of the differentiated transcripts, 140 in total. The heat maps comparing the gene expression in each group against healthy tissues are presented in Fig. 2. In addition, we screened the total of 250 DEGs among those four sets performing a Venn diagram analysis [37]. Figure 2e shows the overlapping and unique transcripts in the different granulomas. Sarcoidosis granulomas exhibited 138 dysregulated transcripts in both lung (Fig. 2a) and lymph node (Fig. 2b), 87 transcripts were exclusive present in sarcoidosis, notoriously, 30% of the DEGs in lung granulomas were unique to sarcoidosis, and the overlapping transcripts dysregulated in both sarcoidosis tissues was only 4%. TB had 89 DEG that were exclusively dysregulated in TB granulomas. When compared TB to sarcoidosis 17% of the transcripts were dysregulated in both diseases, while only 4% of the transcripts were common between CM and lung sarcoidosis.

Comparisons of expression profiles in sarcoidosis granulomas
We identified that sarcoidosis granulomas from lung and lymph nodes share only ten DEGs. NR1H3 and CXCL9 were upregulated whereas the remainder (ADAMTS1, CXCL2, HSPB6, ITGA9, NPR1, NR4A1, CCL14, FABP4) showed downregulation in both lung and lymph nodes. Interrogation of the expression levels of these 10 sarcoidosis genes to TB and CM DEGs revealed CCL14, CXCL9, FABP4, NR1H3 were also significantly dysregulated in TB. Only six genes (ADAMTS1, HSPB6, NR4A1, CXCL2, NPR1, ITGA9) were exclusively dysregulated in both lung and lymph node in sarcoidosis granulomas but not in CM or TB (Fig. 3). These results are very consistent with the notion that granuloma gene expression is highly tissue-and disease-specific.

Comparison of DEGs in sarcoidosis vs CM and TB granulomas
Comparison of TB lymph node granulomas expression profiles to lymph node healthy controls identified 140 DEGs with a surprising number of DEGs common to sarcoidosis, 43 transcripts, 39 present in lymph granulomas. Unlike sarcoidosis where DEGs were predominantly downregulated, TB DEGs showed a balanced proportion of expression, 64 down-regulated and 76 upregulated. Notably, we observed the same direction in the transcript regulation, in all the 43-shared transcripts, except for one, OLR1, a low density lipoprotein receptor involved in Fas-induced apoptosis, this gene was upregulated in TB and down regulated in sarcoidosis, over-expression of OLR1 result in upregulation of NF-κB and its target prooncogenes [38]. Only three DEGs, SLAMF7, DC27, and CXCL13, were dysregulated in all three diseases (sarcoidosis, TB, CM). Expression of CXCL13, a chemokine and B-lymphocyte chemoattractant associated to calcium influx was downregulated in both sarcoidosis (lymph node) and TB granulomas but upregulated CM granulomas. A list of the top dysregulated genes by granuloma tissue origin is available in Additional file 5: Table S4.

Functional biologic pathway enrichment analysis
We first identified the pathways that were common to all three granulomatous diseases using the gene list with all the DEG (250), cytokine-cytokine receptor interaction, chemokine signaling, VEGFA-VEGFR2 and focal adhesion accounted for the most significant pathways with the most number of proportionally dysregulated genes involved, we then mapped the pathways associated to the genes that were exclusively present in sarcoidosis. Sarcoidosis pathways showed similar altered pathways to the rest of the granulomatous diseases with the exception of nuclear receptors meta-pathway (Table 2). Then we compared the differences in pathways between lung and lymph nodes in sarcoidosis, using the gene set of DEG for each tissue, differences were delineated by a clear immunological response, involving leucocyte migration and neutrophil chemotaxis in the lymph nodes while a structural regenerative response, characterized by cell migration and angiogenesis, was observed at the lung level ( Fig. 4a). At the disease level, we identified pathways that were disease specific for each gene set as seen in Fig. 4b.

Independent validation of disease-specific granuloma transcriptome results
We further validated our results using independent cohorts of microarray expression in lymph node and lung tissues from sarcoidosis and tuberculosis (Additional file 5: Fig. S1). Validation analysis in these microarrays data sets confirmed the dysregulation of 90 genes (FC > 2 FDR < 0.012) of the DEG identified in our panel.

Discussion
Sarcoidosis is a heterogeneous disease lacking validated clinical markers that allow differentiation from other granulomatous diseases or that predict the severity or level of organ involvement. In this study, we unveil the transcriptome profiles of cells comprising granulomas from sarcoidosis, CM, and TB. Using the HTG EdgeSeq oncology biomarker panel system to sequence FFPE biopsied tissues collected at the time of the diagnosis, we avoid limitations such as low RNA yields associated to FFPE extractions. We generated information to further define the molecular mechanisms that occur at the compartment level in sarcoidosis and compared to diseases that share a common histopathological hallmark. Sarcoidosis granulomas showed differences in gene pathway expression between compartments potentially associated to disease progression. Lymph node granulomas exhibited clear immunological responses, involving

Table 2 Pathway analysis
Input list were mapped in ConsensusPathDB. The total of genes identified in all granulomatous tissue in the three granulomatous diseases (250 DEG); then we mapped 87 DEG that were identified as exclusively is Sarcoidosis. We included pathways as defined by different pathway databases with a minimum overlap of 20 genes for the total number of dysregulated genes and a minimum overlap of 8 genes for the sarcoidosis gene set with a p value cut off 0.01

Pathway name Gene set size Genes contained p-value q-value Source
Granulomatous tissue A structural regenerative response was observed in lung granulomas, notoriously marked by cell migration and angiogenesis, that appeared to correspond to lung remodeling stages observed in complicated and chronic sarcoidosis. Only ten DEGs were common to both sarcoidosis lung granulomas and lymph nodes granulomas. ADAMTS1, CXCL2, HSPB6, ITGA9, NPR1, NR4A1, CCL14, CXCL9, FABP4 and NR1H3. Interestingly NR1H3, a regulator of macrophage function and inflammation, and CXCL9 a chemokine ligand involved in T-cell trafficking, were upregulated in both granulomas independently of the tissue origin (FC > 2.6). However, CXCL9 was also upregulated in the TB granulomas. Downregulated genes included FABP4, a fatty acid-binding protein associated with activation of proinflammatory macrophages in breast cancer, obesity and leukemia progression [39,40]. FABP4, was previously proposed as sarcoidosis candidate gene [22,41]. We identified that it was also significantly downregulated in TB. CCL14, a chemokine ligand associated with changes in intracellular calcium concentration in monocytes. Hypercalcemia associated to granulomatous diseases is commonly reported [42][43][44], the potential pathogenic role of CXCL13 and CXCL14, both DEG in sarcoidosis and TB needs further investigation. Despite similarities at the histological level, our analysis revealed sarcoidosis granulomas to be significantly divergent at the genomic level when compared to CM granulomas. TB and sarcoidosis granuloma profiles share stronger similarity at the transcriptional and pathway level. Interestingly, the number of DE genes in each comparison using the same thresholds varied significantly, indicating that the number of dysregulated genes does not correlate with the number of samples per comparison, or the size of the microdissection, suggesting that the granulomas are qualitatively distinct depending on the disease and tissue of origin. We identified only three shared dysregulated transcripts in CM, TB and sarcoidosis (lymph nodes); CXCL13, CD27 and SLAMF7, a finding that points to diverse pathogenic mechanisms in granulomatous diseases that result in a similar histopathological conglomeration of CD4+ T cells surrounded by CD8+ T cells, fibroblasts and B cells. CXCL13 is a chemokine with key role in B cell migration by regulation of Ca++ influx [45], while SLAMF7 is a regulator of T lymphocyte development and function such as lytic activity and a modulator of B cell activation and memory [46]. Furthermore, CD27 regulates B-cell activation and immunoglobulin synthesis through the activation of NF-kappaB and MAPK8/JNK. Our microarray validation data corroborated SLAMF7 DE but only in sarcoidosis lymph node granulomas.
The activation of Th1 cells capable of producing interferon (IFN)-γ is important in the immunological pathogenesis of the granuloma formation [47]. Interferon-inducible neutrophil-driven blood transcriptional signature was previously associated to TB and one of the two sarcoidosis profiles (weaker vs strong IFN-inducible profile), with a higher abundance and expression in tuberculosis [48]. Our panel included probes for IFN  related genes such as IFNGR1, IRF1, IFI27, IFIT2, IFNA2,  IFNAR1, IFNB1, IFNG and IFN regulatory related genes;  IRF2, IRF3, IRF4, IRF5, IRF7 and IRF8. Only IFI27 was significantly upregulated in TB (p 0.0000678 FC > 6) but didn't achieve significance in Sarcoidosis. IFI27 is involved in type-I interferon-induced apoptosis characterized by a rapid release of cytochrome C, related pathways include innate immune system and interferon gamma signaling. The IFN immune related pathways that contribute to sarcoidosis etiopathogenesis are reported to be exacerbated by immunotherapy with IFN alpha [49][50][51]. Further investigation is needed to explain the pathogenic effect of the observed differences at the gene expression level.
This study also served to demonstrate the lack of specificity of prior suggested potential sarcoidosis genomic markers due to common dysregulation in fungal or mycobacterium granulomas. For example, CXCL9, a chemoattractant for lymphocytes previously proposed as a marker of sarcoidosis severity [26], was also overexpressed in TB and is also dysregulated in Beryllium disease [27]. On the other hand, we confirmed downregulation of NOTCH4 expression to be exclusively present in lung sarcoidosis. NOTCH signaling is initiated after activation of toll-like receptors in macrophages and it has been previously reported that Notch receptor ligand Dll4 is expressed in cutaneous sarcoidosis granulomas [52]. Supporting this association, SNPs in NOTCH4 have been reported as a sarcoidosis-associated locus in genomewide associated study in African Americans [1].
We have previously generated a 20-sarcoidosis gene signature by microarray in peripheral blood from PBMCs [33]. We now confirm three genes (SERTAD1, HBEGF, KLRB1) as DEGs by oncopanel profiling. KLRB1 expression was also downregulated in TB and SERTAD1 was significantly downregulated in CM. Therefore, these two genes appear to be part of common process in granulomatous diseases. In contrast, HBEGF, a heparin-binding epidermal growth factor, was down-regulated in sarcoidosis lung granulomas (FC − 2.36, p = 0.0016) and was not dysregulated in CM or TB. HBEGF is involved in epithelialization, wound contraction and angiogenesis with an active pro-inflammatory role in skin and lung alveolar regeneration [24], pneumonitis and early stages of systemic sclerosis [25]. Importantly, within the sarcoidosis PBMC molecular signature, HBEGF expression was up-regulated in contrast to the observed down-regulation in granuloma tissues. This variation in gene expression across tissues may be explained by either heritability at cis loci, by trans regulation across tissues [53,54] or by epigenetic mechanisms. Our previously conducted epigenetic studies in sarcoidosis identified IL6ST and STAB1 as part of a miRNA-derived gene signature for complicated sarcoidosis [5]. In the current study, IL6ST was significantly dysregulated only in TB granulomas (FC − 1.42 and p 0.0008), while STAB1 gene was significantly downregulated in lymph nodes in sarcoidosis (FC − 1.64, p 0.0014). STAB1 codes for a transmembrane receptor that is expressed in endothelial cells and lymph nodes, with functions in angiogenesis, lymphocyte homing and cell adhesion which are key aspects in the chronicity of the granuloma.
Leveraging the power derived from publicly available microarray data sets from independent cohorts, we successfully validated ADAMST1, CXCL2 and FABP4, that were identified DEGs by the oncopanel. These three genomic markers were present exclusively in sarcoidosis in both lung and lymph node. NPR1, Natriuretic Peptide Receptor 1, is associated with cardiac hypertrophy and fibrosis [55] and has not been previously associated to sarcoidosis, was present in the lung and lymph node transcriptomic profiles and was also validated in microarray datasets but only in sarcoidosis granulomas from lymph node. ADAMTS1, gene encodes a disintegrin and metalloproteinase with thrombospondin motif associated with various inflammatory processes and a potentially important regulator of pathogenic granuloma formation in sarcoidosis in both compartments.
There are some limitations of the current study. One is related to RNA from FFPE tissues, despite the use of quantitative nuclease protection chemistry that enabled extraction-free quantitation of mRNA, RNA quantity and quality are limited in FFPE compared to cryopreserved tissues. Other limitations include the use of a sequencing panel and not whole genome sequencing, that limited number of gene probes and the expression levels of some genes was undetected. Our comparisons were against tissues and not immune cells, that and the difference in platforms (RNAseq vs microarrays) can explain the limited overlap to already published data sets. Last, the only gene expression data available in CM was microarray data in mice strains limiting the validation of our results in CM. Despite the observed differences at the pathway level in sarcoidosis granulomas at lung and lymph node that indicate disease progression, further investigations are needed to corroborate the expression of these genes in sarcoidosis heterogeneous subphenotypes since limited clinical data was available to establish these correlations.

Conclusions
Sarcoidosis is a complex systemic disease of unknown cause with clinical care hampered by the paucity of available tools for diagnosis and prognosis. Our study is the first to identify RNAseq-derived transcriptome differences in granulomas from lung and lymph nodes of sarcoidosis subjects compared to CM and TB. Our data suggest that immunological signaling related-pathways distinguishes sarcoidosis from other granulomatous diseases and fills a crucial gap in knowledge of gene expression within granulomas. This study corroborates previous genomic markers suggested for sarcoidosis such as STAB1, HBEGF, FABP4 and NOTCH4, with HBEGF and NOTCH4 only expressed in lung granulomas. We have identified new genomic markers for sarcoidosis, ADAMTS1, CXCL2, and NPR1. This relevant translational information in sarcoidosis and other two histopathological similar infections provides important information for the future development of meaningful specific genomic-derived biomarkers for sarcoidosis diagnosis and prognosis.