High throughput 16SrRNA gene sequencing reveals the correlation between Propionibacterium acnes and sarcoidosis

Objective This study aims to use high throughput 16SrRNA gene sequencing to examine the bacterial profile of lymph node biopsy samples of patients with sarcoidosis and to further verify the association between Propionibacterium acnes (P. acnes) and sarcoidosis. Methods A total of 36 mediastinal lymph node biopsy specimens were collected from 17 cases of sarcoidosis, 8 tuberculosis (TB group), and 11 non-infectious lung diseases (control group). The V4 region of the bacterial 16SrRNA gene in the specimens was amplified and sequenced using the high throughput sequencing platform MiSeq, and bacterial profile was established. The data analysis software QIIME and Metastats were used to compare bacterial relative abundance in the three patient groups. Results Overall, 545 genera were identified; 38 showed significantly lower and 29 had significantly higher relative abundance in the sarcoidosis group than in the TB and control groups (P < 0.01). P. acnes 16SrRNA was exclusively found in all the 17 samples of the sarcoidosis group, whereas was not detected in the TB and control groups. The relative abundance of P. acnes in the sarcoidosis group (0.16% ± 0. 11%) was significantly higher than that in the TB (Metastats analysis: P = 0.0010, q = 0.0044) and control groups (Metastats analysis: P = 0.0010, q = 0.0038). The relative abundance of P. granulosum was only 0.0022% ± 0. 0044% in the sarcoidosis group. P. granulosum 16SrRNA was not detected in the other two groups. Conclusion High throughput 16SrRNA gene sequencing appears to be a useful tool to investigate the bacterial profile of sarcoidosis specimens. The results suggest that P. acnes may be involved in sarcoidosis development. Electronic supplementary material The online version of this article (doi:10.1186/s12931-017-0515-z) contains supplementary material, which is available to authorized users.


Background
Sarcoidosis is a systemic granulomatous disease and often involves the respiratory system. The disease frequently affects middle-aged adults, in particular women aged 40-50 years. The histopathology of sarcoidosis is characterized by non-caseous necrotizing granulomas, and the clinical presentation of the disease is very diverse [1]. The exact etiology and pathogenesis of sarcoidosis remain unclear although several factors, such as infection [2], genetic defects [3], immune dysfunction [4], and exposure to environmental pollutants [5], have been indicated to contribute to sarcoidosis development.
Previous reports studying the association of infection and sarcoidosis mainly focus on Mycobacterium tuberculosis (MTB) and propionibacteria, and show controversial results. Eishi et al. [6] used real-time quantitative RT-PCR to analyze bacterial DNA in the lymph node biopsy specimens of patients with sarcoidosis, TB, or other lung diseases and suggested that propionibacteria was more likely than mycobacterium to be an etiological factor of sarcoidosis. Contrarily, other reports indicate a correlation of MTB and sarcoidosis. Oswald et al. [7] found that the CD4+ and/or CD8+ T cells isolated from the bronchoalveolar lavage fluid of patients with sarcoidosis were more likely than the T cells from the control patients to be stimulated by mycobacterial ESAT-6 protein, whereas the response to P. acnes proteins was similar between the T cells of the sarcoidosis group and the T cells of the control group. Our previous studies used PCR to determine the copy number of mycobacteria and propionibacteria in the paraffinembedded lymph node biopsy specimens of patients with sarcoidosis, TB, or other lung diseases and indicated that MTB might not be associated with sarcoidosis [8] but propionibacteria could be [9].
Most of the previous studies investigated only a few bacterial species. A full bacterial profile of the tissue specimens of patients with sarcoidosis is still lacking. The current study is filling this knowledge gap. In the current study, we aim to conduct high throughput 16SrRNA gene sequencing to investigate the bacterial profile of mediastinal lymph node specimens. The 16SrRNA gene is present in the genome of bacteria, chlamydia, rickettsia, mycoplasma, spirochaeta, and actinomycetes, but absent in eukaryotic organisms, viruses, and fungus. The sequence of V4 region of the bacterial 16SrRNA gene, which varies greatly in different bacterial species, is often considered as a unique signature for a bacterial species and thus commonly used for bacterial taxonomic classification [10][11][12]. The V4 region is flanked by evolutionary conserved regions. In the current study, the conserved regions were used to design PCR primers. The V4 region was then amplified by PCR with the primers and then sequenced. The sequences were analyzed and compared to the 16SrRNA gene sequence database, GreenGene database, by bioinformatics approaches to identify the bacterial taxonomic annotation [11,13]. In the current study, we are to use bacterial 16SrRNA gene sequencing to characterize the bacterial profile of the tissue specimens of patients with sarcoidosis and identify the potential pathogenic bacteria that may correlate with the disease.

Tissue specimens
The protocol for collecting and handling patients' biopsy specimens has been approved by the Institutional Review Board of Shanghai Pulmonary Hospital and Tongji University School of Medicine (Approval No: 2011-FK-10). Biopsy specimens from the mediastinal lymph node of 36 patients, who were treated in Shanghai Pulmonary Hospital between August 2014 and August 2015, were collected and analyzed. All the 36 patients, including 17 cases of sarcoidosis, 8 cases of TB, and 11 cases of control, were inpatients of Shanghai Pulmonary Hospital. The 8 patients with TB were newly diagnosed based on positive MTB in their sputum. The 11 controls included 6 cases of nonspecific lymphadenitis and 5 cases of mediastinal lymph node metastasis of lung cancer. The clinical data of the patients are displayed in Table 1. All the participating patients signed the informed consent.

Diagnostic criteria
The diagnostic criteria for sarcoidosis followed the 1999 guidelines of ATS/ERS/WASOG [1]. The histopathology of lymph node biopsy of the patients with sarcoidosis was characterized by non-caseous necrotizing granulomas and showed negative TB, fungus and/or parasitic infection, tumor, and vasculitis. Additionally, to further confirm sarcoidosis, smear-negative TB was excluded based on a negative TB-polymerase chain reaction (PCR) [8] and the comprehensive differential diagnosis scoring system for sarcoidosis and atypical TB [14]. All the 17 patients with sarcoidosis were followed up for 1 year.
Patients with TB were diagnosed according to the 2013 tuberculosis diagnosis and treatment guidelines recommended by the China Medical Association Tuberculosis Society [15]. The diagnosis of TB was based on acid-fast bacilli smear-positive sputum, positive Mycobacterium tuberculosis (M. TB) in sputum culture, positive M. TB in bronchopulmonary lavage fluid, or positive staining for acid-fast bacilli in biopsy specimens. All the patients with TB underwent anti-TB therapy and cured. The lymph node biopsy samples of lymphadenitis and lung cancer metastasis were provided by thoracic surgeons of Shanghai Pulmonary Hospital, and the diagnosis was confirmed.

Gene sequencing of 16SrRNA
Because the sequence of V4 region of the bacterial 16SrRNA gene is a unique signature of a bacterial species, the sequence can be used to identify the specific bacterial taxonomic position [10]. The V4 region is flanked by evolutionary conserved regions, which were used to design PCR primers. The V4 region was then amplified by PCR with the primers and then sequenced. Genomic DNA was extracted from the tissue specimens using the TIANquick FFPE DNA Kit (TIANGEN, China) according to the instruction of the kit. The genomic DNA was then used as the template in the PCR to amplify the 16SrRNA gene V4 region (nucleotide position from 515 to 806). The primer sequences are: 5'-GTGCCAGCMGCCGCGGTA-3' (forward primer) and 5'-GGACTACHVGGGTWTCTAAT-3' (reverse primer). M stands for nucleotide A or C; H stands for nucleotide A, C, or T; V stands for nucleotide A, C, or G; W stands for nucleotide A or T [10]. Phusion® High-Fidelity PCR Master Mix (New England Biolabs, USA) was used for PCR reaction. The PCR reaction condition was: initial denaturation at 98°C for 1 min; 30 cycles of denaturation at 98°C for 10 s, annealing at 50°C for 10 s, extension at 72°C for 60 s; final extension at 72°C for 5 min. The PCR products were electrophoresed, collected from the gel using the GeneJET Gel Extraction Kit (Thermo Scientific, USA). The purified PCR products were used to construct a DNA sequencing library using the the NEB Next® Ultra™ DNA Library Prep Kit from Illumina (San Diego, CA, USA) according to the manufacturer's instruction. The quality of the DNA library was examined using the Quibit® 3.0 Fluorometer 2.0 (Thermo Fisher Scientific, USA) and Agilent Bioanalyzer 2100 system. The DNA library was then sequenced on the Illumina MiSeq platform (CA, USA). The collected raw sequencing data were processed according to the following steps: 1) The sequences of Barcode and primers were first removed from the raw sequencing data. 2) The sequencing data were then analyzed using the software FLASH version 1.2.7 (http://ccb.jhu.edu/ software/FLASH/) [16]. For each sample, the sequencing fragments were aligned and connected to form raw sequencing tags.
3) The raw sequencing tags with low quality sequencing data were removed. 4) The final sequencing tags were compared against the Genomes OnLine Database (Gold) database (http://drive5.com/ uchime/uchime_download.html) using the UCHIME Algorithm (http://www.drive5.com/usearch/manual/uchi-me_algo.html) [17] to identify chimeric sequences. The chimeric sequences were removed from the final sequencing tags [18]. The resulting sequences were the effective sequencing tags and used for bacterial taxonomic annotation.

Sequence analysis
The identified 16SrRNA sequences (the effective sequencing tags) were annotated using the Uparse software package v7.0.1001 (http://drive5.com/uparse/) [19]. The annotation procedure includes the following steps: 1) Sequences that share ≥ 97% identity were considered at the same taxonomic position (belong to the same bacterial species) and assigned to one operational taxonomic unit (OTU). 2) The Uparse software automatically selected representative sequences when constructing OTUs. 3) The representative sequences were used to annotate the OTUs. OTUs were annotated with taxonomic information using the 16SrRNA gene sequence database, Green-Gene database, (http://greengenes.lbl.gov/cgi-bin/nph-in dex.cgi) [20] and the RDP Classifier (Version 2.2, http:// sourceforge.net/projects/rdp-classifier/) [21]. The sequences were annotated with taxonomic Kingdom, Phylum, Class, Order, Family, Genus, and Species. The relative abundance of each bacterial species in a sample was also determined by the software.

Statistical analysis
To estimate if the identified 16SrRNA sequences could include all the bacteria in the samples, we conducted alpha diversity analysis. The alpha diversity was analyzed using the Quantitative Insights into Microbial Ecology (QIIME) software (Version 1.7.0) [22]. To evaluate alpha diversity, we calculated Chao index representing bacterial species abundance, number of observed species, and Shannon index representing bacterial community diversity. Rarefaction curves were plotted using the three parameters as vertical axes and the number of identified sequences as horizontal axes. Chao index, number of observed species, and Shannon index increase as the number of identified sequences increases in a sample, which indicates that unique bacterial species are increasingly identified in a sample. Saturation of the rarefaction curves indicate that the number of unique bacterial species does not increase as the number of identified sequences increases, which suggests that the identified 16SrRNA sequences may cover all the bacteria in the sample. Bacterial relative abundance was determined to identify the most predominant bacterial species, which represent the bacterial profile in the samples. The Quantitative Insights into Microbial Ecology (QIIME) software (Version 1.7.0) and Uparse software package v7.0.1001 (http:// drive5.com/uparse/) were used to analyze bacterial abundance. The Metastats software (http://metastats.cbcb.umd.edu/) was used to compare the difference in bacterial abundance between different groups. The Metastats software is specifically designed to compare bacterial abundance by quantitatively analyzing the 16SrRNA gene sequences [23].
Intra-group comparison was performed to evaluate the differences in the bacterial profile of samples in the same patient group. Inter-group comparison was performed to estimate the difference in the bacterial profile of the sarcoidosis, control, and TB patient groups. Because bacterial profile is expected to vary considerably in different individuals, evaluation of inter-and intra-group difference may reveal the unique characteristics of bacterial profile in sarcoidosis, control, and TB patient groups. For inter-and intra-group comparison of bacterial profile, R software (version 3.1.3) was used for nonparametric multi-response permutations procedure (MRPP) [24] and the analysis of similarity (Anosim) [25]. An A value (calculated from the MRPP) > 0 represented inter-group difference > intra-group difference; A < 0 represented inter-group difference < intra-group difference, P < 0.01 was considered statistically significant. An R value (calculated from the Anosim) > 0 represented inter-group difference > intra-group difference; R < 0 represented inter-group difference < intra-group difference; P < 0.01 was considered statistically significant. A P value was first calculated for inter-group difference in relative abundance. The P value was then adjusted to q value using an algorithm established by Storey and Tibshirani [26]. Based on both the P and q values, Taxonomic units showing significantly different relative abundance in the three groups were identified. When both P < 0.01 and q < 0.01, the differences were considered statistically significant. The software Graphpad prism 5 was used to prepare the curves of bacterial relative abundance. The software SPSS 20.0 was used for chi-square analysis of the inter-group difference in bacterial positive rate; P < 0.01 was considered statistically significant.

Patients' clinical data
Patients in the sarcoidosis, TB, and control groups showed similar age, body mass index, and erythrocyte sedimentation rate ( Table 1). The majority of patients (n = 10) with sarcoidosis were at chest X-ray stage I. The percentages of specific T cell populations, including CD3, CD4, and CD8, were similar in the sarcoidosis and TB groups. So were the levels of IgG, IgA, IgM, C3, and C4 (Table 1).

Sequence data and OTUs
Sequencing data are presented in Additional file 1: Table S1. A total of 8748 unique OTUs were constructed by the Uparse software, and an average of 1768 unique OTUs in each sample. The sequences of all the 8748 unique OTUs are displayed in Additional file 2: Table S2. The OTUs were annotated at phylum, class, order, family, genus, and species taxonomic levels using the RDP Classifier Version 2.2 (http://sourceforge.net/projects/rdp-classifier/) and the 16SrRNA sequence database, GreenGene database (http://greengenes.lbl.gov/cgi-bin/nph-index.cgi). The annotations of all the 8748 are presented in Additional file 3: Table S3. At the taxonomic genus level, a total of 545 unique bacterial genera were identified in all the samples. The relative abundances of the 545 bacterial genera in each sample are presented in Additional file 4: Table S4. At the taxonomic genus level, the OTUs were assigned to 545 genera based on the Greengenes database (Additional file 4: Table S4). In the sarcoidosis group, 478 unique bacterial genera were identified; the top 10 most abundant genera were: Shewanella, Pseudomonas, Acinetobacter, Lactobacillus, Prochlorococcus, Bifidobacterium, Escherichia, Halomonas, Pediococcus, and Rhodococcus ( Table 2). The control group showed 470 unique bacterial genera, including Bacteroides, J2-29, Prevotella, Methylobacterium, Oscillospira, [Prevotella], Roseburia, Ruminococcus, Clostridium, and Helicobacter as the top ten most abundant bacterial genera ( Table 2). The TB group contained 489 unique bacterial genera, and the most abundant ones were Bacteroides, J2-29, [Prevotella], Methylobacterium, Prevotella, Clostridium, Methyloversatilis, Succinivibrio, Sutterella, and Salinispora ( Table 2). All of the most abundant bacterial genera in the three patient groups belonged to the four bacterial Phylum: Proteobacteria, Firmicutes, Bacteroidetes, and Fusobacteria.

Alpha diversity of the sequencing data and the analyses of intra-and inter-group difference in bacterial profile
The rarefaction curves of number of observed species (Fig. 1a), Chao index (Fig. 1b), and Shannon index (Fig. 1c) reach a plateau, suggesting that the identified sequences may sufficiently cover the bacteria in the samples. The rank abundance curves (Fig. 1d) also become stable, indicating that species distribution is uniform. Both the Anosim and MRPP analyses revealed that the inter-group difference in bacterial profile was larger than the intra-group difference and the inter-group difference was statistically significant (All P < 0.01, Table 3).

Propionibacterium was specifically detected in patients with sarcoidosis
We compared the relative abundance of each genus in the three patient groups. A total of 67 bacterial genera showed significantly higher or lower relative abundance in the sarcoidosis group compared with the TB and control groups (P < 0.01, q < 0.01). Among the 67 genera, 38 showed significantly lower relative abundance in the sarcoidosis group than in the TB and control groups (P < 0.01, q < 0.01, Table 4), and 29 showed significantly higher relative abundance in the sarcoidosis group than in the TB and control groups (P < 0.01, q < 0.01, Table 5).
Among the 29 bacterial genera with significantly higher relative abundance in the sarcoidosis group, The Propionibacterium Genus was found in all the 17 patients with sarcoidosis (100% positive rate) but was completely absent in all the 8 patients with TB (0% positive rate, χ2 = 25, P < 0.0001) and 11 control patients (0% positive rate, χ2 = 28, P < 0.0001). In contrast, the other 28 bacterial genera were detected in all the patients in the three groups, suggesting that these bacterial genera may not be specific for patients with sarcoidosis. The relative abundance of the Propionibacterium Genus in the sarcoidosis group was 0.16% ± 0. 11%, which was significantly higher than that in the control (0%, P = 0.001, q = 0.0055) and TB groups (0%, P = 0.0010, q = 0.0063).
The two predominant bacterial species in the Propionibacterium Genus are Propionibacterium acnes (P. acnes) and Propionibacterium granulosum (P. granulosum). P. acnes was detected in all the 17 patients with sarcoidosis, whereas P. granulosum was detected in only 5 of the 17 patients. Both Propionibacterium species were not found in the control and TB groups. The relative abundance of P. acnes in the sarcoidosis group (0.16% ± 0. 11%) was significantly higher than that in the control (0%, P = 0.0010, q = 0.0038) and TB (0%, P = 0.0010, q = 0.0044) groups (Fig. 2a). In contrast, the relative abundance of P. granulosum in the sarcoidosis group (0.0022% ± 0. 0044%), which was extremely lower than the relative abundance of P. acnes, was not significantly different compared with that in the control (0%, P = 0.088, q = 0.18) and TB (0%, P = 0.19, q = 0.38) groups (Fig. 2b). These data indicate that P. acnes may be more likely to correlate with sarcoidosis than other bacteria. The process of identification of the unique bacterium in the sarcoidosis group is displayed in Fig. 3.

Discussion
The association of infection and sarcoidosis etiology has been studied extensively. Molecular biological approaches have been used to search bacterial genome in sarcoidosis lesions [33]. Previous studies have shown that possible pathogenic organisms for sarcoidosis include propionibacteria [8,33,34], mycobacterium [7,8,27,28], borrelia [29,30], mycoplasma [31],  and chlamydia [32]. Particularly, MTB and propionibacteria are the most frequently reported bacteria correlating with sarcoidosis [35,36]. In spite of the efforts, no consistent and definitive conclusion has been drawn. Piotrowski et al. reported that MTB infection triggered a sarcoid reaction, and thus they proposed that TB and sarcoidosis might be different clinical manifestations of MTB infection [28]. However, our previous study, which determined the copy number of MTB in lymph node specimens of patients with TB or sarcoidosis, suggests that MTB may not be the pathogenic bacterium of sarcoidosis [8].
Propionibacteria are gram positive anaerobic bacteria and belong to the normal flora of the skin. Since Abe et al. first isolated P. acnes from lymph node biopsy specimens of patients with sarcoidosis in 1984 [34], P. acnes has been suspected to be a pathogenic bacterium for sarcoidosis. Negi et al. used P. acnes-specific monoclonal antibodies that react with cell-membrane-bound lipoteichoic acid (PAB antibody) in immunohistochemistry to locate P. acnes in the lung tissue and lymph node specimens of patients with sarcoidosis, and they discovered that the PAB antibodies specifically bound to the sarcoidosis specimens but did not react with tissues from patients with non-sarcoidosis disease [33]. We previously performed a meta-analysis to systematically review the articles that focused on the association of P. acnes and sarcoidosis [37], and our results support that P. acnes is highly likely a pathogen of sarcoidosis.
On the other hand, other studies indicate that P. acnes may not be associated with sarcoidosis. Ishige et al. found that 56% -73% of specimens (lung tissues: 24/43, mediastinal lymph node specimens: 8/11) from patients with non-sarcoidosis diseases showed positive P. acnes culture [38]. Oswald et al. discovered that compared with the CD4+ and/or CD8+ T cells from the controls, the T cells from patients with sarcoidosis reacted to mycobacterial ESAT-6 protein more frequently but showed similar response to P. acnes proteins [7]. They also used matrix-assisted laser desorption ionization imaging mass spectrometry to locate the ESAT-6 protein and the P. acnes proteins in the tissue specimens and found that ESAT-6 signals specifically existed in sarcoidosis granulomas, whereas P. acnes signals were distributed non-specially in the tissues [7]. These studies suggest that P. acnes may be not involved in sarcoidosis etiology.
To identify potential pathogenic bacteria of sarcoidosis, we previously used quantitative PCR to quantify the copy number of 16SrRNA gene of P. acnes and P. granulosum in the lymph node specimens and found that propionibacteria existed in 80% of sarcoidosis specimens but only in 4.5% of TB specimens [9]. To further confirm our previous conclusions, in the current study, we used high throughput 16SrRNA gene sequencing to investigate the bacterial profile of lymph node specimens of sarcoidosis, TB, or other lung diseases. Our results showed that P. acnes were exclusively present in sarcoidosis samples, and both P. acnes and P. granulosum were not detected in the TB and control groups. These findings clearly support that P. acnes is associated with sarcoidosis.
The possible pathogenic mechanism of P. acnes in sarcoidosis development has been tested in animal models. Kouji Iio et al. used heat-inactivated P. acnes and Freund's complete adjuvant to induce pulmonary granuloma in mice, and found that the total number of lymphocytes and the ratio of CD4+ to CD8+ T cells in the bronchoalveolar lavage fluid of the mice with pulmonary granuloma were higher than those of the control mice [39]. Their study suggests that P. acnes may cause pulmonary allergic inflammation, which may consequently lead to granuloma. Eishi et al. also proposed that P. acnes could cause sarcoidosis by an allergic endogenous infection [40]. P. acnes or the bacteria-related antigens may interact with the pattern recognition receptors to stimulate macrophages and T cells to release pro-inflammatory factors, such as IFN-γ, IL-2, TNF-α, and IL-6, which ultimately disrupts the immune homeostasis in patients and induces sarcoidosis development [36]. Thus, P. acnes may cause sarcoidosis by interfering in the immune homeostasis.
Compared with quantitative PCR approach, high throughput16SrRNA gene sequencing is a more effective approach to investigate bacterial profile of biopsy specimens and identify the predominant bacterial species in the specimens. The sequencing approach also allows us  to compare the diversity and relative abundance of bacteria in specimens. Although quantitative PCR approach may be sensitive, low-cost, and fast, the sequencing approach can produce data for comprehensive analyses, such as bacterial taxonomy, bacterial diversity, and the correlation of bacterial profile evolution and sarcoidosis progression. The current study first used high throughput sequencing approach to investigate the bacterial profile in lymph node specimens of patients with sarcoidosis and to identify that propionibacteria were exclusively present in the sarcoidosis specimens. The advantages of the sequencing method include high throughput, high reproducibility, and high accuracy. The limitations of the sequencing method include technically complex and requiring special instrument and software for the analysis. In addition, the roles of those bacteria with significantly higher or lower abundance in the sarcoidosis group compared with the control and TB groups in the development of sarcoidosis remain unclear. Furthermore, the control groups in the current study included lymph node specimens of patients with nonspecific lymphadenitis and of patients with mediastinal lymph node metastasis of lung cancer, which may contribute to the highly various bacterial profiles in the control group. Another limitation of the current study is that the number of tissue specimens in the current study was relatively small. We are currently collecting more samples to further investigate the dynamic evolution of bacterial profiles in the sarcoidosis lesions at different disease stages.