Integrated plasma proteomics and lung transcriptomics reveal novel biomarkers in idiopathic pulmonary fibrosis

Background Idiopathic pulmonary fibrosis (IPF) is a fatal lung disease with a significant unmet medical need. Development of transformational therapies for IPF is challenging in part to due to lack of robust predictive biomarkers of prognosis and treatment response. Importantly, circulating biomarkers of IPF are limited and none are in clinical use. Methods We previously reported dysregulated pathways and new disease biomarkers in advanced IPF through RNA sequencing of lung tissues from a cohort of transplant-stage IPF patients (n = 36) in comparison to normal healthy donors (n = 19) and patients with acute lung injury (n = 11). Here we performed proteomic profiling of matching plasma samples from these cohorts through the Somascan-1300 SomaLogics platform. Results Comparative analyses of lung transcriptomic and plasma proteomic signatures identified a set of 34 differentially expressed analytes (fold change (FC) ≥  ± 1.5, false discovery ratio (FDR) ≤ 0.1) in IPF samples compared to healthy controls. IPF samples showed strong enrichment of chemotaxis, tumor infiltration and mast cell migration pathways and downregulated extracellular matrix (ECM) degradation. Mucosal (CCL25 and CCL28) and Th2 (CCL17 and CCL22) chemokines were markedly upregulated in IPF and highly correlated within the subjects. The mast cell maturation chemokine, CXCL12, was also upregulated in IPF plasma (fold change 1.92, FDR 0.006) and significantly correlated (Pearson r = − 0.38, p = 0.022) to lung function (%predicted FVC), with a concomitant increase in the mast cell Tryptase, TPSB2. Markers of collagen III and VI degradation (C3M and C6M) were significantly downregulated (C3M p < 0.001 and C6M p < 0.0001 IPF vs control) and correlated, Pearson r = 0.77) in advanced IPF consistent with altered ECM homeostasis. Conclusions Our study identifies a panel of tissue and circulating biomarkers with clinical utility in IPF that can be validated in future studies across larger cohorts. Supplementary Information The online version contains supplementary material available at 10.1186/s12931-021-01860-3.

proteomic disease signatures generated from clinically relevant human samples including tissue and plasma, combined with robust "in silico" modeling can enable translational disease understanding, diagnosis and stratification of patients for effective drug treatments.
Past studies have profiled gene expression in lung tissues, peripheral blood and isolated cells through microarray and bulk/single cell RNA-sequencing analyses, identifying aberrant cell populations as well as molecular signatures of progressive IPF [7][8][9][10][11][12]. Recent IPF biomarker efforts have focused on identification of circulating biomarkers using plasma/serum or secreted biomarkers in matrices such as Broncho alveolar lavage, sputum and breath condensate obtained through minimally invasive procedures [13][14][15]. However, most studies have primarily used samples from progressive IPF patients where tissue biopsy is not in routine clinical practice. Given that IPF pathology is complex and involves interplay of tissue resident and infiltrating cells resulting in progressive and extensive tissue remodeling and scarring [16], it is possible that the peripheral biomarker signature may not accurately capture tissue level changes in advanced disease. For example, a crosslinked fibrotic extracellular matrix (ECM) could act as a barrier or trap preventing the detection of relevant disease biomarkers in circulation. Thus far, there have not been studies comprehensively examining tissue and plasma molecular signatures in unison. Using well annotated lung tissue samples from a cohort of transplant stage IPF patients in comparison to acute lung injury and healthy controls, we previously reported a transcriptomic fingerprint of advanced IPF enriched in pathways of T-cell activation, immune response and ECM remodeling [17]. These studies also identified novel gene associations to lung function as well as unique isoform regulation in IPF lung.
We hypothesized that a combined analysis of lung and plasma gene/protein signatures will identify robust biomarkers with potential clinical utility. Here, we have performed unbiased proteomic analyses of matching plasma from the advanced IPF cohort through the Somascan-1300 aptamer platform and compared the plasma proteome signature to the previously reported lung transcriptome signature. Our data reveal a striking enrichment of pathways involved in chemotaxis/Th2 chemokine and T-cell signaling, Wnt signaling, mast cell migration and activation, and extracellular matrix degradation in both tissue and plasma of advanced IPF. Notably, the Th2 chemokines CCL17 and CCL22 as well as mucosal chemokines CCL25 and CCL28 were robustly upregulated in IPF and correlated within subjects. The mast cell maturation chemokine CXCL12 was also increased in IPF together with a concomitant increase in the mast cell protease, TPSB2. Neoepitopes of collagen type III and VI degradation (C3M and C6M) were strongly downregulated and highly correlated in advanced IPF subjects. Our data provide a comprehensive signature of IPF tissue and plasma that could be potentially validated and utilized for clinical assessment of advanced IPF.

Human subjects and sample acquisition
All human subject sample acquisitions and experiments were conducted with the appropriate approval from the Institutional Review Board (IRB 806468, IRB 813685). The clinical profile and demographics of IPF, ALI and control subjects used in this study have been previously described [17]. The IPF cohort consisted of 36 subjects with advanced IPF (mean % predicted forced vital capacity of 44) that underwent lung transplantation at the University of Pennsylvania. The ALI and control cohorts consisted of subjects whose donated lungs were deemed ineligible for lung transplantation. Explant samples were evaluated by an experienced thoracic pathologist who classified samples as ' ALI' based on the presence of diffuse alveolar damage or as 'control' if no abnormal pathology was present.

RNA sequencing in lung tissues
Details on the RNA-sequencing method and analyses have been described previously in detail [17].

SomaLogic proteome assay
Plasma samples were collected in Citrate EDTA tubes in operating room prior to explant, centrifuged at 4 ℃ at 3521 RPM for 10 min, and stored at − 80℃. Plasma samples were analyzed on the SOMAscan V2 multiplex proteomic assay (SomaLogic, Boulder CO)-an aptamerbased quantitative proteomic biomarker discovery platform which measures 1033 analytes [18,19]. The assay covers a broad range of proteins associated with disease physiology and biological functions, including cytokines, kinases, growth factors, proteases and their inhibitors, receptors, hormones and structural proteins. Plasma samples were distributed randomly in 96-well microtiter plates and the assay operators were blinded to the identity of all samples. Assay results were reported in normalized relative fluorescence units (RFU).

Plasma data analyses
Each sample in the study was normalized by aligning the mean to a common reference. Inter-plate and inter-run calibration were achieved by applying a multiplicative scaling coefficient to each SOMAmer. These scaling factors were calculated using the eight reference calibrators on each plate. Sample data were first normalized to remove hybridization variation within a run followed by median normalization across all samples to remove other assay biases within the run and finally calibrated to remove assay differences between runs. Log transformed RFU values were used to analyze differential expression of biomarkers across cohorts by using the Limma linear modeling framework for differential expression [20,21]. Pathway analyses was performed with the Metacore Genego platform using differential protein signatures obtained with a cutoff of 1.5-fold change and 0.1 FDR (false discovery rate).

Statistical analyses
Statistical analyses of differential gene expression data using the R package has been described previously [17]. SomaLogic bulk proteome data were analyzed as described in the "plasma data analyses" section. Comparison of transcriptome and proteome data was achieved by generating analyte lists with a similar cutoff of Fold change ≥ ± 1.5 and FDR of ≤ 0.1. Correlation between analytes as well as analyte-FVC correlations were assessed using the Pearson correlation analyses. Differential expression of individual analytes (gene and protein) as well as collagen neoepitope data were analyzed by one-way ANOVA followed by Tukey's post test with differences considered significant at P < 0.05.

SomaLogic profiling of IPF plasma
Plasma samples from the IPF, ALI and healthy control cohorts were analyzed using the Somascan 1300plex aptamer platform. Normalized log-transformed data were used to generate protein expression intensities. Expression data were visualized using t-scholastic neighborhood enrichment (t-SNE), that showed a robust separation of the IPF samples from the control and ALI samples (Fig. 1a). Further unbiased hierarchical clustering of the data revealed a strong clustering of the majority of IPF samples, thus revealing a proteomic fingerprint of advanced IPF (Fig. 1b). Notably, the ALI and control samples clustered together. These data suggested that the observed differences in protein intensities were primarily driven by the IPF disease state.
Advanced IPF plasma proteome shows a strong signature of chemokine signaling, mast cell activation, Wnt signaling and extracellular matrix homeostasis Using cutoffs of 1.5 and 0.1 for fold change and FDR respectively, we identified 236 differentially regulated proteins between IPF and control cohorts and 235 between IPF and ALI cohorts. Only two differentially regulated proteins were identified in the ALI vs control contrast. Therefore, subsequent analyses focused primarily on the differences between IPF and control cohorts. Tables 1 and 2  and downregulated proteins respectively in the IPF vs healthy contrasts (complete list of protein changes in IPF vs healthy contrast is provided in Additional file 1: Table S1). Metacore pathway analyses of differentially expressed proteins showed a striking modulation of chemotactic and immune pathways, mast cell migration and activity, as well as TGFβ signaling and ECM degradation and remodeling ( Fig. 2a, b). As observed previously in our lung transcriptome signature, pathways involved in T-cell activation were distinctly upregulated in IPF plasma (Fig. 2a). Particularly interesting was the marked increase in a variety of chemokines involved in T-cell and other immune cell signaling. The eosinophilic chemokine CCL11, mucosal chemokines, CCL25 and CCL28, and the Th2 chemokines CCL17 and CCL22 were strongly upregulated in IPF plasma. The mast cell and lymphocyte chemoattractant SDF1/CXCL12, mast-cell derived chemokine CCL21, and the mast cell tryptase, TPSB2 were all markedly increased in IPF plasma. The Wnt signaling enhancers SPON1 and RSPO3 were also significantly increased together with a concomitant increase in the Wnt receptor, Frizzled B. Pathway analyses also revealed a marked regulation of ECM remodeling networks with a reduction in ECM remodeling proteases such as TIMP1 and SERPINs and an increase in profibrotic matrix molecules such as SPARC and Vitronectin (Fig. 2b). These changes were consistent with an advanced fibrotic disease state in our IPF cohort, where ECM synthesis is expected to significantly exceed degradation.

Correlation of pathway markers within the IPF cohort
Intrigued by the strong enrichment of chemotactic and profibrotic signaling pathways in the IPF plasma, we sought to analyze the relationship between multiple components of the regulated pathways within the IPF cohort. These analyses showed that the expression of the mucosal chemokines CCL25 and CCL28 as well as that of the Th2 cytokines CCL17 and CCL222 were significantly correlated within the IPF subjects ( Fig. 3a, b). Likewise, Wnt pathway molecules RSPO3, FRZB, and SPON1 were also strongly and significantly correlated within the IPF cohort (Fig. 3c, d). These data further support the findings that the indicated pathways were strongly dysregulated within the IPF cohort.

Comparison of lung transcriptome and plasma proteome in IPF
Our IPF cohort offered the unique opportunity to perform an integrated analyses and comparison of the lung transcriptome signature and the plasma proteomic signature in unison. These analyses provided some insightful data into pathways and markers commonly and divergently dysregulated in tissue and peripheral blood. Using similar fold change and FDR cut-offs (FC ≥ 1.5, FDR ≤ 0.1), we identified 53 genes commonly regulated in lung and plasma, with 34 of them moving in the same direction (Fig. 4). The intersection of these signatures revealed a striking modulation of pathways involved in chemotaxis, T-cell activation, mast cell migration and activation, TGF beta signaling, Wnt activation and ECM homeostasis (Fig. 5). Tables 3 and 4 show that list of genes commonly up or down regulated in IPF lung and plasma. Most of these proteins were either chemokines or chemotactic factors or proteins involved in fibrotic signaling and ECM remodeling. Notably, the receptors for several of the upregulated chemokines in plasma were concomitantly increased in the lung transcriptome (previously published Additional file 1: Table S1 from [17]). These include CCR4 (receptor for CCL17 and 22), CCR7 (receptor for CCL21), CCR5 (receptor for CCL5), CCR10 (receptor for CCL28) and CXCR4 (receptor for CXCL12). The matricellular protein, SPARC (FC 1.58, FDR 0.005), and vitronectin (FC 1.52, FDR 0.074) were both increased in IPF plasma and lung indicative of an active profibrotic state. A particularly interesting finding in this study was the marked dysregulation of mast cell activators and mediators such as CCL21, CXCL12, CCL5 and Tryptase beta 2, that are known to promote a profibrotic response. Figure 6 shows that CXCL12 expression was increased in both lung (FC 6.68, FDR < 0.00001, Fig. 6a) and plasma (FC 1.92, FDR 0.006, Fig. 6b) and the plasma expression was significantly correlated with % predicted FVC (r = − 0.38, p = 0.022, Fig. 6c).

Downregulated ECM degradation in IPF
Since pathway analyses of IPF plasma revealed a strong dysregulation of proteins involved in ECM homeostasis and remodeling, we measured C3M and C6M, neoepitopes of Collagen III and Collagen VI degradation respectively. Interestingly, we found a marked decrease in the levels of both C3M and C6M in IPF compared to healthy controls (Fig. 7a, b). Additionally, we found a high degree of correlation between the expression of both markers within the IPF cohort. We finally compared our plasma proteome signature to a recently published Somalogics plasma signature from a cohort of IPF patients from the IPF PRO registry [24] and found SPARC, CCL5, CCL17 and CCL22, OLR1 and PDGF-a/B as commonly regulated in similar directions in both IPF datasets (Fig. 8a-c).

Discussion
We describe here for the first time a simultaneous comprehensive analyses of plasma proteome and lung transcriptome of a unique cohort of advanced IPF patients  in comparison to that of normal healthy donors. Our studies suggest a strong dysregulation of T-cell activation, chemokine signaling, mast cell maturation, Wnt signaling and ECM homeostasis pathways in lung tissue as well as peripheral blood of these patients and identify new biomarkers that could have clinical utility.
Although numerous profiling studies in the past have identified dysregulated genes, proteins and pathways in early and progressive IPF, there are currently no diagnostic or prognostic biomarkers in clinical practice [6,[24][25][26]. This particular cohort of transplant-stage IPF patients provided an opportunity to compare and correlate tissue and plasma signatures in unison.
Unbiased hierarchical clustering of protein expression across the cohorts shows that the IPF samples were clearly separated from control and ALI samples. Consistent with this separation, we did not find major differences between the ALI and control cohorts, although our previously published RNA-seq data identified significant differences between the cohorts at the gene expression level. In the ALI samples, the degree of diffuse alveolar damage varied with the majority having extensive areas of pathology, while some had more focal areas. This degree of sample heterogeneity is expected and may have potentially contributed to the similarities seen between the ALI and donor sample groups in our study.
Plasma data from the IPF cohort not only confirmed and extended our previous lung transcriptome findings in the same cohort [17], but also provided potential insights into the key pathways and markers that could be involved in IPF disease progression. We observed a strikingly enhanced chemokine signaling signature in our IPF cohort, spanning a diverse group of chemokinereceptor pairs that contribute to both inflammation and tissue remodeling. CCL17/TARC and CCL22 are thymic chemokines previously shown to be upregulated in IPF BAL fluid and correlated to CCR4 expressing alveolar macrophages [27]. CCL28 is a classic mucosal chemokine known to signal through the CCR10 receptor, and CCR10 + epithelial cells are known to drive IPF progression [28]. CCL21 signaling through CCR7 expressed on activated IPF fibroblasts enhances fibrogenesis and neutralization of this pathway attenuates fibrosis [29,30]. Although the role of eosinophils   in IPF is poorly understood, it is known that eosinophils promote fibrotic airway remodeling and collagen deposition in allergic inflammation [31]. Emerging evidence also indicates that pathogenic memory Th-2 cells can activate eosinophils to produce profibrotic factors such as osteopontin [32]. The identification of multiple chemokine subtypes in our study could suggest that interplay of chemokine signaling through the mucosal, epithelial and Th-2 axis could together potentiate several pathogenic mechanisms in IPF including macrophage activation, T-lymphocyte homing, epithelial plasticity, and eosinophil influx. While shifting paradigms over the years have suggested dissociation of early inflammation from advanced fibrosis in IPF, our current findings suggest that inflammatory mechanisms remain active in advanced disease. Recently, artificial intelligence based approaches have also identified mononuclear inflammation, alveolar macrophages and fibroblast foci as potential prognostic biomarkers of IPF [33]. In addition to the increased expression of chemokines, we also show positive correlation between multiple chemokines in several pathways within the IPF cohort, further emphasizing the potential role of these pathways in disease progression. Intriguingly, the Th2 and eosinophilic signature in our IPF cohort was closely similar to the hallmarks of allergic inflammation as seen in asthmatic airways [34], further corroborated by a dominance of asthma-related mechanisms in our pathway analyses.
Another key finding from our study was the upregulation of the mast cell chemokines, CCL5, CCL21 and CXCL12, and the mast cell protease tryptase-B2. Prior studies have shown increased infiltrating mast cell numbers and tryptase activity in human IPF [35,36], and therapeutic targeting of CXCL12/CXCR4 signaling attenuated bleomycin induced lung fibrosis in mice [37,38]. Additionally, blood levels of CXCL12 as well as CXCR4 + cells within the honeycomb cysts and distal epithelium in tissue are increased in IPF [39]. Recent data also suggest that the antifibrotic drug, Nintedanib, could work through inhibition of mast cell survival and activity [40]. Notably, in our study we show that CXCL12 is not only increased in both lung and plasma but also correlated with % predicted FVC, suggesting that CXCL12 could be a tractable disease biomarker of advanced IPF. Mast cells have been long recognized to promote allergic inflammation, fibroblast activation and subepithelial fibrosis in asthma [41,42]. Our collective findings in lung and plasma could imply that mast cell activation and degranulation could provide profibrotic mediators, growth factors and proteases that can potentially activate fibroblasts and impact ECM remodeling in advanced IPF.
Aberrant reactivation of developmental pathways including that of Wnt signaling is known to play a role in the pathogenesis of IPF [43]. In our study, we found a concomitant increase in several components of Wnt signaling including the Wnt activators R-spondin 3 (RSPO3) and SPON1 and the Wnt receptor, FRZB in both lung tissue and plasma. Additionally, the expression of these proteins was significantly correlated within the individual subjects in the IPF cohort. The increase in RSPO3 was particularly interesting in the light of a recent report that therapeutic targeting of RSPO3 attenuates fibrosis in multiple organs such as lung, liver and skin [44].
Consistent with an advanced fibrotic state, we found ECM remodeling and proteolysis pathways strongly dysregulated in the IPF cohort. The matricellular protein, SPARC, plays a key role in promoting collagen assembly into the ECM, and implicated as a biomarker in previous studies [24,45]. SPARC gene and protein expression were strongly upregulated in our IPF cohort. Enrichment of ECM proteolysis pathways in IPF plasma is consistent with our previous findings on downregulated ECM degradation in the tissue. A surprising finding in our study was a strong downregulation of neoepitopes of collagen III and VI degradation, C3M and C6M respectively. Baseline levels of C3M and C6M are known to be predictive of progressive fibrosis and are elevated in newly diagnosed IPF patients [46,47]. However, our IPF cohort represents a significantly advanced IPF population in which it is possible that extensive ECM turnover during the course of the disease would have resulted in a net increase in synthesis and reduction in degradation leading to a potential decrease in these markers. It is also possible that advanced IPF lung tissue is highly crosslinked, and less susceptible to degradation and turnover or that the matrix could act as a barrier to the release and subsequent identification of these markers in circulation. Supporting this notion, our lung RNA-seq data clearly shows marked upregulation of pro-fibrotic and synthetic ECM proteins and a downregulation of ECM degradation pathways. Furthermore, C3M and C6M were significantly correlated among the IPF subjects suggesting an overall decrease in ECM degradation in advanced IPF. Our data also could imply that the dynamics of ECM turnover could be different through the progression of IPF resulting in potential temporal differences in the levels of neoepitopes of ECM synthesis and degradation.
Although RNA-seq and SomaLogics represent distinctly different platforms that limit robust comparison, our analyses clearly confirmed many analytes dysregulated at gene level to be differentially regulated in plasma as well. It is possible that many other markers of interest may be missed due to the targeted 1300-plex analyses. Future studies with the currently available expanded Somascan platform (~ 7000 analytes) could help further identify and validate additional biomarkers of disease. Interestingly, new emerging data indicates that the use of a multi-omic approach such as ours could be valuable in identifying molecular disease signatures and biomarkers of IPF [48]. Another limitation of our study was the inability to include early/progressive IPF controls or a validation cohort due to limited tissue availability and the unique end stage pathology exemplified by this cohort. However, we compared the plasma proteome signature from our IPF cohort with a recently published similar signature using the Soma-Logics platform with the IPF-PRO registry samples, and identified SPARC, CCL5, CCL17 and CCL22, OLR1 and PDGF-a/B as common biomarkers in both IPF datasets. The IPF-Pro registry cohort had a mean predicted FVC of 69%. In contrast, the mean % FVC of patients in our study was 44%, with a majority of patients at ≤ 30%. It is therefore possible that chemokine and growth factor signaling, immune activation and ECM homeostasis pathways could be consistently dysregulated in early and late stage disease. Notably, our dataset also confirms biomarkers (such as CCL17, PDGF, SPARC) previously identified in early IPF, as well as demonstrates correlation of CXCL12 to lung function, suggesting that many biomarkers identified in our late stage IPF cohort may also be potential early diagnostic markers.

Conclusions
In summary, we have presented a unique comparative transcriptome-proteome signature of advanced IPF and identified key tissue and circulating biomarkers that could be predictive of progressive/worsening IPF. Further validation of these findings in larger cohorts will help develop a comprehensive panel of biomarkers with clinical utility to address the current unmet need in the diagnosis and management of IPF.