Local heterogeneity of normal lung parenchyma and small airways disease are associated with COPD severity and progression

Background Small airways disease (SAD) is a major cause of airflow obstruction in COPD patients and has been identified as a precursor to emphysema. Although the amount of SAD in the lungs can be quantified using our Parametric Response Mapping (PRM) approach, the full breadth of this readout as a measure of emphysema and COPD progression has yet to be explored. We evaluated topological features of PRM-derived normal parenchyma and SAD as surrogates of emphysema and predictors of spirometric decline. Methods PRM metrics of normal lung (PRMNorm) and functional SAD (PRMfSAD) were generated from CT scans collected as part of the COPDGene study (n = 8956). Volume density (V) and Euler-Poincaré Characteristic (χ) image maps, measures of the extent and coalescence of pocket formations (i.e., topologies), respectively, were determined for both PRMNorm and PRMfSAD. Association with COPD severity, emphysema, and spirometric measures were assessed via multivariable regression models. Readouts were evaluated as inputs for predicting FEV1 decline using a machine learning model. Results Multivariable cross-sectional analysis of COPD subjects showed that V and χ measures for PRMfSAD and PRMNorm were independently associated with the amount of emphysema. Readouts χfSAD (β of 0.106, p < 0.001) and VfSAD (β of 0.065, p = 0.004) were also independently associated with FEV1% predicted. The machine learning model using PRM topologies as inputs predicted FEV1 decline over five years with an AUC of 0.69. Conclusions We demonstrated that V and χ of fSAD and Norm have independent value when associated with lung function and emphysema. In addition, we demonstrated that these readouts are predictive of spirometric decline when used as inputs in a ML model. Our topological PRM approach using PRMfSAD and PRMNorm may show promise as an early indicator of emphysema onset and COPD progression. Supplementary Information The online version contains supplementary material available at 10.1186/s12931-024-02729-x.


Background
Chronic obstructive pulmonary disease (COPD) is a leading cause of death and healthcare burden in the United States and worldwide.Accounting for over 3 million deaths globally in 2015 [1], this disease is expected to rise in prevalence as the world population ages [2].COPD is understood to be a complex heterogeneous disease presenting clinically diverse phenotypes [3,4].Major causes of airflow obstruction are attributed to chronic bronchiolar obstruction, a.k.a small airways disease (SAD), and emphysema.Although SAD and emphysema are treated as separate COPD subtypes, studies have shown strong quantitative evidence that SAD exists as an intermediate state between healthy lung tissue and emphysema-i.e., irreversible lung damage-in COPD pathogenesis [5][6][7].At present, little has been done to better quantify the onset of SAD from healthy lung parenchyma.
The Parametric Response Map (PRM) is a CT-based voxel-wise computational technique that can identify and quantify functional small airways disease (fSAD; an indirect measure of SAD) even in the presence of emphysema [8].The percent volume of PRM-derived fSAD (PRM fSAD ), i.e., the amount of fSAD in the lungs, has improved COPD phenotyping and the prediction of spirometric decline in subjects at risk of COPD [9].To determine the value of spatial features from each PRM classification, we developed topological PRM (tPRM) as an extension of the PRM algorithm [10].These radiographic tPRM readouts were shown to improve upon commonly used whole-lung PRM measures with respect to COPD characterization and progression [11,12], and correlate to structural changes in lung tissue samples from lung transplant recipients diagnosed with bronchiolitis obliterans [13,14].
In this study, we evaluated the PRM topologies volume density (V), a measure of extent, and Euler-Poincaré Characteristic (χ), a measure of pocket formation, of normal lung and fSAD as independent readouts of COPD severity, pulmonary function, and extent of emphysema using the Phase 1 COPDGene cohort [15].We also investigated the potential of these topologic readouts as predictors of spirometric decline using a machine-learning model.This study demonstrates how tPRM readouts may be used as possible measures of early emphysema and COPD progression.

Study sample
Our study was a secondary analysis of data from COPD-Gene (ClinicalTrials.gov:NCT00608764), a large Health Insurance Portability and Accountability Act-compliant prospective multi-center observational study.In Phase 1 (2007)(2008)(2009)(2010)(2011)(2012) and Phase 2 (2013Phase 2 ( -2017)), 5-year followup, written and informed consent was obtained from all participants and the study was approved by local institutional review boards of all 21 centers.Ever-smokers with greater than or equal to 10 pack-year smoking history, with and without airflow obstruction, were enrolled between January 2008 and June 2011.Participants were non-Hispanic white or African American.Participants underwent volumetric inspiratory and expiratory CT using standardized protocol; images were transferred to a central lab for protocol verification and quality control (QC) [15].Exclusion criteria included a history of other lung disease (except asthma), prior surgical excision involving a lung lobe or greater, present cancer, metal in the chest, or history of chest radiation therapy.Participants were excluded from the present study due to inadequate CT for computing tPRM, such as missing an inspiration/expiration scan, or failing QC implemented specifically for the present study.Our QC protocol is described in Additional File 1 (Supplemental Methods 1).Data for participants evaluated here have been utilized in numerous previous studies and a list of COPDGene publications can be found at [16].Our study is the first to report tPRM analysis across the whole Phase 1 cohort and predict spirometric decline over 5 years in the Phase 2 subset of COPDGene participants.
Spirometry was performed in the COPDGene study before and after the administration of a bronchodilator, specifically 180 mcg of albuterol (Easy-One spirometer; NDD, Andover, MA).Post-bronchodilator values were used in our analyses.COPD was defined by a post-bronchodilator FEV 1 /FVC of less than 0.7 at the baseline visit, as specified in the Global Initiative for Chronic Obstructive Lung Disease (GOLD) guidelines [17].GOLD grades 1-4 were used to define disease severity.GOLD 0 classification, i.e., "at-risk, " was defined by a post-bronchodilator FEV 1 /FVC ≥ 0.7 at the baseline visit, alongside FEV 1 % predicted ≥ 80%.Participants with FEV 1 /FVC ≥ 0.7 with FEV 1 % predicted < 80% were classified as having preserved ratio impaired spirometry (PRISm) [18].Demographic and spirometric measures used in this study included age, sex, race, smoking history, scanner manufacturer, body mass index (BMI), FEV 1 % predicted, FEV 1 /FVC and forced mid-expiratory flow (FEF 25 − 75 ).

Computed tomography and Topological PRM Analysis
All computed tomography (CT) data were obtained from multiple sites associated with the COPDGene project at Phase 1. Whole-lung volumetric multidetector CT acquisition was performed at full inspiration and normal expiration at functional residual capacity using a standardized previously published protocol [15].Data reconstructed with the standard reconstruction kernel were used for quantitative analysis.All CT data were presented in Hounsfield units (HU), where stability of CT measurement for each scanner was monitored monthly using a custom COPDGene phantom [15].For reference, air and water attenuation values are − 1,000 and 0 HU, respectively.
PRM were determined from paired CT scans using Lung Density Analysis (LDA) software (Imbio, LLC, Minneapolis, MN).LDA segmented the lungs from the thoracic cavity with airways removed.Inspiratory CT scans were spatially aligned to the expiratory geometric frame using deformable image registration.Lung voxels were classified using pre-determined HU thresholds as: normal (PRM Norm , -950 < inspiration HU ≤ -810, and expiration HU ≥ -856), functional small airways disease (PRM fSAD , -950 < inspiration HU ≤ -810, expiration HU < -856), emphysema (PRM Emph , inspiration HU < -950, expiration HU < -856), or parenchymal disease (PRM PD , inspiration HU > -810) [19,20].Only voxels between − 1,000 HU and − 250 HU at both inspiration and expiration were used for PRM classification.Each PRM classification was quantified as the percent volume, which is defined as the sum of a PRM classification normalized to the total lung volume at expiration multiplied by 100.There were a few noisy voxels that were considered indeterminate by PRM (inspiration < -950 HU, expiration > -856 HU) that were excluded from our analysis as they did not form consolidated regions of interest within the parenchyma.
Topological analysis of PRM was performed using methods previously described [10].tPRM metrics were defined through application of Minkowski measures on 3D binary voxel distributions: volume density (V) and Euler-Poincaré Characteristic (χ) [21].Maps of V and χ were computed for each PRM class (Norm, fSAD, Emph, and PD) using a 3D moving window of size 21 × 21 × 21 voxels evaluated on a grid of every 5th voxel.V was normalized by the Minkowski estimate of the mask within the same local window volume (rather than a direct calculation of the mask volume in the window as previously described) and χ by the masked window voxel count.Linear interpolation was applied to determine V and χ values for all segmented voxels.
To indicate the PRM class associated with a Minkowski measure, the class is presented as a superscript (e.g., V fSAD is the volume density of PRM fSAD ).tPRM analysis was performed using open-source and in-house software developed in MATLAB R2019a (MATLAB, The Math-Works Inc., Natick, MA).A detailed overview and diagram, of computing tPRM from raw imaging data, was made by Hoff et al. [10].Because the focus of this study is the relationship between normal parenchyma and SAD, and its association with emphysema, all analyses were performed using V and χ for PRM classifications Norm and fSAD.For completeness, V and χ for PRM classifications Emph and PD are provided.

Phase 1 data and statistical analysis
Data in this study are presented as mean and standard deviation unless stated otherwise.Correlations between V and χ for PRM Norm and PRM fSAD were calculated using Spearman rank-order correlation coefficients (ρ ).The total Phase 1 cohort was separated into two subsets based on spirometry-confirmed COPD: non-COPD (FEV 1 /FVC ≥ 0.7) and COPD (FEV 1 /FVC < 0.7).Crosssectional multivariable regression analysis was performed on both subsets using a stepwise approach with V and χ for PRM classifications Norm and fSAD as independent variables and selected pulmonary function testing and clinical features as outcome variables, controlling for age, gender, race, BMI, smoking (pack years) and CT vendor.These control variables were included as compulsory independent variables in all regression models.Statistical work was conducted using IBM SPSS Statistics v27 (SPSS Software Products).In all tests, significance was defined by p < 0.05.

Predict spirometric decline
We evaluated baseline V and χ for PRM classifications Norm and fSAD as predictors of FEV 1 decline over 5 years using a machine learning (ML) model.A total of 4483 cases from the Phase 2 cohort of the COPDGene longitudinal trial, a subset of Phase 1, had FEV 1 measurements at baseline and 5-year follow up.Our ML model is a sparse dictionary learning algorithm [22][23][24][25][26] that classifies image patch features as "normal" or "abnormal".In our method, we used the tPRM maps V Norm , V fSAD , χ Norm , and χ fSAD of each case as inputs for training and testing the algorithm.For training our ML model, individual cases were stratified based on the change in FEV 1 over 5 years [= (FEV 1 at yr 5 -FEV 1 at yr 0)/5 years] as fast (ΔFEV 1 /yr ≤ -60 ml/yr; n = 1516) and slow progressors (ΔFEV 1 /yr > -60 ml/yr; n = 2967).We used 35% of the data for training and 65% for testing the model [27,28].Training was performed on a randomly selected subset of 1569 cases, with n = 531 fast progressors and n = 1038 slow progressors.The remaining 2914 cases, consisting of n = 985 fast progressors and n = 1929 slow progressors, were used for testing the algorithm.In brief, our ML model is designed to associate unique features from the input image patches with fast and slow progressors.This is achieved by randomly selecting image patches from within the lung and extracting the information from the inputs (tPRM maps V Norm , V fSAD , χ Norm , and χ fSAD given as inputs to the ML algorithm) at these image patch locations and comparing their underlying patch features with the compiled class dictionaries of features, which are determined during training.It is important to note that no previous knowledge about the case and lung tissue features, such as emphysema, are provided for the algorithm to delineate "normal" from "abnormal" lung tissue.Details on model design and methods for training and testing are provided in Additional File 1 (Supplemental Fig. 1 and Supplemental Methods 2).To determine the contribution of each feature to the model selection, we used the minimum redundancy maximum relevance feature selection algorithm [29] to rank the tPRM inputs used in the dictionary learning algorithm.The algorithm quantifies the redundancy and relevance using mutual information of variables [30,31].We also investigated the selection bias for each input in the ML model and obtained the prediction accuracy for 10 different choices of training image patches, considering each input separately in the model.The prediction accuracy for each training run is fit to a Gaussian probability density function [32,33].All processing and analyses were performed using in-house algorithms developed in MATLAB version 2020a (MathWorks, Natick, MA).To determine the contribution of our ML model to account for spatial features in predicting FEV 1 decline, we determined if whole lung mean values of V Norm , V fSAD , χ Norm , and χ fSAD were predictive of FEV 1 decline using a logistic regression classifier.

Case Study: spatial analysis
To better understand the relationship between PRM fSAD and PRM Emph , we evaluated the spatial dependance of V and χ for these PRM classifications from a single subject.The case is a female subject, 48 years of age, diagnosed with GOLD 4 COPD.On a single axial slice, profiles of V and χ for PRM fSAD and PRM Emph were generated by selecting points from high emphysema (V Emph > 0.6) and low emphysema (V Emph < 0.2).A line plot (Additional File 1: Supplemental Fig. 4) was produced for V and χ vs. distance along each point of the profile.The distance, in units of centimeters, along the image profile was determined using the voxel dimensions of the CT scan.All processing and analyses were performed using in-house algorithms developed in MATLAB version 2020a (Math-Works, Natick, MA).

Population characteristics
The original COPDGene Phase 1 cohort consisted of 10,300 individuals.We excluded 1,344 participants for: inadequate CT data, such as missing an expiration or inspiration scan, to conduct tPRM analysis (n = 1,125); missing clinical data (n = 16); or failing to pass our CTbased QC testing (n = 203).Further details of CT QC are provided in Additional File 1 (Supplemental Methods 1).The resulting complete subset used for analyses thus consisted of 8,956 participants.Baseline demographics and lung function for all Phase 1 participants, grouped based on FEV 1 % predicted and FEV 1 /FVC-that is, by GOLD grade or PRISm as described in the Methods-are reported in Table 1.Due to the COPDGene recruitment strategy, the proportion of GOLD 0 (FEV 1 /FVC ≥ 0.7, FEV 1 % predicted ≥ 80%) participants [15] account for almost half of the study population (43%; 3,867 of 8,956 participants).Increasing percent volume of PRMderived fSAD (PRM fSAD ) and PRM-derived emphysema (PRM Emph ), with decreasing PRM Norm , was observed with higher GOLD grades.This is consistent with previously published work.PRM-derived parenchymal disease (PRM PD ) was found to be elevated in PRISm and GOLD 0 participants (35.8 ± 16.4% and 26.3 ± 12.8% of the total lung volume, respectively) as compared to the COPD subset.

Topological readouts of PRM
Presented in Fig. 1 is a case with elevated fSAD (PRM fSAD = 40%).Representative coronal slices of the expiration CT scan and PRM fSAD , overlaid on CT scan, are provided.To illustrate the dependence of χ on the arrangement of PRM fSAD , we have included V fSAD and χ fSAD maps indicating regions with similar values of V fSAD (blue and magenta boxes).As expected, V fSAD (Fig. 1C) is dependent on the amount of fSAD (yellow voxels in Fig. 1B).Averaged over the lungs, V fSAD is proportional to the percent volume of PRM fSAD by a factor of 100.However, χ fSAD > 0 (magenta box in Fig. 1D) corresponds to the formation of fSAD pockets (magenta box Fig. 1B), whereas χ fSAD < 0 (blue box in Fig. 1D) is the consolidation of these pockets into a mesh with holes (blue box in Fig. 1B).
The volume density of PRM Norm and PRM fSAD demonstrated an inverse relationship with increasing COPD severity (Fig. 2A), consistent with previous work.A similar inverse relationship was observed for χ of both normal lung and fSAD (χ Norm and χ fSAD ).Values of χ Norm and χ fSAD were found to flip about zero (e.g., χ fSAD changes from positive to negative values) from GOLD 2 to GOLD 4 (Fig. 2B).From Fig. 2B we observe that χ Norm and χ fSAD had means (standard deviations) of -0.0084 (0.0071) and 0.0047 (0.0074), respectively, for cases diagnosed as GOLD 2. For those with severe COPD, i.e., GOLD 4, χ Norm and χ fSAD are 0.0039 (0.0055) and − 0.0036 (0.0048), respectively.Mean values of χ Emph and χ PD were found to be positive and similar across GOLD.We did not consider mean breadth and surface area of PRM Norm and PRM fSAD in our analysis, as we did not see such a strong relationship between them (Additional File 1: Supplementary Fig. 2).
We further evaluated the relationship of PRM Norm and PRM fSAD with respect to V (Fig. 3A) and χ (Fig. 3B).Both V and χ demonstrated strong correlations between Norm and fSAD (ρ = -0.666,p < 0.001 and ρ = -0.745,p < 0.001, respectively) over the Phase 1 cohort.Here the GOLD stages are coded by color and the relative amount of emphysema, quantified by V Emph , by size of the marker.As observed in Fig. 3A, V Norm versus V fSAD had more scatter in the data compared to χ Norm versus χ fSAD (Fig. 3B).As expected, GOLD 4 cases with elevated emphysema (V Emph ) demonstrated a drop in V Norm and V fSAD values.In contrast, χ Norm consisted of primarily positive values, whereas positive and negative values were observed for χ fSAD (Fig. 3B).Although V fSAD was found to be strongly correlated to V Emph (ρ = 0.845, p < 0.001), only a weak correlation was observed between χ fSAD and χ Emph (ρ = -0.155,p < 0.001).

Multivariable regression analysis
Presented in Table 2 are results from multivariable regression analyses that demonstrate the contribution of V and χ to PRM Norm and PRM fSAD when modeling spirometric measures and the volume density of emphysema, controlling for age, sex, race, BMI, pack-years, Fig. 3 Scatter plots of all study sample participants for (A) V Norm versus V fSAD and (B) χ Norm versus χ fSAD .Individual points are color coded based on COPD classifications.The size of the points indicates the amount of emphysema as measured by the volume density of PRM Emph (V Emph ) Fig. 2 Boxplots for topological measures of PRM maps PRM Norm (green), PRM fSAD (yellow), PRM Emph (red) and PRM PD (magenta) across all GOLD stages, "at-risk" (GOLD 0), and PRISm.Plots of (A) volume density, describing class magnitude (relative amounts of voxels) and (B) Euler-Poincaré characteristic, describing class homology, determined by number and type of holes within class volumes.Box plots were computed following standard protocol for box and whiskers; box lines determined by lower quartile (Q1), middle quartile / median (Q2) and upper quartile (Q3), and whiskers are drawn out to Q1-1.5 x IQR and Q3 + 1.5 x IQR for lower and upper limits, respectively.IQR = Q3-Q1.Outliers are defined as points beyond the given upper and lower limits and illustrated as black points with a random bounded horizontal perturbation beyond box whiskers and CT vendor.Among those with spirometrically confirmed COPD, V Norm was found to be significantly associated with multiple clinical measures including FEV 1 % predicted, FEV 1 /FVC, FEF 25 − 75 and V Emph (see Table 2).V fSAD and χ fSAD were found to independently and significantly contribute to FEV 1 % predicted (β = 0.065, p = 0.004 and β = 0.106, p < 0.001).Only the Norm topological measures were found to contribute to FEV 1 /FVC (β = 0.668, p < 0.001 for V Norm and β = -0.120,p < 0.001 for χ Norm ), whereas V and χ for both Norm and fSAD were found to be significant parameters for FEF 25 − 75 .With respect to V Emph , extent of emphysema, V and χ for Norm and fSAD were highly significant but demonstrated similar trends irrespective of PRM classification.For completeness, the same analyses were performed on the non-COPD cohort (Additional File 1: Supplemental Table 1).As compared to the COPD cohort, statistical models generated from the non-COPD cohort demonstrated significant parameters but with weaker correlations (i.e., adjusted R 2 ).

Prediction model of spirometric decline
Representative axial slices of expiration CT scan, PRM, V fSAD , χ fSAD and corresponding patch probability maps from a fast progressor (with ΔFEV 1 /yr of -249 ml/yr) are provided in Fig. 4. Our ML model correctly classified this subject as a fast progressor.This case is a 63-year-old male, diagnosed at baseline with GOLD 2 COPD.Using V and χ from PRM fSAD and PRM Norm as inputs, the ML model was able to determine regions of emphysema, discernible from existing fSAD, observed in the right upper lung as "abnormal" (blue patches in the probability maps).In contrast, the dorsal lung regions were classified as "normal" (red patches in the probability maps) due to the absence of fSAD and emphysema.For completeness we have provided in Additional File 1 (Supplemental Fig. 3) representative axial slices of expiration CT scan, PRM, V fSAD , χ fSAD and corresponding patch probability maps from a slow progressor (with ΔFEV 1 /yr of 101 ml/ yr).Fig. 4 The dictionary learning results for a 63-year-old male diagnosed at baseline with GOLD 2 COPD and declared a fast progressor with ΔFEV 1 /yr of -249 ml/yr.Representative axial slice of an expiratory CT scan acquired at baseline, its associated PRM map, the tPRM maps V fSAD and χ fSAD of PRM fSAD , and their image patch probability maps from the dictionary learning model As seen in Fig. 5A and B, our ML model had an overall classification accuracy of 70.6% and Area Under the Curve (AUC) of 0.69 of the receiver operating characteristic (ROC) curve.We compared our ML model with a simple logistic regression model using whole lung mean values of V Norm , V fSAD , χ Norm , and χ fSAD .Figure 5B shows that the logistic regression model only achieved an AUC of 0.55.The contribution of each of the inputs to the model (V Norm , V fSAD , χ Norm , and χ fSAD ) are shown in Fig. 5C and D. V and χ of PRM fSAD are dominant inputs, followed by V and χ of PRM Norm (Fig. 5C).Using a feature rank analysis performed on our test set, we observed that V and χ of PRM fSAD are important to achieve higher prediction accuracy.In fact, χ fSAD was found to have the smallest spread/variance (Fig. 5D), indicating highly desirable robustness to the choice of training image patches and its usefulness as an input in the ML model.As reported in Table 3, "normal" patches, on average, consisted primarily of PRM Norm , elevated V Norm (abundant) and negative χ Norm (consolidated), with negligible  PRM fSAD , low V fSAD (depleted) and positive χ fSAD (sparse pockets).In "abnormal" patches, similar values of V and χ for PRM Norm and PRM fSAD were observed (Table 3).Positive and negative values in χ fSAD were found for "normal" and "abnormal" patches, respectively.This is consistent with the inverse relationship seen with increasing COPD severity shown in Fig. 2.

Dependence between topologies of PRM fSAD and PRM Emph
As the topologies of PRM were determined as averages over the whole lungs, we provide a case study illustrating the relationship between V and χ of PRM fSAD and PRM Emph at the local level.Presented in Additional File 1 (Supplemental Fig. 4) are the profiles of V and χ of PRM fSAD and PRM Emph from a region of the right lung with elevated and reduced V Emph (orange circle and star, respectively; Supplemental Fig. 4A and C).The case is a female subject, 48 years of age, diagnosed with GOLD 4 COPD.The subject was found to have on average high levels of V fSAD (0.37) with relatively elevated V Emph (0.1).Mean values for the whole lungs of χ were 0.008 and − 0.009 for PRM Emph and PRM fSAD , respectively.As seen in Additional File 1 (Supplemental Fig. 4C), V fSAD increased while V Emph decreased further from lung with the highest level of V Emph (~ 0.6 at orange circle in Additional File 1: Supplemental Fig. 4A and C).At approximately 1.8 cm, volume densities between PRM fSAD and PRM Emph transitioned.In addition, χ fSAD was found to increase with decreasing χ Emph with transition occurring at ~ 1.2 cm.

Discussion
The topological parametric response map is an extension of the well-established PRM method, a quantitative imaging marker of SAD [8].In this study, we have demonstrated that inclusion of topological features, in this case the Euler-Poincaré Characteristic (χ), improved characterization and interpretation of fSAD in COPD as a complimentary readout of volume density (V), which is equivalent to traditional percent volume of PRM classifications [10].This study also evaluated the role of PRM-defined normal parenchyma (PRM Norm ) and fSAD (PRM fSAD ) as lone indicators of COPD severity.We observed distinct patterns in topological metrics with respect to GOLD grades and identified a complete inversion in topology, characterized by Euler-Poincaré Characteristic χ, between normal lung and fSAD, in mid-to-late stages of COPD.We also found V and χ of PRM Norm and PRM fSAD to have statistically significant correlation with spirometric measures and emphysema and to be predictive of spirometric decline.
Our study builds on previous work by Hoff et al. [10] on tPRM characterization in COPD.This study used a much smaller population (n = 88) to demonstrate the trends of all four topological features (volume density, surface area, mean curvature and Euler-Poincaré Characteristic) with increasing COPD severity [10].Limited in statistical power, it instead focused on the surface area of fSAD.Access to a notably larger population (n = 8,956) in the current study allowed us to evaluate the volume density (V) and Euler-Poincaré Characteristic (χ) of PRM Norm and PRM fSAD and relate our findings to the field's current understanding of COPD progression, i.e., normal parenchyma transitions to emphysema through SAD.
A key finding of our study is the ability to quantify parenchymal lung health, based not only on the extent but also on the arrangement of local lung abnormalities, i.e., fSAD.This is rooted in the concept that the lungs are healthy (i.e., PRM Norm ) and COPD progresses through SAD (i.e., PRM fSAD ), an intermediate between normal and emphysematous lung tissue, to emphysema.The nature of this transition suggests χ may be capturing a fundamental mechanism in the emergence of fSAD.Based on our observation, fSAD appears to develop as distinct pockets, which are represented as positive values in χ fSAD within healthy lung tissue, as depicted in the blue box in Fig. 1B.With increasing COPD severity, fSAD pockets coalesce to a mesh, which is represented by negative values in χ fSAD (magenta box in Fig. 1B).On a whole lung level, this transition occurs on average from GOLD stages 2 to 4. By quantifying the amount and arrangement of normal and fSAD parenchyma, one can assess the severity of COPD.As fSAD is an intermediate between healthy lung and emphysema, increasing levels of emphysema have a direct effect on V and χ of fSAD.This is observed in Fig. 3 and Additional File 1 (Supplemental Fig. 4), where increasing values of V Emph resulted in a drop in V fSAD and increase in χ fSAD .These trends were reflected in our multivariable model for V Emph as well (Table 2).
In a seminal study, McDonough and colleagues [7] provided pathological evidence demonstrating the role of SAD in COPD progression.Using high resolution (~ 10 μm) microCT to analyze frozen lung samples from lung transplant recipients with end-stage COPD, they found that widespread narrowing and destruction of the smaller airways (i.e., SAD) occurred before emphysematous lesions became large enough to be visible on standard CT imaging.They concluded that SAD might serve as an emphysema precursor.Based on their observation, we postulated that the transition observed between χ Norm and χ fSAD (Figs. 2 and 3) should be observed for χ fSAD and χ Emph .Using mean values of χ over the lungs, χ Emph was found to be relatively stable, generating positive values across GOLD (Additional File 1: Supplemental Fig. 2), as well as demonstrating a weak correlation to χ fSAD (ρ = -0.155,p < 0.001).Nevertheless, evaluating χ fSAD and χ Emph at the local level, we observe a strong association between these two readouts (Additional File 1: Supplemental Fig. 4), which may be linked to the structural changes in the terminal airways observed using microCT of lung explants.
In a recent study, Bhatt and colleagues evaluated a CT readout, referred to as the mean Jacobian determinant of normal voxels, at varying distances from emphysematous tissue [34].When measured at 2 mm from CT voxels designated emphysema (i.e., voxel HU <-950 HU), this CT-based readout was found to be predictive of spirometric decline.Our spatial analysis of a single case clearly demonstrates a transition in topologies of PRM fSAD and PRM Emph , 1.8 cm and 1.2 cm for V and χ, respectively (Additional File 1: Supplemental Fig. 4).It is the association of topologies between PRM fSAD and PRM Emph at the local level that allows our machine learning model to predict spirometric decline, with an accuracy of 70%, in the absence of any emphysema readout as an input (Fig. 4).Although the readouts reported by Bhatt and colleagues lacked quantification of SAD, there is clear agreement that lung tissue along the periphery of emphysematous tissue provides potential insight into COPD progression.Using only topologies of PRM Norm and PRM fSAD , our patch-based ML model outperformed the whole-lung logistic regression model (Fig. 5B).This result highlights the importance of the spatial relationship of χ fSAD to χ Emph to predict spirometric decline (Figs. 4 and 5).
We acknowledge several notable limitations.COPD-Gene comprises over 20 study sites, making scanner variation and reconstruction kernel inconsistency inevitable.Sensitivity of PRM to scanner variability was addressed previously [35] and although effort was made to apply PRM only to soft kernels, variability in scanner type was unavoidable.However, we included scanner vendor in our multivariable regressions and found that it did not significantly confound models.Another limitation is variation in levels of inspiration and expiration during CT acquisition.Earlier work demonstrated that even small perturbations from functional residual capacity (FRC) have an observable effect on threshold-based techniques such as PRM [35].To limit this, we implemented QC that excluded participants based on erroneous volume changes or strong discordance with correlation between PRM Norm and FEV 1 % predicted.

Conclusions
In this paper, we have demonstrated that topological features, V and χ, are able to enhance the sensitivity of PRM classifications, notably Norm and fSAD, to extent of emphysema and COPD severity.These data support the concept that as pockets of small airways disease coalesce, surrounding normal tissue is lost.Pockets of fSAD are seen to correlate with increasing presence of emphysema, independent of the amount of fSAD present.We further demonstrated that local levels of χ fSAD and χ Emph correlate, which may be explained by bronchiolitis along the periphery of emphysematous tissue observed by McDonough and colleagues using microCT.In addition, we demonstrated that local values of V and χ for PRM Norm and PRM fSAD provide sufficient information to predict spirometric decline, even in the absence of any prior knowledge of emphysema.Our study provides a unique strategy to detect subtle changes in lung parenchyma that may progress to emphysema.This approach to monitoring extent and arrangement of Norm and fSAD offers insight into COPD phenotypes and provides improved prognostic information that has relevance in clinical care and future clinical trials.

Fig. 1
Fig. 1 Illustration of Volume Density (V) and Euler-Poincaré Characteristic (χ) for PRM fSAD .Presented are representative coronal slices for the (A) expiratory CT scan with associated (B) PRM fSAD overlay (yellow).Included are the (C) volume density and (D) Euler-Poincaré Characteristic of PRM fSAD .Blue and Magenta boxes indicate two lung regions with elevated V fSAD that have negative and positive χ fSAD , respectively.The subject is a GOLD 3 female aged 53 with FEV 1 % predicted of 32% and percent volume of PRM fSAD of 40%

Fig. 5
Fig. 5 Results and relevance of the different features (tPRM metrics as inputs) used in the dictionary learning method.(A) Confusion Matrix showing the sensitivity and specificity of the ML model classifications for both the fast progressor (n = 985) and the slow progressor (n = 1929) classes in the test set.Green colored and red colored fields in the matrix represent agreement and disagreement, respectively, of the ML model with the actual decision.(B) Receiver Operating Characteristic (ROC) curve for our ML model and the logistic regression classifier with the corresponding Area Under the Curve (AUC) statistics.(C) Bar plot showing the feature importance score and feature ranking using the minimum redundancy maximum relevance method.(D) Plot showing the distribution of the features and their prediction accuracy over ten different training runs

Table 1
Clinical characterization of the study population Notes Participant characteristics of the entire study population separated in subsets of those with (FEV 1 /FVC < 0.7) and without (FEV 1 /FVC ≥ 0.7) COPD.Values are displayed as mean (standard deviation).GOLD, Global Initiative for Chronic Obstructive Lung Disease; PRISm, preserved ratio impaired spirometry; GOLD 0, at-risk smokers with normal spirometry; BMI, body mass index; FEV 1 , forced expiratory volume in one second; FVC, forced vital capacity; FEF 25 − 75 , forced mid-expiratory flow; PRM, parametric response map; Norm, Normal; fSAD, functional small airways disease; Emph, emphysema; PD, parenchymal disease

Table 2
Multivariable regression for COPD subsetNotes Multivariable regression modelling using volume density (V) and Euler-Poincaré Characteristic (χ) for PRM-derived Normal and fSAD (introduced stepwise) to model pulmonary function testing measures in the COPD subset.Each column presents results for a different regression model.FEV 1 , forced expiratory volume in one second; FVC, forced vital capacity; FEF 25 − 75 , forced mid-expiratory flow; Emph, emphysema; SE, standard error of the estimate; BMI, body mass index; Norm, Normal; fSAD, functional small airways disease.Model performance is reported as adjusted R 2 and standard error of the estimate.Feature association is reported as standardized beta coefficients (β); cells for stepwise variables removed from final model.All regression models were controlled for age, sex, race, BMI, pack years and CT vendor.P values ≥ 0.01, < 0.01 and ≥ 0.001, and < 0.001 are presented as values in parentheses, *, and **, respectively

Table 3
Image patch topological PRM metrics in the ML model