Skip to main content

Imaging-based clusters in former smokers of the COPD cohort associate with clinical characteristics: the SubPopulations and intermediate outcome measures in COPD study (SPIROMICS)



Quantitative computed tomographic (QCT) imaging-based metrics enable to quantify smoking induced disease alterations and to identify imaging-based clusters for current smokers. We aimed to derive clinically meaningful sub-groups of former smokers using dimensional reduction and clustering methods to develop a new way of COPD phenotyping.


An imaging-based cluster analysis was performed for 406 former smokers with a comprehensive set of imaging metrics including 75 imaging-based metrics. They consisted of structural and functional variables at 10 segmental and 5 lobar locations. The structural variables included lung shape, branching angle, airway-circularity, airway-wall-thickness, airway diameter; the functional variables included regional ventilation, emphysema percentage, functional small airway disease percentage, Jacobian (volume change), anisotropic deformation index (directional preference in volume change), and tissue fractions at inspiration and expiration.


We derived four distinct imaging-based clusters as possible phenotypes with the sizes of 100, 80, 141, and 85, respectively. Cluster 1 subjects were asymptomatic and showed relatively normal airway structure and lung function except airway wall thickening and moderate emphysema. Cluster 2 subjects populated with obese females showed an increase of tissue fraction at inspiration, minimal emphysema, and the lowest progression rate of emphysema. Cluster 3 subjects populated with older males showed small airway narrowing and a decreased tissue fraction at expiration, both indicating air-trapping. Cluster 4 subjects populated with lean males were likely to be severe COPD subjects showing the highest progression rate of emphysema.


QCT imaging-based metrics for former smokers allow for the derivation of statistically stable clusters associated with unique clinical characteristics. This approach helps better categorization of COPD sub-populations; suggesting possible quantitative structural and functional phenotypes.


Chronic obstructive pulmonary disease (COPD) is the third leading cause of death in the United States [1] and is identified by airflow limitation and/or obstruction. The severity of COPD is assessed by forced expiratory volume in 1 s (FEV1%) predicted values at post bronchodilator [2]. The pulmonary function test (PFT)-based FEV1 and forced vital capacity (FVC) values are highly recommended to assess the global alteration of lung, but they do not correlate well with symptoms [3]. In addition, PFTs do not reveal local structural and functional alterations, which are essential in examining the heterogeneity of COPD phenotypes. Thus, the ability to quantify these alterations at multiple scales during COPD progression is necessary to characterize COPD phenotypes.

A multicenter study of COPD, i.e., Subpopulations and Intermediate Outcomes in COPD Study (SPIROMICS) [2] acquired QCT scans at total lung capacity (TLC) and residual volume (RV) [4]. This is an integral part of the multicenter study to find structural and functional phenotypes. A recent advance of quantitative medical imaging and data analysis techniques allows for derivation of QCT imaging-based metrics, leading to identification of statistically stable clusters/phenotypes. For instance, using only QCT imaging-based variables, Choi et al. [5] derived clinically meaningful asthmatic sub-groups, being potentially useful in developing clusters-specific treatments. Furthermore, Haghighi et al. [6] expanded the QCT imaging-based clustering approach to identify homogenous clusters within current smokers from SPIROMICS. In this study, we hypothesize that QCT-based imaging metrics could be used to identify distinct COPD former smoker sub-groups with clinically meaningful characteristics, subsequently adding insights to the previous study of current smokers [6]. Shaker et al. [7] and Zach et al. [8] reported that former smokers had significantly higher % low-attenuation areas (%LAAs) on inspiration and expiration CT scans (for emphysema and air trapping measures) than current smokers. This is possibly due to parenchymal inflammation in current smokers serving to mask CT-based indices relative to former smokers [6, 7]. Therefore, we divided the subjects into former and current smokers to independently assess phenotypes between these two groups and report on the former smokers in this work.

With the aid of machine learning techniques, QCT imaging-based metrics have been used to find homogeneous sub-groups of COPD subjects. As an example, Bodduluri et al. [9] have employed image registration-based metrics to discriminate COPD subjects from non-COPD subjects. The study demonstrated the potential of registration-based variables to characterize COPD phenotypes, but this study was limited in supervised learning. In regards to unsupervised learning methods, there have been several efforts to identify COPD sub-groups, but they employed either clinical data-only or a mix of clinical and CT data together [10,11,12] as we focus on imaging-only parameters to identify clusters. Although it would be possible to add clinical/physiological/biological measures into our cluster analysis, we used only imaging-based features to focus features of airway structure and lung function. Once established, our clusters were evaluated for their clinical, physiological, or biological measures. The associations between imaging-only clusters and the non-imaging phenotypes provide a validation of the ability of imaging metrics to characterize clinically meaningful phenotypes. Choi et al. [5] pioneered the use of unsupervised cluster analysis using CT image data acquired by the Severe Asthma Research Program (SARP) to identify four asthmatic clusters. Their approach accounted for inter-site and inter-subject variations, enabling an analysis of large data sets acquired by multiple centers. Furthermore, Choi et al. [13] successfully identified imaging-based structural and functional features that differentiate asthmatics and COPD patients with chronic functional alteration.

In this study, we adopted the approach by Choi et al. [5]. In addition to the existing imaging-based metrics developed for asthma, we introduced several new metrics to account for tissue alterations and emphysematous lung [5, 6]. A comprehensive set of imaging-based metrics were transformed to the principal component domain, and a cluster analysis was performed to explore possible COPD phenotypes of former smokers. The former smokers-clusters were then evaluated in association with severity, GOLD stages [14], sex, BMI and biomarkers, such as neutrophil counts, leukocyte (WBC) count and matrix metalloproteinase (MMP-3). We then compared the cluster membership of former smokers in this study with that of current smokers presented in our previous study [6].


Human subject data and QCT imaging

We analyzed a total of 758 SPIROMICS subjects containing an extensive set of biomarkers. In our analysis, we hypothesized that smoking status may have effects on CT measures of former and current smokers [7, 8]. The hypothesis was further consolidated by performing a combined analysis and finding that a mix of both groups cannot provide adequate cluster stability. Hence, we excluded current smokers, so that a total of 406 formers smokers remained. The healthy never smokers without COPD were considered as healthy controls and were not included in the clustering analysis. PFTs were performed for all subjects pre- and post- bronchodilator, and CT was performed post-bronchodilator. Table 1 shows the demographic and PFT measures based on each stratum. Former smokers with post-bronchodilator FEV1/FVC > 0.7 were grouped in stratum 2, and former smokers in strata 3 and 4 had post-bronchodilator FEV1/FVC < 0.7, with FEV1 > 50% in stratum 3 and FEV1 < 50% in stratum 4, respectively [2].

Table 1 Demography, baseline (Pre-bronchodilator) and maximal (Post-bronchodilator) pulmonary function tests for 105 Stratum 1 (healthy), 119 Stratum 2, 184 Stratum 3 and 103 Stratum 4 subjects

Two QCT scans at TLC and RV were acquired by multiple imaging centers in the NIH-funded SPIROMICS multicenter research study [4]. The CT imaging protocols were approved by each center’s institutional review boards (IRB). All QCT scans were obtained with post-bronchodilator. They were segmented with an automated commercial airway/lung segmentation software (Apollo 2.0, VIDA Diagnostics), and registered with a non-rigid mass-preserving imaging registration technique [15, 16].

Derivation of QCT imaging-based metrics

A total of 75 multiscale imaging-based variables were extracted to derive principal components (Fig. 1). The segmental variables included bifurcation angle (θ), airway circularity (Cr), wall thickness (WT) and hydraulic diameter (Dh), where each variable indicated alteration of skeletal structure, alteration of airway shape, wall thickening and luminal narrowing, respectively. The sizes of WT and Dh were normalized by tracheal WT and average diameter (Dave) predicted from healthy subjects [5], being denoted by WT* and Dh*, to eliminate inter-subject variability due to age, sex, and height. The four segmental variables were extracted from ten local regions to reflect characteristics of regional alterations. A detailed derivation of the above structural variables can be found in reference [17].

Fig. 1
figure 1

An expanded set of imaging-based metrics including emphysema percentage, tissue fraction at TLC and RV. a Inspirational image-based local structures: θ, Cr, WT*, and Dh*. b Expiration image-based global and lobar function: AirT%. c Inspiration image-based global and lobar function: Emph%. d Global structure:. e Registration-based global and lobar functions:.

We further derived both strain-based and density-based functional metrics with the aid of image registration that matched two QCT images at TLC and RV. The strain-based variables included fractional air volume change (ΔVairF), the determinant of Jacobian (Jacobian), and anisotropic deformation index (ADI). These are estimates of regional ventilation, local volume change, and preferential local lung deformation respectively [18, 19]. Next, the density-based functional metrics included functional small airway disease percentage (fSAD%) and emphysema percentage (Emph%) to characterize the portions of small airway narrowing/closure and emphysematous lung, respectively. This approach was devised to dissociate emphysematous region from air-trapping region, previously proposed by Galban et al. [20]. In order to eliminate inter-site variation, we employed a fraction-based fSAD% and Emph% using 90 and 98.5% air-fraction as the threshold, instead of using the density threshold of − 856 and − 950, respectively [21]. We further added two more imaging-based metrics that measure tissue fraction [13, 22] at TLC and RV (βtissueTLC and βtissueRV). The tissue fractions measure the portion of tissue volume in each voxel. These are supplementary metrics for Emph% and fSAD%, because βtissueTLC decreases if tissue destruction is captured and βtissueRV decreases if air fraction increases due to air-trapping.

In addition, we included global imaging-based metrics such as the ratio of apical-basal distance over ventral-dorsal distance at TLC (lung shape), the ratio of air-volume changes in upper lobes to those in middle and lower lobes between TLC and RV (U/(M + L)|v), fSAD%, Emph%, βtissueTLC and βtissueRV, Jacobian and ADI in the whole lung. Overall, there were 32 local/segmental structural variables, 35 lobar structural variables and 8 global structural variables.

Cluster and statistical analysis

Raw imaging data were scaled with standard scaler, and a principal component analysis was performed to derive linearly uncorrelated variables, so-called principal components (PCs). To obtain an optimal number of PCs, a parallel analysis [23] with random uncorrelated data was adopted. The analysis led to the number of 7 as an optimal choice of PCs (Additional file 1: Figure S1).

Using the 7 derived PCs, to find the optimal clustering method and number, we then assessed internal properties including connectivity, average Silhouette width and Dunn indices [24] for three different clustering methods, i.e., hierarchical, K-means, and Gaussian finite mixture model-based methods. Connectivity, average Silhouette width and Dunn indices measure the inverse of ith nearest neighbors which is not assigned to the same cluster, how tightly grouped all the points in the cluster are, and the ratio between the minimal inter-cluster distance to maximal intra-cluster distance, respectively. Thus smaller connectivity and larger Silhouette width and Dunn index indicate better clustering properties. First, K-means method was found to be a good clustering method for current data based on connectivity, and average Silhouette width (Additional file 2: Figure S2a). Dunn criteria then suggested that the number of 4 is an optimal choice in using K-means. To further test stability of the clustering membership, a nonparametric bootstrap analysis was performed with 200 bootstrapped data sets. The mean of Jaccard similarity coefficients, defined by the size of intersection divided by the size of the union between clusters [25], was computed to find the optimal cluster number and clustering approach (Additional file 2: Figure S2b).

Kruskal-Wallis and chi-square tests were performed to compare differences of continuous and categorical variables, respectively. The reported P values were significant, if any one group is statistically different from one group or more. We then performed association tests of imaging-based clusters with demographic and clinical variables to investigate the clinical relevance of current clusters.


Structural and functional features of imaging-based clusters

Cluster analysis identified four stable [6] imaging-based clusters with the sizes of 100, 80, 141 and 85, respectively (Table 2, and Fig. 2). Five major variables with higher Wilk’s λ values which best describe the four clusters were selected with a stepwise forward variable selection technique using Wilk’s λ criterion [26]. Note that the clusters were differentiated predominantly with whole lung (total) parenchymal metrics including βtissue at RV and TLC, Jacobian, Emph% and fSAD%. Overall whole lung Emph% and fSAD% increased with increasing cluster number. It was noted that Emph% and fSAD% in Cluster 2 fell within the similar range with healthy subjects (Fig. 3).

Table 2 Major imaging-based features selected by Wilk’s λ value of a stepwise forward variable selection method in four imaging-based clusters and healthy subjects (stratum 1)
Fig. 2
figure 2

A summary of imaging and clinical characteristics between clusters

Fig. 3
figure 3

a Percentage of emphysema (Emph%) for four clusters and the healthy control group (green). † P > 0.05 between clusters 1, 2, 3 and the healthy group. P < 0.05 between Cluster 4 and other groups for all pairwise comparisons b Percentage of small airway disease (fSAD%) for four clusters and the healthy control group (green). ‡ P < 0.05 for comparisons between four clusters 2, 3, 4 (red) and the healthy group for all pairwise comparison. P > 0.05 for between Cluster 1 and the healthy group

Structural alterations in segmental airways were also captured between clusters (Table 3). Tracheal bifurcation angle (θ) and circularity (Cr) measured in the sLUL were significantly reduced in Cluster 4. Cluster 1 was characterized by airway wall thickening (WT*↑), whereas Clusters 3 and 4 were demonstrated by airway wall thinning (WT*↓) and airway narrowing (Dh*↓). As summarized in Fig. 2, clusters were characterized by airway wall thickening-dominance (Cluster 1), increased tissue fraction at TLC with marginally increased emphysema (Cluster 2), proximal and peripheral airway narrowing (Cluster 3), and severe alterations of tracheal bifurcation angle (θ) and airway shape (Cr) on proximal airways as well as peripheral alterations (Cluster 4).

Table 3 Segmental airway features at specific regions

Associations of imaging-based clusters with clinical features

Clusters 1 and 2 were mostly populated in GOLD 0, 1 and 2 along with a lower BODE index, while Cluster 4 was mostly populated with GOLD 3 and 4 (stratum 4) with the highest BODE index (Table 4). Cluster 2 showed the highest BMI (obese) among all clusters. Clusters 1 and 2 demonstrated similar post-bronchodilator FEV1/FVC values, but Cluster 2 had lower FEV1%predicted and FVC %predicted values compared with Cluster 1. Cluster 3 had significantly lower FEV1%predicted value and FEV1/FVC, along with preserved FVC value, whereas Cluster 4 had the lowest FEV1 and FVC % predicted values, along with the lowest FEV1/FVC.

Table 4 Demography, baseline (pre-bronchodilator) and maximal (post-bronchodilator) PFTs, in four imaging-based clusters

The smoking pack-years were significantly greater in Clusters 3 and 4 than those of Clusters 1 and 2 (Table 5). Cluster 4 showed higher associations with pulmonary/vascular condition, and chronic bronchitis, emphysema, and COPD diagnosed at baseline across all clusters. Shortness of breath during sleep was increased in Clusters 2 and 4. Fathers and mothers of subjects in Cluster 4 were likely to have COPD. The WBC counts were increased in Clusters 2–4, with increased neutrophils (Table 6). Lymphocytes were reduced in Cluster 4. The proteolytic enzymes of matrix-metalloproteinases (MMPs) were reduced especially in Cluster 2. Based on the lowest CAT score and exacerbation, Cluster 1 subjects were likely asymptomatic (CAT< 10) former smokers with the lowest exacerbation across all clusters. In contrast, Cluster 4 showed the highest CAT score with the lowest 6-min walk distance along with severe oxygen desaturation.

Table 5 Associations of symptoms and disease histories with cluster membership
Table 6 Characteristics of biomarkers in four imaging-based clusters

We further associated the clusters with visual diagnostic assessments including COPD subtypes (CLE: Centrilobular; PSE: Paraseptal; PLE: Panlobular emphysema) as well as interstitial lung disease (ILD) by an experienced thoracic radiologist at the University of Iowa (Table 7) because these subtypes might be associated with airway abnormalities [27]. Cluster 4 was less likely related to ILD and had a significant increase of PLE. Subjects with PLE were not observed in Clusters 1 and 2. We analyzed longitudinal data of 169 available subjects among the current cohort of former smokers to quantify change of Emph%, i.e., emphysema progression index (ΔEmph%) between baseline and one-year follow-up. ΔEmph% is computed as the percentage of voxels within the lung less than − 950 HU and assesses the extend of emphysema (ΔEmph% ≥ 1% and ΔEmph% ± 0.5% are considered as rapid-progressors and non-progressors, respectively) [28]. ΔEmph was marginal in Cluster 2 (Table 7), whereas it was significantly higher in Cluster 4.

Table 7 Associations of visual diagnostics (VD) and of emphysema subtypes with cluster membership

Furthermore, we compared two different clusters-grouping derived from current smokers [6] and former smokers, respectively (Table 8). Overall CAT score and exacerbation histories of current smokers were greater than those of former smokers. WBC counts were not differentiable in current smokers-derived clusters because all clusters showed large numbers of WBC count. On the other hand, WBC count of former smokers-derived Cluster 1 was the smallest and it was increased as increasing the cluster membership of former smokers. On the contrary to the finding of WBC counts, former smokers demonstrated greater Emph and fSAD% than current smokers, based on kernel density estimation (KDE) plots (Fig. 4). The dispersed density distribution of current smokers may indicate the masking effect of CT-based measures of emphysema and small airway disease, compared to former smokers [7]. The Emph% and fSAD% of former smokers (Table 2) were especially increased in Clusters 3 and 4, as compared with counterparts of current smokers [6].

Table 8 Comparison of major clinical and biomarkers between current and former smokers
Fig. 4
figure 4

Kernel density estimation (KDE) plots with contour labels based on Emph% and fSAD% for current and former smokers

Decision tree analysis

We performed a decision tree analysis to construct a simple predictive model (Additional file 3: Figure S3) to classify former smokers. The data set was shuffled randomly into training (n = 324) and test sets (n = 82) and the accuracy was assessed on the test set. The model comprising 5 discriminant variables resulted in accuracy of 81%. These variables were βtissueRV (Total), Jacobian (Total), βtissueTLC (Total), Dh* (RMB) and ADI (Total).

We further evaluated an association between current and former smoker clusters by assessing the membership of former smokers in the decision tree of current smokers [6] and vice versa. The classification accuracy for both cases was about 0.62 based on the confusion matrices (Additional file 4: Table S1). It can give an assessment for possible overlap between clusters of these two cohorts.


In this study, we applied an unsupervised clustering method with an expanded set of imaging-based variables to former COPD smokers collected in the multicenter study of SPIROMICS. Four homogeneous clusters were derived within a former-smoker population, exhibiting distinct phenotypic characteristics and strong associations with clinically relevant COPD biomarkers. The imaging-based clusters can provide more information than the conventional PFT-based classification of COPD, such as stratum and GOLD criteria, because they explain structural and functional alterations at lobar and segmental levels. We also included parenchymal metrics including Emph%, fSAD%, tissue fractions at TLC and RV as well as segmental-level structural metrics including wall thickness and diameter of airway branches. The imaging and clinical phenotypes based on the clusters could be explained as follows.

Features of respective clusters

The cluster memberships can suggest possible phenotypes with distinct characteristic correlated with relevant clinical/biomarker measures for former COPD smoker.

Cluster 1: asymptomatic resistant smokers with preserved pulmonary function

Cluster 1 showed preserved pulmonary function (FEV1/FVC = 0.72) at post bronchodilator and was mostly populated in GOLD stages 0 and 1. This cluster had a relatively low Emph% and fSAD% with structural and functional characteristics close to those of healthy controls. BODE index, exacerbation histories and WBC count of this cluster were relatively lower compared with other clusters. These characteristics along with CAT< 10 and the lowest exacerbation among all clusters suggests that Cluster 1 belongs to asymptomatic resistant smokers. Cluster 1 imaging metrics were very close to those of healthy subjects. Airway wall thickening was the only abnormality in this cluster. A large population study, Multi-Ethnic Study of Atherosclerosis (MESA) [29], reported that long-term smoking may contribute to airway wall thickening prior to the development of more severe imaging features of COPD.

Cluster 2: obese female individuals with preserved lung function and marginal emphysema

Cluster 2 with the highest BMI and over-representation of women indicated clinical and epidemiological importance as reported by Castaldi et al. [10] and Martinez et al. [30]. Castaldi et al. [10] derived four clusters with 10,192 subjects from COPDGene using several imaging-based metrics, e.g., Emph%, upper/lower ratio of Emph%, gas trapping, and PFT results acquired by a feature selection method. Note that our Cluster 2 is aligned with Cluster 2 of Castaldi et al. [10] in high BMI, African-American and women-dominance. Cluster 2 showed the preserved pulmonary function (FEV/FVC = 0.71) close to Cluster 1, but the CAT score and exacerbation of this cluster was greater than that of Cluster 1. This group showed a noticeable increase of tissue fraction at TLC, and a decrease of emphysema index among clusters. This cluster included more CLE-only type while showing the lowest ΔEmph% among clusters. This finding is of interest because most studies showed that development of CLE is associated with severe abnormalities of the small airways, e.g. wall thickening. Thus, CLE may be more related to air-borne risk factors that cause airway inflammatory processes [27]. Cluster 2 also showed the lowest value of MMPs among clusters. Ostridge et al. [31] investigated the association between specific pulmonary MMPs and emphysema as these enzymes degrade the extracellular matrix and have been identified as potentially important in the development of emphysema [31].

Cluster 3: older male individuals with increasing fSAD and emphysema

Unlike Clusters 1 and 2, Cluster 3 demonstrated a significant decrease of FEV1/FVC and FEV1% predicted values, but their FVC % predicted value remained in the normal range. This cluster was mostly populated in GOLD stages 2 and 3 with a significant increase in BODE index. From this cluster, Emph% and fSAD% in parenchymal regions were significantly increased, being similar with Cluster 4. Thus, this cluster showing airway narrowing without airway wall thinning, and normal circularity and skeletal structure (airway geometry) would be categorized as an intermediate cluster between less severe stage (Cluster 1) and more severe stage of COPD (Cluster 4).

Cluster 4: severe emphysema and fSAD individuals with severe structural alterations

This cluster showed the highest Emph%, fSAD%, BODE index, WBC count and CAT score along with the lowest FEV1/FVC among all clusters. These characteristics along with structural and functional variables indicated that Cluster 4 belongs to severe symptomatic COPD subjects. The pattern of decreasing Dh* with increasing fSAD% (non-emphysematous air trapping) indicates severely narrowed status of both proximal and distal airways. In addition to airway narrowing, this group actually contains most of the significant structural and functional alterations. It is especially noted that prominent airway wall thinning and alteration of airway geometry change were only observed in this cluster. Assuming that this cluster is the most severe COPD group, alterations of airway features including airway wall thinning (WT*), elliptic airway shape (Cr), and change of airway geometry (θ) may occur at the end stage of COPD.

Dominance of PLE with diffuse destruction in Cluster 4 along with its highest progression index among all clusters might be related to blood-borne mechanism rather than the possible air-borne mechanism in Cluster 2. These finding shows the possibility of two different pathogenetic mechanisms among subjects. In addition, Koo et al. [32] studied WBC count as a biomarker and their associations with the severity of the disease. WBC count in former smokers has an increasing pattern from Cluster 1 to Cluster 4 (Table 6) along with increasing CAT score and decreasing FEV1/FVC.

With previously analyzed current smokers [6], the comparison for important clinical and biomarker measures between former and current smokers are shown in Table 8. Overall, exacerbation has increasing pattern between clusters of former smokers with Cluster 1 and Cluster 4 with the lowest and highest, respectively. Cluster 2 for both current and former smokers has increased exacerbation compared to clusters 1 and 3 and might be related to the highest tissue fraction and possible inflammation in Cluster 2.

WBC count was lower in former smokers possibly due to the effect of smoking on the WBC [6], which was also significantly elevated as increasing cluster membership. This result indicates that WBC count can serve as an important risk factor such as inflammation especially in former smokers. Furthermore, the CAT score and exacerbation histories were significantly higher in current smokers than in former smokers. An increase in inflammatory markers in current smokers relative to former smokers was contradictory to imaging-based features such as Emph% and fSAD% (Fig. 4). The smoking status could affect parenchymal inflammation, leading to an increase of CT density [6, 7]. Thus Emph% and fSAD% could be underestimated, if patients are on smoking. This confounding effect prevents from applying a clustering algorithm for former and current smokers due to the low Jaccard index (< 0.7).

To assess a possible overlap between current and former smokers, we used the trained decision tree on current smokers to classify former smokers and vice versa; the classification accuracy for both cases was about 0.62 (the confusion matrices are reported in Additional file 4: Table S1). This result indicates that two clustering analyses between former and current smokers can be further used to investigate the difference in phenotypic characteristics of these cohorts. The impact of smoking status on cluster membership requires further investigation with larger cohorts as well as with longitudinal data to inspect disease progression and membership transition over time.


We performed a cross-sectional study to derive four unique imaging-based clusters in former smokers with COPD. The current cluster analysis can be used in conjunction with our previously reported cluster analyses in current smokers with COPD to assess the differences in smoking status (former vs current) in the COPD population and explore possible different phenotypes between these two groups.

Availability of data and materials

Not applicable.



Anisotropic deformation index


Body mass index


Right intermediate bronchus

Cr :

Airway luminal circularity

D h :

Hydraulic luminal diameter


Emphysema percentage

FEV1 :

Forced expiratory volume in one second


Functional small airway disease percentage


Determinant of jacobian matrix


Airway luminal area


Left lower lobe


Left main bronchus


Left upper lobe

Lung shape:

Apical-basal distance over ventral-dorsal distance at TLC


Multiscale imaging-based cluster analysis


Principal component analysis


Pulmonary function test


Quantitative computed tomography


Right lower lobe


Right main bronchus


Right middle lobe


Right upper lobe


Residual volume


Sub-grouped left lower lobe with branches of LB6, and LB8 to LB10


Sub-grouped left upper lobe with branches of LB1 to LB5


Subpopulations and intermediate outcome measures in COPD study


Sub-grouped right lower lobe with branches of RB6 to RB10


Sub-grouped right middle lobe with branches of RB4 to RB5


Sub-grouped right upper lobe with branches of RB1 to RB3


Total lung capacity


Trifurcation of left lower lobe

U/(M + L)|v:

The ratio of air volume changes in upper lobes to those in middle and lower lobes


Airway wall area percentage, i.e., the ratio of wall area to total area


White blood cell


Airway wall thickness

βtissue, RV :

Tissue fraction at RV

βtissue, TLC :

Tissue fraction at TLC

ΔV air F :

Lobar fraction of air volume change between TLC and RV

θ :

Bifurcation angle between two daughter branches


  1. Miniño AM, Murphy SL, Xu J, Kochanek KD. Deaths: final data for 2008. Natl Vital Stat Rep Cent Dis Control Prev Natl Cent Health Stat Natl Vital Stat Syst. 2011;59:1–126.

    Google Scholar 

  2. Couper D, LaVange LM, Han M, Barr RG, Bleecker E, Hoffman EA, Kanner R, Kleerup E, Martinez FJ, Woodruff PG, Rennard S, SPIROMICS Research Group. Design of the Subpopulations and Intermediate Outcomes in COPD study (SPIROMICS). Thorax. 2014;69:491–4.

    Article  Google Scholar 

  3. Agusti A, Calverley PMA, Celli B, Coxson HO, Edwards LD, Lomas DA, MacNee W, Miller BE, Rennard S, Silverman EK, Tal-Singer R, Wouters E, Yates JC, Vestbo J. Evaluation of COPD longitudinally to identify predictive surrogate endpoints (ECLIPSE) investigators. Characterisation of COPD heterogeneity in the ECLIPSE cohort. Respir Res. 2010;11:122.

    Article  Google Scholar 

  4. Sieren JP, Newell JD, Barr RG, Bleecker ER, Burnette N, Carretta EE, Couper D, Goldin J, Guo J, Han MK, Hansel NN, Kanner RE, Kazerooni EA, Martinez FJ, Rennard S, Woodruff PG, Hoffman EA. SPIROMICS protocol for multicenter quantitative computed tomography to phenotype the lungs. Am J Respir Crit Care Med. 2016;194:794–806.

    CAS  Article  Google Scholar 

  5. Choi S, Hoffman EA, Wenzel SE, Castro M, Fain S, Jarjour N, Schiebler ML, Chen K, Lin C-L, Heart N. Lung and blood Institute’s severe asthma research program. Quantitative computed tomographic imaging-based clustering differentiates asthmatic subgroups with distinctive clinical phenotypes. J Allergy Clin Immunol. 2017;140:690–700.e8.

    Article  Google Scholar 

  6. Haghighi B, Choi S, Choi J, Hoffman EA, Comellas AP, Newell JD, Graham Barr R, Bleecker E, Cooper CB, Couper D, Han ML, Hansel NN, Kanner RE, Kazerooni EA, Kleerup EAC, Martinez FJ, O’Neal W, Rennard SI, Woodruff PG, Lin C-L. Imaging-based clusters in current smokers of the COPD cohort associate with clinical characteristics: the SubPopulations and Intermediate Outcome Measures in COPD Study (SPIROMICS). Respir Res. 2018;19:178.

  7. Shaker SB, Stavngaard T, Laursen LC, Stoel BC, Dirksen A. Rapid fall in lung density following smoking cessation in COPD. COPD. 2011;8:2–7.

    Article  Google Scholar 

  8. Zach JA, Williams A, Jou S-S, Yagihashi K, Everett D, Hokanson JE, Stinson D, Lynch DA, COPDGene Investigators. Current smoking status is associated with lower quantitative CT measures of emphysema and gas trapping. J Thorac Imaging. 2016;31:29–36.

    Article  Google Scholar 

  9. Bodduluri S, Newell JD, Hoffman EA, Reinhardt JM. Registration-based lung mechanical analysis of chronic obstructive pulmonary disease (COPD) using a supervised machine learning framework. Acad Radiol. 2013;20:527–36.

    Article  Google Scholar 

  10. Castaldi PJ, Dy J, Ross J, Chang Y, Washko GR, Curran-Everett D, Williams A, Lynch DA, Make BJ, Crapo JD, Bowler RP, Regan EA, Hokanson JE, Kinney GL, Han MK, Soler X, Ramsdell JW, Barr RG, Foreman M, van Beek E, Casaburi R, Criner GJ, Lutz SM, Rennard SI, Santorico S, Sciurba FC, DeMeo DL, Hersh CP, Silverman EK, et al. Cluster analysis in the COPDGene study identifies subtypes of smokers with distinct patterns of airway disease and emphysema. Thorax. 2014;69:415–22.

    Article  Google Scholar 

  11. Burgel P-R, Roche N, Paillasseur J-L, Tillie-Leblond I, Chanez P, Escamilla R, Court-Fortune I, Perez T, Carré P, Caillaud D. Clinical COPD phenotypes identified by cluster analysis: validation with mortality. Eur Respir J. 2012;40:495–6.

    Article  Google Scholar 

  12. Castaldi PJ, Benet M, Petersen H, Rafaels N, Finigan J, Paoletti M, Marike Boezen H, Vonk JM, Bowler R, Pistolesi M, Puhan MA, Anto J, Wauters E, Lambrechts D, Janssens W, Bigazzi F, Camiciottoli G, Cho MH, Hersh CP, Barnes K, Rennard S, Boorgula MP, Dy J, Hansel NN, Crapo JD, Tesfaigzi Y, Agusti A, Silverman EK, Garcia-Aymerich J. Do COPD subtypes really exist? COPD heterogeneity and clustering in 10 independent cohorts. Thorax. 2017;72:998–1006.

    Article  Google Scholar 

  13. Choi S, Haghighi B, Choi J, Hoffman EA, Comellas AP, Newell JD, Wenzel SE, Castro M, Fain SB, Jarjour NN, Schiebler ML, Barr RG, Han MK, Bleecker ER, Cooper CB, Couper D, Hansel N, Kanner RE, Kazerooni EA, Kleerup EAC, Martinez FJ, O’Neal WK, Woodruff PG, Lin C-L. Differentiation of quantitative CT imaging phenotypes in asthma versus COPD. BMJ Open Respir Res. 2017;4:e000252.

    Article  Google Scholar 

  14. Pauwels RA, Buist AS, Calverley PMA, Jenkins CR, Hurd SS. Global strategy for the diagnosis, management, and prevention of chronic obstructive pulmonary disease. Am J Respir Crit Care Med. 2001;163:1256–76.

    CAS  Article  Google Scholar 

  15. Haghighi B, D Ellingwood N, Yin Y, Hoffman EA, Lin C-L. A GPU-based symmetric non-rigid image registration method in human lung. Med Biol Eng Comput. 2018;56:355–71.

    Article  Google Scholar 

  16. Yin Y, Hoffman EA, Lin C-L. Mass preserving nonrigid registration of CT lung images using cubic B-spline. Med Phys. 2009;36:4213–22.

    Article  Google Scholar 

  17. Choi S, Hoffman EA, Wenzel SE, Castro M, Fain SB, Jarjour NN, Schiebler ML, Chen K, Lin C-L. Quantitative assessment of multiscale structural and functional alterations in asthmatic populations. J Appl Physiol Bethesda Md. 2015;118:1286–98.

    Google Scholar 

  18. Jahani N, Choi S, Choi J, Haghighi B, Hoffman EA, Comellas AP, Kline JN, Lin C-L. A four-dimensional computed tomography comparison of healthy and asthmatic human lungs. J Biomech. 2017;56:102–10.

    Article  Google Scholar 

  19. Choi S, Hoffman EA, Wenzel SE, Tawhai MH, Yin Y, Castro M, Lin C-L. Registration-based assessment of regional lung function via volumetric CT images of normal subjects vs. severe asthmatics. J Appl Physiol Bethesda Md. 2013;115:730–42.

    Google Scholar 

  20. Galbán CJ, Han MK, Boes JL, Chughtai KA, Meyer CR, Johnson TD, Galbán S, Rehemtulla A, Kazerooni EA, Martinez FJ, Ross BD. Computed tomography-based biomarker provides unique signature for diagnosis of COPD phenotypes and disease progression. Nat Med. 2012;18:1711–5.

    Article  Google Scholar 

  21. Choi S, Hoffman EA, Wenzel SE, Castro M, Lin C-L. Improved CT-based estimate of pulmonary gas trapping accounting for scanner and lung-volume variations in a multicenter asthmatic study. J Appl Physiol. 2014;117:593–603.

    Article  Google Scholar 

  22. Hoffman EA. Effect of body orientation on regional lung expansion: a computed tomographic approach. J Appl Physiol Bethesda Md. 1985;59:468–80.

    CAS  Google Scholar 

  23. Ledesma RD. Determining the number of factors to retain in EFA : an easy-to-use computer program for carrying out parallel analysis; 2007.

    Google Scholar 

  24. Brock G, Pihur V, Datta S, Datta S. clValid: an R package for cluster validation. J Stat Softw. 2008;25:1–22.

    Article  Google Scholar 

  25. Hennig C. Cluster-wise assessment of cluster stability. Comput Stat Data Anal. 2007;52:258–71.

    Article  Google Scholar 

  26. Baier D, Decker R, Schmidt-Thieme L. Data Analysis and Decision Support. Berlin Heidelberg: Springer-Verlag; 2005.

    Book  Google Scholar 

  27. Cosio Piqueras MG, Cosio MG. Disease of the airways in chronic obstructive pulmonary disease. Eur Respir J Suppl. 2001;34:41s–9s.

    CAS  Article  Google Scholar 

  28. Dougherty T. Quantitative computed tomography based measures of vascular dysfunction for identifying COPD phenotypes and subphenotypes. Theses Diss. 2016.

  29. Donohue KM, Hoffman EA, Baumhauer H, Guo J, Budoff M, Austin JHM, Kalhan R, Kawut S, Tracy R, Barr RG. Cigarette smoking and airway wall thickness on CT scan in a multi-ethnic cohort: the MESA lung study. Respir Med. 2012;106:1655–64.

    Article  Google Scholar 

  30. Martinez FJ, Curtis JL, Sciurba F, Mumford J, Giardino ND, Weinmann G, Kazerooni E, Murray S, Criner GJ, Sin DD, Hogg J, Ries AL, Han M, Fishman AP, Make B, Hoffman EA, Mohsenifar Z, Wise R, National Emphysema Treatment Trial Research Group. Sex differences in severe pulmonary emphysema. Am J Respir Crit Care Med. 2007;176:243–52.

    Article  Google Scholar 

  31. Ostridge K, Williams N, Kim V, Harden S, Bourne S, Coombs NA, Elkington PT, Estepar RSJ, Washko G, Staples KJ, Wilkinson TMA. Distinct emphysema subtypes defined by quantitative CT analysis are associated with specific pulmonary matrix metalloproteinases. Respir Res. 2016;17:92.

    Article  Google Scholar 

  32. Koo H-K, Kang HK, Song P, Park HK, Lee S-S, Jung H. Systemic white blood cell count as a biomarker associated with severity of chronic obstructive lung disease. Tuberc Respir Dis. 2017;80:304–10.

    Article  Google Scholar 

Download references


The authors thank the SPIROMICS participants and participating physicians, investigators and staff for making this research possible. More information about the study and how to access SPIROMICS data is at We would like to acknowledge the following current and former investigators of the SPIROMICS sites and reading centers: Neil E Alexis, PhD; Wayne H Anderson, PhD; Igor Barjaktarevic, MD, PhD; R Graham Barr, MD, DrPH; Eugene R Bleecker, MD; Richard C Boucher, MD; Russell P Bowler, MD, PhD; Elizabeth E Carretta, MPH; Stephanie A Christenson, MD; Alejandro P Comellas, MD; Christopher B Cooper, MD, PhD; David J Couper, PhD; Gerard J Criner, MD; Ronald G Crystal, MD; Jeffrey L Curtis, MD; Claire M Doerschuk, MD; Mark T Dransfield, MD; Christine M Freeman, PhD; MeiLan K Han, MD, MS; Nadia N Hansel, MD, MPH; Annette T Hastie, PhD; Eric A Hoffman, PhD; Robert J Kaner, MD; Richard E Kanner, MD; Eric C Kleerup, MD; Jerry A Krishnan, MD, PhD; Lisa M LaVange, PhD; Stephen C Lazarus, MD; Fernando J Martinez, MD, MS; Deborah A Meyers, PhD; Wendy C Moore, MD; John D Newell Jr., MD; Laura Paulin, MD, MHS; Stephen Peters, MD, PhD; Cheryl Pirozzi, MD; Elizabeth C Oelsner, MD, MPH; Wanda K O’Neal, PhD; Victor E Ortega, MD, PhD; Robert Paine, III, MD; Nirupama Putcha, MD, MHS; Sanjeev Raman, MBBS, MD; Stephen I. Rennard, MD; Donald P Tashkin, MD;; J Michael Wells, MD; Robert A Wise, MD; and Prescott G Woodruff, MD, MPH. The project officers from the Lung Division of the National Heart, Lung, and Blood Institute were Lisa Postow, PhD, and Thomas Croxton, PhD, MD.


Supports for this study were provided, in part, by NIH grants U01-HL114494, R01-HL112986 and S10-RR022421, and by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (NRF-2017R1D1A1B03034157) and by the Korea Ministry of Environment (MOE) as the Environmental Health Action Program (RE201806039). SPIROMICS was supported by contracts from the NIH/NHLBI (HHSN268200900013C, HHSN268200900014C, HHSN268200900015C, HHSN268200900016C, HHSN268200900017C, HHSN268200900018C, HHSN268200900019C, HHSN268200900020C), and supplemented by contributions made through the Foundation for the NIH and the COPD Foundation from AstraZeneca/MedImmune; Bayer; Bellerophon Therapeutics; Boehringer-Ingelheim Pharmaceuticals, Inc..; Chiesi Farmaceutici S.p.A.; Forest Research Institute, Inc.; GlaxoSmithKline; Grifols Therapeutics, Inc.; Ikaria, Inc.; Nycomed GmbH; Takeda Pharmaceutical Company; Novartis Pharmaceuticals Corporation; ProterixBio; Regeneron Pharmaceuticals, Inc.; Sanofi; and Sunovion.

Author information

Authors and Affiliations



Co-First/Equal authorship for BH and SC Conception and design: BH, SC; acquisition of data: BH, SC, EAH, APC, JDN, RGB, EB, CBC, DC, MH, NNH, REK, EAK, ECK, FJM, WO, SIR, BMS, PGW, CLL; analysis and interpretation of data: all authors; drafting the article or revising it critically for important intellectual content: BH, SC, JC, EAH, CLL; final approval of the version to be published: all authors.

Corresponding author

Correspondence to Ching-Long Lin.

Ethics declarations

Ethics approval and consent to participate

Ethics and consent were approved by SPIROMICS committee.

Consent for publication

The paper was approved by SPIROMICS publications and presentation committee.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Additional files

Additional file 1:

Figure S1. A scree plot: eigenvalues (magnitude of variances) according to the number of principal components for determining the optimal number of components. (DOCX 65 kb)

Additional file 2:

Figure S2. (a) Internal properties in different clustering methods to find the best clustering approaches as well as the optimal number of clusters; (b) Bootstrapping stability analysis between K-means and hierarchical clustering with 4 or 5 numbers of clusters. (DOCX 58 kb)

Additional file 3:

Figure S3. Predicting imaged-based cluster using only 5 important variables. Variables are βtissueRV (Total), Jacobian (Total), βtissueTLC (Total), Dh* (RMB) and ADI (Total) with 81% accuracy. (DOCX 59 kb)

Additional file 4:

Table S1. The confusion matrices to assess the possible overlap between current and former smoker clusters. Values are presented as the number of subjects (%). (DOCX 15 kb)

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Haghighi, B., Choi, S., Choi, J. et al. Imaging-based clusters in former smokers of the COPD cohort associate with clinical characteristics: the SubPopulations and intermediate outcome measures in COPD study (SPIROMICS). Respir Res 20, 153 (2019).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI:


  • COPD
  • Emphysema
  • Functional small airway disease
  • Former smokers
  • Imaging-based cluster analysis