Derivation and validation of clinical phenotypes for COPD: a systematic review

Background The traditional classification of COPD, which relies solely on spirometry, fails to account for the complexity and heterogeneity of the disease. Phenotyping is a method that attempts to derive a single or combination of disease attributes that are associated with clinically meaningful outcomes. Deriving phenotypes entails the use of cluster analyses, and helps individualize patient management by identifying groups of individuals with similar characteristics. We aimed to systematically review the literature for studies that had derived such phenotypes using unsupervised methods. Methods Two independent reviewers systematically searched multiple databases for studies that performed validated statistical analyses, free of definitive pre-determined hypotheses, to derive phenotypes among patients with COPD. Data were extracted independently. Results 9156 citations were retrieved, of which, 8 studies were included. The number of subjects ranged from 213 to 1543. Most studies appeared to be biased: patients were more likely males, with severe disease, and recruited in tertiary care settings. Statistical methods used to derive phenotypes varied by study. The number of phenotypes identified ranged from 2 to 5. Two phenotypes, with poor longitudinal health outcomes, were common across multiple studies: young patients with severe respiratory disease, few cardiovascular co-morbidities, poor nutritional status and poor health status, and a phenotype of older patients with moderate respiratory disease, obesity, cardiovascular and metabolic co-morbidities. Conclusions The recognition that two phenotypes of COPD were often reported may have clinical implications for altering the course of the disease. This review also provided important information on limitations of phenotype studies in COPD and the need for improvement in future studies. Electronic supplementary material The online version of this article (doi:10.1186/s12931-015-0208-4) contains supplementary material, which is available to authorized users.


Background
Chronic Obstructive Pulmonary Disease (COPD) is the 4 th leading cause of mortality worldwide, and causes significant morbidity [1,2]. Criteria have been developed to diagnose and grade the severity of COPD based on the post-bronchodilator FEV 1 [3]. However, these criteria fail to account for the complexity and heterogeneity of the disease, variable symptomatic manifestations, progression and prognosis of the disease. With progress in the understanding of the disease, and developments in the fields of radiology, genetics, and statistical analyses, identifying phenotypes of the disease with the aim of individualized treatment has now gained significance.
A COPD phenotype has been defined as "A single or combination of disease attributes that describe differences between individuals with COPD as they relate to clinically meaningful outcomes (symptoms, exacerbations, response to treatment, speed of progression of the disease or death)" [4]. Identifying such phenotypes entails use of statistical techniques such as cluster analysis [5]. The goal of cluster analysis is to assign subjects to groups, where subjects in the same cluster are more similar to each other than they are to subjects in other groups.
This review systematically searches the available literature to identify studies that have derived clinical phenotypes in COPD using validated statistical methods, free of definitive pre-determined hypotheses. We aimed to summarize such studies, and the robustness of findings, by identifying the combination of disease attributes in patients with COPD, over and above the traditional spirometric classification of disease, that describe differences between individuals as they relate to clinically meaningful outcomes. We also aimed to identify the limitations of the studies and what needed to be done to improve this field of research.

Methods
A detailed protocol was written prior to starting the review, which identified the key question, and the criteria for the systematic review. We used the Preferred Reporting Items for Systematic Reviews and Meta-Analysis statement [6] as the template for reporting the review.
We searched the literature for studies that enrolled patients with COPD, as defined by GOLD criteria [3], in which unsupervised (hypothesis-free) cluster analyses were used to derive clinical phenotypes as they related to clinically meaningful outcomes.

Search strategy
The following databases were searched for relevant stud- . The search strategy used text words and relevant indexing to answer the following question: What are the different phenotypes of COPD that have been derived, based on subject characteristics? The full MEDLINE strategy (Appendix 1) was applied to all databases, with modifications to search terms as necessary. Further studies were identified in Web of Science and Scopus (02/Oct/2013) by carrying out by citations searches for studies citing included studies, as well as by examining their reference lists. The Medline strategy was rerun prior to submission (two relevant studies were found, one of which was included).

Selection of studies
We selected studies that included at least 50 patients who were 18 years and above, and in which a statistical method was used to identify clusters of subjects. We excluded abstracts, reviews and commentaries. Studies that exclusively enrolled subjects with alpha-1 antitrypsin deficiency or those that tested single risk factors, including genetic polymorphisms, for association with outcomes in COPD were excluded. Studies that tested empirically defined phenotypes without an analytical justification of these phenotypes, and those that did not analyze the association of the derived clusters with clinically meaningful outcomes were also excluded.

Data extraction and consensus
Two reviewer authors (LMP and MA) independently scrutinized titles and abstracts for eligibility. Citations deemed relevant by either reviewer were selected and papers retrieved for full-text review. Each eligible article was independently assessed by two reviewers (LMP and MA). Disagreements were resolved by discussion between the reviewers. Study quality was assessed using the Strengthening the Reporting of Observational studies in Epidemiology (STROBE) checklist [7].
The two reviewer authors (LMP and MA) independently extracted data from each study using a data extraction form. Disagreements were resolved by discussion. Data were extracted for various characteristics, including the following: author, publication year, study design, number of participants, criteria for inclusion and exclusion, statistical methods for cluster analysis and outcomes assessed.

Statistical analysis
Descriptive statistics for the clinical features most commonly used in the routine care of COPD patients were collected and tabulated from the included studies. Due to significant heterogeneity in selection of patients, cluster analyses and derived phenotypes, pooling of results was not deemed appropriate.

Studies selected for the systematic review
We identified 9156 citations, of which 4068 unique published articles were identified after exclusion of duplicate articles. After screening titles and abstracts, 284 studies were found to satisfy the criteria for further review and their full-texts were retrieved. After full-text review, 276 studies were excluded for various reasons, and 8 studies (7 observational studies, one study with data pooled from two randomized controlled trials) were included in the systematic review. Figure 1 summarizes the selection criteria for the included studies.
In the 8 studies identified [8][9][10][11][12][13][14][15], all conducted in Europe and USA, the median sample size included in the analyses was 332 subjects (range 213-1543). Of the 7 studies that described the setting, only one study had a subset of its enrolled subjects recruited from a community-based lung cancer screening study [10]; all the other studies were conducted in a university-based tertiary care setting. A summary of the study characteristics of the included studies can be found in Table 1; a more detailed description can be found in the online supplement, Additional file 1: Table S1.

Studies reporting derivation of phenotypes for COPD that were excluded
We excluded 8 studies that involved the derivation of phenotypes for COPD. Five studies were excluded as none of the studies analyzed the association of the derived phenotypes with any clinically meaningful outcomes. The two studies by Paoletti et al. [16] and Pistolesi et al. [17] derived a set of 9 variables to predict airways obstructive or parenchymal destructive phenotypes based on cluster analysis used on computerized tomography (CT) parameters. The study by Camiciottoli et al. used principal component analysis to derive 2 variables that represented the phenotype (airway obstruction versus parenchymal destruction) and severity of the disease using CT scans, and then used multivariable regression to derive a set of variables to predict the two PCA-transformed variables [18]. Roy et al. [19] performed principal component analysis to derive four components to explain the variability in the dataset of 127 COPD patients. A recent study by Fens et al. [20] used factor analysis followed by k-means cluster analysis to derive four phenotypes for COPD using clinical, CT scan and breathomics-derived variables. We also excluded 3 studies [21][22][23] as they included subjects with both asthma and COPD, and in addition, did not report an association with clinical outcomes.

COPD phenotyping in the selected studies
There was heterogeneity in the patient characteristics across studies. One study only included patients with a post-bronchodilator FEV 1 ≤ 45% predicted [11], one study included patients with a pre-bronchodilator FEV 1 ≤ 45%  predicted [12] and one study only included subjects with GOLD 2-4 [15]. GOLD 1 subjects were underrepresented in the studies that reported the classification of subjects by GOLD [8][9][10]. Women were underrepresented across studies too; the proportion of women ranged from 7% [13] to 46% [12]. Table 2 summarizes the characteristics of the subjects in the selected studies for the systematic review; a more detailed description can be found in the online supplement, Additional file 1: Table S2.
There was significant heterogeneity in the methods used for cluster analysis, these have been summarized in Table 3, and a detailed summary of the variables included in each analysis is reported in the online supplement, Additional file 1: Table S3. None of the studies validated the derived phenotypes in an external cohort for clinically meaningful outcomes. Three studies prospectively validated the phenotypes with mortality data of the cohort from which the phenotypes were derived [9,10,13], and one of these also analyzed hospital admissions [13]. One of the studies [12] pooled the data from two randomized-controlled trials [24,25] to analyze the change in the frequency of exacerbations of individuals who were randomized to receive salmeterol/fluticasone propionate (SFC) compared to those who received salmeterol alone (SAL), and whether this response varied based on the phenotypes that were derived. The other studies validated the derived phenotypes with predicted mortality scores. These have been summarized in Table 3. Table 4 provides a summary of the derived phenotypes and their associations with studied outcomes. A more detailed description can be found in the online supplement, Additional file 1: Table S4. The number of phenotypes ranged from 2 to 5. Five of the studies described a phenotype of younger individuals with severe respiratory disease with a low prevalence of cardiovascular co-morbidities, high prevalence of poor nutritional status and poor health status [8][9][10][11]15]. In two of these studies, women were significantly over-represented in this phenotype [10,15]. A similar phenotype was reported in 2 other studies [13,14], but neither of the studies reported these patients to be younger than the rest of the cohort. Subjects with this phenotype had poor outcomes across the studies. All studies reported a phenotype of older individuals with moderate respiratory disease, and a high prevalence of obesity. In addition, 6 studies reported subjects with this phenotype to have an increased prevalence of cardiovascular and metabolic co-morbidities and inflammatory markers [8][9][10]12,13,15]. Subjects with this phenotype were reported to have a worse prognosis that the individuals with comparable age and respiratory disease status, and, in the study by DiSantostefano et al [12], they were found to have an improvement of prognosis with respect to a decrease in the frequency of exacerbations when treated with SFC, rather than SAL alone.

Assessment of quality of studies
The 8 studies were in compliance with most of the items on the STROBE checklist. There were methodological limitations with some of the studies. Only one study reported a rationale for the sample size [13]. Two studies did not report numbers of individuals at each stage of the study, nor used a flow diagram [14,15]. Only two studies conducted sensitivity analyses: one tested the robustness of the data to using the lower limit of normal (LLN) instead of the fixed ratio for the diagnosis of COPD [8], and the other repeated the cluster analysis using an alternative method, and also analyzed the results in subsets of included participants [13]. Only two studies imputed missing values for variables, one using the median for continuous variables, and the most frequent value for categorical variables [12], and the other using multiple imputation using chained equations [13]. One study reported the validation of the phenotypes with contingency tables, but failed to report any specific details [14].

Discussion
This systematic review of the literature for COPD phenotype studies using unsupervised (free of definitive pre-determined hypotheses) statistical analyses to derive clusters associated with clinically relevant outcomes yielded 8 studies with significant heterogeneity in the selection of subjects, statistical methods used and outcomes validated. Despite these differences, two clinical phenotypes were consistently found across most studies, and may have clinical implications.
One of the phenotypes that describe individual been younger with severe respiratory disease, having a low probability of cardiovascular co-morbidities, high prevalence of poor nutritional status and poor health status with poor longitudinal health outcomes may be important for two reasons. Firstly, although the derivation of phenotypes was cross-sectional, it is likely that such individuals experience a rapid decline in lung function, and therefore, recognizing this phenotype at a younger age and treating the disease aggressively along with measures to support smoking cessation could have important prognostic implications. Secondly, given the low prevalence of cardiovascular co-morbidities, it is likely that such individuals would be good candidates for lung transplantation. Longitudinal cohort studies that include younger patients early in the course of their disease, and follow them closely to understand the differential progression of disease would be vital to a better understanding of this phenotype. Two of the studies found a high proportion of women in the sub-group of individuals with this phenotype, and a stronger pattern might emerge if more women are included in such studies. This phenotype was also associated with a lower height in one study [13], and the authors hypothesized that an impaired in-utero and childhood lung growth may have been a contributing factor to the severity of disease. This association needs to be further validated. The other phenotype of older individuals with moderate respiratory disease, and a high prevalence of obesity, and increased prevalence of cardiovascular and metabolic comorbidities and inflammatory markers is important as such individuals appear to have worse health outcomes than individuals with comparable respiratory disease with fewer co-morbidities. Contrary to the hypothesis of a "chronic systemic inflammatory syndrome" [26] that proposes a systemic inflammation attributable to COPD, Garcia-Aymerich et al. [13] found that bronchial inflammatory markers were not associated with systemic inflammatory markers. This suggests that the worse health outcomes might be caused by the co-morbidities, and consequently, screening for, and optimally treating the comorbidities might be associated with better health outcomes. The study by DiSantostefano et al. also found that patients with this phenotype had a decreased frequency of moderate/severe exacerbations when treated with SFC, when compared to those with the same phenotype randomized to treatment with SAL alone [12].
The study by Vanfleteren et al. [15] identified a phenotype with a very high prevalence of anxiety and depression. A third of such patients had a myocardial infarction, and as suggested by the authors, is consistent with reports on anxiety being the strongest predictor of mortality in COPD [27]. The study by DiSantostefano et al. also found that a high proportion of individuals in the group with cardiovascular comorbidities were on psycholeptic drugs [12,14]. This is important as it is well established that depression and anxiety adversely affect prognosis in COPD, conferring an increased risk of exacerbation and possibly death [28]. Conversely, COPD also increases the risk of developing depression. Future studies should therefore screen patients with COPD for psychiatric disorders for a better understanding of this phenotype.
The present GOLD classification of COPD acknowledges the complexity of the disease, and has included     Outcome analyzed-Framingham 10-year risk, % 8.6 (6.6) 11.5 (6.6) 7.6 (6) 11.9 (7.3) 6.6 (4.5) *BOD score -Score calculated using body mass index (BMI), obstruction (FEV1 % pred) and dyspnoea evaluated on the modified Medical Research Council (MMRC) scale). § The study was a longitudinal analysis of outcomes for the same cohort enrolled in the study by Burgel  measures of risk, in addition to classification based on severity of the spirometric abnormality [3]. A comparison of the predictive ability of phenotypes and that of the present GOLD classification with regards to clinically relevant outcomes would help in our understanding of the disease process, and needs to be explored by future studies. Several methodological limitations were observed in the studies, and the elaboration of these limitations will hopefully serve to guide future studies. In a biased sample, the derived phenotypes merely reflect the bias in selection, and not necessarily the heterogeneity of disease. Of the 7 studies that specified the setting, all [8,9,11,[13][14][15] except a subset on one study [10] were conducted in universitybased referral centers, and patients with mild, early disease. In most developed countries, the prevalence of COPD in women has been reported to be comparable to that of men in recent years [29] and it therefore appears that women were under-represented in most included studies. Four studies [8][9][10][11] reported missing data, and only subjects with complete data were analyzed. This lead to a significant proportion of patients being excluded; the excluded patients were found to be statistically significantly different from those who were included on several important parameters. Strategies such as multiple imputations, employed by two studies [12,13], may be of use in the design of future studies.
Studies used different statistical methodologies for the derivation of phenotypes. Three studies [8][9][10] used principal component analysis for reducing the dimensionality of the data; the clinical meaningfulness of principal components can be difficult to interpret. Cho et al. [11] used factor analysis, followed by several clustering algorithms, a method that is considered robust. However, the study had a sample that was biased toward very severe disease, and the external validity of the results needs to be tempered. DiSantostefano et al. used tree-based supervised cluster analysis using modified recursive partitioning to derive clusters. Self-organizing maps, used in one study [15], are a newer method that needs further validation. A summary of the strengths and weaknesses of the various clustering techniques can be found in a recent review by the authors of this study [30]. Only one study internally validated the clusters by splitting the dataset for derivation and validation [12], but none of the studies validated the derived phenotypes in an external population, limiting the validity of these derived phenotypes. Few studies investigated the robustness of the derived phenotypes to (i) statistical methods used to perform the clustering; (ii) variables used to define the clusters. The use of prognostic indicators such as the BODE index and contingency tables, employed by three of the studies [8,14,15], is far less robust than the use of clinical endpoints to validate the phenotypes, and this limits the validity of the results of these studies. Two of the studies included subjects enrolled in randomized controlled trials (RCTs) [11,12], and it is likely that the populations selected were less heterogeneous than those based in the community, and the studies were likely to suffer from selections biases inherent to RCTs. Lastly, as all of the studies derived phenotypes with cross-sectional data; the stability of these phenotypes over time, and the effect of medications and interventions remain unknown, and needs to be studied in prospective studies.
The strengths of this systematic review include an a priori protocol, detailed literature search with no language restrictions, performed by a librarian (TL), independent review by two reviewers at every stage of the review, and exhaustive data extraction from included studies. The involvement of biostatisticians (AB, TZ) with expertise in this research field, significantly helped our understanding of the literature.
A major limitation of the study is the exclusion of studies that included patients with asthma. This possibly explains the lack of the asthma-COPD overlap phenotype [31] in the included studies. However, the exclusion was per protocol, with the aim of having the systematic review focused to answering a specific question. We also excluded studies that tested specific hypothesis-driven, or empiric phenotypes without a derivation study, reflecting the specific research question that the review was aimed to answer.

Conclusion
This systematic review of the literature identified studies in which two phenotypes of COPD were reported often, representing different aspects of the disease spectrum, and recognizing these phenotypes and treating them optimally may have implications for altering the course of the disease. However, the selection of specific subsets of patients, evident in the existing studies, limits the generalizability of the results. This systematic review has provided important information on limitations of the phenotype studies in COPD and the need for improvement in future research. There is a need for sampling different populations in the COPD disease spectrum, to mirror the population of COPD patients at large, including women, never smokers, and those with mild disease. There is also a need for longitudinal studies of patients with COPD to validate in the same cohort and to explore sources of variability in phenotypes and their temporal nature as well as to allow an iterative validation process in which candidate phenotypes are identified before their relevance to clinical outcome is determined.
OvidSP 1969 to 2013 Week 20); Web of Science (via ThomsonReuters 1996 to 22/Apr/2013); Scopus (via Elsevier 1996 to 22/Apr/2013); CENTRAL (via Cochrane Library). The search strategy used text words and relevant indexing to answer the following question: What are the different phenotypes of COPD, based on subject characteristics? The full MEDLINE strategy, shown below, was applied to all databases, with modifications to search terms as necessary.