Predicting survival of patients with idiopathic pulmonary fibrosis using GAP score: a nationwide cohort study

Background The clinical course of idiopathic pulmonary fibrosis (IPF) varies widely. Although the GAP model is useful for predicting mortality, survivals have not yet been validated for each GAP score. We aimed to elucidate how prognosis is related to GAP score and GAP stage in IPF patients. Methods The Korean Interstitial Lung Disease Study Group conducted a national survey to evaluate various characteristics in IPF patients from 2003 to 2007. Patients were diagnosed according to the 2002 criteria of the ATS/ERS. We enrolled 1,685 patients with IPF; 1,262 had undergone DLCO measurement. Patients were stratified based on GAP score (0–7): GAP score Group 0 (n = 26), Group 1 (n = 150), Group 2 (n = 208), Group 3 (n = 376), Group 4 (n = 317), Group 5 (n = 138), Group 6 (n = 39), and Group 7 (n = 8). Results Higher GAP score and GAP stage were associated with a poorer prognosis (p < 0.001, respectively). Survival time in Group 3 was lower than those in Groups 1 and 2 (p = 0.043 and p = 0.039, respectively), and higher than those in groups 4, 5, and 6 (p = 0.043, p = 0.032, and p = 0.003, respectively). Gender, age, and DLCO (%) differed significantly between Groups 2 and 3. All four variables in the GAP model differed significantly between Groups 3 and 4. Conclusion The GAP system showed significant predictive ability for mortality in IPF patients. However, prognosis in IPF patients with a GAP score of 3 were significantly different from those in the other stage I groups and stage II groups of Asian patients. Electronic supplementary material The online version of this article (doi:10.1186/s12931-016-0454-0) contains supplementary material, which is available to authorized users.


Background
Idiopathic pulmonary fibrosis (IPF) is a specific form of diffuse interstitial lung disease (DILD) that mainly occurs in adults over the age of 50 [1]. It is a chronic, progressive, irreversible, fibrosing interstitial pneumonia, characterized by limited to the lungs [2]. While the etiology of IPF is unknown, it is related to a histological and/or radiological "usual interstitial pneumonia" (UIP) pattern [1]. Morbidity and mortality are high in IPF-the median survival time is only 2.5 to 3.5 years-and the clinical course and prognosis vary widely among individual patients [3]. This high variability makes predicting prognosis difficult, which in turn causes problems with treatment planning. Therefore, physicians must be better equipped to predict the clinical course of IPF if they are to provide precise prognoses and adequate treatment to patients.
Previous studies have shown that age, gender, lung function change, radiological pattern, histological variability, dyspnoea, cough, pulmonary artery hypertension, amount of elastic fiber, and some molecular biomarkers are associated with prognosis [4][5][6][7][8][9][10]. Some investigators have attempted to predict clinical course using these prognostic factors [11]. However, none of these predictive models have been widely adopted, as they are difficult to use or lack external validation. In 2012, Ley et al. suggested a novel system for staging IPF that is similar to those used in asthma, chronic obstructive pulmonary disease (COPD), and lung cancer [12]. The so-called GAP index and staging system uses of four variables: gender (G), age (A), and two pulmonary physiological parameters (P)-percentage predicted forced vital capacity (FVC [%]), and percentage predicted diffusion capacity of the lungs for carbon monoxide (DL CO [%]). These four variables are commonly measured at the initial visit and are easily followed up. This system has helped clinicians to predict prognosis and decide on management strategies. Although this GAP model is simple-to-use for predicting mortality, prognoses have not yet been evaluated for each GAP score. The purpose of our study was to validate, using national survey data, how prognosis is related to GAP score and GAP stage in patients with IPF.

Patient selection
The study involved patients who had been diagnosed with idiopathic interstitial pneumonia (IIP) at 54 university and teaching hospitals between January 1, 2003 and December 31,2007. At each hospital, pulmonary specialists (pulmonologists, chest radiologists, and pathologists) had confirmed the diagnoses, and data were reviewed by the Scientific Committee at the Korean Academy of Tuberculosis and Respiratory Diseases. IPF was diagnosed on the basis of the 2002 criteria of the American Thoracic Society/European Respiratory Society (ATS/ERS) [13]. Initially, we excluded patients who had a history of connective tissue disease, pneumoconiosis, or ingestion of either a cytotoxic agent or amiodarone, and all of which are well-known to cause interstitial lung disease. Additionally, we excluded patients with suspected chronic hypersensitivity pneumonitis; such decisions were made on the basis of history, laboratory data, and committee conference.
In total, 2,186 patients with idiopathic interstitial pneumonia (IIP) were registered; of these, patients with other forms of ILD than IPF (n = 501) were excluded from the study, as were patients who had not undergone pulmonary function testing (PFT) that included DL CO measurement (n = 423). Ultimately, 1,262 patients were included in the study: 760 at GAP stage I, 455 at stage II, and 47 at stage III (Fig. 1). We reviewed the clinical, radiological, and physiological data of all the included patients. With regard to physiological data, we investigated FVC, FVC (%), forced expiratory volume in one second (FEV 1 ), percentage predicted FEV 1 (FEV 1 [%]), total lung capacity (TLC), percentage predicted TLC (TLC [%]), DL CO , and percentage predicted DL CO (DL CO [%]). In addition, we evaluated patients' C-reactive protein (CRP) levels, and examined their blood for the positivity of antinuclear antibody (ANA) and rheumatoid factor (RF). The composite physiologic index (CPI), which is a predictive model for IPF prognosis, was calculated as Well et al. reported [14]. All hospital data were entered into the ILD web-based registry (http://www.ild.or.kr/).

GAP model
Total GAP score was calculated using the method suggested by Ley et al [12] (Table 1). All four clinical variables were examined: gender (woman: 0 points, man: 1 point), age (0-2 points), FVC (%) (0-2 points), and DL CO (%) (0-3 points). We then divided the patients on the basis of GAP score (Groups 0-7): Group 0 (n = 26), Group 1 (n = 150), Group 2 (n = 208), Group 3 (n = 376), Group 4 (n = 317), Group 5 (n = 138), Group 6 (n = 39), and Group 7 (n = 8). In the physiological category, the "cannot perform" classification (3 points) of DL CO measurement had not been recorded in the data used. For this reason, the total GAP score of 8 was not investigated in the current study. Additionally, we excluded patients with total GAP scores of 0 (n = 26), and 7 (n = 8), as these two groups contained much fewer patients than A total of 1262 IPF patients were analysed in this study, excluding 501 with other interstitial lung disease and 423 who had not undergone pulmonary function testing that had included DL CO . Note: Groups with a total GAP score of 0 and 7 were excluded because they contained too few patients and because the baseline characteristics of patients with GAP score 0 were significantly different (all women, never smokers). No patients with a GAP score of 8 were included, because the "unable to perform" category in DL CO was not checked in this study. Definition of abbreviations: IIP, idiopathic interstitial pneumonia; ILD, interstitial lung disease; AIP, acute interstitial pneumonia; BOOP, bronchiolitis obliterans organizing pneumonia; DIP, desquamative interstitial pneumonia; LIP, lymphocytic interstitial pneumonia; NSIP, non-specific interstitial pneumonia; RB-ILD, respiratory bronchiolitis-associated interstitial lung disease the other groups. The characteristics in Group 0, which contained only women who had never smoked, were significantly different from those in the other groups.

Statistical analysis
Information was obtained from web-based questionnaires and medical records; it was stored and analysed using the Excel™ computer program. Analysis of variance (ANOVA) was used to compare continuous variables, and Bonferroni's correction was used for post-hoc analysis. Pearson's chi-squared test or Fisher's exact test were used to compare categorical variables. Continuous variables were presented as mean ± standard deviation, or proportions within each group as a percentage.
To compare the GAP score groups in terms of survival times, Kaplan-Meier survival analysis and the log-rank test were used. In addition, multivariate analysis was conducted with Cox proportional hazard model. Cstatistic was also performed for the GAP model at 1-year, 2-year, and 3-year. When performing the survival analysis, we censored the following conditions: (1) still alive at last visit (at last visit date), (2) lost to follow-up loss and (3) had undergone lung transplantation (at surgery date). Statistics were analysed using SPSS™ Version 20 (SPSS, Chicago, IL, USA). An adjusted p-value less than 0.05 was regarded as statistically significant.

Demographic characteristics
There were 1,228 patients with a GAP score from 1 to 6. The baseline characteristics of these patients are summarized in Table 2. The mean age of the study population was 67.5 ± 9.3 years and was lowest in Group 1. The highest proportion of men occurred in Group 6 (p < 0.001). Although the patients in Group 1 had experienced the longest duration of respiratory symptoms at diagnosis, and those in Group 6 had experienced the shortest, this was not statistically significant (p = 0.133). With regard to smoking status, 83.3 % of patients in Group 6 were eversmokers; the equivalent values in Groups 1 and 2 were 58.7 and 50.5 %, respectively. Furthermore, smoking duration and amount were higher in Group 6 than in the other score groups (p < 0.001 and p = 0.024, respectively). Patients with a higher GAP score tended to have been diagnosed using the clinical method rather than surgical lung biopsy. Specifically, the proportion of clinically diagnosed patients was 87.2 % in Group 6, whereas it was 22.0 % Group 1. The percentages of ANA and RF positivity did not differ significantly among the groups (p = 0.580 and p = 0.177, respectively). Increased CRP level was significantly associated with higher GAP score (p < 0.001). CPI also tended to increase as GAP score increased (p < 0.001). The mean value of CPI was significantly different between Group 3 and Group 4, although there was no significant difference between Group 2 and Group 3 after Bonferroni's correction. The mean follow-up duration of the study population was 19.0 ± 16.0 months.

Physiological and radiological parameters
We investigated pulmonary function, ABGA results, and HRCT findings in IPF patients (Table 3). In Group 1, FVC (%) and DL CO (%) were, respectively, 85.6 and 75.8 %, while in Group 6 the values were 55.5 and 31.9 %. ABGA also differed significantly among groups. Resting pulmonary oxygen tension (PaO 2 ) was highest in Group 1, and higher GAP score was significantly associated with lower pulmonary oxygen tension (p < 0.001). In terms of radiological findings, the groups did not differ in any parameter other than reticular pattern.

Comorbidities and initial respiratory symptoms
Co-morbidities and initial presenting respiratory symptoms are shown in Additional file 1: Tables S1 and S2. The most common co-morbidities were past history of tuberculosis, diabetes mellitus, and hypertension; specifically, past history of tuberculosis was in 147 patients (12.  had lung cancer. These co-morbidities were not significantly different among groups. Fourteen patients (1.1 %) had a family history of IPF (data not shown). Cough, sputum, and hemoptysis were significantly more frequent at higher GAP scores (p = 0.004, p < 0.001, and p = 0.021, respectively). Although the proportion of patients who suffered dyspnoea of exertion increased as GAP score increased, this association was not statistically significant.
Survival analysis on the basis of GAP score All GAP variables showed significant association with prognosis except gender (G) ( Table 4, Additional file 1: Table S3).  Table 5. Respiratory failure (42.3 %) and infection (34.2 %) were the most common causes of death in study population.
Sub-analysis by GAP score Table 6 shows the distribution of GAP points in each group in terms of predictive variables. Higher GAP scores were significantly associated with male predominance, aging, and poorer lung function, same as the original definition of the GAP model. Furthermore, gender, age, and DL CO (%) differed significantly between Groups 2 and 3, and all four variables in the GAP model differed significantly between Groups 3 and 4. Note: Values in parentheses are percentages. CPI = 91.0 -(0.65 a percent predicted DL CO ) -(0.53 a percent predicted FVC) + (0.34 a percentage predicted FEV 1 ) GAP gender, age, and 2 lung physiology variables (FVC and DL CO ), ANA antinuclear antibody, RF rheumatoid factor, CPI composite physiologic score a the following post hoc comparisons were significant at the p = 0.05 level; all other comparisons were non-significant: Score 1 group versus Score 2, 3, 4, 5, 6 groups, Score 2 group versus Score 3, 4, 5, 6 groups, and Score 3 group versus Score 4, 5 groups (age); Score 1 group versus Score 3, 4, 5, 6 groups and Score 2 group versus Score 3, 4, 5, 6 groups (smoking duration); Score 1 group versus Score 6 group, Score 2 group versus Score 6 group, and Score 3 group versus Score 6 group (CRP); Score 1 group versus Score 2, 3, 4, 5, 6 groups, Score 2 group versus Score 4, 5, 6 groups, Score 3 group versus Score 4, 5, 6 groups, Score 4 group versus Score 5, 6 groups and Score 5 group versus Score 6 group (CPI)

Discussion
The GAP model is simple to use in planning treatment or providing prognosis information to IPF patients. However, prognosis in relation to individual score groups have not been studied until now. This study attempted to undertake external validation of the GAP model in a relatively large cohort of IPF patients. Herein, we found that GAP score groups differed in terms of survival: in particular, survival in Group 3 patients differed from the other stage I groups, as well as the stage II groups. For a long time, clinicians who care for IPF patients have been struggling to make accurate prognoses, because IPF is a heterogeneous disease that lacks a validated predictive model [11,15]. Many previous researchers have aimed to find an ideal model for predicting clinical outcome in IPF patients [14,[16][17][18][19][20][21][22].
In 2001, for instance, King et al. [16] created an upgraded version of a previously existing clinical, radiological, and physiological scoring system, known as the "CRP system", [17] to predict survival in IPF patients. This model took into account age, smoking status, clubbing of the fingertips, HRCT score, HRCT score for pulmonary hypertension score, TLC (%), and PaO 2 at max exercise. However, it did not make clear that gender was significantly associated with mortality. Furthermore, it was too complex to use in a clinical setting, and cardiopulmonary exercise testing was essential to calculating the score. Wells et al. [14] then proposed the composite physiological index (CPI), which used a combination of three factors to make a prediction-FVC (%), FEV 1 (%), and DL CO (%); these factors are determined using pulmonary function testing (PFT). Physicians could calculate CPI using PFT results only, rendering CT findings unnecessary in predicting prognosis. Besides these models, du Bois et al. [21] developed a predictive system that was based on IPF diagnostic criteria, and Richards et al. [22] used biomarkers to create another predictive model. However, these models have also been criticized because they are complicated to use or lack external validation.
Ley et al. developed the GAP model in 2012. Its straightforward nature has allowed the GAP index to be widely studied, [23][24][25][26][27][28] and it has been validated in the United States, Italy, and South Korea [12,23]. In fact, the system showed robust predictive power in patients  the following post hoc comparisons were significant at the p = 0.05 level; all other comparisons were non-significant: Score 1 group versus Score 4, 5, 6 groups, Score 2 group versus Score 4, 5, 6 groups, Score 3 group versus Score 4, 5, 6 groups and Score 4 group versus Score 5, 6 groups (FVC (%)); Score 1 group versus Score 4, 5, 6 groups, Score 2 group versus Score 4, 5, 6 groups, Score 3 group versus Score 4, 5, 6 groups and Score 4 group versus Score 5, 6 groups (FEV 1 (%)); Score 1 group versus Score 4, 5, 6 groups, Score 2 group versus Score 5, 6 groups, Score 3 group versus Score 4, 5, 6 groups, and Score 4 group versus Score 5, 6 groups (TLC (%)); Score 1 group versus Score 2, 3, 4, 5, 6 groups, Score 2 group versus Score 4, 5, 6 groups, Score 3 group versus Score 4, 5, 6 groups, and Score 4 versus Score 5,6 groups (DL CO (%)); Score 1 group versus Score 3, 4, 5, 6 groups, Score 2 group versus Score 4, 5, 6 groups, and Score 3 group versus Score 5, 6 groups (Resting PaO 2 ); Score 1 group versus Score 4, 5 groups and Score 2 group versus Score 4, 5 groups (Resting PaCO 2 ) with chronic ILD (ILD-GAP model) and IPF related to occupational dust exposure [26,28]. Furthermore, the model is more powerful and accurate when follow-up PFT results are taken into account, [26,27] and it has been found that DL CO can be replaced by HRCT fibrosis score in the GAP model (CT-GAP model) [25]. Interestingly, the duration of respiratory symptoms at diagnosis was longest in Group 1 and shortest in Group 6, although this was not a significant difference. This may be due to variations in individual perception of respiratory symptoms [29]. Hiwatari et al. [30] reported that IPF patients with mucous hypersecretion had significantly poor prognosis. In our study, the high score group showed sputum production significantly more often than score 1 or 2 group. This could mean that the patients with a higher GAP score could be more vulnerable to respiratory infection, which could be a cause of death. In our study, patients with a score over 3 showed a higher mortality rate due to infection than score 1 or 2 group. Variables related to smoking were significantly related to GAP score in this study; the proportion of ever-smokers, as well as smoking amount, were highest in Group 6. In other studies however, results have conflicted regarding the association between smoking and prognosis in IPF. Such results are easily influenced by gender, as well as the "healthy smoker effect" [16,31]. In our study, smoking was not significantly associated with mortality in both univariate and multivariate analyses (Additional file 1: Table S3). Some investigations have shown that elevated CRP levels are related to poor prognosis [2,32]. In the present study, CRP levels were highest in Group 6, and GAP score was significantly associated with CRP level (p < 0.001).
The most common cause of death in IPF patients is respiratory failure, which results from the progression of lung fibrosis, rather than comorbidities [3]. Furthermore, our study revealed no significant differences among the groups in terms of comorbidities. This suggests that mortality in IPF can be predicted, because the majority of mortalities are caused by the IPF itself.
In the present study, prognosis in Group 3 differed significantly from that in the other score groups, as  shown using Kaplan-Meier analysis. This result suggests that the GAP score of 3 could be divided from the other stage I scores, thus creating a more refined prognostic system. Although the GAP model is simple to use and has proven effective in other chronic ILDs, the staging system amounts basically to a rough grouping of the GAP scores (stage I: 0-3 points, stage II: 4-5 points, and stage III: 6-8 points); the GAP stages I, II, and III were designed to have lowest 40 % risk, middle 40 % risk, and highest 20 % risk, respectively. In our study, Group 3 differed significantly from the other stage I groups, and from the stage II groups, in terms of all four predictive variables that contribute to GAP score; the only exception was FVC (%), which did not differ between Groups 2 and 3. Although the mean value of lung function results was similar, age and gender composition were significantly different between Group 2 and 3. Ley et al. [12] mentioned that one of the limitations of the GAP model is its overestimation of risk in lower-risk groups, and this may be the reason for the lack of significant difference in FVC (%) mentioned between Group 2 and 3. Although the mean value of CPI significantly increased as GAP score increased, the difference of CPI between Groups 2 and 3 was not significant in our study, unlike GAP score. This might be explained by a difference in study design between the GAP model and CPI: GAP uses more clinical data in its model, such as age and gender, while CPI was created using only PFT results [12,14]. Our study did have some limitations. Firstly, patients were diagnosed using the 2002 ATS/ERS guidelines, which place more importance on surgical lung biopsy results than do the 2011 updated guidelines. Also in this study, the HRCT findings were not quantified as scores, or classified according to updated guidelines. In addition, in radiologic findings, traction bronchiectasis was not investigated. However, Ley et al. [12] created the GAP model using a derivation cohort and validation cohort that had been diagnosed between 2000 and 2010. Additionally, Kim et al. [23] demonstrated that the GAP model was effective (except in predicting the 3-year risk of death) in Korean IPF patients who had been diagnosed between 2005 and 2009. Another limitation is that Groups 0 and 7 were excluded from the study because they contained much fewer patients than the other score groups. In fact, patients in Group 0 (all women, never smokers) differed significantly from the other score groups in terms of baseline characteristics. Furthermore, no patients were enrolled who had a GAP score of 8, which requires the inclusion of an "unable to perform" category in DL CO measurement. We also excluded patients who had not undergone PFT that included DL CO . This considerable number of excluded groups may have led to selection bias. Finally, the Korean ILD group did not investigate the Note: Values in parentheses are percentages "Cannot perform" in DL CO was not recorded in this study Total GAP score 3 group was compared with each group 2 and group 4 by Bonferroni adjustment. The following post hoc comparisons were significant at the adjusted p value = 0.05; Score 3 group versus Score 2 group (Gender, age, and DL CO , % predicted); Score 3 group versus Score 4 group (Gender, age, FVC, % predicted and DL CO , %predicted) GAP gender, age, and 2 lung physiology variables (FVC and DL CO ) radiologic scoring of fibrosis, dyspnea scale, and pulmonary artery hypertension, which could have provided more information on prognosis in IPF patients.

Conclusion
In summary, this study was designed as a national validation study to evaluate GAP scores in relation to the prognosis of patients with IPF. On the basis of our study results, we suggest that Group 3 could be separated from other GAP stage I patients and that reporting this score separately would improve mortality prediction.