Can spirometric norms be set using pre- or post- bronchodilator test results in older people?

Background Chronic Obstructive Pulmonary Disease (COPD) is defined by post-bronchodilator spirometry. Data on “normal values” come predominantly from pre-bronchodilator spirometry. The effects of this on diagnosis are unknown. Methods Lower limits of normal (LLN) were estimated from “normal” participants in the Burden of Obstructive Lung Disease (BOLD) programme. Values separately derived using pre- and post-bronchodilator spirometry were compared. Sensitivity and specificity of criteria derived from pre-bronchodilator spirometry and pre-bronchodilator spirometry adjusted by a constant were assessed in the remaining population. The “gold standard” was the LLN for the post-bronchodilator spirometry in the “normal population”. For FEV1/FVC, sensitivity and specificity of criteria were also assessed when a fixed value of < 70% was used rather than LLN. Results Of 6,600 participants with full data, 1,354 were defined as “normal”. Mean differences between pre- and post- bronchodilator measurements were small and the Bland-Altman plots showed no association between difference and mean value. Compared with using the gold standard, however, tests using pre-bronchodilator spirometry had a sensitivity and specificity of detecting a low FEV1 of 78.4% and 100%, a low FVC of 99.8% and 99.1% and a low FEV1/FVC ratio of 65% and 100%. Adjusting this by a constant improved the sensitivity without substantially altering the specificity for FEV1 (99%, 99.8%), FVC (97.4%, 99.9%) and FEV1/FVC (98.7%, 99.5%). Conclusions Using pre-bronchodilator spirometry to derive norms for lung function reduces sensitivity compared to a post-bronchodilator gold standard. Adjustment of these values by a constant can improve validity of the test.

Background GOLD defines chronic pulmonary obstructive disease (COPD) in terms of post-bronchodilator lung function [1], but the data from which 'normal values' are derived are generally based on pre-bronchodilator lung function measures [2,3]. Although some authors have provided local equations based on post-bronchodilator values or assessed differences when using bronchodilators [4][5][6], there is no general account of the systematic difference between the two measures. As "normal" values are derived from "normal" people without overt pathology, it might be anticipated that the differences would be small and possibly not clinically important.The Burden of Lung Disease (BOLD) Initiative was designed to develop methodology that can be used to estimate the prevalence and economic burden of COPD. BOLD is a survey of COPD among non-institutionalised adults aged 40 years and over [7]. Study participants complete a questionnaire covering respiratory symptoms, health status, activity limitation, and exposure to potential risk factors, and perform pre-and post-bronchodilator spirometry tests.
This analysis uses BOLD data to estimate the differences between pre-and post-bronchodilator measures and the lower limit of normal values (LLN) derived using either pre-or post-bronchodilator lung function measures in the same subjects. We also assessed the extent to which any differences varied by age, body mass index and height for males and females, separately, and the extent of misclassification using criteria based on pre-bronchodilator spirometry. For FEV1/FVC we also looked at the extent of misclassification using criteria based on pre-bronchodilator spirometry and defining airway obstruction by FEV1/FVC<70% rather than <LLN.

Methods
We used BOLD data from the following European centres: Maastricht (Netherlands), Lisbon (Portugal), Salzburg (Austria), Bergen (Norway), Krakow (Poland), Hannover (Germany), Uppsala (Sweden), Reykjavik (Iceland), Tartu (Estonia) and London (United Kingdom). Centres were selected to ensure that subjects used in the analyses were from a population of predominantly European origin. Spirometry was performed before and 15-60 minutes after inhalation of 200 μg salbutamol through a spacer. For each participant forced expiratory volume in 1 second (FEV 1 ), forced vital capacity (FVC), forced expiratory volume in 6 seconds (FEV 6 ) were measured before and after bronchodilator use and the corresponding FEV 1 /FVC ratio was calculated.
We selected spirometry data from subjects in good respiratory health (asymptomatic, lifelong non-smokers) using information from a questionnaire. This included subjects who lacked respiratory symptoms (wheezing, phlegm, cough); had no medical diagnosis of asthma, chronic bronchitis, COPD, or emphysema; and denied ever having suffered tuberculosis or lung cancer or having undergone lung resection. This reference population is hereafter refered to as the "normal" population. To generate predicted and lower limit of normal values, each of the variables (FEV 1 , FVC, FEV 6 , FEV 1 /FVC) was entered into a multiple regression model using height, age and body mass index as predictors. In the regression models, age was centred by subtracting 40, body mass index was centred by subtracting 23 whilst height was centred by subtracting the average height (165 centimetres for females and 175 centimetres for males). We modelled separate regression equations for men and women for pre-and post-bronchodilator FEV 1 , FVC, FEV 6 , FEV1/FVC. Non-linear relationships were also investigated by including the square of height and age as predictors in the models. Models were assessed to determine whether addition of quadratic terms for age and height resulted in an improvement in model fit. Likelihood ratio tests were used to compare nested models.
Predicted values and values for the lower limit of normal (LLN) were estimated for each lung function variable using these results. The LLN value for each lung function variable was estimated as LLN value = predicted value −1.645 * S, where S is an estimate of the standard deviation of the residuals and a residual is the difference between observed and predicted lung function. Bland-Altman plots were used to examine the extent of agreement between measured values of pre-and post-bronchodilator ventilatory function variables. Bland-Altman plots were also drawn for LLN values of pre-and post-bronchodilator ventilatory function variables [8].
For each lung function measurement, differences between pre-and post bronchodilator measures were entered into regression models with height, body mass index and age as predictors to investigate whether there was an association between the difference and these predictors. Mean differences between pre-and post bronchodilator ventilatory function were calculated for the observed values, the predicted values and the lower limits of normal together with their 95% limits of agreement.
Further, for each lung function measurement from those who had been excluded because they had a smoking history, a diagnosis of a history of current respiratory symptoms, differences between pre-and post bronchodilator measures were entered into regression models with GOLD class (mild, moderate, severe or very severe) as a predictor to investigate whether there was association between the difference and the GOLD class.
Finally we used the observed post-bronchodilator ventilatory function from those who had been excluded because they had a smoking history, a diagnosis or a history of current respiratory symptoms to estimate the sensitivity and specificity of using the pre-bronchodilator LLN or the pre-bronchodilator LLN corrected by a constant which was the mean difference between the preand post-bronchodilator lower limits of normal. These were then compared with a gold standard of the LLN estimated directly from the post-bronchodilator ventilatory function. Further, for FEV1/FVC we also estimated sensitivity and specificity using the fixed value of 70% rather than LLN. In this case we considered the prebronchodilator FEV1/FVC or the pre-bronchodilator FEV1/FVC corrected by a constant which was the mean difference between the observed pre-and postbronchodilator FEV1/FVC. These were then compared to a gold standard of using the post-bronchodilator FEV1/FVC. In addition to sensitivity and specificity we give Youden's Index, the sum of sensitivity and specificity −1, a summary validity score that has useful properties in some circumstances [9]. All statistical analyses were performed using Stata 12 (Stata Corporation, College Station, TX USA).
Ethical approval for the study was given in each site before starting data collection and all participants signed a consent form after receiving details of the purpose and content of the study. Table 1 shows the number (%) of subjects excluded for different reasons. Of 7430 participants 830 were omitted from all analyses because they had inadequate spirometry (746) or because of missing data (84). Of the 6600 remaining 5246 were excluded from the calculation of normal values because they had a relevant diagnosis, they smoked or they complained of respiratory symptoms. These were subsequently used to analyse the sensitivity and specificity of the different criteria. The other 1354 were used to calculate normal values. Using the spirometric classification of COPD based on post-bronchodilator FEV1, the distribution of the 5246 participants (who were excluded from the calculation of normal values) per GOLD class is; 4053 participants have no COPD i.e. FEV1/FVC(%) ≥ 70%, 633 have mild COPD, 461 have moderate COPD, 88 have severe COPD and 11 have very severe COPD. Table 2 describes the studied population showing the number of participants included from each BOLD site. Table 3 describes the studied population, including mean values of pre-and post-bronchodilator FEV 1 , FVC, FEV 6 and FEV 1 /FVC. As expected the "normal" population had slightly higher ventilatory function and was slightly younger. The normal male population was slightly taller than the other male participants. Table 4 shows the coefficients and the explained variance (R 2 ) for the prediction equations for the "normal" population. The coefficients for age, age 2 , (where relevant) and height are the same whether estimating the mean or the lower limit of normal. Values are given for each of the four measures of ventilatory function, pre-and postbronchodilator and for each sex separately. The pre-and post-bronchodilator values for the intercepts are all within 100 mL of each other with the exception of the lower limit of normal for the FEV 1 in men, where the difference is approximately 125 mL. The pre-and post-bronchodilator values are often very close to each other and for the FEV 6 the differences are negligible. Differences in the coefficients for age and height are also negligible, with the exception of the values for the FEV 1 /FVC ratio. The prediction equations for the FEV 1 /FVC ratio are relatively poor as is seen by the much lower R 2 value, but the differences between pre-and post-bronchodilator values are still less than 3%. Table 5 gives the regression coefficients for the difference between observed pre-and post-bronchodilator   ) showed no evidence of between-centre heterogeneity, suggesting that the results are similar in each of the centres included in the study. For the "non-normal" population the differences between pre-and post-bronchodilator were: FEV 1 in women with mild disease was 59mL (95% CI: 9, 109) greater than those with severe disease; FVC in men with severe disease was 95mL (95% CI: 10, 177) greater than those with mild disease; FVC in men with very severe disease was 275mL (95% CI: 60, 490) greater than those with mild disease; FEV 6 in men with very severe disease was 176mL (95% CI: 20, 332) greater than those with mild disease; FEV 1 /FVC in men with moderate disease was 0.9% (95% CI: 0.29%, 1.52%) greater than those with mild disease. Table 6 shows the mean differences and 95% limits of agreement for the observed values, the predicted values and the lower limits of normal. The mean observed differences and the predicted differences are equal, by definition, but the limits of agreement are much narrower for the predicted values, as expected. Table 7 gives the sensitivity and specificity of using the pre-bronchodilator values and the pre-bronchodilator values adjusted with a fixed constant when compared with the gold standard of using the calculated lower limits of normal using the post-bronchodilator values. The data used to estimate these come from the participants excluded from the "normal" population, and therefore do not include individuals who were used to estimate the original norms. Compared with the "gold standard" of the lower limit of normal derived from post-bronchodilator spirometry, use of the pre-bronchodilator LLN criterion results in a substantially lower sensitivity when judging an abnormal FEV1 (sensitivity =78.4%) or FEV1/FVC ratio (sensitivity = 65%), though specificity remains high. Changing the LLN based on the pre-bronchodilator spirometry with an added constant improves the characteristic of the tests overall and the Youden's index is above 0.95 for all measures. Table 8 gives the sensitivity and specificity of using the pre-bronchodilator values and the prebronchodilator values adjusted with a fixed constant when compared with the gold standard of using the postbronchodilator values when using the fixed cut off of 70% as the criterion for a low FEV1/FVC. In this case adding a constant lowers Youden's Index and sensitivity, but increases the specificity of the test.

Results
Bland-Altman plots for the two sexes show no obvious relation between the difference and magnitude of FEV 1 , FEV 6 , FVC and FEV 1 /FVC (%), respectively. (Figure 1). The mean differences between pre-and post-bronchodilator values for measured FEV1, FVC, FEV6 and FEV1/FVC are 76 mL (95% CI 69, 83 ), -38 mL (95% CI −48, -27), 1.6 mL (95% CI −6.6, 9.8 ) and 2.66% (95% CI 2.47, 2.85), respectively. Figure 2 gives similar plots for the lower limits of normal. The mean differences between pre-and postbronchodilator values for the LLN of FEV 1 , FVC, FEV 6 and FEV 1 /FVC are 92 mL (95% CI 91, 94 ), -27 mL (95% CI −28, -26 ), 9.54 mL (95% CI 8.44, 10.64 ) and 2.91% (95% CI 2.88, 2.94), respectively. All of the above differences are adjusted for age, height and sex. It is clear that the distribution of the difference is irregular with respect to the mean value, though the nature of the irregularity is different for each sex and each measurement. Plots of residuals (Figures 3 and 4) for pre-and postbronchodilator FEV 1 , FVC, FEV 6 and FEV 1 /FVC against the corresponding predicted values reveal a pileup of residuals in the centre of the plot at each predicted value and a normal distribution of residuals trailing off symmetrically from the centre. Further the band enclosing the residuals is approximately equal across the range of predicted values). This shows that the assumptions of normality and homoscedasticity are met and implies that it is reasonable to estimate the fifth percentile of lung function (LLN) of each subject by subtracting 1.645*S from a subject's predicted value.

Discussion
As expected, the observed value of lung function in a "normal" population changes little with the use of a bronchodilator and differences between centres were not significant. The case for the predicted and for the lower limit of normal is more complicated, although the mean difference is the same for the predicted as for the observed, by definition, and similar in magnitude for the lower limit of normal. This analysis has shown that, although the prebronchodilator lower limit of normal for the lung function test is a reasonable approximation to the lower limit of normal, using pre-bronchodilator norms substantially reduces the sensitivity of spirometry in identifying cases of chronic airflow limitation. Addition of a fixed amount provides an even more nearly approximate value to the true post-bronchodilator "normal" values, raising sensitivity to over 99%.
Although it is inconsistent to use pre-bronchodilator "normal values" to assess post-bronchodilator responses the similarity of results using either method is not surprising. Resting tone in the normal airway is low and the effect of a bronchodilator is likely to be similarly small. These data come from a study with a very high level of quality assurance and with a strong training programme for the technicians prior to starting the programme.
Estimating predicted values produced a reasonable fit and similar parameter estimates using either the pre-or post-bronchodilator results. Comparing the observed results using either method showed small average differences between pre-and post-bronchodilator results and no discernable variation in the difference with respect to the average of the pre-and post-bronchodilator values.
The changes in lung function observed in the "normal" population following bronchodilator are, as expected, small. The small fall in FVC was not predicted but the size of the fall is small and may be due to tiring of the participants. If so, it is part of the usual post-bronchodilator test Table 7 Sensitivity and Specificity of using the pre-bronchodilator lower limit of normal and using the prebronchodilator lower limit of normal with an added constant compared with the lower limit of normal derived from the post bronchodilator values <LLN using pre-bronchodilator <LLN using pre-bronchodilator + K Results for 5246 subjects in the "non-normal" survey population. The participants included were those with adequate spirometry, but excluded from the estimation of normal values because of a history of smoking or of respiratory diagnosis or symptoms. Values of "K". for FEV 1 (mLs): 121 (men); 69 (women). for FEV 6 (mLs): 18 (men); 3 (women). for FVC (mLs): -29 (men); -27 (women). for FEV 1 /FVC (%) : 3.22 (men); 2.68 (women). which is almost always performed after an initial test without bronchodilator. The Bland-Altman plots for the predicted values (not shown), and hence also the values for the lower limits of normal, are less satisfactory and show variable associations between mean difference and average value for the different measurements. This in part reflects the observation that the difference is not a constant but varies also by age and height (Table 5). Nevertheless the limits of agreement are narrow, for FEV1 lying between ± 3.5 mL (men), and ± 31 mL (women); for FEV6: ±53 mL (men), ±20 mL (women); for FVC ± 40 mL (men), ± 32 mL (women); for FEV1/FVC: ± 0.35% (men) ± 1.32% (women). The data come from a large multi-centric study of general population samples over the age of 40 years, and the sites chosen were inhabited by people of predominantly European origin. Younger populations have a greater tendency to reversible airway obstruction and although the extent of this in people with neither respiratory symptoms nor respiratory diagnoses is likely to be more limited, we cannot extrapolate our findings to younger age groups. Nor can we extrapolate the findings to other ethnic groups, though again the findings are likely to be similar as there are unlikely to be large variations in resting tone in normal airways in different ethnic groups. Within this population the findings were similar in each of the centres included.
In the BOLD study bronchodilation is achieved by administering 200μg salbutamol via a spacer. The GOLD convention is to give a 400μg dose. The BOLD decision  Results for 5246 subjects in the "non-normal" survey population. The participants included were those with adequate spirometry, but excluded from the estimation of normal values because of a history of smoking or of respiratory diagnosis or symptoms. Values of "K": 2.76 (men); 2.59 (women).
was based on the evidence that 200μg achieves almost as great an effect as 400μg [10] and provides a more acceptable profile of side effects for a population survey of volunteer participants with much less likelihood of having ever used a bronchodilator before. We believe that an additional 200μg of salbutamol would have achieved very little additional change in this group of participants. In epidemiological studies it is not always necessary or even desirable to divide a population into "diseased" and "healthy", and ventilatory function can be treated as a continuous variable. However, where a binary variable is required, as it is, for instance, when making a diagnosis, and hence when estimating a prevalence, there is a need to decide on the cut-off point between the normal and abnormal. In many clinical situations this is done by estimating the 95% tolerance limits in the "normal population". This is however an arbitrary criterion and is agreed by convention. There is no clinical reason for not using a slightly stricter or more relaxed criterion if this is more convenient. There is a great deal of information collected from around the world on "normal" ventilatory function based on this method [11]. The principles underlying this collection have been similar for many years, though the operational definitions of "normal" participants have varied. Nevertheless almost all studies have been conducted without the use of a bronchodilator, which is essential in the use of the test to define COPD according to international guidelines. Given the arbitrary nature of the conventions for defining "normal" values, including the choice of the 95% tolerance limits and the decision on whom to include as "normal" or exclude as potentially abnormal, the differences introduced by using pre-rather than post-bronchodilator ventilatory function, though substantial, are not large. Providing that they are used consistently use of norms based on pre-bronchodilator spirometry are probably acceptable for most purposes. It is, by contrast, very important that the test itself is administered with a bronchodilator, and this is still not the usual practice in prevalence surveys.
In epidemiological studies we recommend the use of the lower limit of normal as the criterion for a low FEV1/FVC ratio as this is the most convenient way to adjust prevalence to age. Nevertheless some arguments are still made for the fixed ratio of 70% and this remains the recommendation of GOLD. Mannino has argued that as the lower limit of normal is less sensitive than the fixed ratio, which is the case at least over the age of about 45 years, that its use leads to "under-diagnosis" of true cases [12]. This may be a consideration where overdiagnosis is not a concern. A Dutch study in Primary Care has also reported that the fixed ratio gives a more accurate assessment of disease when compared to clinical opinion in diagnosing clinical COPD [13]. As in other reports the fixed ratio was more sensitive and less specific than the lower limit of normal and in a lower prevalence environment would have come to the opposite conclusion. As the Fixed Ratio is still used we have provided results for this criterion also.
In the absence of a large set of observations on postbronchodilator ventilatory function, the options for defining "abnormal" values are, effectively, three. First, and easiest, the pre-bronchodilator norms can be used. As bronchodilators have little effect in the absence of smoking, diagnoses and symptoms, this is not a bad approximation and, as the choice of a 95% cut off for the "normal value" is in any case arbitrary, it is adequate at least where only internal comparisons are being made, or where the comparisons are made with studies that have made similar assumptions. Second, the estimates can be changed by a fixed amount to take account of the small average difference induced by the bronchodilator in normal people. As the bronchodilator has slightly different effects on normal subjects according to their age, sex and height, this will not provide a perfect solution and there will be a small distortion in the estimate which will vary by these characteristics. The effect is however small and not of clinical significance when using the lower limit of normal. It should be noted that the small correction added here is based on the BOLD data and needs to be tested in other populations, but it gives some idea of the size of the correction and its ability to adjust adequately for the use of pre-bronchodilator standards. It is notable that the addition of this constant did not improve the results for the test based on the fixed ratio, as judged by Youden's index. This is likely to be due to the greater distortion induced when adding a constant where age has not been adjusted for. Third, internal estimates of normal values can be estimated from the post-bronchodilator measurements taken in the study itself on "normal" participants. This last option is clearly not available to clinical studies and it has the disadvantage in epidemiological studies that the estimates will almost always be determined on relatively small samples, which makes them inherently unreliable.

Conclusions
Using different predictors of normal lung function based on pre-or post-bronchodilator spirometric values provides slightly different results, but these may not be of great clinical significance provided they are used consistently in comparative studies. In a clinical setting spirometry should be used in combination with other information to guide management and these small differences are unlikely to be important when setting criteria for a positive screening test if this is based on a prebronchodilator assessment of spirometry. We have provided prediction equations for post-bronchodilator lung function in people over the age of 40 years living in Europe. We have also provided approximate mean differences between pre-and post-bronchodilator values and we have given estimates of the effect of using the different methods on sensitivity and specificity of a test for COPD.