Weighted Epworth sleepiness scale predicted the apnea-hypopnea index better

Background The relationship between the Epworth sleepiness scale (ESS) and the apnea-hypopnea index (AHI) is uncertain and even poor. The major problem associated with the ESS might be a lack of consideration of weight in prediction in clinical practice. Would awarding different item-scores to the four scales of ESS items to develop a weighted ESS scoring system improve the accuracy of the AHI prediction? It is warranted to explore the intriguing hypotheses. Methods Seven hundred fifty-six adult patients with suspicion of obstructive sleep apnoea syndrome (OSAS) were prospectively recruited to a derivation cohort. This was tested against a prospective validation cohort of 810 adult patients with suspected OSAS. Each ESS item’s increased odds ratio for the corresponding AHI was calculated using univariate logistic regression. The receiver operating characteristic curves were created and the areas under the curves (AUCs) were calculated to illustrate and compare the accuracy of the indices. Results The higher the ESS item-score, the closer the relationship with the corresponding AHI. The odds ratios decreased as a result of the increased AHI. The ESS items were of unequal weight in predicting the corresponding AHI and a weighted ESS was developed. The coincidence rates with the corresponding AHI, body mass indices, and neck circumferences rose as the scores increased, whereas nocturnal nadir oxygen saturations decreased, and the weighted ESS was more strongly associated with these indices, compared with the ESS. The capability in predicting the patients without OSAS or with severe OSAS was strong, especially the latter, and the weighted ESS orchestrated manifest improvement in screening the patients with simple snoring. The patterns of sensitivities, specificities, and Youden’s indices of the four ranks of weighted ESS for predicting the corresponding AHI were better than those of the ESS, and the AUCs of weighted ESS were greater than the corresponding areas of ESS in the two cohorts. Conclusions The weighted ESS orchestrated significant improvement in predicting the AHI, indicating that the capability in predicting the patients without OSAS or with severe OSAS was strong, which might have implications for clinical triage decisions to prioritize patients for polysomnography.


Introduction
Obstructive sleep apnoea (OSA) is a major challenge for physicians and healthcare systems throughout the world [1]. OSA is characterised by repeated interruption of breathing during sleep due to episodic collapse of the pharyngeal airway, nocturnal hypoxaemia and sleep fragmentation. This sleep disruption commonly causes excessive daytime sleepiness (EDS) [2]. Nocturnal hypoxaemia can be a major determinant of EDS in patients with obstructive sleep apnoea syndrome (OSAS) [3]. The apneahypopnea index (AHI) is an objective, sensitive and specific measure of the severity of OSA [4]. The extensively validated Epworth sleepiness scale (ESS), which is a brief self-administered questionnaire that asks the subject to rate on a scale of 0-3 the chances that he would have dozed in eight most soporific situations commonly met in daily life, is the most frequently used instrument for assessing subjective daytime sleepiness or sleep propensity in adults [5,6]. The external criterion validity of the ESS has been tested by examining the relationship between ESS scores and the AHI, but unfortunately, the relationship is uncertain and even poor [1,3,5,[7][8][9].
It is self-evident that we are more likely to fall asleep when lying down than when standing up. Sleep propensity must also be distinguished from the state and process of fatigue [10]. The ESS tries to overcome the fact that people have different daily routines, some facilitating and others inhibiting daytime sleep [5]. Hence, the ESS was designed to measure daytime sleepiness over the whole range, from very high to low levels. The items were chosen, therefore, to represent situations of a widely differing soporific nature, some known to be very soporific; others less so [6]. Johns stated that a corollary of his model of sleep and wakefulness is that some postures, activities and environmental situations will be more conducive than others to sleep-onset and created the term somnificity characterizing a posture, activity and environmental situation that reflects its capacity to facilitate sleep-onset in a majority of subjects to replace the phrase, soporific nature of a situation [10]. However, the major problem associated with the ESS in uncertain and even poor prediction of the AHI might be a lack of consideration of weight in clinical practice. Would awarding different item-scores to the four scales of ESS items to develop a weighted ESS scoring system improve the accuracy of the AHI prediction? Therefore, it is warranted to explore the intriguing hypotheses.
Two prospective cohort studies were conducted to derive and validate a weighted ESS.

Design and setting
A prospective derivation cohort study of 756 adult patients with suspicion of OSAS was conducted at the Department of Pulmonary and Critical Care Medicine in a 1600-bed tertiary care university hospital from June 2018, through November 2018. We then performed a prospective validation cohort study of 810 adult patients with suspected OSAS who presented to our hospital between December 2018 and June 2019.

Criteria for enrollment
International Classification of Sleep Disorders (ICSD) diagnostic criteria for OSA were referred [11,12].
Clinical suspicion of OSAS was based on complaints of (1) loud snoring or witnessed apneas, (2) EDS, or (3) overweight/obesity, which was reported by the patient or relatives. The exclusion criteria were heart failure, dementia, major psychiatric disorder, or another condition not suitable for the use of polysomnography.

Polysomnography
In accordance with standard techniques [13,14], a computer data acquisition and analysis system recorded the following signals: electroencephalogram, bilateral electrooculogram, electrocardiogram, submental and bilateral anterior tibialis electromylogram, thoracic and abdominal excursion, oral and nasal airflow by thermistor and breath sounds, body position, and oxygen saturation by pulse oximeter.

Outcome
The main outcome measures were the ESS, weighted ESS scores and the AHI. Secondary outcomes incorporated body mass index (BMI), neck circumference (NC), and lowest oxygen saturation (LOS).

Sample size calculation
Unit-level design prevalence, cluster-level design prevalence, test sensitivity, target cluster sensitivity, and target system sensitivity were 20%, 1%, 0.9, 0.5, and 0.95, respectively. The total numbers of clusters to be sampled were 598, and the maximum number of samples was 2392.

Data collection
Seven hundred fifty-six patients were enrolled into the derivation cohort excluding 9 cases due to exclusion criteria and 810 into the validation cohort excluding 11 cases. Overnight polysomnography was arranged for all patients with suspected OSAS. Clinical and diagnostic data were collected. The ESS, weighted ESS scores and BMI were calculated. The statistician was blinded to the study.

Statistical analysis
All statistical analyses were performed with Statistical Package for the Social Science for Windows version 16.0 (SPSS, Chicago, IL, USA) and MedCalc version 19.1 (Mariakerke, Belgium). Categorical variables and continuous variables were reported as the percentages and the mean ± standard deviation (SD), respectively. Chi-Square test, unpaired Student's t-test, one-way ANOVA, univariate logistic regression, and Spearman rank correlation were employed. Two groups were compared by unpaired Student's t-test or Chi-Square test, and analyses of multiple groups were carried out using one-way ANOVA or Chi-Square test, depending on the characteristics of variables. Each ESS item's increased odds ratio (OR) for the corresponding AHI was calculated using univariate logistic regression. The receiver operating characteristic (ROC) curves were created and the areas under the ROC curves (AUCs) were calculated to illustrate and compare the accuracies of the indices. The sensitivities, specificities, positive likelihood ratio (PLR), negative likelihood ratio (NLR), positive predictive values (PPVs), negative predictive values (NPVs), and Youden's indices were also calculated. A p value of < 0.05 was considered statistically significant.

Baseline characteristics of study cohorts
The demographic and clinical characteristics of recruited patients with suspicion of OSAS were listed in Table 1. The numbers of patients with simple snoring were 116 and 88 in the derivation and validation cohorts, respectively. The participants recruited to the validation cohort were younger and presented higher AHI (especially AHI > 15 and AHI > 30) and lower nocturnal oxygen saturation nadir, compared with those in the derivation cohort.

Association of the predictive rule of ESS with the AHI in the derivation cohort
In general, the proportions of patients fulfilling the corresponding AHI rose when the ESS item-scores increased. Therefore, the higher the ESS item-score, the closer the relationship with the corresponding AHI (Table 2). However, as the AHI increased, the relationship between the ESS item-score and the corresponding AHI became less close. Each item had a significant increased OR for the corresponding AHI, except for those for AHI > 30 in four items. The ORs decreased as a result of the increased AHI. The top three ORs for AHI ≥ 5 were derived from the items "In a car, while stopped for a few minutes in traffic", "Sitting and talking to someone", and "Sitting inactive in a public place (e.g., a theater or a meeting)", respectively.

Derivation of the weighted ESS
The higher the ESS score, the higher the person's average sleep propensity in daily life, according to high ESS scores indicative of EDS in patients with OSAS. On the basis of the weight of predictive rules of ESS for OSA in predicting the AHI, the eight ESS items were divided into five ranks and different item-scores were assigned for different ranks to develop a weighted ESS scoring system. 0-5-6-7 item-scores were assigned for the four scales of three ESS items with the top three ORs, except for the item "Sitting inactive in a public place (e.g., a theater or a meeting)" due to its lower OR for AHI > 15, compared with the other two items. The other itemscores (0-4-5-6 for one item, 0-3-4-5 for two items, 0-2-3-4 for one item, and 0-1-2-3 for two items) were shown in Table 3. All item-scores were intended to be integers. These scores would be taken at face value if some people could not decide on one number and reported half-values. It would be rounded up to the next whole number if the total score included a half-value after adding them up. The ESS score (the sum of eight item-scores, 0-3) ranged from 0 to 24, and the weighted ESS score (the sum of eight item-scores, from 0 to 3 to 0-7) from 0 to 40.

Associations of AHI, BMI, NC and LOS with the ESS and weighted ESS scores
In general ESS scores can be interpreted as follows: 0-10 indicates normal daytime sleepiness (NDS), 11-12 mild EDS, 13-15 moderate EDS, and 16-24 severe EDS. Similarly, 0-14 rank in the weighted ESS scores was  AHI Apnea-hypopnea index.
OR Odds ratio. CI Confidence interval defined as NDS, and the other ranks were described in Table 4. The four ranks of ESS scores were regarded as corresponding with AHI < 5, AHI ≥ 5, AHI > 15, and AHI > 30, respectively. As did the four ranks of weighted ESS scores. In general, the coincidence rates with the corresponding AHI rose sharply as the cut-off values of scores increased in the two scoring systems in the two cohorts, and the weighted ESS was more strongly associated with the corresponding AHI, especially in the rank for NDS, compared with the ESS. BMI and NC increased significantly as a result of the increased ranks in the two scoring systems in the two cohorts, whereas LOS decreased ( Table 5). The weighted ESS was more strongly associated with these indices in the two cohorts, compared with the ESS.

Comparisons of the scoring systems for predicting the AHI
The sensitivities, specificities, and predictive values of the different cut-off values in the two scoring systems for predicting the corresponding AHI were shown in Table 6. The two scoring systems demonstrated the highest sensitivities, NPVs and Youden's indices, and the lowest NLRs in the lowest rank, and the highest specificities, PLRs and PPVs, and higher Youden's indices in the highest rank in the two cohorts, and the patterns of sensitivities, specificities, and Youden's indices of the ranks for NDS and severe EDS for prediction of the corresponding AHI were better than those of the other two intermediate ranks in the two cohorts, except for that of the rank for mild EDS in the weighted ESS in the validation cohort, indicating that the capability in predicting the patients without OSAS or with severe OSAS was strong, especially the latter, and that the weighted ESS orchestrated manifest improvement in screening the patients with simple snoring. The patterns of sensitivities, specificities, and Youden's indices of the four ranks of weighted ESS for predicting the corresponding AHI were better than those of the ESS in the two cohorts, indicating that the weighted ESS orchestrated significant improvement in predictive power. Table 3 The ESS and weighted ESS scoring systems   Variable  ESS  Weighted ESS   I  II  III  IV  I  II  III  IV   Sitting and reading  0  1  2  3  0  3  4  5   Watching TV  0  1  2  3  0  3  4  5 Sitting inactive in a public place (e.g., a theater or a meeting) 0 1 2 3 0 4 5 6 As a passenger in a car for an hour without a break 0 1 2 3 0 2 3 4 Lying down to rest in the afternoon when circumstances permit 0 1 2 3 0 1 2 3 Sitting and talking to someone 0 1 2 3 0 5 6 7 Sitting quietly after a lunch without alcohol 0 In a car, while stopped for a few minutes in traffic  The ROC curves for the two scoring systems in the two study populations illustrated the differences in accuracy of the AHI prediction (Table 7, and Fig. 1). In general, AUC values decreased as the AHI increased. The weighted ESS was performed better than the ESS in the two cohorts.

Discussion
The main findings of the current study comprise the following: The higher the ESS item-score, the closer the relationship with the corresponding AHI. As the AHI increased, the relationship between the ESS item-score and the corresponding AHI became less close. The ORs decreased as a result of the increased AHI. The ESS items were of unequal weight in predicting the corresponding AHI and a weighted ESS was developed. The coincidence rates with the corresponding AHI, BMIs and NCs rose as the cut-off values of scores increased in the two cohorts, whereas LOS decreased, and the weighted ESS was more strongly associated with these indices, compared with the ESS. The capability in predicting the patients without OSAS or with severe OSAS was strong, especially the latter, and the weighted ESS orchestrated manifest improvement in screening the patients with simple snoring. The patterns of sensitivities, specificities, and Youden's indices of the four ranks of weighted ESS for predicting the corresponding AHI were better than those of the ESS, and the AUCs of weighted ESS were greater than the corresponding areas of ESS in the two cohorts.
Additional 4 scores was added to the top two ESS items while keeping a "0" unchanged, and so on. In other words, any positive answer to the top two items is clinically more relevant to the AHI than any other positive answer to the other items. Onen and coworkers adopted this weighted scoring strategy to develop a simple three-item instrument for measuring an older patient's daytime sleepiness duration and general level of sleepiness in daily activities that can also include information obtained from a proxy [15].
Item "Lying down to rest in the afternoon when circumstances permit" is the only one that clearly involves lying down. All other items involve variations of the sitting posture, except item "watching TV" in which the posture is not specified [10]. The situation in the abovementioned item was the most soporific. By contrast, the situations in items "Sitting and talking to someone" and "In a car, while stopped for a few minutes in traffic" were the least soporific. The other situations in the other items were intermediate in their soporific nature [6]. Furthermore, there were significant overall differences in item-ranks according to their relative somnificities   among the eight ESS items, and the items with the top five ranks were "Lying down to rest in the afternoon when circumstances permit", "Watching TV", "Sitting and reading", "As a passenger in a car for an hour without a break", and "Sitting quietly after a lunch without alcohol", respectively [10], which demonstrated the least five ORs for the AHI in the current study. Sleep propensity was manifested when lying down. Therefore, the item "Lying down to rest in the afternoon when circumstances permit" demonstrated the lowest OR. On the contrary, sleep propensity was decreased in a car while stopped for a few minutes in traffic, which was the least somniferous item and showed the highest OR. As a result, the somnificities were not paradoxical but concordant in the above-mentioned studies including the current. Therefore, the current findings were enough to be clinically relevant. Doubt should not be cast on this scale of somnificities, which may be widely applicable.
The finding that ESS scores can distinguish patients who simply snore from those with even mild OSAS is evidence for the sensitivity of the ESS and the questionnaire should be useful in elucidating the epidemiology of snoring and OSAS [5]. The capability in screening patients with simple snoring or severe OSAS was strong, especially the latter, and the weighted ESS did better than the ESS in the current study. It might be the causations that the subjective reports on the item-scores when never dozing or having high chance of dozing were relatively more accurate and that the consideration of weight in prediction might embody the true and natural features of ESS items, which might avoid underestimation of some variables. Nocturnal polysomnography is the gold standard for diagnosing OSAS, but the diagnostic procedures are expensive and time-consuming. On the basis of the high prevalence of snoring and OSAS, many sleep laboratories have large numbers of snorers waiting to be tested. The weighted ESS could more accurately detect the patients with simple snoring or severe OSAS, especially the former, owing to higher Youden's indices, which might have implications for clinical triage decisions to prioritize patients for polysomnography. The patients fulfilling severe EDS would have the priority for polysomnography, whereas those meeting NDS would not. Moreover, the weighted ESS predicted the AHI better and was more strongly linked to BMI, NC and LOS compared with the ESS, indicating that the weighted ESS orchestrated significant improvement in predicting severity in patients with OSAS. These results give credence to the future recommendation for decision making in clinical practice, although much Practicability is another aspect that requires assessment when developing a new scoring system. In consideration of the ESS relatively easier to practically implement, the weighted ESS is a little more difficult to implement, but the benefit far outweighs the difficulty. Hence, the practicability might not be too bad.
The relationship with the corresponding AHI became closer as a result of the higher item-score. The higher the ESS item-score, the higher the somnificity. This might be the causation resulting in the relationship. The relative inaccuracy of subjective reports on the itemscores might be more obvious when having low chance of dozing than having high chance. It might also be envisaged to interpret the relationship. As the AHI increased, the ESS items demonstrated lower ORs. What mechanisms might be envisaged to interpret this phenomenon? It remains further research.

Limitations
Several limitations of this study must be acknowledged. First, the relative inaccuracy of subjective reports in the ESS and weighted ESS was probably inevitable. Daytime sleepiness could be either underestimated or overestimated. Second, there were relatively small samples. Had the numbers been larger, perhaps the results might have been more robust. Although there was a validation cohort, this was not a muticenter study, most importantly no external validation. The mean age of the series was very young (39-43 years old) with a BMI of 26.3 (clearly very thin). This is not the classical phenotype seen all over the world in sleep labs in patients with clinical suspicion of OSA except in China. Moreover, our samples were limited to Chinese patients. Therefore, future research with other ethnic groups is warranted to assess the generalisability of the current findings. Finally, residual confounding by several factors including habitual sleep duration, disorders not documented in the study, medications, and genetic and socioeconomic factors cannot be excluded [16].

Conclusions
The weighted ESS orchestrated significant improvement in predicting the AHI, indicating that the capability in predicting the patients without OSAS or with severe OSAS was strong, which might have implications for clinical triage decisions to prioritize patients for polysomnography.