Development and validation of a prediction model for tuberculous pleural effusion: a large cohort study and external validation

Background Distinguishing tuberculous pleural effusion (TPE) from non-tuberculosis (TB) benign pleural effusion (BPE) remains to be a challenge in clinical practice. The aim of the present study was to develop and validate a novel nomogram for diagnosing TPE. Methods In this retrospective analysis, a total of 909 consecutive patients with TPE and non-TB BPE from Ningbo First Hospital were divided into the training set and the internal validation set at a ratio of 7:3, respectively. The clinical and laboratory features were collected and analyzed by logistic regression analysis. A diagnostic model incorporating selected variables was developed and was externally validated in a cohort of 110 patients from another hospital. Results Six variables including age, effusion lymphocyte, effusion adenosine deaminase (ADA), effusion lactatedehy drogenase (LDH), effusion LDH/effusion ADA, and serum white blood cell (WBC) were identified as valuable parameters used for developing a nomogram. The nomogram showed a good diagnostic performance in the training set. A novel scoring system was then established based on the nomogram to distinguish TPE from non-TB BPE. The scoring system showed good diagnostic performance in the training set [area under the curve (AUC) (95% confidence interval (CI)), 0.937 (0.917–0.957); sensitivity, 89.0%, and specificity, 89.5%], the internal validation set [AUC (95%CI), 0.934 (0.902–0.966); sensitivity, 88.7%, and specificity, 90.3%], and the external validation set [(AUC (95%CI), 0.941 (0.891–0.991); sensitivity, 93.6%, and specificity, 87.5%)], respectively. Conclusions The study developed and validated a novel scoring system based on a nomogram originated from six clinical parameters. The novel scoring system showed a good diagnostic performance in distinguishing TPE from non-TB BPE and can be conveniently used in clinical settings. Supplementary Information The online version contains supplementary material available at 10.1186/s12931-022-02051-4.


Background
Tuberculosis (TB) remains the most common cause of death from a single infectious pathogen worldwide in 2019 [1]. It is estimated that with 10 million people developed TB disease and 1.4 million TB patients died in 2019 [1]. Tuberculous pleural effusion (TPE) is a common clinical manifestation of extra-pulmonary TB, which accounts for 25 ~ 30% of total TB cases in TB-endemic regions, including China [2][3][4]. Early and accurate diagnosis of TPE is extremely critical for the management of the disease. Currently, the gold standards for TPE diagnosis were based on the detection of acid-fast bacilli (AFB) in sputum, pleural fluid, or pleural biopsy tissues through Open Access *Correspondence: wuah910602@126.com Mycobacterium tuberculosis (M. tuberculosis) culture or performed by thoracoscopy [4,5]. However, the limited sensitivity, low accuracy and invasive examination of those diagnostic tools compromised their diagnostic value in clinical practice [6][7][8]. Alternative diagnostic methods, including tuberculin skin test (TST), adenosine deaminase (ADA), and interferon-gamma release assays (IGRAs), have improved the speed for TPE diagnosis in recently years [4,[9][10][11]. However, the sensitivity and/or specificity of those methods were still insufficient when separated TPE from other type of pleural effusion (PE), such as malignant pleural effusion (MPE) and parapneumonic pleural effusion (PPE) [9][10][11].
Therefore, it was urgent to seek and establish a highly sensitive, accurate and less invasive diagnostic marker or method for TPE patients. The aim of this study was to construct a scoring system based on a nomogram to distinguish TPE from non-TB BPE. Besides, we also validated the diagnostic performance of the developed scoring system in the internal set and the external set from our patients and another hospital, retrospectively.

Patients and study design
This was a retrospective study of individuals more than 18 years old who were admitted to Ningbo First Hospital with newly diagnosed PE between January 2014 and March 2021. A flow diagram of patient selection was presented in Fig. 1. We retrospectively reviewed all consecutive patients with the keyword 'PE (J94.804 and J90. × 00)' and 'tuberculous pleurisy (A16.500)' in the clinical electronic record system of Ningbo First Hospital. All the patients were first admitted to our hospital because of pleural effusion. All PE samples and concomitant blood samples were taken and tested for counts and biochemical parameters. The data from the first sample of PE and blood obtained in each patient was considered for analysis. The related demographic, laboratory, and clinical information for each patient were extracted from the clinical electronic record system. Finally, a total of 909 patients with BPE were enrolled in this study. Patients were randomly separated as the training set (n = 651) and the internal validation set (n = 258) at a 7:3 ratio, A cohort of 110 patients with PE in the Affiliated People Hospital of Ningbo University from August 2020 to November 2021 were used as the external validation set. Among 909 patients, 414 patients with BPE were caused by tuberculous pleurisy (TBP), and 495 patients were caused by parapneumonic effusion (PPE), chronic heart failure (CHF), empyema, parasitic infection and so on. Patients that meet all the following criteria were included: (i) PE was diagnosed underwent either ultrasonography, chest CT, or X-ray (ii) patients underwent diagnosis for PE by cytology, thoracentesis or pleural biopsy and follow-up (at least 6 months). The exclusion criteria were as follows: (i) patients diagnosed with MPE; (ii) age < 18 years old; (iii) pregnant women; (iv) patients with incomplete clinical data; (v) unknown etiology of PE.
The primary aim of the present study was to develop a scoring system with high predictive accuracy to accurately differentiate TPE from non-TPE. The training

Diagnostic criteria for BPE and TPE
BPE was diagnosed based on the following criteria: (a) no tumor cells found in PE; (b) PE of a known etiology, such as TPE or parapneumonic PE, that vanished after optimal treatment; (c) no signs of malignant disease were developed during the follow-up. TPE patients who were first diagnosed and treated in our hospital were included in our study, and was diagnosed based on any of the following criteria: (a) M. tuberculosis was positive in culture of the pleural effusion or pleura tissue; (b) granulomatous inflammation was present in the pleura biopsy by histologic examination and M. tuberculosis was isolated from other sites; or (c) the both presence of granulomatous inflammation in the pleura biopsy by histologic examination and clinical response to anti-TB treatment [12][13][14].

Statistical analysis
Continuous variables were presented as median and inter quartile rang (IQR, 25th-75th), and were compared using either a t-test or Mann-Whitney U test, as appropriate. Categorical variables were presented as number and percentage (n, %), and were compared using the Chi-square (X 2 ) test or Fisher's exact test. Univariate logistic regression analysis was used to screen the independent factors in the training set, and all variables at a significant level [area under the curve (AUC) > 0.6] were selected for multivariate logistic analysis. Then, stepwise selection using the Akaike information criterion (AIC) in the multivariable logistic regression models determined the statistically significant variables. Odds ratios (ORs) were estimated and presented with 95% confidence intervals (CI). Selected variables were incorporated into the nomograms to construct the scoring system using the rms package of R. Calibration curves and decision curve analysis (DCA) were also performed. Receiver operating characteristic (ROC) curve and the corresponding AUCs were calculated to determine the discrimination capacity of the models in distinguishing TPE from non-TB BPE. Besides, the sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), positive likelihood ratio (PLR), and negative likelihood ratio (NLR) were performed to assess the diagnostic accuracy of the nomogram in the training set and validation sets. All statistical analyses were performed using R (packages rms, MASS, OptimalCutpoints, pROC, and rmda; version 4.0.5; http:// www.r-proje ct. org) and SPSS 22.0 (SPSS Inc., Chicago, IL USA). Two-sided P < 0.05 was considered to be significant.

Baseline characteristics
A total of 909 patients with PE from Ningbo First Hospital were included in the present study, and were randomly divided into the training set (n = 651) and the internal validation set (n = 258), respectively. Besides, 110 patients from the Affiliated People Hospital of Ningbo University were included in the external validation set. The demographic and clinical, and laboratory characteristics of the patients among the three groups were presented in Table 1.

Univariate and multivariate logistic regression analyses in patients with TPE and non-TB BPE
Additional file 1: Table S1 compared the demographic, clinical, and laboratory variables between TPE and non-TB BPE in the training set. The cutoff values of those variables were calculated using the Youden index. As shown in Additional file 2: Table S2, most of the included variables were significantly different between the patients with TPE and non-TB BPE. The results calculated by univariate logistic analysis were shown in Additional file 2: Table S2. All 24 variables showed statistical significance. To establish an accurate prediction model, 16 variables with an AUC > 0.6 were performed to multivariate regression analysis. Stepwise selection Table 1 The clinical characteristics of the training set, internal validation set, and external validation set TB tuberculous, WBC white blood cell, ADA adenosine deaminase, LDH lactatedehy drogenase, CA125 carbohydrate antigen 125, CA19-9 carbohydrate antigen 19-9, hsCRP high-sensitivity C-reactive protein, ESR erythrocyte sedimentation rate Continuous variables were presented as median and inter quartile rang (IQR, 25th-75th). Categorical variables were presented as number and percentage (n, %)

Development and validation of the nomogram prediction model
A nomogram based on the above six variables was developed and presented in Fig. 2A. The calibration curve of the nomogram showed that the predicted line overlapped well with the reference line, indicating a good performance of the diagnostic monogram in the training set (Fig. 2B). In addition, the DCA was applied to assess the net benefit of the diagnostic nomogram in order to verify the clinically utility of the model. Results showed that patients would benefit more over the "treat-all" or "treatnone" strategy when the threshold probability was > 0.4 (Fig. 2C).

Diagnostic performance of the scoring system in the training set and validation sets
In the training set, effusion LDH/ADA showed the largest impact on the discrimination of TPE from non-TB BPE in the model with a point of 10 ( Fig. 2A). The other five variables were then modified to integer points: age (5 points), effusion lymphocyte (5 points), effusion ADA (8 points), effusion LDH (7 points), effusion and serum WBC (6 points) ( Table 3). The optimal cutoff value for the total scores was calculated using ROC. When the cutoff value was 23 points, this scoring system showed a good discriminative performance in distinguishing TPE from non-TB BPE with an AUC of 0.937 (95%CI, 0.917-0.957, Fig. 3A and Table 4). The corresponding specificity, sensitivity, PLR, NLR, PPV, and NPV values were 89.0%, 89.5%, 8.5, 0.12, 87.2%, and 91.2%, respectively ( Table 4).

Discussion
Early diagnosis and prompt therapy for patients with TPE is critical to prevent severe complications (pleural thickening, empyema, and calcification, etc.) and mortality. Despite the availability of various diagnostic methods, the early differential diagnosis of TPE from MPE and other non-TB BPE remains to be challenging in clinical practice. Besides, paucibacillary nature of the disease, inappropriate and inadequate test samples, ineffective conventional microbiological techniques, lack of thoracoscopy equipment all lead to the difficulty for diagnosing TPE.
Conventional histopathologic presence of M. tuberculosis on culture, or pleural pathology showing caseating granuloma is the gold standard for diagnosing TPE, however, the diagnostic tests were time consuming and low positive rate [8,11]. Tuberculin skin test (TST) and interferon-gamma release assays (IGRAs) were two common detection methods for diagnosing TPE, but the limitations of inaccuracy, inconsistent sensitivity, and time to diagnosis have retained its efficacies [11,15,16]. Under the circumstances, thoracoscopy seemed to provide a higher sensitivity (93-100%) and accuracy for diagnosing TPE, however, it was an invasive and expensive diagnostic method with a reported 2-6% rate of complications [8,17,18]. The common complications were bleeding, fever, empyema, pneumonia, and prolonged air leak and so on [18]. Besides, several patients with underlying   disease progression and elderly patients cannot tolerate the examination.
In recently years, the Xpert MTB/RIF (Xpert) and/or next-generation Xpert MTB/RIF Ultra (Xpert Ultra), two nucleic acid detection methods, have been increasingly used to diagnose pulmonary TB, rifampicin (RIF) resistance as well as extra-pulmonary TB in various types of clinical specimens endorsed by World Health Organization (WHO) [19,20]. A meta-analysis indicated that the pooled sensitivity of Xpert in diagnosing TPE was only 51.4% [21]. The low sensitivity has compromised its diagnostic capacity for TPE, which might be attributed to the number of mycobacteria and performance of amplification techniques. Therefore, an effective and noninvasive diagnostic method is urgently needed for diagnosing and management of TPE.
Nomograms are a graphical representation of a complex mathematical formula, which are widely used to estimate diagnosis and prognosis for a variety of diseases by integrating clinical, biologic, and/or genetic variables in medicine [22]. Previously, we and other investigators had reported the application of nomogram in differentiating  MPE from BPE [23,24]. In the present study, we developed a scoring system based on a nomogram to distinguish TPE from non-TB BPE. We initially integrated 26 variables, including not only primary clinical and laboratory variables but calculated ratios. We selected six most significant variables (age, effusion lymphocyte, effusion ADA, effusion LDH, effusion LDH/ADA, and serum WBC) analyzed by multivariate regression analysis to construct a predictive model. Our model showed a good diagnostic performance in distinguishing TPE from non-TB BPE in the derivation and validation sets. The integrated six commonly indexes were inexpensive, routinely tested, and readily available in most hospitals, therefore, our model is convenient to apply in clinical practice. Effusion ADA has long been used to diagnose TPE in numerous studies [11,15]. Michot et al. indicated that effusion ADA at an optimal value of 41.5 U/L might be a useful biomarker to differentiate TPE from non-TPE with a sensitivity and specificity were with a sensitivity of 97.1% and a specificity of 92.9% [25]. A study conducted by Garcia-Zamalloa et al. showed a similar cutoff value of effusion ADA with 40U/L [26]. However, a recent study from China showed that best cutoff value of effusion ADA for TBP was 27U/L with a sensitivity of 81% and a specificity of 78% [27]. A similar cutoff value of effusion ADA was also found in our study (22.75 U/L). Therefore, the optimal cutoff values are still controversial due to the prevalence rates of the disease, sample sizes, different test methods, or HIV co-infection [11]. Besides, a similar or even higher level of effusion ADA has been reported in PPE, especially in patients with empyema [28,29]. Effusion LDH was recommended to assist in the classification of patients with complicated parapneumonic effusion (CPPE) [30]. However, an elevated effusion LDH in TPE, PPE, and MPE and the low sensitivity and specificity of LDH in differentiating TPE from PPE limited its utility in clinical practice [30].
The effusion LDH/ADA ratio was also assessed in differentiating TPE from PPE. Wang et al. indicated that effusion LDH/ADA ratio might be a useful biomarker in diagnosing TPE at a cut-off level of 16.20, with a sensitivity of 93.62% and a specificity of 93.06% [31]. Another study from New Zealand also showed that effusion LDH/ ADA ratio at a cutoff value of 15 demonstrated a high sensitivity and specificity in distinguishing TPE from non-TB effusion [32]. Similarly, our study showed a cutoff value of 17.07 for effusion LDH/ADA. Further prospective investigations were needed to validate the results in the future.
To our knowledge, this was the first study to evaluate a scoring system based on a nomogram in distinguishing TPE from non-TB BPE. The developed scoring system might be reliable and accuracy in distinguishing TPE from non-TB BPE, which was assessed by the indexes of sensitivity, specificity, PLR, NLR, PPV, and NPV in the training and validation sets. Our study incorporated the most common and valuable indexes in the predictive model to differentiating TPE from non-TB BPE, which was better than any single variable alone. The six easily accessible and inexpensive variables routinely tested and acquired in most hospitals. Therefore, our diagnostic model for differentiating TPE from non-TB BPE could be easily used in clinical practice in most hospitals, especially in primary hospitals.
Our study had some limitations. First, the present study was retrospective design. Only routine biomarkers in serum and PE were included in the study. Several newly potential biomarkers, such as interleukin 27 (IL-27) and tumor necrosis factor-α (TNF-α), might provide better diagnostic accuracy. Second, external validation was a single-center with a small sample size. Third, our nomogram did not incorporate imaging data into the scoring system, which might be useful. Besides, we also did not compare the diagnostic accuracy of our scoring system and other diagnostic tests for unavailable data, such as IGRAs and Xpert Ultra. Finally, this study was conducted on Chinese patients. Since the incidence of TB differs from country to country, the results of this study cannot be applied to patients in other countries. Further multicentric and prospective investigations containing comprehensive data was needed to validate our results.

Conclusions
Taken together, the present study developed a novel scoring system based on a nomogram with six clinical and laboratory variables to aid differential diagnosis of TPE and non-TB TPE. Our novel scoring system showed a good diagnostic performance and calibration in distinguishing TPE from non-TB TPE in the training set and the validation sets. Further multicentric and prospective investigations should be used to validate the accessible and non-invasive nomogram.