If you don't remember your password, you can reset it by entering your email address and clicking the Reset Password button. You will then receive an email that contains a secure link for resetting your password
If the address matches a valid account an email will be sent to __email__ with instructions for resetting your password
Predictive model for early functional outcomes following acute care after traumatic brain injuries: A machine learning-based development and validation study
Department of Medical Records, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing 100730, ChinaNational Center for Quality Control of Medical Records, Beijing 100730, China
Department of Surgery, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing 100730, China
Department of Medical Records, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing 100730, ChinaNational Center for Quality Control of Medical Records, Beijing 100730, China
Department of Medical Records, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing 100730, ChinaNational Center for Quality Control of Medical Records, Beijing 100730, China
Department of Medical Records, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing 100730, ChinaNational Center for Quality Control of Medical Records, Beijing 100730, China
Department of Neurosurgery, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing 100730, China
Department of Surgery, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing 100730, China
Department of Medical Records, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing 100730, ChinaNational Center for Quality Control of Medical Records, Beijing 100730, China
Department of Neurosurgery, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing 100730, China
Department of Medical Records, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing 100730, ChinaNational Center for Quality Control of Medical Records, Beijing 100730, China
A random forest model for early functional outcomes following acute care after traumatic brain injury (TBI) was developed and validated.
•
The BI score at admission, age, use of nonsurgical treatment, neurosurgery status, and mCCI score were the top 5 prognostic predictors.
•
The predictive model is generalisable by using hospital discharge abstract data.
•
The model can inform decision-making regarding TBI patient management and facilitate health care quality assessment and resource allocation.
Abstract
Introduction
Few studies on early functional outcomes following acute care after traumatic brain injury (TBI) are available. The aim of this study was to develop and validate a predictive model for functional outcomes at discharge for TBI patients using machine learning methods.
Patients and methods
In this retrospective study, data from 5281 TBI patients admitted for acute care who were identified in the Beijing hospital discharge abstract database were analysed. Data from 4181 patients in 52 tertiary hospitals were used for model derivation and internal validation. Data from 1100 patients in 21 secondary hospitals were used for external validation. A poor outcome was defined as a Barthel Index (BI) score ≤ 60 at discharge. Logistic regression, XGBoost, random forest, decision tree, and back propagation neural network models were used to fit classification models. Performance was evaluated by the area under the receiver operating characteristic curve (AUC), the area under the precision-recall curve (AP), calibration plots, sensitivity/recall, specificity, positive predictive value (PPV)/precision, negative predictive value (NPV) and F1-score.
Results
Compared to the other models, the random forest model demonstrated superior performance in internal validation (AUC of 0.856, AP of 0.786, and F1-score of 0.724) and external validation (AUC of 0.779, AP of 0.630, and F1-score of 0.604). The sensitivity/recall, specificity, PPV/precision, and NPV of the model were 71.8%, 69.2%, 52.2%, and 84.0%, respectively, in external validation. The BI score at admission, age, use of nonsurgical treatment, neurosurgery status, and modified Charlson Comorbidity Index were identified as the top 5 predictors for functional outcome at discharge.
Conclusions
We established a random forest model that performed well in predicting early functional outcomes following acute care after TBI. The model has utility for informing decision-making regarding patient management and discharge planning and for facilitating health care quality assessment and resource allocation for TBI treatment.
Traumatic brain injury (TBI) causes temporary or permanent cognitive, behavioural, emotional, and physical impairments, which affect a patient's activities, participation in society and quality of life [
GBD 2016 Traumatic Brain Injury and Spinal Cord Injury Collaborators. Global, regional, and national burden of traumatic brain injury and spinal cord injury, 1990-2016: a systematic analysis for the Global Burden of Disease Study 2016.
GBD 2016 Traumatic Brain Injury and Spinal Cord Injury Collaborators. Global, regional, and national burden of traumatic brain injury and spinal cord injury, 1990-2016: a systematic analysis for the Global Burden of Disease Study 2016.
Early prognosis after TBI is important for clinical decision-making, conducting research, assessing the quality of health care, and allocating medical resources, and it would be helpful in preparing patients and their relatives for expected outcomes [
]. The functional status of patients at discharge from acute hospital care is associated with subsequent rehabilitation arrangements and long-term physical, psychological, and employment outcomes [
Diagnostic accuracy of the Barthel index for measuring activities of daily living outcome after ischemic hemispheric stroke: does early poststroke timing of assessment matter?.
]. Functional outcomes after TBI are of concern to multiple stakeholders, including patients and their relatives, health care providers, and policy-makers. A predictive model for predicting early functional outcomes for TBI patients is useful for informing decision-making regarding patient management and discharge planning and facilitating health care quality assessments and resource allocation for TBI treatment. Few studies are available on functional outcomes following acute care after TBI. De Guise et al. [
] established an ICD-10-based disability predictive index for patients admitted to hospitals with trauma. However, there have been no studies on predicting physical function following acute care after TBI.
Hospital administrative data could serve as a cost-effective data source with good availability for health care research and quality improvement [
]. This study aimed to develop and validate a predictive model for early functional outcomes in TBI survivors by using the Beijing municipal hospital discharge abstract database (DAD).
This study was approved by the Institutional Review Board of Peking Union Medical College Hospital at the Chinese Academy of Medical Sciences and Peking Union Medical College. Patient consent was waived for this retrospective study because it used unidentified patient data.
Patients and methods
Eligibility criteria
TBI patients were defined as those with a principal diagnosis of intracranial injury (World Health Organization (WHO) International Classification of Diseases (ICD)-10 codes: S06.0-S06.9). TBI patients who had a Barthel index (BI) score ≤ 60 at admission were included in this study. Patients who were under 14 years old, admitted for rehabilitation, readmitted for previous injuries, or died during hospitalization were excluded.
Data source
The Beijing hospital DAD is an administrative database that routinely collects hospital discharge abstract data from 107 (74.3%) secondary and 87 (75.0%) tertiary hospitals in Beijing. Information on eligible TBI patients who were admitted to secondary and tertiary hospitals from 1 January to 31 December 2017 was accessed from the DAD. Data from 4181 TBI patients from 52 tertiary hospitals and 1100 patients from 21 secondary hospitals were analysed in this study.
The following data were extracted from the database: demographic information; ICD-10 codes for primary diagnoses; the first 10 secondary diagnoses; ICD-9 Clinical Modification Volume 3 (ICD-9-CM-3) codes for surgeries and procedures; functional status at admission and at discharge, as assessed by the Barthel index (BI) [
]; duration of loss of consciousness (LOC) after the injury; ICU admission; mechanical ventilation use; duration of ventilation; and length of hospital stay (LOS). Individual scores for 10 activities, including feeding, bathing, grooming, dressing, bowel control, bladder control, toileting, transferring from a bed to a chair, walking and stair climbing, were evaluated and documented by a trained nurse on the first day of admission and on the day of discharge. The BI score was calculated by summing the 10 individual scores. Duration of LOC after the injury refers to the total time of a state in which the patient lacks normal awareness of self and the surrounding environment after the injury, which was calculated by summing the duration of LOC prior to hospital and that during hospitalization.
A total of 4716 patients were missing data on the duration of ventilation. However, all the patients were missing data on the duration of ventilation because they did not use mechanical ventilation during hospitalization. Thus, the missing values were recoded to 0 in the preprocessing of the data.
Outcome measures
A poor outcome in this study was defined as a BI score ≤ 60 at discharge, which was used to describe severe physical disability in previous studies [
]. The BI assessment was performed for all patients by trained nurses at discharge.
Candidate predictor variables
A number of candidate predictor variables were considered in this study, including age, sex, comorbidities, injury nature, injury severity, multiple injuries, BI score at admission, duration of LOC, type of treatment, ICU admission, mechanical ventilation use, duration of ventilation, and LOS.
A modified Charlson Comorbidity Index (mCCI) was used to measure comorbidities by weighting up to 10 secondary diagnoses in the study population. The mCCI score was generated by summing the weighted values for each comorbid condition as described by Bouamra et al. [
]. The mCCI score was input as a continuous variable for model derivation.
The nature of the TBI was divided into 9 groups: concussion (ICD-10 code: S06.0), traumatic cerebral oedema (ICD-10 code: S06.1), diffuse brain injury (ICD-10 code: S06.2), focal brain injury (ICD-10 code: S06.3), epidural haemorrhage (ICD-10 code: S06.4), traumatic subdural haemorrhage (ICD-10 code: S06.5), traumatic subarachnoid haemorrhage (ICD-10 code: S06.6), intracranial injury with prolonged coma (ICD-10 code: S06.7), other intracranial injuries (ICD-10 code: S06.8), and unspecified intracranial injury (ICD-10 code: S06.9).
The International Classification of Diseases-based Injury Severity Score (ICISS) was used to estimate the severity of injuries. Survival risk ratios (SRRs) were assigned to each ICD-10 code at the three- or four-digit level by using the reference dataset of previously reported diagnosis-specific SRRs [
An analysis of the effectiveness of a state trauma system: treatment at designated trauma centers is associated with an increased probability of survival.
J Trauma Acute Care Surg.2015; 78 (706-712; discussion 712-714)
Multiple injuries were defined as the concurrence of TBI and injuries to the thorax, abdomen/lower back/pelvis, upper extremities, lower extremities, or spine/spinal cord.
The type of treatment was divided into 4 groups: neurosurgery (ICD-9-CM-3 codes: 01–05), orthopaedic surgery (ICD-9-CM-3 codes: 76–84), other surgery (ICD-9-CM-3 codes excluding 01–05 and 76–84) and nonsurgical treatment (without ICD-9-CM-3 codes).
The other candidate predictors, including age, sex, BI score at admission, duration of LOC, ICU admission, mechanical ventilation use, duration of ventilation and LOS, were taken directly from the DAD.
Model derivation and validation
The cohort from the tertiary hospitals was randomly split by a ratio of 8:2 into derivation (79.9%, 3342) and internal validation (21.1%, 839) datasets. The cohort from the secondary hospitals was used for external validation. The derivation dataset was used to determine the optimized parameters of the predictive model. The internal and external validation datasets were used to evaluate the performance of the trained models. One-hot encoding was used to encode categorical variables. Age, mCCI score, BI score at admission, duration of LOC, duration of ventilation, and LOS were input as continuous variables.
Logistic regression, XGBoost, random forest, decision tree, and back propagation (BP) neural network classification models were used. As information on LOS is available only at discharge, we input it for the derivation of the model to assess its effect on the outcome but excluded it in the final model. For each model, a grid search method was used to adjust the parameters and improve the generalization performance of the models. The optimal parameters for each model are shown in Supplemental Digital Content Table S1. In the random forest model, Gini importance (mean decrease in impurity) was used to calculate each feature importance.
Statistical analysis
Normality was tested using the Kolmogorov–Smirnov method for continuous variables. Continuous variables are mainly described as medians and interquartile ranges (IQRs) and were analysed using Kruskal–Wallis test in this study. Most patients did not experience LOC or mechanical ventilation use and had a mCCI score of 0. Therefore, the duration of LOC was categorized into 〈 0.5 h, 0.5–24 h, and 〉 24 h; duration of ventilation was categorized into 0 h, 1–96 h, and > 96 h; and the mCCI score was categorized into 0, 1–5 and > 6 for presenting baseline characteristics. Frequencies and percentages are used to describe categorical variables. The statistical significance of differences in the proportions amongst different groups was determined using χ2 tests. Two-tailed tests with P < 0.05 were considered significant.
We calculated the following performance metrics for each model on the internal validation, external validation, and the combined internal and external validation datasets: areas under the receiver operating characteristic curve (AUCs), areas under the precision-recall curve (APs), F1-scores, sensitivity/recall, specificity, positive predictive values (PPVs)/precision, and negative predictive values (NPVs). The 95% confidence intervals (CIs) for each value were calculated. A higher AUC value indicates better discrimination of the model. An AUC ≥ 0.9 was considered outstanding discrimination, 0.8 ≤ AUC < 0.9 was considered excellent, 0.7 ≤ AUC < 0.8 was considered acceptable, and 0.5 ≤ AUC < 0.7 was considered poor or no discrimination. Calibration plots were examined to assess the reliability of the candidate models.
Statistical analyses and the derivation and validation of the models were carried out with SPSS Version 23.0 (Chicago, USA) and Python Version 3.7.0. Scikit-learn library Version 0.22.2 and Keras library Version 2.2.4 were used. XGBoost, random forest, decision tree, and logistic regression models were generated using Scikit-learn, and BP neural network was constructed using Keras.
Sensitivity analyses
Two sensitivity analyses were performed to assess the robustness of the model. The combined cohort from tertiary and secondary hospitals and a cohort of TBI patients without multiple injuries were used for model training and validation.
Results
Baseline characteristics
Of the 4181 patients from tertiary hospitals, 3342 (79.9%) patients were used for model derivation, and 839 (20.1%) patients were used for internal validation of the models. All 1100 patients from the secondary hospitals were used for external validation of the models. Table 1 shows the baseline characteristics of the patients in the derivation, internal validation and external validation cohorts. The incidence rates of poor functional outcomes of TBI patients at discharge were 37.1%, 37.2% and 31.9% in the derivation, internal validation and external validation cohorts, respectively. Significant differences between the patients in the derivation cohort and those in the internal validation cohort were found for age (P = 0.022) and LOS (P = 0.032). The median ages of the patients were 51.0 (IQR: 36.0–64.0) and 52.0 (IQR: 40.0–66.0) years in the derivation and internal validation cohorts, respectively, while the median LOSs were 11 (IQR: 6–18) and 12 (IQR: 6–20) days in the two cohorts. Significant differences between the patients in the derivation cohort and those in the external validation cohort were found for age (P < 0.001), injury nature (P < 0.001), multiple injuries (P = 0.005), type of treatment (P < 0.001), mechanical ventilation use (P = 0.016), duration of ventilation (P = 0.019), LOS (P < 0.001), and outcome (P = 0.002).
Table 1Baseline characteristics of the study cohorts.
Characteristics
Tertiary Hospitals
Secondary Hospitals
Derivation Cohort (n = 3342)
Internal Validation Cohort (n = 839)
External Validation Cohort (n = 1100)
Age, year
51.0 (36.0–64.0)
52.0 (40.0–66.0)
55.0 (45.0–66.8)
Sex
Female
1036 (31.0%)
269 (32.1%)
348 (31.6%)
Male
2306 (69.0%)
570 (67.9%)
752 (68.8%)
mCCI score
0
2689 (80.5%)
651 (77.6%)
893 (81.2%)
1–5
516 (15.4%)
148 (17.6%)
175 (15.9%)
>6
137 (4.1%)
40 (4.8%)
32 (2.9%)
Injury nature (ICD-10 code)
S06.0
143 (4.3%)
35 (4.2%)
126 (11.5%)
S06.1
0 (0.0%)
0 (0.0%)
0 (0.0%)
S06.2
248 (7.4%)
62 (7.4%)
159 (14.5%)
S06.3
358 (10.7%)
112 (13.3%)
159 (14.5%)
S06.4
234 (7.0%)
53 (6.3%)
58 (5.3%)
S06.5
522 (15.6%)
134 (16.0%)
140 (12.7%)
S06.6
287 (8.6%)
64 (7.6%)
142 (12.9%)
S06.7
674 (20.2%)
174 (20.7%)
299 (27.2%)
S06.8
94 (2.8%)
20 (2.4%)
28 (2.5%)
S06.9
782 (23.4%)
185 (22.1%)
38 (3.5%)
ICISS
0.85–1
2725 (81.5%)
677 (80.7%)
878 (79.8%)
< 0.85
617 (18.5%)
162 (19.3%)
222 (20.2%)
Multiple injuries (yes)
1696 (50.7%)
426 (50.8%)
505 (45.9%)
BI score at admission
40.0 (20.0–50.0)
40.0 (20.0–50.0)
35.0 (20.0–50.0)
LOC, hour
< 0.5
3030 (90.7%)
753 (89.7%)
1021 (92.8%)
0.5–24
89 (2.7%)
28 (3.3%)
33 (3.0%)
>24
223 (6.7%)
58 (6.9%)
46 (4.2%)
Type of treatment
Nonsurgical treatment
2651 (79.3%)
665 (79.3%)
815 (74.1%)
Neurosurgery
637 (19.1%)
156 (18.6%)
192 (17.5%)
Orthopaedic surgery
35 (1.0%)
12 (1.4%)
18 (1.6%)
Other surgeries
19 (0.6%)
6 (0.7%)
77 (6.4%)
ICU admission (yes)
694 (20.8%)
189 (22.5%)
229 (20.8%)
Mechanical ventilation use (yes)
371 (11.1%)
100 (11.9%)
94 (8.5%)
Duration of ventilation (hours)
0
2971 (88.9%)
739 (88.1%)
1006 (91.5%)
1–96
170 (5.1%)
56 (6.7%)
64 (5.8%)
> 96
201 (6.0%)
44 (5.2%)
30 (2.7%)
LOS (days)
11 (6–18)
12 (6–20)
14 (8–22)
Outcome (BI score ≤ 60 at discharge)
1239 (37.1%)
312 (37.2%)
351 (31.9%)
BI, Barthel Index; ICISS, International Classification of Diseases-based Injury Severity Score; mCCI, modified Charlson Comorbidity Index; LOC, loss of consciousness; LOS, length of hospital stay.
Significant differences were observed for all the candidate predictor variables between the better outcome group (BI score > 60 at discharge) and the poorer outcome group (BI score ≤ 60 at discharge), except for sex. A total of 24 variables were identified as predictors in our study.
Candidate models were fitted by using logistic regression, XGBoost, random forest, decision tree, and BP neural network models. The receiver operating characteristic (ROC) curves and the precision-recall (PR) curves of the internal and external validations for the fitted models that included LOS are shown in Supplemental Digital Content Figures S1-S3. The ROC curves and PR curves for the fitted models that excluded LOS are shown in Fig. 1, Fig. 2, Fig. 3. Changes in the AUC values by < 0.030 were observed when LOS was excluded from the models. The performance metrics of the models excluding LOS are summarized in Table 2. The AUC values of the random forest model were slightly greater than those of the other models. The AUC of the random forest model with the internal validation cohort was 0.856 (95% CI: 0.830–0.883). The model maintained acceptable discrimination with the external validation cohort, with an AUC of 0.779 (95% CI: 0.750–0.808). In the validation with the combined internal and external validation cohorts, the random forest model had an AUC of 0.813 (95% CI: 0.793–0.833). The AP value and the F1-score of the random forest model with the internal validation cohort were 0.786 (95% CI: 0.740–0.832) and 0.724 (95% CI: 0.694–0.754). The AP value (0.630) and the F1-score (0.604) of random forest model were greater than those of the other models in the external validation. The sensitivity/recall, specificity, PPV/precision and NPV of the random forest model were 73.2%, 74.1%, 59.5% and 84.2% in the combined validation cohort, respectively (Table 2). The confusion matrices at the optimal operating point of the models are shown in Supplemental Digital Content Table S2. The calibration plots for each model are shown in Supplemental Digital Content Figure S4. The random forest model was selected as our final model because it consistently showed the best discrimination and calibration in both the internal and external validation cohorts.
Fig. 1Receiver operating characteristic curves (a) and precision-recall curves (b) of the candidate models for internal validation. AUC, area under the receiver operating characteristic curve. AP, area under the precision-recall curve.
Fig. 2Receiver operating characteristic curves (a) and precision-recall curves (b) of the candidate models for external validation. AUC, area under the receiver operating characteristic curve. AP, area under the precision-recall curve.
Fig. 3Receiver operating characteristic curves (a) and precision-recall curves (b) of the candidate models for combined internal and external validation. AUC, area under the receiver operating characteristic curve. AP, area under the precision-recall curve.
Table 2Performance metrics of the candidate models in the validation cohorts.
Internal Validation Cohort (n = 839)
External Validation Cohort (n = 1100)
Internal & External Validation Cohort (n = 1939)
XGBoost
AUC
0.855 (0.828–0.882)
0.770 (0.741–0.800)
0.808 (0.787–0.828)
AP
0.795 (0.750–0.840)
0.618 (0.567–0.669)
0.695 (0.660–0.730)
Sensitivity/Recall
0.667 (0.615–0.719)
0.615 (0.564-0.666)
0.640 (0.603-0.677)
Specificity
0.875 (0.847–0.903)
0.786 (0.757-0.815)
0.823 (0.802-0.844)
PPV/Precision
0.759 (0.708–0.810)
0.574 (0.524-0.624)
0.652 (0.615-0.689)
NPV
0.816 (0.784–0.848)
0.814 (0.786-0.842)
0.815 (0.794-0.836)
F1-score
0.710 (0.679–0.741)
0.594 (0.565-0.623)
0.646 (0.625-0.667)
Random Forest
AUC
0.856 (0.830–0.883)
0.779 (0.750–0.808)
0.813 (0.793–0.833)
AP
0.786 (0.740–0.832)
0.630 (0.579–0.681)
0.695 (0.660–0.730)
Sensitivity/Recall
0.747 (0.699–0.795)
0.718 (0.671-0.765)
0.732 (0.698-0.766)
Specificity
0.812 (0.779–0.845)
0.692 (0.659-0.725)
0.741 (0.717-0.765)
PPV/Precision
0.702 (0.653–0.751)
0.522 (0.477-0.567)
0.595 (0.561-0.629)
NPV
0.844 (0.812–0.876)
0.840 (0.811-0.869)
0.842 (0.821-0.863)
F1-score
0.724 (0.694–0.754)
0.604 (0.575-0.633)
0.656 (0.635-0.677)
Decision Tree
AUC
0.722 (0.685–0.759)
0.652 (0.615–0.687)
0.683 (0.657–0.709)
AP
0.557 (0.502–0.612)
0.423 (0.371–0.475)
0.480 (0.442–0.518)
Sensitivity/Recall
0.657 (0.604–0.710)
0.581 (0.529-0.633)
0.617 (0.580-0.654)
Specificity
0.778 (0.743–0.813)
0.718 (0.686-0.750)
0.743 (0.719-0.767)
PPV/Precision
0.637 (0.584–0.690)
0.492 (0.444-0.540)
0.555 (0.519-0.591)
NPV
0.793 (0.758–0.828)
0.785 (0.754-0.816)
0.789 (0.766-0.812)
F1-score
0.647 (0.615–0.679)
0.533 (0.504-0.562)
0.584 (0.562-0.606)
BP Neural Network
AUC
0.847 (0.819–0.874)
0.764 (0.734–0.796)
0.800 (0.779–0.820)
AP
0.776 (0.730–0.822)
0.593 (0.542–0.644)
0.670 (0.634–0.706)
Sensitivity/Recall
0.686 (0.635–0.737)
0.618 (0.567-0.669)
0.65 (0.614-0.686)
Specificity
0.880 (0.852–0.908)
0.780 (0.750-0.810)
0.821 (0.800-0.842)
PPV/Precision
0.773 (0.724–0.822)
0.568 (0.518-0.618)
0.654 (0.618-0.690)
NPV
0.826 (0.795–0.857)
0.813 (0.784-0.842)
0.819 (0.798-0.840)
F1-score
0.727 (0.697–0.757)
0.592 (0.563-0.621)
0.652 (0.631-0.673)
Logistic Regression
AUC
0.842 (0.814–0.870)
0.777 (0.748–0.806)
0.805 (0.785–0.825)
AP
0.771 (0.724–0.818)
0.627 (0.576–0.678)
0.689 (0.654–0.724)
Sensitivity/Recall
0.779 (0.733–0.825)
0.752 (0.707-0.797)
0.765 (0.733-0.797)
Specificity
0.755 (0.718–0.792)
0.649 (0.615-0.683)
0.693 (0.668-0.718)
PPV/Precision
0.653 (0.605–0.701)
0.501 (0.458-0.544)
0.564 (0.532-0.596)
NPV
0.852 (0.820–0.884)
0.848 (0.819-0.877)
0.850 (0.828-0.872)
F1-score
0.711 (0.680–0.742)
0.601 (0.572-0.630)
0.649 (0.628-0.670)
AUC, area under the receiver operating characteristic curve; AP, area under the precision-recall curve; PPV, positive predictive value; NPV, negative predictive value.
Importance scores for each feature in the selected random forest model are shown in the Supplemental Digital Content, Table S3. The top 5 predictors for the selected random forest model were the BI score at admission, age, use of nonsurgical treatment, neurosurgery status, and mCCI score. In comparison with the patients in the better outcome group, those in the poorer outcome group had a lower BI score at admission (median: 40 (IQR: 0–40) vs. 20 (IQR: 30–55), P < 0.001), older age (median: 47 (IQR: 33–58) vs. 60 (IQR: 45–75), P < 0.001), a lower proportion of nonsurgical treatment (90.4 % vs. 60.5%, P < 0.001), a higher proportion of neurosurgery (8.7 % vs. 36.7%, P < 0.001), and a higher mCCI score (median: 0 (IQR: 0–0) vs. 0 (IQR: 0–1), P < 0.001).
Sensitivity analyses
Consistent results were obtained in the sensitivity analyses, in which the combined dataset from the tertiary and secondary hospitals (Supplemental Digital Content, Figure S5) and the dataset of TBI patients without multiple injuries (Supplemental Digital Content, Figure S6) were used for the training and validation of the model. The random forest model demonstrated an AUC of 0.828 (95% CI: 0.802–0.854) and that of 0.819 (95% CI: 0.783–0.855) in the two sensitivity analyses, showing the robustness of the model.
Discussion
In this study, we developed a machine learning model to predict poor functional outcomes at discharge for TBI patients using a hospital DAD with predictors that are easily available during acute care.
Since the BI score was proposed as an indicator for functional status [
], it has been widely used in the evaluation and prediction of therapeutic effects, LOS and prognosis because of its simplicity, reliability and validity, and high correlations with other measures of physical disability [
]. Functional evaluation has been required for all inpatients at admission and at discharge by trained nurses using the BI in all hospitals in Beijing since 2012. BI scores have also been included in the municipal hospital administrative database in the study area since 2012.
The target population in this study was TBI patients who were admitted for acute care with a BI score ≤ 60 at admission. TBI patients with a BI score > 60 at admission were excluded from this study because they had a lower risk of poor functional outcomes; including them in the study would lead to potential model overfitting. The outcome parameter is evaluated at hospital discharge and varies with LOS. Thus, models were fitted initially with LOS included and then with LOS excluded to assess the effects of LOS on the outcome. The AUC values of the candidate models slightly decreased when LOS was excluded from the models, implying that LOS might play a less important role than the other predictors in the outcome prediction. In addition, LOS is unavailable for early prediction, so it was excluded from the final models.
The BI score at admission, age, use of nonsurgical treatment, neurosurgery status, and mCCI score were identified as the top 5 predictors. The patients with lower BI scores at admission, older age and higher mCCI scores and those who underwent neurosurgery were more likely to experience difficulties in performing daily activities at discharge, which is consistent with clinical intuition. These predictors are important for risk adjustment in functional prognostic studies of TBI patients. The BI score at admission was identified as the most important predictor for functional outcomes at discharge in TBI patients. It was previously reported to be a simple, reliable and strong predictor of the BI score at discharge in stroke studies [
]. The BI is an easy and useful tool to evaluate the performance of activities of daily living at admission and discharge for hospitalized TBI patients.
The cohort from tertiary hospitals in Beijing was considered a homogenous study population, which was used for model derivation and internal validation, while that from secondary hospitals was considered a heterogeneous study population and was used for external validation. The patients in the derivation and external validation cohorts were different in age, injury nature, presence of multiple injuries, type of treatment, use of mechanical ventilation, duration of ventilation, LOS, and functional outcome, which supported the heterogeneity of the two cohorts. Regarding internal validation, the random forest and XGBoost models showed similar discrimination with AUCs over 0.850, followed by the BP neural network, logistic regression, and decision tree models. Regarding external validation, the random forest, logistic regression and XGBoost models maintained acceptable discrimination with AUCs over 0.770. The AP value and the F1-score of random forest model were greater than those of the other models in the external validation. We selected the random forest model as the final model, which showed the best discrimination and acceptable calibration in both internal and external validation amongst all the candidate models. The selected model developed using the cohort from tertiary hospitals showed acceptable performance in the validation using the heterogeneous cohort from secondary hospitals, indicating the generalizability of the model.
Logistic regression is a conventional statistical method for model derivation. In this study, multiple machine learning methods, including XGBoost, random forest, BP neural network and decision tree models, along with logistic regression, were used for model derivation. The data demonstrated that the machine learning methods did not considerably outperform the conventional method in our study. The types of data available for analysis might have limited the performance of the machine learning methods. Machine learning methods benefit more from an unprocessed natural language and a large number of input variables [
The ICD-10 has been widely used in morbidity coding. A few ICD-10 clinical modifications have been developed to address country-specific needs, such as the ICD-10-AM (Australia), ICD-10-CA (Canada), ICD-10-CM (USA) ICD-10-GM (Germany), ICD-10-KM (Korea), ICD-10-TM (Thailand) [
], and ICD-10 Chinese Modification. In general, the national modifications were developed by extending the WHO ICD-10 with more granularity below the basic three- or four-digit categories. In this study, the ICD-10-based predictors, including the nature of the injury, ICISS, and mCCI score, were derived from the basic ICD-10 three- or four-digit codes of diagnoses rather than the extended codes. Therefore, the model could be applied worldwide.
From a patient care perspective, the prognosis of an individual patient is important for both in-hospital care and subsequent care arrangements after discharge. Predictions during acute care may help physicians stratify patients by risk of potential functional impairment, prioritize care accordingly, and inform discharge planning. The model could also be helpful in preparing patients and caregivers by indicating the need for nursing assistance after acute care and in providing guidance in the arrangement of discharge destinations. The PPV/precision of the selected model was 70.2% in the internal validation cohort from tertiary hospitals; however, it decreased to 52.0% in the external validation cohort from secondary hospitals. Use of this model in a secondary hospital should be with cautious because a positive prediction using the model has a higher probability of being a false positive. The NPVs of the selected model were over 84% in the internal, external and the combined internal and external validation cohort, which indicates that negative prediction using the model is helpful to rule out poor functional outcome following acute care after TBI in secondary and tertiary hospitals. It should be noted that TBI patients who died during hospitalization were not included in the modelling. Thus, if the model were to be used in clinical practice, physicians and caregivers would need to be told that the prediction would only be valid if the patient survived.
From a systemic and administrative perspective, the predictions using this model could be applied for performance assessments of the quality of care in a given hospital or region. It is known that poor hospital care is one of the explanations for deaths and disabilities caused by trauma, in addition to a lack of preventive programs, poor prehospital care and the unavailability of rehabilitation medical services [
]. TBI imposes a great challenge for low- and middle-income countries due to their suboptimal health care systems. Strategies to assess the quality of acute care are needed to improve the outcomes of patients with TBI. According to the calibration plots, the probability of a poor outcome was overestimated to some extent for all the candidate models, which supports the utility of the model to screen patients who might receive suboptimal health care for further review and assessment. The quality of health care delivery could be indicated by comparing the observed outcomes with the expected outcomes [
Diagnostic accuracy of the Barthel index for measuring activities of daily living outcome after ischemic hemispheric stroke: does early poststroke timing of assessment matter?.
]. Because all the predictors for functional outcomes identified in our study are easily retrievable variables from a hospital DAD, the prognostic model has great potential for the quality assessment of health care delivery following nonfatal TBI in a specific hospital or in a specific area.
This study had some limitations. First, this was a retrospective study in which the impact could not be evaluated for the outcomes of variables that were undocumented in the hospital DAD. The value of Glasgow Coma Scale (GCS) scores in the prediction of functional outcomes has not yet been shown to be consistent. In this study, GCS scores were not included in the analysis because of data availability. Instead, we included the BI score at admission, duration of LOC and ICISS in the models to reflect injury severity. The covariations between GCS and BI scores at admission in TBI patients are of interest for future studies. Second, the BI score was the only outcome measure in this study. TBI patients could have physical, cognitive and behavioural impairments. This BI index reflects daily activity performance, which is one dimension of physical functioning. There have been some well-recognized multidimensional instruments for assessing functional status in acute care following TBI, including the Functional Independence Measure (FIM), the Functional Status Examination (FSE), the Glasgow Outcome Scale-Extended (GOSE), and the Disability Rating Scale (DRS) [
A multidimensional rasch analysis of the functional independence measure based on the national institute on disability, independent living, and rehabilitation research traumatic brain injury model systems national database.
]. Further studies are required to predict functional outcomes assessed by multidimensional instruments following acute care after TBI. Third, the study population was from a single area (Beijing). Although considerable generalizability of the model was shown in this study, external validation in cohorts from different areas is warranted.
Conclusions
By using machine learning methods and the hospital DAD, we developed a model for predicting poor functional outcomes at discharge for TBI patients. The model could be applied in early prognosis to inform decisions during acute hospital care and for subsequent rehabilitation arrangements after discharge, to inform patients and relatives about the needs for nursing assistance and to facilitate health care quality assessments and resource allocation for TBI treatment.
Data availability
The data that support the findings of this study are available upon reasonable request with the permission of the Beijing Municipal Health Big Data and Policy Research Center.
Funding
This study was supported by the Fundamental Research Funds for the Central Universities (No. 3332021004) and the National High Level Hospital Clinical Research Funding (No. 2022-PUMCH-A-223). The funding sources had no involvement in the study design, data collection, analysis and interpretation, preparation of the manuscript, or decision to submit the article for publication.
CRediT authorship contribution statement
Meng Zhang: Conceptualization, Methodology, Resources, Data curation, Formal analysis, Software, Validation, Writing – original draft. Moning Guo: Resources, Data curation, Formal analysis, Writing – original draft. Zihao Wang: Writing – original draft, Writing – review & editing. Haimin Liu: Software, Validation, Writing – original draft. Xue Bai: Resources, Data curation, Formal analysis, Writing – original draft. Shengnan Cui: Resources, Data curation, Formal analysis, Writing – original draft. Xiaopeng Guo: Writing – original draft, Writing – review & editing. Lu Gao: Writing – original draft, Writing – review & editing. Lingling Gao: Software, Validation, Writing – original draft. Aimin Liao: Resources, Data curation, Formal analysis, Writing – original draft. Bing Xing: Conceptualization, Methodology, Writing – original draft, Writing – review & editing. Yi Wang: Conceptualization, Methodology, Writing – original draft.
Declarations of Competing Interest
None.
Acknowledgments
We thank Bingqing Yang, Fengxiang Chang and Yafei Shang for their help with data arrangements.
Traumatic Brain Injury and Spinal Cord Injury Collaborators. Global, regional, and national burden of traumatic brain injury and spinal cord injury, 1990-2016: a systematic analysis for the Global Burden of Disease Study 2016.
Diagnostic accuracy of the Barthel index for measuring activities of daily living outcome after ischemic hemispheric stroke: does early poststroke timing of assessment matter?.
An analysis of the effectiveness of a state trauma system: treatment at designated trauma centers is associated with an increased probability of survival.
J Trauma Acute Care Surg.2015; 78 (706-712; discussion 712-714)
A multidimensional rasch analysis of the functional independence measure based on the national institute on disability, independent living, and rehabilitation research traumatic brain injury model systems national database.