Coronavirus disease 2019 (COVID-19): a literature review.
Hospital-onset COVID-19 infections (HOCIs) have been reported to account for 12·0–15·0% of all COVID-19 cases in health-care settings and up to 16·2% at the peaks of the pandemic.
Hospital-acquired SARS-Cov-2 infections in patients: inevitable conditions or medical malpractice?.
Although their effect is yet to be fully quantified, HOCIs amplify the pandemic by seeding further outbreaks.
Predicting which patients are at risk of health-care-associated infection (HCAI) can prevent onward transmission to patients and staff, also minimising workload during outbreaks. Traditionally, predicting HCAI has relied on identifying risk factors from combinations of patient clinical variables (eg, age, gender identity, and comorbidities) and hospital contextual variables (eg, colonisation pressure and patients’ length of stay).
Although these approaches alone can perform reasonably well in identifying predictive risk factors of HCAIs, they overlook the fact that nosocomial spread of infection depends largely on the patient’s contacts,
Isolating and grouping patients who are infected, or suspected to be infected, to one area prevents onward spreading by interrupting transmission chains.
Epidemiological data and genome sequencing reveals that nosocomial transmission of SARS-CoV-2 is underestimated and mostly mediated by a small number of highly infectious individuals.
and secondary cases and has played a pivotal role in national COVID-19 responses.
Epidemiological changes on the Isle of Wight after the launch of the NHS Test and Trace programme: a preliminary analysis.
However, exploiting the entire contact network, rather than direct contacts to individuals with known infection alone, provides greater information to characterise transmission.
Associations between changes in population mobility in response to the COVID-19 pandemic and socioeconomic factors at the city level in China and country level worldwide: a retrospective, observational study.
In health-care settings, the overall number of direct contacts of a patient is predictive of HCAI.
The risk of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) transmission from patients with undiagnosed coronavirus disease 2019 (COVID-19) to roommates in a large academic medical center.
The risk of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) transmission from patients with undiagnosed coronavirus disease 2019 (COVID-19) to roommates in a large academic medical center.
fail to use the full dynamic information of contacts.
Throughout the COVID-19 pandemic, health-care facilities have had considerable numbers of hospital-onset COVID-19 infections (HOCIs). Despite substantially higher rates of COVID-19 morbidity and mortality among hospitalised patients, predictive models of HOCI are yet to be fully used in health-care settings. To address this gap, we have designed a machine-learning framework that integrates dynamic patient contact-networks with traditional patient clinical risk factors and contextual hospital variables. Patient contact networks are a natural approach to model the contact-mediated transmission of COVID-19 and other infectious diseases. Our study investigates the use of contact-network variables in predicting HOCIs at the patient level and their generalisability to various hospital settings. We performed two searches on PubMed (Sept 22, 2021) for English-language articles. Search one was on prediction of HOCIs, using the search terms “hospital-onset COVID-19 infections”, “nosocomial COVID-19”, “prediction”, and “forecasting”; search two was on the use of contact-networks for prediction of infections acquired in health-care settings, based on the search terms “healthcare-acquired infections”, “nosocomial infections”, “prediction”, “forecasting”, “contact networks”, and “dynamic contact networks”. Search one identified no studies performing a comprehensive investigation into risk factors of HOCI at the patient level. Although several works examined HOCI epidemiology, providing characterisation of contacts, these studies were performed at single hospital sites, with few patients, and did not include a risk-factor analysis. Other studies examined risk factors for predicting patient risk of COVID-19 on hospital admission; however, by definition, these studies target only community-onset COVID-19 infections and not HOCIs and thus do not capture the in-hospital sources of exposure risk. Search two identified studies that used the total number of patient contacts or the total number of contacts with infectious cases. However, no studies of infections acquired in health-care settings incorporated contact connectivity beyond a patient’s immediate contacts to predict infection risk. Furthermore, the studies found in our searches did not use sophisticated network-theoretical measures or modelling techniques to predict individual patient risk, nor did they account for the time-varying nature of the contacts.
Added value of this study
To our knowledge, this is the first study to forecast HOCIs at the patient level by constructing contact-networks from routinely collected hospital bed records. To investigate the predictive use of patient contact-networks, we used a large multinational hospital dataset collected throughout extended periods of the COVID-19 pandemic in two hospital groups; one in London, UK, and one in Geneva, Switzerland. Using these datasets, we constructed and generalised models to predict HOCIs at the patient level both with and without measures of patient centrality calculated using the dynamic patient contact-networks. Our results show that variables extracted from patient contact-networks are strong predictors of HOCI in both testing and validation. Such network measures lead to improved prediction over standard risk-factor models on the basis of patient clinical data or hospital contextual variables. Most network-derived variables were significantly elevated in HOCIs, emphasising their importance as risk factors.
Implications of all the available evidence
This study shows that dynamic contact-networks provide novel sources of predictive power for respiratory infections acquired in health-care settings, improving the performance of traditional risk-factor prediction models for HOCIs. Contact-network-derived risk factors have the potential to enhance individualised infection prevention and early diagnosis. We designed a machine-learning framework to extract contact risk factors using routinely available bed administrative data and showed its novel and generalisable prediction power. The framework can be used in real time to generate daily risk predictions as part of a suite of surveillance tools in modern, data-driven infection prevention and control strategies.
In this study, we combine dynamic networks of patient contacts (based on bed allocation records) with clinical attributes and hospital contextual data into a novel forecasting framework to predict patient risk of HOCI acquisition for targeting preventive interventions. As a proof of principle, we perform a retrospective cohort study to assess the predictive power of risk factors that were extracted from patient-contact networks, constructed from routinely collected hospital data. We train and test models on a large London hospital dataset spanning the first two major UK surges of COVID-19 (ie, March 23–May 30, 2020 and Sept 7, 2020–April 24, 2021). We then validate the predictive gain from contact-network risk factors by applying the framework to an external dataset from a university-affiliated geriatric hospital in Geneva during surge one (ie, March 1–May 31, 2020) and to data from the same London hospital group after surge two (ie, after April 2–Aug 13, 2021) in the UK, when COVID-19 had become endemic.
Results
A total of 51 157 patients were admitted to the London hospital group during the study’s training and testing period (April 1, 2020–April 1, 2021). Of these patients, 3439 (6·7%) patients tested positive for SARS-CoV-2, including 2950 (5·8%) COCIs and 489 (1·0%) HOCIs (appendix p 9). Together, 21 576 (42·2%) patients had stayed at least 3 days in hospital and were included in the forecasting data (489 HOCIs and 21 087 non-HOCIs).
The prevalence of in-hospital COVID-19 cases had two surges congruent with national UK cases (figure 1). Surge one peaked on March 30, 2020 (ie, one day before the study period), at 59 new daily positive hospital cases (50 COCIs and nine HOCIs); surge two peaked on Jan 6, 2021, at 64 new daily cases (50 COCIs and 14 HOCIs). The two surges differed when analysing the time series (appendix p 10): the proportion of HOCIs was higher during surge two (17·8% HOCIs [406 of 2276 infections were HOCIs]) than during surge one (15·1% HOCIs [167 of 1107 infections were HOCIs]) and the correlation between HOCIs and COCIs was higher during surge two (R=0·79; pappendix p 10).
Figure 1Background hospital infections and contact structure across the study period
Daily number of new patients who tested positive for COVID-19 within the hospital (COCI and HOCI) varied substantially across the study period. A peak of 59 cases was reached on March 30, 2020, and a peak of 64 cases was reached on Jan 6, 2021, dipping to zero new daily cases over days during July, August, September, and October. The patient-contact network also varied across the study period, with differences in connectivity and size of patient-contact clusters between each of the infection surges and during the summer period. COCI=community-onset COVID-19 infection. HOCI=hospital-onset COVID-19 infection.
The patient-contact network structure also varied throughout the pandemic (figure 1). The median number of contacts (degree) over networks across time was four in rooms (ie, the number of people sharing a room), 22 in wards (ie, the number of people sharing a ward), and 67 in buildings (ie, the number of people located in the same building at the same time), with an increasing trend over time (appendix p 12). Surge one had lower median degrees (three in rooms, 18 in wards, and 57 in buildings) than did surge two (four in rooms, 23 in wards, and 70 in buildings). Other network measures also varied over the study period (appendix p 12), with network metrics reflecting a denser contact-network in surge two than in surge one (figure 1).
Univariate analysis identified ten clinical variables that were differentially represented in patients with HOCI versus controls (table 1). Both age and gender identity were significantly different between patients with HOCI and controls, with HOCIs over-represented in older patients and those who identified as male. Regarding specialities, HOCIs were found in a higher proportion of patients in elderly care, general medicine, renal, and surgery compared with controls, and significantly lower proportions in patients from cardiology, gynaecology, obstetrics, and paediatrics.
Table 1Univariate analysis of variable sets for control versus HOCI data
Data are median (IQR) or n (%). Network, hospital contextual, and clinical variables were investigated for discriminatory power for HOCI (sample positive for SARS-CoV-2 at least 3 days after admission) versus control (sample not positive for SARS-CoV-2). Due to the sliding window, each patient can have multiple datapoints representing them on different days over the duration of their hospital stay. In addressment, patient variables are aggregated and averaged across time (appendix p 5). The significance test results show how the varying temporal profiles of patients could be used to classify HOCI versus control. Statistical analyses were performed using the Mann-Whitney U or the χ2 test. For clinical and contextual variables results are reported to 1 decimal point, whereas for network centralities results are given to 2 significant figures. HOCI=hospital-onset COVID-19 infection.
Six of ten hospital contextual variables were significantly different between the HOCI and control groups (table 1). Relative to controls, patients with HOCI were associated with longer length of stay before testing positive and were in hospital during times of higher hospital-bed occupancy and during periods of increased background incidence of COVID-19. No significant difference between the HOCI and control groups was observed for variables related to movement rates (between beds, rooms, wards, and sites).
For network variables, 24 of 30 centrality measures were significantly higher in HOCI patients (eight of ten from each room-contact, ward-contact, and building-contact network; table 1). Network variables that were significantly higher in the HOCI group than in the control group across the three contact networks included measures accounting for infectious COVID-19 cases (ie, infected degree, infected degree centrality, and infected closeness centrality) and general network connectivity (ie, degree, closeness centrality, clustering coefficient, and K-core number).
We trained different models on our London data using sets of variables of different types (panel). All models had high predictive power (table 2; figure 2A, B). In particular, the model based solely on contact-network variables (AUC-ROC 0·88 [95% CI 0·86–0·90]) performed similarly to the model based on all variables (0·89 [0·88–0·90]) and yielded more predictive power than models using solely hospital context variables (0·82 [0·80–0·84]) or clinical variables (0·64 [0·62–0·66]). To ascertain the predictive power of different types of contacts, separate models were trained on variables from each of the three contact networks (ie, room, ward, and building). The model based on ward-contact network variables had the highest predictive power (0·87 [0·85–0·89]); yet building-contact (0·85 [0·83–0·87]) and room-contact (0·82 [0·80–0·84]) network models also yielded high performance.
Table 2Summary of test and validation set performance across variable groups
Performance is measured using AUC-ROC, balanced accuracy, sensitivity, specificity, positive predicted value, negative predicted value, the positive likelihood ratio, and the negative likelihood ratio, which operate on a collapsed confusion matrix to reduce bias (appendix p 7). AUC-ROC=area under the receiver operating characteristic curve.
(A) AUC-ROC (area under the curve [AUC]-receiver operating characteristic curve [ROC]) test set performance for models broken down by the major feature groups (ie, full, clinical, hospital contextual, and network). (B) A further network feature decomposition by network variables computing from all, room, ward, and building patient-contact networks. (C) Risk-factor model test set performance for the contextual risk-factor model, the network (ward) risk-factor model, and a combined model from both the contextual and network (ward) risk factors identified in table 2.
We then investigated models with fewer variables, by using only risk factors (ie, variables identified as significant; ptable 1) among hospital contextual and ward-contact network variables. Clinical, room-contact network, and building-contact network variables were excluded due to comparably lower performance. Models based only on risk factors have equal performance to models including all variables (table 2; figure 2). Furthermore, the combined risk-factor model has the highest positive predictive value (0·87) and positive likelihood ratio (9·67) compared with all other variable-set models (table 2), in addition to high calibration (appendix p 14).
Using a stepwise-variable-elimination approach (appendix p 17), we ranked the combined set of risk factors (ie, hospital contextual plus ward-contact network). The hospital contextual variable “background hospital COVID-19 prevalence” was most predictive, followed by two ward-contact network variables: the infected contact network, which measures the network distance to all infectious cases, and the infected degree and degree centrality, which measures the direct contacts to infectious cases. A parsimonious model based on these three variables alone achieved AUC-ROC of 0·85 (95% CI 0·82–0·88), amounting to 95·5% of the combined model performance (appendix p 17). The same top three variables were also found when applying stepwise variable elimination to the entire variable set and to all the risk factors (table 2 and appendix p 17).
To validate the predictive power of contact-network variables, we applied our risk-factor models (without recalibrating the hyperparameters) to a Geneva-based geriatric hospital group during their first surge in cases (March 1–May 31, 2020). Over that period, 281 COVID-19 cases (138 COCIs and 143 HOCIs) were reported. Cases peaked on March 26, 2020, with 15 newly identified cases (nine HOCIs and six COCIs), reflecting the height of the early epidemic in Switzerland (figure 3A). In this dataset, ward-level and building-level data were unavailable; hence, we constructed room-contact networks. On the basis of only hospital contextual risk factors, the model achieved a high prediction accuracy, but the inclusion of room-contact risk factors further increased performance (table 2).
Figure 3Epidemiology curves of study validation data
Newly identified COVID-19 cases are reported across time and are broken down by HOCI and COCI case types. (A) Non-UK (ie, Geneva) hospital caseload during an epidemic surge of cases. (B) UK hospital group after pandemic surges 1 and 2, when COVID-19 became endemic and non-surging. COCI=community-onset COVID-19 infection. HOCI=hospital-onset COVID-19 infection.
For further validation, we used additional data from the same London hospital group collected during an endemic period following surge two in the UK (April 2–Aug 10, 2021). During this time, 1·4 daily cases were reported on average, with no surging behaviour (figure 3B). Compared with UK surges 1 and 2, HOCIs constituted a lower percentage of all cases (186 [12·9%] of 1446 COVID-19 cases were HOCI compared with 167 [15·1%] of 1107 in UK surge one, and 406 [17·8%] of 2276 in UK surge two; appendix p 10). In this endemic setting, we found that the hospital contextual risk-factor model performed poorly with low sensitivity and specificity (table 2). The ward-contact network risk-factor model had substantially improved performance compared with the hospital contextual model. By further variable integration, performance was marginally improved with the combined risk-factor model and achieved higher AUC-ROC, sensitivity, and specificity as compared with the previous two models (table 2).
Discussion
We used network analysis in combination with machine learning to predict patient-level HOCI using routinely captured hospital data. To our knowledge, this is the first study to forecast individual patient HOCIs by extracting patient contact networks from bed records. Together with hospital contextual variables, we report patient contact-network centrality as a significant HOCI risk factor, able to increase predictive performance across all datasets analysed.
Transmission of SARS-CoV-2 in health-care settings has been associated with features such as limited isolation capacity, suboptimal individual infection prevention practices,
Potential sources, modes of transmission and effectiveness of prevention measures against SARS-CoV-2.
In our training and testing data, patients managed in elderly care, general medicine, renal, and surgical units were significantly over-represented in the HOCI group (table 1). Staffing levels and stress in critical care; complex pathways and excess movements, resulting in high contacts amongst surgery patients; and the strong community links in renal wards might have exacerbated transmission. Older patients and male gender identity being significantly over-represented in HOCIs reflects known features of the wider pandemic.
Rapid triage for COVID-19 using routine clinical data for patients attending hospital: development and prospective validation of an artificial intelligence screening test.
our results show that such fixed variables are least predictive overall. Modern IPC might therefore improve management of outbreaks by including contextual and dynamic risk factors.
Behavioural factors, contact density, and ventilation between locations are known to affect risk of COVID-19 acquisition.
Using genomic concordance to estimate COVID-19 transmission risk across different community settings in England 2020/21.
These factors are consistent with the hospital contextual risk factors identified in our work. We found that background COVID-19 prevalence within the hospital group was the most predictive variable in our training and test data collected during pandemic surges. Although high case numbers increase transmission sources, background prevalence can also be a proxy for staffing stress and density changes, acting as potential exacerbators. Similarly, high HOCI risk from increased hospital-bed occupancy could be due to high patient loads, increased density, and staffing pressures, which make IPC challenging. Similar to other HCAIs, length of stay was significantly higher for HOCIs (table 1).
Length of stay and consecutive length of stay both being significantly longer in HOCIs than in controls also supports genomic analysis suggesting COVID-19 acquisition can be linked to previous admissions.
Epidemiological data and genome sequencing reveals that nosocomial transmission of SARS-CoV-2 is underestimated and mostly mediated by a small number of highly infectious individuals.
Increased movement rates (ie, bed, room, ward, and site moves) were reported as a risk factor for HCAI locally,
Association between intrahospital transfer and hospital-acquired infection in the elderly: a retrospective case-control study in a UK hospital network.
yet it was not significantly different for HOCIs in our data (table 1). The risk from movement rates alone is likely to be too general for HOCI, without specificity, and better captured via measures of contact-network centrality. Altogether, models based on hospital contextual variables showed strong predictive performance across epidemic surges. However, including network variables increased performance most notably in the endemic validation data (table 2).
Most contact-network variables (24 of 30 investigated, eight from each contact definition) were significantly higher in HOCIs (table 1), and the model based only on contact-network variables was as predictive as the model containing all variables (table 2; figure 2). The underlying network structure might, therefore, hold features exploitable for HOCI prediction with network mining tools.
HCGA: highly comparative graph analysis for network phenotyping.
HOCIs were significantly more central in contact networks. Few studies have used contact data to investigate HCAI, and most have considered only direct contacts (ie, network degree).
The risk of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) transmission from patients with undiagnosed coronavirus disease 2019 (COVID-19) to roommates in a large academic medical center.
Epidemiological and clinical predictors of COVID-19.
Consistent with these studies, our results show direct contacts as a strong risk factor of infection. Yet, the infected contact network (ward), measuring network connectedness to all known infections, was more predictive than direct infectious contacts (ie, infected degree), suggesting the presence of longer and indirect transmission chains that can affect contact tracing. Alternatively, disrupting underlying network connectivity by targeting patients with high centrality, together with screening and isolation based on risk factors, could be effective to reduce onward transmission.
To show generalisability, we applied our framework to data gathered from a hospital group that differed in both type (ie, geriatric vs long-term care) and country (ie, Switzerland vs the UK). Despite scarce contact data (ie, only room-level data were available), the framework was still highly predictive, and importantly, performance increased through the inclusion of contact-network risk factors. To further showcase its generalisability, we analysed data from the same London hospital group at a later date under differing epidemiological (ie, endemic) conditions (appendix p 10), changing IPC measures, newly emerging variants, and increasing vaccination rates. Although our framework achieved weaker performance on the endemic validation dataset, the inclusion of patient contact-network risk factors at the ward level substantially increased performance as compared with hospital-contextual risk factors, which did not have predictive capability (table 2).
The emergence of large databases with granular detail has allowed the construction and application of contact networks that can be integrated into routine IPC and public health policy. For instance, recorded movements within hospital (as studied here) or Bluetooth interactions of mobile users (eg, Corona-Warn-App in Germany) provide informative datasets that account for various underlying proxies in human interaction. The ubiquity of such data to construct contact networks is likely only to expand, with select hospitals introducing radiofrequency-identification tracking.
Use of a real-time locating system for contact tracing of health care workers during the COVID-19 pandemic at an infectious disease center in Singapore: validation study.
Aimed at exploiting these emerging sources of data, our dynamic disease forecasting framework is designed to be portable to a range of settings and variables. The framework offers precise individual predictions of risk of infection acquisition and is thus amenable for risk stratification in real time, which can serve to guide dynamic IPC resource allocation for rapid screening, isolation, and grouping of patients at high risk of infection acquisition. By incorporating complex multimodal data sources into a single measure of predicted risk, our framework produces relevant and actionable outputs preventing disease acquisition.
Major challenges to effective IPC activity are low bed capacity and inadequate and overwhelmed isolation capacity, in addition to insufficient staffing and microbiological testing resources. These challenges to IPC were vastly exacerbated by the COVID-19 pandemic. We envisage the proposed framework to be used within a modern, data-driven IPC patient management system and able to assist optimal decisions in real-world scenarios. The predicted risk score for each patient can be used by clinicians to rank and prioritise (eg, identify patients at high risk for infection for isolation or grouping followed by targeted enhanced testing). In this way, HOCIs could be identified at the earliest opportunity, which in turn could optimise IPC measures and treatment. Patients at low risk of infection acquisition could also be potentially moved back to regular patient management faster, saving resources that are in demand. However, further work is needed to evaluate the direct implications (ie, clinical and economic) of identifying patients at high risk of infection. In addition to actionable clinical points, a key aspect of this framework is its dynamism and its ability to generate insight on demand. By aggregating complex data sources into single interpretable risk scores, a range of risk sources and their interactions are made accessible to hospital teams. Such data-driven insights, always integrated within human decision making, can enable hospital teams to become more flexible and responsive to complex, rapidly emerging disease threats.
Our study has several limitations. First, our contact definitions might not fully capture transmission (eg, connections via health-care workers);
Explosive nosocomial outbreak of SARS-CoV-2 in a rehabilitation clinic: the limits of genomics for outbreak reconstruction.
indirect transmission over surfaces; non-room, ward, or building contact; or interactions from visitors. However, routinely collected patient bed allocations have been shown to capture implicitly non-patient interactions that align with organisational and speciality hospital structures.
Network memory in the movement of hospital patients carrying antimicrobial-resistant bacteria.
Staff and visitor contact data were not available in our data due to privacy restrictions, but such data should be investigated, in accordance with privacy preservations. Second, since our training and testing period occurred largely before the UK’s vaccination rollout, we were unable to include vaccination status as a patient variable. With increasing levels of natural and induced immunity, inclusion of vaccination and recovery status might improve predictions; emerging new variants and incomplete vaccine coverage
Estimated transmissibility and impact of SARS-CoV-2 lineage B.1.1.7 in England.
make the levels of susceptibility uncertain. Third, patient ethnicity was not available in our study. Due to its contextual complexities, and being a previously identified risk factor,
Ethnicity and clinical outcomes in COVID-19: a systematic review and meta-analysis.
ethnicity warrants specific and increased investigation in the future. Fourth, our data did not include ventilation or specific information about room arrangements (appendix p 2), which contribute to COVID-19 transmission.
COVID-19 outbreak associated with air conditioning in restaurant, Guangzhou, China, 2020.
However, without accounting for ventilation, our models were highly predictive. Finally, various aspects of hospital organisation were altered across the pandemic, including changes in screening practice, personal protective equipment, or bed placement, which were not encoded here as variables.
Overall, our study emphasises that dynamic networks of patient contacts can aid personalised predictions of infection. Our study applies to respiratory virus transmission in hospital, using widely available patient bed records. Further work is needed to extend this framework to other infectious diseases, assessing the types of contact required for transmission, evaluating the implications of identifying a patient at high risk of infection acquisition, and understanding how it could be integrated into IPC more generally.
AM, JRP, RLP, SM, and MB contributed to study concept and design. JRP, MA, SM, NZ, and FR contributed to data acquisition. AM, JRP, MA, and SM contributed to data analysis and accessed and verified the underlying data. AM, JRP, RLP, MA, AH, and MB contributed to the initial manuscript drafting. All authors contributed to data interpretation and final revisions of the manuscript. AH and MB contributed to study supervision. AM, JRP, RLP, MA, SM, SH, AH, and MB contributed to the discussion of the results and reviewed the data. All authors had full access to all the data in the study and had final responsibility for the decision to submit for publication.