An accurate assessment of the pandemic was critical to the response to COVID-19. This necessitated data, which had to be created, collected, managed, appropriately accessed, shared, linked prior to analyses using a range of methodologies and synthesised for assessment. Data visualisation was central to this and provided a usable way for decision-makers to see trends, outliers and geographical or other groupings.
In the initial months data were sparse and there were considerable challenges gaining access to even the most basic data to understand the situation. Sharing and linking data across organisational and sectoral boundaries were among the hardest and most often recurring challenges of the pandemic response, and are covered in more detail later in this chapter.
Alongside this, there were issues collecting sufficient and appropriate data initially – for example, with limited testing early in the pandemic. As testing expanded, allowing for much richer data from multiple sources, and as data sharing was set up and processes automated, an effective suite of charts, maps and other visualisations were available to underpin decisions.
Enabling integration of data from different parts of the health and public health system and different UK nations and regions is an important learning and legacy of the pandemic – and was not always easy. We anticipate future pandemics will have the same challenges of initially sparse data, and probably of data linkage and automation.
To enable full assessment, data streams from clinical testing, health and care and community settings, genomics, death records and non-health sources were needed. Serological data was also helpful in assessing early cumulative attack rates, and for a range of studies to understand immunity and reinfection, severe disease and transmission dynamics (see Chapter 1: understanding the pathogen), as well as tracking seroprevalence at a population level (for more on serological testing see Chapter 6: testing).
Tables 1 to 5 summarise these data streams, giving a brief description of each type and setting out their strengths and limitations. These data streams included core data on cases, people admitted to hospital, and deaths, as well as demographic data about people in each of these groups (age, sex, ethnicity, occupation, deprivation) and their location (geographically and by setting).
Data on underlying health needs were also key, though this remained challenging throughout the pandemic. Underlying all of these was an accurate test properly recorded to say whether a person did or did not have COVID-19 and until that was available the data streams had limited usable information. Data on outbreaks in specific settings, such as care homes or prisons, and in hospitals, including by different levels of care up to intensive care, were important.
The effects of non-pharmaceutical interventions (NPIs) were also important to assess, requiring data on mobility, contact patterns and behaviour. Over the course of the pandemic, as new SARS-CoV-2 variants emerged and population immunity developed (both natural and vaccine-induced) the ability to link core data to disease outcome, vaccination status, past infection and SARS CoV-2 variant became essential. This continued to be challenging to do properly – for example, linking to past infection required an individual to have been tested and provide identical details for linkage.
Each data set has its own story in terms of what had to be done to get what was needed to those who needed it.[footnote 1] Some data streams were well established, such as data on cause of death from death certificates, held by the Office for National Statistics (ONS). Some data sets existed but were not accessible, shareable or linked. Some data sets had to be created in response to the pandemic.
A range of organisations, therefore, created and/or held relevant data – for example, each of the national public health organisations (Public Health England (PHE, latterly the UK Health Security Agency, UKHSA), Public Health Wales, Public Health Scotland and the Public Health Agency in Northern Ireland) as well as the National Health Services for each of the UK nations (both hospital data and general practitioner (GP) records). Alongside this were other government agencies, consortia such as the COVID-19 Genomics UK consortium (COG-UK), private companies, and academic organisations across the UK nations undertaking relevant studies – for example, Early Pandemic Evaluation and Enhanced Surveillance of COVID-19 (EAVE II) in Scotland.[footnote 2], [footnote 3] All these organisations had their own platforms for data, adding to operational challenges of sharing data even where data sharing agreements were in place.
The users of data were also varied: as well as data use across the UK government, data were needed by the National Health Services and public health organisations of the 4 nations as well as academia – for example, the academic groups providing expert advice within the Scientific Pandemic Influenza Group on Modelling Operations (SPI-M-O).[footnote 4] In addition, there was high public interest in relevant data and, over time, data were made increasingly publicly available – for example, through the COVID-19 dashboard or, for genomic sequence data, through the Global Initiative on Sharing Avian Influenza Data (GISAID) platform.[footnote 5]
There have been important lessons about the data and information needed at different stages of the pandemic. The processes required to bring these data together, analyse and assess them have evolved through the pandemic. This section covers:
- what data were needed, what data we used and where data were sourced
- important processes for data and analysis with a case study on the UK COVID-19 Dashboard
- how analyses were assessed to inform policy, with case studies on Prime Minister and other senior ministers’ briefings and bronze, silver and gold situation reports
- reflections on data, analysis and assessment
What data were needed, and where data were sourced
Testing data from clinical pathways and surveillance studies
From the outset of the pandemic, there was a requirement for estimates of incidence and prevalence of SARS CoV-2 at a national and regional level, along with details of the case composition and demographics.
Data on cases initially came from early studies using the First Few Hundred (FF100) protocol to investigate the clinical and epidemiological characteristics of at least the first few hundred confirmed COVID-19 cases. This provided important data to inform case definitions and early situational awareness,[footnote 6] and was essential for quantifying the delays from infection to clinical outcome, and therefore the lag time between any policy interventions and observable impact on the healthcare system. The rapid growth of the outbreak led to many variables being incomplete, delaying many analyses.
Hospital admissions (from clinical testing) provided an early signal for increases in incidence. Due to limitations in testing capacity, tests were initially prioritised towards clinical presentations of COVID-19 within hospitals. Once diagnostic testing was available at scale, detailed description of case rates by demographics and at lower-level geographies helped to inform policy decision-making. Routine asymptomatic testing across institutions and sections of the population (such as school children and healthcare workers) gave more complete data from late 2020 onwards. This helped to highlight socio-economic disparities in the burden of disease (see Chapter 2: disparities). However, diagnostic testing data were always going to be biased to some degree by testing capacity and access and variation in uptake across socio-economic groups.
In addition to testing data, 4 primary surveillance approaches were also utilised to help better understand population level incidence:
- prevalence studies
All had their pros and cons – triangulation was key.
Sentinel data were provided through the repurposing of influenza surveillance infrastructure. This included the COVID-19 Hospitalisation in England Surveillance System (CHESS), adapted from the UK Severe Influenza Surveillance System and severe acute respiratory infection (SARI) data. These data were primarily used by external academic partners.
The main sources of syndromic surveillance data included:
- the ZOE COVID symptom study
- the NHS COVID-19 app
- NHS Pathways data (111)
- online COVID-19 symptom searching behaviour
The ZOE COVID symptom study was initiated in March 2020 and provided data from those who joined through an online app and self-reported their presence or absence of symptoms a subset of which were confirmed through diagnostic tests.[footnote 7] The ZOE study is an interesting example of crowdsourced data from this pandemic that shows both its strengths (such as the speed of signals) and limitations (such as selection biases and poor comparability of data over time). The NHS COVID-19 app was launched across England and Wales on 24 September 2020 and allowed us to identify contacts of those who have tested positive for COVID-19. NHS Pathways data (111) and online COVID-19 symptom searching behaviour both provided early indicators for potential increases in symptomatic prevalence, including at smaller geographies. All of these studies can give an early indication of the epidemic trajectory and change points. However, they are likely to be limited in their ability to estimate true population prevalence or incidence.
To understand prevalence in the population the ONS Coronavirus (COVID-19) Infection Survey (ONS CIS) and the REal-time Assessment of Community Transmission (REACT) study (in England, table 2) were established.[footnote 8], [footnote 9] The CIS was a new endeavour for the ONS, which engaged an external supplier to support the rapid setting up of this large-scale survey. These studies were developed to sample the population and provide more representative data on infections in the community.
The ONS CIS was a prospective cohort study, initiated in April 2020 and was UK-wide. The initial sample was created through an amalgamation of pre-existing surveys (around 4,000 participants per fortnightly round in April 2020) and was scaled up by autumn 2020, including the use of financial incentives (around 116,000 participants per fortnightly round by October 2020).
The REACT study was a separate population surveillance study undertaken in England to examine prevalence from May 2020. It was helpful to have 2 similar studies to triangulate results. Additional studies were used to investigate infections in key settings, such as the SARS-CoV2 immunity and reinfection evaluation study (SIREN) studying case rates and reinfection in healthcare workers across the UK, and Vivaldi studying case rates in care homes in England.[footnote 10], [footnote 11]
These types of studies are considered the ‘gold standard’ but take time and considerable resource to set up and obtain sufficiently large and representative samples. Their weekly data summaries were central to analysis of the epidemiology, but were not available as quickly as mass testing data (which was produced daily) and provided a lagged estimate of population prevalence. Epidemiological analysis therefore continually triangulated the more representative (but lagged) surveillance studies with the more timely (but often biased) testing case data. It is important to ensure line-level data is available from separate studies to support comparisons of different analyses across studies.
Wastewater testing was used to measure SARS-CoV-2 viral ribonucleic acid (RNA) concentrations at various sites and geographic levels (institution, community, city or town, regional and national) across the UK. In England, the Environmental Monitoring for Health Protection wastewater monitoring programme started in June 2020 at 44 sewage treatment works and was scaled up to cover 74% of the population at its peak by early 2022.[footnote 12]
Generally, wastewater monitoring can provide an indication of presence or absence of detectable pathogens shed into wastewater systems (such as SARS-CoV-2) and is helpful within closed institutional settings such as prisons to give early indication of an outbreak within the monitored population. In this pandemic it also signalled circulation of SARS-CoV-2 variants of concern and supported tracking lineages of SARS-CoV-2.
However, in England it has not been possible to consistently standardise comparable samples between and within locations and so wastewater monitoring was not relied upon for prevalence estimates. This was in part due to differing biases across sites and over time such as:
- flow of wastewater
- sampling consistency
- cross contamination
- obstructions in the system
- efficiency of the sequencing methods
- time of year
- time of day
It is also not possible to link wastewater analysis with infection timelines for individual cases and therefore monitor incidence. This is because polymerase chain reaction (PCR) testing conducted on samples can detect viral fragments from long-resolved infections, and therefore it is difficult to judge whether samples reflect active or past infections. It has, however, been reviewed alongside testing and surveillance data to triangulate signals. Although wastewater monitoring has not typically been a leading indicator for prevalence or incidence, it can help corroborate other indicators and in particular provide early signals on new variant presence in a particular area. In Scotland, for example, wastewater monitoring was used to corroborate findings from testing data.
Case data and genomic information
Internationally, case data were generally accessible but cross-country comparisons were unreliable because of biases such as differences in testing capacity, access, uptake and technologies deployed impacting data. On the other hand, in some cases close sharing of data and international comparison was helpful in understanding the rapidly changing epidemiology – for example, between Northern Ireland and the Republic of Ireland, where the epidemiological picture often looked similar. Case data were complemented by contact tracing data, including data from mobile apps informing individuals of exposure to confirmed COVID-19 cases (for more detail on these apps across the UK’s 4 nations see Chapter 7: contact tracing and isolation).[footnote 13]
As new variants emerged and established, there was a need to bring detailed genomic information alongside case data in order to understand the evolving epidemiology of the pandemic. In doing this, whole genome sequencing (WGS) was key in confirming variants and enabling more detailed virological analyses. WGS processes used samples from surveillance studies, case data and wastewater samples, though genomic surveillance of wastewater samples would have benefited from standardised methods and analysis to support comparison of data across the UK nations.
WGS was also used to track imported variant cases from mandatory testing of international passengers from February 2021 to March 2022. This was important not only to inform interventions for variant cases once in the UK, but also to give some information on likely circulation of variants in other countries where their own WGS capacity was limited. The UK joined many countries worldwide in sharing WGS data on open platforms such as GISAID, making a substantial contribution to global genomic data.
WGS was key in tracking the course of genetic evolution of the virus and tracking variants, but its (sometimes multi-week) lag to results meant it was not ideal to enable timely analysis or inform rapid interventions. It could, however, be triangulated with case data as a retrospective tool to spot the establishment of variants with a growth rate advantage, and besides this it was helpful to get genomic surveillance data from other countries experiencing variant establishment ahead of the UK in order to pre-empt possible response needs should a similar establishment be seen here.
Other, more timely, methods were therefore used. As noted in Chapter 1: understanding the pathogen, by chance some variants did or did not carry one of the genetic targets of PCR testing, the S gene – and therefore many PCR testing labs were able to signal potential variants by tracking ‘S gene dropout’ during testing. These diagnostic test (S gene) data were much timelier (and more readily linked to other data sets) than the data subsequently available from WGS, though not all labs used the same gene targets and so population coverage of this marker was incomplete. Later in the pandemic, genotyping for specific variants provided timelier data than WGS and more specific data than the use of the S gene as a proxy.
Healthcare data were needed to understand disease severity across different demographic groups and also pressure on the healthcare system.
General acute hospital admissions and admissions to intensive care for COVID-19 were important in understanding rates of severe disease from the outset. Early in the pandemic in England, the first data set that provided insight into hospitalisations was CHESS (later, renamed SARI). This was an aggregate and line list data set, providing detail on general admissions and high dependency unit (HDU) or intensive care unit (ICU) admissions. It was sourced from sentinel sites and other participating trusts.[footnote 14] The sentinel trusts were not a representative sample of hospital admissions within England and therefore inferences that were drawn had limitations. These data were biased towards critical care admissions, which made it unrepresentative of clinical pathways and severity. However, it was a valuable tool for modelling patient length of stay and the required bed days for patients with COVID-19.
To better understand pressure on the healthcare system, COVID-19 situational reports were set up to collect key management information across the 4 nations. These situational reports provided aggregate data on COVID-19 hospital admissions and bed occupancy, and these data became available in near real-time across the 4 nations.[footnote 15] In the early days there were data consistency issues across NHS trusts. These were smoothed out as the pandemic progressed, but no retrospective corrections were made to the historical data. As the pandemic evolved, the range of management information collected was expanded to include:
- beds occupied by adults with COVID-19
- beds occupied by adults without COVID-19
- available beds:
This was done to better reflect capacity, as hospital bed types were not interchangeable. Challenges remain regarding the sharing of this data between the 4 nations. Staff absences from COVID-19 and CRITCON data (NHS trust declared assessments of ICU capacity) also became helpful metrics for measuring healthcare pressure.
It was also important to link data streams – for example, linking testing data or vaccine status with hospital admission data (see section below on data linkage). In England, the Secondary Users Service is a comprehensive repository of healthcare data, including Admitted Patient Care and Emergency Care Data Set. These data sets allow for linkage at an individual level to vaccination or infection history, variant, clinical characteristics and demographics.[footnote 16], [footnote 17] Clinical characteristics included a flag for ‘clinically extremely vulnerable’ status or ‘COVID-19 at risk’. However, this did not allow us to differentiate between underlying health conditions.
In addition, data fields on diagnosis were only completed at patient discharge and were also combined with reporting delays of up to 30 days post-discharge. As a result these data lagged admissions by weeks or even months, depending on length of stay. This problem was mitigated by using data on individual admissions to hospital through emergency departments, a subset of individual hospital-level data available for national linkage and analysis, which was only subject to reporting delays rather than length of stay. These data were then linked to information on variants and vaccine status, supporting studies on the severity of disease associated with new variants of concern such as Delta in June 2021.[footnote 18]
With vaccination rollout from December 2020, quantitative and qualitative data streams on vaccine uptake and attitudes towards vaccines were set up to understand the extent of vaccine uptake across different communities and demographic groups, guide vaccination campaigns and support subsequent studies on vaccine effectiveness.[footnote 19], [footnote 20]
However, analysis of vaccine uptake was challenging because the size of the denominator was uncertain. In England, the National Immunisation Management Service (NIMS) used a denominator based on NHS England’s Primary Care Registration Management service database, as for many other vaccine programmes. This register relied on registration with primary care and so could underestimate some populations not routinely engaged with primary care, and overestimate others where people had moved and not de-registered from their GP.
The alternative was using mid-census estimates updated annually based on the last UK national census in 2011, but this was similarly uncertain and in some calculations underestimated population size to such an extent that vaccine coverage exceeded 100%. Therefore ONS population estimates were not used in the analysis of vaccine uptake. It was also particularly important to ensure that government departments and analytical teams used the same denominator in vaccine analysis and presented consistent figures to seniors and ministers to avoid confusion.
Data on deaths from COVID-19 were the subject of intense scrutiny globally from the outset of the pandemic, and were important in situational awareness, particularly where testing was more limited, and in understanding the severity of disease in different groups. The definition of a mortality from COVID-19 is multifaceted and evolved across the pandemic.
Early in the pandemic there was a need for consistency in public reporting of deaths. ONS produced weekly summaries of deaths with COVID-19 mentioned on the death certificate, but these data were lagged. Initially, daily figures for hospital deaths were published. In April 2020 this was updated to include deaths of those with lab-confirmed COVID-19 whatever the setting, including those in the community and care homes.[footnote 21]
In August 2020 it was agreed that deaths within 28 days of a positive COVID-19 test would be reported through official channels.[footnote 22] However, this definition still had limitations and other definitions included:
- death within 60 days of a positive test (for example, in the PHE and Cambridge real-time model in late March 2020)
- COVID-19 as the primary cause on the death certificate
- COVID-19 mentioned on the death certificate[footnote 23]
As changes in treatment and the management of patients with COVID-19 improved and the pathogenesis of the virus evolved, the average time from an infection to a mortality increased and it became more difficult to understand how many of these deaths were with COVID-19 rather than from COVID-19. Therefore, while each definition had limitations, alternative definitions for a COVID-19 death were used to inform a more complete picture of the burden of disease.
As the pandemic progressed it was important to track changes in mortality rates overall as a result of the pandemic – not just directly from COVID-19 but also due to healthcare disruption, the impact of interventions to limit transmission and the wider social and economic impacts.[footnote 24] For this, all-cause excess mortality aided in our understanding. This analysis was produced by academic institutions and the ONS from 2020.[footnote 25] The importance of excess mortality in national data is that it captured the indirect impact of COVID-19. These included the effects of highly stretched healthcare and changed healthcare seeking, the impact of lockdown and other indirect effects.
Further to this, PHE (latterly, UKHSA) provided excess mortality estimates, and later the World Health Organization (WHO) produced global excess mortality estimates for 2020 and 2021.[footnote 26], [footnote 27] Such analyses enabled both an understanding of the full impacts of the pandemic, and also enabled more international comparisons which until that point had been difficult due to different methods of recording and reporting COVID-19 deaths globally. Given the very different ways nations detected and recorded COVID-19 cases, age-adjusted all-cause excess mortality was in the view of the CMOs the most appropriate way to compare international data. Even this, however, is not easy as the ‘expected’ mortality can be calculated in many ways.
As with other data sets outlined above, it remained important throughout to link deaths data with, for example, data sets on clinically extremely vulnerable or COVID-19 at-risk status, variants, vaccination status and demographic variables. This enabled us to understand which groups COVID-19 was impacting most severely as the viris evolved and new medical countermeasures became available. In due course we managed to link deaths data to key variables of interest which facilitated vaccine effectiveness and waning immunity modelling. However, it remained the case that the data lacked the granularity to be able to analyse in detail the clinical impact of different comorbidities.
Other non-health data
Non-health data were also important in understanding the trajectory of the epidemic and responses to interventions. Transport operators, educational establishments, search engines and telecommunications operators provided anonymised, aggregate data. This provided insight into mobility, behaviour and social interactions, to facilitate assessment of the impact of non-pharmaceutical interventions. Types of data used varied across the UK nations due to differences in data collection, storage, reporting and access.[footnote 28], [footnote 29]
Behavioural and attitudinal data – for example, from surveys and/or polling – helped interpret quantitative data and understand interpretations of and adherence to NPIs. In Wales for example, the Public Health Wales ‘How are we doing in Wales’ survey provided updates on public attitudes to, interpretations of and adherence to NPIs. Studies of contact patterns in population samples – for example, the UK-wide CoMix study and the COVID-19 Scottish Contact Survey – also highlighted changing behaviours throughout the pandemic. This was important in informing policies and communications.[footnote 30], [footnote 31]
Figure 1: timeline of daily deaths across the UK with COVID-19 on the death certificate
Description text for figure 1:
Timeline is from February 2020 to October 2022 by date of death.[footnote 32]
Until 22 February 2020 there were no deaths. In late February the number of deaths rose steeply to an initial peak of 1,461 on 8 April 2020.
For much of July, August and September 2020 deaths returned to lower levels of up to 50 per day.
In October 2020 deaths began to rise again but at a slower rate than in March and April 2020.
In late November and early December 2020 deaths were around 400 to 500 per day.
In early December 2020 deaths began to fall slightly before rising sharply to a second peak of 1,490 on 19 January 2021.
Deaths then fell at a similar rate to May and June 2020. For most of April, May, June and early July 2021 deaths were around 0 to 50 per day.
From July 2021 to October 2022 deaths then rose and fell between around 50 and 250 deaths per day.
Key data and analyses for each stage of the pandemic
February to May 2020: first wave
Key data: data sources were more limited, as testing had not yet been scaled up. Focus on case rate data, hospital admissions and deaths.
Analysis: decisions on restrictions were informed by healthcare and to a lesser degree (due to lags) death data. Mobility data were used to demonstrate the impact of national restrictions after they were introduced.
September 2020 to January 2021: second wave increasing and the emergence of Alpha
Key data: case rates and test positivity at lower tier local authority (in England), hospital admissions and occupancy and surveillance studies (Office for National Statistics COVID-19 Infection Study, REACT, SIREN, Vivaldi, ZOE COVID-19 symptom study).
Analysis: case ascertainment increased considerably and was a key data source for low level geographies to inform localised interventions (tiering) in England. Surveillance studies designed to be representative provided a useful comparator to case data and were mostly aligned.
February 2021 to September 2021: second wave reducing and the emergence of Delta
Key data: vaccine uptake, effectiveness, vaccine hesitancy, hospitalisations by age (introduction of vaccination by the Joint Committee on Vaccination and Immunisation prioritisation).
Analysis: vaccine uptake, effectiveness and understanding hesitancy (to inform interventions) was essential.
September 2021 to February 2022: third wave increasing and the emergence of Omicron
Key data: genome sequenced case data, antibody prevalence (surveillance), medium-term modelling projections.
Analysis: variant specific analyses – descriptive and analytical – for example, growth rates by variant, across regions, ethnicities, ages. Mobility data in the emergence of Omicron highlighted an increased level of public caution even before government restrictions – for example, the move to Plan B in England.
Development of new data streams during the pandemic
- March 2020: ZOE COVID-19 symptom study and COVID dashboard go live
- April 2020: ONS COVID-19 Infection survey (CIS) goes live
- May 2020: REACT community transmission study begins
- June 2020: SIREN and Vivaldi studies begin. UK-wide R and growth rates routinely published
- September 2020: NHS COVID-19 mobile phone application launches
- January 2021: Mandatory COVID-19 testing at the border introduced
Data and analysis informing 3 selected policies
May 2020 to January 2021: travel corridors for international travel
Key data: case rates, positivity in countries collected through open source research and data sharing between countries.
Analysis: this informed guidance on which countries passengers had to self-isolate from when travelling to the UK.
May 2021 to October 2021: traffic light
Key data: case rate, positivity in countries, testing volumes, variant proportions in country, growth rate of variants in countries, travel volumes and links between countries, positivity of travellers and variant sequences imported.
Analysis: assessment made of imported infection risk to the UK from overseas travel. Data from countries was variable – some sequenced large proportion of positive cases while some had no testing or sequencing. Border testing enabled assessment of variants and positivity in travellers from countries with no testing or sequencing. Border testing compliance often low (less than 50%) and sequencing lagged (up to 10 to 15 days after positive test).
October 2021 to March 2022: red list
Key data: international epidemiology: case rates, testing rates, positivity, death rates, hospitalisations, intensive care unit admissions, contact tracing data, intelligence and open-source reporting, variant or sequencing reports, mandatory testing of international arrivals and genome sequencing.
Analysis: looked to monitor spread of key variants (Beta, Gamma, Mu, Lambda, Delta, Omicron), inform variant assessments and risk assessments or border measures.
Table 1: testing data types, sub-types, strengths and limitations for COVID-19 in the UK (2020 to 2022)
|Type of testing data and source||Description or subsets||Strengths||Limitations|
|Clinical nose or throat swab testing[footnote 33]: UKHSA and NHS laboratories||Data sets on clinical testing providing data on cases were developed across the 4 UK nations, with data updated daily (at the height of the pandemic).
For example, in England, ‘pillar 1’ clinical swab testing in UKHSA laboratories and NHS hospitals was undertaken for those with a clinical need.
|Small data lag, included those with most severe illness.||Included a mixture of key workers as well as those admitted to hospital.|
|Clinical nose or throat swab testing: NHS Test and Trace||Data sets on clinical testing providing data on cases identified in the community were developed across the 4 UK nations, with data updated daily (at the height of the pandemic).
For example, in England ‘pillar 2’ clinical swab testing was initially for key workers, then widened to testing in the community for those symptomatic or identified as a contact. Later lateral flow device (LFD) tests were provided for home use.
|Increasingly, data at scale, giving high power for analysis, including by age, ethnicity, lower-level geographies
Allowed more detailed analyses of variant through PCR target data and the presence or absence of the S gene where testing was in specific laboratories (see below).
|Case ascertainment affected by:
– behaviours and attitudes to testing
– policy changes in testing eligibility
– sensitivity and specificity of tests used (PCR vs LFD).[footnote 34]
|PCR gene-target data (presence or absence of S gene)||Data sets provided details of the cycle thresholds for detection of specific gene targets in positive clinical tests (and surveillance studies) for SARS-CoV-2. Data were updated daily (at the height of the pandemic).
Four main ‘Lighthouse laboratories’ used a TaqPath assay including S, ORF and N gene targets for a subset of community testing, which provided data on the S gene, which was important for identifying changes in variant.[footnote 35]
|The gene targets allowed differentiation between the presence and absence of the S gene, which aided differentiation between variants, as its presence alternated between wild type, Alpha, Delta and the first wave of Omicron.||The presence and absence of the S gene which alternated in replacing variants (wild type, Alpha, Delta, Omicron) was fortuitous.|
|Genotyping||Reflex assays on positive tests to assess genotypes in a subset of positive clinical tests, following initial PCR testing.[footnote 36]||Rapid assessment of known variants, available later in the epidemic.||Only types specific (known) variants can be tested for with these assays.|
|Whole genome sequencing (WGS) data: COG-UK and Test and Trace[footnote 37]||Data set of the genetic code for SARS-CoV-2 viruses detected through testing.
WGS was undertaken on a subset of SARS-CoV-2 viruses detected in the 4 nations through clinical testing, and from surveillance studies – ONS CIS and REACT (REACT in England only), and for investigation, in wastewater.
|WGS data provided detailed data on strains of viruses circulating and supported identification of variants under investigation or of concern.||WGS data are lagged, and while undertaken at scale the proportion of those sequenced decreased at times of high prevalence.|
|Contact tracing data: NHS Test and Trace and NHS COVID-19 app[footnote 38]||Data sets of the contacts of cases identified through active case follow-up, including with setting of exposure – for example, household contact.
Data sets of contacts per case from digital app.
|Indicates how COVID-19 is spreading between contacts and allowed analysis of, for example, secondary attack rates in households.
Automatically generated data from digital app on exposure.
|Only contacts identified by cases were captured.
For app data, this only included those that had installed the app and self-reported a case. It did not provide detail on location of cases and was influenced by individuals’ willingness to report cases, when it had consequences for self-isolation.
Table 2: surveillance data types, sub-types, strengths and limitations for COVID-19 in the UK (2020 to 2022)
|Type of surveillance data and source||Description or subsets||Strengths||Limitations|
|COVID-19 Infection Survey[footnote 39]: ONS and Oxford University (also, a National Core Study, see below)||Longitudinal household cohort study provided data sets on positivity from a well-described sample of individuals across the 4 nations, allowing headline estimates and multiple sub studies – for example, on reinfections and waning of immunity. Included data on PCR gene targets and WGS to analyse variants.||Consistent positivity ascertainment with longitudinal design.
More representative of community infections than clinical testing data, although still subject to some recruitment bias.
Assessment of immunity through antibody tests.[footnote 40]
Detailed studies of reinfection by variant as well as cycle threshold (Ct), symptoms, contact analyses and predictors of positivity.
|Sample size limited precision and power for analyses – for example, those at smaller geographies, particularly at times of low positivity and/or with the first emergence of new variants of concern.
Study does not include people living in institutional settings such as prisons and care homes.
|REal-time Assessment of Community Transmission (REACT)[footnote 41]: Imperial College London||Repeated cross-sectional study provided data sets with positivity on random samples of individuals included in England. Detailed data on participants allows multiple sub studies – for example, on risk factors for infection.||Consistent positivity ascertainment with cross-sectional design.
More representative of community infections than clinical testing data, although still subject to some recruitment bias.
Sharing of data supported timely ad hoc analyses to inform policy, such as modelling prevalence at small spatial scales.
Detailed sub-studies – for example, on socio-economic risk factors for infection.
|Sample size limited precision and power for analyses – for example, those at smaller geographies, particularly at times of low positivity and/or with the first emergence of new variants of concern.
The repeat cross-sectional rounds, rather than continuous sampling, meant there were gaps in data availability.
|ZOE COVID Symptom study[footnote 42]: Kings College London||Data sets includes participants who have downloaded a digital app and use this to self-report symptoms (approximately over 4 million users during pandemic) and other relevant data – for example, the results of any testing undertaken.||Prevalence estimated from self-reported symptomatic people.
High participation and app enabled flexibility to ask new questions for policy insights.
Trends tracked ONS CIS and REACT at the height of the pandemic.
Rapid data, not reliant on testing and the cost associated with this.
|Reliant on individuals using the app and self-reporting symptoms. Participants were less representative of the community than other studies (though outputs were modelled).
Did not detect asymptomatic of pre-symptomatic individuals who would test positive for SARS-CoV-2.
|Wastewater testing data for the Environmental Monitoring for Health Protection programme[footnote 43]: NHS Test and Trace||Data set on the quantity of viral fragments that entered sewage systems, with testing from sewage flowing into wastewater treatment plants and key locations across the sewer network.[footnote 44]||Can provide data when other data streams (for example, clinical test data) are not routinely in use in the community.
Unobtrusive data collection.
Used, with other data sources, in modelling the pandemic.[footnote 45]
|Difficult to determine accurate location of infections.
Without combined tracking of faecal shedding, surveillance was limited to detection and identification of known and cryptic lineages circulating.
Table 3: healthcare data types, sub-types, strengths and limitations for COVID-19 in the UK (2020 to 2022)
|Type of healthcare data and source||Description or subsets||Strengths||Limitations|
|Aggregated COVID-19 hospital admissions and bed occupancy: NHS in each nation (for example, England)[footnote 46]||Data sets on hospital bed occupancy for COVID-19 (general and acute) available for each of the 4 UK nations, updated daily (at the height of the pandemic). Later data streams specific to bed type were used to describe COVID-19 occupancy, non-COVID-19 occupancy, and available beds.
Data sets on hospital mechanical ventilation bed occupancy for COVID-19 available for each of the 4 UK nations, updated daily (at the height of the pandemic). Data streams specific to bed type were used to describe COVID-19 occupancy in HDU or ICU, non-COVID-19 occupancy in HDU or ICU, and available beds in HDU or ICU.
|Direct measures of healthcare pressure from COVID-19 in the most seriously ill.
Healthcare pressures further illustrated when data streams specific to bed type were used to describe COVID-19 occupancy, non-COVID-19 occupancy and available beds.
|Hospital occupancy with COVID-19 was influenced by length of stay, with some changes, for example, due to age of those admitted, over the course of the pandemic.
The data must be interpreted in the operational context – for example, beds not able to be used due to isolation requirements should be ‘void’ rather than ‘unoccupied’.
Hospital data do not reflect pressures in the primary care system, including pressures on transport (ambulance) services.
|Aggregated staff absence in hospitals||Data sets on overall hospital staff absence (later COVID-19-related absence specifically as well as overall absence).||Reflects ill-health in the population.
Reflects healthcare pressure directly, but also healthcare pressures causing contributing to ill health in staff.
|Measure of healthcare pressure in hospitals, may not be specialist specific and thus key pressures (for example, limited respiratory care or ICU specialised staff) may not be identified.|
|Healthcare – individual level: NHS Digital||Data set with individual level data on hospital admissions – for example, in England, the Secondary Users Service provides a comprehensive repository for healthcare data. Updated weekly to monthly.[footnote 47]||Data were linkable to testing data types (including variants) and vaccination status.
Information on co-morbidities was available, to assess risk factors.
Linkage was done earlier in some UK nations, notably Scotland, providing important information for the 4 nations.
|Data on admissions were lagged as they were completed at discharge. The emergency care data set (from emergency department admission) was a proportion of admissions and timelier.
Data sharing took time and was only linked in real time late in 2020.
|Healthcare demand: NHS||Data set with NHS 111 calls and online COVID-19 search activity, updated weekly.||Provided early markers of healthcare demand and allowed triangulation with other healthcare metrics, as well as use in more complex modelling.||Impacted by overall government communications and strategy.|
|Vaccination administration[footnote 48]: NHS||Data set on the number of vaccinations administered by age and location, updated daily.||Provided an indication of vaccine coverage by the major risk factor – age – which was used to prioritise vaccination rollout.[footnote 49]||Choice of population denominator (NIMS or ONS) was difficult as both had limitations, with NIMS using data from primary care registries and ONS estimating based on the last census.[footnote 50], [footnote 51]|
|Primary care health data||Data set of sample of primary care (GP) health records in England.||Open-source software platform (for instance, OpenSafely) for analysis of electronic health records data. [footnote 52]
Allows for more in-depth analysis of the comparative impact of comorbidities.
Enables analysis of community administered anti-viral drugs and neutralising monoclonal antibodies.
|Sample of GP practices in England with geographical variation in coverage.
Data reporting lag can be considerable for real-time assessment.
Challenge linking to other data sets such as hospital admissions and infection history.
|SIREN immunity study[footnote 53]||Data set on results of testing for immunity in healthcare workers following vaccination and/or (re)infection over 2 years, across the UK.||Provided data on immunity following SARS-CoV-2 infection and vaccination in healthcare workers, allowing analysis of vaccine effectiveness.||Not representative of the population.|
Table 4: deaths data types, sub-types, strengths and limitations for COVID-19 in the UK (2020 to 2022)
|Type of deaths data and source||Description or subsets||Strengths||Limitations|
|Mortality – individual level: ONS||Data set of deaths, with causes of death as recorded on death certificate.||Assessment of specific mortality contribution of COVID-19.||Lagged data, and deaths within 28 days of positive test were used as a timelier indicator.|
|Mortality – individual level: NHS||Data set of deaths within 28 days of a positive test[footnote 54]||Timely assessment of deaths with COVID-19.||May have some incompleteness in comparison to diagnoses on death certifications.
In high-prevalence, low-severity settings (later in the epidemic) deaths became more apparent as being ‘with’ COVID-19 as opposed to ‘from’ COVID-19.
|Excess mortality[footnote 55]: ONS||Data set of excess mortality.||Includes indirect deaths due to the pandemic as well as direct deaths, and deaths due to a changing context (for example, healthcare pressure).||Data were too lagged to be interpreted during the pandemic.|
Table 5: other data types, sub-types, strengths and limitations for COVID-19 in the UK (2020 to 2022)
|Type of data and source||Description or subsets||Strengths||Limitations|
|COVID-19 national core studies[footnote 56], [footnote 57]||Data sets from a number of studies across epidemiology and surveillance (such as ONS CIS), transmission, clinical trials infrastructure, immunity (such as SIREN), longitudinal health and wellbeing, data and connectivity.||Bespoke studies providing data to answer key areas where the UK needed to increase its research scale or infrastructure to respond to key near-term strategic, policy and operational questions regarding COVID-19.||Initiated early in the pandemic, which was important due to the time required to set up the studies needed.|
|Mobility: Google and telecom providers||Data sets on mobility by sector.
Data sets of mobile phone network logs of cell connections.
To note: these data were not linked to health or patient data, and were only used in aggregate form to signal population-level changes in activity types.
|Non-health data source adding context – for example, adherence to NPIs (reflected in reduced mobility).||Require careful baseline comparison.
Aggregated at source to ensure privacy.
|Social contact studies: London School of Hygiene and Tropical Medicine, and Scottish Government||Social contact studies – for example, the CoMix social contact study in England and Scottish Contact Survey – provided data on the number of contacts people had over the course of the pandemic.[footnote 58],[footnote 59]||Important context to understand mixing and interpret cases, incidence, positivity across different demographics and inform modelling and public health interventions.||Participants may not fully reflect the population.|
|Behavioural science: YouGov (polling) and public health organisations||Data on attitudes, and other aspects influencing behaviour – for example, in relation to interventions, both NPIs and vaccines – were undertaken regularly through YouGov polling.[footnote 60]
In addition, specific behavioural science studies from academia and public health organisations across the 4 nations provided data – for example, the Public Health Wales ‘How are we doing in Wales’ survey.[footnote 61]
|Important to understand challenges to NPIs, adherence and vaccine uptake.||Participants may not fully reflect the population.
Studies at scale (for example, through polling) may lack nuance compared with methodologies using interviews, but these are not feasible at scale.
Important processes for data and analysis
It was helpful to have a central body bringing together, linking and analysing data with the right skills to get analytical outputs at speed for decision-makers in an easy-to-interpret format. In England, the Joint Biosecurity Centre (JBC) was established in May 2020, bringing together data science, intelligence assessment, academia and public health expertise to provide insight on the status of the COVID-19 epidemic in the UK.[footnote 62] It was important to have a wide range of expertise (for example, geospatial, coding, modelling and data visualisation) working in a single team and with access to a range of data at speed.
The following processes were key for effective data analysis and assessment.
Data acquisition and sharing
Data acquisition and sharing between different organisations was essential to understand a range of data available across the health and social care systems.
Early in the pandemic, however, there was a proliferation of separate data summaries from different organisations, shared in different formats – for example, through slides – rather than sharing data sets that could easily be analysed alongside one another.
Data acquisition at speed was extremely challenging, and this was due to:
- a lack of understanding about exactly what data sat where across multiple organisations
- a lack of routine relationships across some organisations
- a lack of formal agreements and data governance processes in place at the outset of the pandemic
- a need for an appropriate platform and sufficient data engineering capacity to onboard data swiftly
In response, the JBC set up a dedicated team for data acquisition to map what data sat where, form relationships with organisations to agree access and unblock barriers to access as they arose. Over time, understanding of data available, relationships across organisations and relevant formal agreements improved – for example, on 17 March 2020, a Control of Patient Information (COPI) notice was served to NHS Digital requesting that it securely share patient confidential data (with appropriate safeguards in place) to support situational analysis and assessment for pandemic response.[footnote 63]
However, this was slow and hampered speedy understanding of the situation that was key to the response. In some cases analytical teams used direct agreements for data sharing with a selection of NHS trusts in order to get more timely signals. Other organisations also made efforts to support swifter data sharing – for example, the Secure Research Service within the ONS offered a secure environment for the analysis of ONS data. This was fundamental to understand prevalence and severity, though the platform was originally designed for academic research and not operational response.
These efforts went a long way to facilitating swift data sharing, but they had to be done while responding to the pandemic. In the future, this risk can be mitigated by:
- mapping data locations so analytical teams know what data sits where
- forming strong working relationships across data product owners and analytical teams across organisations likely to be involved in emergency response
- preparing formal data sharing agreements and governance processes in advance
- having access to the right skills at speed, including:
- data engineers to onboard data
- legal teams to amend formal agreements as needed
- link teams to involve end users throughout
- dedicated data acquisition teams to unblock barriers
Data sharing improved over the course of the pandemic, particularly between national health services and public health organisations, but work is ongoing on this.
Linkage of data was also critically important and was similarly problematic in the early stages of the pandemic. Data linkage platforms and agreements were not in place and there needed to be expedited processes to review and enable data linkage at speed in an emergency.
Data linkage requires line list data and a secure research environment where multiple data sets can be linked securely. This can be facilitated through pseudo-identifiers for wider dissemination allowing for greater academic engagement.
Linkage across some data sets was possible in 2020 but the process of bringing all the necessary data sets together (including vaccination data) was not complete until late 2021. However, once established, data linkage enabled a number of important analyses such as on vaccine effectiveness and hospital admissions by variant and vaccination status.[footnote 64]
In the future, this process could be speeded up through:
- routine cooperation between organisations holding and analysing data
- creation of suitable environments for sharing data
- having data engineers in receiving organisations to onboard the data swiftly
- having legal agreements in place for sharing data
- a broader visibility of data sources and what types of data are stored where across relevant health and public health agencies likely to be involved in emergency response
Automated production of analytics, which was often based on open-source analytical software such as R and Python, enabled rapid analysis.
At the outset of the pandemic, some teams and organisations had labour-intensive manual compilation of data in place, but this was rapidly adapted to automated processes.
Transparency in terms of data was supported through tools such as public dashboards, which are explored in more detail below (case study 1).
Transparency for analysis and interpretation was supported through publication of the Scientific Advisory Group for Emergencies (SAGE) papers and other advisory bodies, such as SPI-M-O and the Scientific Pandemic Insights Group on Behaviours (SPI-B).
There was a strong emphasis on explanation of the limitations of the data and analysis, alongside any internally produced products or published outputs. The 4 UK nations had their own advisory structures for seeking and adapting advice specific to their circumstances. The Scottish Government’s COVID-19 Advisory Group, for example, reported particular benefit in the reciprocity agreement it had with SAGE. There was an ongoing challenge in the discrepancy of operational outputs and public health surveillance that could be misinterpreted by the public.
Embedding people across partner organisations and throughout the 4 nations supported close joint working across a number of disciplines.
For example, personnel from organisations across the 4 UK nations were embedded within UKHSA and had access to its data and analysis, supporting data sharing and analytical collaboration across the UK.
Case study 1: the UK COVID-19 Dashboard
The Coronavirus (COVID-19) in the UK Dashboard supported transparency through provision of near real-time data to the public and the research community.
By providing timely, open data, the dashboard supported not only formal research initiatives but also ‘citizen science’. Many amateur analysts or analysts from different fields (such as actuaries) conducted analyses with important insights – these of course needed rapid review by experts to ensure findings were accurate and complemented larger research initiatives that were more regularly used in the response.
Individual UK nations had additional dashboards to focus on relevant data for their nation, such as Northern Ireland’s COVID-19 Dashboard, Scotland’s COVID-19 Dashboard, and Public Health Wales’s COVID-19 dashboard.[footnote 67], [footnote 68], [footnote 69]
The UK dashboard supported strategic decision-making, informed the pandemic response and updated the public and the media, reporting near real-time data on testing, cases, deaths, vaccinations and healthcare.[footnote 70]
In addition, metadata gave context to data sets. There was guidance for developers to set up automated data feeds and a customisable downloads page. There were multiple application programming interfaces (APIs) to make the data as open and reusable as possible.
How the dashboard developed
The dashboard was set up on a platform supplied by the NHS in England and managed by a small, multidisciplinary team of data scientists, information specialists, user researchers and development staff based in UKHSA. It was overseen by a multi-agency steering group, and work focused across 3 equally important areas:
- the digital user journey
Data were collated from numerous sources across all 4 nations of the UK at national and neighbourhood level, and the 4 nations worked jointly to improve cross-UK data available on the dashboard throughout.
As the pandemic progressed and evolved, so too did the dashboard. In its first iteration, the dashboard simply presented a map and a limited number of charts reporting key metrics on cases and deaths. Following updates in response to user research, the dashboard had the following updates:
- accessibility and user experience improvements, including different visualisations – such as graphs of different time frames, waffle charts and heatmaps, data tables, simple summary documents and interactive maps
- a postcode search facility, to allow people to view their local information and tell them what local alert level they were in – this allowed users to understand more clearly the epidemiological data informing some of the decisions on tiering
- addition of the vaccination topic page, including data on uptake by demographics and interactive map to allow comparison of percentage of uptake by dose
- a new metrics documentation page that lists all current and historic metrics searchable by name, category, type or availability by area type (by May 2022 the dashboard presented over 200 metrics)
- What’s new pages detailing the latest updates, changes and any data issues
- one of the bigger changes in early 2022 was the move to a new-episode based definition, with metrics showing first episodes and possible reinfections by specimen date.[footnote 71]
Most data were updated daily throughout much of the pandemic – for example, cases presented by specimen date and deaths reported by date of death. However, by early 2022, due to falling mortality data, these no longer needed to be updated with such frequency. Weekend reporting in England ended, and front-page charts were changed to show 7 days of data rather than daily changes, in line with the government’s Living with COVID-19 strategy.[footnote 72] Reporting cadence reduced to weekly from early July 2022, with contingency plans in place should a return to increased reporting frequency be required.
The dashboard has been a prominent public resource, both through media reports and through direct access by the public. At its peak, there were around one million unique users per day and up to 70 million daily hits. Public use of the dashboard further increased when local data were added and provided more personally relevant data to individuals.
Reflections on the public-facing dashboard
Challenges developing the dashboard included:
- data volume: data came from over 26 separate sources, providing in excess of 700 million raw figures to handle each day
- daily surge in demand: at 4pm each day, demand surged for updated data, with dashboard usage reaching 250,000 to 300,000 per minute on data release – this required constant monitoring and activity to prevent service failure. Actions included increasing database capacity, optimising code, and implementing multiple layers of caching
- creating UK data: the 4 nations collaborated to provide a single UK figure for as many metrics as possible – this brought challenges with different nations working to different timescales and collecting data in different formats
Lessons learned from developing the COVID-19 Dashboard
1. The value of feedback: feedback was received from user surveys, user testing and emails and informed improvements to user research session design, standard operating procedures, quality assurance processes and overall design and ‘user experience’.
2. Open format data: this allowed access, building trust and rapid identification of errors. Downsides included:
- room for misinterpretation: for example, media reporting incorrect information requiring urgent correction
- pressure to publish: daily publishing to such high demand and over a prolonged period was difficult to sustain for a small team
- no room for delays: once expectation was set, it was hard to change. People relied on the information – for example, in planning activities
3. The need for reproducible analytical pipelines (RAPs): RAPs were essential for handling large volumes of data rapidly. The data pipeline began on NHS Foundry and iteratively expanded over time to several hundred transforms covering billions of data points, from numerous different disparate sources.
Some key lessons:
- consider changes carefully: once a flow was set up, altering one part could have unintended consequences later
- timescales and planning: RAPs can both decrease and increase turnaround times for changes to outputs – incorporating fundamental changes or new reporting requests takes time, so planning was essential
How analyses were assessed to inform policy
It was important to have clear processes to collate various data streams and analyses to assess the current situation throughout the pandemic – including how and who should communicate data and insights to decision-makers.
A technical board with representation from all 4 UK nations oversaw an overall assessment of the risk that COVID-19 presented at any time. This board oversaw and agreed the methodology for the UK COVID-19 alert level, which provided public communications on risk across the 4 nations by using 5 levels to describe the epidemic.[footnote 73] The technical board also agreed a consistent framework for monitoring COVID-19 internationally, with analysis of a range of indicators for each country, territory, or island group, to inform risk assessment and the need for intervention.[footnote 74]
Each of the UK nations also set up its own assessments to support decision-making.
In Wales, for example, an internal dashboard within the Welsh Government was developed and used to populate reports, such as the COVID-19 situational report.[footnote 75]
In England, a cadence of bronze, silver and gold local action committee meetings was established and undertaken each week to assess latest data alongside input from local directors of public health and regional teams (see case study 3 below). The bronze meeting used early warning indicators to identify areas and key issues of concern, ensuring local insight and professional judgement from public health leads was considered alongside quantitative data (for example, on cases and admissions to hospital). Key situational awareness updates and associated policy recommendations were then escalated up through the silver meeting chaired by the CMO for England with input from public health regional directors) and the gold meeting (chaired by the Secretary of State for Health and Social Care).
Data, analysis and assessment from these meetings for England were shared across government, including through the Cabinet Office Dashboard, with frequent meetings including the Prime Minister (see case study 2, below). At key times the data, analysis and assessment were brought to national decision-making committees, together with assessment from other agencies to inform decision-making. COVID-O was a ministerial committee convened to handle the COVID-19 emergency and the decision-making body in England.
Alongside this, other forums conducted assessments of specific or technical questions, such as the Variant Technical Group which brought together interdisciplinary technical expertise to risk-assess new variants, or the Data Debrief Group which compared data from different surveillance studies across the 4 nations.[footnote 76] Finally, daily situational awareness calls were used to share information across public health communities.
The outputs of such assessments were important to government departments, operational agencies and SAGE and its sub-groups.
Case study 2: the Prime Minister’s and other senior ministers’ daily data brief
Over the course of the pandemic, and particularly in the run-up to major decisions, the Prime Minister held regular data briefings alongside discussion and review with the CMO for England and the GCSA.
These briefings were supported by presentation of data visualisations and analysis, prepared by the COVID-19 Taskforce at the Cabinet Office and generally known as ‘the Cabinet Office Dashboard’.
The frequency of briefings varied over time, up to daily. Separate data briefings were also given to the First Ministers and leaders of the 4 UK nations, ministers, and senior officials. The Cabinet Office dashboard presented a broad range of data from different departments, much of which was manually assembled overnight each day, providing an overview of the pandemic and its impacts on society and the economy.
The main challenge to assembling the Cabinet Office dashboard was inconsistent data formatting. Most government departments did not have the data engineering expertise required to set up APIs to facilitate data exchange and this meant that data sets often had to be assembled by hand which was time consuming and a potential source of error.
By Autumn 2020, key testing and health data sets were available via API from PHE (latterly UKHSA) and the NHS, but other data continued to be shared by other mechanisms (for example, email) throughout the pandemic.
Machine-readable data and the development of reproducible analytical pipelines (RAPs) were critical to the data briefings, as noted above for the UK COVID-19 Dashboard. The RAPs allowed millions of individual data points to be ingested and transformed and a suite of several hundred charts and visualisations to be generated in a timely and robust fashion. The RAPs allowed more analytical resource to be devoted to refining the end product to ensure that it met the needs of the Prime Minister and other decision-makers.
Case study 3: bronze, silver and gold situation reports
In England, the bronze, silver and gold local action committees were informed by comprehensive national and regional situation reports which were developed using the latest data visualisations and analysis.
How the situation reports developed
The content of situation reports evolved to reflect the changing landscape of the pandemic and to support decision-makers with relevant data to inform upcoming policy decisions. In October 2020, decision-making was focused on the implementation of local COVID-19 restrictions (tiering and local COVID-19 alert levels). A range of epidemiological data were presented alongside healthcare metrics (pressure on the NHS – people admitted to hospital and occupancy of hospital beds) regionally and locally to inform interventions.[footnote 77]
In March 2021 when the government was preparing to lift national NPIs, the reports were updated and re-structured to give senior decision-makers an update on progress made against the 4 key tests for exiting lockdown, with key data on variants or vaccine uptake.[footnote 78]
The creation of these reports involved a considerable resource initially, with staff manually adding charts from a range of data sources and other products (including outputs from PHE, latterly UKHSA, Department of Health and Social Care (DHSC) and the NHS). As data were increasingly shared directly across organisations it was possible to automate this.
The situation reports also increasingly incorporated relevant data from Northern Ireland, Scotland and Wales to understand the progression of the pandemic across the UK, as well as relevant international comparators which were helpful for understanding emerging variants in spite of variations in case ascertainment and genomic surveillance.
Finally, the situation reports were refined weekly in response to continual feedback – for example, refining how data were visualised to aid interpretation. For example, heatmaps (see Figure 2) were useful to visualise large and complex data while further detail was provided in reports.[footnote 79]
The reports brought together a range of health and non-health data, as well as local insights on this data, and provided an assessment of the important messages arising from both the data and local intelligence for decision-makers. This supported decision-makers as well as those involved in the pandemic response, including those who did not have a public health background.
Cross-departmental collaboration in the production of the reports helped ensure data consistency and avoided decision-makers being presented apparently conflicting data due to presentational differences.
The reports provided the basis for a range of other situational awareness products and briefings which used their data visualisations and analysis, but this brought the risk that nuances were lost in the process. Abridged versions were used for briefing MPs, the WHO, the Prime Minister, and senior leaders across government, as well as in COVID-O meetings, discussions involving international liaison and for media communications.
The automation of the reports required support and collaboration from across government but was important in saving resource and allowing teams to work on more complex analyses.
The initial reports were very large (with around 400 slides at times) providing a range of different graphs and data visualisations with different data types often detailed by demographics and geographies. These provided a comprehensive assessment and were important especially at a time when different measures were in force in different geographical locations but were challenging to produce, quality assure, distribute and navigate in meetings.
The shift to shorter, more focused presentations enabled clearer narratives but required more iteration. It was essential that key stakeholders saw situation reports in advance of local action committee meetings.
Figure 2: heatmap of COVID-19 case rates, by age group and region for England in 2021[footnote 80]
Description text for Figure 2:
This heatmap is an example of the type of data visualisation used during the pandemic.
Six tables, with cells shaded as heat maps, show case rates in the 6 English regions. Each column on each table represents a day, running 27 August to 19 September 2021, and each row on each table represents an age group (starting from the bottom: 0 to 12 years, 13 to 17, 18 to 29, 30 to 39, 40 to 49, 50 to 59, 60 plus, and a summary row at the top for all ages).
Each cell is shaded on a colour gradient from light yellow (representing no confirmed cases per 100,000 on a 7-day rolling rate) through shades of red and purple to black (500 or more confirmed cases per 100,000 on a 7-day rolling rate).
A red dashed line on the heatmap shows the last 4 days where data may be incomplete due to reporting delays and removal of lateral flow device testing data without accompanying negative PCR results.
Reflections and advice for a future CMO or GCSA
Good data are essential for an effective pandemic response – otherwise decision-makers, service providers and researchers are flying blind.
Lack of even basic data was particularly acute in the early stages of the pandemic but difficulties with accessing, sharing and linking data persisted for much longer, although the situation improved significantly thanks to the efforts of those involved.
Data sharing and linkage is essential from the outset.
In any health emergency, data from hospitals, primary care, health protection agencies and academic research will need to be shared rapidly between a range of government departments, public sector organisations and academic researchers. This requires data governance processes and interoperable data platforms to support data sharing and interorganisational collaboration.
The following 4 areas are important to understand:
which data are required, with consideration of who ‘owns’ the data and how data will be accessed
which disparate data sets need to be linked to enable necessary analyses, and how will this be done
who will analyse the data to provide insight and inform assessment
which data sets will need to be newly created
Data curation and analysis required considerable resource.
This was only fully effective once automation allowed multiple data streams to be integrated very rapidly.
Surveillance studies, in particular the ONS CIS and REACT, were important to provide consistent, representative data on positivity in the community and in particular settings, and to include those who were asymptomatic.
Analyses had to be continually adapted to understand the evolving epidemic.
For example, later in the epidemic with high levels of immunity, a less severe variant of concern (Omicron) and high prevalence of infection (from January 2022) meant it was increasingly apparent people were being admitted to hospital ‘with’ COVID-19, rather than ‘for’ COVID-19, based on symptoms and reported diagnoses. This was important for risk assessment and the distinction needs to be adequately captured in data.
Data lags limited analyses.
Some are unavoidable (for example, the natural lag between infection and hospitalisation). Others reflected operational processes – for example, individual data on diagnoses were completed at discharge, affecting the linkage of individual-level hospital data to case data to allow analysis of hospital admissions for specific variants.
Transparency of data helped engage the public with public health interventions.
The COVID-19 dashboard was central to this. Data visualisations are important for the public but also help tell the story to and for decision-makers.
Rapid collation of data, analysis and assessment of the situation required multidisciplinary working.
This included epidemiologists, clinicians, analysts, statisticians and data scientists (including data visualisation experts). Cross-organisational working, including across geographies and within and beyond government (for example, with academia) was also key.