Study selection
The search yielded 2,005 articles from the databases PubMed, Scopus, EMBASE, IEEE Xplore, ACM Digital Library and Google Scholar (see Fig. 3 for the PRISMA flow diagram). After article duplicates were removed, 1,862 remained for title/abstract screening. A total of 44 articles were identified for full-text review. Six studies met our inclusion criteria and were included in the present review. Four of these studies were identified through the screening process and two were identified at full-text review via snowballing reference lists of screened articles.

Study characteristics
Table 4 presents the characteristics of studies included in this review. Five studies were conducted in high-income countries; three were conducted in USA [30,31,32], two in Canada [33, 34] and one in China, uan upper-middle-income country [35]. Broadly, the objective of included studies was to describe the development of a population health surveillance platform. Studies aimed to (a) describe population surveillance methodology for people with CHD [30] (b) develop an EHR-based public health information exchange [31] (c) create a socially-generated health data model for disease surveillance (via Social InfoButtons) [32] (d) track the NCD epidemic in China (NCDCMS) [35] (e) describe development of a distributed model (CCDSS) for chronic disease surveillance in Canada [33] and (f) develop PopHR, a big data platform for population health surveillance in Canada [34].
Three of six studies were conducted in a partnered academia (university) and healthcare (health centres, department of health) setting [30, 31, 34]. Two studies were conducted in government settings [33, 35] (public health agencies) and one was conducted exclusively in academia (university) [32].
Target population
The target population for included studies was large and varied. Three studies leveraged their sample from large geographical areas (Ningbo City [35], all Canadian provinces and territories [33], Montreal [34]) with populations ranging from 1 to 6 million persons. Two studies drew their samples from clinical health services (via EHRs) that ranged from 73,000 to 5 million persons [30, 35]. One study drew their sample from social media and government and health websites and quantified the sample in terms of patients, posts and reviews (~ 160,000 total) [32].
Five studies described digital aggregation of real-world data and traditional data to support targeting of 28 NCDs (e.g. asthma, type 2 diabetes mellitus, arthritis, cancer, chronic kidney failure, chronic obstructive pulmonary disease, mental illness, obesity). One study focused exclusively on congenital heart disease (CHD) [30].
Characteristics of real-world and traditional data
Table 5 presents the characteristics, aggregation and application of real-world data and traditional data to support precision public health for NCDs. Four included studies leveraged ‘Clinical, Medication & Family Hx’ as the primary real-world data type [30, 31, 34, 35]. The data source generating this real-world data were organic and included an electronic health or medical record (EHR/EMR) [30, 31, 35] or public health insurance provider [34]. Data aggregation then occurred with real-world types of claims/billing [30, 34] and environmental (retail transactions) [34], and with traditional data via the country-specific Census [31, 34] (lifestyle factors, environmental characteristics), a disease registry (CHD) [30], electronic clinical history database [35] and administrative hospital discharge abstracts [33]. The social media platform ‘Twitter’ was aggregated with traditional health data via government and health websites and bibliographic databases (Centers for Disease Control, PubMed, WebMD, MedHelp) in Social InfoButtons [32].
Four of six included studies performed a static cross-sectional extraction of real-world data and traditional data to build their population health surveillance platform [30,31,32,33]. These studies used EHRs, claims databases, health information systems, registries, Census, social media and government and health websites and administrative hospital datasets. The NCDCMS—a public health data exchange platform to track NCDs in China—was updated using real-time (1-day refresh frequency) data extracted from an EHR and electronic clinical history database [35]. PopHR performed near real-time updates that varied in refresh frequency (2 weeks to 1 year) depending on the type of real-world data (insurance provider, retail transactions) or traditional data (Census) source [34].
Digital aggregation of real-world and traditional data
Methods for digital aggregation of real-world data and traditional data were heterogenous across all studies. Static extracts of real-world data were aggregated via an external, independent vendor (Fine-Grained Records and Linkage Tool) and hosted in Microsoft Access [30], using a public health information network (PHIN) data system [31], via semantic web technology in a digital platform (Social InfoButtons) [32] or data reconciliation of pre-aggregated data (Canadian CCDSS) [33]. Real-world data were aggregated in real-time via public health data exchange with uniform data standards (NCDCMS) [35]. In PopHR, real-world data and traditional data were aggregated in near real-time using server-client architecture to execute data processing, integration and semantics [34].
Application of aggregated data
The primary end-users of platforms developed by the included studies were public health professionals and government. PopHR was implemented with test users in public health and health service agencies to (a) verify software (b) generate rapid feedback and (c) conduct usability testing [34]. Social InfoButtons described government as an end-user to assist in disease surveillance and increase awareness of social health trends [32]. The CCDSS was implemented by the Public Health Agency of Canada and produces data reports, publications and interactive web-based open data resources [33]. The NCDCMS has been implemented in five cities in Zhejiang Province to improve public health surveillance, although the exact mechanism of this was unclear [35]. In Guilbert et al [31]., static summary reports were produced via their EHR-based HIE to describe demographic, clinic and community disease characteristics. One study transferred their deidentified and deduplicated surveillance dataset for CHD to the Centers for Disease Control for review [30].
Five of six included studies applied population health analytics to their aggregated real-world data and traditional data [31,32,33,34,35]. Geospatial analytics were implemented to geocode populations by disease characteristics [31, 32, 34, 35] and conduct spatial regression to geographically contextualise disease characteristics [31]. Descriptive analytics were applied across four platforms to describe disease prevalence according to multi-levelled stratification conditions (via bar charts, data tables and scatter plots) [31,32,33,34]. Temporal analytics were applied via time series to indicate population or geographical changes in prevalence over time [34], or to compare treatment efficacy in populations over time [32]. Inferential analytics were performed in the HIE by Guilbert et al [31]. to consider multivariate queries and mine relevant data characteristics. One study did not report use of analytics [30]. No studies incorporated predictive or prescriptive analytics.
Digital health transformation towards precision public health
Three horizons (see Fig. 4) for digital health transformation towards precision public health for NCDs have been proposed as a strategic roadmap to guide digital public health investment, decision-making and policy [13]. One study aligned with Horizon 1 ‘Digital public health workflows’; Glidewell et al [30]. linked real-world data and traditional data and hosted the aggregate data on Microsoft Access to provide the digital foundation for improved health surveillance of people with CHD.

Results mapping—Three horizons towards precision public health for noncommunicable diseases
Five studies aligned with Horizon 2 ‘Population health data and analytics’. Shaban-Nejad et al [34]. integrated geospatial, descriptive, temporal and comparative analytics to chronic disease surveillance and tested PopHR in a real-world public health and health service agency setting. The social analytic tool Social InfoButtons provided the government with enhanced disease surveillance capability through geocoding and descriptive analytics of social and disease topic discussion online [32].
No studies aligned with Horizon 3 ‘Precision public health’.
link