February 21, 2024

Health Benefit

Healthy is Rich, Today's Best Investment

Harnessing the power of synthetic data in healthcare: innovation, application, and privacy

8 min read
  • Assefa, S. Generating Synthetic Data in Finance: Opportunities, Challenges and Pitfalls. Available at SSRN: (2020).

  • Gonzales, A., Guruswamy, G. & Smith, S. R. Synthetic data in health care: A narrative review. PLOS Digital Health 2, e0000082 (2023).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • McDuff, D., Curran T. & Kadambi, A. Synthetic Data in Healthcare. arXiv preprint arXiv:2304.03243 (2023).

  • Gotz, D. & Borland, D. Data-driven healthcare: challenges and opportunities for interactive visualization. IEEE computer Graph. Appl. 36, 90–96 (2016).

    Article 

    Google Scholar 

  • Jordon J. et al. Weller Adrian. Synthetic Data – what, why and how? arXiv: 2205.03257 [cs], (2022).

  • Philpott, D. A Guide to Federal Terms and Acronyms: Bernan Press; (2017)

  • Metropolis, N. & Ulam, S. The Monte Carlo method. J. Am. Stat. Assoc. 44, 335–341 (1949).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Goodfellow, Ian et al. Generative adversarial networks. Commun. ACM 63, 139–144 (2020).

    Article 

    Google Scholar 

  • Diederik, P. Kingma and Max Welling. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114, (2013).

  • Eric Bonabeau Agent-based modeling: Methods and techniques for simulating human systems. Proc. Natl Acad. Sci. 99, 7280–7287 (2002).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Carmona, R. and Delarue, F. Probabilistic Theory of Mean Field Games with Applications, volume 84. Springer (2018).

  • Walonoski, J., et al. Synthea: An approach, method, and software mechanism for generating synthetic patients and the synthetic electronic health care record. J Am Med Inform Assoc. Epub 2017/10/13. PMID: 29025144 (2017).

  • MDClone Launches New Phase of Collaboration with Washington University in St. Louis. [cited 31 October 2019]. In: MDClone News [Internet]. Available from: (2019).

  • Reiter, J. Inference for partially synthetic, public use microdata sets. Surv. Methodol. 29, 181–188 (2003).

    Google Scholar 

  • Loong, B., Zaslavsky, A. M., He, Y. & Harrington, D. P. Disclosure control using partially synthetic data for large-scale health surveys, with applications to CanCORS. Stat. Med 32, 4139–4161 (2013).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Raghunathan, T., Reiter, J. & Rubin, D. Multiple imputation for statistical disclosure limitation. J. Stat. 19, 1–16 (2003).

    Google Scholar 

  • Reiner Benaim, A. et al. Analyzing medical research results based on synthetic data and their relation to real data results: systematic comparison from five observational studies. JMIR Med Inf. 8, e16492 (2020).

    Article 

    Google Scholar 

  • Ngufor, C., Van Houten, H., Caffo, B. S., Shah, N. D. & McCoy, R. G. Mixed effect machine learning: a framework for predicting longitudinal change in hemoglobin A1c. J. Biomed. Inf. 89, 56–67 (2019).

    Article 

    Google Scholar 

  • Enanoria, W. T. et al. The effect of contact investigations and public health interventions in the control and prevention of measles transmission: a simulation study. PLoS ONE 11, e0167160 (2016).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Laderas, T. et al. Teaching data science fundamentals through realistic synthetic clinical cardiovascular data. bioRxiv. 232611. (2017).

  • Harron, K., Gilbert, R., Cromwell, D. & Van Der Meulen, J. Linking data for mothers and babies in de-identified electronic health data. PLoS One. 11. (2016).

  • Ringel, J. S., Eibner, C., Girosi, F., Cordova, A. & McGlynn, E. A. Modeling health care policy alternatives. Health Serv. Res 45, 1541–1558 (2010).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Aljaaf, A. J. et al. Partially synthesised dataset to improve prediction accuracy. In: Huang D. S., Bevilacqua V., Premanratne P., editors. Intelligent Computing Theories and Application. Switzerland: Springer Cham. p. 855–866 (2016).

  • Amoon, A. T., Arah, O. A. & Kheifets, L. The sensitivity of reported effects of EMF on childhood leukemia to uncontrolled confounding by residential mobility: a hybrid simulation study and an empirical analysis using CAPS data. Cancer Causes Control 30, 901–908 (2019).

    Article 
    PubMed 

    Google Scholar 

  • Symonds, P. et al. MicroEnv: a microsimulation model for quantifying the impacts of environmental policies on population health and health inequalities. Sci. Total Environ. 697, 134105 (2019).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Hennessy, D. Creating a synthetic database for use in microsimulation models to investigate alternative health care financing strategies in Canada. Int J. Microsimul 8, 41–74 (2015).

    Google Scholar 

  • Sun, Z., Wang, F., Hu, J. LINKAGE: An approach for comprehensive risk prediction for care management. In: Cao L., Zhang C., editors. Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Sydney, Australia. New York: Association for Computing Machinery; 2015. 1145–1154 (2015).

  • Davis, P., Lay-Yee, R. & Pearson, J. Using micro-simulation to create a synthesised data set and test policy options: the case of health service effects under demographic ageing. Health Policy 97, 267–274 (2010).

    Article 
    PubMed 

    Google Scholar 

  • Ive, J. et al. Generation and evaluation of artificial mental health records for natural language processing. NPJ digital Med. 3, 1–9 (2020).

    Article 

    Google Scholar 

  • Jiang, Y., Chen, H., Loew, M., Ko, H. COVID-19 CT Image Synthesis with a Conditional Generative Adversarial Network. arXiv: arXiv:2007.14638 (2020)

  • Das, H. P. et al Conditional Synthetic Data Generation for Robust Machine Learning Applications with Limited Pandemic Data arXiv:2109.0648609.06486arXiv:2109.06486Top of FormBottom of Form

  • Cheng, W., Lian, W. & Tian, J. Building the hospital intelligent twins for all-scenario intelligence health care. DIGITAL HEALTH 8. (2022)

  • Karakra, A., Fontanili, F., Lamine, E. & Lamothe, J. “HospiT’Win: A Predictive Simulation-Based Digital Twin for Patients Pathways in Hospital,” 2019 IEEE EMBS International Conference on Biomedical & Health Informatics (BHI), Chicago, IL, USA, pp. 1-4, (2019).

  • Cockrell, C., Schobel-McHugh, S., Lisboa, F., Vodovotz, Y., An, G. Generating synthetic data with a mechanism-based Critical Illness Digital Twin: Demonstration for Post Traumatic Acute Respiratory Distress Syndrome. bioRxiv 2022.11.22.517524.

  • Filippo, M. D. et al. Single-Cell Digital Twins for Cancer Preclinical Investigation. Methods Mol. Biol. (Clifton NJ) 2088, 331–343 (2020).

    Article 
    CAS 

    Google Scholar 

  • Zhang, J., Qian, H. & Zhou, H. Application and Research of Digital Twin Technology in Safety and Health Monitoring of the Elderly in Community. Zhongguo Yi Liao Qi Xie Za Zhi Chin. J. Med Instrum. 43, 410–413 (2019).

    Google Scholar 

  • Hose, D. R. et al. Cardiovascular Models for Personalised Medicine: Where Now and Where Next? Med Eng. Phys. 72, 38–48 (2019).

    Article 
    PubMed 

    Google Scholar 

  • Pencina, M. J., Goldstein, B. A. & D’Agostino, R. B. N. Engl. J. Med. 382, 1583 (2020).

    Article 
    PubMed 

    Google Scholar 

  • Norori, N., Hu, Q., Aellen, F. M., Faraci, F. D. & Tzovara, A. Addressing bias in big data and AI for health care: A call for open science. Patterns (N. Y) 2(Oct), 100347 (2021).

    Article 
    PubMed 

    Google Scholar 

  • Naeem, M. F., Oh, S. J., Uh, Y., Choi, Y. & Yoo, J. In International Conference on Machine Learning, 7176–7185 (PMLR, 2020).

  • Sajjadi, M. S., Bachem, O., Lucic, M., Bousquet, O. & Gelly, S. In Advances in Neural Information Processing Systems (2018).

  • Alaa, A. M., van Breugel, B., Saveliev, E. & van der Schaar, M. In International Conference on Machine Learning (2021).

  • Möller, F. et al. Out-of-distribution Detection and Generation using Soft Brownian Offset Sampling and Autoencoders. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Nashville, TN, USA, pp. 46-55. (2021).

  • Chen, G. et al. Learning Open Set Network with Discriminative Reciprocal Points. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J. M. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science, vol 12348. Springer, Cham. (2020).

  • Molnar, C. Interpretable Machine Learning: A Guide for Making Black Box Models Explainable. [Independently Published] (2022).

  • Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M. & Mongelli, M.“Characterization of Synthetic Health Data Using Rule-Based Artificial Intelligence Models,” in IEEE Journal of Biomedical and Health Informatics. https://doi.org/10.1109/JBHI.2023.3236722.

  • Ghaffar Nia, N., Kaplanoglu, E. & Nasab, A. Evaluation of artificial intelligence techniques in disease diagnosis and prediction. Discov. Artif. Intell. 3, 5 (2023).

    Article 

    Google Scholar 

  • Celino, I. Who is this Explanation for? Human Intelligence and Knowledge Graphs for eXplainable AI. arXiv: 2005.13275 (2020).

  • Hatherley, J., Sparrow, R., Howard, M. (2022). The Virtues of Interpretable Medical Artificial Intelligence. Camb Q Healthc Ethics:1-10. https://doi.org/10.1017/S0963180122000305.

  • Courtois, M., Filiot, A., & Ficheur, G. Distribution-Based Similarity Measures Applied to Laboratory Results Matching. In Applying the FAIR Principles to Accelerate Health Research in Europe in the Post COVID-19 Era (pp. 94-98). IOS Press (2021).

  • Xia, Y. Correlation and association analyses in microbiome study integrating multiomics in health and disease. Prog. Mol. Biol. Transl. Sci. 171, 309–491 (2020).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Reddy, G. T. et al. Analysis of dimensionality reduction techniques on big data. Ieee Access 8, 54776–54788 (2020).

    Article 

    Google Scholar 

  • Alur, R. et al. Auditing for Human Expertise. arXiv: 2306.01646 (2023).

  • Vivian Lai, S et al. Human-AI Collaboration via Conditional Delegation: A Case Study of Content Moderation. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems (CHI ‘22). Association for Computing Machinery, New York, NY, USA, Article 54, 1–18. (2022).

  • Tewari, A. mHealth Systems Need a Privacy-by-Design Approach: Commentary on “Federated Machine Learning, Privacy-Enhancing Technologies, and Data Protection Laws in Medical Research: Scoping Review”. J. Med. Internet Res. 25, e46700 (2023).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Arora, A. & Arora, A. Synthetic patient data in health care: a widening legal loophole. Lancet 399(Apr), 1601–1602 (2022).

    Article 
    PubMed 

    Google Scholar 

  • Appenzeller, A., Leitner, M., Philipp, P., Krempel, E. & Beyerer, J. Privacy and Utility of Private Synthetic Data for Medical Data Analyses. Appl. Sci. 12, 12320 (2022).

    Article 
    CAS 

    Google Scholar 

  • Mendelevitch, O., & Lesh, M. D. Fidelity and privacy of synthetic medical data. arXiv preprint arXiv:2101.08658.(2021).

  • Sweeney, L. K-ANONYMITY: A MODEL FOR PROTECTING PRIVACY. Int. J. Uncertain., Fuzziness Knowl.-Based Syst. 10(Oct.), 557–570 (2002).

    Article 

    Google Scholar 

  • Henriksen-Bulmer, J. & Jeary, S. Re-Identification Attacks—A Systematic Literature Review. Int. J. Inf. Manag. 36(Dec.), 1184–1192 (2016).

    Article 

    Google Scholar 

  • Chen, R. J., Lu, M. Y., Chen, T. Y., Williamson, D. F. K. & Mahmood, F. Synthetic data in machine learning for medicine and healthcare. Nat. Biomed. Eng. 5(Jun), 493–497 (2021).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • US Food and Drug Administration. (n.d.). Artificial intelligence and machine learning in software as a medical device. https://www.fda.gov/medical-devices/software-medical-device-samd/artificial-intelligence-and-machine-learning-software-medical-device.

  • Brauneck, A. et al. Federated machine learning, privacy-enhancing technologies, and data protection laws in medical research: scoping review. J. Med Internet Res 25, e41588 (2023).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Dwork, C. Differential Privacy. In: Bugliesi, M., Preneel, B., Sassone, V., Wegener, I. (eds) Automata, Languages and Programming. ICALP 2006. Lecture Notes in Computer Science, vol 4052. Springer, Berlin, Heidelberg. (2006).

  • Varma, G., Chauhan, R. & Singh, D. Sarve: synthetic data and local differential privacy for private frequency estimation. Cybersecurity 5, 26 (2022).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Bao, E., Xiao, X., Zhao, J., Zhang, D., & Ding, B. Synthetic data generation with differential privacy via Bayesian networks. Journal of Privacy and Confidentiality 11. (2021).

  • Rosenblatt, L. et al. Differentially Private Synthetic Data: Applied Evaluations and Enhancements. arXiv:2011.05537

  • Dwork, C., Kohli, N. & Mulligan, D. Differential privacy in practice: expose your epsilons. JPC. 9 (2019).

  • Ficek, J., Wang, W., Chen, H., Dagne, G. & Daley, E. Differential privacy in health research: a scoping review. J. Am. Med Inf. Assoc. 28, 2269–2276 (2021).

    Article 

    Google Scholar 

  • Jordon, J., Yoon, J., & Van Der Schaar, M. PATE-GAN: Generating synthetic data with differential privacy guarantees. In International conference on learning representations. (2019).

  • Xie, L., Lin, K., Wang, S., Wang, F., Zhou, J. Differentially Private Generative Adversarial Network. arXiv:1802.06739.

  • Patel, J. & Bhatt, N. Review of digital image forgery detection. Int. J. Recent Innov. Trends Comput. Commun. 5, 152–155 (2017).

    Google Scholar 

  • Sadiku, M., Shadare, A. & Musa, S. Digital chain of custody. Int. J. Adv. Res. Comput. Sci. Softw. Eng. 7, 117–118 (2017).

    Google Scholar 

  • Hamid, A. & Naaz, R. Forensic-chain: Blockchain based digital forensics chain of custody with POC in hyperledger composer. Int. J. Digit. Investig. 28, 44–55 (2019).

    Article 

    Google Scholar 

  • Wang, S., Yang, M., Ge, T., Luo, Y. and Fu. X. BBS: A Blockchain Big-Data Sharing System. ICC 2022 – IEEE International Conference on Communications, Seoul, Korea, Republic of, pp. 4205-4210, (2022).

  • link

    Leave a Reply

    Your email address will not be published. Required fields are marked *

    Copyright © All rights reserved. | Newsphere by AF themes.