Cofounder and CTO of Integral.
The healthcare industry is wrought with inefficiencies—especially when it comes to interoperability between systems. In order to make a confident diagnosis, providers require access to patient data across health systems. While regulations have encouraged standard formats to promote easier data sharing, a significant gap remains between data at rest and comprehensible data access.
Interoperability is not a new problem, and legislation from the Obama era via the Cures Act of 2016 has attempted to encourage easier data sharing. When data flows freely, things become much simpler. Clinics and providers sourcing patient data across the history of a patient’s life means more confident diagnostics and treatments. Insurers can develop more accurate risk models, which have the potential to lower premiums. Clinical research is more informed, with access to more data leading to more successful drug developments.
These use cases alone are enough to encourage our government to incentivize standard data formats through legislation and even propose fines of up to $1 million for blocking information. However, challenges remain.
Legacy systems that continue to file insurance claims via fax machines carry a large financial burden to modernize. Legacy software with outdated coding procedures will require upgrades as well as organizational changes to fit new formats. Further, we can expect standards will continue to be refined as we learn more. For example, diagnostic coding may change in order to group new diseases or conditions together based on new research, and our learning will not stop anytime soon. This implies the conversion to an agreed format only solves the problem for today.
Artificial intelligence (AI)—and, more specifically, large language models (LLMs)—have a unique opportunity to smooth the edges of legacy system conversion and the adoption of a standard format.
LLMs have demonstrated enormous prowess in interpreting and contextualizing human language. ChatGPT reached 100 million users within two months of its release, making massive ripples. Prior to that, the gold standard for product adoption was TikTok, which took nine months to reach 100 million users. The technology seems to be a breakthrough, as it (and others) have the potential to influence many industries, including healthcare. AI has already been introduced into diagnostics, and image models have already begun to identify sarcomas in patients.
Language models are able to interpret human language by contextualizing words in semantic space, which is a representation of a word’s meaning. Unlike existing logic, which might map the word “doctor” to “provider” in a simplified format conversion, language models can parse information semantically.
Using this example, this means if a human inputs “Dr.” instead of “Doctor” in an insurance claim form processed by software to convert to a standard format, existing logic breaks. Language models today semantically infer that “Dr.” and “Doctor” represent the same concept, fundamentally widening the margin of error in format conversion and, in turn, interoperability. Legacy data systems require a non-trivial cost to convert, and LLMs are primed for doing so.
Semantic inference, in addition, to efficiently culling data into a standard format, also has great potential to organize and contextualize raw data. In healthcare, raw data is everywhere but is commonly found in doctors’ notes on electronic health records. In my compliance experience, all raw data fields are dropped because the burden of reading and contextualizing all doctors’ notes to ensure there are no HIPAA-violating pieces of information present is too great a cost.
LLMs have the ability to read doctors’ notes in a matter of moments and organize the data to support easier analysis. This applies not just to doctors’ notes but to any raw data that exists such as lab results, clinical notes or any other free text fields—representing a huge unlocked potential. This data has remained dormant due to compliance concerns as well as the financial and time burden of sorting through this data.
LLMs are already beginning to be used in the wild, with tools like Curai and Decoded Health looking to streamline the patient triage process and index clinical documentation, respectively. However, AI in healthcare isn’t without its potential risks. LLMs can have a tendency to hallucinate facts seemingly out of thin air, and there’s not the same certainty of successful execution that traditional software programs have inherently.
A traditional snippet of code is deterministic and can be proved to be correct for a use case, and tests can be written to verify its efficacy. However, in the case of LLMs, testing methodologies are in their infancy, and it may be very difficult to guarantee that a model has completed the job it was assigned to perform with 100% certainty. It’s unclear if this testing limitation is a foundational property of LLMs or just a reflection of our altogether brief progress with the technology so far.
While kinks are still being addressed, it’s apparent the superpower of LLMs is their ability to contextualize human language. Unleashing LLMs to contextualize our raw data formats and help us convert to standard formats for easier interoperability represents a very promising future for healthcare. Successful clinical trials are more likely with richer data, more accurate diagnostics, more equitable insurance pricing and improved patient visibility.
Embracing the semantic inference power of LLMs represents a step-function improvement over existing strategies for interoperability. Systems of data that are interoperable represent a large improvement to patients, providers, insurers and researchers.