Unlocking the Power of EMR Data for Earlier Gastric Cancer Diagnosis and Risk Prediction

By Léon Van Wouwe, Clinical Innovation Director, Volv Global Gastric (stomach) cancer remains a formidable global health challenge, ranking as the fifth most common cancer and a leading cause of cancer-related mortality. [1] Alarmingly, its incidence is rising in patients under 50 years old, alongside other gastrointestinal malignancies. In 2022 alone, nearly one million new cases were diagnosed, leading to approximately 660,000 deaths worldwide. One particularly aggressive form, gastroesophageal junction (GEJ) cancer, spans the critical connection between the esophagus and the stomach, further complicating detection and treatment. As with all cancers, early diagnosis is critical for optimizing treatment outcomes. Yet, gastric cancer often remains undetected until later stages, limiting therapeutic options and survival rates. Could primary care electronic medical records (EMR) data hold the key to shifting this paradigm? The Power of EMR Data in Earlier Cancer Detection Our recent research on gastroenteropancreatic neuroendocrine tumours (GEP-NETs), a rare type of gut cancer, demonstrates how EMR-driven analytics can uncover hidden diagnostic delays. Leveraging UK primary care data, we identified that undiagnosed patients are, on average, 5 to 7 years younger than those already diagnosed—highlighting a significant opportunity to intervene earlier. [2] Addressing Gastric Cancer Recurrence and Outcome Prediction For gastric cancer, the stakes are even higher. Despite curative surgery and neoadjuvant/adjuvant chemotherapy, recurrence is common. One in four patients experiences disease recurrence within a year post-surgery, and survival beyond two years remains a challenge. The five-year survival rate remains dismally low, with fewer than half of patients alive at this milestone. [3, 4, 5] Beyond early detection, advanced risk prediction models leveraging EMR data could refine patient stratification and enhance personalized treatment decisions. By integrating EMR-driven insights, we can better predict recurrence risk and tailor therapeutic regimens accordingly. Notably, Imfinzi-based regimens have already demonstrated statistically significant and clinically meaningful improvements in event-free survival for resectable early-stage gastric and GEJ cancers, underscoring the potential of precision medicine approaches. [6] A Call to Action: Innovating in Gastric Cancer Drug Development The integration of EMR data into drug development and commercialization strategies presents an immense opportunity to revolutionize gastric cancer management. Pharmaceutical innovators and executives, are you ready to explore how real-world data can drive earlier diagnosis, improve risk modelling, and ultimately enhance patient outcomes? Let’s connect to discuss how advanced analytics and AI-driven EMR insights can shape the future of gastric cancer therapeutics. Looking forward to your thoughts in the comments or via direct conversation. This article was originally published on LinkedIn: Unlocking the Power of EMR Data for Earlier Gastric Cancer Diagnosis and Risk Prediction About the author Léon van Wouwe has 20+ years’ global experience in clinical development and operations, uniting data science with pharma and research. He drives cross-functional collaboration to advance innovative treatments. References World Health Organization . International Agency for Research on Cancer. Stomach Fact Sheet. Available at: https://gco.iarc.who.int/media/globocan/factsheets/cancers/7-stomach-fact-sheet.pdf. Accessed March 2025. Volv Global SA , project results. For information contact www.volv.global Li Y, et al. Postoperative recurrence of gastric cancer depends on whether the chemotherapy cycle was more than 9 cycles. Medicine. 2022;101(5):e28620. Ilic M, Ilic I. Epidemiology of stomach cancer. World J Gastroenterol. 2022;28(12):1187-1203. Al-Batran SE, et al. Perioperative chemotherapy with fluorouracil plus leucovorin, oxaliplatin, and docetaxel versus fluorouracil or capecitabine plus cisplatin and epirubicin for locally advanced, resectable gastric or gastro-oesophageal junction adenocarcinoma (FLOT4): a randomised, phase 2/3 trial. Lancet. 2019;393(10184):1948-1957. AstraZeneca : Imfinzi-based regimen demonstrated statistically significant and clinically meaningful improvement in event-free survival in resectable early-stage gastric and gastroesophageal junction cancers Header photo by mohamad azaam on Unsplash
Poster: Alpha-1 Antitrypsin Deficiency: Why millions remain undiagnosed

Alpha-1 antitrypsin deficiency (AATD), a rare genetic condition, can cause lung disease in adults with symptoms similar to chronic obstructive pulmonary diseases. AATD is largely underdiagnosed, with an estimated prevalence of 100,000 individuals with AATD in the United States (US); however, fewer than 10,000 individuals are diagnosed with the disorder. Previously, AATD was thought to affect only White individuals of European descent. Recent studies have shown that people of different races and ethnicities have genotypes consistent with those with moderate-to-severe AATD-related lung disease. We developed a prediction model to identify symptomatic patients of different races and ethnicities with likely risk of AATD using claims data from a large US database. This poster was developed together with Takeda and presented at the American Thoracic Society International Conference 2024.
Case Study: Detecting signs of Fabry and Pompe disease in UK clinical data

Volv, supported by Sanofi, and working with Optimum Patient Care, and collaborating with a specialist Consultant Clinician, is performing research to build algorithms in the UK which are aimed at finding ways to better identify people living with Fabry or Pompe disease. This novel and innovative methodology, inTrigue, is highlighting ways in which we can be much more precise in detecting people living with either disease much earlier. Are you a Fabry or Pompe specialist in the UK and want to know more, or collaborate? Please contact us. inTrigue: helping people living with disease get better outcomes In the sections below you will find an overview of how we create models to help predict which people might be at risk of disease, some of the current performance metrics, and also some background information on both Fabry and Pompe disease. By using the inTrigue methodology in collaboration with Optimum Patient Care (OPC) in the UK and the OPC Research Database and supported by Sanofi, we are learning novel patterns of disease, we do this because using published medical criteria does not help find the patients that remain undiagnosed and in fact highlights many more patients that do not in fact have disease (false positives). The inTrigue approach looks for people that cannot be found using those methods. inTrigue is designed to help clinicians detect the people who are living with a rare or difficult-to-diagnose disease and help uncover those people who are therefore otherwise unlikely to get a diagnosis. Importantly, this is a research project that focusses on a limited population at first works with a population of clinicians that have signed up for the OPC quality improvement (QI) programme to improve the quality of care for patients in general practice aims to use the feedback from clinicians to improve the approach This is a completely different level of performance that promises to reduce the time to a diagnosis, and also importantly, uncover the undiagnosed patients. OPC quality improvement (QI) programme: (https://www.primescholars.com/articles/strategies-that-promote-sustainability-in-quality-improvement-activities-for-chronic-disease-management-in-healthcare-se-100520.html) Volv, Sanofi and OPC: collaborating for people living with disease Volv, supported by Sanofi, and leveraging the data from OPC in the UK, is creating a unique collaboration that does not stop here. Introduction The first phase of this project was to collaborate to build new types of models for two rare diseases: Fabry and Pompe. To do this, we focussed on primary health care records, i.e. the records that general practitioners use. Both diseases are difficult to diagnose for primary care clinicians, and as a result, remain underdiagnosed. For Pompe disease in the UK, it is estimated that 50% of people with the disease are not being diagnosed, leading to a longer delay until they eventually do get diagnosed. This data is managed by Optimum Patient Care, which provides de-identified data, of around 8.5 million patient records, for research purposes. Data security and protection are paramount. This means that the data remains anonymous and secure during the disease model development process. The data complies with: GDPR/ DPA 2018 compliant Secured EHR data extraction Data is de-identified (no PID) Data is pseudonymised SHA256 Secure data encryption AES256 Secure data transfer via HSCN NHS DSP Toolkit (ref: 8HR5) Non-identifiable data is contributed to OPCCRD for ethically approved research NHS IHRA REC (ref: 20/EM/0148) Phase 1: Learn an algorithm/model for the diseases and validate with expert clinicians The first phase of the inTrigue methodology involved an iterative process of finding a way to determine what makes patients with Fabry and Pompe disease stand out from all other patients. We used a combination of data science (or AI) approaches to get to a list of patients that plausibly have a disease. Within this phase, crucially and differentiatingly, we also needed to validate whether the approach has worked by checking the inTrigue results with an expert clinician. We did this with a consultant in a specialist Fabry and Pompe department in a UK teaching hospital. The results of this evaluation can be seen in the results section. Once the clinician’s validation was complete, we then take those inputs and optimise the algorithm, which will again boost the performance. Once this is done, we are ready to move to Phase 2. Phase 2: Clinical follow-up on plausible patients, more accurately and earlier In this second phase, the algorithm is applied to the data, and clinicians are asked if they want to participate in the model deployment programme. The clinicians need to give their consent to be part of this quality improvement programme. Several QI programmes are already in place and if they agree, they can then check to see if any of the patients in their practice are at risk of these diseases. This is done through the remote installation of reports in the GP system. We can then monitor to see if there is an improvement in terms of quality of clinical care. More results on this aspect of the deployment of the models will be published at a later stage, but the optimisation steps post clinician validation shows significant improvement on these results presented here. Later phases After this programme, consideration is being given to deploying the models more widely by embedding them into GP systems nationwide. Initial metrics on model performance Model performance: Fabry disease in UK Task Use model learned via Algorithm SLSL to find undiagnosed FD patients in OPCRD EHR database GP-EHR-DB-UK (18M patients). Evaluation procedure Request that FD specialist practicing in UK review EHRs of top 50 candidate patients (candidates have predicted probabilities exceeding FD threshold FD). Evaluation outcome Results are very promising showing that out of 50 patients the top 25 have a precision of 88%, and when the total 50 patients are considered the precision remains high at 76% using the precision@k metric. Model performance: Pompe disease in UK Task Use model learned via Algorithm SLSL to find undiagnosed PD patients in OPCRD EHR database GP-EHR-DB-UK (18M patients). Evaluation procedure Request that PD specialist practicing in UK review
White Paper: The Path to Rare Disease Clinical Trial Innovation

By Volv Global SA and WODC EU contributors Executive Summary For decades, the pharmaceutical industry has faced the same recurring problems with clinical development: the struggle to fully recruit and retain enough patients, meet target timelines, and have trials conclude on time. Certainly, the industry does overestimate its ability to recruit, but a bigger issue is that study designs and protocol development seemingly fail to truly reflect patients’ lives, or account for the reality in the clinic. In fact, data shows the probability of success for any clinical development effort is 6.2% for orphan drug trials, compared with 13.8% overall, which translates to a 93.8% failure rate for orphan drug development efforts. Given the often progressive and irreversible nature of rare diseases, there is a need to increase efforts to find those undiagnosed patients, diagnose them earlier, and bring them into the frame when developing new treatment options. To achieve this, collectively as an industry, we must do more research into the rare disease patient population to characterise and better understand both the already diagnosed and the undiagnosed. We need this deeper understanding before deciding on the best clinical development strategy, finalising clinical trial design, and starting the enrolment of the patient population in a clinical study. To do that, clinical researchers and drug developers need to include much more knowledge and understanding of those people who are unknowingly living with the disease in the design of clinical development plans and study protocols. To find those people, there is a need to consult more extensively on the design of protocols, not just with the key opinion leaders, but also with physicians that are typically seeing and treating larger numbers of patients. One crucial factor with rare diseases is that the diagnostic journey is arduous and lengthy, often with many patients not being correctly diagnosed. As an example, a study found that 58% of Ehlers-Danlos syndrome (EDS) patients consulted more than five doctors, and 20% consulted more than 20[i]. So, when designing and recruiting for clinical trials, drug developers must first learn where the “as yet undiagnosed patients” are “hidden” – in other words, where they may be in the healthcare system, and which specialists they are seeing. It is those specialisms that need to be brought along in the diagnostic journey, so they can learn to identify rare disease patients within their practice. This is very well illustrated in the case of acute hepatic porphyria (AHP), where the view is that patients reside in the gastroenterology world, but, in fact, an even larger group is residing in other specialties. Another example is cited in Chapter 2. With novel approaches, such as the use of Machine Learning (ML), we can now highlight people who are not yet diagnosed as patients but are likely to be living with a disease, for their clinicians’ attention. Subtle indicators are derived from health care records by using ML, which would be difficult or nigh impossible for a doctor to recognise amidst the wealth of data already in front of them. Conducting thorough natural history studies of patients living with disease, but also including those wider populations of people suspected of living with disease but currently undiagnosed, can help to uncover sentinel events or detectable physiologic changes that are key predictors of disease progression or that are clinically important. These can provide an understanding of which subgroups of people living with the disease might benefit from a drug in development and should therefore be targeted for inclusion in the clinical trial. And, importantly, clinical researchers need to scrutinise the data and adopt insights gained by using ML models which will enable better clinical development strategy, design, and patient stratification. First, though, we need to understand the barriers and misconceptions about the art of the possible and address those directly. This paper explores the changing expectations of the regulators, the challenges the health industry continues to face, and the ways in which we can rethink the entire clinical development process – from development strategy to protocol design, to patient identification and recruitment – to achieve real breakthroughs in rare disease research and development. Chapter 2: Misconceptions and industry challenges The path to rare disease innovation begins with a better understanding of the complexity of each disease – a point well understood by the health authorities. As the US Federal Food and Drug Administration (FDA) has identified in its guidance on natural history studies, rare diseases can have substantial genotypic and/or phenotypic heterogeneity. As such, the natural history of each subtype, if it exists at all, may be poorly understood or inadequately characterised. Above all, a typical natural history study certainly does not include those people living with the disease that – in rare – often remain undiagnosed. There are two levels of undiagnosed patients: those who have had no diagnosis at all and have therefore not been matched with a disease, and those who have had a partial diagnosis but whose symptoms are not well characterised and therefore do not belong in a defined subgroup. As researchers learn more about rare diseases, they are starting to understand that different phenotypes may present with the involvement of different organ systems, with varying degrees of severity or rate of deterioration. As noted earlier, ML can help to elicit subtle indicators from electronic health records or claims data. However, during panel debates at recent orphan drug conferences, there seemed a strong bias towards the use of registries for research and patient characterisation, and there were clear misconceptions from both industry and regulators about the usability of primary care electronic medical records (or electronic claims data) for the purpose of early disease detection, be it in a traditional manner, or ML assisted. The limitations of registries While disease registries have a clear purpose, they are constrained by the fact that they tend only to contain data on patients that are known to have a given disease. By focusing only on rare disease data that already exists in patient registries, research