By Léon Van Wouwe, Clinical Innovation Director, Volv Global
From Signal to Stratification – Reimagining Early CLL Care through Algorithmic Insight
Introduction: the silent burden of early CLL
Chronic lymphocytic leukaemia (CLL) is heterogeneous and often clinically silent until late. Roughly 70–80% of patients are asymptomatic at diagnosis; one-third may never require therapy. [1, 2] Yet in the U.S., over 200,000 people live with CLL, and the disease causes ~4,410 deaths annually. [1] The median age at diagnosis is ~70 years. Because of its indolent nature and low prevalence, U.S. guidelines do not endorse population screening, leaving diagnosis largely “accidental” or delayed. [3]
Delays or missed opportunities in detection mean that many patients present with firmly established disease burden, limiting sensitivity to subtle early signals. For pharmaceutical developers, this latency represents both a challenge and an opportunity: how might we shift detection earlier, stratify risk more precisely, and improve alignment between first-line therapeutic options and our ability to find the majority of patients when they are still presenting with early-stage disease?
The status quo: diagnostic inertia and inefficiencies
Today, most CLL diagnoses arise from incidental lymphocytosis on a CBC or differential, followed by haematology referral and confirmatory flow cytometry (≥ 5 ×10⁹/L clonal B cells, sustained) with immunophenotyping. [1] Patients who present with nonspecific symptoms (fatigue, night sweats, low-grade fevers, recurring infections, lymphadenopathy) may traverse multiple outpatient encounters before evaluation. Because lymphocytosis has many benign causes, clinicians may not act until trends are evident.
This reactive workflow causes delays (weeks to months) and uneven referral patterns, especially when absolute lymphocyte count (ALC) is only modestly elevated or fluctuating. In non-academic settings, molecular and cytogenetic testing may not be readily available or may have long turnaround times, adding friction to early decision-making.
Algorithmic triage: making screening viable in a low-prevalence disease setting
The classic objection to screening in CLL is the low base rate: even a small false-positive rate can overwhelm downstream resources needed to confirm suspicion. But what if we reframed screening as smart triage using an AI-assisted flagging mechanism?
Recent work demonstrates promise. A 12-variable random forest model, built from routine demographic and lab data (age, sex, ALC, WBC, platelet metrics), predicted development of abnormal lymphocytosis associated with CLL/MBL up to five years ahead, achieving AUC ≈ 0.92 (cross-validated AUC ≈ 0.935) and good sensitivity and specificity. [4] While not diagnostic, the model illustrates that latent signals may exist in standard laboratory test series.
Multiple reviews of machine learning (ML) in CLL (20 studies between 2014–2023) show applications in diagnosis, classification, and treatment guidance with reported accuracies from ~83% to near 100 %. Still, most remain proof-of-concept, centre-specific, and not broadly integrated. [5, 6]
If deployed at scale (e.g., embedded in laboratory pipelines or electronic health record (EHR) decision support), such models could flag a limited subset of patients for confirmatory flow cytometry, keeping the number needed to test manageable. For pharma, this unlocks earlier patient capture, better natural history studies, and enriched trial recruitment.
Risk stratification at diagnosis: the current burden
Once a patient is confirmed to have CLL, clinicians order a battery of molecular, cytogenetic, and immunogenetic assays: FISH (del13q, del11q, del17p, trisomy 12), TP53 sequencing, IGHV mutation status, β2-microglobulin, possibly broader NGS panels. These guide prognosis, time-to-first-treatment (TTFT), therapy selection, and in some cases trial eligibility. [1]
But this testing is expensive, logistically burdensome, and not uniform across U.S. practice settings, nor in many other major international healthcare settings. In community centres, access to high-quality molecular diagnostics or fast turnaround may be limited, delaying therapeutic decisions or forcing empirical choices. The redundancy and cost are nontrivial friction in real-world precision care.
How algorithmic risk scoring can lighten the load
Algorithmic risk models (built on routinely collected data) can help in two complementary ways:
- Selective escalation – models can triage which patients merit full molecular workup, sparing low-risk individuals from expensive blanket testing.
- Augmented prognostic scoring – provide a probabilistic estimate of TTFT or need for therapy (e.g., within two years) even before full biomarker data are available. For example, one explainable ML model used only demographics and standard laboratory test results to predict treatment requirement in two years. [7]
In CLL, unsupervised ML clustering of immunophenotype/genetic profiles has also been used to refine risk groups beyond classical staging. One study clustering 2,243 Rai 0–II patients generated novel continuous prognostic relationships missed by standard hierarchical models. [8]
As new first-line targeted therapies have differential efficacy and toxicity profiles, better upfront stratification becomes increasingly valuable. For example, BTK- and BCL2-based regimens now supplant chemoimmunotherapy as first-line, and the presence of unmutated IGHV or TP53 aberrations influences choice. [9]
Vision: a prediction-first paradigm in early CLL
Imagine a diagnostic continuum:
- Routine lab data + patient metadata → AI flagging
- Confirmatory phenotyping only for flagged patients
- Algorithmic risk score (even pre-biomarkers) to guide further testing
- Integrated decision paths: which biomarkers to order, intensity of clinical surveillance, first-line therapy recommendation
Such a paradigm could reduce diagnostic delay, rationalise molecular testing, and enrich pharma pipelines with earlier-stage, better-stratified patients. Volv Global (or similar methodology-platforms) could host modular risk/triage engines, embed explainability and uncertainty quantification, and integrate into EHR/lab systems as decision support.
In concluding this first part, the imperative is clear: closing the diagnostic gap and intelligently stratifying risk at the outset is not only clinically logical – it is foundational to next-generation precision haematology.
About the author
Léon van Wouwe has 20+ years’ global experience in clinical development and operations, uniting data science with pharma and research. He drives cross-functional collaboration to advance innovative treatments.
References
- Shadman, M., 2023.Diagnosis and treatment of chronic lymphocytic leukemia: a review. JAMA, 329(11), pp.918-932. doi:10.1001/jama.2023.1946.
- Yang, X., Zanardo, E., Lejeune, D., De Nigris, E., Sarpong, E., Farooqui, M., & Laliberté, F., 2024.Treatment patterns, healthcare resource utilization, and costs of patients with chronic lymphocytic leukemia or small lymphocytic lymphoma in the US. The Oncologist, 29(3), pp.e360-e371. doi:10.1093/oncolo/oyad324.
- Eichhorst, B., Robak, T., Montserrat, E., Ghia, P., Niemann, C.U., Kater, A.P., Gregor, M., Cymbalista, F., Buske, C. & Hallek, M., on behalf of the ESMO Guidelines Committee, 2021. Chronic lymphocytic leukaemia: ESMO Clinical Practice Guidelines for diagnosis, treatment and follow-up. Annals of Oncology, 32(1), pp.23-33. doi:10.1016/j.annonc.2020.09.019.
- Aoki, J. et al., 2025. Machine learning model predicts abnormal lymphocytosis associated with chronic lymphocytic leukemia. JCO Clinical Cancer Informatics, 9, e2400197. doi:10.1200/CCI-24-00197.
- Al-Agil, M., Patten, P.E.M. & Alhaq, A., 2025. Systematic review of machine learning applications in the early prediction and management of chronic lymphocytic leukaemia. Health Informatics Journal, 31(3), pp.1-25. doi:10.1177/14604582251342178.
- Elhadary, M., Mattar, M., Al Farsi, K., Alshemmari, S., ElSayed, B., Metwalli, O., Elshoeibi, A., Abdelrehim Badr, A., Alshurafa, A. & Yassin, M.A., 2023. Machine learning in CLL. Blood, 142(Suppl 1), p.7185. doi:10.1182/blood-2023-179388.
- Meiseles, A., Paley, D., Ziv, M., Hadid, Y., Rokach, L. & Tadmor, T., 2022. Explainable machine learning for chronic lymphocytic leukemia treatment prediction using only inexpensive tests. Computers in Biology & Medicine, 145, 105490.
- Pozzo, F., Cuturello, F., Villegas Garcia, E., Rossi, F., Degan, M., Nanni, P., Cattarossi, I., Zaina, E., Varaschin, P., Braida, A., … et al., 2023. An unsupervised machine learning method stratifies chronic lymphocytic leukemia patients into novel categories with different risk of early treatment. Biochimica et Biophysica Acta – Molecular Cell Research, 1878(12).
- Stilgenbauer, S., Eichhorst, B., & et al., 2024. Risk-stratification in frontline CLL therapy: Standard of care. Haematology (ASH Publications+), 1, pp.457-462.
Links:
Photo by skynesher on iStock.









