Across the life sciences industry, machine learning is a powerful tool for analyzing patient data and improving the precision and efficiency of evidence generation. By addressing the limitations of real world data (RWD), machine learning can often enhance the capabilities of RWD to identify diseases earlier, trigger timely interventions, and recommend screenings and physician referrals. Our blog series, How Life Sciences is Leveraging Machine Learning: 5 Use Cases in 5 Minutes, explores five recent studies that used machine learning to analyze RWD.
This edition of 5 in 5 includes recent examples of identifying novel subtypes of patients with chronic kidney disease, classifying colorectal cancer stages in claims data, creating a framework handling missing data, developing an algorithm for the early identification of pulmonary arterial hypertension, and predicting the likelihood of complications resulting from mechanical ventilation. Each of these use cases shows how powerful combining machine learning with RWD can be in uncovering and understanding data to enhance patient outcomes.
- AstraZeneca & University College London used unsupervised machine learning to identify novel subtypes of patients with incident and prevalent chronic kidney disease. Five distinct subtypes were identified with relevance to etiology, treatment, and all-cause mortality/hospital admission risk. The newly identified subtypes, which are based on a broader set of factors compared to existing classification systems, could improve the prediction of outcomes and inform effective interventions.
- Weill Cornell Medicine & University of Pennsylvania developed machine learning models that effectively classified colorectal cancer stage, a data point commonly missing from real-world data sources, using only variables available in administrative claims data (demographics, diagnosis codes, and treatment utilization). The models were trained using oncologic stage information from the SEER-Medicare registry and could improve claims-based HEOR analyses by allowing for risk-adjustment or outcome stratification by stage.
- Flatiron, Roche & Genentech present a framework for handling missing data, a common limitation in RWD, with machine learning-based diagnostics. The systematic approach uses random forest classifiers to determine whether model-based imputation is appropriate for a given data set. The authors apply the framework to two real-world oncology datasets, providing a methodological guide for other researchers to implement.
- Janssen used claims data to develop a machine learning algorithm for the early identification of pulmonary arterial hypertension (PAH). The analysis explores the factors that distinguish PAH patients six months pre-diagnosis from non-PAH controls. The study illustrates how routine claims data at the population level may be used to identify those that could benefit from PAH-screening and/or early specialist referral.
- Johns Hopkins University created a statistical model based on electronic health record and physiologic vital data to predict the likelihood of complications resulting from mechanical ventilation. The machine learning-based algorithm could be used to score patients in the ICU in order to trigger immediate interventions and triage in a timely manner.
Machine learning has several powerful and effective use cases in healthcare analytics. Panalgo’s IHD Data Science module is a powerful machine learning analytics tool built on our self-service, point-and-click platform, that allows users to uncover new insights and produce more accurate prediction and segmentation models, similar to the analyses outlined above. If you would like to learn more about how you can leverage machine learning analytics with the IHD Data Science module, contact us today.