Application of AI and data science to extract value from Real-World Data (RWD)

An introduction to the application of AI technology in the generation of insights from real-world health data sources

Medicines Discovery Catapult

In medicine, Real World Evidence (RWE) is derived from Real-World Data (RWD) and is the evidence regarding the clinical use and potential benefits or risks of a medical product. Essentially RWD can mean any data that is collected outside of a clinical trial and can relate to the health of the patient and/or the delivery of health care. RWD can be collected in a number of different ways for example through different medical claims databases or disease registries and also directly from the patient, for example, through Electronic Healthcare Records (EHRs) or generated directly by the patient for example via mobile devices.

The volume of digital healthcare data being generated is increasing rapidly, particularly as devices are becoming increasingly more complex and mechanism to gather data become more accessible and sophisticated.  Furthermore, advancements in computing power and storage mean researchers are no longer as constrained by what is achievable and as a result now is an opportune time for powerful Artificial Intelligence (AI) approaches to be applied to these data with the ultimate aim of providing valuable insights for patient benefit.

The application of AI to RWD and subsequent generation of RWE has huge potential in the context of drug development, for example insights into disease risk, identification of optimal patient treatment pathways, , adherence monitoring and understanding patient behaviour.  All of which have huge potential impact for clinical trials, drug development and importantly patients.

Two elements of AI stand out as being particularly beneficial for handling RWD; Natural Language Processing (NLP) and Machine Learning (ML).  A large portion of RWD is unstructured, for example in the form of clinician notes, patient diary entries or even social media and as a result especially challenging to work with.  NLP approaches enable effective retrieval of information from these sources, useful for a number of different applications including the preparation of data for an ML algorithm to predict outcomes or signals.

ML are computer algorithms that can build mathematical models based on a set of training data in order to make predictions on unseen data (test data) without being programmed explicitly. Depending on the application and the data available, a supervised (where the desired output is known) or unsupervised (where the desired output is not known) approach may be used and within these categories there are different types of ML models or approaches ranging from simple to complex.  The category and model employed in a ML approach is dependent upon the problem, data and constraints.

As an example, within PHASTAR we applied NLP and ML on a large critical care dataset,  MIMIC III (Medical Information Mart for Intensive Care).  This is a large, single-centre database which contains rich information relating to patients that were admitted to critical care units at a large tertiary hospital.  In this example, we looked at whether we could predict readmission to the critical care unit using the patient’s discharge notes; notes which consist of unstructured blocks of text. Through the application of simple NLP methods, key information from the clinician notes were extracted and used as input into an ML algorithm. This approach generated predictive models of hospital readmission and provided initial insights into the key aspects of the clinician notes that were important to that prediction.

There are potentially huge potential benefits for drug development and healthcare from the utilisation of RWD and RWE and at PHASTAR we would welcome the opportunity to work with you to understand how RWD and RWE may benefit your business.

About the author

Jennifer Bradford, PhD is Head of Data Science for the global CRO PHASTAR.  She previously worked for the Advanced Analytics Group at AstraZeneca, leading the development of the REACT clinical trial monitoring tool, which she later customized and delivered to other sponsors as part of Cancer Research UK (CRUK).  Within CRUK and in close collaboration with the Christie hospital she worked on electronic data capture, app development and wearables data analytics in the context of clinical trials. She has a degree in Biomedical Sciences from Keele University and a bioinformatics Masters and PhD from Leeds University.