Automating Access to Real-World Evidence

Date:
Authors:
Share publication:

Background

Real-world evidence is important in regulatory and funding decisions. Manual data extraction from electronic health records (EHR) is time-consuming. Automated extraction using natural language processing and artificial intelligence may facilitate this process. We compared manual and automated data collection from EHR of patients with advanced lung cancer.

Methods

Previously, we extracted data using an automated platform from unstructured EHR for ∼1200 patients with advanced lung cancer (diagnosed 01/15-05/18 at a major cancer centre). For comparison, 100 of 333 patients that received systemic therapy were randomly selected and clinical data manually extracted by 2 trained abstractors using the same variable definitions, including patient, disease characteristics and treatment. All cases were re-reviewed by an expert adjudicator. Accuracy and concordance between automated and manual methods are reported.

Results

Automated extraction required significantly less time (<72 hours) than manual extraction (225 person-hours). Collection of demographic data (age, sex, diagnosis) was highly accurate and concordant with both methods, (96-100%). Accuracy and concordance were lower for unstructured data elements in EHR, such as ECOG performance status, date of stage IV diagnosis and smoking status (automated accuracy: 94%, 93%, 88% respectively; manual accuracy: 83%, 78% and 94%). Detection of biomarker testing was highly accurate and concordant (96-98%), although detection of final results was more variable (accuracy 84-100%, concordance 84-99%). Automated extraction identified metastatic sites more accurately than manual (concordance 70-99%), with the exception of lymph node metastasis (automated 66%, manual 92%, concordance 58%), due to use of analogous terms in radiology reports not included in the gold standard definition. Concurrent medications (86-100%) and comorbid conditions (96-100%), were reported with high accuracy and concordance. Treatment details were also accurately captured with both methods (84-100%) and highly concordant (83-99%).

Conclusions

Automated data abstraction from unstructured EHR is highly accurate and faster than manual abstraction. Key challenges include poorly structured EHR and use of analogous terms beyond the gold standard definition.