Real-world data on diffuse large B-cell lymphoma (DLBCL) has remained incomplete. In Finland, electronic health record (EHR) data of patients treated at special (secondary/tertiary) health care is accessible via data lake technology in a few Finnish hospital districts. The data lake technology enables automatic assessment of large EHR data sets in a protected data environment, thus facilitating the secondary use of EHR data. However, the content and coverage of DLBCL-related EHR data available in Finnish data lakes are not fully known.
The aim of the study was to evaluate current recording practices in DLBCL and the usability of respective data lake data of the Hospital District of Southwest Finland (HDSF) in the automatic assessment of the characteristics, immunochemotherapy (ICT) treatments, and outcomes of DLBCL patients. The study was carried out in a collaboration with Medaffcon, Roche Oy, and Auria Clinical Informatics and included 587 adult patients diagnosed from January 1, 2010, through March 31, 2019.
Analysis of patient characteristics was based on both structurally available and text-mined data. The analysis was partly incomplete due to limited data content/availability and coverage. For example, data on stage, International Prognostic Index (IPI), and cell of origin were available for 63.0, 68.3 and 28.4% of patients, respectively. The coverage of the data was moderate throughout the study period. Aetiology and genetic aberrations were difficult to determine because the data was not sufficient or structurally available, or not possible to extract without a manual chart review.
Algorithmic creation of first four ICT treatment lines and assessment of associated outcomes were feasible. 454 (77.3%) patients had records on ICT drugs. The annual proportion of patients with records on ICT increased during the study period. Patients who had no records on ICT had also less favourable survival (5.6 months compared to that of patients with records on ICT, 106.6 months) and had more often missing data on clinical characteristics. Moreover, the overall survival of DLBCL patients from the beginning of the last detected ICT line decreased across the treatments. These data highlight that a substantial fraction of DLBCL patients does not benefit from standard treatments, yet it is not possible to characterise all the patients with inferior survival.
The study confirmed that the assessment of Finnish data lake data by algorithmic/automatic approaches represents an efficient way to analyse large DLBCL data sets, but recording practices on patient characteristics, treatments, and e.g., response rates at routine clinical care should be developed in Finland towards structural availability and more uniform statement practise to facilitate a secondary use of RWE through data lake platforms.