Our website uses cookies and other similar tools. We also analyze anonymized web traffic. You can choose your cookie preferences below. You may choose only necessary cookies, specific cookies or all cookies. Read more in our privacy policy
Home > All articles > Text mining helped analyse a million entries on smoking
Text mining helped analyse a million entries on smoking
Text mining decreased the work of clinical experts and accelerated the research process.
On the basis of real-world evidence, Medaffcon wanted to find out how smoking affects postoperative surgical complications. Responses to this question were sought in academic cooperation by studying the clinical notes of patients after surgery. Physicians enter a variety of information in the clinical notes, including whether the patient smokes.
However, smoking data is not found in the clinical notes as structured data. It means that there is no separate field in the health records where the physician could enter whether or not the patient smokes. Instead, physicians enter data on the patient’s smoking in free form within the wider patient records. There is no uniform manner to record it, which means that there is a huge amount of different types of formulations.
Million sentences on smoking
When the material was reviewed, a total of million different smoking-related sentences were found. How to review a million sentences and analyse them? There are different possibilities. One possibility is to recruit a huge number of clinical experts to analyse the data. Another is to limit the material drastically in order to reduce the workload. However, neither of these was used, but Medaffcon developed a classifier based on machine-learning to help analyse the material. In order to teach it, clinical experts classified a total of 20 000 smoking-related sentences. This work was conducted in one day by two clinical experts supported by Medaffcon’s pre-processing and helpful tools. After this, the remaining million sentences were analysed and categorised with the help of an algorithm based on machine learning.
– Without the algorithm, this sort of analysis would have been impossible to conduct. The number of patients would be completely different. Previously, it was possible to include thousands of patients in similar studies, now hundreds of thousands, says Medaffcon’s Data Scientist Juhani Aakko.
The quality of entries is essential
This sort of text mining happens every day in the processing of unstructured data, enabling the use of extensive material. Juhani Aakko estimates that the use of different algorithms based on machine learning will increase in the analysis of healthcare data.
– The use of machine learning methods is limited by the fact that they would require an enormous amount of data to be taught. Healthcare has data, but clinicians should review large amounts of it in order to teach the algorithm.
No matter how sophisticated data analysis methods would become, one old fundamental remains, which is the good quality of entries.
– We can only assess data that has been entered. It would be good to pay attention to the quality of entries and to harmonised principles of entering data. Hopefully, healthcare gains new manners to facilitate entering data, picking up some of the data into structured format already when entering them, says Aakko.
Iiro joined Medaffcon in March 2017 as a Biostatistician. For the preceding four years, he has worked as a research assistant in an academic study group, analyzing clinical and genetic patient data. Iiro holds a Master of Science degree in Technology in Bioinformation Technology.
Iiro’s strengths include his strong expertise in statistics and data-analysis, hands-on experience in working with sensitive patient data, and strong interdisciplinary communication skills with experts from various fields. In the field, he is particularly interested in the large data amounts made available with the revolution of technology and how the information received such data can potentially be utilized to draw concrete conclusions, both in order to understand the nature of diseases and to advance the goals of the pharmaceutical industry and patient treatment.
“Machine learning and AI-based solutions will have a major impact on the healthcare sector now and in the future. However, effectively utilizing the already collected and available health-data will have a higher importance in order to improve health-care”.
Juhani joined Medaffcon in October 2020 as a data scientist. Prior to joining Medaffcon, Juhani has worked as a data scientist in a global IT company as well as a scientist at the University of Turku in the Medical Bioinformatics Centre (MBC) and Functional Foods Forum (FFF). Juhani holds a Doctor of Science in Technology degree (2017) and the topic of his thesis was the development of human gut microbiota in early infancy.
Juhani has experience from applying statistical and machine learning methods in medicine and due to his multidisciplinary background, he can easily communicate with people with varied expertise ranging from clinicians to IT-professionals. “Knowledge management and business intelligence have become hot topics also in the social and healthcare sectors. It is very interesting to be involved in harnessing the vast amounts of data available in the systems to actual usable information to support decision making. Both traditional statistics as well as advanced analytics and artificial intelligence will be in a key role in this job.”