Home ITACA : Covid 19 subgroup discovery and exploration tool

COVID-19 Subgroup Discovery and Exploration Tool

A new artificial intelligence system facilitates a reliable prognosis of patients at the time of admission

A team of researchers from the Biomedical Data Science Lab (BDSLab) of the ITACA Institute of the Universitat Politècnica de València (UPV), together with members of the INCLIVA Health Research Institute of the Hospital Clínico Universitario de Valencia and the i+12 Research Institute of the Hospital Universitario 12 de Octubre de Madrid, is developing a clinical decision support system that will offer a robust prognosis for each patient with COVID-19 at the time of admission.

It is a new tool based on artificial intelligence (AI) and machine learning techniques. By combining information on symptoms, comorbidities and laboratory tests, the system allows obtaining a personalized prognosis for each individual and classifying it according to the level of severity that it could reach – for example, if after several days, the affected person may suffer acute respiratory failure, a circumstance in which early treatment would be essential.

Data quality

One of the main challenges for machine learning in the field of COVID-19 is reaching a high level in terms of data quality, a challenge that this tool will help to respond to.

According to Juan Miguel García-Gómez, coordinator of the UPV’s BDSLab-ITACA, machine learning has the potential to help in this task by applying unsupervised and supervised learning techniques to electronic health records (EHR, for its acronym in English) of hospitals.

These techniques allow the extraction of the most significant patterns of the patient’s comorbidity history, symptoms and laboratory tests at the time of admission, and their latest data from the intensive care unit (ICU), facilitating an early stratification of the patient and the prediction of the possible severity of your condition.

However, there is strong evidence that the real world data (RWD) contained in hospital EHRs is far from perfect, limiting the extraction of knowledge by both medical professionals and machines that can help the patient’s diagnosis. Furthermore, the inherent variability of clinical practice and data coding between hospitals, or even among their target populations, can skew any results extracted from the data.

“Therefore,” says Carlos Sáez, postdoctoral researcher at the BDSLab-ITACA of the UPV, “machine learning and AI methods require an evaluation and explanation of data quality (DQ), associated with learning and new predictions, to guarantee correct and pragmatic solutions, and this is what the methodology we have devised contributes to, which will be used for the first time in this tool ”.

SUBCOVERWD-19 project

The UPV team, together with experts from INCLIVA and i+12, the Research Institute of the Hospital Universitario 12 de Octubre de Madrid, are working on this development within the framework of the SUBCOVERWD-19 Project.

Rafael Badenes, doctor of the INCLIVA Anesthesia Research Group, assures that, from a clinical point of view, having AI tools that are capable of predicting, in the early stages of the disease, what the future of the disease will be itself, constitutes a crucial element in the fight against the disease. “In those cases in which greater severity is expected, we could establish treatments earlier, with the ultimate goal of reducing mortality and ICU admissions”, adds Dr. Badenes, head of the Anesthesia section at Hospital Clínico y Universario de Valencia and professor at the University of Valencia (UVEG).

Advanced and highly sophisticated techniques to identify unknown patterns

“The heterogeneity and complexity of COVID-19”, adds Dr. Agustín Gómez de la Cámara, head of the Research and Scientific Support Unit of the Hospital 12 de Octubre, “make it essential to use highly advanced and sophisticated analysis techniques, in order to be able to identify the clinical and epidemiological patterns, still very unknown in this disease. We believe that this project can contribute to achieving this objective ”.

This project, coordinated by Carlos Sáez, researcher at the UPV, has been one of those selected in the call for the SUPERA COVID-19 FUND, promoted by Crue Universidades Españolas, Banco Santander -through Santander Universidades-, and the Superior Council of Scientific Research (CSIC).

The cases of China and the Philippines

“The quality of the data is critical,” says Sáez. “Especially in multi-site data sharing environments, variability between data sources is a potential source of unexpected biases in model learning and subsequent use,” says the project coordinator.

In this sense, in order to discover and classify the severity subgroups of COVID-19 using the nCov2019 data set, recently published in the journal Scientific Data, the UPV’s BDSLab-Itaca team has discovered that the two countries with higher prevalence in such data (China and the Philippines), were divided into separate subgroups with different manifestations of severity.

The work and its results, in the new tool

“The variability of data sources can lead to potential biases for the machine learning process of COVID-19, as well as for the generalization of its results in new patients and locations. It is crucial to take the variability and quality of data into account for a robust and reliable AI ”, concludes Sáez.

This work and its results have been collected in the new COVID-19 Subgroup Discovery and Exploration Tool.

Source: UPV’s Information Office