DeepCurate

14.04.2020

A Framework for Semi-Automatic Bio-Curation for the SABIO-RK Database

In DeepCurate, a framework is to be developed that supports human experts in the curation of biomedical databases – i.e. in the transfer of life science research results from the specialist literature into a structured, machine-readable form. The starting point is the state-of-the-art workflow developed in the SABIO-RK project. While maintaining or improving the quality of the curated data, the system should reduce the cognitive load on the experts, automate trivial subtasks (such as search) and thereby increase the effectiveness and efficiency of manual curation. DeepCurate uses deep learning methods for which text, image and eye movement data are integrated for the first time.

Experimental data from biochemical reactions and their reaction kinetic properties are of great importance for research and development in the fields of biotechnology, medical treatment methods or diagnostics. Most of this data is published in conventional specialist literature, where it is optimized for human reading and is not or only weakly structured (e.g. in tabular form). Presently existing methods of automatic language processing (NLP) do not yet have the robustness, coverage and effectiveness to extract this data in the required quality. In order to make the data available in biomedical databases, the current practice in the Bio-DB community is the completely manual extraction and curation by human experts. SABIO-RK is a curated database on biochemical reactions and their reaction kinetic properties.

DeepCurate is a collaborative project between the NLP and SDBV groups at HITS as part of the HITS Lab initiative. The project is funded by BMBF for three years starting in January 2020.