High-quality translation of clinical Dutch to English
In this technical report, the training procedure of our Transformer-based Dutch-to-English translation model will be outlined. This model was trained exclusively using freely available datasets, and yet achieves a performance level superior to the current state of the art in the domain.
We show this by comparing the precision and recall of MetaMap’s medical concepts extraction when executed on a set of Dutch clinical notes after translation to English, as well as by using more conventional quality metrics for translation models (BLEU…).
Clinical notes remain by far the best source of information available to physicians when taking decisions for their patients, and it is therefore no surprise that the harnessing of their power is an active area of development in the NLP world.
Most of the state of the art in the domain however focuses on a select few languages, among which English is prominent. To harness these recent models, translation is often the preferred approach; but as recent studies discovered, even state of the art translation models perform unsatisfactorily when it comes to medical text, where precision is of extreme importance.
Our model is trained using HuggingFace Transformers, based on the HelsinkiNLP/opus-mt-nl-en model as a starting point, and finetuning happens on a dataset created specifically for this purpose. We achieve both a better MetaMap entity detection recall score and a better BLEU score than Google Translate, Microsoft Translator and DeepL Translate at the time of the study.
More details in the slides I presented today at CLIN31.