NLP workflow for on-line definition extraction from English and Slovene text corpora

Senja Pollak, Anže Vavpetič, Janez Kranjc, Nada Lavrač, Špela Vintar; Proceedings of KONVENS 2012 (Main track: oral presentations), pp. 53-60, September 2012.


Definition extraction is an emerging field of NLP research. This paper presents an innovative information extraction workflow aimed to extract definition candidates from domain-specific corpora, using morphosyntactic patterns, automatic terminology recognition and semantic tagging with wordnet senses. The workflow, implemented in a novel service-oriented workflow environment ClowdFlows, was applied to the task of definition extraction from two corpora of academic papers in the domain of Computational Linguistics, one in Slovene and another in English. The definition extraction workflow is available on-line, therefore it can be reused for definition extraction from other corpora and is easily adaptable to other languages provided that the needed language specific workflow components were accessible as public services on the web.

