Data-driven knowledge extraction for the food domain

Michael Wiegand, Benjamin Roth, Dietrich Klakow; Proceedings of KONVENS 2012 (Main track: oral presentations), pp. 21-29, September 2012.


In this paper, we examine methods to automatically extract domain-specific knowledge from the food domain from unlabeled natural language text. We employ different extraction methods ranging from surface patterns to co-occurrence measures applied on different parts of a document. We show that the effectiveness of a particular method depends very much on the relation type considered and that there is no single method that works equally well for every relation type. We also examine a combination of extraction methods and also consider relationships between different relation types. The extraction methods are applied both on a domain-specific corpus and the domain-independent factual knowledge base Wikipedia. Moreover, we examine an open-domain lexical ontology for suitability.

[pdf] [bibtex]