From semi-automatic to automatic affix extraction in Middle English corpora: Building a sustainable database for analyzing derivational morphology over time

Hagen Peukert; Proceedings of KONVENS 2012 (LThist 2012 workshop), pp. 415-423, September 2012.


The annotation of large corpora is usually restricted to syntactic structure and word class. Pure lexical information and information on the structure of words are stored in specialized dictionaries (Baayen et al., 1995). Both data structures – dictionary and text corpus – can be matched to get e.g. a distribution of certain (restricted) lexical information from a text. This procedure works fine for synchronic corpora. What is missing, however, is either a special mark-up in texts linking each of the items to a certain time or a diachronic lexical database that allows for the matching of the items over time. In what follows, we take the latter approach and present a tool set (MoreXtractor, Morphilizer, Mor-Query), a database (Morphilo-DB) and the architecture of a platform (Morphorm) for a sustainable use of diachronic linguistic data for Middle English, Early Modern English and Modern English.

