Building an old Occitan corpus via cross-Language transfer

Olga Scrivner, Sandra Kübler; Proceedings of KONVENS 2012 (LThist 2012 workshop), pp. 392-400, September 2012.


This paper describes the implementation of a resource-light approach, cross-language transfer, to build and annotate a historical corpus for Old Occitan. Our approach transfers morpho-syntactic and syntactic annotation from resource-rich source languages, Old French and Catalan, to a genetically related target language, Old Occitan. The present corpus consists of three sub-corpora in XML format: 1) raw text; 2) part-of-speech tagged text; and 3) syntactically annotated text.

[pdf] [bibtex]