Evaluating a post-editing approach for handwriting transcription

Verónica Romero, Joan Andreu Sánchez, Nicolás Serrano, Enrique Vidal; Proceedings of KONVENS 2012 (LThist 2012 workshop), pp. 357-364, September 2012.


Marriage license books are documents that were used for centuries by ecclesiastical institutions to register marriage licenses. These books, that were handwritten until the beginning of the 20th century, have interesting information, useful for demography studies and genealogical research. This information is usually collected by expert demographers that devote a lot of time to manually transcribe them. As the accuracy of automatic handwritten text recognizers improves, post-editing the output of these recognizers could be foreseen as a possible alternative. Unluckily, most handwriting recognition techniques require large amounts of annotated images to train the recognition engine. In this paper we carry out a study about how the handwritten recognition system accuracy improves with respect to the amount of training data, and how the human efficiency increases during the transcription of a marriage license book.

