Comparison of named entity recognition tools for raw OCR text

Kepa Joseba Rodriquez, Mike Bryant, Tobias Blanke, Magdalena Luszczynska; Proceedings of KONVENS 2012 (LThist 2012 workshop), pp. 410-414, September 2012.


This short paper analyses an experiment comparing the efficacy of several Named Entity Recognition (NER) tools at extracting entities directly from the output of an optical character recognition (OCR) workflow. The authors present how they first created a set of test data, consisting of raw and corrected OCR output manually annotated with people, locations, and organizations. They then ran each of the NER tools against both raw and corrected OCR output, comparing the precision, recall, and F1 score against the manually annotated data.

