Named entity recognition: Exploring features

Maksim Tkachenko, Andrey Simanovsky; Proceedings of KONVENS 2012 (Main track: oral presentations), pp. 118-127, September 2012.


We study a comprehensive set of features used in supervised named entity recognition. We explore various combinations of features and compare their impact on recognition performance. We build a conditional random field based system that achieves 91.02% F1-measure on the CoNLL 2003 dataset and 81.4% F1-measure on the OntoNotes version 4 CNN dataset, which, to our knowledge, displays the best results in the state of the art for those benchmarks respectively. We demonstrate statistical significance of the boost of performance over the previous top performing system. We also obtained 74.27% F1-measure on NLPBA 2004 dataset.

[pdf] [bibtex]