Aggregating skip bigrams into key phrase-based vector space model for web person disambiguation
Jian Xu, Qin Lu, Zhengzhong Liu; Proceedings of KONVENS 2012 (Main track: oral presentations), pp. 108-117, September 2012.
Web Person Disambiguation (WPD) is often done through clustering of web documents to identify the different namesakes for a given name. This paper presents a clustering algorithm using key phrases as the basic feature. However, key phrases are used in two different forms to represent the document as well context information surround the name mentions in a document. In using the vector space model, key phrases extracted from the documents are used as document representation. Context information of name mentions is represented by skip bigrams of the key phrase sequences surrounding the name mentions. The two components are then aggregated into the vector space model for clustering Experiments on the $WePS2$ datasets show that the proposed approach achieved comparable results with the top 1 system. It indicates that key phrases can be a very effective feature for WPD both at the document level and at the sentential level near the name mentions.