Implicit ambiguity resolution using incremental clustering in cross-language information retrieval

This paper presents a method to implicitly resolve ambiguities using dynamic incremental clustering in cross-language information retrieval (CLIR) such as Korean-to-English and Japanese-to-English CLIR. The main objective of this paper shows that document clusters can effectively resolve the ambiguities tremendously increased in translated queries as well as take into account the context of all the terms in a document. In the framework we propose, a query in Korean/Japanese is first translated into English by looking up bilingual dictionaries, then documents are retrieved for the translated query terms based on the vector space retrieval model or the probabilistic retrieval model. For the top-ranked retrieved documents, query-oriented document clusters are incrementally created and the weight of each retrieved document is recalculated by using the clusters. In the experiment based on TREC CLIR test collection, our method achieved 39.41% and 36.79% improvement for translated queries without ambiguity resolution in Korean-to-English CLIR, and 17.89% and 30.46% improvements in Japanese-to-English CLIR, on the vector space retrieval and on the probabilistic retrieval, respectively. Our method achieved 12.30% improvement for all translation queries, compared with blind feedback for the probabilistic retrieval in Korean-to-English CLIR. These results indicate that cluster analysis help to resolve ambiguity. (C) 2003 Elsevier Ltd. All rights reserved.
Publisher
PERGAMON-ELSEVIER SCIENCE LTD
Issue Date
2004-01
Language
ENG
Description

Received 4 April 2002; accepted 3 April 2003. ; Available online 16 October 2003.

Citation

INFORMATION PROCESSING & MANAGEMENT, v.40, pp.145 - 159

ISSN
0306-4573
DOI
10.1016/S0306-4573(03)00028-1
URI
http://hdl.handle.net/10203/3580
Appears in Collection
CS-Journal Papers(저널논문)
  • Hit : 402
  • Download : 1
  • Cited 0 times in thomson ci
This item is cited by other documents in WoS
⊙ Detail Information in WoSⓡClick to seewebofscience_button
⊙ Cited 5 items in WoSClick to see citing articles inrecords_button

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0