A comparison of collocation-based similarity measures in query expansion

Cited 45 time in webofscience Cited 0 time in scopus
  • Hit : 417
  • Download : 0
In this paper, we present a comparison of collocation-based similarity measures: Jaccard, Dice and Cosine similarity measures for the proper selection of additional search terms in query expansion. In addition, we consider two more similarity measures. average conditional probability (ACP) and normalized mutual information (NMI). ACP is the mean value of two conditional probabilities between a query term and an additional search term. NMI is a normalized value of the two terms' mutual information, All these similarity measures are the functions of any two terms' frequencies and the collocation frequency, but are different in the methods of measurement. The selected measure changes the order of additional search terms and their weights, hence has a strong influence on the retrieval performance. In our experiments of query expansion using these five similarity measures, the additional search terms of Jaccard, Dice and Cosine similarity measures include more frequent terms with lower similarity values than ACP or NMI. In overall assessments of query expansion, the Jaccard, Dice and Cosine similarity measures are better than ACP and NMI in terms of retrieval effectiveness, whereas, NMI and ACP are better in terms of execution efficiency. (C) 1999 Elsevier Science Ltd. All rights reserved.
Publisher
PERGAMON-ELSEVIER SCIENCE LTD
Issue Date
1999-01
Language
English
Article Type
Article
Keywords

INFORMATION-RETRIEVAL; DOCUMENT

Citation

INFORMATION PROCESSING MANAGEMENT, v.35, no.1, pp.19 - 30

ISSN
0306-4573
URI
http://hdl.handle.net/10203/77471
Appears in Collection
CS-Journal Papers(저널논문)
Files in This Item
There are no files associated with this item.
This item is cited by other documents in WoS
⊙ Detail Information in WoSⓡ Click to see webofscience_button
⊙ Cited 45 items in WoS Click to see citing articles in records_button

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0