ABSTRACT
We investigate interpreting coordinations (e.g. word sequences connected with coordinating conjunctions such as "and" and "or") as logical disjunctions of terms to generate a set of disjunctionfree query variants for information retrieval (IR) queries. In addition, so-called hyphen coordinations are resolved by generating full compound forms and rephrasing the original query, e.g. "rice im-and export" is transformed into "rice import and export". Query variants are then processed separately and retrieval results are merged using a standard data fusion technique. We evaluate the approach on German standard IR benchmarking data. The results show that: i) Our proposed approach to generate compounds from hyphen coordinations produces the correct results for all test topics. ii) Our proposed heuristics to identify coordinations and generate query variants based on shallow natural language processing (NLP) techniques is highly accurate on the topics and does not rely on parsing or part-of-speech tagging. iii) Using query variants to produce multiple retrieval results and merging the results decreases precision at top ranks. However, in combination with blind relevance feedback (BRF), this approach can show significant improvement over the standard BRF baseline using the original queries.
- E. Airio. Word normalization and decompounding in monoand bilingual IR. Inf. Retr., pages 249--271, 2006. Google ScholarDigital Library
- M. Braschler and B. Ripplinger. How effective is stemming and decompounding for German text retrieval? Inf. Retr., 7(3-4):291--316, 2004. Google ScholarDigital Library
- A. Chen and F. C. Gey. Multilingual information retrieval using machine translation, relevance feedback and decompounding. Inf. Retr., 7(1--2):149--182, 2004. Google ScholarDigital Library
- W. B. Croft. Combining approaches to information retrieval. In Advances Information Retrieval: Recent Research from the CIIR, chapter 1, pages 1--36. Kluwer Academic, 2000.Google Scholar
- J. A. Fox and E. A. Shaw. Combination of multiple searches. In TREC-2, pages 243--252, Gaithersburg, MD, 1994. NISTGoogle Scholar
- S. Hartrumpf and J. Leveling. Recursive question decomposition for answering complex geographic questions. In CLEF 2009, volume 6241 of LNCS, pages 310--317. Springer, 2010. Google ScholarDigital Library
- S. Huston and W. B. Croft. Evaluating verbose query processing techniques. In SIGIR 2010, pages 291--298. ACM, 2010. Google ScholarDigital Library
- R. Jones, B. Rey, O. Madani, and W. Greiner. Generating query substitutions. In WWW'06, pages 387--396, 2006. Google ScholarDigital Library
- M. Kluck. The domain-specific track in CLEF 2004: Overview of the results and remarks on the assessment process. In CLEF 2004, volume 3491 of LNCS, pages 260--270. Springer, 2005. Google ScholarDigital Library
- P. Koehn and K. Knight. Empirical methods for compound splitting. In EACL '03, pages 187--193. ACL, 2003 Google ScholarDigital Library
- G. Neumann and J. Piskorski. A shallow text processing core engine. Computational Intelligence, 18(3):451--476, 2002.Google ScholarCross Ref
- S. E. Robertson, S. Walker, S. Jones, M. M. Hancock-Beaulieu, and M. Gatford. Okapi at TREC-3. In TREC-3, pages 109--126, Gaithersburg, MD, 1995. NIST.Google Scholar
- J. Savoy. Report on CLEF-2003 monolingual tracks: fusion of probabilistic models for effective monolingual retrieval. In CLEF 2003, volume 3237 of LNCS, pages 322--336. Springer, 2004.Google ScholarCross Ref
- X. Xue and W. B. Croft. Representing queries as distributions. In Query representation and understanding workshop at SIGIR 2010, pages 9--12, 2010.Google Scholar
Index Terms
- Interpretation of coordinations, compound generation, and result fusion for query variants
Recommendations
Can Generative LLMs Create Query Variants for Test Collections? An Exploratory Study
SIGIR '23: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information RetrievalThis paper explores the utility of a Large Language Model (LLM) to automatically generate queries and query variants from a description of an information need. Given a set of information needs described as backstories, we explore how similar the queries ...
Unsupervised query segmentation using only query logs
WWW '11: Proceedings of the 20th international conference companion on World wide webWe introduce an unsupervised query segmentation scheme that uses query logs as the only resource and can effectively capture the structural units in queries. We believe that Web search queries have a unique syntactic structure which is distinct from ...
Retrievability based Document Selection for Relevance Feedback with Automatically Generated Query Variants
CIKM '20: Proceedings of the 29th ACM International Conference on Information & Knowledge ManagementTo mitigate the problem of over-dependence of a pseudo-relevance feedback algorithm on the top-M document set, we make use of a set of equivalence classes of queries rather than one single query. These query equivalents are automatically constructed ...
Comments