ABSTRACT
Diversified retrieval is a very important problem on many e-commerce sites, e.g. eBay and Amazon. Using IR approaches without optimizing for diversity results in a clutter of redundant items that belong to the same products. Most existing product taxonomies are often too noisy, with overlapping structures and non-uniform granularity, to be used directly in diversified retrieval. To address this problem, we propose a Latent Dirichlet Allocation (LDA) based diversified retrieval approach that selects diverse items based on the hidden user intents. Our approach first discovers the hidden user intents of a query using the LDA model, and then ranks the user intents by making trade-offs between their relevance and information novelty. Finally, it chooses the most representative item for each user intent to display. To evaluate the diversity in the search results on e-commerce sites, we propose a new metric, average satisfaction, measuring user satisfaction with the search results. Through our empirical study on eBay, we show that the LDA model discovers meaningful user intents and the LDA-based approach provides significantly higher user satisfaction than the eBay production ranker and three other diversified retrieval approaches.
- R. Agrawal, S. Gollapudi, A. Halverson, and S. Ieong. Diversifying search results. In WSDM, pages 5--14, 2009. Google ScholarDigital Library
- A. Anagnostopoulos, A. Z. Broder, and D. Carmel. Sampling search engine results. In WWW, pages 245--256, 2005. Google ScholarDigital Library
- D. M. Blei, A. Ng, and M. Jordan. Latent dirichlet allocation. JMLR, 3:993--1022, 2003. Google ScholarDigital Library
- A. Bookstein. Information retrieval: A sequential learning process. Journal of the American Society for Information Science, 34(5):331--342, 1983.Google ScholarCross Ref
- J. Carbonell and J. Goldstein. The use of mmr, diversity-based reranking for reordering documents and reproducing summaries. In SIGIR, 1998. Google ScholarDigital Library
- H. Chen and D. R. Karger. Less is more: probabilistic models for retrieving fewer relevant documents. In SIGIR, pages 429--436, 2006. Google ScholarDigital Library
- C. L. Clarke, M. Kolla, G. V. Cormack, O. Vechtomova, A. Ashkan, S. Büttcher, and I. MacKinnon. Novelty and diversity in information retrieval evaluation. In SIGIR, pages 659--666, 2008. Google ScholarDigital Library
- A. Dubey, S. Chakrabarti, and C. Bhattacharyya. Diversity in ranking via resistive graph centers. In KDD, pages 78--86, 2011. Google ScholarDigital Library
- S. Gollapudi and A. Sharma. An axiomatic approach for result diversification. In WWW, pages 381--390, 2009. Google ScholarDigital Library
- T. L. Griffiths and M. Steyvers. Finding scientific topics. PNAS, 101:5228--5235, 2004.Google ScholarCross Ref
- S. Guo and S. Sanner. Probabilistic latent maximal marginal relevance. In SIGIR, pages 833--834, 2010. Google ScholarDigital Library
- M. Hoffman, D. M. Blei, and F. Bach. Online learning for latent dirichlet allocation. In NIPS, pages 856--864, 2010.Google ScholarDigital Library
- L. Li, K. Zhou, G.-R. Xue, H. Zha, and Y. Yu. Enhancing diversity, coverage and balance for summarization through structure learning. In WWW, pages 71--80, 2009. Google ScholarDigital Library
- M. R. McLaughlin and J. L. Herlocker. A collaborative filtering algorithm and evaluation metric that accurately model the user experience. In SIGIR, pages 329--336, 2004. Google ScholarDigital Library
- Q. Mei, J. Guo, and D. Radev. Divrank: the interplay of prestige and diversity in information networks. In KDD, pages 1009--1018, 2010. Google ScholarDigital Library
- D. P. Putthividhya. ebay's internal report. 2011.Google Scholar
- F. Radlinski and S. Dumais. Improving personalized web search using result diversification. In SIGIR, pages 691--692, 2006. Google ScholarDigital Library
- F. Radlinski, R. Kleinberg, and T. Joachims. Learning diverse rankings with multi-armed bandits. In ICML, pages 784--791, 2008. Google ScholarDigital Library
- S. E. Robertson. Readings in information retrieval. chapter The probability ranking principle in IR, pages 281--286. Morgan Kaufmann Publishers Inc., 1997. Google ScholarDigital Library
- J. Teevan, S. T. Dumais, and E. Horvitz. Characterizing the value of personalizing search. In SIGIR, pages 757--758, 2007. Google ScholarDigital Library
- A. Turpin and F. Scholer. User performance versus precision measures for simple search tasks. In SIGIR, pages 11--18, 2006. Google ScholarDigital Library
- E. Vee, U. Srivastava, J. Shanmugasundaram, P. Bhat, and S. A. Yahia. Efficient computation of diverse query results. In ICDE, pages 228--236, 2008. Google ScholarDigital Library
- Y. Wang, H. Bai, M. Stanton, W.-Y. Chen, and E. Y. Chang. Plda: Parallel latent dirichlet allocation for large-scale applications. In AAIM, pages 301--314, 2009. Google ScholarDigital Library
- M. J. Welch, J. Cho, and C. Olston. Search result diversity for informational queries. In WWW, pages 237--246, 2011. Google ScholarDigital Library
- Y. Yue and T. Joachims. Predicting diverse subsets using structural svms. In Proceedings of the 25th international conference on Machine learning, pages 1224--1231, 2008. Google ScholarDigital Library
- C. Zhai, W. Cohen, and J. Lafferty. Beyond independent relevance: Methods and evaluation metrics for subtopic retrieval. In SIGIR, pages 10--17, 2003. Google ScholarDigital Library
- C. Zhai and J. Lafferty. A risk minimization framework for information retrieval. Information Processing and Management, 42(1):31--55, 2006. Google ScholarDigital Library
- B. Zhang, H. Li, Y. Liu, L. Ji, W. Xi, W. Fan, Z. Chen, and W.-Y. Ma. Improving web search results using affinity graph. In SIGIR, pages 504--511, 2005. Google ScholarDigital Library
- Y. Zhang, J. Callan, and T. Minka. Novelty and redundancy detection in adaptive filtering. In SIGIR, pages 81--88, 2002. Google ScholarDigital Library
- X. Zhu, A. B. Goldberg, J. V. Gael, and D. Andrzejewski. Improving diversity in ranking using absorbing random walks. In HLT-NAACL, pages 97--104, 2007.Google Scholar
- C.-N. Ziegler, S. M. McNee, J. A. Konstan, and G. Lausen. Improving recommendation lists through topic diversification. In WWW, pages 22--32, 2005. Google ScholarDigital Library
Index Terms
- Latent dirichlet allocation based diversified retrieval for e-commerce search
Recommendations
On Application of Learning to Rank for E-Commerce Search
SIGIR '17: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information RetrievalE-Commerce (E-Com) search is an emerging important new application of information retrieval. Learning to Rank (LETOR) is a general effective strategy for optimizing search engines, and is thus also a key technology for E-Com search. While the use of ...
Latent dirichlet allocation based multi-document summarization
AND '08: Proceedings of the second workshop on Analytics for noisy unstructured text dataExtraction based Multi-Document Summarization Algorithms consist of choosing sentences from the documents using some weighting mechanism and combining them into a summary. In this article we use Latent Dirichlet Allocation to capture the events being ...
The sensitivity of latent Dirichlet allocation for information retrieval
ECMLPKDD'09: Proceedings of the 2009th European Conference on Machine Learning and Knowledge Discovery in Databases - Volume Part IIIt has been shown that the use of topic models for Information retrieval provides an increase in precision when used in the appropriate form. Latent Dirichlet Allocation (LDA) is a generative topic model that allows us to model documents using a ...
Comments