research-article

Latent dirichlet allocation based diversified retrieval for e-commerce search

Authors:
Jun Yu

Oregon State University, Corvallis, OR, USA

Oregon State University, Corvallis, OR, USA
View Profile

,
Sunil Mohan

eBay Inc., San Jose, CA, USA

eBay Inc., San Jose, CA, USA
View Profile

,
Duangmanee (Pew) Putthividhya

Google, Mountain View, CA, USA

Google, Mountain View, CA, USA
View Profile

,
Weng-Keen Wong

Oregon State University, Corvallis, OR, USA

Oregon State University, Corvallis, OR, USA
View Profile

WSDM '14: Proceedings of the 7th ACM international conference on Web search and data miningFebruary 2014Pages 463–472https://doi.org/10.1145/2556195.2556215

Published:24 February 2014Publication History

WSDM '14: Proceedings of the 7th ACM international conference on Web search and data mining

Pages 463–472

ABSTRACT

Diversified retrieval is a very important problem on many e-commerce sites, e.g. eBay and Amazon. Using IR approaches without optimizing for diversity results in a clutter of redundant items that belong to the same products. Most existing product taxonomies are often too noisy, with overlapping structures and non-uniform granularity, to be used directly in diversified retrieval. To address this problem, we propose a Latent Dirichlet Allocation (LDA) based diversified retrieval approach that selects diverse items based on the hidden user intents. Our approach first discovers the hidden user intents of a query using the LDA model, and then ranks the user intents by making trade-offs between their relevance and information novelty. Finally, it chooses the most representative item for each user intent to display. To evaluate the diversity in the search results on e-commerce sites, we propose a new metric, average satisfaction, measuring user satisfaction with the search results. Through our empirical study on eBay, we show that the LDA model discovers meaningful user intents and the LDA-based approach provides significantly higher user satisfaction than the eBay production ranker and three other diversified retrieval approaches.

References

R. Agrawal, S. Gollapudi, A. Halverson, and S. Ieong. Diversifying search results. In WSDM, pages 5--14, 2009. Google ScholarDigital Library
A. Anagnostopoulos, A. Z. Broder, and D. Carmel. Sampling search engine results. In WWW, pages 245--256, 2005. Google ScholarDigital Library
D. M. Blei, A. Ng, and M. Jordan. Latent dirichlet allocation. JMLR, 3:993--1022, 2003. Google ScholarDigital Library
A. Bookstein. Information retrieval: A sequential learning process. Journal of the American Society for Information Science, 34(5):331--342, 1983.Google ScholarCross Ref
J. Carbonell and J. Goldstein. The use of mmr, diversity-based reranking for reordering documents and reproducing summaries. In SIGIR, 1998. Google ScholarDigital Library
H. Chen and D. R. Karger. Less is more: probabilistic models for retrieving fewer relevant documents. In SIGIR, pages 429--436, 2006. Google ScholarDigital Library
C. L. Clarke, M. Kolla, G. V. Cormack, O. Vechtomova, A. Ashkan, S. Büttcher, and I. MacKinnon. Novelty and diversity in information retrieval evaluation. In SIGIR, pages 659--666, 2008. Google ScholarDigital Library
A. Dubey, S. Chakrabarti, and C. Bhattacharyya. Diversity in ranking via resistive graph centers. In KDD, pages 78--86, 2011. Google ScholarDigital Library
S. Gollapudi and A. Sharma. An axiomatic approach for result diversification. In WWW, pages 381--390, 2009. Google ScholarDigital Library
T. L. Griffiths and M. Steyvers. Finding scientific topics. PNAS, 101:5228--5235, 2004.Google ScholarCross Ref
S. Guo and S. Sanner. Probabilistic latent maximal marginal relevance. In SIGIR, pages 833--834, 2010. Google ScholarDigital Library
M. Hoffman, D. M. Blei, and F. Bach. Online learning for latent dirichlet allocation. In NIPS, pages 856--864, 2010.Google ScholarDigital Library
L. Li, K. Zhou, G.-R. Xue, H. Zha, and Y. Yu. Enhancing diversity, coverage and balance for summarization through structure learning. In WWW, pages 71--80, 2009. Google ScholarDigital Library
M. R. McLaughlin and J. L. Herlocker. A collaborative filtering algorithm and evaluation metric that accurately model the user experience. In SIGIR, pages 329--336, 2004. Google ScholarDigital Library
Q. Mei, J. Guo, and D. Radev. Divrank: the interplay of prestige and diversity in information networks. In KDD, pages 1009--1018, 2010. Google ScholarDigital Library
D. P. Putthividhya. ebay's internal report. 2011.Google Scholar
F. Radlinski and S. Dumais. Improving personalized web search using result diversification. In SIGIR, pages 691--692, 2006. Google ScholarDigital Library
F. Radlinski, R. Kleinberg, and T. Joachims. Learning diverse rankings with multi-armed bandits. In ICML, pages 784--791, 2008. Google ScholarDigital Library
S. E. Robertson. Readings in information retrieval. chapter The probability ranking principle in IR, pages 281--286. Morgan Kaufmann Publishers Inc., 1997. Google ScholarDigital Library
J. Teevan, S. T. Dumais, and E. Horvitz. Characterizing the value of personalizing search. In SIGIR, pages 757--758, 2007. Google ScholarDigital Library
A. Turpin and F. Scholer. User performance versus precision measures for simple search tasks. In SIGIR, pages 11--18, 2006. Google ScholarDigital Library
E. Vee, U. Srivastava, J. Shanmugasundaram, P. Bhat, and S. A. Yahia. Efficient computation of diverse query results. In ICDE, pages 228--236, 2008. Google ScholarDigital Library
Y. Wang, H. Bai, M. Stanton, W.-Y. Chen, and E. Y. Chang. Plda: Parallel latent dirichlet allocation for large-scale applications. In AAIM, pages 301--314, 2009. Google ScholarDigital Library
M. J. Welch, J. Cho, and C. Olston. Search result diversity for informational queries. In WWW, pages 237--246, 2011. Google ScholarDigital Library
Y. Yue and T. Joachims. Predicting diverse subsets using structural svms. In Proceedings of the 25th international conference on Machine learning, pages 1224--1231, 2008. Google ScholarDigital Library
C. Zhai, W. Cohen, and J. Lafferty. Beyond independent relevance: Methods and evaluation metrics for subtopic retrieval. In SIGIR, pages 10--17, 2003. Google ScholarDigital Library
C. Zhai and J. Lafferty. A risk minimization framework for information retrieval. Information Processing and Management, 42(1):31--55, 2006. Google ScholarDigital Library
B. Zhang, H. Li, Y. Liu, L. Ji, W. Xi, W. Fan, Z. Chen, and W.-Y. Ma. Improving web search results using affinity graph. In SIGIR, pages 504--511, 2005. Google ScholarDigital Library
Y. Zhang, J. Callan, and T. Minka. Novelty and redundancy detection in adaptive filtering. In SIGIR, pages 81--88, 2002. Google ScholarDigital Library
X. Zhu, A. B. Goldberg, J. V. Gael, and D. Andrzejewski. Improving diversity in ranking using absorbing random walks. In HLT-NAACL, pages 97--104, 2007.Google Scholar
C.-N. Ziegler, S. M. McNee, J. A. Konstan, and G. Lausen. Improving recommendation lists through topic diversification. In WWW, pages 22--32, 2005. Google ScholarDigital Library

Index Terms

Latent dirichlet allocation based diversified retrieval for e-commerce search
1. Information systems
  1. Information retrieval

Recommendations

On Application of Learning to Rank for E-Commerce Search
SIGIR '17: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval

E-Commerce (E-Com) search is an emerging important new application of information retrieval. Learning to Rank (LETOR) is a general effective strategy for optimizing search engines, and is thus also a key technology for E-Com search. While the use of ...
Read More
Latent dirichlet allocation based multi-document summarization
AND '08: Proceedings of the second workshop on Analytics for noisy unstructured text data

Extraction based Multi-Document Summarization Algorithms consist of choosing sentences from the documents using some weighting mechanism and combining them into a summary. In this article we use Latent Dirichlet Allocation to capture the events being ...
Read More
The sensitivity of latent Dirichlet allocation for information retrieval
ECMLPKDD'09: Proceedings of the 2009th European Conference on Machine Learning and Knowledge Discovery in Databases - Volume Part II

It has been shown that the use of topic models for Information retrieval provides an increase in precision when used in the appropriate form. Latent Dirichlet Allocation (LDA) is a generative topic model that allows us to model documents using a ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
WSDM '14: Proceedings of the 7th ACM international conference on Web search and data mining
February 2014
712 pages
ISBN:9781450323512
DOI:10.1145/2556195
General Chairs:
Ben Carterette
University of Delaware, USA
,
Fernando Diaz
Microsoft Research, USA
,
Program Chairs:
Carlos Castillo
Qatar Computing Research Institute, Qatar
,
Donald Metzler
Google, USA
Copyright © 2014 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 24 February 2014
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
diversified retrieval
e-commerce search
latent dirichlet allocation
Qualifiers
- research-article
Conference

Acceptance Rates
WSDM '14 Paper Acceptance Rate64of355submissions,18%Overall Acceptance Rate498of2,863submissions,17%
More
Upcoming Conference
WSDM '25

Sponsor:

sigir

sigir

sigir

sigir

The Eighteenth ACM International Conference on Web Search and Data Mining

April 7 - 11, 2025

Hannover , Germany
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 29
  Total Citations
  View Citations
- 347
  Total Downloads
- Downloads (Last 12 months)6
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Latent dirichlet allocation based diversified retrieval for e-commerce search

WSDM '14: Proceedings of the 7th ACM international conference on Web search and data mining

ABSTRACT

References

Cited By

Index Terms

Recommendations

On Application of Learning to Rank for E-Commerce Search

Latent dirichlet allocation based multi-document summarization

The sensitivity of latent Dirichlet allocation for information retrieval