ABSTRACT
In a number of recent studies [4, 8] researchers have found that because search engines repeatedly return currently popular pages at the top of search results, popular pages tend to get even more popular, while unpopular pages get ignored by an average user. This "rich-get-richer" phenomenon is particularly problematic for new and high-quality pages because they may never get a chance to get users' attention, decreasing the overall quality of search results in the long run. In this paper, we propose a new ranking function, called page quality that can alleviate the problem of popularity-based ranking. We first present a formal framework to study the search engine bias by discussing what is an "ideal" way to measure the intrinsic quality of a page. We then compare how PageRank, the current ranking metric used by major search engines, differs from this ideal quality metric. This framework will help us investigate the search engine bias in more concrete terms and provide clear understanding why PageRank is effective in many cases and exactly when it is problematic. We then propose a practical way to estimate the intrinsic page quality to avoid the inherent bias of PageRank. We derive our proposed quality estimator through a careful analysis of a reasonable web user model, and we present experimental results that show the potential of our proposed estimator. We believe that our quality estimator has the potential to alleviate the rich-get-richer phenomenon and help new and high-quality pages get the attention that they deserve.
- S. Abiteboul, M. Preda, and G. Cobna. Adaptive on-line page importance computation. In Proceedings of the International World-Wide Web Conference, May 2003. Google ScholarDigital Library
- D. Achlioptas, A. Fiat, A. R. Karlin, and F. McSherry. Web search via hub synthesis. In IEEE Symposium on Foundations of Computer Science, pages 500--509, 2001. Google ScholarDigital Library
- R. Albert, A.-L. Barabasi, and H. Jeong. Diameter of the World Wide Web. Nature, 401(6749):130--131, September 1999.Google Scholar
- R. Baeza-Yates, F. Saint-Jean, and C. Castillo. Web dynamics, age and page quality. In Proceedings of SPIRE 2002, September 2002. Google ScholarDigital Library
- A. Balmin, V. Hristidis, and Y. Papakonstantinou. ObjectRank: authority-based keyword search in databases. In Proceedings of the International Conference on Very Large Databases (VLDB), August 2004. Google ScholarDigital Library
- A.-L. Barabasi and R. Albert. Emergence of scaling in random networks. Science, 286(5439):509--512, October 1999.Google ScholarCross Ref
- A. Broder, R. Kumar, F. Maghoul, P. Raghavan, S. Rajagopalan, R. Stata, A. Tomkins, and J. Wiener. Graph structure in the web: experiments and models. In Proceedings of the International World-Wide Web Conference, May 2000. Google ScholarDigital Library
- J. Cho and S. Roy. Impact of search engines on page popularity. In Proceedings of the International World-Wide Web Conference, May 2004. Google ScholarDigital Library
- J. Cho, S. Roy, and R. E. Adams. Page quality: In search of an unbiased web ranking. Technical report, UCLA Computer Science, 2005.Google Scholar
- N. Fuhr. Probabilistic models in information retrieval. The Computer Journal, 35(3):243--255, 1992. Google ScholarDigital Library
- F. Geerts, H. Mannila, and E. Terzi. Relational link-based ranking. In Proceedings of the International Conference on Very Large Databases (VLDB), August 2004. Google ScholarDigital Library
- R. Goldman, N. Shivakumar, S. Venkatasubramanian, and H. Garcia-Molina. Proximity search in databases. In Proceedings of the International Conference on Very Large Databases (VLDB), pages 26--37, 1998. Google ScholarDigital Library
- S. P. Harter. Variations in relevance assessments and the measurement of retrieval effectiveness. Journal of the American Society for Information Science, 47(1):37--49, December 1996. Google ScholarDigital Library
- T. H. Haveliwala. Topic-sensitive pagerank. In Proceedings of the International World-Wide Web Conference, May 2002. Google ScholarDigital Library
- S. Kamvar, T. Haveliwala, and G. Golub. Adaptive methods for the computation of pagerank. In Proceedings of International Conference on the Numerical Solution of Markov Chains. September 2003.Google Scholar
- S. Kamvar, T. Haveliwala, C. Manning, and G. Golub. Extrapolation methods for accelerating pagerank computations. In Proceedings of the International World-Wide Web Conference, May 2003. Google ScholarDigital Library
- J. Kleinberg. Authoritative sources in a hyperlinked environment. Journal of the ACM, 46(5):604--632, September 1999. Google ScholarDigital Library
- S. Mizzaro. Measuring the agreement among relevance judges. In Proceedings of MIRA Conference, April 1999.Google ScholarCross Ref
- Nielsen NetRatings. http://www.nielsen-netratings.com/.Google Scholar
- Npd search and portal site study. Available at http: / /www.npd.com/press/releases/press_000919.htm.Google Scholar
- S. Olsen. Does search engine's power threaten web's independence? Available at http://news.com.com/2009--1023-963618.html, October 2002.Google Scholar
- L. Page, S. Brin, R. Motwani, and T. Winograd. The pagerank citation ranking: Bringing order to the web. Technical report, Stanford University Database Group, 1998. Available at http://dbpubs.stanford.edu:8090/pub/1999--66.Google Scholar
- D. M. Pennock, G. W. Flake, S. Lawrence, E. J. Glover, and C. L. Giles. Winners don't take all: Characterizing the competition for links on the web. Proceedings of the National Academy of Sciences, 99(8):5207--5211, 2002.Google ScholarCross Ref
- S. E. Robertson and K. Sparck-Jones. Relevance weighting of search terms. Journal of the American Society for Information Science, 27(3):129--146, 1975.Google ScholarCross Ref
- G. Salton. The SMART Retrieval System -- Experiments in Automatic Document Processing. Prentice Hall Inc., 1971. Google ScholarDigital Library
- G. Salton and M. J. McGill. Introduction to modern information retrieval. McGraw-Hill, 1983. Google ScholarDigital Library
- J. A. Tomlin. A new paradigm for ranking pages on the world wide web. In Proceedings of the International World-Wide Web Conference, May 2003. Google ScholarDigital Library
- TREC: Text retrieval conference. http://trec.nist.gov.Google Scholar
- A. C. Tsoi, G. Morini, F. Scarselli, M. Hagenbuchner, and M. Maggini. Adaptive ranking of web pages. In Proceedings of the International World-Wide Web Conference, May 2003. Google ScholarDigital Library
- F. Verhulst. Nonlinear Differential Equations and Dynamical Systems, Springer Verlag, 2nd edition, 1997. Google ScholarDigital Library
- Y. Wang and D. DeWitt. Computing pagerank in a distributed internet search system. In Proceedings of the International Conference on Very Large Databases (VLDB), August 2004. Google ScholarDigital Library
- S. Wartick. Boolean operations. Information Retrieval: Data Structures and Algorithms, pages 264--292, 1992. Google ScholarDigital Library
- Page quality: in search of an unbiased web ranking
Recommendations
Impact of search engines on page popularity
WWW '04: Proceedings of the 13th international conference on World Wide WebRecent studies show that a majority of Web page accesses are referred by search engines. In this paper we study the widespread use of Web search engines and its impact on the ecology of the Web. In particular, we study how much impact search engines ...
Quality-biased ranking for queries with commercial intent
WWW '13 Companion: Proceedings of the 22nd International Conference on World Wide WebModern search engines are good enough to answer popular commercial queries with mainly highly relevant documents. However, our experiments show that users behavior on such relevant commercial sites may differ from one to another web-site with the same ...
Web Page Ranking Using Machine Learning Approach
ACCT '15: Proceedings of the 2015 Fifth International Conference on Advanced Computing & Communication TechnologiesThis article gives an overview of the currently available literature on web page ranking algorithm using machine learning. Web page ranking algorithm, a well-known approach to rank the web pages available on cyber world. It helps us to know--how the ...
Comments