skip to main content
10.1145/1066157.1066220acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
Article

Page quality: in search of an unbiased web ranking

Published:14 June 2005Publication History

ABSTRACT

In a number of recent studies [4, 8] researchers have found that because search engines repeatedly return currently popular pages at the top of search results, popular pages tend to get even more popular, while unpopular pages get ignored by an average user. This "rich-get-richer" phenomenon is particularly problematic for new and high-quality pages because they may never get a chance to get users' attention, decreasing the overall quality of search results in the long run. In this paper, we propose a new ranking function, called page quality that can alleviate the problem of popularity-based ranking. We first present a formal framework to study the search engine bias by discussing what is an "ideal" way to measure the intrinsic quality of a page. We then compare how PageRank, the current ranking metric used by major search engines, differs from this ideal quality metric. This framework will help us investigate the search engine bias in more concrete terms and provide clear understanding why PageRank is effective in many cases and exactly when it is problematic. We then propose a practical way to estimate the intrinsic page quality to avoid the inherent bias of PageRank. We derive our proposed quality estimator through a careful analysis of a reasonable web user model, and we present experimental results that show the potential of our proposed estimator. We believe that our quality estimator has the potential to alleviate the rich-get-richer phenomenon and help new and high-quality pages get the attention that they deserve.

References

  1. S. Abiteboul, M. Preda, and G. Cobna. Adaptive on-line page importance computation. In Proceedings of the International World-Wide Web Conference, May 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. D. Achlioptas, A. Fiat, A. R. Karlin, and F. McSherry. Web search via hub synthesis. In IEEE Symposium on Foundations of Computer Science, pages 500--509, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. R. Albert, A.-L. Barabasi, and H. Jeong. Diameter of the World Wide Web. Nature, 401(6749):130--131, September 1999.Google ScholarGoogle Scholar
  4. R. Baeza-Yates, F. Saint-Jean, and C. Castillo. Web dynamics, age and page quality. In Proceedings of SPIRE 2002, September 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. A. Balmin, V. Hristidis, and Y. Papakonstantinou. ObjectRank: authority-based keyword search in databases. In Proceedings of the International Conference on Very Large Databases (VLDB), August 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. A.-L. Barabasi and R. Albert. Emergence of scaling in random networks. Science, 286(5439):509--512, October 1999.Google ScholarGoogle ScholarCross RefCross Ref
  7. A. Broder, R. Kumar, F. Maghoul, P. Raghavan, S. Rajagopalan, R. Stata, A. Tomkins, and J. Wiener. Graph structure in the web: experiments and models. In Proceedings of the International World-Wide Web Conference, May 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. J. Cho and S. Roy. Impact of search engines on page popularity. In Proceedings of the International World-Wide Web Conference, May 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. J. Cho, S. Roy, and R. E. Adams. Page quality: In search of an unbiased web ranking. Technical report, UCLA Computer Science, 2005.Google ScholarGoogle Scholar
  10. N. Fuhr. Probabilistic models in information retrieval. The Computer Journal, 35(3):243--255, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. F. Geerts, H. Mannila, and E. Terzi. Relational link-based ranking. In Proceedings of the International Conference on Very Large Databases (VLDB), August 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. R. Goldman, N. Shivakumar, S. Venkatasubramanian, and H. Garcia-Molina. Proximity search in databases. In Proceedings of the International Conference on Very Large Databases (VLDB), pages 26--37, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. S. P. Harter. Variations in relevance assessments and the measurement of retrieval effectiveness. Journal of the American Society for Information Science, 47(1):37--49, December 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. T. H. Haveliwala. Topic-sensitive pagerank. In Proceedings of the International World-Wide Web Conference, May 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. S. Kamvar, T. Haveliwala, and G. Golub. Adaptive methods for the computation of pagerank. In Proceedings of International Conference on the Numerical Solution of Markov Chains. September 2003.Google ScholarGoogle Scholar
  16. S. Kamvar, T. Haveliwala, C. Manning, and G. Golub. Extrapolation methods for accelerating pagerank computations. In Proceedings of the International World-Wide Web Conference, May 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. J. Kleinberg. Authoritative sources in a hyperlinked environment. Journal of the ACM, 46(5):604--632, September 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. S. Mizzaro. Measuring the agreement among relevance judges. In Proceedings of MIRA Conference, April 1999.Google ScholarGoogle ScholarCross RefCross Ref
  19. Nielsen NetRatings. http://www.nielsen-netratings.com/.Google ScholarGoogle Scholar
  20. Npd search and portal site study. Available at http: / /www.npd.com/press/releases/press_000919.htm.Google ScholarGoogle Scholar
  21. S. Olsen. Does search engine's power threaten web's independence? Available at http://news.com.com/2009--1023-963618.html, October 2002.Google ScholarGoogle Scholar
  22. L. Page, S. Brin, R. Motwani, and T. Winograd. The pagerank citation ranking: Bringing order to the web. Technical report, Stanford University Database Group, 1998. Available at http://dbpubs.stanford.edu:8090/pub/1999--66.Google ScholarGoogle Scholar
  23. D. M. Pennock, G. W. Flake, S. Lawrence, E. J. Glover, and C. L. Giles. Winners don't take all: Characterizing the competition for links on the web. Proceedings of the National Academy of Sciences, 99(8):5207--5211, 2002.Google ScholarGoogle ScholarCross RefCross Ref
  24. S. E. Robertson and K. Sparck-Jones. Relevance weighting of search terms. Journal of the American Society for Information Science, 27(3):129--146, 1975.Google ScholarGoogle ScholarCross RefCross Ref
  25. G. Salton. The SMART Retrieval System -- Experiments in Automatic Document Processing. Prentice Hall Inc., 1971. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. G. Salton and M. J. McGill. Introduction to modern information retrieval. McGraw-Hill, 1983. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. J. A. Tomlin. A new paradigm for ranking pages on the world wide web. In Proceedings of the International World-Wide Web Conference, May 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. TREC: Text retrieval conference. http://trec.nist.gov.Google ScholarGoogle Scholar
  29. A. C. Tsoi, G. Morini, F. Scarselli, M. Hagenbuchner, and M. Maggini. Adaptive ranking of web pages. In Proceedings of the International World-Wide Web Conference, May 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. F. Verhulst. Nonlinear Differential Equations and Dynamical Systems, Springer Verlag, 2nd edition, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Y. Wang and D. DeWitt. Computing pagerank in a distributed internet search system. In Proceedings of the International Conference on Very Large Databases (VLDB), August 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. S. Wartick. Boolean operations. Information Retrieval: Data Structures and Algorithms, pages 264--292, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  1. Page quality: in search of an unbiased web ranking

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      SIGMOD '05: Proceedings of the 2005 ACM SIGMOD international conference on Management of data
      June 2005
      990 pages
      ISBN:1595930604
      DOI:10.1145/1066157
      • Conference Chair:
      • Fatma Ozcan

      Copyright © 2005 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 14 June 2005

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • Article

      Acceptance Rates

      Overall Acceptance Rate785of4,003submissions,20%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader