Article

Page quality: in search of an unbiased web ranking

Authors:
Junghoo Cho

UCLA Computer Science

UCLA Computer Science
View Profile

,
Sourashis Roy

UCLA Computer Science

UCLA Computer Science
View Profile

,
Robert E. Adams

UCLA Computer Science

UCLA Computer Science
View Profile

SIGMOD '05: Proceedings of the 2005 ACM SIGMOD international conference on Management of dataJune 2005Pages 551–562https://doi.org/10.1145/1066157.1066220

Published:14 June 2005Publication History

SIGMOD '05: Proceedings of the 2005 ACM SIGMOD international conference on Management of data

Pages 551–562

ABSTRACT

In a number of recent studies [4, 8] researchers have found that because search engines repeatedly return currently popular pages at the top of search results, popular pages tend to get even more popular, while unpopular pages get ignored by an average user. This "rich-get-richer" phenomenon is particularly problematic for new and high-quality pages because they may never get a chance to get users' attention, decreasing the overall quality of search results in the long run. In this paper, we propose a new ranking function, called page quality that can alleviate the problem of popularity-based ranking. We first present a formal framework to study the search engine bias by discussing what is an "ideal" way to measure the intrinsic quality of a page. We then compare how PageRank, the current ranking metric used by major search engines, differs from this ideal quality metric. This framework will help us investigate the search engine bias in more concrete terms and provide clear understanding why PageRank is effective in many cases and exactly when it is problematic. We then propose a practical way to estimate the intrinsic page quality to avoid the inherent bias of PageRank. We derive our proposed quality estimator through a careful analysis of a reasonable web user model, and we present experimental results that show the potential of our proposed estimator. We believe that our quality estimator has the potential to alleviate the rich-get-richer phenomenon and help new and high-quality pages get the attention that they deserve.

References

S. Abiteboul, M. Preda, and G. Cobna. Adaptive on-line page importance computation. In Proceedings of the International World-Wide Web Conference, May 2003. Google ScholarDigital Library
D. Achlioptas, A. Fiat, A. R. Karlin, and F. McSherry. Web search via hub synthesis. In IEEE Symposium on Foundations of Computer Science, pages 500--509, 2001. Google ScholarDigital Library
R. Albert, A.-L. Barabasi, and H. Jeong. Diameter of the World Wide Web. Nature, 401(6749):130--131, September 1999.Google Scholar
R. Baeza-Yates, F. Saint-Jean, and C. Castillo. Web dynamics, age and page quality. In Proceedings of SPIRE 2002, September 2002. Google ScholarDigital Library
A. Balmin, V. Hristidis, and Y. Papakonstantinou. ObjectRank: authority-based keyword search in databases. In Proceedings of the International Conference on Very Large Databases (VLDB), August 2004. Google ScholarDigital Library
A.-L. Barabasi and R. Albert. Emergence of scaling in random networks. Science, 286(5439):509--512, October 1999.Google ScholarCross Ref
A. Broder, R. Kumar, F. Maghoul, P. Raghavan, S. Rajagopalan, R. Stata, A. Tomkins, and J. Wiener. Graph structure in the web: experiments and models. In Proceedings of the International World-Wide Web Conference, May 2000. Google ScholarDigital Library
J. Cho and S. Roy. Impact of search engines on page popularity. In Proceedings of the International World-Wide Web Conference, May 2004. Google ScholarDigital Library
J. Cho, S. Roy, and R. E. Adams. Page quality: In search of an unbiased web ranking. Technical report, UCLA Computer Science, 2005.Google Scholar
N. Fuhr. Probabilistic models in information retrieval. The Computer Journal, 35(3):243--255, 1992. Google ScholarDigital Library
F. Geerts, H. Mannila, and E. Terzi. Relational link-based ranking. In Proceedings of the International Conference on Very Large Databases (VLDB), August 2004. Google ScholarDigital Library
R. Goldman, N. Shivakumar, S. Venkatasubramanian, and H. Garcia-Molina. Proximity search in databases. In Proceedings of the International Conference on Very Large Databases (VLDB), pages 26--37, 1998. Google ScholarDigital Library
S. P. Harter. Variations in relevance assessments and the measurement of retrieval effectiveness. Journal of the American Society for Information Science, 47(1):37--49, December 1996. Google ScholarDigital Library
T. H. Haveliwala. Topic-sensitive pagerank. In Proceedings of the International World-Wide Web Conference, May 2002. Google ScholarDigital Library
S. Kamvar, T. Haveliwala, and G. Golub. Adaptive methods for the computation of pagerank. In Proceedings of International Conference on the Numerical Solution of Markov Chains. September 2003.Google Scholar
S. Kamvar, T. Haveliwala, C. Manning, and G. Golub. Extrapolation methods for accelerating pagerank computations. In Proceedings of the International World-Wide Web Conference, May 2003. Google ScholarDigital Library
J. Kleinberg. Authoritative sources in a hyperlinked environment. Journal of the ACM, 46(5):604--632, September 1999. Google ScholarDigital Library
S. Mizzaro. Measuring the agreement among relevance judges. In Proceedings of MIRA Conference, April 1999.Google ScholarCross Ref
Nielsen NetRatings. http://www.nielsen-netratings.com/.Google Scholar
Npd search and portal site study. Available at http: / /www.npd.com/press/releases/press_000919.htm.Google Scholar
S. Olsen. Does search engine's power threaten web's independence? Available at http://news.com.com/2009--1023-963618.html, October 2002.Google Scholar
L. Page, S. Brin, R. Motwani, and T. Winograd. The pagerank citation ranking: Bringing order to the web. Technical report, Stanford University Database Group, 1998. Available at http://dbpubs.stanford.edu:8090/pub/1999--66.Google Scholar
D. M. Pennock, G. W. Flake, S. Lawrence, E. J. Glover, and C. L. Giles. Winners don't take all: Characterizing the competition for links on the web. Proceedings of the National Academy of Sciences, 99(8):5207--5211, 2002.Google ScholarCross Ref
S. E. Robertson and K. Sparck-Jones. Relevance weighting of search terms. Journal of the American Society for Information Science, 27(3):129--146, 1975.Google ScholarCross Ref
G. Salton. The SMART Retrieval System -- Experiments in Automatic Document Processing. Prentice Hall Inc., 1971. Google ScholarDigital Library
G. Salton and M. J. McGill. Introduction to modern information retrieval. McGraw-Hill, 1983. Google ScholarDigital Library
J. A. Tomlin. A new paradigm for ranking pages on the world wide web. In Proceedings of the International World-Wide Web Conference, May 2003. Google ScholarDigital Library
TREC: Text retrieval conference. http://trec.nist.gov.Google Scholar
A. C. Tsoi, G. Morini, F. Scarselli, M. Hagenbuchner, and M. Maggini. Adaptive ranking of web pages. In Proceedings of the International World-Wide Web Conference, May 2003. Google ScholarDigital Library
F. Verhulst. Nonlinear Differential Equations and Dynamical Systems, Springer Verlag, 2nd edition, 1997. Google ScholarDigital Library
Y. Wang and D. DeWitt. Computing pagerank in a distributed internet search system. In Proceedings of the International Conference on Very Large Databases (VLDB), August 2004. Google ScholarDigital Library
S. Wartick. Boolean operations. Information Retrieval: Data Structures and Algorithms, pages 264--292, 1992. Google ScholarDigital Library

Page quality: in search of an unbiased web ranking
1. Information systems

Recommendations

Impact of search engines on page popularity
WWW '04: Proceedings of the 13th international conference on World Wide Web

Recent studies show that a majority of Web page accesses are referred by search engines. In this paper we study the widespread use of Web search engines and its impact on the ecology of the Web. In particular, we study how much impact search engines ...
Read More
Quality-biased ranking for queries with commercial intent
WWW '13 Companion: Proceedings of the 22nd International Conference on World Wide Web

Modern search engines are good enough to answer popular commercial queries with mainly highly relevant documents. However, our experiments show that users behavior on such relevant commercial sites may differ from one to another web-site with the same ...
Read More
Web Page Ranking Using Machine Learning Approach
ACCT '15: Proceedings of the 2015 Fifth International Conference on Advanced Computing & Communication Technologies

This article gives an overview of the currently available literature on web page ranking algorithm using machine learning. Web page ranking algorithm, a well-known approach to rank the web pages available on cyber world. It helps us to know--how the ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGMOD '05: Proceedings of the 2005 ACM SIGMOD international conference on Management of data
June 2005
990 pages
ISBN:1595930604
DOI:10.1145/1066157
Conference Chair:
Fatma Ozcan
IBM Almaden Research Center
Copyright © 2005 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 14 June 2005
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate785of4,003submissions,20%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 82
  Total Citations
  View Citations
- 1,639
  Total Downloads
- Downloads (Last 12 months)23
- Downloads (Last 6 weeks)6
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Page quality: in search of an unbiased web ranking

SIGMOD '05: Proceedings of the 2005 ACM SIGMOD international conference on Management of data

ABSTRACT

References

Cited By

Recommendations

Impact of search engines on page popularity

Quality-biased ranking for queries with commercial intent

Web Page Ranking Using Machine Learning Approach

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Page quality: in search of an unbiased web ranking

SIGMOD '05: Proceedings of the 2005 ACM SIGMOD international conference on Management of data

ABSTRACT

References

Cited By

Recommendations

Impact of search engines on page popularity

Quality-biased ranking for queries with commercial intent

Web Page Ranking Using Machine Learning Approach

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media