ABSTRACT
Measuring the quality of web content, either at page level or website level, is at the heart of several key challenges in the Web. Without doubt, the main one is web search, to be able to rank results. However, there are other important problems such as web reputation or trust, and web spam detection and filtering. However, measuring intrinsic web quality is a hard problem, because of our limited (automatic) understanding of text semantics, which is even worse for other media. Hence, similarly to human trust assessing, where we use past actions, face expressions, body language, etc; in the Web we need to use indirect signals that serve as surrogates for web quality. In this keynote we attempt to present the most important signals as well as new signals that are or can be used to measure quality in the Web. We divide them using the traditional web content, structure, and usage trilogy. We also characterize them according to how easy is to measure these signals, who can measure them, and how well they scale to the whole Web.
- Ernesto Arroyo, Ted Selker, and Willy Wei. Usability tool for analysis of web designs using mouse tracks. CHI Extended Abstracts 2006. Montréal, Québec, Canada, 484--489. Google ScholarDigital Library
- Ricardo Baeza-Yates, Carlos Castillo, and Felipe Saint-Jean. Web Dynamics, Structure, and Page Quality. Web Dynamics 2004, New York, USA, 93--112.Google Scholar
- Ricardo Baeza-Yates and Luz Rello. On Measuring the Lexical Quality of the Web. Web Quality 2012, Lyon, France. Google ScholarDigital Library
- Ricardo Baeza-Yates and Yoelle Maarek. Usage Data in Web Search: Benefits and Limitations. SSDBM 2012, Chania, Greece, 495--506. Google ScholarDigital Library
- Janette Lehmann, Mounia Lalmas, Elad Yom-Tov, and Georges Dupret. Models of User Engagement, 20th conference on User Modeling, Adaptation, and Personalization (UMAP 2012), Montréal, Canada, 164--175. Google ScholarDigital Library
- Pablo Mendes, Peter Mika, Hugo Zaragoza, Roi Blanco. Measuring website similarity using an entity-aware click graph. CIKM 2012, Maui, USA, 1697--1701. Google ScholarDigital Library
Index Terms
- Measuring web quality
Recommendations
Using Propagation of Distrust to Find Untrustworthy Web Neighborhoods
ICIW '09: Proceedings of the 2009 Fourth International Conference on Internet and Web Applications and ServicesWeb spamming, the practice of introducing artificial text and links into web pages to affect the results of searches, has been recognized as a major problem for search engines. But it is mainly a serious problem for web users because they tend to ...
Web Quality of Agile Web Development
SSME '09: Proceedings of the 2009 IITA International Conference on Services Science, Management and EngineeringWith the rapid growth of web applications, WEB-based systems become more and more large and complex. So some serious problems maybe appear during the development process of WEB system, which would lead to the crisis of WEB system. Therefore, WEB Quality ...
A taxonomy of JavaScript redirection spam
AIRWeb '07: Proceedings of the 3rd international workshop on Adversarial information retrieval on the webRedirection spam presents a web page with false content to a crawler for indexing, but automatically redirects the browser to a different web page. Redirection is usually immediate (on page load) but may also be triggered by a timer or a harmless user ...
Comments