ABSTRACT
We present the largest and most detailed measurement of online tracking conducted to date, based on a crawl of the top 1 million websites. We make 15 types of measurements on each site, including stateful (cookie-based) and stateless (fingerprinting-based) tracking, the effect of browser privacy tools, and the exchange of tracking data between different sites ("cookie syncing"). Our findings include multiple sophisticated fingerprinting techniques never before measured in the wild. This measurement is made possible by our open-source web privacy measurement tool, OpenWPM, which uses an automated version of a full-fledged consumer browser. It supports parallelism for speed and scale, automatic recovery from failures of the underlying browser, and comprehensive browser instrumentation. We demonstrate our platform's strength in enabling researchers to rapidly detect, quantify, and characterize emerging online tracking behaviors.
- G. Acar, C. Eubank, et al. The web never forgets: Persistent tracking mechanisms in the wild. In Proceedings of CCS. 2014. Google ScholarDigital Library
- G. Acar, M. Juarez, et al. FPDetective: dusting the web for fingerprinters. In Proceedings of CCS. ACM, 2013. Google ScholarDigital Library
- L. A. Adamic and B. A. Huberman. Zipf's law and the internet. Glottometrics, 3(1):143--150, 2002.Google Scholar
- H. C. Altaweel I, Good N. Web privacy census. Technology Science, 2015.Google Scholar
- J. Angwin. What they know. The Wall Street Journal. http://online.wsj.com/public/page/what-they-know-digital-privacy.html, 2012.Google Scholar
- M. Ayenson, D. J. Wambach, et al. Flash cookies and privacy II: Now with HTML5 and ETag respawning. World Wide Web Internet And Web Information Systems, 2011.Google ScholarCross Ref
- P. E. Black. Ratcliff/Obershelp pattern recognition. http://xlinux.nist.gov/dads/HTML/ratcliffObershelp.html, Dec. 2004.Google Scholar
- Bugzilla. WebRTC Internal IP Address Leakage. https://bugzilla.mozilla.org/show_bug.cgi?id=959893.Google Scholar
- A. Datta, M. C. Tschantz, et al. Automated experiments on ad privacy settings. Privacy Enhancing Technologies, 2015.Google Scholar
- P. Eckersley. How unique is your web browser? In Privacy Enhancing Technologies. Springer, 2010. Google ScholarDigital Library
- Electronic Frontier Foundation. Encrypting the Web. https://www.eff.org/encrypt-the-web.Google Scholar
- S. Englehardt, D. Reisman, et al. Cookies that give you away: The surveillance implications of web tracking. In 24th International Conference on World Wide Web, pp. 289--299. International World Wide Web Conferences Steering Committee, 2015. Google ScholarDigital Library
- Federal Trade Commission. Google will pay$22.5 million to settle FTC charges it misrepresented privacy assurances to users of Apple's Safari internet browser. https://www.ftc.gov/news-events/press-releases/2012/08/google-will-pay-225-million-settle-ftc-charges-it-misrepresented, 2012.Google Scholar
- D. Fifield and S. Egelman. Fingerprinting web users through font metrics. In Financial Cryptography and Data Security, pp. 107--124. Springer, 2015.Google ScholarCross Ref
- N. Fruchter, H. Miao, et al. Variations in tracking in relation to geographic location. In Proceedings of W2SP. 2015.Google Scholar
- C. J. Hoofnagle and N. Good. Web privacy census. Available at SSRN 2460547, 2012.Google Scholar
- M. Kranch and J. Bonneau. Upgrading HTTPS in midair: HSTS and key pinning in practice. In NDSS '15: The 2015 Network and Distributed System Security Symposium. February 2015.Google Scholar
- S. A. Krashakov, A. B. Teslyuk, et al. On the universality of rank distributions of website popularity. Computer Networks, 50(11):1769--1780, 2006. Google ScholarDigital Library
- B. Krishnamurthy and C. Wills. Privacy diffusion on the web: a longitudinal perspective. In Conference on World Wide Web. ACM, 2009. Google ScholarDigital Library
- P. Laperdrix, W. Rudametkin, et al. Beauty and the beast: Diverting modern web browsers to build unique browser fingerprints. In 37th IEEE Symposium on Security and Privacy (S&P 2016). 2016.Google ScholarCross Ref
- M. Lécuyer, G. Ducoffe, et al. Xray: Enhancing the web's transparency with differential correlation. In USENIX Security Symposium. 2014. Google ScholarDigital Library
- T. Libert. Exposing the invisible web: An analysis of third-party http requests on 1 million websites. International Journal of Communication, 9(0), 2015. ISSN 1932--8036.Google Scholar
- J. R. Mayer and J. C. Mitchell. Third-party web tracking: Policy and technology. In Security and Privacy (S&P). IEEE, 2012. Google ScholarDigital Library
- A. M. McDonald and L. F. Cranor. Survey of the use of Adobe Flash Local Shared Objects to respawn HTTP cookies, a. ISJLP, 7, 2011.Google Scholar
- K. Mowery and H. Shacham. Pixel perfect: Fingerprinting canvas in html5. Proceedings of W2SP, 2012.Google Scholar
- Mozilla Developer Network. Mixed content - Security. https://developer.mozilla.org/en-US/docs/Security/Mixed_content.Google Scholar
- C. Neasbitt, B. Li, et al. Webcapsule: Towards a lightweight forensic engine for web browsers. In Proceedings of CCS. ACM, 2015. Google ScholarDigital Library
- N. Nikiforakis, L. Invernizzi, et al. You are what you include: Large-scale evaluation of remote javascript inclusions. In Proceedings of CCS. ACM, 2012. Google ScholarDigital Library
- N. Nikiforakis, A. Kapravelos, et al. Cookieless monster: Exploring the ecosystem of web-based device fingerprinting. In Security and Privacy (S&P). IEEE, 2013. Google ScholarDigital Library
- F. Ocariza, K. Pattabiraman, et al. Javascript errors in the wild: An empirical study. In Software Reliability Engineering (ISSRE). IEEE, 2011. Google ScholarDigital Library
- L. Olejnik, G. Acar, et al. The leaking battery. Cryptology ePrint Archive, Report 2015/616, 2015.Google Scholar
- Phantom JS. Supported web standards. http://www.webcitation.org/6hI3iptm5, 2016.Google Scholar
- M. Z. Rafique, T. Van Goethem, et al. It's free for a reason: Exploring the ecosystem of free live streaming services. In Network and Distributed System Security (NDSS). 2016.Google Scholar
- N. Robinson and J. Bonneau. Cognitive disconnect: Understanding Facebook Connect login permissions. In 2nd ACM conference on Online social networks. ACM, 2014. Google ScholarDigital Library
- F. Roesner, T. Kohno, et al. Detecting and Defending Against Third-Party Tracking on the Web. In Symposium on Networking Systems Design and Implementation. USENIX, 2012. Google ScholarDigital Library
- S. Schelter and J. Kunegis. On the ubiquity of web tracking: Insights from a billion-page web crawl. arXiv preprint arXiv:1607.07403, 2016.Google Scholar
- Selenium Browser Automation. Selenium faq. https://code.google.com/p/selenium/wiki/FrequentlyAskedQuestions, 2014.Google Scholar
- K. Singh, A. Moshchuk, et al. On the incoherencies in web browser access control policies. In Proceedings of S&P. IEEE, 2010. Google ScholarDigital Library
- A. Soltani, S. Canty, et al. Flash cookies and privacy. In AAAI Spring Symposium: Intelligent Information Privacy Management. 2010.Google Scholar
- O. Starov, J. Dahse, et al. No honor among thieves: A large-scale analysis of malicious web shells. In International Conference on World Wide Web. 2016. Google ScholarDigital Library
- Z. Tollman. We're Going HTTPS: Here's How WIRED Is Tackling a Huge Security Upgrade. https://www.wired.com/2016/04/wired-launching-https-security-upgrade/, 2016.Google Scholar
- J. Uberti. New proposal for IP address handling in WebRTC. https://www.ietf.org/mail-archive/web/rtcweb/current/msg14494.html.Google Scholar
- J. Uberti and G. wei Shieh. WebRTC IP Address Handling Recommendations. https://datatracker.ietf.org/doc/draft-ietf-rtcweb-ip-handling/.Google Scholar
- S. Van Acker, D. Hausknecht, et al. Password meters and generators on the web: From large-scale empirical study to getting it right. In Conference on Data and Application Security and Privacy. ACM, 2015. Google ScholarDigital Library
- S. Van Acker, N. Nikiforakis, et al. Flashover: Automated discovery of cross-site scripting vulnerabilities in rich internet applications. In Proceedings of CCS. ACM, 2012. Google ScholarDigital Library
- T. Van Goethem, F. Piessens, et al. Clubbing seals: Exploring the ecosystem of third-party security seals. In Proceedings of CCS. ACM, 2014. Google ScholarDigital Library
- W. V. Wazer. Moving the Washington Post to HTTPS. https://developer.washingtonpost.com/pb/blog/post/2015/12/10/moving-the-washington-post-to-https/, 2015.Google Scholar
- X. Xing, W. Meng, et al. Understanding malvertising through ad-injecting browser extensions. In 24th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 2015. Google ScholarDigital Library
- C. Yue and H. Wang. A measurement study of insecure javascript practices on the web. ACM Transactions on the Web (TWEB), 7(2):7, 2013. Google ScholarDigital Library
- A. Zarras, A. Kapravelos, et al. The dark alleys of madison avenue: Understanding malicious advertisements. In Internet Measurement Conference. ACM, 2014. Google ScholarDigital Library
Index Terms
- Online Tracking: A 1-million-site Measurement and Analysis
Recommendations
Cookies That Give You Away: The Surveillance Implications of Web Tracking
WWW '15: Proceedings of the 24th International Conference on World Wide WebWe study the ability of a passive eavesdropper to leverage "third-party" HTTP tracking cookies for mass surveillance. If two web pages embed the same tracker which tags the browser with a unique cookie, then the adversary can link visits to those pages ...
Automatic Discovery of Emerging Browser Fingerprinting Techniques
WWW '23: Proceedings of the ACM Web Conference 2023With the progression of modern browsers, online tracking has become the most concerning issue for preserving privacy on the web. As major browser vendors plan to or already ban third-party cookies, trackers have to shift towards browser fingerprinting ...
Web-based Fingerprinting Techniques
ICETE 2016: Proceedings of the 13th International Joint Conference on e-Business and TelecommunicationsThe concept of device fingerprinting is based in the assumption that each electronic device holds a unique set
of physical and/or logical features that others can capture and use to differentiate it from the whole. Web-based
fingerprinting, a particular ...
Comments