Who Knows Where I’ve Been?

March 27, 2011 § Leave a comment

Fraction of top 100,000 webpages which contain elements from each networkJust how much of our behavior online is being tracked, collated, and data-mined has been a subject of some public debate recently. The Department of Commerce, at the urging of consumer privacy advocates like the Electronic Frontier Foundation, has been discussing requiring advertisers to honor consumer opt-outs. Meanwhile, Google and a host of other large advertising networks have pushed a cookie-based opt-out mechanism (which advertising networks would voluntarily comply with), Internet Explorer 9 has implemented a more aggressive third-party resource filter, and Firefox has announced a plan for a different, also-voluntary opt-out mechanism that is distinct from the cookie-based approach supported by Google and others. The industry seems likely to accept some voluntary limits. Yet little empirical data quantifying the extent of behavioral tracking exists.

In July of last year, The Wall Street Journal published a survey of the prevalence of behavioral tracking networks among the top fifty websites, called What They Know. (A subsequent piece explored the privacy implications of popular smartphone apps.) The Journal‘s manual approach yielded deep insight into the data gathered–including a detailed view of the privacy and data-retention policies of the most prevalent networks–but it also limited the data gathered to a very small subset of the entire World Wide Web.

In a more scalable way, I have attempted to gather data to answer the question of just how much of all online behavior is visible to a handful of advertising networks.

