International Data Privacy Day 2015

Last Tuesday (2015-01-27) was International Data Privacy Day. I'm not sure if that is an annual thing or if it is something that only started this year, but I saw it on the Mozilla Blog and it is certainly something I support.

Such days exist to raise awareness for a cause, and I decided it would be interesting to do a simple experiment on the topic. Privacy and anonymity or things I have written about before, but I wanted to know just how much tracking is or at least could be being done, how effective simple methods could be at protecting online privacy, and how much of an inconvenience it would be to go to great lengths to ensure privacy.

The setup


Privacy Settings in Firefox

I used Firefox 37 Developer Edition on Ubuntu 14.04, but this will work on any fairly current version of Firefox on any Operating System.

Settings:

  • Enabled DNT header
  • Allow third-party cookies from visited sites.
  • Disable Flash, etc.
  • Set default search engine to DuckDuckGo

Add-ons:

The experiment:

The following are actually my normal settings. I consider them to grant the most privacy without sacrificing convenience and without requiring any complicated setup.

Tracking online tracking is difficult. Lightbeam does a good job of visualizing which sites are aware of your activity across the Internet, but it does not necessarily distinguish between tracking sites and CDNs. It does, however, at least show visited vs. third-party sites as well as cookies, which does give a decent idea of magnitude and method. Visited sites appear as circles, third-party as triangles. Content loaded from other domains shows as a white line, and third-party cookies show as purple lines.

DNT is mostly ineffective because it is simply a way of requesting that sites do not track you, but tracking is not clearly defined and there is no obligation for any site to honor your request. It's a nice idea, having been modelled after the Do Not Call list for phones, but without any legislation or even agreement on the definition of tracking, it's something that only makes a difference for sites that are probably pretty good about tracking to begin with.

Most sites which display ads, Facebook Like/share buttons, or any other kind of content from another domain also set cookies in the process. When you see a Like button on http://example.com/page-1, your browser is loading a number of resources from Facebook itself even though you are on a completely different site. Facebook sets a cookie for itself to remember you and is aware of your User Agent (a small amount of text your browser uses to identify itself which contains browser name, version, and Operating System), your external IP address, and a Referrer header which identifies the site you are currently visiting. Using this data, Facebook, ad networks, and a wide variety of other third-party services are able to build profiles of almost every user of every site across the entire Internet. Allowing third-party cookies only from sites you visit makes you less personally identifiable and shortens the length any profile is valid to however long you have a particular IP address (forever if you have a static IP address). These sites might not even track users by IP address if they're setting cookies since cookies eliminate the need to record IP addresses.

Privacy Badger takes this a step further by not even allowing any content to be loaded from known tracking sites unless it is absolutely necessary for the page to function, in which case it will allow the content but block the cookie. I find it to be more effective and easier to use than AdBlock Plus, and also somewhat more trust worthy since it is by the Electronic Frontier Foundation, who have an extraordinary reputation of fighting for privacy and other online rights.

DuckDuckGo is somewhat famous for being The search engine that doesn't track you. It does not log your IP address or search queries, and the only cookie which it might set is data for your settings (which is not personally identifiable). It also does not reveal your query to any site you visit.

I wanted to know just how effective all of these settings and add-ons were, so I spent two days of normal browsing, where the first day I had all privacy settings enabled and the second I attempted to re-enact my browsing from the first day with privacy settings disabled.

Day 1


47 sites visited, 118 third-party sites, no third-party cookies


These results are pretty decent, yet could also be seen as very concerning. Even though they were unable to set cookies, there are still a number of sites which would be able to track me though my IP address and UA string across quite a number of sites. Unfortunately, even private browsing would be useless against tracking of this kind, and the only way to prevent it would be to use Tor in order to disconnect me from my IP address. Of course, most of the are CDNs, and it is impossible to distinguish between the tracking which might be happening and the tracking which is taking place. Just because I'm loading a font from Google does not mean that my IP address is being logged in the process.

Day 2


46 sites visited, 290 third-party sites, multiple tracking cookies

For the second day, I disabled all privacy settings and add-ons and attempted to re-create the browsing history of the first day. Unlike the results from the first day, the second day did result in third-party cookies being set, which is almost a guarantee of tracking taking place. This is also in addition to any tracking through IP address, UA string, and Referrer header.

You will notice that the graph is much more filled in, third-party sites are much larger (meaning more sites connecting to them), and there are almost no sites at all which are not connected to something else. The number of third-party sites is almost 2.5 times what it was when I had strict privacy settings, and I have suspicions that some of my privacy settings could not be entirely undone (some things were still being blocked).

The verdict

It seems as though a few fairly simple settings and a small number of add-ons can be very effective at stopping online tracking. It is still entirely possible that certain CDNs could be using your IP address, UA string, and Referrer headers to track you, but this is entirely speculation and the only way to stop that would be to use Tor or disable all images, JavaScript, and even Stylesheets. Even then, there could be tracking done server-side by alerting third-parties of your IP address, UA string, and current URL. I'm not saying that you should use Tor all of the time because, unless there is a third-party cookie set, any tracking is transparent to the user and something which only might be taking place.

I've attempted using Tor full-time, but found it far too slow and inconvenient. There are too many captchas, quite a few sites block traffic entirely, the impact on speed is too great, disabling JavaScript (NoScript) breaks too many sites, and any single login forfeits anonymity. Aside from all of that, extended periods on Tor reduce its effectiveness because, even though it does not reveal you real IP address, your browsing starts to build more and more data about the user. Still, I'm experimenting now with using Tor without the browser bundle and setting it as a system-wide proxy, allowing me to use my regular version of Firefox as well as a email and instant messengers. If nothing else, at least it makes Bitcoin more anonymous.

Whatever you take from this is really up to you. I've collected some data and it is up to you to determine what to do with it. Some might interpret all of this as proof that no amount of effort is effective enough to make a difference in the losing battle for online privacy, while others will see it as proof that a few simple changes make all of the difference in the world. It all depends on how much trust you have for a few CDNs and how much privacy you think is the minimum amount you need.