Launching the Mozilla Plugin Privacy Test Database

Today, I launched the Plugin Privacy Test Database for Mozilla plugins. The tests attempt to determine whether plugins passively gather data about users browsing habits. In this introductory blog post, I will outline the test methods used and some of the results limitations.

For a step by step walk-through of a manual plugin test, see my previous post on privacy testing a browser plugin.

If you just want to see the results, head on over to the plugin privacy test database page.

Testing Goals

I use a variety of browser plugins - but I am wary of installing something that has full access to all browsing history and web activity. I routinely test plugins that I plan to install, but not everyone has the technical ability or time to do so. Thus, I wanted to setup a project that would automatically give some insight into whether a particular plugin was tracking a user that anyone could use.

In particular, I wanted to know which plugins track all page views (or more) passively (without user interaction after initial install), as I see these as the most dangerous plugins from a privacy perspective.

Scraping Plugin Data

First, I needed to get a list of all plugins. I used a custom scrapy crawler to crawl the Mozilla plugin page. I limit my data collection to plugins with more than 1000 installs (which results in a little over 1300 plugins at the time of this writing), collecting the location of the plugin install file, number of users, author, version, and last plugin update date.

The scraper itself is not very interesting. One item of import is that the User Agent string must match a recent version of FireFox to even view the page, since the plugin list is filtered based on browser type and version. If you are running this yourself, it should match the browser Selenium will test with to avoid compatibility errors during testing.

From here, I ingest the resulting output file into a database for further processing.

Testing Methodology

Once the plugin data has been ingested, I setup a browser using Selenium Webdriver, download and install a given plugin, and start up a Firefox session. Many plugins have a welcome page or instructions page that they open on first install. I didn't want that data to cloud the results with a lot of traffic, so I attempt to remove it by pausing execution while the page loads, then removing any tabs the plugin loaded on initial install.

I then proxy the browser through a ZAP daemon (a proxy) and browse to a few sites that I know only include first party requests (so they aren't making requests out to places like google analytics or CDN's, confusing whether the plugin or visited page was initiating the request). For the first iteration, these include Wikipedia and DuckDuckGo.

After this, I collect the data from the proxy, filter out requests to Mozilla, Wikipedia, or DuckDuckGo, and store the results in the database.

The core test code looks like this

# desired_capabilities includes proxy configuration
driver = webdriver.Firefox(firefox_binary=binary, executable_path='./geckodriver', capabilities=desired_capability)
    try:
        # Install the plugin (downloaded in previous step)
        ext_id = driver.install_addon(extension_path, temporary=True)
        
        # Plugin may load "welcome" page, which we don't want to include here.
        time.sleep(5) # Give any welcome page time to load
        if len(driver.window_handles) > 1:
            for handle in driver.window_handles[1:]:
                driver.switch_to_window(handle)
                driver.close()
        driver.switch_to_window(driver.window_handles[0])
        
        # Now that we have closed any possible welcome pages, start tracking new URL's
        core.new_session(name="plugin_test", overwrite=True)

        for test in test_site_list:
            if "url" in test.keys():
                driver.get(test["url"])
driver.quit() # Close this browser session
return core.sites # Get all the sites found by the ZAP proxy

Testing Limitations

The above methodology leaves out a few potentially important things. Some plugins don't do anything until they are setup or interacted with. In these cases, my testing would miss data collection. In the current test implementation, the plugin is not clicked or interacted with at all - only installed and left alone.

Additionally, the testing doesn't say anything about whether the traffic is acceptable. Some plugins require regular data retrieval, such as the latest adblock lists. These requests are perfectly acceptable for an adblocker. However, other plugins send data on every site visited, even when there is no need to do so to perform its stated purpose.

Finally, the current tests only run against a couple of sites. Plugins could be configured to collect only certain data, such as google searches, which these tests would not find.

The results are interesting from a passive collection perspective - plugins that store user data when clicked are significantly safer than plugins that track all web activity all the time.

Each person will have to make a manual determination if the traffic the tests reveal is acceptable to them based on their privacy goals and plugin purpose.

Results Summary

Most people will be interested in checking out plugins they personally used, but looking at the whole data set also reveals some interesting things.

  • Over 91% of plugins send no third party requests during my test. Good news!
  • 69 plugins (5%) send more than a single request
  • Most of those that send a lot of data are dependent on doing so - checking a sites SEO or comments from other users of the plugin. Users will have to decide if this behavior is acceptable on every request, or prefer to ask authors to require user interaction before browsing data is sent.
  • I was surprised to see the Shodan plugin, a security service I use (though not with the plugin), so high on the list. I would prefer a plugin like this only send data upon request, and not all the time while browsing.
  • So called security plugins seem to populate a lot of the worst offenders for data collection. Comodo, Avast, Norton, Avira and more. I normally recommend users avoid these products all together, and it does seem they collect all sites browsed (as opposed to using a regularly updated whitelist/blacklist approach)
  • Eyeballing the data show an interesting plugin listed multiple times "Search Secure" which seems to have multiple plugin ID's, but different "firefox user XXX" authors, with different URL's in the POST data for each one. Seems suspicious!

Running The Tests Independently

I released the code under the MIT License, and would welcome any enhancements to the testing methodology. You can find the project on my github. I strongly recommend running the tester at a minimum in a virtual machine and with a VPN. This is due to the nature of the tests we perform: installing untrusted code on our machines - some of which will be sending information about the browser and machine to unscrupulous third parties. Better to mask your IP and machine information if possible.

Testing generates traffic to Mozilla and sites used to test. Be sensitive - do not lower the built in delays beyond a reasonable amount to avoid getting blocked.

Testing the full list of 1300 plugins currently takes about 10 hours. This includes built in delays between tests to limit the traffic sent to any given web host, and the actual tests, which involve loading several pages.

You will need a local postgres database to store results in between steps. Otherwise, follow the instructions in the project Readme.