Tor may not be as anonymous as you think

There are a few so-called darkwebs or darknets in existence. The normal internet that we’re all familiar with is referred to as many things such as clear net or clear web and we use standard web browsers to access it. To access darknet sites, computers need to be reconfigured to resolve unfamiliar web addresses and use special proxies. This software also hides the real IP and identity of the user, but how well does it do that? It turns out that there are a number of ways a darkweb user can be de-anonymized and identified.

Tor is the most well-known darkweb but there are others. The Invisible Internet Project (IP2) is another darkweb, albeit much smaller and the Freenet project has supported a darkweb mode since 2008. The truth is that these darkwebs run on the same old internet as everything else. They run on the same types of servers and use the same protocols. Because of that, it’s very easy for darkweb operators to make simple mistakes that expose their users. Darkweb users are placing their anonymity in the hands of these darkweb operators, many of whom do not have the necessary skills to cloak their users properly.

A recent study shows that up to 35 percent of Tor sites are constructed in such a way that the actual server that they’re hosted on can be identified. Once a darkweb server is identified, the owner can be identified and, from there, legal warrants or illegal hacks can be employed to unmask individual users.

How do darkwebs work?

First, some terminology. The term darknet refers to an entire darknetwork. Within that network can be any service that exists on the clear web such as web servers and FTP servers. Websites that exist on the darknet form what is called the darkweb. Both terms are used in this article, but they are not entirely interchangeable. The same applies to the terms clear net and clear web, where clear means the normal, everyday internet we all use. Lastly, in general, entry points into a darknet are referred to as entry nodes and exit nodes to the clear net are referred to as exit nodes.

EFF How Tor Works
By Electronic Frontier Foundation https://www.torproject.org/about/overview.html.en, CC BY 3.0

Darknet servers exist on the same internet as clear net servers. The darknets overlay the existing internet by utilizing the internet for routing, but not for identification or name resolution. Darknet servers have IP addresses just like any other internet connected server. However, darknet services cannot be accessed by their darknet domain names without using software for that purpose. In addition, darknets encrypt all traffic and also employ a multiple hop system to break the direct link between a user and the darknet server they visit. With these three things in place, traditional investigative measures to identify visitors don’t work well.

There are two different purposes to using the darkweb: to gain access to a darkweb site which is not available on the clear web, or to use the darkweb as an anonymizing network to access regular websites on the clear web. Both Tor and I2P provide both of these services, as do some other anonymity darknets such as Freenet.

To access clear web sites we simply type in the domain name of the site we wish to visit and it downloads to our browsers. That process uses the public Domain Name System (DNS) to reconcile the domain name provided and the actual IP address of the server hosting that domain. darkweb sites do not use the clear net DNS so typing in some darkweb site, such as Facebook’s Tor site at https://facebookcorewwwi.onion/, will not work. To access darkweb sites you must be running that darkweb’s software. Tor, for example, does not use DNS at all. Rather, it maintains a distributed hash table (DHT) containing information about a .onion site that allows a user and a hidden service to build a 6-hop circuit utilizing a rendezvous point between them. None of the hops in the circuit have enough data to correlate the user and the hidden service.

I2P uses a similar concept whereby shared and trusted address books are used for hidden service resolution. That means the resolution is always local. The concept of a Tor circuit is used, but named tunnel and a full duplex circuit in I2P requires two separate tunnels. Outbound traffic and inbound traffic use separate tunnels to betterl resist traffic analysis attacks.

Since it is not practical for most adversaries to track a user’s activities through a darkweb circuit, it is usually easier to use other means to identify users.

The next hurdle in de-anonymizing darkweb users is the transitory nature of the darkweb. Darkweb circuits and tunnels are short lived. Users visiting darkweb sites can have their circuits changed frequently which presents an even larger challenge to traffic analysis. In addition, many darkweb services are hosted on hidden computers in people’s houses rather than in formal data centres. That means the services themselves can be transitory and go up and down, or even change physical location constantly.

Darkweb vulnerabilities

Traffic analysis

There are two main types of traffic analysis that can be done at the darkweb nodes: timing analysis and tracing data manipulation.

Timing analysis refers to an adversary’s ability to collect data from both darknet entry and exit nodes. If the darknet is being used as a clearnet proxy it will necessarily have to use both types of nodes. The entry node information is valuable because it knows the IP address of the person using the darknet. Exit nodes are valuable because they know the final destination of the user. In theory, collecting enough data from these two types of nodes would allow for the correlation of a request coming into an entry node (the user’s IP) and a request leaving an exit node (the user’s final destination). If the adversary has control over both those nodes, then the destination and the IP address of the requester can be associated. Only Tor and I2P provide exit nodes. I2P exit nodes are named outproxies and are rare. Freenet seems to have no such functionality so it can only be used to access Freenet services.

Tracing data manipulation is a technique where deliberately manipulating data at the end web server creates a fingerprint, such as setting a unique header value in an HTTP response. It’s possible to observe that same fingerprint at the client, which completely breaks the darkweb’s anonymity..

There’s not much you can do to prevent analysis of your traffic. The best you can do is to provide as little information as possible to be analyzed by utilizing things like the Tor browser. For an added layer of protection, connecting to a VPN prior to using Tor can not only hide the fact you’re using Tor from your ISP, but it can also hide your true ISP’s IP from the darknet entry node.

Links to clearnet resources

Although the internet does not resolve darknet domain names, the opposite is not always true. Internet domain names can be resolved from within the Tor and I2P darknets. It’s therefore possible, and sadly common within Tor, that darkweb sites can include links to content from the clear web.

This means that owners of the clear web servers will record your browser’s requests for that content. In most cases they will only see your darknet IP address, but it is still another trail that leads to your browser.

Javascript, Flash and WebRTC

These three technologies are very invasive. Web browsing without these technologies enabled limits the amount of data that can be sent to the website. In general, your browser type, IP address, and your request is sent to the website where it can be logged. Javascript, Flash and WebRTC run in your browser and therefore have access to much more information about your system that can be sent off to the web server or other logging facility.

WebRTC is ostensibly a series of protocols to allow real-time peer-to-peer communication such as that used in web conferencing. Part of that information is a user’s real IP address even if hidden behind a VPN or anonymizing proxy like Tor. Most modern browsers, mobile and desktop, support WebRTC by default, therefore a darkweb site can make a WebRTC query to a user’s browser to gain the real IP address. There are plugins for Firefox and Chrome that disable WebRTC in those browsers.

Adobe Flash®️ has long been a massive security risk. Second only to Javascript, Flash is a very rich source of information that can be used for browser fingerprinting. Flash is falling in popularity due to the rise of HTML 5 which replaces most of the functionality Flash is generally used for. It’s best to disable Flash entirely and only enable it when you need it. There are plugins for Firefox and Chrome that will disable Flash.

Javascript is usually the single richest source of browser and operating system information which can be used to fingerprint and track users. Javascript can reveal very specific things about your computer such as the plugins you have installed, your time zone, the fonts on your system, the computer operating system and even details about the hardware on your system such as graphic cards.

Unlike Flash, however, Javascript is still very much in use on the Internet so it is less convenient to disable it. But, as the saying goes: security or convenience, pick one. There are plugins for both Firefox and Chrome that will block Javascript by default and allow you to enable it on a case-by-case basis, or whitelist specific trusted domains which will be allowed to run Javascript by default.

DNS Leaks

The Domain Name System is the internet’s phone book. It reconciles the IP address of internet servers with human readable domain names that we can easily remember. Whenever you visit a site, for example comparitech.com, your computer has to find out the IP address of comparitech.com before your browser can request the website. To get that information, your computer performs a DNS query. If your computer has been to the site recently, then it likely has the IP address cached locally and will use that. But, if the cached answer has expired or if you have not been to the site before, then your computer will query a DNS server on the internet. These queries can be logged and provide information on sites you’ve visited. If your IP address shows as performing a DNS query for comparitech.com, it’s a pretty safe bet that you then visited comparitech.com.

Even when using a VPN to route your actual traffic through the internet, your computer may still use your ISP’s DNS servers for its queries. This will leave logs behind that you visited certain sites, even if the content of those visits is hidden by your VPN.

This type of vulnerability it called a DNS leak and to avoid DNS leaks, you should ensure that your DNS queries are going through your VPN along with all your traffic. Virtually all VPN clients offer DNS leak protection, although you may have to enable it. If you’re not sure your VPN is providing DNS leak protection, you can visit the Comparitech DNS Leak Test page to find out.

See also: Best VPNs for Tor

Tracking or analysis scripts

A recent study of 1.5 million darkweb pages (not sites) showed that an astounding 27 percent (over 400,000) pages contained tracking scripts. In some cases, these scripts were analytic scripts designed to provide traffic metrics to the site owner, but in other cases they could be performing browser fingerprinting to identify individual user trails through the darkweb.

It’s also important to consider that there are many popular traffic analysis scripts in existence already. Therefore, darkweb site owners are more likely to use an existing analysis script rather than roll their own. In that case, this means that traffic script can be used on both the darkweb and the clear web. If a darkweb user uses the same browser on both the darkweb and clearweb, it would be possible to correlate the fingerprint of their browser on the darkweb using a Tor exit node IP address, with a browser on the clear web using an identifiable ISP IP address.

Since tracking scripts must run inside a web browser, they are usually written in Javascript. Running a script-blocking plugin is the safest way to avoid being tracked by this type of script.

Doxing

Doxing is the practice of researching and collecting information on a person or organization. Generally, public information is used which is available in large supply on the internet. Starting with some very small pieces of information, it is sometimes possible to build up a very thick dossier on an internet user and eventually identify the actual person.

Starting with limited information such as that from a web server log, it would seem impossible to trace that back to a real person. However, if that IP makes enough visits over time, it can be possible to determine the person’s likely time zone. Depending on the subject matter of the site, it may be possible to determine the gender of the user. For example, the purchase of after-market viagra on a darkweb pharmacy site would usually indicate a middle-aged male. If the browser is fingerprinted well enough it may be possible to see that same browser used in the same time zones on other darkweb sites which would provide more information. For example, a dark web forum for weapons which show this user has a certain type of handgun. Now we have some tenuous threads to start to pull together. A middle-aged male in a rough time-zone with certain weapons.

The next step is to take that information to the clear web social media sites and forums and compile the same type of data from publicly available information. It’s obvious that doxing can take a lot of time, but it has been done successfully, and unsuccessfully, many times.. The richest sources of doxing information are sites such as forums or chat sites where users interact and write a lot for two reasons:

  • it’s very hard for a person to change their writing style, which leaves them vulnerable to stylometric analysis between attributable clear web posts and anonymous darkweb posts.
  • every sentence that is written potentially carries more clues about the author. For example, a forum post complaining about snow dated in August indicates someone south of the equator. That information, coupled with the traffic patterns that indicate time zone, can narrow down a country fairly well.

Technology cannot do much to protect against doxing. There are some tools that will rewrite prose to foil stylometric analysis. But, the best protection against doxing is to go to great lengths to ensure that none of the information or activities done on the dark web can be correlated back to information on the clear web.

Poisoned darknet nodes

Anyone can set up a darkweb node. Therefore, it stands to reason that not every node owner has noble intent. Some nodes may be run by law enforcement to assist in traffic analysis and some may actually be run by criminals.

After a large takedown of Tor hidden services in 2014, Tor director Andrew Lewman speculated on the possible ways in which over 400 hidden services could have been identified by law enforcement. In the months leading up to the takedown, the Tor project identified some rogue relays that were not behaving correctly. These Tor relays were modifying traffic headers in an attempt to de-anonymize users. Lewman speculates these relays were operated by law enforcement agencies.

It’s not always about law enforcement, though. Criminals also live in the darkweb and will try to steal your login credentials there as well.

A security researcher was curious if Tor exit nodes can be trusted. She set up a honeypot Bitcoin site on the clearweb. The next step was to obtain a list of Tor exit nodes and then create a list of unique passwords. She then logged in to the honeypot site exactly once for each exit node, utilizing the unique password assigned to that exit node. The methodology here is that the only way anyone could obtain the password for the site was by snooping traffic on the Tor exit node. Recall that only the Tor exit node knows the final destination of the request, so the exit node would be the only place where both the honeypot site URL, and the username and password combination would exist unencrypted.

Some notes about the study:

  • The logins were done over HTTP in order to allow the Tor exit node to be able to read the traffic.

  • The passwords were never stored online to eliminate the possibility of them being obtained in some other way.

Over the course of a month, almost 30 logins were attempted from 15 Tor exit nodes. The only reasonable conclusion is that those Tor exit node operators were deliberately stealing credentials and attempted to log in to the honeypot site.

Poisoned darkweb nodes would likely be completely undetectable by users. Tor attempts to identify reliable nodes and assigns them the ‘G’ guard flag through community consensus. But, two of the nodes that stole passwords for the honeypot Bitcoin site mentioned earlier were Guard nodes, so that designator doesn’t seem to convey any confidence.

You may be walking away from this article thinking there is simply no way to remain anonymous on the darkweb. That’s probably true, but that doesn’t mean you can’t benefit from its anonymity and privacy. One truth is that a well-funded and skilled adversary will almost certainly be able to de-anonymize any targeted person. The other truth is that most of us are not the targets of a well funded and skilled adversary. Knowledge is power and knowing the limits of the level of anonymity the darkweb can provide will allow you to prepare your tools, and your behaviour, to make the most of it.