Virtual Private Networks (VPNs) solve a lot of privacy problems. Since a VPN usually encrypts your traffic between your computer and the VPN provider, it makes it very difficult for an observer to view your traffic to see what you’re up to. However, there are many people who want to be able to hide the fact that they’re using a VPN at all; such as people in countries that ban VPNs, or other situations where VPN usage is not generally allowed or blocked through technical means. In this article, we focus on the type of data an observer can collect from network packet captures and how that data can be used to detect VPN use.
- 1 Background on the problem
- 2 Testing methodology
- 3 Non-technical sources of VPN indicators
- 4 Tell-tale signs from packet metadata
- 5 Inconsistencies in operating system and packet fingerprint data
- 6 Insufficient obfuscation techniques from VPN providers
- 7 In summary
Background on the problem
The burning question is “why”? Who cares if someone discovers you’re running a VPN? If the traffic is heavily encrypted anyhow, what’s the problem?
It’s true that in many situations and in many countries, it doesn’t matter at all if an observer detects the use of a VPN. However, there are many countries that ban the use of VPNs and it’s therefore important for VPN users in those countries to know how they can be discovered.
In order to determine whether a VPN is in use, an observer has to have access to a router in which the target traffic is passing through. In the case of a targeted victim, an attacker may expend great resources to identify a way in which to take over a router that particular victim uses. In the case of nation-state surveillance, effective detection would require the control of a lot of routers. When you combine those two things—an organisation that cares if you’re using and VPN and also has the ability to control a large number of routers—that usually indicates a nation-level threat actor.
Keep in mind that this article deals with ways in which VPN usage can be discovered by observers. It doesn’t necessarily mean that the data encrypted within the VPN tunnel is easier to exploit.
Without access to state-level resources, my testing platform and methodology is a little smaller in scale. I created a small internal network using three Virtual Machines (VM) with VirtualBox. The network topology is as such:
I installed packet sniffing software on the OpenWRT router VM and then tested various VPN configurations on the other two virtual machines. The packet sniffing software, tcpdump, allowed me to capture the VMs network traffic for analysis. In a more realistic setup, the packet capturing software would probably be installed in routers on the Internet, or at least within the ISP’s network. The strategic placement of analysis software would require some knowledge of the convergence points of interest on the internet where the target traffic is likely to be flowing. In my testing network, I know with 100% certainty that all the traffic to and from my virtual machines is going to pass through that OpenWRT router. It’s therefore the best place for me to place my collection tools.
Non-technical sources of VPN indicators
Not all sources of data that indicate VPN usage are technical. While some are very technical, such as packet analysis, some are very non-technical, such as human error and daily routine.
Unintended network traffic
Most VPN users have client software that must be launched in order for the the VPN to be established. It’s very difficult to ensure that no traffic passes over the internet prior to the VPN being established when a computer boots up. Even those VPNs with kill switches may not be able to do anything about traffic that passes during system boot up.
To test this, I set the auto-connect and kill switch options of VyprVPN in the Windows virtual machine. I then shutdown the Windows machine, started a packet capture on the OpenWRT router, and started the Windows machine. That generated a lot of packets and of interest are these two sequences.
First, we can see a lot of pings to a similar range of IPs. I did not purposely group these packets – this is how they were sent organically:
This suggests that something is trying to enumerate servers. A very common cause of this type of traffic in a VPN scenario is a VPN client attempting to determine the fastest server. One method to do this is to send an ICMP packet (known as a ping) to a set of servers to see which ones comes back the fastest.
We can see from the first screenshot that 18.104.22.168 returned the fastest in 99 milliseconds. Further down in the packet capture, we suddenly see that most of the traffic from that point on is encrypted and is destined for 22.214.171.124
The next piece of the puzzle is to find out what is at those IPs. Using IP WHOIS which states the registered owner of an IP, we can see that all but one of these IPs belong to the YHC Corporation and resolve to servers in the Data Foundry data center:
126.96.36.199 OrgName: YHC Corporation OrgTechEmail: email@example.com 188.8.131.52 OrgName: YHC Corporation OrgTechEmail: firstname.lastname@example.org 184.108.40.206 OrgName: YHC Corporation OrgTechEmail: email@example.com 209-99-115-97 220.127.116.11 OrgName: YHC Corporation OrgTechEmail: firstname.lastname@example.org 18.104.22.168 OrgName: YHC Corporation OrgTechEmail: email@example.com OrgTechEmail: firstname.lastname@example.org 22.214.171.124 OrgName: YHC Corporation OrgTechEmail: email@example.com 126.96.36.199 OrgName: YHC Corporation OrgTechEmail: firstname.lastname@example.org 188.8.131.52 OrgName: YHC Corporation OrgTechEmail: email@example.com 184.108.40.206 OrgName: YHC Corporation OrgTechEmail: firstname.lastname@example.org OrgName: Powerhouse Management, Inc. OrgTechEmail: email@example.com 220.127.116.11 OrgName: YHC Corporation OrgTechEmail: firstname.lastname@example.org 18.104.22.168 OrgName: YHC Corporation OrgTechEmail: email@example.com 22.214.171.124 OrgName: YHC Corporation OrgTechEmail: firstname.lastname@example.org 126.96.36.199 OrgName: YHC Corporation OrgTechEmail: email@example.com 188.8.131.52 OrgName: YHC Corporation OrgTechEmail: firstname.lastname@example.org 184.108.40.206 OrgName: YHC Corporation OrgTechEmail: email@example.com 220.127.116.11 OrgName: YHC Corporation OrgTechEmail: firstname.lastname@example.org 18.104.22.168 OrgName: YHC Corporation OrgTechEmail: email@example.com
A logical next step would be to scan those IPs to see what services they are running. I won’t supply details on how to do that, but my testing shows that the default connection banners that most servers display have been removed from the VyprVPN servers so there’s no obvious tell-tale that these IPs are running a VPN server.
There isn’t much you can do about how your computer acts prior to being booted up. Therefore, if you want to obfuscate this type of setup sequence, you’ll need to run a VPN “in front” of your computer. Running the VPN client on your router instead of running a client on your computer is one way to do this. You will still run into the same startup sequences when the router restarts, but that is usually less often than your computer.
No unencrypted packets
As I mentioned above, once the pings were complete, the packet capture shows encrypted traffic to the fastest IP. If an observer sees only encrypted packets and not a single unencrypted packet, that can be a sign there is a VPN in use. While the world is moving quickly towards encrypting as much data as possible on the web, there are still some requests which are typically not encrypted. Among these are DNS lookup queries, NNTP (time server) queries and a smattering of other protocol requests such as FTP and Telnet which are sometimes in use in some of our applications, but do not support encryption at all.
Leaks from sloppy human operational security (OpSec)
A great deal of meaningful data can be obtained from a target by using seemingly trivial information. Many people spend a lot of time and effort mitigating what they perceive as the “important” stuff only to be identified by trivial information they did not think of. Some examples include the long memory of the internet that revealed Hillary Clinton’s email administrator was most likely a guy named Paul Combetta; Dread Pirate Roberts, AKA Ross Ulbricht, the alleged mastermind of the illegal Silk Road internet marketplace, was prosecuted largely due to data on his laptop that was physically taken from him while distracted at a public library.
Less dramatically, observers can frequently use things like activity cycles to pin down a target’s timezone or the presence of special characters in a message to identify a language layout corresponding to a target’s country. There is no complete list of things to take into account when considering operational security because coming up with new ways to cross-reference data is mostly an exercise in imagination and resources.
However, there are some specific things that pertain to packet capturing which can identify VPN use.
Tell-tale signs from packet metadata
PFS re-keys are predictable
Since VPN traffic is usually encrypted, it’s generally hidden from prying eyes. Encryption works because it is very hard to “brute force” encrypted data to expose its clear text content. In fact, breaking encryption is so hard that large scale surveillance projects sometimes just collect all the data they can in the hopes that they will be able to break the encryption at some future date when computer power increases, or they are able to obtain the keys that were used to encrypt the data. Perfect Forward Secrecy (PFS) is a method that can be used to prevent the latter scenario.
Perfect Forward Secrecy re-generates the encryption keys used to encrypt the VPN traffic periodically. When a new key pair is generated, the previous pair is destroyed. This means that any collected encrypted packets cannot be decrypted at a later date because the key used to encrypt them no longer exists.
OpenVPN supports PFS. While capturing data for this article, I dropped the key cycling rate down to 10 seconds in order to capture that process taking place. I found that when the key regeneration took place, the following sequence of packets was generated:
09:01:48.461276 IP 192.168.1.204.openvpn > 22.214.171.124.openvpn: UDP, length 94 09:01:54.749114 IP 192.168.1.204.openvpn > 126.96.36.199.openvpn: UDP, length 65 09:01:58.895381 IP 192.168.1.204.openvpn > 188.8.131.52.openvpn: UDP, length 86 09:01:58.951091 IP 192.168.1.204.openvpn > 184.108.40.206.openvpn: UDP, length 94 09:01:58.951614 IP 192.168.1.204.openvpn > 220.127.116.11.openvpn: UDP, length 259 09:01:59.007916 IP 192.168.1.204.openvpn > 18.104.22.168.openvpn: UDP, length 94 09:01:59.008027 IP 192.168.1.204.openvpn > 22.214.171.124.openvpn: UDP, length 94 09:01:59.008265 IP 192.168.1.204.openvpn > 126.96.36.199.openvpn: UDP, length 94 09:01:59.008300 IP 192.168.1.204.openvpn > 188.8.131.52.openvpn: UDP, length 94 09:01:59.062927 IP 192.168.1.204.openvpn > 184.108.40.206.openvpn: UDP, length 256 09:01:59.106521 IP 192.168.1.204.openvpn > 220.127.116.11.openvpn: UDP, length 575
The notable thing about this sequence is that the packet sizes are identical each time the key regeneration took place. Therefore, whenever I saw a sequence of packets with these sizes in my packet capture, I knew key cycling was taking place:
94 65 86 94 259 94 94 94 94 256 575
Arguably, any repeating process would theoretically generate a repeated sequence of packets like this, but it can still be used as an indicator that PFS may be in play. Coupled with other data, this information could be enough to confirm a VPN connection.
All packets destined to the same IP
During the normal course of internet use, people and computers request data from many different sites. Each of those sites has a different IP address. When using a VPN, every single packet is destined to the VPN server. The VPN server peels the VPN encryption layer off each packet to reveal the real packet and then sends it on its way to its actual destination. The VPN server does the same with responses. It receives response packets, wraps them in an encryption layer, and then sends the packet to the user’s computer.
A packet capture that shows a computer sending 100% of its traffic to a single IP is a good indicator that a VPN or proxy is in use.
Psiphon is an internet censorship circumvention tool. It has an interesting function that can combat this to some degree. It has split tunnel mode which essentially only uses the Psiphon tunnel for traffic that leaves your own country.
To see how this looks at the packet level, I launched Psiphon and tested two sites. I am in Canada and here’s a sample of traffic that is destined to our own .CA domain registrar. In this case, my destination is clearly visible in the packet capture.
8:30:14.213668 IP 192.168.1.210.58787 > www.cira.ca.https: Flags [.], ack 1026833, win 64240, length 0 08:30:14.229178 IP www.cira.ca.https > 192.168.1.210.58787: Flags [.], seq 1026833:1028293, ack 715, win 5094, length 1460 08:30:14.229427 IP www.cira.ca.https > 192.168.1.210.58787: Flags [.], seq 1028293:1031213, ack 715, win 5094, length 2920 08:30:14.229781 IP 192.168.1.210.58787 > www.cira.ca.https: Flags [.], ack 1031213, win 64240, length 0
I then visited the Comparitech website which is hosted in the United States:
8:29:48.028789 IP li832-56.members.linode.com.ssh > 192.168.1.210.58659: Flags [P.], seq 107809:108277, ack 19080, win 1392, length 468 08:29:48.029101 IP 192.168.1.210.58659 > li832-56.members.linode.com.ssh: Flags [.], ack 108277, win 856, length 0 08:29:48.029306 IP 192.168.1.210.58659 > li832-56.members.linode.com.ssh: Flags [P.], seq 19080:19132, ack 108277, win 856, length 52 08:29:48.108658 IP li832-56.members.linode.com.ssh > 192.168.1.210.58659: Flags [.], ack 19132, win 1392, length 0
Note how the traffic destined for the US is sent to a Linode server instead of to comparitech.com. Linode is a very large server company and it’s not unusual at all to see traffic destined for a Linode server. Psiphon further obfuscates that traffic by using an SSH tunnel to hide any trace of a VPN. As well, the reverse DNS (rDNS) for the Psiphon server at Linode does not betray its association to Psiphon; the rDNS just shows Linode owns the IP, which is expected. There is more on rDNS in the obfuscation section later on in this article.
Inconsistencies in operating system and packet fingerprint data
Although TCP networking is operating system agnostic, different operating systems create packets with some different values. For example, the default packet Time-To-Live (TTL) value varies in packets created on different systems. Most Windows system will set the packet TTL to 128 by default whereas most Linux systems will set it to 64. Since the TTL is a visible part of captured packet, it’s possible to determine which OS most likely created that packet. There are also other tell-tale signs in packet construction such as length and Maximum Segment Size (MSS) which also vary from operating system to operating system.
The snippet below is part of a packet generated from a Windows system. Note the ttl 127 value on the last line is set to 127. This is because the TTL is expressed in number of “hops”. Every time a packet traverses a device such as a router, its TTL is decremented by one. In this case, the TTL started at 128 but since I captured it on the router—after one hop—it is now 127. However, I can still tell that it was never 64 so this is likely a packet created on a Windows system.
08:08:51.657495 IP (tos 0x0, ttl 64, id 32150, offset 0, flags [DF], proto UDP (17), length 177) google-public-dns-a.google.com.domain > 192.168.2.139.59414: 40501 3/0/0 cdn-3.convertexperiments.com. CNAME cdn-3.convertexperiments.com.edgekey.net., cdn-3.convertexperiments.com.edgekey.net. CNAME e5289.g.akamaiedge.net., e5289.g.akamaiedge.net. A 18.104.22.168 (149) 08:08:51.659278 IP (tos 0x0, ttl 127, id 3890, offset 0, flags [DF], proto TCP (6), length 52)
A packet captured from a Linux machine has a TTL of 63 after its first hop. This is because most Linux machines set the initial value of the packet TTL to 64.
08:15:55.913493 IP (tos 0x0, ttl 63, id 41443, offset 0, flags [DF], proto UDP (17), length 56) 192.168.2.139.48635 > resolver1.ihgip.net.domain: 47200+ A? google.com. (28)
But, so what? Why can it be important to know what operating system created a packet?
If an observer has specialized knowledge of a target it can matter a lot. If the target is known to use Windows—perhaps as a member of large organization that uses Windows throughout—but packets captured from that target show that they were likely created on a Linux machine, that is a good indicator that a VPN or proxy of some kind is in use. It’s worth noting that virtually all VPN servers are run on Linux or Unix-like servers.
It’s possible to adjust the packet parameters on most systems but very few people go to this length.
Insufficient obfuscation techniques from VPN providers
There’s more to network analysis than just collecting packets. Ancillary processes such as DNS can play a role. Many VPN users are aware of DNS because sending DNS queries in the clear is one way for an observer to determine where you’re visiting or about to visit. However, fewer users are aware of Reverse DNS (rDNS). Much like DNS associates a domain name to an IP address, rDNS associates an IP address to an hostname and the hostname name usually identifies the owner of the IP. In addition, most programming libraries and operating systems come with some version of the standard gethostnameby*() functions which extend a system’s ability to associate IPs and hostnames.
Reverse DNS is not as critical as “normal” DNS because rDNS plays no part in the routing of traffic. Rather, it is used primarily as a means to identify IP ownership. Only the owner of an IP address can associate an rDNS record to it. Therefore, checking the rDNS record of an IP address provides a reasonable assurance of who owns it, or at least, who the owner wants you to think owns it. Note that rDNS is not required and many IP addresses do not have rDNS entries at all.
Let’s look at the example of the domain facebook.com. The DNS A record provided by a standard DNS query shows this IP address:
$ dig +short facebook.com 22.214.171.124
Now let’s use a reverse DNS query or the gethostnamebyaddr() function to see who owns that IP:
$ host -n 126.96.36.199 188.8.131.52.in-addr.arpa domain name pointer edge-star-mini-shv-01-mia3.facebook.com
We can see from this that Facebook actually owns that IP address. However, most sites do not own their own IPs; they are leased and belong to arbitrary organizations or perhaps owned by less obvious entities. Amazon is an example of a large computing provider that is used by many companies. An rDNS query for the IP address of many internet services simply shows that Amazon owns the IP and therefore the information is of little use in determining who operates the IP. Another example is Google. Google is a little more subtle in its rDNS entries, but it still maintains ownership information. Here’s how the reverse DNS looks for a Google IP:
$ dig +short google.com 184.108.40.206 $ host -n 220.127.116.11 18.104.22.168.in-addr.arpa domain name pointer fra16s24-in-f14.1e100.net.
Google owns the 1e100.net domain, so we can see that this IP does in fact belong to Google.
In the world of VPNs, address resolution tools can potentially be used to see if the IP your traffic is destined for belongs to a VPN. For example, a default tcpdump command on the OpenWRT router will attempt to resolve the IPs that it sees in the TCP packets. It seems to primarily use gethostbyaddress() to do this and it’s therefore sometimes possible to see where packets are destined. A default tcpdump capture of an IPVanish session illustrates this:
08:23:14.485768 IP 216-151-184-30.ipvanish.com.3074 > 192.168.1.210.51061: UDP, length 1441 08:23:14.485847 IP 216-151-184-30.ipvanish.com.3074 > 192.168.1.210.51061: UDP, length 1441 08:23:14.486144 IP 216-151-184-30.ipvanish.com.3074 > 192.168.1.210.51061: UDP, length 1441 08:23:14.486186 IP 216-151-184-30.ipvanish.com.3074 > 192.168.1.210.51061: UDP, length 385
The IPVanish client for Windows provides three configurations: a standard OpenVPN connection, an OpenVPN connection using HTTPS, and an obfuscated connection.
The packets above were captured during a session using the obfuscated OpenVPN connection setting, yet WireShark is still able to provide destination information.
When determining VPN usage, there are very few “silver bullets”. It usually takes a number of techniques or observations to compile enough indicators which indicate a VPN is in use, and even then it can be hard to be 100% sure. Companies that have a vested interest in disallowing VPN usage such as Netflix and other streaming services have full-time teams dedicated to just this problem. In other cases, many eastern European and Middle Eastern countries) ban VPN usage and have similar teams to ferret out VPN users.