What is Threat Intelligence?
Threat intelligence (TI), or proactive harvesting of data on cybersecurity threats, is an essential instrument for identifying and responding to security incidents. Some of the common TI sources include free IOC (indicators of compromise) feed subscriptions, vulnerability bulletins released by hardware and software vendors, security researchers’ reports covering threat analysis, as well as commercial TI subscriptions.
However, the information obtained via the above-mentioned channels may be incomplete and not relevant enough. In order to increase TI efficiency and ensure a higher quality of threat data, organizations can additionally employ OSINT (open-source intelligence) and offensive techniques that will be dissected in this article.
Not Instead, but Together
It’s worth emphasizing from the get-go that the approach to be described isn’t about ignoring or replacing threat intelligence information retrieved from the conventional free or paid sources. Instead, it’s about complementing and enriching this data.
The use of additional TI techniques can speed up incident response and help solve quite a few important tasks. For instance, in a scenario of another IoT epidemic that harnesses security flaws of specific firmware versions, how fast can you get the feed of potentially vulnerable devices’ IP addresses so that you can efficiently detect such activity at the network perimeter? Similarly, when an indicator of compromise (IP address or domain name), which was flagged malicious a year or more ago, triggers a monitoring system alert – how can you know whether it’s still malicious, given that a new checkup of this IOC “right now” is impossible?
In other words, the enrichment is particularly useful for spotting IOCs with average lifecycle included in David Bianco’s “Pyramid of Pain” (IP addresses, domain names, network/host artifacts). With that said, you can get new long-lasting IOCs (up the “Pyramid”) in some cases as long as you apply the appropriate analytics.
Internet scanning is one of the most informative threat intelligence techniques. What kind of data is typically harvested this way? The most valuable information is obtained from analyzing so-called “banners”, that is, the responses that the system being inspected returns for the scanner’s queries. These banners contain a bevy of details that identify the applied software and its various properties “on the other end”. Furthermore, this checkup doesn’t take much time.
If you seek to expand the scope of your threat intelligence, the whole Internet is subject to scanning. As you scan the entire public IP address space (about 3.7 billion IPv4 addresses, excluding reserved ones), you can obtain the following useful information:
- Which nodes are affected by vulnerabilities that are heavily used in current malicious campaigns and are therefore potential sources of harmful impact.
- Which nodes are Command & Control servers of botnets, querying which may identify a compromised endpoint within the perimeter being protected.
- Which nodes represent a non-public area of distributed anonymous networks that can be used to go beyond the safeguarded perimeter in an inconspicuous, unguided way.
- Comprehensive information on the nodes that were listed in warning reports of the monitoring system.
A lot of network scanning tools have been created in the course of the information network evolution. The following scanners are worth a separate mention:
Let’s look at some brief insights into the pros and cons of these tools in the context of TI enrichment.
Nmap is perhaps the most popular network scanner created 20 years ago. Owing to the feature of using custom scripts (via Nmap Scripting Language), this is an incredibly flexible tool whose capabilities span more than just the harvesting of applied banners. A large number of NSE scripts are available at this point, many of which are free to use. Nmap inspects every connection and therefore operates in a synchronous fashion, so it shouldn’t be used for monitoring large networks such as the internet, because the scan speed isn’t high. Meanwhile, it’s a good idea to leverage this instrument for scanning a small range of addresses obtained with faster tools, given the power of NSE.
This one is another well-known, although not the first of its kind, asynchronous network scanner that surfaced in 2013. According to the developers’ report that gained publicity at the 22nd USENIX Security Symposium, this tool is capable of scanning the whole range of public addresses (on one port) in less than 45 minutes when launched on an average computer with at least 1GB Internet connection bandwidth. Its benefits include the following:
- High performance. With the PF_RING framework and 10GB connection bandwidth being used, the theoretic time it takes to scan the public IPv4 address range is 5 minutes (on one port).
- Supports random order of scanning network addresses.
- Supports modules for TCP SYN scanning, ICMP, DNS queries, UPnP, BACnet, and UDP probing.
- Availability of common GNU/Linux distribution kits in repositories. BSD and macOS are supported as well.
- A number of related projects have been launched that provide utilities complementing the scanner’s functionality.
Speaking of the cons of ZMap, perhaps the biggest one is that only addresses on one port can be scanned at a time.
The following associated projects, which enhance the power of ZMap in the context of addressing specific tasks, are worth singling out:
- ZGrab – an applied protocols scanner supporting the HTTP, HTTPS, SSH, Telnet, FTP, SMTP, POP3, IMAP, Modbus, BACnet, and Siemens S7 protocols, as well as the Tridium Fox banner grabber with extended functionality.
- ZDNS – a utility for quick DNS queries.
- ZTag – a utility for tagging scan results returned by ZGrab.
- ZBrowse – a utility based on the Headless Chrome tool that monitors website content changes.
Masscan is an asynchronous scanner created the same year as ZMap, which resembles Nmap by its command syntax.
According to its creator, Robert David Graham, his tool boasted higher productivity than any other asynchronous scanner existing at that point, including ZMap, due to the use of a custom TCP/IP stack.
Its benefits include:
- High performance theoretically reaching 10 million packets per second, which allows for scanning the whole IPv4 range in a few minutes (on one port), with 10GB connection bandwidth in place. This is comparable to ZMap employing the PF_RING framework.
- Supports the TCP, SCTP, and UDP protocols (the latter works by forwarding UDP payloads from Nmap).
- Supports random order of scanning network addresses.
- Supports the option to select both the IP range and port range for scanning.
- Supports different formats of presenting the scan results (XML, grepable, JSON, binary, or regular list output).
With that said, this scanner has a number of shortcomings:
- Given that a custom TCP/IP stack is used, it’s recommended to run the scan from a dedicated IP address in order to avoid conflicts with the OS stack.
- Limited capabilities of the banner grabber, compared to ZGrab.
Comparative Table of Scanners
The table below reflects a comparison of the above-mentioned scanners’ features.
“++” – implemented very well
“+” – implemented
“+\-” – implemented with restrictions
“-” – not implemented
|High scan speed||-||++||++|
|Support of random address scan order||+||+||+|
|Support of defining ignore items||+||+||+|
|Flexibility of configuring scan targets (addresses, ports)||+||-||+|
|Applied protocols coverage||++||+||+\-|
|The use of scripts for scanning objects||++||-||-|
|Support of different types of results output (list, table, structured, grepable)||++||+||++|
|OS coverage by available binary kits||++||+||+\-|
Selection of Targets and Scanning Methods
After the right tool has been selected, it’s time to define the object and goal of the scan, that is, understand what you are going to scan and why. Whereas the former is usually a trivial choice (it’s the public IPv4 address range with a few exceptions), the latter entirely depends on the required result. For example:
- To discover nodes susceptible to a certain vulnerability, it makes sense to scan the target range using a fast-performing tool first (ZMap, Masscan). Doing so will allow you to spot nodes with open ports used by known-vulnerable services. After finalizing the range of addresses, you can start leveraging Nmap with an appropriate NSE script (if it exists). Otherwise, you’ll need to create a custom script based on available vulnerability information published by security researchers. Sometimes, running a grep command to find the harvested applied banners will suffice, because many services provide their version details in it.
- To discover nodes of covert infrastructure – both anonymous networks and C2 nodes (that is, Command & Control servers of botnets) – you will need to create custom payloads/probes.
- To solve a broad scope of tasks in terms of discovering compromised nodes, segments of anonymous networks’ infrastructure and the like, it can be quite informative to harvest and parse certificates during TLS handshake.
The Choice of Infrastructure
The fact that modern fast scanners can deliver impressive results even with the average processing unit’s limited resources doesn’t necessarily mean that the network infrastructure load will be insignificant. Given the peculiarities of traffic generated during the scan, the packets per second rate is fairly high. In practice, one core of the average CPU (manufactured in the past few years, running in a virtual environment, under any OS, and without any fine-tuning of the TCP/IP stack) can generate a stream of about 100-150 thousand packets per second with bandwidth close to 50 Mbps. That’s a sizeable load for software routers. Network hardware may also have a hard time upon reaching the productivity limit of the ASICs. Moreover, if the scanning speed is in the range of 100-150 Kpps (thousand packets per second), covering the public IPv4 range may take more than 10 hours.
To speed up the scan process, it makes sense to employ a distributed network of scanning nodes that will break the scan pool down into parts. Configuring a random scan sequence is another important measure to reduce bandwidth congestion and the packet load on the ISP’s “last mile” equipment.
Real-world experience shows that mass network scans may be accompanied by a number of technological, organizational, and legal hurdles.
Performance and Availability Issues
As mentioned above, mass network scanning generates traffic with high PPS rate that causes tangible load on both the local infrastructure and the network of the ISP that operates the scanning nodes. Before performing any scan activities for research purposes, it’s definitely a good idea to coordinate them with local admins and ISP representatives. On the other hand, performance issues may also occur at the level of the scanning endpoints if you rent virtual servers with internet hosting providers.
Not all VPS’s are equally useful. When using ones rented with non-mainstream hosting services, you may run into a situation where, even on high-performance VMs (with a few vCPUs and several gigabytes of memory under the hood), the packet rates of ZMap and Masscan won’t get higher than a few dozen Kpps. The common obstacle is a combination of old hardware and inappropriate set-ups of the virtual environment software. One way or another, in practice, you can rest assured that the performance will be at a decent level as long as you cooperate with major industry-leading companies.
Furthermore, in case vulnerable nodes are spotted, keep in mind that NSE checkups may cause denial of service at the nodes being scanned. Not only is such a scenario unethical, but it’s also beyond the legal area and might lead to adverse consequences, all the way to criminal prosecution.
What Are We Looking for?
In order to find something, you need to know how to look for it. The threats are constantly evolving, and it takes some fresh, ongoing analytical insights to discover them. Sometimes the analysis provided in whitepapers gets out of date, encouraging companies to conduct research of their own.
One of the well-known examples is the research on identifying Tor nodes, which was done by the authors of ZMap in 2013. By examining a chain of certificates and spotting peculiarly generated subject name attributes in self-signed certificates, the analysts were able to identify about 67,000 Tor nodes on port tcp/443, and about 2,900 on port tcp/9001. Nowadays, a much smaller number of Tor nodes can be identified this way due to a greater variety of ports, the use of obfs transports, and the migration to Let’s Encrypt certificates. This trend encourages researchers to leverage other analytical techniques for solving such a task.
Abuse Complaints and Legislation
In the course of Internet-wide scanning, there is a nearly 100% chance of facing a bevy of abuse complaints. Automatically generated abuses regarding a few dozen SYN packets are particularly irritating. Moreover, recurrent scans may cause numerous manually filed complaints.
Who is complaining the most? The main sources of abuse complaints (mostly automatic ones) are educational institutions from the *.edu domain area. Also, the reports by ZMap and Masscan authors include several interesting examples of inadequate reaction to network scanning.
It’s recommended to take these abuses seriously for the following reasons:
- Having received an abuse complaint, the hosting provider will most likely block the traffic to or from rented virtual machines, or may even suspend their operation temporarily.
- The ISP may disable uplink to eliminate the risks of the autonomous system being blocked.
Some of the good practices of minimizing complaints boil down to the following:
- Familiarize yourself with the Terms of Service of your ISP / hosting provider.
- Create an information page on the scan source addresses that will explain the goals of the scan. Additionally, add the appropriate commentary to the DNS TXT records.
- Clarify the essence of the scanning if you receive abuse complaints.
- Implement a list of scan exclusions, add the subnetworks of complaining parties to this list upon first request, and complement the information page with a clarification regarding your practices of adding exclusions.
- Don’t run scans longer or more frequently than required to solve a specific task.
- Distribute the scan traffic by source, destination addresses, and time where possible.
Network Scan Outsourcing
Performing mass scans on your own should be a well-thought-out effort (the risks are highlighted above). Meanwhile, you need to acknowledge that the final outcome may turn out unsatisfactory if the analytical processing of the scan results isn’t thorough enough. You can make things easier, at least from a technical perspective, by using existing commercial products, such as:
This one is the best-known search engine for online-accessible services that was originally designed as an IoT search system and was launched by John Matherly in 2009. At this point, this instrument provides several levels of access to information it holds. The options are quite basic unless you sign up. If you register with the service and pay the membership fee, the following functionality becomes available:
- Search with basic filters.
- Detailed “raw data” on nodes.
- Base API allowing for integration with popular utilities, such as Maltego, Metasploit, Nmap, etc.
- Exporting the search results (payable with “Credits”, the service’s internal currency).
- On-demand scanning of specific addresses and address ranges (payable with “Credits”).
The options listed above should suffice for basic OSINT, but enterprise subscription can unlock the complete functionality of the search engine, which includes:
- On-demand scanning of the whole public IPv4 range of the Internet, including checkup of an assigned port range and indication of protocol-specific banner grabbers.
- Real-time notifications about new scan results.
- Obtaining “raw” scan results for further analysis (no restrictions, with the entire database being available).
- The option of creating custom grabbers fine-tuned for specific tasks (such as uncommon protocols).
- Access to older scan reports.
A number of related initiatives were launched within the Shodan project, such as Malware Hunter, Honeypot Or Not, and Exploits, which enrich scan results.
Censys is an IoT search engine featuring extended functionality. It was created by ZMap author Zakir Durumeric and presented in 2015. The engine uses the technology of ZMap and a few associated projects (ZGrab and ZTag). Unlike Shodan, it allows running five queries from one IP address per day without registration. A sign-up expands the scope of available features with an API and more search results returned for queries (up to 1,000). However, the most complete access, which includes historical data, is the prerogative of customers who purchase the Enterprise plan.
This engine’s advantages over Shodan are as follows:
- No search filter limitations for basic subscription plans.
- A greater speed of updating automatic scan results (on a daily basis, according to the developers; Shodan’s speed is much lower).
The cons of Censys include:
- Fewer ports that can be scanned, compared to Shodan.
- No on-demand scanning (either by IP ranges or by ports).
- The depth of scan results enrichment through out-of-the-box instruments is shallower (which is obvious, given the greater number of Shodan’s tools implemented via related projects).
If the results obtained on the range of Censys-supported ports are enough to reach a particular goal, then the lack of on-demand scanning shouldn’t be a big obstacle, considering the high speed of updating the scan results.
ZoomEye is an IoT search engine masterminded by Chinese security company Knowsec Inc. in 2013. It is backed by the vendor’s proprietary tools, Xmap (port scanner) and Wmap (web crawler). This engine collects information on a vast range of ports and supports plenty of search criteria (ports, services, OS, applications), providing details for each host (applied banner contents). The scan report lists potential applied software vulnerabilities covered by the associated SeeBug project (with no applicability checkup). Registered accounts can use an API and web search feature with a complete set of filters, although there are limitations on the number of results being displayed. Lifting the restrictions is a matter of buying the Enterprise plan. The pros of this engine include:
- Large number of ports that can be scanned (comparable to Shodan’s).
- Extended web support.
- Identification of applied software and OS.
The main cons are as follows:
- Long interval of scan results updates.
- No on-demand scanning option.
- Scan results enrichment is restricted to identifying services, applied software, OS, and potential vulnerabilities.
Comparative Table of Services
The table below presents a comparison of the above-mentioned services in terms of their features:
“++” – implemented very well
“+” – implemented
“+\-” – implemented with restrictions
“-” – not implemented
|Identifiable applied protocols coverage||++||+\-||++|
|Speed of updating scan results database||+||++||+\-|
|Generating “raw” results on scanned nodes||+||+||-|
|Availability of related projects enriching scan results||++||+\-||+|
|Availability of API for integration||+||+||+|
Let’s Scan Ourselves
Your own infrastructure can also be a useful TI source that might shed light on new hosts and services that have appeared in the network, and whether or not they are vulnerable or perhaps even malicious. Whereas you can leverage third-party services to scan your perimeter (for instance, this type of a use case is officially supported by Shodan), you have to perform all the actions inside the perimeter on your own. The set of applicable tools for network monitoring and analysis in this scenario can be quite big and include both passive network security monitors, such as Bro, Argus, Nfdump, p0f, and active scanners like Nmap, ZMap, Masscan as well as their commercial competitors.
The IVRE framework can help interpret the collected results, allowing you to get a Shodan / Censys-like instrument of your own. This framework was created by a group of Internet security researchers. One of them, Pierre Lalet, is a co-author of the Scapy utility. The capabilities of this framework include:
- The use of visual analytics tools to discover patterns and anomalies.
- Advanced search engine boasting detailed parsing of the scan results.
- Integration with third-party utilities via API and Python support.
IVRE is also a great match for analyzing Internet-wide scans.
Scanning and active network reconnaissance are excellent techniques that different security researchers have used for a long time. However, old school security experts have yet to get comfortable with these methods. The use of OSINT and offensive techniques, combined with the classic defensive mechanisms, can enhance the protection considerably and ensure its proactivity.
- 1 What is Threat Intelligence?
- 2 Not Instead, but Together
- 3 Internet Scanning
- 4 Tool Selection
- 5 Selection of Targets and Scanning Methods
- 6 The Choice of Infrastructure
- 7 Difficulties
- 8 Network Scan Outsourcing
- 9 Let’s Scan Ourselves
- 10 Conclusion