A database containing more than 267 million Facebook user IDs, phone numbers, and names was left exposed on the web for anyone to access without a password or any other authentication.

Comparitech partnered with security researcher Bob Diachenko to uncover the Elasticsearch cluster. Diachenko believes the trove of data is most likely the result of an illegal scraping operation or Facebook API abuse by criminals in Vietnam, according to the evidence.

The information contained in the database could be used to conduct large-scale SMS spam and phishing campaigns, among other threats to end users.

Diachenko immediately notified the internet service provider managing the IP address of the server so that access could be removed. However, Diachenko says the data was also posted to a hacker forum as a download.

Update on March 6, 2020: A second server was exposed by what appears to be the same criminal group. The data in this server is identical to the first, plus an additional 42 million records. We’ve updated this article accordingly.

Timeline of the exposure

The database was exposed for nearly two weeks before access was removed. Here’s what we know:

  • December 4, 2019 – The database was first indexed by search engines.
  • December 12, 2019 – The data was posted as a download on a hacker forum.
  • December 14, 2019 – Diachenko discovered the database and immediately sent an abuse report to the ISP managing the IP address of the server.
  • December 19, 2019 – Access to the database was removed.
  • March 2, 2020 – A second server containing identical records plus an additional 42 million was indexed by search engine BinaryEdge.
  • March 4, 2020 – Diachenko discovered the second server and alerted the hosting provider.
  • March 4, 2020 – The server was attacked and destroyed by unknown actors.

Typically, when we find exposed personal data like this, we take steps to notify the owner of the database. But because we believe this data belongs to a criminal organization, Diachenko went straight to the parties hosting the servers and relevant IP addresses.

Shortly after Diachenko discovered the second server, it was attacked by an unknown party. The databases of personal info were replaced with dummy data and database names that read, “please_secure_your_servers”.

facebook uid exposure 1
Exposed database prior to unknown attack.

facebook uid exposure 2

Exposed database after unknown attack.

What data was exposed

Initially, 267,140,436 records were exposed. Most of the affected users were from the United States. Diachenko says all of them seem to be valid. Each contained:

  • A unique Facebook ID
  • A phone number
  • A full name
  • A timestamp

The server included a landing page with a login dashboard and welcome note.

Facebook IDs are unique, public numbers associated with specific accounts, which can be used to discern an account’s username and other profile info.

The second server exposed in March 2020 contained the same 267 million records as the previous one, plus an additional 42 million records. It was hosted on a US Elasticsearch server. 25 million of those records contained similar information: Facebook IDs, phone numbers, and usernames.

16.8 million of the new records contained even more info, including:

  • Facebook ID
  • Phone number
  • Profile details
  • Email addresses
  • Some other personal details

How did criminals get Facebook data?

How criminals obtained the user IDs and phone numbers isn’t entirely clear. One possibility is that the data was stolen from Facebook’s developer API before the company restricted access to phone numbers in 2018. Facebook’s API is used by app developers to add social context to their applications by accessing users’ profiles, friends list, groups, photos, and event data. Phone numbers were available to third-party developers prior to 2018.

Diachenko says Facebook’s API could also have a security hole that would allow criminals to access user IDs and phone numbers even after access was restricted.

Another possibility is that the data was stolen without using the Facebook API at all, and instead scraped from publicly visible profile pages.

“Scraping” is a term used to describe a process in which automated bots quickly sift through large numbers of web pages, copying data from each one into a database. It’s difficult for Facebook and other social media sites to prevent scraping because they often cannot tell the difference between a legitimate user and a bot. Scraping is against Facebook’s–and most other social networks’–terms of service.

Many people have their Facebook profile visibility settings set to public, which makes scraping them trivial.

This isn’t the first time such a database has been exposed. In September 2019, 419 million records across several databases were exposed. These also included phone numbers and Facebook IDs.

Dangers of exposed data

A database this big is likely to be used for phishing and spam, particularly via SMS. Facebook users should be on the lookout for suspicious text messages. Even if the sender knows your name or some basic information about you, be skeptical of any unsolicited messages.

Facebook users can minimize the chances of their profiles being scraped by strangers by adjusting their account privacy settings:

  1. Open Facebook and go to **Settings**
  2. Click **Privacy**
  3. Set all relevant fields to **Friends** or **Only me**
  4. Set **”Do you want search engines outside of Facebook to link to your profile** to **No**

This will reduce the chances of your profile being scraped by third parties, but the only way to ensure it never happens again is to completely deactivate or delete your Facebook account.

How and why we discovered this data

Comparitech works with Bob Diachenko to uncover unsecured databases and report them to the public. Our aim is to limit access to and abuse of personal data by malicious parties, and to raise awareness among those affected about the potential risks.

Upon discovering exposed data, Diachenko immediately notifies those responsible so the database can be shut down or secured. We then analyze the leak to identify victims, the duration of the exposure, and any potential threats victims might face.

Previous reports

Comparitech and Diachenko regularly team up to uncover exposed data. Some of our other reports include:

DeHashed.com, a breach notification, prevention, and consultancy service, also discovered the second data exposure and contacted us to confirm evidence indicating the same criminal group was responsible.