We discovered a database containing nearly 188 million records of personal data exposed on the web and accessible to anyone with an internet connection.

Comparitech in conjunction with security researcher Bob Diachenko uncovered the exposed MongoDB database on June 18, 2019.

Some of the records appear to be from Pipl.com and LexisNexis, people search and legal search websites, respectively. The records sourced from Pipl.com, the bulk of the data, contained some or all of the following information:

  • First and last name
  • Aliases and past name
  • Email address
  • Physical address
  • Date of birth
  • Court and bankruptcy notes
  • Phone number
  • Social media profile links
  • Political affiliations
  • Race
  • Religion
  • Skills
  • Gender
  • Employers past and present
  • Automobiles and property

thedatarepo api exposure pipl

About 800,000 of the records appear to originate from LexisNexis, a legal search engine. Those records included names, past names, addresses, gender, parental status, a short biography, family members, redacted emails, and info about the person’s neighbors including full names, dates of birth, reputation scores, and addresses.

It was first indexed by search engines on June 17. We traced the database back to a Github repo for a people search API called thedatarepo. We promptly notified the database owner as soon as he could determine to whom it belonged. The owner then shut down access on July 3, 2019.

thedatarepo data exposure
This screenshot was taken at the beginning of our investigation, the database was subsequently updated and the number of records increased to 188m.

We do not know if anyone else gained unauthorized access to the database.

Whose data was exposed?

Thedatarepo has its own web domain but the website is down as of time of writing. Judging by the ‘dataSource’ fields in the database, it looks like the creators of the API either scraped or purchased the data from Pipl and LexisNexis, and it does not seem likely that Pipl and LexisNexis were actually breached. Much of the personal information found through these search tools is publicly available, though normal users can only view one record at a time.

The Github repo gives examples of how the API could have been used, for example, to look up people by their name or what car they own. It was last updated on June 18, 2019. It lists an email for users to request « bulk data purchases and/or access to more data/requests. »

Data brokers like Pipl obtain personal information from a variety of public and proprietary sources. To that end, they don’t ask for consent and don’t notify record holders that they’re part of a database. If you live in the US, chances are you can be found on data broker and people search websites like Pipl, ZabaSearch, WhitePages.com, Wink, and PeekYou.

pipl homepage

Unfortunately, Pipl does not make removing your personal information easy. Absolving itself of responsibility, Pipl states it only aggregates information from third-party sources. If you want some piece of information removed from Pipl, you have to go to the original source and remove it from there. But because Pipl is now a paid service (it no longer offers a free people search tool), victims are paywalled off from viewing their own information and where it came from.

Exposed databases are a huge risk

Databases of personal information exposed to the web are a huge risk, says Diachenko, who collaborates with Comparitech on security research. Not only is personal data at risk of being stolen; the database itself can be hijacked:

« I have previously reported that the lack of authentication allows the installation of malware or ransomware on the MongoDB servers. The public configuration allows the possibility of cybercriminals to manage the whole system with full administrative privileges. Once the malware is in place, criminals could remotely access the server resources and even launch a code execution to steal or completely destroy any saved data the server contains.”

The people whose information was leaked could be at risk of targeted phishing and identity fraud. We recommend learning how to spot phishing emails to stay safe.

LexisNexis has been breached twice in the past, once in 2005 and once in 2013. The first leaked 310,000 records containing personal information including names, addresses, Social Security numbers, and driver’s license numbers. LexisNexis did not disclose how many records were breached in 2013, but that data reportedly contained SSNs, background reports, and other details on millions of Americans.

Comparitech will update this article if we discover more details to report.