The 2.8GB torrent was compiled by hacker Ron Bowes of Skull Security, who created a web crawler program that harvested data on users contained in Facebook’s open access directory, which lists all users who haven’t bothered to change their privacy settings to make their pages unavailable to search engines.
Bowes’ directory contains 171 million entries, relating to more than 100 million individual users – more than one in five of Facebook’s recently trumpeted half billion user base.
The file contains user account names and a URL for each user’s profile page, from which details such as addresses, dates of birth or phone numbers can be accessed. Accessing a user’s page from the list will also enable you to click through to friends’ profiles – even if those friends have made themselves non-searchable.
Read more on THINQ.
As of the time of this posting, Skull Security’s site is timing out, probably because the story was slashd0tted. The original post, available in Google’s cache, reads in part:
I wrote a quick Ruby script (which has since become a more involved Nmap Script that I haven’t used for harvesting yet) that I used to download the full directory. I should warn you that it isn’t exactly the most user friendly interface — I wrote it for myself, primarily, I’m only linking to it for reference. I don’t really suggest you try to recreate my spidering. It’s a waste of several hundred gigs of bandwidth.
The results were spectacular. 171 million names (100 million unique).
[…]
But it occurred to me that this is public information that Facebook puts out, I’m assuming for search engines or whatever, and that it wouldn’t be right for me to keep it private. Why waste Facebook’s bandwidth and make everybody scrape it, right?
So, I present you with: a torrent! If you haven’t download it, download it now! And seed it for as long as you can.
This torrent contains:
- The URL of every searchable Facebook user’s profile
- The name of every searchable Facebook user, both unique and by count (perfect for post-processing, datamining, etc)
- Processed lists, including first names with count, last names with count, potential usernames with count, etc
- The programs I used to generate everything
Update: Link to ThinQ removed after it was bought out by an unrelated firm that now redirects to advertising.