A Field Guide to Internet Piracy
Types of pirate sites and how they operate
Lydia Crowe, University of Iowa Press
AAUP, June 21, 2013
Do not host content
Link to content from around
1. Indexing Sites
Legal uses: online storage of large files; transfer of
files too large for e-mail.
Examples: megaupload.com (RIP),
2. Locker sites
Right: Kim Dotcom, Megaupload mogul.
Both store and index files.
Legal uses: document sharing for teachers,
companies, authors, etc.
3. Document hosting sites
4. Torrent search engines
Index of torrent files.
Torrents do not contain book content – only
metadata. Users, not torrent sites, are liable for
Content is hosted on individuals’ computers and can
be downloaded in parts from multiple users (seeders)
at the same time. Any given torrent can be hosted
and accessed in multiple countries and jurisdictions.
Examples: ISOHunt.com, ThePirateBay.org
Usenet: a discussion system that can also be used to
exchange ASCII-encoded binary files
5. Usenet search engines
Source: Wikimedia Commons. Usenet Binaries Upload process.PNG. Author: Dale Mahalko.
Users pay for subscriptions to the
Usenet has no central system but
consists of a dynamic array of
servers. When the servers
connect, they pass files back and
Once a book is posted to Usenet,
removing it is nearly impossible.
5. Usenet search engines
Source: Wikimedia Commons. Hercules and the
Lernaean Hydra. National Archaeological
Museum of Spain. Photo: Luis García.
Some sites claim to have certain e-books, but ask the
user to pay a fee for access to the file.
Often, paying this fee gives the user access to a
torrent network that he/she could have accessed for
free and that may or may not have the file he/she
Some e-books are lendable. If the publisher
has allowed it, they can be loaned to
another user’s e-reader for 14 days.
Users sign up to e-book lending sites and list
the books they own. Then they can look at
other people’s collections and ask to borrow
NOT piracy – but your authors might see it
Examples: lendle.me, ecolibris.net,
7. E-book lending sites
Targets: indexing sites, locker sites, and document
1. Locate infringing files using Google
Use site-specific searches: <“university of iowa press”
site:ebook3000.com>, for example, will find everything
containing the phrase “university of iowa press” on the site
Google Alerts for the press name or for specific titles can also
turn up infringing sites.
2. Record titles and URLs on a spreadsheet
3. Send mass takedown notices
4. Follow up for compliance
Anti-piracy at the University of Iowa
1. Introduce self - experience (4 yrs, Rights & Permissions for 3 yrs)2. Introduce presentation - a little about what you will see when you start searching for book piracy; what it's practical to take action against and how you can do it; and lastly a little about the anti-piracy efforts the University of Iowa Press has taken and the results we've seen. Hope this presentation is not too basic - you might be familiar with a lot of this information already, but I find that having a thorough understanding of how these sites operate helps you decide where to expend time and energy policing them -- and helps you explain to authors whether an apparent link to their book is something to be concerned about or not. So, without further ado…
3. Most common thing you'll see - indexing sites. Sometimes they're on a blog made by a free blogging platform like Blogger or Wordpress; sometimes they're a more complicated site, with search tools and books organized by category. They don't actually host any files and have a kind of symbiotic relationship with…
4. Locker sites. These servers are where your books actually live. They're not directly associated with the indexing sites; an indexing site page for a book may have links to many different locker sites. But they are generally not searchable and often the file names are strings of random characters. Indexing sites help users find the files they want on locker sites. They have legal uses too -- providing online file storage or transfer of files too big to be an e-mail attachment, for example -- but they're largely used for piracy. Some are worse offenders than others.This leaves the question -- do you go after the indexing site, the locker site, or both? In general, I only target the locker site, since the locker site is where the file is actually stored. There's also the theory that leaving indexing sites full of dead links frustrates would-be downloaders and has a deterrent effect. Whether that's true or not I don't know, but in my experience, links on indexing sites are not updated regularly; the dead links tend to stay up once a locker site has taken the book file down, and that renders the locker site useless.
5. Another type of site that both hosts and indexes books is a document storing site, the most popular of these being Scribd. These sites have numerous legal uses - but they are also used to share books illegally. In my experience these sites are quite compliant. Scribd, in particular, is very good about blacklisting particular e-book files so that they can never be uploaded again.
6. Now for some of the trickier kinds of sites that are more difficult to police -- these are BitTorrent and Usenet search engines. First I'll talk about BitTorrent sites. These are indexes of torrent files. ThePirateBay.org is probably the most notorious example. Torrent files have no book content. What a torrent file consists of is metadata, like the size and structure of the file, the title, and a list of trackers, or computers that help BitTorrent users find each other. What makes torrent sites tricky is that they are not liable for infringement. Users of the torrent sites are exchanging files directly from their personal computers; this makes them liable. So while individuals could be prosecuted for piracy, and have been, few presses have the resources to do this - it would require you to get lists of downloading activity from individuals' ISPs and comb through them for instances of your books being pirated. And, since a torrent file can be hosted simultaneously in multiple countries and multiple jurisdictions, many users have no legal obligation to comply with a takedown request. It is simply impractical for us to go after them -- so I ignore them.
7. The good news is that according to a recent survey by a Princeton student, books are 1% or less of the content being exchanged through BitTorrent. It's much more popular for larger files like movies, TV shows, and software. Another thing to keep in mind is that for the torrent system to work, multiple people who have the same file have to be online and providing access to the file. This is called "seeding." A book title may be listed on a torrent site, but unless users are online with the book, there's no way to download it. So it's much harder to obtain a scholarly book with a small audience on BitTorrent than a commercial bestseller. This is good news for us.
8. Now a little about Usenet. This was originally a text-based system, used to distribute news articles and foster discussion. But it is also used to exchange binary files that have been encoded into ASCII. Here's a diagram showing how a file -- in this case, a DVD -- is uploaded to a Usenet newsgroup. First the whole CD is turned into a binary file; then it's split and compressed. Next parity files - which help the file to rebuild itself later - are created. Then, all the files are encoded into characters that can be sent over Usenet. Finally, someone downloads the file and reverses the whole process to turn all the characters back into a DVD.
8. Usenet is a subscription service. What's particularly clever about it is that there is no central server. It's a system that consists of a dynamic array of servers which are constantly passing files back and forth. Once a book is posted to Usenet, it's almost instantaneously propagated to multiple servers all over the world. So removing a book from Usenet, once it's posted, is nearly impossible. If one server removes it, it's still on all the others, and it will soon be "re-propagated". (This is a literary audience -- so you can see why I chose a picture of Hercules and the Hydra instead of a technical diagram to illustrate this!) As with BitTorrent, servers are located all over the world, many in countries where such piracy is legal. And unlike with BitTorrent, prosecuting users is almost impossible because Usenet providers keep the IP addresses of their users private and maintain no logs of user activity. This makes going after Usenet piracy not just impractical, but nearly impossible! So let's change the subject before you all get too depressed...
9. It's not surprising that all these online marketplaces of content have given rise to some predatory practices against users, too. It's common to come across sites that ask for the user to pay for a subscription to access thousands of downloadable e-books. Often, these are just fronts for BitTorrent search engines that provide the same content that, say, ThePirateBay provides for free. As we've learned, there is no guarantee that the user can actually find an active torrent of the book he wants. The one I come across the most often is downloadprovider.me. So, if you see a book that appears to be on downloadprovider.me, keep this information in mind! When I come across a site I haven't seen before, I find it useful to click through and see how difficult it is to find the e-book file. If you click a download link and are redirected to a paywall, it's worth taking the site's claims with a grain of salt.
10. The last thing I want to mention before I move on is the legal practice of e-book lending, which often looks suspicious but is harmless as far as I can tell. Some e-books are lending-enabled. They can be transferred from one e-reader to another, for a limited number of times, for a temporary period. E-book lending sites have lists of users and the books that they own. The users can then browse each other's collections and ask to borrow books. These sites don't have any book content -- they just help users meet up, and then the users go to, say, Amazon.com and use the lending function there. University of Iowa Press books are not lending-enabled, so even though one of our books will show up in a user's list from time to time, they wouldn't actually be able to lend it out. A year ago I had an author contact me quite distraught about his book being listed on one of these sites -- it was hard to convince him that this was a legitimate practice! But it is, and it's worth familiarizing yourself with how these sites look and whether or not your titles are lending-enabled.
11. I'll close by speaking briefly about how we've been addressing the piracy problem at the University of Iowa Press, which I'll be the first to admit has been pretty simple and unsophisticated. I use Google searches and Google alerts to find infringing files. I target only indexing/locker sites and document hosting sites, for all the reasons I've discussed here. My usual process is to find a specific site where a lot of books are being offered and then use a site-specific search. For example: university of iowa press, in quotation marks, then site:ebook3000.com, will find everything on that site containing the phrase University of Iowa Press.The only problem is that some sites omit the name of the press,and sometimes even scramble the title or author name! In this case, a search for individual books by title or author is needed. This is obviously more time-consuming and if you have a large list, you may just be able to target best-selling or new titles. Fortunately for us, we have a relatively small list. I then record the titles and URLs for that site on a spreadsheet, and once I've combed through the site, I send a single takedown notice for all the URLs. It's also useful to follow up in a few weeks - see if the file has actually been taken down or if the broken links have been updated. Sometimes it takes a few whacks with the whac-a-mole mallet, so to speak, to get the file to stay down.
12. Here's a shot of the spreadsheet I use as an example. I also use the far-left column to record the number of times I have found the file -- this seems like information that could come in handy someday. Now have my efforts been effective? Without a more sophisticated way of measuring pirate activity, I can't say for sure. But what I can tell you is that the number of active links I have found since I started this activity a couple of years ago has gone down drastically. I used to spend an hour or more a day on finding links and sending takedown notices; now it's on average an hour a month, if that. I'm not sure if this means that I've been successful or that book downloaders have become better at concealing their activities. But my small amount of work on this does seem to have made a difference. And that's the good news I really want everyone to take away from all this. It can seem overwhelming when you first venture out there and see how much piracy there is, but sending takedown notices is quick and easy when you send them en masse, and the vast majority of sites I've interacted with are compliant with these requests. I hope this information has been helpful. I want to thank you very much for your attention. There will be time for questions in a few minutes, but for now I'm going to turn the floor over to Mike Schwartz of Princeton University Press...