Welcome to this session on Piracy and Library research practices. I’d first like to thank everyone for coming to hear us speak about this topic, and I’d especially like to thank Marydee for setting up this session and especially inviting me to present today. The abstract for this session implies that I’m a copyfighter, which is probably a stronger sentiment than is true. I am responsible for site license administration and online journal content at Caltech, and in that role I often come across issues in content security. It was in that sted that I began to think that the issues surrounding Online content security weren’t well understood by either libraries or publishers, so I organized a panel session at the Charleston Conference for Acquisitions and Collection Development to talk about it. Marydee asked me to construct a similar presentation for today’s session.
We all know what a pirate looks like, could look like Blackbeard here, where I notice that he appears to have the original six-pack abs there. CLICK Or it could be Johnny Depp here as Jack XXX the pirate? Or maybe pirates nowadays don’t look anything like these guys, but instead CLICK
They probably look more like these kids. They could have relatively normal, could have pencils stuck up their noses, could be dunked into garbage cans of who knows what liquid, or could scare you with their hair color. These are all pictures of Caltech students and these are the individuals who are the potential pirates we’re talking about today. But of course, these students are anything but pirates… CLICK
Or are they? This is a picture of the Milliken Memorial Library at Caltech and that’s really a Jolly Roger that they students raised onto the building. Again, this was a prank, but it makes us wonder if the library facilitates piracy. So delving into these issues, I’d like to review some of the development of online content licensing in libraries. CLICK
Back in 1999, I attended the very popular and very important ARL Workshop On Licensing Electronic Information Resources. During that workshop, as we were told about the importance of negotiating certain aspects of licenses, I began to wonder if it was all that necessary. Surely, no publisher would sue a library or vice versa. I even raised my hand and asked, “Has any publisher sued any library over failure to comply with negotiated license terms?” The answer was resoundingly “Not yet!”, but everyone was sure it was coming. I wasn’t so sure though and remain unconvinced. But the time since then has presented many instances of Internet security-related litigation, especially copyright infringement litigation, and thanks to the RIAA and Napster, even lawsuits levied against providers, middlemen, and even end-users. But nothing yet in libraries, and that’s a good thing. As a community, we usually tackle and resolve our issues before the need to litigate even develops. And that’s what this panel discussion is about. How do we continue to work like this in an increasingly distributed digital environment? How do we make sure the concerns of information providers are met realistically and consistently yet ensure that libraries can still continue to legitimately serve the needs of their users? What processes can we develop that allow information producers, providers, vendors, and libraries to effectively work together and enforce the licenses that we’ve negotiated. The background on security issues for licensed online content stem from license adapted from database and software vendors whose models didn’t really adapt to academic research materials and the mission of research libraries. Over the course of the years, as an industry, we’ve come to some basic understanding on most license clauses, including Who, What, When, Where, How…and mysteriously absent is the Why…as in, Why do we need licenses for online content? Well, the why is implied through the Restrictions on Use clauses in our licenses. Of course, information providers wanted to protect their copyright and make sure that providing information in a new format would not result in negative impact on their businesses. So most licenses included clauses outlining prohibited users and prohibited types of usage. And, in some licenses, clauses outlining the consequences of violating prohibited usage.
Most prohibited uses outlined in our licenses seem logical and based on common sense. Things like altering, recompiling, reselling, publishing or republishing, making persistent local copies, altering copyrights or changing publisher or authors names, etc. Most prohibited uses outlined in our licenses are either so unusual that they’re unlikely to ever occur, too difficult to accomplish by the average or even above-average user, or aren’t likely to happen since the potential users would lack a clear motivation to do such a thing. Everyone loves some type of music and music is expensive to acquire, and sharing it is easy so there’s a clear motivation to do just that. But not everyone really cares about that article on copper oxides or contribution of backyard grills to air pollution. But we’ve all still seen some violations of prohibited uses and to me, the major prohibited uses that seem to come up in these instances fall into about 3 categories: systematic copying or downloading, downloading by volume, or allowing unauthorized users to access content. And these things to occur and I’ll outline some examples of occurrences at Caltech along these lines. What I’m really interested in is working out a process to stop these common breaches from occurring and getting libraries and publishers on the same page when needing to communicate about these instances. Let’s take a quick look at a few license examples and some recent violations of prohibited uses that have come up and what we need to rectify these things.
-- Now, like I said before, what is actually written and what happens might be two totally different things. And the next few real life examples bear this out. Each of these actually happened and bring to light a number of aspects of online journal security that could be points to discuss. 1.The first example that I’d like to talk about is through an Open Proxy. This example ? Well, interestingly enough, long after the initial hullabaloo about it, JSTOR did identify an open proxy at Caltech and notified us about it. The identification was done before anyone used it to access JSTOR’s products from our site, but it was helpful to know about the issue and that even at a place that prides itself on its secure system, that an individual researcher could fail to configure their system correctly and impact the whole institute and our publishing partners. In essence, JSTOR just wanted to educate us about the issue, that we were unwittingly contributing to it, and that we should do something about it. There were no consequences if we didn’t and no follow-up if we did. 2. Data Harvesting. We were contacted by a dictionary vendor and told that they detected a systematic program downloading content from their web servers. Why would someone do that? 3. Sequential Downloads. However, recent usage made of this service from your institution exceeds what is regarded as normal and reasonable. This activity was isolated to two hosts identified at IP address 131.215.***.*** and 131.215. .***.*** on December 18th. Many of the requests were sequential and systematic--that is, 1,083 requests, in “Journal of Exceptional Downloads” were downloaded consecutively and within short intervals. Access from the IP ranges 131.215.x.x and 131.215.226.x have been temporarily suspended. Note that systematic and programmatic downloading are two of the Prohibited Uses listed in the Institutional User Agreement that you signed (refer to Section 5, Prohibitions on Certain Uses). We would appreciate it if you would investigate the situation and report back your findings to Publisher. Please note that we would like a reply by January 10th, 2003; if no reply is received and/or this systematic downloading continues, access may be suspended from the entire IP range for your institution. We also require an assurance from you that such systematic downloading will not take place again. What is there: IPs it came from, date it came from, one number of downloads, and at least one journal affected. What’s not there: Time it happened, exact material affected, what was downloaded (abstracts, fulltext, etc.). They also asked for 20 days reply. And what constitutes ‘assurance’ and makes that ‘assurance’ enforceable? 4 Systematic Downloads
Using these same case studies, we can look at what really happened and say each of the occurrences were cases of Accidental Piracy:L Jstor open proxy. Misconfigured server, and a virus. The 2 nd instance was probably when a user downloaded something from the Web that was infected and it opened a proxy for a nefarious purpose, almost definitely not to acquire academic research materials, but probably movies and music. Data harvesting Excessive Downloads Automated Downloads
These case studies illustrate a number of problems for libraries. First, most security breaches that happen in libraries are reactive, not proactive. Publishers and libraries wait until a breach occurs before investigating and stopping that breach. Usually, as I’ve pointed out, it’s not really a case of Intentional Piracy but of usage of the content in a way that makes sense to the user, is available to be used in that manner, and it not done maliciously or to repurpose. Remember, this is academic research information, not movies or music, so the desire to acquire the information is much less. Libraries also usually have problems because the information communicated to the library is usually incomplete or inconsistent between publishers. We don’t usually know what the trigger events are and what we’re supposed to do to fix the breach. In most instances, doing nothing seems to work just fine for many publishers but I don’t see that as a scalable solution.
These examples bring to mind a number of issues about Online Journal Security. And as a librarian, most of these came from my viewpoint as a staff member who is responsible for negotiating license terms, and when those terms are perceived to be violated, attempting to enforce the terms or rectify the actions with the provider. Clearly we need to improve the processes that we have as an industry on the following topics: These include: Initial (pro-active) enforcement of license terms (notification / education) Technical systems at the library to ensure compliance Technical/social systems ability to be reactive to enforcement Social systems that enforce/educate compliance (i.e. signage, popups, clickthroughs, notes on screen)
And as librarians, why do we care about these issues? First and foremost, we want to provide information to our users and not violate our licenses. We want to negotiate licenses that are clear about what we are required to do and not be hit by surprises during the life of the contract. We don’t want one user to impact the potential use by others We want to provide seamless access to information with a minimum of intermediation We want to ensure that our usage metrics are accurate representations of usage. That’s what I think is important on this topic, but let’s hear from a number of publishers and another librarian about their perspectives. First up is…
Why should you really care? And there are content providers in this audience whose companies essentially don’t care – it’s more expensive and laborious to protect their content than to let everyone use it at any time. B ut you do need to protect your content since it’s essentially all you usually have to sell. Paying attention to the use (and mis-use) of the product provides valuable insight into the content, the interface, and potential new types of content production and the associated potential revenue. It’s also important to understand your new consumers and how they potentially will be utilizing the content you provide in the future. Usage enhances your content, so you want to promote usage while still protecting that content from mis-use.
You have to ask yourself if our new consumers, the Internet Natives, really care about your content and the integrity of your property. First, they might value that content in a different way for for a different purpose than you value it or intend it to be used. They’ve grown up in an environment with liberal re-use of material for educational and academic purposes is commonplace. In addition, your content might be important to them at the time, but if they can’t get it or do what they want with it, they will go somewhere else. And they need it immediately and will use it immediately and often ephemerally, so the stakes are different in acquiring that content.
Here is a picture of some Pasadena area speed bumps. Notice that they are USED…
And here is what they did with them…they lined an entire dorm with them.
Piracy in the Library: When Internet Natives Go Bad
<ul><li>Piracy in the Library: </li></ul><ul><li>When Internet Natives Go Bad </li></ul><ul><li>John McDonald </li></ul><ul><li>California Institute of Technology </li></ul><ul><li>March 27, 2006 </li></ul>
Security of licensed content <ul><li>Online publishing led to licensing of academic research materials </li></ul><ul><ul><li>Licenses adapted from database & software models </li></ul></ul><ul><li>Clauses focused on explicit definitions of users and usage </li></ul><ul><ul><li>Who (authorized users) </li></ul></ul><ul><ul><li>What (licensed content) </li></ul></ul><ul><ul><li>When (term and renewal) </li></ul></ul><ul><ul><li>Where (jurisdiction) </li></ul></ul><ul><ul><li>How (technical aspects) </li></ul></ul><ul><li>And Why…(as in)… Restrictions on Use </li></ul><ul><ul><li>Prohibited users </li></ul></ul><ul><ul><li>Prohibited use </li></ul></ul>
Prohibited Uses <ul><li>Usual prohibited uses (…or duh!) </li></ul><ul><ul><li>altering, recompiling, reselling, publishing or republishing, making persistent local copies, altering copyrights or changing publisher or authors names, etc. </li></ul></ul><ul><li>Common breaches (…or what seems logical to the publisher but not to our users) </li></ul><ul><ul><li>Systematic or programmatic copying or downloading. </li></ul></ul><ul><ul><li>Downloading by volume (too much or too much from the same issue) </li></ul></ul><ul><ul><li>Allowing unauthorized users to access content </li></ul></ul>
Case Studies: Intentional Piracy <ul><li>Open Proxy </li></ul><ul><ul><li>Are ne’er-do-wells accessing licensed content? </li></ul></ul><ul><li>Data Harvesting </li></ul><ul><ul><li>Why screen-scrape an online dictionary? </li></ul></ul><ul><li>Sequential/Excessive Downloads </li></ul><ul><ul><li>Who needs 1,083 articles from one journal? </li></ul></ul><ul><li>Systematic/Automated Downloads </li></ul><ul><ul><li>Why would someone use a program to download content? </li></ul></ul>
Case Studies: Accidental Piracy <ul><li>JSTOR Open Proxy </li></ul><ul><ul><li>1 st instance: misconfigured server </li></ul></ul><ul><ul><li>2 nd instance: virus </li></ul></ul><ul><li>Data Harvesting </li></ul><ul><ul><li>Acquiring “Data as data” </li></ul></ul><ul><li>Sequential/Excessive Downloads </li></ul><ul><ul><li>Traveling / Sabbatical / Graduation </li></ul></ul><ul><li>Systematic/Automated Downloads </li></ul><ul><ul><li>Crossword puzzle assistance </li></ul></ul><ul><ul><li>Mozilla plugin (PDF capture) </li></ul></ul>
Problems for libraries <ul><li>Most security issues are reactive in nature </li></ul><ul><li>Incomplete information communicated to library </li></ul><ul><li>Inconsistent instructions among publishers </li></ul><ul><li>Unknown trigger events and breach cure procedures </li></ul><ul><li>Rare / inconsistent follow-up </li></ul>
Improving Content Security <ul><li>Libraries </li></ul><ul><ul><li>Pro-active enforcement of license terms </li></ul></ul><ul><ul><li>Improve technical infrastructure for compliance </li></ul></ul><ul><ul><li>Reactive enforcement process </li></ul></ul><ul><ul><li>Identify users & breaches when notified </li></ul></ul><ul><ul><li>Communicate with publishers </li></ul></ul><ul><li>Publishers </li></ul><ul><ul><li>Improve technical infrastructure </li></ul></ul><ul><ul><li>Define trigger events </li></ul></ul><ul><ul><li>Communicate to subscribers </li></ul></ul><ul><ul><li>Investigate – may lead to new views on content </li></ul></ul>
Why should we care? <ul><li>Provide seamless access to information with a minimum of intermediation </li></ul><ul><li>Negotiate clear and explicit licenses </li></ul><ul><li>Provide information according to license terms </li></ul><ul><li>Reduce impact of misuse by one on the potential use by others </li></ul><ul><li>Ensure that our usage metrics are accurate representations of usage. </li></ul>
Why should you care? <ul><li>Protect content </li></ul><ul><li>Develop new content & revenue streams </li></ul><ul><li>Understand new information consumers </li></ul><ul><li>Usage enhances your content </li></ul><ul><li>If you don’t, then they are smart enough to do whatever they want... </li></ul>
Do they care? <ul><li>They might value your content for a reason you don’t </li></ul><ul><li>Creative commons and educational use is a de-facto standard in academia </li></ul><ul><li>Interchangeable nature of content </li></ul><ul><li>Immediacy trumps everything </li></ul>