Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Library Science Talk: Tensions between copyright and knowledge discovery
1. Tensions between Copyright and
Knowledge Discovery
Susan Reilly
23-24 March 2015
Library Science Talks, Geneva & Bern
2. Text & Data Mining is the future
“Text and data mining (TDM) is the process of deriving
information from machine-read material. It works by
copying large quantities of material, extracting the data,
and recombining it to identify patterns.” JISC
3. Why do we call this Knowledge
Discovery?
• Ultimate goal is to extract high level knowledge
from low level data
• Allows analysis across disciplines
• “Undiscovered public knowledge” (Swanson)
• Identifies patterns in the data to produce new
knowledge
• It’s not a new thing, it’s just digital information
makes it a whole lot more powerful and relevant!
4. Alternative to literature review
• Over 50 million articles online
• 1.5 million articles published annually
• Advanced discovery and visualisation
• A more efficient way to discover what is already
out there
5. Malhotra A, Younesi E, Gurulingappa H, Hofmann-Apitius M (2013) ‘HypothesisFinder:’ A Strategy for the
Detection of Speculative Statements in Scientific Text. PLoS Comput Biol 9(7): e1003117.
doi:10.1371/journal.pcbi.1003117
6. “TDM saves lives”
http://arxiv.org/abs/1407.7094
• Tools in the armoury of every biologist and
biotecnician
• Discover new treatments for diseases e.g. fish
oil for Raynaud’s Syndrome
• Controlling malaria outbreaks
• Links between gene mutation and cancers
12. Economics & Competitiveness (Europe)
• TDM potentially worth 5.3 billion euro a year to European
research budget (2%)
• Knock-on effect would be a minimum of 32.5 billion euro
increase in GDP
• US responsible for over half
the articles and patents on TDM
- 1100 US patents compared to 39
EU by 2013
• Non-english speaking countries
falliing behind
13. Copyright v TDM
• Because it involves the copying of content in
order to convert into machine readable format
TDM may infringe copyright
• European Database Directive
prohibits copying of substantial
parts of databases
• In US TDM is covered
by fair use, other parts of the
world have a specific exception
e.g. Japan, UK
https://www.flickr.com/photos/apelad/304195427/
14. The debate in Europe
• Licences for Europe, Feb 2013
– “The Commission's objective is to promote the efficient use of text and data
mining (TDM) for scientific research purposes. ……The Group should explore
solutions such as standard licensing models as well as technology platforms to
facilitate TDM access.”
• No discussion of copyright e.g. does TDM
infringe copyright law?
• Engaging the wrong stakeholders
• An attempt to systematise a problem/not a
solution
15. The problem with licences
• Permission culture: Why relicence? Can’t licence
everything!
• Not scalable or cost effective
• Will licence reflect how the researcher actually
performs TDM?
ME 442 Permission" by Nina Paley - http://mimiandeunice.com/2011/08/30/permission-2/. Licensed under Creative Commons Attribution-Share Alike 3.0 via Wikimedia Commons -
http://commons.wikimedia.org/wiki/File:ME_442_Permission.png#mediaviewer/File:ME_442_Permission.png
16. Elsevier TDM Policy
• Access through API only
• Text only- no images, tables
• Research must register details
• Click-through licence
• Terms can change any time
• Reproducibility of results
17. The debate in Europe continued…..
• Copyright consultation in March 2014
• Commission to present a proposal for reform in
September
• JURI rapporteur, Julia Reda, draft report on
InfoSoc Directive to be voted on in May
18. The Perfect Swell: ideal conditions
for growth of TDM in Europe
• Stakeholder workshop (60 attendees)
• Views from industry, researchers, infrastructure, OA
publishers, legal experts
• Main findings:
– Licencing not scalable
– Need to address lack of legal clarity (does TDM
infringe copyright?)
– Need for harmonisation of copyright law
– Lack of awareness amongst researchers
– Publisher infrastructure not threatened by TDM
http://blogs.plos.org/opens/2014/03/09/best-practice-
enabling-content-mining/
19. So, what do we want?
• Legal clarity
– A specific exception in EU law to allow TDM
– A reinterpretation of EU law
• Legal interoperability
– A solution at WIPO
• Open licences
– CC-by and CC0
20. What do we not want?
• Licences for subscriptions which explicitly forbid
machine crawling
• A licence with every single publisher for every
single research project
• Publishers placing conditions on how TDM
results are disseminated
• Click-through licences
• “Open access” licences that are
NOT interoperable (STM model licences)
21. Spreading the Message
• Global and multistakeholder
• Take a holistic approach
• Articulation of the value of TDM
• Case studies
• Practitioners
• Common vison
• Actions
22. Elsevier TDM Policy
• Access through API only
• Text only- no images, tables
• Research must register details
• Click-through licence
• Terms can change any time
• Reproducibility of results
23. Key Principles
• Common vision
• Copyright not intended to govern access to
facts, ideas and data, nor should it
• Need to move beyond the tipping point of open
access
• Protect academic freedom
• Actions
Thank you.
As introduced, my name is Susan Reilly and I am Advocacy and Projects Manager for LIBER, the Association of European Research Libraries. LIBER represents over 400 research libraries (that is national, university, and other dedicated research libraries) in over 40 countries. Our mission is to create an information infrastructure to enable research in LIBER institutions to be world class. The production of world class research
In pursuit of this mission, since February 2013, LIBER has been advocating for copyright reform, both in Europe and internationally, in order to ensure legal clarity around the act of text and data mining and therefore increase the practice of it. We are advocating for reform because we believe that text and data mining will become integral to the research process, it will bring new efficiencies to research, increase analytic capability and provide new insights that would not be possible without machine technology. In short, we do not believe that, in today’s world of advanced analytics and increasing machine power in research, an information infrastructure can enable world class research outputs without a copyright framework that supports the way researchers work in the digital age.
I’m going to outline our trajectory in terms of how and why we started advocating for copyright reform, but first, in order to explain our position (and maybe bring a few more of you on-board), I’d like to take a look at the definition of text and data mining, what it involves, how this relates to copyright and what the current situation is in Europe.
Text and data mining (TDM) is the process of deriving information from machine-read material. It works by copying large quantities of material, extracting the data, and recombining it to identify patterns. TDM is essentially another method of reading, done by the computer rather than the human eye. It is a natural next step for the research process, as more and more content is electronic. For libraries what this means is that researchers are able to extract more value from our vast collections- born digital and digitised. This is important because we’re getting to the point where nearly 70% of our collections are digital and up to 90% of our collections budget goes on digital content.I’d like to show you some examples of the added value of TDM.
TDM is also known as knowledge discovery. Back in 1986 an information scienctist called Don Swanson came up with the concept of undiscovered public knowledge- the idea that by making connections between the vast amounts of knowledge we have already produced and is in the public domain we can uncover new knowledge. Swanson did this manually by mining outputs from different disciplines. The ultimate goal of knowledge discovery is to extract knowledge from data, that knowledge is new knowledge.
So how can knowledge methods be applied in research libraries? TDM can act as an alternative to a literature review. In 2013 the figure for online articles was over 50 million and the rate of publication was projected to be 1.5 million a year. The large volume of research publications can make it impossible to conduct a comprehensive literature review but using TDM for advanced discovery can help narrow, or uncover patterns in the literature. In fact even library discovery services use TDM e.g. to visualise search results.
Don Swanson, an information scientist, first wrote about undiscovered public knowledge in 1986. He believed such knowledge was available by bringing diverse literature together- in this way he made a connection between dietary fish oil and Raynaud’s disease, a circulatory disorder.
A nice example of how TDM can replace or supplement a literature review is this hypothesis finder. An hypothesis is a supposition or proposed explanation made on the basis of limited evidence as a starting point for further investigation. With such as vast amount of content being published, these starting points or hypotheses can become lost in the noise.
It’s now widely agreed that text and data mining tools should be in common use by biologists and biotechnician. It’s been proven that the mining of sceintfic literature can provide new insight into the treatment for diseases, it can identify where medicines that are already on the market can be used to treat different diseases- dramatically reducing trial times. Data can also be mined to identify the spread of outbreaks and predict and hopefully prevent violent events.
Not just about scholarly articles and databases This image shows the spread of Ebola outbreaks by mining news pieces, blogs, social media and openly available medical reports. It can be the fastest way to track the spread of diseases and therefore control it.
TDM can provide us with new cultural insights. Take this recent visualisation of the geographic dispersal of Spanish dialects which was created by mining geo located tweets. The research has uncovered super dialects and where they are being used, whether the areas are rural or urban. Although the research itself was carried out by European researchers, the datasets were analysed by a company in the US, probably because of the lack of clarity around performing the same activity in Europe.
TDM is not just for developers. We can all use TDM tools and they will eventually lead to more government transparency, for example. Just to prove that TDM is for everyone, here’s an example. This is a tool from the National Centre of text and data mining in the UK. It’s in beta, but is freely available and can be used to analyse the sentiment of a text. In this example I have taken a letter that the IFLA CLM Committee recently sent to the European Commission. The letter was sent on foot of a setback at WIPO in relation to the work on copyright exceptions for libraries. This setback was caused by the EU. This letter demonstrates what pros the people at IFLA are when it comes to letter writing. Note, the only reason I could perform this analysis and not break European copyright law is because the content on the IFLA Website is available under a cc-by licence. downloading a trove of medical research papers and then data-mining them to uncover hitherto-undetected links between pharmaceutical firms and the authors of articles in prestigious journals. Arron Swartz, was able to discover, throughdownloading a trove of medical research papers and then data-mining them, hitherto-undetected links between pharmaceutical firms and the authors of articles in prestigious journals. He did the same by mining case law and legal reports to discover a link between the law firms who produced the reports and …
Some could argue that the copying involved is TDM is transient and therefore not an infrindgement of copyright law, but this activity still comes up against the sui generis Database Directive. Certainly from a research perspective, it is preferable to be able to store a copy of the data to ensure reporducibility of research results. The bottom line is that there is a lack of clarity when it comes to the legality of TDM, especially when put in the international context. In the US, because of its transformative nature, TDM falls under fair use. In Japan there is a specific exception for TDM. In the UK, because it had to work within the framework of the current Information Society Directive an exception for TDM has been implemented, but only for non-commercial purposes. We would argue that because TDM is about the extraction of facts and data to produce new knowledge it should not fall under the scope of copyright.
In Europe, TDM has been clearly identified as an issue that needs to be addressed. The EC commissioned, not 1 but 2 experts reports on the subject and, back in 2013 began a stakeholder dialogue focuse on the issues. The dialogue was extremely problematic. Why start with the idea of licencing an activity when we haven’t clearly establised that it fall under the scope of copyright?
BL estimates 16 months to negotiate a new licence
Publisher expectation that each researcher will provide details of research project unrealistic
Additional licence for TDM unecessary
Storing the data
Sharing the data
Even though we are of the opinion that TDM should not fall within the scope of copright, being pragmatic we think that the easiest way to provide clarity is to implement an exception for TDM that will make this clear in practice. Over the longer terms we advocate for a reinterpretation of EU law, using the 3 step step to establish that acts that do not have an economic impact on the rightsholder are not infridgement. As an organisation representing research institutions, LIBER also recognises that the best research is international and collaborative, hence we would like to establish legal interoperability for research data- via a solution at WIPO and through international collaboration. We also see open access as having created a fertile groud for knowledge discovery and it’s continue growth and consistent implementation is key.