Researcher KnowHow - Copyright
and text and data mining
Judith Carr – Research Data Manager
Gordon Sandison – Licensing and Copyright Manager
Learning Outcomes
This session will raise awareness of:
• Copyright law and how it relates to performing TDM analysis.
• How researchers can take advantage of permitted acts in
copyright law to legitimately use TDM in their research.
• The tools publishers make available to enable TDM analysis.
Disclaimer
The following slides are intended to give an overview of the key
concepts of UK copyright legislation for those in higher
education institutions.
They are not comprehensive, nor do they provide full details of
the provisions within the relevant legislation (most notably the
Copyright, Designs and Patents Act).
The slides are for information purposes only and do not
constitute formal legal advice.
But first, … a copyright quiz.
Does copyright protect ideas?
No.
There are two tests a work must pass for copyright to exist in it.
Firstly, it must be ‘original’ and secondly, it must be recorded or ‘fixed’ i.e. be
something tangible.
So, copyright does not protect ideas which remain solely as ideas. Rather
copyright protects the way these ideas are expressed.
Copyright covers different types of content (text, images, sound, moving
images etc.)
Do copyright works need to be
registered to be protected?
No.
Copyright protection is automatic as soon as the work is ‘fixed’ or
recorded in some format.
Do works need the copyright symbol “©”
to be protected?
No.
Copyright works don’t need a “©” to be protected, but it helps
indicate the work is protected.
You always need permission to use
copyright works.
A. Depends on what you’re using it for.
Permission is not required if the work is out of copyright, is under a
Creative Commons licence, or if you are using the work for reasons
permitted under a copyright exception.
In the UK there are copyright exceptions which permit the use of
copyright material under certain circumstances. Usually educational
institutions also pay for specific licences which enable their lecturers
and students to use copyright material.
There is a specific amount of someone else’s work
that you can use without asking permission and
without infringing their copyright.
A. False
Though you may use a copyright protected work under a
copyright exception, there is no legal amount specified.
The courts define ‘substantial part’ on a case-by-case basis,
usually focusing on the quality of the parts taken rather than the
amount.
What is Intellectual Property?
Intellectual property (IP) refers to unique, creative works which can be
treated as an asset or physical property i.e.
• ‘Intellectual’ because it is creative output of the mind, and
• ‘Property’ because it is viewed as a tradable commodity.
Intellectual property is something original which is subsequently ‘fixed’ in
some format, such as written or drawn on paper, in an audio recording, on
film, or recorded electronically.
An idea alone is not intellectual property. For example, an idea for a book
doesn’t qualify, but the words you’ve written do.
As such, IP is, essentially, the tangible expression of ideas.
Intellectual Property Rights (IPRs)
Intellectual property is protected in law by Intellectual Property Rights or
IPRs.
Intellectual Property Rights:
• Are specific legal rights which exist to protect the owners of IP;
• Give the owners of IP specific exclusive rights in regard to the use of their
work;
• Prohibit unauthorised use of protected works;
• Make it easier for the owners of IP to take legal action against anyone who
uses or copies their work illegally;
• Enable people to earn recognition or financial benefit from what they
invent or create;
Intellectual Property Rights (IPRs)
Intellectual Property Rights fall, principally, into four main areas;
• Trademarks;
• Designs;
• Patents;
• Copyright;
Copyright
Copyright isn’t a single right as such, but a set of exclusive rights
which originators/copyright owners of cultural, creative and artistic
works have over the use of their work.
This set of rights legally gives the copyright holder the exclusive right
to determine:
• Who can use or make copies of their works;
• Under what circumstances;
• In what media;
• For what charge;
Essentially, owning copyright is owning the ‘right to copy’.
Copyright
Copyright does not protect ideas, rather the way these ideas are
expressed. For copyright to exist in a work, the work has to be
both:
• Original and
• Fixed i.e. tangible, recorded in a fixed format i.e. written down,
recorded on tape, filmed etc.
Works are attributed copyright protection, automatically, once
they are recorded in a fixed format. Creators don’t have to
register it and do not need © in order to be protected.
Copyright Law – Restricted Acts
In the UK, the Copyright, Designs and Patents Act 1988 (as
amended 2014) is the legislation which governs copyright.
This law sets out the types of work protected by copyright, and
the uses of those works which are the exclusive right of the
copyright holder.
The uses of the work, which are the exclusive right of the rights
holder, are called ‘Restricted Acts’ i.e. acts/uses restricted
solely to the copyright holder.
Uses Protected by Copyright – Restricted
Acts
• Copying
• Issuing copies to the public
• Rental or Lending
• Public Performance
• Communication to the public
• Adaptation
So, what about TDM?
Text and data mining usually requires copying of the work to be
analysed.
Researchers using text and data mining in their research risked
infringing copyright unless they had specific permission from the
copyright owner.
However, copyright was never meant to restrict the use of the facts
and information that exist in a work.
In 2014, the law was changed.
Permitted Acts/Copyright Exceptions
Though copyright protects others using works, also built into the
legislation are ‘Acts Permitted in relation to Copyright Works’.
These ‘permitted acts’ allow limited use of copyrighted material
without having to gain permission and without infringing
copyright law.
These are often referred to as ‘Copyright Exceptions’ i.e.
exceptions to copyright law.
29A. Copies for text and data analysis
for non-commercial research
• Allows researchers to make copies of any copyright material for the purpose of
computational analysis if they already have the right to read the work (that is,
work that they have “lawful access” to).
• They will be able to do this without having to obtain additional permission to
make these copies from the rights holder.
• This exception only permits the making of copies for the purpose of text and data
mining for non-commercial research.
29A. Copies for text and data analysis
for non-commercial research
• Publishers and content providers are able to apply reasonable measures to maintain
their network security or stability, so long as these measures do not prevent or
unreasonably restrict a researcher’s ability to make the copies they need to make for
their text and data mining.
• Contract terms that stop researchers making copies of works to which they have lawful
access in order to carry out a text and data mining analysis will be unenforceable.
Database Rights
Other legal or technical restrictions may limit the access to collections
of works, such as databases of scientific publishers. Examples of such
databases are JSTOR, ScienceDirect and LexisNexis.
In the UK and in the EU, any collection of data, information or works
which required substantial investment in obtaining, verifying or
presenting its contents, is protected by a ‘database right’.
Database Rights
A database right is comparable to, but distinct from copyright, that
exists to recognise the investment that is made in compiling a
database, even when this does not involve the "creative“ and
originality aspect that is reflected by copyright.
The database right is an exclusive right that prevents substantial
extraction or re-utilisation of the content of the database, as well as
systematic insubstantial extraction of the said content (where what is
‘substantial’ and ‘systematic’ depends on the context).
Database Rights
Moreover, the use of a database can also be regulated by
contract. In some cases, access to a database may require
acceptance of ‘terms and conditions’ that restrict certain
activities, including text and data analysis. But, as with the
copyright exception discussed above, engaging in permissible
activities on a database for the purpose of text and data analysis
cannot be ruled out by contract.
Database Rights
Databases are also usually sheltered by technological measures
which impede systematic access to their contents and ‘bulk’ copying.
So, researchers may need not only permission, but also technical
support from the database owner before engaging in large-scale
computational analysis of the contents of a database.
For this reason, despite the fact that researchers can rely on the
exception for text and data analysis, collaboration between database
owners and researchers remains a fundamental component of text
and data mining research.
What can researchers do with the copies
they make as part of their research?
The copies can only be used by those who have lawful access to
the original material for text and data mining for non-commercial
purposes. They can’t be shared, sold, or made publicly available
in any way and anyone doing so could be sued for copyright
infringement.
Do researchers have to acknowledge
every work they analyse in this way?
The law requires that there is sufficient acknowledgment of
copied works, but recognises that it may be impractical to
acknowledge every work in a large-scale analysis. A researcher
could, for example, refer to the databases from which the works
were obtained.
Can a researcher doing contract research for
an outside company text and data mine
copyright material?
It is unlikely that the research falls within the definition of non-
commercial. You should check before carrying out the analysis,
but the likelihood is that you will have to agree with the copyright
owner that you can make copies for your research.
My research is part-funded by a company. I
choose my own research topics and am free
to publish my work without interference from
the company. Can I text and data mine?
This is likely to be fine, so long as the purpose of your research is
non-commercial, but you should check.
Are the results of my text and data
mining analysis covered by copyright?
Copyright covers the artistic expression of an original idea or fact,
not the fact or idea itself. So, if your results are simply facts they
are not covered by copyright.
Is this compatible with Open Access?
Absolutely. You can text and data mine any work that has been
made available under an open access route. You may publish
your research in an open access journal. You should
acknowledge the works you have mined, unless this is
impossible for reasons of practicality.
Can the results of my non-commercial
research be used for commercial purposes?
There are no restrictions on how or where outputs of text and data
mining can be published, including journals published for profit by
academic publishers and under licences that permit commercial
research, such as CC-BY. Other commercialisation of the research
outputs is not restricted either. But it is important to be scrupulous in
assessing whether the original purpose of carrying out the text and
data mining analysis is solely non-commercial; if it isn’t, then
researchers are very likely to be infringing copyright.
Key messages
• If you have legal access to a resource, then you may make a
copy for TDM analysis.
• Be aware of database rights which may restrict copying.
• Look to see if the owner of the material offers ‘in-house’ TDM
solutions.
RESOURCES
• The Library – subscribes to databases
• The Library – ask your Liaison Librarian
• English Dept UoL – video of demo English Corpus, SketchEngine, Wmatrix
https://stream.liv.ac.uk/e3677y55
• OpenMinTeD - an open, service-oriented e-Infrastructure for Text and Data
Mining (TDM) of scientific and scholarly content. Researchers can collaboratively
create, discover, share and re-use Knowledge from a wide range of text-based
scientific related sources in a seamless way.’http://openminted.eu/
• YouTube -Text and Data Mining in the Humanities and Social Sciences—
Strategies and Toolshttps://www.youtube.com/watch?v=vrX7cM1FC_A
• YouTube - Text Mining for Social Scientists
https://www.youtube.com/watch?v=71FqpwsPNpU&t=2052s
Photo by Sharon McCutcheon on Unsplash
Where on the Library webpages?
Gale Digital Scholar Lab
Build ‘Search your institution's Gale Primary Sources, find
relevant texts, and add them to a content set.’
Clean ‘Prepare documents for analysis by stripping the
text of unnecessary words, punctuation, and other
characters.’
Analyse ‘Use analysis tools to explore your content set in
new ways with visualizations to help create new insights
into your texts.’
https://liverpool.idm.oclc.org/login?url=https://infotrac.g
ale.com/itweb/livuni?db=DSLAB
Getting started guide -
https://www.lib.cam.ac.uk/files/getting_started_gdsl.pdf
Check polices both publishers and databases
A tale of 3 databases
IBM Micromedex
You may only use a crawler to crawl this Web site as permitted by this Web site’s robots.txt
protocol, and IBM may block any crawlers in its sole discretion -The use authorized under this
agreement is non-commercial in nature
NICE
The Open Licence (UK) referred to on their web page does not have any express prohibition -– has
a ‘short questionnaire you need to submit which will advise what consent/use you have
Medscape
DO NOT attempt to access or search any Medscape Network properties or any content
contained therein through the use of any engine, software, tool, agent, device or mechanism
(including scripts, bots, spiders, scraper, crawlers, data mining tools or the like) other than
through software generally available through web browsers
Further Information
• Library Open Research Team https://www.liverpool.ac.uk/open-research/
• Library Licensing and Copyright Manager
https://libguides.liverpool.ac.uk/copyright
• Intellectual Property Office
https://assets.publishing.service.gov.uk/government/uploads/system/uploa
ds/attachment_data/file/375954/Research.pdf

Copyright and text and data mining

  • 1.
    Researcher KnowHow -Copyright and text and data mining Judith Carr – Research Data Manager Gordon Sandison – Licensing and Copyright Manager
  • 2.
    Learning Outcomes This sessionwill raise awareness of: • Copyright law and how it relates to performing TDM analysis. • How researchers can take advantage of permitted acts in copyright law to legitimately use TDM in their research. • The tools publishers make available to enable TDM analysis.
  • 3.
    Disclaimer The following slidesare intended to give an overview of the key concepts of UK copyright legislation for those in higher education institutions. They are not comprehensive, nor do they provide full details of the provisions within the relevant legislation (most notably the Copyright, Designs and Patents Act). The slides are for information purposes only and do not constitute formal legal advice.
  • 4.
    But first, …a copyright quiz.
  • 5.
    Does copyright protectideas? No. There are two tests a work must pass for copyright to exist in it. Firstly, it must be ‘original’ and secondly, it must be recorded or ‘fixed’ i.e. be something tangible. So, copyright does not protect ideas which remain solely as ideas. Rather copyright protects the way these ideas are expressed. Copyright covers different types of content (text, images, sound, moving images etc.)
  • 6.
    Do copyright worksneed to be registered to be protected? No. Copyright protection is automatic as soon as the work is ‘fixed’ or recorded in some format.
  • 7.
    Do works needthe copyright symbol “©” to be protected? No. Copyright works don’t need a “©” to be protected, but it helps indicate the work is protected.
  • 8.
    You always needpermission to use copyright works. A. Depends on what you’re using it for. Permission is not required if the work is out of copyright, is under a Creative Commons licence, or if you are using the work for reasons permitted under a copyright exception. In the UK there are copyright exceptions which permit the use of copyright material under certain circumstances. Usually educational institutions also pay for specific licences which enable their lecturers and students to use copyright material.
  • 9.
    There is aspecific amount of someone else’s work that you can use without asking permission and without infringing their copyright. A. False Though you may use a copyright protected work under a copyright exception, there is no legal amount specified. The courts define ‘substantial part’ on a case-by-case basis, usually focusing on the quality of the parts taken rather than the amount.
  • 10.
    What is IntellectualProperty? Intellectual property (IP) refers to unique, creative works which can be treated as an asset or physical property i.e. • ‘Intellectual’ because it is creative output of the mind, and • ‘Property’ because it is viewed as a tradable commodity. Intellectual property is something original which is subsequently ‘fixed’ in some format, such as written or drawn on paper, in an audio recording, on film, or recorded electronically. An idea alone is not intellectual property. For example, an idea for a book doesn’t qualify, but the words you’ve written do. As such, IP is, essentially, the tangible expression of ideas.
  • 11.
    Intellectual Property Rights(IPRs) Intellectual property is protected in law by Intellectual Property Rights or IPRs. Intellectual Property Rights: • Are specific legal rights which exist to protect the owners of IP; • Give the owners of IP specific exclusive rights in regard to the use of their work; • Prohibit unauthorised use of protected works; • Make it easier for the owners of IP to take legal action against anyone who uses or copies their work illegally; • Enable people to earn recognition or financial benefit from what they invent or create;
  • 12.
    Intellectual Property Rights(IPRs) Intellectual Property Rights fall, principally, into four main areas; • Trademarks; • Designs; • Patents; • Copyright;
  • 13.
    Copyright Copyright isn’t asingle right as such, but a set of exclusive rights which originators/copyright owners of cultural, creative and artistic works have over the use of their work. This set of rights legally gives the copyright holder the exclusive right to determine: • Who can use or make copies of their works; • Under what circumstances; • In what media; • For what charge; Essentially, owning copyright is owning the ‘right to copy’.
  • 14.
    Copyright Copyright does notprotect ideas, rather the way these ideas are expressed. For copyright to exist in a work, the work has to be both: • Original and • Fixed i.e. tangible, recorded in a fixed format i.e. written down, recorded on tape, filmed etc. Works are attributed copyright protection, automatically, once they are recorded in a fixed format. Creators don’t have to register it and do not need © in order to be protected.
  • 15.
    Copyright Law –Restricted Acts In the UK, the Copyright, Designs and Patents Act 1988 (as amended 2014) is the legislation which governs copyright. This law sets out the types of work protected by copyright, and the uses of those works which are the exclusive right of the copyright holder. The uses of the work, which are the exclusive right of the rights holder, are called ‘Restricted Acts’ i.e. acts/uses restricted solely to the copyright holder.
  • 16.
    Uses Protected byCopyright – Restricted Acts • Copying • Issuing copies to the public • Rental or Lending • Public Performance • Communication to the public • Adaptation
  • 17.
    So, what aboutTDM? Text and data mining usually requires copying of the work to be analysed. Researchers using text and data mining in their research risked infringing copyright unless they had specific permission from the copyright owner. However, copyright was never meant to restrict the use of the facts and information that exist in a work. In 2014, the law was changed.
  • 18.
    Permitted Acts/Copyright Exceptions Thoughcopyright protects others using works, also built into the legislation are ‘Acts Permitted in relation to Copyright Works’. These ‘permitted acts’ allow limited use of copyrighted material without having to gain permission and without infringing copyright law. These are often referred to as ‘Copyright Exceptions’ i.e. exceptions to copyright law.
  • 19.
    29A. Copies fortext and data analysis for non-commercial research • Allows researchers to make copies of any copyright material for the purpose of computational analysis if they already have the right to read the work (that is, work that they have “lawful access” to). • They will be able to do this without having to obtain additional permission to make these copies from the rights holder. • This exception only permits the making of copies for the purpose of text and data mining for non-commercial research.
  • 20.
    29A. Copies fortext and data analysis for non-commercial research • Publishers and content providers are able to apply reasonable measures to maintain their network security or stability, so long as these measures do not prevent or unreasonably restrict a researcher’s ability to make the copies they need to make for their text and data mining. • Contract terms that stop researchers making copies of works to which they have lawful access in order to carry out a text and data mining analysis will be unenforceable.
  • 21.
    Database Rights Other legalor technical restrictions may limit the access to collections of works, such as databases of scientific publishers. Examples of such databases are JSTOR, ScienceDirect and LexisNexis. In the UK and in the EU, any collection of data, information or works which required substantial investment in obtaining, verifying or presenting its contents, is protected by a ‘database right’.
  • 22.
    Database Rights A databaseright is comparable to, but distinct from copyright, that exists to recognise the investment that is made in compiling a database, even when this does not involve the "creative“ and originality aspect that is reflected by copyright. The database right is an exclusive right that prevents substantial extraction or re-utilisation of the content of the database, as well as systematic insubstantial extraction of the said content (where what is ‘substantial’ and ‘systematic’ depends on the context).
  • 23.
    Database Rights Moreover, theuse of a database can also be regulated by contract. In some cases, access to a database may require acceptance of ‘terms and conditions’ that restrict certain activities, including text and data analysis. But, as with the copyright exception discussed above, engaging in permissible activities on a database for the purpose of text and data analysis cannot be ruled out by contract.
  • 24.
    Database Rights Databases arealso usually sheltered by technological measures which impede systematic access to their contents and ‘bulk’ copying. So, researchers may need not only permission, but also technical support from the database owner before engaging in large-scale computational analysis of the contents of a database. For this reason, despite the fact that researchers can rely on the exception for text and data analysis, collaboration between database owners and researchers remains a fundamental component of text and data mining research.
  • 25.
    What can researchersdo with the copies they make as part of their research? The copies can only be used by those who have lawful access to the original material for text and data mining for non-commercial purposes. They can’t be shared, sold, or made publicly available in any way and anyone doing so could be sued for copyright infringement.
  • 26.
    Do researchers haveto acknowledge every work they analyse in this way? The law requires that there is sufficient acknowledgment of copied works, but recognises that it may be impractical to acknowledge every work in a large-scale analysis. A researcher could, for example, refer to the databases from which the works were obtained.
  • 27.
    Can a researcherdoing contract research for an outside company text and data mine copyright material? It is unlikely that the research falls within the definition of non- commercial. You should check before carrying out the analysis, but the likelihood is that you will have to agree with the copyright owner that you can make copies for your research.
  • 28.
    My research ispart-funded by a company. I choose my own research topics and am free to publish my work without interference from the company. Can I text and data mine? This is likely to be fine, so long as the purpose of your research is non-commercial, but you should check.
  • 29.
    Are the resultsof my text and data mining analysis covered by copyright? Copyright covers the artistic expression of an original idea or fact, not the fact or idea itself. So, if your results are simply facts they are not covered by copyright.
  • 30.
    Is this compatiblewith Open Access? Absolutely. You can text and data mine any work that has been made available under an open access route. You may publish your research in an open access journal. You should acknowledge the works you have mined, unless this is impossible for reasons of practicality.
  • 31.
    Can the resultsof my non-commercial research be used for commercial purposes? There are no restrictions on how or where outputs of text and data mining can be published, including journals published for profit by academic publishers and under licences that permit commercial research, such as CC-BY. Other commercialisation of the research outputs is not restricted either. But it is important to be scrupulous in assessing whether the original purpose of carrying out the text and data mining analysis is solely non-commercial; if it isn’t, then researchers are very likely to be infringing copyright.
  • 32.
    Key messages • Ifyou have legal access to a resource, then you may make a copy for TDM analysis. • Be aware of database rights which may restrict copying. • Look to see if the owner of the material offers ‘in-house’ TDM solutions.
  • 33.
    RESOURCES • The Library– subscribes to databases • The Library – ask your Liaison Librarian • English Dept UoL – video of demo English Corpus, SketchEngine, Wmatrix https://stream.liv.ac.uk/e3677y55 • OpenMinTeD - an open, service-oriented e-Infrastructure for Text and Data Mining (TDM) of scientific and scholarly content. Researchers can collaboratively create, discover, share and re-use Knowledge from a wide range of text-based scientific related sources in a seamless way.’http://openminted.eu/ • YouTube -Text and Data Mining in the Humanities and Social Sciences— Strategies and Toolshttps://www.youtube.com/watch?v=vrX7cM1FC_A • YouTube - Text Mining for Social Scientists https://www.youtube.com/watch?v=71FqpwsPNpU&t=2052s Photo by Sharon McCutcheon on Unsplash
  • 34.
    Where on theLibrary webpages? Gale Digital Scholar Lab Build ‘Search your institution's Gale Primary Sources, find relevant texts, and add them to a content set.’ Clean ‘Prepare documents for analysis by stripping the text of unnecessary words, punctuation, and other characters.’ Analyse ‘Use analysis tools to explore your content set in new ways with visualizations to help create new insights into your texts.’ https://liverpool.idm.oclc.org/login?url=https://infotrac.g ale.com/itweb/livuni?db=DSLAB Getting started guide - https://www.lib.cam.ac.uk/files/getting_started_gdsl.pdf
  • 35.
    Check polices bothpublishers and databases A tale of 3 databases IBM Micromedex You may only use a crawler to crawl this Web site as permitted by this Web site’s robots.txt protocol, and IBM may block any crawlers in its sole discretion -The use authorized under this agreement is non-commercial in nature NICE The Open Licence (UK) referred to on their web page does not have any express prohibition -– has a ‘short questionnaire you need to submit which will advise what consent/use you have Medscape DO NOT attempt to access or search any Medscape Network properties or any content contained therein through the use of any engine, software, tool, agent, device or mechanism (including scripts, bots, spiders, scraper, crawlers, data mining tools or the like) other than through software generally available through web browsers
  • 36.
    Further Information • LibraryOpen Research Team https://www.liverpool.ac.uk/open-research/ • Library Licensing and Copyright Manager https://libguides.liverpool.ac.uk/copyright • Intellectual Property Office https://assets.publishing.service.gov.uk/government/uploads/system/uploa ds/attachment_data/file/375954/Research.pdf