SlideShare a Scribd company logo
thomas.margoni@stir.ac.uk
Open Mining Infrastructure for Text
and Data
OpenMinTeD
11th
International Conference on
Open Repositories
Dr. Thomas Margoni
Lecturer in Law – University of Stirling & CREATe – University of Glasgow, UK
OpenMinTeD coordinator of Legal Interoperability WG
WG3 Legal Interoperability
& Open Repositories
thomas.margoni@stir.ac.uk
● Goal
● What legal barriers to TDM
● Exceptions to legal barriers (and limits)
● Licences
● Legislative reform
● Policy choices
● Deliverables: multi-layer compatibility matrix
(based on theoretical analysis)
WG3 Legal Interoperability
thomas.margoni@stir.ac.uk
Goal:
The goal WG3 is to study and identify copyright and
related rights (e.g. sui generis database right) restrictions
and exceptions to the use and reuse of sources (both
textual sources and text-mining services) in TDM
activities. On this basis the WG will also identify
contractual tools and schemes (e.g. licences) that can
best serve the needs of TDM services.
What legal barriers to
unrestricted TDM
thomas.margoni@stir.ac.uk
● Copyright and rights related to copyright (e.g. Sui generis database right (SGDR))
– These rights usually restrict the reproduction (copy) and distribution of protected works or databases with
substantial investment
– Problem: reproduction is defined very broadly by EU law (any temporary or permanent copy of the whole or
part of a work, etc); SGDR restricts copies of substantial parts and repeated copies of insubstantial parts
– Therefore any TDM (or any other act) which requires any temporary copy of the original work or DB or part
thereof infringes protected works and DB
– Privacy/data protection
● Protects personal data (e.g. databases containing names, addresses, age, sex, etc).
● One of the most important elements is the concept of consent: data subject can give consent for treatment of his/her data (e.g. in a DB). But
such consent needs to be specific for a purpose. Consent cannot be given for any type of use (like e.g. copyright licences). Therefore, all
data subjects may have to give their consent for every new use, something difficult to foresee in a OA environment
– PSI
● Public Sector Information legislation is based on a different paradigm than other approaches (e.g. U.S. where works of Federal Government
are not protected in the U.S.). PSI 2013 has a “open by default” approach but copyright and other similar rights and privacy are object of
specific exclusion and therefore PB are under no obligation to make them accessible and/or reusable
– Contracts/terms of use
● Even when no rights exist on a specific BD (because there is no originality, no substantial investment, no personal data, etc) terms of use of
data provider may restrict use and redistribution of DB. This limitation is based on a contractual relationship but is still an enforceable
obligation (although there are differences)
Exceptions to legal barriers:
thomas.margoni@stir.ac.uk
● Copyright and rights related to copyright
– Exception and limitations to copyright (ELC), fair dealing, fair use. ELC are limited (in EU 1 mandatory
plus 20 at discretion of MS)
● For TDM only possible one is exception for research and teaching. Problem is that it is not uniformly implemented in all
MS and that it is usually limited to partial copies. It is also limited to non commercial activities and only for illustration for
teaching and research. Fair dealing (e.g UK) is a broader standard but not as much as fair use (US).
● Recently, UK introduced a limitation to copyright and related rights for acts of TDM for non commercial purposes and
for DB legally accessed.
● Privacy/data protection
– Anonymisation of data (removal of personal data) but this is time/money consuming and may reduce the usefulness of DB
● PSI
– PSI legislation does not affect FoA (Freedom of Access) legislation which is MS power. But if MS empower FoA legislation then PSI
“reusable by default” rule applies. However, limitation regarding copyright&C. and privacy still applies
● Contracts/terms of use
– These are private agreements so there are no real exceptions. However, certain regulations (antitrust, abusive clauses, consumer
protection) could under certain circumstances invalidate specific terms. This is however a case per case issue and does not seem to
constitute a sound course of action.
Licences and licence
compatibility
thomas.margoni@stir.ac.uk
● Licences are permissions/authorisations (contract or otherwise based) that allow
one or more parties to perform certain activities.
● Licences (so called esp. in the field of copyright) may be directed to a plurality of
subjects and be drafted in standard forms. These are usually called public
licences (e.g. CCPL = Creative Commons Public Licence, GPL = General Public
Licence, etc)
● In the field of OA, Open Content Licences (e.g. CCPL) are used to grant a
permission to perform acts (copy, redistribute, modify, etc) in relation to a work of
authorship or other subject matter (e.g. a DB)
● Different type of licences in the OC field. A possible problem is “licence
proliferation”, i.e. too many (and possibly incompatible) licences. Therefore, in
the “open environment” there is a general consensus that new licences should
not be created unless really necessary. One of the main goals of the Legal
Interoperability WG in OpenMinTeD is to prepare a licence compatibility matrix.
Inner limits of licences
thomas.margoni@stir.ac.uk
● Licences are a powerful instrument but not perfect…
● Examples of problems with licences:
– “Private ordering tool” i.e. can we entrust a private law tool with a
function that should be a matter of public interest (wider access to
knowledge)?
– Licences are a voluntary tool, i.e. only of the owner of Work/DB is
willing to grant you access, licences work. If data owner says no,
there is no remedy based on contracts that can force him/her to deal
with you.
– Even if DB is willing to employ OA licences, very often there are
problems of correct labeling (legal code, metadata, etc) of resources.
This is a very serious issue faced in many projects in the OA field.
Policy recommendations and
licences
thomas.margoni@stir.ac.uk
● Through proper policy choices some of other disadvantages can be fixed.
– Recommending 1 or a very limited no. of licences which are compatible
(fixing problem of licence incompatibility)
– Crucial importance that data providers, funding agencies, scientific and
public institutions require use of correct licences and subject grant of
funding to the correct implementation of those licences (fixing problems
of “voluntarity” and “labeling”)
– Influence public debate so that legislative intervention in the field is
appropriate (e.g. definition of right of reproduction, limited amount of ELC,
need of a fair use exception, limit of non commercial exception such as in
UK).
– Assessment: only OA publications can be used for assessment
Priority points
thomas.margoni@stir.ac.uk
● There is no limit to creation of Licences and ToS (licence proliferation). This is
generally bad as it raises transactive costs
– We prioritize OA licences
– We need a working definition of OA, but don’t try to create yet another
definition
● Major issue is absence of licences or the use of generic terms such as “this
DB is in OA”
● Define the licence in legal terms but also in machine readable (e.g. metata)
– CC is a good example for OC (Work with WG1)
● How far does SGDR reach, i.e. reuse of SGDR DB especially if you are only
referring to statistical results (e.g. how many times a certain word is repeated)
and limits of certain clause/conditions (e.g. non commercial).
Deliverables
thomas.margoni@stir.ac.uk
● Multi-layer compatibility matrix:
– Static matrix
– “Calculators”
– Ongoing study on best graphic representation of information
● Based on theoretical analysis:
– So far 3 papers in progress
● WG1&3: “Legal Interoperability Issues in the Framework of the
OpenMinTeD Project: a Methodological Overview”
● WG3: “Why we need a TDM exception (but it is not enough)”
● WG3&FutureTDM: “Implications of the ‘non commercial’ limit in
Text and Data Mining”

More Related Content

Viewers also liked

Personal Information Collection: A Trade-Off Analysis
Personal Information Collection: A Trade-Off AnalysisPersonal Information Collection: A Trade-Off Analysis
Personal Information Collection: A Trade-Off Analysis
Shannon Szabo-Pickering
 
Big Data Mining - Classification, Techniques and Issues
Big Data Mining - Classification, Techniques and IssuesBig Data Mining - Classification, Techniques and Issues
Big Data Mining - Classification, Techniques and Issues
Karan Deep Singh
 
Presentation from ALA Midwinter 2014 on Elsevier's new Text and Data Mining P...
Presentation from ALA Midwinter 2014 on Elsevier's new Text and Data Mining P...Presentation from ALA Midwinter 2014 on Elsevier's new Text and Data Mining P...
Presentation from ALA Midwinter 2014 on Elsevier's new Text and Data Mining P...
Chris Shillum
 
Merit Event - Understanding and Managing Data Protection
Merit Event - Understanding and Managing Data ProtectionMerit Event - Understanding and Managing Data Protection
Merit Event - Understanding and Managing Data Protection
meritnorthwest
 
A business driven approach to security policy management a technical perspec...
A business driven approach to security policy management  a technical perspec...A business driven approach to security policy management  a technical perspec...
A business driven approach to security policy management a technical perspec...
AlgoSec
 
1.3 applications, issues
1.3 applications, issues1.3 applications, issues
1.3 applications, issues
Krish_ver2
 
Major issues in data mining
Major issues in data miningMajor issues in data mining
Major issues in data mining
Slideshare
 
Data security in local network using distributed firewall ppt
Data security in local network using distributed firewall ppt Data security in local network using distributed firewall ppt
Data security in local network using distributed firewall ppt
Sabreen Irfana
 
Data mining (lecture 1 & 2) conecpts and techniques
Data mining (lecture 1 & 2) conecpts and techniquesData mining (lecture 1 & 2) conecpts and techniques
Data mining (lecture 1 & 2) conecpts and techniques
Saif Ullah
 

Viewers also liked (9)

Personal Information Collection: A Trade-Off Analysis
Personal Information Collection: A Trade-Off AnalysisPersonal Information Collection: A Trade-Off Analysis
Personal Information Collection: A Trade-Off Analysis
 
Big Data Mining - Classification, Techniques and Issues
Big Data Mining - Classification, Techniques and IssuesBig Data Mining - Classification, Techniques and Issues
Big Data Mining - Classification, Techniques and Issues
 
Presentation from ALA Midwinter 2014 on Elsevier's new Text and Data Mining P...
Presentation from ALA Midwinter 2014 on Elsevier's new Text and Data Mining P...Presentation from ALA Midwinter 2014 on Elsevier's new Text and Data Mining P...
Presentation from ALA Midwinter 2014 on Elsevier's new Text and Data Mining P...
 
Merit Event - Understanding and Managing Data Protection
Merit Event - Understanding and Managing Data ProtectionMerit Event - Understanding and Managing Data Protection
Merit Event - Understanding and Managing Data Protection
 
A business driven approach to security policy management a technical perspec...
A business driven approach to security policy management  a technical perspec...A business driven approach to security policy management  a technical perspec...
A business driven approach to security policy management a technical perspec...
 
1.3 applications, issues
1.3 applications, issues1.3 applications, issues
1.3 applications, issues
 
Major issues in data mining
Major issues in data miningMajor issues in data mining
Major issues in data mining
 
Data security in local network using distributed firewall ppt
Data security in local network using distributed firewall ppt Data security in local network using distributed firewall ppt
Data security in local network using distributed firewall ppt
 
Data mining (lecture 1 & 2) conecpts and techniques
Data mining (lecture 1 & 2) conecpts and techniquesData mining (lecture 1 & 2) conecpts and techniques
Data mining (lecture 1 & 2) conecpts and techniques
 

Similar to Legal issues Text and Data Mining

OpenAIRE webinars during OA week 2017: Legal aspects of Open Science (Thomas ...
OpenAIRE webinars during OA week 2017: Legal aspects of Open Science (Thomas ...OpenAIRE webinars during OA week 2017: Legal aspects of Open Science (Thomas ...
OpenAIRE webinars during OA week 2017: Legal aspects of Open Science (Thomas ...
OpenAIRE
 
20200504_Data, Data Ownership and Open Science
20200504_Data, Data Ownership and Open Science20200504_Data, Data Ownership and Open Science
20200504_Data, Data Ownership and Open Science
OpenAIRE
 
Linked Heritage - Legal Interoperability
Linked Heritage - Legal InteroperabilityLinked Heritage - Legal Interoperability
Linked Heritage - Legal Interoperability
Federico Morando
 
LAPSI: legal interoperability updated
LAPSI: legal interoperability updatedLAPSI: legal interoperability updated
LAPSI: legal interoperability updated
Federico Morando
 
Misa cloud computing workshop lhm final
Misa cloud computing workshop   lhm finalMisa cloud computing workshop   lhm final
Misa cloud computing workshop lhm final
Lou Milrad
 
Legal Challenges in Contracting for Cloud Services
Legal Challenges in Contracting for Cloud ServicesLegal Challenges in Contracting for Cloud Services
Legal Challenges in Contracting for Cloud Services
Lou Milrad
 
20200429_Data, Data Ownership and Open Science
20200429_Data, Data Ownership and Open Science20200429_Data, Data Ownership and Open Science
20200429_Data, Data Ownership and Open Science
OpenAIRE
 
TRUST. IP and Technology Update - IT Audit Toolkit for CIOs and General Couns...
TRUST. IP and Technology Update - IT Audit Toolkit for CIOs and General Couns...TRUST. IP and Technology Update - IT Audit Toolkit for CIOs and General Couns...
TRUST. IP and Technology Update - IT Audit Toolkit for CIOs and General Couns...
Jan Lindberg
 
EC cloudconsult OASIS 20110831
EC cloudconsult OASIS 20110831EC cloudconsult OASIS 20110831
EC cloudconsult OASIS 20110831
Jamie Clark
 
Clouds and Chains
Clouds and ChainsClouds and Chains
Clouds and Chains
Tim Swanson
 
CaseMark Weekly Webinar: AI-in-Legal Q3'2023
CaseMark Weekly Webinar: AI-in-Legal Q3'2023CaseMark Weekly Webinar: AI-in-Legal Q3'2023
CaseMark Weekly Webinar: AI-in-Legal Q3'2023
Scott Kveton
 
Data Residency: Challenges and the Need for Standards
Data Residency: Challenges and the Need for StandardsData Residency: Challenges and the Need for Standards
Data Residency: Challenges and the Need for Standards
Cloud Standards Customer Council
 
Text Data Mining & Publishing
Text Data Mining & PublishingText Data Mining & Publishing
Comparability analysis using royalty rates
Comparability analysis using royalty ratesComparability analysis using royalty rates
Comparability analysis using royalty rates
RoyaltyStat
 
Ownership rights in map products - an Intellectual Property perspective.
Ownership rights in map products - an Intellectual Property perspective.Ownership rights in map products - an Intellectual Property perspective.
Ownership rights in map products - an Intellectual Property perspective.
Lou Milrad
 
GDPR considerations for blockchain solution architects.
GDPR considerations for blockchain solution architects.GDPR considerations for blockchain solution architects.
GDPR considerations for blockchain solution architects.
Salman Baset
 
Legal Framework for Cloud Computing Cebit May 31 2011 Sydney
Legal Framework for Cloud Computing Cebit May 31 2011 SydneyLegal Framework for Cloud Computing Cebit May 31 2011 Sydney
Legal Framework for Cloud Computing Cebit May 31 2011 Sydney
anthonywong
 
How to share and publish data: resources, law, and policy
How to share and publish data: resources, law, and policyHow to share and publish data: resources, law, and policy
How to share and publish data: resources, law, and policy
UC Berkeley Office of Scholarly Communication Services
 
Open Data in a Day - Licensing, Law and Best Practice
Open Data in a Day - Licensing, Law and Best PracticeOpen Data in a Day - Licensing, Law and Best Practice
Open Data in a Day - Licensing, Law and Best Practice
The Open Data Institute of North Carolina
 
Lightweight rights modeling and linked data publication for online cultural h...
Lightweight rights modeling and linked data publication for online cultural h...Lightweight rights modeling and linked data publication for online cultural h...
Lightweight rights modeling and linked data publication for online cultural h...
Antoine Isaac
 

Similar to Legal issues Text and Data Mining (20)

OpenAIRE webinars during OA week 2017: Legal aspects of Open Science (Thomas ...
OpenAIRE webinars during OA week 2017: Legal aspects of Open Science (Thomas ...OpenAIRE webinars during OA week 2017: Legal aspects of Open Science (Thomas ...
OpenAIRE webinars during OA week 2017: Legal aspects of Open Science (Thomas ...
 
20200504_Data, Data Ownership and Open Science
20200504_Data, Data Ownership and Open Science20200504_Data, Data Ownership and Open Science
20200504_Data, Data Ownership and Open Science
 
Linked Heritage - Legal Interoperability
Linked Heritage - Legal InteroperabilityLinked Heritage - Legal Interoperability
Linked Heritage - Legal Interoperability
 
LAPSI: legal interoperability updated
LAPSI: legal interoperability updatedLAPSI: legal interoperability updated
LAPSI: legal interoperability updated
 
Misa cloud computing workshop lhm final
Misa cloud computing workshop   lhm finalMisa cloud computing workshop   lhm final
Misa cloud computing workshop lhm final
 
Legal Challenges in Contracting for Cloud Services
Legal Challenges in Contracting for Cloud ServicesLegal Challenges in Contracting for Cloud Services
Legal Challenges in Contracting for Cloud Services
 
20200429_Data, Data Ownership and Open Science
20200429_Data, Data Ownership and Open Science20200429_Data, Data Ownership and Open Science
20200429_Data, Data Ownership and Open Science
 
TRUST. IP and Technology Update - IT Audit Toolkit for CIOs and General Couns...
TRUST. IP and Technology Update - IT Audit Toolkit for CIOs and General Couns...TRUST. IP and Technology Update - IT Audit Toolkit for CIOs and General Couns...
TRUST. IP and Technology Update - IT Audit Toolkit for CIOs and General Couns...
 
EC cloudconsult OASIS 20110831
EC cloudconsult OASIS 20110831EC cloudconsult OASIS 20110831
EC cloudconsult OASIS 20110831
 
Clouds and Chains
Clouds and ChainsClouds and Chains
Clouds and Chains
 
CaseMark Weekly Webinar: AI-in-Legal Q3'2023
CaseMark Weekly Webinar: AI-in-Legal Q3'2023CaseMark Weekly Webinar: AI-in-Legal Q3'2023
CaseMark Weekly Webinar: AI-in-Legal Q3'2023
 
Data Residency: Challenges and the Need for Standards
Data Residency: Challenges and the Need for StandardsData Residency: Challenges and the Need for Standards
Data Residency: Challenges and the Need for Standards
 
Text Data Mining & Publishing
Text Data Mining & PublishingText Data Mining & Publishing
Text Data Mining & Publishing
 
Comparability analysis using royalty rates
Comparability analysis using royalty ratesComparability analysis using royalty rates
Comparability analysis using royalty rates
 
Ownership rights in map products - an Intellectual Property perspective.
Ownership rights in map products - an Intellectual Property perspective.Ownership rights in map products - an Intellectual Property perspective.
Ownership rights in map products - an Intellectual Property perspective.
 
GDPR considerations for blockchain solution architects.
GDPR considerations for blockchain solution architects.GDPR considerations for blockchain solution architects.
GDPR considerations for blockchain solution architects.
 
Legal Framework for Cloud Computing Cebit May 31 2011 Sydney
Legal Framework for Cloud Computing Cebit May 31 2011 SydneyLegal Framework for Cloud Computing Cebit May 31 2011 Sydney
Legal Framework for Cloud Computing Cebit May 31 2011 Sydney
 
How to share and publish data: resources, law, and policy
How to share and publish data: resources, law, and policyHow to share and publish data: resources, law, and policy
How to share and publish data: resources, law, and policy
 
Open Data in a Day - Licensing, Law and Best Practice
Open Data in a Day - Licensing, Law and Best PracticeOpen Data in a Day - Licensing, Law and Best Practice
Open Data in a Day - Licensing, Law and Best Practice
 
Lightweight rights modeling and linked data publication for online cultural h...
Lightweight rights modeling and linked data publication for online cultural h...Lightweight rights modeling and linked data publication for online cultural h...
Lightweight rights modeling and linked data publication for online cultural h...
 

More from openminted_eu

Supporting the uptake of TDM
Supporting the uptake of TDMSupporting the uptake of TDM
Supporting the uptake of TDM
openminted_eu
 
OpenMinTeD, LIBER conference 2017
OpenMinTeD, LIBER conference 2017OpenMinTeD, LIBER conference 2017
OpenMinTeD, LIBER conference 2017
openminted_eu
 
Resource sync overview and real-world use cases for discovery, harvesting, an...
Resource sync overview and real-world use cases for discovery, harvesting, an...Resource sync overview and real-world use cases for discovery, harvesting, an...
Resource sync overview and real-world use cases for discovery, harvesting, an...
openminted_eu
 
Seamless access to the world's open access research papers via resources sync
Seamless access to the world's open access research papers via resources syncSeamless access to the world's open access research papers via resources sync
Seamless access to the world's open access research papers via resources sync
openminted_eu
 
Webinar slides: Interoperability between resources involved in TDM at the lev...
Webinar slides: Interoperability between resources involved in TDM at the lev...Webinar slides: Interoperability between resources involved in TDM at the lev...
Webinar slides: Interoperability between resources involved in TDM at the lev...
openminted_eu
 
Text Mining: the next data frontier. Beyond Open Access
Text Mining: the next data frontier. Beyond Open AccessText Mining: the next data frontier. Beyond Open Access
Text Mining: the next data frontier. Beyond Open Access
openminted_eu
 
How can repositories support the text mining of their content and why?
How can repositories support the text mining of their content and why?How can repositories support the text mining of their content and why?
How can repositories support the text mining of their content and why?
openminted_eu
 
Tentative steps in mining UK theses
Tentative steps in mining UK thesesTentative steps in mining UK theses
Tentative steps in mining UK theses
openminted_eu
 
OpenMinTeD - Repositories in the centre of new scientific knowledge
OpenMinTeD - Repositories in the centre of new scientific knowledgeOpenMinTeD - Repositories in the centre of new scientific knowledge
OpenMinTeD - Repositories in the centre of new scientific knowledge
openminted_eu
 
Jisc Text Mining Capabilities
Jisc Text Mining CapabilitiesJisc Text Mining Capabilities
Jisc Text Mining Capabilities
openminted_eu
 
OpenMinted: It's Uses and Benefits for the Social Sciences
OpenMinted: It's Uses and Benefits for the Social SciencesOpenMinted: It's Uses and Benefits for the Social Sciences
OpenMinted: It's Uses and Benefits for the Social Sciences
openminted_eu
 
OpenMinTeD - Une infrastructure text-mining au service des scientifiques
OpenMinTeD - Une infrastructure text-mining au service des scientifiquesOpenMinTeD - Une infrastructure text-mining au service des scientifiques
OpenMinTeD - Une infrastructure text-mining au service des scientifiques
openminted_eu
 
The Future is All Mine
The Future is All MineThe Future is All Mine
The Future is All Mine
openminted_eu
 
Infrastructure crossroads... and the way we walked them in DKPro
Infrastructure crossroads... and the way we walked them in DKProInfrastructure crossroads... and the way we walked them in DKPro
Infrastructure crossroads... and the way we walked them in DKPro
openminted_eu
 
OpenMinTeD: Making Sense of Large Volumes of Data
OpenMinTeD: Making Sense of Large Volumes of DataOpenMinTeD: Making Sense of Large Volumes of Data
OpenMinTeD: Making Sense of Large Volumes of Data
openminted_eu
 
Experiences of Text Mining; the National Library of Austria perspective
Experiences of Text Mining; the National Library of Austria perspectiveExperiences of Text Mining; the National Library of Austria perspective
Experiences of Text Mining; the National Library of Austria perspective
openminted_eu
 
Text and Data Mining at the Royal Library in the Netherlands
Text and Data Mining at the Royal Library in the NetherlandsText and Data Mining at the Royal Library in the Netherlands
Text and Data Mining at the Royal Library in the Netherlands
openminted_eu
 
The Breakdown: What is OpenMinTeD?
The Breakdown: What is OpenMinTeD?The Breakdown: What is OpenMinTeD?
The Breakdown: What is OpenMinTeD?
openminted_eu
 

More from openminted_eu (18)

Supporting the uptake of TDM
Supporting the uptake of TDMSupporting the uptake of TDM
Supporting the uptake of TDM
 
OpenMinTeD, LIBER conference 2017
OpenMinTeD, LIBER conference 2017OpenMinTeD, LIBER conference 2017
OpenMinTeD, LIBER conference 2017
 
Resource sync overview and real-world use cases for discovery, harvesting, an...
Resource sync overview and real-world use cases for discovery, harvesting, an...Resource sync overview and real-world use cases for discovery, harvesting, an...
Resource sync overview and real-world use cases for discovery, harvesting, an...
 
Seamless access to the world's open access research papers via resources sync
Seamless access to the world's open access research papers via resources syncSeamless access to the world's open access research papers via resources sync
Seamless access to the world's open access research papers via resources sync
 
Webinar slides: Interoperability between resources involved in TDM at the lev...
Webinar slides: Interoperability between resources involved in TDM at the lev...Webinar slides: Interoperability between resources involved in TDM at the lev...
Webinar slides: Interoperability between resources involved in TDM at the lev...
 
Text Mining: the next data frontier. Beyond Open Access
Text Mining: the next data frontier. Beyond Open AccessText Mining: the next data frontier. Beyond Open Access
Text Mining: the next data frontier. Beyond Open Access
 
How can repositories support the text mining of their content and why?
How can repositories support the text mining of their content and why?How can repositories support the text mining of their content and why?
How can repositories support the text mining of their content and why?
 
Tentative steps in mining UK theses
Tentative steps in mining UK thesesTentative steps in mining UK theses
Tentative steps in mining UK theses
 
OpenMinTeD - Repositories in the centre of new scientific knowledge
OpenMinTeD - Repositories in the centre of new scientific knowledgeOpenMinTeD - Repositories in the centre of new scientific knowledge
OpenMinTeD - Repositories in the centre of new scientific knowledge
 
Jisc Text Mining Capabilities
Jisc Text Mining CapabilitiesJisc Text Mining Capabilities
Jisc Text Mining Capabilities
 
OpenMinted: It's Uses and Benefits for the Social Sciences
OpenMinted: It's Uses and Benefits for the Social SciencesOpenMinted: It's Uses and Benefits for the Social Sciences
OpenMinted: It's Uses and Benefits for the Social Sciences
 
OpenMinTeD - Une infrastructure text-mining au service des scientifiques
OpenMinTeD - Une infrastructure text-mining au service des scientifiquesOpenMinTeD - Une infrastructure text-mining au service des scientifiques
OpenMinTeD - Une infrastructure text-mining au service des scientifiques
 
The Future is All Mine
The Future is All MineThe Future is All Mine
The Future is All Mine
 
Infrastructure crossroads... and the way we walked them in DKPro
Infrastructure crossroads... and the way we walked them in DKProInfrastructure crossroads... and the way we walked them in DKPro
Infrastructure crossroads... and the way we walked them in DKPro
 
OpenMinTeD: Making Sense of Large Volumes of Data
OpenMinTeD: Making Sense of Large Volumes of DataOpenMinTeD: Making Sense of Large Volumes of Data
OpenMinTeD: Making Sense of Large Volumes of Data
 
Experiences of Text Mining; the National Library of Austria perspective
Experiences of Text Mining; the National Library of Austria perspectiveExperiences of Text Mining; the National Library of Austria perspective
Experiences of Text Mining; the National Library of Austria perspective
 
Text and Data Mining at the Royal Library in the Netherlands
Text and Data Mining at the Royal Library in the NetherlandsText and Data Mining at the Royal Library in the Netherlands
Text and Data Mining at the Royal Library in the Netherlands
 
The Breakdown: What is OpenMinTeD?
The Breakdown: What is OpenMinTeD?The Breakdown: What is OpenMinTeD?
The Breakdown: What is OpenMinTeD?
 

Recently uploaded

一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
ahzuo
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
javier ramirez
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
Sachin Paul
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
Walaa Eldin Moustafa
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
nuttdpt
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Kiwi Creative
 
State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023
kuntobimo2016
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
manishkhaire30
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
u86oixdj
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
Roger Valdez
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
v7oacc3l
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Aggregage
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
rwarrenll
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
vikram sood
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
Lars Albertsson
 

Recently uploaded (20)

一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
 
State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
 

Legal issues Text and Data Mining

  • 1. thomas.margoni@stir.ac.uk Open Mining Infrastructure for Text and Data OpenMinTeD 11th International Conference on Open Repositories Dr. Thomas Margoni Lecturer in Law – University of Stirling & CREATe – University of Glasgow, UK OpenMinTeD coordinator of Legal Interoperability WG
  • 2. WG3 Legal Interoperability & Open Repositories thomas.margoni@stir.ac.uk ● Goal ● What legal barriers to TDM ● Exceptions to legal barriers (and limits) ● Licences ● Legislative reform ● Policy choices ● Deliverables: multi-layer compatibility matrix (based on theoretical analysis)
  • 3. WG3 Legal Interoperability thomas.margoni@stir.ac.uk Goal: The goal WG3 is to study and identify copyright and related rights (e.g. sui generis database right) restrictions and exceptions to the use and reuse of sources (both textual sources and text-mining services) in TDM activities. On this basis the WG will also identify contractual tools and schemes (e.g. licences) that can best serve the needs of TDM services.
  • 4. What legal barriers to unrestricted TDM thomas.margoni@stir.ac.uk ● Copyright and rights related to copyright (e.g. Sui generis database right (SGDR)) – These rights usually restrict the reproduction (copy) and distribution of protected works or databases with substantial investment – Problem: reproduction is defined very broadly by EU law (any temporary or permanent copy of the whole or part of a work, etc); SGDR restricts copies of substantial parts and repeated copies of insubstantial parts – Therefore any TDM (or any other act) which requires any temporary copy of the original work or DB or part thereof infringes protected works and DB – Privacy/data protection ● Protects personal data (e.g. databases containing names, addresses, age, sex, etc). ● One of the most important elements is the concept of consent: data subject can give consent for treatment of his/her data (e.g. in a DB). But such consent needs to be specific for a purpose. Consent cannot be given for any type of use (like e.g. copyright licences). Therefore, all data subjects may have to give their consent for every new use, something difficult to foresee in a OA environment – PSI ● Public Sector Information legislation is based on a different paradigm than other approaches (e.g. U.S. where works of Federal Government are not protected in the U.S.). PSI 2013 has a “open by default” approach but copyright and other similar rights and privacy are object of specific exclusion and therefore PB are under no obligation to make them accessible and/or reusable – Contracts/terms of use ● Even when no rights exist on a specific BD (because there is no originality, no substantial investment, no personal data, etc) terms of use of data provider may restrict use and redistribution of DB. This limitation is based on a contractual relationship but is still an enforceable obligation (although there are differences)
  • 5. Exceptions to legal barriers: thomas.margoni@stir.ac.uk ● Copyright and rights related to copyright – Exception and limitations to copyright (ELC), fair dealing, fair use. ELC are limited (in EU 1 mandatory plus 20 at discretion of MS) ● For TDM only possible one is exception for research and teaching. Problem is that it is not uniformly implemented in all MS and that it is usually limited to partial copies. It is also limited to non commercial activities and only for illustration for teaching and research. Fair dealing (e.g UK) is a broader standard but not as much as fair use (US). ● Recently, UK introduced a limitation to copyright and related rights for acts of TDM for non commercial purposes and for DB legally accessed. ● Privacy/data protection – Anonymisation of data (removal of personal data) but this is time/money consuming and may reduce the usefulness of DB ● PSI – PSI legislation does not affect FoA (Freedom of Access) legislation which is MS power. But if MS empower FoA legislation then PSI “reusable by default” rule applies. However, limitation regarding copyright&C. and privacy still applies ● Contracts/terms of use – These are private agreements so there are no real exceptions. However, certain regulations (antitrust, abusive clauses, consumer protection) could under certain circumstances invalidate specific terms. This is however a case per case issue and does not seem to constitute a sound course of action.
  • 6. Licences and licence compatibility thomas.margoni@stir.ac.uk ● Licences are permissions/authorisations (contract or otherwise based) that allow one or more parties to perform certain activities. ● Licences (so called esp. in the field of copyright) may be directed to a plurality of subjects and be drafted in standard forms. These are usually called public licences (e.g. CCPL = Creative Commons Public Licence, GPL = General Public Licence, etc) ● In the field of OA, Open Content Licences (e.g. CCPL) are used to grant a permission to perform acts (copy, redistribute, modify, etc) in relation to a work of authorship or other subject matter (e.g. a DB) ● Different type of licences in the OC field. A possible problem is “licence proliferation”, i.e. too many (and possibly incompatible) licences. Therefore, in the “open environment” there is a general consensus that new licences should not be created unless really necessary. One of the main goals of the Legal Interoperability WG in OpenMinTeD is to prepare a licence compatibility matrix.
  • 7. Inner limits of licences thomas.margoni@stir.ac.uk ● Licences are a powerful instrument but not perfect… ● Examples of problems with licences: – “Private ordering tool” i.e. can we entrust a private law tool with a function that should be a matter of public interest (wider access to knowledge)? – Licences are a voluntary tool, i.e. only of the owner of Work/DB is willing to grant you access, licences work. If data owner says no, there is no remedy based on contracts that can force him/her to deal with you. – Even if DB is willing to employ OA licences, very often there are problems of correct labeling (legal code, metadata, etc) of resources. This is a very serious issue faced in many projects in the OA field.
  • 8. Policy recommendations and licences thomas.margoni@stir.ac.uk ● Through proper policy choices some of other disadvantages can be fixed. – Recommending 1 or a very limited no. of licences which are compatible (fixing problem of licence incompatibility) – Crucial importance that data providers, funding agencies, scientific and public institutions require use of correct licences and subject grant of funding to the correct implementation of those licences (fixing problems of “voluntarity” and “labeling”) – Influence public debate so that legislative intervention in the field is appropriate (e.g. definition of right of reproduction, limited amount of ELC, need of a fair use exception, limit of non commercial exception such as in UK). – Assessment: only OA publications can be used for assessment
  • 9. Priority points thomas.margoni@stir.ac.uk ● There is no limit to creation of Licences and ToS (licence proliferation). This is generally bad as it raises transactive costs – We prioritize OA licences – We need a working definition of OA, but don’t try to create yet another definition ● Major issue is absence of licences or the use of generic terms such as “this DB is in OA” ● Define the licence in legal terms but also in machine readable (e.g. metata) – CC is a good example for OC (Work with WG1) ● How far does SGDR reach, i.e. reuse of SGDR DB especially if you are only referring to statistical results (e.g. how many times a certain word is repeated) and limits of certain clause/conditions (e.g. non commercial).
  • 10. Deliverables thomas.margoni@stir.ac.uk ● Multi-layer compatibility matrix: – Static matrix – “Calculators” – Ongoing study on best graphic representation of information ● Based on theoretical analysis: – So far 3 papers in progress ● WG1&3: “Legal Interoperability Issues in the Framework of the OpenMinTeD Project: a Methodological Overview” ● WG3: “Why we need a TDM exception (but it is not enough)” ● WG3&FutureTDM: “Implications of the ‘non commercial’ limit in Text and Data Mining”