SlideShare a Scribd company logo
1 of 22
EULAide: Interpretation of End-User License
Agreements using Ontology-Based
Information Extraction
Najmeh Mousavi Nejad, Simon Scerri, Sören Auer & Elisa M. Sibarani
SEMANTiCS16 - 12th International Conference on Semantic Systems
Leipzig, September 12 - 15. 2016
DAAD
Deutscher Akademischer Austauschdienst
German Academic Exchange Service
Outline
 Motivation
 Related Work
 Approach: EULAide
 Evaluation & Results
 Conclusions & Future Works
2 Najmeh Mousavi Nejad, EULAide: Interpretation of End-User License Agreements using Ontology-Based Information Extraction
Motivation
 Online research commissioned by Skandia1
 7% read online EULAs when signing up for
products & services
 21% suffered as a result of ticking EULA box
without reading them
 10% locked into a longer term contract than they
expected
 5% lost money by not being able to cancel or
amend hotels or holidays
3 Najmeh Mousavi Nejad, EULAide: Interpretation of End-User License Agreements using Ontology-Based Information Extraction
1 http://www.prnewswire.co.uk/news-releases/skandia-takes-the-terminal-out-of-terms-and-conditions-145280565.html
Problem Statement
 Given an EULA (End-User License Agreement), we want to
extract permissions, prohibitions & duties from it
 Violation of EULA : legal punishments, including federal fines
 Specifications
 Complexity
 Emerge of new regulations
 Possible change of regulations
 Long texts
4 Najmeh Mousavi Nejad, EULAide: Interpretation of End-User License Agreements using Ontology-Based Information Extraction
Involved Communities
 Permission & Obligations Expression Working
Group1
 A World Wide Web Consortium (W3C) group
 Mission: defining a semantic data model for
expressing permissions and obligations statements
5 Najmeh Mousavi Nejad, EULAide: Interpretation of End-User License Agreements using Ontology-Based Information Extraction
1 https://www.w3.org/2016/poe/charter
 Spare Us the small print2
 A campaign run by Fairer Finance
 Mission: getting rid of lengthy terms & conditions
2 http://www.fairerfinance.com/campaigns/spare-us-the-small-print
Related Work
 Manual
 Online service (tldrlegal.com)
 Semi-automatic
 NLL2RDF (Cabrio, et al. ,ESWC 2014)
 First attempt to generate RDF expressions of EULAs
 Exploit CC REL & ODRL vocabularies
 Supervised machine learning
 Limitation: few number of rights
6 Najmeh Mousavi Nejad, EULAide: Interpretation of End-User License Agreements using Ontology-Based Information Extraction
Vocabularies & Ontologies for EULAs
7
Name Domain Coverage Last Release
CC REL Linked data 2013/11
ODRL Open digital content 2015/03
LDR (derived from ODRL) Linked data resources 2014/09
LiMo Open data 2013/05
L4LOD Web of data 2013/05
ODRS Open data 2013/07
MPEG-21 Rights Data Dictionary Contains the terms as standardized in ISO/IEC
21000-6
2005/07
IPROnto Intellectual property rights (focus: ecommerce) 2003/12
Copyright ontology Digital rights management 2014/01
Semantic Copyright - Basic Works in digital formats 2009/10
Semantic Copyright - Registry Works in digital formats 2009/10
Najmeh Mousavi Nejad, EULAide: Interpretation of End-User License Agreements using Ontology-Based Information Extraction
Approach: OBIE
 Why OBIE?
 Relying on domain expert
knowledge
 Clear structure & terminologies
of EULAs
 Our chosen ontology: ODRL
 Most recently updated
 Broad enough
 Highest community endorsement
8
Policy
(e.g, Request)
Asset
Constraint
Rule
(e.g., Permission)
Action
(e.g., sell)
Party
(e.g., Individual)
permission
prohibition
duty
constraint
function
(e.g., assignee)
action
target output
 Leveraging OBIE to classify EULAs into predefined categories
Najmeh Mousavi Nejad, EULAide: Interpretation of End-User License Agreements using Ontology-Based Information Extraction
Black Box Architecture
9 Najmeh Mousavi Nejad, EULAide: Interpretation of End-User License Agreements using Ontology-Based Information Extraction
EULAide
EULA
Ontology
Annotation Types:
Permission,
Prohibition, Duty
Architecture of EULAide
Sentence Splitter
Morphological Analyser
POS Tagger
Linguistic Pre-Processing
Ontology-based-
Gazetteer
EULA OBIE
Transducer
Annotation Types:
Permission,
Prohibition, Duty
ODRL Enhancement
Gazetteer
User Interface
EULA
Ontology
enhanced
ODRL
Ontology
Pre-processed
EULA
Annotated
concepts
GATE EULA OBIE Pipeline
Tokeniser
Najmeh Mousavi Nejad, EULAide: Interpretation of End-User License Agreements using Ontology-Based Information Extraction10
EULA OBIE Transducer
11
 JAPE transducer based on ODRL community specification documentations
 Example of Permission Rules
phase rule example
Annotate
Classes
PermissionAction Copy, Reproduce, Delete
PermissionWords May, grant, allow
Extract
Permissions
[Subj][PermissionWords][Permission
Action]+[Asset]
[You][May][copy, share and reproduce][the
product]
[License][PermissionWords][object]
[PermissionAction]+[Asset]
[This license] [grants] [you] [to copy, share
and reproduce] [the product]
Najmeh Mousavi Nejad, EULAide: Interpretation of End-User License Agreements using Ontology-Based Information Extraction
EULA OBIE
Transducer
Example of Permission
12
Sentence: This license grants you to copy, share and reproduce the product.
Text-file Gazetteer This license grants you to copy, share and reproduce the product
(License) (Asset)
Ontology-based
Gazetteer
This license grants you to copy, share and reproduce the product
(Lookups)
annotateClasses
phase
This license grants you to copy, share and reproduce the product
(PermWords) (PermissionActions)
extract
Permissions
This license grants you to copy, share and reproduce the product
(License) (PermWords) (obj) (PermissionActions)+ (Asset)
Najmeh Mousavi Nejad, EULAide: Interpretation of End-User License Agreements using Ontology-Based Information Extraction
Implementation of EULAide using GATE
 GATE: General Architecture for Text Engineering
 Open source software
 University of Sheffield
 Written in Java
 Initial release: 1995
 We chose GATE for:
 Its ANNIE IE system & its support for JAPE grammar rules
 Excellent support for OBIE approaches
 Support for evaluation tools
 Embedded API
13 Najmeh Mousavi Nejad, EULAide: Interpretation of End-User License Agreements using Ontology-Based Information Extraction
Gold Standard Creation
14
 20 popular licenses including 193 permissions, 185 prohibitions & 168 duties
 Average words: 3,206
 Average characters without space: 16,815
 Using GATE IAA plugin
Precision Recall F-measure
Permission 0.94 0.9 0.92
Prohibition 0.79 0.94 0.86
Duty 0.86 0.96 0.91
Summary 0.87 0.93 0.9
IAA (Inter Annotator Agreement) for two Annotators
Najmeh Mousavi Nejad, EULAide: Interpretation of End-User License Agreements using Ontology-Based Information Extraction
Evaluation: GATE Corpus Quality Assurance
15
 Apply EULAide to the gold standard
 F-measure calculation for two
conditions
Evaluation of EULAide without Ontology Enhancement
Precision Recall F0.5 F1 F2
Permission 0.74 0.75 0.74 0.74 0.75
Prohibition 0.89 0.63 0.82 0.74 0.66
Duty 0.66 0.67 0.67 0.67 0.67
Summary 0.75 0.68 0.74 0.72 0.7
Evaluation of EULAide with Ontology Enhancement
Najmeh Mousavi Nejad, EULAide: Interpretation of End-User License Agreements using Ontology-Based Information Extraction
 Without ontology
enhancement transducer
Precision Recall F0.5 F1 F2
Permission 0.75 0.56 0.71 0.64 0.59
Prohibition 0.89 0.47 0.75 0.61 0.52
Duty 0.73 0.43 0.64 0.54 0.46
Summary 0.79 0.49 0.7 0.6 0.53
 With ontology enhancement
transducer
Evaluation – Example of Failure
16
 Permission:
 False positive: “nothing other than this License grants you permission to
propagate or modify any covered work.”
 False negative: “The number of permitted participants on a group video call
varies from 3 to a maximum of 10, subject to system requirements”
Example: Facebook EULA
17 Najmeh Mousavi Nejad, EULAide: Interpretation of End-User License Agreements using Ontology-Based Information Extraction
Conclusions
Sentence Splitter
Morphological Analyser
POS Tagger
Linguistic Pre-Processing
Ontology-
based-
Gazetteer
EULA OBIE
Transducer
Annotation
Types:
Permission,
Prohibition, Duty
ODRL Enhancement
Gazetteer
User Interface
EULA
Ontology
enhanced ODRL
Ontology
Pre-processed
EULA
Annotated
concepts
GATE EULA OBIE Pipeline
Tokeniser
 Identification of ontologies and vocabularies in EULA domain
 EULAide: semi-automatic ontology-based annotation of EULAs
 Ontology enhancement by adding additional concepts
 IAA of 90%
 F-measure of more than 70%
 Applying the pipeline to different kind of
front ends (using GATE API)
Najmeh Mousavi Nejad, EULAide: Interpretation of End-User License Agreements using Ontology-Based Information Extraction18
Future Works
 Extract more policies and rights (e.g., agreements, constraints, etc.)
 Compilation of a more comprehensive ontology based on ODRL
 Combining different IE methods
 Implementation of a RESTful application in Java
 Planning to design a mobile app
 Integrating GATE with Stanford NLP to have a more accurate extraction (e.g., using coreference)
19 Najmeh Mousavi Nejad, EULAide: Interpretation of End-User License Agreements using Ontology-Based Information Extraction
Email: nejad@cs.uni-bonn.de
References
20 Najmeh Mousavi Nejad, EULAide: Interpretation of End-User License Agreements using Ontology-Based Information Extraction
• All images are from Pixabay which are released under Creative Commons
CC0 into the public domain.
Backup Slides
21 Najmeh Mousavi Nejad, EULAide: Interpretation of End-User License Agreements using Ontology-Based Information Extraction
• The plugin offers two types of IAA measurement: Fmeasure and agreement
based on the kappa statistic. The latter has been criticized and has a
number of well-known limitations. Kappa is suitable when annotators have
the same number of instances but with different class labels. It is not
recommended for text mark-up tasks, such as named entity recognition
and information extraction [9]. When the annotators themselves determine
which text spans they can annotate, the F-measure should be used. The F-
measure has been less controversial and is also indicated as the most
appropriate IAA measure in the GATE manual itself, given the nature of our
annotation task [5].
Evaluation – Example of Failure
22
 Permission:
 False positive: “nothing other than this License grants you permission to propagate or modify any covered work.”
 Prohibition:
 False positive: “Do not use such Services in a way that distracts you and prevents you from obeying traffic or safety laws.”
 Duty:
 False positive: “Each Recipient is solely responsible for determining the appropriateness of using and distributing the
Program”
 False negative: “The number of permitted participants on a group video call varies from 3 to a maximum of 10, subject
to system requirements”
 False negative: “No one other than Sun has the right to modify the terms applicable to Covered Code created under
this License”
 False negative: “The GNU GPL requires that modified versions be marked as changed, so that their problems will not be
attributed erroneously to authors of previous versions”

More Related Content

Viewers also liked

Kostas Kastrantas | Business Opportunities with Linked Open Data
Kostas Kastrantas  | Business Opportunities with Linked Open DataKostas Kastrantas  | Business Opportunities with Linked Open Data
Kostas Kastrantas | Business Opportunities with Linked Open Data
semanticsconference
 

Viewers also liked (19)

Camilo Thorne, Stefano Faralli and Heiner Stuckenschmidt | Entity Linking for...
Camilo Thorne, Stefano Faralli and Heiner Stuckenschmidt | Entity Linking for...Camilo Thorne, Stefano Faralli and Heiner Stuckenschmidt | Entity Linking for...
Camilo Thorne, Stefano Faralli and Heiner Stuckenschmidt | Entity Linking for...
 
Sebastian Bader | Semantic Technologies for Assisted Decision-Making in Indus...
Sebastian Bader | Semantic Technologies for Assisted Decision-Making in Indus...Sebastian Bader | Semantic Technologies for Assisted Decision-Making in Indus...
Sebastian Bader | Semantic Technologies for Assisted Decision-Making in Indus...
 
Vladimir Alexiev | Semantic Enrichment of Twitter Microposts Helps Understand...
Vladimir Alexiev | Semantic Enrichment of Twitter Microposts Helps Understand...Vladimir Alexiev | Semantic Enrichment of Twitter Microposts Helps Understand...
Vladimir Alexiev | Semantic Enrichment of Twitter Microposts Helps Understand...
 
Adam Bartusiak and Jörg Lässig | Semantic Processing for the Conversion of Un...
Adam Bartusiak and Jörg Lässig | Semantic Processing for the Conversion of Un...Adam Bartusiak and Jörg Lässig | Semantic Processing for the Conversion of Un...
Adam Bartusiak and Jörg Lässig | Semantic Processing for the Conversion of Un...
 
Phil Ritchie | Putting Standards into Action: Multilingual and Semantic Enric...
Phil Ritchie | Putting Standards into Action: Multilingual and Semantic Enric...Phil Ritchie | Putting Standards into Action: Multilingual and Semantic Enric...
Phil Ritchie | Putting Standards into Action: Multilingual and Semantic Enric...
 
Ben Gardner | Delivering a Linked Data warehouse and integrating across the w...
Ben Gardner | Delivering a Linked Data warehouse and integrating across the w...Ben Gardner | Delivering a Linked Data warehouse and integrating across the w...
Ben Gardner | Delivering a Linked Data warehouse and integrating across the w...
 
Kolawole John Adebayo, Luigi Di Caro and Guido Boella | A Supervised Keyphras...
Kolawole John Adebayo, Luigi Di Caro and Guido Boella | A Supervised Keyphras...Kolawole John Adebayo, Luigi Di Caro and Guido Boella | A Supervised Keyphras...
Kolawole John Adebayo, Luigi Di Caro and Guido Boella | A Supervised Keyphras...
 
Chalitha Perera | Cross Media Concept and Entity Driven Search for Enterprise
Chalitha Perera | Cross Media Concept and Entity Driven Search for EnterpriseChalitha Perera | Cross Media Concept and Entity Driven Search for Enterprise
Chalitha Perera | Cross Media Concept and Entity Driven Search for Enterprise
 
Stephen Buxton | Data Integration - a Multi-Model Approach - Documents and Tr...
Stephen Buxton | Data Integration - a Multi-Model Approach - Documents and Tr...Stephen Buxton | Data Integration - a Multi-Model Approach - Documents and Tr...
Stephen Buxton | Data Integration - a Multi-Model Approach - Documents and Tr...
 
Shuangyong Song, Qingliang Miao and Yao Meng | Linking Images to Semantic Kno...
Shuangyong Song, Qingliang Miao and Yao Meng | Linking Images to Semantic Kno...Shuangyong Song, Qingliang Miao and Yao Meng | Linking Images to Semantic Kno...
Shuangyong Song, Qingliang Miao and Yao Meng | Linking Images to Semantic Kno...
 
David Kuilman | Creating a Semantic Enterprise Content model to support conti...
David Kuilman | Creating a Semantic Enterprise Content model to support conti...David Kuilman | Creating a Semantic Enterprise Content model to support conti...
David Kuilman | Creating a Semantic Enterprise Content model to support conti...
 
Victor Charpenay | Standardized Semantics for an Open Web of Things
Victor Charpenay | Standardized Semantics for an Open Web of ThingsVictor Charpenay | Standardized Semantics for an Open Web of Things
Victor Charpenay | Standardized Semantics for an Open Web of Things
 
Kostas Kastrantas | Business Opportunities with Linked Open Data
Kostas Kastrantas  | Business Opportunities with Linked Open DataKostas Kastrantas  | Business Opportunities with Linked Open Data
Kostas Kastrantas | Business Opportunities with Linked Open Data
 
OWL-based validation by Gavin Mendel Gleasonand Bojan Bozic, Trinity College,...
OWL-based validation by Gavin Mendel Gleasonand Bojan Bozic, Trinity College,...OWL-based validation by Gavin Mendel Gleasonand Bojan Bozic, Trinity College,...
OWL-based validation by Gavin Mendel Gleasonand Bojan Bozic, Trinity College,...
 
Thomas Vavra | New Ways of Handling Old Data
Thomas Vavra | New Ways of Handling Old DataThomas Vavra | New Ways of Handling Old Data
Thomas Vavra | New Ways of Handling Old Data
 
OOPS!: on-line ontology diagnosis by Maria Poveda
OOPS!: on-line ontology diagnosis by Maria PovedaOOPS!: on-line ontology diagnosis by Maria Poveda
OOPS!: on-line ontology diagnosis by Maria Poveda
 
Georgios Meditskos and Stamatia Dasiopoulou | Question Answering over Pattern...
Georgios Meditskos and Stamatia Dasiopoulou | Question Answering over Pattern...Georgios Meditskos and Stamatia Dasiopoulou | Question Answering over Pattern...
Georgios Meditskos and Stamatia Dasiopoulou | Question Answering over Pattern...
 
Felix Burkhardt | ARCHITECTURE FOR A QUESTION ANSWERING MACHINE
Felix Burkhardt | ARCHITECTURE FOR A QUESTION ANSWERING MACHINEFelix Burkhardt | ARCHITECTURE FOR A QUESTION ANSWERING MACHINE
Felix Burkhardt | ARCHITECTURE FOR A QUESTION ANSWERING MACHINE
 
Martin Voigt | Streaming-based Text Mining using Deep Learning and Semantics
Martin Voigt | Streaming-based Text Mining using Deep Learning and SemanticsMartin Voigt | Streaming-based Text Mining using Deep Learning and Semantics
Martin Voigt | Streaming-based Text Mining using Deep Learning and Semantics
 

Similar to Najmeh Mousavi Nejad, Simon Scerri, Sören Auer and Elisa M. Sibarani | EULAide: Interpretation of End-User License Agreements using Ontology-Based Information Extraction

Leveraging the strength of OSGi to deliver a convergent IoT Ecosystem - O Log...
Leveraging the strength of OSGi to deliver a convergent IoT Ecosystem - O Log...Leveraging the strength of OSGi to deliver a convergent IoT Ecosystem - O Log...
Leveraging the strength of OSGi to deliver a convergent IoT Ecosystem - O Log...
mfrancis
 
Presentation- on OIM
Presentation- on OIMPresentation- on OIM
Presentation- on OIM
Tamim Khan
 

Similar to Najmeh Mousavi Nejad, Simon Scerri, Sören Auer and Elisa M. Sibarani | EULAide: Interpretation of End-User License Agreements using Ontology-Based Information Extraction (20)

Session 2.5 semantic similarity based clustering of license excerpts for im...
Session 2.5   semantic similarity based clustering of license excerpts for im...Session 2.5   semantic similarity based clustering of license excerpts for im...
Session 2.5 semantic similarity based clustering of license excerpts for im...
 
License DSL translation in COMPAS framework
License DSL translation in COMPAS frameworkLicense DSL translation in COMPAS framework
License DSL translation in COMPAS framework
 
Shibboleth Federations and Secure SDI
Shibboleth Federations and Secure SDIShibboleth Federations and Secure SDI
Shibboleth Federations and Secure SDI
 
OGC Web Service Shibboleth Interoperability Experiment
OGC Web Service Shibboleth Interoperability ExperimentOGC Web Service Shibboleth Interoperability Experiment
OGC Web Service Shibboleth Interoperability Experiment
 
Access Control in ESDIN: Shibboleth
Access Control in ESDIN: ShibbolethAccess Control in ESDIN: Shibboleth
Access Control in ESDIN: Shibboleth
 
Leveraging the strength of OSGi to deliver a convergent IoT Ecosystem - O Log...
Leveraging the strength of OSGi to deliver a convergent IoT Ecosystem - O Log...Leveraging the strength of OSGi to deliver a convergent IoT Ecosystem - O Log...
Leveraging the strength of OSGi to deliver a convergent IoT Ecosystem - O Log...
 
ISC Cloud13 Sill - Crossing organizational boundaries in cloud computing
ISC Cloud13 Sill - Crossing organizational boundaries in cloud computingISC Cloud13 Sill - Crossing organizational boundaries in cloud computing
ISC Cloud13 Sill - Crossing organizational boundaries in cloud computing
 
OAuth Base Camp
OAuth Base CampOAuth Base Camp
OAuth Base Camp
 
ECI Driving Standards from Code -ECI Work with ONOS
ECI Driving Standards from Code -ECI Work with ONOSECI Driving Standards from Code -ECI Work with ONOS
ECI Driving Standards from Code -ECI Work with ONOS
 
Multiagent multiobjective interaction game system for service provisoning veh...
Multiagent multiobjective interaction game system for service provisoning veh...Multiagent multiobjective interaction game system for service provisoning veh...
Multiagent multiobjective interaction game system for service provisoning veh...
 
Open Source ETL
Open Source ETLOpen Source ETL
Open Source ETL
 
openETCS ITEA2 2013 Review Overview
openETCS ITEA2 2013 Review OverviewopenETCS ITEA2 2013 Review Overview
openETCS ITEA2 2013 Review Overview
 
ITAM UK 2017 Open source alternatives_John Springall
ITAM UK 2017 Open source alternatives_John Springall ITAM UK 2017 Open source alternatives_John Springall
ITAM UK 2017 Open source alternatives_John Springall
 
5041
50415041
5041
 
Presentation- on OIM
Presentation- on OIMPresentation- on OIM
Presentation- on OIM
 
50120140503011
5012014050301150120140503011
50120140503011
 
Cut your costs: Deactivate inactive users & reduce sap license fees. [Webinar]
Cut your costs: Deactivate inactive users & reduce sap license fees. [Webinar]Cut your costs: Deactivate inactive users & reduce sap license fees. [Webinar]
Cut your costs: Deactivate inactive users & reduce sap license fees. [Webinar]
 
OPC UA Inside Out, Part 1 - Introduction and Playing Field
OPC UA Inside Out, Part 1 - Introduction and Playing FieldOPC UA Inside Out, Part 1 - Introduction and Playing Field
OPC UA Inside Out, Part 1 - Introduction and Playing Field
 
[Solace] Open Data Movement for Connected Vehicles
[Solace] Open Data Movement for Connected Vehicles[Solace] Open Data Movement for Connected Vehicles
[Solace] Open Data Movement for Connected Vehicles
 
Smarter Apps for Smarter phones - see me at bit.ly/1ezHj0c
Smarter Apps for Smarter phones - see me at bit.ly/1ezHj0cSmarter Apps for Smarter phones - see me at bit.ly/1ezHj0c
Smarter Apps for Smarter phones - see me at bit.ly/1ezHj0c
 

More from semanticsconference

More from semanticsconference (20)

Linear books to open world adventure
Linear books to open world adventureLinear books to open world adventure
Linear books to open world adventure
 
Session 1.2 high-precision, context-free entity linking exploiting unambigu...
Session 1.2   high-precision, context-free entity linking exploiting unambigu...Session 1.2   high-precision, context-free entity linking exploiting unambigu...
Session 1.2 high-precision, context-free entity linking exploiting unambigu...
 
Session 4.3 semantic annotation for enhancing collaborative ideation
Session 4.3   semantic annotation for enhancing collaborative ideationSession 4.3   semantic annotation for enhancing collaborative ideation
Session 4.3 semantic annotation for enhancing collaborative ideation
 
Session 1.1 dalicc - data licenses clearance center
Session 1.1   dalicc - data licenses clearance centerSession 1.1   dalicc - data licenses clearance center
Session 1.1 dalicc - data licenses clearance center
 
Session 1.3 context information management across smart city knowledge domains
Session 1.3   context information management across smart city knowledge domainsSession 1.3   context information management across smart city knowledge domains
Session 1.3 context information management across smart city knowledge domains
 
Session 0.0 aussenac semanticsnl-pwebsem2017-v4
Session 0.0   aussenac semanticsnl-pwebsem2017-v4Session 0.0   aussenac semanticsnl-pwebsem2017-v4
Session 0.0 aussenac semanticsnl-pwebsem2017-v4
 
Session 0.0 keynote sandeep sacheti - final hi res
Session 0.0   keynote sandeep sacheti - final hi resSession 0.0   keynote sandeep sacheti - final hi res
Session 0.0 keynote sandeep sacheti - final hi res
 
Session 1.1 linked data applied: a field report from the netherlands
Session 1.1   linked data applied: a field report from the netherlandsSession 1.1   linked data applied: a field report from the netherlands
Session 1.1 linked data applied: a field report from the netherlands
 
Session 1.2 enrich your knowledge graphs: linked data integration with pool...
Session 1.2   enrich your knowledge graphs: linked data integration with pool...Session 1.2   enrich your knowledge graphs: linked data integration with pool...
Session 1.2 enrich your knowledge graphs: linked data integration with pool...
 
Session 1.4 connecting information from legislation and datasets using a ca...
Session 1.4   connecting information from legislation and datasets using a ca...Session 1.4   connecting information from legislation and datasets using a ca...
Session 1.4 connecting information from legislation and datasets using a ca...
 
Session 1.4 a distributed network of heritage information
Session 1.4   a distributed network of heritage informationSession 1.4   a distributed network of heritage information
Session 1.4 a distributed network of heritage information
 
Session 0.0 media panel - matthias priem - gtuo - semantics 2017
Session 0.0   media panel - matthias priem - gtuo - semantics 2017Session 0.0   media panel - matthias priem - gtuo - semantics 2017
Session 0.0 media panel - matthias priem - gtuo - semantics 2017
 
Session 1.3 semantic asset management in the dutch rail engineering and con...
Session 1.3   semantic asset management in the dutch rail engineering and con...Session 1.3   semantic asset management in the dutch rail engineering and con...
Session 1.3 semantic asset management in the dutch rail engineering and con...
 
Session 1.3 energy, smart homes & smart grids: towards interoperability...
Session 1.3   energy, smart homes & smart grids: towards interoperability...Session 1.3   energy, smart homes & smart grids: towards interoperability...
Session 1.3 energy, smart homes & smart grids: towards interoperability...
 
Session 1.2 improving access to digital content by semantic enrichment
Session 1.2   improving access to digital content by semantic enrichmentSession 1.2   improving access to digital content by semantic enrichment
Session 1.2 improving access to digital content by semantic enrichment
 
Session 2.3 semantics for safeguarding & security – a police story
Session 2.3   semantics for safeguarding & security – a police storySession 2.3   semantics for safeguarding & security – a police story
Session 2.3 semantics for safeguarding & security – a police story
 
Session 4.2 unleash the triple: leveraging a corporate discovery interface....
Session 4.2   unleash the triple: leveraging a corporate discovery interface....Session 4.2   unleash the triple: leveraging a corporate discovery interface....
Session 4.2 unleash the triple: leveraging a corporate discovery interface....
 
Session 1.6 slovak public metadata governance and management based on linke...
Session 1.6   slovak public metadata governance and management based on linke...Session 1.6   slovak public metadata governance and management based on linke...
Session 1.6 slovak public metadata governance and management based on linke...
 
Session 5.6 towards a semantic outlier detection framework in wireless sens...
Session 5.6   towards a semantic outlier detection framework in wireless sens...Session 5.6   towards a semantic outlier detection framework in wireless sens...
Session 5.6 towards a semantic outlier detection framework in wireless sens...
 
Session 2.2 ontology-guided job market demand analysis: a cross-sectional s...
Session 2.2   ontology-guided job market demand analysis: a cross-sectional s...Session 2.2   ontology-guided job market demand analysis: a cross-sectional s...
Session 2.2 ontology-guided job market demand analysis: a cross-sectional s...
 

Recently uploaded

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Recently uploaded (20)

Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 

Najmeh Mousavi Nejad, Simon Scerri, Sören Auer and Elisa M. Sibarani | EULAide: Interpretation of End-User License Agreements using Ontology-Based Information Extraction

  • 1. EULAide: Interpretation of End-User License Agreements using Ontology-Based Information Extraction Najmeh Mousavi Nejad, Simon Scerri, Sören Auer & Elisa M. Sibarani SEMANTiCS16 - 12th International Conference on Semantic Systems Leipzig, September 12 - 15. 2016 DAAD Deutscher Akademischer Austauschdienst German Academic Exchange Service
  • 2. Outline  Motivation  Related Work  Approach: EULAide  Evaluation & Results  Conclusions & Future Works 2 Najmeh Mousavi Nejad, EULAide: Interpretation of End-User License Agreements using Ontology-Based Information Extraction
  • 3. Motivation  Online research commissioned by Skandia1  7% read online EULAs when signing up for products & services  21% suffered as a result of ticking EULA box without reading them  10% locked into a longer term contract than they expected  5% lost money by not being able to cancel or amend hotels or holidays 3 Najmeh Mousavi Nejad, EULAide: Interpretation of End-User License Agreements using Ontology-Based Information Extraction 1 http://www.prnewswire.co.uk/news-releases/skandia-takes-the-terminal-out-of-terms-and-conditions-145280565.html
  • 4. Problem Statement  Given an EULA (End-User License Agreement), we want to extract permissions, prohibitions & duties from it  Violation of EULA : legal punishments, including federal fines  Specifications  Complexity  Emerge of new regulations  Possible change of regulations  Long texts 4 Najmeh Mousavi Nejad, EULAide: Interpretation of End-User License Agreements using Ontology-Based Information Extraction
  • 5. Involved Communities  Permission & Obligations Expression Working Group1  A World Wide Web Consortium (W3C) group  Mission: defining a semantic data model for expressing permissions and obligations statements 5 Najmeh Mousavi Nejad, EULAide: Interpretation of End-User License Agreements using Ontology-Based Information Extraction 1 https://www.w3.org/2016/poe/charter  Spare Us the small print2  A campaign run by Fairer Finance  Mission: getting rid of lengthy terms & conditions 2 http://www.fairerfinance.com/campaigns/spare-us-the-small-print
  • 6. Related Work  Manual  Online service (tldrlegal.com)  Semi-automatic  NLL2RDF (Cabrio, et al. ,ESWC 2014)  First attempt to generate RDF expressions of EULAs  Exploit CC REL & ODRL vocabularies  Supervised machine learning  Limitation: few number of rights 6 Najmeh Mousavi Nejad, EULAide: Interpretation of End-User License Agreements using Ontology-Based Information Extraction
  • 7. Vocabularies & Ontologies for EULAs 7 Name Domain Coverage Last Release CC REL Linked data 2013/11 ODRL Open digital content 2015/03 LDR (derived from ODRL) Linked data resources 2014/09 LiMo Open data 2013/05 L4LOD Web of data 2013/05 ODRS Open data 2013/07 MPEG-21 Rights Data Dictionary Contains the terms as standardized in ISO/IEC 21000-6 2005/07 IPROnto Intellectual property rights (focus: ecommerce) 2003/12 Copyright ontology Digital rights management 2014/01 Semantic Copyright - Basic Works in digital formats 2009/10 Semantic Copyright - Registry Works in digital formats 2009/10 Najmeh Mousavi Nejad, EULAide: Interpretation of End-User License Agreements using Ontology-Based Information Extraction
  • 8. Approach: OBIE  Why OBIE?  Relying on domain expert knowledge  Clear structure & terminologies of EULAs  Our chosen ontology: ODRL  Most recently updated  Broad enough  Highest community endorsement 8 Policy (e.g, Request) Asset Constraint Rule (e.g., Permission) Action (e.g., sell) Party (e.g., Individual) permission prohibition duty constraint function (e.g., assignee) action target output  Leveraging OBIE to classify EULAs into predefined categories Najmeh Mousavi Nejad, EULAide: Interpretation of End-User License Agreements using Ontology-Based Information Extraction
  • 9. Black Box Architecture 9 Najmeh Mousavi Nejad, EULAide: Interpretation of End-User License Agreements using Ontology-Based Information Extraction EULAide EULA Ontology Annotation Types: Permission, Prohibition, Duty
  • 10. Architecture of EULAide Sentence Splitter Morphological Analyser POS Tagger Linguistic Pre-Processing Ontology-based- Gazetteer EULA OBIE Transducer Annotation Types: Permission, Prohibition, Duty ODRL Enhancement Gazetteer User Interface EULA Ontology enhanced ODRL Ontology Pre-processed EULA Annotated concepts GATE EULA OBIE Pipeline Tokeniser Najmeh Mousavi Nejad, EULAide: Interpretation of End-User License Agreements using Ontology-Based Information Extraction10
  • 11. EULA OBIE Transducer 11  JAPE transducer based on ODRL community specification documentations  Example of Permission Rules phase rule example Annotate Classes PermissionAction Copy, Reproduce, Delete PermissionWords May, grant, allow Extract Permissions [Subj][PermissionWords][Permission Action]+[Asset] [You][May][copy, share and reproduce][the product] [License][PermissionWords][object] [PermissionAction]+[Asset] [This license] [grants] [you] [to copy, share and reproduce] [the product] Najmeh Mousavi Nejad, EULAide: Interpretation of End-User License Agreements using Ontology-Based Information Extraction EULA OBIE Transducer
  • 12. Example of Permission 12 Sentence: This license grants you to copy, share and reproduce the product. Text-file Gazetteer This license grants you to copy, share and reproduce the product (License) (Asset) Ontology-based Gazetteer This license grants you to copy, share and reproduce the product (Lookups) annotateClasses phase This license grants you to copy, share and reproduce the product (PermWords) (PermissionActions) extract Permissions This license grants you to copy, share and reproduce the product (License) (PermWords) (obj) (PermissionActions)+ (Asset) Najmeh Mousavi Nejad, EULAide: Interpretation of End-User License Agreements using Ontology-Based Information Extraction
  • 13. Implementation of EULAide using GATE  GATE: General Architecture for Text Engineering  Open source software  University of Sheffield  Written in Java  Initial release: 1995  We chose GATE for:  Its ANNIE IE system & its support for JAPE grammar rules  Excellent support for OBIE approaches  Support for evaluation tools  Embedded API 13 Najmeh Mousavi Nejad, EULAide: Interpretation of End-User License Agreements using Ontology-Based Information Extraction
  • 14. Gold Standard Creation 14  20 popular licenses including 193 permissions, 185 prohibitions & 168 duties  Average words: 3,206  Average characters without space: 16,815  Using GATE IAA plugin Precision Recall F-measure Permission 0.94 0.9 0.92 Prohibition 0.79 0.94 0.86 Duty 0.86 0.96 0.91 Summary 0.87 0.93 0.9 IAA (Inter Annotator Agreement) for two Annotators Najmeh Mousavi Nejad, EULAide: Interpretation of End-User License Agreements using Ontology-Based Information Extraction
  • 15. Evaluation: GATE Corpus Quality Assurance 15  Apply EULAide to the gold standard  F-measure calculation for two conditions Evaluation of EULAide without Ontology Enhancement Precision Recall F0.5 F1 F2 Permission 0.74 0.75 0.74 0.74 0.75 Prohibition 0.89 0.63 0.82 0.74 0.66 Duty 0.66 0.67 0.67 0.67 0.67 Summary 0.75 0.68 0.74 0.72 0.7 Evaluation of EULAide with Ontology Enhancement Najmeh Mousavi Nejad, EULAide: Interpretation of End-User License Agreements using Ontology-Based Information Extraction  Without ontology enhancement transducer Precision Recall F0.5 F1 F2 Permission 0.75 0.56 0.71 0.64 0.59 Prohibition 0.89 0.47 0.75 0.61 0.52 Duty 0.73 0.43 0.64 0.54 0.46 Summary 0.79 0.49 0.7 0.6 0.53  With ontology enhancement transducer
  • 16. Evaluation – Example of Failure 16  Permission:  False positive: “nothing other than this License grants you permission to propagate or modify any covered work.”  False negative: “The number of permitted participants on a group video call varies from 3 to a maximum of 10, subject to system requirements”
  • 17. Example: Facebook EULA 17 Najmeh Mousavi Nejad, EULAide: Interpretation of End-User License Agreements using Ontology-Based Information Extraction
  • 18. Conclusions Sentence Splitter Morphological Analyser POS Tagger Linguistic Pre-Processing Ontology- based- Gazetteer EULA OBIE Transducer Annotation Types: Permission, Prohibition, Duty ODRL Enhancement Gazetteer User Interface EULA Ontology enhanced ODRL Ontology Pre-processed EULA Annotated concepts GATE EULA OBIE Pipeline Tokeniser  Identification of ontologies and vocabularies in EULA domain  EULAide: semi-automatic ontology-based annotation of EULAs  Ontology enhancement by adding additional concepts  IAA of 90%  F-measure of more than 70%  Applying the pipeline to different kind of front ends (using GATE API) Najmeh Mousavi Nejad, EULAide: Interpretation of End-User License Agreements using Ontology-Based Information Extraction18
  • 19. Future Works  Extract more policies and rights (e.g., agreements, constraints, etc.)  Compilation of a more comprehensive ontology based on ODRL  Combining different IE methods  Implementation of a RESTful application in Java  Planning to design a mobile app  Integrating GATE with Stanford NLP to have a more accurate extraction (e.g., using coreference) 19 Najmeh Mousavi Nejad, EULAide: Interpretation of End-User License Agreements using Ontology-Based Information Extraction Email: nejad@cs.uni-bonn.de
  • 20. References 20 Najmeh Mousavi Nejad, EULAide: Interpretation of End-User License Agreements using Ontology-Based Information Extraction • All images are from Pixabay which are released under Creative Commons CC0 into the public domain.
  • 21. Backup Slides 21 Najmeh Mousavi Nejad, EULAide: Interpretation of End-User License Agreements using Ontology-Based Information Extraction • The plugin offers two types of IAA measurement: Fmeasure and agreement based on the kappa statistic. The latter has been criticized and has a number of well-known limitations. Kappa is suitable when annotators have the same number of instances but with different class labels. It is not recommended for text mark-up tasks, such as named entity recognition and information extraction [9]. When the annotators themselves determine which text spans they can annotate, the F-measure should be used. The F- measure has been less controversial and is also indicated as the most appropriate IAA measure in the GATE manual itself, given the nature of our annotation task [5].
  • 22. Evaluation – Example of Failure 22  Permission:  False positive: “nothing other than this License grants you permission to propagate or modify any covered work.”  Prohibition:  False positive: “Do not use such Services in a way that distracts you and prevents you from obeying traffic or safety laws.”  Duty:  False positive: “Each Recipient is solely responsible for determining the appropriateness of using and distributing the Program”  False negative: “The number of permitted participants on a group video call varies from 3 to a maximum of 10, subject to system requirements”  False negative: “No one other than Sun has the right to modify the terms applicable to Covered Code created under this License”  False negative: “The GNU GPL requires that modified versions be marked as changed, so that their problems will not be attributed erroneously to authors of previous versions”

Editor's Notes

  1.  free and open source software (FOSS)  FOSS software licenses both rights to the customer and therefore bundles the modifiable source code with the software ("open-source"), while proprietary software typically does not license these rights and therefore keeps the source code hidden ("closed source"). Protective: right to sublicense : NO
  2. End-User License Agreement (EULA): a legal contract between application providers and their end users
  3. POE group: mention that we are in touch with them The underlying model should support the business models of open, educational, government, and commercial communities through profiles that align with their specific requirements whilst retaining a common semantic layer for wider interoperability Mission: defining a semantic data model for expressing permissions and obligations statements for digital content, and to define the technical elements to make it deployable across browsers and content systems. Mission: getting rid of lengthy terms & conditions and legal jargon that no one understands.
  4. An online service, which uses a manual, crowdsourced way to help people understand the most commonly-used licenses. It is supported by users and everyone can create an account and suggest a short summary of a chosen license and then upload the EULA to the website. The Semantic Copyright project and the copyright ontology respectively, provide rich representation for the attributes of digital work and even support basic reasoning over the uses allowed, limited by only capturing basic rights [1]. Unfortunately, there is little documentation and no additional information about this effort and the ontology. Permissions (derive, reproduce, modify, copy, sell), Prohibitions (commercialise) and Duties (shareAlike, attachPolicy, attribute). The framework’s limitation is that it only covers a limited number of rights and conditions. Furthermore, notwithstanding that their dataset covered 37 licenses, the class with the highest frequency only scored 28 occurrences. This low number might be related to their training data, since after going through their publicly available dataset17 we noticed a scarce number of annotations in the 4-5 page licenses.
  5. As can be observed, the domain coverage ranges from very specific (MPEG-21, e-commerce applications) to very broad (digital rights, open data, linked data). ODRL stands out not only in terms of being the most current (most recently updated vocabulary), but also in terms of comprehensiveness (ranking 2nd from the domain-independent vocabularies). ODRL has also demonstrated the highest community endorsement. Examples included the RDF licenses database5, which is a first attempt at developing a knowledge base of licenses, and which combines the Creative Commons Rights Expression Language (CC REL) and ODRL to express EULAs as RDF. Furthermore, supplementary efforts have identified ODRL 2.0 as the best candidate for defining policies in linked data licenses [14]. In [11] a formal conceptual model based on CC REL, XrML and ODRL was integrated into a platform. The license picker ontology also exploits ODRL [7]
  6. POE starting point RDF licenses database Best candidate for defining policies in linked data licenses License picker ontology We introduce a novel technique that exploits knowledge encoded in an ontology for annotating, extracting and classifying EULA content into predefined categories, leveraging Ontology-based Information Extraction (OBIE). OBIE guides Information Extraction (IE) pipelines to process unstructured or semi-structured natural language text by exploiting ontologies to extract pre-defined structured information and annotating the text using ontology terms Primarily, the reliance on a vocabulary engineered by domain experts grounds our work in existing standards and broadens its application. For example, our work can benefit initiatives undertaken by the Permissions & Obligations Expression Working Group2. Secondly, as a form of legal document license agreements tend to have clear structures and terminologies, sometimes even containing identical clauses and phrases. This facilitates the ‘mapping’ of natural-language text to machine-readable conceptualisations in the ontology for our use-case.
  7. annotateClasses: separates Lookup annotations:
  8. Each individual gazetteer list is a plain text file
  9. JAPE is a Java Annotation Patterns Engine. JAPE provides finite state transduction over annotations based on regular expressions
  10. 20 popular license including Apple Website, Facebook, Google, SoundCloud We removed 10% disagreement from the goldstandard
  11. FP (false positive),: annotation is present in the Response set but not the Key (false negative), “missing” (IE/IR): annotation is present in the Key but missing in the Response
  12. FP (false positive),: annotation is present in the Response set but not the Key (false negative), “missing” (IE/IR): annotation is present in the Key but missing in the Response