Najmeh Mousavi Nejad, Simon Scerri, Sören Auer and Elisa M. Sibarani | EULAide: Interpretation of End-User License Agreements using Ontology-Based Information Extraction
Martin Voigt | Streaming-based Text Mining using Deep Learning and Semantics
Similar to Najmeh Mousavi Nejad, Simon Scerri, Sören Auer and Elisa M. Sibarani | EULAide: Interpretation of End-User License Agreements using Ontology-Based Information Extraction
Similar to Najmeh Mousavi Nejad, Simon Scerri, Sören Auer and Elisa M. Sibarani | EULAide: Interpretation of End-User License Agreements using Ontology-Based Information Extraction (20)
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Najmeh Mousavi Nejad, Simon Scerri, Sören Auer and Elisa M. Sibarani | EULAide: Interpretation of End-User License Agreements using Ontology-Based Information Extraction
1. EULAide: Interpretation of End-User License
Agreements using Ontology-Based
Information Extraction
Najmeh Mousavi Nejad, Simon Scerri, Sören Auer & Elisa M. Sibarani
SEMANTiCS16 - 12th International Conference on Semantic Systems
Leipzig, September 12 - 15. 2016
DAAD
Deutscher Akademischer Austauschdienst
German Academic Exchange Service
2. Outline
Motivation
Related Work
Approach: EULAide
Evaluation & Results
Conclusions & Future Works
2 Najmeh Mousavi Nejad, EULAide: Interpretation of End-User License Agreements using Ontology-Based Information Extraction
3. Motivation
Online research commissioned by Skandia1
7% read online EULAs when signing up for
products & services
21% suffered as a result of ticking EULA box
without reading them
10% locked into a longer term contract than they
expected
5% lost money by not being able to cancel or
amend hotels or holidays
3 Najmeh Mousavi Nejad, EULAide: Interpretation of End-User License Agreements using Ontology-Based Information Extraction
1 http://www.prnewswire.co.uk/news-releases/skandia-takes-the-terminal-out-of-terms-and-conditions-145280565.html
4. Problem Statement
Given an EULA (End-User License Agreement), we want to
extract permissions, prohibitions & duties from it
Violation of EULA : legal punishments, including federal fines
Specifications
Complexity
Emerge of new regulations
Possible change of regulations
Long texts
4 Najmeh Mousavi Nejad, EULAide: Interpretation of End-User License Agreements using Ontology-Based Information Extraction
5. Involved Communities
Permission & Obligations Expression Working
Group1
A World Wide Web Consortium (W3C) group
Mission: defining a semantic data model for
expressing permissions and obligations statements
5 Najmeh Mousavi Nejad, EULAide: Interpretation of End-User License Agreements using Ontology-Based Information Extraction
1 https://www.w3.org/2016/poe/charter
Spare Us the small print2
A campaign run by Fairer Finance
Mission: getting rid of lengthy terms & conditions
2 http://www.fairerfinance.com/campaigns/spare-us-the-small-print
6. Related Work
Manual
Online service (tldrlegal.com)
Semi-automatic
NLL2RDF (Cabrio, et al. ,ESWC 2014)
First attempt to generate RDF expressions of EULAs
Exploit CC REL & ODRL vocabularies
Supervised machine learning
Limitation: few number of rights
6 Najmeh Mousavi Nejad, EULAide: Interpretation of End-User License Agreements using Ontology-Based Information Extraction
7. Vocabularies & Ontologies for EULAs
7
Name Domain Coverage Last Release
CC REL Linked data 2013/11
ODRL Open digital content 2015/03
LDR (derived from ODRL) Linked data resources 2014/09
LiMo Open data 2013/05
L4LOD Web of data 2013/05
ODRS Open data 2013/07
MPEG-21 Rights Data Dictionary Contains the terms as standardized in ISO/IEC
21000-6
2005/07
IPROnto Intellectual property rights (focus: ecommerce) 2003/12
Copyright ontology Digital rights management 2014/01
Semantic Copyright - Basic Works in digital formats 2009/10
Semantic Copyright - Registry Works in digital formats 2009/10
Najmeh Mousavi Nejad, EULAide: Interpretation of End-User License Agreements using Ontology-Based Information Extraction
8. Approach: OBIE
Why OBIE?
Relying on domain expert
knowledge
Clear structure & terminologies
of EULAs
Our chosen ontology: ODRL
Most recently updated
Broad enough
Highest community endorsement
8
Policy
(e.g, Request)
Asset
Constraint
Rule
(e.g., Permission)
Action
(e.g., sell)
Party
(e.g., Individual)
permission
prohibition
duty
constraint
function
(e.g., assignee)
action
target output
Leveraging OBIE to classify EULAs into predefined categories
Najmeh Mousavi Nejad, EULAide: Interpretation of End-User License Agreements using Ontology-Based Information Extraction
9. Black Box Architecture
9 Najmeh Mousavi Nejad, EULAide: Interpretation of End-User License Agreements using Ontology-Based Information Extraction
EULAide
EULA
Ontology
Annotation Types:
Permission,
Prohibition, Duty
10. Architecture of EULAide
Sentence Splitter
Morphological Analyser
POS Tagger
Linguistic Pre-Processing
Ontology-based-
Gazetteer
EULA OBIE
Transducer
Annotation Types:
Permission,
Prohibition, Duty
ODRL Enhancement
Gazetteer
User Interface
EULA
Ontology
enhanced
ODRL
Ontology
Pre-processed
EULA
Annotated
concepts
GATE EULA OBIE Pipeline
Tokeniser
Najmeh Mousavi Nejad, EULAide: Interpretation of End-User License Agreements using Ontology-Based Information Extraction10
11. EULA OBIE Transducer
11
JAPE transducer based on ODRL community specification documentations
Example of Permission Rules
phase rule example
Annotate
Classes
PermissionAction Copy, Reproduce, Delete
PermissionWords May, grant, allow
Extract
Permissions
[Subj][PermissionWords][Permission
Action]+[Asset]
[You][May][copy, share and reproduce][the
product]
[License][PermissionWords][object]
[PermissionAction]+[Asset]
[This license] [grants] [you] [to copy, share
and reproduce] [the product]
Najmeh Mousavi Nejad, EULAide: Interpretation of End-User License Agreements using Ontology-Based Information Extraction
EULA OBIE
Transducer
12. Example of Permission
12
Sentence: This license grants you to copy, share and reproduce the product.
Text-file Gazetteer This license grants you to copy, share and reproduce the product
(License) (Asset)
Ontology-based
Gazetteer
This license grants you to copy, share and reproduce the product
(Lookups)
annotateClasses
phase
This license grants you to copy, share and reproduce the product
(PermWords) (PermissionActions)
extract
Permissions
This license grants you to copy, share and reproduce the product
(License) (PermWords) (obj) (PermissionActions)+ (Asset)
Najmeh Mousavi Nejad, EULAide: Interpretation of End-User License Agreements using Ontology-Based Information Extraction
13. Implementation of EULAide using GATE
GATE: General Architecture for Text Engineering
Open source software
University of Sheffield
Written in Java
Initial release: 1995
We chose GATE for:
Its ANNIE IE system & its support for JAPE grammar rules
Excellent support for OBIE approaches
Support for evaluation tools
Embedded API
13 Najmeh Mousavi Nejad, EULAide: Interpretation of End-User License Agreements using Ontology-Based Information Extraction
14. Gold Standard Creation
14
20 popular licenses including 193 permissions, 185 prohibitions & 168 duties
Average words: 3,206
Average characters without space: 16,815
Using GATE IAA plugin
Precision Recall F-measure
Permission 0.94 0.9 0.92
Prohibition 0.79 0.94 0.86
Duty 0.86 0.96 0.91
Summary 0.87 0.93 0.9
IAA (Inter Annotator Agreement) for two Annotators
Najmeh Mousavi Nejad, EULAide: Interpretation of End-User License Agreements using Ontology-Based Information Extraction
15. Evaluation: GATE Corpus Quality Assurance
15
Apply EULAide to the gold standard
F-measure calculation for two
conditions
Evaluation of EULAide without Ontology Enhancement
Precision Recall F0.5 F1 F2
Permission 0.74 0.75 0.74 0.74 0.75
Prohibition 0.89 0.63 0.82 0.74 0.66
Duty 0.66 0.67 0.67 0.67 0.67
Summary 0.75 0.68 0.74 0.72 0.7
Evaluation of EULAide with Ontology Enhancement
Najmeh Mousavi Nejad, EULAide: Interpretation of End-User License Agreements using Ontology-Based Information Extraction
Without ontology
enhancement transducer
Precision Recall F0.5 F1 F2
Permission 0.75 0.56 0.71 0.64 0.59
Prohibition 0.89 0.47 0.75 0.61 0.52
Duty 0.73 0.43 0.64 0.54 0.46
Summary 0.79 0.49 0.7 0.6 0.53
With ontology enhancement
transducer
16. Evaluation – Example of Failure
16
Permission:
False positive: “nothing other than this License grants you permission to
propagate or modify any covered work.”
False negative: “The number of permitted participants on a group video call
varies from 3 to a maximum of 10, subject to system requirements”
17. Example: Facebook EULA
17 Najmeh Mousavi Nejad, EULAide: Interpretation of End-User License Agreements using Ontology-Based Information Extraction
18. Conclusions
Sentence Splitter
Morphological Analyser
POS Tagger
Linguistic Pre-Processing
Ontology-
based-
Gazetteer
EULA OBIE
Transducer
Annotation
Types:
Permission,
Prohibition, Duty
ODRL Enhancement
Gazetteer
User Interface
EULA
Ontology
enhanced ODRL
Ontology
Pre-processed
EULA
Annotated
concepts
GATE EULA OBIE Pipeline
Tokeniser
Identification of ontologies and vocabularies in EULA domain
EULAide: semi-automatic ontology-based annotation of EULAs
Ontology enhancement by adding additional concepts
IAA of 90%
F-measure of more than 70%
Applying the pipeline to different kind of
front ends (using GATE API)
Najmeh Mousavi Nejad, EULAide: Interpretation of End-User License Agreements using Ontology-Based Information Extraction18
19. Future Works
Extract more policies and rights (e.g., agreements, constraints, etc.)
Compilation of a more comprehensive ontology based on ODRL
Combining different IE methods
Implementation of a RESTful application in Java
Planning to design a mobile app
Integrating GATE with Stanford NLP to have a more accurate extraction (e.g., using coreference)
19 Najmeh Mousavi Nejad, EULAide: Interpretation of End-User License Agreements using Ontology-Based Information Extraction
Email: nejad@cs.uni-bonn.de
20. References
20 Najmeh Mousavi Nejad, EULAide: Interpretation of End-User License Agreements using Ontology-Based Information Extraction
• All images are from Pixabay which are released under Creative Commons
CC0 into the public domain.
21. Backup Slides
21 Najmeh Mousavi Nejad, EULAide: Interpretation of End-User License Agreements using Ontology-Based Information Extraction
• The plugin offers two types of IAA measurement: Fmeasure and agreement
based on the kappa statistic. The latter has been criticized and has a
number of well-known limitations. Kappa is suitable when annotators have
the same number of instances but with different class labels. It is not
recommended for text mark-up tasks, such as named entity recognition
and information extraction [9]. When the annotators themselves determine
which text spans they can annotate, the F-measure should be used. The F-
measure has been less controversial and is also indicated as the most
appropriate IAA measure in the GATE manual itself, given the nature of our
annotation task [5].
22. Evaluation – Example of Failure
22
Permission:
False positive: “nothing other than this License grants you permission to propagate or modify any covered work.”
Prohibition:
False positive: “Do not use such Services in a way that distracts you and prevents you from obeying traffic or safety laws.”
Duty:
False positive: “Each Recipient is solely responsible for determining the appropriateness of using and distributing the
Program”
False negative: “The number of permitted participants on a group video call varies from 3 to a maximum of 10, subject
to system requirements”
False negative: “No one other than Sun has the right to modify the terms applicable to Covered Code created under
this License”
False negative: “The GNU GPL requires that modified versions be marked as changed, so that their problems will not be
attributed erroneously to authors of previous versions”
Editor's Notes
free and open source software (FOSS)
FOSS software licenses both rights to the customer and therefore bundles the modifiable source code with the software ("open-source"), while proprietary software typically does not license these rights and therefore keeps the source code hidden ("closed source").
Protective: right to sublicense : NO
End-User License Agreement (EULA): a legal contract between application providers and their end users
POE group: mention that we are in touch with them
The underlying model should support the business models of open, educational, government, and commercial communities through profiles that align with their specific requirements whilst retaining a common semantic layer for wider interoperability
Mission: defining a semantic data model for expressing permissions and obligations statements for digital content, and to define the technical elements to make it deployable across browsers and content systems.
Mission: getting rid of lengthy terms & conditions and legal jargon that no one understands.
An online service, which uses a manual, crowdsourced way to help people understand the most commonly-used licenses. It is supported by users and everyone can create an account and suggest a short summary of a chosen license and then upload the EULA to the website.
The Semantic Copyright project and the copyright ontology respectively, provide rich representation for the attributes of digital work and even support basic reasoning over the uses allowed, limited by only capturing basic rights [1]. Unfortunately, there is little documentation and no additional information about this effort and the ontology.
Permissions (derive, reproduce, modify, copy, sell), Prohibitions (commercialise) and Duties (shareAlike, attachPolicy, attribute).
The framework’s limitation is that it only covers a limited number of rights and conditions. Furthermore, notwithstanding that their dataset covered 37 licenses, the class with the highest frequency only scored 28 occurrences. This low number might be related to their training data, since after going through their publicly available dataset17 we noticed a scarce number of annotations in the 4-5 page licenses.
As can be observed, the domain coverage ranges from very specific (MPEG-21, e-commerce applications) to very broad (digital rights, open data, linked data). ODRL stands out not only in terms of being the most current (most recently updated vocabulary), but also in terms of comprehensiveness (ranking 2nd from the domain-independent vocabularies). ODRL has also demonstrated the highest community endorsement. Examples included the RDF licenses database5, which is a first attempt at developing a knowledge base of licenses, and which combines the Creative Commons Rights Expression Language (CC REL) and ODRL to express EULAs as RDF. Furthermore, supplementary efforts have identified ODRL 2.0 as the best candidate for defining policies in linked data licenses [14]. In [11] a formal conceptual model based on CC REL, XrML and ODRL was integrated into a platform. The license picker ontology also exploits ODRL [7]
POE starting point
RDF licenses database
Best candidate for defining policies in linked data licenses
License picker ontology
We introduce a novel technique that exploits knowledge encoded in an ontology for annotating, extracting and classifying EULA content into predefined categories, leveraging Ontology-based Information Extraction (OBIE). OBIE guides Information Extraction (IE) pipelines to process unstructured or semi-structured natural language text by exploiting ontologies to extract pre-defined structured information and annotating the text using ontology terms
Primarily, the reliance on a vocabulary engineered by domain experts grounds our work in existingstandards and broadens its application. For example, ourwork can benefit initiatives undertaken by the Permissions& Obligations Expression Working Group2. Secondly, asa form of legal document license agreements tend to haveclear structures and terminologies, sometimes even containing identical clauses and phrases. This facilitates the ‘mapping’ of natural-language text to machine-readable conceptualisations in the ontology for our use-case.
annotateClasses: separates Lookup annotations:
Each individual gazetteer list is a plain text file
JAPE is a Java Annotation Patterns Engine. JAPE provides finite state transduction over annotations based on regular expressions
20 popular license including Apple Website, Facebook, Google, SoundCloud
We removed 10% disagreement from the goldstandard
FP (false positive),: annotation is present in the Response set but not the Key
(false negative), “missing” (IE/IR): annotation is present in the Key but missing in the Response
FP (false positive),: annotation is present in the Response set but not the Key
(false negative), “missing” (IE/IR): annotation is present in the Key but missing in the Response