Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Najmeh Mousavi Nejad, Simon Scerri, Sören Auer and Elisa M. Sibarani | EULAide: Interpretation of End-User License Agreements using Ontology-Based Information Extraction

255 views

Published on

http://2016.semantics.cc/najmeh-mousavi-nejad

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Najmeh Mousavi Nejad, Simon Scerri, Sören Auer and Elisa M. Sibarani | EULAide: Interpretation of End-User License Agreements using Ontology-Based Information Extraction

  1. 1. EULAide: Interpretation of End-User License Agreements using Ontology-Based Information Extraction Najmeh Mousavi Nejad, Simon Scerri, Sören Auer & Elisa M. Sibarani SEMANTiCS16 - 12th International Conference on Semantic Systems Leipzig, September 12 - 15. 2016 DAAD Deutscher Akademischer Austauschdienst German Academic Exchange Service
  2. 2. Outline  Motivation  Related Work  Approach: EULAide  Evaluation & Results  Conclusions & Future Works 2 Najmeh Mousavi Nejad, EULAide: Interpretation of End-User License Agreements using Ontology-Based Information Extraction
  3. 3. Motivation  Online research commissioned by Skandia1  7% read online EULAs when signing up for products & services  21% suffered as a result of ticking EULA box without reading them  10% locked into a longer term contract than they expected  5% lost money by not being able to cancel or amend hotels or holidays 3 Najmeh Mousavi Nejad, EULAide: Interpretation of End-User License Agreements using Ontology-Based Information Extraction 1 http://www.prnewswire.co.uk/news-releases/skandia-takes-the-terminal-out-of-terms-and-conditions-145280565.html
  4. 4. Problem Statement  Given an EULA (End-User License Agreement), we want to extract permissions, prohibitions & duties from it  Violation of EULA : legal punishments, including federal fines  Specifications  Complexity  Emerge of new regulations  Possible change of regulations  Long texts 4 Najmeh Mousavi Nejad, EULAide: Interpretation of End-User License Agreements using Ontology-Based Information Extraction
  5. 5. Involved Communities  Permission & Obligations Expression Working Group1  A World Wide Web Consortium (W3C) group  Mission: defining a semantic data model for expressing permissions and obligations statements 5 Najmeh Mousavi Nejad, EULAide: Interpretation of End-User License Agreements using Ontology-Based Information Extraction 1 https://www.w3.org/2016/poe/charter  Spare Us the small print2  A campaign run by Fairer Finance  Mission: getting rid of lengthy terms & conditions 2 http://www.fairerfinance.com/campaigns/spare-us-the-small-print
  6. 6. Related Work  Manual  Online service (tldrlegal.com)  Semi-automatic  NLL2RDF (Cabrio, et al. ,ESWC 2014)  First attempt to generate RDF expressions of EULAs  Exploit CC REL & ODRL vocabularies  Supervised machine learning  Limitation: few number of rights 6 Najmeh Mousavi Nejad, EULAide: Interpretation of End-User License Agreements using Ontology-Based Information Extraction
  7. 7. Vocabularies & Ontologies for EULAs 7 Name Domain Coverage Last Release CC REL Linked data 2013/11 ODRL Open digital content 2015/03 LDR (derived from ODRL) Linked data resources 2014/09 LiMo Open data 2013/05 L4LOD Web of data 2013/05 ODRS Open data 2013/07 MPEG-21 Rights Data Dictionary Contains the terms as standardized in ISO/IEC 21000-6 2005/07 IPROnto Intellectual property rights (focus: ecommerce) 2003/12 Copyright ontology Digital rights management 2014/01 Semantic Copyright - Basic Works in digital formats 2009/10 Semantic Copyright - Registry Works in digital formats 2009/10 Najmeh Mousavi Nejad, EULAide: Interpretation of End-User License Agreements using Ontology-Based Information Extraction
  8. 8. Approach: OBIE  Why OBIE?  Relying on domain expert knowledge  Clear structure & terminologies of EULAs  Our chosen ontology: ODRL  Most recently updated  Broad enough  Highest community endorsement 8 Policy (e.g, Request) Asset Constraint Rule (e.g., Permission) Action (e.g., sell) Party (e.g., Individual) permission prohibition duty constraint function (e.g., assignee) action target output  Leveraging OBIE to classify EULAs into predefined categories Najmeh Mousavi Nejad, EULAide: Interpretation of End-User License Agreements using Ontology-Based Information Extraction
  9. 9. Black Box Architecture 9 Najmeh Mousavi Nejad, EULAide: Interpretation of End-User License Agreements using Ontology-Based Information Extraction EULAide EULA Ontology Annotation Types: Permission, Prohibition, Duty
  10. 10. Architecture of EULAide Sentence Splitter Morphological Analyser POS Tagger Linguistic Pre-Processing Ontology-based- Gazetteer EULA OBIE Transducer Annotation Types: Permission, Prohibition, Duty ODRL Enhancement Gazetteer User Interface EULA Ontology enhanced ODRL Ontology Pre-processed EULA Annotated concepts GATE EULA OBIE Pipeline Tokeniser Najmeh Mousavi Nejad, EULAide: Interpretation of End-User License Agreements using Ontology-Based Information Extraction10
  11. 11. EULA OBIE Transducer 11  JAPE transducer based on ODRL community specification documentations  Example of Permission Rules phase rule example Annotate Classes PermissionAction Copy, Reproduce, Delete PermissionWords May, grant, allow Extract Permissions [Subj][PermissionWords][Permission Action]+[Asset] [You][May][copy, share and reproduce][the product] [License][PermissionWords][object] [PermissionAction]+[Asset] [This license] [grants] [you] [to copy, share and reproduce] [the product] Najmeh Mousavi Nejad, EULAide: Interpretation of End-User License Agreements using Ontology-Based Information Extraction EULA OBIE Transducer
  12. 12. Example of Permission 12 Sentence: This license grants you to copy, share and reproduce the product. Text-file Gazetteer This license grants you to copy, share and reproduce the product (License) (Asset) Ontology-based Gazetteer This license grants you to copy, share and reproduce the product (Lookups) annotateClasses phase This license grants you to copy, share and reproduce the product (PermWords) (PermissionActions) extract Permissions This license grants you to copy, share and reproduce the product (License) (PermWords) (obj) (PermissionActions)+ (Asset) Najmeh Mousavi Nejad, EULAide: Interpretation of End-User License Agreements using Ontology-Based Information Extraction
  13. 13. Implementation of EULAide using GATE  GATE: General Architecture for Text Engineering  Open source software  University of Sheffield  Written in Java  Initial release: 1995  We chose GATE for:  Its ANNIE IE system & its support for JAPE grammar rules  Excellent support for OBIE approaches  Support for evaluation tools  Embedded API 13 Najmeh Mousavi Nejad, EULAide: Interpretation of End-User License Agreements using Ontology-Based Information Extraction
  14. 14. Gold Standard Creation 14  20 popular licenses including 193 permissions, 185 prohibitions & 168 duties  Average words: 3,206  Average characters without space: 16,815  Using GATE IAA plugin Precision Recall F-measure Permission 0.94 0.9 0.92 Prohibition 0.79 0.94 0.86 Duty 0.86 0.96 0.91 Summary 0.87 0.93 0.9 IAA (Inter Annotator Agreement) for two Annotators Najmeh Mousavi Nejad, EULAide: Interpretation of End-User License Agreements using Ontology-Based Information Extraction
  15. 15. Evaluation: GATE Corpus Quality Assurance 15  Apply EULAide to the gold standard  F-measure calculation for two conditions Evaluation of EULAide without Ontology Enhancement Precision Recall F0.5 F1 F2 Permission 0.74 0.75 0.74 0.74 0.75 Prohibition 0.89 0.63 0.82 0.74 0.66 Duty 0.66 0.67 0.67 0.67 0.67 Summary 0.75 0.68 0.74 0.72 0.7 Evaluation of EULAide with Ontology Enhancement Najmeh Mousavi Nejad, EULAide: Interpretation of End-User License Agreements using Ontology-Based Information Extraction  Without ontology enhancement transducer Precision Recall F0.5 F1 F2 Permission 0.75 0.56 0.71 0.64 0.59 Prohibition 0.89 0.47 0.75 0.61 0.52 Duty 0.73 0.43 0.64 0.54 0.46 Summary 0.79 0.49 0.7 0.6 0.53  With ontology enhancement transducer
  16. 16. Evaluation – Example of Failure 16  Permission:  False positive: “nothing other than this License grants you permission to propagate or modify any covered work.”  False negative: “The number of permitted participants on a group video call varies from 3 to a maximum of 10, subject to system requirements”
  17. 17. Example: Facebook EULA 17 Najmeh Mousavi Nejad, EULAide: Interpretation of End-User License Agreements using Ontology-Based Information Extraction
  18. 18. Conclusions Sentence Splitter Morphological Analyser POS Tagger Linguistic Pre-Processing Ontology- based- Gazetteer EULA OBIE Transducer Annotation Types: Permission, Prohibition, Duty ODRL Enhancement Gazetteer User Interface EULA Ontology enhanced ODRL Ontology Pre-processed EULA Annotated concepts GATE EULA OBIE Pipeline Tokeniser  Identification of ontologies and vocabularies in EULA domain  EULAide: semi-automatic ontology-based annotation of EULAs  Ontology enhancement by adding additional concepts  IAA of 90%  F-measure of more than 70%  Applying the pipeline to different kind of front ends (using GATE API) Najmeh Mousavi Nejad, EULAide: Interpretation of End-User License Agreements using Ontology-Based Information Extraction18
  19. 19. Future Works  Extract more policies and rights (e.g., agreements, constraints, etc.)  Compilation of a more comprehensive ontology based on ODRL  Combining different IE methods  Implementation of a RESTful application in Java  Planning to design a mobile app  Integrating GATE with Stanford NLP to have a more accurate extraction (e.g., using coreference) 19 Najmeh Mousavi Nejad, EULAide: Interpretation of End-User License Agreements using Ontology-Based Information Extraction Email: nejad@cs.uni-bonn.de
  20. 20. References 20 Najmeh Mousavi Nejad, EULAide: Interpretation of End-User License Agreements using Ontology-Based Information Extraction • All images are from Pixabay which are released under Creative Commons CC0 into the public domain.
  21. 21. Backup Slides 21 Najmeh Mousavi Nejad, EULAide: Interpretation of End-User License Agreements using Ontology-Based Information Extraction • The plugin offers two types of IAA measurement: Fmeasure and agreement based on the kappa statistic. The latter has been criticized and has a number of well-known limitations. Kappa is suitable when annotators have the same number of instances but with different class labels. It is not recommended for text mark-up tasks, such as named entity recognition and information extraction [9]. When the annotators themselves determine which text spans they can annotate, the F-measure should be used. The F- measure has been less controversial and is also indicated as the most appropriate IAA measure in the GATE manual itself, given the nature of our annotation task [5].
  22. 22. Evaluation – Example of Failure 22  Permission:  False positive: “nothing other than this License grants you permission to propagate or modify any covered work.”  Prohibition:  False positive: “Do not use such Services in a way that distracts you and prevents you from obeying traffic or safety laws.”  Duty:  False positive: “Each Recipient is solely responsible for determining the appropriateness of using and distributing the Program”  False negative: “The number of permitted participants on a group video call varies from 3 to a maximum of 10, subject to system requirements”  False negative: “No one other than Sun has the right to modify the terms applicable to Covered Code created under this License”  False negative: “The GNU GPL requires that modified versions be marked as changed, so that their problems will not be attributed erroneously to authors of previous versions”

×