SlideShare a Scribd company logo
Semantic Similarity based Clustering of License
Excerpts for Improved End-User Interpretation
Najmeh Mousavi Nejad, Simon Scerri & Sören Auer
SEMANTiCS17 - 13th International Conference on Semantic Systems
Amsterdam, September 11 - 14. 2017
2 Najmeh Mousavi Nejad, Semantic Similarity based Clustering of License Excerpts for Improved End-User Interpretation
Motivation
 Online research commissioned by Skandia1
 Only 7% read online End-User License Agreements
(EULAs) when signing up for products & services
 21% suffered as a result of ticking EULA box without
reading them
1 http://www.prnewswire.co.uk/news-releases/skandia-takes-the-terminal-out-of-terms-and-conditions-145280565.html
 10% locked into a longer term contract than they
expected
 5% lost money by not being able to cancel or amend
hotels or holidays
3 Najmeh Mousavi Nejad, Semantic Similarity based Clustering of License Excerpts for Improved End-User Interpretation
Problem Statement
 Given an EULA (End-User License Agreement) in the natural language, we want to provide a
user-friendly summary of permissions, prohibitions & duties.
 Focus of this Work: clustering similar extracted excerpts of EULAs for the benefit of end-user
You may reproduce, prepare Derivative Works of, publicly display, publicly perform, sublicense,
and distribute the Work and such Derivative Works in Source or Object form.
You may reproduce and distribute copies of the Work or Derivative Works with or without
modifications and You may add Your own attribution notices within Derivative Works
4 Najmeh Mousavi Nejad, Semantic Similarity based Clustering of License Excerpts for Improved End-User Interpretation
Related Work
 Manual: online service (tldrlegal.com)
 Semi-automatic
 NLL2RDF (Cabrio, et al., ESWC 2014)
 First attempt to generate RDF expressions of EULAs
 Exploit CC REL & ODRL vocabularies & supervised machine learning
 Limitation: few number of rights
 EULAide (Mousavi Nejad, et al., SEMANTiCS 2016)
 Exploits ontology-based information extraction to extract permissions, prohibitions & duties from EULAs
 Taken as a basis in this work
 Limitation: a lot of extracted excerpts for long EULAs
5 Najmeh Mousavi Nejad, Semantic Similarity based Clustering of License Excerpts for Improved End-User Interpretation
Approach
 Building a similarity matrix for each class considering different features of extracted segments
 Features for each class (e.g., permission, prohibitions & duties) are extracted with JAPE rules
 Action, Condition, PolicyType
 Exploiting a distributional semantic approach for computing short text similarity
 DISCO: DIstributionally related words using CO-occurrences
 Has a word space builder: creates the word embedding database over a text corpus
 Computes semantic similarity based on the word vectors
6 Najmeh Mousavi Nejad, Semantic Similarity based Clustering of License Excerpts for Improved End-User Interpretation
Approach – Continued
 Using a hierarchical clustering algorithm to cluster each class (e.g., permission,
prohibitions & duties)
a
c , d e , f
a , c , d b e , f g
b c d e f g
7 Najmeh Mousavi Nejad, Semantic Similarity based Clustering of License Excerpts for Improved End-User Interpretation
Architecture
User
Front-end
Web
Interface
HTML,
CSS,
Angular
JS
EULA File
Permission,
Duty &
Prohibition
clusters
Back-end
FormData/
String
XML/JSON
data
Application
Server
Ontology
File
Permission, Duty,
Prohibition
DISCO
API
3 Similarity
Matrixes for
the 3 classes
Clustering
Algorithm
3 classes, each clustered based on their similarities
Customized
GATE OBIE
Pipeline
(feature
extraction
phase added)
Stanford
Dependency
Parser
Annotations
with features
Annotations with
extracted objects
Or URL
8 Najmeh Mousavi Nejad, Semantic Similarity based Clustering of License Excerpts for Improved End-User Interpretation
Feature Extraction
 Three different features are extracted with JAPE rules
 Sequence of actions: copy, share, distribute
 Condition on which a specific action is granted or forbidden or obliged
 PolicyType: copyright, patent or intellectual property right
If you join a Dropbox for business account, you must use it in compliance with your employer’s terms & policies.
Each contributor grants you a patent license to make, use, sell, import and transfer the work.
condition action
policy type action
9 Najmeh Mousavi Nejad, Semantic Similarity based Clustering of License Excerpts for Improved End-User Interpretation
Sketch of Semantic Clustering Algorithm
Input: permissions, prohibitions, duties with features
1: for all three classes do
2: for all segment pairs in each class do
3: A = similarity between actions
4: B = similarity between conditions
5: C = similarity between policy types
6: D = similarity between the remainders of segments
7: finalSim = A + B + C + D
8: add finalSim to the corresponding matrix cell
9: end for
10: do HAC clustering for the matrix with a threshold
11: end for
Output: clustered permissions, prohibitions & duties
10 Najmeh Mousavi Nejad, Semantic Similarity based Clustering of License Excerpts for Improved End-User Interpretation
EULAide Platform Web Interface
11 Najmeh Mousavi Nejad, Semantic Similarity based Clustering of License Excerpts for Improved End-User Interpretation
Experiments
 The Clustering Approach Evaluation
 Does the semantic based clustering compress information?
 Does the feature extraction phase improve the result?
 Usability experiments
 Does using EULAide need less time and effort for EULA comprehension?
 Does semi-automatic IE lead to information loss?
 How easy is exploiting EULAide regarding human perception?
12 Najmeh Mousavi Nejad, Semantic Similarity based Clustering of License Excerpts for Improved End-User Interpretation
Clustering Approach Evaluation
 Result of clustering for 4 EULAs
 Clusters-M (Baseline): without features
 Clusters-Mf (our algorithm): with feature extraction phase
#Instances #Baseline Clusters #Our Approach Clusters
Permission 30 18 20
Duty 27 24 20
Prohibition 40 32 35
 Does the semantic based clustering compress information? YES
13 Najmeh Mousavi Nejad, Semantic Similarity based Clustering of License Excerpts for Improved End-User Interpretation
Clustering Approach Evaluation
 Compiling a gold standard for EULA clustering is hard
 Solution: measuring the inter-annotator agreement
approximately
 Five subject were asked to devise their own clustering
criteria as they best deemed fit
 Computing the agreement between them:
𝑅𝑎𝑛𝑑 𝐼𝑛𝑑𝑒𝑥 =
𝑇𝑃 + 𝑇𝑁
𝑇𝑃 + 𝐹𝑃 + 𝐹𝑁 + 𝑇𝑁
 Does the feature extraction phase improve the result?
h1 h2 h3 h4 h5 M Mf
h1 (1) 0.71 0.89 0.91 0.71 0.93 0.97
h2 * (1) 0.61 0.63 0.95 0.7 0.68
h3 * * (1) 0.92 0.63 0.86 0.86
h4 * * * (1) 0.65 0.85 0.88
h5 * * * * (1) 0.7 0.67
RI of clustering Result for Duties
14 Najmeh Mousavi Nejad, Semantic Similarity based Clustering of License Excerpts for Improved End-User Interpretation
Clustering Approach Evaluation
 Accumulated deviation of Mf from Human = 9%
 Accumulated deviation of M from Human = 10%
 Does the feature extraction phase improve the result?
Human M Mf
Permission 0.79 0.83 0.81
Duty 0.76 0.81 0.81
Prohibition 0.83 0.84 0.85
 Feature-based approach:
 Generates more fined-grained clusters
 Are more attuned to human intuition and perception
average rand index
YES
15 Najmeh Mousavi Nejad, Semantic Similarity based Clustering of License Excerpts for Improved End-User Interpretation
Usability Evaluation
 Does using EULAide need less time and effort for EULA comprehension?
 Does semi-automatic IE lead to information loss?
 Experimental setup for the 4 EULAs
 A legal expert designed 5 multiple choices questions for each EULA
 6 students read EULAs in 2 modes: natural text & EULAide
h1 h2 h3 h4 h5 h6
1-full × × ×
1-EULAide × × ×
2-full × × ×
2-EULAide × × ×
3-full × × ×
3-EULAide × × ×
4-full × × ×
4-EULAide × × ×
 They answered the questions in 2 phases
 Phase1: using memory without looking
 Phase2: using search tools for finding the answers
16 Najmeh Mousavi Nejad, Semantic Similarity based Clustering of License Excerpts for Improved End-User Interpretation
Usability Evaluation
Correct Incorrect Unanswered in phase1
Phase2
correct
Phase2
incorrect
Phase2
Unanswered
EULA-full 67 8 18.5 5 1.5
EULAide 62 15 6.5 4.5 12
Reading Answering
phase1
Answering
phase2
EULA-full 1185 75 152
EULAide 315 72 77
 Does using EULAide need less time and effort
for EULA comprehension? YES
 Does semi-automatic IE lead to information
loss? YES
average time in seconds
average percentage of questions results (%)
17 Najmeh Mousavi Nejad, Semantic Similarity based Clustering of License Excerpts for Improved End-User Interpretation
Usability Test
 How easy is using EULAide regarding human perception?
 USE questionnaire: usefulness, satisfaction, ease of learning and ease of use1
 Contains 30 questions
 1 = strongly dissatisfied, 7 = strongly satisfied
 With 6 participants
1 http://garyperlman.com/quest/quest.cgi?form=USE
Usefulness Ease of
use
Ease of
learning
satisfaction
6.14 6.11 6.75 6.0
 Recommendations by participants:
 Extending the idea of summary in the header of each accordion
 Including other aspects of EULAs: what is the agreement between us and the service provider?
18 Najmeh Mousavi Nejad, Semantic Similarity based Clustering of License Excerpts for Improved End-User Interpretation
Conclusions & Future Works
 EULAide is the first comprehensive approach for EULA interpretation
 Clustering is effective and reduces the number of relevant terms for users to focus on initially
 EULAide is visual & simple to digest EULAs
 It saves around 75% of the time
 But it has a marginal price of 10.5% loss of valuable information
 Future works
 Improving the feature extraction phase
 Extending the summarization technique of each cluster for the benefit of end users
nejad@cs.uni-bonn.de
THANK YOU !
nejad@cs.uni-bonn.de
20 Najmeh Mousavi Nejad, Semantic Similarity based Clustering of License Excerpts for Improved End-User Interpretation
References
 All images are from Pixabay which are released under Creative Commons CC0 into the public domain.
21 Najmeh Mousavi Nejad, Semantic Similarity based Clustering of License Excerpts for Improved End-User Interpretation
Backup Slide
 Agglomerative hierarchical clustering (HAC) is an established, well-known technique which has been shown to be a successful
method for text and document clustering
 Furthermore, among different HAC methods, the average linkage has been proved to be the most suitable one for text categorization
[1, 24]. Once the proper clustering technique is identifed, we can pass similarity matrices to the clustering component. The HAC
process continues until it reaches a pre-defned threshold.
 In average linkage hierarchical clustering, the distance between two clusters is defined as the average distance between each point
in one cluster to every point in the other cluster
22 Najmeh Mousavi Nejad, Semantic Similarity based Clustering of License Excerpts for Improved End-User Interpretation
Backup Slide
 DISCO computation method
𝑆𝑖𝑚 𝑇1, 𝑇2 =
𝑑𝑖𝑟𝑒𝑐𝑡𝑒𝑑𝑆𝑖𝑚 𝑇1, 𝑇2 + 𝑑𝑖𝑟𝑒𝑐𝑡𝑒𝑑𝑆𝑖𝑚(𝑇2, 𝑇1)
2
𝑑𝑖𝑟𝑒𝑐𝑡𝑒𝑑𝑆𝑖𝑚 𝑇1, 𝑇2 = 𝑖=1
𝑛
[ 𝑤𝑒𝑖𝑔ℎ𝑡(𝑤𝑖1)* max
1≤𝑗≤𝑛
𝑊𝑜𝑟𝑑𝑆𝑖𝑚 𝑤𝑖1, 𝑤𝑗2 ]
 5 participants could reveal about 80% of all usability problems that exist in a product (Nielsen,
1993).

More Related Content

Similar to Session 2.5 semantic similarity based clustering of license excerpts for improved end-user interpretation

Ui Design And Usability For Everybody
Ui Design And Usability For EverybodyUi Design And Usability For Everybody
Ui Design And Usability For EverybodyEmpatika
 
OTS - Everything you wanted to know but didn't ask
OTS - Everything you wanted to know but didn't askOTS - Everything you wanted to know but didn't ask
OTS - Everything you wanted to know but didn't ask
Jeff Hackney
 
Usability Primer - for Alberta Municipal Webmasters Working Group
Usability Primer - for Alberta Municipal Webmasters Working GroupUsability Primer - for Alberta Municipal Webmasters Working Group
Usability Primer - for Alberta Municipal Webmasters Working Group
NormanMendoza
 
Analysis of students behavior in the process of operating system experiments
Analysis of students behavior in the process of operating system experimentsAnalysis of students behavior in the process of operating system experiments
Analysis of students behavior in the process of operating system experiments
Farhanullah khan
 
Smas Hits May 11, 2009 Sensex Down 193 Points On Profit Booking
Smas Hits May 11, 2009 Sensex Down 193 Points On Profit BookingSmas Hits May 11, 2009 Sensex Down 193 Points On Profit Booking
Smas Hits May 11, 2009 Sensex Down 193 Points On Profit Booking
Jagannadham Thunuguntla
 
Pallavi_Chauhan_TestEngineer
Pallavi_Chauhan_TestEngineerPallavi_Chauhan_TestEngineer
Pallavi_Chauhan_TestEngineer
Pallavi Chauhan
 
Stepin evening presented
Stepin evening presentedStepin evening presented
Stepin evening presentedVijayan Reddy
 
professional fuzzy type-ahead rummage around in xml type-ahead search techni...
professional fuzzy type-ahead rummage around in xml  type-ahead search techni...professional fuzzy type-ahead rummage around in xml  type-ahead search techni...
professional fuzzy type-ahead rummage around in xml type-ahead search techni...
Kumar Goud
 
Monitoring_and_Managing_Peopelsoft_Resources-Namrata Zalewski
Monitoring_and_Managing_Peopelsoft_Resources-Namrata ZalewskiMonitoring_and_Managing_Peopelsoft_Resources-Namrata Zalewski
Monitoring_and_Managing_Peopelsoft_Resources-Namrata ZalewskiNamrata Zalewski
 
An IDE-Based Context-Aware Meta Search Engine
An IDE-Based Context-Aware Meta Search EngineAn IDE-Based Context-Aware Meta Search Engine
An IDE-Based Context-Aware Meta Search Engine
Masud Rahman
 
Query processing
Query processingQuery processing
Query processing
University of Potsdam
 
Aspect Oriented Programming
Aspect Oriented ProgrammingAspect Oriented Programming
Aspect Oriented Programming
Rodger Oates
 
Software Testing Principles and  Techniques
Software Testing Principles and  Techniques Software Testing Principles and  Techniques
Software Testing Principles and  Techniques
suresh ramanujam
 
STATISTICAL ANALYSIS FOR PERFORMANCE COMPARISON
STATISTICAL ANALYSIS FOR PERFORMANCE COMPARISONSTATISTICAL ANALYSIS FOR PERFORMANCE COMPARISON
STATISTICAL ANALYSIS FOR PERFORMANCE COMPARISON
ijseajournal
 
A new model for the selection of web development frameworks: application to P...
A new model for the selection of web development frameworks: application to P...A new model for the selection of web development frameworks: application to P...
A new model for the selection of web development frameworks: application to P...
IJECEIAES
 
Innovate2011 DevOps TSRM RTC
Innovate2011 DevOps TSRM RTCInnovate2011 DevOps TSRM RTC
Innovate2011 DevOps TSRM RTC
Steve Speicher
 
408372362-Student-Result-management-System-project-report-docx.docx
408372362-Student-Result-management-System-project-report-docx.docx408372362-Student-Result-management-System-project-report-docx.docx
408372362-Student-Result-management-System-project-report-docx.docx
santhoshyadav23
 
JAD Workshops
JAD WorkshopsJAD Workshops
JAD Workshops
hapy
 
EVALUATION OF SINGLE-SPAN MODELS ON EXTRACTIVE MULTI-SPAN QUESTION-ANSWERING
EVALUATION OF SINGLE-SPAN MODELS ON EXTRACTIVE MULTI-SPAN QUESTION-ANSWERINGEVALUATION OF SINGLE-SPAN MODELS ON EXTRACTIVE MULTI-SPAN QUESTION-ANSWERING
EVALUATION OF SINGLE-SPAN MODELS ON EXTRACTIVE MULTI-SPAN QUESTION-ANSWERING
IJwest
 

Similar to Session 2.5 semantic similarity based clustering of license excerpts for improved end-user interpretation (20)

Ui Design And Usability For Everybody
Ui Design And Usability For EverybodyUi Design And Usability For Everybody
Ui Design And Usability For Everybody
 
OTS - Everything you wanted to know but didn't ask
OTS - Everything you wanted to know but didn't askOTS - Everything you wanted to know but didn't ask
OTS - Everything you wanted to know but didn't ask
 
Usability Primer - for Alberta Municipal Webmasters Working Group
Usability Primer - for Alberta Municipal Webmasters Working GroupUsability Primer - for Alberta Municipal Webmasters Working Group
Usability Primer - for Alberta Municipal Webmasters Working Group
 
Analysis of students behavior in the process of operating system experiments
Analysis of students behavior in the process of operating system experimentsAnalysis of students behavior in the process of operating system experiments
Analysis of students behavior in the process of operating system experiments
 
Smas Hits May 11, 2009 Sensex Down 193 Points On Profit Booking
Smas Hits May 11, 2009 Sensex Down 193 Points On Profit BookingSmas Hits May 11, 2009 Sensex Down 193 Points On Profit Booking
Smas Hits May 11, 2009 Sensex Down 193 Points On Profit Booking
 
Pallavi_Chauhan_TestEngineer
Pallavi_Chauhan_TestEngineerPallavi_Chauhan_TestEngineer
Pallavi_Chauhan_TestEngineer
 
Stepin evening presented
Stepin evening presentedStepin evening presented
Stepin evening presented
 
professional fuzzy type-ahead rummage around in xml type-ahead search techni...
professional fuzzy type-ahead rummage around in xml  type-ahead search techni...professional fuzzy type-ahead rummage around in xml  type-ahead search techni...
professional fuzzy type-ahead rummage around in xml type-ahead search techni...
 
Monitoring_and_Managing_Peopelsoft_Resources-Namrata Zalewski
Monitoring_and_Managing_Peopelsoft_Resources-Namrata ZalewskiMonitoring_and_Managing_Peopelsoft_Resources-Namrata Zalewski
Monitoring_and_Managing_Peopelsoft_Resources-Namrata Zalewski
 
An IDE-Based Context-Aware Meta Search Engine
An IDE-Based Context-Aware Meta Search EngineAn IDE-Based Context-Aware Meta Search Engine
An IDE-Based Context-Aware Meta Search Engine
 
Query processing
Query processingQuery processing
Query processing
 
Aspect Oriented Programming
Aspect Oriented ProgrammingAspect Oriented Programming
Aspect Oriented Programming
 
Software Testing Principles and  Techniques
Software Testing Principles and  Techniques Software Testing Principles and  Techniques
Software Testing Principles and  Techniques
 
STATISTICAL ANALYSIS FOR PERFORMANCE COMPARISON
STATISTICAL ANALYSIS FOR PERFORMANCE COMPARISONSTATISTICAL ANALYSIS FOR PERFORMANCE COMPARISON
STATISTICAL ANALYSIS FOR PERFORMANCE COMPARISON
 
A new model for the selection of web development frameworks: application to P...
A new model for the selection of web development frameworks: application to P...A new model for the selection of web development frameworks: application to P...
A new model for the selection of web development frameworks: application to P...
 
Innovate2011 DevOps TSRM RTC
Innovate2011 DevOps TSRM RTCInnovate2011 DevOps TSRM RTC
Innovate2011 DevOps TSRM RTC
 
408372362-Student-Result-management-System-project-report-docx.docx
408372362-Student-Result-management-System-project-report-docx.docx408372362-Student-Result-management-System-project-report-docx.docx
408372362-Student-Result-management-System-project-report-docx.docx
 
BA Resume
BA  ResumeBA  Resume
BA Resume
 
JAD Workshops
JAD WorkshopsJAD Workshops
JAD Workshops
 
EVALUATION OF SINGLE-SPAN MODELS ON EXTRACTIVE MULTI-SPAN QUESTION-ANSWERING
EVALUATION OF SINGLE-SPAN MODELS ON EXTRACTIVE MULTI-SPAN QUESTION-ANSWERINGEVALUATION OF SINGLE-SPAN MODELS ON EXTRACTIVE MULTI-SPAN QUESTION-ANSWERING
EVALUATION OF SINGLE-SPAN MODELS ON EXTRACTIVE MULTI-SPAN QUESTION-ANSWERING
 

More from semanticsconference

Linear books to open world adventure
Linear books to open world adventureLinear books to open world adventure
Linear books to open world adventure
semanticsconference
 
Session 1.2 high-precision, context-free entity linking exploiting unambigu...
Session 1.2   high-precision, context-free entity linking exploiting unambigu...Session 1.2   high-precision, context-free entity linking exploiting unambigu...
Session 1.2 high-precision, context-free entity linking exploiting unambigu...
semanticsconference
 
Session 4.3 semantic annotation for enhancing collaborative ideation
Session 4.3   semantic annotation for enhancing collaborative ideationSession 4.3   semantic annotation for enhancing collaborative ideation
Session 4.3 semantic annotation for enhancing collaborative ideation
semanticsconference
 
Session 1.1 dalicc - data licenses clearance center
Session 1.1   dalicc - data licenses clearance centerSession 1.1   dalicc - data licenses clearance center
Session 1.1 dalicc - data licenses clearance center
semanticsconference
 
Session 1.3 context information management across smart city knowledge domains
Session 1.3   context information management across smart city knowledge domainsSession 1.3   context information management across smart city knowledge domains
Session 1.3 context information management across smart city knowledge domains
semanticsconference
 
Session 0.0 aussenac semanticsnl-pwebsem2017-v4
Session 0.0   aussenac semanticsnl-pwebsem2017-v4Session 0.0   aussenac semanticsnl-pwebsem2017-v4
Session 0.0 aussenac semanticsnl-pwebsem2017-v4
semanticsconference
 
Session 0.0 keynote sandeep sacheti - final hi res
Session 0.0   keynote sandeep sacheti - final hi resSession 0.0   keynote sandeep sacheti - final hi res
Session 0.0 keynote sandeep sacheti - final hi res
semanticsconference
 
Session 1.1 linked data applied: a field report from the netherlands
Session 1.1   linked data applied: a field report from the netherlandsSession 1.1   linked data applied: a field report from the netherlands
Session 1.1 linked data applied: a field report from the netherlands
semanticsconference
 
Session 1.2 enrich your knowledge graphs: linked data integration with pool...
Session 1.2   enrich your knowledge graphs: linked data integration with pool...Session 1.2   enrich your knowledge graphs: linked data integration with pool...
Session 1.2 enrich your knowledge graphs: linked data integration with pool...
semanticsconference
 
Session 1.4 connecting information from legislation and datasets using a ca...
Session 1.4   connecting information from legislation and datasets using a ca...Session 1.4   connecting information from legislation and datasets using a ca...
Session 1.4 connecting information from legislation and datasets using a ca...
semanticsconference
 
Session 1.4 a distributed network of heritage information
Session 1.4   a distributed network of heritage informationSession 1.4   a distributed network of heritage information
Session 1.4 a distributed network of heritage information
semanticsconference
 
Session 0.0 media panel - matthias priem - gtuo - semantics 2017
Session 0.0   media panel - matthias priem - gtuo - semantics 2017Session 0.0   media panel - matthias priem - gtuo - semantics 2017
Session 0.0 media panel - matthias priem - gtuo - semantics 2017
semanticsconference
 
Session 1.3 semantic asset management in the dutch rail engineering and con...
Session 1.3   semantic asset management in the dutch rail engineering and con...Session 1.3   semantic asset management in the dutch rail engineering and con...
Session 1.3 semantic asset management in the dutch rail engineering and con...
semanticsconference
 
Session 1.3 energy, smart homes & smart grids: towards interoperability...
Session 1.3   energy, smart homes & smart grids: towards interoperability...Session 1.3   energy, smart homes & smart grids: towards interoperability...
Session 1.3 energy, smart homes & smart grids: towards interoperability...
semanticsconference
 
Session 1.2 improving access to digital content by semantic enrichment
Session 1.2   improving access to digital content by semantic enrichmentSession 1.2   improving access to digital content by semantic enrichment
Session 1.2 improving access to digital content by semantic enrichment
semanticsconference
 
Session 2.3 semantics for safeguarding & security – a police story
Session 2.3   semantics for safeguarding & security – a police storySession 2.3   semantics for safeguarding & security – a police story
Session 2.3 semantics for safeguarding & security – a police story
semanticsconference
 
Session 4.2 unleash the triple: leveraging a corporate discovery interface....
Session 4.2   unleash the triple: leveraging a corporate discovery interface....Session 4.2   unleash the triple: leveraging a corporate discovery interface....
Session 4.2 unleash the triple: leveraging a corporate discovery interface....
semanticsconference
 
Session 1.6 slovak public metadata governance and management based on linke...
Session 1.6   slovak public metadata governance and management based on linke...Session 1.6   slovak public metadata governance and management based on linke...
Session 1.6 slovak public metadata governance and management based on linke...
semanticsconference
 
Session 5.6 towards a semantic outlier detection framework in wireless sens...
Session 5.6   towards a semantic outlier detection framework in wireless sens...Session 5.6   towards a semantic outlier detection framework in wireless sens...
Session 5.6 towards a semantic outlier detection framework in wireless sens...
semanticsconference
 
Session 2.2 ontology-guided job market demand analysis: a cross-sectional s...
Session 2.2   ontology-guided job market demand analysis: a cross-sectional s...Session 2.2   ontology-guided job market demand analysis: a cross-sectional s...
Session 2.2 ontology-guided job market demand analysis: a cross-sectional s...
semanticsconference
 

More from semanticsconference (20)

Linear books to open world adventure
Linear books to open world adventureLinear books to open world adventure
Linear books to open world adventure
 
Session 1.2 high-precision, context-free entity linking exploiting unambigu...
Session 1.2   high-precision, context-free entity linking exploiting unambigu...Session 1.2   high-precision, context-free entity linking exploiting unambigu...
Session 1.2 high-precision, context-free entity linking exploiting unambigu...
 
Session 4.3 semantic annotation for enhancing collaborative ideation
Session 4.3   semantic annotation for enhancing collaborative ideationSession 4.3   semantic annotation for enhancing collaborative ideation
Session 4.3 semantic annotation for enhancing collaborative ideation
 
Session 1.1 dalicc - data licenses clearance center
Session 1.1   dalicc - data licenses clearance centerSession 1.1   dalicc - data licenses clearance center
Session 1.1 dalicc - data licenses clearance center
 
Session 1.3 context information management across smart city knowledge domains
Session 1.3   context information management across smart city knowledge domainsSession 1.3   context information management across smart city knowledge domains
Session 1.3 context information management across smart city knowledge domains
 
Session 0.0 aussenac semanticsnl-pwebsem2017-v4
Session 0.0   aussenac semanticsnl-pwebsem2017-v4Session 0.0   aussenac semanticsnl-pwebsem2017-v4
Session 0.0 aussenac semanticsnl-pwebsem2017-v4
 
Session 0.0 keynote sandeep sacheti - final hi res
Session 0.0   keynote sandeep sacheti - final hi resSession 0.0   keynote sandeep sacheti - final hi res
Session 0.0 keynote sandeep sacheti - final hi res
 
Session 1.1 linked data applied: a field report from the netherlands
Session 1.1   linked data applied: a field report from the netherlandsSession 1.1   linked data applied: a field report from the netherlands
Session 1.1 linked data applied: a field report from the netherlands
 
Session 1.2 enrich your knowledge graphs: linked data integration with pool...
Session 1.2   enrich your knowledge graphs: linked data integration with pool...Session 1.2   enrich your knowledge graphs: linked data integration with pool...
Session 1.2 enrich your knowledge graphs: linked data integration with pool...
 
Session 1.4 connecting information from legislation and datasets using a ca...
Session 1.4   connecting information from legislation and datasets using a ca...Session 1.4   connecting information from legislation and datasets using a ca...
Session 1.4 connecting information from legislation and datasets using a ca...
 
Session 1.4 a distributed network of heritage information
Session 1.4   a distributed network of heritage informationSession 1.4   a distributed network of heritage information
Session 1.4 a distributed network of heritage information
 
Session 0.0 media panel - matthias priem - gtuo - semantics 2017
Session 0.0   media panel - matthias priem - gtuo - semantics 2017Session 0.0   media panel - matthias priem - gtuo - semantics 2017
Session 0.0 media panel - matthias priem - gtuo - semantics 2017
 
Session 1.3 semantic asset management in the dutch rail engineering and con...
Session 1.3   semantic asset management in the dutch rail engineering and con...Session 1.3   semantic asset management in the dutch rail engineering and con...
Session 1.3 semantic asset management in the dutch rail engineering and con...
 
Session 1.3 energy, smart homes & smart grids: towards interoperability...
Session 1.3   energy, smart homes & smart grids: towards interoperability...Session 1.3   energy, smart homes & smart grids: towards interoperability...
Session 1.3 energy, smart homes & smart grids: towards interoperability...
 
Session 1.2 improving access to digital content by semantic enrichment
Session 1.2   improving access to digital content by semantic enrichmentSession 1.2   improving access to digital content by semantic enrichment
Session 1.2 improving access to digital content by semantic enrichment
 
Session 2.3 semantics for safeguarding & security – a police story
Session 2.3   semantics for safeguarding & security – a police storySession 2.3   semantics for safeguarding & security – a police story
Session 2.3 semantics for safeguarding & security – a police story
 
Session 4.2 unleash the triple: leveraging a corporate discovery interface....
Session 4.2   unleash the triple: leveraging a corporate discovery interface....Session 4.2   unleash the triple: leveraging a corporate discovery interface....
Session 4.2 unleash the triple: leveraging a corporate discovery interface....
 
Session 1.6 slovak public metadata governance and management based on linke...
Session 1.6   slovak public metadata governance and management based on linke...Session 1.6   slovak public metadata governance and management based on linke...
Session 1.6 slovak public metadata governance and management based on linke...
 
Session 5.6 towards a semantic outlier detection framework in wireless sens...
Session 5.6   towards a semantic outlier detection framework in wireless sens...Session 5.6   towards a semantic outlier detection framework in wireless sens...
Session 5.6 towards a semantic outlier detection framework in wireless sens...
 
Session 2.2 ontology-guided job market demand analysis: a cross-sectional s...
Session 2.2   ontology-guided job market demand analysis: a cross-sectional s...Session 2.2   ontology-guided job market demand analysis: a cross-sectional s...
Session 2.2 ontology-guided job market demand analysis: a cross-sectional s...
 

Recently uploaded

FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
Abida Shariff
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
Bhaskar Mitra
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 

Recently uploaded (20)

FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 

Session 2.5 semantic similarity based clustering of license excerpts for improved end-user interpretation

  • 1. Semantic Similarity based Clustering of License Excerpts for Improved End-User Interpretation Najmeh Mousavi Nejad, Simon Scerri & Sören Auer SEMANTiCS17 - 13th International Conference on Semantic Systems Amsterdam, September 11 - 14. 2017
  • 2. 2 Najmeh Mousavi Nejad, Semantic Similarity based Clustering of License Excerpts for Improved End-User Interpretation Motivation  Online research commissioned by Skandia1  Only 7% read online End-User License Agreements (EULAs) when signing up for products & services  21% suffered as a result of ticking EULA box without reading them 1 http://www.prnewswire.co.uk/news-releases/skandia-takes-the-terminal-out-of-terms-and-conditions-145280565.html  10% locked into a longer term contract than they expected  5% lost money by not being able to cancel or amend hotels or holidays
  • 3. 3 Najmeh Mousavi Nejad, Semantic Similarity based Clustering of License Excerpts for Improved End-User Interpretation Problem Statement  Given an EULA (End-User License Agreement) in the natural language, we want to provide a user-friendly summary of permissions, prohibitions & duties.  Focus of this Work: clustering similar extracted excerpts of EULAs for the benefit of end-user You may reproduce, prepare Derivative Works of, publicly display, publicly perform, sublicense, and distribute the Work and such Derivative Works in Source or Object form. You may reproduce and distribute copies of the Work or Derivative Works with or without modifications and You may add Your own attribution notices within Derivative Works
  • 4. 4 Najmeh Mousavi Nejad, Semantic Similarity based Clustering of License Excerpts for Improved End-User Interpretation Related Work  Manual: online service (tldrlegal.com)  Semi-automatic  NLL2RDF (Cabrio, et al., ESWC 2014)  First attempt to generate RDF expressions of EULAs  Exploit CC REL & ODRL vocabularies & supervised machine learning  Limitation: few number of rights  EULAide (Mousavi Nejad, et al., SEMANTiCS 2016)  Exploits ontology-based information extraction to extract permissions, prohibitions & duties from EULAs  Taken as a basis in this work  Limitation: a lot of extracted excerpts for long EULAs
  • 5. 5 Najmeh Mousavi Nejad, Semantic Similarity based Clustering of License Excerpts for Improved End-User Interpretation Approach  Building a similarity matrix for each class considering different features of extracted segments  Features for each class (e.g., permission, prohibitions & duties) are extracted with JAPE rules  Action, Condition, PolicyType  Exploiting a distributional semantic approach for computing short text similarity  DISCO: DIstributionally related words using CO-occurrences  Has a word space builder: creates the word embedding database over a text corpus  Computes semantic similarity based on the word vectors
  • 6. 6 Najmeh Mousavi Nejad, Semantic Similarity based Clustering of License Excerpts for Improved End-User Interpretation Approach – Continued  Using a hierarchical clustering algorithm to cluster each class (e.g., permission, prohibitions & duties) a c , d e , f a , c , d b e , f g b c d e f g
  • 7. 7 Najmeh Mousavi Nejad, Semantic Similarity based Clustering of License Excerpts for Improved End-User Interpretation Architecture User Front-end Web Interface HTML, CSS, Angular JS EULA File Permission, Duty & Prohibition clusters Back-end FormData/ String XML/JSON data Application Server Ontology File Permission, Duty, Prohibition DISCO API 3 Similarity Matrixes for the 3 classes Clustering Algorithm 3 classes, each clustered based on their similarities Customized GATE OBIE Pipeline (feature extraction phase added) Stanford Dependency Parser Annotations with features Annotations with extracted objects Or URL
  • 8. 8 Najmeh Mousavi Nejad, Semantic Similarity based Clustering of License Excerpts for Improved End-User Interpretation Feature Extraction  Three different features are extracted with JAPE rules  Sequence of actions: copy, share, distribute  Condition on which a specific action is granted or forbidden or obliged  PolicyType: copyright, patent or intellectual property right If you join a Dropbox for business account, you must use it in compliance with your employer’s terms & policies. Each contributor grants you a patent license to make, use, sell, import and transfer the work. condition action policy type action
  • 9. 9 Najmeh Mousavi Nejad, Semantic Similarity based Clustering of License Excerpts for Improved End-User Interpretation Sketch of Semantic Clustering Algorithm Input: permissions, prohibitions, duties with features 1: for all three classes do 2: for all segment pairs in each class do 3: A = similarity between actions 4: B = similarity between conditions 5: C = similarity between policy types 6: D = similarity between the remainders of segments 7: finalSim = A + B + C + D 8: add finalSim to the corresponding matrix cell 9: end for 10: do HAC clustering for the matrix with a threshold 11: end for Output: clustered permissions, prohibitions & duties
  • 10. 10 Najmeh Mousavi Nejad, Semantic Similarity based Clustering of License Excerpts for Improved End-User Interpretation EULAide Platform Web Interface
  • 11. 11 Najmeh Mousavi Nejad, Semantic Similarity based Clustering of License Excerpts for Improved End-User Interpretation Experiments  The Clustering Approach Evaluation  Does the semantic based clustering compress information?  Does the feature extraction phase improve the result?  Usability experiments  Does using EULAide need less time and effort for EULA comprehension?  Does semi-automatic IE lead to information loss?  How easy is exploiting EULAide regarding human perception?
  • 12. 12 Najmeh Mousavi Nejad, Semantic Similarity based Clustering of License Excerpts for Improved End-User Interpretation Clustering Approach Evaluation  Result of clustering for 4 EULAs  Clusters-M (Baseline): without features  Clusters-Mf (our algorithm): with feature extraction phase #Instances #Baseline Clusters #Our Approach Clusters Permission 30 18 20 Duty 27 24 20 Prohibition 40 32 35  Does the semantic based clustering compress information? YES
  • 13. 13 Najmeh Mousavi Nejad, Semantic Similarity based Clustering of License Excerpts for Improved End-User Interpretation Clustering Approach Evaluation  Compiling a gold standard for EULA clustering is hard  Solution: measuring the inter-annotator agreement approximately  Five subject were asked to devise their own clustering criteria as they best deemed fit  Computing the agreement between them: 𝑅𝑎𝑛𝑑 𝐼𝑛𝑑𝑒𝑥 = 𝑇𝑃 + 𝑇𝑁 𝑇𝑃 + 𝐹𝑃 + 𝐹𝑁 + 𝑇𝑁  Does the feature extraction phase improve the result? h1 h2 h3 h4 h5 M Mf h1 (1) 0.71 0.89 0.91 0.71 0.93 0.97 h2 * (1) 0.61 0.63 0.95 0.7 0.68 h3 * * (1) 0.92 0.63 0.86 0.86 h4 * * * (1) 0.65 0.85 0.88 h5 * * * * (1) 0.7 0.67 RI of clustering Result for Duties
  • 14. 14 Najmeh Mousavi Nejad, Semantic Similarity based Clustering of License Excerpts for Improved End-User Interpretation Clustering Approach Evaluation  Accumulated deviation of Mf from Human = 9%  Accumulated deviation of M from Human = 10%  Does the feature extraction phase improve the result? Human M Mf Permission 0.79 0.83 0.81 Duty 0.76 0.81 0.81 Prohibition 0.83 0.84 0.85  Feature-based approach:  Generates more fined-grained clusters  Are more attuned to human intuition and perception average rand index YES
  • 15. 15 Najmeh Mousavi Nejad, Semantic Similarity based Clustering of License Excerpts for Improved End-User Interpretation Usability Evaluation  Does using EULAide need less time and effort for EULA comprehension?  Does semi-automatic IE lead to information loss?  Experimental setup for the 4 EULAs  A legal expert designed 5 multiple choices questions for each EULA  6 students read EULAs in 2 modes: natural text & EULAide h1 h2 h3 h4 h5 h6 1-full × × × 1-EULAide × × × 2-full × × × 2-EULAide × × × 3-full × × × 3-EULAide × × × 4-full × × × 4-EULAide × × ×  They answered the questions in 2 phases  Phase1: using memory without looking  Phase2: using search tools for finding the answers
  • 16. 16 Najmeh Mousavi Nejad, Semantic Similarity based Clustering of License Excerpts for Improved End-User Interpretation Usability Evaluation Correct Incorrect Unanswered in phase1 Phase2 correct Phase2 incorrect Phase2 Unanswered EULA-full 67 8 18.5 5 1.5 EULAide 62 15 6.5 4.5 12 Reading Answering phase1 Answering phase2 EULA-full 1185 75 152 EULAide 315 72 77  Does using EULAide need less time and effort for EULA comprehension? YES  Does semi-automatic IE lead to information loss? YES average time in seconds average percentage of questions results (%)
  • 17. 17 Najmeh Mousavi Nejad, Semantic Similarity based Clustering of License Excerpts for Improved End-User Interpretation Usability Test  How easy is using EULAide regarding human perception?  USE questionnaire: usefulness, satisfaction, ease of learning and ease of use1  Contains 30 questions  1 = strongly dissatisfied, 7 = strongly satisfied  With 6 participants 1 http://garyperlman.com/quest/quest.cgi?form=USE Usefulness Ease of use Ease of learning satisfaction 6.14 6.11 6.75 6.0  Recommendations by participants:  Extending the idea of summary in the header of each accordion  Including other aspects of EULAs: what is the agreement between us and the service provider?
  • 18. 18 Najmeh Mousavi Nejad, Semantic Similarity based Clustering of License Excerpts for Improved End-User Interpretation Conclusions & Future Works  EULAide is the first comprehensive approach for EULA interpretation  Clustering is effective and reduces the number of relevant terms for users to focus on initially  EULAide is visual & simple to digest EULAs  It saves around 75% of the time  But it has a marginal price of 10.5% loss of valuable information  Future works  Improving the feature extraction phase  Extending the summarization technique of each cluster for the benefit of end users nejad@cs.uni-bonn.de
  • 20. 20 Najmeh Mousavi Nejad, Semantic Similarity based Clustering of License Excerpts for Improved End-User Interpretation References  All images are from Pixabay which are released under Creative Commons CC0 into the public domain.
  • 21. 21 Najmeh Mousavi Nejad, Semantic Similarity based Clustering of License Excerpts for Improved End-User Interpretation Backup Slide  Agglomerative hierarchical clustering (HAC) is an established, well-known technique which has been shown to be a successful method for text and document clustering  Furthermore, among different HAC methods, the average linkage has been proved to be the most suitable one for text categorization [1, 24]. Once the proper clustering technique is identifed, we can pass similarity matrices to the clustering component. The HAC process continues until it reaches a pre-defned threshold.  In average linkage hierarchical clustering, the distance between two clusters is defined as the average distance between each point in one cluster to every point in the other cluster
  • 22. 22 Najmeh Mousavi Nejad, Semantic Similarity based Clustering of License Excerpts for Improved End-User Interpretation Backup Slide  DISCO computation method 𝑆𝑖𝑚 𝑇1, 𝑇2 = 𝑑𝑖𝑟𝑒𝑐𝑡𝑒𝑑𝑆𝑖𝑚 𝑇1, 𝑇2 + 𝑑𝑖𝑟𝑒𝑐𝑡𝑒𝑑𝑆𝑖𝑚(𝑇2, 𝑇1) 2 𝑑𝑖𝑟𝑒𝑐𝑡𝑒𝑑𝑆𝑖𝑚 𝑇1, 𝑇2 = 𝑖=1 𝑛 [ 𝑤𝑒𝑖𝑔ℎ𝑡(𝑤𝑖1)* max 1≤𝑗≤𝑛 𝑊𝑜𝑟𝑑𝑆𝑖𝑚 𝑤𝑖1, 𝑤𝑗2 ]  5 participants could reveal about 80% of all usability problems that exist in a product (Nielsen, 1993).

Editor's Notes

  1. An online service, which uses a manual, crowdsourced way to help people understand the most commonly-used licenses. It is supported by users and everyone can create an account and suggest a short summary of a chosen license and then upload the EULA to the website.
  2. When available, the quality of clustering methods is best measured by comparing machine-generated clusters with a reliable gold standard. However, considering the complexity and broadness of EULAs, it is very difcult to obtain or compile a suitable gold standard that is agreed and accepted by a majority. Therefore, as an alternative method we designed an experiment that can compare clustering preferences between human evaluators (to approximate inter-annotator agreement), and compare them with those generated computationally. To determine the usefulness of feature extraction, a second machine-generated clustering was included for a secondary comparison
  3. When available, the quality of clustering methods is best measured by comparing machine-generated clusters with a reliable gold standard. However, considering the complexity and broadness of EULAs, it is very difcult to obtain or compile a suitable gold standard that is agreed and accepted by a majority. Therefore, as an alternative method we designed an experiment that can compare clustering preferences between human evaluators (to approximate inter-annotator agreement), and compare them with those generated computationally. To determine the usefulness of feature extraction, a second machine-generated clustering was included for a secondary comparison
  4. Each student read two EULA in natural text & the other two in EULAide Each EULA in natural text was read by 3 different students Each EULA with EULAide was studied by 3 different students The rational behind this setting was measuring how good users can remember policies and also how fast they can search for information in the EULA. The primary purpose of EULAide is to get an overview of an EULA before accepting it. In practice, when one is agreeing with terms and conditions, they should try to remember the important parts of license agreement in order to avoid infringing the regulations.
  5. Distributional semantic approaches beneft from the observation that words in the similar contexts tend to have similar meanings