The document discusses an approach to clustering similar excerpts extracted from end-user license agreements (EULAs) to provide a more user-friendly summary. It extracts permissions, prohibitions, and duties from EULAs using an ontology-based information extraction system. It then computes semantic similarity between excerpts and clusters them using hierarchical clustering. An evaluation found the clustering compressed information while preserving essential meanings, and users could understand EULAs faster and with less effort using the clustered summaries compared to original texts.
Usability Primer - for Alberta Municipal Webmasters Working GroupNormanMendoza
Presentation provided on December 1, 2006. References:
“A Practical Guide to Usability Testing” by Joseph S. Dumas and Janice C. Redish
The Elements of User Experience, diagram by Jesse James Garrett
“Markets are certainly looking at election results with some apprehension, but what is also true is that they are in for a correction. Elections might act as the trigger for such a correction,” said Jagannadham Thunuguntla, equity head at SMC Capitals.
professional fuzzy type-ahead rummage around in xml type-ahead search techni...Kumar Goud
Abstract – It is a research venture on the new information-access standard called type-ahead search, in which systems discover responds to a keyword query on-the-fly as users type in the uncertainty. In this paper we learn how to support fuzzy type-ahead search in XML. Underneath fuzzy search is important when users have limited knowledge about the exact representation of the entities they are looking for, such as people records in an online directory. We have developed and deployed several such systems, some of which have been used by many people on a daily basis. The systems received overwhelmingly positive feedbacks from users due to their friendly interfaces with the fuzzy-search feature. We describe the design and implementation of the systems, and demonstrate several such systems. We show that our efficient techniques can indeed allow this search paradigm to scale on large amounts of data.
Index Terms - type-ahead, large data set, server side, online directory, search technique.
STATISTICAL ANALYSIS FOR PERFORMANCE COMPARISONijseajournal
Performance responsiveness and scalability is a make-or-break quality for software. Nearly everyone runs into performance problems at one time or another. This paper discusses about performance issues faced during Pre Examination Process Automation System (PEPAS) implemented in java technology. The challenges faced during the life cycle of the project and the mitigation actions performed. It compares 3 java technologies and shows how improvements are made through statistical analysis in response time of the application. The paper concludes with result analysis.
A new model for the selection of web development frameworks: application to P...IJECEIAES
The use of a framework is often essential for medium and large scale developments, but is also of interest for small developments. PHP has evolved as the scripting language the most chosen by developers, which has generated an explosion of PHP frameworks. There is a big debate about what the best PHP frameworks are, because the simple fact is that not all frameworks are built for everyone. Indeed, not all frameworks meet the same needs, and several frameworks can be used together in certain situations. Choosing the right framework, however, can sometimes be difficult. In order to make the selection process easier, we propose a pragmatic and complete model to compare and evaluate the main PHP frameworks. This model is based on a set of comparison criteria based on the Intrinsic durability, industrialized solution, technical adaptability, strategy, technical architecture and Speed criteria. Results show that the values of these criteria allow developers to easily and properly choose the framwork that best meets their needs
Joint Application Design (JAD) was developed by IBM in the late 1970s. It is a requirements determination method that brings together business and IT professionals in a structured workshop to determine and discuss system requirements
EVALUATION OF SINGLE-SPAN MODELS ON EXTRACTIVE MULTI-SPAN QUESTION-ANSWERINGIJwest
Machine Reading Comprehension (MRC), particularly extractive close-domain question-answering, is a prominent field in Natural Language Processing (NLP). Given a question and a passage or set of passages, a machine must be able to extract the appropriate answer from the passage(s). However, the majority of these existing questions have only one answer, and more substantial testing on questions with multiple answers, or multi-span questions, has not yet been applied. Thus, we introduce a newly compiled dataset consisting of questions with multiple answers that originate from previously existing datasets. In addition, we run BERT-based models pre-trained for question-answering on our constructed dataset to evaluate their reading comprehension abilities. Runtime of base models on the entire dataset is approximately one day while the runtime for all models on a third of the dataset is a little over two days. Among the three of BERT-based models we ran, RoBERTa exhibits the highest consistent performance, regardless of size. We find that all our models perform similarly on this new, multi-span dataset compared to the single-span source datasets. While the models tested on the source datasets were slightly fine-tuned in order to return multiple answers, performance is similar enough to judge that task formulation does not drastically affect question-answering abilities. Our evaluations indicate that these models are indeed capable of adjusting to answer questions that require multiple answers. We hope that our findings will assist future development in question-answering and improve existing question-answering products and methods.
Usability Primer - for Alberta Municipal Webmasters Working GroupNormanMendoza
Presentation provided on December 1, 2006. References:
“A Practical Guide to Usability Testing” by Joseph S. Dumas and Janice C. Redish
The Elements of User Experience, diagram by Jesse James Garrett
“Markets are certainly looking at election results with some apprehension, but what is also true is that they are in for a correction. Elections might act as the trigger for such a correction,” said Jagannadham Thunuguntla, equity head at SMC Capitals.
professional fuzzy type-ahead rummage around in xml type-ahead search techni...Kumar Goud
Abstract – It is a research venture on the new information-access standard called type-ahead search, in which systems discover responds to a keyword query on-the-fly as users type in the uncertainty. In this paper we learn how to support fuzzy type-ahead search in XML. Underneath fuzzy search is important when users have limited knowledge about the exact representation of the entities they are looking for, such as people records in an online directory. We have developed and deployed several such systems, some of which have been used by many people on a daily basis. The systems received overwhelmingly positive feedbacks from users due to their friendly interfaces with the fuzzy-search feature. We describe the design and implementation of the systems, and demonstrate several such systems. We show that our efficient techniques can indeed allow this search paradigm to scale on large amounts of data.
Index Terms - type-ahead, large data set, server side, online directory, search technique.
STATISTICAL ANALYSIS FOR PERFORMANCE COMPARISONijseajournal
Performance responsiveness and scalability is a make-or-break quality for software. Nearly everyone runs into performance problems at one time or another. This paper discusses about performance issues faced during Pre Examination Process Automation System (PEPAS) implemented in java technology. The challenges faced during the life cycle of the project and the mitigation actions performed. It compares 3 java technologies and shows how improvements are made through statistical analysis in response time of the application. The paper concludes with result analysis.
A new model for the selection of web development frameworks: application to P...IJECEIAES
The use of a framework is often essential for medium and large scale developments, but is also of interest for small developments. PHP has evolved as the scripting language the most chosen by developers, which has generated an explosion of PHP frameworks. There is a big debate about what the best PHP frameworks are, because the simple fact is that not all frameworks are built for everyone. Indeed, not all frameworks meet the same needs, and several frameworks can be used together in certain situations. Choosing the right framework, however, can sometimes be difficult. In order to make the selection process easier, we propose a pragmatic and complete model to compare and evaluate the main PHP frameworks. This model is based on a set of comparison criteria based on the Intrinsic durability, industrialized solution, technical adaptability, strategy, technical architecture and Speed criteria. Results show that the values of these criteria allow developers to easily and properly choose the framwork that best meets their needs
Joint Application Design (JAD) was developed by IBM in the late 1970s. It is a requirements determination method that brings together business and IT professionals in a structured workshop to determine and discuss system requirements
EVALUATION OF SINGLE-SPAN MODELS ON EXTRACTIVE MULTI-SPAN QUESTION-ANSWERINGIJwest
Machine Reading Comprehension (MRC), particularly extractive close-domain question-answering, is a prominent field in Natural Language Processing (NLP). Given a question and a passage or set of passages, a machine must be able to extract the appropriate answer from the passage(s). However, the majority of these existing questions have only one answer, and more substantial testing on questions with multiple answers, or multi-span questions, has not yet been applied. Thus, we introduce a newly compiled dataset consisting of questions with multiple answers that originate from previously existing datasets. In addition, we run BERT-based models pre-trained for question-answering on our constructed dataset to evaluate their reading comprehension abilities. Runtime of base models on the entire dataset is approximately one day while the runtime for all models on a third of the dataset is a little over two days. Among the three of BERT-based models we ran, RoBERTa exhibits the highest consistent performance, regardless of size. We find that all our models perform similarly on this new, multi-span dataset compared to the single-span source datasets. While the models tested on the source datasets were slightly fine-tuned in order to return multiple answers, performance is similar enough to judge that task formulation does not drastically affect question-answering abilities. Our evaluations indicate that these models are indeed capable of adjusting to answer questions that require multiple answers. We hope that our findings will assist future development in question-answering and improve existing question-answering products and methods.
Similar to Session 2.5 semantic similarity based clustering of license excerpts for improved end-user interpretation (20)
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Search and Society: Reimagining Information Access for Radical FuturesBhaskar Mitra
The field of Information retrieval (IR) is currently undergoing a transformative shift, at least partly due to the emerging applications of generative AI to information access. In this talk, we will deliberate on the sociotechnical implications of generative AI for information access. We will argue that there is both a critical necessity and an exciting opportunity for the IR community to re-center our research agendas on societal needs while dismantling the artificial separation between the work on fairness, accountability, transparency, and ethics in IR and the rest of IR research. Instead of adopting a reactionary strategy of trying to mitigate potential social harms from emerging technologies, the community should aim to proactively set the research agenda for the kinds of systems we should build inspired by diverse explicitly stated sociotechnical imaginaries. The sociotechnical imaginaries that underpin the design and development of information access technologies needs to be explicitly articulated, and we need to develop theories of change in context of these diverse perspectives. Our guiding future imaginaries must be informed by other academic fields, such as democratic theory and critical theory, and should be co-developed with social science scholars, legal scholars, civil rights and social justice activists, and artists, among others.
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
Session 2.5 semantic similarity based clustering of license excerpts for improved end-user interpretation
1. Semantic Similarity based Clustering of License
Excerpts for Improved End-User Interpretation
Najmeh Mousavi Nejad, Simon Scerri & Sören Auer
SEMANTiCS17 - 13th International Conference on Semantic Systems
Amsterdam, September 11 - 14. 2017
2. 2 Najmeh Mousavi Nejad, Semantic Similarity based Clustering of License Excerpts for Improved End-User Interpretation
Motivation
Online research commissioned by Skandia1
Only 7% read online End-User License Agreements
(EULAs) when signing up for products & services
21% suffered as a result of ticking EULA box without
reading them
1 http://www.prnewswire.co.uk/news-releases/skandia-takes-the-terminal-out-of-terms-and-conditions-145280565.html
10% locked into a longer term contract than they
expected
5% lost money by not being able to cancel or amend
hotels or holidays
3. 3 Najmeh Mousavi Nejad, Semantic Similarity based Clustering of License Excerpts for Improved End-User Interpretation
Problem Statement
Given an EULA (End-User License Agreement) in the natural language, we want to provide a
user-friendly summary of permissions, prohibitions & duties.
Focus of this Work: clustering similar extracted excerpts of EULAs for the benefit of end-user
You may reproduce, prepare Derivative Works of, publicly display, publicly perform, sublicense,
and distribute the Work and such Derivative Works in Source or Object form.
You may reproduce and distribute copies of the Work or Derivative Works with or without
modifications and You may add Your own attribution notices within Derivative Works
4. 4 Najmeh Mousavi Nejad, Semantic Similarity based Clustering of License Excerpts for Improved End-User Interpretation
Related Work
Manual: online service (tldrlegal.com)
Semi-automatic
NLL2RDF (Cabrio, et al., ESWC 2014)
First attempt to generate RDF expressions of EULAs
Exploit CC REL & ODRL vocabularies & supervised machine learning
Limitation: few number of rights
EULAide (Mousavi Nejad, et al., SEMANTiCS 2016)
Exploits ontology-based information extraction to extract permissions, prohibitions & duties from EULAs
Taken as a basis in this work
Limitation: a lot of extracted excerpts for long EULAs
5. 5 Najmeh Mousavi Nejad, Semantic Similarity based Clustering of License Excerpts for Improved End-User Interpretation
Approach
Building a similarity matrix for each class considering different features of extracted segments
Features for each class (e.g., permission, prohibitions & duties) are extracted with JAPE rules
Action, Condition, PolicyType
Exploiting a distributional semantic approach for computing short text similarity
DISCO: DIstributionally related words using CO-occurrences
Has a word space builder: creates the word embedding database over a text corpus
Computes semantic similarity based on the word vectors
6. 6 Najmeh Mousavi Nejad, Semantic Similarity based Clustering of License Excerpts for Improved End-User Interpretation
Approach – Continued
Using a hierarchical clustering algorithm to cluster each class (e.g., permission,
prohibitions & duties)
a
c , d e , f
a , c , d b e , f g
b c d e f g
7. 7 Najmeh Mousavi Nejad, Semantic Similarity based Clustering of License Excerpts for Improved End-User Interpretation
Architecture
User
Front-end
Web
Interface
HTML,
CSS,
Angular
JS
EULA File
Permission,
Duty &
Prohibition
clusters
Back-end
FormData/
String
XML/JSON
data
Application
Server
Ontology
File
Permission, Duty,
Prohibition
DISCO
API
3 Similarity
Matrixes for
the 3 classes
Clustering
Algorithm
3 classes, each clustered based on their similarities
Customized
GATE OBIE
Pipeline
(feature
extraction
phase added)
Stanford
Dependency
Parser
Annotations
with features
Annotations with
extracted objects
Or URL
8. 8 Najmeh Mousavi Nejad, Semantic Similarity based Clustering of License Excerpts for Improved End-User Interpretation
Feature Extraction
Three different features are extracted with JAPE rules
Sequence of actions: copy, share, distribute
Condition on which a specific action is granted or forbidden or obliged
PolicyType: copyright, patent or intellectual property right
If you join a Dropbox for business account, you must use it in compliance with your employer’s terms & policies.
Each contributor grants you a patent license to make, use, sell, import and transfer the work.
condition action
policy type action
9. 9 Najmeh Mousavi Nejad, Semantic Similarity based Clustering of License Excerpts for Improved End-User Interpretation
Sketch of Semantic Clustering Algorithm
Input: permissions, prohibitions, duties with features
1: for all three classes do
2: for all segment pairs in each class do
3: A = similarity between actions
4: B = similarity between conditions
5: C = similarity between policy types
6: D = similarity between the remainders of segments
7: finalSim = A + B + C + D
8: add finalSim to the corresponding matrix cell
9: end for
10: do HAC clustering for the matrix with a threshold
11: end for
Output: clustered permissions, prohibitions & duties
10. 10 Najmeh Mousavi Nejad, Semantic Similarity based Clustering of License Excerpts for Improved End-User Interpretation
EULAide Platform Web Interface
11. 11 Najmeh Mousavi Nejad, Semantic Similarity based Clustering of License Excerpts for Improved End-User Interpretation
Experiments
The Clustering Approach Evaluation
Does the semantic based clustering compress information?
Does the feature extraction phase improve the result?
Usability experiments
Does using EULAide need less time and effort for EULA comprehension?
Does semi-automatic IE lead to information loss?
How easy is exploiting EULAide regarding human perception?
12. 12 Najmeh Mousavi Nejad, Semantic Similarity based Clustering of License Excerpts for Improved End-User Interpretation
Clustering Approach Evaluation
Result of clustering for 4 EULAs
Clusters-M (Baseline): without features
Clusters-Mf (our algorithm): with feature extraction phase
#Instances #Baseline Clusters #Our Approach Clusters
Permission 30 18 20
Duty 27 24 20
Prohibition 40 32 35
Does the semantic based clustering compress information? YES
13. 13 Najmeh Mousavi Nejad, Semantic Similarity based Clustering of License Excerpts for Improved End-User Interpretation
Clustering Approach Evaluation
Compiling a gold standard for EULA clustering is hard
Solution: measuring the inter-annotator agreement
approximately
Five subject were asked to devise their own clustering
criteria as they best deemed fit
Computing the agreement between them:
𝑅𝑎𝑛𝑑 𝐼𝑛𝑑𝑒𝑥 =
𝑇𝑃 + 𝑇𝑁
𝑇𝑃 + 𝐹𝑃 + 𝐹𝑁 + 𝑇𝑁
Does the feature extraction phase improve the result?
h1 h2 h3 h4 h5 M Mf
h1 (1) 0.71 0.89 0.91 0.71 0.93 0.97
h2 * (1) 0.61 0.63 0.95 0.7 0.68
h3 * * (1) 0.92 0.63 0.86 0.86
h4 * * * (1) 0.65 0.85 0.88
h5 * * * * (1) 0.7 0.67
RI of clustering Result for Duties
14. 14 Najmeh Mousavi Nejad, Semantic Similarity based Clustering of License Excerpts for Improved End-User Interpretation
Clustering Approach Evaluation
Accumulated deviation of Mf from Human = 9%
Accumulated deviation of M from Human = 10%
Does the feature extraction phase improve the result?
Human M Mf
Permission 0.79 0.83 0.81
Duty 0.76 0.81 0.81
Prohibition 0.83 0.84 0.85
Feature-based approach:
Generates more fined-grained clusters
Are more attuned to human intuition and perception
average rand index
YES
15. 15 Najmeh Mousavi Nejad, Semantic Similarity based Clustering of License Excerpts for Improved End-User Interpretation
Usability Evaluation
Does using EULAide need less time and effort for EULA comprehension?
Does semi-automatic IE lead to information loss?
Experimental setup for the 4 EULAs
A legal expert designed 5 multiple choices questions for each EULA
6 students read EULAs in 2 modes: natural text & EULAide
h1 h2 h3 h4 h5 h6
1-full × × ×
1-EULAide × × ×
2-full × × ×
2-EULAide × × ×
3-full × × ×
3-EULAide × × ×
4-full × × ×
4-EULAide × × ×
They answered the questions in 2 phases
Phase1: using memory without looking
Phase2: using search tools for finding the answers
16. 16 Najmeh Mousavi Nejad, Semantic Similarity based Clustering of License Excerpts for Improved End-User Interpretation
Usability Evaluation
Correct Incorrect Unanswered in phase1
Phase2
correct
Phase2
incorrect
Phase2
Unanswered
EULA-full 67 8 18.5 5 1.5
EULAide 62 15 6.5 4.5 12
Reading Answering
phase1
Answering
phase2
EULA-full 1185 75 152
EULAide 315 72 77
Does using EULAide need less time and effort
for EULA comprehension? YES
Does semi-automatic IE lead to information
loss? YES
average time in seconds
average percentage of questions results (%)
17. 17 Najmeh Mousavi Nejad, Semantic Similarity based Clustering of License Excerpts for Improved End-User Interpretation
Usability Test
How easy is using EULAide regarding human perception?
USE questionnaire: usefulness, satisfaction, ease of learning and ease of use1
Contains 30 questions
1 = strongly dissatisfied, 7 = strongly satisfied
With 6 participants
1 http://garyperlman.com/quest/quest.cgi?form=USE
Usefulness Ease of
use
Ease of
learning
satisfaction
6.14 6.11 6.75 6.0
Recommendations by participants:
Extending the idea of summary in the header of each accordion
Including other aspects of EULAs: what is the agreement between us and the service provider?
18. 18 Najmeh Mousavi Nejad, Semantic Similarity based Clustering of License Excerpts for Improved End-User Interpretation
Conclusions & Future Works
EULAide is the first comprehensive approach for EULA interpretation
Clustering is effective and reduces the number of relevant terms for users to focus on initially
EULAide is visual & simple to digest EULAs
It saves around 75% of the time
But it has a marginal price of 10.5% loss of valuable information
Future works
Improving the feature extraction phase
Extending the summarization technique of each cluster for the benefit of end users
nejad@cs.uni-bonn.de
20. 20 Najmeh Mousavi Nejad, Semantic Similarity based Clustering of License Excerpts for Improved End-User Interpretation
References
All images are from Pixabay which are released under Creative Commons CC0 into the public domain.
21. 21 Najmeh Mousavi Nejad, Semantic Similarity based Clustering of License Excerpts for Improved End-User Interpretation
Backup Slide
Agglomerative hierarchical clustering (HAC) is an established, well-known technique which has been shown to be a successful
method for text and document clustering
Furthermore, among different HAC methods, the average linkage has been proved to be the most suitable one for text categorization
[1, 24]. Once the proper clustering technique is identifed, we can pass similarity matrices to the clustering component. The HAC
process continues until it reaches a pre-defned threshold.
In average linkage hierarchical clustering, the distance between two clusters is defined as the average distance between each point
in one cluster to every point in the other cluster
22. 22 Najmeh Mousavi Nejad, Semantic Similarity based Clustering of License Excerpts for Improved End-User Interpretation
Backup Slide
DISCO computation method
𝑆𝑖𝑚 𝑇1, 𝑇2 =
𝑑𝑖𝑟𝑒𝑐𝑡𝑒𝑑𝑆𝑖𝑚 𝑇1, 𝑇2 + 𝑑𝑖𝑟𝑒𝑐𝑡𝑒𝑑𝑆𝑖𝑚(𝑇2, 𝑇1)
2
𝑑𝑖𝑟𝑒𝑐𝑡𝑒𝑑𝑆𝑖𝑚 𝑇1, 𝑇2 = 𝑖=1
𝑛
[ 𝑤𝑒𝑖𝑔ℎ𝑡(𝑤𝑖1)* max
1≤𝑗≤𝑛
𝑊𝑜𝑟𝑑𝑆𝑖𝑚 𝑤𝑖1, 𝑤𝑗2 ]
5 participants could reveal about 80% of all usability problems that exist in a product (Nielsen,
1993).
Editor's Notes
An online service, which uses a manual, crowdsourced way to help people understand the most commonly-used licenses. It is supported by users and everyone can create an account and suggest a short summary of a chosen license and then upload the EULA to the website.
When available, the quality of clustering methods is best measured by comparing machine-generated clusters with a reliablegold standard. However, considering the complexity and broadnessof EULAs, it is very difcult to obtain or compile a suitable goldstandard that is agreed and accepted by a majority. Therefore, as analternative method we designed an experiment that can compareclustering preferences between human evaluators (to approximateinter-annotator agreement), and compare them with those generated computationally. To determine the usefulness of featureextraction, a second machine-generated clustering was includedfor a secondary comparison
When available, the quality of clustering methods is best measured by comparing machine-generated clusters with a reliablegold standard. However, considering the complexity and broadnessof EULAs, it is very difcult to obtain or compile a suitable goldstandard that is agreed and accepted by a majority. Therefore, as analternative method we designed an experiment that can compareclustering preferences between human evaluators (to approximateinter-annotator agreement), and compare them with those generated computationally. To determine the usefulness of featureextraction, a second machine-generated clustering was includedfor a secondary comparison
Each student read two EULA in natural text & the other two in EULAide
Each EULA in natural text was read by 3 different students
Each EULA with EULAide was studied by 3 different students
The rational behind this setting was measuring howgood users can remember policies and also how fast they can searchfor information in the EULA. The primary purpose of EULAide isto get an overview of an EULA before accepting it. In practice,when one is agreeing with terms and conditions, they should tryto remember the important parts of license agreement in order toavoid infringing the regulations.
Distributional semantic approaches beneft from the observation that words in the similar contexts tend to have similar meanings