Keynote delivered at the 2013 Workshop on
Using Predictive Coding in E-Discovery (DESI V), about minimizing the cost of human review following an automated classification pass
Short Description about machine learning.What is machine learning? specifications , categories, terminologies and applications every thing is explained in short way.
Short Description about machine learning.What is machine learning? specifications , categories, terminologies and applications every thing is explained in short way.
Self-organisation of Knowledge in Socio-technical Systems: A Coordination Per...Andrea Omicini
Some of the most peculiar traits of socio-technical systems (STS) in knowledge-intensive environments (KIE) – such as unpredictability of agents’ behaviour, ever-growing amount of information to manage, fast-paced production/consumption – tangle coordination of agents as well as coordination of information, by affecting, e.g., reachability by knowledge prosumers and manageability by the IT infrastructure. In this seminar we describe a novel approach to coordination of STS in KIE, grounded on the MoK (Molecules of Knowledge) model for knowledge self-organisation, and inspired to key concepts from the cognitive theory of BIC (behavioural implicit communication).
A N E XTENSION OF P ROTÉGÉ FOR AN AUTOMA TIC F UZZY - O NTOLOGY BUILDING U...ijcsit
The process of building ontology is a very
complex and time
-
consuming process
especially when dealing
with huge amount of data. Unfortunately current
marketed
tools are very limited and don’t meet
all
user
needs.
Indeed, t
hese software build the core of the ontology from initial data that generates
a
big number of
information.
In this paper, we
aim to resolve these problems
by adding an extension to the well known
ontology editor Protégé in order to work towards a complete
FCA
-
based framework
which resolves the
limitation of other tools in
building fuzzy
-
ontology
.
W
e will give
, in this paper
, some
details on
our
sem
i
-
automat
ic collaborative tool
called FOD Tab Plug
-
in
which
takes into consideration another degree of
granularity in the process of generation
.
In fact, i
t follows a bottom
-
up strategy based on conceptual
clustering, fuzzy logic and Formal Concept Analysis (FCA) a
nd it defines ontology between classes
resulting from a preliminary classification of data and not from the initial large amount of data
.
Daniel Samaan: ChatGPT and the Future of WorkEdunomica
Daniel Samaan: ChatGPT and the Future of Work
People Analytics Conference 2023 Summer
Website: https://pacamp.org
Youtube: https://www.youtube.com/channel/UCeHtPZ_ZLZ-nHFMUCXY81RQ
FB: https://www.facebook.com/pacamporg
The data deluge who we are living today is fostering the development of new techniques for effective and efficient methods for the analysis and the extraction of knowledge and insights from the data. The Big Data paradigm, in particular, the Volume and the Velocity features, is requiring to change our habits for treating data and for extracting information that is useful for discovering patterns and insights. Also, the exploratory data analysis must reformulate its aims. How many groups are in the data? How to deal with data that doesn't fit my PC memory? How to represent aggregated data or repeated measures on individuals? How data correlates? The Symbolic Data Analysis approach tried to reformulate the statistical thinking in this case. In this talk, we present some tools for working with aggregated data described by empirical distributions of values.
Using some real-cases from different fields (data from sensors, official statistics or data stream), and the HistDAWass R package, we show some recent solutions for the unsupervised classification, and for feature selection in a subspace clustering context, and how to interpret the results.
Frankie Rybicki slide set for Deep Learning in Radiology / MedicineFrank Rybicki
These are my #AI slides for medical deep learning using #radiology and medical imaging examples. Please use them & modify to teach your own group about medical AI.
“Towards Multi-Step Expert Advice for Cognitive Computing” - Dr. Achim Rettin...diannepatricia
Dr. Achim Rettinger from Karlsruhe Institute of Technology presented this today as part of the Cognitive Systems Institute Speaker Series on October 13, 2016
A Clustering Method for Weak Signals to Support Anticipative IntelligenceCSCJournals
Organizations need appropriate anticipative information to support their decision making process. Contrarily to some strategic information analyses that help managers to establish patterns using past information, anticipative intelligence is intended to help managers to act based on the analysis of pieces of information that indicate some sort of trend that may become true in the future. One example of this kind of information is known as a weak signal, which is a short text related to a specific domain. In this work, pairs of weak signals, written in Portuguese, are compared to each other so that similarities can be identified and correlated weak signals can be clustered together. The idea is that the analysis of the resulting similar groups may lead to the formulation of a hypothesis that can support the decision making process. The proposed technique consists of two main steps: preprocessing the set of weak signals and clustering. The proposed method was evaluated on a database of bio-energy weak signals. The main innovations of this work are: (i) the application of a computational methodology from the literature for analyzing anticipative information; and (ii) the adaptation of data mining techniques to implement this methodology in a software product.
Probabilistic Modular Embedding for Stochastic Coordinated SystemsStefano Mariani
Embedding and modular embedding are two well-known tech- niques for measuring and comparing the expressiveness of languages— sequential and concurrent programming languages, respectively. The emer- gence of new classes of computational systems featuring stochastic be- haviours – such as pervasive, adaptive, self-organising systems – requires new tools for probabilistic languages. In this paper, we recall and refine the notion of probabilistic modular embedding (PME) as an extension to modular embedding meant to capture the expressiveness of stochas- tic systems, and show its application to different coordination languages providing probabilistic mechanisms for stochastic systems.
Self-organisation of Knowledge in Socio-technical Systems: A Coordination Per...Andrea Omicini
Some of the most peculiar traits of socio-technical systems (STS) in knowledge-intensive environments (KIE) – such as unpredictability of agents’ behaviour, ever-growing amount of information to manage, fast-paced production/consumption – tangle coordination of agents as well as coordination of information, by affecting, e.g., reachability by knowledge prosumers and manageability by the IT infrastructure. In this seminar we describe a novel approach to coordination of STS in KIE, grounded on the MoK (Molecules of Knowledge) model for knowledge self-organisation, and inspired to key concepts from the cognitive theory of BIC (behavioural implicit communication).
A N E XTENSION OF P ROTÉGÉ FOR AN AUTOMA TIC F UZZY - O NTOLOGY BUILDING U...ijcsit
The process of building ontology is a very
complex and time
-
consuming process
especially when dealing
with huge amount of data. Unfortunately current
marketed
tools are very limited and don’t meet
all
user
needs.
Indeed, t
hese software build the core of the ontology from initial data that generates
a
big number of
information.
In this paper, we
aim to resolve these problems
by adding an extension to the well known
ontology editor Protégé in order to work towards a complete
FCA
-
based framework
which resolves the
limitation of other tools in
building fuzzy
-
ontology
.
W
e will give
, in this paper
, some
details on
our
sem
i
-
automat
ic collaborative tool
called FOD Tab Plug
-
in
which
takes into consideration another degree of
granularity in the process of generation
.
In fact, i
t follows a bottom
-
up strategy based on conceptual
clustering, fuzzy logic and Formal Concept Analysis (FCA) a
nd it defines ontology between classes
resulting from a preliminary classification of data and not from the initial large amount of data
.
Daniel Samaan: ChatGPT and the Future of WorkEdunomica
Daniel Samaan: ChatGPT and the Future of Work
People Analytics Conference 2023 Summer
Website: https://pacamp.org
Youtube: https://www.youtube.com/channel/UCeHtPZ_ZLZ-nHFMUCXY81RQ
FB: https://www.facebook.com/pacamporg
The data deluge who we are living today is fostering the development of new techniques for effective and efficient methods for the analysis and the extraction of knowledge and insights from the data. The Big Data paradigm, in particular, the Volume and the Velocity features, is requiring to change our habits for treating data and for extracting information that is useful for discovering patterns and insights. Also, the exploratory data analysis must reformulate its aims. How many groups are in the data? How to deal with data that doesn't fit my PC memory? How to represent aggregated data or repeated measures on individuals? How data correlates? The Symbolic Data Analysis approach tried to reformulate the statistical thinking in this case. In this talk, we present some tools for working with aggregated data described by empirical distributions of values.
Using some real-cases from different fields (data from sensors, official statistics or data stream), and the HistDAWass R package, we show some recent solutions for the unsupervised classification, and for feature selection in a subspace clustering context, and how to interpret the results.
Frankie Rybicki slide set for Deep Learning in Radiology / MedicineFrank Rybicki
These are my #AI slides for medical deep learning using #radiology and medical imaging examples. Please use them & modify to teach your own group about medical AI.
“Towards Multi-Step Expert Advice for Cognitive Computing” - Dr. Achim Rettin...diannepatricia
Dr. Achim Rettinger from Karlsruhe Institute of Technology presented this today as part of the Cognitive Systems Institute Speaker Series on October 13, 2016
A Clustering Method for Weak Signals to Support Anticipative IntelligenceCSCJournals
Organizations need appropriate anticipative information to support their decision making process. Contrarily to some strategic information analyses that help managers to establish patterns using past information, anticipative intelligence is intended to help managers to act based on the analysis of pieces of information that indicate some sort of trend that may become true in the future. One example of this kind of information is known as a weak signal, which is a short text related to a specific domain. In this work, pairs of weak signals, written in Portuguese, are compared to each other so that similarities can be identified and correlated weak signals can be clustered together. The idea is that the analysis of the resulting similar groups may lead to the formulation of a hypothesis that can support the decision making process. The proposed technique consists of two main steps: preprocessing the set of weak signals and clustering. The proposed method was evaluated on a database of bio-energy weak signals. The main innovations of this work are: (i) the application of a computational methodology from the literature for analyzing anticipative information; and (ii) the adaptation of data mining techniques to implement this methodology in a software product.
Probabilistic Modular Embedding for Stochastic Coordinated SystemsStefano Mariani
Embedding and modular embedding are two well-known tech- niques for measuring and comparing the expressiveness of languages— sequential and concurrent programming languages, respectively. The emer- gence of new classes of computational systems featuring stochastic be- haviours – such as pervasive, adaptive, self-organising systems – requires new tools for probabilistic languages. In this paper, we recall and refine the notion of probabilistic modular embedding (PME) as an extension to modular embedding meant to capture the expressiveness of stochas- tic systems, and show its application to different coordination languages providing probabilistic mechanisms for stochastic systems.
Hybrid Approach for Brain Tumour Detection in Image Segmentationijtsrd
In this paper we have considered illustrating a few techniques. But the numbers of techniques are so large they cannot be all addressed. Image segmentation forms the basics of pattern recognition and scene analysis problems. The segmentation techniques are numerous in number but the choice of one technique over the other depends only on the application or requirements of the problem that is being considered. Analysis of cluster is a descriptive assignment that perceive homogenous group of objects and it is also one of the fundamental analytical method in facts mining. The main idea of this is to present facts about brain tumour detection system and various data mining methods used in this system. This is focuses on scalable data systems, which include a set of tools and mechanisms to load, extract, and improve disparate data power to perform complex transformations and analysis will be measured between the way of measuring the Furrier and Wavelet Transform distance. Sandeep | Jyoti Kataria "Hybrid Approach for Brain Tumour Detection in Image Segmentation" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-4 | Issue-6 , October 2020, URL: https://www.ijtsrd.com/papers/ijtsrd33409.pdf Paper Url: https://www.ijtsrd.com/medicine/other/33409/hybrid-approach-for-brain-tumour-detection-in-image-segmentation/sandeep
Learn SQL from basic queries to Advance queriesmanishkhaire30
Dive into the world of data analysis with our comprehensive guide on mastering SQL! This presentation offers a practical approach to learning SQL, focusing on real-world applications and hands-on practice. Whether you're a beginner or looking to sharpen your skills, this guide provides the tools you need to extract, analyze, and interpret data effectively.
Key Highlights:
Foundations of SQL: Understand the basics of SQL, including data retrieval, filtering, and aggregation.
Advanced Queries: Learn to craft complex queries to uncover deep insights from your data.
Data Trends and Patterns: Discover how to identify and interpret trends and patterns in your datasets.
Practical Examples: Follow step-by-step examples to apply SQL techniques in real-world scenarios.
Actionable Insights: Gain the skills to derive actionable insights that drive informed decision-making.
Join us on this journey to enhance your data analysis capabilities and unlock the full potential of SQL. Perfect for data enthusiasts, analysts, and anyone eager to harness the power of data!
#DataAnalysis #SQL #LearningSQL #DataInsights #DataScience #Analytics
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeWalaa Eldin Moustafa
Dynamic policy enforcement is becoming an increasingly important topic in today’s world where data privacy and compliance is a top priority for companies, individuals, and regulators alike. In these slides, we discuss how LinkedIn implements a powerful dynamic policy enforcement engine, called ViewShift, and integrates it within its data lake. We show the query engine architecture and how catalog implementations can automatically route table resolutions to compliance-enforcing SQL views. Such views have a set of very interesting properties: (1) They are auto-generated from declarative data annotations. (2) They respect user-level consent and preferences (3) They are context-aware, encoding a different set of transformations for different use cases (4) They are portable; while the SQL logic is only implemented in one SQL dialect, it is accessible in all engines.
#SQL #Views #Privacy #Compliance #DataLake
Analysis insight about a Flyball dog competition team's performanceroli9797
Insight of my analysis about a Flyball dog competition team's last year performance. Find more: https://github.com/rolandnagy-ds/flyball_race_analysis/tree/main
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Round table discussion of vector databases, unstructured data, ai, big data, real-time, robots and Milvus.
A lively discussion with NJ Gen AI Meetup Lead, Prasad and Procure.FYI's Co-Found
The Building Blocks of QuestDB, a Time Series Databasejavier ramirez
Talk Delivered at Valencia Codes Meetup 2024-06.
Traditionally, databases have treated timestamps just as another data type. However, when performing real-time analytics, timestamps should be first class citizens and we need rich time semantics to get the most out of our data. We also need to deal with ever growing datasets while keeping performant, which is as fun as it sounds.
It is no wonder time-series databases are now more popular than ever before. Join me in this session to learn about the internal architecture and building blocks of QuestDB, an open source time-series database designed for speed. We will also review a history of some of the changes we have gone over the past two years to deal with late and unordered data, non-blocking writes, read-replicas, or faster batch ingestion.
The Building Blocks of QuestDB, a Time Series Database
Utility Theory, Minimum Effort, and Predictive Coding
1. Utility Theory, Minimum Effort,
and Predictive Coding
Fabrizio Sebastiani
Istituto di Scienza e Tecnologie dell’Informazione
Consiglio Nazionale delle Ricerche
56124 Pisa, Italy
DESI V – Roma, IT, 14 June 2013
2. What I’ll be talking about
A talk about text classification (“predictive coding”), about humans in the
loop, and about how to best support their work
I will be looking at scenarios in which
1 text classification technology is used for identifying documents belonging to a
given class / relevant to a given query ...
2 ... but the level of accuracy that can be obtained from the classifier is not
considered sufficient ...
3 ... with the consequence that one or more human assessors are asked to
inspect (and correct where appropriate) a portion of the classification
decisions, with the goal of increasing overall accuracy.
How can we support / optimize the work of the human assessors?
Fabrizio Sebastiani (ISTI-CNR Pisa (Italy)) Utility Theory, Minimum Effort, and Predictive Coding DESI V – Roma, IT, 14 June 2013 2 / 36
3. What I’ll be talking about
A talk about text classification (“predictive coding”), about humans in the
loop, and about how to best support their work
I will be looking at scenarios in which
1 text classification technology is used for identifying documents belonging to a
given class / relevant to a given query ...
2 ... but the level of accuracy that can be obtained from the classifier is not
considered sufficient ...
3 ... with the consequence that one or more human assessors are asked to
inspect (and correct where appropriate) a portion of the classification
decisions, with the goal of increasing overall accuracy.
How can we support / optimize the work of the human assessors?
Fabrizio Sebastiani (ISTI-CNR Pisa (Italy)) Utility Theory, Minimum Effort, and Predictive Coding DESI V – Roma, IT, 14 June 2013 2 / 36
4. What I’ll be talking about
A talk about text classification (“predictive coding”), about humans in the
loop, and about how to best support their work
I will be looking at scenarios in which
1 text classification technology is used for identifying documents belonging to a
given class / relevant to a given query ...
2 ... but the level of accuracy that can be obtained from the classifier is not
considered sufficient ...
3 ... with the consequence that one or more human assessors are asked to
inspect (and correct where appropriate) a portion of the classification
decisions, with the goal of increasing overall accuracy.
How can we support / optimize the work of the human assessors?
Fabrizio Sebastiani (ISTI-CNR Pisa (Italy)) Utility Theory, Minimum Effort, and Predictive Coding DESI V – Roma, IT, 14 June 2013 2 / 36
5. A worked out example
predicted
Y N
true
Y TP = 4 FP = 3
N FN = 4 TN = 9
F1 =
2TP
2TP + FP + FN
= 0.53
Fabrizio Sebastiani (ISTI-CNR Pisa (Italy)) Utility Theory, Minimum Effort, and Predictive Coding DESI V – Roma, IT, 14 June 2013 3 / 36
6. A worked out example (cont’d)
predicted
Y N
true
Y TP = 4 FP = 3
N FN = 4 TN = 9
F1 =
2TP
2TP + FP + FN
= 0.53
Fabrizio Sebastiani (ISTI-CNR Pisa (Italy)) Utility Theory, Minimum Effort, and Predictive Coding DESI V – Roma, IT, 14 June 2013 4 / 36
7. A worked out example (cont’d)
predicted
Y N
true
Y TP = 5 FP = 3
N FN = 3 TN = 9
F1 =
2TP
2TP + FP + FN
= 0.63
Fabrizio Sebastiani (ISTI-CNR Pisa (Italy)) Utility Theory, Minimum Effort, and Predictive Coding DESI V – Roma, IT, 14 June 2013 5 / 36
8. A worked out example (cont’d)
predicted
Y N
true
Y TP = 5 FP = 2
N FN = 3 TN = 10
F1 =
2TP
2TP + FP + FN
= 0.67
Fabrizio Sebastiani (ISTI-CNR Pisa (Italy)) Utility Theory, Minimum Effort, and Predictive Coding DESI V – Roma, IT, 14 June 2013 6 / 36
9. A worked out example (cont’d)
predicted
Y N
true
Y TP = 6 FP = 2
N FN = 2 TN = 10
F1 =
2TP
2TP + FP + FN
= 0.75
Fabrizio Sebastiani (ISTI-CNR Pisa (Italy)) Utility Theory, Minimum Effort, and Predictive Coding DESI V – Roma, IT, 14 June 2013 7 / 36
10. A worked out example (cont’d)
predicted
Y N
true
Y TP = 6 FP = 1
N FN = 2 TN = 11
F1 =
2TP
2TP + FP + FN
= 0.80
Fabrizio Sebastiani (ISTI-CNR Pisa (Italy)) Utility Theory, Minimum Effort, and Predictive Coding DESI V – Roma, IT, 14 June 2013 8 / 36
11. What I’ll be talking about (cont’d)
We need methods that
given a desired level of accuracy, minimize the assessors’ effort necessary to
achieve it; alternatively,
given an available amount of human assessors’ effort, maximize the accuracy
that can be obtained through it
This can be achieved by ranking the automatically classified documents in
such a way that, by starting the inspection from the top of the ranking, the
cost-effectiveness of the annotators’ work is maximized
We call the task of generating such a ranking Semi-Automatic Text
Classification (SATC)
Fabrizio Sebastiani (ISTI-CNR Pisa (Italy)) Utility Theory, Minimum Effort, and Predictive Coding DESI V – Roma, IT, 14 June 2013 9 / 36
12. What I’ll be talking about (cont’d)
We need methods that
given a desired level of accuracy, minimize the assessors’ effort necessary to
achieve it; alternatively,
given an available amount of human assessors’ effort, maximize the accuracy
that can be obtained through it
This can be achieved by ranking the automatically classified documents in
such a way that, by starting the inspection from the top of the ranking, the
cost-effectiveness of the annotators’ work is maximized
We call the task of generating such a ranking Semi-Automatic Text
Classification (SATC)
Fabrizio Sebastiani (ISTI-CNR Pisa (Italy)) Utility Theory, Minimum Effort, and Predictive Coding DESI V – Roma, IT, 14 June 2013 9 / 36
13. What I’ll be talking about (cont’d)
Previous work has addressed SATC via techniques developed for “active
learning”
In both cases, the automatically classified documents are ranked with the
goal of having the human annotator start inspecting/correcting from the top;
however
in active learning the goal is providing new training examples
in SATC the goal is increasing the overall accuracy of the classified set
We claim that a ranking generated “à la active learning” is suboptimal for
SATC1
1G Berardi, A Esuli, F Sebastiani. A Utility-Theoretic Ranking Method for Semi-Automated Text
Classification. Proceedings of the 35th Annual International ACM SIGIR Conference on Research and
Development in Information Retrieval (SIGIR 2012), Portland, US, 2012.
Fabrizio Sebastiani (ISTI-CNR Pisa (Italy)) Utility Theory, Minimum Effort, and Predictive Coding DESI V – Roma, IT, 14 June 2013 10 / 36
14. What I’ll be talking about (cont’d)
Previous work has addressed SATC via techniques developed for “active
learning”
In both cases, the automatically classified documents are ranked with the
goal of having the human annotator start inspecting/correcting from the top;
however
in active learning the goal is providing new training examples
in SATC the goal is increasing the overall accuracy of the classified set
We claim that a ranking generated “à la active learning” is suboptimal for
SATC1
1G Berardi, A Esuli, F Sebastiani. A Utility-Theoretic Ranking Method for Semi-Automated Text
Classification. Proceedings of the 35th Annual International ACM SIGIR Conference on Research and
Development in Information Retrieval (SIGIR 2012), Portland, US, 2012.
Fabrizio Sebastiani (ISTI-CNR Pisa (Italy)) Utility Theory, Minimum Effort, and Predictive Coding DESI V – Roma, IT, 14 June 2013 10 / 36
15. Outline of this talk
1 We discuss how to measure “error reduction” (i.e., increase in accuracy)
2 We discuss a method for maximizing the expected error reduction for a fixed
amount of annotation effort
3 We show some promising experimental results
Fabrizio Sebastiani (ISTI-CNR Pisa (Italy)) Utility Theory, Minimum Effort, and Predictive Coding DESI V – Roma, IT, 14 June 2013 11 / 36
16. Error Reduction, and How to Measure it
Outline
1 Error Reduction, and How to Measure it
2 Error Reduction, and How to Maximize it
3 Some Experimental Results
Fabrizio Sebastiani (ISTI-CNR Pisa (Italy)) Utility Theory, Minimum Effort, and Predictive Coding DESI V – Roma, IT, 14 June 2013 12 / 36
17. Error Reduction, and How to Measure it
Error Reduction, and how to measure it
Assume we have
1 class (or “query”) c;
2 classifier h for c;
3 set of unlabeled documents D that we have automatically classified by means
of h, so that every document in D is associated
with a binary decision (Y or N)
with a confidence score (a positive real number)
4 measure of accuracy A, ranging on [0,1]
Fabrizio Sebastiani (ISTI-CNR Pisa (Italy)) Utility Theory, Minimum Effort, and Predictive Coding DESI V – Roma, IT, 14 June 2013 13 / 36
18. Error Reduction, and How to Measure it
Error Reduction, and how to Measure it
(cont’d)
We will assume that A is
F1 =
2 · Precision · Recall
Precision + Recall
=
2 · TP
(2 · TP) + FP + FN
but any “set-based” measure of accuracy (i.e., based on a contingency table)
may be used
An amount of error, measured as E = (1 − A), is present in the automatically
classified set D
Human annotators inspect-and-correct a portion of D with the goal of
reducing the error present in D
Fabrizio Sebastiani (ISTI-CNR Pisa (Italy)) Utility Theory, Minimum Effort, and Predictive Coding DESI V – Roma, IT, 14 June 2013 14 / 36
19. Error Reduction, and How to Measure it
Error Reduction, and how to Measure it
(cont’d)
We will assume that A is
F1 =
2 · Precision · Recall
Precision + Recall
=
2 · TP
(2 · TP) + FP + FN
but any “set-based” measure of accuracy (i.e., based on a contingency table)
may be used
An amount of error, measured as E = (1 − A), is present in the automatically
classified set D
Human annotators inspect-and-correct a portion of D with the goal of
reducing the error present in D
Fabrizio Sebastiani (ISTI-CNR Pisa (Italy)) Utility Theory, Minimum Effort, and Predictive Coding DESI V – Roma, IT, 14 June 2013 14 / 36
20. Error Reduction, and How to Measure it
Error Reduction, and how to Measure it
(cont’d)
We define error at rank n (noted as E(n)) as the error still present in D after
the annotator has inspected the documents at the first n rank positions
E(0) is the initial error generated by the automated classifier
E(|D|) is 0
We define error reduction at rank n (noted as ER(n)) to be
ER(n) =
E(0) − E(n)
E(0)
the error reduction obtained by the annotator who inspects the docs at the
first n rank positions
ER(n) ∈ [0, 1]
ER(n) = 0 indicates no reduction
ER(n) = 1 indicates total elimination of error
Fabrizio Sebastiani (ISTI-CNR Pisa (Italy)) Utility Theory, Minimum Effort, and Predictive Coding DESI V – Roma, IT, 14 June 2013 15 / 36
21. Error Reduction, and How to Measure it
Error Reduction, and how to Measure it
(cont’d)
We define error at rank n (noted as E(n)) as the error still present in D after
the annotator has inspected the documents at the first n rank positions
E(0) is the initial error generated by the automated classifier
E(|D|) is 0
We define error reduction at rank n (noted as ER(n)) to be
ER(n) =
E(0) − E(n)
E(0)
the error reduction obtained by the annotator who inspects the docs at the
first n rank positions
ER(n) ∈ [0, 1]
ER(n) = 0 indicates no reduction
ER(n) = 1 indicates total elimination of error
Fabrizio Sebastiani (ISTI-CNR Pisa (Italy)) Utility Theory, Minimum Effort, and Predictive Coding DESI V – Roma, IT, 14 June 2013 15 / 36
22. Error Reduction, and How to Measure it
Error Reduction, and how to Measure it
(cont’d)
0.0 0.2 0.4 0.6 0.8 1.0
Inspection Length
0.0
0.2
0.4
0.6
0.8
1.0
ErrorReduction(ER)
Fabrizio Sebastiani (ISTI-CNR Pisa (Italy)) Utility Theory, Minimum Effort, and Predictive Coding DESI V – Roma, IT, 14 June 2013 16 / 36
23. Error Reduction, and How to Maximize it
Outline
1 Error Reduction, and How to Measure it
2 Error Reduction, and How to Maximize it
3 Some Experimental Results
Fabrizio Sebastiani (ISTI-CNR Pisa (Italy)) Utility Theory, Minimum Effort, and Predictive Coding DESI V – Roma, IT, 14 June 2013 17 / 36
24. Error Reduction, and How to Maximize it
Error Reduction, and how to Maximize it
Problem
How should we rank the documents in D so as to maximize the expected error
reduction?
Fabrizio Sebastiani (ISTI-CNR Pisa (Italy)) Utility Theory, Minimum Effort, and Predictive Coding DESI V – Roma, IT, 14 June 2013 18 / 36
25. Error Reduction, and How to Maximize it
A worked out example
predicted
Y N
true
Y TP = 4 FP = 3
N FN = 4 TN = 9
F1 =
2TP
2TP + FP + FN
= 0.53
Fabrizio Sebastiani (ISTI-CNR Pisa (Italy)) Utility Theory, Minimum Effort, and Predictive Coding DESI V – Roma, IT, 14 June 2013 19 / 36
26. Error Reduction, and How to Maximize it
A worked out example (cont’d)
predicted
Y N
true
Y TP = 4 FP = 3
N FN = 4 TN = 9
F1 =
2TP
2TP + FP + FN
= 0.53
Fabrizio Sebastiani (ISTI-CNR Pisa (Italy)) Utility Theory, Minimum Effort, and Predictive Coding DESI V – Roma, IT, 14 June 2013 20 / 36
27. Error Reduction, and How to Maximize it
A worked out example (cont’d)
predicted
Y N
true
Y TP = 5 FP = 3
N FN = 3 TN = 9
F1 =
2TP
2TP + FP + FN
= 0.63
Fabrizio Sebastiani (ISTI-CNR Pisa (Italy)) Utility Theory, Minimum Effort, and Predictive Coding DESI V – Roma, IT, 14 June 2013 21 / 36
28. Error Reduction, and How to Maximize it
A worked out example (cont’d)
predicted
Y N
true
Y TP = 5 FP = 2
N FN = 3 TN = 10
F1 =
2TP
2TP + FP + FN
= 0.67
Fabrizio Sebastiani (ISTI-CNR Pisa (Italy)) Utility Theory, Minimum Effort, and Predictive Coding DESI V – Roma, IT, 14 June 2013 22 / 36
29. Error Reduction, and How to Maximize it
A worked out example (cont’d)
predicted
Y N
true
Y TP = 6 FP = 2
N FN = 2 TN = 10
F1 =
2TP
2TP + FP + FN
= 0.75
Fabrizio Sebastiani (ISTI-CNR Pisa (Italy)) Utility Theory, Minimum Effort, and Predictive Coding DESI V – Roma, IT, 14 June 2013 23 / 36
30. Error Reduction, and How to Maximize it
A worked out example (cont’d)
predicted
Y N
true
Y TP = 6 FP = 1
N FN = 2 TN = 11
F1 =
2TP
2TP + FP + FN
= 0.80
Fabrizio Sebastiani (ISTI-CNR Pisa (Italy)) Utility Theory, Minimum Effort, and Predictive Coding DESI V – Roma, IT, 14 June 2013 24 / 36
31. Error Reduction, and How to Maximize it
Error Reduction, and how to Maximize it
Problem: how should we rank the documents in D so as to maximize the
expected error reduction?
Intuition 1: Documents that have a higher probability of being misclassified
should be ranked higher
Intuition 2: Documents that, if corrected, bring about a higher gain (i.e., a
bigger impact on A) should be ranked higher
Here, consider that a false positive and a false negative may have different
impacts on A (e.g., when A ≡ Fβ, for any value of β)
Bottom line
Documents that have a higher utility (= probability × gain) should be ranked
higher
Fabrizio Sebastiani (ISTI-CNR Pisa (Italy)) Utility Theory, Minimum Effort, and Predictive Coding DESI V – Roma, IT, 14 June 2013 25 / 36
32. Error Reduction, and How to Maximize it
Error Reduction, and how to Maximize it
Problem: how should we rank the documents in D so as to maximize the
expected error reduction?
Intuition 1: Documents that have a higher probability of being misclassified
should be ranked higher
Intuition 2: Documents that, if corrected, bring about a higher gain (i.e., a
bigger impact on A) should be ranked higher
Here, consider that a false positive and a false negative may have different
impacts on A (e.g., when A ≡ Fβ, for any value of β)
Bottom line
Documents that have a higher utility (= probability × gain) should be ranked
higher
Fabrizio Sebastiani (ISTI-CNR Pisa (Italy)) Utility Theory, Minimum Effort, and Predictive Coding DESI V – Roma, IT, 14 June 2013 25 / 36
33. Error Reduction, and How to Maximize it
Error Reduction, and how to Maximize it
(cont’d)
Given a set Ω of mutually disjoint events, a utility function is defined as
U(Ω) =
ω∈Ω
P(ω)G(ω)
where
P(ω) is the probability of occurrence of event ω
G(ω) is the gain obtained if event ω occurs
We can thus estimate the utility, for the aims of increasing A, of manually
inspecting a document d as
U(TP, TN, FP, FN) = P(FP) · G(FP) + P(FN) · G(FN)
provided we can estimate
If d is labelled with class c: P(FP) and G(FP)
If d is not labelled with class c: P(FN) and G(FN)
Fabrizio Sebastiani (ISTI-CNR Pisa (Italy)) Utility Theory, Minimum Effort, and Predictive Coding DESI V – Roma, IT, 14 June 2013 26 / 36
34. Error Reduction, and How to Maximize it
Error Reduction, and how to Maximize it
(cont’d)
Given a set Ω of mutually disjoint events, a utility function is defined as
U(Ω) =
ω∈Ω
P(ω)G(ω)
where
P(ω) is the probability of occurrence of event ω
G(ω) is the gain obtained if event ω occurs
We can thus estimate the utility, for the aims of increasing A, of manually
inspecting a document d as
U(TP, TN, FP, FN) = P(FP) · G(FP) + P(FN) · G(FN)
provided we can estimate
If d is labelled with class c: P(FP) and G(FP)
If d is not labelled with class c: P(FN) and G(FN)
Fabrizio Sebastiani (ISTI-CNR Pisa (Italy)) Utility Theory, Minimum Effort, and Predictive Coding DESI V – Roma, IT, 14 June 2013 26 / 36
35. Error Reduction, and How to Maximize it
Error Reduction, and how to Maximize it
(cont’d)
Estimating P(FP) and P(FN) (the probability of misclassification) can be
done by converting the confidence score returned by the classifier into a
probability of correct classification
Tricky: requires probability “calibration” via a generalized sigmoid function to
be optimized via k-fold cross-validation
Gains G(FP) and G(FN) can be defined “differentially”; i.e.,
The gain obtained by correcting a FN is (AFN→TP
− A)
The gain obtained by correcting a FP is (AFP→TN
− A)
Gains need to be estimated by estimating the contingency table on the
training set via k-fold cross-validation
Key observation: in general, G(FP) = G(FN)
Fabrizio Sebastiani (ISTI-CNR Pisa (Italy)) Utility Theory, Minimum Effort, and Predictive Coding DESI V – Roma, IT, 14 June 2013 27 / 36
36. Error Reduction, and How to Maximize it
Error Reduction, and how to Maximize it
(cont’d)
Estimating P(FP) and P(FN) (the probability of misclassification) can be
done by converting the confidence score returned by the classifier into a
probability of correct classification
Tricky: requires probability “calibration” via a generalized sigmoid function to
be optimized via k-fold cross-validation
Gains G(FP) and G(FN) can be defined “differentially”; i.e.,
The gain obtained by correcting a FN is (AFN→TP
− A)
The gain obtained by correcting a FP is (AFP→TN
− A)
Gains need to be estimated by estimating the contingency table on the
training set via k-fold cross-validation
Key observation: in general, G(FP) = G(FN)
Fabrizio Sebastiani (ISTI-CNR Pisa (Italy)) Utility Theory, Minimum Effort, and Predictive Coding DESI V – Roma, IT, 14 June 2013 27 / 36
37. Error Reduction, and How to Maximize it
Error Reduction, and how to Maximize it
(cont’d)
Estimating P(FP) and P(FN) (the probability of misclassification) can be
done by converting the confidence score returned by the classifier into a
probability of correct classification
Tricky: requires probability “calibration” via a generalized sigmoid function to
be optimized via k-fold cross-validation
Gains G(FP) and G(FN) can be defined “differentially”; i.e.,
The gain obtained by correcting a FN is (AFN→TP
− A)
The gain obtained by correcting a FP is (AFP→TN
− A)
Gains need to be estimated by estimating the contingency table on the
training set via k-fold cross-validation
Key observation: in general, G(FP) = G(FN)
Fabrizio Sebastiani (ISTI-CNR Pisa (Italy)) Utility Theory, Minimum Effort, and Predictive Coding DESI V – Roma, IT, 14 June 2013 27 / 36
38. Some Experimental Results
Outline
1 Error Reduction, and How to Measure it
2 Error Reduction, and How to Maximize it
3 Some Experimental Results
Fabrizio Sebastiani (ISTI-CNR Pisa (Italy)) Utility Theory, Minimum Effort, and Predictive Coding DESI V – Roma, IT, 14 June 2013 28 / 36
39. Some Experimental Results
Some Experimental Results
Learning algorithms: MP-Boost, SVMs
Datasets:
# Cats # Training # Test FM
1 MP-Boost FM
1 SVMs
Reuters-21578 115 9603 3299 0.608 0.527
OHSUMED-S 97 12358 3652 0.479 0.478
Baseline: ranking by probability of misclassification, equivalent to applying
our ranking method with G(FP) = G(FN) = 1
Fabrizio Sebastiani (ISTI-CNR Pisa (Italy)) Utility Theory, Minimum Effort, and Predictive Coding DESI V – Roma, IT, 14 June 2013 29 / 36
44. Some Experimental Results
A few side notes
This approach allows the human annotator to know, at any stage of the
inspection process, what the estimated accuracy is at that stage
Estimate accuracy at the beginning of the process, via k-fold cross validation
Update after each correction is made
This approach lends itself to having more than one assessor working in
parallel on the same inspection task
Recent research I have not discussed today :
A “dynamic” SATC method in which gains are updated after each correction
is performed
“Microaveraging” and “Macroaveraging” -oriented methods
Fabrizio Sebastiani (ISTI-CNR Pisa (Italy)) Utility Theory, Minimum Effort, and Predictive Coding DESI V – Roma, IT, 14 June 2013 34 / 36
45. Some Experimental Results
Concluding Remarks
Take-away message: Semi-automatic text classification needs to be addressed
as a task in its own right
Active learning typically makes use of probabilities of misclassification but does
not make use of gains ⇒ ranking “à la active learning” is suboptimal for SATC
The use of utility theory means that the ranking algorithm is optimized for a
specific accuracy measure ⇒ Choose the accuracy measure the best mirrors
your applicative needs (e.g., Fβ with β > 1), and choose it well!
SATC is important, since in more and more application contexts the accuracy
obtainable via completely automatic text classification is not sufficient; more
and more frequently humans will need to enter the loop
Fabrizio Sebastiani (ISTI-CNR Pisa (Italy)) Utility Theory, Minimum Effort, and Predictive Coding DESI V – Roma, IT, 14 June 2013 35 / 36
46. Some Experimental Results
Thank you!
Fabrizio Sebastiani (ISTI-CNR Pisa (Italy)) Utility Theory, Minimum Effort, and Predictive Coding DESI V – Roma, IT, 14 June 2013 36 / 36