SlideShare a Scribd company logo
Explainable insights
on algorithm
performance
Sevvandi Kandanaarachchi
Work with Kate Smith-Miles
1
To explain or to predict? – Galit Shmueli
● Paper by Shmueli in 2010
● Talks about these two topics
● Argues that explanation and prediction are different
● Two modelling paths – for predicting and explaining
● Social sciences have done explanatory models for a long time
What is an explanation?
• To explain an event is to provide some information about its causal history. –
Lewis, 1986 (Causal Explanation)
• A statement or an account that makes something clear – Google
• It is important to note that the solution to explainable AI is not just ‘more AI’ -
Miller, 2019
• Miller (2019) argues for Social Science + Computer Science in XAI
In the fields of philosophy, cognitive psychology/science, and social
psychology, there is a vast and mature body of work that studies these exact
topics.
3
Message: Bring in the
social scientists to the
party!
Integrate their methods!
4
What is this talk about?
● Using a method in social sciences to do two ML type tasks
● Evaluate algorithms
● It gives us more meaningful metrics about algorithms
● Has some causal interpretations
● Visually inspect where algorithms perform well (and poorly)
● Ensembles
● Anomaly detection ensembles
What is algorithm evaluation?
• Performance of many algorithms to many problems
• How do you explain the algorithm performance?
• Standard statistical analysis misses many things
Algo 1 Algo 2 Algo 3 Algo 4
Problem 1
Problem 2
Problem 3
Problem 4
Problem 5
6
We want to evaluate algorithms
performance . . .
. . . in a way that we understand algorithms
and problems better!
7
8
Hello Item
Response
Theory
Item Response Theory
(IRT)
• Modelsusedinsocialsciences/psychometrics
• Unobservablecharacteristicsandobserved
outcomes
• Verbalormathematicalability
• Racialprejudiceor stressproneness
• Politicalinclinations
• Intrinsic“quality”thatcannotbemeasured
directly
9
This Photo by Unknown Author is licensed under CC BY-SA
IRT in education
• Finds the discrimination and difficulty of test
questions
• And the ability of the test participants
• By fitting an IRT model
• In education – questions that can discriminate
between students with different ability is preferred
to “very difficult” questions.
10
How it works
11
Questions
Students Q 1 Q 2 Q 3 Q 4
Stu 1 0.95 0.87 0.67 0.84
Stu 2 0.57 0.49 0.78 0.77
Stu n 0.75 0.86 0.57 0.45
IRT Model
Discrimination of
questions
Difficulty of questions
Ability of students (latent trait)
Matrix
βj
αj
θi
What does IRT give us?
• Q1 - discrimination , difficulty
• Q2 - discrimination , difficulty
• Q3 - discrimination , difficulty
• Q4 - discrimination , difficulty
• Student 1 ability
︙
• Student n ability
α1 β1
α2 β2
α3 β3
α4 β4
θ1
θn
Q 1 Q 2 Q 3 Q 4
Stu 1 0.95 0.87 0.67 0.84
Stu 2 0.57 0.49 0.78 0.77
Stu n 0.75 0.86 0.57 0.45
12
The causal understanding
𝛼
𝑗
𝛽
𝑗
𝜃
𝑖
𝑥
𝑖
𝑗
Discrimination of Q j Difficulty of Q j Ability of student i
Marks of student i for Question j
Student
Question
Marks
13
Dichotomous IRT
● Multiple choice
● True or false
●
● - outcome/score of examinee for item
● - examinee’s ability
● - guessing parameter for item
● - difficulty parameter
● - discrimination
𝜙
(
𝑥
𝑖
𝑗
= 1
𝜃
𝑖
,
𝛼
𝑗
,
𝑑
𝑗
,
𝛾
𝑗
) =
𝛾
𝑗
+
1 −
𝛾
𝑗
1 + exp( −
𝛼
𝑗
(
𝜃
𝑖
−
𝑑
𝑗
))
𝑥
𝑖
𝑗
𝑖𝑗𝜃
𝑖
(
𝑖
)
𝛾
𝑗
𝑗𝑑
𝑗
𝛼
𝑗
This Photo by Unknown Author is licensed under CC BY-NC
14
Continuous IRT
• Grades out of 100
• A 2D surface of probabilities
f (zj |θ) =
αjγj
2π
exp
(
−
α2
j
2 (θ − βj − γjzj)
2
)
15
Mapping algorithm evaluation to IRT
IRT Model
Students
Test questions
16
Problems/datasets
Algorithms
The new mapping
𝛼
𝑗
𝛽
𝑗
𝜃
𝑖
𝑥
𝑖
𝑗
Discrimination of Algo j Difficulty of Algo j Ability of Dataset i
Marks of dataset i for algorithm j
Dataset
Algorithm
Performance
17
Fitting the IRT model
• Maximising the expectation
• - discrimination parameter of algorithm
• - scaling parameter for the algorithm
• - difficulty parameter for the algorithm
• - score of the algorithm on the dataset/problem
• - prior probabilities
Eθ|Λ(t),Z [ln p (Λ|θ, Z)] = N
n
∑
j=1
(ln αj + ln γj) −
1
2
N
∑
i=1
n
∑
j=1
α2
j ((βj + γjzij − μ(t)
i )
2
+ σ(t)2
)
+ ln p (Λ) + const
αj
γj
βj
zij
Λ
New meaning of IRT parameters?
● Discrimination -> anomalousness, algorithm consistency
● Difficulty -> algorithm difficulty limit
● Student ability -> Dataset difficulty spectrum
What can we say . . .
• Consistent algorithms give similar performance for easy or hard datasets
• Algorithms with higher difficulty limits can handle harder problems
• Anomalous algorithms give bad performance for easy problems and good
performance for difficult problems
20
Dataset difficulty
●
● Is a function of discrimination, difficulty and scores
dataset i di
ffi
culty = −
∑j
̂
α2
j (
̂
βj + ̂
γjzij)
∑j
̂
α2
j
Example – fitting IRT model
• Anomaly detection algorithms
• 8 anomaly detection algorithms
• 3142 datasets
• Our performance metric is AUROC (looking at the performance vs actual)
22
Anomaly detection
algorithms
23
Algorithm Consistency Difficulty Limit Anomalous
Ensemble 55.08 -66.55 FALSE
LOF 4.50 5.10 FALSE
KNN 1.72 2.30 FALSE
FAST_ABOD 9.39 10.23 FALSE
Isolation Forest 2.35 3.05 FALSE
KDEOS_1 0.86 -0.31 TRUE
KDEOS-2 1.16 -0.51 TRUE
LDF 2.01 2.08 FALSE
Dataset difficulty spectrum
IRT Student Ability -> Dataset Difficulty
24
Focusing on algorithm performance
• We have algorithm performance (y axis)
and problem difficulty (x axis)
• We can fit a model and find how each
algorithm performs
• We use smoothing splines
• Can visualize them
• No parameters to specify
25
26
Strengths and
weaknesses
Strengths and weaknesses of algorithms
• If the curve is on top – it is strong in that region
• If the curve is at bottom – weak in that region
27
AIRT
IRT
Question and
student
characteristics
Algorithm and
dataset
characteristics
Visualise strengths
and weaknesses
Portfolio
construction
28
AIRT performs well, when . . .
• The set of algorithms is diverse.
• Ties back to IRT basics
• IRT in education – If all the questions are equally discriminative and difficult,
IRT doesn’t add much
• IRT useful when we have a diverse set of questions and we want to know
• Whichquestionsaremorediscriminative
• Whichquestionsaredifficult
29
AIRT
• Understand moreaboutalgorithms
• Anomalousness,consistency,difficultylimit
• Visualisethestrengthsandweaknessesofalgorithms
• Selectaportfolioofgood algorithms
• Includesdiagnostics
• Rpackage airt(onCRAN)
• https://sevvandi.github.io/airt/
•
30
A different application of IRT
31
IRT to build an ensemble
● Previous work was using performance - marks
● What about using original responses?
● Like survey questions
● Rosenberg's Self-Esteem Scale - I feel I am a person of worth
(Strongly agree/Agree/Neutral/Disagree/Strongly disagree)
● No right or wrong answer
● Latent trait gives the person’s self esteem
● Latent trait uncovers the “hidden quality”
Unsupervised algorithms
● Instead of performance values what if you have original responses?
Q 1 Q 2 Q 3 Q 4
Stu 1 0.95 0.87 0.67 0.84
Stu 2 0.57 0.49 0.78 0.77
Stu n 0.75 0.86 0.57 0.45
latent value (i) = −
∑j
̂
α2
j (
̂
βj + ̂
γjzij)
∑j
̂
α2
j
What is an anomaly detection ensemble?
Dataset
Unsupervised
AD methods
• The AD methods are heterogenous methods
• Ensembles – use existing methods to come up with better anomaly detection/
scores
AD ensemble
Ensemble
Score
Anomaly detection ensembles
● Latent trait = anomalousness of the observations = the ensemble score
●
, a weighted score of original responses
●
ensemble score of observa
ti
on (i) = −
∑j
̂
α2
j (
̂
βj + ̂
γjzij)
∑j
̂
α2
j
IRT Ensemble
36
𝑌
𝑁
×
𝑛
IRT
ensemble
Ensemble
Score
IRT for anomaly detection ensembles
• R package – outlierensembles – on CRAN
IRT and algorithms
● Algorithm evaluation
- AIRT
● Anomaly detection
ensembles
● IRT used in ML more
39
New AIRT parameters
●
●
●
● Where M, V and C denote mean, variance, and covariance terms
γ(t+1)
j
=
V (μ(t)
i ) + σ(t)2
Cj (zij, μ(t)
i )
β(t+1)
j
= M (μ(t)
i ) − γ(t+1)
j
Mj (zij)
α(t+1)
j
=
(
γ(t+1)2
j
Vj(zij) − V (μ(t)
i ) − σ(t)2
)
−1/2
Explaining notation
●
●
●
●
●
Mj (zij) =
∑i
zij
N
M (μ(t)
i ) =
∑i
μ(t)
i
N
V (zij) =
∑i
z2
ij
N
− Mj (zij)
2
V (μ(t)
i ) =
∑i
μ(t)2
i
N
− M (μ(t)
i )
2
Cj (zij, μ(t)
i ) =
∑i
zijμ(t)
i
N
− Mj (zij) M (μ(t)
i )
More notation
●
● is the th iteration
●
●
p (θi |Λ(t)
, zi) =
𝒩
(θi |μ(t)
i
, σ(t)2
)
Λ(t)
= (λ1
(t)
, …, λn
(t)
) , λj
(t)
= (α(t)
j
, β(t)
j
, γ(t)
j )
T
t
σ(t)2
=
∑
j
α(t)2
j
+ σ−2
−1
μ(t)
i
= σ(t)2
∑
j
α(t)2
j (β(t)
j
+ γ(t)
j
zij) + μ
Algorithm portfolio selection
• Can use algorithm strengths to select a good portfolio of algorithms
• We call this portfolio airt portfolio
• airt – Algorithm IRT (old Scottish word – to guide)
43
AIRT framework
Performance
data
IRT Model
Algorithm and
dataset metrics
Fitting
smoothing
splines
Algorithm
strengths and
weaknesses
Airt portfolio
Evaluate
portfolios
General block
AIRT output
44
What happens to the IRT parameters?
• IRT - ability of student
• As increases probability of a higher
score increases
• What is in terms of a dataset?
• easiness of the dataset
• Dataset difficulty score
θi
θi
θi
−θi
45
Discrimination parameter
• Discrimination of item
• increases slope of curve increases
• What is in terms of an algorithm?
• - lack of stability/robustness of
algo
•
Consistency of algo
αj
αj
αj
αj
1
|αj |
46
Consistent algorithms
• Education – such a question doesn’t
give any information
• Algorithms – these algorithms are
really stable or consistent
•
Consistency =
1
|αj |
47
Anomalous algorithms
• Algorithms that perform poorly on
easy datasets and well on difficult
datasets
• Negative discrimination
• In education – such items are
discarded or revised
• If an algorithm anomalous, it is
interesting
• Anomalousness = = sign(αj)
This Photo by Unknown Author is licensed under CC BY-NC-ND
48

More Related Content

Similar to Explainable insights on algorithm performance

林守德/Practical Issues in Machine Learning
林守德/Practical Issues in Machine Learning林守德/Practical Issues in Machine Learning
林守德/Practical Issues in Machine Learning
台灣資料科學年會
 
Toward Recommendation for Upskilling: Modeling Skill Improvement and Item Dif...
Toward Recommendation for Upskilling: Modeling Skill Improvement and Item Dif...Toward Recommendation for Upskilling: Modeling Skill Improvement and Item Dif...
Toward Recommendation for Upskilling: Modeling Skill Improvement and Item Dif...
Kazutoshi Umemoto
 
Teacher-Aware Active Robot Learning
Teacher-Aware Active Robot LearningTeacher-Aware Active Robot Learning
Teacher-Aware Active Robot Learning
Mattia Racca
 
From ensembles to computer networks
From ensembles to computer networksFrom ensembles to computer networks
From ensembles to computer networks
CSIRO
 
EDA and Preprocessing in Tabular and Text data .pptx
EDA and Preprocessing in Tabular and Text data .pptxEDA and Preprocessing in Tabular and Text data .pptx
EDA and Preprocessing in Tabular and Text data .pptx
Brajkishore23
 
Cara apakah a raven matriks uji
Cara apakah a raven matriks ujiCara apakah a raven matriks uji
Cara apakah a raven matriks uji
Agel Hatmaja
 
Machine Learning Essentials Demystified part1 | Big Data Demystified
Machine Learning Essentials Demystified part1 | Big Data DemystifiedMachine Learning Essentials Demystified part1 | Big Data Demystified
Machine Learning Essentials Demystified part1 | Big Data Demystified
Omid Vahdaty
 
Machine learning ppt unit one syllabuspptx
Machine learning ppt unit one syllabuspptxMachine learning ppt unit one syllabuspptx
Machine learning ppt unit one syllabuspptx
VenkateswaraBabuRavi
 
joe beck cald talk.ppt
joe beck cald talk.pptjoe beck cald talk.ppt
joe beck cald talk.ppt
EverMontoya2
 
Ch17 lab r_verdu103: Entry level statistics exercise (descriptives)
Ch17 lab r_verdu103: Entry level statistics exercise (descriptives)Ch17 lab r_verdu103: Entry level statistics exercise (descriptives)
Ch17 lab r_verdu103: Entry level statistics exercise (descriptives)
Sherri Gunder
 
Week 1.pdf
Week 1.pdfWeek 1.pdf
Week 1.pdf
AnjaliJain608033
 
Analysing & interpreting data.ppt
Analysing & interpreting data.pptAnalysing & interpreting data.ppt
Analysing & interpreting data.ppt
manaswidebbarma1
 
Mini datathon
Mini datathonMini datathon
Mini datathon
Kunal Jain
 
Research Methodology (RM)- Scaling Techniques- MBA
Research Methodology (RM)- Scaling Techniques- MBAResearch Methodology (RM)- Scaling Techniques- MBA
Research Methodology (RM)- Scaling Techniques- MBA
Chandra Shekar Immani
 
Introduction to Boosted Trees by Tianqi Chen
Introduction to Boosted Trees by Tianqi ChenIntroduction to Boosted Trees by Tianqi Chen
Introduction to Boosted Trees by Tianqi Chen
Zhuyi Xue
 
Testing Scientific Thinking Skills protocol
Testing Scientific Thinking Skills protocolTesting Scientific Thinking Skills protocol
Testing Scientific Thinking Skills protocol
jdomen44
 
Aplicando Analítica de Aprendizaje para la Evaluación de Competencias y Compo...
Aplicando Analítica de Aprendizaje para la Evaluación de Competencias y Compo...Aplicando Analítica de Aprendizaje para la Evaluación de Competencias y Compo...
Aplicando Analítica de Aprendizaje para la Evaluación de Competencias y Compo...
Facultad de Informática UCM
 
Learning Analytics for the Evaluation of Competencies and Behaviors in Seriou...
Learning Analytics for the Evaluation of Competencies and Behaviors in Seriou...Learning Analytics for the Evaluation of Competencies and Behaviors in Seriou...
Learning Analytics for the Evaluation of Competencies and Behaviors in Seriou...
MIT
 
ai4.ppt
ai4.pptai4.ppt
ai4.ppt
ssuser448ad3
 
Teaching Constraint Programming, Patrick Prosser
Teaching Constraint Programming,  Patrick ProsserTeaching Constraint Programming,  Patrick Prosser
Teaching Constraint Programming, Patrick Prosser
Pierre Schaus
 

Similar to Explainable insights on algorithm performance (20)

林守德/Practical Issues in Machine Learning
林守德/Practical Issues in Machine Learning林守德/Practical Issues in Machine Learning
林守德/Practical Issues in Machine Learning
 
Toward Recommendation for Upskilling: Modeling Skill Improvement and Item Dif...
Toward Recommendation for Upskilling: Modeling Skill Improvement and Item Dif...Toward Recommendation for Upskilling: Modeling Skill Improvement and Item Dif...
Toward Recommendation for Upskilling: Modeling Skill Improvement and Item Dif...
 
Teacher-Aware Active Robot Learning
Teacher-Aware Active Robot LearningTeacher-Aware Active Robot Learning
Teacher-Aware Active Robot Learning
 
From ensembles to computer networks
From ensembles to computer networksFrom ensembles to computer networks
From ensembles to computer networks
 
EDA and Preprocessing in Tabular and Text data .pptx
EDA and Preprocessing in Tabular and Text data .pptxEDA and Preprocessing in Tabular and Text data .pptx
EDA and Preprocessing in Tabular and Text data .pptx
 
Cara apakah a raven matriks uji
Cara apakah a raven matriks ujiCara apakah a raven matriks uji
Cara apakah a raven matriks uji
 
Machine Learning Essentials Demystified part1 | Big Data Demystified
Machine Learning Essentials Demystified part1 | Big Data DemystifiedMachine Learning Essentials Demystified part1 | Big Data Demystified
Machine Learning Essentials Demystified part1 | Big Data Demystified
 
Machine learning ppt unit one syllabuspptx
Machine learning ppt unit one syllabuspptxMachine learning ppt unit one syllabuspptx
Machine learning ppt unit one syllabuspptx
 
joe beck cald talk.ppt
joe beck cald talk.pptjoe beck cald talk.ppt
joe beck cald talk.ppt
 
Ch17 lab r_verdu103: Entry level statistics exercise (descriptives)
Ch17 lab r_verdu103: Entry level statistics exercise (descriptives)Ch17 lab r_verdu103: Entry level statistics exercise (descriptives)
Ch17 lab r_verdu103: Entry level statistics exercise (descriptives)
 
Week 1.pdf
Week 1.pdfWeek 1.pdf
Week 1.pdf
 
Analysing & interpreting data.ppt
Analysing & interpreting data.pptAnalysing & interpreting data.ppt
Analysing & interpreting data.ppt
 
Mini datathon
Mini datathonMini datathon
Mini datathon
 
Research Methodology (RM)- Scaling Techniques- MBA
Research Methodology (RM)- Scaling Techniques- MBAResearch Methodology (RM)- Scaling Techniques- MBA
Research Methodology (RM)- Scaling Techniques- MBA
 
Introduction to Boosted Trees by Tianqi Chen
Introduction to Boosted Trees by Tianqi ChenIntroduction to Boosted Trees by Tianqi Chen
Introduction to Boosted Trees by Tianqi Chen
 
Testing Scientific Thinking Skills protocol
Testing Scientific Thinking Skills protocolTesting Scientific Thinking Skills protocol
Testing Scientific Thinking Skills protocol
 
Aplicando Analítica de Aprendizaje para la Evaluación de Competencias y Compo...
Aplicando Analítica de Aprendizaje para la Evaluación de Competencias y Compo...Aplicando Analítica de Aprendizaje para la Evaluación de Competencias y Compo...
Aplicando Analítica de Aprendizaje para la Evaluación de Competencias y Compo...
 
Learning Analytics for the Evaluation of Competencies and Behaviors in Seriou...
Learning Analytics for the Evaluation of Competencies and Behaviors in Seriou...Learning Analytics for the Evaluation of Competencies and Behaviors in Seriou...
Learning Analytics for the Evaluation of Competencies and Behaviors in Seriou...
 
ai4.ppt
ai4.pptai4.ppt
ai4.ppt
 
Teaching Constraint Programming, Patrick Prosser
Teaching Constraint Programming,  Patrick ProsserTeaching Constraint Programming,  Patrick Prosser
Teaching Constraint Programming, Patrick Prosser
 

More from CSIRO

The painful removal of tiling artefacts in hypersprectral data
The painful removal of tiling artefacts in hypersprectral dataThe painful removal of tiling artefacts in hypersprectral data
The painful removal of tiling artefacts in hypersprectral data
CSIRO
 
The painful removal of tiling artefacts in ToF-SIMS data
The painful removal of tiling artefacts in ToF-SIMS dataThe painful removal of tiling artefacts in ToF-SIMS data
The painful removal of tiling artefacts in ToF-SIMS data
CSIRO
 
Sophisticated tools for spatio-temporal data exploration
Sophisticated tools for spatio-temporal data explorationSophisticated tools for spatio-temporal data exploration
Sophisticated tools for spatio-temporal data exploration
CSIRO
 
A time series of networks. Is everything OK? Are there anomalies?
A time series of networks. Is everything OK? Are there anomalies?A time series of networks. Is everything OK? Are there anomalies?
A time series of networks. Is everything OK? Are there anomalies?
CSIRO
 
Anomalous Networks
Anomalous NetworksAnomalous Networks
Anomalous Networks
CSIRO
 
Four, fast geostatistical methods - a comparison
Four, fast geostatistical methods - a comparisonFour, fast geostatistical methods - a comparison
Four, fast geostatistical methods - a comparison
CSIRO
 
Comparison of geostatistical methods for spatial data
Comparison of geostatistical methods for spatial dataComparison of geostatistical methods for spatial data
Comparison of geostatistical methods for spatial data
CSIRO
 
Anomalies and events keep us on our toes
Anomalies and events keep us on our toesAnomalies and events keep us on our toes
Anomalies and events keep us on our toes
CSIRO
 
Mathematics of anomalies
Mathematics of anomaliesMathematics of anomalies
Mathematics of anomalies
CSIRO
 
Here is the anomalow-down!
Here is the anomalow-down!Here is the anomalow-down!
Here is the anomalow-down!
CSIRO
 
Looking out for anomalies
Looking out for anomaliesLooking out for anomalies
Looking out for anomalies
CSIRO
 

More from CSIRO (11)

The painful removal of tiling artefacts in hypersprectral data
The painful removal of tiling artefacts in hypersprectral dataThe painful removal of tiling artefacts in hypersprectral data
The painful removal of tiling artefacts in hypersprectral data
 
The painful removal of tiling artefacts in ToF-SIMS data
The painful removal of tiling artefacts in ToF-SIMS dataThe painful removal of tiling artefacts in ToF-SIMS data
The painful removal of tiling artefacts in ToF-SIMS data
 
Sophisticated tools for spatio-temporal data exploration
Sophisticated tools for spatio-temporal data explorationSophisticated tools for spatio-temporal data exploration
Sophisticated tools for spatio-temporal data exploration
 
A time series of networks. Is everything OK? Are there anomalies?
A time series of networks. Is everything OK? Are there anomalies?A time series of networks. Is everything OK? Are there anomalies?
A time series of networks. Is everything OK? Are there anomalies?
 
Anomalous Networks
Anomalous NetworksAnomalous Networks
Anomalous Networks
 
Four, fast geostatistical methods - a comparison
Four, fast geostatistical methods - a comparisonFour, fast geostatistical methods - a comparison
Four, fast geostatistical methods - a comparison
 
Comparison of geostatistical methods for spatial data
Comparison of geostatistical methods for spatial dataComparison of geostatistical methods for spatial data
Comparison of geostatistical methods for spatial data
 
Anomalies and events keep us on our toes
Anomalies and events keep us on our toesAnomalies and events keep us on our toes
Anomalies and events keep us on our toes
 
Mathematics of anomalies
Mathematics of anomaliesMathematics of anomalies
Mathematics of anomalies
 
Here is the anomalow-down!
Here is the anomalow-down!Here is the anomalow-down!
Here is the anomalow-down!
 
Looking out for anomalies
Looking out for anomaliesLooking out for anomalies
Looking out for anomalies
 

Recently uploaded

一比一原版悉尼大学毕业证如何办理
一比一原版悉尼大学毕业证如何办理一比一原版悉尼大学毕业证如何办理
一比一原版悉尼大学毕业证如何办理
keesa2
 
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
slg6lamcq
 
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
eoxhsaa
 
一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理
一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理
一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理
osoyvvf
 
Build applications with generative AI on Google Cloud
Build applications with generative AI on Google CloudBuild applications with generative AI on Google Cloud
Build applications with generative AI on Google Cloud
Márton Kodok
 
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
ywqeos
 
Sample Devops SRE Product Companies .pdf
Sample Devops SRE  Product Companies .pdfSample Devops SRE  Product Companies .pdf
Sample Devops SRE Product Companies .pdf
Vineet
 
Cell The Unit of Life for NEET Multiple Choice Questions.docx
Cell The Unit of Life for NEET Multiple Choice Questions.docxCell The Unit of Life for NEET Multiple Choice Questions.docx
Cell The Unit of Life for NEET Multiple Choice Questions.docx
vasanthatpuram
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
Timothy Spann
 
Template xxxxxxxx ssssssssssss Sertifikat.pptx
Template xxxxxxxx ssssssssssss Sertifikat.pptxTemplate xxxxxxxx ssssssssssss Sertifikat.pptx
Template xxxxxxxx ssssssssssss Sertifikat.pptx
TeukuEriSyahputra
 
原版一比一爱尔兰都柏林大学毕业证(UCD毕业证书)如何办理
原版一比一爱尔兰都柏林大学毕业证(UCD毕业证书)如何办理 原版一比一爱尔兰都柏林大学毕业证(UCD毕业证书)如何办理
原版一比一爱尔兰都柏林大学毕业证(UCD毕业证书)如何办理
tzu5xla
 
Econ3060_Screen Time and Success_ final_GroupProject.pdf
Econ3060_Screen Time and Success_ final_GroupProject.pdfEcon3060_Screen Time and Success_ final_GroupProject.pdf
Econ3060_Screen Time and Success_ final_GroupProject.pdf
blueshagoo1
 
Digital Marketing Performance Marketing Sample .pdf
Digital Marketing Performance Marketing  Sample .pdfDigital Marketing Performance Marketing  Sample .pdf
Digital Marketing Performance Marketing Sample .pdf
Vineet
 
一比一原版(Sheffield毕业证书)谢菲尔德大学毕业证如何办理
一比一原版(Sheffield毕业证书)谢菲尔德大学毕业证如何办理一比一原版(Sheffield毕业证书)谢菲尔德大学毕业证如何办理
一比一原版(Sheffield毕业证书)谢菲尔德大学毕业证如何办理
1tyxnjpia
 
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
z6osjkqvd
 
ML-PPT-UNIT-2 Generative Classifiers Discriminative Classifiers
ML-PPT-UNIT-2 Generative Classifiers Discriminative ClassifiersML-PPT-UNIT-2 Generative Classifiers Discriminative Classifiers
ML-PPT-UNIT-2 Generative Classifiers Discriminative Classifiers
MastanaihnaiduYasam
 
一比一原版南昆士兰大学毕业证如何办理
一比一原版南昆士兰大学毕业证如何办理一比一原版南昆士兰大学毕业证如何办理
一比一原版南昆士兰大学毕业证如何办理
ugydym
 
一比一原版(uob毕业证书)伯明翰大学毕业证如何办理
一比一原版(uob毕业证书)伯明翰大学毕业证如何办理一比一原版(uob毕业证书)伯明翰大学毕业证如何办理
一比一原版(uob毕业证书)伯明翰大学毕业证如何办理
9gr6pty
 
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
hqfek
 
一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理
一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理
一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理
eudsoh
 

Recently uploaded (20)

一比一原版悉尼大学毕业证如何办理
一比一原版悉尼大学毕业证如何办理一比一原版悉尼大学毕业证如何办理
一比一原版悉尼大学毕业证如何办理
 
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
 
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
 
一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理
一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理
一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理
 
Build applications with generative AI on Google Cloud
Build applications with generative AI on Google CloudBuild applications with generative AI on Google Cloud
Build applications with generative AI on Google Cloud
 
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
 
Sample Devops SRE Product Companies .pdf
Sample Devops SRE  Product Companies .pdfSample Devops SRE  Product Companies .pdf
Sample Devops SRE Product Companies .pdf
 
Cell The Unit of Life for NEET Multiple Choice Questions.docx
Cell The Unit of Life for NEET Multiple Choice Questions.docxCell The Unit of Life for NEET Multiple Choice Questions.docx
Cell The Unit of Life for NEET Multiple Choice Questions.docx
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
 
Template xxxxxxxx ssssssssssss Sertifikat.pptx
Template xxxxxxxx ssssssssssss Sertifikat.pptxTemplate xxxxxxxx ssssssssssss Sertifikat.pptx
Template xxxxxxxx ssssssssssss Sertifikat.pptx
 
原版一比一爱尔兰都柏林大学毕业证(UCD毕业证书)如何办理
原版一比一爱尔兰都柏林大学毕业证(UCD毕业证书)如何办理 原版一比一爱尔兰都柏林大学毕业证(UCD毕业证书)如何办理
原版一比一爱尔兰都柏林大学毕业证(UCD毕业证书)如何办理
 
Econ3060_Screen Time and Success_ final_GroupProject.pdf
Econ3060_Screen Time and Success_ final_GroupProject.pdfEcon3060_Screen Time and Success_ final_GroupProject.pdf
Econ3060_Screen Time and Success_ final_GroupProject.pdf
 
Digital Marketing Performance Marketing Sample .pdf
Digital Marketing Performance Marketing  Sample .pdfDigital Marketing Performance Marketing  Sample .pdf
Digital Marketing Performance Marketing Sample .pdf
 
一比一原版(Sheffield毕业证书)谢菲尔德大学毕业证如何办理
一比一原版(Sheffield毕业证书)谢菲尔德大学毕业证如何办理一比一原版(Sheffield毕业证书)谢菲尔德大学毕业证如何办理
一比一原版(Sheffield毕业证书)谢菲尔德大学毕业证如何办理
 
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
 
ML-PPT-UNIT-2 Generative Classifiers Discriminative Classifiers
ML-PPT-UNIT-2 Generative Classifiers Discriminative ClassifiersML-PPT-UNIT-2 Generative Classifiers Discriminative Classifiers
ML-PPT-UNIT-2 Generative Classifiers Discriminative Classifiers
 
一比一原版南昆士兰大学毕业证如何办理
一比一原版南昆士兰大学毕业证如何办理一比一原版南昆士兰大学毕业证如何办理
一比一原版南昆士兰大学毕业证如何办理
 
一比一原版(uob毕业证书)伯明翰大学毕业证如何办理
一比一原版(uob毕业证书)伯明翰大学毕业证如何办理一比一原版(uob毕业证书)伯明翰大学毕业证如何办理
一比一原版(uob毕业证书)伯明翰大学毕业证如何办理
 
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
 
一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理
一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理
一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理
 

Explainable insights on algorithm performance

  • 1. Explainable insights on algorithm performance Sevvandi Kandanaarachchi Work with Kate Smith-Miles 1
  • 2. To explain or to predict? – Galit Shmueli ● Paper by Shmueli in 2010 ● Talks about these two topics ● Argues that explanation and prediction are different ● Two modelling paths – for predicting and explaining ● Social sciences have done explanatory models for a long time
  • 3. What is an explanation? • To explain an event is to provide some information about its causal history. – Lewis, 1986 (Causal Explanation) • A statement or an account that makes something clear – Google • It is important to note that the solution to explainable AI is not just ‘more AI’ - Miller, 2019 • Miller (2019) argues for Social Science + Computer Science in XAI In the fields of philosophy, cognitive psychology/science, and social psychology, there is a vast and mature body of work that studies these exact topics. 3
  • 4. Message: Bring in the social scientists to the party! Integrate their methods! 4
  • 5. What is this talk about? ● Using a method in social sciences to do two ML type tasks ● Evaluate algorithms ● It gives us more meaningful metrics about algorithms ● Has some causal interpretations ● Visually inspect where algorithms perform well (and poorly) ● Ensembles ● Anomaly detection ensembles
  • 6. What is algorithm evaluation? • Performance of many algorithms to many problems • How do you explain the algorithm performance? • Standard statistical analysis misses many things Algo 1 Algo 2 Algo 3 Algo 4 Problem 1 Problem 2 Problem 3 Problem 4 Problem 5 6
  • 7. We want to evaluate algorithms performance . . . . . . in a way that we understand algorithms and problems better! 7
  • 9. Item Response Theory (IRT) • Modelsusedinsocialsciences/psychometrics • Unobservablecharacteristicsandobserved outcomes • Verbalormathematicalability • Racialprejudiceor stressproneness • Politicalinclinations • Intrinsic“quality”thatcannotbemeasured directly 9 This Photo by Unknown Author is licensed under CC BY-SA
  • 10. IRT in education • Finds the discrimination and difficulty of test questions • And the ability of the test participants • By fitting an IRT model • In education – questions that can discriminate between students with different ability is preferred to “very difficult” questions. 10
  • 11. How it works 11 Questions Students Q 1 Q 2 Q 3 Q 4 Stu 1 0.95 0.87 0.67 0.84 Stu 2 0.57 0.49 0.78 0.77 Stu n 0.75 0.86 0.57 0.45 IRT Model Discrimination of questions Difficulty of questions Ability of students (latent trait) Matrix βj αj θi
  • 12. What does IRT give us? • Q1 - discrimination , difficulty • Q2 - discrimination , difficulty • Q3 - discrimination , difficulty • Q4 - discrimination , difficulty • Student 1 ability ︙ • Student n ability α1 β1 α2 β2 α3 β3 α4 β4 θ1 θn Q 1 Q 2 Q 3 Q 4 Stu 1 0.95 0.87 0.67 0.84 Stu 2 0.57 0.49 0.78 0.77 Stu n 0.75 0.86 0.57 0.45 12
  • 13. The causal understanding 𝛼 𝑗 𝛽 𝑗 𝜃 𝑖 𝑥 𝑖 𝑗 Discrimination of Q j Difficulty of Q j Ability of student i Marks of student i for Question j Student Question Marks 13
  • 14. Dichotomous IRT ● Multiple choice ● True or false ● ● - outcome/score of examinee for item ● - examinee’s ability ● - guessing parameter for item ● - difficulty parameter ● - discrimination 𝜙 ( 𝑥 𝑖 𝑗 = 1 𝜃 𝑖 , 𝛼 𝑗 , 𝑑 𝑗 , 𝛾 𝑗 ) = 𝛾 𝑗 + 1 − 𝛾 𝑗 1 + exp( − 𝛼 𝑗 ( 𝜃 𝑖 − 𝑑 𝑗 )) 𝑥 𝑖 𝑗 𝑖𝑗𝜃 𝑖 ( 𝑖 ) 𝛾 𝑗 𝑗𝑑 𝑗 𝛼 𝑗 This Photo by Unknown Author is licensed under CC BY-NC 14
  • 15. Continuous IRT • Grades out of 100 • A 2D surface of probabilities f (zj |θ) = αjγj 2π exp ( − α2 j 2 (θ − βj − γjzj) 2 ) 15
  • 16. Mapping algorithm evaluation to IRT IRT Model Students Test questions 16 Problems/datasets Algorithms
  • 17. The new mapping 𝛼 𝑗 𝛽 𝑗 𝜃 𝑖 𝑥 𝑖 𝑗 Discrimination of Algo j Difficulty of Algo j Ability of Dataset i Marks of dataset i for algorithm j Dataset Algorithm Performance 17
  • 18. Fitting the IRT model • Maximising the expectation • - discrimination parameter of algorithm • - scaling parameter for the algorithm • - difficulty parameter for the algorithm • - score of the algorithm on the dataset/problem • - prior probabilities Eθ|Λ(t),Z [ln p (Λ|θ, Z)] = N n ∑ j=1 (ln αj + ln γj) − 1 2 N ∑ i=1 n ∑ j=1 α2 j ((βj + γjzij − μ(t) i ) 2 + σ(t)2 ) + ln p (Λ) + const αj γj βj zij Λ
  • 19. New meaning of IRT parameters? ● Discrimination -> anomalousness, algorithm consistency ● Difficulty -> algorithm difficulty limit ● Student ability -> Dataset difficulty spectrum
  • 20. What can we say . . . • Consistent algorithms give similar performance for easy or hard datasets • Algorithms with higher difficulty limits can handle harder problems • Anomalous algorithms give bad performance for easy problems and good performance for difficult problems 20
  • 21. Dataset difficulty ● ● Is a function of discrimination, difficulty and scores dataset i di ffi culty = − ∑j ̂ α2 j ( ̂ βj + ̂ γjzij) ∑j ̂ α2 j
  • 22. Example – fitting IRT model • Anomaly detection algorithms • 8 anomaly detection algorithms • 3142 datasets • Our performance metric is AUROC (looking at the performance vs actual) 22
  • 23. Anomaly detection algorithms 23 Algorithm Consistency Difficulty Limit Anomalous Ensemble 55.08 -66.55 FALSE LOF 4.50 5.10 FALSE KNN 1.72 2.30 FALSE FAST_ABOD 9.39 10.23 FALSE Isolation Forest 2.35 3.05 FALSE KDEOS_1 0.86 -0.31 TRUE KDEOS-2 1.16 -0.51 TRUE LDF 2.01 2.08 FALSE
  • 24. Dataset difficulty spectrum IRT Student Ability -> Dataset Difficulty 24
  • 25. Focusing on algorithm performance • We have algorithm performance (y axis) and problem difficulty (x axis) • We can fit a model and find how each algorithm performs • We use smoothing splines • Can visualize them • No parameters to specify 25
  • 27. Strengths and weaknesses of algorithms • If the curve is on top – it is strong in that region • If the curve is at bottom – weak in that region 27
  • 29. AIRT performs well, when . . . • The set of algorithms is diverse. • Ties back to IRT basics • IRT in education – If all the questions are equally discriminative and difficult, IRT doesn’t add much • IRT useful when we have a diverse set of questions and we want to know • Whichquestionsaremorediscriminative • Whichquestionsaredifficult 29
  • 30. AIRT • Understand moreaboutalgorithms • Anomalousness,consistency,difficultylimit • Visualisethestrengthsandweaknessesofalgorithms • Selectaportfolioofgood algorithms • Includesdiagnostics • Rpackage airt(onCRAN) • https://sevvandi.github.io/airt/ • 30
  • 32. IRT to build an ensemble ● Previous work was using performance - marks ● What about using original responses? ● Like survey questions ● Rosenberg's Self-Esteem Scale - I feel I am a person of worth (Strongly agree/Agree/Neutral/Disagree/Strongly disagree) ● No right or wrong answer ● Latent trait gives the person’s self esteem ● Latent trait uncovers the “hidden quality”
  • 33. Unsupervised algorithms ● Instead of performance values what if you have original responses? Q 1 Q 2 Q 3 Q 4 Stu 1 0.95 0.87 0.67 0.84 Stu 2 0.57 0.49 0.78 0.77 Stu n 0.75 0.86 0.57 0.45 latent value (i) = − ∑j ̂ α2 j ( ̂ βj + ̂ γjzij) ∑j ̂ α2 j
  • 34. What is an anomaly detection ensemble? Dataset Unsupervised AD methods • The AD methods are heterogenous methods • Ensembles – use existing methods to come up with better anomaly detection/ scores AD ensemble Ensemble Score
  • 35. Anomaly detection ensembles ● Latent trait = anomalousness of the observations = the ensemble score ● , a weighted score of original responses ● ensemble score of observa ti on (i) = − ∑j ̂ α2 j ( ̂ βj + ̂ γjzij) ∑j ̂ α2 j
  • 37. IRT for anomaly detection ensembles • R package – outlierensembles – on CRAN
  • 38. IRT and algorithms ● Algorithm evaluation - AIRT ● Anomaly detection ensembles ● IRT used in ML more
  • 39. 39
  • 40. New AIRT parameters ● ● ● ● Where M, V and C denote mean, variance, and covariance terms γ(t+1) j = V (μ(t) i ) + σ(t)2 Cj (zij, μ(t) i ) β(t+1) j = M (μ(t) i ) − γ(t+1) j Mj (zij) α(t+1) j = ( γ(t+1)2 j Vj(zij) − V (μ(t) i ) − σ(t)2 ) −1/2
  • 41. Explaining notation ● ● ● ● ● Mj (zij) = ∑i zij N M (μ(t) i ) = ∑i μ(t) i N V (zij) = ∑i z2 ij N − Mj (zij) 2 V (μ(t) i ) = ∑i μ(t)2 i N − M (μ(t) i ) 2 Cj (zij, μ(t) i ) = ∑i zijμ(t) i N − Mj (zij) M (μ(t) i )
  • 42. More notation ● ● is the th iteration ● ● p (θi |Λ(t) , zi) = 𝒩 (θi |μ(t) i , σ(t)2 ) Λ(t) = (λ1 (t) , …, λn (t) ) , λj (t) = (α(t) j , β(t) j , γ(t) j ) T t σ(t)2 = ∑ j α(t)2 j + σ−2 −1 μ(t) i = σ(t)2 ∑ j α(t)2 j (β(t) j + γ(t) j zij) + μ
  • 43. Algorithm portfolio selection • Can use algorithm strengths to select a good portfolio of algorithms • We call this portfolio airt portfolio • airt – Algorithm IRT (old Scottish word – to guide) 43
  • 44. AIRT framework Performance data IRT Model Algorithm and dataset metrics Fitting smoothing splines Algorithm strengths and weaknesses Airt portfolio Evaluate portfolios General block AIRT output 44
  • 45. What happens to the IRT parameters? • IRT - ability of student • As increases probability of a higher score increases • What is in terms of a dataset? • easiness of the dataset • Dataset difficulty score θi θi θi −θi 45
  • 46. Discrimination parameter • Discrimination of item • increases slope of curve increases • What is in terms of an algorithm? • - lack of stability/robustness of algo • Consistency of algo αj αj αj αj 1 |αj | 46
  • 47. Consistent algorithms • Education – such a question doesn’t give any information • Algorithms – these algorithms are really stable or consistent • Consistency = 1 |αj | 47
  • 48. Anomalous algorithms • Algorithms that perform poorly on easy datasets and well on difficult datasets • Negative discrimination • In education – such items are discarded or revised • If an algorithm anomalous, it is interesting • Anomalousness = = sign(αj) This Photo by Unknown Author is licensed under CC BY-NC-ND 48