SlideShare a Scribd company logo
1 of 43
Using educational
models to explain
algorithm
evaluation
Sevvandi Kandanaarachchi
Work with Kate Smith-Miles
1
What is an explanation?
โ€ข โ€œTo explain an event is to provide some information about its
causal history.โ€ โ€“ Lewis , 1986 (Causal Explanation)
โ€ข โ€œA statement or an account that makes something clearโ€ โ€“
Google
โ€ข โ€œIt is important to note that the solution to explainable AI is not
just โ€˜more AIโ€™ โ€œ - Miller, 2019
2
Explanation in the social sciences
โ€ข Miller (2019) argues for Social Science + Computer Science in XAI
โ€œIn the fields of philosophy, cognitive psychology/science, and social
psychology, there is a vast and mature body of work that studies these
exact topics. For millennia, philosophers have asked the questions
about what constitutes an explanation, what is the function of
explanations, and what are their structure. For over 50 years, cognitive
and social psychologists have analysed how people attribute and
evaluate the social behaviour of others in physical environments. For
over two decades, cognitive psychologists and scientists have
investigated how people generate explanations and how they evaluate
their quality.
I argue here that there is considerable scope to infuse this valuable
body of research into explainable AI.โ€
3
Message: Bring in the social
scientists to the party!
4
What is algorithm evaluation?
โ€ข Performance of many algorithms to many problems
โ€ข How do you explain the algorithm performance?
โ€ข Standard statistical analysis misses many things
5
Algo 1 Algo 2 Algo 3 Algo 4
Problem 1
Problem 2
Problem 3
Problem 4
Problem 5
We want to evaluate algorithms
performance in a way that we
understand algorithms and problems
better!
6
7
Item Response
Theory
Item Response Theory
(IRT)
โ€ข Models used in social
sciences/psychometrics
โ€ข Unobservable characteristics and observed
outcomes
โ€ข Verbal or mathematical ability
โ€ข Racial prejudice or stress proneness
โ€ข Political inclinations
โ€ข Intrinsic โ€œqualityโ€ that cannot be
measured directly
This Photo by Unknown Author is licensed under CC BY-SA
IRT in education
โ€ข Finds the discrimination and difficulty of test
questions
โ€ข And the ability of the test participants
โ€ข By fitting an IRT model
โ€ข In education โ€“ questions that can discriminate
between students with different ability is
preferred to โ€œvery difficultโ€ questions.
9
How it works
10
Questions
Students Q 1 Q 2 Q 3 Q 4
Stu 1 0.95 0.87 0.67 0.84
Stu 2 0.57 0.49 0.78 0.77
Stu n 0.75 0.86 0.57 0.45
IRT Model
Discrimination of
questions ๐›ผ๐‘—
Difficulty of questions ๐›ฝ๐‘—
Ability of students ๐œƒ๐‘– (latent trait)
Matrix ๐‘Œ๐‘ร—๐‘›
What does IRT give us?
โ€ข Q1 - discrimination ๐›ผ1,difficulty ๐›ฝ1
โ€ข Q2 - discrimination ๐›ผ2 ,difficulty ๐›ฝ2
โ€ข Q3 - discrimination ๐›ผ3 ,difficulty ๐›ฝ3
โ€ข Q4 - discrimination ๐›ผ4 ,difficulty ๐›ฝ4
โ€ข Student 1 ability ๐œƒ1
๏ธ™
โ€ข Student n ability ๐œƒ๐‘›
11
Q 1 Q 2 Q 3 Q 4
Stu 1 0.95 0.87 0.67 0.84
Stu 2 0.57 0.49 0.78 0.77
Stu n 0.75 0.86 0.57 0.45
The causal understanding
12
๐›ผ๐‘— ๐›ฝ๐‘— ๐œƒ๐‘–
๐‘ฅ๐‘–๐‘—
Discrimination of Q j Difficulty of Q j Ability of student i
Marks of student i for Question j
Student
Question
Marks
IRT in Data Science/Machine Learning
โ€ข Relatively new area of research
โ€ข Seminal paper
โ€ข 2019 - Item response theory in AI: Analysing machine learning
classifiers at the instance level โ€“ F. Martรญnez-Plumed et al.
13
Dichotomous IRT
โ€ข Multiple choice
โ€ข True or false
โ€ข ๐œ™ ๐‘ฅ๐‘–๐‘— = 1 ๐œƒ๐‘–, ๐›ผ๐‘—, ๐‘‘๐‘—, ๐›พ๐‘— = ๐›พ๐‘— +
1 โˆ’๐›พ๐‘—
1+exp(โˆ’๐›ผ๐‘—(๐œƒ๐‘–โˆ’๐‘‘๐‘—))
โ€ข ๐‘ฅ๐‘–๐‘— - outcome/score of examinee ๐‘– for item ๐‘—
โ€ข ๐œƒ๐‘– - examineeโ€™s (๐‘–) ability
โ€ข ๐›พ๐‘— - guessing parameter for item ๐‘—
โ€ข ๐‘‘๐‘— - difficulty parameter
โ€ข ๐›ผ๐‘— - discrimination
This Photo by Unknown Author is licensed under CC BY-NC
14
Continuous IRT
โ€ข Grades out of 100
โ€ข A 2D surface of probabilities
โ€ข ๐‘ƒ(๐‘ฅ๐‘–๐‘—|๐œƒ๐‘–, ๐‘‘๐‘—, ๐›ผ๐‘—)
15
๐‘ƒ(๐‘ฅ๐‘–๐‘—|๐œƒ๐‘–, ๐‘‘๐‘—, ๐›ผ๐‘—)
Continuous IRT
โ€ข ๐‘ƒ(๐‘ฅ๐‘–๐‘—|๐œƒ๐‘–, ๐‘‘๐‘—, ๐›ผ๐‘—)
โ€ข At a fixed ๐œƒ
16
๐‘ƒ(๐‘ฅ๐‘–๐‘—|๐œƒ๐‘–, ๐‘‘๐‘—, ๐›ผ๐‘—)
๐œƒ = 2
๐œƒ = โˆ’3,
x axis is the normalized score
Mapping algorithm evaluation to IRT
IRT Model
Students
Test questions
17
Problems/datasets
Algorithms
New Mapping
IRT Model
Algorithms
Datasets/Problems
18
The new mapping
19
๐›ผ๐‘— ๐›ฝ๐‘— ๐œƒ๐‘–
๐‘ฅ๐‘–๐‘—
Discrimination of Algo j Difficulty of Algo j Ability of Dataset i
Marks of dataset i for algorithm j
Dataset
Algorithm
Performance
What happens to the IRT parameters?
โ€ข IRT - ๐œƒ๐‘– - ability of student ๐‘–
โ€ข As ๐œƒ increases probability of a
higher score increases
โ€ข What is ๐œƒ๐‘–, in terms of a
dataset?
โ€ข ๐œƒ๐‘– easiness of the dataset
โ€ข ๐›ฟ๐‘– = โˆ’๐œƒ๐‘–
โ€ข ๐›ฟ๐‘– Dataset difficulty score
20
Discrimination parameter
โ€ข Discrimination of item ๐‘— = ๐›ผ๐‘—
โ€ข ๐›ผ๐‘—increases โ†’ slope of curve
increases
โ€ข What is ๐›ผ๐‘—, in terms of an
algorithm?
โ€ข ๐›ผ๐‘—- lack of stability/robustness
of algo
โ€ข (1/|๐›ผ๐‘—|) Consistency of algo
21
Consistent algorithms
โ€ข Education โ€“ such a question
doesnโ€™t give any information
โ€ข Algorithms โ€“ these algorithms
are really stable or consistent
โ€ข Consistency = 1/|๐›ผ๐‘—|
22
Anomalous algorithms
โ€ข Algorithms that perform poorly
on easy datasets and well on
difficult datasets
โ€ข Negative discrimination
โ€ข In education โ€“ such items are
discarded or revised
โ€ข If an algorithm anomalous, it is
interesting
โ€ข Anomalousness = sign(๐›ผ๐‘—)
This Photo by Unknown Author is licensed under CC BY-NC-ND
23
Fitting the IRT model
โ€ข Maximising the expectation
โ€ข ๐ธ = ๐‘ ๐‘—(ln ๐›ผ๐‘— + ln |๐›พ๐‘—|) โˆ’ 1/ 2 ๐‘– ๐‘— ๐›ผ๐‘—
2
๐›ฝ๐‘— + ๐›พ๐‘—๐‘ง๐‘–๐‘— โˆ’ ๐œ‡๐‘–
๐‘ก
2
+
After fitting the model, we get . . .
โ€ข Algorithm metrics
โ€ข Consistency, difficulty limit, anomalousness indicator
โ€ข Dataset metrics
โ€ข Difficulty scores
25
What can we say . . .
โ€ข Consistent algorithms give similar performance for easy or hard
datasets
โ€ข Algorithms with higher difficulty limits can handle harder problems
โ€ข Anomalous algorithms give bad performance for easy problems and
good performance for difficult problems
26
Example โ€“ fitting IRT model
โ€ข Graph Colouring algorithms (Smith-Miles et al 2014)
โ€ข 8 graph colouring algorithms
โ€ข RandomGreedy, DSATUR, Bktr, HillClimber, HEA, PartialCol, TabuCol, AntCol
โ€ข How many colours did each algorithm use to colour 6712 graphs
27
Graph colouring
algorithms
28
Algorithm Consistency Difficulty Limit Anomalous
Random Greedy 1.73 1.96 FALSE
DSATUR 0.57 1.79 FALSE
Bktr 0.32 1.97 FALSE
Hill Climber 0.68 3.18 FALSE
HEA 7.76 87.53 FALSE
PartialCol 2.25 19.21 FALSE
TabuCol 4.68 65.87 FALSE
AntCol 3.09 11.61 FALSE
Dataset difficulty spectrum
29
Difficulty scores of the graph colouring
problems
30
Focusing on algorithm performance
โ€ข We have algorithm performance (y axis)
and problem difficulty (x axis)
โ€ข We can fit a model and find how each
algorithm performs
โ€ข We use smoothing splines
โ€ข Can visualize them
โ€ข No parameters to specify
31
The smoothing splines
32
Strengths and weaknesses of algorithms
โ€ข If the curve is on top โ€“ it is strong in that region
โ€ข If the curve is at bottom โ€“ weak in that region
โ€ข โ„Ž๐‘—โˆ— ๐›ฟ = max
๐‘—
โ„Ž๐‘—(๐›ฟ) - this is the best performance for a given
difficulty
โ€ข ๐‘— โˆ— is the best algorithm for that difficulty
โ€ข ๐‘ ๐‘ก๐‘Ÿ๐‘’๐‘›๐‘”๐‘กโ„Ž๐‘  ๐‘—, ๐œ– = { ๐›ฟ: โ„Ž๐‘— ๐›ฟ โˆ’ โ„Ž๐‘—โˆ— ๐›ฟ โ‰ค ๐œ–}
โ€ข We give ๐œ– leeway
โ€ข Weaknesses are similar
33
Strengths and
weaknesses
34
35
AIRT
IRT
Question and
student
characteristics
Algorithm and
dataset
characteristics
Visualize strengths
and weaknesses
Portfolio
construction
AIRT performs well, when . . .
โ€ข The set of algorithms is diverse.
โ€ข Ties back to IRT basics
โ€ข IRT in education โ€“ If all the questions are equally discriminative and
difficult, IRT doesnโ€™t add much
โ€ข IRT useful when we have a diverse set of questions and we want to
know
โ€ข Which questions are more discriminative
โ€ข Which questions are difficult
36
Summary
โ€ข Understand more about algorithms
โ€ข Anomalousness, consistency, difficulty limit
โ€ข Visualise the strengths and weaknesses of algorithms
โ€ข Select a portfolio of good algorithms
โ€ข Includes diagnostics
โ€ข R package airt (on CRAN)
โ€ข https://sevvandi.github.io/airt/
โ€ข Pre-print: http://bit.ly/algorithmirt
โ€ข Comprehensive Algorithm Portfolio Evaluation using Item Response Theory
โ€ข More applications included
37
38
Algorithm portfolio selection
โ€ข Can use algorithm strengths to select a good portfolio of algorithms
โ€ข We call this portfolio airt portfolio
โ€ข airt โ€“ Algorithm IRT (old Scottish word โ€“ to guide)
โ€ข ๐‘Ž๐‘–๐‘Ÿ๐‘ก ๐‘๐‘œ๐‘Ÿ๐‘ก๐‘“๐‘œ๐‘™๐‘–๐‘œ ๐œ– = ๐‘ ๐‘’๐‘ก ๐‘œ๐‘“ ๐‘Ž๐‘™๐‘”๐‘œ๐‘Ÿ๐‘–๐‘กโ„Ž๐‘š๐‘  ๐‘ค๐‘–๐‘กโ„Ž ๐‘ ๐‘ก๐‘Ÿ๐‘’๐‘›๐‘”๐‘กโ„Ž๐‘  (๐œ–)
39
Evaluating this portfolio
โ€ข Let ๐‘– denote a problem, ๐‘ƒ a portfolio of algorithms, ๐น the full set of
algorithms
โ€ข ๐‘ƒ๐‘’๐‘Ÿ๐‘“๐‘œ๐‘Ÿ๐‘š๐‘Ž๐‘›๐‘๐‘’. ๐‘‘๐‘’๐‘ก๐‘’๐‘Ÿ๐‘–๐‘œ๐‘Ÿ๐‘Ž๐‘ก๐‘–๐‘œ๐‘› ๐‘–, ๐‘ƒ = ๐‘๐‘’๐‘ ๐‘ก. ๐‘๐‘’๐‘Ÿ๐‘“๐‘œ๐‘Ÿ๐‘š๐‘Ž๐‘›๐‘๐‘’ ๐‘–, ๐น โˆ’
๐‘๐‘’๐‘ ๐‘ก. ๐‘๐‘’๐‘Ÿ๐‘“๐‘œ๐‘Ÿ๐‘š๐‘Ž๐‘›๐‘๐‘’(๐‘–, ๐‘ƒ)
Graph colouring example
40
AIRT framework
41
Performance
data
IRT Model
Algorithm and
dataset metrics
Fitting
smoothing
splines
Algorithm
strengths and
weaknesses
Airt portfolio
Evaluate
portfolios
General block
AIRT output
An example from ASlib data repository
42
โ€ข SAT11-individual
example
โ€ข Example where
airt doesnโ€™t do
so well.
โ€ข The curves are
quite similar to
each other
โ€ข Reason to
believe SAT11
has pre-selected
algorithms
43

More Related Content

Similar to Explainable algorithm evaluation.pptx

Cara apakah a raven matriks uji
Cara apakah a raven matriks ujiCara apakah a raven matriks uji
Cara apakah a raven matriks ujiAgel Hatmaja
ย 
Supervised machine learning algorithms(strengths and weaknesses)
Supervised machine learning algorithms(strengths and weaknesses)Supervised machine learning algorithms(strengths and weaknesses)
Supervised machine learning algorithms(strengths and weaknesses)MonarchSaha
ย 
From ensembles to computer networks
From ensembles to computer networksFrom ensembles to computer networks
From ensembles to computer networksCSIRO
ย 
Ml - A shallow dive
Ml  - A shallow diveMl  - A shallow dive
Ml - A shallow diveGopi Krishna Nuti
ย 
Mini datathon - Bengaluru
Mini datathon - BengaluruMini datathon - Bengaluru
Mini datathon - BengaluruKunal Jain
ย 
Machine Learning Essentials Demystified part1 | Big Data Demystified
Machine Learning Essentials Demystified part1 | Big Data DemystifiedMachine Learning Essentials Demystified part1 | Big Data Demystified
Machine Learning Essentials Demystified part1 | Big Data DemystifiedOmid Vahdaty
ย 
ML SFCSE.pptx
ML SFCSE.pptxML SFCSE.pptx
ML SFCSE.pptxNIKHILGR3
ย 
Machine learning meetup
Machine learning meetupMachine learning meetup
Machine learning meetupQuantUniversity
ย 
Week_1 Machine Learning introduction.pptx
Week_1 Machine Learning introduction.pptxWeek_1 Machine Learning introduction.pptx
Week_1 Machine Learning introduction.pptxmuhammadsamroz
ย 
Testing Scientific Thinking Skills protocol
Testing Scientific Thinking Skills protocolTesting Scientific Thinking Skills protocol
Testing Scientific Thinking Skills protocoljdomen44
ย 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine LearningFINBOURNE Technology
ย 
Weka bike rental
Weka bike rentalWeka bike rental
Weka bike rentalPratik Doshi
ย 
Machine learning ppt unit one syllabuspptx
Machine learning ppt unit one syllabuspptxMachine learning ppt unit one syllabuspptx
Machine learning ppt unit one syllabuspptxVenkateswaraBabuRavi
ย 
Data Science and Machine Learning with Tensorflow
 Data Science and Machine Learning with Tensorflow Data Science and Machine Learning with Tensorflow
Data Science and Machine Learning with TensorflowShubham Sharma
ย 
background.pptx
background.pptxbackground.pptx
background.pptxKabileshCm
ย 
EXPLORATORY DATA ANALYSIS
EXPLORATORY DATA ANALYSISEXPLORATORY DATA ANALYSIS
EXPLORATORY DATA ANALYSISBabasID2
ย 
Machine Learning from Statistical Point of View
Machine Learning from Statistical Point of ViewMachine Learning from Statistical Point of View
Machine Learning from Statistical Point of ViewYury Gubman
ย 
Lecture 3 ml
Lecture 3 mlLecture 3 ml
Lecture 3 mlKalpesh Doru
ย 

Similar to Explainable algorithm evaluation.pptx (20)

Cara apakah a raven matriks uji
Cara apakah a raven matriks ujiCara apakah a raven matriks uji
Cara apakah a raven matriks uji
ย 
Supervised machine learning algorithms(strengths and weaknesses)
Supervised machine learning algorithms(strengths and weaknesses)Supervised machine learning algorithms(strengths and weaknesses)
Supervised machine learning algorithms(strengths and weaknesses)
ย 
From ensembles to computer networks
From ensembles to computer networksFrom ensembles to computer networks
From ensembles to computer networks
ย 
Ml - A shallow dive
Ml  - A shallow diveMl  - A shallow dive
Ml - A shallow dive
ย 
Mini datathon - Bengaluru
Mini datathon - BengaluruMini datathon - Bengaluru
Mini datathon - Bengaluru
ย 
Machine Learning Essentials Demystified part1 | Big Data Demystified
Machine Learning Essentials Demystified part1 | Big Data DemystifiedMachine Learning Essentials Demystified part1 | Big Data Demystified
Machine Learning Essentials Demystified part1 | Big Data Demystified
ย 
ML SFCSE.pptx
ML SFCSE.pptxML SFCSE.pptx
ML SFCSE.pptx
ย 
Week 1.pdf
Week 1.pdfWeek 1.pdf
Week 1.pdf
ย 
Machine learning meetup
Machine learning meetupMachine learning meetup
Machine learning meetup
ย 
ๆž—ๅฎˆๅพท/Practical Issues in Machine Learning
ๆž—ๅฎˆๅพท/Practical Issues in Machine Learningๆž—ๅฎˆๅพท/Practical Issues in Machine Learning
ๆž—ๅฎˆๅพท/Practical Issues in Machine Learning
ย 
Week_1 Machine Learning introduction.pptx
Week_1 Machine Learning introduction.pptxWeek_1 Machine Learning introduction.pptx
Week_1 Machine Learning introduction.pptx
ย 
Testing Scientific Thinking Skills protocol
Testing Scientific Thinking Skills protocolTesting Scientific Thinking Skills protocol
Testing Scientific Thinking Skills protocol
ย 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
ย 
Weka bike rental
Weka bike rentalWeka bike rental
Weka bike rental
ย 
Machine learning ppt unit one syllabuspptx
Machine learning ppt unit one syllabuspptxMachine learning ppt unit one syllabuspptx
Machine learning ppt unit one syllabuspptx
ย 
Data Science and Machine Learning with Tensorflow
 Data Science and Machine Learning with Tensorflow Data Science and Machine Learning with Tensorflow
Data Science and Machine Learning with Tensorflow
ย 
background.pptx
background.pptxbackground.pptx
background.pptx
ย 
EXPLORATORY DATA ANALYSIS
EXPLORATORY DATA ANALYSISEXPLORATORY DATA ANALYSIS
EXPLORATORY DATA ANALYSIS
ย 
Machine Learning from Statistical Point of View
Machine Learning from Statistical Point of ViewMachine Learning from Statistical Point of View
Machine Learning from Statistical Point of View
ย 
Lecture 3 ml
Lecture 3 mlLecture 3 ml
Lecture 3 ml
ย 

More from CSIRO

The painful removal of tiling artefacts in hypersprectral data
The painful removal of tiling artefacts in hypersprectral dataThe painful removal of tiling artefacts in hypersprectral data
The painful removal of tiling artefacts in hypersprectral dataCSIRO
ย 
The painful removal of tiling artefacts in ToF-SIMS data
The painful removal of tiling artefacts in ToF-SIMS dataThe painful removal of tiling artefacts in ToF-SIMS data
The painful removal of tiling artefacts in ToF-SIMS dataCSIRO
ย 
Sophisticated tools for spatio-temporal data exploration
Sophisticated tools for spatio-temporal data explorationSophisticated tools for spatio-temporal data exploration
Sophisticated tools for spatio-temporal data explorationCSIRO
ย 
A time series of networks. Is everything OK? Are there anomalies?
A time series of networks. Is everything OK? Are there anomalies?A time series of networks. Is everything OK? Are there anomalies?
A time series of networks. Is everything OK? Are there anomalies?CSIRO
ย 
Anomalous Networks
Anomalous NetworksAnomalous Networks
Anomalous NetworksCSIRO
ย 
Four, fast geostatistical methods - a comparison
Four, fast geostatistical methods - a comparisonFour, fast geostatistical methods - a comparison
Four, fast geostatistical methods - a comparisonCSIRO
ย 
Comparison of geostatistical methods for spatial data
Comparison of geostatistical methods for spatial dataComparison of geostatistical methods for spatial data
Comparison of geostatistical methods for spatial dataCSIRO
ย 
Anomalies and events keep us on our toes
Anomalies and events keep us on our toesAnomalies and events keep us on our toes
Anomalies and events keep us on our toesCSIRO
ย 
Mathematics of anomalies
Mathematics of anomaliesMathematics of anomalies
Mathematics of anomaliesCSIRO
ย 
Here is the anomalow-down!
Here is the anomalow-down!Here is the anomalow-down!
Here is the anomalow-down!CSIRO
ย 
Looking out for anomalies
Looking out for anomaliesLooking out for anomalies
Looking out for anomaliesCSIRO
ย 

More from CSIRO (11)

The painful removal of tiling artefacts in hypersprectral data
The painful removal of tiling artefacts in hypersprectral dataThe painful removal of tiling artefacts in hypersprectral data
The painful removal of tiling artefacts in hypersprectral data
ย 
The painful removal of tiling artefacts in ToF-SIMS data
The painful removal of tiling artefacts in ToF-SIMS dataThe painful removal of tiling artefacts in ToF-SIMS data
The painful removal of tiling artefacts in ToF-SIMS data
ย 
Sophisticated tools for spatio-temporal data exploration
Sophisticated tools for spatio-temporal data explorationSophisticated tools for spatio-temporal data exploration
Sophisticated tools for spatio-temporal data exploration
ย 
A time series of networks. Is everything OK? Are there anomalies?
A time series of networks. Is everything OK? Are there anomalies?A time series of networks. Is everything OK? Are there anomalies?
A time series of networks. Is everything OK? Are there anomalies?
ย 
Anomalous Networks
Anomalous NetworksAnomalous Networks
Anomalous Networks
ย 
Four, fast geostatistical methods - a comparison
Four, fast geostatistical methods - a comparisonFour, fast geostatistical methods - a comparison
Four, fast geostatistical methods - a comparison
ย 
Comparison of geostatistical methods for spatial data
Comparison of geostatistical methods for spatial dataComparison of geostatistical methods for spatial data
Comparison of geostatistical methods for spatial data
ย 
Anomalies and events keep us on our toes
Anomalies and events keep us on our toesAnomalies and events keep us on our toes
Anomalies and events keep us on our toes
ย 
Mathematics of anomalies
Mathematics of anomaliesMathematics of anomalies
Mathematics of anomalies
ย 
Here is the anomalow-down!
Here is the anomalow-down!Here is the anomalow-down!
Here is the anomalow-down!
ย 
Looking out for anomalies
Looking out for anomaliesLooking out for anomalies
Looking out for anomalies
ย 

Recently uploaded

RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
ย 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
ย 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
ย 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
ย 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
ย 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]๐Ÿ“Š Markus Baersch
ย 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
ย 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
ย 
ๅŠž็†(UWICๆฏ•ไธš่ฏไนฆ)่‹ฑๅ›ฝๅก่ฟชๅคซๅŸŽๅธ‚ๅคงๅญฆๆฏ•ไธš่ฏๆˆ็ปฉๅ•ๅŽŸ็‰ˆไธ€ๆฏ”ไธ€
ๅŠž็†(UWICๆฏ•ไธš่ฏไนฆ)่‹ฑๅ›ฝๅก่ฟชๅคซๅŸŽๅธ‚ๅคงๅญฆๆฏ•ไธš่ฏๆˆ็ปฉๅ•ๅŽŸ็‰ˆไธ€ๆฏ”ไธ€ๅŠž็†(UWICๆฏ•ไธš่ฏไนฆ)่‹ฑๅ›ฝๅก่ฟชๅคซๅŸŽๅธ‚ๅคงๅญฆๆฏ•ไธš่ฏๆˆ็ปฉๅ•ๅŽŸ็‰ˆไธ€ๆฏ”ไธ€
ๅŠž็†(UWICๆฏ•ไธš่ฏไนฆ)่‹ฑๅ›ฝๅก่ฟชๅคซๅŸŽๅธ‚ๅคงๅญฆๆฏ•ไธš่ฏๆˆ็ปฉๅ•ๅŽŸ็‰ˆไธ€ๆฏ”ไธ€F La
ย 
ไธ“ไธšไธ€ๆฏ”ไธ€็พŽๅ›ฝไฟ„ไบฅไฟ„ๅคงๅญฆๆฏ•ไธš่ฏๆˆ็ปฉๅ•pdf็”ตๅญ็‰ˆๅˆถไฝœไฟฎๆ”น
ไธ“ไธšไธ€ๆฏ”ไธ€็พŽๅ›ฝไฟ„ไบฅไฟ„ๅคงๅญฆๆฏ•ไธš่ฏๆˆ็ปฉๅ•pdf็”ตๅญ็‰ˆๅˆถไฝœไฟฎๆ”นไธ“ไธšไธ€ๆฏ”ไธ€็พŽๅ›ฝไฟ„ไบฅไฟ„ๅคงๅญฆๆฏ•ไธš่ฏๆˆ็ปฉๅ•pdf็”ตๅญ็‰ˆๅˆถไฝœไฟฎๆ”น
ไธ“ไธšไธ€ๆฏ”ไธ€็พŽๅ›ฝไฟ„ไบฅไฟ„ๅคงๅญฆๆฏ•ไธš่ฏๆˆ็ปฉๅ•pdf็”ตๅญ็‰ˆๅˆถไฝœไฟฎๆ”นyuu sss
ย 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
ย 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
ย 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
ย 
ๅŠž็†ๅญฆไฝ่ฏ็บฝ็บฆๅคงๅญฆๆฏ•ไธš่ฏ(NYUๆฏ•ไธš่ฏไนฆ๏ผ‰ๅŽŸ็‰ˆไธ€ๆฏ”ไธ€
ๅŠž็†ๅญฆไฝ่ฏ็บฝ็บฆๅคงๅญฆๆฏ•ไธš่ฏ(NYUๆฏ•ไธš่ฏไนฆ๏ผ‰ๅŽŸ็‰ˆไธ€ๆฏ”ไธ€ๅŠž็†ๅญฆไฝ่ฏ็บฝ็บฆๅคงๅญฆๆฏ•ไธš่ฏ(NYUๆฏ•ไธš่ฏไนฆ๏ผ‰ๅŽŸ็‰ˆไธ€ๆฏ”ไธ€
ๅŠž็†ๅญฆไฝ่ฏ็บฝ็บฆๅคงๅญฆๆฏ•ไธš่ฏ(NYUๆฏ•ไธš่ฏไนฆ๏ผ‰ๅŽŸ็‰ˆไธ€ๆฏ”ไธ€fhwihughh
ย 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
ย 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home ServiceSapana Sha
ย 
Call Girls in Defence Colony Delhi ๐Ÿ’ฏCall Us ๐Ÿ”8264348440๐Ÿ”
Call Girls in Defence Colony Delhi ๐Ÿ’ฏCall Us ๐Ÿ”8264348440๐Ÿ”Call Girls in Defence Colony Delhi ๐Ÿ’ฏCall Us ๐Ÿ”8264348440๐Ÿ”
Call Girls in Defence Colony Delhi ๐Ÿ’ฏCall Us ๐Ÿ”8264348440๐Ÿ”soniya singh
ย 
High Class Call Girls Noida Sector 39 Aarushi ๐Ÿ”8264348440๐Ÿ” Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi ๐Ÿ”8264348440๐Ÿ” Independent Escort...High Class Call Girls Noida Sector 39 Aarushi ๐Ÿ”8264348440๐Ÿ” Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi ๐Ÿ”8264348440๐Ÿ” Independent Escort...soniya singh
ย 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
ย 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
ย 

Recently uploaded (20)

RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
ย 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
ย 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
ย 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
ย 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
ย 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
ย 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
ย 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
ย 
ๅŠž็†(UWICๆฏ•ไธš่ฏไนฆ)่‹ฑๅ›ฝๅก่ฟชๅคซๅŸŽๅธ‚ๅคงๅญฆๆฏ•ไธš่ฏๆˆ็ปฉๅ•ๅŽŸ็‰ˆไธ€ๆฏ”ไธ€
ๅŠž็†(UWICๆฏ•ไธš่ฏไนฆ)่‹ฑๅ›ฝๅก่ฟชๅคซๅŸŽๅธ‚ๅคงๅญฆๆฏ•ไธš่ฏๆˆ็ปฉๅ•ๅŽŸ็‰ˆไธ€ๆฏ”ไธ€ๅŠž็†(UWICๆฏ•ไธš่ฏไนฆ)่‹ฑๅ›ฝๅก่ฟชๅคซๅŸŽๅธ‚ๅคงๅญฆๆฏ•ไธš่ฏๆˆ็ปฉๅ•ๅŽŸ็‰ˆไธ€ๆฏ”ไธ€
ๅŠž็†(UWICๆฏ•ไธš่ฏไนฆ)่‹ฑๅ›ฝๅก่ฟชๅคซๅŸŽๅธ‚ๅคงๅญฆๆฏ•ไธš่ฏๆˆ็ปฉๅ•ๅŽŸ็‰ˆไธ€ๆฏ”ไธ€
ย 
ไธ“ไธšไธ€ๆฏ”ไธ€็พŽๅ›ฝไฟ„ไบฅไฟ„ๅคงๅญฆๆฏ•ไธš่ฏๆˆ็ปฉๅ•pdf็”ตๅญ็‰ˆๅˆถไฝœไฟฎๆ”น
ไธ“ไธšไธ€ๆฏ”ไธ€็พŽๅ›ฝไฟ„ไบฅไฟ„ๅคงๅญฆๆฏ•ไธš่ฏๆˆ็ปฉๅ•pdf็”ตๅญ็‰ˆๅˆถไฝœไฟฎๆ”นไธ“ไธšไธ€ๆฏ”ไธ€็พŽๅ›ฝไฟ„ไบฅไฟ„ๅคงๅญฆๆฏ•ไธš่ฏๆˆ็ปฉๅ•pdf็”ตๅญ็‰ˆๅˆถไฝœไฟฎๆ”น
ไธ“ไธšไธ€ๆฏ”ไธ€็พŽๅ›ฝไฟ„ไบฅไฟ„ๅคงๅญฆๆฏ•ไธš่ฏๆˆ็ปฉๅ•pdf็”ตๅญ็‰ˆๅˆถไฝœไฟฎๆ”น
ย 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
ย 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
ย 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
ย 
ๅŠž็†ๅญฆไฝ่ฏ็บฝ็บฆๅคงๅญฆๆฏ•ไธš่ฏ(NYUๆฏ•ไธš่ฏไนฆ๏ผ‰ๅŽŸ็‰ˆไธ€ๆฏ”ไธ€
ๅŠž็†ๅญฆไฝ่ฏ็บฝ็บฆๅคงๅญฆๆฏ•ไธš่ฏ(NYUๆฏ•ไธš่ฏไนฆ๏ผ‰ๅŽŸ็‰ˆไธ€ๆฏ”ไธ€ๅŠž็†ๅญฆไฝ่ฏ็บฝ็บฆๅคงๅญฆๆฏ•ไธš่ฏ(NYUๆฏ•ไธš่ฏไนฆ๏ผ‰ๅŽŸ็‰ˆไธ€ๆฏ”ไธ€
ๅŠž็†ๅญฆไฝ่ฏ็บฝ็บฆๅคงๅญฆๆฏ•ไธš่ฏ(NYUๆฏ•ไธš่ฏไนฆ๏ผ‰ๅŽŸ็‰ˆไธ€ๆฏ”ไธ€
ย 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
ย 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service
ย 
Call Girls in Defence Colony Delhi ๐Ÿ’ฏCall Us ๐Ÿ”8264348440๐Ÿ”
Call Girls in Defence Colony Delhi ๐Ÿ’ฏCall Us ๐Ÿ”8264348440๐Ÿ”Call Girls in Defence Colony Delhi ๐Ÿ’ฏCall Us ๐Ÿ”8264348440๐Ÿ”
Call Girls in Defence Colony Delhi ๐Ÿ’ฏCall Us ๐Ÿ”8264348440๐Ÿ”
ย 
High Class Call Girls Noida Sector 39 Aarushi ๐Ÿ”8264348440๐Ÿ” Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi ๐Ÿ”8264348440๐Ÿ” Independent Escort...High Class Call Girls Noida Sector 39 Aarushi ๐Ÿ”8264348440๐Ÿ” Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi ๐Ÿ”8264348440๐Ÿ” Independent Escort...
ย 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
ย 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
ย 

Explainable algorithm evaluation.pptx

  • 1. Using educational models to explain algorithm evaluation Sevvandi Kandanaarachchi Work with Kate Smith-Miles 1
  • 2. What is an explanation? โ€ข โ€œTo explain an event is to provide some information about its causal history.โ€ โ€“ Lewis , 1986 (Causal Explanation) โ€ข โ€œA statement or an account that makes something clearโ€ โ€“ Google โ€ข โ€œIt is important to note that the solution to explainable AI is not just โ€˜more AIโ€™ โ€œ - Miller, 2019 2
  • 3. Explanation in the social sciences โ€ข Miller (2019) argues for Social Science + Computer Science in XAI โ€œIn the fields of philosophy, cognitive psychology/science, and social psychology, there is a vast and mature body of work that studies these exact topics. For millennia, philosophers have asked the questions about what constitutes an explanation, what is the function of explanations, and what are their structure. For over 50 years, cognitive and social psychologists have analysed how people attribute and evaluate the social behaviour of others in physical environments. For over two decades, cognitive psychologists and scientists have investigated how people generate explanations and how they evaluate their quality. I argue here that there is considerable scope to infuse this valuable body of research into explainable AI.โ€ 3
  • 4. Message: Bring in the social scientists to the party! 4
  • 5. What is algorithm evaluation? โ€ข Performance of many algorithms to many problems โ€ข How do you explain the algorithm performance? โ€ข Standard statistical analysis misses many things 5 Algo 1 Algo 2 Algo 3 Algo 4 Problem 1 Problem 2 Problem 3 Problem 4 Problem 5
  • 6. We want to evaluate algorithms performance in a way that we understand algorithms and problems better! 6
  • 8. Item Response Theory (IRT) โ€ข Models used in social sciences/psychometrics โ€ข Unobservable characteristics and observed outcomes โ€ข Verbal or mathematical ability โ€ข Racial prejudice or stress proneness โ€ข Political inclinations โ€ข Intrinsic โ€œqualityโ€ that cannot be measured directly This Photo by Unknown Author is licensed under CC BY-SA
  • 9. IRT in education โ€ข Finds the discrimination and difficulty of test questions โ€ข And the ability of the test participants โ€ข By fitting an IRT model โ€ข In education โ€“ questions that can discriminate between students with different ability is preferred to โ€œvery difficultโ€ questions. 9
  • 10. How it works 10 Questions Students Q 1 Q 2 Q 3 Q 4 Stu 1 0.95 0.87 0.67 0.84 Stu 2 0.57 0.49 0.78 0.77 Stu n 0.75 0.86 0.57 0.45 IRT Model Discrimination of questions ๐›ผ๐‘— Difficulty of questions ๐›ฝ๐‘— Ability of students ๐œƒ๐‘– (latent trait) Matrix ๐‘Œ๐‘ร—๐‘›
  • 11. What does IRT give us? โ€ข Q1 - discrimination ๐›ผ1,difficulty ๐›ฝ1 โ€ข Q2 - discrimination ๐›ผ2 ,difficulty ๐›ฝ2 โ€ข Q3 - discrimination ๐›ผ3 ,difficulty ๐›ฝ3 โ€ข Q4 - discrimination ๐›ผ4 ,difficulty ๐›ฝ4 โ€ข Student 1 ability ๐œƒ1 ๏ธ™ โ€ข Student n ability ๐œƒ๐‘› 11 Q 1 Q 2 Q 3 Q 4 Stu 1 0.95 0.87 0.67 0.84 Stu 2 0.57 0.49 0.78 0.77 Stu n 0.75 0.86 0.57 0.45
  • 12. The causal understanding 12 ๐›ผ๐‘— ๐›ฝ๐‘— ๐œƒ๐‘– ๐‘ฅ๐‘–๐‘— Discrimination of Q j Difficulty of Q j Ability of student i Marks of student i for Question j Student Question Marks
  • 13. IRT in Data Science/Machine Learning โ€ข Relatively new area of research โ€ข Seminal paper โ€ข 2019 - Item response theory in AI: Analysing machine learning classifiers at the instance level โ€“ F. Martรญnez-Plumed et al. 13
  • 14. Dichotomous IRT โ€ข Multiple choice โ€ข True or false โ€ข ๐œ™ ๐‘ฅ๐‘–๐‘— = 1 ๐œƒ๐‘–, ๐›ผ๐‘—, ๐‘‘๐‘—, ๐›พ๐‘— = ๐›พ๐‘— + 1 โˆ’๐›พ๐‘— 1+exp(โˆ’๐›ผ๐‘—(๐œƒ๐‘–โˆ’๐‘‘๐‘—)) โ€ข ๐‘ฅ๐‘–๐‘— - outcome/score of examinee ๐‘– for item ๐‘— โ€ข ๐œƒ๐‘– - examineeโ€™s (๐‘–) ability โ€ข ๐›พ๐‘— - guessing parameter for item ๐‘— โ€ข ๐‘‘๐‘— - difficulty parameter โ€ข ๐›ผ๐‘— - discrimination This Photo by Unknown Author is licensed under CC BY-NC 14
  • 15. Continuous IRT โ€ข Grades out of 100 โ€ข A 2D surface of probabilities โ€ข ๐‘ƒ(๐‘ฅ๐‘–๐‘—|๐œƒ๐‘–, ๐‘‘๐‘—, ๐›ผ๐‘—) 15 ๐‘ƒ(๐‘ฅ๐‘–๐‘—|๐œƒ๐‘–, ๐‘‘๐‘—, ๐›ผ๐‘—)
  • 16. Continuous IRT โ€ข ๐‘ƒ(๐‘ฅ๐‘–๐‘—|๐œƒ๐‘–, ๐‘‘๐‘—, ๐›ผ๐‘—) โ€ข At a fixed ๐œƒ 16 ๐‘ƒ(๐‘ฅ๐‘–๐‘—|๐œƒ๐‘–, ๐‘‘๐‘—, ๐›ผ๐‘—) ๐œƒ = 2 ๐œƒ = โˆ’3, x axis is the normalized score
  • 17. Mapping algorithm evaluation to IRT IRT Model Students Test questions 17 Problems/datasets Algorithms
  • 19. The new mapping 19 ๐›ผ๐‘— ๐›ฝ๐‘— ๐œƒ๐‘– ๐‘ฅ๐‘–๐‘— Discrimination of Algo j Difficulty of Algo j Ability of Dataset i Marks of dataset i for algorithm j Dataset Algorithm Performance
  • 20. What happens to the IRT parameters? โ€ข IRT - ๐œƒ๐‘– - ability of student ๐‘– โ€ข As ๐œƒ increases probability of a higher score increases โ€ข What is ๐œƒ๐‘–, in terms of a dataset? โ€ข ๐œƒ๐‘– easiness of the dataset โ€ข ๐›ฟ๐‘– = โˆ’๐œƒ๐‘– โ€ข ๐›ฟ๐‘– Dataset difficulty score 20
  • 21. Discrimination parameter โ€ข Discrimination of item ๐‘— = ๐›ผ๐‘— โ€ข ๐›ผ๐‘—increases โ†’ slope of curve increases โ€ข What is ๐›ผ๐‘—, in terms of an algorithm? โ€ข ๐›ผ๐‘—- lack of stability/robustness of algo โ€ข (1/|๐›ผ๐‘—|) Consistency of algo 21
  • 22. Consistent algorithms โ€ข Education โ€“ such a question doesnโ€™t give any information โ€ข Algorithms โ€“ these algorithms are really stable or consistent โ€ข Consistency = 1/|๐›ผ๐‘—| 22
  • 23. Anomalous algorithms โ€ข Algorithms that perform poorly on easy datasets and well on difficult datasets โ€ข Negative discrimination โ€ข In education โ€“ such items are discarded or revised โ€ข If an algorithm anomalous, it is interesting โ€ข Anomalousness = sign(๐›ผ๐‘—) This Photo by Unknown Author is licensed under CC BY-NC-ND 23
  • 24. Fitting the IRT model โ€ข Maximising the expectation โ€ข ๐ธ = ๐‘ ๐‘—(ln ๐›ผ๐‘— + ln |๐›พ๐‘—|) โˆ’ 1/ 2 ๐‘– ๐‘— ๐›ผ๐‘— 2 ๐›ฝ๐‘— + ๐›พ๐‘—๐‘ง๐‘–๐‘— โˆ’ ๐œ‡๐‘– ๐‘ก 2 +
  • 25. After fitting the model, we get . . . โ€ข Algorithm metrics โ€ข Consistency, difficulty limit, anomalousness indicator โ€ข Dataset metrics โ€ข Difficulty scores 25
  • 26. What can we say . . . โ€ข Consistent algorithms give similar performance for easy or hard datasets โ€ข Algorithms with higher difficulty limits can handle harder problems โ€ข Anomalous algorithms give bad performance for easy problems and good performance for difficult problems 26
  • 27. Example โ€“ fitting IRT model โ€ข Graph Colouring algorithms (Smith-Miles et al 2014) โ€ข 8 graph colouring algorithms โ€ข RandomGreedy, DSATUR, Bktr, HillClimber, HEA, PartialCol, TabuCol, AntCol โ€ข How many colours did each algorithm use to colour 6712 graphs 27
  • 28. Graph colouring algorithms 28 Algorithm Consistency Difficulty Limit Anomalous Random Greedy 1.73 1.96 FALSE DSATUR 0.57 1.79 FALSE Bktr 0.32 1.97 FALSE Hill Climber 0.68 3.18 FALSE HEA 7.76 87.53 FALSE PartialCol 2.25 19.21 FALSE TabuCol 4.68 65.87 FALSE AntCol 3.09 11.61 FALSE
  • 30. Difficulty scores of the graph colouring problems 30
  • 31. Focusing on algorithm performance โ€ข We have algorithm performance (y axis) and problem difficulty (x axis) โ€ข We can fit a model and find how each algorithm performs โ€ข We use smoothing splines โ€ข Can visualize them โ€ข No parameters to specify 31
  • 33. Strengths and weaknesses of algorithms โ€ข If the curve is on top โ€“ it is strong in that region โ€ข If the curve is at bottom โ€“ weak in that region โ€ข โ„Ž๐‘—โˆ— ๐›ฟ = max ๐‘— โ„Ž๐‘—(๐›ฟ) - this is the best performance for a given difficulty โ€ข ๐‘— โˆ— is the best algorithm for that difficulty โ€ข ๐‘ ๐‘ก๐‘Ÿ๐‘’๐‘›๐‘”๐‘กโ„Ž๐‘  ๐‘—, ๐œ– = { ๐›ฟ: โ„Ž๐‘— ๐›ฟ โˆ’ โ„Ž๐‘—โˆ— ๐›ฟ โ‰ค ๐œ–} โ€ข We give ๐œ– leeway โ€ข Weaknesses are similar 33
  • 36. AIRT performs well, when . . . โ€ข The set of algorithms is diverse. โ€ข Ties back to IRT basics โ€ข IRT in education โ€“ If all the questions are equally discriminative and difficult, IRT doesnโ€™t add much โ€ข IRT useful when we have a diverse set of questions and we want to know โ€ข Which questions are more discriminative โ€ข Which questions are difficult 36
  • 37. Summary โ€ข Understand more about algorithms โ€ข Anomalousness, consistency, difficulty limit โ€ข Visualise the strengths and weaknesses of algorithms โ€ข Select a portfolio of good algorithms โ€ข Includes diagnostics โ€ข R package airt (on CRAN) โ€ข https://sevvandi.github.io/airt/ โ€ข Pre-print: http://bit.ly/algorithmirt โ€ข Comprehensive Algorithm Portfolio Evaluation using Item Response Theory โ€ข More applications included 37
  • 38. 38
  • 39. Algorithm portfolio selection โ€ข Can use algorithm strengths to select a good portfolio of algorithms โ€ข We call this portfolio airt portfolio โ€ข airt โ€“ Algorithm IRT (old Scottish word โ€“ to guide) โ€ข ๐‘Ž๐‘–๐‘Ÿ๐‘ก ๐‘๐‘œ๐‘Ÿ๐‘ก๐‘“๐‘œ๐‘™๐‘–๐‘œ ๐œ– = ๐‘ ๐‘’๐‘ก ๐‘œ๐‘“ ๐‘Ž๐‘™๐‘”๐‘œ๐‘Ÿ๐‘–๐‘กโ„Ž๐‘š๐‘  ๐‘ค๐‘–๐‘กโ„Ž ๐‘ ๐‘ก๐‘Ÿ๐‘’๐‘›๐‘”๐‘กโ„Ž๐‘  (๐œ–) 39
  • 40. Evaluating this portfolio โ€ข Let ๐‘– denote a problem, ๐‘ƒ a portfolio of algorithms, ๐น the full set of algorithms โ€ข ๐‘ƒ๐‘’๐‘Ÿ๐‘“๐‘œ๐‘Ÿ๐‘š๐‘Ž๐‘›๐‘๐‘’. ๐‘‘๐‘’๐‘ก๐‘’๐‘Ÿ๐‘–๐‘œ๐‘Ÿ๐‘Ž๐‘ก๐‘–๐‘œ๐‘› ๐‘–, ๐‘ƒ = ๐‘๐‘’๐‘ ๐‘ก. ๐‘๐‘’๐‘Ÿ๐‘“๐‘œ๐‘Ÿ๐‘š๐‘Ž๐‘›๐‘๐‘’ ๐‘–, ๐น โˆ’ ๐‘๐‘’๐‘ ๐‘ก. ๐‘๐‘’๐‘Ÿ๐‘“๐‘œ๐‘Ÿ๐‘š๐‘Ž๐‘›๐‘๐‘’(๐‘–, ๐‘ƒ) Graph colouring example 40
  • 41. AIRT framework 41 Performance data IRT Model Algorithm and dataset metrics Fitting smoothing splines Algorithm strengths and weaknesses Airt portfolio Evaluate portfolios General block AIRT output
  • 42. An example from ASlib data repository 42
  • 43. โ€ข SAT11-individual example โ€ข Example where airt doesnโ€™t do so well. โ€ข The curves are quite similar to each other โ€ข Reason to believe SAT11 has pre-selected algorithms 43