SlideShare a Scribd company logo
1 of 43
Explainable
algorithm
evaluation from
lessons in education
Sevvandi Kandanaarachchi
Work with Kate Smith-Miles
MODSIM 2023
1
What is an explanation?
• “To explain an event is to provide some information about its
causal history.” – Lewis , 1986 (Causal Explanation)
• “A statement or an account that makes something clear” –
Google
• “It is important to note that the solution to explainable AI is not
just ‘more AI’ “ - Miller, 2019
2
Explanation in the social sciences
• Miller (2019) argues for Social Science + Computer Science in XAI
“In the fields of philosophy, cognitive psychology/science, and social
psychology, there is a vast and mature body of work that studies these
exact topics. For millennia, philosophers have asked the questions
about what constitutes an explanation, what is the function of
explanations, and what are their structure. For over 50 years, cognitive
and social psychologists have analysed how people attribute and
evaluate the social behaviour of others in physical environments. For
over two decades, cognitive psychologists and scientists have
investigated how people generate explanations and how they evaluate
their quality.
I argue here that there is considerable scope to infuse this valuable
body of research into explainable AI.”
3
Our contribution: bring methods
from social sciences for algorithm
evaluation
Evaluate algorithm performance in a way that we understand
algorithms and problems better!
4
What is algorithm evaluation?
• Performance of many algorithms to many problems
• How do you explain the algorithm performance?
• Standard statistical analysis misses many things
5
Algo 1 Algo 2 Algo 3 Algo 4
Problem 1
Problem 2
Problem 3
Problem 4
Problem 5
Item Response Theory (IRT)
• Models used in social sciences/psychometrics
• Unobservable characteristics and observed outcomes
• Verbal or mathematical ability
• Racial prejudice or stress proneness
• Political inclinations
• Intrinsic “quality” that cannot be measured directly
6
This Photo by Unknown Author is licensed under CC BY-SA
IRT in education
• Finds the discrimination and difficulty of test
questions
• And the ability of the test participants
• By fitting an IRT model
• In education – questions that can discriminate
between students with different ability is
preferred to “very difficult” questions.
7
How it works
8
Questions
Students Q 1 Q 2 Q 3 Q 4
Stu 1 0.95 0.87 0.67 0.84
Stu 2 0.57 0.49 0.78 0.77
Stu n 0.75 0.86 0.57 0.45
IRT Model
Discrimination of questions 𝛼𝑗
Difficulty of questions 𝛽𝑗
Ability of students 𝜃𝑖 (latent trait)
Matrix 𝑌𝑁×𝑛
Finds the discrimination and
difficulty of test questions
The causal understanding in traditional IRT
9
𝛼𝑗 𝛽𝑗 𝜃𝑖
𝑥𝑖𝑗
Discrimination of Q j Difficulty of Q j Ability of student i
Marks of student i for Question j
Student
Question
Marks
The mapping to algorithm evaluation
10
𝛼𝑗 𝛽𝑗 𝜃𝑖
𝑥𝑖𝑗
Discrimination of Algo j Difficulty of Algo j Ability of Dataset i
Marks of dataset i for algorithm j
Dataset
Algorithm
Performance
After fitting the model, we get . . .
• Algorithm metrics
• Discrimination parameter -> Consistency, anomalousness indicator
• Difficulty parameter -> Difficulty limit of algorithm
• Dataset metrics
• Ability scores -> Difficulty of datasets
• AIRT – Algorithmic IRT
• Scottish word – to guide
11
What can we say . . .
• Consistent algorithms give similar performance for easy or hard datasets
• Algorithms with higher difficulty limits can handle harder problems
• Anomalous algorithms give bad performance for easy problems and good
performance for difficult problems
• Difficult datasets give poorer performances to algorithms compared to
easier datasets
• General algorithm evaluation has best algorithm, or best 5 algorithms
12
Anomaly detection
algorithms 13
Algorithm Consistency Difficulty Limit Anomalous
Ensemble 55.08 -66.55 FALSE
LOF 4.50 5.10 FALSE
KNN 1.72 2.30 FALSE
FAST_ABOD 9.39 10.23 FALSE
Isolation Forest 2.35 3.05 FALSE
KDEOS_1 0.86 -0.31 TRUE
KDEOS-2 1.16 -0.51 TRUE
LDF 2.01 2.08 FALSE
Dataset difficulty spectrum
IRT Student Ability - > Dataset Difficulty
14
Focusing on algorithm performance
• We have algorithm performance (y axis)
and problem difficulty (x axis)
• We can fit a model and find how each
algorithm performs
• We use smoothing splines
• Can visualize them
• No parameters to specify
15
16
Strengths and
weaknesses
AIRT framework
17
Performance
data
IRT Model
Algorithm and
dataset metrics
Fitting
smoothing
splines
Algorithm
strengths and
weaknesses
Airt portfolio
Evaluate
portfolios
General block
AIRT output
18
AIRT
IRT
Question and
student
characteristics
Algorithm and
dataset
characteristics
Visualize strengths
and weaknesses
Portfolio
construction
AIRT performs well, when . . .
• The set of algorithms is diverse.
• Ties back to IRT basics
• IRT in education – If all the questions are equally discriminative and
difficult, IRT doesn’t add much
• IRT useful when we have a diverse set of questions and we want to
know
• Which questions are more discriminative
• Which questions are difficult
19
Summary
• Understand more about algorithms
• Anomalousness, consistency, difficulty limit
• Visualise the strengths and weaknesses of algorithms
• Select a portfolio of good algorithms
• Includes diagnostics
• R package airt on CRAN (uses EstCRM)
• https://sevvandi.github.io/airt/
• Pre-print: http://bit.ly/algorithmirt
• Comprehensive Algorithm Portfolio Evaluation using Item Response Theory
• More applications included
20
We are hiring! – FairML Research
• CSIRO Postdoctoral Fellowship in Fairness
Research in Machine Learning
• Salary Range: AU$92,624 to AU$101,459 pa
• plus up to 15.4% superannuation
• 3-year contract
• https://jobs.csiro.au/go/CERC-Postdoctoral-and-
Engineering-Fellowships/7829300/
• Job will be advertised in August
22
IRT in Data Science/Machine Learning
• Relatively new area of research
• Seminal paper
• 2019 - Item response theory in AI: Analysing machine learning
classifiers at the instance level – F. Martínez-Plumed et al.
23
Dichotomous IRT
• Multiple choice
• True or false
• 𝜙 𝑥𝑖𝑗 = 1 𝜃𝑖, 𝛼𝑗, 𝑑𝑗, 𝛾𝑗 = 𝛾𝑗 +
1 −𝛾𝑗
1+exp(−𝛼𝑗(𝜃𝑖−𝑑𝑗))
• 𝑥𝑖𝑗 - outcome/score of examinee 𝑖 for item 𝑗
• 𝜃𝑖 - examinee’s (𝑖) ability
• 𝛾𝑗 - guessing parameter for item 𝑗
• 𝑑𝑗 - difficulty parameter
• 𝛼𝑗 - discrimination
This Photo by Unknown Author is licensed under CC BY-NC
24
Continuous IRT
• Grades out of 100
• A 2D surface of probabilities
• 𝑃(𝑥𝑖𝑗|𝜃𝑖, 𝑑𝑗, 𝛼𝑗)
25
𝑃(𝑥𝑖𝑗|𝜃𝑖, 𝑑𝑗, 𝛼𝑗)
Continuous IRT
• 𝑃(𝑥𝑖𝑗|𝜃𝑖, 𝑑𝑗, 𝛼𝑗)
• At a fixed 𝜃
26
𝑃(𝑥𝑖𝑗|𝜃𝑖, 𝑑𝑗, 𝛼𝑗)
𝜃 = 2
𝜃 = −3,
x axis is the normalized score
Fitting the IRT model
• Maximising the expectation
• 𝐸 = 𝑁 𝑗(ln 𝛼𝑗 + ln |𝛾𝑗|) − 1/ 2 𝑖 𝑗 𝛼𝑗
2
𝛽𝑗 + 𝛾𝑗𝑧𝑖𝑗 − 𝜇𝑖
𝑡
2
+
Mapping algorithm evaluation to IRT
IRT Model
Students
Test questions
28
Problems/datasets
Algorithms
What happens to the IRT parameters?
• IRT - 𝜃𝑖 - ability of student 𝑖
• As 𝜃 increases probability of a
higher score increases
• What is 𝜃𝑖, in terms of a
dataset?
• 𝜃𝑖 easiness of the dataset
• 𝛿𝑖 = −𝜃𝑖
• 𝛿𝑖 Dataset difficulty score
29
Discrimination parameter
• Discrimination of item 𝑗 = 𝛼𝑗
• 𝛼𝑗increases → slope of curve
increases
• What is 𝛼𝑗, in terms of an
algorithm?
• 𝛼𝑗- lack of stability/robustness
of algo
• (1/|𝛼𝑗|) Consistency of algo
30
Consistent algorithms
• Education – such a question
doesn’t give any information
• Algorithms – these algorithms
are really stable or consistent
• Consistency = 1/|𝛼𝑗|
31
Anomalous algorithms
• Algorithms that perform poorly
on easy datasets and well on
difficult datasets
• Negative discrimination
• In education – such items are
discarded or revised
• If an algorithm anomalous, it is
interesting
• Anomalousness = sign(𝛼𝑗)
This Photo by Unknown Author is licensed under CC BY-NC-ND
32
Example – fitting IRT model
• Graph Colouring algorithms (Smith-Miles et al 2014)
• 8 graph colouring algorithms
• RandomGreedy, DSATUR, Bktr, HillClimber, HEA, PartialCol, TabuCol, AntCol
• How many colours did each algorithm use to colour 6712 graphs
33
Graph colouring
algorithms
34
Algorithm Consistency Difficulty Limit Anomalous
Random Greedy 1.73 1.96 FALSE
DSATUR 0.57 1.79 FALSE
Bktr 0.32 1.97 FALSE
Hill Climber 0.68 3.18 FALSE
HEA 7.76 87.53 FALSE
PartialCol 2.25 19.21 FALSE
TabuCol 4.68 65.87 FALSE
AntCol 3.09 11.61 FALSE
Dataset difficulty spectrum
35
Focusing on algorithm performance
• We have algorithm performance (y axis)
and problem difficulty (x axis)
• We can fit a model and find how each
algorithm performs
• We use smoothing splines
• Can visualize them
• No parameters to specify
36
The smoothing splines
37
Strengths and weaknesses of algorithms
• If the curve is on top – it is strong in that region
• If the curve is at bottom – weak in that region
• ℎ𝑗∗ 𝛿 = max
𝑗
ℎ𝑗(𝛿) - this is the best performance for a given
difficulty
• 𝑗 ∗ is the best algorithm for that difficulty
• 𝑠𝑡𝑟𝑒𝑛𝑔𝑡ℎ𝑠 𝑗, 𝜖 = { 𝛿: ℎ𝑗 𝛿 − ℎ𝑗∗ 𝛿 ≤ 𝜖}
• We give 𝜖 leeway
• Weaknesses are similar
38
Strengths and
weaknesses
39
Algorithm portfolio selection
• Can use algorithm strengths to select a good portfolio of algorithms
• We call this portfolio airt portfolio
• airt – Algorithm IRT (old Scottish word – to guide)
• 𝑎𝑖𝑟𝑡 𝑝𝑜𝑟𝑡𝑓𝑜𝑙𝑖𝑜 𝜖 = 𝑠𝑒𝑡 𝑜𝑓 𝑎𝑙𝑔𝑜𝑟𝑖𝑡ℎ𝑚𝑠 𝑤𝑖𝑡ℎ 𝑠𝑡𝑟𝑒𝑛𝑔𝑡ℎ𝑠 (𝜖)
40
Evaluating this portfolio
• Let 𝑖 denote a problem, 𝑃 a portfolio of algorithms, 𝐹 the full set of
algorithms
• 𝑃𝑒𝑟𝑓𝑜𝑟𝑚𝑎𝑛𝑐𝑒. 𝑑𝑒𝑡𝑒𝑟𝑖𝑜𝑟𝑎𝑡𝑖𝑜𝑛 𝑖, 𝑃 = 𝑏𝑒𝑠𝑡. 𝑝𝑒𝑟𝑓𝑜𝑟𝑚𝑎𝑛𝑐𝑒 𝑖, 𝐹 −
𝑏𝑒𝑠𝑡. 𝑝𝑒𝑟𝑓𝑜𝑟𝑚𝑎𝑛𝑐𝑒(𝑖, 𝑃)
Graph colouring example
41
An example from ASlib data repository
42
• SAT11-individual
example
• Example where
airt doesn’t do
so well.
• The curves are
quite similar to
each other
• Reason to
believe SAT11
has pre-selected
algorithms
43

More Related Content

Similar to Explainable algorithm evaluation from lessons in education

Evaluating algorithms using Item Response Theory
Evaluating algorithms using Item Response TheoryEvaluating algorithms using Item Response Theory
Evaluating algorithms using Item Response TheoryCSIRO
 
background.pptx
background.pptxbackground.pptx
background.pptxKabileshCm
 
Factor Analysis and Correspondence Analysis Composite and Indicator Scores of...
Factor Analysis and Correspondence Analysis Composite and Indicator Scores of...Factor Analysis and Correspondence Analysis Composite and Indicator Scores of...
Factor Analysis and Correspondence Analysis Composite and Indicator Scores of...Matthew Powers
 
Toward Recommendation for Upskilling: Modeling Skill Improvement and Item Dif...
Toward Recommendation for Upskilling: Modeling Skill Improvement and Item Dif...Toward Recommendation for Upskilling: Modeling Skill Improvement and Item Dif...
Toward Recommendation for Upskilling: Modeling Skill Improvement and Item Dif...Kazutoshi Umemoto
 
林守德/Practical Issues in Machine Learning
林守德/Practical Issues in Machine Learning林守德/Practical Issues in Machine Learning
林守德/Practical Issues in Machine Learning台灣資料科學年會
 
Machine Learning Essentials Demystified part1 | Big Data Demystified
Machine Learning Essentials Demystified part1 | Big Data DemystifiedMachine Learning Essentials Demystified part1 | Big Data Demystified
Machine Learning Essentials Demystified part1 | Big Data DemystifiedOmid Vahdaty
 
Week_1 Machine Learning introduction.pptx
Week_1 Machine Learning introduction.pptxWeek_1 Machine Learning introduction.pptx
Week_1 Machine Learning introduction.pptxmuhammadsamroz
 
Machine learning ppt unit one syllabuspptx
Machine learning ppt unit one syllabuspptxMachine learning ppt unit one syllabuspptx
Machine learning ppt unit one syllabuspptxVenkateswaraBabuRavi
 
introduction to machine learning 3c-feature-extraction.pptx
introduction to machine learning 3c-feature-extraction.pptxintroduction to machine learning 3c-feature-extraction.pptx
introduction to machine learning 3c-feature-extraction.pptxPratik Gohel
 
Anomalies! You can't escape them.
Anomalies! You can't escape them.Anomalies! You can't escape them.
Anomalies! You can't escape them.CSIRO
 
Application of Machine Learning in Agriculture
Application of Machine  Learning in AgricultureApplication of Machine  Learning in Agriculture
Application of Machine Learning in AgricultureAman Vasisht
 
ML SFCSE.pptx
ML SFCSE.pptxML SFCSE.pptx
ML SFCSE.pptxNIKHILGR3
 
Machine Learning Lecture 2 Basics
Machine Learning Lecture 2 BasicsMachine Learning Lecture 2 Basics
Machine Learning Lecture 2 Basicsananth
 
04 Classification in Data Mining
04 Classification in Data Mining04 Classification in Data Mining
04 Classification in Data MiningValerii Klymchuk
 
MLSEV Virtual. State of the Art in ML
MLSEV Virtual. State of the Art in MLMLSEV Virtual. State of the Art in ML
MLSEV Virtual. State of the Art in MLBigML, Inc
 

Similar to Explainable algorithm evaluation from lessons in education (20)

Evaluating algorithms using Item Response Theory
Evaluating algorithms using Item Response TheoryEvaluating algorithms using Item Response Theory
Evaluating algorithms using Item Response Theory
 
background.pptx
background.pptxbackground.pptx
background.pptx
 
Mini datathon
Mini datathonMini datathon
Mini datathon
 
Factor Analysis and Correspondence Analysis Composite and Indicator Scores of...
Factor Analysis and Correspondence Analysis Composite and Indicator Scores of...Factor Analysis and Correspondence Analysis Composite and Indicator Scores of...
Factor Analysis and Correspondence Analysis Composite and Indicator Scores of...
 
Toward Recommendation for Upskilling: Modeling Skill Improvement and Item Dif...
Toward Recommendation for Upskilling: Modeling Skill Improvement and Item Dif...Toward Recommendation for Upskilling: Modeling Skill Improvement and Item Dif...
Toward Recommendation for Upskilling: Modeling Skill Improvement and Item Dif...
 
林守德/Practical Issues in Machine Learning
林守德/Practical Issues in Machine Learning林守德/Practical Issues in Machine Learning
林守德/Practical Issues in Machine Learning
 
Machine Learning Essentials Demystified part1 | Big Data Demystified
Machine Learning Essentials Demystified part1 | Big Data DemystifiedMachine Learning Essentials Demystified part1 | Big Data Demystified
Machine Learning Essentials Demystified part1 | Big Data Demystified
 
Week_1 Machine Learning introduction.pptx
Week_1 Machine Learning introduction.pptxWeek_1 Machine Learning introduction.pptx
Week_1 Machine Learning introduction.pptx
 
Machine learning ppt unit one syllabuspptx
Machine learning ppt unit one syllabuspptxMachine learning ppt unit one syllabuspptx
Machine learning ppt unit one syllabuspptx
 
introduction to machine learning 3c-feature-extraction.pptx
introduction to machine learning 3c-feature-extraction.pptxintroduction to machine learning 3c-feature-extraction.pptx
introduction to machine learning 3c-feature-extraction.pptx
 
Anomalies! You can't escape them.
Anomalies! You can't escape them.Anomalies! You can't escape them.
Anomalies! You can't escape them.
 
Application of Machine Learning in Agriculture
Application of Machine  Learning in AgricultureApplication of Machine  Learning in Agriculture
Application of Machine Learning in Agriculture
 
ML SFCSE.pptx
ML SFCSE.pptxML SFCSE.pptx
ML SFCSE.pptx
 
Machine Learning Lecture 2 Basics
Machine Learning Lecture 2 BasicsMachine Learning Lecture 2 Basics
Machine Learning Lecture 2 Basics
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Mini_Project
Mini_ProjectMini_Project
Mini_Project
 
Weka bike rental
Weka bike rentalWeka bike rental
Weka bike rental
 
04 Classification in Data Mining
04 Classification in Data Mining04 Classification in Data Mining
04 Classification in Data Mining
 
AUC: at what cost(s)?
AUC: at what cost(s)?AUC: at what cost(s)?
AUC: at what cost(s)?
 
MLSEV Virtual. State of the Art in ML
MLSEV Virtual. State of the Art in MLMLSEV Virtual. State of the Art in ML
MLSEV Virtual. State of the Art in ML
 

More from CSIRO

The painful removal of tiling artefacts in hypersprectral data
The painful removal of tiling artefacts in hypersprectral dataThe painful removal of tiling artefacts in hypersprectral data
The painful removal of tiling artefacts in hypersprectral dataCSIRO
 
The painful removal of tiling artefacts in ToF-SIMS data
The painful removal of tiling artefacts in ToF-SIMS dataThe painful removal of tiling artefacts in ToF-SIMS data
The painful removal of tiling artefacts in ToF-SIMS dataCSIRO
 
Sophisticated tools for spatio-temporal data exploration
Sophisticated tools for spatio-temporal data explorationSophisticated tools for spatio-temporal data exploration
Sophisticated tools for spatio-temporal data explorationCSIRO
 
A time series of networks. Is everything OK? Are there anomalies?
A time series of networks. Is everything OK? Are there anomalies?A time series of networks. Is everything OK? Are there anomalies?
A time series of networks. Is everything OK? Are there anomalies?CSIRO
 
Anomalous Networks
Anomalous NetworksAnomalous Networks
Anomalous NetworksCSIRO
 
Four, fast geostatistical methods - a comparison
Four, fast geostatistical methods - a comparisonFour, fast geostatistical methods - a comparison
Four, fast geostatistical methods - a comparisonCSIRO
 
Comparison of geostatistical methods for spatial data
Comparison of geostatistical methods for spatial dataComparison of geostatistical methods for spatial data
Comparison of geostatistical methods for spatial dataCSIRO
 
From ensembles to computer networks
From ensembles to computer networksFrom ensembles to computer networks
From ensembles to computer networksCSIRO
 
Anomalies and events keep us on our toes
Anomalies and events keep us on our toesAnomalies and events keep us on our toes
Anomalies and events keep us on our toesCSIRO
 
Mathematics of anomalies
Mathematics of anomaliesMathematics of anomalies
Mathematics of anomaliesCSIRO
 
Here is the anomalow-down!
Here is the anomalow-down!Here is the anomalow-down!
Here is the anomalow-down!CSIRO
 
Looking out for anomalies
Looking out for anomaliesLooking out for anomalies
Looking out for anomaliesCSIRO
 

More from CSIRO (12)

The painful removal of tiling artefacts in hypersprectral data
The painful removal of tiling artefacts in hypersprectral dataThe painful removal of tiling artefacts in hypersprectral data
The painful removal of tiling artefacts in hypersprectral data
 
The painful removal of tiling artefacts in ToF-SIMS data
The painful removal of tiling artefacts in ToF-SIMS dataThe painful removal of tiling artefacts in ToF-SIMS data
The painful removal of tiling artefacts in ToF-SIMS data
 
Sophisticated tools for spatio-temporal data exploration
Sophisticated tools for spatio-temporal data explorationSophisticated tools for spatio-temporal data exploration
Sophisticated tools for spatio-temporal data exploration
 
A time series of networks. Is everything OK? Are there anomalies?
A time series of networks. Is everything OK? Are there anomalies?A time series of networks. Is everything OK? Are there anomalies?
A time series of networks. Is everything OK? Are there anomalies?
 
Anomalous Networks
Anomalous NetworksAnomalous Networks
Anomalous Networks
 
Four, fast geostatistical methods - a comparison
Four, fast geostatistical methods - a comparisonFour, fast geostatistical methods - a comparison
Four, fast geostatistical methods - a comparison
 
Comparison of geostatistical methods for spatial data
Comparison of geostatistical methods for spatial dataComparison of geostatistical methods for spatial data
Comparison of geostatistical methods for spatial data
 
From ensembles to computer networks
From ensembles to computer networksFrom ensembles to computer networks
From ensembles to computer networks
 
Anomalies and events keep us on our toes
Anomalies and events keep us on our toesAnomalies and events keep us on our toes
Anomalies and events keep us on our toes
 
Mathematics of anomalies
Mathematics of anomaliesMathematics of anomalies
Mathematics of anomalies
 
Here is the anomalow-down!
Here is the anomalow-down!Here is the anomalow-down!
Here is the anomalow-down!
 
Looking out for anomalies
Looking out for anomaliesLooking out for anomalies
Looking out for anomalies
 

Recently uploaded

Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home ServiceSapana Sha
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一F La
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAmazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAbdelrhman abooda
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理e4aez8ss
 

Recently uploaded (20)

Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAmazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 

Explainable algorithm evaluation from lessons in education

  • 1. Explainable algorithm evaluation from lessons in education Sevvandi Kandanaarachchi Work with Kate Smith-Miles MODSIM 2023 1
  • 2. What is an explanation? • “To explain an event is to provide some information about its causal history.” – Lewis , 1986 (Causal Explanation) • “A statement or an account that makes something clear” – Google • “It is important to note that the solution to explainable AI is not just ‘more AI’ “ - Miller, 2019 2
  • 3. Explanation in the social sciences • Miller (2019) argues for Social Science + Computer Science in XAI “In the fields of philosophy, cognitive psychology/science, and social psychology, there is a vast and mature body of work that studies these exact topics. For millennia, philosophers have asked the questions about what constitutes an explanation, what is the function of explanations, and what are their structure. For over 50 years, cognitive and social psychologists have analysed how people attribute and evaluate the social behaviour of others in physical environments. For over two decades, cognitive psychologists and scientists have investigated how people generate explanations and how they evaluate their quality. I argue here that there is considerable scope to infuse this valuable body of research into explainable AI.” 3
  • 4. Our contribution: bring methods from social sciences for algorithm evaluation Evaluate algorithm performance in a way that we understand algorithms and problems better! 4
  • 5. What is algorithm evaluation? • Performance of many algorithms to many problems • How do you explain the algorithm performance? • Standard statistical analysis misses many things 5 Algo 1 Algo 2 Algo 3 Algo 4 Problem 1 Problem 2 Problem 3 Problem 4 Problem 5
  • 6. Item Response Theory (IRT) • Models used in social sciences/psychometrics • Unobservable characteristics and observed outcomes • Verbal or mathematical ability • Racial prejudice or stress proneness • Political inclinations • Intrinsic “quality” that cannot be measured directly 6 This Photo by Unknown Author is licensed under CC BY-SA
  • 7. IRT in education • Finds the discrimination and difficulty of test questions • And the ability of the test participants • By fitting an IRT model • In education – questions that can discriminate between students with different ability is preferred to “very difficult” questions. 7
  • 8. How it works 8 Questions Students Q 1 Q 2 Q 3 Q 4 Stu 1 0.95 0.87 0.67 0.84 Stu 2 0.57 0.49 0.78 0.77 Stu n 0.75 0.86 0.57 0.45 IRT Model Discrimination of questions 𝛼𝑗 Difficulty of questions 𝛽𝑗 Ability of students 𝜃𝑖 (latent trait) Matrix 𝑌𝑁×𝑛 Finds the discrimination and difficulty of test questions
  • 9. The causal understanding in traditional IRT 9 𝛼𝑗 𝛽𝑗 𝜃𝑖 𝑥𝑖𝑗 Discrimination of Q j Difficulty of Q j Ability of student i Marks of student i for Question j Student Question Marks
  • 10. The mapping to algorithm evaluation 10 𝛼𝑗 𝛽𝑗 𝜃𝑖 𝑥𝑖𝑗 Discrimination of Algo j Difficulty of Algo j Ability of Dataset i Marks of dataset i for algorithm j Dataset Algorithm Performance
  • 11. After fitting the model, we get . . . • Algorithm metrics • Discrimination parameter -> Consistency, anomalousness indicator • Difficulty parameter -> Difficulty limit of algorithm • Dataset metrics • Ability scores -> Difficulty of datasets • AIRT – Algorithmic IRT • Scottish word – to guide 11
  • 12. What can we say . . . • Consistent algorithms give similar performance for easy or hard datasets • Algorithms with higher difficulty limits can handle harder problems • Anomalous algorithms give bad performance for easy problems and good performance for difficult problems • Difficult datasets give poorer performances to algorithms compared to easier datasets • General algorithm evaluation has best algorithm, or best 5 algorithms 12
  • 13. Anomaly detection algorithms 13 Algorithm Consistency Difficulty Limit Anomalous Ensemble 55.08 -66.55 FALSE LOF 4.50 5.10 FALSE KNN 1.72 2.30 FALSE FAST_ABOD 9.39 10.23 FALSE Isolation Forest 2.35 3.05 FALSE KDEOS_1 0.86 -0.31 TRUE KDEOS-2 1.16 -0.51 TRUE LDF 2.01 2.08 FALSE
  • 14. Dataset difficulty spectrum IRT Student Ability - > Dataset Difficulty 14
  • 15. Focusing on algorithm performance • We have algorithm performance (y axis) and problem difficulty (x axis) • We can fit a model and find how each algorithm performs • We use smoothing splines • Can visualize them • No parameters to specify 15
  • 17. AIRT framework 17 Performance data IRT Model Algorithm and dataset metrics Fitting smoothing splines Algorithm strengths and weaknesses Airt portfolio Evaluate portfolios General block AIRT output
  • 19. AIRT performs well, when . . . • The set of algorithms is diverse. • Ties back to IRT basics • IRT in education – If all the questions are equally discriminative and difficult, IRT doesn’t add much • IRT useful when we have a diverse set of questions and we want to know • Which questions are more discriminative • Which questions are difficult 19
  • 20. Summary • Understand more about algorithms • Anomalousness, consistency, difficulty limit • Visualise the strengths and weaknesses of algorithms • Select a portfolio of good algorithms • Includes diagnostics • R package airt on CRAN (uses EstCRM) • https://sevvandi.github.io/airt/ • Pre-print: http://bit.ly/algorithmirt • Comprehensive Algorithm Portfolio Evaluation using Item Response Theory • More applications included 20
  • 21. We are hiring! – FairML Research • CSIRO Postdoctoral Fellowship in Fairness Research in Machine Learning • Salary Range: AU$92,624 to AU$101,459 pa • plus up to 15.4% superannuation • 3-year contract • https://jobs.csiro.au/go/CERC-Postdoctoral-and- Engineering-Fellowships/7829300/ • Job will be advertised in August
  • 22. 22
  • 23. IRT in Data Science/Machine Learning • Relatively new area of research • Seminal paper • 2019 - Item response theory in AI: Analysing machine learning classifiers at the instance level – F. Martínez-Plumed et al. 23
  • 24. Dichotomous IRT • Multiple choice • True or false • 𝜙 𝑥𝑖𝑗 = 1 𝜃𝑖, 𝛼𝑗, 𝑑𝑗, 𝛾𝑗 = 𝛾𝑗 + 1 −𝛾𝑗 1+exp(−𝛼𝑗(𝜃𝑖−𝑑𝑗)) • 𝑥𝑖𝑗 - outcome/score of examinee 𝑖 for item 𝑗 • 𝜃𝑖 - examinee’s (𝑖) ability • 𝛾𝑗 - guessing parameter for item 𝑗 • 𝑑𝑗 - difficulty parameter • 𝛼𝑗 - discrimination This Photo by Unknown Author is licensed under CC BY-NC 24
  • 25. Continuous IRT • Grades out of 100 • A 2D surface of probabilities • 𝑃(𝑥𝑖𝑗|𝜃𝑖, 𝑑𝑗, 𝛼𝑗) 25 𝑃(𝑥𝑖𝑗|𝜃𝑖, 𝑑𝑗, 𝛼𝑗)
  • 26. Continuous IRT • 𝑃(𝑥𝑖𝑗|𝜃𝑖, 𝑑𝑗, 𝛼𝑗) • At a fixed 𝜃 26 𝑃(𝑥𝑖𝑗|𝜃𝑖, 𝑑𝑗, 𝛼𝑗) 𝜃 = 2 𝜃 = −3, x axis is the normalized score
  • 27. Fitting the IRT model • Maximising the expectation • 𝐸 = 𝑁 𝑗(ln 𝛼𝑗 + ln |𝛾𝑗|) − 1/ 2 𝑖 𝑗 𝛼𝑗 2 𝛽𝑗 + 𝛾𝑗𝑧𝑖𝑗 − 𝜇𝑖 𝑡 2 +
  • 28. Mapping algorithm evaluation to IRT IRT Model Students Test questions 28 Problems/datasets Algorithms
  • 29. What happens to the IRT parameters? • IRT - 𝜃𝑖 - ability of student 𝑖 • As 𝜃 increases probability of a higher score increases • What is 𝜃𝑖, in terms of a dataset? • 𝜃𝑖 easiness of the dataset • 𝛿𝑖 = −𝜃𝑖 • 𝛿𝑖 Dataset difficulty score 29
  • 30. Discrimination parameter • Discrimination of item 𝑗 = 𝛼𝑗 • 𝛼𝑗increases → slope of curve increases • What is 𝛼𝑗, in terms of an algorithm? • 𝛼𝑗- lack of stability/robustness of algo • (1/|𝛼𝑗|) Consistency of algo 30
  • 31. Consistent algorithms • Education – such a question doesn’t give any information • Algorithms – these algorithms are really stable or consistent • Consistency = 1/|𝛼𝑗| 31
  • 32. Anomalous algorithms • Algorithms that perform poorly on easy datasets and well on difficult datasets • Negative discrimination • In education – such items are discarded or revised • If an algorithm anomalous, it is interesting • Anomalousness = sign(𝛼𝑗) This Photo by Unknown Author is licensed under CC BY-NC-ND 32
  • 33. Example – fitting IRT model • Graph Colouring algorithms (Smith-Miles et al 2014) • 8 graph colouring algorithms • RandomGreedy, DSATUR, Bktr, HillClimber, HEA, PartialCol, TabuCol, AntCol • How many colours did each algorithm use to colour 6712 graphs 33
  • 34. Graph colouring algorithms 34 Algorithm Consistency Difficulty Limit Anomalous Random Greedy 1.73 1.96 FALSE DSATUR 0.57 1.79 FALSE Bktr 0.32 1.97 FALSE Hill Climber 0.68 3.18 FALSE HEA 7.76 87.53 FALSE PartialCol 2.25 19.21 FALSE TabuCol 4.68 65.87 FALSE AntCol 3.09 11.61 FALSE
  • 36. Focusing on algorithm performance • We have algorithm performance (y axis) and problem difficulty (x axis) • We can fit a model and find how each algorithm performs • We use smoothing splines • Can visualize them • No parameters to specify 36
  • 38. Strengths and weaknesses of algorithms • If the curve is on top – it is strong in that region • If the curve is at bottom – weak in that region • ℎ𝑗∗ 𝛿 = max 𝑗 ℎ𝑗(𝛿) - this is the best performance for a given difficulty • 𝑗 ∗ is the best algorithm for that difficulty • 𝑠𝑡𝑟𝑒𝑛𝑔𝑡ℎ𝑠 𝑗, 𝜖 = { 𝛿: ℎ𝑗 𝛿 − ℎ𝑗∗ 𝛿 ≤ 𝜖} • We give 𝜖 leeway • Weaknesses are similar 38
  • 40. Algorithm portfolio selection • Can use algorithm strengths to select a good portfolio of algorithms • We call this portfolio airt portfolio • airt – Algorithm IRT (old Scottish word – to guide) • 𝑎𝑖𝑟𝑡 𝑝𝑜𝑟𝑡𝑓𝑜𝑙𝑖𝑜 𝜖 = 𝑠𝑒𝑡 𝑜𝑓 𝑎𝑙𝑔𝑜𝑟𝑖𝑡ℎ𝑚𝑠 𝑤𝑖𝑡ℎ 𝑠𝑡𝑟𝑒𝑛𝑔𝑡ℎ𝑠 (𝜖) 40
  • 41. Evaluating this portfolio • Let 𝑖 denote a problem, 𝑃 a portfolio of algorithms, 𝐹 the full set of algorithms • 𝑃𝑒𝑟𝑓𝑜𝑟𝑚𝑎𝑛𝑐𝑒. 𝑑𝑒𝑡𝑒𝑟𝑖𝑜𝑟𝑎𝑡𝑖𝑜𝑛 𝑖, 𝑃 = 𝑏𝑒𝑠𝑡. 𝑝𝑒𝑟𝑓𝑜𝑟𝑚𝑎𝑛𝑐𝑒 𝑖, 𝐹 − 𝑏𝑒𝑠𝑡. 𝑝𝑒𝑟𝑓𝑜𝑟𝑚𝑎𝑛𝑐𝑒(𝑖, 𝑃) Graph colouring example 41
  • 42. An example from ASlib data repository 42
  • 43. • SAT11-individual example • Example where airt doesn’t do so well. • The curves are quite similar to each other • Reason to believe SAT11 has pre-selected algorithms 43