•

0 likes•10 views

Machine Learning (ML) and Artificial Intelligence (AI) have made great strides in this decade. We have a plethora of ML algorithms that can be used to perform a given task, be it face recognition, image classification or natural language processing. However, explainability of ML/AI algorithms remains a big problem. Explainable AI (XAI) is a branch of ML that is devoted to unravelling the black-box nature of AI so that we understand the reasons behind the decisions/output. However, there are concerns that XAI sometimes produce “tools for computer scientists to explain things to other computer scientists”, which defeats its purpose. To this end, a growing number of researchers have called for integration with social sciences to make truly explainable and trustworthy AI, because philosophy and social sciences have debated the meaning and function of an explanation for millennia and have deeper insights1. In this talk, we present such an integration2. Our problem domain is algorithm evaluation, which considers a portfolio of algorithms and its performance on a set of problems. For example, it can be a portfolio of regression algorithms. The goal is to understand meaningful, explainable insights about the algorithms from the performance results. As the social science linkage, we use Item Response Theory (IRT), a methodology from educational psychometrics. IRT is traditionally used to evaluate the difficulty and discrimination of test questions and the ability of students and has causal interpretations. Using IRT we obtain explainable insights about algorithms relating to their stable/consistent nature, the difficulty level of problems they can handle and their behaviour. In addition, we visualise the problem spectrum and find regions on the spectrum where algorithms exhibit strengths. The causal interpretations of IRT transfer to the algorithm evaluation domain as we gain a deeper understanding of algorithms. References 1. Miller, T. Explanation in artificial intelligence: Insights from the social sciences. Artif Intell 267, 1–38 (2019). 2. Kandanaarachchi, S. & Smith-Miles, K. Comprehensive Algorithm Portfolio Evaluation using Item Response Theory. Journal of Machine Learning Research 24, 1–52 (2023).

Report

Share

Report

Share

Download to read offline

Los sistemas educacionales actuales aun se centran fuertemente en la evaluación de contenidos a pesar de que vivimos en la era de la información y mediante métodos tradicionales como evaluaciones estandarizadas que estresan a los estudiantes. Los juegos serios representan oportunidades educacionales de gran impacto al representar entornos más realistas, que capturan gran cantidad de datos sobre el proceso que siguen los estudiantes y que son más disfrutables y relajados que los exámenes. Estos datos, combinados con técnicas de analítica de aprendizaje, representan gran potencial para construir modelos que permitan la evaluación de competencias claves para la sociedad del siglo 21 a través de juegos serios, estas evaluaciones se implementan de forma indirecta en lo que se conoce como Stealth Assessment (evaluación fantasma), para evitar interrumpir el flujo de juego. En este charla veremos una metodología que se basa en tres etapas – Diseño, Implementación y Evaluación – para la implementación de sistemas de evaluación a través de juegos.Aplicando Analítica de Aprendizaje para la Evaluación de Competencias y Compo...

Aplicando Analítica de Aprendizaje para la Evaluación de Competencias y Compo...Facultad de Informática UCM

Los sistemas educacionales actuales aun se centran fuertemente en la evaluación de contenidos a pesar de que vivimos en la era de la información y mediante métodos tradicionales como evaluaciones estandarizadas que estresan a los estudiantes. Los juegos serios representan oportunidades educacionales de gran impacto al representar entornos más realistas, que capturan gran cantidad de datos sobre el proceso que siguen los estudiantes y que son más disfrutables y relajados que los exámenes. Estos datos, combinados con técnicas de analítica de aprendizaje, representan gran potencial para construir modelos que permitan la evaluación de competencias claves para la sociedad del siglo 21 a través de juegos serios, estas evaluaciones se implementan de forma indirecta en lo que se conoce como Stealth Assessment (evaluación fantasma), para evitar interrumpir el flujo de juego. En este charla veremos una metodología que se basa en tres etapas – Diseño, Implementación y Evaluación – para la implementación de sistemas de evaluación a través de juegos.Aplicando Analítica de Aprendizaje para la Evaluación de Competencias y Compo...

Aplicando Analítica de Aprendizaje para la Evaluación de Competencias y Compo...Facultad de Informática UCM

林守德/Practical Issues in Machine Learning

林守德/Practical Issues in Machine Learning

Toward Recommendation for Upskilling: Modeling Skill Improvement and Item Dif...

Toward Recommendation for Upskilling: Modeling Skill Improvement and Item Dif...

Teacher-Aware Active Robot Learning

Teacher-Aware Active Robot Learning

From ensembles to computer networks

From ensembles to computer networks

EDA and Preprocessing in Tabular and Text data .pptx

EDA and Preprocessing in Tabular and Text data .pptx

Cara apakah a raven matriks uji

Cara apakah a raven matriks uji

Machine Learning Essentials Demystified part1 | Big Data Demystified

Machine Learning Essentials Demystified part1 | Big Data Demystified

Machine learning ppt unit one syllabuspptx

Machine learning ppt unit one syllabuspptx

joe beck cald talk.ppt

joe beck cald talk.ppt

Ch17 lab r_verdu103: Entry level statistics exercise (descriptives)

Ch17 lab r_verdu103: Entry level statistics exercise (descriptives)

Week 1.pdf

Week 1.pdf

Analysing & interpreting data.ppt

Analysing & interpreting data.ppt

Mini datathon

Mini datathon

Research Methodology (RM)- Scaling Techniques- MBA

Research Methodology (RM)- Scaling Techniques- MBA

Introduction to Boosted Trees by Tianqi Chen

Introduction to Boosted Trees by Tianqi Chen

Testing Scientific Thinking Skills protocol

Testing Scientific Thinking Skills protocol

Aplicando Analítica de Aprendizaje para la Evaluación de Competencias y Compo...

Aplicando Analítica de Aprendizaje para la Evaluación de Competencias y Compo...

Learning Analytics for the Evaluation of Competencies and Behaviors in Seriou...

Learning Analytics for the Evaluation of Competencies and Behaviors in Seriou...

ai4.ppt

ai4.ppt

Teaching Constraint Programming, Patrick Prosser

Teaching Constraint Programming, Patrick Prosser

The painful removal of tiling artefacts in hypersprectral data

The painful removal of tiling artefacts in hypersprectral data

The painful removal of tiling artefacts in ToF-SIMS data

The painful removal of tiling artefacts in ToF-SIMS data

Sophisticated tools for spatio-temporal data exploration

Sophisticated tools for spatio-temporal data exploration

A time series of networks. Is everything OK? Are there anomalies?

A time series of networks. Is everything OK? Are there anomalies?

Anomalous Networks

Anomalous Networks

Four, fast geostatistical methods - a comparison

Four, fast geostatistical methods - a comparison

Comparison of geostatistical methods for spatial data

Comparison of geostatistical methods for spatial data

Anomalies and events keep us on our toes

Anomalies and events keep us on our toes

Mathematics of anomalies

Mathematics of anomalies

Here is the anomalow-down!

Here is the anomalow-down!

Looking out for anomalies

Looking out for anomalies

Saudi Arabia [ Abortion pills) Jeddah/riaydh/dammam/+966572737505☎️] cytotec tablets uses abortion pills 💊💊
How effective is the abortion pill? 💊💊 +966572737505) "Abortion pills in Jeddah" how to get cytotec tablets in Riyadh " Abortion pills in dammam*💊💊
The abortion pill is very effective. If you’re taking mifepristone and misoprostol, it depends on how far along the pregnancy is, and how many doses of medicine you take:💊💊 +966572737505) how to buy cytotec pills
At 8 weeks pregnant or less, it works about 94-98% of the time. +966572737505[ 💊💊💊
At 8-9 weeks pregnant, it works about 94-96% of the time. +966572737505)
At 9-10 weeks pregnant, it works about 91-93% of the time. +966572737505)💊💊
If you take an extra dose of misoprostol, it works about 99% of the time.
At 10-11 weeks pregnant, it works about 87% of the time. +966572737505)
If you take an extra dose of misoprostol, it works about 98% of the time.
In general, taking both mifepristone and+966572737505 misoprostol works a bit better than taking misoprostol only.
+966572737505
Taking misoprostol alone works to end the+966572737505 pregnancy about 85-95% of the time — depending on how far along the+966572737505 pregnancy is and how you take the medicine.
+966572737505
The abortion pill usually works, but if it doesn’t, you can take more medicine or have an in-clinic abortion.
+966572737505
When can I take the abortion pill?+966572737505
In general, you can have a medication abortion up to 77 days (11 weeks)+966572737505 after the first day of your last period. If it’s been 78 days or more since the first day of your last+966572737505 period, you can have an in-clinic abortion to end your pregnancy.+966572737505
Why do people choose the abortion pill?
Which kind of abortion you choose all depends on your personal+966572737505 preference and situation. With+966572737505 medication+966572737505 abortion, some people like that you don’t need to have a procedure in a doctor’s office. You can have your medication abortion on your own+966572737505 schedule, at home or in another comfortable place that you choose.+966572737505 You get to decide who you want to be with during your abortion, or you can go it alone. Because+966572737505 medication abortion is similar to a miscarriage, many people feel like it’s more “natural” and less invasive. And some+966572737505 people may not have an in-clinic abortion provider close by, so abortion pills are more available to+966572737505 them.
+966572737505
Your doctor, nurse, or health center staff can help you decide which kind of abortion is best for you.
+966572737505
More questions from patients:
Saudi Arabia+966572737505
CYTOTEC Misoprostol Tablets. Misoprostol is a medication that can prevent stomach ulcers if you also take NSAID medications. It reduces the amount of acid in your stomach, which protects your stomach lining. The brand name of this medication is Cytotec®.+966573737505)
Unwanted Kit is a combination of two medicinesAbortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec

Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotecAbortion pills in Riyadh +966572737505 get cytotec

Atlantic Grupa Case Study (Mintec Data AI)

Atlantic Grupa Case Study (Mintec Data AI)

Fuzzy Sets decision making under information of uncertainty

Fuzzy Sets decision making under information of uncertainty

Slip-and-fall Injuries: Top Workers' Comp Claims

Slip-and-fall Injuries: Top Workers' Comp Claims

一比一原版阿德莱德大学毕业证成绩单如何办理

一比一原版阿德莱德大学毕业证成绩单如何办理

Exploratory Data Analysis - Dilip S.pptx

Exploratory Data Analysis - Dilip S.pptx

Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec

Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec

一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理

一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理

Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs

Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs

一比一原版麦考瑞大学毕业证成绩单如何办理

一比一原版麦考瑞大学毕业证成绩单如何办理

2024 Q1 Tableau User Group Leader Quarterly Call

2024 Q1 Tableau User Group Leader Quarterly Call

Supply chain analytics to combat the effects of Ukraine-Russia-conflict

Supply chain analytics to combat the effects of Ukraine-Russia-conflict

Easy and simple project file on mp online

Easy and simple project file on mp online

一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理

一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理

how can i exchange pi coins for others currency like Bitcoin

how can i exchange pi coins for others currency like Bitcoin

Artificial_General_Intelligence__storm_gen_article.pdf

Artificial_General_Intelligence__storm_gen_article.pdf

Pre-ProductionImproveddsfjgndflghtgg.pptx

Pre-ProductionImproveddsfjgndflghtgg.pptx

How I opened a fake bank account and didn't go to prison

How I opened a fake bank account and didn't go to prison

basics of data science with application areas.pdf

basics of data science with application areas.pdf

一比一原版纽卡斯尔大学毕业证成绩单如何办理

一比一原版纽卡斯尔大学毕业证成绩单如何办理

Machine Learning for Accident Severity Prediction

Machine Learning for Accident Severity Prediction

- 1. Explainable insights on algorithm performance Sevvandi Kandanaarachchi Work with Kate Smith-Miles 1
- 2. To explain or to predict? – Galit Shmueli ● Paper by Shmueli in 2010 ● Talks about these two topics ● Argues that explanation and prediction are different ● Two modelling paths – for predicting and explaining ● Social sciences have done explanatory models for a long time
- 3. What is an explanation? • To explain an event is to provide some information about its causal history. – Lewis, 1986 (Causal Explanation) • A statement or an account that makes something clear – Google • It is important to note that the solution to explainable AI is not just ‘more AI’ - Miller, 2019 • Miller (2019) argues for Social Science + Computer Science in XAI In the fields of philosophy, cognitive psychology/science, and social psychology, there is a vast and mature body of work that studies these exact topics. 3
- 4. Message: Bring in the social scientists to the party! Integrate their methods! 4
- 5. What is this talk about? ● Using a method in social sciences to do two ML type tasks ● Evaluate algorithms ● It gives us more meaningful metrics about algorithms ● Has some causal interpretations ● Visually inspect where algorithms perform well (and poorly) ● Ensembles ● Anomaly detection ensembles
- 6. What is algorithm evaluation? • Performance of many algorithms to many problems • How do you explain the algorithm performance? • Standard statistical analysis misses many things Algo 1 Algo 2 Algo 3 Algo 4 Problem 1 Problem 2 Problem 3 Problem 4 Problem 5 6
- 7. We want to evaluate algorithms performance . . . . . . in a way that we understand algorithms and problems better! 7
- 9. Item Response Theory (IRT) • Modelsusedinsocialsciences/psychometrics • Unobservablecharacteristicsandobserved outcomes • Verbalormathematicalability • Racialprejudiceor stressproneness • Politicalinclinations • Intrinsic“quality”thatcannotbemeasured directly 9 This Photo by Unknown Author is licensed under CC BY-SA
- 10. IRT in education • Finds the discrimination and difficulty of test questions • And the ability of the test participants • By fitting an IRT model • In education – questions that can discriminate between students with different ability is preferred to “very difficult” questions. 10
- 11. How it works 11 Questions Students Q 1 Q 2 Q 3 Q 4 Stu 1 0.95 0.87 0.67 0.84 Stu 2 0.57 0.49 0.78 0.77 Stu n 0.75 0.86 0.57 0.45 IRT Model Discrimination of questions Difficulty of questions Ability of students (latent trait) Matrix βj αj θi
- 12. What does IRT give us? • Q1 - discrimination , difficulty • Q2 - discrimination , difficulty • Q3 - discrimination , difficulty • Q4 - discrimination , difficulty • Student 1 ability ︙ • Student n ability α1 β1 α2 β2 α3 β3 α4 β4 θ1 θn Q 1 Q 2 Q 3 Q 4 Stu 1 0.95 0.87 0.67 0.84 Stu 2 0.57 0.49 0.78 0.77 Stu n 0.75 0.86 0.57 0.45 12
- 13. The causal understanding 𝛼 𝑗 𝛽 𝑗 𝜃 𝑖 𝑥 𝑖 𝑗 Discrimination of Q j Difficulty of Q j Ability of student i Marks of student i for Question j Student Question Marks 13
- 14. Dichotomous IRT ● Multiple choice ● True or false ● ● - outcome/score of examinee for item ● - examinee’s ability ● - guessing parameter for item ● - difficulty parameter ● - discrimination 𝜙 ( 𝑥 𝑖 𝑗 = 1 𝜃 𝑖 , 𝛼 𝑗 , 𝑑 𝑗 , 𝛾 𝑗 ) = 𝛾 𝑗 + 1 − 𝛾 𝑗 1 + exp( − 𝛼 𝑗 ( 𝜃 𝑖 − 𝑑 𝑗 )) 𝑥 𝑖 𝑗 𝑖𝑗𝜃 𝑖 ( 𝑖 ) 𝛾 𝑗 𝑗𝑑 𝑗 𝛼 𝑗 This Photo by Unknown Author is licensed under CC BY-NC 14
- 15. Continuous IRT • Grades out of 100 • A 2D surface of probabilities f (zj |θ) = αjγj 2π exp ( − α2 j 2 (θ − βj − γjzj) 2 ) 15
- 16. Mapping algorithm evaluation to IRT IRT Model Students Test questions 16 Problems/datasets Algorithms
- 17. The new mapping 𝛼 𝑗 𝛽 𝑗 𝜃 𝑖 𝑥 𝑖 𝑗 Discrimination of Algo j Difficulty of Algo j Ability of Dataset i Marks of dataset i for algorithm j Dataset Algorithm Performance 17
- 18. Fitting the IRT model • Maximising the expectation • - discrimination parameter of algorithm • - scaling parameter for the algorithm • - difficulty parameter for the algorithm • - score of the algorithm on the dataset/problem • - prior probabilities Eθ|Λ(t),Z [ln p (Λ|θ, Z)] = N n ∑ j=1 (ln αj + ln γj) − 1 2 N ∑ i=1 n ∑ j=1 α2 j ((βj + γjzij − μ(t) i ) 2 + σ(t)2 ) + ln p (Λ) + const αj γj βj zij Λ
- 19. New meaning of IRT parameters? ● Discrimination -> anomalousness, algorithm consistency ● Difficulty -> algorithm difficulty limit ● Student ability -> Dataset difficulty spectrum
- 20. What can we say . . . • Consistent algorithms give similar performance for easy or hard datasets • Algorithms with higher difficulty limits can handle harder problems • Anomalous algorithms give bad performance for easy problems and good performance for difficult problems 20
- 21. Dataset difficulty ● ● Is a function of discrimination, difficulty and scores dataset i di ffi culty = − ∑j ̂ α2 j ( ̂ βj + ̂ γjzij) ∑j ̂ α2 j
- 22. Example – fitting IRT model • Anomaly detection algorithms • 8 anomaly detection algorithms • 3142 datasets • Our performance metric is AUROC (looking at the performance vs actual) 22
- 23. Anomaly detection algorithms 23 Algorithm Consistency Difficulty Limit Anomalous Ensemble 55.08 -66.55 FALSE LOF 4.50 5.10 FALSE KNN 1.72 2.30 FALSE FAST_ABOD 9.39 10.23 FALSE Isolation Forest 2.35 3.05 FALSE KDEOS_1 0.86 -0.31 TRUE KDEOS-2 1.16 -0.51 TRUE LDF 2.01 2.08 FALSE
- 24. Dataset difficulty spectrum IRT Student Ability -> Dataset Difficulty 24
- 25. Focusing on algorithm performance • We have algorithm performance (y axis) and problem difficulty (x axis) • We can fit a model and find how each algorithm performs • We use smoothing splines • Can visualize them • No parameters to specify 25
- 27. Strengths and weaknesses of algorithms • If the curve is on top – it is strong in that region • If the curve is at bottom – weak in that region 27
- 28. AIRT IRT Question and student characteristics Algorithm and dataset characteristics Visualise strengths and weaknesses Portfolio construction 28
- 29. AIRT performs well, when . . . • The set of algorithms is diverse. • Ties back to IRT basics • IRT in education – If all the questions are equally discriminative and difficult, IRT doesn’t add much • IRT useful when we have a diverse set of questions and we want to know • Whichquestionsaremorediscriminative • Whichquestionsaredifficult 29
- 30. AIRT • Understand moreaboutalgorithms • Anomalousness,consistency,difficultylimit • Visualisethestrengthsandweaknessesofalgorithms • Selectaportfolioofgood algorithms • Includesdiagnostics • Rpackage airt(onCRAN) • https://sevvandi.github.io/airt/ • 30
- 31. A different application of IRT 31
- 32. IRT to build an ensemble ● Previous work was using performance - marks ● What about using original responses? ● Like survey questions ● Rosenberg's Self-Esteem Scale - I feel I am a person of worth (Strongly agree/Agree/Neutral/Disagree/Strongly disagree) ● No right or wrong answer ● Latent trait gives the person’s self esteem ● Latent trait uncovers the “hidden quality”
- 33. Unsupervised algorithms ● Instead of performance values what if you have original responses? Q 1 Q 2 Q 3 Q 4 Stu 1 0.95 0.87 0.67 0.84 Stu 2 0.57 0.49 0.78 0.77 Stu n 0.75 0.86 0.57 0.45 latent value (i) = − ∑j ̂ α2 j ( ̂ βj + ̂ γjzij) ∑j ̂ α2 j
- 34. What is an anomaly detection ensemble? Dataset Unsupervised AD methods • The AD methods are heterogenous methods • Ensembles – use existing methods to come up with better anomaly detection/ scores AD ensemble Ensemble Score
- 35. Anomaly detection ensembles ● Latent trait = anomalousness of the observations = the ensemble score ● , a weighted score of original responses ● ensemble score of observa ti on (i) = − ∑j ̂ α2 j ( ̂ βj + ̂ γjzij) ∑j ̂ α2 j
- 37. IRT for anomaly detection ensembles • R package – outlierensembles – on CRAN
- 38. IRT and algorithms ● Algorithm evaluation - AIRT ● Anomaly detection ensembles ● IRT used in ML more
- 39. 39
- 40. New AIRT parameters ● ● ● ● Where M, V and C denote mean, variance, and covariance terms γ(t+1) j = V (μ(t) i ) + σ(t)2 Cj (zij, μ(t) i ) β(t+1) j = M (μ(t) i ) − γ(t+1) j Mj (zij) α(t+1) j = ( γ(t+1)2 j Vj(zij) − V (μ(t) i ) − σ(t)2 ) −1/2
- 41. Explaining notation ● ● ● ● ● Mj (zij) = ∑i zij N M (μ(t) i ) = ∑i μ(t) i N V (zij) = ∑i z2 ij N − Mj (zij) 2 V (μ(t) i ) = ∑i μ(t)2 i N − M (μ(t) i ) 2 Cj (zij, μ(t) i ) = ∑i zijμ(t) i N − Mj (zij) M (μ(t) i )
- 42. More notation ● ● is the th iteration ● ● p (θi |Λ(t) , zi) = 𝒩 (θi |μ(t) i , σ(t)2 ) Λ(t) = (λ1 (t) , …, λn (t) ) , λj (t) = (α(t) j , β(t) j , γ(t) j ) T t σ(t)2 = ∑ j α(t)2 j + σ−2 −1 μ(t) i = σ(t)2 ∑ j α(t)2 j (β(t) j + γ(t) j zij) + μ
- 43. Algorithm portfolio selection • Can use algorithm strengths to select a good portfolio of algorithms • We call this portfolio airt portfolio • airt – Algorithm IRT (old Scottish word – to guide) 43
- 44. AIRT framework Performance data IRT Model Algorithm and dataset metrics Fitting smoothing splines Algorithm strengths and weaknesses Airt portfolio Evaluate portfolios General block AIRT output 44
- 45. What happens to the IRT parameters? • IRT - ability of student • As increases probability of a higher score increases • What is in terms of a dataset? • easiness of the dataset • Dataset difficulty score θi θi θi −θi 45
- 46. Discrimination parameter • Discrimination of item • increases slope of curve increases • What is in terms of an algorithm? • - lack of stability/robustness of algo • Consistency of algo αj αj αj αj 1 |αj | 46
- 47. Consistent algorithms • Education – such a question doesn’t give any information • Algorithms – these algorithms are really stable or consistent • Consistency = 1 |αj | 47
- 48. Anomalous algorithms • Algorithms that perform poorly on easy datasets and well on difficult datasets • Negative discrimination • In education – such items are discarded or revised • If an algorithm anomalous, it is interesting • Anomalousness = = sign(αj) This Photo by Unknown Author is licensed under CC BY-NC-ND 48