SlideShare a Scribd company logo
1 of 35
Active Learning (ML)
Akhilesh Ravi
Indian Institute of Technology Gandhinagar
What is active learning?
● Let us say that you have some data.
● You want to apply a machine learning technique to classify
the data.
● No labelled samples or a very small number of labelled
samples, and a large amount of data.
● Labelling each sample is expensive.
● What will you do?
What is active learning?
Image Source: https://www.datacamp.com/community/tutorials/active-learning
What is active learning?
What is active learning?
“The key idea behind active learning is that a machine learning algorithm
can perform better with less training if it is allowed to choose the data from
which it learns.”[1]
● It gives the samples to be labelled in such a way that with less labelled
samples, the machine learning model performs well - it chooses the
optimal set of samples to be labelled for good performance
● We could say that the model actively learns and sees what to learn next
so that it can perform better.
References:
1. Settles, Burr. Active learning literature survey. University of Wisconsin-Madison Department of Computer Sciences, 2009.
2. "Active Learning (Machine Learning)". En.Wikipedia.Org, 2020,
https://en.wikipedia.org/wiki/Active_learning_(machine_learning)#cite_note-settles-1. Accessed 31 Oct 2020.
Steps in Active Learning
● Train the model on the labelled data.
● Evaluate the model on all the unlabelled samples.
● Based on the evaluation, choose the sample/list of samples
to be labelled.
● Label these samples and add them to the labelled data.
● Repeat the above steps till a certain condition (stopping
criterion)
An Important Component
Oracle - person or a model that knows the correct
answer/classification/prediction to all questions/queries.[1][2]
Practically, an expert in the corresponding field would be
considered as an oracle.
1. Hosein, Stefan. "A Beginner's Guide To Active Learning". DatacampCommunity, 2020, https://www.datacamp.com/community/tutorials/active-learning. Accessed 31 Oct 2020.
2. "Active Learning (Machine Learning)". En.Wikipedia.Org, 2020, https://en.wikipedia.org/wiki/Active_learning_(machine_learning)#cite_note-settles-1. Accessed 31 Oct 2020.
Active learning - Scenarios
Active Learning - Scenarios
● Membership Query Synthesis
● Pool-based Sampling
● Stream-based Selective Sampling
Membership Query Synthesis
● The learner has a distribution made from the original data.
● The learner generates a sample from this distribution.
● The oracle gives the prediction for the sample
● This is added to the dataset that the learner uses for learning
References
1. Hosein, Stefan. "A Beginner's Guide To Active Learning". DatacampCommunity, 2020, https://www.datacamp.com/community/tutorials/active-learning. Accessed 31 Oct 2020.
2. "Active Learning (Machine Learning)". En.Wikipedia.Org, 2020, https://en.wikipedia.org/wiki/Active_learning_(machine_learning)#cite_note-settles-1. Accessed 31 Oct 2020.
Membership Query Synthesis
Image Source: https://www.statisticsfromatoz.com/uploads/7/3/2/1/73216723/discrete-and-cont-distributions_orig.png
Pool-based Sampling
● A data pool of unlabelled samples
● Informativeness score - assigned to all the samples in the pool
or a subset of pool if the pool is very large.
● The most informative sample(s) is(are) selected.
● Depending on the configuration ,one sample can be chosen
each time or a few samples can be chosen each time.
● These are labelled and added to the dataset that the learner
uses for learning.
References
1. Hosein, Stefan. "A Beginner's Guide To Active Learning". DatacampCommunity, 2020, https://www.datacamp.com/community/tutorials/active-learning. Accessed 31 Oct 2020.
2. "Active Learning (Machine Learning)". En.Wikipedia.Org, 2020, https://en.wikipedia.org/wiki/Active_learning_(machine_learning)#cite_note-settles-1. Accessed 31 Oct 2020.
Pool-based Sampling
Image Source: https://www.datacamp.com/community/tutorials/active-learning
Stream-based Selective Sampling
Stream-based Selective Sampling
● Assumption: Getting an unlabelled sample is free
● The samples are taken one by one and examined
● Based on an informativeness score for each sample, decide
whether an instance has to be labelled in this iteration or not.
● Many iterations
References
1. Hosein, Stefan. "A Beginner's Guide To Active Learning". DatacampCommunity, 2020, https://www.datacamp.com/community/tutorials/active-learning. Accessed 31 Oct 2020.
2. "Active Learning (Machine Learning)". En.Wikipedia.Org, 2020, https://en.wikipedia.org/wiki/Active_learning_(machine_learning)#cite_note-settles-1. Accessed 31 Oct 2020.
Query Strategies
Query Strategies
There are many query strategies.[1][2] Here are three common
strategies:
● Least Confidence
● Margin Sampling
● Entropy Sampling
1. Hosein, Stefan. "A Beginner's Guide To Active Learning". DatacampCommunity, 2020, https://www.datacamp.com/community/tutorials/active-learning. Accessed 31 Oct 2020.
2. "Active Learning (Machine Learning)". En.Wikipedia.Org, 2020, https://en.wikipedia.org/wiki/Active_learning_(machine_learning)#cite_note-settles-1. Accessed 31 Oct 2020.
Query Strategies - Least Confidence
The sample which has the least probability for its most likely
label is chosen.
Eg - Let the samples in a dataset be in three classes - A, B, C.
Sample S1 probabilities: A - 0.5, B - 0.25, C - 0.25 Most likely label: A (0.5)
Sample S2 probabilities: A - 0.1, B - 0.8, C - 0.1 Most likely label: B (0.8)
Here, S1 will be chosen according to the above query strategy.
References
1. Hosein, Stefan. "A Beginner's Guide To Active Learning". DatacampCommunity, 2020, https://www.datacamp.com/community/tutorials/active-learning. Accessed 31 Oct 2020.
2. "Active Learning (Machine Learning)". En.Wikipedia.Org, 2020, https://en.wikipedia.org/wiki/Active_learning_(machine_learning)#cite_note-settles-1. Accessed 31 Oct 2020.
Query Strategies - Margin Sampling
The problem with LC is that is takes only the most likely label into
account. Thus, this query strategy takes the sample which has least
difference between its most likely label and second most likely label
probabilities.
Eg - Let the samples in a dataset be in three classes - A, B, C.
Sample S1 probabilities: A - 0.5, B - 0.45, C - 0.05 0.5 - 0.45 = 0.05
Sample S2 probabilities: A - 0.3, B - 0.4, C - 0.3 0.4 - 0.3 = 0.1
Here, S1 will be chosen according to the margin sample.
According to LC, S2 will be chosen.
References
1. Hosein, Stefan. "A Beginner's Guide To Active Learning". DatacampCommunity, 2020, https://www.datacamp.com/community/tutorials/active-learning. Accessed 31 Oct 2020.
2. "Active Learning (Machine Learning)". En.Wikipedia.Org, 2020, https://en.wikipedia.org/wiki/Active_learning_(machine_learning)#cite_note-settles-1. Accessed 31 Oct 2020.
LC takes only most likely label into account and margin sampling
sampling takes the top two likely labels into account. Entropy sampling
uses the probability of all possible labels. This is done using the metric
called entropy. The sample with the largest entropy is selected.
Query Strategies - Entropy Sampling
References
1. Hosein, Stefan. "A Beginner's Guide To Active Learning". DatacampCommunity, 2020, https://www.datacamp.com/community/tutorials/active-learning. Accessed 31 Oct 2020.
2. "Active Learning (Machine Learning)". En.Wikipedia.Org, 2020, https://en.wikipedia.org/wiki/Active_learning_(machine_learning)#cite_note-settles-1. Accessed 31 Oct 2020.
Looking back: Steps in Active Learning
● Train the model on the labelled data.
● Evaluate the model on all the unlabelled samples.
● Based on the evaluation, choose the sample/list of samples to be labelled.
● Label these samples and add them to the labelled data.
● Repeat the above steps till a certain condition (stopping criterion)
Image Source: https://www.datacamp.com/community/tutorials/active-learning
Advantages of Active Learning
● Eases the problem of lack of labelled data; only a
fraction of the data has to be labelled
● Can be applied for online learning scenarios - many
practical scenarios in industries involve online learning
Application Areas of Active Learning
● Natural Language Processing[1]
○ Lots of data to label
● Reinforcement Learning
● Online Learning
○ Lot of data coming in continuously
○ Spam filters, ranking of search results, job listings, etc.[2]
References
1. Hosein, Stefan. "A Beginner's Guide To Active Learning". DatacampCommunity, 2020, https://www.datacamp.com/community/tutorials/active-learning. Accessed 31 Oct 2020.
2. https://www.oreilly.com/content/real-world-active-learning/
Active learning - Visualizations
Active Learning - Visualizations
Iris Dataset
Versicolor - red
Virginica -
cyan/light blue
Source: https://www.kaggle.com/uciml/iris
Active Learning - Visualizations
Active Learning - Visualizations
Active Learning - Visualizations
Active Learning - Visualizations
Active Learning - Visualizations
Active Learning - Visualizations
Active Learning - Visualizations
Active Learning - Visualizations
Tutorial on Active Learning
Active Learning Tutorial - bit.ly/medium-active-learning
Images
1. Towards Data Science Logo - https://miro.medium.com/max/1200/1*F0LADxTtsKOgmPa-_7iUEQ.jpeg
2. Medium.com Logo - https://miro.medium.com/max/8978/1*s986xIGqhfsN8U--09_AdA.png
References
Datacamp.org -
https://www.datacamp.com/community/tutorials/active-learning
Wikipedia.org -
https://en.wikipedia.org/wiki/Active_learning_(machine_learning)
Thank you

More Related Content

What's hot

Few shot learning/ one shot learning/ machine learning
Few shot learning/ one shot learning/ machine learningFew shot learning/ one shot learning/ machine learning
Few shot learning/ one shot learning/ machine learningﺁﺻﻒ ﻋﻠﯽ ﻣﯿﺮ
 
Active learning
Active learningActive learning
Active learningAli Abbasi
 
Active learning: Scenarios and techniques
Active learning: Scenarios and techniquesActive learning: Scenarios and techniques
Active learning: Scenarios and techniquesweb2webs
 
Machine Learning and Data Mining: 16 Classifiers Ensembles
Machine Learning and Data Mining: 16 Classifiers EnsemblesMachine Learning and Data Mining: 16 Classifiers Ensembles
Machine Learning and Data Mining: 16 Classifiers EnsemblesPier Luca Lanzi
 
Variational Autoencoder
Variational AutoencoderVariational Autoencoder
Variational AutoencoderMark Chang
 
Feature Selection in Machine Learning
Feature Selection in Machine LearningFeature Selection in Machine Learning
Feature Selection in Machine LearningUpekha Vandebona
 
Generative Adversarial Networks
Generative Adversarial NetworksGenerative Adversarial Networks
Generative Adversarial NetworksMustafa Yagmur
 
Anomaly detection using deep one class classifier
Anomaly detection using deep one class classifierAnomaly detection using deep one class classifier
Anomaly detection using deep one class classifier홍배 김
 
PR-409: Denoising Diffusion Probabilistic Models
PR-409: Denoising Diffusion Probabilistic ModelsPR-409: Denoising Diffusion Probabilistic Models
PR-409: Denoising Diffusion Probabilistic ModelsHyeongmin Lee
 
Machine learning with scikitlearn
Machine learning with scikitlearnMachine learning with scikitlearn
Machine learning with scikitlearnPratap Dangeti
 
Scikit Learn Tutorial | Machine Learning with Python | Python for Data Scienc...
Scikit Learn Tutorial | Machine Learning with Python | Python for Data Scienc...Scikit Learn Tutorial | Machine Learning with Python | Python for Data Scienc...
Scikit Learn Tutorial | Machine Learning with Python | Python for Data Scienc...Edureka!
 
Generative Models for General Audiences
Generative Models for General AudiencesGenerative Models for General Audiences
Generative Models for General AudiencesSangwoo Mo
 
PRML 8.2 条件付き独立性
PRML 8.2 条件付き独立性PRML 8.2 条件付き独立性
PRML 8.2 条件付き独立性sleepy_yoshi
 
Lecture_16_Self-supervised_Learning.pptx
Lecture_16_Self-supervised_Learning.pptxLecture_16_Self-supervised_Learning.pptx
Lecture_16_Self-supervised_Learning.pptxKarimdabbabi
 
Introduction to Few shot learning
Introduction to Few shot learningIntroduction to Few shot learning
Introduction to Few shot learningRidge-i, Inc.
 
[MIRU2018] Global Average Poolingの特性を用いたAttention Branch Network
[MIRU2018] Global Average Poolingの特性を用いたAttention Branch Network[MIRU2018] Global Average Poolingの特性を用いたAttention Branch Network
[MIRU2018] Global Average Poolingの特性を用いたAttention Branch NetworkHiroshi Fukui
 
(DL輪読)Variational Dropout Sparsifies Deep Neural Networks
(DL輪読)Variational Dropout Sparsifies Deep Neural Networks(DL輪読)Variational Dropout Sparsifies Deep Neural Networks
(DL輪読)Variational Dropout Sparsifies Deep Neural NetworksMasahiro Suzuki
 

What's hot (20)

Few shot learning/ one shot learning/ machine learning
Few shot learning/ one shot learning/ machine learningFew shot learning/ one shot learning/ machine learning
Few shot learning/ one shot learning/ machine learning
 
Active learning
Active learningActive learning
Active learning
 
Active learning: Scenarios and techniques
Active learning: Scenarios and techniquesActive learning: Scenarios and techniques
Active learning: Scenarios and techniques
 
Machine Learning and Data Mining: 16 Classifiers Ensembles
Machine Learning and Data Mining: 16 Classifiers EnsemblesMachine Learning and Data Mining: 16 Classifiers Ensembles
Machine Learning and Data Mining: 16 Classifiers Ensembles
 
Variational Autoencoder
Variational AutoencoderVariational Autoencoder
Variational Autoencoder
 
Feature Selection in Machine Learning
Feature Selection in Machine LearningFeature Selection in Machine Learning
Feature Selection in Machine Learning
 
Generative Adversarial Networks
Generative Adversarial NetworksGenerative Adversarial Networks
Generative Adversarial Networks
 
Anomaly detection using deep one class classifier
Anomaly detection using deep one class classifierAnomaly detection using deep one class classifier
Anomaly detection using deep one class classifier
 
PR-409: Denoising Diffusion Probabilistic Models
PR-409: Denoising Diffusion Probabilistic ModelsPR-409: Denoising Diffusion Probabilistic Models
PR-409: Denoising Diffusion Probabilistic Models
 
Machine learning with scikitlearn
Machine learning with scikitlearnMachine learning with scikitlearn
Machine learning with scikitlearn
 
Scikit Learn Tutorial | Machine Learning with Python | Python for Data Scienc...
Scikit Learn Tutorial | Machine Learning with Python | Python for Data Scienc...Scikit Learn Tutorial | Machine Learning with Python | Python for Data Scienc...
Scikit Learn Tutorial | Machine Learning with Python | Python for Data Scienc...
 
Generative Models for General Audiences
Generative Models for General AudiencesGenerative Models for General Audiences
Generative Models for General Audiences
 
PRML 8.2 条件付き独立性
PRML 8.2 条件付き独立性PRML 8.2 条件付き独立性
PRML 8.2 条件付き独立性
 
Lecture_16_Self-supervised_Learning.pptx
Lecture_16_Self-supervised_Learning.pptxLecture_16_Self-supervised_Learning.pptx
Lecture_16_Self-supervised_Learning.pptx
 
Introduction to Few shot learning
Introduction to Few shot learningIntroduction to Few shot learning
Introduction to Few shot learning
 
実装レベルで学ぶVQVAE
実装レベルで学ぶVQVAE実装レベルで学ぶVQVAE
実装レベルで学ぶVQVAE
 
Meta learning tutorial
Meta learning tutorialMeta learning tutorial
Meta learning tutorial
 
[MIRU2018] Global Average Poolingの特性を用いたAttention Branch Network
[MIRU2018] Global Average Poolingの特性を用いたAttention Branch Network[MIRU2018] Global Average Poolingの特性を用いたAttention Branch Network
[MIRU2018] Global Average Poolingの特性を用いたAttention Branch Network
 
XGBoost & LightGBM
XGBoost & LightGBMXGBoost & LightGBM
XGBoost & LightGBM
 
(DL輪読)Variational Dropout Sparsifies Deep Neural Networks
(DL輪読)Variational Dropout Sparsifies Deep Neural Networks(DL輪読)Variational Dropout Sparsifies Deep Neural Networks
(DL輪読)Variational Dropout Sparsifies Deep Neural Networks
 

Similar to ML Active Learning Guide

Data-Driven Learning Strategy
Data-Driven Learning StrategyData-Driven Learning Strategy
Data-Driven Learning StrategyJessie Chuang
 
training_presentation
training_presentationtraining_presentation
training_presentationYudi512144
 
Moodle and analytics present and future tl forum
Moodle and analytics   present and future tl forumMoodle and analytics   present and future tl forum
Moodle and analytics present and future tl forumNetSpot Pty Ltd
 
Learning Analytics in Education: Using Student’s Big Data to Improve Teaching
Learning Analytics in Education:  Using Student’s Big Data to Improve TeachingLearning Analytics in Education:  Using Student’s Big Data to Improve Teaching
Learning Analytics in Education: Using Student’s Big Data to Improve TeachingRafael Scapin, Ph.D.
 
How AI will change the way you help students succeed - SchooLinks
How AI will change the way you help students succeed - SchooLinksHow AI will change the way you help students succeed - SchooLinks
How AI will change the way you help students succeed - SchooLinksKatie Fang
 
Aligning Learning Analytics with Classroom Practices & Needs
Aligning Learning Analytics with Classroom Practices & NeedsAligning Learning Analytics with Classroom Practices & Needs
Aligning Learning Analytics with Classroom Practices & NeedsSimon Knight
 
EXTRA: Integrating External Knowledge into Multimodal Hashtag Recommendation ...
EXTRA: Integrating External Knowledge into Multimodal Hashtag Recommendation ...EXTRA: Integrating External Knowledge into Multimodal Hashtag Recommendation ...
EXTRA: Integrating External Knowledge into Multimodal Hashtag Recommendation ...ssuser610732
 
Learning Analytics in Higher Education
Learning Analytics in Higher EducationLearning Analytics in Higher Education
Learning Analytics in Higher EducationJose Antonio Omedes
 
Learning Analytics for Adaptive Learning And Standardization
Learning Analytics for Adaptive Learning And StandardizationLearning Analytics for Adaptive Learning And Standardization
Learning Analytics for Adaptive Learning And StandardizationOpen Cyber University of Korea
 
Moodle and analytics - present and future
Moodle and analytics - present and futureMoodle and analytics - present and future
Moodle and analytics - present and futureNetSpot Pty Ltd
 
Learning Analytics: Realizing their Promise in the California State University
Learning Analytics:  Realizing their Promise in the California State UniversityLearning Analytics:  Realizing their Promise in the California State University
Learning Analytics: Realizing their Promise in the California State UniversityJohn Whitmer, Ed.D.
 
JISC RSC London Workshop - Learner analytics
JISC RSC London Workshop - Learner analyticsJISC RSC London Workshop - Learner analytics
JISC RSC London Workshop - Learner analyticsJames Ballard
 
Introduction to Learning Analytics - Framework and Implementation Concerns
Introduction to Learning Analytics - Framework and Implementation ConcernsIntroduction to Learning Analytics - Framework and Implementation Concerns
Introduction to Learning Analytics - Framework and Implementation ConcernsTore Hoel
 
Mootnz13 Moodle Analytics
Mootnz13 Moodle AnalyticsMootnz13 Moodle Analytics
Mootnz13 Moodle AnalyticsNetSpot Pty Ltd
 
Learning Analytics – Opportunities for ISO/IEC JTC 1/SC36 standardisation
Learning Analytics – Opportunities for ISO/IEC JTC 1/SC36 standardisationLearning Analytics – Opportunities for ISO/IEC JTC 1/SC36 standardisation
Learning Analytics – Opportunities for ISO/IEC JTC 1/SC36 standardisationTore Hoel
 
Jisc learning analytics service oct 2016
Jisc learning analytics service oct 2016Jisc learning analytics service oct 2016
Jisc learning analytics service oct 2016Paul Bailey
 
Introduction to Jisc's Learning Analytics project - Sept 2015
Introduction to Jisc's Learning Analytics project  - Sept 2015Introduction to Jisc's Learning Analytics project  - Sept 2015
Introduction to Jisc's Learning Analytics project - Sept 2015mwebbjisc
 
V Jornadas eMadrid sobre "Educación Digital". Cristina Conati, University of ...
V Jornadas eMadrid sobre "Educación Digital". Cristina Conati, University of ...V Jornadas eMadrid sobre "Educación Digital". Cristina Conati, University of ...
V Jornadas eMadrid sobre "Educación Digital". Cristina Conati, University of ...eMadrid network
 

Similar to ML Active Learning Guide (20)

Data-Driven Learning Strategy
Data-Driven Learning StrategyData-Driven Learning Strategy
Data-Driven Learning Strategy
 
training_presentation
training_presentationtraining_presentation
training_presentation
 
Moodle and analytics present and future tl forum
Moodle and analytics   present and future tl forumMoodle and analytics   present and future tl forum
Moodle and analytics present and future tl forum
 
Learning Analytics in Education: Using Student’s Big Data to Improve Teaching
Learning Analytics in Education:  Using Student’s Big Data to Improve TeachingLearning Analytics in Education:  Using Student’s Big Data to Improve Teaching
Learning Analytics in Education: Using Student’s Big Data to Improve Teaching
 
kobotraining
kobotraining kobotraining
kobotraining
 
kobo_training.pdf
kobo_training.pdfkobo_training.pdf
kobo_training.pdf
 
How AI will change the way you help students succeed - SchooLinks
How AI will change the way you help students succeed - SchooLinksHow AI will change the way you help students succeed - SchooLinks
How AI will change the way you help students succeed - SchooLinks
 
Aligning Learning Analytics with Classroom Practices & Needs
Aligning Learning Analytics with Classroom Practices & NeedsAligning Learning Analytics with Classroom Practices & Needs
Aligning Learning Analytics with Classroom Practices & Needs
 
EXTRA: Integrating External Knowledge into Multimodal Hashtag Recommendation ...
EXTRA: Integrating External Knowledge into Multimodal Hashtag Recommendation ...EXTRA: Integrating External Knowledge into Multimodal Hashtag Recommendation ...
EXTRA: Integrating External Knowledge into Multimodal Hashtag Recommendation ...
 
Learning Analytics in Higher Education
Learning Analytics in Higher EducationLearning Analytics in Higher Education
Learning Analytics in Higher Education
 
Learning Analytics for Adaptive Learning And Standardization
Learning Analytics for Adaptive Learning And StandardizationLearning Analytics for Adaptive Learning And Standardization
Learning Analytics for Adaptive Learning And Standardization
 
Moodle and analytics - present and future
Moodle and analytics - present and futureMoodle and analytics - present and future
Moodle and analytics - present and future
 
Learning Analytics: Realizing their Promise in the California State University
Learning Analytics:  Realizing their Promise in the California State UniversityLearning Analytics:  Realizing their Promise in the California State University
Learning Analytics: Realizing their Promise in the California State University
 
JISC RSC London Workshop - Learner analytics
JISC RSC London Workshop - Learner analyticsJISC RSC London Workshop - Learner analytics
JISC RSC London Workshop - Learner analytics
 
Introduction to Learning Analytics - Framework and Implementation Concerns
Introduction to Learning Analytics - Framework and Implementation ConcernsIntroduction to Learning Analytics - Framework and Implementation Concerns
Introduction to Learning Analytics - Framework and Implementation Concerns
 
Mootnz13 Moodle Analytics
Mootnz13 Moodle AnalyticsMootnz13 Moodle Analytics
Mootnz13 Moodle Analytics
 
Learning Analytics – Opportunities for ISO/IEC JTC 1/SC36 standardisation
Learning Analytics – Opportunities for ISO/IEC JTC 1/SC36 standardisationLearning Analytics – Opportunities for ISO/IEC JTC 1/SC36 standardisation
Learning Analytics – Opportunities for ISO/IEC JTC 1/SC36 standardisation
 
Jisc learning analytics service oct 2016
Jisc learning analytics service oct 2016Jisc learning analytics service oct 2016
Jisc learning analytics service oct 2016
 
Introduction to Jisc's Learning Analytics project - Sept 2015
Introduction to Jisc's Learning Analytics project  - Sept 2015Introduction to Jisc's Learning Analytics project  - Sept 2015
Introduction to Jisc's Learning Analytics project - Sept 2015
 
V Jornadas eMadrid sobre "Educación Digital". Cristina Conati, University of ...
V Jornadas eMadrid sobre "Educación Digital". Cristina Conati, University of ...V Jornadas eMadrid sobre "Educación Digital". Cristina Conati, University of ...
V Jornadas eMadrid sobre "Educación Digital". Cristina Conati, University of ...
 

Recently uploaded

How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 

Recently uploaded (20)

DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 

ML Active Learning Guide

  • 1. Active Learning (ML) Akhilesh Ravi Indian Institute of Technology Gandhinagar
  • 2. What is active learning? ● Let us say that you have some data. ● You want to apply a machine learning technique to classify the data. ● No labelled samples or a very small number of labelled samples, and a large amount of data. ● Labelling each sample is expensive. ● What will you do?
  • 3. What is active learning? Image Source: https://www.datacamp.com/community/tutorials/active-learning
  • 4. What is active learning?
  • 5. What is active learning? “The key idea behind active learning is that a machine learning algorithm can perform better with less training if it is allowed to choose the data from which it learns.”[1] ● It gives the samples to be labelled in such a way that with less labelled samples, the machine learning model performs well - it chooses the optimal set of samples to be labelled for good performance ● We could say that the model actively learns and sees what to learn next so that it can perform better. References: 1. Settles, Burr. Active learning literature survey. University of Wisconsin-Madison Department of Computer Sciences, 2009. 2. "Active Learning (Machine Learning)". En.Wikipedia.Org, 2020, https://en.wikipedia.org/wiki/Active_learning_(machine_learning)#cite_note-settles-1. Accessed 31 Oct 2020.
  • 6. Steps in Active Learning ● Train the model on the labelled data. ● Evaluate the model on all the unlabelled samples. ● Based on the evaluation, choose the sample/list of samples to be labelled. ● Label these samples and add them to the labelled data. ● Repeat the above steps till a certain condition (stopping criterion)
  • 7. An Important Component Oracle - person or a model that knows the correct answer/classification/prediction to all questions/queries.[1][2] Practically, an expert in the corresponding field would be considered as an oracle. 1. Hosein, Stefan. "A Beginner's Guide To Active Learning". DatacampCommunity, 2020, https://www.datacamp.com/community/tutorials/active-learning. Accessed 31 Oct 2020. 2. "Active Learning (Machine Learning)". En.Wikipedia.Org, 2020, https://en.wikipedia.org/wiki/Active_learning_(machine_learning)#cite_note-settles-1. Accessed 31 Oct 2020.
  • 8. Active learning - Scenarios
  • 9. Active Learning - Scenarios ● Membership Query Synthesis ● Pool-based Sampling ● Stream-based Selective Sampling
  • 10. Membership Query Synthesis ● The learner has a distribution made from the original data. ● The learner generates a sample from this distribution. ● The oracle gives the prediction for the sample ● This is added to the dataset that the learner uses for learning References 1. Hosein, Stefan. "A Beginner's Guide To Active Learning". DatacampCommunity, 2020, https://www.datacamp.com/community/tutorials/active-learning. Accessed 31 Oct 2020. 2. "Active Learning (Machine Learning)". En.Wikipedia.Org, 2020, https://en.wikipedia.org/wiki/Active_learning_(machine_learning)#cite_note-settles-1. Accessed 31 Oct 2020.
  • 11. Membership Query Synthesis Image Source: https://www.statisticsfromatoz.com/uploads/7/3/2/1/73216723/discrete-and-cont-distributions_orig.png
  • 12. Pool-based Sampling ● A data pool of unlabelled samples ● Informativeness score - assigned to all the samples in the pool or a subset of pool if the pool is very large. ● The most informative sample(s) is(are) selected. ● Depending on the configuration ,one sample can be chosen each time or a few samples can be chosen each time. ● These are labelled and added to the dataset that the learner uses for learning. References 1. Hosein, Stefan. "A Beginner's Guide To Active Learning". DatacampCommunity, 2020, https://www.datacamp.com/community/tutorials/active-learning. Accessed 31 Oct 2020. 2. "Active Learning (Machine Learning)". En.Wikipedia.Org, 2020, https://en.wikipedia.org/wiki/Active_learning_(machine_learning)#cite_note-settles-1. Accessed 31 Oct 2020.
  • 13. Pool-based Sampling Image Source: https://www.datacamp.com/community/tutorials/active-learning
  • 14. Stream-based Selective Sampling Stream-based Selective Sampling ● Assumption: Getting an unlabelled sample is free ● The samples are taken one by one and examined ● Based on an informativeness score for each sample, decide whether an instance has to be labelled in this iteration or not. ● Many iterations References 1. Hosein, Stefan. "A Beginner's Guide To Active Learning". DatacampCommunity, 2020, https://www.datacamp.com/community/tutorials/active-learning. Accessed 31 Oct 2020. 2. "Active Learning (Machine Learning)". En.Wikipedia.Org, 2020, https://en.wikipedia.org/wiki/Active_learning_(machine_learning)#cite_note-settles-1. Accessed 31 Oct 2020.
  • 16. Query Strategies There are many query strategies.[1][2] Here are three common strategies: ● Least Confidence ● Margin Sampling ● Entropy Sampling 1. Hosein, Stefan. "A Beginner's Guide To Active Learning". DatacampCommunity, 2020, https://www.datacamp.com/community/tutorials/active-learning. Accessed 31 Oct 2020. 2. "Active Learning (Machine Learning)". En.Wikipedia.Org, 2020, https://en.wikipedia.org/wiki/Active_learning_(machine_learning)#cite_note-settles-1. Accessed 31 Oct 2020.
  • 17. Query Strategies - Least Confidence The sample which has the least probability for its most likely label is chosen. Eg - Let the samples in a dataset be in three classes - A, B, C. Sample S1 probabilities: A - 0.5, B - 0.25, C - 0.25 Most likely label: A (0.5) Sample S2 probabilities: A - 0.1, B - 0.8, C - 0.1 Most likely label: B (0.8) Here, S1 will be chosen according to the above query strategy. References 1. Hosein, Stefan. "A Beginner's Guide To Active Learning". DatacampCommunity, 2020, https://www.datacamp.com/community/tutorials/active-learning. Accessed 31 Oct 2020. 2. "Active Learning (Machine Learning)". En.Wikipedia.Org, 2020, https://en.wikipedia.org/wiki/Active_learning_(machine_learning)#cite_note-settles-1. Accessed 31 Oct 2020.
  • 18. Query Strategies - Margin Sampling The problem with LC is that is takes only the most likely label into account. Thus, this query strategy takes the sample which has least difference between its most likely label and second most likely label probabilities. Eg - Let the samples in a dataset be in three classes - A, B, C. Sample S1 probabilities: A - 0.5, B - 0.45, C - 0.05 0.5 - 0.45 = 0.05 Sample S2 probabilities: A - 0.3, B - 0.4, C - 0.3 0.4 - 0.3 = 0.1 Here, S1 will be chosen according to the margin sample. According to LC, S2 will be chosen. References 1. Hosein, Stefan. "A Beginner's Guide To Active Learning". DatacampCommunity, 2020, https://www.datacamp.com/community/tutorials/active-learning. Accessed 31 Oct 2020. 2. "Active Learning (Machine Learning)". En.Wikipedia.Org, 2020, https://en.wikipedia.org/wiki/Active_learning_(machine_learning)#cite_note-settles-1. Accessed 31 Oct 2020.
  • 19. LC takes only most likely label into account and margin sampling sampling takes the top two likely labels into account. Entropy sampling uses the probability of all possible labels. This is done using the metric called entropy. The sample with the largest entropy is selected. Query Strategies - Entropy Sampling References 1. Hosein, Stefan. "A Beginner's Guide To Active Learning". DatacampCommunity, 2020, https://www.datacamp.com/community/tutorials/active-learning. Accessed 31 Oct 2020. 2. "Active Learning (Machine Learning)". En.Wikipedia.Org, 2020, https://en.wikipedia.org/wiki/Active_learning_(machine_learning)#cite_note-settles-1. Accessed 31 Oct 2020.
  • 20. Looking back: Steps in Active Learning ● Train the model on the labelled data. ● Evaluate the model on all the unlabelled samples. ● Based on the evaluation, choose the sample/list of samples to be labelled. ● Label these samples and add them to the labelled data. ● Repeat the above steps till a certain condition (stopping criterion) Image Source: https://www.datacamp.com/community/tutorials/active-learning
  • 21. Advantages of Active Learning ● Eases the problem of lack of labelled data; only a fraction of the data has to be labelled ● Can be applied for online learning scenarios - many practical scenarios in industries involve online learning
  • 22. Application Areas of Active Learning ● Natural Language Processing[1] ○ Lots of data to label ● Reinforcement Learning ● Online Learning ○ Lot of data coming in continuously ○ Spam filters, ranking of search results, job listings, etc.[2] References 1. Hosein, Stefan. "A Beginner's Guide To Active Learning". DatacampCommunity, 2020, https://www.datacamp.com/community/tutorials/active-learning. Accessed 31 Oct 2020. 2. https://www.oreilly.com/content/real-world-active-learning/
  • 23. Active learning - Visualizations
  • 24. Active Learning - Visualizations Iris Dataset Versicolor - red Virginica - cyan/light blue Source: https://www.kaggle.com/uciml/iris
  • 25. Active Learning - Visualizations
  • 26. Active Learning - Visualizations
  • 27. Active Learning - Visualizations
  • 28. Active Learning - Visualizations
  • 29. Active Learning - Visualizations
  • 30. Active Learning - Visualizations
  • 31. Active Learning - Visualizations
  • 32. Active Learning - Visualizations
  • 33. Tutorial on Active Learning Active Learning Tutorial - bit.ly/medium-active-learning Images 1. Towards Data Science Logo - https://miro.medium.com/max/1200/1*F0LADxTtsKOgmPa-_7iUEQ.jpeg 2. Medium.com Logo - https://miro.medium.com/max/8978/1*s986xIGqhfsN8U--09_AdA.png