SlideShare a Scribd company logo
Active Learning (ML)
Akhilesh Ravi
Indian Institute of Technology Gandhinagar
What is active learning?
● Let us say that you have some data.
● You want to apply a machine learning technique to classify
the data.
● No labelled samples or a very small number of labelled
samples, and a large amount of data.
● Labelling each sample is expensive.
● What will you do?
What is active learning?
Image Source: https://www.datacamp.com/community/tutorials/active-learning
What is active learning?
What is active learning?
“The key idea behind active learning is that a machine learning algorithm
can perform better with less training if it is allowed to choose the data from
which it learns.”[1]
● It gives the samples to be labelled in such a way that with less labelled
samples, the machine learning model performs well - it chooses the
optimal set of samples to be labelled for good performance
● We could say that the model actively learns and sees what to learn next
so that it can perform better.
References:
1. Settles, Burr. Active learning literature survey. University of Wisconsin-Madison Department of Computer Sciences, 2009.
2. "Active Learning (Machine Learning)". En.Wikipedia.Org, 2020,
https://en.wikipedia.org/wiki/Active_learning_(machine_learning)#cite_note-settles-1. Accessed 31 Oct 2020.
Steps in Active Learning
● Train the model on the labelled data.
● Evaluate the model on all the unlabelled samples.
● Based on the evaluation, choose the sample/list of samples
to be labelled.
● Label these samples and add them to the labelled data.
● Repeat the above steps till a certain condition (stopping
criterion)
An Important Component
Oracle - person or a model that knows the correct
answer/classification/prediction to all questions/queries.[1][2]
Practically, an expert in the corresponding field would be
considered as an oracle.
1. Hosein, Stefan. "A Beginner's Guide To Active Learning". DatacampCommunity, 2020, https://www.datacamp.com/community/tutorials/active-learning. Accessed 31 Oct 2020.
2. "Active Learning (Machine Learning)". En.Wikipedia.Org, 2020, https://en.wikipedia.org/wiki/Active_learning_(machine_learning)#cite_note-settles-1. Accessed 31 Oct 2020.
Active learning - Scenarios
Active Learning - Scenarios
● Membership Query Synthesis
● Pool-based Sampling
● Stream-based Selective Sampling
Membership Query Synthesis
● The learner has a distribution made from the original data.
● The learner generates a sample from this distribution.
● The oracle gives the prediction for the sample
● This is added to the dataset that the learner uses for learning
References
1. Hosein, Stefan. "A Beginner's Guide To Active Learning". DatacampCommunity, 2020, https://www.datacamp.com/community/tutorials/active-learning. Accessed 31 Oct 2020.
2. "Active Learning (Machine Learning)". En.Wikipedia.Org, 2020, https://en.wikipedia.org/wiki/Active_learning_(machine_learning)#cite_note-settles-1. Accessed 31 Oct 2020.
Membership Query Synthesis
Image Source: https://www.statisticsfromatoz.com/uploads/7/3/2/1/73216723/discrete-and-cont-distributions_orig.png
Pool-based Sampling
● A data pool of unlabelled samples
● Informativeness score - assigned to all the samples in the pool
or a subset of pool if the pool is very large.
● The most informative sample(s) is(are) selected.
● Depending on the configuration ,one sample can be chosen
each time or a few samples can be chosen each time.
● These are labelled and added to the dataset that the learner
uses for learning.
References
1. Hosein, Stefan. "A Beginner's Guide To Active Learning". DatacampCommunity, 2020, https://www.datacamp.com/community/tutorials/active-learning. Accessed 31 Oct 2020.
2. "Active Learning (Machine Learning)". En.Wikipedia.Org, 2020, https://en.wikipedia.org/wiki/Active_learning_(machine_learning)#cite_note-settles-1. Accessed 31 Oct 2020.
Pool-based Sampling
Image Source: https://www.datacamp.com/community/tutorials/active-learning
Stream-based Selective Sampling
Stream-based Selective Sampling
● Assumption: Getting an unlabelled sample is free
● The samples are taken one by one and examined
● Based on an informativeness score for each sample, decide
whether an instance has to be labelled in this iteration or not.
● Many iterations
References
1. Hosein, Stefan. "A Beginner's Guide To Active Learning". DatacampCommunity, 2020, https://www.datacamp.com/community/tutorials/active-learning. Accessed 31 Oct 2020.
2. "Active Learning (Machine Learning)". En.Wikipedia.Org, 2020, https://en.wikipedia.org/wiki/Active_learning_(machine_learning)#cite_note-settles-1. Accessed 31 Oct 2020.
Query Strategies
Query Strategies
There are many query strategies.[1][2] Here are three common
strategies:
● Least Confidence
● Margin Sampling
● Entropy Sampling
1. Hosein, Stefan. "A Beginner's Guide To Active Learning". DatacampCommunity, 2020, https://www.datacamp.com/community/tutorials/active-learning. Accessed 31 Oct 2020.
2. "Active Learning (Machine Learning)". En.Wikipedia.Org, 2020, https://en.wikipedia.org/wiki/Active_learning_(machine_learning)#cite_note-settles-1. Accessed 31 Oct 2020.
Query Strategies - Least Confidence
The sample which has the least probability for its most likely
label is chosen.
Eg - Let the samples in a dataset be in three classes - A, B, C.
Sample S1 probabilities: A - 0.5, B - 0.25, C - 0.25 Most likely label: A (0.5)
Sample S2 probabilities: A - 0.1, B - 0.8, C - 0.1 Most likely label: B (0.8)
Here, S1 will be chosen according to the above query strategy.
References
1. Hosein, Stefan. "A Beginner's Guide To Active Learning". DatacampCommunity, 2020, https://www.datacamp.com/community/tutorials/active-learning. Accessed 31 Oct 2020.
2. "Active Learning (Machine Learning)". En.Wikipedia.Org, 2020, https://en.wikipedia.org/wiki/Active_learning_(machine_learning)#cite_note-settles-1. Accessed 31 Oct 2020.
Query Strategies - Margin Sampling
The problem with LC is that is takes only the most likely label into
account. Thus, this query strategy takes the sample which has least
difference between its most likely label and second most likely label
probabilities.
Eg - Let the samples in a dataset be in three classes - A, B, C.
Sample S1 probabilities: A - 0.5, B - 0.45, C - 0.05 0.5 - 0.45 = 0.05
Sample S2 probabilities: A - 0.3, B - 0.4, C - 0.3 0.4 - 0.3 = 0.1
Here, S1 will be chosen according to the margin sample.
According to LC, S2 will be chosen.
References
1. Hosein, Stefan. "A Beginner's Guide To Active Learning". DatacampCommunity, 2020, https://www.datacamp.com/community/tutorials/active-learning. Accessed 31 Oct 2020.
2. "Active Learning (Machine Learning)". En.Wikipedia.Org, 2020, https://en.wikipedia.org/wiki/Active_learning_(machine_learning)#cite_note-settles-1. Accessed 31 Oct 2020.
LC takes only most likely label into account and margin sampling
sampling takes the top two likely labels into account. Entropy sampling
uses the probability of all possible labels. This is done using the metric
called entropy. The sample with the largest entropy is selected.
Query Strategies - Entropy Sampling
References
1. Hosein, Stefan. "A Beginner's Guide To Active Learning". DatacampCommunity, 2020, https://www.datacamp.com/community/tutorials/active-learning. Accessed 31 Oct 2020.
2. "Active Learning (Machine Learning)". En.Wikipedia.Org, 2020, https://en.wikipedia.org/wiki/Active_learning_(machine_learning)#cite_note-settles-1. Accessed 31 Oct 2020.
Looking back: Steps in Active Learning
● Train the model on the labelled data.
● Evaluate the model on all the unlabelled samples.
● Based on the evaluation, choose the sample/list of samples to be labelled.
● Label these samples and add them to the labelled data.
● Repeat the above steps till a certain condition (stopping criterion)
Image Source: https://www.datacamp.com/community/tutorials/active-learning
Advantages of Active Learning
● Eases the problem of lack of labelled data; only a
fraction of the data has to be labelled
● Can be applied for online learning scenarios - many
practical scenarios in industries involve online learning
Application Areas of Active Learning
● Natural Language Processing[1]
○ Lots of data to label
● Reinforcement Learning
● Online Learning
○ Lot of data coming in continuously
○ Spam filters, ranking of search results, job listings, etc.[2]
References
1. Hosein, Stefan. "A Beginner's Guide To Active Learning". DatacampCommunity, 2020, https://www.datacamp.com/community/tutorials/active-learning. Accessed 31 Oct 2020.
2. https://www.oreilly.com/content/real-world-active-learning/
Active learning - Visualizations
Active Learning - Visualizations
Iris Dataset
Versicolor - red
Virginica -
cyan/light blue
Source: https://www.kaggle.com/uciml/iris
Active Learning - Visualizations
Active Learning - Visualizations
Active Learning - Visualizations
Active Learning - Visualizations
Active Learning - Visualizations
Active Learning - Visualizations
Active Learning - Visualizations
Active Learning - Visualizations
Tutorial on Active Learning
Active Learning Tutorial - bit.ly/medium-active-learning
Images
1. Towards Data Science Logo - https://miro.medium.com/max/1200/1*F0LADxTtsKOgmPa-_7iUEQ.jpeg
2. Medium.com Logo - https://miro.medium.com/max/8978/1*s986xIGqhfsN8U--09_AdA.png
References
Datacamp.org -
https://www.datacamp.com/community/tutorials/active-learning
Wikipedia.org -
https://en.wikipedia.org/wiki/Active_learning_(machine_learning)
Thank you

More Related Content

What's hot

Decision tree and random forest
Decision tree and random forestDecision tree and random forest
Decision tree and random forest
Lippo Group Digital
 
What is Machine Learning
What is Machine LearningWhat is Machine Learning
What is Machine Learning
Bhaskara Reddy Sannapureddy
 
Simple overview of machine learning
Simple overview of machine learningSimple overview of machine learning
Simple overview of machine learning
priyadharshini R
 
Decision trees in Machine Learning
Decision trees in Machine Learning Decision trees in Machine Learning
Decision trees in Machine Learning
Mohammad Junaid Khan
 
Ensemble methods
Ensemble methodsEnsemble methods
Ensemble methods
Christopher Marker
 
Introduction to XGBoost
Introduction to XGBoostIntroduction to XGBoost
Introduction to XGBoost
Joonyoung Yi
 
Continual Learning with Deep Architectures - Tutorial ICML 2021
Continual Learning with Deep Architectures - Tutorial ICML 2021Continual Learning with Deep Architectures - Tutorial ICML 2021
Continual Learning with Deep Architectures - Tutorial ICML 2021
Vincenzo Lomonaco
 
Unsupervised learning clustering
Unsupervised learning clusteringUnsupervised learning clustering
Unsupervised learning clustering
Arshad Farhad
 
Linear Regression Algorithm | Linear Regression in Python | Machine Learning ...
Linear Regression Algorithm | Linear Regression in Python | Machine Learning ...Linear Regression Algorithm | Linear Regression in Python | Machine Learning ...
Linear Regression Algorithm | Linear Regression in Python | Machine Learning ...
Edureka!
 
Supervised and Unsupervised Machine Learning
Supervised and Unsupervised Machine LearningSupervised and Unsupervised Machine Learning
Supervised and Unsupervised Machine Learning
Spotle.ai
 
hands on machine learning Chapter 6&7 decision tree, ensemble and random forest
hands on machine learning Chapter 6&7 decision tree, ensemble and random foresthands on machine learning Chapter 6&7 decision tree, ensemble and random forest
hands on machine learning Chapter 6&7 decision tree, ensemble and random forest
Jaey Jeong
 
Types of machine learning
Types of machine learningTypes of machine learning
Types of machine learning
HimaniAloona
 
Applications in Machine Learning
Applications in Machine LearningApplications in Machine Learning
Applications in Machine Learning
Joel Graff
 
Classification and Regression
Classification and RegressionClassification and Regression
Classification and Regression
Megha Sharma
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
Shahar Cohen
 
Introduction to-machine-learning
Introduction to-machine-learningIntroduction to-machine-learning
Introduction to-machine-learning
Babu Priyavrat
 
Continual Learning Introduction
Continual Learning IntroductionContinual Learning Introduction
Continual Learning Introduction
Ridge-i, Inc.
 
Cross validation
Cross validationCross validation
Cross validation
RidhaAfrawe
 
Machine learning overview
Machine learning overviewMachine learning overview
Machine learning overview
prih_yah
 
XGBoost: the algorithm that wins every competition
XGBoost: the algorithm that wins every competitionXGBoost: the algorithm that wins every competition
XGBoost: the algorithm that wins every competition
Jaroslaw Szymczak
 

What's hot (20)

Decision tree and random forest
Decision tree and random forestDecision tree and random forest
Decision tree and random forest
 
What is Machine Learning
What is Machine LearningWhat is Machine Learning
What is Machine Learning
 
Simple overview of machine learning
Simple overview of machine learningSimple overview of machine learning
Simple overview of machine learning
 
Decision trees in Machine Learning
Decision trees in Machine Learning Decision trees in Machine Learning
Decision trees in Machine Learning
 
Ensemble methods
Ensemble methodsEnsemble methods
Ensemble methods
 
Introduction to XGBoost
Introduction to XGBoostIntroduction to XGBoost
Introduction to XGBoost
 
Continual Learning with Deep Architectures - Tutorial ICML 2021
Continual Learning with Deep Architectures - Tutorial ICML 2021Continual Learning with Deep Architectures - Tutorial ICML 2021
Continual Learning with Deep Architectures - Tutorial ICML 2021
 
Unsupervised learning clustering
Unsupervised learning clusteringUnsupervised learning clustering
Unsupervised learning clustering
 
Linear Regression Algorithm | Linear Regression in Python | Machine Learning ...
Linear Regression Algorithm | Linear Regression in Python | Machine Learning ...Linear Regression Algorithm | Linear Regression in Python | Machine Learning ...
Linear Regression Algorithm | Linear Regression in Python | Machine Learning ...
 
Supervised and Unsupervised Machine Learning
Supervised and Unsupervised Machine LearningSupervised and Unsupervised Machine Learning
Supervised and Unsupervised Machine Learning
 
hands on machine learning Chapter 6&7 decision tree, ensemble and random forest
hands on machine learning Chapter 6&7 decision tree, ensemble and random foresthands on machine learning Chapter 6&7 decision tree, ensemble and random forest
hands on machine learning Chapter 6&7 decision tree, ensemble and random forest
 
Types of machine learning
Types of machine learningTypes of machine learning
Types of machine learning
 
Applications in Machine Learning
Applications in Machine LearningApplications in Machine Learning
Applications in Machine Learning
 
Classification and Regression
Classification and RegressionClassification and Regression
Classification and Regression
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Introduction to-machine-learning
Introduction to-machine-learningIntroduction to-machine-learning
Introduction to-machine-learning
 
Continual Learning Introduction
Continual Learning IntroductionContinual Learning Introduction
Continual Learning Introduction
 
Cross validation
Cross validationCross validation
Cross validation
 
Machine learning overview
Machine learning overviewMachine learning overview
Machine learning overview
 
XGBoost: the algorithm that wins every competition
XGBoost: the algorithm that wins every competitionXGBoost: the algorithm that wins every competition
XGBoost: the algorithm that wins every competition
 

Similar to Active learning

Data-Driven Learning Strategy
Data-Driven Learning StrategyData-Driven Learning Strategy
Data-Driven Learning Strategy
Jessie Chuang
 
training_presentation
training_presentationtraining_presentation
training_presentation
Yudi512144
 
Moodle and analytics present and future tl forum
Moodle and analytics   present and future tl forumMoodle and analytics   present and future tl forum
Moodle and analytics present and future tl forum
NetSpot Pty Ltd
 
Learning Analytics in Education: Using Student’s Big Data to Improve Teaching
Learning Analytics in Education:  Using Student’s Big Data to Improve TeachingLearning Analytics in Education:  Using Student’s Big Data to Improve Teaching
Learning Analytics in Education: Using Student’s Big Data to Improve Teaching
Rafael Scapin, Ph.D.
 
kobotraining
kobotraining kobotraining
kobotraining
Adugna Endale
 
kobo_training.pdf
kobo_training.pdfkobo_training.pdf
kobo_training.pdf
zaheerkhan660029
 
How AI will change the way you help students succeed - SchooLinks
How AI will change the way you help students succeed - SchooLinksHow AI will change the way you help students succeed - SchooLinks
How AI will change the way you help students succeed - SchooLinks
Katie Fang
 
Aligning Learning Analytics with Classroom Practices & Needs
Aligning Learning Analytics with Classroom Practices & NeedsAligning Learning Analytics with Classroom Practices & Needs
Aligning Learning Analytics with Classroom Practices & Needs
Simon Knight
 
EXTRA: Integrating External Knowledge into Multimodal Hashtag Recommendation ...
EXTRA: Integrating External Knowledge into Multimodal Hashtag Recommendation ...EXTRA: Integrating External Knowledge into Multimodal Hashtag Recommendation ...
EXTRA: Integrating External Knowledge into Multimodal Hashtag Recommendation ...
ssuser610732
 
Learning Analytics in Higher Education
Learning Analytics in Higher EducationLearning Analytics in Higher Education
Learning Analytics in Higher Education
Jose Antonio Omedes
 
Learning Analytics for Adaptive Learning And Standardization
Learning Analytics for Adaptive Learning And StandardizationLearning Analytics for Adaptive Learning And Standardization
Learning Analytics for Adaptive Learning And Standardization
Open Cyber University of Korea
 
Moodle and analytics - present and future
Moodle and analytics - present and futureMoodle and analytics - present and future
Moodle and analytics - present and future
NetSpot Pty Ltd
 
Learning Analytics: Realizing their Promise in the California State University
Learning Analytics:  Realizing their Promise in the California State UniversityLearning Analytics:  Realizing their Promise in the California State University
Learning Analytics: Realizing their Promise in the California State University
John Whitmer, Ed.D.
 
JISC RSC London Workshop - Learner analytics
JISC RSC London Workshop - Learner analyticsJISC RSC London Workshop - Learner analytics
JISC RSC London Workshop - Learner analytics
James Ballard
 
Introduction to Learning Analytics - Framework and Implementation Concerns
Introduction to Learning Analytics - Framework and Implementation ConcernsIntroduction to Learning Analytics - Framework and Implementation Concerns
Introduction to Learning Analytics - Framework and Implementation Concerns
Tore Hoel
 
Mootnz13 Moodle Analytics
Mootnz13 Moodle AnalyticsMootnz13 Moodle Analytics
Mootnz13 Moodle Analytics
NetSpot Pty Ltd
 
Learning Analytics – Opportunities for ISO/IEC JTC 1/SC36 standardisation
Learning Analytics – Opportunities for ISO/IEC JTC 1/SC36 standardisationLearning Analytics – Opportunities for ISO/IEC JTC 1/SC36 standardisation
Learning Analytics – Opportunities for ISO/IEC JTC 1/SC36 standardisation
Tore Hoel
 
Jisc learning analytics service oct 2016
Jisc learning analytics service oct 2016Jisc learning analytics service oct 2016
Jisc learning analytics service oct 2016
Paul Bailey
 
Introduction to Jisc's Learning Analytics project - Sept 2015
Introduction to Jisc's Learning Analytics project  - Sept 2015Introduction to Jisc's Learning Analytics project  - Sept 2015
Introduction to Jisc's Learning Analytics project - Sept 2015
mwebbjisc
 
V Jornadas eMadrid sobre "Educación Digital". Cristina Conati, University of ...
V Jornadas eMadrid sobre "Educación Digital". Cristina Conati, University of ...V Jornadas eMadrid sobre "Educación Digital". Cristina Conati, University of ...
V Jornadas eMadrid sobre "Educación Digital". Cristina Conati, University of ...
eMadrid network
 

Similar to Active learning (20)

Data-Driven Learning Strategy
Data-Driven Learning StrategyData-Driven Learning Strategy
Data-Driven Learning Strategy
 
training_presentation
training_presentationtraining_presentation
training_presentation
 
Moodle and analytics present and future tl forum
Moodle and analytics   present and future tl forumMoodle and analytics   present and future tl forum
Moodle and analytics present and future tl forum
 
Learning Analytics in Education: Using Student’s Big Data to Improve Teaching
Learning Analytics in Education:  Using Student’s Big Data to Improve TeachingLearning Analytics in Education:  Using Student’s Big Data to Improve Teaching
Learning Analytics in Education: Using Student’s Big Data to Improve Teaching
 
kobotraining
kobotraining kobotraining
kobotraining
 
kobo_training.pdf
kobo_training.pdfkobo_training.pdf
kobo_training.pdf
 
How AI will change the way you help students succeed - SchooLinks
How AI will change the way you help students succeed - SchooLinksHow AI will change the way you help students succeed - SchooLinks
How AI will change the way you help students succeed - SchooLinks
 
Aligning Learning Analytics with Classroom Practices & Needs
Aligning Learning Analytics with Classroom Practices & NeedsAligning Learning Analytics with Classroom Practices & Needs
Aligning Learning Analytics with Classroom Practices & Needs
 
EXTRA: Integrating External Knowledge into Multimodal Hashtag Recommendation ...
EXTRA: Integrating External Knowledge into Multimodal Hashtag Recommendation ...EXTRA: Integrating External Knowledge into Multimodal Hashtag Recommendation ...
EXTRA: Integrating External Knowledge into Multimodal Hashtag Recommendation ...
 
Learning Analytics in Higher Education
Learning Analytics in Higher EducationLearning Analytics in Higher Education
Learning Analytics in Higher Education
 
Learning Analytics for Adaptive Learning And Standardization
Learning Analytics for Adaptive Learning And StandardizationLearning Analytics for Adaptive Learning And Standardization
Learning Analytics for Adaptive Learning And Standardization
 
Moodle and analytics - present and future
Moodle and analytics - present and futureMoodle and analytics - present and future
Moodle and analytics - present and future
 
Learning Analytics: Realizing their Promise in the California State University
Learning Analytics:  Realizing their Promise in the California State UniversityLearning Analytics:  Realizing their Promise in the California State University
Learning Analytics: Realizing their Promise in the California State University
 
JISC RSC London Workshop - Learner analytics
JISC RSC London Workshop - Learner analyticsJISC RSC London Workshop - Learner analytics
JISC RSC London Workshop - Learner analytics
 
Introduction to Learning Analytics - Framework and Implementation Concerns
Introduction to Learning Analytics - Framework and Implementation ConcernsIntroduction to Learning Analytics - Framework and Implementation Concerns
Introduction to Learning Analytics - Framework and Implementation Concerns
 
Mootnz13 Moodle Analytics
Mootnz13 Moodle AnalyticsMootnz13 Moodle Analytics
Mootnz13 Moodle Analytics
 
Learning Analytics – Opportunities for ISO/IEC JTC 1/SC36 standardisation
Learning Analytics – Opportunities for ISO/IEC JTC 1/SC36 standardisationLearning Analytics – Opportunities for ISO/IEC JTC 1/SC36 standardisation
Learning Analytics – Opportunities for ISO/IEC JTC 1/SC36 standardisation
 
Jisc learning analytics service oct 2016
Jisc learning analytics service oct 2016Jisc learning analytics service oct 2016
Jisc learning analytics service oct 2016
 
Introduction to Jisc's Learning Analytics project - Sept 2015
Introduction to Jisc's Learning Analytics project  - Sept 2015Introduction to Jisc's Learning Analytics project  - Sept 2015
Introduction to Jisc's Learning Analytics project - Sept 2015
 
V Jornadas eMadrid sobre "Educación Digital". Cristina Conati, University of ...
V Jornadas eMadrid sobre "Educación Digital". Cristina Conati, University of ...V Jornadas eMadrid sobre "Educación Digital". Cristina Conati, University of ...
V Jornadas eMadrid sobre "Educación Digital". Cristina Conati, University of ...
 

Recently uploaded

HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
Edge AI and Vision Alliance
 
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
Fwdays
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
Ivanti
 
Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |
AstuteBusiness
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
MichaelKnudsen27
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
Jakub Marek
 
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
Alex Pruden
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
ssuserfac0301
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
Chart Kalyan
 
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Pitangent Analytics & Technology Solutions Pvt. Ltd
 
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansBiomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Neo4j
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
Brandon Minnick, MBA
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
Zilliz
 
Essentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation ParametersEssentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation Parameters
Safe Software
 
Harnessing the Power of NLP and Knowledge Graphs for Opioid Research
Harnessing the Power of NLP and Knowledge Graphs for Opioid ResearchHarnessing the Power of NLP and Knowledge Graphs for Opioid Research
Harnessing the Power of NLP and Knowledge Graphs for Opioid Research
Neo4j
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
Jason Packer
 
AppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSFAppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSF
Ajin Abraham
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Tosin Akinosho
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
DanBrown980551
 

Recently uploaded (20)

HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
 
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
 
Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
 
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
 
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
 
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansBiomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
 
Essentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation ParametersEssentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation Parameters
 
Harnessing the Power of NLP and Knowledge Graphs for Opioid Research
Harnessing the Power of NLP and Knowledge Graphs for Opioid ResearchHarnessing the Power of NLP and Knowledge Graphs for Opioid Research
Harnessing the Power of NLP and Knowledge Graphs for Opioid Research
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
 
AppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSFAppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSF
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
 

Active learning

  • 1. Active Learning (ML) Akhilesh Ravi Indian Institute of Technology Gandhinagar
  • 2. What is active learning? ● Let us say that you have some data. ● You want to apply a machine learning technique to classify the data. ● No labelled samples or a very small number of labelled samples, and a large amount of data. ● Labelling each sample is expensive. ● What will you do?
  • 3. What is active learning? Image Source: https://www.datacamp.com/community/tutorials/active-learning
  • 4. What is active learning?
  • 5. What is active learning? “The key idea behind active learning is that a machine learning algorithm can perform better with less training if it is allowed to choose the data from which it learns.”[1] ● It gives the samples to be labelled in such a way that with less labelled samples, the machine learning model performs well - it chooses the optimal set of samples to be labelled for good performance ● We could say that the model actively learns and sees what to learn next so that it can perform better. References: 1. Settles, Burr. Active learning literature survey. University of Wisconsin-Madison Department of Computer Sciences, 2009. 2. "Active Learning (Machine Learning)". En.Wikipedia.Org, 2020, https://en.wikipedia.org/wiki/Active_learning_(machine_learning)#cite_note-settles-1. Accessed 31 Oct 2020.
  • 6. Steps in Active Learning ● Train the model on the labelled data. ● Evaluate the model on all the unlabelled samples. ● Based on the evaluation, choose the sample/list of samples to be labelled. ● Label these samples and add them to the labelled data. ● Repeat the above steps till a certain condition (stopping criterion)
  • 7. An Important Component Oracle - person or a model that knows the correct answer/classification/prediction to all questions/queries.[1][2] Practically, an expert in the corresponding field would be considered as an oracle. 1. Hosein, Stefan. "A Beginner's Guide To Active Learning". DatacampCommunity, 2020, https://www.datacamp.com/community/tutorials/active-learning. Accessed 31 Oct 2020. 2. "Active Learning (Machine Learning)". En.Wikipedia.Org, 2020, https://en.wikipedia.org/wiki/Active_learning_(machine_learning)#cite_note-settles-1. Accessed 31 Oct 2020.
  • 8. Active learning - Scenarios
  • 9. Active Learning - Scenarios ● Membership Query Synthesis ● Pool-based Sampling ● Stream-based Selective Sampling
  • 10. Membership Query Synthesis ● The learner has a distribution made from the original data. ● The learner generates a sample from this distribution. ● The oracle gives the prediction for the sample ● This is added to the dataset that the learner uses for learning References 1. Hosein, Stefan. "A Beginner's Guide To Active Learning". DatacampCommunity, 2020, https://www.datacamp.com/community/tutorials/active-learning. Accessed 31 Oct 2020. 2. "Active Learning (Machine Learning)". En.Wikipedia.Org, 2020, https://en.wikipedia.org/wiki/Active_learning_(machine_learning)#cite_note-settles-1. Accessed 31 Oct 2020.
  • 11. Membership Query Synthesis Image Source: https://www.statisticsfromatoz.com/uploads/7/3/2/1/73216723/discrete-and-cont-distributions_orig.png
  • 12. Pool-based Sampling ● A data pool of unlabelled samples ● Informativeness score - assigned to all the samples in the pool or a subset of pool if the pool is very large. ● The most informative sample(s) is(are) selected. ● Depending on the configuration ,one sample can be chosen each time or a few samples can be chosen each time. ● These are labelled and added to the dataset that the learner uses for learning. References 1. Hosein, Stefan. "A Beginner's Guide To Active Learning". DatacampCommunity, 2020, https://www.datacamp.com/community/tutorials/active-learning. Accessed 31 Oct 2020. 2. "Active Learning (Machine Learning)". En.Wikipedia.Org, 2020, https://en.wikipedia.org/wiki/Active_learning_(machine_learning)#cite_note-settles-1. Accessed 31 Oct 2020.
  • 13. Pool-based Sampling Image Source: https://www.datacamp.com/community/tutorials/active-learning
  • 14. Stream-based Selective Sampling Stream-based Selective Sampling ● Assumption: Getting an unlabelled sample is free ● The samples are taken one by one and examined ● Based on an informativeness score for each sample, decide whether an instance has to be labelled in this iteration or not. ● Many iterations References 1. Hosein, Stefan. "A Beginner's Guide To Active Learning". DatacampCommunity, 2020, https://www.datacamp.com/community/tutorials/active-learning. Accessed 31 Oct 2020. 2. "Active Learning (Machine Learning)". En.Wikipedia.Org, 2020, https://en.wikipedia.org/wiki/Active_learning_(machine_learning)#cite_note-settles-1. Accessed 31 Oct 2020.
  • 16. Query Strategies There are many query strategies.[1][2] Here are three common strategies: ● Least Confidence ● Margin Sampling ● Entropy Sampling 1. Hosein, Stefan. "A Beginner's Guide To Active Learning". DatacampCommunity, 2020, https://www.datacamp.com/community/tutorials/active-learning. Accessed 31 Oct 2020. 2. "Active Learning (Machine Learning)". En.Wikipedia.Org, 2020, https://en.wikipedia.org/wiki/Active_learning_(machine_learning)#cite_note-settles-1. Accessed 31 Oct 2020.
  • 17. Query Strategies - Least Confidence The sample which has the least probability for its most likely label is chosen. Eg - Let the samples in a dataset be in three classes - A, B, C. Sample S1 probabilities: A - 0.5, B - 0.25, C - 0.25 Most likely label: A (0.5) Sample S2 probabilities: A - 0.1, B - 0.8, C - 0.1 Most likely label: B (0.8) Here, S1 will be chosen according to the above query strategy. References 1. Hosein, Stefan. "A Beginner's Guide To Active Learning". DatacampCommunity, 2020, https://www.datacamp.com/community/tutorials/active-learning. Accessed 31 Oct 2020. 2. "Active Learning (Machine Learning)". En.Wikipedia.Org, 2020, https://en.wikipedia.org/wiki/Active_learning_(machine_learning)#cite_note-settles-1. Accessed 31 Oct 2020.
  • 18. Query Strategies - Margin Sampling The problem with LC is that is takes only the most likely label into account. Thus, this query strategy takes the sample which has least difference between its most likely label and second most likely label probabilities. Eg - Let the samples in a dataset be in three classes - A, B, C. Sample S1 probabilities: A - 0.5, B - 0.45, C - 0.05 0.5 - 0.45 = 0.05 Sample S2 probabilities: A - 0.3, B - 0.4, C - 0.3 0.4 - 0.3 = 0.1 Here, S1 will be chosen according to the margin sample. According to LC, S2 will be chosen. References 1. Hosein, Stefan. "A Beginner's Guide To Active Learning". DatacampCommunity, 2020, https://www.datacamp.com/community/tutorials/active-learning. Accessed 31 Oct 2020. 2. "Active Learning (Machine Learning)". En.Wikipedia.Org, 2020, https://en.wikipedia.org/wiki/Active_learning_(machine_learning)#cite_note-settles-1. Accessed 31 Oct 2020.
  • 19. LC takes only most likely label into account and margin sampling sampling takes the top two likely labels into account. Entropy sampling uses the probability of all possible labels. This is done using the metric called entropy. The sample with the largest entropy is selected. Query Strategies - Entropy Sampling References 1. Hosein, Stefan. "A Beginner's Guide To Active Learning". DatacampCommunity, 2020, https://www.datacamp.com/community/tutorials/active-learning. Accessed 31 Oct 2020. 2. "Active Learning (Machine Learning)". En.Wikipedia.Org, 2020, https://en.wikipedia.org/wiki/Active_learning_(machine_learning)#cite_note-settles-1. Accessed 31 Oct 2020.
  • 20. Looking back: Steps in Active Learning ● Train the model on the labelled data. ● Evaluate the model on all the unlabelled samples. ● Based on the evaluation, choose the sample/list of samples to be labelled. ● Label these samples and add them to the labelled data. ● Repeat the above steps till a certain condition (stopping criterion) Image Source: https://www.datacamp.com/community/tutorials/active-learning
  • 21. Advantages of Active Learning ● Eases the problem of lack of labelled data; only a fraction of the data has to be labelled ● Can be applied for online learning scenarios - many practical scenarios in industries involve online learning
  • 22. Application Areas of Active Learning ● Natural Language Processing[1] ○ Lots of data to label ● Reinforcement Learning ● Online Learning ○ Lot of data coming in continuously ○ Spam filters, ranking of search results, job listings, etc.[2] References 1. Hosein, Stefan. "A Beginner's Guide To Active Learning". DatacampCommunity, 2020, https://www.datacamp.com/community/tutorials/active-learning. Accessed 31 Oct 2020. 2. https://www.oreilly.com/content/real-world-active-learning/
  • 23. Active learning - Visualizations
  • 24. Active Learning - Visualizations Iris Dataset Versicolor - red Virginica - cyan/light blue Source: https://www.kaggle.com/uciml/iris
  • 25. Active Learning - Visualizations
  • 26. Active Learning - Visualizations
  • 27. Active Learning - Visualizations
  • 28. Active Learning - Visualizations
  • 29. Active Learning - Visualizations
  • 30. Active Learning - Visualizations
  • 31. Active Learning - Visualizations
  • 32. Active Learning - Visualizations
  • 33. Tutorial on Active Learning Active Learning Tutorial - bit.ly/medium-active-learning Images 1. Towards Data Science Logo - https://miro.medium.com/max/1200/1*F0LADxTtsKOgmPa-_7iUEQ.jpeg 2. Medium.com Logo - https://miro.medium.com/max/8978/1*s986xIGqhfsN8U--09_AdA.png