SlideShare a Scribd company logo
1 of 27
Download to read offline
SMU Classification: Restricted
1. Yan, Y., Rosales, R., Fung, G., & Dy, J. G. (2011). Active learning from crowds. In ICML (Vol. 11, pp. 1161–1168).
2. Bi, W., Wang, L., Kwok, J. T., & Tu, Z. (2014). Learning to Predict from Crowdsourced Data. In UAI (pp. 82–91).
3. Rodrigues, F., Lourenco, M., Ribeiro, B., & Pereira, F. C. (2017). Learning Supervised Topic Models for
Classification and Regression from Crowds. IEEE Transactions on Pattern Analysis and Machine Intelligence,
39(12), 2409–2422.
SMU Classification: Restricted
1
SMU Classification: Restricted
2
• Most research on supervised learning techniques rely on an
often overlooked assumption that a single domain expert
can provide the required supervision
• Crowdsourcing
- Quality: Mixture of experts and non-experts, annotators having
different expertise
- Inference: Truth inference from noisy labels
- Budget: How to collect enough useful labels before running
out of budget?
SMU Classification: Restricted
3
• Motivation behind Crowdsourcing
- It is difficult to collect a single golden ground-truth in some
problem domains
- It is often the case that an annotator does not have the
appropriate knowledge for annotating all the data, even for a
particular domain
- In many instances, collecting annotations from multiple non-
expert annotators can be less costly than collecting
annotations just from one expert
- Collaboration and knowledge sharing is becoming more
common, and thus technology for combining multiple opinions
will become necessary
SMU Classification: Restricted
4
• In many learning tasks the labeled data is limited in quantity
or expensive to obtain, but the amount of unlabeled data is
large or easy to obtain
• Try to learn the most at a given cost
- Identify the most useful data point to label given the
information obtained
- Identify the most useful annotator
SMU Classification: Restricted
5
• Active Learning from Crowds and Extensions
- Simple Ground Truth Inference
- Learn the prediction model at the same time
- Extend some existed model to the active learning from crowds
scenario
SMU Classification: Restricted
6
• Sometimes the annotator may not have the knowledge to
label the data accurately
- The annotation may comes from the observation of the input
data, not the underlying ground truth
• Goal
- Actively collect ground truth from the worker, and learn a
prediction model
SMU Classification: Restricted
7
• Probabilistic Multi-Labeler Model
- ! data points {#$, #&, … , #!} from input space )
- The label for the *-th data point by annotator + is ,*
(+)
from
label space /
- The unknown ground truth for the *-th data point is 0* from
output space 1
- All 0 and some of , are unobservable
SMU Classification: Restricted
8
• Model Definition
- The classifier is trained by assuming a probabilistic model
over random variables !, ", and #
where $%
&
is the set of annotators for %-th data point
SMU Classification: Restricted
9
• Model Definition
-
- We could use a Gaussian model:
where the variance depends on the input ! and is specific to
each annotator "
For binary classification, the variance is a logistic function of
input and annotator
SMU Classification: Restricted
10
• Model Definition
-
- We could use a Bernoulli model:
where !"($) is also a logistic function of the input and the
labeler identity "
SMU Classification: Restricted
11
• Model Definition
-
- Gaussian model allows for assigning a lower variance to input
regions where the labeler is more consistently correct relative
to areas where there are inconsistencies
- Bernoulli model assigns a higher probability of the labeler
being correct to certain input areas relative to other areas
SMU Classification: Restricted
12
• Model Definition
-
- The following logistic regression function is used
because the task is binary classification
SMU Classification: Restricted
13
• Optimally Selecting New Training Points and Annotators
- Pick a new training point to be labeled
- Pick a appropriate labeler among all available labelers
• To find the least confident data point
- The potential samples for which the probability of !(# = %|')
is close to
)
*
• To find the most confident annotator given data point
- Recall the aforementioned variance formula
Find the annotator with minimal variance
SMU Classification: Restricted
14
• Document Classification Task
- Binary Classification
SMU Classification: Restricted
16
• Workers’ qualities can vary drastically and lead to different
noise levels in their annotations
- The worker might not be a expert
- The worker’s default label judgement is incorrect
- Different labeling tasks can have different difficulties
- Worker may not be dedicated to the task
• Worker’s decision process:
- If the worker is dedicated to the labeling task or if he considers
the sample as easy, the corresponding label is generated
according to his underlying decision function
- Otherwise, the label is generated based on his default labeling
judgement
SMU Classification: Restricted
17
• The task is a binary classification problem with:
- ! workers
- " query samples
- The #-th sample $(#) is annotated by the set of workers '# ⊆
{*, ,, … , !}
- The annotation by the /-th worker is 0/
(#)
∈ {2, *}
- The ground truth 0∗(#) ∈ {2, *} is generated by a logistic
regression model with parameters 4∗
where
SMU Classification: Restricted
18
• Reasons that an annotator gives incorrect label:
1. The annotate is dedicated to the task, but the expertise is not
strong enough
The worker !‘s annotation follows a Bernoulli distribution
where "! is !’s estimation of "∗
A small $! suggests "! being very similar to "∗
-> worker ! has high accuracy
SMU Classification: Restricted
19
• Reasons that an annotator gives incorrect label:
2. The annotator is not dedicated to the task, he randomly
annotates according to some default judgement
The worker !‘s annotation follows a Bernoulli distribution
where "! ∈ [%, ']
• Combining the two reasons:
SMU Classification: Restricted
20
• Difficulty to an annotator affects the quality significantly:
- The difficulty of !-th sample "(!) to annotator % is &%
(!)
If "(!) is difficult to %, &%
(!)
will be closed to 0
- The sample is difficult if it’s closed to the worker’s decision
boundary
A small '% will makes an easy sample (w/ large distance to the
boundary) seems difficult to the worker
Distance to the boundary
Sensitivity to
sample difficulty
SMU Classification: Restricted
21
Accuracy of worker
Whether the
worker is
dedicated to
the task
Sensitivity of
worker to
the difficulty
of the task
Difficulty of
the sample to
the worker
Ground truth
is generated
by a logistic
regression
with these
params
W* is drawn
from this
prior
Worker’s
estimatio
n of w*
SMU Classification: Restricted
22
• Baselines
- MTL: prediction model is average of all workers’ model
- RY: coin flipping to decide whether annotation comes from
bias/ground truth
- YAN: active learning from crowd
- GLAD: considering sample difficulty and workers’ expertise
- CUBAM: considering workers’ expertise and bias
- MV: majority vote
Algorithm learns a prediction model
SMU Classification: Restricted
23
wordtopic
Dist. of
topic-doc
Dist. of
topic-word
# doc
# word
Prior of !
SMU Classification: Restricted
24
Latent class
(truth)
Label
(annotation)
Reliability of
worker
# worker
# class
SMU Classification: Restricted
25
• Other than binary classification
- Multi-Label Learning from Crowds
• Level of confidence
- Active Learning from Crowds with Unsure Option
- Active Learning with Confidence-based Answers for
Crowdsourcing Labelling Tasks
• More complicated models:
- Gaussian Process Classification and Active Learning with
Multiple Annotators
- Deep Learning from Crowds
SMU Classification: Restricted
26
• Crowdsourcing can be very helpful when performing out-of-
sample prediction
• Existed models can be extended to be put in the
crowdsourcing scenario
SMU Classification: Restricted

More Related Content

Similar to Active learning from crowds

Tips and tricks to win kaggle data science competitions
Tips and tricks to win kaggle data science competitionsTips and tricks to win kaggle data science competitions
Tips and tricks to win kaggle data science competitionsDarius Barušauskas
 
Artificial Neural Networks for data mining
Artificial Neural Networks for data miningArtificial Neural Networks for data mining
Artificial Neural Networks for data miningALIZAIB KHAN
 
Lecture 2 Data mining process.pdf
Lecture 2 Data mining process.pdfLecture 2 Data mining process.pdf
Lecture 2 Data mining process.pdfKaushik Kundu
 
Kaggle Gold Medal Case Study
Kaggle Gold Medal Case StudyKaggle Gold Medal Case Study
Kaggle Gold Medal Case StudyAlon Bochman, CFA
 
2015EDM: A Framework for Multifaceted Evaluation of Student Models (Polygon)
2015EDM: A Framework for Multifaceted Evaluation of Student Models (Polygon)2015EDM: A Framework for Multifaceted Evaluation of Student Models (Polygon)
2015EDM: A Framework for Multifaceted Evaluation of Student Models (Polygon)Yun Huang
 
Iwsm2014 cosmic approximate sizing using a fuzzy logic approach (alain abran)
Iwsm2014   cosmic approximate sizing using a fuzzy logic approach (alain abran)Iwsm2014   cosmic approximate sizing using a fuzzy logic approach (alain abran)
Iwsm2014 cosmic approximate sizing using a fuzzy logic approach (alain abran)Nesma
 
Unit 2 MARKETING RESEARCH
Unit 2 MARKETING RESEARCHUnit 2 MARKETING RESEARCH
Unit 2 MARKETING RESEARCHPramod Rawat
 
Chapter 02 collaborative recommendation
Chapter 02   collaborative recommendationChapter 02   collaborative recommendation
Chapter 02 collaborative recommendationAravindharamanan S
 
Chapter 02 collaborative recommendation
Chapter 02   collaborative recommendationChapter 02   collaborative recommendation
Chapter 02 collaborative recommendationAravindharamanan S
 
Machine Learning in the Financial Industry
Machine Learning in the Financial IndustryMachine Learning in the Financial Industry
Machine Learning in the Financial IndustrySubrat Panda, PhD
 
Analysing a Complex Agent-Based Model Using Data-Mining Techniques
Analysing a Complex Agent-Based Model  Using Data-Mining TechniquesAnalysing a Complex Agent-Based Model  Using Data-Mining Techniques
Analysing a Complex Agent-Based Model Using Data-Mining TechniquesBruce Edmonds
 
Modelling and evaluation
Modelling and evaluationModelling and evaluation
Modelling and evaluationeShikshak
 
Mb0050 research methodology
Mb0050   research methodologyMb0050   research methodology
Mb0050 research methodologysmumbahelp
 
Mb0050 research methodology
Mb0050   research methodologyMb0050   research methodology
Mb0050 research methodologysmumbahelp
 
Multi variate presentation
Multi variate presentationMulti variate presentation
Multi variate presentationArun Kumar
 
Connections b/w active learning and model extraction
Connections b/w active learning and model extractionConnections b/w active learning and model extraction
Connections b/w active learning and model extractionAnmol Dwivedi
 

Similar to Active learning from crowds (20)

Tips and tricks to win kaggle data science competitions
Tips and tricks to win kaggle data science competitionsTips and tricks to win kaggle data science competitions
Tips and tricks to win kaggle data science competitions
 
Artificial Neural Networks for data mining
Artificial Neural Networks for data miningArtificial Neural Networks for data mining
Artificial Neural Networks for data mining
 
Artificial Neural Networks for Data Mining
Artificial Neural Networks for Data MiningArtificial Neural Networks for Data Mining
Artificial Neural Networks for Data Mining
 
Lecture 2 Data mining process.pdf
Lecture 2 Data mining process.pdfLecture 2 Data mining process.pdf
Lecture 2 Data mining process.pdf
 
Kaggle Gold Medal Case Study
Kaggle Gold Medal Case StudyKaggle Gold Medal Case Study
Kaggle Gold Medal Case Study
 
2015EDM: A Framework for Multifaceted Evaluation of Student Models (Polygon)
2015EDM: A Framework for Multifaceted Evaluation of Student Models (Polygon)2015EDM: A Framework for Multifaceted Evaluation of Student Models (Polygon)
2015EDM: A Framework for Multifaceted Evaluation of Student Models (Polygon)
 
Iwsm2014 cosmic approximate sizing using a fuzzy logic approach (alain abran)
Iwsm2014   cosmic approximate sizing using a fuzzy logic approach (alain abran)Iwsm2014   cosmic approximate sizing using a fuzzy logic approach (alain abran)
Iwsm2014 cosmic approximate sizing using a fuzzy logic approach (alain abran)
 
Iwsm2014 cosmic approximate sizing using a fuzzy logic approach (alain abran)
Iwsm2014   cosmic approximate sizing using a fuzzy logic approach (alain abran)Iwsm2014   cosmic approximate sizing using a fuzzy logic approach (alain abran)
Iwsm2014 cosmic approximate sizing using a fuzzy logic approach (alain abran)
 
Unit 2 MARKETING RESEARCH
Unit 2 MARKETING RESEARCHUnit 2 MARKETING RESEARCH
Unit 2 MARKETING RESEARCH
 
Chapter 02 collaborative recommendation
Chapter 02   collaborative recommendationChapter 02   collaborative recommendation
Chapter 02 collaborative recommendation
 
Chapter 02 collaborative recommendation
Chapter 02   collaborative recommendationChapter 02   collaborative recommendation
Chapter 02 collaborative recommendation
 
Machine Learning in the Financial Industry
Machine Learning in the Financial IndustryMachine Learning in the Financial Industry
Machine Learning in the Financial Industry
 
Analysing a Complex Agent-Based Model Using Data-Mining Techniques
Analysing a Complex Agent-Based Model  Using Data-Mining TechniquesAnalysing a Complex Agent-Based Model  Using Data-Mining Techniques
Analysing a Complex Agent-Based Model Using Data-Mining Techniques
 
Modelling and evaluation
Modelling and evaluationModelling and evaluation
Modelling and evaluation
 
Mb0050 research methodology
Mb0050   research methodologyMb0050   research methodology
Mb0050 research methodology
 
Mb0050 research methodology
Mb0050   research methodologyMb0050   research methodology
Mb0050 research methodology
 
Multi variate presentation
Multi variate presentationMulti variate presentation
Multi variate presentation
 
Connections b/w active learning and model extraction
Connections b/w active learning and model extractionConnections b/w active learning and model extraction
Connections b/w active learning and model extraction
 
Data Processing
 Data Processing Data Processing
Data Processing
 
ML MODULE 4.pdf
ML MODULE 4.pdfML MODULE 4.pdf
ML MODULE 4.pdf
 

More from PC LO

Task oriented word embedding for text classification
Task oriented word embedding for text classificationTask oriented word embedding for text classification
Task oriented word embedding for text classificationPC LO
 
Chinese liwc lexicon expansion via hierarchical classification of word embedd...
Chinese liwc lexicon expansion via hierarchical classification of word embedd...Chinese liwc lexicon expansion via hierarchical classification of word embedd...
Chinese liwc lexicon expansion via hierarchical classification of word embedd...PC LO
 
ext mining for the Vaccine Adverse Event Reporting System: medical text class...
ext mining for the Vaccine Adverse Event Reporting System: medical text class...ext mining for the Vaccine Adverse Event Reporting System: medical text class...
ext mining for the Vaccine Adverse Event Reporting System: medical text class...PC LO
 
Sentiment analysis and opinion mining Ch.7
Sentiment analysis and opinion mining Ch.7Sentiment analysis and opinion mining Ch.7
Sentiment analysis and opinion mining Ch.7PC LO
 
On Joint Modeling of Topical Communities and Personal Interest in Microblogs
On Joint Modeling of Topical Communities and Personal Interest in MicroblogsOn Joint Modeling of Topical Communities and Personal Interest in Microblogs
On Joint Modeling of Topical Communities and Personal Interest in MicroblogsPC LO
 
Campclass
CampclassCampclass
CampclassPC LO
 
MIS報告 供應鏈管理
MIS報告 供應鏈管理MIS報告 供應鏈管理
MIS報告 供應鏈管理PC LO
 
資料庫期末報告
資料庫期末報告資料庫期末報告
資料庫期末報告PC LO
 
新事業期末報告
新事業期末報告新事業期末報告
新事業期末報告PC LO
 
ERP 期末報告
ERP 期末報告ERP 期末報告
ERP 期末報告PC LO
 
User Acceptance of Information Technology
User Acceptance of Information TechnologyUser Acceptance of Information Technology
User Acceptance of Information TechnologyPC LO
 
SCM_B2B
SCM_B2BSCM_B2B
SCM_B2BPC LO
 
Ubuntu
UbuntuUbuntu
UbuntuPC LO
 
Travelution
TravelutionTravelution
TravelutionPC LO
 
社團整合系統
社團整合系統社團整合系統
社團整合系統PC LO
 
禿窄痘胖醜_期中
禿窄痘胖醜_期中禿窄痘胖醜_期中
禿窄痘胖醜_期中PC LO
 

More from PC LO (16)

Task oriented word embedding for text classification
Task oriented word embedding for text classificationTask oriented word embedding for text classification
Task oriented word embedding for text classification
 
Chinese liwc lexicon expansion via hierarchical classification of word embedd...
Chinese liwc lexicon expansion via hierarchical classification of word embedd...Chinese liwc lexicon expansion via hierarchical classification of word embedd...
Chinese liwc lexicon expansion via hierarchical classification of word embedd...
 
ext mining for the Vaccine Adverse Event Reporting System: medical text class...
ext mining for the Vaccine Adverse Event Reporting System: medical text class...ext mining for the Vaccine Adverse Event Reporting System: medical text class...
ext mining for the Vaccine Adverse Event Reporting System: medical text class...
 
Sentiment analysis and opinion mining Ch.7
Sentiment analysis and opinion mining Ch.7Sentiment analysis and opinion mining Ch.7
Sentiment analysis and opinion mining Ch.7
 
On Joint Modeling of Topical Communities and Personal Interest in Microblogs
On Joint Modeling of Topical Communities and Personal Interest in MicroblogsOn Joint Modeling of Topical Communities and Personal Interest in Microblogs
On Joint Modeling of Topical Communities and Personal Interest in Microblogs
 
Campclass
CampclassCampclass
Campclass
 
MIS報告 供應鏈管理
MIS報告 供應鏈管理MIS報告 供應鏈管理
MIS報告 供應鏈管理
 
資料庫期末報告
資料庫期末報告資料庫期末報告
資料庫期末報告
 
新事業期末報告
新事業期末報告新事業期末報告
新事業期末報告
 
ERP 期末報告
ERP 期末報告ERP 期末報告
ERP 期末報告
 
User Acceptance of Information Technology
User Acceptance of Information TechnologyUser Acceptance of Information Technology
User Acceptance of Information Technology
 
SCM_B2B
SCM_B2BSCM_B2B
SCM_B2B
 
Ubuntu
UbuntuUbuntu
Ubuntu
 
Travelution
TravelutionTravelution
Travelution
 
社團整合系統
社團整合系統社團整合系統
社團整合系統
 
禿窄痘胖醜_期中
禿窄痘胖醜_期中禿窄痘胖醜_期中
禿窄痘胖醜_期中
 

Recently uploaded

Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Delhi Call girls
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...shambhavirathore45
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramMoniSankarHazra
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...shivangimorya083
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 

Recently uploaded (20)

Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 

Active learning from crowds

  • 1. SMU Classification: Restricted 1. Yan, Y., Rosales, R., Fung, G., & Dy, J. G. (2011). Active learning from crowds. In ICML (Vol. 11, pp. 1161–1168). 2. Bi, W., Wang, L., Kwok, J. T., & Tu, Z. (2014). Learning to Predict from Crowdsourced Data. In UAI (pp. 82–91). 3. Rodrigues, F., Lourenco, M., Ribeiro, B., & Pereira, F. C. (2017). Learning Supervised Topic Models for Classification and Regression from Crowds. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(12), 2409–2422.
  • 3. SMU Classification: Restricted 2 • Most research on supervised learning techniques rely on an often overlooked assumption that a single domain expert can provide the required supervision • Crowdsourcing - Quality: Mixture of experts and non-experts, annotators having different expertise - Inference: Truth inference from noisy labels - Budget: How to collect enough useful labels before running out of budget?
  • 4. SMU Classification: Restricted 3 • Motivation behind Crowdsourcing - It is difficult to collect a single golden ground-truth in some problem domains - It is often the case that an annotator does not have the appropriate knowledge for annotating all the data, even for a particular domain - In many instances, collecting annotations from multiple non- expert annotators can be less costly than collecting annotations just from one expert - Collaboration and knowledge sharing is becoming more common, and thus technology for combining multiple opinions will become necessary
  • 5. SMU Classification: Restricted 4 • In many learning tasks the labeled data is limited in quantity or expensive to obtain, but the amount of unlabeled data is large or easy to obtain • Try to learn the most at a given cost - Identify the most useful data point to label given the information obtained - Identify the most useful annotator
  • 6. SMU Classification: Restricted 5 • Active Learning from Crowds and Extensions - Simple Ground Truth Inference - Learn the prediction model at the same time - Extend some existed model to the active learning from crowds scenario
  • 7. SMU Classification: Restricted 6 • Sometimes the annotator may not have the knowledge to label the data accurately - The annotation may comes from the observation of the input data, not the underlying ground truth • Goal - Actively collect ground truth from the worker, and learn a prediction model
  • 8. SMU Classification: Restricted 7 • Probabilistic Multi-Labeler Model - ! data points {#$, #&, … , #!} from input space ) - The label for the *-th data point by annotator + is ,* (+) from label space / - The unknown ground truth for the *-th data point is 0* from output space 1 - All 0 and some of , are unobservable
  • 9. SMU Classification: Restricted 8 • Model Definition - The classifier is trained by assuming a probabilistic model over random variables !, ", and # where $% & is the set of annotators for %-th data point
  • 10. SMU Classification: Restricted 9 • Model Definition - - We could use a Gaussian model: where the variance depends on the input ! and is specific to each annotator " For binary classification, the variance is a logistic function of input and annotator
  • 11. SMU Classification: Restricted 10 • Model Definition - - We could use a Bernoulli model: where !"($) is also a logistic function of the input and the labeler identity "
  • 12. SMU Classification: Restricted 11 • Model Definition - - Gaussian model allows for assigning a lower variance to input regions where the labeler is more consistently correct relative to areas where there are inconsistencies - Bernoulli model assigns a higher probability of the labeler being correct to certain input areas relative to other areas
  • 13. SMU Classification: Restricted 12 • Model Definition - - The following logistic regression function is used because the task is binary classification
  • 14. SMU Classification: Restricted 13 • Optimally Selecting New Training Points and Annotators - Pick a new training point to be labeled - Pick a appropriate labeler among all available labelers • To find the least confident data point - The potential samples for which the probability of !(# = %|') is close to ) * • To find the most confident annotator given data point - Recall the aforementioned variance formula Find the annotator with minimal variance
  • 15. SMU Classification: Restricted 14 • Document Classification Task - Binary Classification
  • 16. SMU Classification: Restricted 16 • Workers’ qualities can vary drastically and lead to different noise levels in their annotations - The worker might not be a expert - The worker’s default label judgement is incorrect - Different labeling tasks can have different difficulties - Worker may not be dedicated to the task • Worker’s decision process: - If the worker is dedicated to the labeling task or if he considers the sample as easy, the corresponding label is generated according to his underlying decision function - Otherwise, the label is generated based on his default labeling judgement
  • 17. SMU Classification: Restricted 17 • The task is a binary classification problem with: - ! workers - " query samples - The #-th sample $(#) is annotated by the set of workers '# ⊆ {*, ,, … , !} - The annotation by the /-th worker is 0/ (#) ∈ {2, *} - The ground truth 0∗(#) ∈ {2, *} is generated by a logistic regression model with parameters 4∗ where
  • 18. SMU Classification: Restricted 18 • Reasons that an annotator gives incorrect label: 1. The annotate is dedicated to the task, but the expertise is not strong enough The worker !‘s annotation follows a Bernoulli distribution where "! is !’s estimation of "∗ A small $! suggests "! being very similar to "∗ -> worker ! has high accuracy
  • 19. SMU Classification: Restricted 19 • Reasons that an annotator gives incorrect label: 2. The annotator is not dedicated to the task, he randomly annotates according to some default judgement The worker !‘s annotation follows a Bernoulli distribution where "! ∈ [%, '] • Combining the two reasons:
  • 20. SMU Classification: Restricted 20 • Difficulty to an annotator affects the quality significantly: - The difficulty of !-th sample "(!) to annotator % is &% (!) If "(!) is difficult to %, &% (!) will be closed to 0 - The sample is difficult if it’s closed to the worker’s decision boundary A small '% will makes an easy sample (w/ large distance to the boundary) seems difficult to the worker Distance to the boundary Sensitivity to sample difficulty
  • 21. SMU Classification: Restricted 21 Accuracy of worker Whether the worker is dedicated to the task Sensitivity of worker to the difficulty of the task Difficulty of the sample to the worker Ground truth is generated by a logistic regression with these params W* is drawn from this prior Worker’s estimatio n of w*
  • 22. SMU Classification: Restricted 22 • Baselines - MTL: prediction model is average of all workers’ model - RY: coin flipping to decide whether annotation comes from bias/ground truth - YAN: active learning from crowd - GLAD: considering sample difficulty and workers’ expertise - CUBAM: considering workers’ expertise and bias - MV: majority vote Algorithm learns a prediction model
  • 23. SMU Classification: Restricted 23 wordtopic Dist. of topic-doc Dist. of topic-word # doc # word Prior of !
  • 24. SMU Classification: Restricted 24 Latent class (truth) Label (annotation) Reliability of worker # worker # class
  • 25. SMU Classification: Restricted 25 • Other than binary classification - Multi-Label Learning from Crowds • Level of confidence - Active Learning from Crowds with Unsure Option - Active Learning with Confidence-based Answers for Crowdsourcing Labelling Tasks • More complicated models: - Gaussian Process Classification and Active Learning with Multiple Annotators - Deep Learning from Crowds
  • 26. SMU Classification: Restricted 26 • Crowdsourcing can be very helpful when performing out-of- sample prediction • Existed models can be extended to be put in the crowdsourcing scenario