SlideShare a Scribd company logo
1 of 25
Download to read offline
Active Content-Based
Crowdsourcing Task
Selection
Piyush Bansal, Carsten Eickhoff, Thomas Hofmann
ETH Zurich
1
Outline
● Past work
○ Exploiting Document content for vote aggregation
● Ongoing extensions
○ Crowdsourcing in extreme budget constraints.
○ Information theoretic approaches
○ Experiments and results
○ Conclusion
2
State of the Art
● Crowdsourced relevance assessment cheap and effective
● Quality control via redundancy yields strong performance
● Untapped source of information: document content
● Key idea: Locality of relevance
Davtyan et al. 2015: Exploiting Document Content for Efficient Aggregation of Crowdsourcing Votes
3
●
Clustering Hypothesis for relevance assessment
4
Methods
● (informal) Problem statement: Given a set of relevance assessments,
how accurately can we infer the relevance of unjudged Web pages?
○ Solution ideas:
■ Assign same relevance assessment label as nearest neighbor.
■ Borrow relevance assessments from <n> nearest neighbors and
then assign the majority label.
■ Smooth expected relevance across similarity space (KDE, GPs)
○ Baseline:
■ Majority Voting for label aggregation, and coin toss for unjudged
Web pages.
5
Davtyan et. al. - Results
6
Motivation for our work
Consider the task of search relevance assessment
● Extremely budget-constrained scenario
● Can only ask humans to rate a few Web pages per query
● In previous figure: Number of votes < 1
7
A Generic Model of Crowdsourcing
8
A Generic Model of Crowdsourcing
9
A Generic Model of Crowdsourcing
Difallah et al. 2013: Pick-a-crowd,
Nushi et al. 2015: Crowd Access Path Optimization
10
A Generic Model of Crowdsourcing
11
A Generic Model of Crowdsourcing
12
Kazai et al. 2011: Worker types and personality traits in crowdsourcing relevance labels
Davtyan et al. 2015: Exploiting Document Content for Efficient Aggregation of Crowdsourcing Votes
A Generic Model of Crowdsourcing
13
Preliminaries
● RequestVote
○ Sample radom vote from crowd
● AggregateVotes
○ Gaussian Processes (GP) for inferring relevance labels for
unjudged documents.
○ Described by mean function (here: constant),
○ and covariance function (here: linear covariance).
14
PickDocument
● What subset of documents to select for labeling?
○ Typical Active learning problem
○ Focus on optimal data acquisition
○ Baseline: Random sampling
● Select points that the classifier is most uncertain about
○ uncertainty based sampling.
15
Solution
● Variance-based sampling:
○ Proxy for “uncertainty”, as entropy is a measure of uncertainty
○ Variance-based sampling as approximation to max entropy sampling.
○ In Gaussian processes, the posterior variance does not depend on
the actual observed values of random variables.
16
Solution
● Selecting points that maximise variance is NP complete2
● However, this criterion is “submodular"
○ Submodularity (informally): In mathematics, a submodular set function (also
known as a submodular function) is a set function whose value, informally, has the
property that the difference in the incremental value of the function that a single element
makes when added to an input set decreases as the size of the input set increases.
○ However, due to Nemhauser (1978), an approximate solution (1 - 1/e)
OPT to this is achieved via a greedy algorithm.
2 Krause et al. 2008: Near-optimal sensor placements in Gaussian processes
3 Nemhauser et al. 1978: An analysis of approximations for maximizing submodular set functions
17
Algorithm: Variance based sampling
18
Mutual Information based sampling
● Variance-based sampling is only concerned with reducing
uncertainty at sampled points.
● We care about system-wide uncertainty.
● Maximise Mutual Information b/w selected documents and
rest of space.
● Equivalent to maximally minimising the entropy between
selected documents, and the rest of space (DA).
19
Algorithm: MI based sampling
20
Experiments
● TREC Crowdsourcing Track 2011 data
● 30 (28) topics
● ~100 documents (ClueWeb’09) to be judged per topic
● ~15 historic votes per query-document pair
● Project documents in 100D doc2vec space
21
Results - on TREC2011 CrowdSourcing Dataset.
22
Qualitative Analysis
23
Conclusions
● Active Learning for Crowdsourcing Vote Sampling
● Two information-theoretic criteria
○ Variance
○ Mutual information
● Saves up to 25% budget at constant quality
● Can be computed efficiently (greedy)
● Does not depend on sampled observations
● In the future: application to other modalities (images, videos)
24
Thank you!
Questions?
25

More Related Content

What's hot

Analysis of different similarity measures: Simrank
Analysis of different similarity measures: SimrankAnalysis of different similarity measures: Simrank
Analysis of different similarity measures: SimrankAbhishek Mungoli
 
Transformation Functions for Text Classification: A case study with StackOver...
Transformation Functions for Text Classification: A case study with StackOver...Transformation Functions for Text Classification: A case study with StackOver...
Transformation Functions for Text Classification: A case study with StackOver...Sebastian Ruder
 
Slides: Concurrent Inference of Topic Models and Distributed Vector Represent...
Slides: Concurrent Inference of Topic Models and Distributed Vector Represent...Slides: Concurrent Inference of Topic Models and Distributed Vector Represent...
Slides: Concurrent Inference of Topic Models and Distributed Vector Represent...Parang Saraf
 
Machine Language and Pattern Analysis IEEE 2015 Projects
Machine Language and Pattern Analysis IEEE 2015 ProjectsMachine Language and Pattern Analysis IEEE 2015 Projects
Machine Language and Pattern Analysis IEEE 2015 ProjectsVijay Karan
 
Drsp dimension reduction for similarity matching and pruning of time series ...
Drsp  dimension reduction for similarity matching and pruning of time series ...Drsp  dimension reduction for similarity matching and pruning of time series ...
Drsp dimension reduction for similarity matching and pruning of time series ...IJDKP
 

What's hot (7)

Analysis of different similarity measures: Simrank
Analysis of different similarity measures: SimrankAnalysis of different similarity measures: Simrank
Analysis of different similarity measures: Simrank
 
Transformation Functions for Text Classification: A case study with StackOver...
Transformation Functions for Text Classification: A case study with StackOver...Transformation Functions for Text Classification: A case study with StackOver...
Transformation Functions for Text Classification: A case study with StackOver...
 
Pilot Big Data O&G by CGG
Pilot Big Data O&G by CGGPilot Big Data O&G by CGG
Pilot Big Data O&G by CGG
 
Slides: Concurrent Inference of Topic Models and Distributed Vector Represent...
Slides: Concurrent Inference of Topic Models and Distributed Vector Represent...Slides: Concurrent Inference of Topic Models and Distributed Vector Represent...
Slides: Concurrent Inference of Topic Models and Distributed Vector Represent...
 
Alan Berg
Alan Berg Alan Berg
Alan Berg
 
Machine Language and Pattern Analysis IEEE 2015 Projects
Machine Language and Pattern Analysis IEEE 2015 ProjectsMachine Language and Pattern Analysis IEEE 2015 Projects
Machine Language and Pattern Analysis IEEE 2015 Projects
 
Drsp dimension reduction for similarity matching and pruning of time series ...
Drsp  dimension reduction for similarity matching and pruning of time series ...Drsp  dimension reduction for similarity matching and pruning of time series ...
Drsp dimension reduction for similarity matching and pruning of time series ...
 

Similar to Active Content-Based Crowdsourcing Task Selection

Learning Content and Usage Factors Simultaneously
Learning Content and Usage Factors SimultaneouslyLearning Content and Usage Factors Simultaneously
Learning Content and Usage Factors SimultaneouslyArnab Bhadury
 
Rank based similarity search reducing the dimensional dependence
Rank based similarity search reducing the dimensional dependenceRank based similarity search reducing the dimensional dependence
Rank based similarity search reducing the dimensional dependenceredpel dot com
 
What to read next? Challenges and Preliminary Results in Selecting Represen...
What to read next? Challenges and  Preliminary Results in Selecting  Represen...What to read next? Challenges and  Preliminary Results in Selecting  Represen...
What to read next? Challenges and Preliminary Results in Selecting Represen...MOVING Project
 
Natural Language Processing to Improve Student Engagement featuring Dr. Rebec...
Natural Language Processing to Improve Student Engagement featuring Dr. Rebec...Natural Language Processing to Improve Student Engagement featuring Dr. Rebec...
Natural Language Processing to Improve Student Engagement featuring Dr. Rebec...Penn State EdTech Network
 
Recommendations for Open Online Education: An Algorithmic Study
Recommendations for Open Online Education:  An Algorithmic StudyRecommendations for Open Online Education:  An Algorithmic Study
Recommendations for Open Online Education: An Algorithmic StudyHendrik Drachsler
 
Statistical Analysis of Results in Music Information Retrieval: Why and How
Statistical Analysis of Results in Music Information Retrieval: Why and HowStatistical Analysis of Results in Music Information Retrieval: Why and How
Statistical Analysis of Results in Music Information Retrieval: Why and HowJulián Urbano
 
A multi criteria evaluation of environmental databases using hasse
A multi criteria evaluation of environmental databases using hasseA multi criteria evaluation of environmental databases using hasse
A multi criteria evaluation of environmental databases using hassebalamurugan.k Kalibalamurugan
 
Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...
Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...
Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...Ian Morgan
 
Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...
Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...
Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...Bayes Nets meetup London
 
Online machine learning in Streaming Applications
Online machine learning in Streaming ApplicationsOnline machine learning in Streaming Applications
Online machine learning in Streaming ApplicationsStavros Kontopoulos
 
Neural Semi-supervised Learning under Domain Shift
Neural Semi-supervised Learning under Domain ShiftNeural Semi-supervised Learning under Domain Shift
Neural Semi-supervised Learning under Domain ShiftSebastian Ruder
 
Towards reproducibility and maximally-open data
Towards reproducibility and maximally-open dataTowards reproducibility and maximally-open data
Towards reproducibility and maximally-open dataPablo Bernabeu
 
Study and development of methods and tools for testing, validation and verif...
 Study and development of methods and tools for testing, validation and verif... Study and development of methods and tools for testing, validation and verif...
Study and development of methods and tools for testing, validation and verif...Emilio Serrano
 
Applications of Machine Learning for Materials Discovery at NREL
Applications of Machine Learning for Materials Discovery at NRELApplications of Machine Learning for Materials Discovery at NREL
Applications of Machine Learning for Materials Discovery at NRELaimsnist
 
Exploratory_Analysis_of_Data_ppt.pdf
Exploratory_Analysis_of_Data_ppt.pdfExploratory_Analysis_of_Data_ppt.pdf
Exploratory_Analysis_of_Data_ppt.pdfRushikeshKulkarni71
 
BigData'18: Validation and Analysis of Hypothesis Generation Systems
BigData'18: Validation and Analysis of Hypothesis Generation SystemsBigData'18: Validation and Analysis of Hypothesis Generation Systems
BigData'18: Validation and Analysis of Hypothesis Generation SystemsJustin Sybrandt, Ph.D.
 

Similar to Active Content-Based Crowdsourcing Task Selection (20)

Sybrandt Thesis Proposal Presentation
Sybrandt Thesis Proposal PresentationSybrandt Thesis Proposal Presentation
Sybrandt Thesis Proposal Presentation
 
Learning Content and Usage Factors Simultaneously
Learning Content and Usage Factors SimultaneouslyLearning Content and Usage Factors Simultaneously
Learning Content and Usage Factors Simultaneously
 
Active learning
Active learningActive learning
Active learning
 
Rank based similarity search reducing the dimensional dependence
Rank based similarity search reducing the dimensional dependenceRank based similarity search reducing the dimensional dependence
Rank based similarity search reducing the dimensional dependence
 
What to read next? Challenges and Preliminary Results in Selecting Represen...
What to read next? Challenges and  Preliminary Results in Selecting  Represen...What to read next? Challenges and  Preliminary Results in Selecting  Represen...
What to read next? Challenges and Preliminary Results in Selecting Represen...
 
Natural Language Processing to Improve Student Engagement featuring Dr. Rebec...
Natural Language Processing to Improve Student Engagement featuring Dr. Rebec...Natural Language Processing to Improve Student Engagement featuring Dr. Rebec...
Natural Language Processing to Improve Student Engagement featuring Dr. Rebec...
 
Recommendations for Open Online Education: An Algorithmic Study
Recommendations for Open Online Education:  An Algorithmic StudyRecommendations for Open Online Education:  An Algorithmic Study
Recommendations for Open Online Education: An Algorithmic Study
 
Statistical Analysis of Results in Music Information Retrieval: Why and How
Statistical Analysis of Results in Music Information Retrieval: Why and HowStatistical Analysis of Results in Music Information Retrieval: Why and How
Statistical Analysis of Results in Music Information Retrieval: Why and How
 
A multi criteria evaluation of environmental databases using hasse
A multi criteria evaluation of environmental databases using hasseA multi criteria evaluation of environmental databases using hasse
A multi criteria evaluation of environmental databases using hasse
 
Climate Extremes Workshop - Networks and Extremes: Review and Further Studies...
Climate Extremes Workshop - Networks and Extremes: Review and Further Studies...Climate Extremes Workshop - Networks and Extremes: Review and Further Studies...
Climate Extremes Workshop - Networks and Extremes: Review and Further Studies...
 
Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...
Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...
Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...
 
Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...
Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...
Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...
 
Online machine learning in Streaming Applications
Online machine learning in Streaming ApplicationsOnline machine learning in Streaming Applications
Online machine learning in Streaming Applications
 
Neural Semi-supervised Learning under Domain Shift
Neural Semi-supervised Learning under Domain ShiftNeural Semi-supervised Learning under Domain Shift
Neural Semi-supervised Learning under Domain Shift
 
Towards reproducibility and maximally-open data
Towards reproducibility and maximally-open dataTowards reproducibility and maximally-open data
Towards reproducibility and maximally-open data
 
Noisy labels
Noisy labelsNoisy labels
Noisy labels
 
Study and development of methods and tools for testing, validation and verif...
 Study and development of methods and tools for testing, validation and verif... Study and development of methods and tools for testing, validation and verif...
Study and development of methods and tools for testing, validation and verif...
 
Applications of Machine Learning for Materials Discovery at NREL
Applications of Machine Learning for Materials Discovery at NRELApplications of Machine Learning for Materials Discovery at NREL
Applications of Machine Learning for Materials Discovery at NREL
 
Exploratory_Analysis_of_Data_ppt.pdf
Exploratory_Analysis_of_Data_ppt.pdfExploratory_Analysis_of_Data_ppt.pdf
Exploratory_Analysis_of_Data_ppt.pdf
 
BigData'18: Validation and Analysis of Hypothesis Generation Systems
BigData'18: Validation and Analysis of Hypothesis Generation SystemsBigData'18: Validation and Analysis of Hypothesis Generation Systems
BigData'18: Validation and Analysis of Hypothesis Generation Systems
 

More from Carsten Eickhoff

Unsupervised Learning of General-Purpose Embeddings for User and Location Mod...
Unsupervised Learning of General-Purpose Embeddings for User and Location Mod...Unsupervised Learning of General-Purpose Embeddings for User and Location Mod...
Unsupervised Learning of General-Purpose Embeddings for User and Location Mod...Carsten Eickhoff
 
Web2Text: Deep Structured Boilerplate Removal
Web2Text: Deep Structured Boilerplate RemovalWeb2Text: Deep Structured Boilerplate Removal
Web2Text: Deep Structured Boilerplate RemovalCarsten Eickhoff
 
Cognitive Biases in Crowdsourcing
Cognitive Biases in CrowdsourcingCognitive Biases in Crowdsourcing
Cognitive Biases in CrowdsourcingCarsten Eickhoff
 
Evaluating Music Recommender Systems for Groups
Evaluating Music Recommender Systems for GroupsEvaluating Music Recommender Systems for Groups
Evaluating Music Recommender Systems for GroupsCarsten Eickhoff
 
Efficient Parallel Learning of Word2Vec
Efficient Parallel Learning of Word2VecEfficient Parallel Learning of Word2Vec
Efficient Parallel Learning of Word2VecCarsten Eickhoff
 
An Eye-Tracking Study of Query Reformulation
An Eye-Tracking Study of Query ReformulationAn Eye-Tracking Study of Query Reformulation
An Eye-Tracking Study of Query ReformulationCarsten Eickhoff
 
Introduction to Information Retrieval
Introduction to Information RetrievalIntroduction to Information Retrieval
Introduction to Information RetrievalCarsten Eickhoff
 
Exploiting User Comments for Audio-visual Content Indexing and Retrieval (ECI...
Exploiting User Comments for Audio-visual Content Indexing and Retrieval (ECI...Exploiting User Comments for Audio-visual Content Indexing and Retrieval (ECI...
Exploiting User Comments for Audio-visual Content Indexing and Retrieval (ECI...Carsten Eickhoff
 

More from Carsten Eickhoff (8)

Unsupervised Learning of General-Purpose Embeddings for User and Location Mod...
Unsupervised Learning of General-Purpose Embeddings for User and Location Mod...Unsupervised Learning of General-Purpose Embeddings for User and Location Mod...
Unsupervised Learning of General-Purpose Embeddings for User and Location Mod...
 
Web2Text: Deep Structured Boilerplate Removal
Web2Text: Deep Structured Boilerplate RemovalWeb2Text: Deep Structured Boilerplate Removal
Web2Text: Deep Structured Boilerplate Removal
 
Cognitive Biases in Crowdsourcing
Cognitive Biases in CrowdsourcingCognitive Biases in Crowdsourcing
Cognitive Biases in Crowdsourcing
 
Evaluating Music Recommender Systems for Groups
Evaluating Music Recommender Systems for GroupsEvaluating Music Recommender Systems for Groups
Evaluating Music Recommender Systems for Groups
 
Efficient Parallel Learning of Word2Vec
Efficient Parallel Learning of Word2VecEfficient Parallel Learning of Word2Vec
Efficient Parallel Learning of Word2Vec
 
An Eye-Tracking Study of Query Reformulation
An Eye-Tracking Study of Query ReformulationAn Eye-Tracking Study of Query Reformulation
An Eye-Tracking Study of Query Reformulation
 
Introduction to Information Retrieval
Introduction to Information RetrievalIntroduction to Information Retrieval
Introduction to Information Retrieval
 
Exploiting User Comments for Audio-visual Content Indexing and Retrieval (ECI...
Exploiting User Comments for Audio-visual Content Indexing and Retrieval (ECI...Exploiting User Comments for Audio-visual Content Indexing and Retrieval (ECI...
Exploiting User Comments for Audio-visual Content Indexing and Retrieval (ECI...
 

Recently uploaded

Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)PraveenaKalaiselvan1
 
Cultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxCultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxpradhanghanshyam7136
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxSwapnil Therkar
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Patrick Diehl
 
A relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfA relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfnehabiju2046
 
Boyles law module in the grade 10 science
Boyles law module in the grade 10 scienceBoyles law module in the grade 10 science
Boyles law module in the grade 10 sciencefloriejanemacaya1
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...Sérgio Sacani
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCEPRINCE C P
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTSérgio Sacani
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bSérgio Sacani
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptxanandsmhk
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real timeSatoshi NAKAHIRA
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxAArockiyaNisha
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhousejana861314
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PPRINCE C P
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfSELF-EXPLANATORY
 
Scheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxScheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxyaramohamed343013
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsAArockiyaNisha
 

Recently uploaded (20)

Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)
 
Cultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxCultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptx
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?
 
A relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfA relative description on Sonoporation.pdf
A relative description on Sonoporation.pdf
 
Boyles law module in the grade 10 science
Boyles law module in the grade 10 scienceBoyles law module in the grade 10 science
Boyles law module in the grade 10 science
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real time
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhouse
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C P
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
 
Scheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxScheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docx
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based Nanomaterials
 
Engler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomyEngler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomy
 

Active Content-Based Crowdsourcing Task Selection

  • 1. Active Content-Based Crowdsourcing Task Selection Piyush Bansal, Carsten Eickhoff, Thomas Hofmann ETH Zurich 1
  • 2. Outline ● Past work ○ Exploiting Document content for vote aggregation ● Ongoing extensions ○ Crowdsourcing in extreme budget constraints. ○ Information theoretic approaches ○ Experiments and results ○ Conclusion 2
  • 3. State of the Art ● Crowdsourced relevance assessment cheap and effective ● Quality control via redundancy yields strong performance ● Untapped source of information: document content ● Key idea: Locality of relevance Davtyan et al. 2015: Exploiting Document Content for Efficient Aggregation of Crowdsourcing Votes 3
  • 4. ● Clustering Hypothesis for relevance assessment 4
  • 5. Methods ● (informal) Problem statement: Given a set of relevance assessments, how accurately can we infer the relevance of unjudged Web pages? ○ Solution ideas: ■ Assign same relevance assessment label as nearest neighbor. ■ Borrow relevance assessments from <n> nearest neighbors and then assign the majority label. ■ Smooth expected relevance across similarity space (KDE, GPs) ○ Baseline: ■ Majority Voting for label aggregation, and coin toss for unjudged Web pages. 5
  • 6. Davtyan et. al. - Results 6
  • 7. Motivation for our work Consider the task of search relevance assessment ● Extremely budget-constrained scenario ● Can only ask humans to rate a few Web pages per query ● In previous figure: Number of votes < 1 7
  • 8. A Generic Model of Crowdsourcing 8
  • 9. A Generic Model of Crowdsourcing 9
  • 10. A Generic Model of Crowdsourcing Difallah et al. 2013: Pick-a-crowd, Nushi et al. 2015: Crowd Access Path Optimization 10
  • 11. A Generic Model of Crowdsourcing 11
  • 12. A Generic Model of Crowdsourcing 12 Kazai et al. 2011: Worker types and personality traits in crowdsourcing relevance labels Davtyan et al. 2015: Exploiting Document Content for Efficient Aggregation of Crowdsourcing Votes
  • 13. A Generic Model of Crowdsourcing 13
  • 14. Preliminaries ● RequestVote ○ Sample radom vote from crowd ● AggregateVotes ○ Gaussian Processes (GP) for inferring relevance labels for unjudged documents. ○ Described by mean function (here: constant), ○ and covariance function (here: linear covariance). 14
  • 15. PickDocument ● What subset of documents to select for labeling? ○ Typical Active learning problem ○ Focus on optimal data acquisition ○ Baseline: Random sampling ● Select points that the classifier is most uncertain about ○ uncertainty based sampling. 15
  • 16. Solution ● Variance-based sampling: ○ Proxy for “uncertainty”, as entropy is a measure of uncertainty ○ Variance-based sampling as approximation to max entropy sampling. ○ In Gaussian processes, the posterior variance does not depend on the actual observed values of random variables. 16
  • 17. Solution ● Selecting points that maximise variance is NP complete2 ● However, this criterion is “submodular" ○ Submodularity (informally): In mathematics, a submodular set function (also known as a submodular function) is a set function whose value, informally, has the property that the difference in the incremental value of the function that a single element makes when added to an input set decreases as the size of the input set increases. ○ However, due to Nemhauser (1978), an approximate solution (1 - 1/e) OPT to this is achieved via a greedy algorithm. 2 Krause et al. 2008: Near-optimal sensor placements in Gaussian processes 3 Nemhauser et al. 1978: An analysis of approximations for maximizing submodular set functions 17
  • 19. Mutual Information based sampling ● Variance-based sampling is only concerned with reducing uncertainty at sampled points. ● We care about system-wide uncertainty. ● Maximise Mutual Information b/w selected documents and rest of space. ● Equivalent to maximally minimising the entropy between selected documents, and the rest of space (DA). 19
  • 20. Algorithm: MI based sampling 20
  • 21. Experiments ● TREC Crowdsourcing Track 2011 data ● 30 (28) topics ● ~100 documents (ClueWeb’09) to be judged per topic ● ~15 historic votes per query-document pair ● Project documents in 100D doc2vec space 21
  • 22. Results - on TREC2011 CrowdSourcing Dataset. 22
  • 24. Conclusions ● Active Learning for Crowdsourcing Vote Sampling ● Two information-theoretic criteria ○ Variance ○ Mutual information ● Saves up to 25% budget at constant quality ● Can be computed efficiently (greedy) ● Does not depend on sampled observations ● In the future: application to other modalities (images, videos) 24