Robert Leaman Benjamin Good
Zhiyong Lu Andrew Su
http://slideshare.net/andrewsu
 The aggregated decisions of a group
are often better than the those of any
single member
 Requirements:
 Diversity
 Independence
 Decentralization
 Aggregation
2[Surowiecki, 2004]
Sir Francis Galton
 An undefined group of people
 Typically ‘large’
 Diverse skills and abilities
 Typically no special skills assumed
3
[Estelles-Arolas, 2012]
 Computational power
 Distributed computing
 Content
 Web searches, social media
updates, blogs
 Observations
 Online surveys
 Personal data
4[Good & Su, 2013]
 Cognitive power
 Visual reasoning, language
processing
 Creative effort
 Resource creation, algorithm
development
 Funding: $$$
5[Good & Su, 2013]
 Crowd data
 Content
 Search logs
 Crowdsourcing
 Observations
 Cognitive power
 Creative effort
 Not a focus in this
tutorial
 Distributed
computing
 Crowdfunding
6
 Access
 To the data; to the crowd
▪ 1 in 5 people have a smartphone worldwide
 Engagement
 Getting contributors’ attention
 Incentive
 Quality control
7
 Information reflects health
 Disease status
 Disease associations
 Health related behaviors
 Information also drives health
 Knowledge and beliefs regarding prevention and
treatment
 Quality monitoring of health information
available to public 8
“Infodemiology”
[Eysenbach, 2006]
 Key challenge: text
 Variability: tired, wiped, pooped  somnolence
 Ambiguity: numb  sensory or cognition?
 Two levels
 Keyword: locate specific terms + synonyms
 Concept: attempt to normalize mentions to
specific entities
 Measurement
 Disproportionality analysis
 Separating signal from noise
9
 Objective: predict flu
outbreaks from internet
search trends
 Access to search data via
direct access to logs or via
ad clicks
 High correlation between
clicks one week and cases
the next
 Caveats!
 Many potential confounders
10
[Eysenbach, 2006]
[Eysenbach, 2009]
[Ginsberg et al., 2009]
2004 2005 2006 2007
searches
cases
 Objective: Mine social media
forums for ADR reports
 Lexicon based on UMLS
Metathesaurus, SIDER,
MedEffect, and a set of
colloquial phrases (“zonked”,
misspellings)
 Demonstrated viability of
text mining (73.9% f-
measure)
 Revealed known ADRs and
putatively novel ADRs
Olanzapine Known
incidence
Corpus
Frequency
Weight gain 65% 30.0%
Fatigue 26% 15.9%
Increased
cholesterol
22% -
Increased
appetite
- 4.9%
Depression - 3.1%
Tremor - 2.7%
Diabetes 2% 2.6%
Anxiety - 1.4%
11
[Leaman et al., 2010]
 Objective: identify DDI from
internet search logs
 DDI reports difficult to find
 Focused on a DDI unknown at
time data collected
▪ Paroxetine + pravastatin 
hyperglycemia
 Synonyms
 Web searches
 Disproportionality analysis
 Results
 Significant association
 Classifying 31TP & 31TN pairs
▪ AUC = 0.82 12
[White et al., 2013]
 Outsourcing
 Tasks normally performed in-house
 To a large, diverse, external group
 Via an open call
13
[Estelles-Arolas, 2012]
EXPERT LABOR
 Must be found
 Expensive
 Often slow
 High quality
 Ambiguity OK
 Hard to use for
experiments
 Must be retained
CROWD LABOR
 Readily available
 Inexpensive
 Fast
 Quality variable
 Instructions must be clear
 Easy prototyping and
experimentation
 Retention less important
14
 Humans (even unskilled) simply better than
computers at some tasks
 Allows workflows to include an “HPU”
 Highly scalable
 Rapid turn-around
 High throughput
 Diverse solutions
 Low risk
 Low cost
15
[Quinn & Bederson, 2011]
 Microtask: low difficulty, large in number
 Observations or data processing
 Surveying, text or image annotation
 Validation: redundancy and aggregation
 Megatask: high difficulty, low in number
 Problem solving, creative effort
 Validation: manually, with metrics or rubric
16
[Good & Su, 2013]
MICROTASK
 Microtask market
 Citizen science
 Workflow
sequestration
 Casual game
 Educational
MEGATASK
 Innovation contest
 Hard game
 Collaborative
content creation
17
[Good & Su, 2013]
18
Requester
Tasks
Amazon
Tasks
Tasks
TasksTasks
Tasks
Tasks
Aggregation
function
Workers
http://www.thesheepmarket.com/
 Automatically tag all genes (NCBI’s gene tagger), all
mutations (UMBC’s EMU)
 Highlight candidate gene-mutation pairs in context
 Frame task as simple yes/no questions
Slide courtesy: L. Hirschman [Burger et al., 2012]
20
21
[Mea 2014]
Tagging cells for
breast cancer
based on stain
22
Requester
Tasks
Amazon
Tasks
Tasks
TasksTasks
Tasks
Tasks
Aggregation
function
Workers
 Baseline: majority vote
 Can we do better?
 Separate annotator bias and error
 Model annotator quality
▪ Measure with labeled data or reputation
 Model difficulty of each task
 Sometimes disagreement is informative
23
[Ipeirotis et al., 2010]
[Raykar et al., 2010]
[Arroyo &Welty, 2013]
MICROTASK
 Microtask market
 Citizen science
 Workflow
sequestration
 Casual game
 Educational
MEGATASK
 Innovation contest
 Hard game
 Collaborative
content creation
24
[Good & Su, 2013]
 Volunteers label images of cell biopsies from
cancer patients
 Estimate presence and number of cancer cells
 Incentive
 Altruism, sense of mastery
 Quality
 training, redundancy
 Analyzed 2.4 million images as of 11/2014
25
[cellslider.net]
MICROTASK
 Microtask market
 Citizen science
 Workflow
sequestration
 Casual game
 Educational
MEGATASK
 Innovation contest
 Hard game
 Collaborative
framework
26
[Good & Su, 2013]
EXAMPLE: RECAPTCHA,
 Workflow:
logging into a
website
 Sequestration:
performing
optical
character
recognition
27
EXAMPLE: PROBLEM-TREATMENT KNOWLEDGE BASE CREATION
 Workflow: prescribing medication
 Sequestration:entering reason for prescription
into ordering system
28
[Mccoy 2012]
MICROTASK
 Microtask market
 Citizen science
 Workflow
sequestration
 Casual game
 Educational
MEGATASK
 Innovation contest
 Hard game
 Collaborative
content creation
29
[Good & Su, 2013]
30
MalariaSpot: Luengo-Ortiz 2012
MOLT: Mavandadi 2012
MICROTASK
 Microtask market
 Citizen science
 Workflow
sequestration
 Casual game
 Educational
MEGATASK
 Innovation contest
 Hard game
 Collaborative
content creation
31
[Good & Su, 2013]
 Bioinformatics students simultaneously learn
and perform metagenome annotation
 Incentive:
educational
 Quality:
aggregation,
instructor
evaluation
32[Hingamp et al., 2008]
MICROTASK
 Microtask market
 Citizen science
 Workflow
sequestration
 Casual game
 Educational
MEGATASK
 Innovation contest
 Hard game
 Collaborative
content creation
33
[Good & Su, 2013]
OPEN PROFESSIONAL PLATFORMS ($$$)
 Innocentive
 TopCoder
 Kaggle
ACADEMIC (PUBLICATIONS..)
 DREAM (see invited opening talk at crowdsourcing session)
 CASP
34
MICROTASK
 Microtask market
 Citizen science
 Workflow
sequestration
 Casual game
 Educational
MEGATASK
 Innovation contest
 Hard game
 Collaborative
content creation
35
[Good & Su, 2013]
 Players manipulate proteins to find the 3D
shape with the lowest calculated free energy
 Competitive and collaborative
 Incentive
 Altruism, fun, community
 Quality
 Automated scoring
 High performance, found
a difficult key retroviral structure
36
[Khatib, et al., 2011]
MICROTASK
 Microtask market
 Citizen science
 Workflow
sequestration
 Casual game
 Educational
MEGATASK
 Innovation contest
 Hard game
 Collaborative
content creation
37
 Aims to provide a
Wikipedia page for
every notable human
gene
 Repository of
functional knowledge
 10K distinct genes
 50M views & 15K edits
per year
38
[Huss et al., 2008]
[Good et al., 2011]
 Means many different things
 Fundamental points:
 Humans (even unskilled) simply better than
computers at some tasks
 There are a lot of humans available
 There are many approaches for accessing their
talents
39
INTRINSIC
 Altruism
 Fun
 Education
 Sense of mastery
 Resource creation
EXTRINSIC
 Money
 Recognition
 Community
40
 Define problem & goal
 Decide platform
 Decompose problem into tasks
 Separate: expert, crowdsourced & automatable
 Refine crowdsourced tasks
 Simple, clear, self-contained, engaging
 Design: instructions and user interface
41
[Hetmank, 2013]
[Alonso & Lease, 2011]
[Eickhoff & deVries, 2011]
 Iterate
 Test internally
 Calibrate with small crowdsourced sample
 Verify understanding, timing, pricing & quality
 Incorporate feedback
 Run production
 Scale on data before workers
 Validate results
42
[Hetmank, 2013]
[Alonso & Lease, 2011]
[Eickhoff & deVries, 2011]
 Automatic evaluation
 If possible
 Direct quality assessment
 Expensive
▪ Microtask: Include tasks with known answers
▪ Megatask: Evaluate tasks after completion (rubric)
 Aggregate redundant responses
43
PRO
 Reduced cost  more
data
 Fast turn-around time
 High throughput
 “Real world”
environment
 Public participation &
awareness
CON
 Potentially poor quality
 Spammers
 Potentially low
retention
 Privacy concerns for
sensitive data
 Lax protections for
workers
44
 Potentially poor quality: discussed previously
 Low retention
 Complicates quality estimation due to sparsity
 Do workers build task-specific expertise?
 Privacy
 Sensitive data requires trusted workers
45
 Protection for workers
 Low pay, no protections, benefits, or career path
 Potential to cause harm
▪ E.g. exposure to anti-vaccine information
 Is IRB approval needed?
 Can be addressed
 Responsibility of the researcher
▪ “[opportunity to] deliberately value ethics above cost
savings”
46
[Graber & Graber, 2013]
[Fort, Adda and Cohen, 2011]
[Fort, Adda and Cohen, 2011]
 Demographics:
 Shift from mostly US to US/India mix
 Average pay is <$2.00 / hour
 Over 30% rely on MTurk for basic income
 Workers not anonymous
 However:
 Tools can be used ethically or unethically
 Crowdsourcing ≠ AMT
47
[Ross et al., 2009]
[Lease et al., 2013]
 Improved predictability
 Pricing, quality, retention
 Improved infrastructure
 Data analysis, validation & aggregation
 Improved trust mechanisms
 Matching workers and tasks
 Relevant characteristics for matching each
 Increased mobility
48
 Crowdsourcing and learning from crowd data
offer distinct advantages
 Scalability
 Rapid turn-around
 Throughput
 Low cost
 Must be carefully planned and managed
49
 Wide variety of approaches and platforms
available
 Resources section lists several
 Many questions still open
 Science using crowdsourcing
 Science of crowdsourcing
50
 Thanks to the members of the crowd who make this
methodology possible
 Questions: robert.leaman@nih.gov,
bgood@scripps.edu, asu@scripps.edu
 Support:
 Robert Leaman & Zhiyong Lu:
▪ Intramural Research Program of National Library of Medicine, NIH
 Benjamin Good & Andrew Su:
▪ National Institute of General Medical Sciences, NIH: R01GM089820
and R01GM083924
▪ NationalCenter for AdvancingTranslational Sciences, NIH:
UL1TR001114
51
 Distributed computing: BOINC
 Microtask markets: Amazon MechanicalTurk,
Clickworker, SamaSource, many others
 Meta services: Crowdflower, Crowdsource
 Educational: annotathon.org
 Innovation contest: Innocentive,TopCoder
 Crowdfunding: Rockethub, Petridish
52
 Adar E:Why I hate MechanicalTurk research (and workshops). In: CHI: 2011;
Vancouver, BC, Canada. Citeseer.
 Alonso O, Lease M: Crowdsourcing for Information Retrieval: Principles, Methods
and Applications.Tutorial at ACM-SIGIR 2011.
 Aroyo L,Welty C: CrowdTruth: Harnessing disagreement in crowdsourcing a
relation extraction gold standard. In:WebSci2013 ACM 2013. 2013.
 Burger J, Doughty E, Bayer S,Tresner-Kirsch D,Wellner B, Aberdeen J, Lee K,
Kann M, Hirschman L:Validating Candidate Gene-Mutation Relations in MEDLINE
Abstracts via Crowdsourcing. In: Data Integration in the Life Sciences.vol. 7348:
Springer Berlin Heidelberg; 2012: 83-91.
 Eickhoff C, deVries A: How Crowdsourceable is yourTask? In:WSDM 2011
Workshop on Crowdsourcing for Search and Data Mining; Hong Kong, China. 2011:
11-14.
 Estelles-Arolas E, Gonzalez-Ladron-de-Guevara F:Towards an integrated
crowdsourcing definition. Journal of Information Science 2012, 38(189).
 Fort K, Adda G, Cohen KB: Amazon MechanicalTurk: Gold Mine or Coal Mine?
Computational Linguistics 2011, 37(2).
 Ginsberg J, Mohebbi MH, Patel RS, Brammer L, Smolinski MS, Brilliant L:
Detecting influenza epidemics using search engine query data. Nature 2009,
457(7232):1012-1014.
53
 Good BM, Clarke EL, de Alfaro L, Su AI: Gene Wiki in 2011: community
intelligence applied to human gene annotation. Nucleic Acids Res 2011, 40:D1255-
1261.
 Good BM, Su AI: Crowdsourcing for bioinformatics. Bioinformatics 2013,
29(16):1925-1933.
 Graber MA, Graber A: Internet-based crowdsourcing and research ethics: the
case for IRB review. Journal of medical ethics 2013, 39(2):115-118.
 Halevy A, Norvig P, Pereira F:The Unreasonable Effectiveness of Data. IEEE
Intelligent Systems 2009, 9:8-12.
 Harpaz R, Callahan A,Tamang S, LowY, Odgers D, Finlayson S, Jung K, LePendu
P, Shah NH:Text Mining for Adverse Drug Events: the Promise,Challenges, and
State of the Art. Drug Safety 2014, 37(10):777-790.
 Hetmank L: Components and Functions of Crowdsourcing Systems - A
Systematic Literature Review. In: 11th International Conference on
Wirtschaftsinformatik; Leipzip,Germany. 2013.
 Hingamp P, Brochier C,Talla E, Gautheret D,Thieffry D, Herrmann C:
Metagenome annotation using a distributed grid of undergraduate students. PLoS
biology 2008, 6(11):e296.
 Howe J: Crowdsourcing:Why the power of the crowd is driving the future of
business:Crown Business; 2009.
54
 Huss JW, Orozco D, Goodale J,Wu C, Batalov S,VickersTJ,Valafar F, Su AI:A
GeneWiki for Community Annotation of Gene Function. PLoS biology 2008,
6(7):e175.
 Ipeirotis P: Managing Crowdsourced Human Computation.Tutorial at WWW2011.
 Ipeirotis PG, Provost F,Wang J: Quality Management on Amazon Mechanical
Turk. In: KDD-HCOMP;Washington DC, USA. 2010.
 Khatib F, DiMaio F, Foldit Contenders G, FolditVoid Crushers G, Cooper S,
Kazmierczyk M, Gilski M, Krzywda S, Zabranska H, Pichova I et al: Crystal structure
of a monomeric retroviral protease solved by protein folding game players. Nature
structural & molecular biology 2011, 18(10):1175-1177.
 Leaman R,Wojtulewicz L, Sullivan R, Skariah A,Yang J, Gonzalez G:Towards
Internet-Age Pharmacovigilance: Extracting Adverse Drug Reactions from User
Posts to Health-Related Social Networks. In: BioNLPWorkshop; 2010: 117-125.
 Lease M, Hullman J, Bingham JP, Bernstein M, Kim J, LaseckiWS, Bakhshi S,
MitraT, Miller RC: MechanicalTurk is Not Anonymous. In.: Social Science Research
Network; 2013.
 Nakatsu RT, Grossman EB, Iacovou CL: A taxonomy of crowdsourcing based on
task complexity. Journal of Information Science 2014.
 Nielsen J: Usability Engineering:Academic Press; 1993.
55
 Pustejovsky J, Stubbs A: Natural Language Annotation for Machine Learning:
O'Reilly Media; 2012.
 Quinn AJ, Bederson BB: Human Computation: A Survey andTaxonomy of a
Growing Field. In: CHI;Vancouver, BC, Canada. 2011.
 Ranard BL, HaYP, Meisel ZF, Asch DA, Hill SS, Becker LB, Seymour AK, Merchant
RM: Crowdsourcing--harnessing the masses to advance health and medicine, a
systematic review. Journal ofGeneral Internal Medicine 2014, 29(1):187-203.
 RaykarVC,Yu S, Zhao LH,Valadez GH, Florin C, Bogoni L, Moy L: Learning from
Crowds. Journal of Machine Learning Research 2010, 11:1297-1332.
 Ross J, Zaldivar A, Irani L:Who are theTurkers?Worker demographics in Amazon
MechanicalTurk. In.: Department of Informatics, UC Irvine USA; 2009.
 Surowiecki J:The Wisdom of Crowds: Doubleday; 2004.
 Vakharia D, Lease M: Beyond AMT: AnAnalysis of Crowd Work Platforms. arXiv;
2013.
 Von Ahn L: Games with a Purpose.Computer 2006, 39(6):92-94.
 White R,Tatonetti NP, Shah NH, Altman RB, Horvitz E:Web-scale
pharmacovigilance: listening to signals from the crowd. J Am Med InformAssoc
2013, 20:404-408.
 Yuen M-C, King I, Leung K-S:A Survey of Crowdsourcing Systems. In: IEEE
International Conference on Privacy, Security, Risk andTrust. 2011.
56

Crowdsourcing and Learning from Crowd Data (Tutorial @ PSB2015)

  • 1.
    Robert Leaman BenjaminGood Zhiyong Lu Andrew Su http://slideshare.net/andrewsu
  • 2.
     The aggregateddecisions of a group are often better than the those of any single member  Requirements:  Diversity  Independence  Decentralization  Aggregation 2[Surowiecki, 2004] Sir Francis Galton
  • 3.
     An undefinedgroup of people  Typically ‘large’  Diverse skills and abilities  Typically no special skills assumed 3 [Estelles-Arolas, 2012]
  • 4.
     Computational power Distributed computing  Content  Web searches, social media updates, blogs  Observations  Online surveys  Personal data 4[Good & Su, 2013]
  • 5.
     Cognitive power Visual reasoning, language processing  Creative effort  Resource creation, algorithm development  Funding: $$$ 5[Good & Su, 2013]
  • 6.
     Crowd data Content  Search logs  Crowdsourcing  Observations  Cognitive power  Creative effort  Not a focus in this tutorial  Distributed computing  Crowdfunding 6
  • 7.
     Access  Tothe data; to the crowd ▪ 1 in 5 people have a smartphone worldwide  Engagement  Getting contributors’ attention  Incentive  Quality control 7
  • 8.
     Information reflectshealth  Disease status  Disease associations  Health related behaviors  Information also drives health  Knowledge and beliefs regarding prevention and treatment  Quality monitoring of health information available to public 8 “Infodemiology” [Eysenbach, 2006]
  • 9.
     Key challenge:text  Variability: tired, wiped, pooped  somnolence  Ambiguity: numb  sensory or cognition?  Two levels  Keyword: locate specific terms + synonyms  Concept: attempt to normalize mentions to specific entities  Measurement  Disproportionality analysis  Separating signal from noise 9
  • 10.
     Objective: predictflu outbreaks from internet search trends  Access to search data via direct access to logs or via ad clicks  High correlation between clicks one week and cases the next  Caveats!  Many potential confounders 10 [Eysenbach, 2006] [Eysenbach, 2009] [Ginsberg et al., 2009] 2004 2005 2006 2007 searches cases
  • 11.
     Objective: Minesocial media forums for ADR reports  Lexicon based on UMLS Metathesaurus, SIDER, MedEffect, and a set of colloquial phrases (“zonked”, misspellings)  Demonstrated viability of text mining (73.9% f- measure)  Revealed known ADRs and putatively novel ADRs Olanzapine Known incidence Corpus Frequency Weight gain 65% 30.0% Fatigue 26% 15.9% Increased cholesterol 22% - Increased appetite - 4.9% Depression - 3.1% Tremor - 2.7% Diabetes 2% 2.6% Anxiety - 1.4% 11 [Leaman et al., 2010]
  • 12.
     Objective: identifyDDI from internet search logs  DDI reports difficult to find  Focused on a DDI unknown at time data collected ▪ Paroxetine + pravastatin  hyperglycemia  Synonyms  Web searches  Disproportionality analysis  Results  Significant association  Classifying 31TP & 31TN pairs ▪ AUC = 0.82 12 [White et al., 2013]
  • 13.
     Outsourcing  Tasksnormally performed in-house  To a large, diverse, external group  Via an open call 13 [Estelles-Arolas, 2012]
  • 14.
    EXPERT LABOR  Mustbe found  Expensive  Often slow  High quality  Ambiguity OK  Hard to use for experiments  Must be retained CROWD LABOR  Readily available  Inexpensive  Fast  Quality variable  Instructions must be clear  Easy prototyping and experimentation  Retention less important 14
  • 15.
     Humans (evenunskilled) simply better than computers at some tasks  Allows workflows to include an “HPU”  Highly scalable  Rapid turn-around  High throughput  Diverse solutions  Low risk  Low cost 15 [Quinn & Bederson, 2011]
  • 16.
     Microtask: lowdifficulty, large in number  Observations or data processing  Surveying, text or image annotation  Validation: redundancy and aggregation  Megatask: high difficulty, low in number  Problem solving, creative effort  Validation: manually, with metrics or rubric 16 [Good & Su, 2013]
  • 17.
    MICROTASK  Microtask market Citizen science  Workflow sequestration  Casual game  Educational MEGATASK  Innovation contest  Hard game  Collaborative content creation 17 [Good & Su, 2013]
  • 18.
  • 19.
     Automatically tagall genes (NCBI’s gene tagger), all mutations (UMBC’s EMU)  Highlight candidate gene-mutation pairs in context  Frame task as simple yes/no questions Slide courtesy: L. Hirschman [Burger et al., 2012]
  • 20.
  • 21.
    21 [Mea 2014] Tagging cellsfor breast cancer based on stain
  • 22.
  • 23.
     Baseline: majorityvote  Can we do better?  Separate annotator bias and error  Model annotator quality ▪ Measure with labeled data or reputation  Model difficulty of each task  Sometimes disagreement is informative 23 [Ipeirotis et al., 2010] [Raykar et al., 2010] [Arroyo &Welty, 2013]
  • 24.
    MICROTASK  Microtask market Citizen science  Workflow sequestration  Casual game  Educational MEGATASK  Innovation contest  Hard game  Collaborative content creation 24 [Good & Su, 2013]
  • 25.
     Volunteers labelimages of cell biopsies from cancer patients  Estimate presence and number of cancer cells  Incentive  Altruism, sense of mastery  Quality  training, redundancy  Analyzed 2.4 million images as of 11/2014 25 [cellslider.net]
  • 26.
    MICROTASK  Microtask market Citizen science  Workflow sequestration  Casual game  Educational MEGATASK  Innovation contest  Hard game  Collaborative framework 26 [Good & Su, 2013]
  • 27.
    EXAMPLE: RECAPTCHA,  Workflow: logginginto a website  Sequestration: performing optical character recognition 27
  • 28.
    EXAMPLE: PROBLEM-TREATMENT KNOWLEDGEBASE CREATION  Workflow: prescribing medication  Sequestration:entering reason for prescription into ordering system 28 [Mccoy 2012]
  • 29.
    MICROTASK  Microtask market Citizen science  Workflow sequestration  Casual game  Educational MEGATASK  Innovation contest  Hard game  Collaborative content creation 29 [Good & Su, 2013]
  • 30.
  • 31.
    MICROTASK  Microtask market Citizen science  Workflow sequestration  Casual game  Educational MEGATASK  Innovation contest  Hard game  Collaborative content creation 31 [Good & Su, 2013]
  • 32.
     Bioinformatics studentssimultaneously learn and perform metagenome annotation  Incentive: educational  Quality: aggregation, instructor evaluation 32[Hingamp et al., 2008]
  • 33.
    MICROTASK  Microtask market Citizen science  Workflow sequestration  Casual game  Educational MEGATASK  Innovation contest  Hard game  Collaborative content creation 33 [Good & Su, 2013]
  • 34.
    OPEN PROFESSIONAL PLATFORMS($$$)  Innocentive  TopCoder  Kaggle ACADEMIC (PUBLICATIONS..)  DREAM (see invited opening talk at crowdsourcing session)  CASP 34
  • 35.
    MICROTASK  Microtask market Citizen science  Workflow sequestration  Casual game  Educational MEGATASK  Innovation contest  Hard game  Collaborative content creation 35 [Good & Su, 2013]
  • 36.
     Players manipulateproteins to find the 3D shape with the lowest calculated free energy  Competitive and collaborative  Incentive  Altruism, fun, community  Quality  Automated scoring  High performance, found a difficult key retroviral structure 36 [Khatib, et al., 2011]
  • 37.
    MICROTASK  Microtask market Citizen science  Workflow sequestration  Casual game  Educational MEGATASK  Innovation contest  Hard game  Collaborative content creation 37
  • 38.
     Aims toprovide a Wikipedia page for every notable human gene  Repository of functional knowledge  10K distinct genes  50M views & 15K edits per year 38 [Huss et al., 2008] [Good et al., 2011]
  • 39.
     Means manydifferent things  Fundamental points:  Humans (even unskilled) simply better than computers at some tasks  There are a lot of humans available  There are many approaches for accessing their talents 39
  • 40.
    INTRINSIC  Altruism  Fun Education  Sense of mastery  Resource creation EXTRINSIC  Money  Recognition  Community 40
  • 41.
     Define problem& goal  Decide platform  Decompose problem into tasks  Separate: expert, crowdsourced & automatable  Refine crowdsourced tasks  Simple, clear, self-contained, engaging  Design: instructions and user interface 41 [Hetmank, 2013] [Alonso & Lease, 2011] [Eickhoff & deVries, 2011]
  • 42.
     Iterate  Testinternally  Calibrate with small crowdsourced sample  Verify understanding, timing, pricing & quality  Incorporate feedback  Run production  Scale on data before workers  Validate results 42 [Hetmank, 2013] [Alonso & Lease, 2011] [Eickhoff & deVries, 2011]
  • 43.
     Automatic evaluation If possible  Direct quality assessment  Expensive ▪ Microtask: Include tasks with known answers ▪ Megatask: Evaluate tasks after completion (rubric)  Aggregate redundant responses 43
  • 44.
    PRO  Reduced cost more data  Fast turn-around time  High throughput  “Real world” environment  Public participation & awareness CON  Potentially poor quality  Spammers  Potentially low retention  Privacy concerns for sensitive data  Lax protections for workers 44
  • 45.
     Potentially poorquality: discussed previously  Low retention  Complicates quality estimation due to sparsity  Do workers build task-specific expertise?  Privacy  Sensitive data requires trusted workers 45
  • 46.
     Protection forworkers  Low pay, no protections, benefits, or career path  Potential to cause harm ▪ E.g. exposure to anti-vaccine information  Is IRB approval needed?  Can be addressed  Responsibility of the researcher ▪ “[opportunity to] deliberately value ethics above cost savings” 46 [Graber & Graber, 2013] [Fort, Adda and Cohen, 2011] [Fort, Adda and Cohen, 2011]
  • 47.
     Demographics:  Shiftfrom mostly US to US/India mix  Average pay is <$2.00 / hour  Over 30% rely on MTurk for basic income  Workers not anonymous  However:  Tools can be used ethically or unethically  Crowdsourcing ≠ AMT 47 [Ross et al., 2009] [Lease et al., 2013]
  • 48.
     Improved predictability Pricing, quality, retention  Improved infrastructure  Data analysis, validation & aggregation  Improved trust mechanisms  Matching workers and tasks  Relevant characteristics for matching each  Increased mobility 48
  • 49.
     Crowdsourcing andlearning from crowd data offer distinct advantages  Scalability  Rapid turn-around  Throughput  Low cost  Must be carefully planned and managed 49
  • 50.
     Wide varietyof approaches and platforms available  Resources section lists several  Many questions still open  Science using crowdsourcing  Science of crowdsourcing 50
  • 51.
     Thanks tothe members of the crowd who make this methodology possible  Questions: robert.leaman@nih.gov, bgood@scripps.edu, asu@scripps.edu  Support:  Robert Leaman & Zhiyong Lu: ▪ Intramural Research Program of National Library of Medicine, NIH  Benjamin Good & Andrew Su: ▪ National Institute of General Medical Sciences, NIH: R01GM089820 and R01GM083924 ▪ NationalCenter for AdvancingTranslational Sciences, NIH: UL1TR001114 51
  • 52.
     Distributed computing:BOINC  Microtask markets: Amazon MechanicalTurk, Clickworker, SamaSource, many others  Meta services: Crowdflower, Crowdsource  Educational: annotathon.org  Innovation contest: Innocentive,TopCoder  Crowdfunding: Rockethub, Petridish 52
  • 53.
     Adar E:WhyI hate MechanicalTurk research (and workshops). In: CHI: 2011; Vancouver, BC, Canada. Citeseer.  Alonso O, Lease M: Crowdsourcing for Information Retrieval: Principles, Methods and Applications.Tutorial at ACM-SIGIR 2011.  Aroyo L,Welty C: CrowdTruth: Harnessing disagreement in crowdsourcing a relation extraction gold standard. In:WebSci2013 ACM 2013. 2013.  Burger J, Doughty E, Bayer S,Tresner-Kirsch D,Wellner B, Aberdeen J, Lee K, Kann M, Hirschman L:Validating Candidate Gene-Mutation Relations in MEDLINE Abstracts via Crowdsourcing. In: Data Integration in the Life Sciences.vol. 7348: Springer Berlin Heidelberg; 2012: 83-91.  Eickhoff C, deVries A: How Crowdsourceable is yourTask? In:WSDM 2011 Workshop on Crowdsourcing for Search and Data Mining; Hong Kong, China. 2011: 11-14.  Estelles-Arolas E, Gonzalez-Ladron-de-Guevara F:Towards an integrated crowdsourcing definition. Journal of Information Science 2012, 38(189).  Fort K, Adda G, Cohen KB: Amazon MechanicalTurk: Gold Mine or Coal Mine? Computational Linguistics 2011, 37(2).  Ginsberg J, Mohebbi MH, Patel RS, Brammer L, Smolinski MS, Brilliant L: Detecting influenza epidemics using search engine query data. Nature 2009, 457(7232):1012-1014. 53
  • 54.
     Good BM,Clarke EL, de Alfaro L, Su AI: Gene Wiki in 2011: community intelligence applied to human gene annotation. Nucleic Acids Res 2011, 40:D1255- 1261.  Good BM, Su AI: Crowdsourcing for bioinformatics. Bioinformatics 2013, 29(16):1925-1933.  Graber MA, Graber A: Internet-based crowdsourcing and research ethics: the case for IRB review. Journal of medical ethics 2013, 39(2):115-118.  Halevy A, Norvig P, Pereira F:The Unreasonable Effectiveness of Data. IEEE Intelligent Systems 2009, 9:8-12.  Harpaz R, Callahan A,Tamang S, LowY, Odgers D, Finlayson S, Jung K, LePendu P, Shah NH:Text Mining for Adverse Drug Events: the Promise,Challenges, and State of the Art. Drug Safety 2014, 37(10):777-790.  Hetmank L: Components and Functions of Crowdsourcing Systems - A Systematic Literature Review. In: 11th International Conference on Wirtschaftsinformatik; Leipzip,Germany. 2013.  Hingamp P, Brochier C,Talla E, Gautheret D,Thieffry D, Herrmann C: Metagenome annotation using a distributed grid of undergraduate students. PLoS biology 2008, 6(11):e296.  Howe J: Crowdsourcing:Why the power of the crowd is driving the future of business:Crown Business; 2009. 54
  • 55.
     Huss JW,Orozco D, Goodale J,Wu C, Batalov S,VickersTJ,Valafar F, Su AI:A GeneWiki for Community Annotation of Gene Function. PLoS biology 2008, 6(7):e175.  Ipeirotis P: Managing Crowdsourced Human Computation.Tutorial at WWW2011.  Ipeirotis PG, Provost F,Wang J: Quality Management on Amazon Mechanical Turk. In: KDD-HCOMP;Washington DC, USA. 2010.  Khatib F, DiMaio F, Foldit Contenders G, FolditVoid Crushers G, Cooper S, Kazmierczyk M, Gilski M, Krzywda S, Zabranska H, Pichova I et al: Crystal structure of a monomeric retroviral protease solved by protein folding game players. Nature structural & molecular biology 2011, 18(10):1175-1177.  Leaman R,Wojtulewicz L, Sullivan R, Skariah A,Yang J, Gonzalez G:Towards Internet-Age Pharmacovigilance: Extracting Adverse Drug Reactions from User Posts to Health-Related Social Networks. In: BioNLPWorkshop; 2010: 117-125.  Lease M, Hullman J, Bingham JP, Bernstein M, Kim J, LaseckiWS, Bakhshi S, MitraT, Miller RC: MechanicalTurk is Not Anonymous. In.: Social Science Research Network; 2013.  Nakatsu RT, Grossman EB, Iacovou CL: A taxonomy of crowdsourcing based on task complexity. Journal of Information Science 2014.  Nielsen J: Usability Engineering:Academic Press; 1993. 55
  • 56.
     Pustejovsky J,Stubbs A: Natural Language Annotation for Machine Learning: O'Reilly Media; 2012.  Quinn AJ, Bederson BB: Human Computation: A Survey andTaxonomy of a Growing Field. In: CHI;Vancouver, BC, Canada. 2011.  Ranard BL, HaYP, Meisel ZF, Asch DA, Hill SS, Becker LB, Seymour AK, Merchant RM: Crowdsourcing--harnessing the masses to advance health and medicine, a systematic review. Journal ofGeneral Internal Medicine 2014, 29(1):187-203.  RaykarVC,Yu S, Zhao LH,Valadez GH, Florin C, Bogoni L, Moy L: Learning from Crowds. Journal of Machine Learning Research 2010, 11:1297-1332.  Ross J, Zaldivar A, Irani L:Who are theTurkers?Worker demographics in Amazon MechanicalTurk. In.: Department of Informatics, UC Irvine USA; 2009.  Surowiecki J:The Wisdom of Crowds: Doubleday; 2004.  Vakharia D, Lease M: Beyond AMT: AnAnalysis of Crowd Work Platforms. arXiv; 2013.  Von Ahn L: Games with a Purpose.Computer 2006, 39(6):92-94.  White R,Tatonetti NP, Shah NH, Altman RB, Horvitz E:Web-scale pharmacovigilance: listening to signals from the crowd. J Am Med InformAssoc 2013, 20:404-408.  Yuen M-C, King I, Leung K-S:A Survey of Crowdsourcing Systems. In: IEEE International Conference on Privacy, Security, Risk andTrust. 2011. 56

Editor's Notes

  • #3 Vox populi = “one vote, one value” 787 votes on ox weight, the median value was <1% off, mean was even closer Criteria Description Diversity of opinion Each person should have private information even if it's just an eccentric interpretation of the known facts. Independence People's opinions aren't determined by the opinions of those around them. Decentralization People are able to specialize and draw on local knowledge. Aggregation Some mechanism exists for turning private judgments into a collective decision.
  • #5 Drawn examples from biomedical research – many examples in other fields from astronomy to botany to ornithology
  • #7 Some links for distributed computing and crowdfunding on resources page
  • #8 Access to data can be hard
  • #11 Blurs the line between demand crowd data and observational crowdsourcing Example confounders – changes in search engine algorithm, seasonal searches, media reports, baseline search activity
  • #12 Olanzapine used to treat schizophrenia and bipolar depression Most frequently mentioned ADR was always a known ADR “We used the DailyStrength1 health-related social network as the source of user comments in this study. DailyStrength allows users to create profiles, maintain friends and join various disease-related support groups. It serves as a resource for patients to connect with others who have similar conditions, many of whom are friends solely online. As of 2007, DailyStrength had an average of 14,000 daily visitors, each spending 82 minutes on the site and viewing approximately 145 pages (comScore Media Metrix Canada, 2007).»
  • #13 DDI officially described in 2011, web search logs from 2010
  • #19 credit Aaron Koblin - integrate with previous
  • #20 Animate red box to emphasize Turkers don't see it
  • #21 Using NLP to tag diseases and conditions in drug labels. One disease at a time. Ask turkers to answer yes/no questions w.r.t. whether the highlighted disease is an indicated use of the highlighted drug.
  • #24 This is a jumping off point for the audience to consider.
  • #26 Note the differences between this and AMT. Incentives are different, tasks are the same, training same, aggregation same, Cost scales differently..
  • #33 CACAO Jim Hu.
  • #41 “Instrumental” ??
  • #46 Task-specific expertise is lost at end of experiment