Crowdsourcing and Learning from Crowd Data (Tutorial @ PSB2015)

Robert Leaman Benjamin Good
Zhiyong Lu Andrew Su
http://slideshare.net/andrewsu

 The aggregated decisions of a group
are often better than the those of any
single member
 Requirements:
 Diversity
 Independence
 Decentralization
 Aggregation
2[Surowiecki, 2004]
Sir Francis Galton

 An undefined group of people
 Typically ‘large’
 Diverse skills and abilities
 Typically no special skills assumed
3
[Estelles-Arolas, 2012]

 Computational power
 Distributed computing
 Content
 Web searches, social media
updates, blogs
 Observations
 Online surveys
 Personal data
4[Good & Su, 2013]

 Cognitive power
 Visual reasoning, language
processing
 Creative effort
 Resource creation, algorithm
development
 Funding: $$$
5[Good & Su, 2013]

 Crowd data
 Content
 Search logs
 Crowdsourcing
 Observations
 Cognitive power
 Creative effort
 Not a focus in this
tutorial
 Distributed
computing
 Crowdfunding
6

 Access
 To the data; to the crowd
▪ 1 in 5 people have a smartphone worldwide
 Engagement
 Getting contributors’ attention
 Incentive
 Quality control
7

 Information reflects health
 Disease status
 Disease associations
 Health related behaviors
 Information also drives health
 Knowledge and beliefs regarding prevention and
treatment
 Quality monitoring of health information
available to public 8
“Infodemiology”
[Eysenbach, 2006]

 Key challenge: text
 Variability: tired, wiped, pooped  somnolence
 Ambiguity: numb  sensory or cognition?
 Two levels
 Keyword: locate specific terms + synonyms
 Concept: attempt to normalize mentions to
specific entities
 Measurement
 Disproportionality analysis
 Separating signal from noise
9

 Objective: predict flu
outbreaks from internet
search trends
 Access to search data via
direct access to logs or via
ad clicks
 High correlation between
clicks one week and cases
the next
 Caveats!
 Many potential confounders
10
[Eysenbach, 2006]
[Eysenbach, 2009]
[Ginsberg et al., 2009]
2004 2005 2006 2007
searches
cases

 Objective: Mine social media
forums for ADR reports
 Lexicon based on UMLS
Metathesaurus, SIDER,
MedEffect, and a set of
colloquial phrases (“zonked”,
misspellings)
 Demonstrated viability of
text mining (73.9% f-
measure)
 Revealed known ADRs and
putatively novel ADRs
Olanzapine Known
incidence
Corpus
Frequency
Weight gain 65% 30.0%
Fatigue 26% 15.9%
Increased
cholesterol
22% -
Increased
appetite
- 4.9%
Depression - 3.1%
Tremor - 2.7%
Diabetes 2% 2.6%
Anxiety - 1.4%
11
[Leaman et al., 2010]

 Objective: identify DDI from
internet search logs
 DDI reports difficult to find
 Focused on a DDI unknown at
time data collected
▪ Paroxetine + pravastatin 
hyperglycemia
 Synonyms
 Web searches
 Disproportionality analysis
 Results
 Significant association
 Classifying 31TP & 31TN pairs
▪ AUC = 0.82 12
[White et al., 2013]

 Outsourcing
 Tasks normally performed in-house
 To a large, diverse, external group
 Via an open call
13
[Estelles-Arolas, 2012]

EXPERT LABOR
 Must be found
 Expensive
 Often slow
 High quality
 Ambiguity OK
 Hard to use for
experiments
 Must be retained
CROWD LABOR
 Readily available
 Inexpensive
 Fast
 Quality variable
 Instructions must be clear
 Easy prototyping and
experimentation
 Retention less important
14

 Humans (even unskilled) simply better than
computers at some tasks
 Allows workflows to include an “HPU”
 Highly scalable
 Rapid turn-around
 High throughput
 Diverse solutions
 Low risk
 Low cost
15
[Quinn & Bederson, 2011]

 Microtask: low difficulty, large in number
 Observations or data processing
 Surveying, text or image annotation
 Validation: redundancy and aggregation
 Megatask: high difficulty, low in number
 Problem solving, creative effort
 Validation: manually, with metrics or rubric
16
[Good & Su, 2013]

MICROTASK
 Microtask market
 Citizen science
 Workflow
sequestration
 Casual game
 Educational
MEGATASK
 Innovation contest
 Hard game
 Collaborative
content creation
17
[Good & Su, 2013]

18
Requester
Tasks
Amazon
Tasks
Tasks
TasksTasks
Tasks
Tasks
Aggregation
function
Workers
http://www.thesheepmarket.com/

 Automatically tag all genes (NCBI’s gene tagger), all
mutations (UMBC’s EMU)
 Highlight candidate gene-mutation pairs in context
 Frame task as simple yes/no questions
Slide courtesy: L. Hirschman [Burger et al., 2012]

21
[Mea 2014]
Tagging cells for
breast cancer
based on stain

22
Requester
Tasks
Amazon
Tasks
Tasks
TasksTasks
Tasks
Tasks
Aggregation
function
Workers

 Baseline: majority vote
 Can we do better?
 Separate annotator bias and error
 Model annotator quality
▪ Measure with labeled data or reputation
 Model difficulty of each task
 Sometimes disagreement is informative
23
[Ipeirotis et al., 2010]
[Raykar et al., 2010]
[Arroyo &Welty, 2013]

MICROTASK
 Citizen science
 Workflow
sequestration
 Casual game
 Educational
MEGATASK
 Hard game
 Collaborative
content creation
24
[Good & Su, 2013]

 Volunteers label images of cell biopsies from
cancer patients
 Estimate presence and number of cancer cells
 Incentive
 Altruism, sense of mastery
 Quality
 training, redundancy
 Analyzed 2.4 million images as of 11/2014
25
[cellslider.net]

MICROTASK
 Citizen science
 Workflow
sequestration
 Casual game
 Educational
MEGATASK
 Hard game
 Collaborative
framework
26
[Good & Su, 2013]

EXAMPLE: RECAPTCHA,
 Workflow:
logging into a
website
 Sequestration:
performing
optical
character
recognition
27

EXAMPLE: PROBLEM-TREATMENT KNOWLEDGE BASE CREATION
 Workflow: prescribing medication
 Sequestration:entering reason for prescription
into ordering system
28
[Mccoy 2012]

MICROTASK
 Citizen science
 Workflow
sequestration
 Casual game
 Educational
MEGATASK
 Hard game
 Collaborative
content creation
29
[Good & Su, 2013]

30
MalariaSpot: Luengo-Ortiz 2012
MOLT: Mavandadi 2012

MICROTASK
 Citizen science
 Workflow
sequestration
 Casual game
 Educational
MEGATASK
 Hard game
 Collaborative
content creation
31
[Good & Su, 2013]

 Bioinformatics students simultaneously learn
and perform metagenome annotation
 Incentive:
educational
 Quality:
aggregation,
instructor
evaluation
32[Hingamp et al., 2008]

MICROTASK
 Citizen science
 Workflow
sequestration
 Casual game
 Educational
MEGATASK
 Hard game
 Collaborative
content creation
33
[Good & Su, 2013]

OPEN PROFESSIONAL PLATFORMS ($$$)
 Innocentive
 TopCoder
 Kaggle
ACADEMIC (PUBLICATIONS..)
 DREAM (see invited opening talk at crowdsourcing session)
 CASP
34

MICROTASK
 Citizen science
 Workflow
sequestration
 Casual game
 Educational
MEGATASK
 Hard game
 Collaborative
content creation
35
[Good & Su, 2013]

 Players manipulate proteins to find the 3D
shape with the lowest calculated free energy
 Competitive and collaborative
 Incentive
 Altruism, fun, community
 Quality
 Automated scoring
 High performance, found
a difficult key retroviral structure
36
[Khatib, et al., 2011]

MICROTASK
 Citizen science
 Workflow
sequestration
 Casual game
 Educational
MEGATASK
 Hard game
 Collaborative
content creation
37

 Aims to provide a
Wikipedia page for
every notable human
gene
 Repository of
functional knowledge
 10K distinct genes
 50M views & 15K edits
per year
38
[Huss et al., 2008]
[Good et al., 2011]

 Means many different things
 Fundamental points:
 Humans (even unskilled) simply better than
computers at some tasks
 There are a lot of humans available
 There are many approaches for accessing their
talents
39

INTRINSIC
 Altruism
 Fun
 Education
 Sense of mastery
 Resource creation
EXTRINSIC
 Money
 Recognition
 Community
40

 Define problem & goal
 Decide platform
 Decompose problem into tasks
 Separate: expert, crowdsourced & automatable
 Refine crowdsourced tasks
 Simple, clear, self-contained, engaging
 Design: instructions and user interface
41
[Hetmank, 2013]
[Alonso & Lease, 2011]
[Eickhoff & deVries, 2011]

 Iterate
 Test internally
 Calibrate with small crowdsourced sample
 Verify understanding, timing, pricing & quality
 Incorporate feedback
 Run production
 Scale on data before workers
 Validate results
42
[Hetmank, 2013]
[Alonso & Lease, 2011]
[Eickhoff & deVries, 2011]

 Automatic evaluation
 If possible
 Direct quality assessment
 Expensive
▪ Microtask: Include tasks with known answers
▪ Megatask: Evaluate tasks after completion (rubric)
 Aggregate redundant responses
43

PRO
 Reduced cost  more
data
 Fast turn-around time
 High throughput
 “Real world”
environment
 Public participation &
awareness
CON
 Potentially poor quality
 Spammers
 Potentially low
retention
 Privacy concerns for
sensitive data
 Lax protections for
workers
44

 Potentially poor quality: discussed previously
 Low retention
 Complicates quality estimation due to sparsity
 Do workers build task-specific expertise?
 Privacy
 Sensitive data requires trusted workers
45

 Protection for workers
 Low pay, no protections, benefits, or career path
 Potential to cause harm
▪ E.g. exposure to anti-vaccine information
 Is IRB approval needed?
 Can be addressed
 Responsibility of the researcher
▪ “[opportunity to] deliberately value ethics above cost
savings”
46
[Graber & Graber, 2013]
[Fort, Adda and Cohen, 2011]
[Fort, Adda and Cohen, 2011]

 Demographics:
 Shift from mostly US to US/India mix
 Average pay is <$2.00 / hour
 Over 30% rely on MTurk for basic income
 Workers not anonymous
 However:
 Tools can be used ethically or unethically
 Crowdsourcing ≠ AMT
47
[Ross et al., 2009]
[Lease et al., 2013]

 Improved predictability
 Pricing, quality, retention
 Improved infrastructure
 Data analysis, validation & aggregation
 Improved trust mechanisms
 Matching workers and tasks
 Relevant characteristics for matching each
 Increased mobility
48

 Crowdsourcing and learning from crowd data
offer distinct advantages
 Scalability
 Rapid turn-around
 Throughput
 Low cost
 Must be carefully planned and managed
49

 Wide variety of approaches and platforms
available
 Resources section lists several
 Many questions still open
 Science using crowdsourcing
 Science of crowdsourcing
50

 Thanks to the members of the crowd who make this
methodology possible
 Questions: robert.leaman@nih.gov,
bgood@scripps.edu, asu@scripps.edu
 Support:
 Robert Leaman & Zhiyong Lu:
▪ Intramural Research Program of National Library of Medicine, NIH
 Benjamin Good & Andrew Su:
▪ National Institute of General Medical Sciences, NIH: R01GM089820
and R01GM083924
▪ NationalCenter for AdvancingTranslational Sciences, NIH:
UL1TR001114
51

 Distributed computing: BOINC
 Microtask markets: Amazon MechanicalTurk,
Clickworker, SamaSource, many others
 Meta services: Crowdflower, Crowdsource
 Educational: annotathon.org
 Innovation contest: Innocentive,TopCoder
 Crowdfunding: Rockethub, Petridish
52

 Adar E:Why I hate MechanicalTurk research (and workshops). In: CHI: 2011;
Vancouver, BC, Canada. Citeseer.
 Alonso O, Lease M: Crowdsourcing for Information Retrieval: Principles, Methods
and Applications.Tutorial at ACM-SIGIR 2011.
 Aroyo L,Welty C: CrowdTruth: Harnessing disagreement in crowdsourcing a
relation extraction gold standard. In:WebSci2013 ACM 2013. 2013.
 Burger J, Doughty E, Bayer S,Tresner-Kirsch D,Wellner B, Aberdeen J, Lee K,
Kann M, Hirschman L:Validating Candidate Gene-Mutation Relations in MEDLINE
Abstracts via Crowdsourcing. In: Data Integration in the Life Sciences.vol. 7348:
Springer Berlin Heidelberg; 2012: 83-91.
 Eickhoff C, deVries A: How Crowdsourceable is yourTask? In:WSDM 2011
Workshop on Crowdsourcing for Search and Data Mining; Hong Kong, China. 2011:
11-14.
 Estelles-Arolas E, Gonzalez-Ladron-de-Guevara F:Towards an integrated
crowdsourcing definition. Journal of Information Science 2012, 38(189).
 Fort K, Adda G, Cohen KB: Amazon MechanicalTurk: Gold Mine or Coal Mine?
Computational Linguistics 2011, 37(2).
 Ginsberg J, Mohebbi MH, Patel RS, Brammer L, Smolinski MS, Brilliant L:
Detecting influenza epidemics using search engine query data. Nature 2009,
457(7232):1012-1014.
53

 Good BM, Clarke EL, de Alfaro L, Su AI: Gene Wiki in 2011: community
intelligence applied to human gene annotation. Nucleic Acids Res 2011, 40:D1255-
1261.
 Good BM, Su AI: Crowdsourcing for bioinformatics. Bioinformatics 2013,
29(16):1925-1933.
 Graber MA, Graber A: Internet-based crowdsourcing and research ethics: the
case for IRB review. Journal of medical ethics 2013, 39(2):115-118.
 Halevy A, Norvig P, Pereira F:The Unreasonable Effectiveness of Data. IEEE
Intelligent Systems 2009, 9:8-12.
 Harpaz R, Callahan A,Tamang S, LowY, Odgers D, Finlayson S, Jung K, LePendu
P, Shah NH:Text Mining for Adverse Drug Events: the Promise,Challenges, and
State of the Art. Drug Safety 2014, 37(10):777-790.
 Hetmank L: Components and Functions of Crowdsourcing Systems - A
Systematic Literature Review. In: 11th International Conference on
Wirtschaftsinformatik; Leipzip,Germany. 2013.
 Hingamp P, Brochier C,Talla E, Gautheret D,Thieffry D, Herrmann C:
Metagenome annotation using a distributed grid of undergraduate students. PLoS
biology 2008, 6(11):e296.
 Howe J: Crowdsourcing:Why the power of the crowd is driving the future of
business:Crown Business; 2009.
54

 Huss JW, Orozco D, Goodale J,Wu C, Batalov S,VickersTJ,Valafar F, Su AI:A
GeneWiki for Community Annotation of Gene Function. PLoS biology 2008,
6(7):e175.
 Ipeirotis P: Managing Crowdsourced Human Computation.Tutorial at WWW2011.
 Ipeirotis PG, Provost F,Wang J: Quality Management on Amazon Mechanical
Turk. In: KDD-HCOMP;Washington DC, USA. 2010.
 Khatib F, DiMaio F, Foldit Contenders G, FolditVoid Crushers G, Cooper S,
Kazmierczyk M, Gilski M, Krzywda S, Zabranska H, Pichova I et al: Crystal structure
of a monomeric retroviral protease solved by protein folding game players. Nature
structural & molecular biology 2011, 18(10):1175-1177.
 Leaman R,Wojtulewicz L, Sullivan R, Skariah A,Yang J, Gonzalez G:Towards
Internet-Age Pharmacovigilance: Extracting Adverse Drug Reactions from User
Posts to Health-Related Social Networks. In: BioNLPWorkshop; 2010: 117-125.
 Lease M, Hullman J, Bingham JP, Bernstein M, Kim J, LaseckiWS, Bakhshi S,
MitraT, Miller RC: MechanicalTurk is Not Anonymous. In.: Social Science Research
Network; 2013.
 Nakatsu RT, Grossman EB, Iacovou CL: A taxonomy of crowdsourcing based on
task complexity. Journal of Information Science 2014.
 Nielsen J: Usability Engineering:Academic Press; 1993.
55

 Pustejovsky J, Stubbs A: Natural Language Annotation for Machine Learning:
O'Reilly Media; 2012.
 Quinn AJ, Bederson BB: Human Computation: A Survey andTaxonomy of a
Growing Field. In: CHI;Vancouver, BC, Canada. 2011.
 Ranard BL, HaYP, Meisel ZF, Asch DA, Hill SS, Becker LB, Seymour AK, Merchant
RM: Crowdsourcing--harnessing the masses to advance health and medicine, a
systematic review. Journal ofGeneral Internal Medicine 2014, 29(1):187-203.
 RaykarVC,Yu S, Zhao LH,Valadez GH, Florin C, Bogoni L, Moy L: Learning from
Crowds. Journal of Machine Learning Research 2010, 11:1297-1332.
 Ross J, Zaldivar A, Irani L:Who are theTurkers?Worker demographics in Amazon
MechanicalTurk. In.: Department of Informatics, UC Irvine USA; 2009.
 Surowiecki J:The Wisdom of Crowds: Doubleday; 2004.
 Vakharia D, Lease M: Beyond AMT: AnAnalysis of Crowd Work Platforms. arXiv;
2013.
 Von Ahn L: Games with a Purpose.Computer 2006, 39(6):92-94.
 White R,Tatonetti NP, Shah NH, Altman RB, Horvitz E:Web-scale
pharmacovigilance: listening to signals from the crowd. J Am Med InformAssoc
2013, 20:404-408.
 Yuen M-C, King I, Leung K-S:A Survey of Crowdsourcing Systems. In: IEEE
International Conference on Privacy, Security, Risk andTrust. 2011.
56

Crowdsourcing and Learning from Crowd Data (Tutorial @ PSB2015)

More Related Content

Similar to Crowdsourcing and Learning from Crowd Data (Tutorial @ PSB2015)

More from Andrew Su

Recently uploaded

Crowdsourcing and Learning from Crowd Data (Tutorial @ PSB2015)

Editor's Notes