2. The aggregated decisions of a group
are often better than the those of any
single member
Requirements:
Diversity
Independence
Decentralization
Aggregation
2[Surowiecki, 2004]
Sir Francis Galton
3. An undefined group of people
Typically ‘large’
Diverse skills and abilities
Typically no special skills assumed
3
[Estelles-Arolas, 2012]
4. Computational power
Distributed computing
Content
Web searches, social media
updates, blogs
Observations
Online surveys
Personal data
4[Good & Su, 2013]
5. Cognitive power
Visual reasoning, language
processing
Creative effort
Resource creation, algorithm
development
Funding: $$$
5[Good & Su, 2013]
6. Crowd data
Content
Search logs
Crowdsourcing
Observations
Cognitive power
Creative effort
Not a focus in this
tutorial
Distributed
computing
Crowdfunding
6
7. Access
To the data; to the crowd
▪ 1 in 5 people have a smartphone worldwide
Engagement
Getting contributors’ attention
Incentive
Quality control
7
8. Information reflects health
Disease status
Disease associations
Health related behaviors
Information also drives health
Knowledge and beliefs regarding prevention and
treatment
Quality monitoring of health information
available to public 8
“Infodemiology”
[Eysenbach, 2006]
9. Key challenge: text
Variability: tired, wiped, pooped somnolence
Ambiguity: numb sensory or cognition?
Two levels
Keyword: locate specific terms + synonyms
Concept: attempt to normalize mentions to
specific entities
Measurement
Disproportionality analysis
Separating signal from noise
9
10. Objective: predict flu
outbreaks from internet
search trends
Access to search data via
direct access to logs or via
ad clicks
High correlation between
clicks one week and cases
the next
Caveats!
Many potential confounders
10
[Eysenbach, 2006]
[Eysenbach, 2009]
[Ginsberg et al., 2009]
2004 2005 2006 2007
searches
cases
11. Objective: Mine social media
forums for ADR reports
Lexicon based on UMLS
Metathesaurus, SIDER,
MedEffect, and a set of
colloquial phrases (“zonked”,
misspellings)
Demonstrated viability of
text mining (73.9% f-
measure)
Revealed known ADRs and
putatively novel ADRs
Olanzapine Known
incidence
Corpus
Frequency
Weight gain 65% 30.0%
Fatigue 26% 15.9%
Increased
cholesterol
22% -
Increased
appetite
- 4.9%
Depression - 3.1%
Tremor - 2.7%
Diabetes 2% 2.6%
Anxiety - 1.4%
11
[Leaman et al., 2010]
12. Objective: identify DDI from
internet search logs
DDI reports difficult to find
Focused on a DDI unknown at
time data collected
▪ Paroxetine + pravastatin
hyperglycemia
Synonyms
Web searches
Disproportionality analysis
Results
Significant association
Classifying 31TP & 31TN pairs
▪ AUC = 0.82 12
[White et al., 2013]
13. Outsourcing
Tasks normally performed in-house
To a large, diverse, external group
Via an open call
13
[Estelles-Arolas, 2012]
14. EXPERT LABOR
Must be found
Expensive
Often slow
High quality
Ambiguity OK
Hard to use for
experiments
Must be retained
CROWD LABOR
Readily available
Inexpensive
Fast
Quality variable
Instructions must be clear
Easy prototyping and
experimentation
Retention less important
14
15. Humans (even unskilled) simply better than
computers at some tasks
Allows workflows to include an “HPU”
Highly scalable
Rapid turn-around
High throughput
Diverse solutions
Low risk
Low cost
15
[Quinn & Bederson, 2011]
16. Microtask: low difficulty, large in number
Observations or data processing
Surveying, text or image annotation
Validation: redundancy and aggregation
Megatask: high difficulty, low in number
Problem solving, creative effort
Validation: manually, with metrics or rubric
16
[Good & Su, 2013]
17. MICROTASK
Microtask market
Citizen science
Workflow
sequestration
Casual game
Educational
MEGATASK
Innovation contest
Hard game
Collaborative
content creation
17
[Good & Su, 2013]
19. Automatically tag all genes (NCBI’s gene tagger), all
mutations (UMBC’s EMU)
Highlight candidate gene-mutation pairs in context
Frame task as simple yes/no questions
Slide courtesy: L. Hirschman [Burger et al., 2012]
23. Baseline: majority vote
Can we do better?
Separate annotator bias and error
Model annotator quality
▪ Measure with labeled data or reputation
Model difficulty of each task
Sometimes disagreement is informative
23
[Ipeirotis et al., 2010]
[Raykar et al., 2010]
[Arroyo &Welty, 2013]
24. MICROTASK
Microtask market
Citizen science
Workflow
sequestration
Casual game
Educational
MEGATASK
Innovation contest
Hard game
Collaborative
content creation
24
[Good & Su, 2013]
25. Volunteers label images of cell biopsies from
cancer patients
Estimate presence and number of cancer cells
Incentive
Altruism, sense of mastery
Quality
training, redundancy
Analyzed 2.4 million images as of 11/2014
25
[cellslider.net]
26. MICROTASK
Microtask market
Citizen science
Workflow
sequestration
Casual game
Educational
MEGATASK
Innovation contest
Hard game
Collaborative
framework
26
[Good & Su, 2013]
31. MICROTASK
Microtask market
Citizen science
Workflow
sequestration
Casual game
Educational
MEGATASK
Innovation contest
Hard game
Collaborative
content creation
31
[Good & Su, 2013]
32. Bioinformatics students simultaneously learn
and perform metagenome annotation
Incentive:
educational
Quality:
aggregation,
instructor
evaluation
32[Hingamp et al., 2008]
33. MICROTASK
Microtask market
Citizen science
Workflow
sequestration
Casual game
Educational
MEGATASK
Innovation contest
Hard game
Collaborative
content creation
33
[Good & Su, 2013]
34. OPEN PROFESSIONAL PLATFORMS ($$$)
Innocentive
TopCoder
Kaggle
ACADEMIC (PUBLICATIONS..)
DREAM (see invited opening talk at crowdsourcing session)
CASP
34
35. MICROTASK
Microtask market
Citizen science
Workflow
sequestration
Casual game
Educational
MEGATASK
Innovation contest
Hard game
Collaborative
content creation
35
[Good & Su, 2013]
36. Players manipulate proteins to find the 3D
shape with the lowest calculated free energy
Competitive and collaborative
Incentive
Altruism, fun, community
Quality
Automated scoring
High performance, found
a difficult key retroviral structure
36
[Khatib, et al., 2011]
37. MICROTASK
Microtask market
Citizen science
Workflow
sequestration
Casual game
Educational
MEGATASK
Innovation contest
Hard game
Collaborative
content creation
37
38. Aims to provide a
Wikipedia page for
every notable human
gene
Repository of
functional knowledge
10K distinct genes
50M views & 15K edits
per year
38
[Huss et al., 2008]
[Good et al., 2011]
39. Means many different things
Fundamental points:
Humans (even unskilled) simply better than
computers at some tasks
There are a lot of humans available
There are many approaches for accessing their
talents
39
40. INTRINSIC
Altruism
Fun
Education
Sense of mastery
Resource creation
EXTRINSIC
Money
Recognition
Community
40
41. Define problem & goal
Decide platform
Decompose problem into tasks
Separate: expert, crowdsourced & automatable
Refine crowdsourced tasks
Simple, clear, self-contained, engaging
Design: instructions and user interface
41
[Hetmank, 2013]
[Alonso & Lease, 2011]
[Eickhoff & deVries, 2011]
42. Iterate
Test internally
Calibrate with small crowdsourced sample
Verify understanding, timing, pricing & quality
Incorporate feedback
Run production
Scale on data before workers
Validate results
42
[Hetmank, 2013]
[Alonso & Lease, 2011]
[Eickhoff & deVries, 2011]
43. Automatic evaluation
If possible
Direct quality assessment
Expensive
▪ Microtask: Include tasks with known answers
▪ Megatask: Evaluate tasks after completion (rubric)
Aggregate redundant responses
43
44. PRO
Reduced cost more
data
Fast turn-around time
High throughput
“Real world”
environment
Public participation &
awareness
CON
Potentially poor quality
Spammers
Potentially low
retention
Privacy concerns for
sensitive data
Lax protections for
workers
44
45. Potentially poor quality: discussed previously
Low retention
Complicates quality estimation due to sparsity
Do workers build task-specific expertise?
Privacy
Sensitive data requires trusted workers
45
46. Protection for workers
Low pay, no protections, benefits, or career path
Potential to cause harm
▪ E.g. exposure to anti-vaccine information
Is IRB approval needed?
Can be addressed
Responsibility of the researcher
▪ “[opportunity to] deliberately value ethics above cost
savings”
46
[Graber & Graber, 2013]
[Fort, Adda and Cohen, 2011]
[Fort, Adda and Cohen, 2011]
47. Demographics:
Shift from mostly US to US/India mix
Average pay is <$2.00 / hour
Over 30% rely on MTurk for basic income
Workers not anonymous
However:
Tools can be used ethically or unethically
Crowdsourcing ≠ AMT
47
[Ross et al., 2009]
[Lease et al., 2013]
48. Improved predictability
Pricing, quality, retention
Improved infrastructure
Data analysis, validation & aggregation
Improved trust mechanisms
Matching workers and tasks
Relevant characteristics for matching each
Increased mobility
48
49. Crowdsourcing and learning from crowd data
offer distinct advantages
Scalability
Rapid turn-around
Throughput
Low cost
Must be carefully planned and managed
49
50. Wide variety of approaches and platforms
available
Resources section lists several
Many questions still open
Science using crowdsourcing
Science of crowdsourcing
50
51. Thanks to the members of the crowd who make this
methodology possible
Questions: robert.leaman@nih.gov,
bgood@scripps.edu, asu@scripps.edu
Support:
Robert Leaman & Zhiyong Lu:
▪ Intramural Research Program of National Library of Medicine, NIH
Benjamin Good & Andrew Su:
▪ National Institute of General Medical Sciences, NIH: R01GM089820
and R01GM083924
▪ NationalCenter for AdvancingTranslational Sciences, NIH:
UL1TR001114
51
53. Adar E:Why I hate MechanicalTurk research (and workshops). In: CHI: 2011;
Vancouver, BC, Canada. Citeseer.
Alonso O, Lease M: Crowdsourcing for Information Retrieval: Principles, Methods
and Applications.Tutorial at ACM-SIGIR 2011.
Aroyo L,Welty C: CrowdTruth: Harnessing disagreement in crowdsourcing a
relation extraction gold standard. In:WebSci2013 ACM 2013. 2013.
Burger J, Doughty E, Bayer S,Tresner-Kirsch D,Wellner B, Aberdeen J, Lee K,
Kann M, Hirschman L:Validating Candidate Gene-Mutation Relations in MEDLINE
Abstracts via Crowdsourcing. In: Data Integration in the Life Sciences.vol. 7348:
Springer Berlin Heidelberg; 2012: 83-91.
Eickhoff C, deVries A: How Crowdsourceable is yourTask? In:WSDM 2011
Workshop on Crowdsourcing for Search and Data Mining; Hong Kong, China. 2011:
11-14.
Estelles-Arolas E, Gonzalez-Ladron-de-Guevara F:Towards an integrated
crowdsourcing definition. Journal of Information Science 2012, 38(189).
Fort K, Adda G, Cohen KB: Amazon MechanicalTurk: Gold Mine or Coal Mine?
Computational Linguistics 2011, 37(2).
Ginsberg J, Mohebbi MH, Patel RS, Brammer L, Smolinski MS, Brilliant L:
Detecting influenza epidemics using search engine query data. Nature 2009,
457(7232):1012-1014.
53
54. Good BM, Clarke EL, de Alfaro L, Su AI: Gene Wiki in 2011: community
intelligence applied to human gene annotation. Nucleic Acids Res 2011, 40:D1255-
1261.
Good BM, Su AI: Crowdsourcing for bioinformatics. Bioinformatics 2013,
29(16):1925-1933.
Graber MA, Graber A: Internet-based crowdsourcing and research ethics: the
case for IRB review. Journal of medical ethics 2013, 39(2):115-118.
Halevy A, Norvig P, Pereira F:The Unreasonable Effectiveness of Data. IEEE
Intelligent Systems 2009, 9:8-12.
Harpaz R, Callahan A,Tamang S, LowY, Odgers D, Finlayson S, Jung K, LePendu
P, Shah NH:Text Mining for Adverse Drug Events: the Promise,Challenges, and
State of the Art. Drug Safety 2014, 37(10):777-790.
Hetmank L: Components and Functions of Crowdsourcing Systems - A
Systematic Literature Review. In: 11th International Conference on
Wirtschaftsinformatik; Leipzip,Germany. 2013.
Hingamp P, Brochier C,Talla E, Gautheret D,Thieffry D, Herrmann C:
Metagenome annotation using a distributed grid of undergraduate students. PLoS
biology 2008, 6(11):e296.
Howe J: Crowdsourcing:Why the power of the crowd is driving the future of
business:Crown Business; 2009.
54
55. Huss JW, Orozco D, Goodale J,Wu C, Batalov S,VickersTJ,Valafar F, Su AI:A
GeneWiki for Community Annotation of Gene Function. PLoS biology 2008,
6(7):e175.
Ipeirotis P: Managing Crowdsourced Human Computation.Tutorial at WWW2011.
Ipeirotis PG, Provost F,Wang J: Quality Management on Amazon Mechanical
Turk. In: KDD-HCOMP;Washington DC, USA. 2010.
Khatib F, DiMaio F, Foldit Contenders G, FolditVoid Crushers G, Cooper S,
Kazmierczyk M, Gilski M, Krzywda S, Zabranska H, Pichova I et al: Crystal structure
of a monomeric retroviral protease solved by protein folding game players. Nature
structural & molecular biology 2011, 18(10):1175-1177.
Leaman R,Wojtulewicz L, Sullivan R, Skariah A,Yang J, Gonzalez G:Towards
Internet-Age Pharmacovigilance: Extracting Adverse Drug Reactions from User
Posts to Health-Related Social Networks. In: BioNLPWorkshop; 2010: 117-125.
Lease M, Hullman J, Bingham JP, Bernstein M, Kim J, LaseckiWS, Bakhshi S,
MitraT, Miller RC: MechanicalTurk is Not Anonymous. In.: Social Science Research
Network; 2013.
Nakatsu RT, Grossman EB, Iacovou CL: A taxonomy of crowdsourcing based on
task complexity. Journal of Information Science 2014.
Nielsen J: Usability Engineering:Academic Press; 1993.
55
56. Pustejovsky J, Stubbs A: Natural Language Annotation for Machine Learning:
O'Reilly Media; 2012.
Quinn AJ, Bederson BB: Human Computation: A Survey andTaxonomy of a
Growing Field. In: CHI;Vancouver, BC, Canada. 2011.
Ranard BL, HaYP, Meisel ZF, Asch DA, Hill SS, Becker LB, Seymour AK, Merchant
RM: Crowdsourcing--harnessing the masses to advance health and medicine, a
systematic review. Journal ofGeneral Internal Medicine 2014, 29(1):187-203.
RaykarVC,Yu S, Zhao LH,Valadez GH, Florin C, Bogoni L, Moy L: Learning from
Crowds. Journal of Machine Learning Research 2010, 11:1297-1332.
Ross J, Zaldivar A, Irani L:Who are theTurkers?Worker demographics in Amazon
MechanicalTurk. In.: Department of Informatics, UC Irvine USA; 2009.
Surowiecki J:The Wisdom of Crowds: Doubleday; 2004.
Vakharia D, Lease M: Beyond AMT: AnAnalysis of Crowd Work Platforms. arXiv;
2013.
Von Ahn L: Games with a Purpose.Computer 2006, 39(6):92-94.
White R,Tatonetti NP, Shah NH, Altman RB, Horvitz E:Web-scale
pharmacovigilance: listening to signals from the crowd. J Am Med InformAssoc
2013, 20:404-408.
Yuen M-C, King I, Leung K-S:A Survey of Crowdsourcing Systems. In: IEEE
International Conference on Privacy, Security, Risk andTrust. 2011.
56
Editor's Notes
Vox populi = “one vote, one value”
787 votes on ox weight, the median value was <1% off, mean was even closer
Criteria Description
Diversity of opinion Each person should have private information even if it's just an eccentric interpretation of the known facts.
Independence People's opinions aren't determined by the opinions of those around them.
Decentralization People are able to specialize and draw on local knowledge.Aggregation Some mechanism exists for turning private judgments into a collective decision.
Drawn examples from biomedical research – many examples in other fields from astronomy to botany to ornithology
Some links for distributed computing and crowdfunding on resources page
Access to data can be hard
Blurs the line between demand crowd data and observational crowdsourcing
Example confounders – changes in search engine algorithm, seasonal searches, media reports, baseline search activity
Olanzapine used to treat schizophrenia and bipolar depression
Most frequently mentioned ADR was always a known ADR
“We used the DailyStrength1 health-related social
network as the source of user comments in this
study. DailyStrength allows users to create profiles,
maintain friends and join various disease-related
support groups. It serves as a resource for
patients to connect with others who have similar
conditions, many of whom are friends solely online.
As of 2007, DailyStrength had an average
of 14,000 daily visitors, each spending 82 minutes
on the site and viewing approximately 145 pages
(comScore Media Metrix Canada, 2007).»
DDI officially described in 2011, web search logs from 2010
credit Aaron Koblin - integrate with previous
Animate red box to emphasize Turkers don't see it
Using NLP to tag diseases and conditions in drug labels. One disease at a time. Ask turkers to answer yes/no questions w.r.t. whether the highlighted disease is an indicated use of the highlighted drug.
This is a jumping off point for the audience to consider.
Note the differences between this and AMT. Incentives are different, tasks are the same, training same, aggregation same,
Cost scales differently..
CACAO Jim Hu.
“Instrumental” ??
Task-specific expertise is lost at end of experiment