FCPCCS - Big Data and Crowdsourcing
Pattern-recognition and the
crowd
FCPCCS - Big Data and Crowdsourcing
What would you do with unlimited human analysts?
FCPCCS - Big Data and Crowdsourcing
FCPCCS - Big Data and Crowdsourcing
People
DataCategories
FCPCCS - Big Data and Crowdsourcing
Models
FCPCCS - Big Data and Crowdsourcing
FCPCCS - Big Data and Crowdsourcing
FCPCCS - Big Data and Crowdsourcing
Unstructured data gets structured (bonus: a
system that gets smarter over time)
Adaptive System
Machine
Learning
Optimization
Human
Annotation
Prediction
Engine
Structured Data Reports
Action
FCPCCS - Big Data and Crowdsourcing
80%
85%
99%
83%
81%
88%
87%
90%
73%
91%
0% 50% 100%
News Category 4
News Category 2
News Category 1
Manufacturing
Health Sciences
Finding Relevant News Articles
% analyst time saved
% accuracy
(compared to
humans)
Efficiency of human time is a major benefit
FCPCCS - Big Data and Crowdsourcing
FCPCCS - Big Data and Crowdsourcing
FCPCCS - Big Data and Crowdsourcing
The importance of definition
• If people can’t agree on what’s-in and what’s-out, it’s
hard to train a machine
FCPCCS - Big Data and Crowdsourcing
FCPCCS - Big Data and Crowdsourcing
Wait a sec! Aren’t these ducks?
(Can we agree to disagree?)
FCPCCS - Big Data and Crowdsourcing
The importance of definition
• If people can’t agree on what’s-in and what’s-out, it’s
hard to train a machine
• In our case toxicity was defined as:
• ad hominem attacks (directed at specific people)
• bigoted comments (e.g., sexist, racist, homophobic, etc)
• Set definitions
• Then see if people are consistent
• Run pilots
• Do inter-annotator agreement
• Iterate
FCPCCS - Big Data and Crowdsourcing
Inter-annotator agreement: is everyone
measuring the same way?
FCPCCS - Big Data and Crowdsourcing
Quick recommendation for inter-annotator
agreement
• You can measure consistency, probably the best way is
Krippendorff’s alpha
• Don’t use percentage agreement! Particularly when data are
skewed towards one category.
• If 95% of the data fall under one category label, then random
coding would still have two people agree so much that %
agreement would make you think you had a reliable study
(even though you wouldn’t)
• And you can ALSO use models to check these things
FCPCCS - Big Data and Crowdsourcing
Finding healthy communities (supportive)
FCPCCS - Big Data and Crowdsourcing
And unhealthy ones (toxic)
FCPCCS - Big Data and Crowdsourcing
FCPCCS - Big Data and Crowdsourcing
Collect data and annotations—then interrogate it
Human annotations
Which
people/categories
should we be wary
of?
Which annotations
do we select to train
a model with?
A classifier
that can
predict
unseen data
FCPCCS - Big Data and Crowdsourcing
Routing messages that matter
FCPCCS - Big Data and Crowdsourcing
Processing millions of SMS in 12 African languages
Intent of sender
(i.e. report a problem, ask
a question or make a
suggestion)
Categorization
(i.e. orphans and
vulnerable children,
violence against children,
health, nutrition)
Language detection
(i.e. English, Acholi,
Karamojong, Luganda,
Nkole, Swahili, Lango)
Location
(i.e. village names)
FCPCCS - Big Data and Crowdsourcing
FCPCCS - Big Data and Crowdsourcing
1.4%
FCPCCS - Big Data and Crowdsourcing
FCPCCS - Big Data and Crowdsourcing
Top 3 categories in Nigeria
9.69%
17.68%
39.44%
Employment
U-report support
Health
FCPCCS - Big Data and Crowdsourcing
The Donald Rumsfeld Question
FCPCCS - Big Data and Crowdsourcing
How do I find what I don’t know I don’t
know?
FCPCCS - Big Data and Crowdsourcing
Negative topics in Walmart employee reviews
Hours/Benefits
968
518
Management
2,404
Work/life balance
1,241
Company Values Dealing With
Customers
658
Training &
Expectation
968
Low Pay
1,446
FCPCCS - Big Data and Crowdsourcing
Common Pros among
Employees
Common Cons among Employees
37%
25% 24%
41%
27%
17%
0%
10%
20%
30%
40%
50%
Current
Former
24%
16%
13% 13%14%
16%
12%
0%
10%
20%
30%
Current
Former
Structuring unstructured data lets you combine it
with other metadata
FCPCCS - Big Data and Crowdsourcing
Question: What improves models the
most?
FCPCCS - Big Data and Crowdsourcing
Instead of worrying about the algorithms
in the machine
FCPCCS - Big Data and Crowdsourcing
It’s almost always better to just get more
pandas
FCPCCS - Big Data and Crowdsourcing
How else do you verify?
 We assess model accuracy using cross-validation.
 Instead of using all annotated data to train a model, you hold out a
random 10% and build the model with the rest.
 Then you predict against that 10%. You do this 10 times and average
the accuracy.
 Precision measures “if we automatically label something as
X, how often are we right?”
 Recall measures “how much of stuff that SHOULD have label
X are actually given label X?”
FCPCCS - Big Data and Crowdsourcing
The system gets smarter
 Here’s what happens going across the first 2,543
annotations on one REALLY low signal classification task
 By 9,744 annotations, our accuracy is 97%
FCPCCS - Big Data and Crowdsourcing
Other tasks are more straight-forward
0.00
0.10
0.20
0.30
0.40
0.50
0.60
0.70
0.80
0.90
1.00
50 100 150 200
F-score
Number of paragraphs annotated
F-scores go up with more annotations
Disease
Country
Reported_deaths
Reported_cases
Date
Issue
Location
People affected
# of deaths
Event date
FCPCCS - Big Data and Crowdsourcing
Project workflow
Phase 1:Data
• Data capture,
normalization and
loading
Phase 2:Discovery
• Topic discovery
• Category creation
• Expert data
annotation
• Category
verification
Phase 3:Training
• Guideline creation
• Annotator
validation
• Model training
Phase 4:
Optimization
• Model evaluation
• Category
refinement
Phase 5:Model
Deployment
• Full system
integration
• Model
performance
• Metrics reporting
FCPCCS - Big Data and Crowdsourcing
email tyler@idibon.com
twitter @idibon
www idibon.com
THANK YOU!

Crowdsourcing big data_industry_jun-25-2015_for_slideshare

  • 1.
    FCPCCS - BigData and Crowdsourcing Pattern-recognition and the crowd
  • 2.
    FCPCCS - BigData and Crowdsourcing What would you do with unlimited human analysts?
  • 3.
    FCPCCS - BigData and Crowdsourcing
  • 4.
    FCPCCS - BigData and Crowdsourcing People DataCategories
  • 5.
    FCPCCS - BigData and Crowdsourcing Models
  • 6.
    FCPCCS - BigData and Crowdsourcing
  • 7.
    FCPCCS - BigData and Crowdsourcing
  • 8.
    FCPCCS - BigData and Crowdsourcing Unstructured data gets structured (bonus: a system that gets smarter over time) Adaptive System Machine Learning Optimization Human Annotation Prediction Engine Structured Data Reports Action
  • 9.
    FCPCCS - BigData and Crowdsourcing 80% 85% 99% 83% 81% 88% 87% 90% 73% 91% 0% 50% 100% News Category 4 News Category 2 News Category 1 Manufacturing Health Sciences Finding Relevant News Articles % analyst time saved % accuracy (compared to humans) Efficiency of human time is a major benefit
  • 10.
    FCPCCS - BigData and Crowdsourcing
  • 11.
    FCPCCS - BigData and Crowdsourcing
  • 12.
    FCPCCS - BigData and Crowdsourcing The importance of definition • If people can’t agree on what’s-in and what’s-out, it’s hard to train a machine
  • 13.
    FCPCCS - BigData and Crowdsourcing
  • 14.
    FCPCCS - BigData and Crowdsourcing Wait a sec! Aren’t these ducks? (Can we agree to disagree?)
  • 15.
    FCPCCS - BigData and Crowdsourcing The importance of definition • If people can’t agree on what’s-in and what’s-out, it’s hard to train a machine • In our case toxicity was defined as: • ad hominem attacks (directed at specific people) • bigoted comments (e.g., sexist, racist, homophobic, etc) • Set definitions • Then see if people are consistent • Run pilots • Do inter-annotator agreement • Iterate
  • 16.
    FCPCCS - BigData and Crowdsourcing Inter-annotator agreement: is everyone measuring the same way?
  • 17.
    FCPCCS - BigData and Crowdsourcing Quick recommendation for inter-annotator agreement • You can measure consistency, probably the best way is Krippendorff’s alpha • Don’t use percentage agreement! Particularly when data are skewed towards one category. • If 95% of the data fall under one category label, then random coding would still have two people agree so much that % agreement would make you think you had a reliable study (even though you wouldn’t) • And you can ALSO use models to check these things
  • 18.
    FCPCCS - BigData and Crowdsourcing Finding healthy communities (supportive)
  • 19.
    FCPCCS - BigData and Crowdsourcing And unhealthy ones (toxic)
  • 20.
    FCPCCS - BigData and Crowdsourcing
  • 21.
    FCPCCS - BigData and Crowdsourcing Collect data and annotations—then interrogate it Human annotations Which people/categories should we be wary of? Which annotations do we select to train a model with? A classifier that can predict unseen data
  • 22.
    FCPCCS - BigData and Crowdsourcing Routing messages that matter
  • 23.
    FCPCCS - BigData and Crowdsourcing Processing millions of SMS in 12 African languages Intent of sender (i.e. report a problem, ask a question or make a suggestion) Categorization (i.e. orphans and vulnerable children, violence against children, health, nutrition) Language detection (i.e. English, Acholi, Karamojong, Luganda, Nkole, Swahili, Lango) Location (i.e. village names)
  • 24.
    FCPCCS - BigData and Crowdsourcing
  • 25.
    FCPCCS - BigData and Crowdsourcing 1.4%
  • 26.
    FCPCCS - BigData and Crowdsourcing
  • 27.
    FCPCCS - BigData and Crowdsourcing Top 3 categories in Nigeria 9.69% 17.68% 39.44% Employment U-report support Health
  • 28.
    FCPCCS - BigData and Crowdsourcing The Donald Rumsfeld Question
  • 29.
    FCPCCS - BigData and Crowdsourcing How do I find what I don’t know I don’t know?
  • 30.
    FCPCCS - BigData and Crowdsourcing Negative topics in Walmart employee reviews Hours/Benefits 968 518 Management 2,404 Work/life balance 1,241 Company Values Dealing With Customers 658 Training & Expectation 968 Low Pay 1,446
  • 31.
    FCPCCS - BigData and Crowdsourcing Common Pros among Employees Common Cons among Employees 37% 25% 24% 41% 27% 17% 0% 10% 20% 30% 40% 50% Current Former 24% 16% 13% 13%14% 16% 12% 0% 10% 20% 30% Current Former Structuring unstructured data lets you combine it with other metadata
  • 32.
    FCPCCS - BigData and Crowdsourcing Question: What improves models the most?
  • 33.
    FCPCCS - BigData and Crowdsourcing Instead of worrying about the algorithms in the machine
  • 34.
    FCPCCS - BigData and Crowdsourcing It’s almost always better to just get more pandas
  • 35.
    FCPCCS - BigData and Crowdsourcing How else do you verify?  We assess model accuracy using cross-validation.  Instead of using all annotated data to train a model, you hold out a random 10% and build the model with the rest.  Then you predict against that 10%. You do this 10 times and average the accuracy.  Precision measures “if we automatically label something as X, how often are we right?”  Recall measures “how much of stuff that SHOULD have label X are actually given label X?”
  • 36.
    FCPCCS - BigData and Crowdsourcing The system gets smarter  Here’s what happens going across the first 2,543 annotations on one REALLY low signal classification task  By 9,744 annotations, our accuracy is 97%
  • 37.
    FCPCCS - BigData and Crowdsourcing Other tasks are more straight-forward 0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.00 50 100 150 200 F-score Number of paragraphs annotated F-scores go up with more annotations Disease Country Reported_deaths Reported_cases Date Issue Location People affected # of deaths Event date
  • 38.
    FCPCCS - BigData and Crowdsourcing Project workflow Phase 1:Data • Data capture, normalization and loading Phase 2:Discovery • Topic discovery • Category creation • Expert data annotation • Category verification Phase 3:Training • Guideline creation • Annotator validation • Model training Phase 4: Optimization • Model evaluation • Category refinement Phase 5:Model Deployment • Full system integration • Model performance • Metrics reporting
  • 39.
    FCPCCS - BigData and Crowdsourcing email tyler@idibon.com twitter @idibon www idibon.com THANK YOU!

Editor's Notes

  • #4 http://nypost.com/2015/02/07/meet-the-bird-brains-batty-enough-to-go-bird-watching-in-winter/
  • #9 This is the basic stuff you want. (It’s a little self-serving because Idibon’s adaptive system is what makes us special but we really do believe that optimizing training on relevant data with meaningful categories is THE way to deliver business value.) By using computers to create an initial understanding of data and elevate specific cases for Human Annotation, we use computers to make human decisions smarter, and humans to make computer decisions smarter. Our system optimizes work by using cutting edge Machine Learning that improves accuracy and learns iteratively. Our Prediction Engine provides initial conclusions for further evaluation by human analysts and is also what allows us to scale ten of millions messages a day. Our Optimization process teaches our algorithm what results to select for, essentially refining its accuracy. The key take away here is that we optimize for human analysts time; we can cluster data initially and automatically, then we can escalate specific cases to human annotation. Much of the learning is unsupervised and therefore faster, cheaper and actually more accurate. After iterations in our adaptive system, previously unstructured data is now structured. This structured data can be delivered in different outputs, including CSV file exports for your analysts to build reports or direct routing to customer service agents to take action.
  • #10 As you can see—different categories have different results. News category 1 is awesome—you really don’t have to show human analysts much data to get all the Relevant stuff (you show them 10% of the data and still get 99% of what the client cares about) Manufacturing is less awesome. You can reduce your workload to just 73% of what it was…but you have to accept that you’ll only get 83% of the stuff you care about (you’ll miss 17%). If you want to get more like 90% accuracy, you need to review more documents. You “only” get a workload reduction of ~56%. Ideally, you want a system that gets better over time.
  • #11 First case study! http://idibon.com/toxicity-in-reddit-communities-a-journey-to-the-darkest-depths-of-the-interwebs/
  • #12 Lately, Reddit has gotten a lot of press for having terrible, awful communities
  • #13 See also http://cswww.essex.ac.uk/Research/nle/arrau/icagr.pdf
  • #14 http://blog.ioactive.com/2013/05/security-101-machine-learning-and-big.html
  • #15 The important thing is having definitions people will agree with and can be consistent with…and which actually answer organizational objectives. Do you care about whether duck decoys and/or rubber duckies are ducks or not? WHY? http://blog.ioactive.com/2013/05/security-101-machine-learning-and-big.html
  • #16 The trickiest thing about ad hominem attacks as a definition is: what to do with trash talk in sports/gaming. Tricky!
  • #18 The trickiest thing about ad hominem attacks as a definition is: what to do with trash talk in sports/gaming. Tricky!
  • #19 This is interactive, check out: http://idibon.com/toxicity-in-reddit-communities-a-journey-to-the-darkest-depths-of-the-interwebs/ The DIY (do it yourself) group is the one that is most supportive and least toxic. This data ties to actual upvote/downvote behavior. Meaning that you’re not actually a supportive community if everyone down votes the supportive comments, nor are you a toxic community if everyone downvotes the toxic comments.
  • #20 This is interactive, check out: http://idibon.com/toxicity-in-reddit-communities-a-journey-to-the-darkest-depths-of-the-interwebs/ It’s only when everyone upvotes toxic comments that you are a toxic community by our definition here.
  • #21 We also specifically looked at bigotry. Indeed, /r/TheRedPill, is seen as the most bigoted. It’s a subreddit dedicated to proud male chauvinism. This is interactive, check out: http://idibon.com/toxicity-in-reddit-communities-a-journey-to-the-darkest-depths-of-the-interwebs/
  • #23 Case study three: http://idibon.com/idibon-supports-unicef-provide-natural-language-processing-sms-based-social-monitoring-systems-africa/ Photo: http://unicefaids.tumblr.com/post/37835112363/photo-young-people-in-kitwe-zambia-explore-the
  • #24 The United Nations Children’s Fund (UNICEF) is a United Nations branch that provides long-term humanitarian and developmental assistance to children and mothers in developing countries. Idibon provides scalable natural language processing and analytics to UNICEF’s multinational U-report applications, enabling UNICEF to process text messages sent from citizens in Uganda and Nigeria “to better understand and empower marginalized communities that are often excluded due to language barriers.” (Evan Wheeler, CTO of UNICEF’s Global Innovation Centre) UNICEF U-report only has six dedicated analysts to process and respond to millions of messages a month and Idibon’s technology enables the organization to operate efficiently and at scale. Specifically, Idibon processes each SMS in four ways: Intent of sender – to prioritize support/services (UNICEF receives more than a million messages a month and can only respond to about a thousand) Categorization – to prioritize support/services and to route to appropriate analyst Language detection – to route to appropriate analyst Location – to identify where to send support/services Press release: http://unicefstories.org/2015/02/09/idibon-supports-unicef-to-provide-natural-language-processing-to-sms-based-social-monitoring-systems-in-africa/
  • #25 Environment is an important issue. But it looks to be about 1.4% of the data…which means you do have to get enough data to build a model. Note that different countries/languages talk about the environment differently (Uganda=droughts, cows; Nigeria: oil). So you may have more or less heterogeneity in your rarer categories. Image from http://www.theatlantic.com/photo/2011/06/nigeria-the-cost-of-oil/100082/ For more recent news: http://www.theguardian.com/environment/2015/jan/07/niger-delta-communities-to-sue-shell-in-london-for-oil-spill-compensation
  • #26 “Environment” is clearly an important issue in Nigeria but only 1.4% of the messages are classified that way. (One other thing: high/low percentages don’t necessarily correspond to personal or societal importance.)
  • #27 Each needle found makes the next one easier to find, buuuuuuut some things you want to find are just too rare. You can’t model things that aren’t in the data.
  • #28 At UNICEF, different people care about different categories—the people who respond to rumors of ebola outbreaks or cures are different than the people trying to keep track of economic issues. Most actionable is, of course, finding people who specifically require support about participating in the community.
  • #32 Pay and Opportunities are much less of a pro once employees have left Walmart and becomes more of a con Management is highly criticised amongst both current and former
  • #36 9,744 annotations total 951 for engageable 8793 for irrelevant