How Public Sector is using Mechanical Turk

266 views

Published on

AWS World Wide Public Sector Symposium session 2 - how is Mechanical Turk being used to transform business process in the Public Sector

Published in: Internet
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
266
On SlideShare
0
From Embeds
0
Number of Embeds
10
Actions
Shares
0
Downloads
3
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • We knew it should be less expensive – certainly less capital intensive. We knew it should be more scalable – what we didn’t know was how would it compare on the other fronts.
  • The current technology solution produced limitations that impacted our ability to support the business needs.
  • We knew it should be less expensive – certainly less capital intensive. We knew it should be more scalable – what we didn’t know was how would it compare on the other fronts.
  • The current technology solution produced limitations that impacted our ability to support the business needs.
  • We knew it should be less expensive – certainly less capital intensive. We knew it should be more scalable – what we didn’t know was how would it compare on the other fronts.
  • The current technology solution produced limitations that impacted our ability to support the business needs.
  • How Public Sector is using Mechanical Turk

    1. 1. AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014 AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014 Transformational Impact of Cloud Labor John Hoskins & Daniel Gray jhoskins@amazon.com djgray@amazon.com
    2. 2. AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014 22 ][ How is Mechanical Turk impacting Business?
    3. 3. AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014 Forestry Service wants to provide real time online campsite booking • 350,000 individual campsites – exact location is unknown • Thousands of campgrounds with little or no POI data (bathroom? shower? Boat ramp?) • No concierge for a double booking
    4. 4. AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014 US Copyright Office would like to provide internet access to CR data • Current data is contained exclusively on cards and microfilm • Scanning project is underway • No taxonomy for discovery “What would the internet be without a search engine?”
    5. 5. AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014 5 Business Need[ ] 5 The FDA wants to provide instant access to product and drug recall and interaction information to better protect consumers.
    6. 6. AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014 • Over 2 MILLION serious ADRs yearly • 100,000 DEATHS yearly • ADRs 4th leading cause of death ahead of pulmonary disease, diabetes, AIDS, pneumonia, accidents and automobile death Why[ ]
    7. 7. AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014 7 Business Problem[ ] 7 Reports of interactions are delivered randomly and the current process to extract data from thousands of forms causes significant lag in its availability
    8. 8. AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014 • Data can be received in multiple formats – forms, written and typed, email, electronic . . . • Data is subject to HIPAA privacy regulations. • Accuracy and response time are critical – budget constraint obvious 8 Challenge[ ] 8
    9. 9. AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014 • Technology can shred the form into field level or below • OCR makes a pass at recognizing the data • Workers correct OCR. • Data from workers is reconstructed into digital input for the database • Data is made available through the API openFDA 9 Solution[ ] 9
    10. 10. AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 20141 0 Business Need[ ] 10 A Government Defense contractor needs to update its natural language processing system to accommodate “internet speak”.
    11. 11. AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 20141 1 Business Problem[ ] 11 Comments from the internet in the form of posts and tweets more closely resemble spoken language – while NLP is predicated on written language.
    12. 12. AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014 • NLP is involved in a mission critical defense system and is missing significant data due to inaccuracies. • Cross referencing spoken language to written language in Arabic is uniquely complex • Training requires millions of data points of ground truth 1 2 Challenge[ ] 12
    13. 13. AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014 • Internet crawler scrapes posts with interesting key words and phrases. • Phrases are translated by 5 unique native Arabic speakers (5 dialects) with English as their second language • Each of the 5 phrases are corrected by English grammar experts • The five corrected phrases are voted on by a panel of 5 additional workers • The best phrase (highest score with least corrections) is sent to 5 native English speakers with Arabic as second language for translation • Each result is corrected by Arabic grammar experts and then voted on • Best result is fed into NLP with original phrase for learning 1 3 Solution[ ] 13
    14. 14. AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 20141 4 Business Need[ ] 14 Army Research Labs needed to annotate verbs across many permutations against actual human actions to train robots to recognize
    15. 15. AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 20141 5 Business Problem[ ] 15 The volume of data required placed significant delays on the project – yet accuracy was paramount to the results
    16. 16. AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014 • Sample consisted of 100 different samples of 10 permutations of 35 verbs – 350,000 videos • At 20 seconds each that’s almost 2000 hours – a person year. • Project needed completion within 60 days 1 6 Challenge[ ] 16
    17. 17. AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014 • Workers were given 50 videos per task and asked if the video represented a given verb permutation • Gold standard videos were included in each batch of 50 • Vote consisted of 2 workers with 100% Gold standard accuracy agreeing 1 7 Solution[ ] 17
    18. 18. AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014 AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014 Thank You http://www.mturk.com 18 John Hoskins, Amazon Mechanical Turk hoskins@amazon.com

    ×