Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

How Can Crowdsourcing and Machine Learning Improve Speech Technology?

875 views

Published on

Can crowdsourcing be a reliable source of data for speech technology?

By DefinedCrowd. Presented at Crowdsourcing Week Global 2016. Learn more and join the next event: www.crowdsourcingweek.com

Published in: Technology
  • Be the first to comment

How Can Crowdsourcing and Machine Learning Improve Speech Technology?

  1. 1. How can crowdsourcing and machine learning improve speech technology? Joao Freitas, Daniela BragaCSW Global London April 14th 2016
  2. 2. April 2016 definedcrowd2 How many of you have tried speech recognition?
  3. 3. April 2016 definedcrowd3 Speech Technology is everywhere
  4. 4. April 2016 definedcrowd4 And it starts to understands you…
  5. 5. April 2016 definedcrowd5 What it takes to get there Large amounts of data Deep Learning 3000+ hours speech recordings + transcription 200+ words with pronunciations 0.5M natural language variants + semantic annotation Language and Product dependent!
  6. 6. April 2016 definedcrowd6 DefinedCrowd landscape We serve the data needs for AI and ML landscape. We’re a SaaS company that collects and enriches training data for AI, combining crowdsourcing and ML.
  7. 7. April 2016 definedcrowd7 The world before DefinedCrowd Louis, Speech Scientist Wants to test if the Chinese acoustic model works for Mandarin speakers in Singapore User Goal Hires: • Few vendors • 1PM • 1 Dev • 1 Chinese LE in-house What does he do? 50 hours of raw speech with… • Poor quality (~20% of garbage) • Unknown sources • Long wait What does he get?
  8. 8. April 2016 definedcrowd8 The world after DefinedCrowd Andy, Speech Scientist Wants to test if the Chinese acoustic model works for Mandarin speakers in Singapore User Goal Subscribes our platform What does he do? 50 hours of pure speech with… • High-quality • 100% transparency • 50% faster throughput What does he get? • Picks a template • Adjusts settings and picks the crowd • Launches the job • Collects the data How does he do it?
  9. 9. April 2016 definedcrowd 9 Our platform – enterprise side
  10. 10. April 2016 definedcrowd Unique crowd model US: 200+ Brazil: 200+ Taiwan: 100+ Russia: 200+ Japan: 100+ Korea: 100+ Ukraine (100+) Spain (100+) Portugal (100+) France (100+) Germany (100+) Denmark (50+) Sweden (50+) Finland (50+) Netherlands (50+) Italy (100+) Greece (100+) Czech Republic (100+) Poland (100+) Turkey (100+) Belgium (50+) Australia: 100+ New Zealand:50+ Mexico: 100+ Puerto Rico: 100+ Canada: 100+ China: 200+ Vietnam: 50+ Thailand: 50+ Malaysia: 50+ Singapore: 50+ India: 100+  30+ countries  100+ dialects  3,000 crowd
  11. 11. April 2016 definedcrowd11 We know a lot about our crowd Languages & Dialect User Activity Job Performance School & Courses Profile Info Other Jobs
  12. 12. April 2016 definedcrowd12 Why is Machine Learning relevant for Crowdsourcing?
  13. 13. April 2016 definedcrowd13 We learn from metadata to provide recommendations to customers and crowd members How we use Machine Learning
  14. 14. April 2016 definedcrowd14 How we detect spam Raw data • Logging system • Behavior measures Data Processing •Clean data •Transform data Feature Extraction • Task-related measures (e.g. average duration) • Session Duration • Execution peaks • Consensus score • Real-time audits Classification & Analysis • Detect outliers/ anomalies • Predict task / job duration
  15. 15. April 2016 definedcrowd15 Example of Results I
  16. 16. April 2016 definedcrowd Same results – Different perspective
  17. 17. April 2016 definedcrowd17 Another Dimension
  18. 18. April 2016 definedcrowd18 Quality in our platform 1. Combined score of Qualification Tests 2. Real-time Audits and Reviews 3. Majority Vote 4. Overall Majority 5. Worker Expertise 6. Task Subjectiveness 7. …
  19. 19. April 2016 definedcrowd19 Other predictions using Machine Learning Best quality / budget tradeoff Best match between job and crowd member Expected quality When will a job finish (even before it starts) Quality Time Cost
  20. 20. definedcrowd Intelligent data for AI contacts: joao@definedcrowd.com daniela@definedcrowd.com mail@definedcrowd.com

×