Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Open Data Science Conference 2015

743 views

Published on

Lukas Biewald's presentation at the Open Data Science Conference "We Need Open Data to Become the Next Open Source"

Published in: Data & Analytics
  • Be the first to comment

Open Data Science Conference 2015

  1. 1. Why We Need More Data
  2. 2. Lots of Data 3
  3. 3. The Effect of Better Algorithms 4 CrowdFlower, Inc. – Proprietary and Confidential 0% 5% 10% 15% 20% 25% Naïve Bayes Maximum Entropy SVM Classifier Error Rate
  4. 4. Real World Data 5 Active Semi-Supervised Learning for Improving Word Alignment (Vamshi ACL ’10)
  5. 5. The Effect of Better Features 6 CrowdFlower, Inc. – Proprietary and Confidential 0% 5% 10% 15% 20% 25% 30% Unigrams Bigrams Unigrams+Bigrams Classifier Error Rate
  6. 6. Real World Data 7
  7. 7. The Effect of More Data 8 CrowdFlower, Inc. – Proprietary and Confidential 0% 2% 4% 6% 8% 10% 12% 14% N 2N 4N Classifier Error Rate
  8. 8. Real World Data 9 Active Semi-Supervised Learning for Improving Word Alignment (Vamshi ACL ’10)
  9. 9. The Effect of Cleaner Data 10 CrowdFlower, Inc. – Proprietary and Confidential 0% 2% 4% 6% 8% 10% 12% 14% 90% Accurate Data 95% Accurate Data 100% Accurate Data Classifier Error Rate
  10. 10. 11 Where do Data Scientists Spend Their Time
  11. 11. The Power of Open Data
  12. 12. CrowdFlower Data Enrichment Platform 13
  13. 13. Color Data 14
  14. 14. 15
  15. 15. 16
  16. 16. 17
  17. 17. 18
  18. 18. 19
  19. 19. 20
  20. 20. Fleshmap 21
  21. 21. 22
  22. 22. Drug Side Effects 23
  23. 23. 24
  24. 24. 25
  25. 25. Apple Watch 26
  26. 26. Apple Watch 27
  27. 27. Apple Watch 28
  28. 28. Apple Watch 29
  29. 29. Data for Everyone
  30. 30. Collecting the Same Data Over and Over 31
  31. 31. Open Data 32
  32. 32. Make Your Data Public Setting 33
  33. 33. Data for Everyone 34
  34. 34. Data For Everyone Library 35
  35. 35. Data for Everyone 36
  36. 36. Data For Everyone 37
  37. 37. Categorize URLs 38
  38. 38. URL Categorization 39
  39. 39. Open Data API 40
  40. 40. Record Data 41
  41. 41. Extracting Names and Titles 42
  42. 42. Summarization 43
  43. 43. Is an Image Funny? 44
  44. 44. Classifying Medical Images 45
  45. 45. Attributes of People 46
  46. 46. 47
  47. 47. 396 Scripts 48
  48. 48. Lukas Biewald lukas@crowdflower.com @L2K Thank You

×