Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Lessons from 2MM machine learning models

58,055 views

Published on

Kaggle is a community of almost 400K data scientists who have built almost 2MM machine learning models to participate in our competitions. Data scientists come to Kaggle to learn, collaborate and develop the state of the art in machine learning. This talk will cover some of the lessons we have learned from the Kaggle community.

Published in: Data & Analytics
  • Login to see the comments

Lessons from 2MM machine learning models

  1. 1. Kaggle The home of data science
  2. 2. GE Flight Quest 2 Optimize flight routes based on weather & traffic $250,000 122 teams Hewlett Foundation: Automated Essay Scoring Develop an automated scoring algorithm for student-written essays $100,000 155 teams Allstate Purchase Prediction Challenge Develop an automated scoring algorithm for student-written essays $50,000 1,570 teams Merck Molecular Activity Challenge Help develop safe and effective medicines by predicting molecular activity $40,000 236 teams Higgs Boson Machine Learning Challenge Use the ATLAS experiment to identify the Higgs boson $13,000 1,302 teams
  3. 3. Age Income Default 58 $95,824 True 73 $20,708 False 59 $82,152 False 66 $25,334 True Age Income Default 73 $53,445 61 $36,679 47 $90,422 44 $79,040 Training Data Test Data The Kaggle Approach
  4. 4. Mapping Dark Matter Competition Progress Accuracy (lower is better) Week 1 Week 3 Week 5 Week 7 End .0150 .0170 Martin O’Leary PhD student in Glaciology, Cambridge U
  5. 5. “In less than a week, Martin O’Leary, a PhD student in glaciology, outperformed the state-of-the-art algorithms” “The world’s brightest physicists have been working for decades on solving one of the great unifying problems of our universe”
  6. 6. Mapping Dark Matter Competition Progress Accuracy (lower is better) Week 1 Week 3 Week 5 Week 7 End .0150 .0170 Martin O’Leary PhD student in Glaciology, Cambridge U Marius Cobzarenco Grad student in computer vision, UC London Ali Haissaine & Eu Jin Loc Signature Verification, Qatar U & Grad Student @ Deloitte Other deepZot (David Kirkby & Daniel Margala) Particle Physicist & Cosmologist
  7. 7. We’ve worked with many of the world’s largest companies Healthcare & Pharma Consumer Internet Finance IndustrialConsumer Marketing Oil & Gas $50b+ Beverage Co. Global Bank Top Credit Card Issuer Top 5 E&P Top 20 E&P
  8. 8. That submit over 100K machine learning models per month 0 20,000 40,000 60,000 80,000 100,000 120,000 140,000 160,000 May-10 May-11 May-12 May-13 May-14 May-15 Monthly Submissions to Kaggle Competitions
  9. 9. There’s a cookbook for winning competitions on structured data. It starts with exploring the data.
  10. 10. 2. Create and select features
  11. 11. 3. Parameter tuning and ensembling
  12. 12. A second cookbook is emerging on computer vision and speech problems. It involves using convolutional neural networks.
  13. 13. The vast majority of time is spent training algorithms when CNNs are applied.
  14. 14. There are the problems that land in the middle…
  15. 15. Anthony Goldbloom a@kaggle.com 650 283 9781

×