Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Ben Hamner, CTO, Kaggle, at MLconf NYC 2017


Published on

Ben Hamner is Kaggle’s co-founder and CTO. At Kaggle, he currently’s focused on creating tools that empower data scientists to frictionlessly collaborate on analytics and promote their results. He has worked with machine learning across many domains, including natural language processing, computer vision, web classification, and neuroscience. Prior to Kaggle, Ben applied machine learning to improve brain-computer interfaces as a Whitaker Fellow at the École Polytechnique Fédérale de Lausanne in Lausanne, Switzerland. He graduated with a BSE in Biomedical Engineering, Electrical Engineering, and Math from Duke University.

Abstract Summary:

The Future of Kaggle: Where We Came From and Where We’re Going:
Kaggle started off running supervised machine learning competitions. This attracted a talented and diverse community that now has nearly one million members. It’s exposed us to hundreds of machine learning usecases, introduced hundreds of thousands to machine learning, and helped push the state of the art forward. We’ve expanded by launching an open data platform, Kaggle Datasets, along with a reproducible and collaborative machine learning platform, Kaggle Kernels. They have already achieved strong adoption by our community by making it simpler to get started with, share, and collaborate on data and code.

We’ve achieved less than 1% of what we’re capable of. Several weeks ago we launched an announced an acquisition by Google. This enables us to move forward more rapidly and ambitiously. Working with analytics and machine learning is fraught with pain right now. It’s the software engineering equivalent of programming in assembly. It’s tough to access data. It’s tough to collaborate. It’s tough to reproduce results. We’ve seen these pain points over, and over, and over again. We’ve seen them in how our customer’s internal teams function. We’ve experienced them collaborating with our customers. We’ve seen them as people approach our competitions individually, and they become even more pronounced when our users team up. We want to solve this, and foster an era of intelligent services that improve your lives every single day.

In this talk, I’ll go into depth on the lessons we’ve learned from running Kaggle and the most frustrating pain points we’ve seen. I’ll discuss how you can ameliorate these by leveraging current open source tools and technologies, and wrap up by painting a picture of the future we’re building towards.

Published in: Technology
  • Be the first to comment

Ben Hamner, CTO, Kaggle, at MLconf NYC 2017

  1. 1. The Future Of Kaggle Where we came from and where we’re going @benhamner
  2. 2. Our mission is to help the world learn from data @benhamner
  3. 3. We got started running supervised learning competitions @benhamner
  4. 4. Since 2010, we’ve run ● 240 general competitions ● 1,610 university classroom competitions We’re now doing this at scale @benhamner
  5. 5. This has attracted a talented and diverse community @benhamner
  6. 6. We’ve taught hundreds of thousands machine learning @benhamner
  7. 7. We’ve pushed the state of the art forward @benhamner
  8. 8. ● What techniques work well ● How people win competitions ● Why our community participates ● What major pain points data scientists hit ● How we can help data scientists ameliorate these pain points We’ve learned a tremendous amount along the way @benhamner
  9. 9. Great data scientists optimize the entire ML workflow @benhamner
  10. 10. GBM’s and deep neural networks are incredibly effective @benhamner
  11. 11. Model ensembling almost always ekes out gains @benhamner
  12. 12. Successful participants avoid overfitting @benhamner
  13. 13. We’ve seen major pain points @benhamner
  14. 14. Today’s practices are like programming in assembly @benhamner
  15. 15. Beside software engineering tools, ML tools feel like they came from the stone age @benhamner
  16. 16. Accessing data is tough @benhamner
  17. 17. Getting high quality data is even tougher @benhamner
  18. 18. Cleaning data is painful Essay: “This essay got good marks, but as far as I can tell, it's gibberish.” Human Scores: 5/5, 4/5 @benhamner
  19. 19. Data leakage is common and subtle @benhamner
  20. 20. Going from research to production can be brutal @benhamner
  21. 21. Reproducing work takes days to months @benhamner
  22. 22. We can do better than this @benhamner
  23. 23. Accessing data should be seamless @benhamner
  24. 24. You should never need to repeat work others have done @benhamner
  25. 25. A single command should reproduce everything start-to-end > make all @benhamner
  26. 26. Making a successful one-line update should take seconds @benhamner
  27. 27. Helpful metadata shouldn’t stay buried in minds or emails @benhamner
  28. 28. Best practices should be easy defaults, not complicated custom contraptions @benhamner
  29. 29. We’re changing this @benhamner
  30. 30. We’ve launched two new products: Kernels and Datasets @benhamner
  31. 31. We recently joined Google Cloud to accelerate our growth @benhamner
  32. 32. Datasets, Kernels, and Competitions have an exciting future @benhamner
  33. 33. The world’s data will be accessible with a common interface @benhamner
  34. 34. That captures the important code and metadata on top of it @benhamner
  35. 35. A central searchable hub for your organization’s data @benhamner
  36. 36. A kernel is an atom of reproducible data science @benhamner
  37. 37. Kernels will be your continuous integration server for data @benhamner
  38. 38. We’ve started running code competitions @benhamner
  39. 39. ● Backtested time series ● Live data feeds ● Reinforcement learning ● Generative modeling ● Adversarial learning ● Machine learning under computational constraints ● Sensitive datasets This will enable exciting new competition formats @benhamner