Successfully reported this slideshow.

Kaggle competitions, new friends, new skills and new opportunities



1 of 28
1 of 28

More Related Content

Related Books

Free with a 14 day trial from Scribd

See all

Kaggle competitions, new friends, new skills and new opportunities

  1. 1. Kaggle Competitions, New Friends, New Skills and New Opportunities Jo-fai (Joe) Chow Data Scientist @matlabulus
  2. 2. About Me • 2005 - 2015 • Water Engineer o Consultant for Utilities o EngD Research • 2015 - Present • Data Scientist o Virgin Media o Domino Data Lab o 2
  3. 3. About This Talk • What happened o Things I did since I started participating in Kaggle competitions. o New opportunities – results of new skills and friends. 3
  4. 4. First MOOC Experience • One of the first Massive Open Online Courses. o Met some new friends. o Decided to collaborate for fun. o “How about Kaggle?” o “What is Kaggle?” 4
  5. 5. First Kaggle Experience • First time in my life o Supervised learning • Random Forest • Support Vector Machine • Neural Networks o Train, Validate & Predict. o “Is it black magic?” 5
  6. 6. First Kaggle Experience • Problems o “Hey Joe, you are a nice guy but we can’t work together.” o “You love MATLAB so much. You even call yourself @matlabulous on twitter!” o “We prefer R/Python.” • Results o I kept using MATLAB o Lone wolf o No collaboration  6
  7. 7. Identifying Skills Gap • Obvious skills gap: o Open-source Programming langauges o Machine learning techniques o Collaboration • Kind of related o Data visualisation o Handling large datasets o Explaining results • That competition was a good wake up call. 7
  8. 8. From MATLAB to R/Python MATLAB Python R Neural Networks ✔️ ✔️ ✔️ Random Forest ✔️ ✔️ ✔️ SVM ✔️ ✔️ ✔️ Other Machine Learning Libraries Toolboxes (commercial + open source) Scikit-learn and many more CRAN, GitHub (A LOT!) Data Visualisation I wasn’t good at it anyway … Matplotlib (plus a lot more since then) ggplot2 (WOW!) (plus a lot more since then) 8
  9. 9. What can people do with R? 9 James Cheshire, UCL Link Paul Butler, Facebook Link
  10. 10. Filling the Skills Gap • More MOOC o Machine Learning • Andrew Ng (Coursera) o Data Analysis • Jeff Leek (Coursera) • R o Intro to Programming • Dave Evans (Udacity) • Python • Things I also picked up: o Linux (Ubuntu) o Git o Cloud computing o HTML / CSS 10
  11. 11. Learning from other Kagglers • Continuous learning o Kaggle’s forums and blogs. o New tools and tricks. o Many things you cannot learn from school. o I am standing on the shoulders of many Kagglers. 11
  12. 12. Side Project 1 – Crime Data Viz shiny::runGitHub("rApps", "woobe", subdir = "crimemap") 12
  13. 13. Other Side Projects • Colour Palette o woobe/rPlotter 13 • World Cup 2014 Correct Score Prediction o ML vs. my friends o 10 out of 64 (15.6%) o Friends’ avg. = 4 (6.3%) o woobe/wc2014
  14. 14. Open Up Myself • Before Kaggle/MOOC o I was drawing a circle around myself. o Fear of change. o Domain-specific problem solving. • After Kaggle/MOOC o Data-driven approach. o Not a subject matter expert? No worries  o Free to try new tools, to learn and to create. 14
  15. 15. New Opportunities • LondonR o First presentation outside water industry / academia. o Very positive feedback. o Led to other projects. o /londonr_crimemap 15
  16. 16. New Opportunities • useR! 2014 (UCLA) o Presented a poster. o Met new friends. o Life-changing event. o woobe/useR_2014 16
  17. 17. New Friends 17 Ramnath Vaidyanathan htmlwidgets DataRobot Nick @ DominoData Lab & John Chambers! rOpenSci RStudio Matt Dowle data.table (also at
  18. 18. About Domino and H2O 18
  19. 19. More Opportunities • First blog post about H2O o Things to try after useR! – Part1: Deep Learning with H2O 19
  20. 20. More Opportunities • Blog post about Domino and H2O o I did it for fun. I did not have any expectation. o It helped attract customers to both Domino and H2O. 20
  21. 21. Becoming a Data Scientist 21
  22. 22. London Kagglers Assemble • London Kaggle Meetup o Sep 2015 o I met my Kaggle buddy Mickael Le Gal o He is a product data scientist at Tictrac 22
  23. 23. London Kagglers Assemble • Rossmann Store Sales o We got stuck at top 10% for a long period. o Mickael had a breakthrough in feature engineering with 48 hours to go. o I re-trained all models and completed model stacking just a few hours before the deadline (thanks to Domino Data Lab). o Top 2% finish (our best result so far). 23
  24. 24. Joining 24
  25. 25. Summary of Benefits • Direct o Identify data science skills gap. o Learn quickly from the community. o Expand your network. o Prepare yourself for real- life data challenges. • Indirect o You also learn non-ML skills along the way. o You learn to build small data products (e.g. graph, web app, REST API) and help others gain insight. 25
  26. 26. Strata Hadoop London Talks • “Model stacking and parameters tuning.” o Wednesday 1 June o 7pm o LondonR at Strata o Free event • “Generalised Low Rank Models.” o Thursday 2 June o 2pm o Strata Hadoop main conference event 26
  27. 27. Big Thank You • London Kaggle Meetup Organisers • Mickael • Marios • You (Kagglers) • Prof. Dragan Savic (supervisor at uni. of Exeter) • Mango Solutions • Virgin Media • Domino Data Lab • 27
  28. 28. Any Questions? • Contact o o @matlabulous o • Links (All Slides) o meetups • H2O in London o Coming soon! • Meetups • Office o We’re hiring! o 28

Editor's Notes

  • Erin LeDell
  • ×