Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Kaggle Competitions, New Friends, New
Skills and New Opportunities
Jo-fai (Joe) Chow
Data Scientist
joe@h2o.ai
@matlabulus
About Me
• 2005 - 2015
• Water Engineer
o Consultant for Utilities
o EngD Research
• 2015 - Present
• Data Scientist
o Vir...
About This Talk
• What happened
o Things I did since I
started participating in
Kaggle competitions.
o New opportunities –...
First MOOC Experience
• One of the first Massive
Open Online Courses.
o Met some new friends.
o Decided to collaborate
for...
First Kaggle Experience
• First time in my life
o Supervised learning
• Random Forest
• Support Vector Machine
• Neural Ne...
First Kaggle Experience
• Problems
o “Hey Joe, you are a nice
guy but we can’t work
together.”
o “You love MATLAB so
much....
Identifying Skills Gap
• Obvious skills gap:
o Open-source
Programming langauges
o Machine learning
techniques
o Collabora...
From MATLAB to R/Python
MATLAB Python R
Neural Networks ✔️ ✔️ ✔️
Random Forest ✔️ ✔️ ✔️
SVM ✔️ ✔️ ✔️
Other Machine
Learnin...
What can people do with R?
9
James Cheshire, UCL
Link
Paul Butler, Facebook
Link
Filling the Skills Gap
• More MOOC
o Machine Learning
• Andrew Ng (Coursera)
o Data Analysis
• Jeff Leek (Coursera)
• R
o ...
Learning from other Kagglers
• Continuous learning
o Kaggle’s forums and blogs.
o New tools and tricks.
o Many things you ...
Side Project 1 – Crime Data Viz
shiny::runGitHub("rApps", "woobe",
subdir = "crimemap")
12
Other Side Projects
• Colour Palette
o github.com/
woobe/rPlotter
13
• World Cup 2014 Correct
Score Prediction
o ML vs. my...
Open Up Myself
• Before Kaggle/MOOC
o I was drawing a circle
around myself.
o Fear of change.
o Domain-specific problem
so...
New Opportunities
• LondonR
o First presentation
outside water industry /
academia.
o Very positive feedback.
o Led to oth...
New Opportunities
• useR! 2014 (UCLA)
o Presented a poster.
o Met new friends.
o Life-changing event.
o github.com/
woobe/...
New Friends
17
Ramnath Vaidyanathan
htmlwidgets
DataRobot
Nick @
DominoData
Lab
H2O.ai &
John Chambers!
rOpenSci
RStudio
M...
About Domino and H2O
18
More Opportunities
• First blog post about
H2O
o Things to try after useR!
– Part1: Deep Learning
with H2O
19
More Opportunities
• Blog post about Domino
and H2O
o I did it for fun. I did not
have any expectation.
o It helped attrac...
Becoming a Data Scientist
21
London Kagglers Assemble
• London Kaggle Meetup
o Sep 2015
o I met my Kaggle buddy
Mickael Le Gal
o He is a product data
s...
London Kagglers Assemble
• Rossmann Store Sales
o We got stuck at top 10% for a long
period.
o Mickael had a breakthrough ...
Joining H2O.ai
24
Summary of Benefits
• Direct
o Identify data science
skills gap.
o Learn quickly from the
community.
o Expand your network...
Strata Hadoop London Talks
• “Model stacking and
parameters tuning.”
o Wednesday 1 June
o 7pm
o LondonR at Strata
o Free e...
Big Thank You
• London Kaggle Meetup
Organisers
• Mickael
• Marios
• You (Kagglers)
• Prof. Dragan Savic
(supervisor at un...
Any Questions?
• Contact
o joe@h2o.ai
o @matlabulous
o github.com/woobe
• Links (All Slides)
o github.com/h2oai/h2o-
meetu...
Upcoming SlideShare
Loading in …5
×

Kaggle competitions, new friends, new skills and new opportunities

548 views

Published on

London Kaggle Meetup

  • Be the first to comment

Kaggle competitions, new friends, new skills and new opportunities

  1. 1. Kaggle Competitions, New Friends, New Skills and New Opportunities Jo-fai (Joe) Chow Data Scientist joe@h2o.ai @matlabulus
  2. 2. About Me • 2005 - 2015 • Water Engineer o Consultant for Utilities o EngD Research • 2015 - Present • Data Scientist o Virgin Media o Domino Data Lab o H2O.ai 2
  3. 3. About This Talk • What happened o Things I did since I started participating in Kaggle competitions. o New opportunities – results of new skills and friends. 3
  4. 4. First MOOC Experience • One of the first Massive Open Online Courses. o Met some new friends. o Decided to collaborate for fun. o “How about Kaggle?” o “What is Kaggle?” 4
  5. 5. First Kaggle Experience • First time in my life o Supervised learning • Random Forest • Support Vector Machine • Neural Networks o Train, Validate & Predict. o “Is it black magic?” 5
  6. 6. First Kaggle Experience • Problems o “Hey Joe, you are a nice guy but we can’t work together.” o “You love MATLAB so much. You even call yourself @matlabulous on twitter!” o “We prefer R/Python.” • Results o I kept using MATLAB o Lone wolf o No collaboration  6
  7. 7. Identifying Skills Gap • Obvious skills gap: o Open-source Programming langauges o Machine learning techniques o Collaboration • Kind of related o Data visualisation o Handling large datasets o Explaining results • That competition was a good wake up call. 7
  8. 8. From MATLAB to R/Python MATLAB Python R Neural Networks ✔️ ✔️ ✔️ Random Forest ✔️ ✔️ ✔️ SVM ✔️ ✔️ ✔️ Other Machine Learning Libraries Toolboxes (commercial + open source) Scikit-learn and many more CRAN, GitHub (A LOT!) Data Visualisation I wasn’t good at it anyway … Matplotlib (plus a lot more since then) ggplot2 (WOW!) (plus a lot more since then) 8
  9. 9. What can people do with R? 9 James Cheshire, UCL Link Paul Butler, Facebook Link
  10. 10. Filling the Skills Gap • More MOOC o Machine Learning • Andrew Ng (Coursera) o Data Analysis • Jeff Leek (Coursera) • R o Intro to Programming • Dave Evans (Udacity) • Python • Things I also picked up: o Linux (Ubuntu) o Git o Cloud computing o HTML / CSS 10
  11. 11. Learning from other Kagglers • Continuous learning o Kaggle’s forums and blogs. o New tools and tricks. o Many things you cannot learn from school. o I am standing on the shoulders of many Kagglers. 11
  12. 12. Side Project 1 – Crime Data Viz shiny::runGitHub("rApps", "woobe", subdir = "crimemap") 12
  13. 13. Other Side Projects • Colour Palette o github.com/ woobe/rPlotter 13 • World Cup 2014 Correct Score Prediction o ML vs. my friends o 10 out of 64 (15.6%) o Friends’ avg. = 4 (6.3%) o github.com/ woobe/wc2014
  14. 14. Open Up Myself • Before Kaggle/MOOC o I was drawing a circle around myself. o Fear of change. o Domain-specific problem solving. • After Kaggle/MOOC o Data-driven approach. o Not a subject matter expert? No worries  o Free to try new tools, to learn and to create. 14
  15. 15. New Opportunities • LondonR o First presentation outside water industry / academia. o Very positive feedback. o Led to other projects. o bit.ly /londonr_crimemap 15
  16. 16. New Opportunities • useR! 2014 (UCLA) o Presented a poster. o Met new friends. o Life-changing event. o github.com/ woobe/useR_2014 16
  17. 17. New Friends 17 Ramnath Vaidyanathan htmlwidgets DataRobot Nick @ DominoData Lab H2O.ai & John Chambers! rOpenSci RStudio Matt Dowle data.table (also at H2O.ai)
  18. 18. About Domino and H2O 18
  19. 19. More Opportunities • First blog post about H2O o Things to try after useR! – Part1: Deep Learning with H2O 19
  20. 20. More Opportunities • Blog post about Domino and H2O o I did it for fun. I did not have any expectation. o It helped attract customers to both Domino and H2O. 20
  21. 21. Becoming a Data Scientist 21
  22. 22. London Kagglers Assemble • London Kaggle Meetup o Sep 2015 o I met my Kaggle buddy Mickael Le Gal o He is a product data scientist at Tictrac 22
  23. 23. London Kagglers Assemble • Rossmann Store Sales o We got stuck at top 10% for a long period. o Mickael had a breakthrough in feature engineering with 48 hours to go. o I re-trained all models and completed model stacking just a few hours before the deadline (thanks to Domino Data Lab). o Top 2% finish (our best result so far). 23
  24. 24. Joining H2O.ai 24
  25. 25. Summary of Benefits • Direct o Identify data science skills gap. o Learn quickly from the community. o Expand your network. o Prepare yourself for real- life data challenges. • Indirect o You also learn non-ML skills along the way. o You learn to build small data products (e.g. graph, web app, REST API) and help others gain insight. 25
  26. 26. Strata Hadoop London Talks • “Model stacking and parameters tuning.” o Wednesday 1 June o 7pm o LondonR at Strata o Free event • “Generalised Low Rank Models.” o Thursday 2 June o 2pm o Strata Hadoop main conference event 26
  27. 27. Big Thank You • London Kaggle Meetup Organisers • Mickael • Marios • You (Kagglers) • Prof. Dragan Savic (supervisor at uni. of Exeter) • Mango Solutions • Virgin Media • Domino Data Lab • H2O.ai 27
  28. 28. Any Questions? • Contact o joe@h2o.ai o @matlabulous o github.com/woobe • Links (All Slides) o github.com/h2oai/h2o- meetups • H2O in London o Coming soon! • Meetups • Office o We’re hiring! o www.h2o.ai/careers 28

×