SlideShare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.
SlideShare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.
Successfully reported this slideshow.
Activate your 14 day free trial to unlock unlimited reading.
Kaggle competitions, new friends, new skills and new opportunities
Kaggle competitions, new friends, new skills and new opportunities
1.
Kaggle Competitions, New Friends, New
Skills and New Opportunities
Jo-fai (Joe) Chow
Data Scientist
joe@h2o.ai
@matlabulus
2.
About Me
• 2005 - 2015
• Water Engineer
o Consultant for Utilities
o EngD Research
• 2015 - Present
• Data Scientist
o Virgin Media
o Domino Data Lab
o H2O.ai
2
3.
About This Talk
• What happened
o Things I did since I
started participating in
Kaggle competitions.
o New opportunities –
results of new skills and
friends.
3
4.
First MOOC Experience
• One of the first Massive
Open Online Courses.
o Met some new friends.
o Decided to collaborate
for fun.
o “How about Kaggle?”
o “What is Kaggle?”
4
5.
First Kaggle Experience
• First time in my life
o Supervised learning
• Random Forest
• Support Vector Machine
• Neural Networks
o Train, Validate & Predict.
o “Is it black magic?”
5
6.
First Kaggle Experience
• Problems
o “Hey Joe, you are a nice
guy but we can’t work
together.”
o “You love MATLAB so
much. You even call
yourself @matlabulous
on twitter!”
o “We prefer R/Python.”
• Results
o I kept using MATLAB
o Lone wolf
o No collaboration
6
7.
Identifying Skills Gap
• Obvious skills gap:
o Open-source
Programming langauges
o Machine learning
techniques
o Collaboration
• Kind of related
o Data visualisation
o Handling large datasets
o Explaining results
• That competition was a good wake up call.
7
8.
From MATLAB to R/Python
MATLAB Python R
Neural Networks ✔️ ✔️ ✔️
Random Forest ✔️ ✔️ ✔️
SVM ✔️ ✔️ ✔️
Other Machine
Learning Libraries
Toolboxes
(commercial + open
source)
Scikit-learn and
many more
CRAN, GitHub
(A LOT!)
Data Visualisation I wasn’t good at it
anyway …
Matplotlib
(plus a lot more
since then)
ggplot2 (WOW!)
(plus a lot more
since then)
8
9.
What can people do with R?
9
James Cheshire, UCL
Link
Paul Butler, Facebook
Link
10.
Filling the Skills Gap
• More MOOC
o Machine Learning
• Andrew Ng (Coursera)
o Data Analysis
• Jeff Leek (Coursera)
• R
o Intro to Programming
• Dave Evans (Udacity)
• Python
• Things I also picked up:
o Linux (Ubuntu)
o Git
o Cloud computing
o HTML / CSS
10
11.
Learning from other Kagglers
• Continuous learning
o Kaggle’s forums and blogs.
o New tools and tricks.
o Many things you cannot
learn from school.
o I am standing on the
shoulders of many
Kagglers.
11
12.
Side Project 1 – Crime Data Viz
shiny::runGitHub("rApps", "woobe",
subdir = "crimemap")
12
13.
Other Side Projects
• Colour Palette
o github.com/
woobe/rPlotter
13
• World Cup 2014 Correct
Score Prediction
o ML vs. my friends
o 10 out of 64 (15.6%)
o Friends’ avg. = 4 (6.3%)
o github.com/
woobe/wc2014
14.
Open Up Myself
• Before Kaggle/MOOC
o I was drawing a circle
around myself.
o Fear of change.
o Domain-specific problem
solving.
• After Kaggle/MOOC
o Data-driven approach.
o Not a subject matter
expert? No worries
o Free to try new tools, to
learn and to create.
14
15.
New Opportunities
• LondonR
o First presentation
outside water industry /
academia.
o Very positive feedback.
o Led to other projects.
o bit.ly
/londonr_crimemap
15
16.
New Opportunities
• useR! 2014 (UCLA)
o Presented a poster.
o Met new friends.
o Life-changing event.
o github.com/
woobe/useR_2014
16
17.
New Friends
17
Ramnath Vaidyanathan
htmlwidgets
DataRobot
Nick @
DominoData
Lab
H2O.ai &
John Chambers!
rOpenSci
RStudio
Matt Dowle
data.table (also at H2O.ai)
19.
More Opportunities
• First blog post about
H2O
o Things to try after useR!
– Part1: Deep Learning
with H2O
19
20.
More Opportunities
• Blog post about Domino
and H2O
o I did it for fun. I did not
have any expectation.
o It helped attract
customers to both
Domino and H2O.
20
22.
London Kagglers Assemble
• London Kaggle Meetup
o Sep 2015
o I met my Kaggle buddy
Mickael Le Gal
o He is a product data
scientist at Tictrac
22
23.
London Kagglers Assemble
• Rossmann Store Sales
o We got stuck at top 10% for a long
period.
o Mickael had a breakthrough in feature
engineering with 48 hours to go.
o I re-trained all models and completed
model stacking just a few hours before
the deadline (thanks to Domino Data
Lab).
o Top 2% finish (our best result so far).
23
25.
Summary of Benefits
• Direct
o Identify data science
skills gap.
o Learn quickly from the
community.
o Expand your network.
o Prepare yourself for real-
life data challenges.
• Indirect
o You also learn non-ML
skills along the way.
o You learn to build small
data products (e.g.
graph, web app, REST
API) and help others gain
insight.
25
26.
Strata Hadoop London Talks
• “Model stacking and
parameters tuning.”
o Wednesday 1 June
o 7pm
o LondonR at Strata
o Free event
• “Generalised Low Rank
Models.”
o Thursday 2 June
o 2pm
o Strata Hadoop main
conference event
26
27.
Big Thank You
• London Kaggle Meetup
Organisers
• Mickael
• Marios
• You (Kagglers)
• Prof. Dragan Savic
(supervisor at uni. of
Exeter)
• Mango Solutions
• Virgin Media
• Domino Data Lab
• H2O.ai
27
28.
Any Questions?
• Contact
o joe@h2o.ai
o @matlabulous
o github.com/woobe
• Links (All Slides)
o github.com/h2oai/h2o-
meetups
• H2O in London
o Coming soon!
• Meetups
• Office
o We’re hiring!
o www.h2o.ai/careers
28