Kaggle competitions, new friends, new skills and new opportunities

Jo-fai Chow
Jo-fai ChowData Scientist at H2O.ai - Customer Success, Smart Apps, Machine Learning & Data Visualization
Kaggle Competitions, New Friends, New
Skills and New Opportunities
Jo-fai (Joe) Chow
Data Scientist
joe@h2o.ai
@matlabulus
About Me
• 2005 - 2015
• Water Engineer
o Consultant for Utilities
o EngD Research
• 2015 - Present
• Data Scientist
o Virgin Media
o Domino Data Lab
o H2O.ai
2
About This Talk
• What happened
o Things I did since I
started participating in
Kaggle competitions.
o New opportunities –
results of new skills and
friends.
3
First MOOC Experience
• One of the first Massive
Open Online Courses.
o Met some new friends.
o Decided to collaborate
for fun.
o “How about Kaggle?”
o “What is Kaggle?”
4
First Kaggle Experience
• First time in my life
o Supervised learning
• Random Forest
• Support Vector Machine
• Neural Networks
o Train, Validate & Predict.
o “Is it black magic?”
5
First Kaggle Experience
• Problems
o “Hey Joe, you are a nice
guy but we can’t work
together.”
o “You love MATLAB so
much. You even call
yourself @matlabulous
on twitter!”
o “We prefer R/Python.”
• Results
o I kept using MATLAB
o Lone wolf
o No collaboration 
6
Identifying Skills Gap
• Obvious skills gap:
o Open-source
Programming langauges
o Machine learning
techniques
o Collaboration
• Kind of related
o Data visualisation
o Handling large datasets
o Explaining results
• That competition was a good wake up call.
7
From MATLAB to R/Python
MATLAB Python R
Neural Networks ✔️ ✔️ ✔️
Random Forest ✔️ ✔️ ✔️
SVM ✔️ ✔️ ✔️
Other Machine
Learning Libraries
Toolboxes
(commercial + open
source)
Scikit-learn and
many more
CRAN, GitHub
(A LOT!)
Data Visualisation I wasn’t good at it
anyway …
Matplotlib
(plus a lot more
since then)
ggplot2 (WOW!)
(plus a lot more
since then)
8
What can people do with R?
9
James Cheshire, UCL
Link
Paul Butler, Facebook
Link
Filling the Skills Gap
• More MOOC
o Machine Learning
• Andrew Ng (Coursera)
o Data Analysis
• Jeff Leek (Coursera)
• R
o Intro to Programming
• Dave Evans (Udacity)
• Python
• Things I also picked up:
o Linux (Ubuntu)
o Git
o Cloud computing
o HTML / CSS
10
Learning from other Kagglers
• Continuous learning
o Kaggle’s forums and blogs.
o New tools and tricks.
o Many things you cannot
learn from school.
o I am standing on the
shoulders of many
Kagglers.
11
Side Project 1 – Crime Data Viz
shiny::runGitHub("rApps", "woobe",
subdir = "crimemap")
12
Other Side Projects
• Colour Palette
o github.com/
woobe/rPlotter
13
• World Cup 2014 Correct
Score Prediction
o ML vs. my friends
o 10 out of 64 (15.6%)
o Friends’ avg. = 4 (6.3%)
o github.com/
woobe/wc2014
Open Up Myself
• Before Kaggle/MOOC
o I was drawing a circle
around myself.
o Fear of change.
o Domain-specific problem
solving.
• After Kaggle/MOOC
o Data-driven approach.
o Not a subject matter
expert? No worries 
o Free to try new tools, to
learn and to create.
14
New Opportunities
• LondonR
o First presentation
outside water industry /
academia.
o Very positive feedback.
o Led to other projects.
o bit.ly
/londonr_crimemap
15
New Opportunities
• useR! 2014 (UCLA)
o Presented a poster.
o Met new friends.
o Life-changing event.
o github.com/
woobe/useR_2014
16
New Friends
17
Ramnath Vaidyanathan
htmlwidgets
DataRobot
Nick @
DominoData
Lab
H2O.ai &
John Chambers!
rOpenSci
RStudio
Matt Dowle
data.table (also at H2O.ai)
About Domino and H2O
18
More Opportunities
• First blog post about
H2O
o Things to try after useR!
– Part1: Deep Learning
with H2O
19
More Opportunities
• Blog post about Domino
and H2O
o I did it for fun. I did not
have any expectation.
o It helped attract
customers to both
Domino and H2O.
20
Becoming a Data Scientist
21
London Kagglers Assemble
• London Kaggle Meetup
o Sep 2015
o I met my Kaggle buddy
Mickael Le Gal
o He is a product data
scientist at Tictrac
22
London Kagglers Assemble
• Rossmann Store Sales
o We got stuck at top 10% for a long
period.
o Mickael had a breakthrough in feature
engineering with 48 hours to go.
o I re-trained all models and completed
model stacking just a few hours before
the deadline (thanks to Domino Data
Lab).
o Top 2% finish (our best result so far).
23
Joining H2O.ai
24
Summary of Benefits
• Direct
o Identify data science
skills gap.
o Learn quickly from the
community.
o Expand your network.
o Prepare yourself for real-
life data challenges.
• Indirect
o You also learn non-ML
skills along the way.
o You learn to build small
data products (e.g.
graph, web app, REST
API) and help others gain
insight.
25
Strata Hadoop London Talks
• “Model stacking and
parameters tuning.”
o Wednesday 1 June
o 7pm
o LondonR at Strata
o Free event
• “Generalised Low Rank
Models.”
o Thursday 2 June
o 2pm
o Strata Hadoop main
conference event
26
Big Thank You
• London Kaggle Meetup
Organisers
• Mickael
• Marios
• You (Kagglers)
• Prof. Dragan Savic
(supervisor at uni. of
Exeter)
• Mango Solutions
• Virgin Media
• Domino Data Lab
• H2O.ai
27
Any Questions?
• Contact
o joe@h2o.ai
o @matlabulous
o github.com/woobe
• Links (All Slides)
o github.com/h2oai/h2o-
meetups
• H2O in London
o Coming soon!
• Meetups
• Office
o We’re hiring!
o www.h2o.ai/careers
28
1 of 28

Recommended

Introduction to Generalised Low-Rank Model and Missing Values by
Introduction to Generalised Low-Rank Model and Missing ValuesIntroduction to Generalised Low-Rank Model and Missing Values
Introduction to Generalised Low-Rank Model and Missing ValuesJo-fai Chow
2.3K views22 slides
Improving Model Predictions via Stacking and Hyper-parameters Tuning by
Improving Model Predictions via Stacking and Hyper-parameters TuningImproving Model Predictions via Stacking and Hyper-parameters Tuning
Improving Model Predictions via Stacking and Hyper-parameters TuningJo-fai Chow
2.1K views18 slides
Using H2O Random Grid Search for Hyper-parameters Optimization by
Using H2O Random Grid Search for Hyper-parameters OptimizationUsing H2O Random Grid Search for Hyper-parameters Optimization
Using H2O Random Grid Search for Hyper-parameters OptimizationJo-fai Chow
2.5K views21 slides
Kaggle Competitions, New Friends, New Skills and New Opportunities by
Kaggle Competitions, New Friends, New Skills and New OpportunitiesKaggle Competitions, New Friends, New Skills and New Opportunities
Kaggle Competitions, New Friends, New Skills and New OpportunitiesJo-fai Chow
363 views31 slides
Deploying your Predictive Models as a Service via Domino by
Deploying your Predictive Models as a Service via DominoDeploying your Predictive Models as a Service via Domino
Deploying your Predictive Models as a Service via DominoJo-fai Chow
19.1K views33 slides
mlcourse.ai, introduction, course overview by
mlcourse.ai, introduction, course overviewmlcourse.ai, introduction, course overview
mlcourse.ai, introduction, course overviewYury Kashnitsky
4.5K views23 slides

More Related Content

What's hot

Big data week 2018 - Graph Analytics on Big Data by
Big data week 2018 - Graph Analytics on Big DataBig data week 2018 - Graph Analytics on Big Data
Big data week 2018 - Graph Analytics on Big DataChristos Hadjinikolis
145 views43 slides
A Comparison of Different Strategies for Automated Semantic Document Annotation by
A Comparison of Different Strategies for Automated Semantic Document AnnotationA Comparison of Different Strategies for Automated Semantic Document Annotation
A Comparison of Different Strategies for Automated Semantic Document AnnotationAnsgar Scherp
1K views44 slides
How well does your Instance Matching system perform? Experimental evaluation ... by
How well does your Instance Matching system perform? Experimental evaluation ...How well does your Instance Matching system perform? Experimental evaluation ...
How well does your Instance Matching system perform? Experimental evaluation ...Holistic Benchmarking of Big Linked Data
648 views19 slides
Gephi, Graphx, and Giraph by
Gephi, Graphx, and GiraphGephi, Graphx, and Giraph
Gephi, Graphx, and GiraphDoug Needham
2.1K views15 slides
Apache Spark GraphX highlights. by
Apache Spark GraphX highlights. Apache Spark GraphX highlights.
Apache Spark GraphX highlights. Doug Needham
2K views28 slides
SF Python Meetup: TextRank in Python by
SF Python Meetup: TextRank in PythonSF Python Meetup: TextRank in Python
SF Python Meetup: TextRank in PythonPaco Nathan
5.7K views24 slides

What's hot(7)

A Comparison of Different Strategies for Automated Semantic Document Annotation by Ansgar Scherp
A Comparison of Different Strategies for Automated Semantic Document AnnotationA Comparison of Different Strategies for Automated Semantic Document Annotation
A Comparison of Different Strategies for Automated Semantic Document Annotation
Ansgar Scherp1K views
Gephi, Graphx, and Giraph by Doug Needham
Gephi, Graphx, and GiraphGephi, Graphx, and Giraph
Gephi, Graphx, and Giraph
Doug Needham2.1K views
Apache Spark GraphX highlights. by Doug Needham
Apache Spark GraphX highlights. Apache Spark GraphX highlights.
Apache Spark GraphX highlights.
Doug Needham2K views
SF Python Meetup: TextRank in Python by Paco Nathan
SF Python Meetup: TextRank in PythonSF Python Meetup: TextRank in Python
SF Python Meetup: TextRank in Python
Paco Nathan5.7K views
Josh Wills, MLconf 2013 by MLconf
Josh Wills, MLconf 2013Josh Wills, MLconf 2013
Josh Wills, MLconf 2013
MLconf2.2K views

Similar to Kaggle competitions, new friends, new skills and new opportunities

From Kaggle to H2O - The True Story of a Civil Engineer Turned Data Geek by
From Kaggle to H2O - The True Story of a Civil Engineer Turned Data GeekFrom Kaggle to H2O - The True Story of a Civil Engineer Turned Data Geek
From Kaggle to H2O - The True Story of a Civil Engineer Turned Data GeekJo-fai Chow
1.8K views45 slides
How to start your data career by
How to start your data careerHow to start your data career
How to start your data careerAdwait Bhave
173 views29 slides
What you did last summer? by
What you did last summer?What you did last summer?
What you did last summer?DoThinger
389 views41 slides
Write code and find a job by
Write code and find a jobWrite code and find a job
Write code and find a jobYung-Yu Chen
331 views43 slides
Beginner android by
Beginner androidBeginner android
Beginner androidSmriti Das
549 views40 slides
Agile Data Science by
Agile Data ScienceAgile Data Science
Agile Data ScienceVolodymyr Kazantsev
5.9K views47 slides

Similar to Kaggle competitions, new friends, new skills and new opportunities(20)

From Kaggle to H2O - The True Story of a Civil Engineer Turned Data Geek by Jo-fai Chow
From Kaggle to H2O - The True Story of a Civil Engineer Turned Data GeekFrom Kaggle to H2O - The True Story of a Civil Engineer Turned Data Geek
From Kaggle to H2O - The True Story of a Civil Engineer Turned Data Geek
Jo-fai Chow1.8K views
How to start your data career by Adwait Bhave
How to start your data careerHow to start your data career
How to start your data career
Adwait Bhave173 views
What you did last summer? by DoThinger
What you did last summer?What you did last summer?
What you did last summer?
DoThinger389 views
Write code and find a job by Yung-Yu Chen
Write code and find a jobWrite code and find a job
Write code and find a job
Yung-Yu Chen331 views
Beginner android by Smriti Das
Beginner androidBeginner android
Beginner android
Smriti Das549 views
Landing a career in data science by Parul Pandey
Landing a career in data scienceLanding a career in data science
Landing a career in data science
Parul Pandey435 views
Informal talk at pict by Mayank Jain
Informal talk at pictInformal talk at pict
Informal talk at pict
Mayank Jain2.2K views
Artur Suchwalko “What are common mistakes in Data Science projects and how to... by Lviv Startup Club
Artur Suchwalko “What are common mistakes in Data Science projects and how to...Artur Suchwalko “What are common mistakes in Data Science projects and how to...
Artur Suchwalko “What are common mistakes in Data Science projects and how to...
Dr. You or, How I Learned to Stop Worry and Love the PhD by Olga Botvinnik
Dr. You or, How I Learned to Stop Worry and Love the PhDDr. You or, How I Learned to Stop Worry and Love the PhD
Dr. You or, How I Learned to Stop Worry and Love the PhD
Olga Botvinnik831 views
Anomaly detection made easy - Piotr Guzik Allegro by Evention
Anomaly detection made easy - Piotr Guzik AllegroAnomaly detection made easy - Piotr Guzik Allegro
Anomaly detection made easy - Piotr Guzik Allegro
Evention158 views
Anomaly detection made easy by Piotr Guzik
Anomaly detection made easyAnomaly detection made easy
Anomaly detection made easy
Piotr Guzik1.8K views
Clare Corthell: Learning Data Science Online by sfdatascience
Clare Corthell: Learning Data Science OnlineClare Corthell: Learning Data Science Online
Clare Corthell: Learning Data Science Online
sfdatascience2.3K views
A few questions about large scale machine learning by Theodoros Vasiloudis
A few questions about large scale machine learningA few questions about large scale machine learning
A few questions about large scale machine learning

More from Jo-fai Chow

Making Multimillion-Dollar Baseball Decisions with H2O AutoML, LIME and Shiny by
Making Multimillion-Dollar Baseball Decisions with H2O AutoML, LIME and ShinyMaking Multimillion-Dollar Baseball Decisions with H2O AutoML, LIME and Shiny
Making Multimillion-Dollar Baseball Decisions with H2O AutoML, LIME and ShinyJo-fai Chow
702 views50 slides
Automatic and Interpretable Machine Learning in R with H2O and LIME by
Automatic and Interpretable Machine Learning in R with H2O and LIMEAutomatic and Interpretable Machine Learning in R with H2O and LIME
Automatic and Interpretable Machine Learning in R with H2O and LIMEJo-fai Chow
313 views92 slides
Automatic and Interpretable Machine Learning with H2O and LIME by
Automatic and Interpretable Machine Learning with H2O and LIMEAutomatic and Interpretable Machine Learning with H2O and LIME
Automatic and Interpretable Machine Learning with H2O and LIMEJo-fai Chow
3.7K views57 slides
H2O at Berlin R Meetup by
H2O at Berlin R MeetupH2O at Berlin R Meetup
H2O at Berlin R MeetupJo-fai Chow
447 views71 slides
H2O at BelgradeR Meetup by
H2O at BelgradeR MeetupH2O at BelgradeR Meetup
H2O at BelgradeR MeetupJo-fai Chow
473 views100 slides
Introduction to H2O and Model Stacking Use Cases by
Introduction to H2O and Model Stacking Use CasesIntroduction to H2O and Model Stacking Use Cases
Introduction to H2O and Model Stacking Use CasesJo-fai Chow
1.1K views78 slides

More from Jo-fai Chow(20)

Making Multimillion-Dollar Baseball Decisions with H2O AutoML, LIME and Shiny by Jo-fai Chow
Making Multimillion-Dollar Baseball Decisions with H2O AutoML, LIME and ShinyMaking Multimillion-Dollar Baseball Decisions with H2O AutoML, LIME and Shiny
Making Multimillion-Dollar Baseball Decisions with H2O AutoML, LIME and Shiny
Jo-fai Chow702 views
Automatic and Interpretable Machine Learning in R with H2O and LIME by Jo-fai Chow
Automatic and Interpretable Machine Learning in R with H2O and LIMEAutomatic and Interpretable Machine Learning in R with H2O and LIME
Automatic and Interpretable Machine Learning in R with H2O and LIME
Jo-fai Chow313 views
Automatic and Interpretable Machine Learning with H2O and LIME by Jo-fai Chow
Automatic and Interpretable Machine Learning with H2O and LIMEAutomatic and Interpretable Machine Learning with H2O and LIME
Automatic and Interpretable Machine Learning with H2O and LIME
Jo-fai Chow3.7K views
H2O at Berlin R Meetup by Jo-fai Chow
H2O at Berlin R MeetupH2O at Berlin R Meetup
H2O at Berlin R Meetup
Jo-fai Chow447 views
H2O at BelgradeR Meetup by Jo-fai Chow
H2O at BelgradeR MeetupH2O at BelgradeR Meetup
H2O at BelgradeR Meetup
Jo-fai Chow473 views
Introduction to H2O and Model Stacking Use Cases by Jo-fai Chow
Introduction to H2O and Model Stacking Use CasesIntroduction to H2O and Model Stacking Use Cases
Introduction to H2O and Model Stacking Use Cases
Jo-fai Chow1.1K views
H2O at Poznan R Meetup by Jo-fai Chow
H2O at Poznan R MeetupH2O at Poznan R Meetup
H2O at Poznan R Meetup
Jo-fai Chow753 views
Introduction to Machine Learning with H2O and Python by Jo-fai Chow
Introduction to Machine Learning with H2O and PythonIntroduction to Machine Learning with H2O and Python
Introduction to Machine Learning with H2O and Python
Jo-fai Chow716 views
Introduction to Machine Learning with H2O and Python by Jo-fai Chow
Introduction to Machine Learning with H2O and PythonIntroduction to Machine Learning with H2O and Python
Introduction to Machine Learning with H2O and Python
Jo-fai Chow842 views
H2O Deep Water - Making Deep Learning Accessible to Everyone by Jo-fai Chow
H2O Deep Water - Making Deep Learning Accessible to EveryoneH2O Deep Water - Making Deep Learning Accessible to Everyone
H2O Deep Water - Making Deep Learning Accessible to Everyone
Jo-fai Chow566 views
H2O Deep Water - Making Deep Learning Accessible to Everyone by Jo-fai Chow
H2O Deep Water - Making Deep Learning Accessible to EveryoneH2O Deep Water - Making Deep Learning Accessible to Everyone
H2O Deep Water - Making Deep Learning Accessible to Everyone
Jo-fai Chow1K views
Project "Deep Water" by Jo-fai Chow
Project "Deep Water"Project "Deep Water"
Project "Deep Water"
Jo-fai Chow1.5K views
H2O Machine Learning Use Cases by Jo-fai Chow
H2O Machine Learning Use CasesH2O Machine Learning Use Cases
H2O Machine Learning Use Cases
Jo-fai Chow1.5K views
A Systematic, Multi-Criteria Decision Support Framework for Sustainable Drain... by Jo-fai Chow
A Systematic, Multi-Criteria Decision Support Framework for Sustainable Drain...A Systematic, Multi-Criteria Decision Support Framework for Sustainable Drain...
A Systematic, Multi-Criteria Decision Support Framework for Sustainable Drain...
Jo-fai Chow1.2K views
Designing Sustainable Drainage Systems by Jo-fai Chow
Designing Sustainable Drainage SystemsDesigning Sustainable Drainage Systems
Designing Sustainable Drainage Systems
Jo-fai Chow2.2K views
Developing a New Decision Support System for SuDS by Jo-fai Chow
Developing a New Decision Support System for SuDSDeveloping a New Decision Support System for SuDS
Developing a New Decision Support System for SuDS
Jo-fai Chow1.3K views
Udacity Statement (Introduction to Statistics, August 2012) by Jo-fai Chow
Udacity Statement (Introduction to Statistics, August 2012)Udacity Statement (Introduction to Statistics, August 2012)
Udacity Statement (Introduction to Statistics, August 2012)
Jo-fai Chow365 views
Coursera Statement (Computational Investing, Part I, by Jo-fai Chow
Coursera Statement (Computational Investing, Part I, Coursera Statement (Computational Investing, Part I,
Coursera Statement (Computational Investing, Part I,
Jo-fai Chow532 views
Coursera Statement (Computing for Data Analysis, Oct 2013) by Jo-fai Chow
Coursera Statement (Computing for Data Analysis, Oct 2013)Coursera Statement (Computing for Data Analysis, Oct 2013)
Coursera Statement (Computing for Data Analysis, Oct 2013)
Jo-fai Chow283 views
Coursera Statement (Data Analysis, Mar 2013) by Jo-fai Chow
Coursera Statement (Data Analysis, Mar 2013)Coursera Statement (Data Analysis, Mar 2013)
Coursera Statement (Data Analysis, Mar 2013)
Jo-fai Chow332 views

Recently uploaded

Managing Github via Terrafom.pdf by
Managing Github via Terrafom.pdfManaging Github via Terrafom.pdf
Managing Github via Terrafom.pdfmicharaeck
5 views47 slides
Yin Sun - Shell by
Yin Sun - ShellYin Sun - Shell
Yin Sun - ShellDutch Power
62 views17 slides
HITCON CISO Summit 2023 - Closing by
HITCON CISO Summit 2023 - ClosingHITCON CISO Summit 2023 - Closing
HITCON CISO Summit 2023 - ClosingHacks in Taiwan (HITCON)
178 views33 slides
OSMC | SNMP Monitoring at scale by Rocco Pezzani & Thomas Gelf by
OSMC | SNMP Monitoring at scale by Rocco Pezzani & Thomas Gelf OSMC | SNMP Monitoring at scale by Rocco Pezzani & Thomas Gelf
OSMC | SNMP Monitoring at scale by Rocco Pezzani & Thomas Gelf NETWAYS
11 views92 slides
Helko van den Brom - VSL by
Helko van den Brom - VSLHelko van den Brom - VSL
Helko van den Brom - VSLDutch Power
63 views18 slides
Roozbeh Torkzadeh - TU Eindhoven by
Roozbeh Torkzadeh - TU EindhovenRoozbeh Torkzadeh - TU Eindhoven
Roozbeh Torkzadeh - TU EindhovenDutch Power
62 views14 slides

Recently uploaded(20)

Managing Github via Terrafom.pdf by micharaeck
Managing Github via Terrafom.pdfManaging Github via Terrafom.pdf
Managing Github via Terrafom.pdf
micharaeck5 views
OSMC | SNMP Monitoring at scale by Rocco Pezzani & Thomas Gelf by NETWAYS
OSMC | SNMP Monitoring at scale by Rocco Pezzani & Thomas Gelf OSMC | SNMP Monitoring at scale by Rocco Pezzani & Thomas Gelf
OSMC | SNMP Monitoring at scale by Rocco Pezzani & Thomas Gelf
NETWAYS11 views
Helko van den Brom - VSL by Dutch Power
Helko van den Brom - VSLHelko van den Brom - VSL
Helko van den Brom - VSL
Dutch Power63 views
Roozbeh Torkzadeh - TU Eindhoven by Dutch Power
Roozbeh Torkzadeh - TU EindhovenRoozbeh Torkzadeh - TU Eindhoven
Roozbeh Torkzadeh - TU Eindhoven
Dutch Power62 views
Post-event report intro session-1.docx by RohitRathi59
Post-event report intro session-1.docxPost-event report intro session-1.docx
Post-event report intro session-1.docx
RohitRathi5912 views
OSMC 2023 | Will ChatGPT Take Over My Job? by Philipp Krenn by NETWAYS
OSMC 2023 | Will ChatGPT Take Over My Job? by Philipp KrennOSMC 2023 | Will ChatGPT Take Over My Job? by Philipp Krenn
OSMC 2023 | Will ChatGPT Take Over My Job? by Philipp Krenn
NETWAYS22 views
OSMC 2023 | IGNITE: Metrics, Margins, Mutiny – How to make your SREs (not) ru... by NETWAYS
OSMC 2023 | IGNITE: Metrics, Margins, Mutiny – How to make your SREs (not) ru...OSMC 2023 | IGNITE: Metrics, Margins, Mutiny – How to make your SREs (not) ru...
OSMC 2023 | IGNITE: Metrics, Margins, Mutiny – How to make your SREs (not) ru...
NETWAYS7 views
BLogSite (Web Programming) (1).pdf by Fiverr
BLogSite (Web Programming) (1).pdfBLogSite (Web Programming) (1).pdf
BLogSite (Web Programming) (1).pdf
Fiverr11 views
Christan van Dorst - Hyteps by Dutch Power
Christan van Dorst - HytepsChristan van Dorst - Hyteps
Christan van Dorst - Hyteps
Dutch Power65 views
Gym Members Community.pptx by nasserbf1987
Gym Members Community.pptxGym Members Community.pptx
Gym Members Community.pptx
nasserbf19876 views

Kaggle competitions, new friends, new skills and new opportunities

  • 1. Kaggle Competitions, New Friends, New Skills and New Opportunities Jo-fai (Joe) Chow Data Scientist joe@h2o.ai @matlabulus
  • 2. About Me • 2005 - 2015 • Water Engineer o Consultant for Utilities o EngD Research • 2015 - Present • Data Scientist o Virgin Media o Domino Data Lab o H2O.ai 2
  • 3. About This Talk • What happened o Things I did since I started participating in Kaggle competitions. o New opportunities – results of new skills and friends. 3
  • 4. First MOOC Experience • One of the first Massive Open Online Courses. o Met some new friends. o Decided to collaborate for fun. o “How about Kaggle?” o “What is Kaggle?” 4
  • 5. First Kaggle Experience • First time in my life o Supervised learning • Random Forest • Support Vector Machine • Neural Networks o Train, Validate & Predict. o “Is it black magic?” 5
  • 6. First Kaggle Experience • Problems o “Hey Joe, you are a nice guy but we can’t work together.” o “You love MATLAB so much. You even call yourself @matlabulous on twitter!” o “We prefer R/Python.” • Results o I kept using MATLAB o Lone wolf o No collaboration  6
  • 7. Identifying Skills Gap • Obvious skills gap: o Open-source Programming langauges o Machine learning techniques o Collaboration • Kind of related o Data visualisation o Handling large datasets o Explaining results • That competition was a good wake up call. 7
  • 8. From MATLAB to R/Python MATLAB Python R Neural Networks ✔️ ✔️ ✔️ Random Forest ✔️ ✔️ ✔️ SVM ✔️ ✔️ ✔️ Other Machine Learning Libraries Toolboxes (commercial + open source) Scikit-learn and many more CRAN, GitHub (A LOT!) Data Visualisation I wasn’t good at it anyway … Matplotlib (plus a lot more since then) ggplot2 (WOW!) (plus a lot more since then) 8
  • 9. What can people do with R? 9 James Cheshire, UCL Link Paul Butler, Facebook Link
  • 10. Filling the Skills Gap • More MOOC o Machine Learning • Andrew Ng (Coursera) o Data Analysis • Jeff Leek (Coursera) • R o Intro to Programming • Dave Evans (Udacity) • Python • Things I also picked up: o Linux (Ubuntu) o Git o Cloud computing o HTML / CSS 10
  • 11. Learning from other Kagglers • Continuous learning o Kaggle’s forums and blogs. o New tools and tricks. o Many things you cannot learn from school. o I am standing on the shoulders of many Kagglers. 11
  • 12. Side Project 1 – Crime Data Viz shiny::runGitHub("rApps", "woobe", subdir = "crimemap") 12
  • 13. Other Side Projects • Colour Palette o github.com/ woobe/rPlotter 13 • World Cup 2014 Correct Score Prediction o ML vs. my friends o 10 out of 64 (15.6%) o Friends’ avg. = 4 (6.3%) o github.com/ woobe/wc2014
  • 14. Open Up Myself • Before Kaggle/MOOC o I was drawing a circle around myself. o Fear of change. o Domain-specific problem solving. • After Kaggle/MOOC o Data-driven approach. o Not a subject matter expert? No worries  o Free to try new tools, to learn and to create. 14
  • 15. New Opportunities • LondonR o First presentation outside water industry / academia. o Very positive feedback. o Led to other projects. o bit.ly /londonr_crimemap 15
  • 16. New Opportunities • useR! 2014 (UCLA) o Presented a poster. o Met new friends. o Life-changing event. o github.com/ woobe/useR_2014 16
  • 17. New Friends 17 Ramnath Vaidyanathan htmlwidgets DataRobot Nick @ DominoData Lab H2O.ai & John Chambers! rOpenSci RStudio Matt Dowle data.table (also at H2O.ai)
  • 19. More Opportunities • First blog post about H2O o Things to try after useR! – Part1: Deep Learning with H2O 19
  • 20. More Opportunities • Blog post about Domino and H2O o I did it for fun. I did not have any expectation. o It helped attract customers to both Domino and H2O. 20
  • 21. Becoming a Data Scientist 21
  • 22. London Kagglers Assemble • London Kaggle Meetup o Sep 2015 o I met my Kaggle buddy Mickael Le Gal o He is a product data scientist at Tictrac 22
  • 23. London Kagglers Assemble • Rossmann Store Sales o We got stuck at top 10% for a long period. o Mickael had a breakthrough in feature engineering with 48 hours to go. o I re-trained all models and completed model stacking just a few hours before the deadline (thanks to Domino Data Lab). o Top 2% finish (our best result so far). 23
  • 25. Summary of Benefits • Direct o Identify data science skills gap. o Learn quickly from the community. o Expand your network. o Prepare yourself for real- life data challenges. • Indirect o You also learn non-ML skills along the way. o You learn to build small data products (e.g. graph, web app, REST API) and help others gain insight. 25
  • 26. Strata Hadoop London Talks • “Model stacking and parameters tuning.” o Wednesday 1 June o 7pm o LondonR at Strata o Free event • “Generalised Low Rank Models.” o Thursday 2 June o 2pm o Strata Hadoop main conference event 26
  • 27. Big Thank You • London Kaggle Meetup Organisers • Mickael • Marios • You (Kagglers) • Prof. Dragan Savic (supervisor at uni. of Exeter) • Mango Solutions • Virgin Media • Domino Data Lab • H2O.ai 27
  • 28. Any Questions? • Contact o joe@h2o.ai o @matlabulous o github.com/woobe • Links (All Slides) o github.com/h2oai/h2o- meetups • H2O in London o Coming soon! • Meetups • Office o We’re hiring! o www.h2o.ai/careers 28

Editor's Notes

  1. Erin LeDell