SlideShare a Scribd company logo
Kaggle Competitions, New Friends, New
Skills and New Opportunities
Jo-fai (Joe) Chow
Data Scientist
joe@h2o.ai
@matlabulus
About Me
• 2005 - 2015
• Water Engineer
o Consultant for Utilities
o EngD Research
• 2015 - Present
• Data Scientist
o Virgin Media
o Domino Data Lab
o H2O.ai
2
About This Talk
• What happened
o Things I did since I
started participating in
Kaggle competitions.
o New opportunities –
results of new skills and
friends.
3
First MOOC Experience
• One of the first Massive
Open Online Courses.
o Met some new friends.
o Decided to collaborate
for fun.
o “How about Kaggle?”
o “What is Kaggle?”
4
First Kaggle Experience
• First time in my life
o Supervised learning
• Random Forest
• Support Vector Machine
• Neural Networks
o Train, Validate & Predict.
o “Is it black magic?”
5
First Kaggle Experience
• Problems
o “Hey Joe, you are a nice
guy but we can’t work
together.”
o “You love MATLAB so
much. You even call
yourself @matlabulous
on twitter!”
o “We prefer R/Python.”
• Results
o I kept using MATLAB
o Lone wolf
o No collaboration 
6
Identifying Skills Gap
• Obvious skills gap:
o Open-source
Programming langauges
o Machine learning
techniques
o Collaboration
• Kind of related
o Data visualisation
o Handling large datasets
o Explaining results
• That competition was a good wake up call.
7
From MATLAB to R/Python
MATLAB Python R
Neural Networks ✔️ ✔️ ✔️
Random Forest ✔️ ✔️ ✔️
SVM ✔️ ✔️ ✔️
Other Machine
Learning Libraries
Toolboxes
(commercial + open
source)
Scikit-learn and
many more
CRAN, GitHub
(A LOT!)
Data Visualisation I wasn’t good at it
anyway …
Matplotlib
(plus a lot more
since then)
ggplot2 (WOW!)
(plus a lot more
since then)
8
What can people do with R?
9
James Cheshire, UCL
Link
Paul Butler, Facebook
Link
Filling the Skills Gap
• More MOOC
o Machine Learning
• Andrew Ng (Coursera)
o Data Analysis
• Jeff Leek (Coursera)
• R
o Intro to Programming
• Dave Evans (Udacity)
• Python
• Things I also picked up:
o Linux (Ubuntu)
o Git
o Cloud computing
o HTML / CSS
10
Learning from other Kagglers
• Continuous learning
o Kaggle’s forums and blogs.
o New tools and tricks.
o Many things you cannot
learn from school.
o I am standing on the
shoulders of many
Kagglers.
11
Side Project 1 – Crime Data Viz
shiny::runGitHub("rApps", "woobe",
subdir = "crimemap")
12
Other Side Projects
• Colour Palette
o github.com/
woobe/rPlotter
13
• World Cup 2014 Correct
Score Prediction
o ML vs. my friends
o 10 out of 64 (15.6%)
o Friends’ avg. = 4 (6.3%)
o github.com/
woobe/wc2014
Open Up Myself
• Before Kaggle/MOOC
o I was drawing a circle
around myself.
o Fear of change.
o Domain-specific problem
solving.
• After Kaggle/MOOC
o Data-driven approach.
o Not a subject matter
expert? No worries 
o Free to try new tools, to
learn and to create.
14
New Opportunities
• LondonR
o First presentation
outside water industry /
academia.
o Very positive feedback.
o Led to other projects.
o bit.ly
/londonr_crimemap
15
New Opportunities
• useR! 2014 (UCLA)
o Presented a poster.
o Met new friends.
o Life-changing event.
o github.com/
woobe/useR_2014
16
New Friends
17
Ramnath Vaidyanathan
htmlwidgets
DataRobot
Nick @
DominoData
Lab
H2O.ai &
John Chambers!
rOpenSci
RStudio
Matt Dowle
data.table (also at H2O.ai)
About Domino and H2O
18
More Opportunities
• First blog post about
H2O
o Things to try after useR!
– Part1: Deep Learning
with H2O
19
More Opportunities
• Blog post about Domino
and H2O
o I did it for fun. I did not
have any expectation.
o It helped attract
customers to both
Domino and H2O.
20
Becoming a Data Scientist
21
London Kagglers Assemble
• London Kaggle Meetup
o Sep 2015
o I met my Kaggle buddy
Mickael Le Gal
o He is a product data
scientist at Tictrac
22
London Kagglers Assemble
• Rossmann Store Sales
o We got stuck at top 10% for a long
period.
o Mickael had a breakthrough in feature
engineering with 48 hours to go.
o I re-trained all models and completed
model stacking just a few hours before
the deadline (thanks to Domino Data
Lab).
o Top 2% finish (our best result so far).
23
Joining H2O.ai
24
Summary of Benefits
• Direct
o Identify data science
skills gap.
o Learn quickly from the
community.
o Expand your network.
o Prepare yourself for real-
life data challenges.
• Indirect
o You also learn non-ML
skills along the way.
o You learn to build small
data products (e.g.
graph, web app, REST
API) and help others gain
insight.
25
Strata Hadoop London Talks
• “Model stacking and
parameters tuning.”
o Wednesday 1 June
o 7pm
o LondonR at Strata
o Free event
• “Generalised Low Rank
Models.”
o Thursday 2 June
o 2pm
o Strata Hadoop main
conference event
26
Big Thank You
• London Kaggle Meetup
Organisers
• Mickael
• Marios
• You (Kagglers)
• Prof. Dragan Savic
(supervisor at uni. of
Exeter)
• Mango Solutions
• Virgin Media
• Domino Data Lab
• H2O.ai
27
Any Questions?
• Contact
o joe@h2o.ai
o @matlabulous
o github.com/woobe
• Links (All Slides)
o github.com/h2oai/h2o-
meetups
• H2O in London
o Coming soon!
• Meetups
• Office
o We’re hiring!
o www.h2o.ai/careers
28

More Related Content

What's hot

Big data week 2018 - Graph Analytics on Big Data
Big data week 2018 - Graph Analytics on Big DataBig data week 2018 - Graph Analytics on Big Data
Big data week 2018 - Graph Analytics on Big Data
Christos Hadjinikolis
 
A Comparison of Different Strategies for Automated Semantic Document Annotation
A Comparison of Different Strategies for Automated Semantic Document AnnotationA Comparison of Different Strategies for Automated Semantic Document Annotation
A Comparison of Different Strategies for Automated Semantic Document Annotation
Ansgar Scherp
 
How well does your Instance Matching system perform? Experimental evaluation ...
How well does your Instance Matching system perform? Experimental evaluation ...How well does your Instance Matching system perform? Experimental evaluation ...
How well does your Instance Matching system perform? Experimental evaluation ...
Holistic Benchmarking of Big Linked Data
 
Gephi, Graphx, and Giraph
Gephi, Graphx, and GiraphGephi, Graphx, and Giraph
Gephi, Graphx, and Giraph
Doug Needham
 
Apache Spark GraphX highlights.
Apache Spark GraphX highlights. Apache Spark GraphX highlights.
Apache Spark GraphX highlights.
Doug Needham
 
SF Python Meetup: TextRank in Python
SF Python Meetup: TextRank in PythonSF Python Meetup: TextRank in Python
SF Python Meetup: TextRank in Python
Paco Nathan
 
Josh Wills, MLconf 2013
Josh Wills, MLconf 2013Josh Wills, MLconf 2013
Josh Wills, MLconf 2013
MLconf
 

What's hot (7)

Big data week 2018 - Graph Analytics on Big Data
Big data week 2018 - Graph Analytics on Big DataBig data week 2018 - Graph Analytics on Big Data
Big data week 2018 - Graph Analytics on Big Data
 
A Comparison of Different Strategies for Automated Semantic Document Annotation
A Comparison of Different Strategies for Automated Semantic Document AnnotationA Comparison of Different Strategies for Automated Semantic Document Annotation
A Comparison of Different Strategies for Automated Semantic Document Annotation
 
How well does your Instance Matching system perform? Experimental evaluation ...
How well does your Instance Matching system perform? Experimental evaluation ...How well does your Instance Matching system perform? Experimental evaluation ...
How well does your Instance Matching system perform? Experimental evaluation ...
 
Gephi, Graphx, and Giraph
Gephi, Graphx, and GiraphGephi, Graphx, and Giraph
Gephi, Graphx, and Giraph
 
Apache Spark GraphX highlights.
Apache Spark GraphX highlights. Apache Spark GraphX highlights.
Apache Spark GraphX highlights.
 
SF Python Meetup: TextRank in Python
SF Python Meetup: TextRank in PythonSF Python Meetup: TextRank in Python
SF Python Meetup: TextRank in Python
 
Josh Wills, MLconf 2013
Josh Wills, MLconf 2013Josh Wills, MLconf 2013
Josh Wills, MLconf 2013
 

Similar to Kaggle competitions, new friends, new skills and new opportunities

From Kaggle to H2O - The True Story of a Civil Engineer Turned Data Geek
From Kaggle to H2O - The True Story of a Civil Engineer Turned Data GeekFrom Kaggle to H2O - The True Story of a Civil Engineer Turned Data Geek
From Kaggle to H2O - The True Story of a Civil Engineer Turned Data Geek
Jo-fai Chow
 
How to start your data career
How to start your data careerHow to start your data career
How to start your data career
Adwait Bhave
 
What you did last summer?
What you did last summer?What you did last summer?
What you did last summer?
DoThinger
 
Write code and find a job
Write code and find a jobWrite code and find a job
Write code and find a job
Yung-Yu Chen
 
Beginner android
Beginner androidBeginner android
Beginner android
Smriti Das
 
Agile Data Science
Agile Data ScienceAgile Data Science
Agile Data Science
Volodymyr Kazantsev
 
Impact of Open Source
Impact of Open SourceImpact of Open Source
Impact of Open Source
Anne-Gaelle Colom
 
Landing a career in data science
Landing a career in data scienceLanding a career in data science
Landing a career in data science
Parul Pandey
 
Google summer of code
Google summer of codeGoogle summer of code
Google summer of code
Pradeeban Kathiravelu, Ph.D.
 
Informal talk at pict
Informal talk at pictInformal talk at pict
Informal talk at pictMayank Jain
 
Limits of Machine Learning
Limits of Machine LearningLimits of Machine Learning
Limits of Machine Learning
Alexey Grigorev
 
Ask your users
Ask your usersAsk your users
Ask your users
Marie Toler Raney
 
Artur Suchwalko “What are common mistakes in Data Science projects and how to...
Artur Suchwalko “What are common mistakes in Data Science projects and how to...Artur Suchwalko “What are common mistakes in Data Science projects and how to...
Artur Suchwalko “What are common mistakes in Data Science projects and how to...
Lviv Startup Club
 
Dr. You or, How I Learned to Stop Worry and Love the PhD
Dr. You or, How I Learned to Stop Worry and Love the PhDDr. You or, How I Learned to Stop Worry and Love the PhD
Dr. You or, How I Learned to Stop Worry and Love the PhD
Olga Botvinnik
 
Anomaly detection made easy
Anomaly detection made easyAnomaly detection made easy
Anomaly detection made easy
Piotr Guzik
 
Anomaly detection made easy - Piotr Guzik Allegro
Anomaly detection made easy - Piotr Guzik AllegroAnomaly detection made easy - Piotr Guzik Allegro
Anomaly detection made easy - Piotr Guzik Allegro
Evention
 
Clare Corthell: Learning Data Science Online
Clare Corthell: Learning Data Science OnlineClare Corthell: Learning Data Science Online
Clare Corthell: Learning Data Science Online
sfdatascience
 
A few questions about large scale machine learning
A few questions about large scale machine learningA few questions about large scale machine learning
A few questions about large scale machine learning
Theodoros Vasiloudis
 
Digital Fabrication Studio v.0.2: Introduction
Digital Fabrication Studio v.0.2: IntroductionDigital Fabrication Studio v.0.2: Introduction
Digital Fabrication Studio v.0.2: Introduction
Massimo Menichinelli
 
Google summer of code 2012
Google summer of code 2012Google summer of code 2012
Google summer of code 2012
Pradeeban Kathiravelu, Ph.D.
 

Similar to Kaggle competitions, new friends, new skills and new opportunities (20)

From Kaggle to H2O - The True Story of a Civil Engineer Turned Data Geek
From Kaggle to H2O - The True Story of a Civil Engineer Turned Data GeekFrom Kaggle to H2O - The True Story of a Civil Engineer Turned Data Geek
From Kaggle to H2O - The True Story of a Civil Engineer Turned Data Geek
 
How to start your data career
How to start your data careerHow to start your data career
How to start your data career
 
What you did last summer?
What you did last summer?What you did last summer?
What you did last summer?
 
Write code and find a job
Write code and find a jobWrite code and find a job
Write code and find a job
 
Beginner android
Beginner androidBeginner android
Beginner android
 
Agile Data Science
Agile Data ScienceAgile Data Science
Agile Data Science
 
Impact of Open Source
Impact of Open SourceImpact of Open Source
Impact of Open Source
 
Landing a career in data science
Landing a career in data scienceLanding a career in data science
Landing a career in data science
 
Google summer of code
Google summer of codeGoogle summer of code
Google summer of code
 
Informal talk at pict
Informal talk at pictInformal talk at pict
Informal talk at pict
 
Limits of Machine Learning
Limits of Machine LearningLimits of Machine Learning
Limits of Machine Learning
 
Ask your users
Ask your usersAsk your users
Ask your users
 
Artur Suchwalko “What are common mistakes in Data Science projects and how to...
Artur Suchwalko “What are common mistakes in Data Science projects and how to...Artur Suchwalko “What are common mistakes in Data Science projects and how to...
Artur Suchwalko “What are common mistakes in Data Science projects and how to...
 
Dr. You or, How I Learned to Stop Worry and Love the PhD
Dr. You or, How I Learned to Stop Worry and Love the PhDDr. You or, How I Learned to Stop Worry and Love the PhD
Dr. You or, How I Learned to Stop Worry and Love the PhD
 
Anomaly detection made easy
Anomaly detection made easyAnomaly detection made easy
Anomaly detection made easy
 
Anomaly detection made easy - Piotr Guzik Allegro
Anomaly detection made easy - Piotr Guzik AllegroAnomaly detection made easy - Piotr Guzik Allegro
Anomaly detection made easy - Piotr Guzik Allegro
 
Clare Corthell: Learning Data Science Online
Clare Corthell: Learning Data Science OnlineClare Corthell: Learning Data Science Online
Clare Corthell: Learning Data Science Online
 
A few questions about large scale machine learning
A few questions about large scale machine learningA few questions about large scale machine learning
A few questions about large scale machine learning
 
Digital Fabrication Studio v.0.2: Introduction
Digital Fabrication Studio v.0.2: IntroductionDigital Fabrication Studio v.0.2: Introduction
Digital Fabrication Studio v.0.2: Introduction
 
Google summer of code 2012
Google summer of code 2012Google summer of code 2012
Google summer of code 2012
 

More from Jo-fai Chow

Making Multimillion-Dollar Baseball Decisions with H2O AutoML, LIME and Shiny
Making Multimillion-Dollar Baseball Decisions with H2O AutoML, LIME and ShinyMaking Multimillion-Dollar Baseball Decisions with H2O AutoML, LIME and Shiny
Making Multimillion-Dollar Baseball Decisions with H2O AutoML, LIME and Shiny
Jo-fai Chow
 
Automatic and Interpretable Machine Learning in R with H2O and LIME
Automatic and Interpretable Machine Learning in R with H2O and LIMEAutomatic and Interpretable Machine Learning in R with H2O and LIME
Automatic and Interpretable Machine Learning in R with H2O and LIME
Jo-fai Chow
 
Automatic and Interpretable Machine Learning with H2O and LIME
Automatic and Interpretable Machine Learning with H2O and LIMEAutomatic and Interpretable Machine Learning with H2O and LIME
Automatic and Interpretable Machine Learning with H2O and LIME
Jo-fai Chow
 
H2O at Berlin R Meetup
H2O at Berlin R MeetupH2O at Berlin R Meetup
H2O at Berlin R Meetup
Jo-fai Chow
 
H2O at BelgradeR Meetup
H2O at BelgradeR MeetupH2O at BelgradeR Meetup
H2O at BelgradeR Meetup
Jo-fai Chow
 
Introduction to H2O and Model Stacking Use Cases
Introduction to H2O and Model Stacking Use CasesIntroduction to H2O and Model Stacking Use Cases
Introduction to H2O and Model Stacking Use Cases
Jo-fai Chow
 
H2O at Poznan R Meetup
H2O at Poznan R MeetupH2O at Poznan R Meetup
H2O at Poznan R Meetup
Jo-fai Chow
 
Introduction to Machine Learning with H2O and Python
Introduction to Machine Learning with H2O and PythonIntroduction to Machine Learning with H2O and Python
Introduction to Machine Learning with H2O and Python
Jo-fai Chow
 
Introduction to Machine Learning with H2O and Python
Introduction to Machine Learning with H2O and PythonIntroduction to Machine Learning with H2O and Python
Introduction to Machine Learning with H2O and Python
Jo-fai Chow
 
H2O Deep Water - Making Deep Learning Accessible to Everyone
H2O Deep Water - Making Deep Learning Accessible to EveryoneH2O Deep Water - Making Deep Learning Accessible to Everyone
H2O Deep Water - Making Deep Learning Accessible to Everyone
Jo-fai Chow
 
H2O Deep Water - Making Deep Learning Accessible to Everyone
H2O Deep Water - Making Deep Learning Accessible to EveryoneH2O Deep Water - Making Deep Learning Accessible to Everyone
H2O Deep Water - Making Deep Learning Accessible to Everyone
Jo-fai Chow
 
Project "Deep Water"
Project "Deep Water"Project "Deep Water"
Project "Deep Water"
Jo-fai Chow
 
H2O Machine Learning Use Cases
H2O Machine Learning Use CasesH2O Machine Learning Use Cases
H2O Machine Learning Use Cases
Jo-fai Chow
 
A Systematic, Multi-Criteria Decision Support Framework for Sustainable Drain...
A Systematic, Multi-Criteria Decision Support Framework for Sustainable Drain...A Systematic, Multi-Criteria Decision Support Framework for Sustainable Drain...
A Systematic, Multi-Criteria Decision Support Framework for Sustainable Drain...
Jo-fai Chow
 
Designing Sustainable Drainage Systems
Designing Sustainable Drainage SystemsDesigning Sustainable Drainage Systems
Designing Sustainable Drainage Systems
Jo-fai Chow
 
Developing a New Decision Support System for SuDS
Developing a New Decision Support System for SuDSDeveloping a New Decision Support System for SuDS
Developing a New Decision Support System for SuDS
Jo-fai Chow
 
Udacity Statement (Introduction to Statistics, August 2012)
Udacity Statement (Introduction to Statistics, August 2012)Udacity Statement (Introduction to Statistics, August 2012)
Udacity Statement (Introduction to Statistics, August 2012)Jo-fai Chow
 
Coursera Statement (Computational Investing, Part I,
Coursera Statement (Computational Investing, Part I, Coursera Statement (Computational Investing, Part I,
Coursera Statement (Computational Investing, Part I, Jo-fai Chow
 
Coursera Statement (Computing for Data Analysis, Oct 2013)
Coursera Statement (Computing for Data Analysis, Oct 2013)Coursera Statement (Computing for Data Analysis, Oct 2013)
Coursera Statement (Computing for Data Analysis, Oct 2013)Jo-fai Chow
 
Coursera Statement (Data Analysis, Mar 2013)
Coursera Statement (Data Analysis, Mar 2013)Coursera Statement (Data Analysis, Mar 2013)
Coursera Statement (Data Analysis, Mar 2013)Jo-fai Chow
 

More from Jo-fai Chow (20)

Making Multimillion-Dollar Baseball Decisions with H2O AutoML, LIME and Shiny
Making Multimillion-Dollar Baseball Decisions with H2O AutoML, LIME and ShinyMaking Multimillion-Dollar Baseball Decisions with H2O AutoML, LIME and Shiny
Making Multimillion-Dollar Baseball Decisions with H2O AutoML, LIME and Shiny
 
Automatic and Interpretable Machine Learning in R with H2O and LIME
Automatic and Interpretable Machine Learning in R with H2O and LIMEAutomatic and Interpretable Machine Learning in R with H2O and LIME
Automatic and Interpretable Machine Learning in R with H2O and LIME
 
Automatic and Interpretable Machine Learning with H2O and LIME
Automatic and Interpretable Machine Learning with H2O and LIMEAutomatic and Interpretable Machine Learning with H2O and LIME
Automatic and Interpretable Machine Learning with H2O and LIME
 
H2O at Berlin R Meetup
H2O at Berlin R MeetupH2O at Berlin R Meetup
H2O at Berlin R Meetup
 
H2O at BelgradeR Meetup
H2O at BelgradeR MeetupH2O at BelgradeR Meetup
H2O at BelgradeR Meetup
 
Introduction to H2O and Model Stacking Use Cases
Introduction to H2O and Model Stacking Use CasesIntroduction to H2O and Model Stacking Use Cases
Introduction to H2O and Model Stacking Use Cases
 
H2O at Poznan R Meetup
H2O at Poznan R MeetupH2O at Poznan R Meetup
H2O at Poznan R Meetup
 
Introduction to Machine Learning with H2O and Python
Introduction to Machine Learning with H2O and PythonIntroduction to Machine Learning with H2O and Python
Introduction to Machine Learning with H2O and Python
 
Introduction to Machine Learning with H2O and Python
Introduction to Machine Learning with H2O and PythonIntroduction to Machine Learning with H2O and Python
Introduction to Machine Learning with H2O and Python
 
H2O Deep Water - Making Deep Learning Accessible to Everyone
H2O Deep Water - Making Deep Learning Accessible to EveryoneH2O Deep Water - Making Deep Learning Accessible to Everyone
H2O Deep Water - Making Deep Learning Accessible to Everyone
 
H2O Deep Water - Making Deep Learning Accessible to Everyone
H2O Deep Water - Making Deep Learning Accessible to EveryoneH2O Deep Water - Making Deep Learning Accessible to Everyone
H2O Deep Water - Making Deep Learning Accessible to Everyone
 
Project "Deep Water"
Project "Deep Water"Project "Deep Water"
Project "Deep Water"
 
H2O Machine Learning Use Cases
H2O Machine Learning Use CasesH2O Machine Learning Use Cases
H2O Machine Learning Use Cases
 
A Systematic, Multi-Criteria Decision Support Framework for Sustainable Drain...
A Systematic, Multi-Criteria Decision Support Framework for Sustainable Drain...A Systematic, Multi-Criteria Decision Support Framework for Sustainable Drain...
A Systematic, Multi-Criteria Decision Support Framework for Sustainable Drain...
 
Designing Sustainable Drainage Systems
Designing Sustainable Drainage SystemsDesigning Sustainable Drainage Systems
Designing Sustainable Drainage Systems
 
Developing a New Decision Support System for SuDS
Developing a New Decision Support System for SuDSDeveloping a New Decision Support System for SuDS
Developing a New Decision Support System for SuDS
 
Udacity Statement (Introduction to Statistics, August 2012)
Udacity Statement (Introduction to Statistics, August 2012)Udacity Statement (Introduction to Statistics, August 2012)
Udacity Statement (Introduction to Statistics, August 2012)
 
Coursera Statement (Computational Investing, Part I,
Coursera Statement (Computational Investing, Part I, Coursera Statement (Computational Investing, Part I,
Coursera Statement (Computational Investing, Part I,
 
Coursera Statement (Computing for Data Analysis, Oct 2013)
Coursera Statement (Computing for Data Analysis, Oct 2013)Coursera Statement (Computing for Data Analysis, Oct 2013)
Coursera Statement (Computing for Data Analysis, Oct 2013)
 
Coursera Statement (Data Analysis, Mar 2013)
Coursera Statement (Data Analysis, Mar 2013)Coursera Statement (Data Analysis, Mar 2013)
Coursera Statement (Data Analysis, Mar 2013)
 

Recently uploaded

Doctoral Symposium at the 17th IEEE International Conference on Software Test...
Doctoral Symposium at the 17th IEEE International Conference on Software Test...Doctoral Symposium at the 17th IEEE International Conference on Software Test...
Doctoral Symposium at the 17th IEEE International Conference on Software Test...
Sebastiano Panichella
 
Obesity causes and management and associated medical conditions
Obesity causes and management and associated medical conditionsObesity causes and management and associated medical conditions
Obesity causes and management and associated medical conditions
Faculty of Medicine And Health Sciences
 
Media as a Mind Controlling Strategy In Old and Modern Era
Media as a Mind Controlling Strategy In Old and Modern EraMedia as a Mind Controlling Strategy In Old and Modern Era
Media as a Mind Controlling Strategy In Old and Modern Era
faizulhassanfaiz1670
 
Getting started with Amazon Bedrock Studio and Control Tower
Getting started with Amazon Bedrock Studio and Control TowerGetting started with Amazon Bedrock Studio and Control Tower
Getting started with Amazon Bedrock Studio and Control Tower
Vladimir Samoylov
 
Eureka, I found it! - Special Libraries Association 2021 Presentation
Eureka, I found it! - Special Libraries Association 2021 PresentationEureka, I found it! - Special Libraries Association 2021 Presentation
Eureka, I found it! - Special Libraries Association 2021 Presentation
Access Innovations, Inc.
 
somanykidsbutsofewfathers-140705000023-phpapp02.pptx
somanykidsbutsofewfathers-140705000023-phpapp02.pptxsomanykidsbutsofewfathers-140705000023-phpapp02.pptx
somanykidsbutsofewfathers-140705000023-phpapp02.pptx
Howard Spence
 
International Workshop on Artificial Intelligence in Software Testing
International Workshop on Artificial Intelligence in Software TestingInternational Workshop on Artificial Intelligence in Software Testing
International Workshop on Artificial Intelligence in Software Testing
Sebastiano Panichella
 
Announcement of 18th IEEE International Conference on Software Testing, Verif...
Announcement of 18th IEEE International Conference on Software Testing, Verif...Announcement of 18th IEEE International Conference on Software Testing, Verif...
Announcement of 18th IEEE International Conference on Software Testing, Verif...
Sebastiano Panichella
 
Sharpen existing tools or get a new toolbox? Contemporary cluster initiatives...
Sharpen existing tools or get a new toolbox? Contemporary cluster initiatives...Sharpen existing tools or get a new toolbox? Contemporary cluster initiatives...
Sharpen existing tools or get a new toolbox? Contemporary cluster initiatives...
Orkestra
 
María Carolina Martínez - eCommerce Day Colombia 2024
María Carolina Martínez - eCommerce Day Colombia 2024María Carolina Martínez - eCommerce Day Colombia 2024
María Carolina Martínez - eCommerce Day Colombia 2024
eCommerce Institute
 
Supercharge your AI - SSP Industry Breakout Session 2024-v2_1.pdf
Supercharge your AI - SSP Industry Breakout Session 2024-v2_1.pdfSupercharge your AI - SSP Industry Breakout Session 2024-v2_1.pdf
Supercharge your AI - SSP Industry Breakout Session 2024-v2_1.pdf
Access Innovations, Inc.
 
Bonzo subscription_hjjjjjjjj5hhhhhhh_2024.pdf
Bonzo subscription_hjjjjjjjj5hhhhhhh_2024.pdfBonzo subscription_hjjjjjjjj5hhhhhhh_2024.pdf
Bonzo subscription_hjjjjjjjj5hhhhhhh_2024.pdf
khadija278284
 
Acorn Recovery: Restore IT infra within minutes
Acorn Recovery: Restore IT infra within minutesAcorn Recovery: Restore IT infra within minutes
Acorn Recovery: Restore IT infra within minutes
IP ServerOne
 
Competition and Regulation in Professional Services – KLEINER – June 2024 OEC...
Competition and Regulation in Professional Services – KLEINER – June 2024 OEC...Competition and Regulation in Professional Services – KLEINER – June 2024 OEC...
Competition and Regulation in Professional Services – KLEINER – June 2024 OEC...
OECD Directorate for Financial and Enterprise Affairs
 
0x01 - Newton's Third Law: Static vs. Dynamic Abusers
0x01 - Newton's Third Law:  Static vs. Dynamic Abusers0x01 - Newton's Third Law:  Static vs. Dynamic Abusers
0x01 - Newton's Third Law: Static vs. Dynamic Abusers
OWASP Beja
 
Bitcoin Lightning wallet and tic-tac-toe game XOXO
Bitcoin Lightning wallet and tic-tac-toe game XOXOBitcoin Lightning wallet and tic-tac-toe game XOXO
Bitcoin Lightning wallet and tic-tac-toe game XOXO
Matjaž Lipuš
 

Recently uploaded (16)

Doctoral Symposium at the 17th IEEE International Conference on Software Test...
Doctoral Symposium at the 17th IEEE International Conference on Software Test...Doctoral Symposium at the 17th IEEE International Conference on Software Test...
Doctoral Symposium at the 17th IEEE International Conference on Software Test...
 
Obesity causes and management and associated medical conditions
Obesity causes and management and associated medical conditionsObesity causes and management and associated medical conditions
Obesity causes and management and associated medical conditions
 
Media as a Mind Controlling Strategy In Old and Modern Era
Media as a Mind Controlling Strategy In Old and Modern EraMedia as a Mind Controlling Strategy In Old and Modern Era
Media as a Mind Controlling Strategy In Old and Modern Era
 
Getting started with Amazon Bedrock Studio and Control Tower
Getting started with Amazon Bedrock Studio and Control TowerGetting started with Amazon Bedrock Studio and Control Tower
Getting started with Amazon Bedrock Studio and Control Tower
 
Eureka, I found it! - Special Libraries Association 2021 Presentation
Eureka, I found it! - Special Libraries Association 2021 PresentationEureka, I found it! - Special Libraries Association 2021 Presentation
Eureka, I found it! - Special Libraries Association 2021 Presentation
 
somanykidsbutsofewfathers-140705000023-phpapp02.pptx
somanykidsbutsofewfathers-140705000023-phpapp02.pptxsomanykidsbutsofewfathers-140705000023-phpapp02.pptx
somanykidsbutsofewfathers-140705000023-phpapp02.pptx
 
International Workshop on Artificial Intelligence in Software Testing
International Workshop on Artificial Intelligence in Software TestingInternational Workshop on Artificial Intelligence in Software Testing
International Workshop on Artificial Intelligence in Software Testing
 
Announcement of 18th IEEE International Conference on Software Testing, Verif...
Announcement of 18th IEEE International Conference on Software Testing, Verif...Announcement of 18th IEEE International Conference on Software Testing, Verif...
Announcement of 18th IEEE International Conference on Software Testing, Verif...
 
Sharpen existing tools or get a new toolbox? Contemporary cluster initiatives...
Sharpen existing tools or get a new toolbox? Contemporary cluster initiatives...Sharpen existing tools or get a new toolbox? Contemporary cluster initiatives...
Sharpen existing tools or get a new toolbox? Contemporary cluster initiatives...
 
María Carolina Martínez - eCommerce Day Colombia 2024
María Carolina Martínez - eCommerce Day Colombia 2024María Carolina Martínez - eCommerce Day Colombia 2024
María Carolina Martínez - eCommerce Day Colombia 2024
 
Supercharge your AI - SSP Industry Breakout Session 2024-v2_1.pdf
Supercharge your AI - SSP Industry Breakout Session 2024-v2_1.pdfSupercharge your AI - SSP Industry Breakout Session 2024-v2_1.pdf
Supercharge your AI - SSP Industry Breakout Session 2024-v2_1.pdf
 
Bonzo subscription_hjjjjjjjj5hhhhhhh_2024.pdf
Bonzo subscription_hjjjjjjjj5hhhhhhh_2024.pdfBonzo subscription_hjjjjjjjj5hhhhhhh_2024.pdf
Bonzo subscription_hjjjjjjjj5hhhhhhh_2024.pdf
 
Acorn Recovery: Restore IT infra within minutes
Acorn Recovery: Restore IT infra within minutesAcorn Recovery: Restore IT infra within minutes
Acorn Recovery: Restore IT infra within minutes
 
Competition and Regulation in Professional Services – KLEINER – June 2024 OEC...
Competition and Regulation in Professional Services – KLEINER – June 2024 OEC...Competition and Regulation in Professional Services – KLEINER – June 2024 OEC...
Competition and Regulation in Professional Services – KLEINER – June 2024 OEC...
 
0x01 - Newton's Third Law: Static vs. Dynamic Abusers
0x01 - Newton's Third Law:  Static vs. Dynamic Abusers0x01 - Newton's Third Law:  Static vs. Dynamic Abusers
0x01 - Newton's Third Law: Static vs. Dynamic Abusers
 
Bitcoin Lightning wallet and tic-tac-toe game XOXO
Bitcoin Lightning wallet and tic-tac-toe game XOXOBitcoin Lightning wallet and tic-tac-toe game XOXO
Bitcoin Lightning wallet and tic-tac-toe game XOXO
 

Kaggle competitions, new friends, new skills and new opportunities

  • 1. Kaggle Competitions, New Friends, New Skills and New Opportunities Jo-fai (Joe) Chow Data Scientist joe@h2o.ai @matlabulus
  • 2. About Me • 2005 - 2015 • Water Engineer o Consultant for Utilities o EngD Research • 2015 - Present • Data Scientist o Virgin Media o Domino Data Lab o H2O.ai 2
  • 3. About This Talk • What happened o Things I did since I started participating in Kaggle competitions. o New opportunities – results of new skills and friends. 3
  • 4. First MOOC Experience • One of the first Massive Open Online Courses. o Met some new friends. o Decided to collaborate for fun. o “How about Kaggle?” o “What is Kaggle?” 4
  • 5. First Kaggle Experience • First time in my life o Supervised learning • Random Forest • Support Vector Machine • Neural Networks o Train, Validate & Predict. o “Is it black magic?” 5
  • 6. First Kaggle Experience • Problems o “Hey Joe, you are a nice guy but we can’t work together.” o “You love MATLAB so much. You even call yourself @matlabulous on twitter!” o “We prefer R/Python.” • Results o I kept using MATLAB o Lone wolf o No collaboration  6
  • 7. Identifying Skills Gap • Obvious skills gap: o Open-source Programming langauges o Machine learning techniques o Collaboration • Kind of related o Data visualisation o Handling large datasets o Explaining results • That competition was a good wake up call. 7
  • 8. From MATLAB to R/Python MATLAB Python R Neural Networks ✔️ ✔️ ✔️ Random Forest ✔️ ✔️ ✔️ SVM ✔️ ✔️ ✔️ Other Machine Learning Libraries Toolboxes (commercial + open source) Scikit-learn and many more CRAN, GitHub (A LOT!) Data Visualisation I wasn’t good at it anyway … Matplotlib (plus a lot more since then) ggplot2 (WOW!) (plus a lot more since then) 8
  • 9. What can people do with R? 9 James Cheshire, UCL Link Paul Butler, Facebook Link
  • 10. Filling the Skills Gap • More MOOC o Machine Learning • Andrew Ng (Coursera) o Data Analysis • Jeff Leek (Coursera) • R o Intro to Programming • Dave Evans (Udacity) • Python • Things I also picked up: o Linux (Ubuntu) o Git o Cloud computing o HTML / CSS 10
  • 11. Learning from other Kagglers • Continuous learning o Kaggle’s forums and blogs. o New tools and tricks. o Many things you cannot learn from school. o I am standing on the shoulders of many Kagglers. 11
  • 12. Side Project 1 – Crime Data Viz shiny::runGitHub("rApps", "woobe", subdir = "crimemap") 12
  • 13. Other Side Projects • Colour Palette o github.com/ woobe/rPlotter 13 • World Cup 2014 Correct Score Prediction o ML vs. my friends o 10 out of 64 (15.6%) o Friends’ avg. = 4 (6.3%) o github.com/ woobe/wc2014
  • 14. Open Up Myself • Before Kaggle/MOOC o I was drawing a circle around myself. o Fear of change. o Domain-specific problem solving. • After Kaggle/MOOC o Data-driven approach. o Not a subject matter expert? No worries  o Free to try new tools, to learn and to create. 14
  • 15. New Opportunities • LondonR o First presentation outside water industry / academia. o Very positive feedback. o Led to other projects. o bit.ly /londonr_crimemap 15
  • 16. New Opportunities • useR! 2014 (UCLA) o Presented a poster. o Met new friends. o Life-changing event. o github.com/ woobe/useR_2014 16
  • 17. New Friends 17 Ramnath Vaidyanathan htmlwidgets DataRobot Nick @ DominoData Lab H2O.ai & John Chambers! rOpenSci RStudio Matt Dowle data.table (also at H2O.ai)
  • 19. More Opportunities • First blog post about H2O o Things to try after useR! – Part1: Deep Learning with H2O 19
  • 20. More Opportunities • Blog post about Domino and H2O o I did it for fun. I did not have any expectation. o It helped attract customers to both Domino and H2O. 20
  • 21. Becoming a Data Scientist 21
  • 22. London Kagglers Assemble • London Kaggle Meetup o Sep 2015 o I met my Kaggle buddy Mickael Le Gal o He is a product data scientist at Tictrac 22
  • 23. London Kagglers Assemble • Rossmann Store Sales o We got stuck at top 10% for a long period. o Mickael had a breakthrough in feature engineering with 48 hours to go. o I re-trained all models and completed model stacking just a few hours before the deadline (thanks to Domino Data Lab). o Top 2% finish (our best result so far). 23
  • 25. Summary of Benefits • Direct o Identify data science skills gap. o Learn quickly from the community. o Expand your network. o Prepare yourself for real- life data challenges. • Indirect o You also learn non-ML skills along the way. o You learn to build small data products (e.g. graph, web app, REST API) and help others gain insight. 25
  • 26. Strata Hadoop London Talks • “Model stacking and parameters tuning.” o Wednesday 1 June o 7pm o LondonR at Strata o Free event • “Generalised Low Rank Models.” o Thursday 2 June o 2pm o Strata Hadoop main conference event 26
  • 27. Big Thank You • London Kaggle Meetup Organisers • Mickael • Marios • You (Kagglers) • Prof. Dragan Savic (supervisor at uni. of Exeter) • Mango Solutions • Virgin Media • Domino Data Lab • H2O.ai 27
  • 28. Any Questions? • Contact o joe@h2o.ai o @matlabulous o github.com/woobe • Links (All Slides) o github.com/h2oai/h2o- meetups • H2O in London o Coming soon! • Meetups • Office o We’re hiring! o www.h2o.ai/careers 28

Editor's Notes

  1. Erin LeDell