SlideShare a Scribd company logo
1 of 30
Download to read offline
Intro to Data Science
Erin LeDell Ph.D.

Statistician & Machine
Learning Scientist

H2O.ai
H2O World 2015
Download our app, “H2O World 2015”
H2O World 2015
I have H2O
Installed
I have Python
installed
I have R
installed
I have the H2O
World data
sets
Pick up stickers or get install help at the
information booth
Intro to Data Science
• What is Data Science?
• The Data Scientist
• The Data Science Team
• Data Science Tools
• What is Machine Learning?
• What is Deep Learning?
• What is Ensemble Learning?
• Data Science Resources
What is Data Science?
One of the earliest uses of the
term "data science" occurred in
the title of the 1996 International
Federation of Classification
Societies conference in Kobe,
Japan.
What is Data Science?
• The term re-emerged and became popularized in 2001 by
William Cleveland, then at Bell Labs, when he published, 

"Data Science: An Action Plan for Expanding the Technical Areas
of the Field of Statistics”.
• This publication describes a plan to enlarge the major areas of
technical work of the field of statistics. Dr. Cleveland states,
"Since plan is ambitious and implies substantial change, the
altered field will be called Data Science."
What is Data Science?
What is Data Science?
• Clean, transform, filter, aggregate, impute
• Convert into X and Y
Problem
Formulation
Data
Processing
Machine
Learning
• Identify a data task or prediction problem
• Collect relevant data
• Train models
• Evaluate models
The Data Science Venn Diagram
Drew Conway (2010)
The Data Scientist
The Data Scientist “Unicorn”
Survey of Data Scientists on LinkedIn
The number of data scientists has doubled
over the last 4 years.
The top five skills listed by data scientists:
1. Data Analysis

2. R
3. Python
4. Data Mining

5. Machine Learning
From Data Unicorns to Data Teams
Data Science Teams
• Usually a background in computer science or
engineering
• Very good programming and DevOps skills
Data
Analysts
Data
Engineers
Data

Scientists
• Strong data skills and the ability to use existing data
analysis tools
• Able to communicate and tell a story using data
• Strong math/stats background in addition to
programming ability
• Understanding of machine learning algorithms
Data Science Teams
Data Science in the Enterprise
• Data Science teams develop “actionable insights” for business.
• They provide decision makers with information, guidance and
confidence in the decision making process.
• Competitive advantage
• Cost minimization
• Data-driven products
Data Science in the Enterprise
Don’t be a data dinosaur.
Embrace the data!
Data Science Tools
2013 was the year of the 

data science “language wars.”
Data Science Tools
In 2015, we have evolved beyond this…

We are too busy doing actual data science!
Data Science Tools
We are headed toward language agnostic data science, where
friendly APIs connect to powerful data processing engines.
What is Machine Learning?
Unlike rules-based systems which require a human
expert to hard-code domain knowledge directly into
the system, a machine learning algorithm learns how
to make decisions from the data alone.
”Field of study that gives computers the
ability to learn without being
explicitly programmed.” 

— Arthur Samuel, 1959
Machine Learning Tasks
• Multi-class or binary classification
• Ranking (e.g. Google Search results order)
• Evaluate with Classification Error or AUC
Regression
Classification
Clustering
• Unsupervised learning (no training labels)
• Partition the data; identify clusters or sub-populations
• Evaluate with AIC, BIC or Total Sum of Squares
• Predict a real-valued response (e.g. viral load, price)
• Gaussian, Gamma, Poisson, etc. distributed response
• Evaluate with MSE or R^2
Train, Validation and Test Set
• If you plan on doing any model tuning, you should split
your dataset into three parts: Train, Validation and Test
• There is no general rule for how you should partition the
data and it will depend on how strong the signal in your
data is, but an example could be: 

50% Train, 25% Validation and 25% Test
• The validation set is used strictly for model tuning (via
validation of models with different parameters) and the test
set is used to make a final estimate of the generalization
K-fold Cross-validation
• K-fold Cross-validation (CV) 

is used to evaluate the
performance of machine
learning algorithms.
• CV will give you the most
“mileage” on your training
data.
• Performance metrics are
averaged across k folds.
Machine Learning Workflow
Training and Prediction

in machine learning
What is Deep Learning?
• Deep neural networks have more than one hidden
layer in their architecture. That’s why they are
called “deep” neural networks.
• Very useful for complex input data such as images,
video, audio.
”A branch of machine learning based on a set of
algorithms that attempt to model high-level
abstractions in data by using model architectures,
composed of multiple non-linear transformations.” 



— Wikipedia (2015)
What is Deep Learning?
• Deep learning architectures,
specifically artificial neural
networks (ANNs) have
been around since 1980.
• However, there were
breakthroughs in training
techniques that lead to their
recent resurgence in the mid
2000’s.
• Combined with modern
computing power, they are
quite effective.
What is Ensemble Learning?
• Random Forests and Gradient Boosting Machines (GBM)
are both ensembles of decision trees.
• Stacking, or Super Learning, is technique for combining
various learners into a single, powerful learner using a
second-level metalearning algorithm.
“Ensemble methods use multiple learning
algorithms to obtain better predictive
performance that could be obtained from 

any of the constituent learning algorithms.” 

— Wikipedia (2015)
No Free Lunch
• No general purpose algorithm to solve all problems.
• No right answer on optimal data preparation.
• Some algorithms may have such strong biases that they
can only learn certain kinds of functions.
"Even after the observation of the frequent or
constant conjunction of objects, we have no reason
to draw any inference concerning any object beyond
those of which we have had experience.” 

— David Hume (1711-1776)
Where to Learn More?
• H2O Online Training (free): http://learn.h2o.ai
• H2O Slidedecks: http://www.slideshare.net/0xdata
• H2O Video Presentations: https://www.youtube.com/user/0xdata
• H2O Community Events & Meetups: http://h2o.ai/events
• Machine Learning & Data Science courses: http://coursebuffet.com

More Related Content

What's hot

Domino and AWS: collaborative analytics and model governance at financial ser...
Domino and AWS: collaborative analytics and model governance at financial ser...Domino and AWS: collaborative analytics and model governance at financial ser...
Domino and AWS: collaborative analytics and model governance at financial ser...Domino Data Lab
 
Data Science Salon: Introduction to Machine Learning - Marketing Use Case
Data Science Salon: Introduction to Machine Learning - Marketing Use CaseData Science Salon: Introduction to Machine Learning - Marketing Use Case
Data Science Salon: Introduction to Machine Learning - Marketing Use CaseFormulatedby
 
Big data expo - machine learning in the elastic stack
Big data expo - machine learning in the elastic stack Big data expo - machine learning in the elastic stack
Big data expo - machine learning in the elastic stack BigDataExpo
 
H2O World - Top 10 Data Science Pitfalls - Mark Landry
H2O World - Top 10 Data Science Pitfalls - Mark LandryH2O World - Top 10 Data Science Pitfalls - Mark Landry
H2O World - Top 10 Data Science Pitfalls - Mark LandrySri Ambati
 
Big Data Agile Analytics by Ken Collier - Director Agile Analytics, Thoughtwo...
Big Data Agile Analytics by Ken Collier - Director Agile Analytics, Thoughtwo...Big Data Agile Analytics by Ken Collier - Director Agile Analytics, Thoughtwo...
Big Data Agile Analytics by Ken Collier - Director Agile Analytics, Thoughtwo...Thoughtworks
 
Dataiku productive application to production - pap is may 2015
Dataiku    productive application to production - pap is may 2015 Dataiku    productive application to production - pap is may 2015
Dataiku productive application to production - pap is may 2015 Dataiku
 
H2O World - Learning How Humans and Non-Humans Interact with Digital Ads
H2O World - Learning How Humans and Non-Humans Interact with Digital AdsH2O World - Learning How Humans and Non-Humans Interact with Digital Ads
H2O World - Learning How Humans and Non-Humans Interact with Digital AdsSri Ambati
 
H2O World - NCS Continuous Media Optimization w/H2O - Satya Satyamoorthy
H2O World - NCS Continuous Media Optimization w/H2O - Satya SatyamoorthyH2O World - NCS Continuous Media Optimization w/H2O - Satya Satyamoorthy
H2O World - NCS Continuous Media Optimization w/H2O - Satya SatyamoorthySri Ambati
 
H2O World - Data Science in Action @ 6sense - Viral Bajaria
H2O World - Data Science in Action @ 6sense - Viral BajariaH2O World - Data Science in Action @ 6sense - Viral Bajaria
H2O World - Data Science in Action @ 6sense - Viral BajariaSri Ambati
 
H2O World - Machine Learning at Comcast - Andrew Leamon & Chushi Ren
H2O World - Machine Learning at Comcast - Andrew Leamon & Chushi RenH2O World - Machine Learning at Comcast - Andrew Leamon & Chushi Ren
H2O World - Machine Learning at Comcast - Andrew Leamon & Chushi RenSri Ambati
 
The Other 99% of a Data Science Project
The Other 99% of a Data Science ProjectThe Other 99% of a Data Science Project
The Other 99% of a Data Science ProjectEugene Mandel
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data ScienceLivePerson
 
Top 10 Data Science Practitioner Pitfalls
Top 10 Data Science Practitioner PitfallsTop 10 Data Science Practitioner Pitfalls
Top 10 Data Science Practitioner PitfallsSri Ambati
 
Agile data science
Agile data scienceAgile data science
Agile data scienceJoel Horwitz
 
Leveraging Open Source Automated Data Science Tools
Leveraging Open Source Automated Data Science ToolsLeveraging Open Source Automated Data Science Tools
Leveraging Open Source Automated Data Science ToolsDomino Data Lab
 
Dataiku r users group v2
Dataiku   r users group v2Dataiku   r users group v2
Dataiku r users group v2Cdiscount
 
Back to Square One: Building a Data Science Team from Scratch
Back to Square One: Building a Data Science Team from ScratchBack to Square One: Building a Data Science Team from Scratch
Back to Square One: Building a Data Science Team from ScratchKlaas Bosteels
 
Leveraged Analytics at Scale
Leveraged Analytics at ScaleLeveraged Analytics at Scale
Leveraged Analytics at ScaleDomino Data Lab
 
Data Quality Analytics: Understanding what is in your data, before using it
Data Quality Analytics: Understanding what is in your data, before using itData Quality Analytics: Understanding what is in your data, before using it
Data Quality Analytics: Understanding what is in your data, before using itDomino Data Lab
 

What's hot (20)

Domino and AWS: collaborative analytics and model governance at financial ser...
Domino and AWS: collaborative analytics and model governance at financial ser...Domino and AWS: collaborative analytics and model governance at financial ser...
Domino and AWS: collaborative analytics and model governance at financial ser...
 
Data Science Salon: Introduction to Machine Learning - Marketing Use Case
Data Science Salon: Introduction to Machine Learning - Marketing Use CaseData Science Salon: Introduction to Machine Learning - Marketing Use Case
Data Science Salon: Introduction to Machine Learning - Marketing Use Case
 
Big data expo - machine learning in the elastic stack
Big data expo - machine learning in the elastic stack Big data expo - machine learning in the elastic stack
Big data expo - machine learning in the elastic stack
 
H2O World - Top 10 Data Science Pitfalls - Mark Landry
H2O World - Top 10 Data Science Pitfalls - Mark LandryH2O World - Top 10 Data Science Pitfalls - Mark Landry
H2O World - Top 10 Data Science Pitfalls - Mark Landry
 
Big Data Agile Analytics by Ken Collier - Director Agile Analytics, Thoughtwo...
Big Data Agile Analytics by Ken Collier - Director Agile Analytics, Thoughtwo...Big Data Agile Analytics by Ken Collier - Director Agile Analytics, Thoughtwo...
Big Data Agile Analytics by Ken Collier - Director Agile Analytics, Thoughtwo...
 
Dataiku productive application to production - pap is may 2015
Dataiku    productive application to production - pap is may 2015 Dataiku    productive application to production - pap is may 2015
Dataiku productive application to production - pap is may 2015
 
H2O World - Learning How Humans and Non-Humans Interact with Digital Ads
H2O World - Learning How Humans and Non-Humans Interact with Digital AdsH2O World - Learning How Humans and Non-Humans Interact with Digital Ads
H2O World - Learning How Humans and Non-Humans Interact with Digital Ads
 
H2O World - NCS Continuous Media Optimization w/H2O - Satya Satyamoorthy
H2O World - NCS Continuous Media Optimization w/H2O - Satya SatyamoorthyH2O World - NCS Continuous Media Optimization w/H2O - Satya Satyamoorthy
H2O World - NCS Continuous Media Optimization w/H2O - Satya Satyamoorthy
 
H2O World - Data Science in Action @ 6sense - Viral Bajaria
H2O World - Data Science in Action @ 6sense - Viral BajariaH2O World - Data Science in Action @ 6sense - Viral Bajaria
H2O World - Data Science in Action @ 6sense - Viral Bajaria
 
H2O World - Machine Learning at Comcast - Andrew Leamon & Chushi Ren
H2O World - Machine Learning at Comcast - Andrew Leamon & Chushi RenH2O World - Machine Learning at Comcast - Andrew Leamon & Chushi Ren
H2O World - Machine Learning at Comcast - Andrew Leamon & Chushi Ren
 
The Other 99% of a Data Science Project
The Other 99% of a Data Science ProjectThe Other 99% of a Data Science Project
The Other 99% of a Data Science Project
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Top 10 Data Science Practitioner Pitfalls
Top 10 Data Science Practitioner PitfallsTop 10 Data Science Practitioner Pitfalls
Top 10 Data Science Practitioner Pitfalls
 
Agile data science
Agile data scienceAgile data science
Agile data science
 
Leveraging Open Source Automated Data Science Tools
Leveraging Open Source Automated Data Science ToolsLeveraging Open Source Automated Data Science Tools
Leveraging Open Source Automated Data Science Tools
 
Dataiku r users group v2
Dataiku   r users group v2Dataiku   r users group v2
Dataiku r users group v2
 
Back to Square One: Building a Data Science Team from Scratch
Back to Square One: Building a Data Science Team from ScratchBack to Square One: Building a Data Science Team from Scratch
Back to Square One: Building a Data Science Team from Scratch
 
Leveraged Analytics at Scale
Leveraged Analytics at ScaleLeveraged Analytics at Scale
Leveraged Analytics at Scale
 
Andreas weigend
Andreas weigendAndreas weigend
Andreas weigend
 
Data Quality Analytics: Understanding what is in your data, before using it
Data Quality Analytics: Understanding what is in your data, before using itData Quality Analytics: Understanding what is in your data, before using it
Data Quality Analytics: Understanding what is in your data, before using it
 

Viewers also liked

H2O World - Intro to R, Python, and Flow - Amy Wang
H2O World - Intro to R, Python, and Flow - Amy WangH2O World - Intro to R, Python, and Flow - Amy Wang
H2O World - Intro to R, Python, and Flow - Amy WangSri Ambati
 
Intro to Machine Learning with H2O and AWS
Intro to Machine Learning with H2O and AWSIntro to Machine Learning with H2O and AWS
Intro to Machine Learning with H2O and AWSSri Ambati
 
Intro to Data Science Big Data
Intro to Data Science Big DataIntro to Data Science Big Data
Intro to Data Science Big DataIndu Khemchandani
 
H2O World - Welcome to H2O World with Arno Candel
H2O World - Welcome to H2O World with Arno CandelH2O World - Welcome to H2O World with Arno Candel
H2O World - Welcome to H2O World with Arno CandelSri Ambati
 
Introduction to data science intro,ch(1,2,3)
Introduction to data science intro,ch(1,2,3)Introduction to data science intro,ch(1,2,3)
Introduction to data science intro,ch(1,2,3)heba_ahmad
 
Big Data Science: Intro and Benefits
Big Data Science: Intro and BenefitsBig Data Science: Intro and Benefits
Big Data Science: Intro and BenefitsChandan Rajah
 
Intro to H2O in Python - Data Science LA
Intro to H2O in Python - Data Science LAIntro to H2O in Python - Data Science LA
Intro to H2O in Python - Data Science LASri Ambati
 
Webinar: Deep Learning with H2O
Webinar: Deep Learning with H2OWebinar: Deep Learning with H2O
Webinar: Deep Learning with H2OSri Ambati
 
H2O World - Building a Smarter Application - Tom Kraljevic
H2O World - Building a Smarter Application - Tom KraljevicH2O World - Building a Smarter Application - Tom Kraljevic
H2O World - Building a Smarter Application - Tom KraljevicSri Ambati
 
Applied Data Science: Building a Beer Recommender | Data Science MD - Oct 2014
Applied Data Science: Building a Beer Recommender | Data Science MD - Oct 2014Applied Data Science: Building a Beer Recommender | Data Science MD - Oct 2014
Applied Data Science: Building a Beer Recommender | Data Science MD - Oct 2014Austin Ogilvie
 
H2O World - H2O Deep Learning with Arno Candel
H2O World - H2O Deep Learning with Arno CandelH2O World - H2O Deep Learning with Arno Candel
H2O World - H2O Deep Learning with Arno CandelSri Ambati
 
H2O Open Source Deep Learning, Arno Candel 03-20-14
H2O Open Source Deep Learning, Arno Candel 03-20-14H2O Open Source Deep Learning, Arno Candel 03-20-14
H2O Open Source Deep Learning, Arno Candel 03-20-14Sri Ambati
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data ScienceCaserta
 
Introduction of Data Science
Introduction of Data ScienceIntroduction of Data Science
Introduction of Data ScienceJason Geng
 
Build Your Own Recommendation Engine
Build Your Own Recommendation EngineBuild Your Own Recommendation Engine
Build Your Own Recommendation EngineSri Ambati
 
H2O with Erin LeDell at Portland R User Group
H2O with Erin LeDell at Portland R User GroupH2O with Erin LeDell at Portland R User Group
H2O with Erin LeDell at Portland R User GroupSri Ambati
 
H2O World - Top 10 Deep Learning Tips & Tricks - Arno Candel
H2O World - Top 10 Deep Learning Tips & Tricks - Arno CandelH2O World - Top 10 Deep Learning Tips & Tricks - Arno Candel
H2O World - Top 10 Deep Learning Tips & Tricks - Arno CandelSri Ambati
 
Anyone Can Build A Recommendation Engine With Solr: Presented by Doug Turnbul...
Anyone Can Build A Recommendation Engine With Solr: Presented by Doug Turnbul...Anyone Can Build A Recommendation Engine With Solr: Presented by Doug Turnbul...
Anyone Can Build A Recommendation Engine With Solr: Presented by Doug Turnbul...Lucidworks
 
Demystifying Data Science with an introduction to Machine Learning
Demystifying Data Science with an introduction to Machine LearningDemystifying Data Science with an introduction to Machine Learning
Demystifying Data Science with an introduction to Machine LearningJulian Bright
 

Viewers also liked (20)

H2O World - Intro to R, Python, and Flow - Amy Wang
H2O World - Intro to R, Python, and Flow - Amy WangH2O World - Intro to R, Python, and Flow - Amy Wang
H2O World - Intro to R, Python, and Flow - Amy Wang
 
Intro to Machine Learning with H2O and AWS
Intro to Machine Learning with H2O and AWSIntro to Machine Learning with H2O and AWS
Intro to Machine Learning with H2O and AWS
 
Intro to Data Science Big Data
Intro to Data Science Big DataIntro to Data Science Big Data
Intro to Data Science Big Data
 
H2O World - Welcome to H2O World with Arno Candel
H2O World - Welcome to H2O World with Arno CandelH2O World - Welcome to H2O World with Arno Candel
H2O World - Welcome to H2O World with Arno Candel
 
Introduction to data science intro,ch(1,2,3)
Introduction to data science intro,ch(1,2,3)Introduction to data science intro,ch(1,2,3)
Introduction to data science intro,ch(1,2,3)
 
Big Data Science: Intro and Benefits
Big Data Science: Intro and BenefitsBig Data Science: Intro and Benefits
Big Data Science: Intro and Benefits
 
Intro to H2O in Python - Data Science LA
Intro to H2O in Python - Data Science LAIntro to H2O in Python - Data Science LA
Intro to H2O in Python - Data Science LA
 
Webinar: Deep Learning with H2O
Webinar: Deep Learning with H2OWebinar: Deep Learning with H2O
Webinar: Deep Learning with H2O
 
H2O World - Building a Smarter Application - Tom Kraljevic
H2O World - Building a Smarter Application - Tom KraljevicH2O World - Building a Smarter Application - Tom Kraljevic
H2O World - Building a Smarter Application - Tom Kraljevic
 
Applied Data Science: Building a Beer Recommender | Data Science MD - Oct 2014
Applied Data Science: Building a Beer Recommender | Data Science MD - Oct 2014Applied Data Science: Building a Beer Recommender | Data Science MD - Oct 2014
Applied Data Science: Building a Beer Recommender | Data Science MD - Oct 2014
 
H2O World - H2O Deep Learning with Arno Candel
H2O World - H2O Deep Learning with Arno CandelH2O World - H2O Deep Learning with Arno Candel
H2O World - H2O Deep Learning with Arno Candel
 
H2O Open Source Deep Learning, Arno Candel 03-20-14
H2O Open Source Deep Learning, Arno Candel 03-20-14H2O Open Source Deep Learning, Arno Candel 03-20-14
H2O Open Source Deep Learning, Arno Candel 03-20-14
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
H20: A platform for big math
H20: A platform for big math H20: A platform for big math
H20: A platform for big math
 
Introduction of Data Science
Introduction of Data ScienceIntroduction of Data Science
Introduction of Data Science
 
Build Your Own Recommendation Engine
Build Your Own Recommendation EngineBuild Your Own Recommendation Engine
Build Your Own Recommendation Engine
 
H2O with Erin LeDell at Portland R User Group
H2O with Erin LeDell at Portland R User GroupH2O with Erin LeDell at Portland R User Group
H2O with Erin LeDell at Portland R User Group
 
H2O World - Top 10 Deep Learning Tips & Tricks - Arno Candel
H2O World - Top 10 Deep Learning Tips & Tricks - Arno CandelH2O World - Top 10 Deep Learning Tips & Tricks - Arno Candel
H2O World - Top 10 Deep Learning Tips & Tricks - Arno Candel
 
Anyone Can Build A Recommendation Engine With Solr: Presented by Doug Turnbul...
Anyone Can Build A Recommendation Engine With Solr: Presented by Doug Turnbul...Anyone Can Build A Recommendation Engine With Solr: Presented by Doug Turnbul...
Anyone Can Build A Recommendation Engine With Solr: Presented by Doug Turnbul...
 
Demystifying Data Science with an introduction to Machine Learning
Demystifying Data Science with an introduction to Machine LearningDemystifying Data Science with an introduction to Machine Learning
Demystifying Data Science with an introduction to Machine Learning
 

Similar to H2O World - Intro to Data Science with Erin Ledell

Data Science.pptx NEW COURICUUMN IN DATA
Data Science.pptx NEW COURICUUMN IN DATAData Science.pptx NEW COURICUUMN IN DATA
Data Science.pptx NEW COURICUUMN IN DATAjaved75
 
Intro to Data Science for Non-Data Scientists
Intro to Data Science for Non-Data ScientistsIntro to Data Science for Non-Data Scientists
Intro to Data Science for Non-Data ScientistsSri Ambati
 
Building High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning ApplicationsBuilding High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning ApplicationsYalçın Yenigün
 
Afternoons with Azure - Azure Machine Learning
Afternoons with Azure - Azure Machine Learning Afternoons with Azure - Azure Machine Learning
Afternoons with Azure - Azure Machine Learning CCG
 
Large Scale Data Mining using Genetics-Based Machine Learning
Large Scale Data Mining using Genetics-Based Machine LearningLarge Scale Data Mining using Genetics-Based Machine Learning
Large Scale Data Mining using Genetics-Based Machine Learningjaumebp
 
لموعد الإثنين 03 يناير 2022 143 مبادرة #تواصل_تطوير المحاضرة ال 143 من المباد...
لموعد الإثنين 03 يناير 2022 143 مبادرة #تواصل_تطوير المحاضرة ال 143 من المباد...لموعد الإثنين 03 يناير 2022 143 مبادرة #تواصل_تطوير المحاضرة ال 143 من المباد...
لموعد الإثنين 03 يناير 2022 143 مبادرة #تواصل_تطوير المحاضرة ال 143 من المباد...Egyptian Engineers Association
 
Databases, Web Services and Tools For Systems Immunology
Databases, Web Services and Tools For Systems ImmunologyDatabases, Web Services and Tools For Systems Immunology
Databases, Web Services and Tools For Systems ImmunologyYannick Pouliot
 
Net campus2015 antimomusone
Net campus2015 antimomusoneNet campus2015 antimomusone
Net campus2015 antimomusoneDotNetCampus
 
PREDICT THE FUTURE , MACHINE LEARNING & BIG DATA
PREDICT THE FUTURE , MACHINE LEARNING & BIG DATAPREDICT THE FUTURE , MACHINE LEARNING & BIG DATA
PREDICT THE FUTURE , MACHINE LEARNING & BIG DATADotNetCampus
 
Lec 1 integrating data science and data analytics in various research thrust
Lec 1 integrating data science and data analytics in various research thrustLec 1 integrating data science and data analytics in various research thrust
Lec 1 integrating data science and data analytics in various research thrustMenchita Falcutila Dumlao
 
Which institute is best for data science?
Which institute is best for data science?Which institute is best for data science?
Which institute is best for data science?DIGITALSAI1
 
Best Selenium certification course
Best Selenium certification courseBest Selenium certification course
Best Selenium certification courseKumarNaik21
 
Data science training in hyd ppt (1)
Data science training in hyd ppt (1)Data science training in hyd ppt (1)
Data science training in hyd ppt (1)SayyedYusufali
 
Data science training institute in hyderabad
Data science training institute in hyderabadData science training institute in hyderabad
Data science training institute in hyderabadVamsiNihal
 
Data science training in Hyderabad
Data science  training in HyderabadData science  training in Hyderabad
Data science training in Hyderabadsaitejavella
 
Data science training Hyderabad
Data science training HyderabadData science training Hyderabad
Data science training HyderabadNithinsunil1
 
Data science online training in hyderabad
Data science online training in hyderabadData science online training in hyderabad
Data science online training in hyderabadVamsiNihal
 
Data science training in hyd ppt (1)
Data science training in hyd ppt (1)Data science training in hyd ppt (1)
Data science training in hyd ppt (1)SayyedYusufali
 
data science training and placement
data science training and placementdata science training and placement
data science training and placementSaiprasadVella
 

Similar to H2O World - Intro to Data Science with Erin Ledell (20)

Data Science.pptx NEW COURICUUMN IN DATA
Data Science.pptx NEW COURICUUMN IN DATAData Science.pptx NEW COURICUUMN IN DATA
Data Science.pptx NEW COURICUUMN IN DATA
 
Intro to Data Science for Non-Data Scientists
Intro to Data Science for Non-Data ScientistsIntro to Data Science for Non-Data Scientists
Intro to Data Science for Non-Data Scientists
 
Building High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning ApplicationsBuilding High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning Applications
 
Afternoons with Azure - Azure Machine Learning
Afternoons with Azure - Azure Machine Learning Afternoons with Azure - Azure Machine Learning
Afternoons with Azure - Azure Machine Learning
 
Large Scale Data Mining using Genetics-Based Machine Learning
Large Scale Data Mining using Genetics-Based Machine LearningLarge Scale Data Mining using Genetics-Based Machine Learning
Large Scale Data Mining using Genetics-Based Machine Learning
 
لموعد الإثنين 03 يناير 2022 143 مبادرة #تواصل_تطوير المحاضرة ال 143 من المباد...
لموعد الإثنين 03 يناير 2022 143 مبادرة #تواصل_تطوير المحاضرة ال 143 من المباد...لموعد الإثنين 03 يناير 2022 143 مبادرة #تواصل_تطوير المحاضرة ال 143 من المباد...
لموعد الإثنين 03 يناير 2022 143 مبادرة #تواصل_تطوير المحاضرة ال 143 من المباد...
 
Data Science and Analysis.pptx
Data Science and Analysis.pptxData Science and Analysis.pptx
Data Science and Analysis.pptx
 
Databases, Web Services and Tools For Systems Immunology
Databases, Web Services and Tools For Systems ImmunologyDatabases, Web Services and Tools For Systems Immunology
Databases, Web Services and Tools For Systems Immunology
 
Net campus2015 antimomusone
Net campus2015 antimomusoneNet campus2015 antimomusone
Net campus2015 antimomusone
 
PREDICT THE FUTURE , MACHINE LEARNING & BIG DATA
PREDICT THE FUTURE , MACHINE LEARNING & BIG DATAPREDICT THE FUTURE , MACHINE LEARNING & BIG DATA
PREDICT THE FUTURE , MACHINE LEARNING & BIG DATA
 
Lec 1 integrating data science and data analytics in various research thrust
Lec 1 integrating data science and data analytics in various research thrustLec 1 integrating data science and data analytics in various research thrust
Lec 1 integrating data science and data analytics in various research thrust
 
Which institute is best for data science?
Which institute is best for data science?Which institute is best for data science?
Which institute is best for data science?
 
Best Selenium certification course
Best Selenium certification courseBest Selenium certification course
Best Selenium certification course
 
Data science training in hyd ppt (1)
Data science training in hyd ppt (1)Data science training in hyd ppt (1)
Data science training in hyd ppt (1)
 
Data science training institute in hyderabad
Data science training institute in hyderabadData science training institute in hyderabad
Data science training institute in hyderabad
 
Data science training in Hyderabad
Data science  training in HyderabadData science  training in Hyderabad
Data science training in Hyderabad
 
Data science training Hyderabad
Data science training HyderabadData science training Hyderabad
Data science training Hyderabad
 
Data science online training in hyderabad
Data science online training in hyderabadData science online training in hyderabad
Data science online training in hyderabad
 
Data science training in hyd ppt (1)
Data science training in hyd ppt (1)Data science training in hyd ppt (1)
Data science training in hyd ppt (1)
 
data science training and placement
data science training and placementdata science training and placement
data science training and placement
 

More from Sri Ambati

H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
Generative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptxGenerative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptxSri Ambati
 
AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek Sri Ambati
 
LLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5thLLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5thSri Ambati
 
Building, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for ProductionBuilding, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for ProductionSri Ambati
 
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...Sri Ambati
 
Risk Management for LLMs
Risk Management for LLMsRisk Management for LLMs
Risk Management for LLMsSri Ambati
 
Open-Source AI: Community is the Way
Open-Source AI: Community is the WayOpen-Source AI: Community is the Way
Open-Source AI: Community is the WaySri Ambati
 
Building Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2OBuilding Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2OSri Ambati
 
Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical Sri Ambati
 
Cutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM PapersCutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM PapersSri Ambati
 
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...Sri Ambati
 
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...Sri Ambati
 
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...Sri Ambati
 
LLM Interpretability
LLM Interpretability LLM Interpretability
LLM Interpretability Sri Ambati
 
Never Reply to an Email Again
Never Reply to an Email AgainNever Reply to an Email Again
Never Reply to an Email AgainSri Ambati
 
Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)Sri Ambati
 
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...Sri Ambati
 
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...Sri Ambati
 
AI Foundations Course Module 1 - An AI Transformation Journey
AI Foundations Course Module 1 - An AI Transformation JourneyAI Foundations Course Module 1 - An AI Transformation Journey
AI Foundations Course Module 1 - An AI Transformation JourneySri Ambati
 

More from Sri Ambati (20)

H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
Generative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptxGenerative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptx
 
AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek
 
LLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5thLLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5th
 
Building, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for ProductionBuilding, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for Production
 
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
 
Risk Management for LLMs
Risk Management for LLMsRisk Management for LLMs
Risk Management for LLMs
 
Open-Source AI: Community is the Way
Open-Source AI: Community is the WayOpen-Source AI: Community is the Way
Open-Source AI: Community is the Way
 
Building Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2OBuilding Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2O
 
Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical
 
Cutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM PapersCutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM Papers
 
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
 
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
 
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
 
LLM Interpretability
LLM Interpretability LLM Interpretability
LLM Interpretability
 
Never Reply to an Email Again
Never Reply to an Email AgainNever Reply to an Email Again
Never Reply to an Email Again
 
Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)
 
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
 
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
 
AI Foundations Course Module 1 - An AI Transformation Journey
AI Foundations Course Module 1 - An AI Transformation JourneyAI Foundations Course Module 1 - An AI Transformation Journey
AI Foundations Course Module 1 - An AI Transformation Journey
 

Recently uploaded

SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanyChristoph Pohl
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesŁukasz Chruściel
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - InfographicHr365.us smith
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptkotipi9215
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Hr365.us smith
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantAxelRicardoTrocheRiq
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Andreas Granig
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEEVICTOR MAESTRE RAMIREZ
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Velvetech LLC
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsAhmed Mohamed
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...soniya singh
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024StefanoLambiase
 
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfThe Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfPower Karaoke
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...OnePlan Solutions
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin
 

Recently uploaded (20)

SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New Features
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - Infographic
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.ppt
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service Consultant
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEE
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML Diagrams
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
 
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfThe Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdf
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
 

H2O World - Intro to Data Science with Erin Ledell

  • 1. Intro to Data Science Erin LeDell Ph.D.
 Statistician & Machine Learning Scientist
 H2O.ai
  • 2. H2O World 2015 Download our app, “H2O World 2015”
  • 3. H2O World 2015 I have H2O Installed I have Python installed I have R installed I have the H2O World data sets Pick up stickers or get install help at the information booth
  • 4. Intro to Data Science • What is Data Science? • The Data Scientist • The Data Science Team • Data Science Tools • What is Machine Learning? • What is Deep Learning? • What is Ensemble Learning? • Data Science Resources
  • 5. What is Data Science? One of the earliest uses of the term "data science" occurred in the title of the 1996 International Federation of Classification Societies conference in Kobe, Japan.
  • 6. What is Data Science? • The term re-emerged and became popularized in 2001 by William Cleveland, then at Bell Labs, when he published, 
 "Data Science: An Action Plan for Expanding the Technical Areas of the Field of Statistics”. • This publication describes a plan to enlarge the major areas of technical work of the field of statistics. Dr. Cleveland states, "Since plan is ambitious and implies substantial change, the altered field will be called Data Science."
  • 7. What is Data Science?
  • 8. What is Data Science? • Clean, transform, filter, aggregate, impute • Convert into X and Y Problem Formulation Data Processing Machine Learning • Identify a data task or prediction problem • Collect relevant data • Train models • Evaluate models
  • 9. The Data Science Venn Diagram Drew Conway (2010)
  • 11. The Data Scientist “Unicorn”
  • 12. Survey of Data Scientists on LinkedIn The number of data scientists has doubled over the last 4 years. The top five skills listed by data scientists: 1. Data Analysis
 2. R 3. Python 4. Data Mining
 5. Machine Learning
  • 13. From Data Unicorns to Data Teams
  • 14. Data Science Teams • Usually a background in computer science or engineering • Very good programming and DevOps skills Data Analysts Data Engineers Data
 Scientists • Strong data skills and the ability to use existing data analysis tools • Able to communicate and tell a story using data • Strong math/stats background in addition to programming ability • Understanding of machine learning algorithms
  • 16. Data Science in the Enterprise • Data Science teams develop “actionable insights” for business. • They provide decision makers with information, guidance and confidence in the decision making process. • Competitive advantage • Cost minimization • Data-driven products
  • 17. Data Science in the Enterprise Don’t be a data dinosaur. Embrace the data!
  • 18. Data Science Tools 2013 was the year of the 
 data science “language wars.”
  • 19. Data Science Tools In 2015, we have evolved beyond this…
 We are too busy doing actual data science!
  • 20. Data Science Tools We are headed toward language agnostic data science, where friendly APIs connect to powerful data processing engines.
  • 21. What is Machine Learning? Unlike rules-based systems which require a human expert to hard-code domain knowledge directly into the system, a machine learning algorithm learns how to make decisions from the data alone. ”Field of study that gives computers the ability to learn without being explicitly programmed.” 
 — Arthur Samuel, 1959
  • 22. Machine Learning Tasks • Multi-class or binary classification • Ranking (e.g. Google Search results order) • Evaluate with Classification Error or AUC Regression Classification Clustering • Unsupervised learning (no training labels) • Partition the data; identify clusters or sub-populations • Evaluate with AIC, BIC or Total Sum of Squares • Predict a real-valued response (e.g. viral load, price) • Gaussian, Gamma, Poisson, etc. distributed response • Evaluate with MSE or R^2
  • 23. Train, Validation and Test Set • If you plan on doing any model tuning, you should split your dataset into three parts: Train, Validation and Test • There is no general rule for how you should partition the data and it will depend on how strong the signal in your data is, but an example could be: 
 50% Train, 25% Validation and 25% Test • The validation set is used strictly for model tuning (via validation of models with different parameters) and the test set is used to make a final estimate of the generalization
  • 24. K-fold Cross-validation • K-fold Cross-validation (CV) 
 is used to evaluate the performance of machine learning algorithms. • CV will give you the most “mileage” on your training data. • Performance metrics are averaged across k folds.
  • 25. Machine Learning Workflow Training and Prediction
 in machine learning
  • 26. What is Deep Learning? • Deep neural networks have more than one hidden layer in their architecture. That’s why they are called “deep” neural networks. • Very useful for complex input data such as images, video, audio. ”A branch of machine learning based on a set of algorithms that attempt to model high-level abstractions in data by using model architectures, composed of multiple non-linear transformations.” 
 
 — Wikipedia (2015)
  • 27. What is Deep Learning? • Deep learning architectures, specifically artificial neural networks (ANNs) have been around since 1980. • However, there were breakthroughs in training techniques that lead to their recent resurgence in the mid 2000’s. • Combined with modern computing power, they are quite effective.
  • 28. What is Ensemble Learning? • Random Forests and Gradient Boosting Machines (GBM) are both ensembles of decision trees. • Stacking, or Super Learning, is technique for combining various learners into a single, powerful learner using a second-level metalearning algorithm. “Ensemble methods use multiple learning algorithms to obtain better predictive performance that could be obtained from 
 any of the constituent learning algorithms.” 
 — Wikipedia (2015)
  • 29. No Free Lunch • No general purpose algorithm to solve all problems. • No right answer on optimal data preparation. • Some algorithms may have such strong biases that they can only learn certain kinds of functions. "Even after the observation of the frequent or constant conjunction of objects, we have no reason to draw any inference concerning any object beyond those of which we have had experience.” 
 — David Hume (1711-1776)
  • 30. Where to Learn More? • H2O Online Training (free): http://learn.h2o.ai • H2O Slidedecks: http://www.slideshare.net/0xdata • H2O Video Presentations: https://www.youtube.com/user/0xdata • H2O Community Events & Meetups: http://h2o.ai/events • Machine Learning & Data Science courses: http://coursebuffet.com