SlideShare a Scribd company logo
H2O.ai

Machine Intelligence
Data Science for
Non-Data Scientists
Erin LeDell Ph.D.
Silicon Valley Big Data Science
August 2015
H2O.ai

Machine Intelligence
H2O.ai
H2O Company
H2O Software
• Team: 35. Founded in 2012, Mountain View, CA
• Stanford Math & Systems Engineers
• Open Source Software

• Ease of Use via Web Interface
• R, Python, Scala, Spark & Hadoop Interfaces
• Distributed Algorithms Scale to Big Data
H2O.ai

Machine Intelligence
Scientific Advisory Council
Dr. Trevor Hastie
Dr. Rob Tibshirani
Dr. Stephen Boyd
• John A. Overdeck Professor of Mathematics, Stanford University
• PhD in Statistics, Stanford University
• Co-author, The Elements of Statistical Learning: Prediction, Inference and Data Mining
• Co-author with John Chambers, Statistical Models in S
• Co-author, Generalized Additive Models
• 108,404 citations (via Google Scholar)
• Professor of Statistics and Health Research and Policy, Stanford University
• PhD in Statistics, Stanford University
• COPPS Presidents’ Award recipient
• Co-author, The Elements of Statistical Learning: Prediction, Inference and Data Mining
• Author, Regression Shrinkage and Selection via the Lasso
• Co-author, An Introduction to the Bootstrap
• Professor of Electrical Engineering and Computer Science, Stanford University
• PhD in Electrical Engineering and Computer Science, UC Berkeley
• Co-author, Convex Optimization
• Co-author, Linear Matrix Inequalities in System and Control Theory
• Co-author, Distributed Optimization and Statistical Learning via the Alternating Direction
Method of Multipliers
H2O.ai

Machine Intelligence
What is Data Science?
Problem
Formulation
• Identify an outcome of interest and the type of task:
classification / regression / clustering
• Identify the potential predictor variables
• Identify the independent sampling units
• Conduct research experiment (e.g. Clinical Trial)
• Collect examples / randomly sample the population
• Transform, clean, impute, filter, aggregate data
• Prepare the data for machine learning — X, Y
• Modeling using a machine learning algorithm (training)
• Model evaluation and comparison
• Sensitivity & Cost Analysis
• Translate results into action items
• Feed results into research pipeline
Collect &
Process Data
Machine Learning
Insights & Action
H2O.ai

Machine Intelligence Source: marketingdistillery.com
H2O.ai

Machine Intelligence
What is Machine Learning?
What it is: ✤ “Field of study that gives computers the ability to learn
without being explicitly programmed.” (Samuel, 1959)
✤ “Machine learning and statistics are closely related
fields. The ideas of machine learning, from
methodological principles to theoretical tools, have
had a long pre-history in statistics.” (Jordan, 2014)
✤ M.I. Jordan also suggested the term data science as
a placeholder to call the overall field.
Unlike rules-based systems which require a human
expert to hard-code domain knowledge directly into
the system, a machine learning algorithm learns how
to make decisions from the data alone.
What it’s not:
H2O.ai

Machine Intelligence
Classification
Clustering
Machine Learning Overview
• Predict a real-valued response (viral load, weight)
• Gaussian, Gamma, Poisson and Tweedie
• MSE and R^2
• Multi-class or Binary classification
• Ranking
• Accuracy and AUC
• Unsupervised learning (no training labels)
• Partition the data / identify clusters
• AIC and BIC
Regression
H2O.ai

Machine Intelligence
Machine Learning Workflow
Source: NLTK
Example of a supervised machine learning workflow.
H2O.ai

Machine Intelligence
ML Model Performance
Test & Train
• Partition the original data (randomly) into a training set
and a test set. (e.g. 70/30)
• Train a model using the “training set” and evaluate
performance on the “test set” or “validation set.”
• Train & test K
models as shown.
• Average the model
performance over
the K test sets.
• Report cross-
validated metrics.
• Regression: R^2, MSE, RMSE
• Classification: Accuracy, F1, H-measure
• Ranking (Binary Outcome): AUC, Partial AUC
K-fold
Cross-validation
Performance
Metrics
H2O.ai

Machine Intelligence
What is Deep Learning?
What it is: ✤ “A branch of machine learning based on a set of
algorithms that attempt to model high-level
abstractions in data by using model architectures,
composed of multiple non-linear
transformations.” (Wikipedia, 2015)
✤ Deep neural networks have more than one hidden
layer in their architecture. That’s what’s “deep.”
✤ Very useful for complex input data such as images,
video, audio.
Deep learning architectures, specifically artificial
neural networks (ANNs) have been around since
1980, so they are not new. However, there were
breakthroughs in training techniques that lead to their
recent resurgence (mid 2000’s). Combined with
modern computing power, they are quite effective.
What it’s not:
H2O.ai

Machine Intelligence
Deep Learning Architecture
Example of a deep neural net architecture.
H2O.ai

Machine Intelligence
What is Ensemble Learning?
What it is: ✤ “Ensemble methods use multiple learning algorithms
to obtain better predictive performance that could be
obtained from any of the constituent learning
algorithms.” (Wikipedia, 2015)
✤ Random Forests and Gradient Boosting Machines
(GBM) are both ensembles of decision trees.
✤ Stacking, or Super Learning, is technique for
combining various learners into a single, powerful
learner using a second-level metalearning algorithm.
Ensembles typically achieve superior model
performance over singular methods. However, this
comes at a price — computation time.
What it’s not:
H2O.ai

Machine Intelligence
Where to learn more?
• H2O Online Training (free): http://learn.h2o.ai
• H2O Slidedecks: http://www.slideshare.net/0xdata
• H2O Video Presentations: https://www.youtube.com/user/0xdata
• H2O Community Events & Meetups: http://h2o.ai/events
• Machine Learning & Data Science courses: http://coursebuffet.com
Customers ! Community ! Evangelists
November 9, 10, 11
Computer History Museum

H 2 O W O R L D . H 2 O . A I

!
20% off registration
using code:

h2ocommunity
!
H2O.ai

Machine Intelligence
Questions?
@ledell on Twitter, GitHub
erin@h2o.ai
http://www.stat.berkeley.edu/~ledell
Data Science for Non-Data
Scientists 



aka. How the Business Views Data
Science
Chen Huang
August 20, 2015
Agenda
•  Introduction
•  Data Science Primer
•  Working with Data Scientists
•  Decoding the Data Science Lingo
•  Q&A
Introduction
•  Who am I?
•  Why am I giving this talk?
Who am I?
•  Data Strategist
•  Career in Business Intelligence,
Analytics, and Big Data
•  Various roles
•  Consultant
•  Developer
•  Business and Data Analyst
•  Product Manager
•  Functional and Technical Trainer
•  Client Services
•  Worked in various industries
•  Health care, pharmaceutics,
communications and high tech,
consumer products, automotive,
finance, government contracting
August, 2015 – San Francisco, CA
Why am I giving this talk?
July, 2011 – Beijing, China
Data Science Primer
•  What can Data Science do for the Business?
•  Applications of Data Science
•  Data-Driven Decisions
•  What does a Data Scientist do?
•  Data Science Skills
What can Data Science do for the
Business?
A: Data science! Extracting useful
information and knowledge from large
volumes of data in order to improve
business decision-making or
providing the business insights to make
data-driven decisions
DataBusiness
What can Data do?
Image: http://www.slideshare.net/andrewgardner5811/big-data-and-the-art-of-data-science
Applications of Data Science
Image: http://www.slideshare.net/andrewgardner5811/big-data-and-the-art-of-data-science
Data-Driven Decisions
•  Practice of basing decisions on data, rather than purely
on intuition
•  There is evidence that data-driven decision making and
big data technologies substantially improve business
performance
The Art and Science of Data Science
•  Discover unknowns in data
•  Obtain predictive, actionable insights
•  Communicate business data stories
•  Build confidence in decision making
•  Create valuable Data Products that has business
impacts
http://www.slideshare.net/datasciencelondon/big-data-sorry-data-science-what-does-a-data-scientist-do
What does a Data Scientist do?
•  Data curiosity. Explore data. Discover unknowns
•  Understand data relationships
•  Understand the business, has domain knowledge
•  Can tell relevant stories with data
•  Holistic view of the business
•  Knows machine learning, statistics, probability
•  Can hack and code
•  Define and test an hypothesis, run experiences
•  Asks good questions
http://www.slideshare.net/andrewgardner5811/big-data-and-the-art-of-data-science
Data Science Skills
Image: http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram
Image: http://www.slideshare.net/galvanizeHQ/how-to-become-a-data-scientist-by-ryan-orban-vp-of-operations-and-expansion-galvanize
Image: http://www.slideshare.net/galvanizeHQ/how-to-become-a-data-scientist-by-ryan-orban-vp-of-operations-and-expansion-galvanize
Working with Data Scientists
•  Collaboration
•  Data Science Cycle
•  Organizational Models for Data Science Teams
Working with Data Scientists
Data
Science
Business
Data
Engineering
Data Science Cycle
Image: https://en.wikipedia.org/wiki/Data_science
Organizational Models for Data
Science Teams
Image: http://www.slideshare.net/emcacademics/building-data-science-teams-31057129
Decoding the Data Science Lingo
Machine Learning
•  A subfield of computer science
and artificial intelligence (AI) that
focuses on the design of
systems that can learn from and
make decisions and predictions
based on data.
•  Machine learning enables
computers to act and make
data-driven decisions rather than
being explicitly programmed to
carry out a certain task.
•  Machine Learning programs are
also designed to learn and
improve over time when
exposed to new data.
•  Everything!
Data Science Definition: Business Application:
Definition: http://blog.aylien.com/post/121281850733/10-machine-learning-terms-explained-in-simple
Unsupervised Learning
Data Science Definition:
•  Where a program, given a
dataset, can automatically find
patterns and relationships
within the dataset.
•  The business will decide how
deeply or many categories
there are.
•  Clustering or grouping of like
data.
•  Examples: k-means clustering,
hierarchical clustering
Business Application:
•  Customer segmentation
•  Understanding users and
behaviors
•  Classifying unknown and pre-
defined images into categories
Definition: http://blog.aylien.com/post/121281850733/10-machine-learning-terms-explained-in-simple
Supervised Learning
•  Where a program is “trained”
on a pre-defined dataset.
•  Based off its training data the
program can make accurate
decisions when given new
data.
•  Classifying Twitter sentiments
•  Recommender systems
Data Science Definition: Business Application:
Definition: http://blog.aylien.com/post/121281850733/10-machine-learning-terms-explained-in-simple
Score
•  Number of ways to evaluate
how well the model assigns the
correct class value to the test
instances.
•  Confidence gauge
Data Science Definition: Business Application:
Definition: https://mlcorner.wordpress.com/tag/scoring/
Score Cont.
•  True Positive (TP):    If the instance
is positive and it is classified as
positive False
•  Negative (FN): If the instance is
positive but it is classified as
negative True
•  Negative (TN):  If the instance is
negative and it is classified as
negative False
•  Positive (FP):   If the instance is
negative but it is classified as
positive
•  Classification problems:
•  Precision = the number of times you correctly classify = TP/(TP+FP)
•  Accuracy = proportion of correctly classified instances = (TP+TN)/(TP+TN
+FP+FN)
•  Recall or Sensitivity = the number of positive that you correctly classify out
of all the actual positives = TP/(TP+FN)
•  Specificity = classifier’s ability to identify negative results = TN/(TN+FP)
Classification
•  Sub-category of Supervised
Learning
•  Classification is the process of
taking some sort of input and
assign a label to it. The
predictions are discrete,
categories, or “yes or no”
nature.
•  Examples: Logistic
Regression, Random Forest
•  What customers should a
company target with its
marketing campaigns?
•  Is this Nigerian prince
committing fraud? (Spam
classification)
•  Is this actually Barack
Obama’s Facebook profile and
review on Amazon? (Fraud
detection)
Data Science Definition: Business Application:
Definition: http://blog.aylien.com/post/121281850733/10-machine-learning-terms-explained-in-simple
Regression
•  Sub-category of Supervised
Learning
•  Regression is a type of
algorithm that predicts a
continuous values.
•  How much would a user spend
on a mobile game like
CandyCrush?
•  How much would someone
spend on healthcare out of
pocket?
•  How many attendees will come
to this event based on past
registration?
Data Science Definition: Business Application:
Definition: http://blog.aylien.com/post/121281850733/10-machine-learning-terms-explained-in-simple
Decision Trees
•  Using a tree-like graph or model
of decisions and their possible
consequence.
•  Medical Testing (e.g. health
incidences, etc.)
•  Genealogy breakdowns (e.g.
eye color, blood type, etc.)
Data Science Definition: Business Application:
Definition: http://blog.aylien.com/post/121281850733/10-machine-learning-terms-explained-in-simple
Deep Learning
•  A category of machine learning
algorithms that often use
Artificial Neural Networks to
generate model.
•  Image classification
•  Language processing
•  Audio processing
•  Outlier and fraud detection
Data Science Definition: Business Application:
Definition: http://blog.aylien.com/post/121281850733/10-machine-learning-terms-explained-in-simple
Questions?

More Related Content

What's hot

Data analytics and powerbi intro
Data analytics and powerbi introData analytics and powerbi intro
Data analytics and powerbi intro
Berkovich Consulting
 
Understanding big data and data analytics big data
Understanding big data and data analytics big dataUnderstanding big data and data analytics big data
Understanding big data and data analytics big data
Seta Wicaksana
 
Introduction to Data Science and Analytics
Introduction to Data Science and AnalyticsIntroduction to Data Science and Analytics
Introduction to Data Science and Analytics
Srinath Perera
 
Big Data in Manufacturing Final PPT
Big Data in Manufacturing Final PPTBig Data in Manufacturing Final PPT
Big Data in Manufacturing Final PPTNikhil Atkuri
 
Data Analytics
Data AnalyticsData Analytics
Data Analytics
Ravi Nayak
 
What is Data Science
What is Data ScienceWhat is Data Science
What is Data Science
Ioannis Kourouklides
 
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Data Science London
 
Product Management for AI
Product Management for AIProduct Management for AI
Product Management for AI
Peter Skomoroch
 
How to Become a Data Scientist
How to Become a Data ScientistHow to Become a Data Scientist
How to Become a Data Scientist
ryanorban
 
Data analytics
Data analyticsData analytics
Data analytics
davidfergarcia
 
Big Data, Business Intelligence and Data Analytics
Big Data, Business Intelligence and Data AnalyticsBig Data, Business Intelligence and Data Analytics
Big Data, Business Intelligence and Data Analytics
Systems Limited
 
Analytics, Business Intelligence, and Data Science - What's the Progression?
Analytics, Business Intelligence, and Data Science - What's the Progression?Analytics, Business Intelligence, and Data Science - What's the Progression?
Analytics, Business Intelligence, and Data Science - What's the Progression?
DATAVERSITY
 
Introduction to data science club
Introduction to data science clubIntroduction to data science club
Introduction to data science club
Data Science Club
 
Business intelligence
Business intelligenceBusiness intelligence
Business intelligence
Dr. Dipti Patil
 
Dissecting the dangers of deepfakes and their impact on reputation Generative...
Dissecting the dangers of deepfakes and their impact on reputation Generative...Dissecting the dangers of deepfakes and their impact on reputation Generative...
Dissecting the dangers of deepfakes and their impact on reputation Generative...
CSIRO National AI Centre
 
Big Data Analytics | What Is Big Data Analytics? | Big Data Analytics For Beg...
Big Data Analytics | What Is Big Data Analytics? | Big Data Analytics For Beg...Big Data Analytics | What Is Big Data Analytics? | Big Data Analytics For Beg...
Big Data Analytics | What Is Big Data Analytics? | Big Data Analytics For Beg...
Simplilearn
 
Data Analytics and Artificial Intelligence in the era of Digital Transformation
Data Analytics and Artificial Intelligence in the era of Digital TransformationData Analytics and Artificial Intelligence in the era of Digital Transformation
Data Analytics and Artificial Intelligence in the era of Digital Transformation
Jan Wiegelmann
 
Data Science With Python | Python For Data Science | Python Data Science Cour...
Data Science With Python | Python For Data Science | Python Data Science Cour...Data Science With Python | Python For Data Science | Python Data Science Cour...
Data Science With Python | Python For Data Science | Python Data Science Cour...
Simplilearn
 
Data science - An Introduction
Data science - An IntroductionData science - An Introduction
Data science - An Introduction
Ravishankar Rajagopalan
 
Introduction to Big Data Analytics and Data Science
Introduction to Big Data Analytics and Data ScienceIntroduction to Big Data Analytics and Data Science
Introduction to Big Data Analytics and Data Science
Data Science Thailand
 

What's hot (20)

Data analytics and powerbi intro
Data analytics and powerbi introData analytics and powerbi intro
Data analytics and powerbi intro
 
Understanding big data and data analytics big data
Understanding big data and data analytics big dataUnderstanding big data and data analytics big data
Understanding big data and data analytics big data
 
Introduction to Data Science and Analytics
Introduction to Data Science and AnalyticsIntroduction to Data Science and Analytics
Introduction to Data Science and Analytics
 
Big Data in Manufacturing Final PPT
Big Data in Manufacturing Final PPTBig Data in Manufacturing Final PPT
Big Data in Manufacturing Final PPT
 
Data Analytics
Data AnalyticsData Analytics
Data Analytics
 
What is Data Science
What is Data ScienceWhat is Data Science
What is Data Science
 
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
 
Product Management for AI
Product Management for AIProduct Management for AI
Product Management for AI
 
How to Become a Data Scientist
How to Become a Data ScientistHow to Become a Data Scientist
How to Become a Data Scientist
 
Data analytics
Data analyticsData analytics
Data analytics
 
Big Data, Business Intelligence and Data Analytics
Big Data, Business Intelligence and Data AnalyticsBig Data, Business Intelligence and Data Analytics
Big Data, Business Intelligence and Data Analytics
 
Analytics, Business Intelligence, and Data Science - What's the Progression?
Analytics, Business Intelligence, and Data Science - What's the Progression?Analytics, Business Intelligence, and Data Science - What's the Progression?
Analytics, Business Intelligence, and Data Science - What's the Progression?
 
Introduction to data science club
Introduction to data science clubIntroduction to data science club
Introduction to data science club
 
Business intelligence
Business intelligenceBusiness intelligence
Business intelligence
 
Dissecting the dangers of deepfakes and their impact on reputation Generative...
Dissecting the dangers of deepfakes and their impact on reputation Generative...Dissecting the dangers of deepfakes and their impact on reputation Generative...
Dissecting the dangers of deepfakes and their impact on reputation Generative...
 
Big Data Analytics | What Is Big Data Analytics? | Big Data Analytics For Beg...
Big Data Analytics | What Is Big Data Analytics? | Big Data Analytics For Beg...Big Data Analytics | What Is Big Data Analytics? | Big Data Analytics For Beg...
Big Data Analytics | What Is Big Data Analytics? | Big Data Analytics For Beg...
 
Data Analytics and Artificial Intelligence in the era of Digital Transformation
Data Analytics and Artificial Intelligence in the era of Digital TransformationData Analytics and Artificial Intelligence in the era of Digital Transformation
Data Analytics and Artificial Intelligence in the era of Digital Transformation
 
Data Science With Python | Python For Data Science | Python Data Science Cour...
Data Science With Python | Python For Data Science | Python Data Science Cour...Data Science With Python | Python For Data Science | Python Data Science Cour...
Data Science With Python | Python For Data Science | Python Data Science Cour...
 
Data science - An Introduction
Data science - An IntroductionData science - An Introduction
Data science - An Introduction
 
Introduction to Big Data Analytics and Data Science
Introduction to Big Data Analytics and Data ScienceIntroduction to Big Data Analytics and Data Science
Introduction to Big Data Analytics and Data Science
 

Viewers also liked

Python for Data Science - TDC 2015
Python for Data Science - TDC 2015Python for Data Science - TDC 2015
Python for Data Science - TDC 2015
Gabriel Moreira
 
Data Science Driven Malware Detection
Data Science Driven Malware DetectionData Science Driven Malware Detection
Data Science Driven Malware Detection
VMware Tanzu
 
[FAST CAMPUS] 1강 data science overview
[FAST CAMPUS] 1강 data science overview [FAST CAMPUS] 1강 data science overview
[FAST CAMPUS] 1강 data science overview
chanyoonkim
 
Pivotal Digital Transformation Forum: Accelerate Time to Market with Business...
Pivotal Digital Transformation Forum: Accelerate Time to Market with Business...Pivotal Digital Transformation Forum: Accelerate Time to Market with Business...
Pivotal Digital Transformation Forum: Accelerate Time to Market with Business...
VMware Tanzu
 
Pivotal Digital Transformation Forum: Data Science
Pivotal Digital Transformation Forum: Data Science Pivotal Digital Transformation Forum: Data Science
Pivotal Digital Transformation Forum: Data Science
VMware Tanzu
 
Pivotal Digital Transformation Forum: Becoming a Data Driven Enterprise
Pivotal Digital Transformation Forum: Becoming a Data Driven EnterprisePivotal Digital Transformation Forum: Becoming a Data Driven Enterprise
Pivotal Digital Transformation Forum: Becoming a Data Driven Enterprise
VMware Tanzu
 
저성장 시대 데이터 경제만이 살길이다
저성장 시대 데이터 경제만이 살길이다저성장 시대 데이터 경제만이 살길이다
저성장 시대 데이터 경제만이 살길이다
eungjin cho
 
What Is the Future of Data Sharing?
What Is the Future of Data Sharing?What Is the Future of Data Sharing?
What Is the Future of Data Sharing?
Center on Global Brand Leadership
 
Introduction on Data Science
Introduction on Data ScienceIntroduction on Data Science
Introduction on Data ScienceEdureka!
 
Titan: Big Graph Data with Cassandra
Titan: Big Graph Data with CassandraTitan: Big Graph Data with Cassandra
Titan: Big Graph Data with Cassandra
Matthias Broecheler
 
How to Interview a Data Scientist
How to Interview a Data ScientistHow to Interview a Data Scientist
How to Interview a Data Scientist
Daniel Tunkelang
 
Data Science - Part XIV - Genetic Algorithms
Data Science - Part XIV - Genetic AlgorithmsData Science - Part XIV - Genetic Algorithms
Data Science - Part XIV - Genetic Algorithms
Derek Kane
 
Data Science - Part XI - Text Analytics
Data Science - Part XI - Text AnalyticsData Science - Part XI - Text Analytics
Data Science - Part XI - Text Analytics
Derek Kane
 
Data Science - Part X - Time Series Forecasting
Data Science - Part X - Time Series ForecastingData Science - Part X - Time Series Forecasting
Data Science - Part X - Time Series Forecasting
Derek Kane
 
Data Science - Part XIII - Hidden Markov Models
Data Science - Part XIII - Hidden Markov ModelsData Science - Part XIII - Hidden Markov Models
Data Science - Part XIII - Hidden Markov Models
Derek Kane
 
Data Science - Part XVII - Deep Learning & Image Processing
Data Science - Part XVII - Deep Learning & Image ProcessingData Science - Part XVII - Deep Learning & Image Processing
Data Science - Part XVII - Deep Learning & Image Processing
Derek Kane
 
To Serve and Protect: Making Sense of Hadoop Security
To Serve and Protect: Making Sense of Hadoop Security To Serve and Protect: Making Sense of Hadoop Security
To Serve and Protect: Making Sense of Hadoop Security
Inside Analysis
 
MATATABI: Cyber Threat Analysis and Defense Platform using Huge Amount of Dat...
MATATABI: Cyber Threat Analysis and Defense Platform using Huge Amount of Dat...MATATABI: Cyber Threat Analysis and Defense Platform using Huge Amount of Dat...
MATATABI: Cyber Threat Analysis and Defense Platform using Huge Amount of Dat...
APNIC
 
Balancing Mobile UX & Security: An API Management Perspective Presentation fr...
Balancing Mobile UX & Security: An API Management Perspective Presentation fr...Balancing Mobile UX & Security: An API Management Perspective Presentation fr...
Balancing Mobile UX & Security: An API Management Perspective Presentation fr...
CA API Management
 
Data Security and Privacy by Contract: Hacking Us All Into Business Associate...
Data Security and Privacy by Contract: Hacking Us All Into Business Associate...Data Security and Privacy by Contract: Hacking Us All Into Business Associate...
Data Security and Privacy by Contract: Hacking Us All Into Business Associate...
Shawn Tuma
 

Viewers also liked (20)

Python for Data Science - TDC 2015
Python for Data Science - TDC 2015Python for Data Science - TDC 2015
Python for Data Science - TDC 2015
 
Data Science Driven Malware Detection
Data Science Driven Malware DetectionData Science Driven Malware Detection
Data Science Driven Malware Detection
 
[FAST CAMPUS] 1강 data science overview
[FAST CAMPUS] 1강 data science overview [FAST CAMPUS] 1강 data science overview
[FAST CAMPUS] 1강 data science overview
 
Pivotal Digital Transformation Forum: Accelerate Time to Market with Business...
Pivotal Digital Transformation Forum: Accelerate Time to Market with Business...Pivotal Digital Transformation Forum: Accelerate Time to Market with Business...
Pivotal Digital Transformation Forum: Accelerate Time to Market with Business...
 
Pivotal Digital Transformation Forum: Data Science
Pivotal Digital Transformation Forum: Data Science Pivotal Digital Transformation Forum: Data Science
Pivotal Digital Transformation Forum: Data Science
 
Pivotal Digital Transformation Forum: Becoming a Data Driven Enterprise
Pivotal Digital Transformation Forum: Becoming a Data Driven EnterprisePivotal Digital Transformation Forum: Becoming a Data Driven Enterprise
Pivotal Digital Transformation Forum: Becoming a Data Driven Enterprise
 
저성장 시대 데이터 경제만이 살길이다
저성장 시대 데이터 경제만이 살길이다저성장 시대 데이터 경제만이 살길이다
저성장 시대 데이터 경제만이 살길이다
 
What Is the Future of Data Sharing?
What Is the Future of Data Sharing?What Is the Future of Data Sharing?
What Is the Future of Data Sharing?
 
Introduction on Data Science
Introduction on Data ScienceIntroduction on Data Science
Introduction on Data Science
 
Titan: Big Graph Data with Cassandra
Titan: Big Graph Data with CassandraTitan: Big Graph Data with Cassandra
Titan: Big Graph Data with Cassandra
 
How to Interview a Data Scientist
How to Interview a Data ScientistHow to Interview a Data Scientist
How to Interview a Data Scientist
 
Data Science - Part XIV - Genetic Algorithms
Data Science - Part XIV - Genetic AlgorithmsData Science - Part XIV - Genetic Algorithms
Data Science - Part XIV - Genetic Algorithms
 
Data Science - Part XI - Text Analytics
Data Science - Part XI - Text AnalyticsData Science - Part XI - Text Analytics
Data Science - Part XI - Text Analytics
 
Data Science - Part X - Time Series Forecasting
Data Science - Part X - Time Series ForecastingData Science - Part X - Time Series Forecasting
Data Science - Part X - Time Series Forecasting
 
Data Science - Part XIII - Hidden Markov Models
Data Science - Part XIII - Hidden Markov ModelsData Science - Part XIII - Hidden Markov Models
Data Science - Part XIII - Hidden Markov Models
 
Data Science - Part XVII - Deep Learning & Image Processing
Data Science - Part XVII - Deep Learning & Image ProcessingData Science - Part XVII - Deep Learning & Image Processing
Data Science - Part XVII - Deep Learning & Image Processing
 
To Serve and Protect: Making Sense of Hadoop Security
To Serve and Protect: Making Sense of Hadoop Security To Serve and Protect: Making Sense of Hadoop Security
To Serve and Protect: Making Sense of Hadoop Security
 
MATATABI: Cyber Threat Analysis and Defense Platform using Huge Amount of Dat...
MATATABI: Cyber Threat Analysis and Defense Platform using Huge Amount of Dat...MATATABI: Cyber Threat Analysis and Defense Platform using Huge Amount of Dat...
MATATABI: Cyber Threat Analysis and Defense Platform using Huge Amount of Dat...
 
Balancing Mobile UX & Security: An API Management Perspective Presentation fr...
Balancing Mobile UX & Security: An API Management Perspective Presentation fr...Balancing Mobile UX & Security: An API Management Perspective Presentation fr...
Balancing Mobile UX & Security: An API Management Perspective Presentation fr...
 
Data Security and Privacy by Contract: Hacking Us All Into Business Associate...
Data Security and Privacy by Contract: Hacking Us All Into Business Associate...Data Security and Privacy by Contract: Hacking Us All Into Business Associate...
Data Security and Privacy by Contract: Hacking Us All Into Business Associate...
 

Similar to Intro to Data Science for Non-Data Scientists

H2O with Erin LeDell at Portland R User Group
H2O with Erin LeDell at Portland R User GroupH2O with Erin LeDell at Portland R User Group
H2O with Erin LeDell at Portland R User Group
Sri Ambati
 
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactData Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
Dr. Sunil Kr. Pandey
 
intro to data science Clustering and visualization of data science subfields ...
intro to data science Clustering and visualization of data science subfields ...intro to data science Clustering and visualization of data science subfields ...
intro to data science Clustering and visualization of data science subfields ...
jybufgofasfbkpoovh
 
Machine Learning in Modern Medicine with Erin LeDell at Stanford Med
Machine Learning in Modern Medicine with Erin LeDell at Stanford MedMachine Learning in Modern Medicine with Erin LeDell at Stanford Med
Machine Learning in Modern Medicine with Erin LeDell at Stanford Med
Sri Ambati
 
Which institute is best for data science?
Which institute is best for data science?Which institute is best for data science?
Which institute is best for data science?
DIGITALSAI1
 
Best Selenium certification course
Best Selenium certification courseBest Selenium certification course
Best Selenium certification course
KumarNaik21
 
Data science training in hyd ppt (1)
Data science training in hyd ppt (1)Data science training in hyd ppt (1)
Data science training in hyd ppt (1)
SayyedYusufali
 
Data science training institute in hyderabad
Data science training institute in hyderabadData science training institute in hyderabad
Data science training institute in hyderabad
VamsiNihal
 
Data science training in Hyderabad
Data science  training in HyderabadData science  training in Hyderabad
Data science training in Hyderabad
saitejavella
 
Data science training Hyderabad
Data science training HyderabadData science training Hyderabad
Data science training Hyderabad
Nithinsunil1
 
Data science online training in hyderabad
Data science online training in hyderabadData science online training in hyderabad
Data science online training in hyderabad
VamsiNihal
 
Data science training in hyd ppt (1)
Data science training in hyd ppt (1)Data science training in hyd ppt (1)
Data science training in hyd ppt (1)
SayyedYusufali
 
data science training and placement
data science training and placementdata science training and placement
data science training and placement
SaiprasadVella
 
online data science training
online data science trainingonline data science training
online data science training
DIGITALSAI1
 
Data science online training in hyderabad
Data science online training in hyderabadData science online training in hyderabad
Data science online training in hyderabad
VamsiNihal
 
data science online training in hyderabad
data science online training in hyderabaddata science online training in hyderabad
data science online training in hyderabad
VamsiNihal
 
Best data science training in Hyderabad
Best data science training in HyderabadBest data science training in Hyderabad
Best data science training in Hyderabad
KumarNaik21
 
Data science training Hyderabad
Data science training HyderabadData science training Hyderabad
Data science training Hyderabad
Nithinsunil1
 
Data Science Training and Placement
Data Science Training and PlacementData Science Training and Placement
Data Science Training and Placement
AkhilGGM
 
Breed data scientists_ A Presentation.pptx
Breed data scientists_ A Presentation.pptxBreed data scientists_ A Presentation.pptx
Breed data scientists_ A Presentation.pptx
GautamPopli1
 

Similar to Intro to Data Science for Non-Data Scientists (20)

H2O with Erin LeDell at Portland R User Group
H2O with Erin LeDell at Portland R User GroupH2O with Erin LeDell at Portland R User Group
H2O with Erin LeDell at Portland R User Group
 
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactData Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
 
intro to data science Clustering and visualization of data science subfields ...
intro to data science Clustering and visualization of data science subfields ...intro to data science Clustering and visualization of data science subfields ...
intro to data science Clustering and visualization of data science subfields ...
 
Machine Learning in Modern Medicine with Erin LeDell at Stanford Med
Machine Learning in Modern Medicine with Erin LeDell at Stanford MedMachine Learning in Modern Medicine with Erin LeDell at Stanford Med
Machine Learning in Modern Medicine with Erin LeDell at Stanford Med
 
Which institute is best for data science?
Which institute is best for data science?Which institute is best for data science?
Which institute is best for data science?
 
Best Selenium certification course
Best Selenium certification courseBest Selenium certification course
Best Selenium certification course
 
Data science training in hyd ppt (1)
Data science training in hyd ppt (1)Data science training in hyd ppt (1)
Data science training in hyd ppt (1)
 
Data science training institute in hyderabad
Data science training institute in hyderabadData science training institute in hyderabad
Data science training institute in hyderabad
 
Data science training in Hyderabad
Data science  training in HyderabadData science  training in Hyderabad
Data science training in Hyderabad
 
Data science training Hyderabad
Data science training HyderabadData science training Hyderabad
Data science training Hyderabad
 
Data science online training in hyderabad
Data science online training in hyderabadData science online training in hyderabad
Data science online training in hyderabad
 
Data science training in hyd ppt (1)
Data science training in hyd ppt (1)Data science training in hyd ppt (1)
Data science training in hyd ppt (1)
 
data science training and placement
data science training and placementdata science training and placement
data science training and placement
 
online data science training
online data science trainingonline data science training
online data science training
 
Data science online training in hyderabad
Data science online training in hyderabadData science online training in hyderabad
Data science online training in hyderabad
 
data science online training in hyderabad
data science online training in hyderabaddata science online training in hyderabad
data science online training in hyderabad
 
Best data science training in Hyderabad
Best data science training in HyderabadBest data science training in Hyderabad
Best data science training in Hyderabad
 
Data science training Hyderabad
Data science training HyderabadData science training Hyderabad
Data science training Hyderabad
 
Data Science Training and Placement
Data Science Training and PlacementData Science Training and Placement
Data Science Training and Placement
 
Breed data scientists_ A Presentation.pptx
Breed data scientists_ A Presentation.pptxBreed data scientists_ A Presentation.pptx
Breed data scientists_ A Presentation.pptx
 

More from Sri Ambati

GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
Sri Ambati
 
Generative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptxGenerative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptx
Sri Ambati
 
AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek
Sri Ambati
 
LLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5thLLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5th
Sri Ambati
 
Building, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for ProductionBuilding, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for Production
Sri Ambati
 
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Sri Ambati
 
Risk Management for LLMs
Risk Management for LLMsRisk Management for LLMs
Risk Management for LLMs
Sri Ambati
 
Open-Source AI: Community is the Way
Open-Source AI: Community is the WayOpen-Source AI: Community is the Way
Open-Source AI: Community is the Way
Sri Ambati
 
Building Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2OBuilding Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2O
Sri Ambati
 
Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical
Sri Ambati
 
Cutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM PapersCutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM Papers
Sri Ambati
 
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Sri Ambati
 
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Sri Ambati
 
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
Sri Ambati
 
LLM Interpretability
LLM Interpretability LLM Interpretability
LLM Interpretability
Sri Ambati
 
Never Reply to an Email Again
Never Reply to an Email AgainNever Reply to an Email Again
Never Reply to an Email Again
Sri Ambati
 
Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)
Sri Ambati
 
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
Sri Ambati
 
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
Sri Ambati
 

More from Sri Ambati (20)

GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
Generative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptxGenerative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptx
 
AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek
 
LLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5thLLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5th
 
Building, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for ProductionBuilding, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for Production
 
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
 
Risk Management for LLMs
Risk Management for LLMsRisk Management for LLMs
Risk Management for LLMs
 
Open-Source AI: Community is the Way
Open-Source AI: Community is the WayOpen-Source AI: Community is the Way
Open-Source AI: Community is the Way
 
Building Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2OBuilding Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2O
 
Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical
 
Cutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM PapersCutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM Papers
 
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
 
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
 
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
 
LLM Interpretability
LLM Interpretability LLM Interpretability
LLM Interpretability
 
Never Reply to an Email Again
Never Reply to an Email AgainNever Reply to an Email Again
Never Reply to an Email Again
 
Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)
 
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
 
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
 

Recently uploaded

Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Anthony Dahanne
 
GlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote sessionGlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote session
Globus
 
Explore Modern SharePoint Templates for 2024
Explore Modern SharePoint Templates for 2024Explore Modern SharePoint Templates for 2024
Explore Modern SharePoint Templates for 2024
Sharepoint Designs
 
Corporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMSCorporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMS
Tendenci - The Open Source AMS (Association Management Software)
 
BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024
Ortus Solutions, Corp
 
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERRORTROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
Tier1 app
 
Advanced Flow Concepts Every Developer Should Know
Advanced Flow Concepts Every Developer Should KnowAdvanced Flow Concepts Every Developer Should Know
Advanced Flow Concepts Every Developer Should Know
Peter Caitens
 
Large Language Models and the End of Programming
Large Language Models and the End of ProgrammingLarge Language Models and the End of Programming
Large Language Models and the End of Programming
Matt Welsh
 
A Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdfA Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdf
kalichargn70th171
 
Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
Paco van Beckhoven
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2
 
Using IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New ZealandUsing IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New Zealand
IES VE
 
Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024
Globus
 
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Globus
 
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
Max Andersen
 
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.ILBeyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Natan Silnitsky
 
How to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good PracticesHow to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good Practices
Globus
 
Why React Native as a Strategic Advantage for Startup Innovation.pdf
Why React Native as a Strategic Advantage for Startup Innovation.pdfWhy React Native as a Strategic Advantage for Startup Innovation.pdf
Why React Native as a Strategic Advantage for Startup Innovation.pdf
ayushiqss
 
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
XfilesPro
 
SOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBrokerSOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar
 

Recently uploaded (20)

Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
 
GlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote sessionGlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote session
 
Explore Modern SharePoint Templates for 2024
Explore Modern SharePoint Templates for 2024Explore Modern SharePoint Templates for 2024
Explore Modern SharePoint Templates for 2024
 
Corporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMSCorporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMS
 
BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024
 
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERRORTROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
 
Advanced Flow Concepts Every Developer Should Know
Advanced Flow Concepts Every Developer Should KnowAdvanced Flow Concepts Every Developer Should Know
Advanced Flow Concepts Every Developer Should Know
 
Large Language Models and the End of Programming
Large Language Models and the End of ProgrammingLarge Language Models and the End of Programming
Large Language Models and the End of Programming
 
A Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdfA Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdf
 
Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
 
Using IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New ZealandUsing IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New Zealand
 
Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024
 
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...
 
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
 
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.ILBeyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
 
How to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good PracticesHow to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good Practices
 
Why React Native as a Strategic Advantage for Startup Innovation.pdf
Why React Native as a Strategic Advantage for Startup Innovation.pdfWhy React Native as a Strategic Advantage for Startup Innovation.pdf
Why React Native as a Strategic Advantage for Startup Innovation.pdf
 
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
 
SOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBrokerSOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBroker
 

Intro to Data Science for Non-Data Scientists

  • 1. H2O.ai
 Machine Intelligence Data Science for Non-Data Scientists Erin LeDell Ph.D. Silicon Valley Big Data Science August 2015
  • 2. H2O.ai
 Machine Intelligence H2O.ai H2O Company H2O Software • Team: 35. Founded in 2012, Mountain View, CA • Stanford Math & Systems Engineers • Open Source Software
 • Ease of Use via Web Interface • R, Python, Scala, Spark & Hadoop Interfaces • Distributed Algorithms Scale to Big Data
  • 3. H2O.ai
 Machine Intelligence Scientific Advisory Council Dr. Trevor Hastie Dr. Rob Tibshirani Dr. Stephen Boyd • John A. Overdeck Professor of Mathematics, Stanford University • PhD in Statistics, Stanford University • Co-author, The Elements of Statistical Learning: Prediction, Inference and Data Mining • Co-author with John Chambers, Statistical Models in S • Co-author, Generalized Additive Models • 108,404 citations (via Google Scholar) • Professor of Statistics and Health Research and Policy, Stanford University • PhD in Statistics, Stanford University • COPPS Presidents’ Award recipient • Co-author, The Elements of Statistical Learning: Prediction, Inference and Data Mining • Author, Regression Shrinkage and Selection via the Lasso • Co-author, An Introduction to the Bootstrap • Professor of Electrical Engineering and Computer Science, Stanford University • PhD in Electrical Engineering and Computer Science, UC Berkeley • Co-author, Convex Optimization • Co-author, Linear Matrix Inequalities in System and Control Theory • Co-author, Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers
  • 4. H2O.ai
 Machine Intelligence What is Data Science? Problem Formulation • Identify an outcome of interest and the type of task: classification / regression / clustering • Identify the potential predictor variables • Identify the independent sampling units • Conduct research experiment (e.g. Clinical Trial) • Collect examples / randomly sample the population • Transform, clean, impute, filter, aggregate data • Prepare the data for machine learning — X, Y • Modeling using a machine learning algorithm (training) • Model evaluation and comparison • Sensitivity & Cost Analysis • Translate results into action items • Feed results into research pipeline Collect & Process Data Machine Learning Insights & Action
  • 5. H2O.ai
 Machine Intelligence Source: marketingdistillery.com
  • 6. H2O.ai
 Machine Intelligence What is Machine Learning? What it is: ✤ “Field of study that gives computers the ability to learn without being explicitly programmed.” (Samuel, 1959) ✤ “Machine learning and statistics are closely related fields. The ideas of machine learning, from methodological principles to theoretical tools, have had a long pre-history in statistics.” (Jordan, 2014) ✤ M.I. Jordan also suggested the term data science as a placeholder to call the overall field. Unlike rules-based systems which require a human expert to hard-code domain knowledge directly into the system, a machine learning algorithm learns how to make decisions from the data alone. What it’s not:
  • 7. H2O.ai
 Machine Intelligence Classification Clustering Machine Learning Overview • Predict a real-valued response (viral load, weight) • Gaussian, Gamma, Poisson and Tweedie • MSE and R^2 • Multi-class or Binary classification • Ranking • Accuracy and AUC • Unsupervised learning (no training labels) • Partition the data / identify clusters • AIC and BIC Regression
  • 8. H2O.ai
 Machine Intelligence Machine Learning Workflow Source: NLTK Example of a supervised machine learning workflow.
  • 9. H2O.ai
 Machine Intelligence ML Model Performance Test & Train • Partition the original data (randomly) into a training set and a test set. (e.g. 70/30) • Train a model using the “training set” and evaluate performance on the “test set” or “validation set.” • Train & test K models as shown. • Average the model performance over the K test sets. • Report cross- validated metrics. • Regression: R^2, MSE, RMSE • Classification: Accuracy, F1, H-measure • Ranking (Binary Outcome): AUC, Partial AUC K-fold Cross-validation Performance Metrics
  • 10. H2O.ai
 Machine Intelligence What is Deep Learning? What it is: ✤ “A branch of machine learning based on a set of algorithms that attempt to model high-level abstractions in data by using model architectures, composed of multiple non-linear transformations.” (Wikipedia, 2015) ✤ Deep neural networks have more than one hidden layer in their architecture. That’s what’s “deep.” ✤ Very useful for complex input data such as images, video, audio. Deep learning architectures, specifically artificial neural networks (ANNs) have been around since 1980, so they are not new. However, there were breakthroughs in training techniques that lead to their recent resurgence (mid 2000’s). Combined with modern computing power, they are quite effective. What it’s not:
  • 11. H2O.ai
 Machine Intelligence Deep Learning Architecture Example of a deep neural net architecture.
  • 12. H2O.ai
 Machine Intelligence What is Ensemble Learning? What it is: ✤ “Ensemble methods use multiple learning algorithms to obtain better predictive performance that could be obtained from any of the constituent learning algorithms.” (Wikipedia, 2015) ✤ Random Forests and Gradient Boosting Machines (GBM) are both ensembles of decision trees. ✤ Stacking, or Super Learning, is technique for combining various learners into a single, powerful learner using a second-level metalearning algorithm. Ensembles typically achieve superior model performance over singular methods. However, this comes at a price — computation time. What it’s not:
  • 13. H2O.ai
 Machine Intelligence Where to learn more? • H2O Online Training (free): http://learn.h2o.ai • H2O Slidedecks: http://www.slideshare.net/0xdata • H2O Video Presentations: https://www.youtube.com/user/0xdata • H2O Community Events & Meetups: http://h2o.ai/events • Machine Learning & Data Science courses: http://coursebuffet.com
  • 14. Customers ! Community ! Evangelists November 9, 10, 11 Computer History Museum H 2 O W O R L D . H 2 O . A I ! 20% off registration using code: h2ocommunity !
  • 15. H2O.ai
 Machine Intelligence Questions? @ledell on Twitter, GitHub erin@h2o.ai http://www.stat.berkeley.edu/~ledell
  • 16. Data Science for Non-Data Scientists 
 
 aka. How the Business Views Data Science Chen Huang August 20, 2015
  • 17.
  • 18. Agenda •  Introduction •  Data Science Primer •  Working with Data Scientists •  Decoding the Data Science Lingo •  Q&A
  • 19. Introduction •  Who am I? •  Why am I giving this talk?
  • 20. Who am I? •  Data Strategist •  Career in Business Intelligence, Analytics, and Big Data •  Various roles •  Consultant •  Developer •  Business and Data Analyst •  Product Manager •  Functional and Technical Trainer •  Client Services •  Worked in various industries •  Health care, pharmaceutics, communications and high tech, consumer products, automotive, finance, government contracting August, 2015 – San Francisco, CA
  • 21. Why am I giving this talk? July, 2011 – Beijing, China
  • 22. Data Science Primer •  What can Data Science do for the Business? •  Applications of Data Science •  Data-Driven Decisions •  What does a Data Scientist do? •  Data Science Skills
  • 23. What can Data Science do for the Business? A: Data science! Extracting useful information and knowledge from large volumes of data in order to improve business decision-making or providing the business insights to make data-driven decisions DataBusiness
  • 24. What can Data do? Image: http://www.slideshare.net/andrewgardner5811/big-data-and-the-art-of-data-science
  • 25. Applications of Data Science Image: http://www.slideshare.net/andrewgardner5811/big-data-and-the-art-of-data-science
  • 26. Data-Driven Decisions •  Practice of basing decisions on data, rather than purely on intuition •  There is evidence that data-driven decision making and big data technologies substantially improve business performance
  • 27. The Art and Science of Data Science •  Discover unknowns in data •  Obtain predictive, actionable insights •  Communicate business data stories •  Build confidence in decision making •  Create valuable Data Products that has business impacts http://www.slideshare.net/datasciencelondon/big-data-sorry-data-science-what-does-a-data-scientist-do
  • 28.
  • 29. What does a Data Scientist do? •  Data curiosity. Explore data. Discover unknowns •  Understand data relationships •  Understand the business, has domain knowledge •  Can tell relevant stories with data •  Holistic view of the business •  Knows machine learning, statistics, probability •  Can hack and code •  Define and test an hypothesis, run experiences •  Asks good questions http://www.slideshare.net/andrewgardner5811/big-data-and-the-art-of-data-science
  • 30. Data Science Skills Image: http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram
  • 33. Working with Data Scientists •  Collaboration •  Data Science Cycle •  Organizational Models for Data Science Teams
  • 34.
  • 35. Working with Data Scientists Data Science Business Data Engineering
  • 36. Data Science Cycle Image: https://en.wikipedia.org/wiki/Data_science
  • 37. Organizational Models for Data Science Teams Image: http://www.slideshare.net/emcacademics/building-data-science-teams-31057129
  • 38. Decoding the Data Science Lingo
  • 39. Machine Learning •  A subfield of computer science and artificial intelligence (AI) that focuses on the design of systems that can learn from and make decisions and predictions based on data. •  Machine learning enables computers to act and make data-driven decisions rather than being explicitly programmed to carry out a certain task. •  Machine Learning programs are also designed to learn and improve over time when exposed to new data. •  Everything! Data Science Definition: Business Application: Definition: http://blog.aylien.com/post/121281850733/10-machine-learning-terms-explained-in-simple
  • 40. Unsupervised Learning Data Science Definition: •  Where a program, given a dataset, can automatically find patterns and relationships within the dataset. •  The business will decide how deeply or many categories there are. •  Clustering or grouping of like data. •  Examples: k-means clustering, hierarchical clustering Business Application: •  Customer segmentation •  Understanding users and behaviors •  Classifying unknown and pre- defined images into categories Definition: http://blog.aylien.com/post/121281850733/10-machine-learning-terms-explained-in-simple
  • 41. Supervised Learning •  Where a program is “trained” on a pre-defined dataset. •  Based off its training data the program can make accurate decisions when given new data. •  Classifying Twitter sentiments •  Recommender systems Data Science Definition: Business Application: Definition: http://blog.aylien.com/post/121281850733/10-machine-learning-terms-explained-in-simple
  • 42. Score •  Number of ways to evaluate how well the model assigns the correct class value to the test instances. •  Confidence gauge Data Science Definition: Business Application: Definition: https://mlcorner.wordpress.com/tag/scoring/
  • 43. Score Cont. •  True Positive (TP):    If the instance is positive and it is classified as positive False •  Negative (FN): If the instance is positive but it is classified as negative True •  Negative (TN):  If the instance is negative and it is classified as negative False •  Positive (FP):   If the instance is negative but it is classified as positive •  Classification problems: •  Precision = the number of times you correctly classify = TP/(TP+FP) •  Accuracy = proportion of correctly classified instances = (TP+TN)/(TP+TN +FP+FN) •  Recall or Sensitivity = the number of positive that you correctly classify out of all the actual positives = TP/(TP+FN) •  Specificity = classifier’s ability to identify negative results = TN/(TN+FP)
  • 44. Classification •  Sub-category of Supervised Learning •  Classification is the process of taking some sort of input and assign a label to it. The predictions are discrete, categories, or “yes or no” nature. •  Examples: Logistic Regression, Random Forest •  What customers should a company target with its marketing campaigns? •  Is this Nigerian prince committing fraud? (Spam classification) •  Is this actually Barack Obama’s Facebook profile and review on Amazon? (Fraud detection) Data Science Definition: Business Application: Definition: http://blog.aylien.com/post/121281850733/10-machine-learning-terms-explained-in-simple
  • 45. Regression •  Sub-category of Supervised Learning •  Regression is a type of algorithm that predicts a continuous values. •  How much would a user spend on a mobile game like CandyCrush? •  How much would someone spend on healthcare out of pocket? •  How many attendees will come to this event based on past registration? Data Science Definition: Business Application: Definition: http://blog.aylien.com/post/121281850733/10-machine-learning-terms-explained-in-simple
  • 46. Decision Trees •  Using a tree-like graph or model of decisions and their possible consequence. •  Medical Testing (e.g. health incidences, etc.) •  Genealogy breakdowns (e.g. eye color, blood type, etc.) Data Science Definition: Business Application: Definition: http://blog.aylien.com/post/121281850733/10-machine-learning-terms-explained-in-simple
  • 47. Deep Learning •  A category of machine learning algorithms that often use Artificial Neural Networks to generate model. •  Image classification •  Language processing •  Audio processing •  Outlier and fraud detection Data Science Definition: Business Application: Definition: http://blog.aylien.com/post/121281850733/10-machine-learning-terms-explained-in-simple