SlideShare a Scribd company logo
Machine Learning
what, why and how
me ...
Harshad
➔ Senior Data Scientist @ Sokrati
➔ Spent last 4 years trying to understand and
apply machine learning
Sokrati is a digital advertising startup based out of Pune
what are we going to do ?
get a 10000 feet view ...
then go to specifics ...
10000 feet view
what is ML ?
➔ Too many definitions!
➔ Too much debate
➔ Analytics vs ML vs data mining vs Big Data vs
Statistics vs next buzzword in the market
what is ML ?
Teaching machines to take decisions
with the help of data
practical man’s definition of machine learning!
bit of history ...
That’s too ancient!
That’s not
ML...
bit of (relevant) history ...
➔ Insurance, Banking industry
◆ Credit scoring
◆ mathematical models in finance
➔ Artificial Intelligence and other fancy ideas
◆ deep blue, Samuel’s checkers machine
◆ IBM watson computer
Find ‘f’ => endeavour to understand the world!
g( y ) = f( X )
the two cultures in ML
Stats Culture
Vs
AI / ML Culture
the two cultures in ML
Stats Culture
➔ Focus on ‘why this model’
➔ Goodness of fit, hypothesis
tests, residuals
➔ or MCMC methods, bayesian
modeling
➔ regression models, survival
analysis
AI / ML Culture
➔ Focus on ‘good predictions’
➔ Cross validations, ensemble of
models
➔ Focus on underfit vs overfit
analysis
➔ neural nets, tree based models
(random forests et. al.)
but let’s build bridges
Stats
➔ Focus on basics, sound theory
➔ Exploration, summaries
➔ Models
ML / AI
➔ Focus on predictions
➔ Model evaluation
➔ Feature selection
Computations
➔ Focus on application
➔ Achieving scale and usability
➔ Hadoop , Storm and friends..
Business Knowledge
➔ Focus on interpretation
➔ Visualizations
➔ Creating stories out of data
typical ML process
➔ Objective
➔ Source data
➔ Explore
➔ Model
➔ Evaluate
➔ Apply
➔ Validate
objective
in brief!
bottom line : not in terms of algorithm but outcome
probability of customer churn
group set of emails by topic
predict rainfall
recommend item to a consumer to
increase likelihood of click
sourcing data
to be covered at end!
explore
R you ready ?
introduction to R
➔ Starting R
➔ Data Structures
◆ Atomic Vectors, matrix
◆ Lists
◆ Data frames
➔ Data types
◆ usual suspects in numerics (int, double, character)
◆ Factors
◆ logical
data frame , the workhorse
➔ Load sample data frame
➔ Explore data frame (head, tail)
➔ Access elements by index
◆ access rows
◆ access columns
● single
● multiple (by name, by index, by -ve index)
➔ Find metadata
◆ names
◆ dimensions
➔ Explore using plot (pairs)
broadcasting / vectorization
➔ Very important concept
➔ Subsetting
◆ vectors
◆ data frames
➔ Applying operations
◆ operations on entire column
data exploration
➔ summarizing using mean
➔ quantiles, when mean is not enough
◆ outlier detection
➔ functional roots of R : sapply summary
➔ summary function
◆ on numerical
◆ on factors
➔ plots (basic)
➔ histograms
➔ correlations
models
simple & real world
basic types of models
➔ Supervised learning
➔ Unsupervised learning
➔ Semi-supervised learning
➔ Reinforcement learning
linear regression
➔ load sample dataset (cars)
➔ build linear regression model
➔ understand the output
◆ summary
◆ plots
➔ understanding train vs predict cycle
◆ most important idea!
demo on real world
dataset
data exploration, classification
hierarchical clustering
➔ basic idea of clustering
◆ distance as a proxy for similarity, group by distance
◆ group anything as long as distance can be
calculated
➔ load and explore eurodist data
➔ fit hierarchical cluster
➔ plot dendrogram
demo on real world data
and the most important idea!
concept of vector space model
➔ Words as axis
➔ Bag of words defines
vector space
➔ Document as a point in
space
➔ We can
◆ define distance
◆ measure similarity
(cosine similarity)
◆ group documents
➔ what can be a document ?
evaluation
evaluation metrics
➔ depends on type of model
◆ regression : MAPE, MSSE
◆ classification : accuracy, precision, recall, F score
◆ clustering : within vs between variance
➔ ML world (ref : two cultures) has much
better story
➔ Not enough to perform well on training set
brief intro regularization
➔ Bias vs Variance problem
➔ We want to be ‘just right’
➔ Concept of regularization
➔ Intro Cross validation
demo of evaluation
and fantastic Scikits API
sourcing and applying
and the great ML divide
the great ML divide
Lab Culture
➔ Theory
➔ Small Datasets
➔ In memory
➔ Not live
➔ R, Octave,
Python..
Source and Apply
➔ The practice
➔ Huge datasets
➔ Live in production
➔ Hadoop and
friends, Python ? ,
R ?
processing data at scale
➔ Data is not available in final form
➔ Non standard data
◆ click streams
◆ event logs
◆ free form text
➔ Process at scale
➔ Transform , clean, group in final form
5 min intro to Hadoop Ecosystem
➔ Write in assembly
◆ java
➔ DSLs
◆ Pig
◆ hive
◆ impala
➔ Functional Languages to rescue!
Introduction to Cascalog
processing data at scale
Recap
➔ Pragmatic ML
➔ Key phases
➔ Supervised learning
➔ Unsupervised learning
➔ Evaluation
➔ Application issues
➔ Processing data @ scale
Let’s discuss...

More Related Content

Viewers also liked

Energy harvesting at morgan
Energy harvesting at morganEnergy harvesting at morgan
Energy harvesting at morgan
Frédéric Pimparel CEng
 
The Seven Deadly Sins by Employees
The Seven Deadly Sins by EmployeesThe Seven Deadly Sins by Employees
The Seven Deadly Sins by Employees
Ankur Tandon
 
Google Analytics aplicado al Email Marketing
Google Analytics aplicado al Email MarketingGoogle Analytics aplicado al Email Marketing
Google Analytics aplicado al Email Marketing
FromDoppler
 
Social Media - Strategy Quickstart for CPAs
Social Media - Strategy Quickstart for CPAsSocial Media - Strategy Quickstart for CPAs
Social Media - Strategy Quickstart for CPAs
Tom Hood, CPA,CITP,CGMA
 
ללמוד כיצד לרתום את הכח של מתודולוגיית אדיג\'ס- הזמנה ליום העיון
ללמוד כיצד לרתום את הכח של מתודולוגיית אדיג\'ס- הזמנה ליום העיוןללמוד כיצד לרתום את הכח של מתודולוגיית אדיג\'ס- הזמנה ליום העיון
ללמוד כיצד לרתום את הכח של מתודולוגיית אדיג\'ס- הזמנה ליום העיוןPeterShtrom
 
Faire d’internet un véritable outil commercial dans le tourisme
Faire d’internet un véritable outil commercial dans le tourismeFaire d’internet un véritable outil commercial dans le tourisme
Faire d’internet un véritable outil commercial dans le tourisme
Philippe Fabry
 
The 7 Essential Secrets of the Tech Job Search
The 7 Essential Secrets of the Tech Job SearchThe 7 Essential Secrets of the Tech Job Search
The 7 Essential Secrets of the Tech Job Search
Jeremy Schifeling
 
Aloittelevan kunnallispoliitikon tunnustukset vuodelta 1997
Aloittelevan kunnallispoliitikon tunnustukset vuodelta 1997Aloittelevan kunnallispoliitikon tunnustukset vuodelta 1997
Aloittelevan kunnallispoliitikon tunnustukset vuodelta 1997
Jyrki Kasvi
 
Actividades de reflexion incial la ofimatica
Actividades de reflexion incial la ofimaticaActividades de reflexion incial la ofimatica
Actividades de reflexion incial la ofimatica
Neiver Ramirez Perez
 
E book - Hiring tool kit for Smart Recruiters
E book - Hiring tool kit for Smart RecruitersE book - Hiring tool kit for Smart Recruiters
E book - Hiring tool kit for Smart Recruiters
Talview
 
The Numbers Magic (Amsterdam Node Meetup Presentation)
The Numbers Magic (Amsterdam Node Meetup Presentation)The Numbers Magic (Amsterdam Node Meetup Presentation)
The Numbers Magic (Amsterdam Node Meetup Presentation)
icemobile
 
Презентація УБА 2012
Презентація УБА 2012Презентація УБА 2012
Презентація УБА 2012
uba2010
 
Social CMI: Social data in context
Social CMI: Social data in contextSocial CMI: Social data in context
Social CMI: Social data in context
Brandwatch
 
Social Selling Social Media Conversions Webinar With Milk it Academy by Doyle...
Social Selling Social Media Conversions Webinar With Milk it Academy by Doyle...Social Selling Social Media Conversions Webinar With Milk it Academy by Doyle...
Social Selling Social Media Conversions Webinar With Milk it Academy by Doyle...
Doyle Buehler
 
Leadership howard county 03 28 12 jenn lim delivering happiness
Leadership howard county 03 28 12 jenn lim delivering happinessLeadership howard county 03 28 12 jenn lim delivering happiness
Leadership howard county 03 28 12 jenn lim delivering happiness
Delivering Happiness
 
Guía de aplicación sistema nervioso
Guía  de  aplicación   sistema  nerviosoGuía  de  aplicación   sistema  nervioso
Guía de aplicación sistema nervioso
Giuliana Tinoco
 

Viewers also liked (16)

Energy harvesting at morgan
Energy harvesting at morganEnergy harvesting at morgan
Energy harvesting at morgan
 
The Seven Deadly Sins by Employees
The Seven Deadly Sins by EmployeesThe Seven Deadly Sins by Employees
The Seven Deadly Sins by Employees
 
Google Analytics aplicado al Email Marketing
Google Analytics aplicado al Email MarketingGoogle Analytics aplicado al Email Marketing
Google Analytics aplicado al Email Marketing
 
Social Media - Strategy Quickstart for CPAs
Social Media - Strategy Quickstart for CPAsSocial Media - Strategy Quickstart for CPAs
Social Media - Strategy Quickstart for CPAs
 
ללמוד כיצד לרתום את הכח של מתודולוגיית אדיג\'ס- הזמנה ליום העיון
ללמוד כיצד לרתום את הכח של מתודולוגיית אדיג\'ס- הזמנה ליום העיוןללמוד כיצד לרתום את הכח של מתודולוגיית אדיג\'ס- הזמנה ליום העיון
ללמוד כיצד לרתום את הכח של מתודולוגיית אדיג\'ס- הזמנה ליום העיון
 
Faire d’internet un véritable outil commercial dans le tourisme
Faire d’internet un véritable outil commercial dans le tourismeFaire d’internet un véritable outil commercial dans le tourisme
Faire d’internet un véritable outil commercial dans le tourisme
 
The 7 Essential Secrets of the Tech Job Search
The 7 Essential Secrets of the Tech Job SearchThe 7 Essential Secrets of the Tech Job Search
The 7 Essential Secrets of the Tech Job Search
 
Aloittelevan kunnallispoliitikon tunnustukset vuodelta 1997
Aloittelevan kunnallispoliitikon tunnustukset vuodelta 1997Aloittelevan kunnallispoliitikon tunnustukset vuodelta 1997
Aloittelevan kunnallispoliitikon tunnustukset vuodelta 1997
 
Actividades de reflexion incial la ofimatica
Actividades de reflexion incial la ofimaticaActividades de reflexion incial la ofimatica
Actividades de reflexion incial la ofimatica
 
E book - Hiring tool kit for Smart Recruiters
E book - Hiring tool kit for Smart RecruitersE book - Hiring tool kit for Smart Recruiters
E book - Hiring tool kit for Smart Recruiters
 
The Numbers Magic (Amsterdam Node Meetup Presentation)
The Numbers Magic (Amsterdam Node Meetup Presentation)The Numbers Magic (Amsterdam Node Meetup Presentation)
The Numbers Magic (Amsterdam Node Meetup Presentation)
 
Презентація УБА 2012
Презентація УБА 2012Презентація УБА 2012
Презентація УБА 2012
 
Social CMI: Social data in context
Social CMI: Social data in contextSocial CMI: Social data in context
Social CMI: Social data in context
 
Social Selling Social Media Conversions Webinar With Milk it Academy by Doyle...
Social Selling Social Media Conversions Webinar With Milk it Academy by Doyle...Social Selling Social Media Conversions Webinar With Milk it Academy by Doyle...
Social Selling Social Media Conversions Webinar With Milk it Academy by Doyle...
 
Leadership howard county 03 28 12 jenn lim delivering happiness
Leadership howard county 03 28 12 jenn lim delivering happinessLeadership howard county 03 28 12 jenn lim delivering happiness
Leadership howard county 03 28 12 jenn lim delivering happiness
 
Guía de aplicación sistema nervioso
Guía  de  aplicación   sistema  nerviosoGuía  de  aplicación   sistema  nervioso
Guía de aplicación sistema nervioso
 

Similar to Workshop on Machine Learning

Tf gsds
Tf gsdsTf gsds
Machine Learning for (JVM) Developers
Machine Learning for (JVM) DevelopersMachine Learning for (JVM) Developers
Machine Learning for (JVM) Developers
Mateusz Dymczyk
 
A General Overview of Machine Learning
A General Overview of Machine LearningA General Overview of Machine Learning
A General Overview of Machine Learning
Ashish Sharma
 
Data Workflows for Machine Learning - SF Bay Area ML
Data Workflows for Machine Learning - SF Bay Area MLData Workflows for Machine Learning - SF Bay Area ML
Data Workflows for Machine Learning - SF Bay Area ML
Paco Nathan
 
What data scientists really do, according to 50 data scientists
What data scientists really do, according to 50 data scientistsWhat data scientists really do, according to 50 data scientists
What data scientists really do, according to 50 data scientists
Hugo Bowne-Anderson
 
Knowledge Graphs - The Power of Graph-Based Search
Knowledge Graphs - The Power of Graph-Based SearchKnowledge Graphs - The Power of Graph-Based Search
Knowledge Graphs - The Power of Graph-Based Search
Neo4j
 
Data Workflows for Machine Learning - Seattle DAML
Data Workflows for Machine Learning - Seattle DAMLData Workflows for Machine Learning - Seattle DAML
Data Workflows for Machine Learning - Seattle DAML
Paco Nathan
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
Rahul Jain
 
DMDW Unit 1.pdf
DMDW Unit 1.pdfDMDW Unit 1.pdf
DMDW Unit 1.pdf
ASISHRANJANSAMAL1
 
UKSG 2024 - Demystifying AI - Evaluating future uses and limits in library co...
UKSG 2024 - Demystifying AI - Evaluating future uses and limits in library co...UKSG 2024 - Demystifying AI - Evaluating future uses and limits in library co...
UKSG 2024 - Demystifying AI - Evaluating future uses and limits in library co...
UKSG: connecting the knowledge community
 
Data science presentation
Data science presentationData science presentation
Data science presentation
MSDEVMTL
 
Data Science Folk Knowledge
Data Science Folk KnowledgeData Science Folk Knowledge
Data Science Folk Knowledge
Krishna Sankar
 
Barga Data Science lecture 2
Barga Data Science lecture 2Barga Data Science lecture 2
Barga Data Science lecture 2
Roger Barga
 
A Friendly Introduction to Machine Learning
A Friendly Introduction to Machine LearningA Friendly Introduction to Machine Learning
A Friendly Introduction to Machine Learning
Haptik
 
Map Reduce amrp presentation
Map Reduce amrp presentationMap Reduce amrp presentation
Map Reduce amrp presentation
renjan131
 
OSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine LearningOSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine Learning
Paco Nathan
 
Tf wiads
Tf wiadsTf wiads
Tf wdvds
Tf wdvdsTf wdvds
Demystifying Data Science with an introduction to Machine Learning
Demystifying Data Science with an introduction to Machine LearningDemystifying Data Science with an introduction to Machine Learning
Demystifying Data Science with an introduction to Machine Learning
Julian Bright
 
Decoding Data Science
Decoding Data ScienceDecoding Data Science
Decoding Data Science
Matt Fornito
 

Similar to Workshop on Machine Learning (20)

Tf gsds
Tf gsdsTf gsds
Tf gsds
 
Machine Learning for (JVM) Developers
Machine Learning for (JVM) DevelopersMachine Learning for (JVM) Developers
Machine Learning for (JVM) Developers
 
A General Overview of Machine Learning
A General Overview of Machine LearningA General Overview of Machine Learning
A General Overview of Machine Learning
 
Data Workflows for Machine Learning - SF Bay Area ML
Data Workflows for Machine Learning - SF Bay Area MLData Workflows for Machine Learning - SF Bay Area ML
Data Workflows for Machine Learning - SF Bay Area ML
 
What data scientists really do, according to 50 data scientists
What data scientists really do, according to 50 data scientistsWhat data scientists really do, according to 50 data scientists
What data scientists really do, according to 50 data scientists
 
Knowledge Graphs - The Power of Graph-Based Search
Knowledge Graphs - The Power of Graph-Based SearchKnowledge Graphs - The Power of Graph-Based Search
Knowledge Graphs - The Power of Graph-Based Search
 
Data Workflows for Machine Learning - Seattle DAML
Data Workflows for Machine Learning - Seattle DAMLData Workflows for Machine Learning - Seattle DAML
Data Workflows for Machine Learning - Seattle DAML
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
DMDW Unit 1.pdf
DMDW Unit 1.pdfDMDW Unit 1.pdf
DMDW Unit 1.pdf
 
UKSG 2024 - Demystifying AI - Evaluating future uses and limits in library co...
UKSG 2024 - Demystifying AI - Evaluating future uses and limits in library co...UKSG 2024 - Demystifying AI - Evaluating future uses and limits in library co...
UKSG 2024 - Demystifying AI - Evaluating future uses and limits in library co...
 
Data science presentation
Data science presentationData science presentation
Data science presentation
 
Data Science Folk Knowledge
Data Science Folk KnowledgeData Science Folk Knowledge
Data Science Folk Knowledge
 
Barga Data Science lecture 2
Barga Data Science lecture 2Barga Data Science lecture 2
Barga Data Science lecture 2
 
A Friendly Introduction to Machine Learning
A Friendly Introduction to Machine LearningA Friendly Introduction to Machine Learning
A Friendly Introduction to Machine Learning
 
Map Reduce amrp presentation
Map Reduce amrp presentationMap Reduce amrp presentation
Map Reduce amrp presentation
 
OSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine LearningOSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine Learning
 
Tf wiads
Tf wiadsTf wiads
Tf wiads
 
Tf wdvds
Tf wdvdsTf wdvds
Tf wdvds
 
Demystifying Data Science with an introduction to Machine Learning
Demystifying Data Science with an introduction to Machine LearningDemystifying Data Science with an introduction to Machine Learning
Demystifying Data Science with an introduction to Machine Learning
 
Decoding Data Science
Decoding Data ScienceDecoding Data Science
Decoding Data Science
 

Recently uploaded

Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
Sm321
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
v7oacc3l
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
manishkhaire30
 
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docxDATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
SaffaIbrahim1
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
Sachin Paul
 
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
xclpvhuk
 
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
taqyea
 
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
ihavuls
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
roli9797
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
AndrzejJarynowski
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
nuttdpt
 
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
mkkikqvo
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
nyfuhyz
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
Social Samosa
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
jitskeb
 
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
hyfjgavov
 
Build applications with generative AI on Google Cloud
Build applications with generative AI on Google CloudBuild applications with generative AI on Google Cloud
Build applications with generative AI on Google Cloud
Márton Kodok
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Aggregage
 
writing report business partner b1+ .pdf
writing report business partner b1+ .pdfwriting report business partner b1+ .pdf
writing report business partner b1+ .pdf
VyNguyen709676
 
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
y3i0qsdzb
 

Recently uploaded (20)

Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
 
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docxDATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
 
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
 
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
 
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
 
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
 
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
 
Build applications with generative AI on Google Cloud
Build applications with generative AI on Google CloudBuild applications with generative AI on Google Cloud
Build applications with generative AI on Google Cloud
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
 
writing report business partner b1+ .pdf
writing report business partner b1+ .pdfwriting report business partner b1+ .pdf
writing report business partner b1+ .pdf
 
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
 

Workshop on Machine Learning

  • 2. me ... Harshad ➔ Senior Data Scientist @ Sokrati ➔ Spent last 4 years trying to understand and apply machine learning Sokrati is a digital advertising startup based out of Pune
  • 3. what are we going to do ?
  • 4. get a 10000 feet view ...
  • 5. then go to specifics ...
  • 7. what is ML ? ➔ Too many definitions! ➔ Too much debate ➔ Analytics vs ML vs data mining vs Big Data vs Statistics vs next buzzword in the market
  • 8. what is ML ? Teaching machines to take decisions with the help of data practical man’s definition of machine learning!
  • 11. bit of (relevant) history ... ➔ Insurance, Banking industry ◆ Credit scoring ◆ mathematical models in finance ➔ Artificial Intelligence and other fancy ideas ◆ deep blue, Samuel’s checkers machine ◆ IBM watson computer
  • 12. Find ‘f’ => endeavour to understand the world! g( y ) = f( X )
  • 13. the two cultures in ML Stats Culture Vs AI / ML Culture
  • 14. the two cultures in ML Stats Culture ➔ Focus on ‘why this model’ ➔ Goodness of fit, hypothesis tests, residuals ➔ or MCMC methods, bayesian modeling ➔ regression models, survival analysis AI / ML Culture ➔ Focus on ‘good predictions’ ➔ Cross validations, ensemble of models ➔ Focus on underfit vs overfit analysis ➔ neural nets, tree based models (random forests et. al.)
  • 15. but let’s build bridges Stats ➔ Focus on basics, sound theory ➔ Exploration, summaries ➔ Models ML / AI ➔ Focus on predictions ➔ Model evaluation ➔ Feature selection Computations ➔ Focus on application ➔ Achieving scale and usability ➔ Hadoop , Storm and friends.. Business Knowledge ➔ Focus on interpretation ➔ Visualizations ➔ Creating stories out of data
  • 16. typical ML process ➔ Objective ➔ Source data ➔ Explore ➔ Model ➔ Evaluate ➔ Apply ➔ Validate
  • 18. bottom line : not in terms of algorithm but outcome probability of customer churn group set of emails by topic predict rainfall recommend item to a consumer to increase likelihood of click
  • 19. sourcing data to be covered at end!
  • 21. introduction to R ➔ Starting R ➔ Data Structures ◆ Atomic Vectors, matrix ◆ Lists ◆ Data frames ➔ Data types ◆ usual suspects in numerics (int, double, character) ◆ Factors ◆ logical
  • 22. data frame , the workhorse ➔ Load sample data frame ➔ Explore data frame (head, tail) ➔ Access elements by index ◆ access rows ◆ access columns ● single ● multiple (by name, by index, by -ve index) ➔ Find metadata ◆ names ◆ dimensions ➔ Explore using plot (pairs)
  • 23. broadcasting / vectorization ➔ Very important concept ➔ Subsetting ◆ vectors ◆ data frames ➔ Applying operations ◆ operations on entire column
  • 24. data exploration ➔ summarizing using mean ➔ quantiles, when mean is not enough ◆ outlier detection ➔ functional roots of R : sapply summary ➔ summary function ◆ on numerical ◆ on factors ➔ plots (basic) ➔ histograms ➔ correlations
  • 26. basic types of models ➔ Supervised learning ➔ Unsupervised learning ➔ Semi-supervised learning ➔ Reinforcement learning
  • 27. linear regression ➔ load sample dataset (cars) ➔ build linear regression model ➔ understand the output ◆ summary ◆ plots ➔ understanding train vs predict cycle ◆ most important idea!
  • 28. demo on real world dataset data exploration, classification
  • 29. hierarchical clustering ➔ basic idea of clustering ◆ distance as a proxy for similarity, group by distance ◆ group anything as long as distance can be calculated ➔ load and explore eurodist data ➔ fit hierarchical cluster ➔ plot dendrogram
  • 30. demo on real world data and the most important idea!
  • 31. concept of vector space model ➔ Words as axis ➔ Bag of words defines vector space ➔ Document as a point in space ➔ We can ◆ define distance ◆ measure similarity (cosine similarity) ◆ group documents ➔ what can be a document ?
  • 33. evaluation metrics ➔ depends on type of model ◆ regression : MAPE, MSSE ◆ classification : accuracy, precision, recall, F score ◆ clustering : within vs between variance ➔ ML world (ref : two cultures) has much better story ➔ Not enough to perform well on training set
  • 34. brief intro regularization ➔ Bias vs Variance problem ➔ We want to be ‘just right’ ➔ Concept of regularization ➔ Intro Cross validation
  • 35. demo of evaluation and fantastic Scikits API
  • 36. sourcing and applying and the great ML divide
  • 37. the great ML divide Lab Culture ➔ Theory ➔ Small Datasets ➔ In memory ➔ Not live ➔ R, Octave, Python.. Source and Apply ➔ The practice ➔ Huge datasets ➔ Live in production ➔ Hadoop and friends, Python ? , R ?
  • 38. processing data at scale ➔ Data is not available in final form ➔ Non standard data ◆ click streams ◆ event logs ◆ free form text ➔ Process at scale ➔ Transform , clean, group in final form
  • 39. 5 min intro to Hadoop Ecosystem ➔ Write in assembly ◆ java ➔ DSLs ◆ Pig ◆ hive ◆ impala ➔ Functional Languages to rescue!
  • 41. Recap ➔ Pragmatic ML ➔ Key phases ➔ Supervised learning ➔ Unsupervised learning ➔ Evaluation ➔ Application issues ➔ Processing data @ scale