SlideShare a Scribd company logo
1 of 83
Unraveled
Machine Learning
Unraveled
Machine Learning
ravel [rav-uh l]
verb, raveled, raveling.
1. to disentangle or unravel the threads or fibers
of (a woven or knitted fabric, rope, etc.).
2. to tangle or entangle.
verb, raveled, raveling.
1. to disentangle or unravel
the threads or fibers of (a
woven or knitted fabric,
rope, etc.).
2. to tangle or entangle.
ravel[rav-uh l]
Field of study that gives computers the ability to learn without being explicitly programmed.
— Arthur Samuel (1959)
A computer program is said to learn from experience E with respect to some task T and some performance
measure P, if its performance on T, as measured by P, improves with experience E.
— Tom Mitchell (1998)
ML Definitions
39 Years
Field of study that gives computers the ability to
learn without being explicitly programmed.
— Arthur Samuel (1959)
A solved game is a game whose
outcome (win, lose, or draw) can
be correctly predicted from any
position, given that both players
play perfectly.
Heuristics
experience-based techniques for problem solving
rule of thumb
educated guess
intuitive judgment
stereotyping
common sense
A computer program is said to learn from experience
E with respect to some task T and some performance
measure P, if its performance on T, as measured by P,
improves with experience E.
— Tom Mitchell (1998)
Cat?
Dog!
MathematicsComputer Science
Machine
Learning Statistics
al community has been committed to the almost exclusive use of
— Leo Breiman, 2001
Kuhn, Max; Johnson, Kjell (2013-05-17). Applied Predictive Modeling
(Page 20). Springer. Kindle Edition.
Statistical Science , 2001, Vol. 16, No. 3, 199–231
Statistical Modeling: The Two Cultures, Leo Breiman
Statistical Science , 2001, Vol. 16, No. 3, 199–231
Statistical Modeling: The Two Cultures, Leo Breiman
This enterprise has at its heart the belief that a
statistician, by imagination and by looking at
the data, can invent a reasonably good
parametric class of models for a complex
mechanism devised by nature.
Statistical Science , 2001, Vol. 16, No. 3, 199–231
Statistical Modeling: The Two Cultures, Leo Breiman
MathematicsComputer Science
Machine
Learning Statistics
Mathematics
Computer Science
Machine
Learning Statistics
Da
ta
Sc
ien
ce
MathematicsComputer Science
Machine
Learning
Statistics
Data Science
Mathematics
Computer
Science
Hacking
Data Science
Mathematics
Computer
Science
Hacking
Data Science
Mathematics
Computer
Science
Hacking
Big
Data
Volume
Velocity
Variety
P(V__) = .649% = .00649
.00649 ^ 3 = 0.000000273359449
= 2.7 * 10^-7
Statistics
Vastly Simplified
Too much stuff to measure everything.
What to do?
Population = 1024
Confidence = 99%
N = 256
Red = 42%
CI = 6.9
35.1 - 48.9
Population = 16384
Confidence = 99%
CI = 1%
N = 8256
N = ALL
Cat Cat Cat Cat
Cat Cat Cat Cat
Dog Dog Dog Dog
Dog Dog Dog Dog
Cat Cat Cat Cat
Cat Cat Cat Cat
Dog Dog Dog Dog
Dog Dog Dog Dog
Training
Model
Recall
Cat Cat
Cat
CatCat
Cat
Dog Dog
Dog
DogDog
Cat
Predictions
Training
Dataset
(Labeled
Examples)
Test
Dataset
Training
Model
Recall Predictions
Cat Cat
Cat
Dog
Cat Cat
Cat Cat Cat
Dog Dog Dog
Dog Dog Dog Dog
Cross-
Validation
Dataset
Cat Cat Cat Cat
Cat Cat Cat Cat
Dog Dog Dog Dog
Dog Dog Dog Dog
Training
Dataset
(Labeled
Examples)
Training
Model
Recall
Cat Cat
Cat
CatCat
Cat
Dog Dog
Dog
DogDog
Cat
Predictions
Training
Dataset
(Labeled
Examples)
Test
Dataset
Prediction
e would be to think backwards . . . and that’s just
— Sheldon on The Big Bang Theory
Accuracy
Predictive modeling:
The process of developing a
mathematical tool or model that
generates an accurate prediction
— Kuhn, Max; Johnson, Kjell (2013-05-17).
Applied Predictive Modeling. Springer.
Predictions do not have to be
accurate to score big value.
— Siegel, Eric. Predictive Analytics:
The Power to Predict Who Will Click, Buy, Lie, or Die. Wiley.
more
Predict
Not Explain
Splork?
Splork?
//demonstrations.wolfram.com/KNearestNeighborKNNClassifier/
> k1<-knn3(splorkData[samp,-c(3,4)],splorkData$splork[samp], k=3)
> k1
3-nearest neighbor classification model
Call:
knn3.data.frame(x = splorkData[samp, -c(3, 4)], y = splorkData$splork[samp], k = 3)
Training set class distribution:
no yes
386 114
> pred<-predict(k1,newdata=splorkData[-samp,-c(3,4)],type="class")
> str(pred)
Factor w/ 2 levels "no","yes": 1 1 1 2 1 1 1 1 1 1 ...
> table(pred,splorkData$splork[-samp])
pred no yes
no 332 5
yes 0 117
> rf <- randomForest(splork ~ ., data=splorkData[1:500,])
Call:
randomForest(formula = splork ~ ., data = splorkData[1:500, ]
Type of random forest: classification
Number of trees: 500
No. of variables tried at each split: 1
OOB estimate of error rate: 0%
Confusion matrix:
no yes class.error
no 462 0 0
yes 0 38 0
>
> table(pred,splorkData$splor
pred no yes
no 418 0
yes 0 36
1
32
4
> getTree(rf)
left daughter right daughter split var split point status prediction
0: 1 2 3 7 0.5 1 0
1 :2 4 5 6 0.5 1 0
2 :3 0 0 0 0.0 -1 2
3 :4 6 7 3 1.0 1 0
4 :5 0 0 0 0.0 -1 1
5 :6 0 0 0 0.0 -1 1
6 :7 0 0 0 0.0 -1 1
6 7
5
w=0 w=1
Title: SECOM Data Set
Abstract: Data from a semi-conductor manufacturing process
-----------------------------------------------------
Data Set Characteristics: Multivariate
Number of Instances: 1567
Area: Computer
Attribute Characteristics: Real
Number of Attributes: 591
Date Donated: 2008-11-19
Associated Tasks: Classification, Causal-Discovery
Missing Values? Yes
#make training and test subsets
>train<-secom[1:1000,]
>test<-secom[1001:1567,]
#get rid of near-zero-variance variables
>train<-train[,-nearZeroVar(train)]
>test<-test[,-nearZeroVar(test)]
#impute missing values
>train<-na.roughfix(train)
>test<-na.roughfix(test)
#scale and center
>tr1<-preProcess(train, method = c("center", "scale"))
>tr2<-preProcess(test, method = c("center", "scale"))
>traincs<-predict(tr1,train)
>testcs<-predict(tr2,test)
> fit <- glm(secResp ~ .,data=data.frame(sec[train,],secResp=
Warning messages:
1: glm.fit: algorithm did not converge
2: glm.fit: fitted probabilities numerically 0 or 1 occurred
>
For the background to warning messages about ‘fitted probabilities numerically 0 or 1 occurred’ for
binomial GLMs, see Venables & Ripley (2002, pp. 197–8).
There is one fairly common circumstance in which both convergence problems and the Hauck-
Donner phenomenon can occur. This is when the fitted probabilities are extremely close to zero or
one. Consider a medical diagnosis problem with thousands of cases and around 50 binary
explanatory variable (which may arise from coding fewer categorical variables); one of these
indicators is rarely true but always indicates that the disease is present.
>Call:
randomForest(formula = secResp ~ ., data = data.frame(sec
Type of random forest: classification
Number of trees: 500
No. of variables tried at each split: 21
OOB estimate of error rate: 7.6%
Confusion matrix:
0 1 class.error
0 924 0 0
1 76 0 1
> ab
Call:
ada(sec[train, subset], y = secResp[train], test.x = sec[test,
], test.y = secResp[test], type = "gentle", iter = 100)
Loss: exponential Method: gentle Iteration: 100
Final Confusion Matrix for Data:
Final Prediction
True value 0 1
0 924 0
1 50 26
Train Error: 0.05
Out-Of-Bag Error: 0.061 iteration= 100
Additional Estimates of number of iterations:
train.err1 train.kap1 test.err2 test.kap2
95 95 2 1
> pred<-predict(ab,sec[test,])
> table(pred,secResp[test])
pred 0 1
0 539 28
1 0 0
Over Fitting
People . . . operate with beliefs and biases.
xtent you can eliminate both and replace them w
you gain a clear advantage.
— Michael Lewis, Moneyball:
The Art of Winning an Unfair Game
Ml presentation

More Related Content

What's hot

4.5 multiplying and dividng by powers of 10
4.5 multiplying and dividng by powers of 104.5 multiplying and dividng by powers of 10
4.5 multiplying and dividng by powers of 10Rachel
 
Tree models with Scikit-Learn: Great models with little assumptions
Tree models with Scikit-Learn: Great models with little assumptionsTree models with Scikit-Learn: Great models with little assumptions
Tree models with Scikit-Learn: Great models with little assumptionsGilles Louppe
 
Getting Started with Machine Learning
Getting Started with Machine LearningGetting Started with Machine Learning
Getting Started with Machine LearningHumberto Marchezi
 
4.5 multiplying and dividng by powers of 10
4.5 multiplying and dividng by powers of 104.5 multiplying and dividng by powers of 10
4.5 multiplying and dividng by powers of 10Rachel
 
Understanding Random Forests: From Theory to Practice
Understanding Random Forests: From Theory to PracticeUnderstanding Random Forests: From Theory to Practice
Understanding Random Forests: From Theory to PracticeGilles Louppe
 
Data mining Computerassignment 3
Data mining Computerassignment 3Data mining Computerassignment 3
Data mining Computerassignment 3BarryK88
 
Vc dimension in Machine Learning
Vc dimension in Machine LearningVc dimension in Machine Learning
Vc dimension in Machine LearningVARUN KUMAR
 
Data mining assignment 3
Data mining assignment 3Data mining assignment 3
Data mining assignment 3BarryK88
 
Mws gen nle_ppt_bisection
Mws gen nle_ppt_bisectionMws gen nle_ppt_bisection
Mws gen nle_ppt_bisectionAlvin Setiawan
 
The Power of Ensembles in Machine Learning
The Power of Ensembles in Machine LearningThe Power of Ensembles in Machine Learning
The Power of Ensembles in Machine LearningAmit Kapoor
 
Machine Learning in Agriculture Module 3: linear regression
Machine Learning in Agriculture Module 3: linear regressionMachine Learning in Agriculture Module 3: linear regression
Machine Learning in Agriculture Module 3: linear regressionPrasenjit Dey
 
Functional Programming In Mathematica
Functional Programming In MathematicaFunctional Programming In Mathematica
Functional Programming In MathematicaHossam Karim
 
Machine Learning with Go
Machine Learning with GoMachine Learning with Go
Machine Learning with GoJames Bowman
 
11 Machine Learning Important Issues in Machine Learning
11 Machine Learning Important Issues in Machine Learning11 Machine Learning Important Issues in Machine Learning
11 Machine Learning Important Issues in Machine LearningAndres Mendez-Vazquez
 
"Automatic Variational Inference in Stan" NIPS2015_yomi2016-01-20
"Automatic Variational Inference in Stan" NIPS2015_yomi2016-01-20"Automatic Variational Inference in Stan" NIPS2015_yomi2016-01-20
"Automatic Variational Inference in Stan" NIPS2015_yomi2016-01-20Yuta Kashino
 

What's hot (20)

4.5 multiplying and dividng by powers of 10
4.5 multiplying and dividng by powers of 104.5 multiplying and dividng by powers of 10
4.5 multiplying and dividng by powers of 10
 
Tree models with Scikit-Learn: Great models with little assumptions
Tree models with Scikit-Learn: Great models with little assumptionsTree models with Scikit-Learn: Great models with little assumptions
Tree models with Scikit-Learn: Great models with little assumptions
 
Getting Started with Machine Learning
Getting Started with Machine LearningGetting Started with Machine Learning
Getting Started with Machine Learning
 
Lecture 07.
Lecture 07.Lecture 07.
Lecture 07.
 
4.5 multiplying and dividng by powers of 10
4.5 multiplying and dividng by powers of 104.5 multiplying and dividng by powers of 10
4.5 multiplying and dividng by powers of 10
 
Machine Learning Basics
Machine Learning BasicsMachine Learning Basics
Machine Learning Basics
 
Understanding Random Forests: From Theory to Practice
Understanding Random Forests: From Theory to PracticeUnderstanding Random Forests: From Theory to Practice
Understanding Random Forests: From Theory to Practice
 
Data mining Computerassignment 3
Data mining Computerassignment 3Data mining Computerassignment 3
Data mining Computerassignment 3
 
Vc dimension in Machine Learning
Vc dimension in Machine LearningVc dimension in Machine Learning
Vc dimension in Machine Learning
 
STLD- Switching functions
STLD- Switching functions STLD- Switching functions
STLD- Switching functions
 
Data mining assignment 3
Data mining assignment 3Data mining assignment 3
Data mining assignment 3
 
Mws gen nle_ppt_bisection
Mws gen nle_ppt_bisectionMws gen nle_ppt_bisection
Mws gen nle_ppt_bisection
 
The Power of Ensembles in Machine Learning
The Power of Ensembles in Machine LearningThe Power of Ensembles in Machine Learning
The Power of Ensembles in Machine Learning
 
Talk 2
Talk 2Talk 2
Talk 2
 
Montecarlophd
MontecarlophdMontecarlophd
Montecarlophd
 
Machine Learning in Agriculture Module 3: linear regression
Machine Learning in Agriculture Module 3: linear regressionMachine Learning in Agriculture Module 3: linear regression
Machine Learning in Agriculture Module 3: linear regression
 
Functional Programming In Mathematica
Functional Programming In MathematicaFunctional Programming In Mathematica
Functional Programming In Mathematica
 
Machine Learning with Go
Machine Learning with GoMachine Learning with Go
Machine Learning with Go
 
11 Machine Learning Important Issues in Machine Learning
11 Machine Learning Important Issues in Machine Learning11 Machine Learning Important Issues in Machine Learning
11 Machine Learning Important Issues in Machine Learning
 
"Automatic Variational Inference in Stan" NIPS2015_yomi2016-01-20
"Automatic Variational Inference in Stan" NIPS2015_yomi2016-01-20"Automatic Variational Inference in Stan" NIPS2015_yomi2016-01-20
"Automatic Variational Inference in Stan" NIPS2015_yomi2016-01-20
 

Similar to Ml presentation

Introduction
IntroductionIntroduction
Introductionbutest
 
Complex models in ecology: challenges and solutions
Complex models in ecology: challenges and solutionsComplex models in ecology: challenges and solutions
Complex models in ecology: challenges and solutionsPeter Solymos
 
. An introduction to machine learning and probabilistic ...
. An introduction to machine learning and probabilistic .... An introduction to machine learning and probabilistic ...
. An introduction to machine learning and probabilistic ...butest
 
Computational Biology, Part 4 Protein Coding Regions
Computational Biology, Part 4 Protein Coding RegionsComputational Biology, Part 4 Protein Coding Regions
Computational Biology, Part 4 Protein Coding Regionsbutest
 
maXbox starter69 Machine Learning VII
maXbox starter69 Machine Learning VIImaXbox starter69 Machine Learning VII
maXbox starter69 Machine Learning VIIMax Kleiner
 
Ensembles of Many Diverse Weak Defenses can be Strong: Defending Deep Neural ...
Ensembles of Many Diverse Weak Defenses can be Strong: Defending Deep Neural ...Ensembles of Many Diverse Weak Defenses can be Strong: Defending Deep Neural ...
Ensembles of Many Diverse Weak Defenses can be Strong: Defending Deep Neural ...Pooyan Jamshidi
 
Two methods for optimising cognitive model parameters
Two methods for optimising cognitive model parametersTwo methods for optimising cognitive model parameters
Two methods for optimising cognitive model parametersUniversity of Huddersfield
 
Machine Learning ebook.pdf
Machine Learning ebook.pdfMachine Learning ebook.pdf
Machine Learning ebook.pdfHODIT12
 
1_5_AI_edx_ml_51intro_240204_104838machine learning lecture 1
1_5_AI_edx_ml_51intro_240204_104838machine learning lecture 11_5_AI_edx_ml_51intro_240204_104838machine learning lecture 1
1_5_AI_edx_ml_51intro_240204_104838machine learning lecture 1MostafaHazemMostafaa
 
Cost Optimized Design Technique for Pseudo-Random Numbers in Cellular Automata
Cost Optimized Design Technique for Pseudo-Random Numbers in Cellular AutomataCost Optimized Design Technique for Pseudo-Random Numbers in Cellular Automata
Cost Optimized Design Technique for Pseudo-Random Numbers in Cellular Automataijait
 
isabelle_webinar_jan..
isabelle_webinar_jan..isabelle_webinar_jan..
isabelle_webinar_jan..butest
 
Kaggle Projects Presentation Sawinder Pal Kaur
Kaggle Projects Presentation Sawinder Pal KaurKaggle Projects Presentation Sawinder Pal Kaur
Kaggle Projects Presentation Sawinder Pal KaurSawinder Pal Kaur
 
Phylogenetics Analysis in R
Phylogenetics Analysis in RPhylogenetics Analysis in R
Phylogenetics Analysis in RKlaus Schliep
 
Machine Learning
Machine LearningMachine Learning
Machine Learningbutest
 
know Machine Learning Basic Concepts.pdf
know Machine Learning Basic Concepts.pdfknow Machine Learning Basic Concepts.pdf
know Machine Learning Basic Concepts.pdfhemangppatel
 
Machine learning in science and industry — day 1
Machine learning in science and industry — day 1Machine learning in science and industry — day 1
Machine learning in science and industry — day 1arogozhnikov
 
Camp IT: Making the World More Efficient Using AI & Machine Learning
Camp IT: Making the World More Efficient Using AI & Machine LearningCamp IT: Making the World More Efficient Using AI & Machine Learning
Camp IT: Making the World More Efficient Using AI & Machine LearningKrzysztof Kowalczyk
 
HW2-1_05.doc
HW2-1_05.docHW2-1_05.doc
HW2-1_05.docbutest
 

Similar to Ml presentation (20)

Introduction
IntroductionIntroduction
Introduction
 
Complex models in ecology: challenges and solutions
Complex models in ecology: challenges and solutionsComplex models in ecology: challenges and solutions
Complex models in ecology: challenges and solutions
 
. An introduction to machine learning and probabilistic ...
. An introduction to machine learning and probabilistic .... An introduction to machine learning and probabilistic ...
. An introduction to machine learning and probabilistic ...
 
Computational Biology, Part 4 Protein Coding Regions
Computational Biology, Part 4 Protein Coding RegionsComputational Biology, Part 4 Protein Coding Regions
Computational Biology, Part 4 Protein Coding Regions
 
maXbox starter69 Machine Learning VII
maXbox starter69 Machine Learning VIImaXbox starter69 Machine Learning VII
maXbox starter69 Machine Learning VII
 
Into to prob_prog_hari (2)
Into to prob_prog_hari (2)Into to prob_prog_hari (2)
Into to prob_prog_hari (2)
 
Ensembles of Many Diverse Weak Defenses can be Strong: Defending Deep Neural ...
Ensembles of Many Diverse Weak Defenses can be Strong: Defending Deep Neural ...Ensembles of Many Diverse Weak Defenses can be Strong: Defending Deep Neural ...
Ensembles of Many Diverse Weak Defenses can be Strong: Defending Deep Neural ...
 
Two methods for optimising cognitive model parameters
Two methods for optimising cognitive model parametersTwo methods for optimising cognitive model parameters
Two methods for optimising cognitive model parameters
 
Machine Learning ebook.pdf
Machine Learning ebook.pdfMachine Learning ebook.pdf
Machine Learning ebook.pdf
 
1_5_AI_edx_ml_51intro_240204_104838machine learning lecture 1
1_5_AI_edx_ml_51intro_240204_104838machine learning lecture 11_5_AI_edx_ml_51intro_240204_104838machine learning lecture 1
1_5_AI_edx_ml_51intro_240204_104838machine learning lecture 1
 
Cost Optimized Design Technique for Pseudo-Random Numbers in Cellular Automata
Cost Optimized Design Technique for Pseudo-Random Numbers in Cellular AutomataCost Optimized Design Technique for Pseudo-Random Numbers in Cellular Automata
Cost Optimized Design Technique for Pseudo-Random Numbers in Cellular Automata
 
isabelle_webinar_jan..
isabelle_webinar_jan..isabelle_webinar_jan..
isabelle_webinar_jan..
 
Kaggle Projects Presentation Sawinder Pal Kaur
Kaggle Projects Presentation Sawinder Pal KaurKaggle Projects Presentation Sawinder Pal Kaur
Kaggle Projects Presentation Sawinder Pal Kaur
 
Phylogenetics Analysis in R
Phylogenetics Analysis in RPhylogenetics Analysis in R
Phylogenetics Analysis in R
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
know Machine Learning Basic Concepts.pdf
know Machine Learning Basic Concepts.pdfknow Machine Learning Basic Concepts.pdf
know Machine Learning Basic Concepts.pdf
 
Machine learning in science and industry — day 1
Machine learning in science and industry — day 1Machine learning in science and industry — day 1
Machine learning in science and industry — day 1
 
Camp IT: Making the World More Efficient Using AI & Machine Learning
Camp IT: Making the World More Efficient Using AI & Machine LearningCamp IT: Making the World More Efficient Using AI & Machine Learning
Camp IT: Making the World More Efficient Using AI & Machine Learning
 
HW2-1_05.doc
HW2-1_05.docHW2-1_05.doc
HW2-1_05.doc
 
The Perceptron (D1L1 Insight@DCU Machine Learning Workshop 2017)
The Perceptron (D1L1 Insight@DCU Machine Learning Workshop 2017)The Perceptron (D1L1 Insight@DCU Machine Learning Workshop 2017)
The Perceptron (D1L1 Insight@DCU Machine Learning Workshop 2017)
 

Recently uploaded

Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Steffen Staab
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...panagenda
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...OnePlan Solutions
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfjoe51371421
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerThousandEyes
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software DevelopersVinodh Ram
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️anilsa9823
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 

Recently uploaded (20)

Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
Exploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the ProcessExploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the Process
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdf
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software Developers
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 

Ml presentation

  • 2. Unraveled Machine Learning ravel [rav-uh l] verb, raveled, raveling. 1. to disentangle or unravel the threads or fibers of (a woven or knitted fabric, rope, etc.). 2. to tangle or entangle.
  • 3.
  • 4. verb, raveled, raveling. 1. to disentangle or unravel the threads or fibers of (a woven or knitted fabric, rope, etc.). 2. to tangle or entangle. ravel[rav-uh l]
  • 5. Field of study that gives computers the ability to learn without being explicitly programmed. — Arthur Samuel (1959) A computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E. — Tom Mitchell (1998) ML Definitions 39 Years
  • 6. Field of study that gives computers the ability to learn without being explicitly programmed. — Arthur Samuel (1959)
  • 7.
  • 8.
  • 9. A solved game is a game whose outcome (win, lose, or draw) can be correctly predicted from any position, given that both players play perfectly.
  • 10. Heuristics experience-based techniques for problem solving rule of thumb educated guess intuitive judgment stereotyping common sense
  • 11. A computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E. — Tom Mitchell (1998)
  • 14. al community has been committed to the almost exclusive use of — Leo Breiman, 2001
  • 15. Kuhn, Max; Johnson, Kjell (2013-05-17). Applied Predictive Modeling (Page 20). Springer. Kindle Edition.
  • 16. Statistical Science , 2001, Vol. 16, No. 3, 199–231 Statistical Modeling: The Two Cultures, Leo Breiman
  • 17. Statistical Science , 2001, Vol. 16, No. 3, 199–231 Statistical Modeling: The Two Cultures, Leo Breiman
  • 18. This enterprise has at its heart the belief that a statistician, by imagination and by looking at the data, can invent a reasonably good parametric class of models for a complex mechanism devised by nature. Statistical Science , 2001, Vol. 16, No. 3, 199–231 Statistical Modeling: The Two Cultures, Leo Breiman
  • 19.
  • 25.
  • 29. P(V__) = .649% = .00649 .00649 ^ 3 = 0.000000273359449 = 2.7 * 10^-7
  • 30.
  • 31.
  • 32.
  • 33.
  • 34. Statistics Vastly Simplified Too much stuff to measure everything. What to do?
  • 35. Population = 1024 Confidence = 99% N = 256 Red = 42% CI = 6.9 35.1 - 48.9
  • 36. Population = 16384 Confidence = 99% CI = 1% N = 8256
  • 37.
  • 39.
  • 40.
  • 41.
  • 42.
  • 43.
  • 44.
  • 45.
  • 46. Cat Cat Cat Cat Cat Cat Cat Cat Dog Dog Dog Dog Dog Dog Dog Dog
  • 47. Cat Cat Cat Cat Cat Cat Cat Cat Dog Dog Dog Dog Dog Dog Dog Dog Training Model Recall Cat Cat Cat CatCat Cat Dog Dog Dog DogDog Cat Predictions Training Dataset (Labeled Examples) Test Dataset
  • 48. Training Model Recall Predictions Cat Cat Cat Dog Cat Cat Cat Cat Cat Dog Dog Dog Dog Dog Dog Dog Cross- Validation Dataset Cat Cat Cat Cat Cat Cat Cat Cat Dog Dog Dog Dog Dog Dog Dog Dog Training Dataset (Labeled Examples)
  • 50.
  • 51.
  • 52.
  • 53.
  • 54.
  • 55. Prediction e would be to think backwards . . . and that’s just — Sheldon on The Big Bang Theory
  • 56.
  • 58. Predictive modeling: The process of developing a mathematical tool or model that generates an accurate prediction — Kuhn, Max; Johnson, Kjell (2013-05-17). Applied Predictive Modeling. Springer. Predictions do not have to be accurate to score big value. — Siegel, Eric. Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie, or Die. Wiley. more
  • 62.
  • 63.
  • 64.
  • 65.
  • 66.
  • 68.
  • 69. > k1<-knn3(splorkData[samp,-c(3,4)],splorkData$splork[samp], k=3) > k1 3-nearest neighbor classification model Call: knn3.data.frame(x = splorkData[samp, -c(3, 4)], y = splorkData$splork[samp], k = 3) Training set class distribution: no yes 386 114 > pred<-predict(k1,newdata=splorkData[-samp,-c(3,4)],type="class") > str(pred) Factor w/ 2 levels "no","yes": 1 1 1 2 1 1 1 1 1 1 ... > table(pred,splorkData$splork[-samp]) pred no yes no 332 5 yes 0 117
  • 70. > rf <- randomForest(splork ~ ., data=splorkData[1:500,]) Call: randomForest(formula = splork ~ ., data = splorkData[1:500, ] Type of random forest: classification Number of trees: 500 No. of variables tried at each split: 1 OOB estimate of error rate: 0% Confusion matrix: no yes class.error no 462 0 0 yes 0 38 0 > > table(pred,splorkData$splor pred no yes no 418 0 yes 0 36
  • 71. 1 32 4 > getTree(rf) left daughter right daughter split var split point status prediction 0: 1 2 3 7 0.5 1 0 1 :2 4 5 6 0.5 1 0 2 :3 0 0 0 0.0 -1 2 3 :4 6 7 3 1.0 1 0 4 :5 0 0 0 0.0 -1 1 5 :6 0 0 0 0.0 -1 1 6 :7 0 0 0 0.0 -1 1 6 7 5 w=0 w=1
  • 72. Title: SECOM Data Set Abstract: Data from a semi-conductor manufacturing process ----------------------------------------------------- Data Set Characteristics: Multivariate Number of Instances: 1567 Area: Computer Attribute Characteristics: Real Number of Attributes: 591 Date Donated: 2008-11-19 Associated Tasks: Classification, Causal-Discovery Missing Values? Yes
  • 73.
  • 74. #make training and test subsets >train<-secom[1:1000,] >test<-secom[1001:1567,] #get rid of near-zero-variance variables >train<-train[,-nearZeroVar(train)] >test<-test[,-nearZeroVar(test)] #impute missing values >train<-na.roughfix(train) >test<-na.roughfix(test) #scale and center >tr1<-preProcess(train, method = c("center", "scale")) >tr2<-preProcess(test, method = c("center", "scale")) >traincs<-predict(tr1,train) >testcs<-predict(tr2,test)
  • 75.
  • 76. > fit <- glm(secResp ~ .,data=data.frame(sec[train,],secResp= Warning messages: 1: glm.fit: algorithm did not converge 2: glm.fit: fitted probabilities numerically 0 or 1 occurred > For the background to warning messages about ‘fitted probabilities numerically 0 or 1 occurred’ for binomial GLMs, see Venables & Ripley (2002, pp. 197–8). There is one fairly common circumstance in which both convergence problems and the Hauck- Donner phenomenon can occur. This is when the fitted probabilities are extremely close to zero or one. Consider a medical diagnosis problem with thousands of cases and around 50 binary explanatory variable (which may arise from coding fewer categorical variables); one of these indicators is rarely true but always indicates that the disease is present.
  • 77. >Call: randomForest(formula = secResp ~ ., data = data.frame(sec Type of random forest: classification Number of trees: 500 No. of variables tried at each split: 21 OOB estimate of error rate: 7.6% Confusion matrix: 0 1 class.error 0 924 0 0 1 76 0 1
  • 78.
  • 79. > ab Call: ada(sec[train, subset], y = secResp[train], test.x = sec[test, ], test.y = secResp[test], type = "gentle", iter = 100) Loss: exponential Method: gentle Iteration: 100 Final Confusion Matrix for Data: Final Prediction True value 0 1 0 924 0 1 50 26 Train Error: 0.05 Out-Of-Bag Error: 0.061 iteration= 100 Additional Estimates of number of iterations: train.err1 train.kap1 test.err2 test.kap2 95 95 2 1 > pred<-predict(ab,sec[test,]) > table(pred,secResp[test]) pred 0 1 0 539 28 1 0 0
  • 80.
  • 82. People . . . operate with beliefs and biases. xtent you can eliminate both and replace them w you gain a clear advantage. — Michael Lewis, Moneyball: The Art of Winning an Unfair Game

Editor's Notes

  1. Hi, I’m Mark Fetherolf. I am a data scientist and president of Numinary Data Science. My goal today is to *unravel* machine learning
  2. I chose unraveled over explained, expounded, revealed, uncovered, elucidated, and of course *for dummies* I chose unraveled because
  3. I really like the word *ravel* It has …
  4. two definitions that are exact opposites so no matter whether I enlighten or confuse you, I will still achieve the goal of unraveling Machine Learning also has several definitions …
  5. They are not opposites but are separated by 39 years The first from Arthur Samuel, arguably the father of Machine Learning
  6. In 1959 checkers on IBM’s first, the IBM 701. sensational; IBM's stock +15 overnight. Worlds fair ny 1965; age 11; Selectric, Touchtone, tic/tac/toe; played 100 times; could get to a tie every time; Samuel used “wrote learning”; scored a polynomial function that rated each position to score moves later (temporal scoring, alpha / beta pruning). Problem with getting “stuck” .. / nudge … 8 years later …
  7. in 1967, he wrote that getting the program to generate its own parameters - without nudging seemed as far in the future had in 1959 a mere 40 years later …
  8. A mere 40 years later
  9. IN the absence of a solution In other words, in the absence of exact rules, —> we rely on Heuristics
  10. Games that aren’t solved are can still be played And they can be played according to very specific rules or algorithms 1959: Arthur Samuel wrote computer programs that applied heuristics to checkers; and they were a very special kind of heuristics —> … ones that get better with experience!
  11. 39 years forward …. and that rhymes with T and that stands for _______________Tom Mitchell -> Professor at CMU -> more formal definition … 39 years after Arthur Samuel; T - Task; E - Experience; P - Performance; Sounds complicated, but really, is it …
  12. Little Amanda - unlike Amy who got the wrong on purpose; funny; siri - haircut Task: Classify cats and dogs Experience: 1) Guess; 2) Look at mom/dad for feedback Performance: right / wrong; Machines learning has a lot in common with human learning and quite a lot NOT in common … SIMILARLY … As a species I have the feeling that we are unique in being defensive about our intellect. Comcast: ask people, of the tv shows you haven’t yet seen, which ones are you likely to watch next year
  13. Machine learning has a lot in common with Statistics and a lot that is not; How many of you are statisticians? Know a statistician? Know more than 1 statistician? Some statisticians are exceptionally methodical, perhaps to the point of being fussy about methodology and it’s proper application … we have these in the computer programming business too … difference between … ML people on the other hand are sometimes a bit less *regulated*
  14. Prominent ML pioneer, Leo Breiman stirred things up a bunch in 2001 when we wrote that statistical methods led to irrelevant theory and questionable conclusions
  15. Linear regression IS a form of ML but breiman points out that Every ML book and course starts out with linear regression, which we all learned in statistics so what’s the issue …? Well, Breiman says ..
  16. input and an output connected by nature; x is rainfall, y is plant growth; x is force, y is acceleration; x is how much time college students spend playing minecraft and y is their GPAs
  17. brieman says that statisticians assume that there must be a parametric model that describes the relationship
  18. algorithmic modelers on the other hand treat nature as entirely unknown and just go for results
  19. Of course some of this difference in perspective can be traced to the parent disciplines
  20. disciplines that have coexisted peacefully but warily; I remember as a maths student marveling and waxing poetic about PI, how such a simple natural thing could be so complex and subtle; my friend ed, a comp sci major, said, “what’s the big deal, you’ve got a circle. measure it. In one of my three favorite public relations campaigns …
  21. In one of my three favorite public relations campaigns, we aimed to resolve the conflict by renaming the whole thing data science which encompasses —> (would you like to hear the other two)
  22. mathematics, statistics, computer science, machine learning; hacking is included to highlight practicality; data science is a practice not a purely academic discipline; Data scientists are described as people who are better at programming than most mathematicians and better at maths than most programmers; when I heard this I was, of course, thrilled .. THAT’s me
  23. Davenport and Patel’s HBR article was the icing on the cake for me: I’ve had lost of jobs; several in the 21st century; none until now sexy
  24. but of course we can’t talk about data science without mentioning the elephant in the room (who knows his name?) Hadoop!
  25. Doug Cutting, Hadoop's creator, named the framework after his child's stuffed toy elephant.
  26. I don’t know about you, but I am deeply suspicious of such pneumonic. V is the first letter of .649% of english words. Which means that the changes these are the best is 0.000000273359449 (2.3 * 10^-7)
  27. I don’t know about you, but I am deeply suspicious of such pneumonic. V is the first letter of .649% of english words. Which means that the changes these are the best is 0.000000273359449 (2.3 * 10^-7)
  28. Here is, I think, a more clear illustration of big data Long ago, in the in the real world, one of my jobs, as a child, was to make a database to collect data from this incredibly clean oil refinery,
  29. actually looked more like this; with thousands of pressure, temperature and flow-sensing devices feeding data into a place like this …
  30. feeds from temperature, pressure, flow rates etc. all fed into the control room where
  31. Harold, the diligent young engineer, … unrealistic (lacks pocket protector) collects data; in 30 minutes he could record 800-100 measurements; so he could gather one sample for each data point week or so; in reality he focuses his sampling on the project he’s working on that day; optimizing his thin-film reactor perhaps
  32. 1937 Census used statistical sampling to measure the extent of unemployment.
  33. Statistics are great when you are overwhelmed with data. Take a random sample use statistics to predict population parameters within known confidence limits …
  34. and it works great in the absence of sampling bias and we have ways of measuring these and we have stratification and other strategies that work until we encounter effects like …
  35. data points live in some kind of high dimension spatial landscape and the probability of having some property function of ones neighbor having that same property?
  36. We replace harold …
  37. a computer system that collects 10000 data points per second; and it doesn’t care what for or whether they are used
  38. N = ALL is a big idea. So big I put it on my facebook page. But I don’t spend much time on facebook.
  39. When it comes to obsessive time-wasting on the internet, I much prefer Kaggle; 217K people competing to solve ML problems; Sentiment; recommendation; web optimization; the higgs boson; predict seizures;
  40. It took me around 500 hours to get up to 925th; which puts me in the top 1% of the sexiest job of the century! Whoa!
  41. Let’s look at cats and dogs. September 2013 - petfinder data - 4 months; good example to step through the ML process
  42. cats and dogs … EXAMPLES; LABELED Kaggle got data from pet-finder; 25K labeled pictures of cats and dogs for training and a bunch of others for testing;
  43. 4 months later, Pierre Sermanet achieved .98914 success rate PREDICTING … which are cats and dogs Interesting we use the word predicting … cat and dog are TRUTH / ground-truth
  44. and We call it predictive modeling, even when what we are predicting may already be in the past. We could call it guessing; when you flip a coin and cover it with your hand and I say heads; am I predicting or GUESSING; I think of predictive modeling as informed guessing …
  45. Jeanne Dixon was one of the great guessers of all time and she did predict that Kennedy would be assassinated, sort-of … which brings up the issue of accuracy —->
  46. What does accurate mean? Mark’s weather prediction model. How accurate is it? I also have A psychological test. It’s a test for being a psychopath. I ask you if you are a psychopath and then classify you as a psychopath regardless of what you say. Like the stopped watch that is right twice daily …
  47. more accurate than mark’s weather forecast … concept of baseline
  48. However, if enough patients have taken the alternative therapy, then data could be collected on these patients related to their disease, treatment history, and demographics. Also, laboratory tests could be collected related to patients’ genetic background or other biological data (e.g., protein measurements). Given their outcome, a predictive model could be created to predict the response to the alternative therapy based on these data. The critical question for the doctor and patient is a prediction of how the patient will react to a change in therapy. All, this prediction needs to be accurate. Kuhn, Max; Johnson, Kjell (2013-05-17). Applied Predictive Modeling (Page 4). Springer. Kindle Edition.
  49. edges are numerical values; color is a label or classifier we will use a common trick and recode colors using dummy variables —-> dummy as in showroom dummy as opposed to quantum electrodynamics for dummies
  50. we will look at the distribution of each of the variables over the domain of spork-ness versus non-splorkness