SlideShare a Scribd company logo
1 of 22
Introduction to
Machine Learning
Portland Data Science Group
Created by Andrew Ferlitsch
Community Outreach Officer
May, 2017
ML : DS vs CS
MLDATA
SCIENCE
COMPUTER
SCIENCE
Machine Learning is a marriage of Statistics and Computer Science
Traditional Front Ends
Categories
ML
Regression/Classification
(Prediction)
Reinforcement Learning
(Robotics)
Neural Networks
(Recognition)
Natural Language
Processing
(Speech/Language)
Solutions
ML
Statistics Computer Science
Classification
Prediction
Forecasting
Recommendation
Association
Apple vs Pear
Life Expectancy
Weather
Shelf Placement
Suggestions
Computer Vision
Speech Recognition
Language
Dynamic Adaption
Road Signs
Voice to Text
Translation
Audio RecognitionAutomation
Gun Shots
Examples
Ladder ( Complexity == Salary)
Marketing / Sales
Text Classification
Human Interface
Computer Vision
Factory Automation
3D Printing
Autonomous
Space
Frontier
Emergent
Mature Growth Very Good $$
Exceptional $$
Stratosphere $$
It’s About Training
Machine Learning is about using data to train a model
DATA
Training
Data
Test
Data
Train
Split Dataset into Training and Test
Model
Use Training Data to Train the Model
Produce Model
Test the Model Accuracy
Determine
Accuracy
It’s in the Label
Features + Label (where label is what the item [row] is.
e.g., apple.
Supervised versus Unsupervised Learning
Labeled
Data
Unlabeled
Data
Features Only. e.g., we do not know its an apple
• Human (or program) pre-label the data.
• Learn how features map to labels.
• Learn how features map to clusters.
• Learn how clusters map to labels.
Supervised Learning
Feature 1 Feature 2 Feature 3 Feature 4 Label
real-value real-value real-value categorical-value category/value
real-value real-value real-value categorical-value category/value
real-value real-value real-value categorical-value category/value
Attributes of each sample What the sample is
Weight (oz) Width (in) Height (in) Color Label
6 3 3.5 green apple
2 1.5 8 yellow banana
7 4 4.7 yellow apple
Example
Little SignificanceGreater Significance
Decision Tree
Simple Training Model
1st Feature
2nd Feature 2nd Feature
green yellow
3rd Feature 3rd Feature 3rd Feature 3rd Feature
color
weight
< 4 >= 4 < 3.5 >= 3.5
width
height
e.g., yellow
apples weigh more
then green apples
4th Feature 4th Feature 4th Feature
banana apple banana apple
Learn
thresholds
Leaves are the classification
apple
Pruning
Weight (oz) Width (in) Height (in) Color Label
6 3 3.5 green apple
2 1.5 8 yellow banana
7 4 4.7 yellow apple
Assume does not contribute to outcome
Weight (oz) Width (in) Height (in) Label
6 3 3.5 apple
2 1.5 8 banana
7 4 4.7 apple
Decision Tree After Pruning
Simple Training Model
1st Feature
2nd Feature 2nd Feature
< 4 >= 4
3rd Feature 3rd Feature 3rd Feature 3rd Feature
weight
width
< 2 >= 2 < 2.5 >= 2.5
height
banana apple banana apple
Leaves are the classification
apple
> 3 <= 3
Ensemble – Decision Stumps
Decision Stumps – Weak Learners
1st Feature
2nd Feature
< 4 >= 4
3rd Feature
weight
width
< 2.5 >= 2.5
height
banana apple
banana apple
apple
<= 4> 4
banana
MAJORITY VOTE
Weight: 4.2 = Apple
Width : 2.3 = Banana
Height : 5.5 = Banana
VOTE = Banana
(Simple) Linear Regression
It’s In The Line
Age
(x)
0
Feature (data)
Spend
(y)
Label
(learn) Data Plotted (Scatter)
Best Fitted Line
y = a + bx
a
bx (slope)
Loss Function
Minimize Loss (Estimated Error) when Fitting a Line
y1
Actual Values (y)
Predicted Values (yhat)
y2
y3
y4
y5
y6
1
𝑛
𝑗=1
𝑛
(𝑦 − 𝑦ℎ𝑎𝑡)2
MSE =
(y – yhat)
Mean Square Error
Sum the Square of the Difference
Divide by the number of samples
Libraries Do the Work
Python & R have libraries that do the math!
e.g., numPy, sci-learn
Split the
Dataset X_train, X_test = split( dataset, 0.80 )
Train the
Model
dataset percentage (e.g., 80% train, 20% test)training & test data
model = train( X_train, 4 )
training
datamethod
trained
model
Test the
Model Y_test = model( X_test, 4 )
column of
label
predicted
values
trained
model
test
data
column of
label
Calculate
Accuracy
result = accuracy( X_test, 4, Y_test )
actual value
predicted
values
Pseudo Names
Unsupervised Learning
Feature 1 Feature 2 Feature 3 Feature 4
real-value real-value real-value categorical-value
real-value real-value real-value categorical-value
real-value real-value real-value categorical-value
Attributes of each sample
Weight (oz) Width (in) Height (in) Color
6 3 3.5 green
2 1.5 8 yellow
7 4 4.7 yellow
Example
There is NO label – we don’t know what each sample is in the training set!
Clusters
It’s In The Cluster
Height
(x1)
0
Weight
(x2)
Data Plotted (Scatter)
Cluster
(e.g., Apple)
Cluster
(e.g., Banana)
Find a Relationship Between Data that Separates them into Clusters
K-Means
Height
(x1)0
Weight
(x2)
Cluster Centroid
• Pick Number of Clusters (e.g., 2 for Apple and Banana)
• Place a point (cluster centroid) randomly for each cluster
• Assign each sample to a cluster based on closest cluster centroid
Calculate Distance
to each centroid
Recalculate Centroids
Height
(x1)0
Weight
(x2)
Previous Cluster Centroid
• Calculate Centroid (Center) of each Cluster
• Move Centroid to new calculated location.
• Assign each sample to a cluster based on closest (new) cluster centroid
New Centroids
Recalculate
Distances
REPEAT STEPS until centroids do not move anymore
Preparing a Dataset - Clean
Clean
• Fix/Remove unreadable entries
• e.g., bad (funny) characters from different character codesets
• Fix/Remove Misaligned entries
• e.g., incorrect number of fields for row in a CSV file.
• Replace blank fields (i.e., synthesize a value).
• e.g., Mean value of all non-blank values
• e.g., Use rows with values as training set to learn the value
Dataset
Preparing a Dataset – Conversion
Categorical
Value
Conversion
• Change categorical values into real values
• Cannot use enumeration (values imply importance!)
• Expand into dummy variables, one per category, use 0 and 1 as values
CleanDataset
Fruit
Apple
Banana
Pear
Apple Banana Pear
1
1
1
Preparing a Dataset – Feature Scaling
Feature
Scaling
• Scale values to be within the same proportional range
• A column with much larger range will over influence
learning over another column with smaller range.
• Typically, scale the range between 0 and 1 (normalization) or
-1 and 1 (standardization).
Categorical
Value
Conversion
CleanDataset
X’ =
𝑥 − min(𝑥)
max 𝑥 − min(𝑥)
Normalization
original valuenew value

More Related Content

What's hot

Decision Trees for Classification: A Machine Learning Algorithm
Decision Trees for Classification: A Machine Learning AlgorithmDecision Trees for Classification: A Machine Learning Algorithm
Decision Trees for Classification: A Machine Learning AlgorithmPalin analytics
 
Understanding random forests
Understanding random forestsUnderstanding random forests
Understanding random forestsMarc Garcia
 
Ml6 decision trees
Ml6 decision treesMl6 decision trees
Ml6 decision treesankit_ppt
 
Random forest algorithm
Random forest algorithmRandom forest algorithm
Random forest algorithmRashid Ansari
 
Ml4 naive bayes
Ml4 naive bayesMl4 naive bayes
Ml4 naive bayesankit_ppt
 
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)Parth Khare
 
Tree models with Scikit-Learn: Great models with little assumptions
Tree models with Scikit-Learn: Great models with little assumptionsTree models with Scikit-Learn: Great models with little assumptions
Tree models with Scikit-Learn: Great models with little assumptionsGilles Louppe
 
Lect9 Decision tree
Lect9 Decision treeLect9 Decision tree
Lect9 Decision treehktripathy
 
Random Forest Classifier in Machine Learning | Palin Analytics
Random Forest Classifier in Machine Learning | Palin AnalyticsRandom Forest Classifier in Machine Learning | Palin Analytics
Random Forest Classifier in Machine Learning | Palin AnalyticsPalin analytics
 
Decision tree, softmax regression and ensemble methods in machine learning
Decision tree, softmax regression and ensemble methods in machine learningDecision tree, softmax regression and ensemble methods in machine learning
Decision tree, softmax regression and ensemble methods in machine learningAbhishek Vijayvargia
 
Decision trees for machine learning
Decision trees for machine learningDecision trees for machine learning
Decision trees for machine learningAmr BARAKAT
 
Data mining assignment 3
Data mining assignment 3Data mining assignment 3
Data mining assignment 3BarryK88
 
Ml1 introduction to-supervised_learning_and_k_nearest_neighbors
Ml1 introduction to-supervised_learning_and_k_nearest_neighborsMl1 introduction to-supervised_learning_and_k_nearest_neighbors
Ml1 introduction to-supervised_learning_and_k_nearest_neighborsankit_ppt
 
Decision trees & random forests
Decision trees & random forestsDecision trees & random forests
Decision trees & random forestsSC5.io
 
ensemble learning
ensemble learningensemble learning
ensemble learningbutest
 
Interpreting machine learning models
Interpreting machine learning modelsInterpreting machine learning models
Interpreting machine learning modelsandosa
 

What's hot (20)

Decision Trees for Classification: A Machine Learning Algorithm
Decision Trees for Classification: A Machine Learning AlgorithmDecision Trees for Classification: A Machine Learning Algorithm
Decision Trees for Classification: A Machine Learning Algorithm
 
Decision tree
Decision treeDecision tree
Decision tree
 
LR1. Summary Day 1
LR1. Summary Day 1LR1. Summary Day 1
LR1. Summary Day 1
 
Understanding random forests
Understanding random forestsUnderstanding random forests
Understanding random forests
 
Ml6 decision trees
Ml6 decision treesMl6 decision trees
Ml6 decision trees
 
Random forest algorithm
Random forest algorithmRandom forest algorithm
Random forest algorithm
 
Ml4 naive bayes
Ml4 naive bayesMl4 naive bayes
Ml4 naive bayes
 
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
 
Tree models with Scikit-Learn: Great models with little assumptions
Tree models with Scikit-Learn: Great models with little assumptionsTree models with Scikit-Learn: Great models with little assumptions
Tree models with Scikit-Learn: Great models with little assumptions
 
Lect9 Decision tree
Lect9 Decision treeLect9 Decision tree
Lect9 Decision tree
 
L13. Cluster Analysis
L13. Cluster AnalysisL13. Cluster Analysis
L13. Cluster Analysis
 
Random Forest Classifier in Machine Learning | Palin Analytics
Random Forest Classifier in Machine Learning | Palin AnalyticsRandom Forest Classifier in Machine Learning | Palin Analytics
Random Forest Classifier in Machine Learning | Palin Analytics
 
Decision tree, softmax regression and ensemble methods in machine learning
Decision tree, softmax regression and ensemble methods in machine learningDecision tree, softmax regression and ensemble methods in machine learning
Decision tree, softmax regression and ensemble methods in machine learning
 
Decision trees for machine learning
Decision trees for machine learningDecision trees for machine learning
Decision trees for machine learning
 
Data mining assignment 3
Data mining assignment 3Data mining assignment 3
Data mining assignment 3
 
Ml1 introduction to-supervised_learning_and_k_nearest_neighbors
Ml1 introduction to-supervised_learning_and_k_nearest_neighborsMl1 introduction to-supervised_learning_and_k_nearest_neighbors
Ml1 introduction to-supervised_learning_and_k_nearest_neighbors
 
Decision trees & random forests
Decision trees & random forestsDecision trees & random forests
Decision trees & random forests
 
L4. Ensembles of Decision Trees
L4. Ensembles of Decision TreesL4. Ensembles of Decision Trees
L4. Ensembles of Decision Trees
 
ensemble learning
ensemble learningensemble learning
ensemble learning
 
Interpreting machine learning models
Interpreting machine learning modelsInterpreting machine learning models
Interpreting machine learning models
 

Similar to Introduction to Machine Learning

www1.cs.columbia.edu
www1.cs.columbia.eduwww1.cs.columbia.edu
www1.cs.columbia.edubutest
 
Jan vitek distributedrandomforest_5-2-2013
Jan vitek distributedrandomforest_5-2-2013Jan vitek distributedrandomforest_5-2-2013
Jan vitek distributedrandomforest_5-2-2013Sri Ambati
 
Data Mining Concepts and Techniques.ppt
Data Mining Concepts and Techniques.pptData Mining Concepts and Techniques.ppt
Data Mining Concepts and Techniques.pptRvishnupriya2
 
Data Mining Concepts and Techniques.ppt
Data Mining Concepts and Techniques.pptData Mining Concepts and Techniques.ppt
Data Mining Concepts and Techniques.pptRvishnupriya2
 
Download presentation source
Download presentation sourceDownload presentation source
Download presentation sourcebutest
 
Machine learning and decision trees
Machine learning and decision treesMachine learning and decision trees
Machine learning and decision treesPadma Metta
 
Kaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learnt
Kaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learntKaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learnt
Kaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learntEugene Yan Ziyou
 
Introduction to ML and Decision Tree
Introduction to ML and Decision TreeIntroduction to ML and Decision Tree
Introduction to ML and Decision TreeSuman Debnath
 
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...MLconf
 
Random Forests: The Vanilla of Machine Learning - Anna Quach
Random Forests: The Vanilla of Machine Learning - Anna QuachRandom Forests: The Vanilla of Machine Learning - Anna Quach
Random Forests: The Vanilla of Machine Learning - Anna QuachWithTheBest
 
classification in data mining and data warehousing.pdf
classification in data mining and data warehousing.pdfclassification in data mining and data warehousing.pdf
classification in data mining and data warehousing.pdf321106410027
 
Cs501 classification prediction
Cs501 classification predictionCs501 classification prediction
Cs501 classification predictionKamal Singh Lodhi
 
13 random forest
13 random forest13 random forest
13 random forestVishal Dutt
 
Dataming-chapter-7-Classification-Basic.pptx
Dataming-chapter-7-Classification-Basic.pptxDataming-chapter-7-Classification-Basic.pptx
Dataming-chapter-7-Classification-Basic.pptxHimanshuSharma997566
 
Machine learning in agriculture module 2
Machine learning in agriculture module 2Machine learning in agriculture module 2
Machine learning in agriculture module 2Prasenjit Dey
 

Similar to Introduction to Machine Learning (20)

www1.cs.columbia.edu
www1.cs.columbia.eduwww1.cs.columbia.edu
www1.cs.columbia.edu
 
Jan vitek distributedrandomforest_5-2-2013
Jan vitek distributedrandomforest_5-2-2013Jan vitek distributedrandomforest_5-2-2013
Jan vitek distributedrandomforest_5-2-2013
 
ppt
pptppt
ppt
 
Data Mining Concepts and Techniques.ppt
Data Mining Concepts and Techniques.pptData Mining Concepts and Techniques.ppt
Data Mining Concepts and Techniques.ppt
 
Data Mining Concepts and Techniques.ppt
Data Mining Concepts and Techniques.pptData Mining Concepts and Techniques.ppt
Data Mining Concepts and Techniques.ppt
 
Download presentation source
Download presentation sourceDownload presentation source
Download presentation source
 
Machine learning and decision trees
Machine learning and decision treesMachine learning and decision trees
Machine learning and decision trees
 
R for Statistical Computing
R for Statistical ComputingR for Statistical Computing
R for Statistical Computing
 
Unit 3classification
Unit 3classificationUnit 3classification
Unit 3classification
 
Kaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learnt
Kaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learntKaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learnt
Kaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learnt
 
Decision tree
Decision treeDecision tree
Decision tree
 
Decision tree
Decision treeDecision tree
Decision tree
 
Introduction to ML and Decision Tree
Introduction to ML and Decision TreeIntroduction to ML and Decision Tree
Introduction to ML and Decision Tree
 
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
 
Random Forests: The Vanilla of Machine Learning - Anna Quach
Random Forests: The Vanilla of Machine Learning - Anna QuachRandom Forests: The Vanilla of Machine Learning - Anna Quach
Random Forests: The Vanilla of Machine Learning - Anna Quach
 
classification in data mining and data warehousing.pdf
classification in data mining and data warehousing.pdfclassification in data mining and data warehousing.pdf
classification in data mining and data warehousing.pdf
 
Cs501 classification prediction
Cs501 classification predictionCs501 classification prediction
Cs501 classification prediction
 
13 random forest
13 random forest13 random forest
13 random forest
 
Dataming-chapter-7-Classification-Basic.pptx
Dataming-chapter-7-Classification-Basic.pptxDataming-chapter-7-Classification-Basic.pptx
Dataming-chapter-7-Classification-Basic.pptx
 
Machine learning in agriculture module 2
Machine learning in agriculture module 2Machine learning in agriculture module 2
Machine learning in agriculture module 2
 

More from Andrew Ferlitsch

Pareto Principle Applied to QA
Pareto Principle Applied to QAPareto Principle Applied to QA
Pareto Principle Applied to QAAndrew Ferlitsch
 
Whiteboarding Coding Challenges in Python
Whiteboarding Coding Challenges in PythonWhiteboarding Coding Challenges in Python
Whiteboarding Coding Challenges in PythonAndrew Ferlitsch
 
Object Oriented Programming Principles
Object Oriented Programming PrinciplesObject Oriented Programming Principles
Object Oriented Programming PrinciplesAndrew Ferlitsch
 
Python - Installing and Using Python and Jupyter Notepad
Python - Installing and Using Python and Jupyter NotepadPython - Installing and Using Python and Jupyter Notepad
Python - Installing and Using Python and Jupyter NotepadAndrew Ferlitsch
 
Natural Language Processing - Groupings (Associations) Generation
Natural Language Processing - Groupings (Associations) GenerationNatural Language Processing - Groupings (Associations) Generation
Natural Language Processing - Groupings (Associations) GenerationAndrew Ferlitsch
 
Natural Language Provessing - Handling Narrarive Fields in Datasets for Class...
Natural Language Provessing - Handling Narrarive Fields in Datasets for Class...Natural Language Provessing - Handling Narrarive Fields in Datasets for Class...
Natural Language Provessing - Handling Narrarive Fields in Datasets for Class...Andrew Ferlitsch
 
Machine Learning - Introduction to Recurrent Neural Networks
Machine Learning - Introduction to Recurrent Neural NetworksMachine Learning - Introduction to Recurrent Neural Networks
Machine Learning - Introduction to Recurrent Neural NetworksAndrew Ferlitsch
 
Machine Learning - Introduction to Convolutional Neural Networks
Machine Learning - Introduction to Convolutional Neural NetworksMachine Learning - Introduction to Convolutional Neural Networks
Machine Learning - Introduction to Convolutional Neural NetworksAndrew Ferlitsch
 
Machine Learning - Introduction to Neural Networks
Machine Learning - Introduction to Neural NetworksMachine Learning - Introduction to Neural Networks
Machine Learning - Introduction to Neural NetworksAndrew Ferlitsch
 
Python - Numpy/Pandas/Matplot Machine Learning Libraries
Python - Numpy/Pandas/Matplot Machine Learning LibrariesPython - Numpy/Pandas/Matplot Machine Learning Libraries
Python - Numpy/Pandas/Matplot Machine Learning LibrariesAndrew Ferlitsch
 
Machine Learning - Accuracy and Confusion Matrix
Machine Learning - Accuracy and Confusion MatrixMachine Learning - Accuracy and Confusion Matrix
Machine Learning - Accuracy and Confusion MatrixAndrew Ferlitsch
 
Machine Learning - Ensemble Methods
Machine Learning - Ensemble MethodsMachine Learning - Ensemble Methods
Machine Learning - Ensemble MethodsAndrew Ferlitsch
 
ML - Multiple Linear Regression
ML - Multiple Linear RegressionML - Multiple Linear Regression
ML - Multiple Linear RegressionAndrew Ferlitsch
 
ML - Simple Linear Regression
ML - Simple Linear RegressionML - Simple Linear Regression
ML - Simple Linear RegressionAndrew Ferlitsch
 
Machine Learning - Dummy Variable Conversion
Machine Learning - Dummy Variable ConversionMachine Learning - Dummy Variable Conversion
Machine Learning - Dummy Variable ConversionAndrew Ferlitsch
 
Machine Learning - Splitting Datasets
Machine Learning - Splitting DatasetsMachine Learning - Splitting Datasets
Machine Learning - Splitting DatasetsAndrew Ferlitsch
 
Machine Learning - Dataset Preparation
Machine Learning - Dataset PreparationMachine Learning - Dataset Preparation
Machine Learning - Dataset PreparationAndrew Ferlitsch
 
Machine Learning - Introduction to Tensorflow
Machine Learning - Introduction to TensorflowMachine Learning - Introduction to Tensorflow
Machine Learning - Introduction to TensorflowAndrew Ferlitsch
 

More from Andrew Ferlitsch (20)

AI - Intelligent Agents
AI - Intelligent AgentsAI - Intelligent Agents
AI - Intelligent Agents
 
Pareto Principle Applied to QA
Pareto Principle Applied to QAPareto Principle Applied to QA
Pareto Principle Applied to QA
 
Whiteboarding Coding Challenges in Python
Whiteboarding Coding Challenges in PythonWhiteboarding Coding Challenges in Python
Whiteboarding Coding Challenges in Python
 
Object Oriented Programming Principles
Object Oriented Programming PrinciplesObject Oriented Programming Principles
Object Oriented Programming Principles
 
Python - OOP Programming
Python - OOP ProgrammingPython - OOP Programming
Python - OOP Programming
 
Python - Installing and Using Python and Jupyter Notepad
Python - Installing and Using Python and Jupyter NotepadPython - Installing and Using Python and Jupyter Notepad
Python - Installing and Using Python and Jupyter Notepad
 
Natural Language Processing - Groupings (Associations) Generation
Natural Language Processing - Groupings (Associations) GenerationNatural Language Processing - Groupings (Associations) Generation
Natural Language Processing - Groupings (Associations) Generation
 
Natural Language Provessing - Handling Narrarive Fields in Datasets for Class...
Natural Language Provessing - Handling Narrarive Fields in Datasets for Class...Natural Language Provessing - Handling Narrarive Fields in Datasets for Class...
Natural Language Provessing - Handling Narrarive Fields in Datasets for Class...
 
Machine Learning - Introduction to Recurrent Neural Networks
Machine Learning - Introduction to Recurrent Neural NetworksMachine Learning - Introduction to Recurrent Neural Networks
Machine Learning - Introduction to Recurrent Neural Networks
 
Machine Learning - Introduction to Convolutional Neural Networks
Machine Learning - Introduction to Convolutional Neural NetworksMachine Learning - Introduction to Convolutional Neural Networks
Machine Learning - Introduction to Convolutional Neural Networks
 
Machine Learning - Introduction to Neural Networks
Machine Learning - Introduction to Neural NetworksMachine Learning - Introduction to Neural Networks
Machine Learning - Introduction to Neural Networks
 
Python - Numpy/Pandas/Matplot Machine Learning Libraries
Python - Numpy/Pandas/Matplot Machine Learning LibrariesPython - Numpy/Pandas/Matplot Machine Learning Libraries
Python - Numpy/Pandas/Matplot Machine Learning Libraries
 
Machine Learning - Accuracy and Confusion Matrix
Machine Learning - Accuracy and Confusion MatrixMachine Learning - Accuracy and Confusion Matrix
Machine Learning - Accuracy and Confusion Matrix
 
Machine Learning - Ensemble Methods
Machine Learning - Ensemble MethodsMachine Learning - Ensemble Methods
Machine Learning - Ensemble Methods
 
ML - Multiple Linear Regression
ML - Multiple Linear RegressionML - Multiple Linear Regression
ML - Multiple Linear Regression
 
ML - Simple Linear Regression
ML - Simple Linear RegressionML - Simple Linear Regression
ML - Simple Linear Regression
 
Machine Learning - Dummy Variable Conversion
Machine Learning - Dummy Variable ConversionMachine Learning - Dummy Variable Conversion
Machine Learning - Dummy Variable Conversion
 
Machine Learning - Splitting Datasets
Machine Learning - Splitting DatasetsMachine Learning - Splitting Datasets
Machine Learning - Splitting Datasets
 
Machine Learning - Dataset Preparation
Machine Learning - Dataset PreparationMachine Learning - Dataset Preparation
Machine Learning - Dataset Preparation
 
Machine Learning - Introduction to Tensorflow
Machine Learning - Introduction to TensorflowMachine Learning - Introduction to Tensorflow
Machine Learning - Introduction to Tensorflow
 

Recently uploaded

Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 

Recently uploaded (20)

Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 

Introduction to Machine Learning

  • 1. Introduction to Machine Learning Portland Data Science Group Created by Andrew Ferlitsch Community Outreach Officer May, 2017
  • 2. ML : DS vs CS MLDATA SCIENCE COMPUTER SCIENCE Machine Learning is a marriage of Statistics and Computer Science Traditional Front Ends
  • 4. Solutions ML Statistics Computer Science Classification Prediction Forecasting Recommendation Association Apple vs Pear Life Expectancy Weather Shelf Placement Suggestions Computer Vision Speech Recognition Language Dynamic Adaption Road Signs Voice to Text Translation Audio RecognitionAutomation Gun Shots Examples
  • 5. Ladder ( Complexity == Salary) Marketing / Sales Text Classification Human Interface Computer Vision Factory Automation 3D Printing Autonomous Space Frontier Emergent Mature Growth Very Good $$ Exceptional $$ Stratosphere $$
  • 6. It’s About Training Machine Learning is about using data to train a model DATA Training Data Test Data Train Split Dataset into Training and Test Model Use Training Data to Train the Model Produce Model Test the Model Accuracy Determine Accuracy
  • 7. It’s in the Label Features + Label (where label is what the item [row] is. e.g., apple. Supervised versus Unsupervised Learning Labeled Data Unlabeled Data Features Only. e.g., we do not know its an apple • Human (or program) pre-label the data. • Learn how features map to labels. • Learn how features map to clusters. • Learn how clusters map to labels.
  • 8. Supervised Learning Feature 1 Feature 2 Feature 3 Feature 4 Label real-value real-value real-value categorical-value category/value real-value real-value real-value categorical-value category/value real-value real-value real-value categorical-value category/value Attributes of each sample What the sample is Weight (oz) Width (in) Height (in) Color Label 6 3 3.5 green apple 2 1.5 8 yellow banana 7 4 4.7 yellow apple Example Little SignificanceGreater Significance
  • 9. Decision Tree Simple Training Model 1st Feature 2nd Feature 2nd Feature green yellow 3rd Feature 3rd Feature 3rd Feature 3rd Feature color weight < 4 >= 4 < 3.5 >= 3.5 width height e.g., yellow apples weigh more then green apples 4th Feature 4th Feature 4th Feature banana apple banana apple Learn thresholds Leaves are the classification apple
  • 10. Pruning Weight (oz) Width (in) Height (in) Color Label 6 3 3.5 green apple 2 1.5 8 yellow banana 7 4 4.7 yellow apple Assume does not contribute to outcome Weight (oz) Width (in) Height (in) Label 6 3 3.5 apple 2 1.5 8 banana 7 4 4.7 apple
  • 11. Decision Tree After Pruning Simple Training Model 1st Feature 2nd Feature 2nd Feature < 4 >= 4 3rd Feature 3rd Feature 3rd Feature 3rd Feature weight width < 2 >= 2 < 2.5 >= 2.5 height banana apple banana apple Leaves are the classification apple > 3 <= 3
  • 12. Ensemble – Decision Stumps Decision Stumps – Weak Learners 1st Feature 2nd Feature < 4 >= 4 3rd Feature weight width < 2.5 >= 2.5 height banana apple banana apple apple <= 4> 4 banana MAJORITY VOTE Weight: 4.2 = Apple Width : 2.3 = Banana Height : 5.5 = Banana VOTE = Banana
  • 13. (Simple) Linear Regression It’s In The Line Age (x) 0 Feature (data) Spend (y) Label (learn) Data Plotted (Scatter) Best Fitted Line y = a + bx a bx (slope)
  • 14. Loss Function Minimize Loss (Estimated Error) when Fitting a Line y1 Actual Values (y) Predicted Values (yhat) y2 y3 y4 y5 y6 1 𝑛 𝑗=1 𝑛 (𝑦 − 𝑦ℎ𝑎𝑡)2 MSE = (y – yhat) Mean Square Error Sum the Square of the Difference Divide by the number of samples
  • 15. Libraries Do the Work Python & R have libraries that do the math! e.g., numPy, sci-learn Split the Dataset X_train, X_test = split( dataset, 0.80 ) Train the Model dataset percentage (e.g., 80% train, 20% test)training & test data model = train( X_train, 4 ) training datamethod trained model Test the Model Y_test = model( X_test, 4 ) column of label predicted values trained model test data column of label Calculate Accuracy result = accuracy( X_test, 4, Y_test ) actual value predicted values Pseudo Names
  • 16. Unsupervised Learning Feature 1 Feature 2 Feature 3 Feature 4 real-value real-value real-value categorical-value real-value real-value real-value categorical-value real-value real-value real-value categorical-value Attributes of each sample Weight (oz) Width (in) Height (in) Color 6 3 3.5 green 2 1.5 8 yellow 7 4 4.7 yellow Example There is NO label – we don’t know what each sample is in the training set!
  • 17. Clusters It’s In The Cluster Height (x1) 0 Weight (x2) Data Plotted (Scatter) Cluster (e.g., Apple) Cluster (e.g., Banana) Find a Relationship Between Data that Separates them into Clusters
  • 18. K-Means Height (x1)0 Weight (x2) Cluster Centroid • Pick Number of Clusters (e.g., 2 for Apple and Banana) • Place a point (cluster centroid) randomly for each cluster • Assign each sample to a cluster based on closest cluster centroid Calculate Distance to each centroid
  • 19. Recalculate Centroids Height (x1)0 Weight (x2) Previous Cluster Centroid • Calculate Centroid (Center) of each Cluster • Move Centroid to new calculated location. • Assign each sample to a cluster based on closest (new) cluster centroid New Centroids Recalculate Distances REPEAT STEPS until centroids do not move anymore
  • 20. Preparing a Dataset - Clean Clean • Fix/Remove unreadable entries • e.g., bad (funny) characters from different character codesets • Fix/Remove Misaligned entries • e.g., incorrect number of fields for row in a CSV file. • Replace blank fields (i.e., synthesize a value). • e.g., Mean value of all non-blank values • e.g., Use rows with values as training set to learn the value Dataset
  • 21. Preparing a Dataset – Conversion Categorical Value Conversion • Change categorical values into real values • Cannot use enumeration (values imply importance!) • Expand into dummy variables, one per category, use 0 and 1 as values CleanDataset Fruit Apple Banana Pear Apple Banana Pear 1 1 1
  • 22. Preparing a Dataset – Feature Scaling Feature Scaling • Scale values to be within the same proportional range • A column with much larger range will over influence learning over another column with smaller range. • Typically, scale the range between 0 and 1 (normalization) or -1 and 1 (standardization). Categorical Value Conversion CleanDataset X’ = 𝑥 − min(𝑥) max 𝑥 − min(𝑥) Normalization original valuenew value