SlideShare a Scribd company logo
HANDS-ON ONLINE TRAINING
ON DATA SCIENCE
KNOWLEDGE AND SKILLS FORUM
INTRODUCTION TO
MACHINE LEARNING
FAITHFUL ONWUEGBUCHE
OLUWASEUN ODEYEMI
AUGUSTINE OKOLIE
WHAT IS MACHINE
LEARNING
• Machine learning is a type of artificial
intelligence (AI) that enables computers to
learn and make decisions without being
explicitly programmed.
• Using algorithms that iteratively learn
from data, machine learning allows
computers to find hidden insights without
being explicitly programmed where to
look.
THINK AND LEARN LIKE A BABY
MACHINE
LEARNING VS.
STATISTICS
VS.
COMPUTER
SCIENCE
Machine Learning Statistics Computer Science
Objective
Focuses on learning from
data to make predictions or
decisions without being
explicitly programmed. It
prioritizes prediction
accuracy and
generalizability.
Aims to infer properties of
an underlying distribution
from a data sample. It
emphasizes understanding
and interpreting data and
probabilistic models.
Focuses on the
creation and
application of
algorithms to
manipulate, store, and
communicate digital
information.
Methodologie
s
Typically uses complex
models (like neural
networks) and large
amounts of data to train
models for prediction.
Utilizes both supervised
and unsupervised learning
methods.
Often employs simpler,
more interpretable
models. Focuses on
hypothesis testing,
experimental design,
estimation, and
mathematical analysis.
Involves algorithm
design, data
structures,
computation theory,
computer architecture,
software development,
and more.
Validation
Measures model
performance through
methods like cross-
validation and seeks to
improve generalization to
unseen data.
Validates models using
methods such as
confidence intervals, p-
values, and hypothesis
tests to quantify
uncertainty.
Uses formal methods
for verifying
correctness, analyzing
computational
complexity, and
proving algorithmic
bounds.
Primary
Concern
Creating models that can
learn from and make
decisions or predictions
based on data.
Drawing valid conclusions
and quantifying
uncertainty about
observed data and
underlying distributions.
Creating efficient
algorithms and data
structures to solve
computational
problems.
TYPES OF MACHINE LEARNING
Three Types of Problems
• Supervised
• Unsupervised
• Reinforcement
SUPERVISE
D
• Trained using labeled examples
• Desired output is known
• Methods include classification, regression, etc.
• Uses patterns to predict the values of the label on
additional unlabeled data
• Algorithms:
• Linear regression
• Logistic regression
• K-Nearest Neighbors (KNN)
• Decision Trees and Random Forests
• Support Vector Machines
• Naive Bayes
• Neural Networks
UNSUPERVIS
ED
• Used against data that has no historical labels
• Desired output is unknown
• Goal is to explore the data and find some
structure within the data
• Algorithms
• Anomaly detection
• K-means clustering
• Hierarchical clustering
• DBSCAN
• Principal Component Analysis (PCA)
• Neural Networks
REINFORCEMEN
T
• Algorithm discovers through trial and error
which actions yield the greatest rewards.
• Three primary components:
• the agent (the learner or decision maker),
• the environment (everything the agent
interacts with)
• actions (what the agent can do).
• Objective: the agent chooses actions that
maximize the expected reward over a given
amount of time.
• Algorithms
• Markov Decision Process
• Q-Learning
• Deep Q Network (DQN)
WHY USE IT?
• Machine learning based models can extract patterns from massive
amounts of data which humans cannot do because
• We cannot retain everything in memory or we cannot perform
obvious/redundant computations for hours and days to come up
with interesting patterns.
• “Humans can typically create one or two good models in a week;
machine learning can create thousands of models in a week” (Thomas
H. Davenport)
• Solve problems we simply could not before
USE CASES
• Email spam filter
• Recommendation systems
• Self driving car
• Finance
• Image Recognition
• Competitive machines
TYPICAL MACHINE LEARNING PROCESS
Source: INTRODUCING AZURE MACHINE
LEARNING, pg. 5
TO GIVE CREDIT, OR NOT TO GIVE CREDIT
• You are asked by your boss while working at Big Bank Inc. to
develop an automated decision maker on whether to give a
potential client credit or not.
WHAT IS THE QUESTION?
• What question are we trying to answer here? What problem are
we looking to solve?
SELECTING DATA
Feature
• An individual measurable property of a phenomenon being
observed
• Best found through industry experts
SELECTING DATA
Feature Extraction
• Feature extraction is a general term for methods of
constructing combinations of the variables to get around
certain problems while still describing the data with sufficient
accuracy.
• Analysis with a large number of variables generally requires a
large amount of memory and computation.
• Reducing the amount of resources required to describe a
large set of data
SELECTING DATA
PCA (Principal ComponentAnalysis)
• We have a huge list of different features
• Many of them will measure related properties and so will be
redundant
• Summarize with less features
PREPARING DATA
• Cleaning
• Units
• Missing Values
• Metadata
DEVELOPING MODEL
• What is the problem being solved?
• What is the goal of the model?
• Minimize error on the “training” data
• Training data is the data used to train the model (all of it but the part we
removed)
DEVELOPING MODEL
Linear Model
• Relationships are modeled using linear predictor functions
whose unknown model parameters are estimated from the data
• Complex way of saying the model draws a line between two
categories (classification) or to estimate a value (regression)
• Linear regression the most common form of linear model
DEVELOPING MODEL
Non-linear Model
• A nonlinear model describes nonlinear relationships in
experimental data
• The parameters can take the form of an exponential,
trigonometric, power, or any other nonlinear function
DEVELOPING
MODEL
Overfitting vs. Bias in Machine
Learning
Overfitting​ Bias​
Definition​
Overfitting occurs when a
model fits the data more
than is warranted. It
captures the noise along
with the underlying
pattern in the data.​
Bias is error from
erroneous assumptions in
the learning algorithm.
High bias can cause
an algorithm to miss
relevant relations between
features and
target outputs.​
Consequence​
Overfitting leads to a
smaller error on the
training data set but
a larger one on unseen
data, reducing the model's
ability to generalize.​
High bias often leads
to underfitting, where the
model oversimplifies the
data and doesn't capture
its complexity.​
Example​
Creating a very complex
decision tree that classifies
each
training instance perfectly,
but performs poorly on
unseen data.​
Fitting a quadratic dataset
using a linear model – the
model will consistently fail
to capture the
true relationship and make
errors.​
DEVELOPING MODEL
Overfitting vs. Bias in Machine
Learning
COMBINATIONS OF BIAS-VARIANCE
DEVELOPING MODEL
Rule Addition
• Minimize error on the “training” data
• AND make sure that error in the “unseen” data is close to error in
the “training” data
DEVELOPING A MODEL
• Keep it Simple
• Go for simpler models over more complicated models
• Generally, the fewer parameters that you have to tune the better
• Cross-Validation
• K-fold cross validation is a great way to estimate error on training data
• Regularization
• Can sometimes help penalize certain sources of overfitting.
• LASSO
• Forces the sum of the absolute value of coefficients to be less than a
fixed value
• Effectively choosing a simpler model
DEVELOPIN
G A MODEL
Data Snooping or Data Dredging
• It's a form of bias that arises when you make decisions based on
the same data you've used to train and test your model.
• “If a data set has affected any step in the learning process, its
ability to access the outcome has been compromised”
• Experimenting
• Reuse of the same data set to determine quality of model
• Once a data set has been used to test the performance of a
data set, it should be considered contaminated
Source: Learning From Data, pg. 1
INTERPRETING RESULTS
• Validation
• Cross validation
• Test set
• Once the test set has been used, you must find new data!

More Related Content

What's hot

supervised learning
supervised learningsupervised learning
supervised learning
Amar Tripathi
 
Machine Learning by Rj
Machine Learning by RjMachine Learning by Rj
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
CloudxLab
 
Machine learning
Machine learningMachine learning
Machine learning
Rajib Kumar De
 
Machine learning
Machine learningMachine learning
Machine learning
ADARSHMISHRA126
 
Types of Machine Learning
Types of Machine LearningTypes of Machine Learning
Types of Machine Learning
Samra Shahzadi
 
Machine learning
Machine learningMachine learning
Machine learning
Dr Geetha Mohan
 
Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...
Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...
Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...
Simplilearn
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
Kumar P
 
Machine learning
Machine learningMachine learning
Machine learning
Navdeep Asteya
 
Machine Learning and Applications
Machine Learning and ApplicationsMachine Learning and Applications
Machine Learning and Applications
Geeta Arora
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
Sajitha Burvin
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
Rabab Munawar
 
Automated Machine Learning
Automated Machine LearningAutomated Machine Learning
Automated Machine Learning
safa cimenli
 
MACHINE LEARNING PPT(ML) rohit.pptx
MACHINE LEARNING  PPT(ML) rohit.pptxMACHINE LEARNING  PPT(ML) rohit.pptx
MACHINE LEARNING PPT(ML) rohit.pptx
NikhilRanaCSELEET005
 
Machine Learning presentation.
Machine Learning presentation.Machine Learning presentation.
Machine Learning presentation.butest
 
Pattern recognition
Pattern recognitionPattern recognition
Pattern recognition
Tanjina Thakur
 
Machine learning with ADA Boost
Machine learning with ADA BoostMachine learning with ADA Boost
Machine learning with ADA Boost
Aman Patel
 
Machine Can Think
Machine Can ThinkMachine Can Think
Machine Can Think
Rahul Jaiman
 
Adversarial Attacks and Defense
Adversarial Attacks and DefenseAdversarial Attacks and Defense
Adversarial Attacks and Defense
Kishor Datta Gupta
 

What's hot (20)

supervised learning
supervised learningsupervised learning
supervised learning
 
Machine Learning by Rj
Machine Learning by RjMachine Learning by Rj
Machine Learning by Rj
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Machine learning
Machine learningMachine learning
Machine learning
 
Machine learning
Machine learningMachine learning
Machine learning
 
Types of Machine Learning
Types of Machine LearningTypes of Machine Learning
Types of Machine Learning
 
Machine learning
Machine learningMachine learning
Machine learning
 
Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...
Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...
Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Machine learning
Machine learningMachine learning
Machine learning
 
Machine Learning and Applications
Machine Learning and ApplicationsMachine Learning and Applications
Machine Learning and Applications
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Automated Machine Learning
Automated Machine LearningAutomated Machine Learning
Automated Machine Learning
 
MACHINE LEARNING PPT(ML) rohit.pptx
MACHINE LEARNING  PPT(ML) rohit.pptxMACHINE LEARNING  PPT(ML) rohit.pptx
MACHINE LEARNING PPT(ML) rohit.pptx
 
Machine Learning presentation.
Machine Learning presentation.Machine Learning presentation.
Machine Learning presentation.
 
Pattern recognition
Pattern recognitionPattern recognition
Pattern recognition
 
Machine learning with ADA Boost
Machine learning with ADA BoostMachine learning with ADA Boost
Machine learning with ADA Boost
 
Machine Can Think
Machine Can ThinkMachine Can Think
Machine Can Think
 
Adversarial Attacks and Defense
Adversarial Attacks and DefenseAdversarial Attacks and Defense
Adversarial Attacks and Defense
 

Similar to Introduction to Machine Learning

Choosing a Machine Learning technique to solve your need
Choosing a Machine Learning technique to solve your needChoosing a Machine Learning technique to solve your need
Choosing a Machine Learning technique to solve your need
GibDevs
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
Niko Vuokko
 
MACHINE LEARNING PRESENTATION (ARTIFICIAL INTELLIGENCE)
MACHINE LEARNING PRESENTATION (ARTIFICIAL INTELLIGENCE)MACHINE LEARNING PRESENTATION (ARTIFICIAL INTELLIGENCE)
MACHINE LEARNING PRESENTATION (ARTIFICIAL INTELLIGENCE)
MAHIRA
 
Afternoons with Azure - Azure Machine Learning
Afternoons with Azure - Azure Machine Learning Afternoons with Azure - Azure Machine Learning
Afternoons with Azure - Azure Machine Learning
CCG
 
Artificial Intelligence Approaches
Artificial Intelligence  ApproachesArtificial Intelligence  Approaches
Artificial Intelligence Approaches
Jincy Nelson
 
Improving AI Development - Dave Litwiller - Jan 11 2022 - Public
Improving AI Development - Dave Litwiller - Jan 11 2022 - PublicImproving AI Development - Dave Litwiller - Jan 11 2022 - Public
Improving AI Development - Dave Litwiller - Jan 11 2022 - Public
Dave Litwiller
 
Informs presentation new ppt
Informs presentation new pptInforms presentation new ppt
Informs presentation new pptSalford Systems
 
Barga Data Science lecture 2
Barga Data Science lecture 2Barga Data Science lecture 2
Barga Data Science lecture 2
Roger Barga
 
Machine Learning_Unit 2_Full.ppt.pdf
Machine Learning_Unit 2_Full.ppt.pdfMachine Learning_Unit 2_Full.ppt.pdf
Machine Learning_Unit 2_Full.ppt.pdf
Dr.DHANALAKSHMI SENTHILKUMAR
 
ML_Module_1.pdf
ML_Module_1.pdfML_Module_1.pdf
ML_Module_1.pdf
JafarHussain48
 
Machine Learning in the Financial Industry
Machine Learning in the Financial IndustryMachine Learning in the Financial Industry
Machine Learning in the Financial Industry
Subrat Panda, PhD
 
Machine Learning 2 deep Learning: An Intro
Machine Learning 2 deep Learning: An IntroMachine Learning 2 deep Learning: An Intro
Machine Learning 2 deep Learning: An Intro
Si Krishan
 
Introduction to ML (Machine Learning)
Introduction to ML (Machine Learning)Introduction to ML (Machine Learning)
Introduction to ML (Machine Learning)
SwatiTripathi44
 
The Art of Intelligence – A Practical Introduction Machine Learning for Orac...
The Art of Intelligence – A Practical Introduction Machine Learning for Orac...The Art of Intelligence – A Practical Introduction Machine Learning for Orac...
The Art of Intelligence – A Practical Introduction Machine Learning for Orac...
Lucas Jellema
 
Optimal Model Complexity (1).pptx
Optimal Model Complexity (1).pptxOptimal Model Complexity (1).pptx
Optimal Model Complexity (1).pptx
MurindanyiSudi1
 
H2O World - Intro to Data Science with Erin Ledell
H2O World - Intro to Data Science with Erin LedellH2O World - Intro to Data Science with Erin Ledell
H2O World - Intro to Data Science with Erin Ledell
Sri Ambati
 
Lecture 1
Lecture 1Lecture 1
Lecture 1
Aun Akbar
 
lec1.ppt
lec1.pptlec1.ppt
lec1.ppt
SVasuKrishna1
 
02 Related Concepts
02 Related Concepts02 Related Concepts
02 Related Concepts
Valerii Klymchuk
 
MACHINE LEARNING INTRODUCTION DIFFERENCE BETWEEN SUOERVISED , UNSUPERVISED AN...
MACHINE LEARNING INTRODUCTION DIFFERENCE BETWEEN SUOERVISED , UNSUPERVISED AN...MACHINE LEARNING INTRODUCTION DIFFERENCE BETWEEN SUOERVISED , UNSUPERVISED AN...
MACHINE LEARNING INTRODUCTION DIFFERENCE BETWEEN SUOERVISED , UNSUPERVISED AN...
DurgaDevi310087
 

Similar to Introduction to Machine Learning (20)

Choosing a Machine Learning technique to solve your need
Choosing a Machine Learning technique to solve your needChoosing a Machine Learning technique to solve your need
Choosing a Machine Learning technique to solve your need
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
MACHINE LEARNING PRESENTATION (ARTIFICIAL INTELLIGENCE)
MACHINE LEARNING PRESENTATION (ARTIFICIAL INTELLIGENCE)MACHINE LEARNING PRESENTATION (ARTIFICIAL INTELLIGENCE)
MACHINE LEARNING PRESENTATION (ARTIFICIAL INTELLIGENCE)
 
Afternoons with Azure - Azure Machine Learning
Afternoons with Azure - Azure Machine Learning Afternoons with Azure - Azure Machine Learning
Afternoons with Azure - Azure Machine Learning
 
Artificial Intelligence Approaches
Artificial Intelligence  ApproachesArtificial Intelligence  Approaches
Artificial Intelligence Approaches
 
Improving AI Development - Dave Litwiller - Jan 11 2022 - Public
Improving AI Development - Dave Litwiller - Jan 11 2022 - PublicImproving AI Development - Dave Litwiller - Jan 11 2022 - Public
Improving AI Development - Dave Litwiller - Jan 11 2022 - Public
 
Informs presentation new ppt
Informs presentation new pptInforms presentation new ppt
Informs presentation new ppt
 
Barga Data Science lecture 2
Barga Data Science lecture 2Barga Data Science lecture 2
Barga Data Science lecture 2
 
Machine Learning_Unit 2_Full.ppt.pdf
Machine Learning_Unit 2_Full.ppt.pdfMachine Learning_Unit 2_Full.ppt.pdf
Machine Learning_Unit 2_Full.ppt.pdf
 
ML_Module_1.pdf
ML_Module_1.pdfML_Module_1.pdf
ML_Module_1.pdf
 
Machine Learning in the Financial Industry
Machine Learning in the Financial IndustryMachine Learning in the Financial Industry
Machine Learning in the Financial Industry
 
Machine Learning 2 deep Learning: An Intro
Machine Learning 2 deep Learning: An IntroMachine Learning 2 deep Learning: An Intro
Machine Learning 2 deep Learning: An Intro
 
Introduction to ML (Machine Learning)
Introduction to ML (Machine Learning)Introduction to ML (Machine Learning)
Introduction to ML (Machine Learning)
 
The Art of Intelligence – A Practical Introduction Machine Learning for Orac...
The Art of Intelligence – A Practical Introduction Machine Learning for Orac...The Art of Intelligence – A Practical Introduction Machine Learning for Orac...
The Art of Intelligence – A Practical Introduction Machine Learning for Orac...
 
Optimal Model Complexity (1).pptx
Optimal Model Complexity (1).pptxOptimal Model Complexity (1).pptx
Optimal Model Complexity (1).pptx
 
H2O World - Intro to Data Science with Erin Ledell
H2O World - Intro to Data Science with Erin LedellH2O World - Intro to Data Science with Erin Ledell
H2O World - Intro to Data Science with Erin Ledell
 
Lecture 1
Lecture 1Lecture 1
Lecture 1
 
lec1.ppt
lec1.pptlec1.ppt
lec1.ppt
 
02 Related Concepts
02 Related Concepts02 Related Concepts
02 Related Concepts
 
MACHINE LEARNING INTRODUCTION DIFFERENCE BETWEEN SUOERVISED , UNSUPERVISED AN...
MACHINE LEARNING INTRODUCTION DIFFERENCE BETWEEN SUOERVISED , UNSUPERVISED AN...MACHINE LEARNING INTRODUCTION DIFFERENCE BETWEEN SUOERVISED , UNSUPERVISED AN...
MACHINE LEARNING INTRODUCTION DIFFERENCE BETWEEN SUOERVISED , UNSUPERVISED AN...
 

Recently uploaded

一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
ewymefz
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
ewymefz
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
ocavb
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
correoyaya
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
nscud
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
Opendatabay
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
ewymefz
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
yhkoc
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
ukgaet
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
ArpitMalhotra16
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Linda486226
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
enxupq
 
Tabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsTabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflows
alex933524
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
nscud
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
Oppotus
 

Recently uploaded (20)

一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
 
Tabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsTabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflows
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 

Introduction to Machine Learning

  • 1. HANDS-ON ONLINE TRAINING ON DATA SCIENCE KNOWLEDGE AND SKILLS FORUM INTRODUCTION TO MACHINE LEARNING FAITHFUL ONWUEGBUCHE OLUWASEUN ODEYEMI AUGUSTINE OKOLIE
  • 2. WHAT IS MACHINE LEARNING • Machine learning is a type of artificial intelligence (AI) that enables computers to learn and make decisions without being explicitly programmed. • Using algorithms that iteratively learn from data, machine learning allows computers to find hidden insights without being explicitly programmed where to look.
  • 3. THINK AND LEARN LIKE A BABY
  • 4. MACHINE LEARNING VS. STATISTICS VS. COMPUTER SCIENCE Machine Learning Statistics Computer Science Objective Focuses on learning from data to make predictions or decisions without being explicitly programmed. It prioritizes prediction accuracy and generalizability. Aims to infer properties of an underlying distribution from a data sample. It emphasizes understanding and interpreting data and probabilistic models. Focuses on the creation and application of algorithms to manipulate, store, and communicate digital information. Methodologie s Typically uses complex models (like neural networks) and large amounts of data to train models for prediction. Utilizes both supervised and unsupervised learning methods. Often employs simpler, more interpretable models. Focuses on hypothesis testing, experimental design, estimation, and mathematical analysis. Involves algorithm design, data structures, computation theory, computer architecture, software development, and more. Validation Measures model performance through methods like cross- validation and seeks to improve generalization to unseen data. Validates models using methods such as confidence intervals, p- values, and hypothesis tests to quantify uncertainty. Uses formal methods for verifying correctness, analyzing computational complexity, and proving algorithmic bounds. Primary Concern Creating models that can learn from and make decisions or predictions based on data. Drawing valid conclusions and quantifying uncertainty about observed data and underlying distributions. Creating efficient algorithms and data structures to solve computational problems.
  • 5. TYPES OF MACHINE LEARNING Three Types of Problems • Supervised • Unsupervised • Reinforcement
  • 6. SUPERVISE D • Trained using labeled examples • Desired output is known • Methods include classification, regression, etc. • Uses patterns to predict the values of the label on additional unlabeled data • Algorithms: • Linear regression • Logistic regression • K-Nearest Neighbors (KNN) • Decision Trees and Random Forests • Support Vector Machines • Naive Bayes • Neural Networks
  • 7. UNSUPERVIS ED • Used against data that has no historical labels • Desired output is unknown • Goal is to explore the data and find some structure within the data • Algorithms • Anomaly detection • K-means clustering • Hierarchical clustering • DBSCAN • Principal Component Analysis (PCA) • Neural Networks
  • 8. REINFORCEMEN T • Algorithm discovers through trial and error which actions yield the greatest rewards. • Three primary components: • the agent (the learner or decision maker), • the environment (everything the agent interacts with) • actions (what the agent can do). • Objective: the agent chooses actions that maximize the expected reward over a given amount of time. • Algorithms • Markov Decision Process • Q-Learning • Deep Q Network (DQN)
  • 9. WHY USE IT? • Machine learning based models can extract patterns from massive amounts of data which humans cannot do because • We cannot retain everything in memory or we cannot perform obvious/redundant computations for hours and days to come up with interesting patterns. • “Humans can typically create one or two good models in a week; machine learning can create thousands of models in a week” (Thomas H. Davenport) • Solve problems we simply could not before
  • 10. USE CASES • Email spam filter • Recommendation systems • Self driving car • Finance • Image Recognition • Competitive machines
  • 11. TYPICAL MACHINE LEARNING PROCESS Source: INTRODUCING AZURE MACHINE LEARNING, pg. 5
  • 12. TO GIVE CREDIT, OR NOT TO GIVE CREDIT • You are asked by your boss while working at Big Bank Inc. to develop an automated decision maker on whether to give a potential client credit or not.
  • 13. WHAT IS THE QUESTION? • What question are we trying to answer here? What problem are we looking to solve?
  • 14. SELECTING DATA Feature • An individual measurable property of a phenomenon being observed • Best found through industry experts
  • 15. SELECTING DATA Feature Extraction • Feature extraction is a general term for methods of constructing combinations of the variables to get around certain problems while still describing the data with sufficient accuracy. • Analysis with a large number of variables generally requires a large amount of memory and computation. • Reducing the amount of resources required to describe a large set of data
  • 16. SELECTING DATA PCA (Principal ComponentAnalysis) • We have a huge list of different features • Many of them will measure related properties and so will be redundant • Summarize with less features
  • 17. PREPARING DATA • Cleaning • Units • Missing Values • Metadata
  • 18. DEVELOPING MODEL • What is the problem being solved? • What is the goal of the model? • Minimize error on the “training” data • Training data is the data used to train the model (all of it but the part we removed)
  • 19. DEVELOPING MODEL Linear Model • Relationships are modeled using linear predictor functions whose unknown model parameters are estimated from the data • Complex way of saying the model draws a line between two categories (classification) or to estimate a value (regression) • Linear regression the most common form of linear model
  • 20. DEVELOPING MODEL Non-linear Model • A nonlinear model describes nonlinear relationships in experimental data • The parameters can take the form of an exponential, trigonometric, power, or any other nonlinear function
  • 21. DEVELOPING MODEL Overfitting vs. Bias in Machine Learning Overfitting​ Bias​ Definition​ Overfitting occurs when a model fits the data more than is warranted. It captures the noise along with the underlying pattern in the data.​ Bias is error from erroneous assumptions in the learning algorithm. High bias can cause an algorithm to miss relevant relations between features and target outputs.​ Consequence​ Overfitting leads to a smaller error on the training data set but a larger one on unseen data, reducing the model's ability to generalize.​ High bias often leads to underfitting, where the model oversimplifies the data and doesn't capture its complexity.​ Example​ Creating a very complex decision tree that classifies each training instance perfectly, but performs poorly on unseen data.​ Fitting a quadratic dataset using a linear model – the model will consistently fail to capture the true relationship and make errors.​
  • 22. DEVELOPING MODEL Overfitting vs. Bias in Machine Learning
  • 24. DEVELOPING MODEL Rule Addition • Minimize error on the “training” data • AND make sure that error in the “unseen” data is close to error in the “training” data
  • 25. DEVELOPING A MODEL • Keep it Simple • Go for simpler models over more complicated models • Generally, the fewer parameters that you have to tune the better • Cross-Validation • K-fold cross validation is a great way to estimate error on training data • Regularization • Can sometimes help penalize certain sources of overfitting. • LASSO • Forces the sum of the absolute value of coefficients to be less than a fixed value • Effectively choosing a simpler model
  • 26. DEVELOPIN G A MODEL Data Snooping or Data Dredging • It's a form of bias that arises when you make decisions based on the same data you've used to train and test your model. • “If a data set has affected any step in the learning process, its ability to access the outcome has been compromised” • Experimenting • Reuse of the same data set to determine quality of model • Once a data set has been used to test the performance of a data set, it should be considered contaminated Source: Learning From Data, pg. 1
  • 27. INTERPRETING RESULTS • Validation • Cross validation • Test set • Once the test set has been used, you must find new data!

Editor's Notes

  1. Coin: that’s a quarter, that’s a dime Unsupervised: That’s a cluster, that’s another cluster Renforcement:
  2. Unsupervised learning works well on transactional data. For example, it can identify segments of customers with similar attributes who can then be treated similarly in marketing campaigns. Or it can find the main attributes that separate customer segments from each other. Popular techniques include self-organizing maps, nearest-neighbor mapping, k-means clustering and singular value decomposition. These algorithms are also used to segment text topics, recommend items and identify data outliers.
  3. The agent will reach the goal much faster by following a good policy. So the goal in reinforcement learning is to learn the best policy.
  4. In WWII, Allied bombers were key to strategic attacks, yet these lumbering giants were constantly shot down over enemy territory. The planes needed more armor, but armor is heavy. So extra plating could only go where the planes were being shot the most. A man named Abraham Wald, a Jewish mathematician who’d been locked out of university positions and ultimately fled the persecution in his own home country of Hungary, was brought in to oversee the operation. He started with a simple diagram—the outline of a plane—and he marked bullet holes corresponding to where each returning bomber had been shot. The result was the anatomy of common plane damage. The wings, nose, and tail were blackened with bullet holes, so these were the spots that needed more armor.
  5. Draw on board
  6. Draw example on board, why ever use linear then?
  7. Draw example on board
  8. More data