Machine learning
Outline
2
Machine Learning
Types of Machine Learning Problems
Steps to solve a MachineLearning Problem
What is aCat?
What is aCat?
Occlusion Diversity Deformation Lighting variations
4
What is MachineLearning?
Introduction to Machine Learning
The subfield of computer science that “gives computers the ability to learn
without being explicitly programmed”.
(Arthur Samuel, 1959)
Acomputer program is said to learn from experience Ewith respect to
some class of tasks Tand performance measure Pif its performance at
tasks in T,as measured by P
,improves with experience E.”
(TomMitchell, 1997)
Using data for answeringquestions
Training Predicting
The Big DataEra
Introduction toMachine Learning
Data alreadyavailable
everywhere
Lowstorage costs: everyone
has several GBs for“free”
Hardware more powerfuland
cheaper than everbefore
Everyone has a computer
fullypacked withsensors:
• GPS
• Cameras
• Microphones
Permanentlyconnected to
Internet
• YouTube
• Gmail
• Facebook
• Twitter
Services
CloudComputing:
• Onlinestorage
• Infrastructure as a
Service
User applications:
Devices
Data
Types of Machine LearningProblems
Introduction to Machine Learning
Supervised
Unsupervised
Reinforcement
Typesof Machine LearningProblems
Introduction to Machine Learning
Supervised
Is this a cat or a dog?
Are these emails spam or not?
Unsupervised
Predict the market value of houses, given the
square meters, number of rooms, neighborhood,
etc.
Reinforcement
Learn through examples of which we knowthe
desired output (what we want topredict).
Typesof Machine LearningProblems
Introduction to Machine Learning
Supervised
Unsupervised
Reinforcement
Output is a discrete
variable (e.g., cat/dog)
Classification
Regression
Output is continuous
(e.g., price, temperature)
Unsupervised
Typesof Machine LearningProblems
Introduction to Machine Learning
Supervised
There is no desired output. Learn somethingabout
the data. Latent relationships.
I want to find anomalies in the credit cardusage
patterns of my customers.
Reinforcement
I have photos and want to put them in 20
groups.
Unsupervised
Typesof Machine LearningProblems
Introduction to Machine Learning
Supervised
Reinforcement
Useful for learning structure in the data(clustering),
hidden correlations, reduce dimensionality,etc.
Environment gives feedback via a positiveor
negative reward signal.
Unsupervised
Reinforcement
Typesof Machine LearningProblems
Introduction to Machine Learning
Supervised An agent interacts with an environment andwatches
the result of the interaction.
IntroductiontoMachineLearning
StepstoSolve a MachineLearning Problem
Data
Gathering
Collect data from
various sources
Data
Preprocessing
Clean data tohave
homogeneity
Feature
Engineering
Making yourdata
more useful
Algorithm Selection
& Training
Selecting the right
machine learning model
Making
Predictions
Evaluate the
model
DataGathering
Might depend on humanwork
• Manual labeling for supervised learning.
• Domain knowledge. Maybe evenexperts.
May come for free, or “sortof”
• E.g., Machine Translation.
The more the better: Some algorithms need large amounts of data to be
useful (e.g., neural networks).
The quantity and quality of data dictate the modelaccuracy
Introduction to Machine Learning
DataPreprocessing
Is there anything wrong with thedata?
• Missing values
• Outliers
• Bad encoding (fortext)
• Wrongly-labeled examples
• Biased data
• Do I have many more samples of one class
than the rest?
Need to fix/remove data?
Introduction to Machine Learning
Introduction to Machine Learning
FeatureEngineering
What is a feature?
Afeature is an individual measurable
property of a phenomenon being observed
Our inputs are represented by a setof features.
T
oclassify spam email, features couldbe:
• Number of words that have beench4ng3d
like this.
• Language of the email (0=English,
1=Spanish)
• Number of emojis
Buy ch34p drugs
from the ph4rm4cy
now :) :) :)
(2, 0, 3)
Feature
engineering
Introduction to Machine Learning
FeatureEngineering
Extract more information from existing data, not adding “new”dataper-se
• Making it more useful
• With good features, most algorithms can learnfaster
It can be an art
• Requires thought and knowledge of thedata
T
wo steps:
• Variable transformation (e.g., dates into weekdays, normalizing)
• Feature creation (e.g., n-grams for texts, if word is capitalizedto
detect names, etc.)
Introduction to Machine Learning
Algorithm Selection &Training
Supervised
• Linear classifier
• Naive Bayes
• Support Vector Machines (SVM)
• Decision Tree
• Random Forests
• k-Nearest Neighbors
• Neural Networks (Deeplearning)
Unsupervised
• PCA
• t-SNE
• k-means
• DBSCAN
Reinforcement
• SARSA–λ
• Q-Learning
Goal of training: making the correct prediction as often as possible
• Incremental improvement:
• Use of metrics for evaluating performance and comparingsolutions
• Hyperparameter tuning: more an art than ascience
Introduction to Machine Learning
Algorithm Selection &Training
Predict Adjust
Introduction to Machine Learning
Making Predictions
Feature
extraction
Machine
Learning
model
Samples
Labels
Features
Feature
extraction
Input Features Trained
classifier
Label
Training Phase
Prediction Phase
Summary
• Machine Learning is intelligent use of data to answer questions
• Enabled by an exponential increase in computing power anddata
availability
• Three big types of problems: supervised, unsupervised,reinforcement
• 5 stepsto every machine learning solution:
1. Data Gathering
2. Data Preprocessing
3. Feature Engineering
4. Algorithm Selection & Training
5. Making Predictions
Introduction to Machine Learning

ML basics.pptx

  • 1.
  • 2.
    Outline 2 Machine Learning Types ofMachine Learning Problems Steps to solve a MachineLearning Problem
  • 3.
  • 4.
    What is aCat? OcclusionDiversity Deformation Lighting variations 4
  • 5.
    What is MachineLearning? Introductionto Machine Learning The subfield of computer science that “gives computers the ability to learn without being explicitly programmed”. (Arthur Samuel, 1959) Acomputer program is said to learn from experience Ewith respect to some class of tasks Tand performance measure Pif its performance at tasks in T,as measured by P ,improves with experience E.” (TomMitchell, 1997) Using data for answeringquestions Training Predicting
  • 6.
    The Big DataEra IntroductiontoMachine Learning Data alreadyavailable everywhere Lowstorage costs: everyone has several GBs for“free” Hardware more powerfuland cheaper than everbefore Everyone has a computer fullypacked withsensors: • GPS • Cameras • Microphones Permanentlyconnected to Internet • YouTube • Gmail • Facebook • Twitter Services CloudComputing: • Onlinestorage • Infrastructure as a Service User applications: Devices Data
  • 7.
    Types of MachineLearningProblems Introduction to Machine Learning Supervised Unsupervised Reinforcement
  • 8.
    Typesof Machine LearningProblems Introductionto Machine Learning Supervised Is this a cat or a dog? Are these emails spam or not? Unsupervised Predict the market value of houses, given the square meters, number of rooms, neighborhood, etc. Reinforcement Learn through examples of which we knowthe desired output (what we want topredict).
  • 9.
    Typesof Machine LearningProblems Introductionto Machine Learning Supervised Unsupervised Reinforcement Output is a discrete variable (e.g., cat/dog) Classification Regression Output is continuous (e.g., price, temperature)
  • 10.
    Unsupervised Typesof Machine LearningProblems Introductionto Machine Learning Supervised There is no desired output. Learn somethingabout the data. Latent relationships. I want to find anomalies in the credit cardusage patterns of my customers. Reinforcement I have photos and want to put them in 20 groups.
  • 11.
    Unsupervised Typesof Machine LearningProblems Introductionto Machine Learning Supervised Reinforcement Useful for learning structure in the data(clustering), hidden correlations, reduce dimensionality,etc.
  • 12.
    Environment gives feedbackvia a positiveor negative reward signal. Unsupervised Reinforcement Typesof Machine LearningProblems Introduction to Machine Learning Supervised An agent interacts with an environment andwatches the result of the interaction.
  • 13.
    IntroductiontoMachineLearning StepstoSolve a MachineLearningProblem Data Gathering Collect data from various sources Data Preprocessing Clean data tohave homogeneity Feature Engineering Making yourdata more useful Algorithm Selection & Training Selecting the right machine learning model Making Predictions Evaluate the model
  • 14.
    DataGathering Might depend onhumanwork • Manual labeling for supervised learning. • Domain knowledge. Maybe evenexperts. May come for free, or “sortof” • E.g., Machine Translation. The more the better: Some algorithms need large amounts of data to be useful (e.g., neural networks). The quantity and quality of data dictate the modelaccuracy Introduction to Machine Learning
  • 15.
    DataPreprocessing Is there anythingwrong with thedata? • Missing values • Outliers • Bad encoding (fortext) • Wrongly-labeled examples • Biased data • Do I have many more samples of one class than the rest? Need to fix/remove data? Introduction to Machine Learning
  • 16.
    Introduction to MachineLearning FeatureEngineering What is a feature? Afeature is an individual measurable property of a phenomenon being observed Our inputs are represented by a setof features. T oclassify spam email, features couldbe: • Number of words that have beench4ng3d like this. • Language of the email (0=English, 1=Spanish) • Number of emojis Buy ch34p drugs from the ph4rm4cy now :) :) :) (2, 0, 3) Feature engineering
  • 17.
    Introduction to MachineLearning FeatureEngineering Extract more information from existing data, not adding “new”dataper-se • Making it more useful • With good features, most algorithms can learnfaster It can be an art • Requires thought and knowledge of thedata T wo steps: • Variable transformation (e.g., dates into weekdays, normalizing) • Feature creation (e.g., n-grams for texts, if word is capitalizedto detect names, etc.)
  • 18.
    Introduction to MachineLearning Algorithm Selection &Training Supervised • Linear classifier • Naive Bayes • Support Vector Machines (SVM) • Decision Tree • Random Forests • k-Nearest Neighbors • Neural Networks (Deeplearning) Unsupervised • PCA • t-SNE • k-means • DBSCAN Reinforcement • SARSA–λ • Q-Learning
  • 19.
    Goal of training:making the correct prediction as often as possible • Incremental improvement: • Use of metrics for evaluating performance and comparingsolutions • Hyperparameter tuning: more an art than ascience Introduction to Machine Learning Algorithm Selection &Training Predict Adjust
  • 20.
    Introduction to MachineLearning Making Predictions Feature extraction Machine Learning model Samples Labels Features Feature extraction Input Features Trained classifier Label Training Phase Prediction Phase
  • 21.
    Summary • Machine Learningis intelligent use of data to answer questions • Enabled by an exponential increase in computing power anddata availability • Three big types of problems: supervised, unsupervised,reinforcement • 5 stepsto every machine learning solution: 1. Data Gathering 2. Data Preprocessing 3. Feature Engineering 4. Algorithm Selection & Training 5. Making Predictions Introduction to Machine Learning