Machine Learning an Research
Overview
AICTE SPONSORED 2 Weeks FDP on
Artificial Intelligence and Advanced Machine
Learning using Data Science
Venue: S A Engg College Date:26.11.2019
Dr.A.Kathirvel,
Professor and Head
Misrimal Navajee Munoth Jain Engineering College, Chennai
 What is machine learning?
 Learning system model
 Training and testing
 Performance
 Algorithms
 Machine learning structure
 What are we seeking?
 Learning techniques
 Applications
 Conclusion
Outline & Content
3
Why “Learn”?
 Machine learning is programming computers to
optimize a performance criterion using example
data or past experience.
 There is no need to “learn” to calculate payroll
 Learning is used when:
 Human expertise does not exist (navigating on Mars),
 Humans are unable to explain their expertise (speech
recognition)
 Solution changes in time (routing on a computer
network)
 Solution needs to be adapted to particular cases (user
biometrics)
4
What & WhenWe Talk About “Learning”
 Learning general models from a data of particular
examples
 Data is cheap and abundant (data warehouses, data
marts); knowledge is expensive and scarce.
 Example in retail: Customer transactions to
consumer behavior:
People who bought “DaVinci Code” also bought “The Five
PeopleYou Meet in Heaven” (www.amazon.com)
 Build a model that is a good and useful approximation
to the data.
5
Data Mining/KDD
 Retail: Market basket analysis, Customer relationship
management (CRM)
 Finance: Credit scoring, fraud detection
 Manufacturing: Optimization, troubleshooting
 Medicine: Medical diagnosis
 Telecommunications: Quality of service optimization
 Bioinformatics: Motifs, alignment
 Web mining: Search engines
Definition := “KDD is the non-trivial process of identifying valid,
novel, potentially useful, and ultimately understandable patterns in
data” (Fayyad)
Applications:
6
What is Machine Learning?
 Machine Learning
 Study of algorithms that
 improve their performance
 at some task
 with experience
 Optimize a performance criterion using example
data or past experience.
 Role of Statistics: Inference from a sample
 Role of Computer science: Efficient algorithms to
 Solve the optimization problem
 Representing and evaluating the model for inference
 A branch of artificial intelligence, concerned with
the design and development of algorithms that allow
computers to evolve behaviors based on empirical
data.
 As intelligence requires knowledge, it is necessary for
the computers to acquire knowledge.
What is machine learning?
7
What is Machine Learning?
 It is very hard to write programs that solve problems like
recognizing a face.
 We don’t know what program to write because we
don’t know how our brain does it.
 Even if we had a good idea about how to do it, the
program might be horrendously complicated.
 Instead of writing a program by hand, we collect lots of
examples that specify the correct output for a given input.
 A machine learning algorithm then takes these examples
and produces a program that does the job.
 The program produced by the learning algorithm may
look very different from a typical hand-written program.
It may contain millions of numbers.
 If we do it right, the program works for new cases as
well as the ones we trained it on.
8
Machine Learning is…
Machine learning, a branch of artificial intelligence,
concerns the construction and study of systems
that can learn from data.
9
Machine Learning is…
Machine learning is programming computers to optimize a
performance criterion using example data or past experience.
-- Ethem Alpaydin
The goal of machine learning is to develop methods that can
automatically detect patterns in data, and then to use the uncovered
patterns to predict future data or other outcomes of interest.
-- Kevin P. Murphy
The field of pattern recognition is concerned with the automatic
discovery of regularities in data through the use of computer
algorithms and with the use of these regularities to take actions.
-- Christopher M. Bishop
10
Machine Learning is…
Machine learning is about predicting the future
based on the past.
-- Hal Daume III
11
Machine Learning is…
Machine learning is about predicting the future based on
the past.
-- Hal Daume III
Training
Data
model/
predictor
past
model/
predictor
future
Testing
Data
12
Learning system model
Input
Samples
Learning
Method
System
Training
Testing
13
Training and testing
Training set
(observed)
Universal set
(unobserved)
Testing set
(unobserved)
Data acquisition Practical usage
14
 Training is the process of making the system able to learn.
 No free lunch rule:
 Training set and testing set come from the same distribution
 Need to make some assumptions or bias
Training and testing
15
 There are several factors affecting the performance:
 Types of training provided
 The form and extent of any initial background
knowledge
 The type of feedback provided
 The learning algorithms used
 Two important factors:
 Modeling
 Optimization
Performance
16
 The success of machine learning system also
depends on the algorithms.
 The algorithms control the search to find and
build the knowledge structures.
 The learning algorithms should extract useful
information from training examples.
Algorithms
 Supervised learning ( )
 Prediction
 Classification (discrete labels), Regression (real values)
 Unsupervised learning ( )
 Clustering
 Probability distribution estimation
 Finding association (in features)
 Dimension reduction
 Semi-supervised learning
 Reinforcement learning
 Decision making (robot, chess machine)
Algorithms
18
19
Algorithms
Supervised learning Unsupervised learning
Semi-supervised learning
Data
examples
Data
20
Data
examples
Data
21
Data
examples
Data
22
Data
examples
Data
23
Supervised learning
Supervised learning: given labeled examples
label
label1
label3
label4
label5
labeled examples
examples
24
Supervised learning
Supervised learning: given labeled examples
model/
predictor
label
label1
label3
label4
label5
25
Supervised learning
Supervised learning: learn to predict new example
model/
predictor
predicted label
26
Supervised learning: classification
Supervised learning: given labeled examples
label
apple
apple
banana
banana
Classification: a finite set of labels
27
Types of learning task
 Supervised learning
 Learn to predict output when given an input vector
 Who provides the correct answer?
 Reinforcement learning
 Learn action to maximize payoff
 Not much information in a payoff signal
 Payoff is often delayed
 Reinforcement learning is an important area that will not be
covered in this course.
 Unsupervised learning
 Create an internal representation of the input e.g. form clusters;
extract features
 How do we know if a representation is good?
 This is the new frontier of machine learning because most big
datasets do not come with labels.
28
Classification Example
Differentiate
between low-risk
and high-risk
customers from
their income and
savings
29
Classification Applications
Face recognition
Character recognition
Spam detection
Medical diagnosis: From symptoms to illnesses
Biometrics: Recognition/authentication using physical
and/or behavioral characteristics: Face, iris, signature, etc
...
30
Supervised learning: regression
Supervised learning: given labeled examples
label
-4.5
10.1
3.2
4.3
Regression: label is real-valued
31
Regression Example
Price of a used car
x : car attributes
(e.g. mileage)
y : price
y = wx+w0
32
Regression Applications
❑Economics/Finance: predict the value of a stock
❑Epidemiology
❑Car/plane navigation: angle of the steering wheel,
acceleration, …
❑Temporal trends: weather over time
❑…
33
Supervised learning: ranking
Supervised learning: given labeled examples
label
1
4
2
3
Ranking: label is a ranking
34
Ranking example
Given a query
and
a set of web
pages,
rank them
according
to relevance
35
Ranking Applications
❑User preference, e.g. Netflix “My List” -- movie
queue ranking
❑iTunes
❑flight search (search in general)
❑reranking N-best output lists
❑…
36
Unsupervised learning
Unupervised learning: given data, i.e. examples, but no labels
Unsupervised learning applications
❑learn clusters/groups without any label
❑customer segmentation (i.e. grouping)
❑image compression
❑bioinformatics: learn motifs
❑…
Reinforcement learning
left, right, straight, left, left, left, straight
left, straight, straight, left, right, straight, straight
GOOD
BAD
left, right, straight, left, left, left, straight
left, straight, straight, left, right, straight, straight
18.5
-3
Given a sequence of examples/states and a reward after
completing that sequence, learn to predict the action to take
in for an individual example/state
Reinforcement learning example
… WIN!
… LOSE!
Backgammon
Given sequences of moves and whether or not the
player won at the end, learn to make good moves
40
Reinforcement learning example
http://www.youtube.com/watch?v=VCdxqn0fcnE
41
Other learning variations
What data is available:
 Supervised, unsupervised, reinforcement learning
 semi-supervised, active learning, …
How are we getting the data:
 online vs. offline learning
Type of model:
 generative vs. discriminative
 parametric vs. non-parametric
42
 Supervised learning
Machine learning structure
43
 Unsupervised learning
Machine learning structure
44
 Supervised: Low E-out or maximize probabilistic terms
 Unsupervised: Minimum quantization error, Minimum distance,
MAP, MLE(maximum likelihood estimation)
What are we seeking?
E-in: for training set
E-out: for testing set
45
Under-fitting VS. Over-fitting (fixed N)
What are we seeking?
error
(model = hypothesis + loss functions)
46
 Supervised learning categories and techniques
 Linear classifier (numerical functions)
 Parametric (Probabilistic functions)
 Naïve Bayes, Gaussian discriminant analysis (GDA), Hidden Markov
models (HMM), Probabilistic graphical models
 Non-parametric (Instance-based functions)
 K-nearest neighbors, Kernel regression, Kernel density estimation,
Local regression
 Non-metric (Symbolic functions)
 Classification and regression tree (CART), decision tree
 Aggregation
 Bagging (bootstrap + aggregation),Adaboost, Random forest
Learning techniques
47
 Techniques:
 Perceptron
 Logistic regression
 Support vector machine (SVM)
 Ada-line
 Multi-layer perceptron (MLP)
Learning techniques
, where w is an d-dim vector (learned)
• Linear classifier
48
Learning techniques
Using perceptron learning algorithm(PLA)
Training Testing
Error rate: 0.10 Error rate: 0.156
49
Learning techniques
Using logistic regression
Training Testing
Error rate: 0.11 Error rate: 0.145
50
 Support vector machine (SVM):
 Linear to nonlinear: Feature transform and kernel function
Learning techniques
• Non-linear case
51
 Unsupervised learning categories and techniques
 Clustering
 K-means clustering
 Spectral clustering
 Density Estimation
 Gaussian mixture model (GMM)
 Graphical models
 Dimensionality reduction
 Principal component analysis (PCA)
 Factor analysis
Learning techniques
52
 Face detection
 Object detection and recognition
 Image segmentation
 Multimedia event detection
 Economical and commercial usage
Applications
53
A classic example of a task that requires machine learning:
It is very hard to say what makes a 2
54
Some more examples of tasks that are best solved by
using a learning algorithm
 Recognizing patterns:
 Facial identities or facial expressions
 Handwritten or spoken words
 Medical images
 Generating patterns:
 Generating images or motion sequences (demo)
 Recognizing anomalies:
 Unusual sequences of credit card transactions
 Unusual patterns of sensor readings in a nuclear power plant or
unusual sound in your car engine.
 Prediction:
 Future stock prices or currency exchange rates
55
Some web-based examples of machine learning
 The web contains a lot of data.Tasks with very big datasets often
use machine learning
 especially if the data is noisy or non-stationary.
 Spam filtering, fraud detection:
 The enemy adapts so we must adapt too.
 Recommendation systems:
 Lots of noisy data. Million dollar prize!
 Information retrieval:
 Find documents or images with similar content.
 DataVisualization:
 Display a huge database in a revealing way (demo)
56
Displaying the structure of a set of documents
using Latent Semantic Analysis (a form of PCA)
Each document is converted to a
vector of word counts.This
vector is then mapped to two
coordinates and displayed as a
colored dot.The colors
represent the hand-labeled
classes.
When the documents are laid
out in 2-D, the classes are not
used. So we can judge how good
the algorithm is by seeing if the
classes are separated.
57
Displaying the structure of a set of documents
using a deep neural network
58
Machine Learning & Symbolic AI
 Knowledge Representation works with facts/assertions and
develops rules of logical inference. The rules can handle
quantifiers. Learning and uncertainty are usually ignored.
 Expert Systems used logical rules or conditional probabilities
provided by “experts” for specific domains.
 Graphical Models treat uncertainty properly and allow learning
(but they often ignore quantifiers and use a fixed set of variables)
 Set of logical assertions → values of a subset of the variables
and local models of the probabilistic interactions between
variables.
 Logical inference → probability distributions over subsets of
the unobserved variables (or individual ones)
 Learning = refining the local models of the interactions.
59
Machine Learning & Statistics
 A lot of machine learning is just a rediscovery of things that
statisticians already knew. This is often disguised by differences
in terminology:
 Ridge regression = weight-decay
 Fitting = learning
 Held-out data = test data
 But the emphasis is very different:
 A good piece of statistics: Clever proof that a relatively
simple estimation procedure is asymptotically unbiased.
 A good piece of machine learning: Demonstration that a
complicated algorithm produces impressive results on a
specific task.
 Data-mining: Using very simple machine learning techniques
on very large databases because computers are too slow to
do anything more interesting with ten billion examples.
60
A spectrum of machine learning tasks
 Low-dimensional data (e.g. less
than 100 dimensions)
 Lots of noise in the data
 There is not much structure in
the data, and what structure there
is, can be represented by a fairly
simple model.
 The main problem is
distinguishing true structure from
noise.
 High-dimensional data (e.g. more
than 100 dimensions)
 The noise is not sufficient to
obscure the structure in the data
if we process it right.
 There is a huge amount of
structure in the data, but the
structure is too complicated to
be represented by a simple
model.
 The main problem is figuring out
a way to represent the
complicated structure that allows
it to be learned.
Statistics------------------------------------Artificial Intelligence
So What Is Machine Learning?
 Automating automation
 Getting computers to program themselves
 Writing software is the bottleneck
 Let the data do the work instead!
62
Traditional Programming
Machine Learning
Computer
Data
Program
Output
Computer
Data
Output
Program
Magic?
No, more like gardening
 Seeds = Algorithms
 Nutrients = Data
 Gardener =You
 Plants = Programs
64
Sample Applications
 Web search
 Computational biology
 Finance
 E-commerce
 Space exploration
 Robotics
 Information extraction
 Social networks
 Debugging
 [Your favorite area]
65
Growth of Machine Learning
 Machine learning is preferred approach to
 Speech recognition, Natural language processing
 Computer vision
 Medical outcomes analysis
 Robot control
 Computational biology
 This trend is accelerating
 Improved machine learning algorithms
 Improved data capture, networking, faster computers
 Software too complex to write by hand
 New sensors / IO devices
 Demand for self-customization to user, environment
 It turns out to be difficult to extract knowledge from human experts→failure of
expert systems in the 1980’s.
66
67
Applications
 Association Analysis
 Supervised Learning
 Classification
 Regression/Prediction
 Unsupervised Learning
 Reinforcement Learning
Learning Associations
 Basket analysis:
P (Y | X ) probability that somebody who buys X
also buys Y where X and Y are products/services.
Example: P ( chips | beer ) = 0.7
Market-Basket transactions
TID Items
1 Bread, Milk
2 Bread, Diaper, Beer, Eggs
3 Milk, Diaper, Beer, Coke
4 Bread, Milk, Diaper, Beer
5 Bread, Milk, Diaper, Coke
68
69
Classification
 Example: Credit
scoring
 Differentiating
between low-risk
and high-risk
customers from
their income and
savings
Discriminant: IF income > θ1 AND savings > θ2
THEN low-risk ELSE high-risk
Model
70
Classification:Applications
 Aka Pattern recognition
 Face recognition: Pose, lighting, occlusion (glasses, beard),
make-up, hair style
 Character recognition: Different handwriting styles.
 Speech recognition:Temporal dependency.
 Use of a dictionary or the syntax of the language.
 Sensor fusion: Combine multiple modalities; eg, visual
(lip image) and acoustic for speech
 Medical diagnosis: From symptoms to illnesses
 Web Advertizing: Predict if a user clicks on an ad on the
Internet.
71
Face Recognition
Training examples of a person
Test images
AT&T Laboratories, Cambridge UK
http://www.uk.research.att.com/facedatabase.html
72
Prediction: Regression
 Example: Price of a
used car
 x : car attributes
y : price
y = g (x | θ )
g ( ) model,
θ parameters
y = wx+w0
73
Regression Applications
 Navigating a car:Angle of the steering wheel
(CMU NavLab)
 Kinematics of a robot arm
α1= g1(x,y)
α2= g2(x,y)
α1
α2
(x,y)
74
Supervised Learning: Uses
 Prediction of future cases: Use the rule to
predict the output for future inputs
 Knowledge extraction:The rule is easy to
understand
 Compression:The rule is simpler than the
data it explains
 Outlier detection: Exceptions that are not
covered by the rule, e.g., fraud
Example: decision trees tools that create rules
75
Unsupervised Learning
 Learning “what normally happens”
 No output
 Clustering: Grouping similar instances
 Other applications: Summarization,Association
Analysis
 Example applications
 Customer segmentation in CRM
 Image compression: Color quantization
 Bioinformatics: Learning motifs
76
Reinforcement Learning
 Topics:
 Policies: what actions should an agent take in a particular
situation
 Utility estimation: how good is a state (→used by policy)
 No supervised output but delayed reward
 Credit assignment problem (what was responsible for the
outcome)
 Applications:
 Game playing
 Robot in a maze
 Multiple agents, partial observability, ...
77
Resources: Datasets
 UCI Repository:
http://www.ics.uci.edu/~mlearn/MLRepository.html
 UCI KDD Archive:
http://kdd.ics.uci.edu/summary.data.application.html
 Statlib: http://lib.stat.cmu.edu/
 Delve: http://www.cs.utoronto.ca/~delve/
78
Resources: Journals
 Journal of Machine Learning Research
www.jmlr.org
 Machine Learning
 IEEETransactions on Neural Networks
 IEEETransactions on Pattern Analysis and Machine
Intelligence
 Annals of Statistics
 Journal of the American Statistical Association
 ...
79
Resources: Conferences
 International Conference on Machine Learning (ICML)
 European Conference on Machine Learning (ECML)
 Neural Information Processing Systems (NIPS)
 Computational Learning
 International Joint Conference on Artificial Intelligence
(IJCAI)
 ACM SIGKDD Conference on Knowledge Discovery and
Data Mining (KDD)
 IEEE Int. Conf. on Data Mining (ICDM)
ML in a Nutshell
 Tens of thousands of machine learning
algorithms
 Hundreds new every year
 Every machine learning algorithm has three
components:
 Representation
 Evaluation
 Optimization
80
Representation
 Decision trees
 Sets of rules / Logic programs
 Instances
 Graphical models (Bayes/Markov nets)
 Neural networks
 Support vector machines
 Model ensembles
 Etc.
81
Evaluation
 Accuracy
 Precision and recall
 Squared error
 Likelihood
 Posterior probability
 Cost / Utility
 Margin
 Entropy
 K-L divergence
 Etc.
82
Optimization
 Combinatorial optimization
 E.g.: Greedy search
 Convex optimization
 E.g.: Gradient descent
 Constrained optimization
 E.g.: Linear programming
83
We have a simple overview of some
techniques and algorithms in machine learning.
Furthermore, there are more and more
techniques apply machine learning as a solution.
In the future, machine learning will play an
important role in our daily life.
Conclusion
84
[1] W. L. Chao, J. J. Ding, “Integrated Machine
Learning Algorithms for Human Age Estimation”,
NTU, 2011.
Reference
85

Machine Learning an Research Overview

  • 1.
    Machine Learning anResearch Overview AICTE SPONSORED 2 Weeks FDP on Artificial Intelligence and Advanced Machine Learning using Data Science Venue: S A Engg College Date:26.11.2019 Dr.A.Kathirvel, Professor and Head Misrimal Navajee Munoth Jain Engineering College, Chennai
  • 2.
     What ismachine learning?  Learning system model  Training and testing  Performance  Algorithms  Machine learning structure  What are we seeking?  Learning techniques  Applications  Conclusion Outline & Content
  • 3.
    3 Why “Learn”?  Machinelearning is programming computers to optimize a performance criterion using example data or past experience.  There is no need to “learn” to calculate payroll  Learning is used when:  Human expertise does not exist (navigating on Mars),  Humans are unable to explain their expertise (speech recognition)  Solution changes in time (routing on a computer network)  Solution needs to be adapted to particular cases (user biometrics)
  • 4.
    4 What & WhenWeTalk About “Learning”  Learning general models from a data of particular examples  Data is cheap and abundant (data warehouses, data marts); knowledge is expensive and scarce.  Example in retail: Customer transactions to consumer behavior: People who bought “DaVinci Code” also bought “The Five PeopleYou Meet in Heaven” (www.amazon.com)  Build a model that is a good and useful approximation to the data.
  • 5.
    5 Data Mining/KDD  Retail:Market basket analysis, Customer relationship management (CRM)  Finance: Credit scoring, fraud detection  Manufacturing: Optimization, troubleshooting  Medicine: Medical diagnosis  Telecommunications: Quality of service optimization  Bioinformatics: Motifs, alignment  Web mining: Search engines Definition := “KDD is the non-trivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data” (Fayyad) Applications:
  • 6.
    6 What is MachineLearning?  Machine Learning  Study of algorithms that  improve their performance  at some task  with experience  Optimize a performance criterion using example data or past experience.  Role of Statistics: Inference from a sample  Role of Computer science: Efficient algorithms to  Solve the optimization problem  Representing and evaluating the model for inference
  • 7.
     A branchof artificial intelligence, concerned with the design and development of algorithms that allow computers to evolve behaviors based on empirical data.  As intelligence requires knowledge, it is necessary for the computers to acquire knowledge. What is machine learning? 7
  • 8.
    What is MachineLearning?  It is very hard to write programs that solve problems like recognizing a face.  We don’t know what program to write because we don’t know how our brain does it.  Even if we had a good idea about how to do it, the program might be horrendously complicated.  Instead of writing a program by hand, we collect lots of examples that specify the correct output for a given input.  A machine learning algorithm then takes these examples and produces a program that does the job.  The program produced by the learning algorithm may look very different from a typical hand-written program. It may contain millions of numbers.  If we do it right, the program works for new cases as well as the ones we trained it on. 8
  • 9.
    Machine Learning is… Machinelearning, a branch of artificial intelligence, concerns the construction and study of systems that can learn from data. 9
  • 10.
    Machine Learning is… Machinelearning is programming computers to optimize a performance criterion using example data or past experience. -- Ethem Alpaydin The goal of machine learning is to develop methods that can automatically detect patterns in data, and then to use the uncovered patterns to predict future data or other outcomes of interest. -- Kevin P. Murphy The field of pattern recognition is concerned with the automatic discovery of regularities in data through the use of computer algorithms and with the use of these regularities to take actions. -- Christopher M. Bishop 10
  • 11.
    Machine Learning is… Machinelearning is about predicting the future based on the past. -- Hal Daume III 11
  • 12.
    Machine Learning is… Machinelearning is about predicting the future based on the past. -- Hal Daume III Training Data model/ predictor past model/ predictor future Testing Data 12
  • 13.
  • 14.
    Training and testing Trainingset (observed) Universal set (unobserved) Testing set (unobserved) Data acquisition Practical usage 14
  • 15.
     Training isthe process of making the system able to learn.  No free lunch rule:  Training set and testing set come from the same distribution  Need to make some assumptions or bias Training and testing 15
  • 16.
     There areseveral factors affecting the performance:  Types of training provided  The form and extent of any initial background knowledge  The type of feedback provided  The learning algorithms used  Two important factors:  Modeling  Optimization Performance 16
  • 17.
     The successof machine learning system also depends on the algorithms.  The algorithms control the search to find and build the knowledge structures.  The learning algorithms should extract useful information from training examples. Algorithms
  • 18.
     Supervised learning( )  Prediction  Classification (discrete labels), Regression (real values)  Unsupervised learning ( )  Clustering  Probability distribution estimation  Finding association (in features)  Dimension reduction  Semi-supervised learning  Reinforcement learning  Decision making (robot, chess machine) Algorithms 18
  • 19.
    19 Algorithms Supervised learning Unsupervisedlearning Semi-supervised learning
  • 20.
  • 21.
  • 22.
  • 23.
  • 24.
    Supervised learning Supervised learning:given labeled examples label label1 label3 label4 label5 labeled examples examples 24
  • 25.
    Supervised learning Supervised learning:given labeled examples model/ predictor label label1 label3 label4 label5 25
  • 26.
    Supervised learning Supervised learning:learn to predict new example model/ predictor predicted label 26
  • 27.
    Supervised learning: classification Supervisedlearning: given labeled examples label apple apple banana banana Classification: a finite set of labels 27
  • 28.
    Types of learningtask  Supervised learning  Learn to predict output when given an input vector  Who provides the correct answer?  Reinforcement learning  Learn action to maximize payoff  Not much information in a payoff signal  Payoff is often delayed  Reinforcement learning is an important area that will not be covered in this course.  Unsupervised learning  Create an internal representation of the input e.g. form clusters; extract features  How do we know if a representation is good?  This is the new frontier of machine learning because most big datasets do not come with labels. 28
  • 29.
    Classification Example Differentiate between low-risk andhigh-risk customers from their income and savings 29
  • 30.
    Classification Applications Face recognition Characterrecognition Spam detection Medical diagnosis: From symptoms to illnesses Biometrics: Recognition/authentication using physical and/or behavioral characteristics: Face, iris, signature, etc ... 30
  • 31.
    Supervised learning: regression Supervisedlearning: given labeled examples label -4.5 10.1 3.2 4.3 Regression: label is real-valued 31
  • 32.
    Regression Example Price ofa used car x : car attributes (e.g. mileage) y : price y = wx+w0 32
  • 33.
    Regression Applications ❑Economics/Finance: predictthe value of a stock ❑Epidemiology ❑Car/plane navigation: angle of the steering wheel, acceleration, … ❑Temporal trends: weather over time ❑… 33
  • 34.
    Supervised learning: ranking Supervisedlearning: given labeled examples label 1 4 2 3 Ranking: label is a ranking 34
  • 35.
    Ranking example Given aquery and a set of web pages, rank them according to relevance 35
  • 36.
    Ranking Applications ❑User preference,e.g. Netflix “My List” -- movie queue ranking ❑iTunes ❑flight search (search in general) ❑reranking N-best output lists ❑… 36
  • 37.
    Unsupervised learning Unupervised learning:given data, i.e. examples, but no labels
  • 38.
    Unsupervised learning applications ❑learnclusters/groups without any label ❑customer segmentation (i.e. grouping) ❑image compression ❑bioinformatics: learn motifs ❑…
  • 39.
    Reinforcement learning left, right,straight, left, left, left, straight left, straight, straight, left, right, straight, straight GOOD BAD left, right, straight, left, left, left, straight left, straight, straight, left, right, straight, straight 18.5 -3 Given a sequence of examples/states and a reward after completing that sequence, learn to predict the action to take in for an individual example/state
  • 40.
    Reinforcement learning example …WIN! … LOSE! Backgammon Given sequences of moves and whether or not the player won at the end, learn to make good moves 40
  • 41.
  • 42.
    Other learning variations Whatdata is available:  Supervised, unsupervised, reinforcement learning  semi-supervised, active learning, … How are we getting the data:  online vs. offline learning Type of model:  generative vs. discriminative  parametric vs. non-parametric 42
  • 43.
     Supervised learning Machinelearning structure 43
  • 44.
     Unsupervised learning Machinelearning structure 44
  • 45.
     Supervised: LowE-out or maximize probabilistic terms  Unsupervised: Minimum quantization error, Minimum distance, MAP, MLE(maximum likelihood estimation) What are we seeking? E-in: for training set E-out: for testing set 45
  • 46.
    Under-fitting VS. Over-fitting(fixed N) What are we seeking? error (model = hypothesis + loss functions) 46
  • 47.
     Supervised learningcategories and techniques  Linear classifier (numerical functions)  Parametric (Probabilistic functions)  Naïve Bayes, Gaussian discriminant analysis (GDA), Hidden Markov models (HMM), Probabilistic graphical models  Non-parametric (Instance-based functions)  K-nearest neighbors, Kernel regression, Kernel density estimation, Local regression  Non-metric (Symbolic functions)  Classification and regression tree (CART), decision tree  Aggregation  Bagging (bootstrap + aggregation),Adaboost, Random forest Learning techniques 47
  • 48.
     Techniques:  Perceptron Logistic regression  Support vector machine (SVM)  Ada-line  Multi-layer perceptron (MLP) Learning techniques , where w is an d-dim vector (learned) • Linear classifier 48
  • 49.
    Learning techniques Using perceptronlearning algorithm(PLA) Training Testing Error rate: 0.10 Error rate: 0.156 49
  • 50.
    Learning techniques Using logisticregression Training Testing Error rate: 0.11 Error rate: 0.145 50
  • 51.
     Support vectormachine (SVM):  Linear to nonlinear: Feature transform and kernel function Learning techniques • Non-linear case 51
  • 52.
     Unsupervised learningcategories and techniques  Clustering  K-means clustering  Spectral clustering  Density Estimation  Gaussian mixture model (GMM)  Graphical models  Dimensionality reduction  Principal component analysis (PCA)  Factor analysis Learning techniques 52
  • 53.
     Face detection Object detection and recognition  Image segmentation  Multimedia event detection  Economical and commercial usage Applications 53
  • 54.
    A classic exampleof a task that requires machine learning: It is very hard to say what makes a 2 54
  • 55.
    Some more examplesof tasks that are best solved by using a learning algorithm  Recognizing patterns:  Facial identities or facial expressions  Handwritten or spoken words  Medical images  Generating patterns:  Generating images or motion sequences (demo)  Recognizing anomalies:  Unusual sequences of credit card transactions  Unusual patterns of sensor readings in a nuclear power plant or unusual sound in your car engine.  Prediction:  Future stock prices or currency exchange rates 55
  • 56.
    Some web-based examplesof machine learning  The web contains a lot of data.Tasks with very big datasets often use machine learning  especially if the data is noisy or non-stationary.  Spam filtering, fraud detection:  The enemy adapts so we must adapt too.  Recommendation systems:  Lots of noisy data. Million dollar prize!  Information retrieval:  Find documents or images with similar content.  DataVisualization:  Display a huge database in a revealing way (demo) 56
  • 57.
    Displaying the structureof a set of documents using Latent Semantic Analysis (a form of PCA) Each document is converted to a vector of word counts.This vector is then mapped to two coordinates and displayed as a colored dot.The colors represent the hand-labeled classes. When the documents are laid out in 2-D, the classes are not used. So we can judge how good the algorithm is by seeing if the classes are separated. 57
  • 58.
    Displaying the structureof a set of documents using a deep neural network 58
  • 59.
    Machine Learning &Symbolic AI  Knowledge Representation works with facts/assertions and develops rules of logical inference. The rules can handle quantifiers. Learning and uncertainty are usually ignored.  Expert Systems used logical rules or conditional probabilities provided by “experts” for specific domains.  Graphical Models treat uncertainty properly and allow learning (but they often ignore quantifiers and use a fixed set of variables)  Set of logical assertions → values of a subset of the variables and local models of the probabilistic interactions between variables.  Logical inference → probability distributions over subsets of the unobserved variables (or individual ones)  Learning = refining the local models of the interactions. 59
  • 60.
    Machine Learning &Statistics  A lot of machine learning is just a rediscovery of things that statisticians already knew. This is often disguised by differences in terminology:  Ridge regression = weight-decay  Fitting = learning  Held-out data = test data  But the emphasis is very different:  A good piece of statistics: Clever proof that a relatively simple estimation procedure is asymptotically unbiased.  A good piece of machine learning: Demonstration that a complicated algorithm produces impressive results on a specific task.  Data-mining: Using very simple machine learning techniques on very large databases because computers are too slow to do anything more interesting with ten billion examples. 60
  • 61.
    A spectrum ofmachine learning tasks  Low-dimensional data (e.g. less than 100 dimensions)  Lots of noise in the data  There is not much structure in the data, and what structure there is, can be represented by a fairly simple model.  The main problem is distinguishing true structure from noise.  High-dimensional data (e.g. more than 100 dimensions)  The noise is not sufficient to obscure the structure in the data if we process it right.  There is a huge amount of structure in the data, but the structure is too complicated to be represented by a simple model.  The main problem is figuring out a way to represent the complicated structure that allows it to be learned. Statistics------------------------------------Artificial Intelligence
  • 62.
    So What IsMachine Learning?  Automating automation  Getting computers to program themselves  Writing software is the bottleneck  Let the data do the work instead! 62
  • 63.
  • 64.
    Magic? No, more likegardening  Seeds = Algorithms  Nutrients = Data  Gardener =You  Plants = Programs 64
  • 65.
    Sample Applications  Websearch  Computational biology  Finance  E-commerce  Space exploration  Robotics  Information extraction  Social networks  Debugging  [Your favorite area] 65
  • 66.
    Growth of MachineLearning  Machine learning is preferred approach to  Speech recognition, Natural language processing  Computer vision  Medical outcomes analysis  Robot control  Computational biology  This trend is accelerating  Improved machine learning algorithms  Improved data capture, networking, faster computers  Software too complex to write by hand  New sensors / IO devices  Demand for self-customization to user, environment  It turns out to be difficult to extract knowledge from human experts→failure of expert systems in the 1980’s. 66
  • 67.
    67 Applications  Association Analysis Supervised Learning  Classification  Regression/Prediction  Unsupervised Learning  Reinforcement Learning
  • 68.
    Learning Associations  Basketanalysis: P (Y | X ) probability that somebody who buys X also buys Y where X and Y are products/services. Example: P ( chips | beer ) = 0.7 Market-Basket transactions TID Items 1 Bread, Milk 2 Bread, Diaper, Beer, Eggs 3 Milk, Diaper, Beer, Coke 4 Bread, Milk, Diaper, Beer 5 Bread, Milk, Diaper, Coke 68
  • 69.
    69 Classification  Example: Credit scoring Differentiating between low-risk and high-risk customers from their income and savings Discriminant: IF income > θ1 AND savings > θ2 THEN low-risk ELSE high-risk Model
  • 70.
    70 Classification:Applications  Aka Patternrecognition  Face recognition: Pose, lighting, occlusion (glasses, beard), make-up, hair style  Character recognition: Different handwriting styles.  Speech recognition:Temporal dependency.  Use of a dictionary or the syntax of the language.  Sensor fusion: Combine multiple modalities; eg, visual (lip image) and acoustic for speech  Medical diagnosis: From symptoms to illnesses  Web Advertizing: Predict if a user clicks on an ad on the Internet.
  • 71.
    71 Face Recognition Training examplesof a person Test images AT&T Laboratories, Cambridge UK http://www.uk.research.att.com/facedatabase.html
  • 72.
    72 Prediction: Regression  Example:Price of a used car  x : car attributes y : price y = g (x | θ ) g ( ) model, θ parameters y = wx+w0
  • 73.
    73 Regression Applications  Navigatinga car:Angle of the steering wheel (CMU NavLab)  Kinematics of a robot arm α1= g1(x,y) α2= g2(x,y) α1 α2 (x,y)
  • 74.
    74 Supervised Learning: Uses Prediction of future cases: Use the rule to predict the output for future inputs  Knowledge extraction:The rule is easy to understand  Compression:The rule is simpler than the data it explains  Outlier detection: Exceptions that are not covered by the rule, e.g., fraud Example: decision trees tools that create rules
  • 75.
    75 Unsupervised Learning  Learning“what normally happens”  No output  Clustering: Grouping similar instances  Other applications: Summarization,Association Analysis  Example applications  Customer segmentation in CRM  Image compression: Color quantization  Bioinformatics: Learning motifs
  • 76.
    76 Reinforcement Learning  Topics: Policies: what actions should an agent take in a particular situation  Utility estimation: how good is a state (→used by policy)  No supervised output but delayed reward  Credit assignment problem (what was responsible for the outcome)  Applications:  Game playing  Robot in a maze  Multiple agents, partial observability, ...
  • 77.
    77 Resources: Datasets  UCIRepository: http://www.ics.uci.edu/~mlearn/MLRepository.html  UCI KDD Archive: http://kdd.ics.uci.edu/summary.data.application.html  Statlib: http://lib.stat.cmu.edu/  Delve: http://www.cs.utoronto.ca/~delve/
  • 78.
    78 Resources: Journals  Journalof Machine Learning Research www.jmlr.org  Machine Learning  IEEETransactions on Neural Networks  IEEETransactions on Pattern Analysis and Machine Intelligence  Annals of Statistics  Journal of the American Statistical Association  ...
  • 79.
    79 Resources: Conferences  InternationalConference on Machine Learning (ICML)  European Conference on Machine Learning (ECML)  Neural Information Processing Systems (NIPS)  Computational Learning  International Joint Conference on Artificial Intelligence (IJCAI)  ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD)  IEEE Int. Conf. on Data Mining (ICDM)
  • 80.
    ML in aNutshell  Tens of thousands of machine learning algorithms  Hundreds new every year  Every machine learning algorithm has three components:  Representation  Evaluation  Optimization 80
  • 81.
    Representation  Decision trees Sets of rules / Logic programs  Instances  Graphical models (Bayes/Markov nets)  Neural networks  Support vector machines  Model ensembles  Etc. 81
  • 82.
    Evaluation  Accuracy  Precisionand recall  Squared error  Likelihood  Posterior probability  Cost / Utility  Margin  Entropy  K-L divergence  Etc. 82
  • 83.
    Optimization  Combinatorial optimization E.g.: Greedy search  Convex optimization  E.g.: Gradient descent  Constrained optimization  E.g.: Linear programming 83
  • 84.
    We have asimple overview of some techniques and algorithms in machine learning. Furthermore, there are more and more techniques apply machine learning as a solution. In the future, machine learning will play an important role in our daily life. Conclusion 84
  • 85.
    [1] W. L.Chao, J. J. Ding, “Integrated Machine Learning Algorithms for Human Age Estimation”, NTU, 2011. Reference 85