Machine Learning an Research Overview

Machine Learning an Research
Overview
AICTE SPONSORED 2 Weeks FDP on
Artificial Intelligence and Advanced Machine
Learning using Data Science
Venue: S A Engg College Date:26.11.2019
Dr.A.Kathirvel,
Professor and Head
Misrimal Navajee Munoth Jain Engineering College, Chennai

 What is machine learning?
 Learning system model
 Training and testing
 Performance
 Algorithms
 Machine learning structure
 What are we seeking?
 Learning techniques
 Applications
 Conclusion
Outline & Content

3
Why “Learn”?
 Machine learning is programming computers to
optimize a performance criterion using example
data or past experience.
 There is no need to “learn” to calculate payroll
 Learning is used when:
 Human expertise does not exist (navigating on Mars),
 Humans are unable to explain their expertise (speech
recognition)
 Solution changes in time (routing on a computer
network)
 Solution needs to be adapted to particular cases (user
biometrics)

4
What & WhenWe Talk About “Learning”
 Learning general models from a data of particular
examples
 Data is cheap and abundant (data warehouses, data
marts); knowledge is expensive and scarce.
 Example in retail: Customer transactions to
consumer behavior:
People who bought “DaVinci Code” also bought “The Five
PeopleYou Meet in Heaven” (www.amazon.com)
 Build a model that is a good and useful approximation
to the data.

5
Data Mining/KDD
 Retail: Market basket analysis, Customer relationship
management (CRM)
 Finance: Credit scoring, fraud detection
 Manufacturing: Optimization, troubleshooting
 Medicine: Medical diagnosis
 Telecommunications: Quality of service optimization
 Bioinformatics: Motifs, alignment
 Web mining: Search engines
Definition := “KDD is the non-trivial process of identifying valid,
novel, potentially useful, and ultimately understandable patterns in
data” (Fayyad)
Applications:

6
What is Machine Learning?
 Machine Learning
 Study of algorithms that
 improve their performance
 at some task
 with experience
 Optimize a performance criterion using example
data or past experience.
 Role of Statistics: Inference from a sample
 Role of Computer science: Efficient algorithms to
 Solve the optimization problem
 Representing and evaluating the model for inference

 A branch of artificial intelligence, concerned with
the design and development of algorithms that allow
computers to evolve behaviors based on empirical
data.
 As intelligence requires knowledge, it is necessary for
the computers to acquire knowledge.
What is machine learning?
7

What is Machine Learning?
 It is very hard to write programs that solve problems like
recognizing a face.
 We don’t know what program to write because we
don’t know how our brain does it.
 Even if we had a good idea about how to do it, the
program might be horrendously complicated.
 Instead of writing a program by hand, we collect lots of
examples that specify the correct output for a given input.
 A machine learning algorithm then takes these examples
and produces a program that does the job.
 The program produced by the learning algorithm may
look very different from a typical hand-written program.
It may contain millions of numbers.
 If we do it right, the program works for new cases as
well as the ones we trained it on.
8

Machine Learning is…
Machine learning, a branch of artificial intelligence,
concerns the construction and study of systems
that can learn from data.
9

Machine learning is programming computers to optimize a
performance criterion using example data or past experience.
-- Ethem Alpaydin
The goal of machine learning is to develop methods that can
automatically detect patterns in data, and then to use the uncovered
patterns to predict future data or other outcomes of interest.
-- Kevin P. Murphy
The field of pattern recognition is concerned with the automatic
discovery of regularities in data through the use of computer
algorithms and with the use of these regularities to take actions.
-- Christopher M. Bishop
10

Machine learning is about predicting the future
based on the past.
-- Hal Daume III
11

Machine learning is about predicting the future based on
the past.
-- Hal Daume III
Training
Data
model/
predictor
past
model/
predictor
future
Testing
Data
12

Learning system model
Input
Samples
Learning
Method
System
Training
Testing
13

Training and testing
Training set
(observed)
Universal set
(unobserved)
Testing set
(unobserved)
Data acquisition Practical usage
14

 Training is the process of making the system able to learn.
 No free lunch rule:
 Training set and testing set come from the same distribution
 Need to make some assumptions or bias
Training and testing
15

 There are several factors affecting the performance:
 Types of training provided
 The form and extent of any initial background
knowledge
 The type of feedback provided
 The learning algorithms used
 Two important factors:
 Modeling
 Optimization
Performance
16

 The success of machine learning system also
depends on the algorithms.
 The algorithms control the search to find and
build the knowledge structures.
 The learning algorithms should extract useful
information from training examples.
Algorithms

 Supervised learning ( )
 Prediction
 Classification (discrete labels), Regression (real values)
 Unsupervised learning ( )
 Clustering
 Probability distribution estimation
 Finding association (in features)
 Dimension reduction
 Semi-supervised learning
 Reinforcement learning
 Decision making (robot, chess machine)
Algorithms
18

19
Algorithms
Supervised learning Unsupervised learning
Semi-supervised learning

Supervised learning
Supervised learning: given labeled examples
label
label1
label3
label4
label5
labeled examples
examples
24

Supervised learning
model/
predictor
label
label1
label3
label4
label5
25

Supervised learning
Supervised learning: learn to predict new example
model/
predictor
predicted label
26

Supervised learning: classification
label
apple
apple
banana
banana
Classification: a finite set of labels
27

Types of learning task
 Supervised learning
 Learn to predict output when given an input vector
 Who provides the correct answer?
 Reinforcement learning
 Learn action to maximize payoff
 Not much information in a payoff signal
 Payoff is often delayed
 Reinforcement learning is an important area that will not be
covered in this course.
 Unsupervised learning
 Create an internal representation of the input e.g. form clusters;
extract features
 How do we know if a representation is good?
 This is the new frontier of machine learning because most big
datasets do not come with labels.
28

Classification Example
Differentiate
between low-risk
and high-risk
customers from
their income and
savings
29

Classification Applications
Face recognition
Character recognition
Spam detection
Medical diagnosis: From symptoms to illnesses
Biometrics: Recognition/authentication using physical
and/or behavioral characteristics: Face, iris, signature, etc
...
30

Supervised learning: regression
label
-4.5
10.1
3.2
4.3
Regression: label is real-valued
31

Regression Example
Price of a used car
x : car attributes
(e.g. mileage)
y : price
y = wx+w0
32

Regression Applications
❑Economics/Finance: predict the value of a stock
❑Epidemiology
❑Car/plane navigation: angle of the steering wheel,
acceleration, …
❑Temporal trends: weather over time
❑…
33

Supervised learning: ranking
label
1
4
2
3
Ranking: label is a ranking
34

Ranking example
Given a query
and
a set of web
pages,
rank them
according
to relevance
35

Ranking Applications
❑User preference, e.g. Netflix “My List” -- movie
queue ranking
❑iTunes
❑flight search (search in general)
❑reranking N-best output lists
❑…
36

Unsupervised learning
Unupervised learning: given data, i.e. examples, but no labels

Unsupervised learning applications
❑learn clusters/groups without any label
❑customer segmentation (i.e. grouping)
❑image compression
❑bioinformatics: learn motifs
❑…

Reinforcement learning
left, right, straight, left, left, left, straight
left, straight, straight, left, right, straight, straight
GOOD
BAD
left, right, straight, left, left, left, straight
left, straight, straight, left, right, straight, straight
18.5
-3
Given a sequence of examples/states and a reward after
completing that sequence, learn to predict the action to take
in for an individual example/state

Reinforcement learning example
… WIN!
… LOSE!
Backgammon
Given sequences of moves and whether or not the
player won at the end, learn to make good moves
40

Reinforcement learning example
http://www.youtube.com/watch?v=VCdxqn0fcnE
41

Other learning variations
What data is available:
 Supervised, unsupervised, reinforcement learning
 semi-supervised, active learning, …
How are we getting the data:
 online vs. offline learning
Type of model:
 generative vs. discriminative
 parametric vs. non-parametric
42

 Supervised learning
Machine learning structure
43

 Unsupervised learning
Machine learning structure
44

 Supervised: Low E-out or maximize probabilistic terms
 Unsupervised: Minimum quantization error, Minimum distance,
MAP, MLE(maximum likelihood estimation)
What are we seeking?
E-in: for training set
E-out: for testing set
45

Under-fitting VS. Over-fitting (fixed N)
What are we seeking?
error
(model = hypothesis + loss functions)
46

 Supervised learning categories and techniques
 Linear classifier (numerical functions)
 Parametric (Probabilistic functions)
 Naïve Bayes, Gaussian discriminant analysis (GDA), Hidden Markov
models (HMM), Probabilistic graphical models
 Non-parametric (Instance-based functions)
 K-nearest neighbors, Kernel regression, Kernel density estimation,
Local regression
 Non-metric (Symbolic functions)
 Classification and regression tree (CART), decision tree
 Aggregation
 Bagging (bootstrap + aggregation),Adaboost, Random forest
Learning techniques
47

 Techniques:
 Perceptron
 Logistic regression
 Support vector machine (SVM)
 Ada-line
 Multi-layer perceptron (MLP)
Learning techniques
, where w is an d-dim vector (learned)
• Linear classifier
48

Learning techniques
Using perceptron learning algorithm(PLA)
Training Testing
Error rate: 0.10 Error rate: 0.156
49

Learning techniques
Using logistic regression
Training Testing
Error rate: 0.11 Error rate: 0.145
50

 Support vector machine (SVM):
 Linear to nonlinear: Feature transform and kernel function
Learning techniques
• Non-linear case
51

 Unsupervised learning categories and techniques
 Clustering
 K-means clustering
 Spectral clustering
 Density Estimation
 Gaussian mixture model (GMM)
 Graphical models
 Dimensionality reduction
 Principal component analysis (PCA)
 Factor analysis
Learning techniques
52

 Face detection
 Object detection and recognition
 Image segmentation
 Multimedia event detection
 Economical and commercial usage
Applications
53

A classic example of a task that requires machine learning:
It is very hard to say what makes a 2
54

Some more examples of tasks that are best solved by
using a learning algorithm
 Recognizing patterns:
 Facial identities or facial expressions
 Handwritten or spoken words
 Medical images
 Generating patterns:
 Generating images or motion sequences (demo)
 Recognizing anomalies:
 Unusual sequences of credit card transactions
 Unusual patterns of sensor readings in a nuclear power plant or
unusual sound in your car engine.
 Prediction:
 Future stock prices or currency exchange rates
55

Some web-based examples of machine learning
 The web contains a lot of data.Tasks with very big datasets often
use machine learning
 especially if the data is noisy or non-stationary.
 Spam filtering, fraud detection:
 The enemy adapts so we must adapt too.
 Recommendation systems:
 Lots of noisy data. Million dollar prize!
 Information retrieval:
 Find documents or images with similar content.
 DataVisualization:
 Display a huge database in a revealing way (demo)
56

Displaying the structure of a set of documents
using Latent Semantic Analysis (a form of PCA)
Each document is converted to a
vector of word counts.This
vector is then mapped to two
coordinates and displayed as a
colored dot.The colors
represent the hand-labeled
classes.
When the documents are laid
out in 2-D, the classes are not
used. So we can judge how good
the algorithm is by seeing if the
classes are separated.
57

Displaying the structure of a set of documents
using a deep neural network
58

Machine Learning & Symbolic AI
 Knowledge Representation works with facts/assertions and
develops rules of logical inference. The rules can handle
quantifiers. Learning and uncertainty are usually ignored.
 Expert Systems used logical rules or conditional probabilities
provided by “experts” for specific domains.
 Graphical Models treat uncertainty properly and allow learning
(but they often ignore quantifiers and use a fixed set of variables)
 Set of logical assertions → values of a subset of the variables
and local models of the probabilistic interactions between
variables.
 Logical inference → probability distributions over subsets of
the unobserved variables (or individual ones)
 Learning = refining the local models of the interactions.
59

Machine Learning & Statistics
 A lot of machine learning is just a rediscovery of things that
statisticians already knew. This is often disguised by differences
in terminology:
 Ridge regression = weight-decay
 Fitting = learning
 Held-out data = test data
 But the emphasis is very different:
 A good piece of statistics: Clever proof that a relatively
simple estimation procedure is asymptotically unbiased.
 A good piece of machine learning: Demonstration that a
complicated algorithm produces impressive results on a
specific task.
 Data-mining: Using very simple machine learning techniques
on very large databases because computers are too slow to
do anything more interesting with ten billion examples.
60

A spectrum of machine learning tasks
 Low-dimensional data (e.g. less
than 100 dimensions)
 Lots of noise in the data
 There is not much structure in
the data, and what structure there
is, can be represented by a fairly
simple model.
 The main problem is
distinguishing true structure from
noise.
 High-dimensional data (e.g. more
than 100 dimensions)
 The noise is not sufficient to
obscure the structure in the data
if we process it right.
 There is a huge amount of
structure in the data, but the
structure is too complicated to
be represented by a simple
model.
 The main problem is figuring out
a way to represent the
complicated structure that allows
it to be learned.
Statistics------------------------------------Artificial Intelligence

So What Is Machine Learning?
 Automating automation
 Getting computers to program themselves
 Writing software is the bottleneck
 Let the data do the work instead!
62

Traditional Programming
Machine Learning
Computer
Data
Program
Output
Computer
Data
Output
Program

Magic?
No, more like gardening
 Seeds = Algorithms
 Nutrients = Data
 Gardener =You
 Plants = Programs
64

Sample Applications
 Web search
 Computational biology
 Finance
 E-commerce
 Space exploration
 Robotics
 Information extraction
 Social networks
 Debugging
 [Your favorite area]
65

Growth of Machine Learning
 Machine learning is preferred approach to
 Speech recognition, Natural language processing
 Computer vision
 Medical outcomes analysis
 Robot control
 Computational biology
 This trend is accelerating
 Improved machine learning algorithms
 Improved data capture, networking, faster computers
 Software too complex to write by hand
 New sensors / IO devices
 Demand for self-customization to user, environment
 It turns out to be difficult to extract knowledge from human experts→failure of
expert systems in the 1980’s.
66

67
Applications
 Association Analysis
 Supervised Learning
 Classification
 Regression/Prediction
 Unsupervised Learning
 Reinforcement Learning

Learning Associations
 Basket analysis:
P (Y | X ) probability that somebody who buys X
also buys Y where X and Y are products/services.
Example: P ( chips | beer ) = 0.7
Market-Basket transactions
TID Items
1 Bread, Milk
2 Bread, Diaper, Beer, Eggs
3 Milk, Diaper, Beer, Coke
4 Bread, Milk, Diaper, Beer
5 Bread, Milk, Diaper, Coke
68

69
Classification
 Example: Credit
scoring
 Differentiating
between low-risk
and high-risk
customers from
their income and
savings
Discriminant: IF income > θ1 AND savings > θ2
THEN low-risk ELSE high-risk
Model

70
Classification:Applications
 Aka Pattern recognition
 Face recognition: Pose, lighting, occlusion (glasses, beard),
make-up, hair style
 Character recognition: Different handwriting styles.
 Speech recognition:Temporal dependency.
 Use of a dictionary or the syntax of the language.
 Sensor fusion: Combine multiple modalities; eg, visual
(lip image) and acoustic for speech
 Medical diagnosis: From symptoms to illnesses
 Web Advertizing: Predict if a user clicks on an ad on the
Internet.

71
Face Recognition
Training examples of a person
Test images
AT&T Laboratories, Cambridge UK
http://www.uk.research.att.com/facedatabase.html

72
Prediction: Regression
 Example: Price of a
used car
 x : car attributes
y : price
y = g (x | θ )
g ( ) model,
θ parameters
y = wx+w0

73
Regression Applications
 Navigating a car:Angle of the steering wheel
(CMU NavLab)
 Kinematics of a robot arm
α1= g1(x,y)
α2= g2(x,y)
α1
α2
(x,y)

74
Supervised Learning: Uses
 Prediction of future cases: Use the rule to
predict the output for future inputs
 Knowledge extraction:The rule is easy to
understand
 Compression:The rule is simpler than the
data it explains
 Outlier detection: Exceptions that are not
covered by the rule, e.g., fraud
Example: decision trees tools that create rules

75
Unsupervised Learning
 Learning “what normally happens”
 No output
 Clustering: Grouping similar instances
 Other applications: Summarization,Association
Analysis
 Example applications
 Customer segmentation in CRM
 Image compression: Color quantization
 Bioinformatics: Learning motifs

76
Reinforcement Learning
 Topics:
 Policies: what actions should an agent take in a particular
situation
 Utility estimation: how good is a state (→used by policy)
 No supervised output but delayed reward
 Credit assignment problem (what was responsible for the
outcome)
 Applications:
 Game playing
 Robot in a maze
 Multiple agents, partial observability, ...

77
Resources: Datasets
 UCI Repository:
http://www.ics.uci.edu/~mlearn/MLRepository.html
 UCI KDD Archive:
http://kdd.ics.uci.edu/summary.data.application.html
 Statlib: http://lib.stat.cmu.edu/
 Delve: http://www.cs.utoronto.ca/~delve/

78
Resources: Journals
 Journal of Machine Learning Research
www.jmlr.org
 Machine Learning
 IEEETransactions on Neural Networks
 IEEETransactions on Pattern Analysis and Machine
Intelligence
 Annals of Statistics
 Journal of the American Statistical Association
 ...

79
Resources: Conferences
 International Conference on Machine Learning (ICML)
 European Conference on Machine Learning (ECML)
 Neural Information Processing Systems (NIPS)
 Computational Learning
 International Joint Conference on Artificial Intelligence
(IJCAI)
 ACM SIGKDD Conference on Knowledge Discovery and
Data Mining (KDD)
 IEEE Int. Conf. on Data Mining (ICDM)

ML in a Nutshell
 Tens of thousands of machine learning
algorithms
 Hundreds new every year
 Every machine learning algorithm has three
components:
 Representation
 Evaluation
 Optimization
80

Representation
 Decision trees
 Sets of rules / Logic programs
 Instances
 Graphical models (Bayes/Markov nets)
 Neural networks
 Support vector machines
 Model ensembles
 Etc.
81

Evaluation
 Accuracy
 Precision and recall
 Squared error
 Likelihood
 Posterior probability
 Cost / Utility
 Margin
 Entropy
 K-L divergence
 Etc.
82

Optimization
 Combinatorial optimization
 E.g.: Greedy search
 Convex optimization
 E.g.: Gradient descent
 Constrained optimization
 E.g.: Linear programming
83

We have a simple overview of some
techniques and algorithms in machine learning.
Furthermore, there are more and more
techniques apply machine learning as a solution.
In the future, machine learning will play an
important role in our daily life.
Conclusion
84

[1] W. L. Chao, J. J. Ding, “Integrated Machine
Learning Algorithms for Human Age Estimation”,
NTU, 2011.
Reference
85

Machine Learning an Research Overview

More Related Content

What's hot

Similar to Machine Learning an Research Overview

More from Kathirvel Ayyaswamy

Recently uploaded

Machine Learning an Research Overview