SlideShare a Scribd company logo
1 of 56
Foundations of Machine
Learning (CS725)
Autumn 2011
Instructor: Prof. Ganesh Ramakrishnan
TAs: Ajay Nagesh, Amrita Saha,
Kedharnath Narahari
The grand goal
From the movie 2001: A Space Odyssey (1968)
Outline
• Introduction to Machine Learning
– What is machine learning?
– Why machine learning
– How machine learning relates to other fields
• Real world applications
• Machine Learning : Models and methods
– Supervised
– Unsupervised
– Semi-supervised
– Active learning
• Course Information
– Tools and software
– Pre-requisites
INTRODUCTION TO
MACHINE LEARNING
Intelligence
• Ability for abstract thought, understanding,
communication, reasoning, planning, emotional
intelligence, problem solving, learning
• The ability to learn and/or adapt is generally
considered a hallmark of intelligence
Learning and Machine Learning
• ``Learning denotes changes in the system that
are adaptive in the sense that they enable the
system to do the task(s) drawn from the same
population more efficiently and more
effectively the next time.''--Herbert Simon
• Machine Learning is concerned with the
development of algorithms and techniques
that allow computers to learn.
Machine Learning
• “Machine learning studies the process of
constructing abstractions (features, concepts,
functions, relations and ways of acting)
automatically from data.”
E.g.: Learning concepts and words
“tufa”
“tufa”
“tufa”
Can you pick out the tufas?
Source: Josh Tenenbaum
Why Machine Learning ?
• Human expertise does not exist (e.g. Martian
exploration)
• Humans cannot explain their expertise or reduce
it to a rule set, or their explanation is incomplete
and needs tuning (e.g. speech recognition)
• Situation changing in time (e.g. spam/junk email)
• Humans are expensive to train up (e.g. zipcode
recognition)
• There are large amounts of data (e.g. discover
astronomical objects)
APPLICATIONS OF
MACHINE LEARNING
Data Data Everywhere …
• Library of Congress text database of ~20 TB
• AT&T 323 TB, 1.9 trillion phone call records.
• World of Warcraft utilizes 1.3 PB of storage to
maintain its game.
• Avatar movie reported to have taken over 1 PB of
local storage at WetaDigital for the rendering of the
3D CGI effects.
• Google processes ~24 PB of data per day.
• YouTube: 24 hours of video uploaded every minute.
More video is uploaded in 60 days than all 3 major
US networks created in 60 years. According to Cisco,
internet video will generate over 18 EB of traffic per
month in 2013.
Information Overload
Machine Learning to the rescue
• Machine Learning is one of the front-line
technologies to handle Information Overload
• Business
– Mining correlations, trends, spatio-temporal predictions.
– Efficient supply chain management.
– Opinion mining and sentiment analysis.
– Recommender systems.
Fields related to Machine Learning
Fields related to Machine Learning
• Artificial Intelligence: computational intelligence
• Data Mining: searching through large volumes of
data
• Neural Networks: neural/brain inspired methods
• Signal Processing: signals, video, speech, image
• Pattern Recognition: labeling data
• Robotics: building autonomous robots
Application of Machine Learning
Deep Blue
and the chess
Challenge
RoboCup
Online Poker
Application of Machine Learning
• Computation Biology
(Structure learning)
• Animation and Control
• Tracking and activity
recognition
Application of Machine Learning
• Application in speech and
Natural Language processing
• Probabilistic Context Free
Grammars
• Graphical Models
• Social network graph
analysis, causality
analysis
Deep Q and A: IBM Watson
• Deep Question and Answering : Jeopardy challenge
• Watson emerged winner when pitted against all time
best rated players in the history of Jeopardy
Source: IBM Research
MACHINE LEARNING
MODELS AND METHODS
Machine Learning Process
 How to do the learning actually?
Learning (Formally)
 Task
 To apply some machine learning method to the data
obtained from a given domain (Training Data)
 The domain has some characteristics, which we are trying
to learn (Model)
 Objective
 To minimise the error in prediction
 Types of Learning
 Supervised Learning
 Unsupervised Learning
 Semi-Supervised Learning
 Active Learning
Supervised Learning
 Classification / Regression problem
 Where some samples of data (Training data) with the
correct class labels are provided.
 i.e. Some correspondence between input (X) & output (Y) given
 Using knowledge from training data, the classifier/ regressor
model is learnt
 i.e. Learn some function f : f(X) = Y
 f may be probabilistic/deterministic
 Learning the model ≡ Fitting the parameters of model to
minimise prediction error
 Model can then be tested on test-data
Regression
 Linear regression
 Uses
 Stock Prediction
 Outlier detection
Regression
Regression
 Non Linear regression
All models are not good
 Constrain the parameters
Classification
BearHead
DuckHead
LionHead
d1
d2
d3
f1 f2 f3 f4 Class label
???
Supervised Classification example
Source: LHI Animal Faces Dataset
Classification
 Example:
 Credit Scoring
 Goal:
 Differentiating between high-risk and low-
risk customers based on their income and
savings
 Discriminant:
 IF income > θ1 AND savings > θ2 THEN
low-risk ELSE high-risk
 Discriminant is called 'hypothesis'
 Input attribute space is called 'Feature Space'
 Here Input data is 2-dimensional and the
output is binary
Other applications
Building non-linear classifiers
 Curse of dimensionality
Application
What is the right hypothesis?
What is the right hypothesis for this
classification problem
What is the right hypothesis for this
regression problem
Which linear hypothesis is better
 Max – Margin
Classifier
Other considerations
 Feature extraction: which are the good features that
characterise the data
 Model selection: picking the right model using some
scoring/fitting function:
 It is important not only to provide a good predictor, but
also to assess accurately how “good” the model is on
unseen test data
 So a good performance estimator is needed to rank the
model
 Model averaging: Instead of picking a single model, it
might be better to do a weighted average over the best-
fit models
Which hypothesis is better?
 Unless you know something about the
distribution of problems your learning
algorithm will encounter, any hypothesis that
agrees with all your data is as good as any
other.
 You have to make assumptions about the
underlying features.
 Hence learning is inductive, not deductive.
Unsupervised Learning
 Labels may be too expensive to generate or
may be completely unknown
 There is lots of training data but with no class
labels assigned to it
???
Source: LHI Animal Faces Dataset
Unsupervised Learning
 For example clustering
 Clustering –
 grouping similar objects
 Similar in which way?
Clustering
Clustering Problems
 How to tell which type of clustering is
desirable?
Semi-Supervised Learning
 Supervised learning + Additional unlabeled data
 Unsupervised learning + Additional labeled data
 Learning Algorithm:
 Start from the labeled data to build an initial classifier
 Use the unlabeled data to enhance the model
 Some Techniques:
 Co-Training: two or more learners can be trained
using an independent set of different features
 Or to model joint probability distribution of the
features and labels
Example
 ideally...
Active Learning
 Unlabeled data is easy to obtain; but labels may be very
expensive
 For e.g. Speech recognizer
 Active Learning
 Initially all data labels are hidden
 There is some charge for revealing every label
 Active Learner will interactively query the user for labels
 By intelligent querying, a lot less number of labels will be
required than in usual supervised training
 But a bad algorithm might focus on unimportant or invalid
examples
 Ideally,
Active Learning: Example
Active Learning: Example
 Suppose data lies on a real line and the classifier discriminant
looks like
 H= {hw}: hw(x) = 1 if x > w, 0 otherwise
 Theoretically we can prove that if the actual data distribution P can
be classified using some hypothesis hw in H
 Then to get a classifier with error 'e', we just need O(1/e) random
labeled samples from P
 Now labels are sequences of 0s and 1s
 Goal is to discover the pt 'w' where transition occurs
 Find that using binary search
 So only log (1/e) samples queried
 Exponential improvement in terms of number of samples required
Active Learning and survelliance
Active Learning and sensor networks
How learning happens
Human Machine
Memorize k-Nearest Neighbours,
Case/Example-based learning
Observe someone else,
then repeat
Supervised Learning, Learning by
Demonstration
Keep trying until it works
(riding a bike)
Reinforcement Learning
20 Questions Active Learning
Pattern matching
(faces, voices, languages)
Pattern Recognition
Guess that current trend will
continue (stock market, real
estate prices)
Regression
COURSE INFORMATION
Tools and Resources
 Weka: http://www.cs.waikato.ac.nz/ml/weka
 Scilab: http://www.scilab.org/
 R-software: http://www.r-project.org/
 RapidMiner: http://rapid-i.com/content/view/181/190/
 Orange: http://orange.biolab.si/
 KNIME: http://www.knime.org/
 SVM Light: http://svmlight.joachims.org
 ShogunToolbox: http://www.shogun-toolbox.org/
 Elefant: http://elefant.developer.nicta.com.au
 Google prediction API: http://code.google.com/apis/predict/
Course Info
 Pre-requisites for course
 Probability & Statistics
 Basics of convex optimization
 Basics of linear algebra
 Online Materials
 Online class-notes : http://www.cse.iitb.ac.in/~cs725/notes/classNotes/
 Username: cs717
 Password: cs717_student
 Andrew Ng. Notes http://www.stanford.edu/class/cs229/materials.html and
video lecture series http://videolectures.net/andrew_ng/
 Main Text Book: Pattern Recognition and Machine Learning – Christopher
Bishop
 Reference: Hastie, Tibshirani, Friedman The elements of Statistical Learning
Springer Verlag

More Related Content

Similar to Machine_Learning.pptx

LearningAG.ppt
LearningAG.pptLearningAG.ppt
LearningAG.ppt
butest
 
ML.pptvdvdvdvdvdfvdfgvdsdgdsfgdfgdfgdfgdf
ML.pptvdvdvdvdvdfvdfgvdsdgdsfgdfgdfgdfgdfML.pptvdvdvdvdvdfvdfgvdsdgdsfgdfgdfgdfgdf
ML.pptvdvdvdvdvdfvdfgvdsdgdsfgdfgdfgdfgdf
AvijitChaudhuri3
 
Machine Learning presentation.
Machine Learning presentation.Machine Learning presentation.
Machine Learning presentation.
butest
 

Similar to Machine_Learning.pptx (20)

Machine Learning an Research Overview
Machine Learning an Research OverviewMachine Learning an Research Overview
Machine Learning an Research Overview
 
LearningAG.ppt
LearningAG.pptLearningAG.ppt
LearningAG.ppt
 
introduction to machine learning
introduction to machine learningintroduction to machine learning
introduction to machine learning
 
Machine Learning Landscape
Machine Learning LandscapeMachine Learning Landscape
Machine Learning Landscape
 
An Introduction to Machine Learning
An Introduction to Machine LearningAn Introduction to Machine Learning
An Introduction to Machine Learning
 
machine learning
machine learningmachine learning
machine learning
 
Machine learning introduction to unit 1.ppt
Machine learning introduction to unit 1.pptMachine learning introduction to unit 1.ppt
Machine learning introduction to unit 1.ppt
 
ML crash course
ML crash courseML crash course
ML crash course
 
Lecture 2 - Introduction to Machine Learning, a lecture in subject module Sta...
Lecture 2 - Introduction to Machine Learning, a lecture in subject module Sta...Lecture 2 - Introduction to Machine Learning, a lecture in subject module Sta...
Lecture 2 - Introduction to Machine Learning, a lecture in subject module Sta...
 
module 6 (1).ppt
module 6 (1).pptmodule 6 (1).ppt
module 6 (1).ppt
 
ML.ppt
ML.pptML.ppt
ML.ppt
 
ML.ppt
ML.pptML.ppt
ML.ppt
 
ML.ppt
ML.pptML.ppt
ML.ppt
 
ML.ppt
ML.pptML.ppt
ML.ppt
 
ML.pptvdvdvdvdvdfvdfgvdsdgdsfgdfgdfgdfgdf
ML.pptvdvdvdvdvdfvdfgvdsdgdsfgdfgdfgdfgdfML.pptvdvdvdvdvdfvdfgvdsdgdsfgdfgdfgdfgdf
ML.pptvdvdvdvdvdfvdfgvdsdgdsfgdfgdfgdfgdf
 
ML.ppt
ML.pptML.ppt
ML.ppt
 
Machine Learning Basics
Machine Learning BasicsMachine Learning Basics
Machine Learning Basics
 
machine learning algorithm.pptx
machine learning algorithm.pptxmachine learning algorithm.pptx
machine learning algorithm.pptx
 
Machine Learning presentation.
Machine Learning presentation.Machine Learning presentation.
Machine Learning presentation.
 
Chapter 6 - Learning data and analytics course
Chapter 6 - Learning data and analytics courseChapter 6 - Learning data and analytics course
Chapter 6 - Learning data and analytics course
 

Recently uploaded

Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
gajnagarg
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
nirzagarg
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 
Belur $ Female Escorts Service in Kolkata (Adult Only) 8005736733 Escort Serv...
Belur $ Female Escorts Service in Kolkata (Adult Only) 8005736733 Escort Serv...Belur $ Female Escorts Service in Kolkata (Adult Only) 8005736733 Escort Serv...
Belur $ Female Escorts Service in Kolkata (Adult Only) 8005736733 Escort Serv...
HyderabadDolls
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
ahmedjiabur940
 
Abortion pills in Doha {{ QATAR }} +966572737505) Get Cytotec
Abortion pills in Doha {{ QATAR }} +966572737505) Get CytotecAbortion pills in Doha {{ QATAR }} +966572737505) Get Cytotec
Abortion pills in Doha {{ QATAR }} +966572737505) Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 

Recently uploaded (20)

Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
 
Oral Sex Call Girls Kashmiri Gate Delhi Just Call 👉👉 📞 8448380779 Top Class C...
Oral Sex Call Girls Kashmiri Gate Delhi Just Call 👉👉 📞 8448380779 Top Class C...Oral Sex Call Girls Kashmiri Gate Delhi Just Call 👉👉 📞 8448380779 Top Class C...
Oral Sex Call Girls Kashmiri Gate Delhi Just Call 👉👉 📞 8448380779 Top Class C...
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
 
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxRESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
 
Call Girls In GOA North Goa +91-8588052666 Direct Cash Escorts Service
Call Girls In GOA North Goa +91-8588052666 Direct Cash Escorts ServiceCall Girls In GOA North Goa +91-8588052666 Direct Cash Escorts Service
Call Girls In GOA North Goa +91-8588052666 Direct Cash Escorts Service
 
💞 Safe And Secure Call Girls Agra Call Girls Service Just Call 🍑👄6378878445 🍑...
💞 Safe And Secure Call Girls Agra Call Girls Service Just Call 🍑👄6378878445 🍑...💞 Safe And Secure Call Girls Agra Call Girls Service Just Call 🍑👄6378878445 🍑...
💞 Safe And Secure Call Girls Agra Call Girls Service Just Call 🍑👄6378878445 🍑...
 
Belur $ Female Escorts Service in Kolkata (Adult Only) 8005736733 Escort Serv...
Belur $ Female Escorts Service in Kolkata (Adult Only) 8005736733 Escort Serv...Belur $ Female Escorts Service in Kolkata (Adult Only) 8005736733 Escort Serv...
Belur $ Female Escorts Service in Kolkata (Adult Only) 8005736733 Escort Serv...
 
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
 
Statistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbersStatistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbers
 
Abortion pills in Doha {{ QATAR }} +966572737505) Get Cytotec
Abortion pills in Doha {{ QATAR }} +966572737505) Get CytotecAbortion pills in Doha {{ QATAR }} +966572737505) Get Cytotec
Abortion pills in Doha {{ QATAR }} +966572737505) Get Cytotec
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubai
 
Vastral Call Girls Book Now 7737669865 Top Class Escort Service Available
Vastral Call Girls Book Now 7737669865 Top Class Escort Service AvailableVastral Call Girls Book Now 7737669865 Top Class Escort Service Available
Vastral Call Girls Book Now 7737669865 Top Class Escort Service Available
 

Machine_Learning.pptx

  • 1. Foundations of Machine Learning (CS725) Autumn 2011 Instructor: Prof. Ganesh Ramakrishnan TAs: Ajay Nagesh, Amrita Saha, Kedharnath Narahari
  • 2. The grand goal From the movie 2001: A Space Odyssey (1968)
  • 3. Outline • Introduction to Machine Learning – What is machine learning? – Why machine learning – How machine learning relates to other fields • Real world applications • Machine Learning : Models and methods – Supervised – Unsupervised – Semi-supervised – Active learning • Course Information – Tools and software – Pre-requisites
  • 5. Intelligence • Ability for abstract thought, understanding, communication, reasoning, planning, emotional intelligence, problem solving, learning • The ability to learn and/or adapt is generally considered a hallmark of intelligence
  • 6. Learning and Machine Learning • ``Learning denotes changes in the system that are adaptive in the sense that they enable the system to do the task(s) drawn from the same population more efficiently and more effectively the next time.''--Herbert Simon • Machine Learning is concerned with the development of algorithms and techniques that allow computers to learn.
  • 7. Machine Learning • “Machine learning studies the process of constructing abstractions (features, concepts, functions, relations and ways of acting) automatically from data.”
  • 8. E.g.: Learning concepts and words “tufa” “tufa” “tufa” Can you pick out the tufas? Source: Josh Tenenbaum
  • 9. Why Machine Learning ? • Human expertise does not exist (e.g. Martian exploration) • Humans cannot explain their expertise or reduce it to a rule set, or their explanation is incomplete and needs tuning (e.g. speech recognition) • Situation changing in time (e.g. spam/junk email) • Humans are expensive to train up (e.g. zipcode recognition) • There are large amounts of data (e.g. discover astronomical objects)
  • 11. Data Data Everywhere … • Library of Congress text database of ~20 TB • AT&T 323 TB, 1.9 trillion phone call records. • World of Warcraft utilizes 1.3 PB of storage to maintain its game. • Avatar movie reported to have taken over 1 PB of local storage at WetaDigital for the rendering of the 3D CGI effects. • Google processes ~24 PB of data per day. • YouTube: 24 hours of video uploaded every minute. More video is uploaded in 60 days than all 3 major US networks created in 60 years. According to Cisco, internet video will generate over 18 EB of traffic per month in 2013.
  • 13. Machine Learning to the rescue • Machine Learning is one of the front-line technologies to handle Information Overload • Business – Mining correlations, trends, spatio-temporal predictions. – Efficient supply chain management. – Opinion mining and sentiment analysis. – Recommender systems.
  • 14. Fields related to Machine Learning
  • 15. Fields related to Machine Learning • Artificial Intelligence: computational intelligence • Data Mining: searching through large volumes of data • Neural Networks: neural/brain inspired methods • Signal Processing: signals, video, speech, image • Pattern Recognition: labeling data • Robotics: building autonomous robots
  • 16. Application of Machine Learning Deep Blue and the chess Challenge RoboCup Online Poker
  • 17. Application of Machine Learning • Computation Biology (Structure learning) • Animation and Control • Tracking and activity recognition
  • 18. Application of Machine Learning • Application in speech and Natural Language processing • Probabilistic Context Free Grammars • Graphical Models • Social network graph analysis, causality analysis
  • 19. Deep Q and A: IBM Watson • Deep Question and Answering : Jeopardy challenge • Watson emerged winner when pitted against all time best rated players in the history of Jeopardy Source: IBM Research
  • 21. Machine Learning Process  How to do the learning actually?
  • 22. Learning (Formally)  Task  To apply some machine learning method to the data obtained from a given domain (Training Data)  The domain has some characteristics, which we are trying to learn (Model)  Objective  To minimise the error in prediction  Types of Learning  Supervised Learning  Unsupervised Learning  Semi-Supervised Learning  Active Learning
  • 23. Supervised Learning  Classification / Regression problem  Where some samples of data (Training data) with the correct class labels are provided.  i.e. Some correspondence between input (X) & output (Y) given  Using knowledge from training data, the classifier/ regressor model is learnt  i.e. Learn some function f : f(X) = Y  f may be probabilistic/deterministic  Learning the model ≡ Fitting the parameters of model to minimise prediction error  Model can then be tested on test-data
  • 24. Regression  Linear regression  Uses  Stock Prediction  Outlier detection
  • 27. All models are not good  Constrain the parameters
  • 29. BearHead DuckHead LionHead d1 d2 d3 f1 f2 f3 f4 Class label ??? Supervised Classification example Source: LHI Animal Faces Dataset
  • 30. Classification  Example:  Credit Scoring  Goal:  Differentiating between high-risk and low- risk customers based on their income and savings  Discriminant:  IF income > θ1 AND savings > θ2 THEN low-risk ELSE high-risk  Discriminant is called 'hypothesis'  Input attribute space is called 'Feature Space'  Here Input data is 2-dimensional and the output is binary
  • 32. Building non-linear classifiers  Curse of dimensionality
  • 34. What is the right hypothesis?
  • 35. What is the right hypothesis for this classification problem
  • 36. What is the right hypothesis for this regression problem
  • 37. Which linear hypothesis is better  Max – Margin Classifier
  • 38. Other considerations  Feature extraction: which are the good features that characterise the data  Model selection: picking the right model using some scoring/fitting function:  It is important not only to provide a good predictor, but also to assess accurately how “good” the model is on unseen test data  So a good performance estimator is needed to rank the model  Model averaging: Instead of picking a single model, it might be better to do a weighted average over the best- fit models
  • 39. Which hypothesis is better?  Unless you know something about the distribution of problems your learning algorithm will encounter, any hypothesis that agrees with all your data is as good as any other.  You have to make assumptions about the underlying features.  Hence learning is inductive, not deductive.
  • 40. Unsupervised Learning  Labels may be too expensive to generate or may be completely unknown  There is lots of training data but with no class labels assigned to it
  • 41. ??? Source: LHI Animal Faces Dataset
  • 42. Unsupervised Learning  For example clustering  Clustering –  grouping similar objects  Similar in which way?
  • 44.
  • 45. Clustering Problems  How to tell which type of clustering is desirable?
  • 46. Semi-Supervised Learning  Supervised learning + Additional unlabeled data  Unsupervised learning + Additional labeled data  Learning Algorithm:  Start from the labeled data to build an initial classifier  Use the unlabeled data to enhance the model  Some Techniques:  Co-Training: two or more learners can be trained using an independent set of different features  Or to model joint probability distribution of the features and labels
  • 48. Active Learning  Unlabeled data is easy to obtain; but labels may be very expensive  For e.g. Speech recognizer  Active Learning  Initially all data labels are hidden  There is some charge for revealing every label  Active Learner will interactively query the user for labels  By intelligent querying, a lot less number of labels will be required than in usual supervised training  But a bad algorithm might focus on unimportant or invalid examples
  • 50. Active Learning: Example  Suppose data lies on a real line and the classifier discriminant looks like  H= {hw}: hw(x) = 1 if x > w, 0 otherwise  Theoretically we can prove that if the actual data distribution P can be classified using some hypothesis hw in H  Then to get a classifier with error 'e', we just need O(1/e) random labeled samples from P  Now labels are sequences of 0s and 1s  Goal is to discover the pt 'w' where transition occurs  Find that using binary search  So only log (1/e) samples queried  Exponential improvement in terms of number of samples required
  • 51. Active Learning and survelliance
  • 52. Active Learning and sensor networks
  • 53. How learning happens Human Machine Memorize k-Nearest Neighbours, Case/Example-based learning Observe someone else, then repeat Supervised Learning, Learning by Demonstration Keep trying until it works (riding a bike) Reinforcement Learning 20 Questions Active Learning Pattern matching (faces, voices, languages) Pattern Recognition Guess that current trend will continue (stock market, real estate prices) Regression
  • 55. Tools and Resources  Weka: http://www.cs.waikato.ac.nz/ml/weka  Scilab: http://www.scilab.org/  R-software: http://www.r-project.org/  RapidMiner: http://rapid-i.com/content/view/181/190/  Orange: http://orange.biolab.si/  KNIME: http://www.knime.org/  SVM Light: http://svmlight.joachims.org  ShogunToolbox: http://www.shogun-toolbox.org/  Elefant: http://elefant.developer.nicta.com.au  Google prediction API: http://code.google.com/apis/predict/
  • 56. Course Info  Pre-requisites for course  Probability & Statistics  Basics of convex optimization  Basics of linear algebra  Online Materials  Online class-notes : http://www.cse.iitb.ac.in/~cs725/notes/classNotes/  Username: cs717  Password: cs717_student  Andrew Ng. Notes http://www.stanford.edu/class/cs229/materials.html and video lecture series http://videolectures.net/andrew_ng/  Main Text Book: Pattern Recognition and Machine Learning – Christopher Bishop  Reference: Hastie, Tibshirani, Friedman The elements of Statistical Learning Springer Verlag