SlideShare a Scribd company logo
1 of 6
Download to read offline
General Info                                                                     General Info (cont’d)
                                                                                         instructor: Jörg Tiedemann (j.tiedemann@rug.nl)
   Machine Learning, LIX004M5                                                                 Harmoniegebouw, room 1311-429                                               You need an account on ’hagen’ for the labs!
                                                                                                                                                                          (But you may also work from home or somewhere
                Overview and Introduction                                                prerequisites: open to students in Computer                                      else)
                                                                                              Science, Artificial Intelligence and Information
                          J¨ rg Tiedemann
                           o                                                                  Science
                        tiedeman@let.rug.nl
                                                                                              2nd year student or higher
                                                                                              background: programming ability, elementary                                 Go to A. Da Costa. (Harmoniegebouw, k. 336,
                          Informatiekunde
                     Rijksuniversiteit Groningen                                              statistics                                                                  bouwdeel 13.13, telefoon 363 5801) (dagelijks 10.30-
                                                                                         schedule: September 5 - October 21                                               12.00 en 14.00-15.30 uur, vrijdagmiddag gesloten)!
                                                                                                • lectures: mondays 13-15
                                                                                                • labs: fridays 9-11 (4 times only!)




                                                   Machine Learning, LIX004M5 – p.1/50                                              Machine Learning, LIX004M5 – p.2/50                                            Machine Learning, LIX004M5 – p.3/50




General Info (cont’d)                                                                    General Info (cont’d)                                                            General Info (cont’d)
Website: http://www.let.rug.nl/˜tiedeman/ml06                                            Purpose of this course                                                             •   Examination:
Examination: lab assignments and exercises (50%),                                          • Introduction (!) to machine learning techniques                                     • obligatory lab assignments (50%)

    written exam (50%)                                                                       (How much do you know already?)                                                     • written exam (50%)

Exam: Friday, October 27, 9-12                                                             • Discussion of several machine learning                                         •   A minimum of 6 for both parts is required!
Literature: Tom Mitchell Machine Learning, New                                               approaches                                                                     •   Exam is open book
    York: McGraw-Hill, 1997                                                                • Examples and applications in various fields
    additional on-line literature (links available from                                    • Practical assignments
    the course website)                                                                       • using Weka - a machine learning package
                                                                                                 implemented in Java
                                                                                              • a little bit of programming/scripting
                                                                                              • some theoretical questions


                                                   Machine Learning, LIX004M5 – p.4/50                                              Machine Learning, LIX004M5 – p.5/50                                            Machine Learning, LIX004M5 – p.6/50




Preliminary Program                                                                      General comments                                                                 What is Machine learning?
 1. Organization, Introduction, (Ch.1, Ch.5)                                               •   Read the book! (and other literature if necessary)                         Machine Learning is
 2. Inductive learning (Ch.2), Decision Trees (Ch.3)                                       •   Ask questions! (and I’ll try to answer)                                     • the study of algorithms that
      • Lab 1 - Decision Trees
                                                                                           •   Tell me if you think that something’s wrong                                     • improve their performance
 3. Instance-Based Learning (Ch.8)                                                                                                                                             • at some task
                                                                                           •   Keep the deadlines!
      • Lab 2 - Instance-based learning
                                                                                               (1 week late → half the points, later → no points)                              • with experience

 4. Bayesian Learning (Ch.6)
      • Lab 3 - Learner comparison/combination
                                                                                                                                                                          ... just like a human being ... (?)
 5. Sequential data & Markov Models, M&S Ch.9, Bilmex;
      • Lab 4 - Markov models

 6. Maximum Entropy models, Combining Learners
 7. Genetic Algorithms (Ch.9), Reinforcement Learning (Ch.13)

                                                   Machine Learning, LIX004M5 – p.7/50                                              Machine Learning, LIX004M5 – p.8/50                                            Machine Learning, LIX004M5 – p.9/50
What is all the hype about ML?                                                                    Why machine learning?                                                            Typical Data Mining Task
                                                                                                  data mining: pattern recognition, knowledge
                                                                                                       discovery, use historical data to improve future
"Every time I fire a linguist the performance of the                                                    decisions, prediction (classification, regression),
recognizer goes up"                                                                                    data discripton (clustering, summarization,
                                                                                                       visualization)
(probably) said by Fred Jelinek (IBM speech group) in the 80s, quoted by, e.g.,                   complex applications: we cannot program by hand,
Jurafsky and Martin, Speech and Language Processing.                                                   (efficient) processing of complex signals                                    Given:
                                                                                                  self-customizing programs: automatic adjustments                                    • 9714 patient records, each describing a pregnancy and birth
                                                                                                       according to usage, dynamic systems                                            • Each patient record contains 215 features
                                                                                                                                                                                   Learn to predict:
                                                                                                                                                                                      • Classes of future patients at high risk for Emergency Cesarean Section



                                                           Machine Learning, LIX004M5 – p.10/50                                             Machine Learning, LIX004M5 – p.11/50                                                            Machine Learning, LIX004M5 – p.12/50




Pattern Recognition                                                                               Complex applications                                                             Classification
Object detection                                                                                  Operating robots:                                                                Personal home page? Company website? Educational site?
                                                                                                  ALVINN [Pomerleau] drives 70 mph on highways




                                                           Machine Learning, LIX004M5 – p.13/50                                             Machine Learning, LIX004M5 – p.14/50                                                            Machine Learning, LIX004M5 – p.15/50




Automatic customization                                                                           Machine learning is growing                                                      Questions to ask
                                                                                                  many more applications:                                                          Learning = improve with experience at some task
                                                                                                    •   speech recognition                                                            •   What experience?
                                                                                                    •   robot control                                                                 •   What exactly should be learned?
                                                                                                    •   spam filtering, data sorting                                                   •   How shall it be represented?
                                                                                                    •   machine translation                                                           •   What specific algorithm to learn it?
                                                                                                    •   financial data analysis and market predictions                              Goal: handle unseen data correctly according to the
                                                                                                    •   hand writing recognition                                                   task (use your knowledge inferred from experience!)
                                                                                                    •   data clustering and visualization
                                                                                                    •   pattern recognition in genetics (e.g. DNA
                                                                                                        sequences)

                                                           Machine Learning, LIX004M5 – p.16/50                                             Machine Learning, LIX004M5 – p.17/50                                                            Machine Learning, LIX004M5 – p.18/50
What experience?                                                                What exactly should be learned?                                                             How shall it be represented?
  •   What do we know already about the task and                                Outcome of the target function                                                              Model selection
      possible solutions? (prior knowledge)                                      • boolean (→ concept learning)
                                                                                                                                                                               •   symbolic representation (e.g. rules)
  •   What kind of data do we have available?                                    • discrete values (→ classification)
      (training examples)                                                                                                                                                      •   subsymbolic representation (neural networks,
                                                                                 • real values (→ regression)                                                                      SVMs)
      What are the discriminative features? How are
      they connected with each other (dependencies)?                            many machine learning tasks are classification tasks ...                                     Do we want to restrict the space of possible solutions?
  •   Is a “teacher” available (→ supervised learning)                                                                                                                      (→ restriction bias ... we come back to this)
      or not (→ unsupervised learning)?
      How expensive is labeling?
  •   How much data do we need and how clean does it
      have to be?


                                         Machine Learning, LIX004M5 – p.19/50                                                        Machine Learning, LIX004M5 – p.20/50                                                               Machine Learning, LIX004M5 – p.21/50




What algorithm to learn it?                                                     What algorithm to learn it?                                                                 Learning Models
Learning means approximating the real (unknown)                                 Learning means approximating the real (unknown)                                             Learning means approximating the real (unknown)
target function according to our experience (e.g.                               target function according to our experience (e.g.                                           target function according to our experience (e.g.
observed training examples)                                                     observed training examples)                                                                 observed training examples)
                                                                                → Learning = Search for a “good” hypothesis/model                                           → Learning = search for a “good” hypothesis/model
                                                                                Do we want to prefer certain models?                                                        Which one is better?
                                                                                (→ preference bias ... later more)




                                         Machine Learning, LIX004M5 – p.22/50                                                        Machine Learning, LIX004M5 – p.23/50                                                               Machine Learning, LIX004M5 – p.24/50




Learning Models                                                                 What algorithm to learn it?                                                                 The roots of ML
Learning means approximating the real (unknown)                                 Learning means approximating the real (unknown)                                             Artificial intelligence: use prior knowledge and training data to guide
target function according to our experience (e.g.                               target function according to our experience (e.g.                                                 learning as a search problem
observed training examples)                                                     observed training examples)                                                                 Baysian methods: probabilistic classifiers, probabilistic reasoning
                                                                                                                                                                            Computational complexity theory: trade-off between model (learning)
→ Learning = search for a “good” hypothesis/model                               → Learning = search for a “good” hypothesis/model                                               complexity and performance
                                                                                                                                                                            Control theory: control optimisation processes
Which one is better?                                                              • supervised learning (classified data available)
                                                                                  • unsupervised learning (e.g. clustering)                                                 Information theory: entropy, information content, code optimsation and the
                                                                                                                                                                                 minimum description length principle
                                                                                  • inductive learning (from training data)
                                                                                                                                                                            Philosophy: Occam’s razor (simple is best)
                                                                                  • deductive learning (data + domain theory)
                                                                                                                                                                            Psychology and neurobiology: response improvement with practice, ideas
                                                                                  • gradient descent, bayesian learning, reinforcement learning ...                              that lead to artificical neural networks
                                                                                                                                                                            Statistics: data description, estimation of probability distributions,
                                                                                                                                                                                   evaluation, confidence

                                         Machine Learning, LIX004M5 – p.25/50                                                        Machine Learning, LIX004M5 – p.26/50                                                               Machine Learning, LIX004M5 – p.27/50
A walk-through example                                                                      A walk-through example                                                           A walk-through example
from Duda et al: Pattern Classification                                                      from Duda et al: Pattern Classification                                           Procedure:
   •   Task: automatically sort incoming fish on a                                              •   Task: automatically sort incoming fish on a                                preprocessing: isolate fishes from one another and
       conveyor belt into “sea bass” or “salmon”                                                   conveyor belt into “sea bass” or “salmon”                                     from the background of the images
                                                                                               •   Experience: sample images                                                 feature selection: determine discriminative features
We want a machine to learn this task.
The machine needs some “experience”.                                                                                                                                             to be extracted from the images (e.g. length,
                                                                                                                                                                                 lightness, width, position of mouth, etc),
                                                                                                                                                                                 feature selection = kind of data reduction (focus
                                                                                                                                                                                 on relevant information)
                                                                                                                                                                             feature extraction: extract selected features from
                                                                                                                                                                                 images and pass them to a classifier



                                                     Machine Learning, LIX004M5 – p.28/50                                             Machine Learning, LIX004M5 – p.29/50                                            Machine Learning, LIX004M5 – p.30/50




A walk-through example                                                                      A walk-through example                                                           A walk-through example
                                                                                            Select length for discrimiation:                                                 Lightness is a better feature:




                                                     Machine Learning, LIX004M5 – p.31/50                                             Machine Learning, LIX004M5 – p.32/50                                            Machine Learning, LIX004M5 – p.33/50




A walk-through example                                                                      A walk-through example                                                           A walk-through example
   •   devise decision rule or move decision boundary                                       Feature selection:                                                               We still need to:
       to minimize some classification cost (→ decision                                        • distinguishing (similar for objects in same
       theory)                                                                                                                                                                 •   select an appropriate type of model for
                                                                                                category and very different for different                                          classification (e.g. function class to define
   •   a single feature might not be enough minimize                                            categories)                                                                        separation boundaries)
       costs                                                                                  • invariant (feature value doesn’t change when
                                                                                                                                                                               •   select the model that generalizes the best (to be
→ feature vector, e.g.:                                                                         changing the context)                                                              able to classify even unseen objects correctly)
                                                                                              • insensitive to noise
                                                                                                                                                                               •   consider computational complexity (trade-off
                                X1         width                                              • simple to extract                                                                  between complexity and performance; scalibility)
                     X=
                                X2       lightness




                                                     Machine Learning, LIX004M5 – p.34/50                                             Machine Learning, LIX004M5 – p.35/50                                            Machine Learning, LIX004M5 – p.36/50
A walk-through example                                                        A walk-through example                                                            A walk-through example




Linear decision boundary                                                      Overly complex decision boundary                                                  a good trade-off between performance on the training
                                                                              (What is the problem?)                                                            set and model simplicity
                                       Machine Learning, LIX004M5 – p.37/50                                              Machine Learning, LIX004M5 – p.38/50                                              Machine Learning, LIX004M5 – p.39/50




The Design Cycle                                                              Evaluation                                                                        Evaluation
                                                                              We have ...                                                                       Evaluation of classifiers based on
                                                                                •   different feature sets                                                        •   accuracy or error rate
                                                                                •   different models                                                                  (percentage of classification errors)
                                                                                •   different learning strategies
                                                                                                                                                                  •   risk (cost estimation for classification decisions)

                                                                              → We need to evaluate!                                                            Never ever evaluate on training data!
                                                                                                                                                                ... Why not?




                                       Machine Learning, LIX004M5 – p.40/50                                              Machine Learning, LIX004M5 – p.41/50                                              Machine Learning, LIX004M5 – p.42/50




Evaluation                                                                    Evaluation                                                                        Evaluation
                                                                                                                                                                Typical strategy in supervised learning:
Distinguish:                                                                  How good is an estimate of the true error by means of                             Split data into disjoint training data and test data
                                                                              sample errors?
sample error: error rate observed when classifying                                                                                                              Problems:
    sample data (test data)                                                     •   confidence intervals
true error: probability of misclassifying a randomly                            •   larger sample → greater confidence                                             •   we could be (too) lucky (sample error on test data
    selected object                                                                                                                                                   is better than with other data splits)
                                                                                                                                                                  •   test data set is too small to be confident
                                                                              How good is one model compared to another?
                                                                                                                                                                  •   training data is rare and expensive (we don’t want
                                                                                •   calculate sample errors                                                           to waste too much when seperating test data)
                                                                                •   compute statistical significance (e.g. paired t-test)



                                       Machine Learning, LIX004M5 – p.43/50                                              Machine Learning, LIX004M5 – p.44/50                                              Machine Learning, LIX004M5 – p.45/50
Cross validation                                                                 Cross validation                                                                 Cross validation
  •   split D into k similar sized sets (e.g. k=10)                                •   split D into k similar sized sets (e.g. k=10)                                •   split D into k similar sized sets (e.g. k=10)
  •   use k − 1 sets for training and 1 for evaluation                             •   use k − 1 sets for training and 1 for evaluation                             •   use k − 1 sets for training and 1 for evaluation
  •   use each set once for evaluation and calculate the                           •   use each set once for evaluation and calculate the                           •   use each set once for evaluation and calculate the
      average of the errors                                                            average of the errors                                                            average of the errors
                                                                                 → improve error estimates (higher confidence)                                     → improve error estimates (higher confidence)
                                                                                 → all data is tested                                                             → all data is tested
                                                                                 → better use of (limited) training data                                          → better use of (limited) training data
                                                                                 Note: we still don’t evaluate on training data!                                  Note: we still don’t evaluate on training data!
                                                                                                                                                                  special case: leave one out cross validation - use each
                                                                                                                                                                  training example once for testing and the others for
                                                                                                                                                                  training
                                          Machine Learning, LIX004M5 – p.46/50                                             Machine Learning, LIX004M5 – p.47/50                                             Machine Learning, LIX004M5 – p.48/50




Conclusion                                                                       What’s next?
  •   we seem to be overwhelmed by number,
      complexity and magnitude of sub-problems
                                                                                 This week: Read ch. 1 & ch. 5 of Mitchell and look
  •   many of them can be solved (to a certain degree
                                                                                     at the exercises
      at least)
                                                                                     No lab on Friday!
  •   many fascinating problems still remain
                                                                                 Next week: Inductive learning, Mitchell ch. 2 &
Enjoy working with learning systems!                                                 Decision trees, ch. 3
                                                                                     First lab about Decision Trees




                                          Machine Learning, LIX004M5 – p.49/50                                             Machine Learning, LIX004M5 – p.50/50

More Related Content

Viewers also liked (11)

Mesa Overhead Doors Repair
Mesa Overhead Doors RepairMesa Overhead Doors Repair
Mesa Overhead Doors Repair
 
Sutuhadong
SutuhadongSutuhadong
Sutuhadong
 
Clean dns ptbr
Clean dns ptbrClean dns ptbr
Clean dns ptbr
 
Click here to read article
Click here to read articleClick here to read article
Click here to read article
 
PBIS-DATA
PBIS-DATAPBIS-DATA
PBIS-DATA
 
Como gerir de forma eficaz um projeto do Portugal 2020 | Modelos e Relatórios...
Como gerir de forma eficaz um projeto do Portugal 2020 | Modelos e Relatórios...Como gerir de forma eficaz um projeto do Portugal 2020 | Modelos e Relatórios...
Como gerir de forma eficaz um projeto do Portugal 2020 | Modelos e Relatórios...
 
M2020 | Omni-experience & Mobile
M2020 | Omni-experience & MobileM2020 | Omni-experience & Mobile
M2020 | Omni-experience & Mobile
 
Fórum Business Analytics | INDEG-ISCTE
Fórum Business Analytics | INDEG-ISCTEFórum Business Analytics | INDEG-ISCTE
Fórum Business Analytics | INDEG-ISCTE
 
Goodreads مقدمة عن برنامج وتطبيق قودريدز
Goodreads مقدمة عن برنامج وتطبيق قودريدزGoodreads مقدمة عن برنامج وتطبيق قودريدز
Goodreads مقدمة عن برنامج وتطبيق قودريدز
 
Polymer
PolymerPolymer
Polymer
 
KMME 2016 - Abusalah
KMME 2016 - AbusalahKMME 2016 - Abusalah
KMME 2016 - Abusalah
 

Similar to Machine Learning, LIX004M5

Machine Learning, LIX004M5
Machine Learning, LIX004M5Machine Learning, LIX004M5
Machine Learning, LIX004M5butest
 
Machine Learning, LIX004M5
Machine Learning, LIX004M5Machine Learning, LIX004M5
Machine Learning, LIX004M5butest
 
Lessons Learned from Testing Machine Learning Software
Lessons Learned from Testing Machine Learning SoftwareLessons Learned from Testing Machine Learning Software
Lessons Learned from Testing Machine Learning SoftwareChristian Ramirez
 
MLlecture1.ppt
MLlecture1.pptMLlecture1.ppt
MLlecture1.pptbutest
 
MLlecture1.ppt
MLlecture1.pptMLlecture1.ppt
MLlecture1.pptbutest
 
Multi task learning in dnn
Multi task learning in dnnMulti task learning in dnn
Multi task learning in dnnLior Boim
 
cs330_2021_lifelong_learning.pdf
cs330_2021_lifelong_learning.pdfcs330_2021_lifelong_learning.pdf
cs330_2021_lifelong_learning.pdfKuan-Tsae Huang
 
Deep Learning, an interactive introduction for NLP-ers
Deep Learning, an interactive introduction for NLP-ersDeep Learning, an interactive introduction for NLP-ers
Deep Learning, an interactive introduction for NLP-ersRoelof Pieters
 
633-600 Machine Learning
633-600 Machine Learning633-600 Machine Learning
633-600 Machine Learningbutest
 
Data science unit 1 By: Professor Lili Saghafi
Data science unit 1 By: Professor Lili Saghafi Data science unit 1 By: Professor Lili Saghafi
Data science unit 1 By: Professor Lili Saghafi Professor Lili Saghafi
 
Deep learning with tensorflow
Deep learning with tensorflowDeep learning with tensorflow
Deep learning with tensorflowCharmi Chokshi
 

Similar to Machine Learning, LIX004M5 (20)

Machine Learning, LIX004M5
Machine Learning, LIX004M5Machine Learning, LIX004M5
Machine Learning, LIX004M5
 
Machine Learning, LIX004M5
Machine Learning, LIX004M5Machine Learning, LIX004M5
Machine Learning, LIX004M5
 
Lessons Learned from Testing Machine Learning Software
Lessons Learned from Testing Machine Learning SoftwareLessons Learned from Testing Machine Learning Software
Lessons Learned from Testing Machine Learning Software
 
Deep learning - a primer
Deep learning - a primerDeep learning - a primer
Deep learning - a primer
 
Deep learning - a primer
Deep learning - a primerDeep learning - a primer
Deep learning - a primer
 
MLlecture1.ppt
MLlecture1.pptMLlecture1.ppt
MLlecture1.ppt
 
MLlecture1.ppt
MLlecture1.pptMLlecture1.ppt
MLlecture1.ppt
 
Multi task learning in dnn
Multi task learning in dnnMulti task learning in dnn
Multi task learning in dnn
 
cs330_2021_lifelong_learning.pdf
cs330_2021_lifelong_learning.pdfcs330_2021_lifelong_learning.pdf
cs330_2021_lifelong_learning.pdf
 
Deep Learning, an interactive introduction for NLP-ers
Deep Learning, an interactive introduction for NLP-ersDeep Learning, an interactive introduction for NLP-ers
Deep Learning, an interactive introduction for NLP-ers
 
Presentation MaSE 18-102012
Presentation MaSE 18-102012Presentation MaSE 18-102012
Presentation MaSE 18-102012
 
Lecture01 0089
Lecture01 0089Lecture01 0089
Lecture01 0089
 
633-600 Machine Learning
633-600 Machine Learning633-600 Machine Learning
633-600 Machine Learning
 
tensorflow.pptx
tensorflow.pptxtensorflow.pptx
tensorflow.pptx
 
Lec 01 introduction
Lec 01   introductionLec 01   introduction
Lec 01 introduction
 
Data science unit 1 By: Professor Lili Saghafi
Data science unit 1 By: Professor Lili Saghafi Data science unit 1 By: Professor Lili Saghafi
Data science unit 1 By: Professor Lili Saghafi
 
Deep learning with tensorflow
Deep learning with tensorflowDeep learning with tensorflow
Deep learning with tensorflow
 
Learning how to learn
Learning how to learnLearning how to learn
Learning how to learn
 
210428kopo
210428kopo210428kopo
210428kopo
 
MILA DL & RL summer school highlights
MILA DL & RL summer school highlights MILA DL & RL summer school highlights
MILA DL & RL summer school highlights
 

More from butest

EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEbutest
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALbutest
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jacksonbutest
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALbutest
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer IIbutest
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazzbutest
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.docbutest
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1butest
 
Facebook
Facebook Facebook
Facebook butest
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...butest
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...butest
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTbutest
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docbutest
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docbutest
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.docbutest
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!butest
 

More from butest (20)

EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBE
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jackson
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer II
 
PPT
PPTPPT
PPT
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.doc
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1
 
Facebook
Facebook Facebook
Facebook
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENT
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.doc
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.doc
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.doc
 
hier
hierhier
hier
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!
 

Machine Learning, LIX004M5

  • 1. General Info General Info (cont’d) instructor: Jörg Tiedemann (j.tiedemann@rug.nl) Machine Learning, LIX004M5 Harmoniegebouw, room 1311-429 You need an account on ’hagen’ for the labs! (But you may also work from home or somewhere Overview and Introduction prerequisites: open to students in Computer else) Science, Artificial Intelligence and Information J¨ rg Tiedemann o Science tiedeman@let.rug.nl 2nd year student or higher background: programming ability, elementary Go to A. Da Costa. (Harmoniegebouw, k. 336, Informatiekunde Rijksuniversiteit Groningen statistics bouwdeel 13.13, telefoon 363 5801) (dagelijks 10.30- schedule: September 5 - October 21 12.00 en 14.00-15.30 uur, vrijdagmiddag gesloten)! • lectures: mondays 13-15 • labs: fridays 9-11 (4 times only!) Machine Learning, LIX004M5 – p.1/50 Machine Learning, LIX004M5 – p.2/50 Machine Learning, LIX004M5 – p.3/50 General Info (cont’d) General Info (cont’d) General Info (cont’d) Website: http://www.let.rug.nl/˜tiedeman/ml06 Purpose of this course • Examination: Examination: lab assignments and exercises (50%), • Introduction (!) to machine learning techniques • obligatory lab assignments (50%) written exam (50%) (How much do you know already?) • written exam (50%) Exam: Friday, October 27, 9-12 • Discussion of several machine learning • A minimum of 6 for both parts is required! Literature: Tom Mitchell Machine Learning, New approaches • Exam is open book York: McGraw-Hill, 1997 • Examples and applications in various fields additional on-line literature (links available from • Practical assignments the course website) • using Weka - a machine learning package implemented in Java • a little bit of programming/scripting • some theoretical questions Machine Learning, LIX004M5 – p.4/50 Machine Learning, LIX004M5 – p.5/50 Machine Learning, LIX004M5 – p.6/50 Preliminary Program General comments What is Machine learning? 1. Organization, Introduction, (Ch.1, Ch.5) • Read the book! (and other literature if necessary) Machine Learning is 2. Inductive learning (Ch.2), Decision Trees (Ch.3) • Ask questions! (and I’ll try to answer) • the study of algorithms that • Lab 1 - Decision Trees • Tell me if you think that something’s wrong • improve their performance 3. Instance-Based Learning (Ch.8) • at some task • Keep the deadlines! • Lab 2 - Instance-based learning (1 week late → half the points, later → no points) • with experience 4. Bayesian Learning (Ch.6) • Lab 3 - Learner comparison/combination ... just like a human being ... (?) 5. Sequential data & Markov Models, M&S Ch.9, Bilmex; • Lab 4 - Markov models 6. Maximum Entropy models, Combining Learners 7. Genetic Algorithms (Ch.9), Reinforcement Learning (Ch.13) Machine Learning, LIX004M5 – p.7/50 Machine Learning, LIX004M5 – p.8/50 Machine Learning, LIX004M5 – p.9/50
  • 2. What is all the hype about ML? Why machine learning? Typical Data Mining Task data mining: pattern recognition, knowledge discovery, use historical data to improve future "Every time I fire a linguist the performance of the decisions, prediction (classification, regression), recognizer goes up" data discripton (clustering, summarization, visualization) (probably) said by Fred Jelinek (IBM speech group) in the 80s, quoted by, e.g., complex applications: we cannot program by hand, Jurafsky and Martin, Speech and Language Processing. (efficient) processing of complex signals Given: self-customizing programs: automatic adjustments • 9714 patient records, each describing a pregnancy and birth according to usage, dynamic systems • Each patient record contains 215 features Learn to predict: • Classes of future patients at high risk for Emergency Cesarean Section Machine Learning, LIX004M5 – p.10/50 Machine Learning, LIX004M5 – p.11/50 Machine Learning, LIX004M5 – p.12/50 Pattern Recognition Complex applications Classification Object detection Operating robots: Personal home page? Company website? Educational site? ALVINN [Pomerleau] drives 70 mph on highways Machine Learning, LIX004M5 – p.13/50 Machine Learning, LIX004M5 – p.14/50 Machine Learning, LIX004M5 – p.15/50 Automatic customization Machine learning is growing Questions to ask many more applications: Learning = improve with experience at some task • speech recognition • What experience? • robot control • What exactly should be learned? • spam filtering, data sorting • How shall it be represented? • machine translation • What specific algorithm to learn it? • financial data analysis and market predictions Goal: handle unseen data correctly according to the • hand writing recognition task (use your knowledge inferred from experience!) • data clustering and visualization • pattern recognition in genetics (e.g. DNA sequences) Machine Learning, LIX004M5 – p.16/50 Machine Learning, LIX004M5 – p.17/50 Machine Learning, LIX004M5 – p.18/50
  • 3. What experience? What exactly should be learned? How shall it be represented? • What do we know already about the task and Outcome of the target function Model selection possible solutions? (prior knowledge) • boolean (→ concept learning) • symbolic representation (e.g. rules) • What kind of data do we have available? • discrete values (→ classification) (training examples) • subsymbolic representation (neural networks, • real values (→ regression) SVMs) What are the discriminative features? How are they connected with each other (dependencies)? many machine learning tasks are classification tasks ... Do we want to restrict the space of possible solutions? • Is a “teacher” available (→ supervised learning) (→ restriction bias ... we come back to this) or not (→ unsupervised learning)? How expensive is labeling? • How much data do we need and how clean does it have to be? Machine Learning, LIX004M5 – p.19/50 Machine Learning, LIX004M5 – p.20/50 Machine Learning, LIX004M5 – p.21/50 What algorithm to learn it? What algorithm to learn it? Learning Models Learning means approximating the real (unknown) Learning means approximating the real (unknown) Learning means approximating the real (unknown) target function according to our experience (e.g. target function according to our experience (e.g. target function according to our experience (e.g. observed training examples) observed training examples) observed training examples) → Learning = Search for a “good” hypothesis/model → Learning = search for a “good” hypothesis/model Do we want to prefer certain models? Which one is better? (→ preference bias ... later more) Machine Learning, LIX004M5 – p.22/50 Machine Learning, LIX004M5 – p.23/50 Machine Learning, LIX004M5 – p.24/50 Learning Models What algorithm to learn it? The roots of ML Learning means approximating the real (unknown) Learning means approximating the real (unknown) Artificial intelligence: use prior knowledge and training data to guide target function according to our experience (e.g. target function according to our experience (e.g. learning as a search problem observed training examples) observed training examples) Baysian methods: probabilistic classifiers, probabilistic reasoning Computational complexity theory: trade-off between model (learning) → Learning = search for a “good” hypothesis/model → Learning = search for a “good” hypothesis/model complexity and performance Control theory: control optimisation processes Which one is better? • supervised learning (classified data available) • unsupervised learning (e.g. clustering) Information theory: entropy, information content, code optimsation and the minimum description length principle • inductive learning (from training data) Philosophy: Occam’s razor (simple is best) • deductive learning (data + domain theory) Psychology and neurobiology: response improvement with practice, ideas • gradient descent, bayesian learning, reinforcement learning ... that lead to artificical neural networks Statistics: data description, estimation of probability distributions, evaluation, confidence Machine Learning, LIX004M5 – p.25/50 Machine Learning, LIX004M5 – p.26/50 Machine Learning, LIX004M5 – p.27/50
  • 4. A walk-through example A walk-through example A walk-through example from Duda et al: Pattern Classification from Duda et al: Pattern Classification Procedure: • Task: automatically sort incoming fish on a • Task: automatically sort incoming fish on a preprocessing: isolate fishes from one another and conveyor belt into “sea bass” or “salmon” conveyor belt into “sea bass” or “salmon” from the background of the images • Experience: sample images feature selection: determine discriminative features We want a machine to learn this task. The machine needs some “experience”. to be extracted from the images (e.g. length, lightness, width, position of mouth, etc), feature selection = kind of data reduction (focus on relevant information) feature extraction: extract selected features from images and pass them to a classifier Machine Learning, LIX004M5 – p.28/50 Machine Learning, LIX004M5 – p.29/50 Machine Learning, LIX004M5 – p.30/50 A walk-through example A walk-through example A walk-through example Select length for discrimiation: Lightness is a better feature: Machine Learning, LIX004M5 – p.31/50 Machine Learning, LIX004M5 – p.32/50 Machine Learning, LIX004M5 – p.33/50 A walk-through example A walk-through example A walk-through example • devise decision rule or move decision boundary Feature selection: We still need to: to minimize some classification cost (→ decision • distinguishing (similar for objects in same theory) • select an appropriate type of model for category and very different for different classification (e.g. function class to define • a single feature might not be enough minimize categories) separation boundaries) costs • invariant (feature value doesn’t change when • select the model that generalizes the best (to be → feature vector, e.g.: changing the context) able to classify even unseen objects correctly) • insensitive to noise • consider computational complexity (trade-off X1 width • simple to extract between complexity and performance; scalibility) X= X2 lightness Machine Learning, LIX004M5 – p.34/50 Machine Learning, LIX004M5 – p.35/50 Machine Learning, LIX004M5 – p.36/50
  • 5. A walk-through example A walk-through example A walk-through example Linear decision boundary Overly complex decision boundary a good trade-off between performance on the training (What is the problem?) set and model simplicity Machine Learning, LIX004M5 – p.37/50 Machine Learning, LIX004M5 – p.38/50 Machine Learning, LIX004M5 – p.39/50 The Design Cycle Evaluation Evaluation We have ... Evaluation of classifiers based on • different feature sets • accuracy or error rate • different models (percentage of classification errors) • different learning strategies • risk (cost estimation for classification decisions) → We need to evaluate! Never ever evaluate on training data! ... Why not? Machine Learning, LIX004M5 – p.40/50 Machine Learning, LIX004M5 – p.41/50 Machine Learning, LIX004M5 – p.42/50 Evaluation Evaluation Evaluation Typical strategy in supervised learning: Distinguish: How good is an estimate of the true error by means of Split data into disjoint training data and test data sample errors? sample error: error rate observed when classifying Problems: sample data (test data) • confidence intervals true error: probability of misclassifying a randomly • larger sample → greater confidence • we could be (too) lucky (sample error on test data selected object is better than with other data splits) • test data set is too small to be confident How good is one model compared to another? • training data is rare and expensive (we don’t want • calculate sample errors to waste too much when seperating test data) • compute statistical significance (e.g. paired t-test) Machine Learning, LIX004M5 – p.43/50 Machine Learning, LIX004M5 – p.44/50 Machine Learning, LIX004M5 – p.45/50
  • 6. Cross validation Cross validation Cross validation • split D into k similar sized sets (e.g. k=10) • split D into k similar sized sets (e.g. k=10) • split D into k similar sized sets (e.g. k=10) • use k − 1 sets for training and 1 for evaluation • use k − 1 sets for training and 1 for evaluation • use k − 1 sets for training and 1 for evaluation • use each set once for evaluation and calculate the • use each set once for evaluation and calculate the • use each set once for evaluation and calculate the average of the errors average of the errors average of the errors → improve error estimates (higher confidence) → improve error estimates (higher confidence) → all data is tested → all data is tested → better use of (limited) training data → better use of (limited) training data Note: we still don’t evaluate on training data! Note: we still don’t evaluate on training data! special case: leave one out cross validation - use each training example once for testing and the others for training Machine Learning, LIX004M5 – p.46/50 Machine Learning, LIX004M5 – p.47/50 Machine Learning, LIX004M5 – p.48/50 Conclusion What’s next? • we seem to be overwhelmed by number, complexity and magnitude of sub-problems This week: Read ch. 1 & ch. 5 of Mitchell and look • many of them can be solved (to a certain degree at the exercises at least) No lab on Friday! • many fascinating problems still remain Next week: Inductive learning, Mitchell ch. 2 & Enjoy working with learning systems! Decision trees, ch. 3 First lab about Decision Trees Machine Learning, LIX004M5 – p.49/50 Machine Learning, LIX004M5 – p.50/50