General Info                                                                   General Info (cont’d)
                                                                                              instructor: Jörg Tiedemann (j.tiedemann@rug.nl)
   Machine Learning, LIX004M5                                                                      Harmoniegebouw, room 1311-429                                             You need an account on ’hagen’ for the labs!
                                                                                                                                                                             (But you may also work from home or somewhere else)
                Overview and Introduction                                                     prerequisites: open to students in Computer
                                                                                                   Science, Artificial Intelligence and Information
                           J¨ rg Tiedemann
                            o                                                                      Science
                         tiedeman@let.rug.nl
                                                                                                                                                                             Go to A. Da Costa.
                                                                                                   2nd year student or higher                                                (Harmoniegebouw, k. 336, bouwdeel 13.13, telefoon 363 5801)
                           Informatiekunde
                                                                                                   background: programming ability, elementary                               (10.30-12.00 en 14.00-15.30 uur, closed on friday afternoon)
                      Rijksuniversiteit Groningen                                                  statistics
                                                                                              schedule: September 3 - October 26
                                                                                                     • lectures: mondays 13-15
                                                                                                     • labs: fridays 9-11


                                                                                              5 ECT
                                                        Machine Learning, LIX004M5 – p.1/29                                            Machine Learning, LIX004M5 – p.2/29                                                              Machine Learning, LIX004M5 – p.3/29




General Info (cont’d)                                                                         General Info (cont’d)                                                          Preliminary Program - Lectures
Website: http://www.let.rug.nl/˜tiedeman/ml07                                                 Purpose of this course                                                         We will only manage to look at a selection of the
Examination: 3 obligatory lab assignments                                                       • Introduction to machine learning techniques                                topics in book:
    present and report final project (50%)                                                       • Discussion of several machine learning                                       1. Organization, Introduction, (Ch.1, Ch.2)
    written exam (50%)                                                                            approaches                                                                   2. Decision Trees (Ch.3)
Exam: Friday, October 26, 9-12 (AZERN)                                                          • Examples and applications in various fields
                                                                                                                                                                               3. Instance-Based Learning (Ch.8)
Literature: Tom Mitchell Machine Learning, New                                                  • Practical assignments                                                        4. Bayesian Learning (Ch.6)
    York: McGraw-Hill, 1997                                                                        • using Weka - a machine learning package
    additional on-line literature (links available from                                                                                                                        5. Rule Induction & Reinforcement Learning (Ch.13)
                                                                                                      implemented in Java
    the course website)                                                                                                                                                        6. Genetic Algorithms (Ch.9)
                                                                                                   • some theoretical questions
                                                                                                   • independent group work on final project                                    7. Presentations of Final Projects




                                                        Machine Learning, LIX004M5 – p.4/29                                            Machine Learning, LIX004M5 – p.5/29                                                              Machine Learning, LIX004M5 – p.6/29




Preliminary Program - Labs                                                                    What is Machine learning?                                                      What is all the hype about ML?
6 lab sessions, 3 short lab reports, 1 final project                                           Machine Learning is
  • evaluation in ML (Ch.5), introduction to topics for the final                               • the study of algorithms that
                                                                                                                                                                             "Every time I fire a linguist the performance of the
    project, getting started with WEKA (report 1)                                                  • improve their performance
                                                                                                                                                                             recognizer goes up"
  • select & start with final project (no supervision)                                              • at some task
                                                                                                   • with experience
  • decision trees & instance-based learning (report2)
                                                                                                                                                                             (probably) said by Fred Jelinek (IBM speech group) in the 80s, quoted by, e.g.,
  • work on final project (no supervision)                                                                                                                                    Jurafsky and Martin, Speech and Language Processing.
  • naive bayes and rule induction (report 3)                                                 ... just like a human being ... (?)
  • work on final project




                                                        Machine Learning, LIX004M5 – p.7/29                                            Machine Learning, LIX004M5 – p.8/29                                                              Machine Learning, LIX004M5 – p.9/29
Why machine learning?                                                                   Typical Data Mining Task                                                                         Pattern Recognition
data mining: pattern recognition, knowledge                                                                                                                                              Object detection
     discovery, use historical data to improve future
     decisions, prediction (classification, regression),
     data discripton (clustering, summarization,
     visualization)
complex applications: we cannot program by hand,
     (efficient) processing of complex signals                                           Given:
self-customizing programs: automatic adjustments                                            • 9714 patient records, each describing a pregnancy and birth
     according to usage, dynamic systems                                                    • Each patient record contains 215 features
                                                                                        Learn to predict:
                                                                                            • Classes of future patients at high risk for Emergency Cesarean Section



                                                 Machine Learning, LIX004M5 – p.10/29                                                             Machine Learning, LIX004M5 – p.11/29                                           Machine Learning, LIX004M5 – p.12/29




Classification                                                                           Complex applications                                                                             Automatic customization
Personal home page? Company website? Educational site?                                  Robots playing football in RoboCup
                                                                                        colour classification (DT,NN), player positioning (RL), behaviors (RL,
                                                                                        GA), team strategy adaptation (mixture of experts), ball kicking (GA)
                                                                                        ...
                                                                                        http://www.robocup.org/
                                                                                        http://sserver.sourceforge.net/SIG-learn/




                                                 Machine Learning, LIX004M5 – p.13/29                                                             Machine Learning, LIX004M5 – p.14/29                                           Machine Learning, LIX004M5 – p.15/29




Machine learning is growing                                                             Questions to ask                                                                                 What experience?
many more applications:                                                                 Learning = improve with experience at some task                                                    •   What do we know already about the task and
                                                                                                                                                                                               possible solutions? (prior knowledge)
  •   speech recognition                                                                    •   What experience?                                                                           •   What kind of data do we have available?
  •   spam filtering, sorting data                                                           •   What exactly should be learned?                                                            •   How much data do we need and how clean does it
  •   machine translation                                                                   •   How shall it be represented?                                                                   have to be? (training examples)
  •   robot control                                                                         •   What specific algorithm to learn it?                                                            What are the discriminative features? How are
  •   financial data analysis and market predictions                                                                                                                                            they connected with each other (dependencies)?
                                                                                        Goal: handle unseen data correctly according to the
  •   hand writing recognition
                                                                                                                                                                                           •   Is a “teacher” available (→ supervised learning)
                                                                                        task (use your knowledge inferred from experience!)
                                                                                                                                                                                               or not (→ unsupervised learning)?
  •   data clustering and visualization                                                                                                                                                        How expensive is labeling?
  •   pattern recognition in genetics (e.g. DNA
      sequences)
  •   ...
                                                 Machine Learning, LIX004M5 – p.16/29                                                             Machine Learning, LIX004M5 – p.17/29                                           Machine Learning, LIX004M5 – p.18/29
What exactly should be learned?                                                               How shall it be represented?                                                                What algorithm to learn it?
Outcome of the target function                                                                Model selection                                                                             Learning means approximating the real (unknown)
 • boolean (→ concept learning)                                                                                                                                                           target function according to our experience (e.g.
                                                                                                 •   symbolic representation (e.g. with rules, trees)                                     observed training examples)
 • discrete values (→ classification)
                                                                                                 •   subsymbolic representation (neural networks,
 • real values (→ regression)                                                                        SVMs)                                                                                → Learning = search for a “good” hypothesis/model

many machine learning tasks are classification tasks ...                                       Do we want to restrict the space of possible solutions?                                     Do we want to prefer certain models?
                                                                                              (→ restriction bias)                                                                        (→ preference bias)




                                                       Machine Learning, LIX004M5 – p.19/29                                                        Machine Learning, LIX004M5 – p.20/29                                                               Machine Learning, LIX004M5 – p.21/29




Inductive learning as search                                                                  Inductive bias                                                                              Learning Models
Inductive learning: infer a model from training data                                             •   corresponds to prior knowledge about data and                                        Learning means approximating the real (unknown)
example: concept learning                                                                            task (a priori assumptions)                                                          target function according to our experience (e.g.
   • a set of instances X with attributes a1 ..an                                                •   depends on learning algorithm and model                                              observed training examples)
   • Hypotheses H: set of functions h : X → {0, 1}
                                                                                                     representation                                                                       → Learning = search for a “good” hypothesis/model
   • Representation: e.g. conjunction of constraints                                          Restriction bias:
                                                                                                                                                                                          Which one is better?
   • Training examples D: a sequence of positive and negative
                                                                                                 • hypothesis space is restricted (also: language bias)
     examples of the unknown target function c : X → {0, 1}
                                                                                              Preference bias:
what we want is: hypothesis hc such that hc (x) = c(x) for all x ∈ X
                                ˆ           ˆ                                                    • prefer certain hypotheses (usually more general ones)
what we can observe: hypothesis h such that h(x) = c(x) for all
     x∈D                                                                                      Why do we need inductive bias?

                                                       Machine Learning, LIX004M5 – p.22/29                                                        Machine Learning, LIX004M5 – p.23/29                                                               Machine Learning, LIX004M5 – p.24/29




Learning Models                                                                               What algorithm to learn it?                                                                 The roots of ML
Learning means approximating the real (unknown)                                               Learning means approximating the real (unknown)                                             Artificial intelligence: use prior knowledge and training data to guide
target function according to our experience (e.g.                                             target function according to our experience (e.g.                                                 learning as a search problem
observed training examples)                                                                   observed training examples)                                                                 Baysian methods: probabilistic classifiers, probabilistic reasoning
                                                                                                                                                                                          Statistics: data description, estimation of probability distributions,
→ Learning = search for a “good” hypothesis/model                                             → Learning = search for a “good” hypothesis/model                                                  evaluation, confidence
                                                                                                                                                                                          Information theory: entropy, information content, code optimsation and the
Which one is better?                                                                          decision trees & information gain
                                                                                                                                                                                               minimum description length principle
                                                                                              bayesian techniques & maximum likelihood estimations
                                                                                                                                                                                          Computational complexity theory: trade-off between model (learning)
                                                                                              least mean square algorithm, gradient search
                                                                                                                                                                                              complexity and performance
                                                                                              expectation maximization
                                                                                                                                                                                          Control theory: control optimisation processes
                                                                                              maximum entropy
                                                                                                                                                                                          Psychology and neurobiology: response improvement with practice, ideas
                                                                                              minimum description length
                                                                                                                                                                                               that lead to artificical neural networks
                                                                                              reinforcement learning
What would be a model without inductive bias?                                                                                                                                             Philosophy: Occam’s razor (simple is best)
                                                                                              genetic algorithms, simulated annealing
                                                       Machine Learning, LIX004M5 – p.25/29                                                        Machine Learning, LIX004M5 – p.26/29                                                               Machine Learning, LIX004M5 – p.27/29
Take-home messages                                                             What’s next?

  •   ML = algorithms that learn from experience                               This week: Read ch. 1, ch. 2 & ch. 5 of Mitchell and
  •   generalize instead of memorize                                               look at the exercises
  •   different types of inductive bias                                        Lab on Friday: Evaluation in ML, introduction of
                                                                                   final projects, small exercises
  •   ML has many fascinating sub-tasks
                                                                               Next week: Decision trees, ch. 3, lab session on
Enjoy working with learning systems!                                               Decision Trees & IBL




                                        Machine Learning, LIX004M5 – p.28/29                                          Machine Learning, LIX004M5 – p.29/29

Machine Learning, LIX004M5

  • 1.
    General Info General Info (cont’d) instructor: Jörg Tiedemann (j.tiedemann@rug.nl) Machine Learning, LIX004M5 Harmoniegebouw, room 1311-429 You need an account on ’hagen’ for the labs! (But you may also work from home or somewhere else) Overview and Introduction prerequisites: open to students in Computer Science, Artificial Intelligence and Information J¨ rg Tiedemann o Science tiedeman@let.rug.nl Go to A. Da Costa. 2nd year student or higher (Harmoniegebouw, k. 336, bouwdeel 13.13, telefoon 363 5801) Informatiekunde background: programming ability, elementary (10.30-12.00 en 14.00-15.30 uur, closed on friday afternoon) Rijksuniversiteit Groningen statistics schedule: September 3 - October 26 • lectures: mondays 13-15 • labs: fridays 9-11 5 ECT Machine Learning, LIX004M5 – p.1/29 Machine Learning, LIX004M5 – p.2/29 Machine Learning, LIX004M5 – p.3/29 General Info (cont’d) General Info (cont’d) Preliminary Program - Lectures Website: http://www.let.rug.nl/˜tiedeman/ml07 Purpose of this course We will only manage to look at a selection of the Examination: 3 obligatory lab assignments • Introduction to machine learning techniques topics in book: present and report final project (50%) • Discussion of several machine learning 1. Organization, Introduction, (Ch.1, Ch.2) written exam (50%) approaches 2. Decision Trees (Ch.3) Exam: Friday, October 26, 9-12 (AZERN) • Examples and applications in various fields 3. Instance-Based Learning (Ch.8) Literature: Tom Mitchell Machine Learning, New • Practical assignments 4. Bayesian Learning (Ch.6) York: McGraw-Hill, 1997 • using Weka - a machine learning package additional on-line literature (links available from 5. Rule Induction & Reinforcement Learning (Ch.13) implemented in Java the course website) 6. Genetic Algorithms (Ch.9) • some theoretical questions • independent group work on final project 7. Presentations of Final Projects Machine Learning, LIX004M5 – p.4/29 Machine Learning, LIX004M5 – p.5/29 Machine Learning, LIX004M5 – p.6/29 Preliminary Program - Labs What is Machine learning? What is all the hype about ML? 6 lab sessions, 3 short lab reports, 1 final project Machine Learning is • evaluation in ML (Ch.5), introduction to topics for the final • the study of algorithms that "Every time I fire a linguist the performance of the project, getting started with WEKA (report 1) • improve their performance recognizer goes up" • select & start with final project (no supervision) • at some task • with experience • decision trees & instance-based learning (report2) (probably) said by Fred Jelinek (IBM speech group) in the 80s, quoted by, e.g., • work on final project (no supervision) Jurafsky and Martin, Speech and Language Processing. • naive bayes and rule induction (report 3) ... just like a human being ... (?) • work on final project Machine Learning, LIX004M5 – p.7/29 Machine Learning, LIX004M5 – p.8/29 Machine Learning, LIX004M5 – p.9/29
  • 2.
    Why machine learning? Typical Data Mining Task Pattern Recognition data mining: pattern recognition, knowledge Object detection discovery, use historical data to improve future decisions, prediction (classification, regression), data discripton (clustering, summarization, visualization) complex applications: we cannot program by hand, (efficient) processing of complex signals Given: self-customizing programs: automatic adjustments • 9714 patient records, each describing a pregnancy and birth according to usage, dynamic systems • Each patient record contains 215 features Learn to predict: • Classes of future patients at high risk for Emergency Cesarean Section Machine Learning, LIX004M5 – p.10/29 Machine Learning, LIX004M5 – p.11/29 Machine Learning, LIX004M5 – p.12/29 Classification Complex applications Automatic customization Personal home page? Company website? Educational site? Robots playing football in RoboCup colour classification (DT,NN), player positioning (RL), behaviors (RL, GA), team strategy adaptation (mixture of experts), ball kicking (GA) ... http://www.robocup.org/ http://sserver.sourceforge.net/SIG-learn/ Machine Learning, LIX004M5 – p.13/29 Machine Learning, LIX004M5 – p.14/29 Machine Learning, LIX004M5 – p.15/29 Machine learning is growing Questions to ask What experience? many more applications: Learning = improve with experience at some task • What do we know already about the task and possible solutions? (prior knowledge) • speech recognition • What experience? • What kind of data do we have available? • spam filtering, sorting data • What exactly should be learned? • How much data do we need and how clean does it • machine translation • How shall it be represented? have to be? (training examples) • robot control • What specific algorithm to learn it? What are the discriminative features? How are • financial data analysis and market predictions they connected with each other (dependencies)? Goal: handle unseen data correctly according to the • hand writing recognition • Is a “teacher” available (→ supervised learning) task (use your knowledge inferred from experience!) or not (→ unsupervised learning)? • data clustering and visualization How expensive is labeling? • pattern recognition in genetics (e.g. DNA sequences) • ... Machine Learning, LIX004M5 – p.16/29 Machine Learning, LIX004M5 – p.17/29 Machine Learning, LIX004M5 – p.18/29
  • 3.
    What exactly shouldbe learned? How shall it be represented? What algorithm to learn it? Outcome of the target function Model selection Learning means approximating the real (unknown) • boolean (→ concept learning) target function according to our experience (e.g. • symbolic representation (e.g. with rules, trees) observed training examples) • discrete values (→ classification) • subsymbolic representation (neural networks, • real values (→ regression) SVMs) → Learning = search for a “good” hypothesis/model many machine learning tasks are classification tasks ... Do we want to restrict the space of possible solutions? Do we want to prefer certain models? (→ restriction bias) (→ preference bias) Machine Learning, LIX004M5 – p.19/29 Machine Learning, LIX004M5 – p.20/29 Machine Learning, LIX004M5 – p.21/29 Inductive learning as search Inductive bias Learning Models Inductive learning: infer a model from training data • corresponds to prior knowledge about data and Learning means approximating the real (unknown) example: concept learning task (a priori assumptions) target function according to our experience (e.g. • a set of instances X with attributes a1 ..an • depends on learning algorithm and model observed training examples) • Hypotheses H: set of functions h : X → {0, 1} representation → Learning = search for a “good” hypothesis/model • Representation: e.g. conjunction of constraints Restriction bias: Which one is better? • Training examples D: a sequence of positive and negative • hypothesis space is restricted (also: language bias) examples of the unknown target function c : X → {0, 1} Preference bias: what we want is: hypothesis hc such that hc (x) = c(x) for all x ∈ X ˆ ˆ • prefer certain hypotheses (usually more general ones) what we can observe: hypothesis h such that h(x) = c(x) for all x∈D Why do we need inductive bias? Machine Learning, LIX004M5 – p.22/29 Machine Learning, LIX004M5 – p.23/29 Machine Learning, LIX004M5 – p.24/29 Learning Models What algorithm to learn it? The roots of ML Learning means approximating the real (unknown) Learning means approximating the real (unknown) Artificial intelligence: use prior knowledge and training data to guide target function according to our experience (e.g. target function according to our experience (e.g. learning as a search problem observed training examples) observed training examples) Baysian methods: probabilistic classifiers, probabilistic reasoning Statistics: data description, estimation of probability distributions, → Learning = search for a “good” hypothesis/model → Learning = search for a “good” hypothesis/model evaluation, confidence Information theory: entropy, information content, code optimsation and the Which one is better? decision trees & information gain minimum description length principle bayesian techniques & maximum likelihood estimations Computational complexity theory: trade-off between model (learning) least mean square algorithm, gradient search complexity and performance expectation maximization Control theory: control optimisation processes maximum entropy Psychology and neurobiology: response improvement with practice, ideas minimum description length that lead to artificical neural networks reinforcement learning What would be a model without inductive bias? Philosophy: Occam’s razor (simple is best) genetic algorithms, simulated annealing Machine Learning, LIX004M5 – p.25/29 Machine Learning, LIX004M5 – p.26/29 Machine Learning, LIX004M5 – p.27/29
  • 4.
    Take-home messages What’s next? • ML = algorithms that learn from experience This week: Read ch. 1, ch. 2 & ch. 5 of Mitchell and • generalize instead of memorize look at the exercises • different types of inductive bias Lab on Friday: Evaluation in ML, introduction of final projects, small exercises • ML has many fascinating sub-tasks Next week: Decision trees, ch. 3, lab session on Enjoy working with learning systems! Decision Trees & IBL Machine Learning, LIX004M5 – p.28/29 Machine Learning, LIX004M5 – p.29/29