Your SlideShare is downloading. ×
original
original
original
original
original
original
original
original
original
original
original
original
original
original
original
original
original
original
original
original
original
original
original
original
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

original

376

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
376
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
5
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Friday, 12 January 2007 William H. Hsu Department of Computing and Information Sciences, KSU http:// www.kddresearch.org http:// www.cis.ksu.edu/~bhsu Readings: Chapter 1, Mitchell A Brief Survey of Machine Learning CIS 732/830: Lecture 0
  • 2. Lecture Outline
    • Course Information: Format, Exams, Homework, Programming, Grading
    • Class Resources: Course Notes, Web/News/Mail, Reserve Texts
    • Overview
      • Topics covered
      • Applications
      • Why machine learning?
    • Brief Tour of Machine Learning
      • A case study
      • A taxonomy of learning
      • Intelligent systems engineering: specification of learning problems
    • Issues in Machine Learning
      • Design choices
      • The performance element: intelligent systems
    • Some Applications of Learning
      • Database mining, reasoning (inference/decision support), acting
      • Industrial usage of intelligent systems
  • 3. Course Information and Administrivia
    • Instructor: William H. Hsu
      • E-mail: [email_address]
      • Office phone: (785) 532-6350 x29; home phone: (785) 539-7180; ICQ 28651394
      • Office hours: Wed/Thu/Fri (213 Nichols); by appointment, Tue
    • Grading
      • Assignments (8): 32%, paper reviews (13): 10%, midterm: 20%, final: 25%
      • Class participation: 5%
      • Lowest homework score and lowest 3 paper summary scores dropped
    • Homework
      • Eight assignments: written (5) and programming (3) – drop lowest
      • Late policy: due on Fridays
      • Cheating: don’t do it; see class web page for policy
    • Project Option
      • CIS 690/890 project option for graduate students
      • Term paper or semester research project; sign up by 09 Feb 2007
  • 4. Class Resources
    • Web Page (Required)
      • http://www.kddresearch.org/Courses/Spring-2007/CIS732
      • Lecture notes (MS PowerPoint XP, PDF)
      • Homeworks (MS PowerPoint XP, PDF)
      • Exam and homework solutions (MS PowerPoint XP, PDF)
      • Class announcements (students responsibility to follow) and grade postings
    • Course Notes at Copy Center (Required)
    • Course Web Group
      • Mirror of announcements from class web page
      • Discussions (instructor and other students)
      • Dated research announcements (seminars, conferences, calls for papers)
    • Mailing List (Automatic)
      • [email_address]
      • Sign-up sheet
      • Reminders, related research announcements
  • 5. Course Overview
    • Learning Algorithms and Models
      • Models: decision trees, linear threshold units (winnow, weighted majority), neural networks, Bayesian networks (polytrees, belief networks, influence diagrams, HMMs), genetic algorithms, instance-based (nearest-neighbor)
      • Algorithms (e.g., for decision trees): ID3, C4.5, CART, OC1
      • Methodologies: supervised, unsupervised, reinforcement; knowledge-guided
    • Theory of Learning
      • Computational learning theory (COLT): complexity, limitations of learning
      • Probably Approximately Correct (PAC) learning
      • Probabilistic, statistical, information theoretic results
    • Multistrategy Learning: Combining Techniques, Knowledge Sources
    • Data: Time Series, Very Large Databases (VLDB), Text Corpora
    • Applications
      • Performance element: classification, decision support, planning, control
      • Database mining and knowledge discovery in databases (KDD)
      • Computer inference: learning to reason
  • 6. Why Machine Learning?
    • New Computational Capability
      • Database mining: converting (technical) records into knowledge
      • Self-customizing programs: learning news filters, adaptive monitors
      • Learning to act: robot planning, control optimization, decision support
      • Applications that are hard to program: automated driving, speech recognition
    • Better Understanding of Human Learning and Teaching
      • Cognitive science: theories of knowledge acquisition (e.g., through practice)
      • Performance elements: reasoning (inference) and recommender systems
    • Time is Right
      • Recent progress in algorithms and theory
      • Rapidly growing volume of online data from various sources
      • Available computational power
      • Growth and interest of learning-based industries (e.g., data mining/KDD)
  • 7. Rule and Decision Tree Learning
    • Example: Rule Acquisition from Historical Data
    • Data
      • Patient 103 (time = 1): Age 23, First-Pregnancy: no, Anemia: no, Diabetes: no, Previous-Premature-Birth: no, Ultrasound: unknown , Elective C-Section: unknown , Emergency-C-Section: unknown
      • Patient 103 (time = 2): Age 23, First-Pregnancy: no, Anemia: no, Diabetes: yes , Previous-Premature-Birth: no, Ultrasound: abnormal , Elective C-Section: no, Emergency-C-Section: unknown
      • Patient 103 (time = n): Age 23, First-Pregnancy: no, Anemia: no, Diabetes: no, Previous-Premature-Birth: no, Ultrasound: unknown , Elective C-Section: no, Emergency-C-Section: YES
    • Learned Rule
      • IF no previous vaginal delivery , AND abnormal 2nd trimester ultrasound , AND malpresentation at admission , AND no elective C-Section THEN probability of emergency C-Section is 0.6
      • Training set: 26/41 = 0.634
      • Test set: 12/20 = 0.600
  • 8. Neural Network Learning
    • A utonomous L earning V ehicle I n a N eural N et (ALVINN): Pomerleau et al
      • http://www.ri.cmu.edu/projects/project_160.html
      • Drives 70mph on highways
  • 9. Relevant Disciplines
    • Artificial Intelligence
    • Bayesian Methods
    • Cognitive Science
    • Computational Complexity Theory
    • Control Theory
    • Information Theory
    • Neuroscience
    • Philosophy
    • Psychology
    • Statistics
    Machine Learning Symbolic Representation Planning/Problem Solving Knowledge-Guided Learning Bayes’s Theorem Missing Data Estimators PAC Formalism Mistake Bounds Language Learning Learning to Reason Optimization Learning Predictors Meta-Learning Entropy Measures MDL Approaches Optimal Codes ANN Models Modular Learning Occam’s Razor Inductive Generalization Power Law of Practice Heuristic Learning Bias/Variance Formalism Confidence Intervals Hypothesis Testing
  • 10. Specifying A Learning Problem
    • Learning = Improving with Experience at Some Task
      • Improve over task T,
      • with respect to performance measure P ,
      • based on experience E .
    • Example: Learning to Play Checkers
      • T : play games of checkers
      • P : percent of games won in world tournament
      • E : opportunity to play against self
    • Refining the Problem Specification: Issues
      • What experience?
      • What exactly should be learned?
      • How shall it be represented ?
      • What specific algorithm to learn it?
    • Defining the Problem Milieu
      • Performance element: How shall the results of learning be applied?
      • How shall the performance element be evaluated? The learning system?
  • 11. Example: Learning to Play Checkers
    • Type of Training Experience
      • Direct or indirect?
      • Teacher or not?
      • Knowledge about the game (e.g., openings/endgames)?
    • Problem: Is Training Experience Representative (of Performance Goal)?
    • Software Design
      • Assumptions of the learning system: legal move generator exists
      • Software requirements: generator, evaluator(s), parametric target function
    • Choosing a Target Function
      • ChooseMove: Board  Move (action selection function, or policy )
      • V: Board  R (board evaluation function)
      • Ideal target V ; approximated target
      • Goal of learning process: operational description (approximation) of V
  • 12. A Target Function for Learning to Play Checkers
    • Possible Definition
      • If b is a final board state that is won, then V(b) = 100
      • If b is a final board state that is lost, then V(b) = -100
      • If b is a final board state that is drawn, then V(b) = 0
      • If b is not a final board state in the game, then V(b) = V(b’) where b’ is the best final board state that can be achieved starting from b and playing optimally until the end of the game
      • Correct values, but not operational
    • Choosing a Representation for the Target Function
      • Collection of rules?
      • Neural network?
      • Polynomial function (e.g., linear, quadratic combination) of board features?
      • Other?
    • A Representation for Learned Function
      • bp/rp = number of black/red pieces; bk/rk = number of black/red kings; bt/rt = number of black/red pieces threatened (can be taken on next turn)
  • 13. A Training Procedure for Learning to Play Checkers
    • Obtaining Training Examples
      • the target function
      • the learned function
      • the training value
    • One Rule For Estimating Training Values:
    • Choose Weight Tuning Rule
      • Least Mean Square (LMS) weight update rule: REPEAT
          • Select a training example b at random
          • Compute the error(b) for this training example
          • For each board feature f i , update weight w i as follows: where c is a small, constant factor to adjust the learning rate
  • 14. Design Choices for Learning to Play Checkers Completed Design Determine Type of Training Experience Games against experts Games against self Table of correct moves Determine Target Function Board  value Board  move Determine Representation of Learned Function Polynomial Linear function of six features Artificial neural network Determine Learning Algorithm Gradient descent Linear programming
  • 15. Some Issues in Machine Learning
    • What Algorithms Can Approximate Functions Well? When?
    • How Do Learning System Design Factors Influence Accuracy?
      • Number of training examples
      • Complexity of hypothesis representation
    • How Do Learning Problem Characteristics Influence Accuracy?
      • Noisy data
      • Multiple data sources
    • What Are The Theoretical Limits of Learnability?
    • How Can Prior Knowledge of Learner Help?
    • What Clues Can We Get From Biological Learning Systems?
    • How Can Systems Alter Their Own Representation?
  • 16. Interesting Applications Reasoning (Inference, Decision Support) Cartia ThemeScapes - http:// www.cartia.com 6500 news stories from the WWW in 1997 Planning, Control Normal Ignited Engulfed Destroyed Extinguished Fire Alarm Flooding DC-ARM - http://www- kbs.ai.uiuc.edu Database Mining NCSA D2K - http:// alg.ncsa.uiuc.edu
  • 17. Lecture Outline
    • Read: Chapter 2, Mitchell; Section 5.1.2, Buchanan and Wilkins
    • Suggested Exercises: 2.2, 2.3, 2.4, 2.6
    • Taxonomy of Learning Systems
    • Learning from Examples
      • (Supervised) concept learning framework
      • Simple approach: assumes no noise; illustrates key concepts
    • General-to-Specific Ordering over Hypotheses
      • Version space: partially-ordered set (poset) formalism
      • Candidate elimination algorithm
      • Inductive learning
    • Choosing New Examples
    • Next Week
      • The need for inductive bias: 2.7, Mitchell; 2.4.1-2.4.3, Shavlik and Dietterich
      • Computational learning theory (COLT): Chapter 7, Mitchell
      • PAC learning formalism: 7.2-7.4, Mitchell; 2.4.2, Shavlik and Dietterich
  • 18. What to Learn?
    • Classification Functions
      • Learning hidden functions: estimating (“fitting”) parameters
      • Concept learning (e.g., chair, face, game)
      • Diagnosis, prognosis: medical, risk assessment, fraud, mechanical systems
    • Models
      • Map (for navigation)
      • Distribution (query answering, aka QA)
      • Language model (e.g., automaton/grammar)
    • Skills
      • Playing games
      • Planning
      • Reasoning (acquiring representation to use in reasoning)
    • Cluster Definitions for Pattern Recognition
      • Shapes of objects
      • Functional or taxonomic definition
    • Many Problems Can Be Reduced to Classification
  • 19. How to Learn It?
    • Supervised
      • What is learned? Classification function; other models
      • Inputs and outputs? Learning:
      • How is it learned? Presentation of examples to learner (by teacher)
    • Unsupervised
      • Cluster definition, or vector quantization function ( codebook )
      • Learning:
      • Formation, segmentation, labeling of clusters based on observations, metric
    • Reinforcement
      • Control policy (function from states of the world to actions)
      • Learning:
      • (Delayed) feedback of reward values to agent based on actions selected; model updated based on reward, (partially) observable state
  • 20. (Supervised) Concept Learning
    • Given: Training Examples < x, f(x) > of Some Unknown Function f
    • Find: A Good Approximation to f
    • Examples (besides Concept Learning)
      • Disease diagnosis
        • x = properties of patient (medical history, symptoms, lab tests)
        • f = disease (or recommended therapy)
      • Risk assessment
        • x = properties of consumer, policyholder (demographics, accident history)
        • f = risk level (expected cost)
      • Automatic steering
        • x = bitmap picture of road surface in front of vehicle
        • f = degrees to turn the steering wheel
      • Part-of-speech tagging
      • Fraud/intrusion detection
      • Web log analysis
      • Multisensor integration and prediction
  • 21. A Learning Problem Unknown Function x 1 x 2 x 3 x 4 y = f (x 1 , x 2 , x 3 , x 4 )
    • x i : t i , y: t, f : (t 1  t 2  t 3  t 4 )  t
    • Our learning function: Vector (t 1  t 2  t 3  t 4  t)  (t 1  t 2  t 3  t 4 )  t
  • 22. Hypothesis Space: Unrestricted Case
    • | A  B | = | B | | A |
    • | H 4  H | = | {0,1}  {0,1}  {0,1}  {0,1}  {0,1} | = 2 2 4 = 65536 function values
    • Complete Ignorance: Is Learning Possible?
      • Need to see every possible input/output pair
      • After 7 examples, still have 2 9 = 512 possibilities (out of 65536) for f
  • 23. Training Examples for Concept EnjoySport
    • Specification for Examples
      • Similar to a data type definition
      • 6 attributes: Sky, Temp, Humidity, Wind, Water, Forecast
      • Nominal-valued (symbolic) attributes - enumerative data type
    • Binary (Boolean-Valued or H -Valued) Concept
    • Supervised Learning Problem: Describe the General Concept
  • 24. Representing Hypotheses
    • Many Possible Representations
    • Hypothesis h : Conjunction of Constraints on Attributes
    • Constraint Values
      • Specific value (e.g., Water = Warm )
      • Don’t care (e.g., “Water = ?” )
      • No value allowed (e.g., “Water = Ø”)
    • Example Hypothesis for EnjoySport
      • Sky AirTemp Humidity Wind Water Forecast <Sunny ? ? Strong ? Same>
      • Is this consistent with the training examples?
      • What are some hypotheses that are consistent with the examples?

×