Software-Praktikum SoSe 2005 Lehrstuhl fuer Maschinelles ...

  • 505 views
Uploaded on

 

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
505
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
7
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Lehrstuhl fuer Maschinelles Lernen und Natuerlich Sprachliche Systeme Albrecht Zimmernann, Tayfun Guerel, Kristian Kersting , Prof. Dr. Luc De Raedt, Machine Learning in Games Crash Course on Machine Learning
  • 2. Why Machine Learning?
    • Past
    • Computers (mostly) programmed by hand
    • Future
    • Computers (mostly) program themselves, by interaction with their environment
  • 3. Behavioural Cloning / Verhaltensimitation plays logs plays User model
  • 4. Backgammon
    • More than 10 20 states (boards)
    • Best human players see only small fraction of all boards during lifetime
    • Searching is hard because of dice (branching factor > 100)
  • 5. TD-Gammon by Tesauro (1995)
  • 6. Recent Trends
    • Recent progress in algorithms and theory
    • Growing flood of online data
    • Computational power is available
    • Growing industry
  • 7. Three Niches for Machine Learning
    • Data mining: using historical data to improve decisions
      • Medical records  medical knowledge
    • Software applications we can’t program by hand
      • Autonomous driving
      • Speech recognition
    • Self customizing programs
      • Newsreader that learns user interests
  • 8. Typical Data Mining task
    • Given:
      • 9,714 patient records, each describing pregnancy and birth
      • Each patient record contains 215 features
    • Learn to predict:
      • Class of future patients at risk for Emergency Cesarean Section
  • 9. Data Mining Result
    • One of 18 learned rules:
      • If no previous vaginal delivery
      • abnormal 2 nd Trimester Ultrasound
      • Malpresentation at admission
      • Then Probability of Emergency C-Section is 0.6
    • Accuracy over training data: 26/14 = .63
    • Accuracy over testing data: 12/20 = .60
  • 10. Credit Risk Analysis
    • Learned Rules:
      • If Other-Delinquent-Accounts > 2
      • Number-Delinquent-Billing-Cycles > 1
      • Then Profitable-Customer? = no
      • If Other-Delinquent-Accounts = 0
      • (Income > $30k OR Years-of-credit > 3)
      • Then Profitable-Customer? = yes
  • 11. Other Prediction Problems Process optimization Customer Purchase behavior Customer retention
  • 12. Problems Too Difficult to Program by Hand
    • ALVINN [Pomerlau] drives 70mph on highways
  • 13. Problems Too Difficult to Program by Hand
    • ALVINN [Pomerlau] drives 70mph on highways
  • 14. Software that Customizes to User
  • 15. Lehrstuhl fuer Maschinelles Lernen und Natuerlich Sprachliche Systeme Albrecht Zimmernann, Tayfun Guerel, Kristian Kersting , Prof. Dr. Luc De Raedt, Machine Learning in Games Crash Course on Decision Tree Learning Refund MarSt TaxInc YES NO NO NO Yes No Married Single, Divorced < 80K > 80K
  • 16. Classification: Definition
    • Given a collection of records ( training set )
      • Each record contains a set of attributes , one of the attributes is the class .
    • Find a model for class attribute as a function of the values of other attributes.
    • Goal: previously unseen records should be assigned a class as accurately as possible.
      • A test set is used to determine the accuracy of the model. Usually, the given data set is divided into training and test sets, with training set used to build the model and test set used to validate it.
  • 17. Illustrating Classification Task
  • 18. Examples of Classification Task
    • Predicting tumor cells as benign or malignant
    • Classifying credit card transactions as legitimate or fraudulent
    • Classifying secondary structures of protein as alpha-helix, beta-sheet, or random coil
    • Categorizing news stories as finance, weather, entertainment, sports, etc
  • 19. Classification Techniques
    • Decision Tree based Methods
    • Rule-based Methods
    • Instance-Based Learners
    • Neural Networks
    • Bayesian Networks
    • (Conditional) Random Fields
    • Support Vector Machines
    • Inductive Logic Programming
    • Statistical Relational Learning
  • 20. Decision Tree for PlayTennis Outlook Sunny Overcast Rain Humidity High Normal Wind Strong Weak No Yes Yes Yes No
  • 21. Decision Tree for PlayTennis Outlook Sunny Overcast Rain Humidity High Normal No Yes Each internal node tests an attribute Each branch corresponds to an attribute value node Each leaf node assigns a classification
  • 22. Decision Tree for PlayTennis Outlook Temperature Humidity Wind PlayTennis Sunny Hot High Weak ? No Outlook Sunny Overcast Rain Humidity High Normal Wind Strong Weak No Yes Yes Yes No
  • 23. Decision Tree for Conjunction Outlook Sunny Overcast Rain Wind Strong Weak No Yes No Outlook=Sunny  Wind=Weak No
  • 24. Decision Tree for Disjunction Outlook Sunny Overcast Rain Yes Outlook=Sunny  Wind=Weak Wind Strong Weak No Yes Wind Strong Weak No Yes
  • 25. Decision Tree for XOR Outlook Sunny Overcast Rain Wind Strong Weak Yes No Outlook=Sunny XOR Wind=Weak Wind Strong Weak No Yes Wind Strong Weak No Yes
  • 26. Decision Tree
    • decision trees represent disjunctions of conjunctions
    (Outlook=Sunny  Humidity=Normal)  (Outlook=Overcast)  (Outlook=Rain  Wind=Weak) Outlook Sunny Overcast Rain Humidity High Normal Wind Strong Weak No Yes Yes Yes No
  • 27. When to consider Decision Trees
    • Instances describable by attribute-value pairs
    • Target function is discrete valued
    • Disjunctive hypothesis may be required
    • Possibly noisy training data
    • Missing attribute values
    • Examples:
      • Medical diagnosis
      • Credit risk analysis
      • RTS Games ?
  • 28. Decision Tree Induction
    • Many Algorithms:
      • Hunt’s Algorithm (one of the earliest)
      • CART
      • ID3 , C4.5
  • 29. Top-Down Induction of Decision Trees ID3
    • A  the “best” decision attribute for next node
    • Assign A as decision attribute for node
    • 3. For each value of A create new descendant
    • Sort training examples to leaf node according to
    • the attribute value of the branch
    • If all training examples are perfectly classified (same value of target attribute) stop, else iterate over new leaf nodes.
  • 30. Which Attribute is ”best”?
    • Example:
      • 2 Attributes, 1 class variable
      • 64 examples: 29+, 35-
    A 1 =? True False [21+, 5-] [8+, 30-] [29+,35-] A 2 =? True False [18+, 33-] [11+, 2-] [29+,35-]
  • 31. Entropy
    • S is a sample of training examples
    • p + is the proportion of positive examples
    • p - is the proportion of negative examples
    • Entropy measures the impurity of S
      • Entropy(S) = -p + log 2 p + - p - log 2 p -
  • 32. Entropy
    • Entropy(S) = expected number of bits needed to encode class (+ or -) of randomly drawn members of S (under the optimal, shortest length-code)
    • Information theory : optimal length code assigns
    • – log 2 p bits to messages having probability p.
    • So, the expected number of bits to encode
    • (+ or -) of random member of S:
      • -p + log 2 p + - p - log 2 p -
    • (log 0 = 0)
  • 33. Information Gain
    • Gain(S,A): expected reduction in entropy due to sorting S on attribute A
    Gain(S,A)=Entropy(S) -  v  values(A) |S v |/|S| Entropy(S v ) Entropy([29+,35-]) = -29/64 log 2 29/64 – 35/64 log 2 35/64 = 0.99 A 1 =? True False [21+, 5-] [8+, 30-] [29+,35-] A 2 =? True False [18+, 33-] [11+, 2-] [29+,35-]
  • 34. Information Gain
    • Entropy([21+,5-]) = 0.71
    • Entropy([8+,30-]) = 0.74
    • Gain(S,A 1 )=Entropy(S)
    • -26/64*Entropy([21+,5-])
    • -38/64*Entropy([8+,30-])
    • =0.27
    Entropy([18+,33-]) = 0.94 Entropy([11+,2-]) = 0.62 Gain(S,A 2 )=Entropy(S) -51/64*Entropy([18+,33-]) -13/64*Entropy([11+,2-]) =0.12 Entropy(S)=Entropy([29+,35-]) = 0.99 A 1 =? True False [21+, 5-] [8+, 30-] [29+,35-] A 2 =? True False [18+, 33-] [11+, 2-] [29+,35-]
  • 35. Another Example
    • 14 training-example (9+, 5-) days for playing tennis
      • Wind: weak, strong
      • Humidity: high, normal
  • 36. Another Example Humidity High Normal [ 3 +, 4 -] [ 6 +, 1 -] S= [9+,5-] E=0.940 Gain(S,Humidity) =0.940-(7/14)*0.985 – (7/14)*0.592 =0.151 E=0.985 E=0.592 Wind Weak Strong [ 6 +, 2 -] [ 3 +, 3 -] S= [9+,5-] E=0.940 E=0.811 E=1.0 Gain(S,Wind) =0.940-(8/14)*0.811 – (6/14)*1.0 =0.048
  • 37. Yet Another Example: Playing Tennis No Strong High Mild Rain D14 Yes Weak Normal Hot Overcast D13 Yes Strong High Mild Overcast D12 Yes Strong Normal Mild Sunny D11 Yes Strong Normal Mild Rain D10 Yes Weak Normal Cold Sunny D9 No Weak High Mild Sunny D8 Yes Weak Normal Cool Overcast D7 No Strong Normal Cool Rain D6 Yes Weak Normal Cool Rain D5 Yes Weak High Mild Rain D4 Yes Weak High Hot Overcast D3 No Strong High Hot Sunny D2 No Weak High Hot Sunny D1 Play Tennis Wind Humidity Temp. Outlook Day
  • 38. PlayTennis - Selecting Next Attribute Outlook Sunny Rain [ 2 +, 3 -] [ 3 +, 2 -] S= [9+,5-] E=0.940 Gain(S,Outlook) =0.940-(5/14)*0.971 -(4/14)*0.0 – (5/14)*0.0971 =0.247 E=0.971 E=0.971 Over cast [ 4 +, 0 ] E=0.0 Gain(S,Humidity) = 0.151 Gain(S,Wind) = 0.048 Gain(S,Temp) = 0.029
  • 39. PlayTennis - ID3 Algorithm Outlook Sunny Overcast Rain Yes [D1,D2,…,D14] [9+,5-] S sunny =[D1,D2,D8,D9,D11] [2+,3-] ? ? [D3,D7,D12,D13] [4+,0-] [D4,D5,D6,D10,D14] [3+,2-] Gain(S sunny , Humidity)=0.970-(3/5)0.0 – 2/5(0.0) = 0.970 Gain(S sunny , Temp.)=0.970-(2/5)0.0 –2/5(1.0)-(1/5)0.0 = 0.570 Gain(S sunny , Wind)=0.970= -(2/5)1.0 – 3/5(0.918) = 0.019
  • 40. ID3 Algorithm Outlook Sunny Overcast Rain Humidity High Normal Wind Strong Weak No Yes Yes Yes No [D3,D7,D12,D13] [D8,D9,D11] [D6,D14] [D1,D2] [D4,D5,D10]
  • 41. Hypothesis Space Search ID3 + - + A2 - A4 + - A2 - A3 - + + - + + - + A1 - - + + - + A2 + - -
  • 42. Hypothesis Space Search ID3
    • Hypothesis space is complete!
      • Target function surely in there…
    • Outputs a single hypothesis
    • No backtracking on selected attributes (greedy search)
      • Local minimal (suboptimal splits)
    • Statistically-based search choices
      • Robust to noisy data
    • Inductive bias (search bias)
      • Prefer shorter trees over longer ones
      • Place high information gain attributes close to the root
  • 43. Converting a Tree to Rules R 1 : If (Outlook=Sunny)  (Humidity=High) Then PlayTennis=No R 2 : If (Outlook=Sunny)  (Humidity=Normal) Then PlayTennis=Yes R 3 : If (Outlook=Overcast) Then PlayTennis=Yes R 4 : If (Outlook=Rain)  (Wind=Strong) Then PlayTennis=No R 5 : If (Outlook=Rain)  (Wind=Weak) Then PlayTennis=Yes Outlook Sunny Overcast Rain Humidity High Normal Wind Strong Weak No Yes Yes Yes No
  • 44. Conclusions
    • Decision tree learning provides a practical method for concept learning.
    • ID3-like algorithms search complete hypothesis space.
    • The inductive bias of decision trees is preference (search) bias.
    • Overfitting (you will see it, ;-)) the training data is an important issue in decision tree learning.
    • A large number of extensions of the ID3 algorithm have been proposed for overfitting avoidance, handling missing attributes, handling numerical attributes, etc (feel free to try them out).