MATH3346 Data Mining A Machine Learning Framework for Data Mining
Upcoming SlideShare
Loading in...5
×
 

MATH3346 Data Mining A Machine Learning Framework for Data Mining

on

  • 1,027 views

 

Statistics

Views

Total Views
1,027
Views on SlideShare
1,026
Embed Views
1

Actions

Likes
0
Downloads
6
Comments
0

1 Embed 1

http://www.slideshare.net 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

MATH3346 Data Mining A Machine Learning Framework for Data Mining MATH3346 Data Mining A Machine Learning Framework for Data Mining Presentation Transcript

  • A Learning Framework A Learning Framework Concept Learning Concept Learning 1 A Learning Framework MATH3346 Data Mining The Learning Problem A Machine Learning Framework for Data Playing Draughts Mining Knowledge Representation Algorithm Graham.Williams@togaware.com 2 Concept Learning Example Hypotheses August 2005 c 2005 Graham.Williams@togaware.com MATH3346 Data Mining: Machine Learning c 2005 Graham.Williams@togaware.com MATH3346 Data Mining: Machine Learning The Learning Problem The Learning Problem A Learning Framework Playing Draughts A Learning Framework Playing Draughts Concept Learning Knowledge Representation Concept Learning Knowledge Representation Algorithm Algorithm Reference Book 1 A Learning Framework The Learning Problem Playing Draughts Knowledge Representation Machine Learning Algorithm Tom Mitchell 1997, McGraw-Hill ISBN: 0070428077. 2 Concept Learning Example Hypotheses c 2005 Graham.Williams@togaware.com MATH3346 Data Mining: Machine Learning c 2005 Graham.Williams@togaware.com MATH3346 Data Mining: Machine Learning
  • The Learning Problem The Learning Problem A Learning Framework Playing Draughts A Learning Framework Playing Draughts Concept Learning Knowledge Representation Concept Learning Knowledge Representation Algorithm Algorithm What is the Learning Problem? A Framework For Learning From what data do we learn? Learning = Improving with experience at some task Supervised versus Unsupervised Improve over task T , with respect to performance measure P, How to represent the knowledge discovered? based on experience E . Group means E.g., Learn to play draughts (checkers) Regression formula T : Play draughts Decision tree P: % of games won in world tournament Neural Network E : opportunity to play against self How to discover the sentence that best describes data? Search through the representation space c 2005 Graham.Williams@togaware.com MATH3346 Data Mining: Machine Learning c 2005 Graham.Williams@togaware.com MATH3346 Data Mining: Machine Learning The Learning Problem The Learning Problem A Learning Framework Playing Draughts A Learning Framework Playing Draughts Concept Learning Knowledge Representation Concept Learning Knowledge Representation Algorithm Algorithm Learning to Play Draughts Training Experience T : Play draughts Direct Training: current board → move P: Percent of games won in world tournament Indirect Training: moves → outcome Teacher: to guide training as a supervisor What experience? No Teacher: the learner proposes boards and measures What exactly should be learned? performance. How shall it be represented? A problem: is training experience representative of performance What specific algorithm to learn it? goal? c 2005 Graham.Williams@togaware.com MATH3346 Data Mining: Machine Learning c 2005 Graham.Williams@togaware.com MATH3346 Data Mining: Machine Learning
  • The Learning Problem The Learning Problem A Learning Framework Playing Draughts A Learning Framework Playing Draughts Concept Learning Knowledge Representation Concept Learning Knowledge Representation Algorithm Algorithm Choose the Target Function Possible Definition for Target Function V if b is a final board state that is won, then V (b) = 100 What is the best move, given the current layout: if b is a final board state that is lost, then V (b) = −100 ChooseMove : Board → Move if b is a final board state that is drawn, then V (b) = 0 ChooseMove is difficult to learn. if b is a not a final state in the game, then V (b) = V (b ), Evaluate the current board layout: where b is the best final board state that can be achieved V : Board → starting from b and playing optimally until the end of the The aim is to learn an evaluation function. game. This gives correct values, but is not operational - how to make use of this? c 2005 Graham.Williams@togaware.com MATH3346 Data Mining: Machine Learning c 2005 Graham.Williams@togaware.com MATH3346 Data Mining: Machine Learning The Learning Problem The Learning Problem A Learning Framework Playing Draughts A Learning Framework Playing Draughts Concept Learning Knowledge Representation Concept Learning Knowledge Representation Algorithm Algorithm Choose Representation for Target Function A Representation for Learned Function V (b) = w0 +w1 ·bp(b)+w2 ·rp(b)+w3 ·bk(b)+w4 ·rk(b)+w5 ·bt(b)+w6 · Choice of representation is “everything”..... but which one? bp(b): number of black pieces on board b collection of rules? rp(b): number of red pieces on b neural network? bk(b): number of black kings on b decision tree? rk(b): number of red kings on b numeric formula? bt(b): number of red pieces threatened by black (i.e., which polynomial function of board features? can be taken on black’s next turn) ... rt(b): number of black pieces threatened by red c 2005 Graham.Williams@togaware.com MATH3346 Data Mining: Machine Learning c 2005 Graham.Williams@togaware.com MATH3346 Data Mining: Machine Learning
  • The Learning Problem The Learning Problem A Learning Framework Playing Draughts A Learning Framework Playing Draughts Concept Learning Knowledge Representation Concept Learning Knowledge Representation Algorithm Algorithm Obtaining Training Examples Choose Weight Tuning Rule LMS Weight update rule: Minimise squared error E = ˆ − V (b))2 training (Vtrain (b) All we know is the outcome of the game Repeat: V (b): the true target function ˆ Select a training example b at random V (b) : the learned function 1 ˆ Compute error (b): error (b) = Vtrain (b) − V (b) Vtrain (b): the training values (supplied) 2 For each board feature fi (e.g., bp), update weight wi : A simple and empirically useful rule for estimating training values: wi ← wi + c · fi · error (b) ˆ Vtrain (b) ← V (Successor (b)) c is some small constant, say 0.1, to moderate the rate of learning Stochastic gradient-descent search to minimise E c 2005 Graham.Williams@togaware.com MATH3346 Data Mining: Machine Learning c 2005 Graham.Williams@togaware.com MATH3346 Data Mining: Machine Learning The Learning Problem A Learning Framework Playing Draughts A Learning Framework Example Concept Learning Knowledge Representation Concept Learning Hypotheses Algorithm Design Choices Determine Type of Training Experience Games against ... experts Table of correct Games against moves self 1 A Learning Framework Determine The Learning Problem Target Function Playing Draughts Board Board ... Knowledge Representation ¨ move ¨ value Algorithm Determine Representation of Learned Function ... 2 Concept Learning Polynomial Linear function Artificial neural of six features network Example Determine Hypotheses Learning Algorithm Linear ... Gradient programming descent Completed Design c 2005 Graham.Williams@togaware.com MATH3346 Data Mining: Machine Learning c 2005 Graham.Williams@togaware.com MATH3346 Data Mining: Machine Learning
  • A Learning Framework Example A Learning Framework Example Concept Learning Hypotheses Concept Learning Hypotheses Learning a Concept from Examples: EnjoySport Representing Hypotheses Concept learning: infer boolean function from examples of input/output Many possible representations Target concept: Days when Aldo enjoys his water sport Here, h is conjunction of constraints on attributes Each constraint can be a specific value (e.g., Water = Warm) Sky Temp Humid Wind Water Forecast EnjoySport don’t care (e.g., “Water =?”) Sunny Warm Normal Strong Warm Same Yes Sunny Warm High Strong Warm Same Yes no value allowed (e.g.,“Water=∅”) Rainy Cold High Strong Warm Change No For example, Rainy Warm High Strong Cool Change Yes Sky AirTemp Humid Wind Water Forecast Sunny ? ? Strong ? Same What is the general concept? c 2005 Graham.Williams@togaware.com MATH3346 Data Mining: Machine Learning c 2005 Graham.Williams@togaware.com MATH3346 Data Mining: Machine Learning A Learning Framework Example A Learning Framework Example Concept Learning Hypotheses Concept Learning Hypotheses Prototypical Concept Learning Task Inductive Learning Hypothesis Given: Instances X : Possible days, each described by the attributes Sky, AirTemp, Humidity, Wind, Water, Forecast Hypotheses H: Conjunctions of literals. E.g. Any hypothesis found to approximate the target function ?, Cold, High, ?, ?, ? . well over a sufficiently large set of training examples will Target function (concept) c: EnjoySport : X → {0, 1} also approximate the target function well over other Training examples D: Positive and negative examples of unobserved examples. the target function x1 , c(x1 ) , . . . xm , c(xm ) Determine: A hypothesis h in H such that h(x) = c(x) for all x in D. c 2005 Graham.Williams@togaware.com MATH3346 Data Mining: Machine Learning c 2005 Graham.Williams@togaware.com MATH3346 Data Mining: Machine Learning
  • A Learning Framework Example A Learning Framework Example Concept Learning Hypotheses Concept Learning Hypotheses Instances, Hypotheses, and The Learning Problem More-General-Than partial order Instances X Hypotheses H Specific How do we search through this generally very large hypothesis h h space to find the best hypothesis for the task at hand! x1 1 3 h x 2 2 General x1= <Sunny, Warm, High, Strong, Cool, Same> h 1= <Sunny, ?, ?, Strong, ?, ?> x = <Sunny, Warm, High, Light, Warm, Same> h = <Sunny, ?, ?, ?, ?, ?> 2 2 h = <Sunny, ?, ?, ?, Cool, ?> 3 c 2005 Graham.Williams@togaware.com MATH3346 Data Mining: Machine Learning c 2005 Graham.Williams@togaware.com MATH3346 Data Mining: Machine Learning A Learning Framework Example Concept Learning Hypotheses Limits on Representational Languages Consider 2 dimensional instance space— instances are represented by (x, y ). Choice of representational language affects how well we can learn Consider the illustrations from Hastie, Tibshi- rani, Friedman, The Elements of Statistical Learning. c 2005 Graham.Williams@togaware.com MATH3346 Data Mining: Machine Learning