Classification Learn a method for predicting the instance class from pre-labeled (classified) instances Many approaches: Regression, Decision Trees, Bayesian, Neural Networks, ... Given a set of points from classes what is the class of new point ?
A branch represents an outcome of the test, e.g., Color=red.
A leaf node represents a class label or class label distribution.
At each node, one attribute is chosen to split training examples into distinct classes as much as possible
A new instance is classified by following a matching path to a leaf node.
Weather Data: Play or not Play? Note: Outlook is the Forecast, no relation to Microsoft email program No true high mild rain Yes false normal hot overcast Yes true high mild overcast Yes true normal mild sunny Yes false normal mild rain Yes false normal cool sunny No false high mild sunny Yes true normal cool overcast No true normal cool rain Yes false normal cool rain Yes false high mild rain Yes false high hot overcast No true high hot sunny No false high hot sunny Play? Windy Humidity Temperature Outlook
Example Tree for “Play?” overcast high normal false true sunny rain No No Yes Yes Yes Outlook Humidity Windy
Accuracy on the entire dataset is not the right measure
develop a target model
score all prospects and rank them by decreasing score
select top P% of prospects for action
How do we decide what is the best subset of prospects ?
Model-Sorted List Use a model to assign score to each customer Sort customers by decreasing score Expect more targets (hits) near the top of the list 3 hits in top 5% of the list If there 15 targets overall, then top 5 has 3/15=20% of targets … 4897 N 0.92 5 2422 2734 … 3820 2478 1024 1746 CustID N 0.06 100 … N 0.11 99 … … … … … Age Y 0.93 4 … … Y 0.94 3 N 0.95 2 Y 0.97 1 Target Score No