Lecture1.ppt

Classification
• Introduction
• Statistical Based Algorithm
• Distance Based Algorithm
• Tree Based Algorithm
• Rule Based Algorithm
• Neural Network Based Algorithm
• Combining Technique

Introduction
• Classification involves mapping of input data to appropriate
classes.
• Def: Given a database D = {t1 , t2 , ... , tn } of tuples (items,
records) and a set of classes C = { C 1, ... , Cm }, the
classification problem is to define a mapping f: D C where
each ti is assigned to one class. A class, Cj , contains precisely
those tuples mapped to it; that is, Cj = {ti |f(ti ) = Cj , 1 ≤ i ≤ n and
ti E D}.
• The problem is implemented in two phases:
1.Create a specific model by evaluating the training data.
2. Apply the model to classifying tuples from the target database.

Introduction
• Issues In Classification:.
1. Missing Data
2. Measuring Performance.

Missing Data
There are many approaches to handle the missing data:
• Ignore the missing data.
• Assume a value for the missing data.
• Assume a special value for the missing data.

Measuring Performance and
Accuracy
• Classification accuracy is usually calculated by determining the
percentage of tuples placed in the correct class.
• Given a specific class and a database tuple may or may not be
assigned to that class while its actual membership may or may
not be in that class. This gives us four quadrants:
• True positive (TP): 𝑡𝑖 predicted to be in 𝐶𝑗 and is actually in it.
• False positive (FP): 𝑡𝑖 predicted to be in 𝐶𝑗 but is not actually in
it.
• True negative (TN): 𝑡𝑖 not predicted to be in 𝐶𝑗 and is not
actually in it.
• False negative (FN): 𝑡𝑖 not predicted to be in 𝐶𝑗 but is actually in
it.

Measuring Performance and
Accuracy

Lecture1.ppt

More Related Content

Similar to Lecture1.ppt

More from Minakshee Patil

Recently uploaded

Lecture1.ppt