Unit 2
Classification
Classification
• Introduction
• Statistical Based Algorithm
• Distance Based Algorithm
• Tree Based Algorithm
• Rule Based Algorithm
• Neural Network Based Algorithm
• Combining Technique
Introduction
• Classification involves mapping of input data to appropriate
classes.
• Def: Given a database D = {t1 , t2 , ... , tn } of tuples (items,
records) and a set of classes C = { C 1, ... , Cm }, the
classification problem is to define a mapping f: D C where
each ti is assigned to one class. A class, Cj , contains precisely
those tuples mapped to it; that is, Cj = {ti |f(ti ) = Cj , 1 ≤ i ≤ n and
ti E D}.
• The problem is implemented in two phases:
1.Create a specific model by evaluating the training data.
2. Apply the model to classifying tuples from the target database.
Introduction
Introduction
• Issues In Classification:.
1. Missing Data
2. Measuring Performance.
Missing Data
There are many approaches to handle the missing data:
• Ignore the missing data.
• Assume a value for the missing data.
• Assume a special value for the missing data.
Measuring Performance and
Accuracy
• Classification accuracy is usually calculated by determining the
percentage of tuples placed in the correct class.
• Given a specific class and a database tuple may or may not be
assigned to that class while its actual membership may or may
not be in that class. This gives us four quadrants:
• True positive (TP): 𝑡𝑖 predicted to be in 𝐶𝑗 and is actually in it.
• False positive (FP): 𝑡𝑖 predicted to be in 𝐶𝑗 but is not actually in
it.
• True negative (TN): 𝑡𝑖 not predicted to be in 𝐶𝑗 and is not
actually in it.
• False negative (FN): 𝑡𝑖 not predicted to be in 𝐶𝑗 but is actually in
it.
Measuring Performance and
Accuracy
Measuring Performance and
Accuracy
Measuring Performance and
Accuracy

Lecture1.ppt

  • 1.
  • 2.
    Classification • Introduction • StatisticalBased Algorithm • Distance Based Algorithm • Tree Based Algorithm • Rule Based Algorithm • Neural Network Based Algorithm • Combining Technique
  • 3.
    Introduction • Classification involvesmapping of input data to appropriate classes. • Def: Given a database D = {t1 , t2 , ... , tn } of tuples (items, records) and a set of classes C = { C 1, ... , Cm }, the classification problem is to define a mapping f: D C where each ti is assigned to one class. A class, Cj , contains precisely those tuples mapped to it; that is, Cj = {ti |f(ti ) = Cj , 1 ≤ i ≤ n and ti E D}. • The problem is implemented in two phases: 1.Create a specific model by evaluating the training data. 2. Apply the model to classifying tuples from the target database.
  • 4.
  • 5.
    Introduction • Issues InClassification:. 1. Missing Data 2. Measuring Performance.
  • 6.
    Missing Data There aremany approaches to handle the missing data: • Ignore the missing data. • Assume a value for the missing data. • Assume a special value for the missing data.
  • 7.
    Measuring Performance and Accuracy •Classification accuracy is usually calculated by determining the percentage of tuples placed in the correct class. • Given a specific class and a database tuple may or may not be assigned to that class while its actual membership may or may not be in that class. This gives us four quadrants: • True positive (TP): 𝑡𝑖 predicted to be in 𝐶𝑗 and is actually in it. • False positive (FP): 𝑡𝑖 predicted to be in 𝐶𝑗 but is not actually in it. • True negative (TN): 𝑡𝑖 not predicted to be in 𝐶𝑗 and is not actually in it. • False negative (FN): 𝑡𝑖 not predicted to be in 𝐶𝑗 but is actually in it.
  • 8.
  • 9.
  • 10.