SlideShare a Scribd company logo
Data Mining:
Classification
Classification and Prediction
• What is classification? What is prediction?
• Issues regarding classification and prediction
• Classification by decision tree induction
• Classification:
– predicts categorical class labels
– classifies data (constructs a model) based on the
training set and the values (class labels) in a
classifying attribute and uses it in classifying new data
• Prediction:
– models continuous-valued functions, i.e., predicts
unknown or missing values
• Typical Applications
– credit approval
– target marketing
– medical diagnosis
– treatment effectiveness analysis
Classification vs. Prediction
Classification Learning: Definition
l Given a collection of records (training set)
– Each record contains a set of attributes, one of the
attributes is the class
l Find a model for the class attribute as a function
of the values of the other attributes
l Goal: previously unseen records should be
assigned a class as accurately as possible
– Use test set to estimate the accuracy of the model
– Often, the given data set is divided into training and test
sets, with training set used to build the model and test
set used to validate it.
Illustrating Classification Learning
Apply
Model
Induction
Deduction
Learn
Model
Model
Tid Attrib1 Attrib2 Attrib3 Class
1 Yes Large 125K No
2 No Medium 100K No
3 No Small 70K No
4 Yes Medium 120K No
5 No Large 95K Yes
6 No Medium 60K No
7 Yes Large 220K No
8 No Small 85K Yes
9 No Medium 75K No
10 No Small 90K Yes
10
Tid Attrib1 Attrib2 Attrib3 Class
11 No Small 55K ?
12 Yes Medium 80K ?
13 Yes Large 110K ?
14 No Small 95K ?
15 No Large 67K ?
10
Test Set
Learning
algorithm
Training Set
Examples of Classification Task
l Predicting tumor cells as benign or malignant
l Classifying credit card transactions
as legitimate or fraudulent
l Classifying secondary structures of protein
as alpha-helix, beta-sheet, or random
coil
l Categorizing news stories as finance,
weather, entertainment, sports, etc.
Classification - A Two-Step Process
• Model construction: describing a set of predetermined classes
– Building the Classifier or Model
– Each tuple/sample is assumed to belong to a predefined class, as
determined by the class label attribute
– The set of tuples used for model construction: training set
– The model is represented as classification rules, decision trees, or
mathematical formulae
• Model usage: for classifying future or unknown objects
– Using Classifier for Classification
– Estimate accuracy of the model
• The known label of test sample is compared with the classified result
from the model
• Accuracy rate is the percentage of test set samples that are correctly
classified by the model
• Test set is independent of training set, otherwise over-fitting will
occur
Classification Process (1): Model Construction
Example: Loan application
The data classification process: Learning: Training data are analyzed by a
classification algorithm. Here, the class label attribute is loan decision, and the
learned model or classifier is represented in the form of classification rules.
Classification Process (2): Use the Model
in Prediction
Classification: Test data are used to estimate the accuracy of the classification
rules. If the accuracy is considered acceptable, the rules can be applied to the
classification of new data tuples.
Supervised vs. Unsupervised
Learning
• Supervised learning (classification)
– Supervision: The training data (observations,
measurements, etc.) are accompanied by labels
indicating the class of the observations
– New data is classified based on the training set
• Unsupervised learning (clustering)
– The class labels of training data is unknown
– Given a set of measurements, observations, etc. with
the aim of establishing the existence of classes or
clusters in the data
Issues (1): Data Preparation
• Data cleaning
– Preprocess data in order to reduce noise and handle
missing values
• Relevance analysis (feature selection)
– Remove the irrelevant or redundant attributes
• Data transformation
– Generalize and/or normalize data
Issues (2): Evaluating Classification
Methods
• Predictive accuracy
• Speed and scalability
– time to construct the model
– time to use the model
• Robustness
– handling noise and missing values
• Scalability
– efficiency in disk-resident databases
• Interpretability:
– understanding and insight provded by the model
• Goodness of rules
– decision tree size
– compactness of classification rules
The problem
• Given a set of training cases/objects and their attribute
values, try to determine the target attribute value of new
examples.
– Classification
– Prediction
• Use a decision tree to predict categories for new events.
• Use training data to build the decision tree.
New
Events
Decision
Tree
Category
Training
Events and
Categories

More Related Content

What's hot

Data preparation
Data preparationData preparation
Data preparation
Tony Nguyen
 
Data Warehouse Basic Guide
Data Warehouse Basic GuideData Warehouse Basic Guide
Data Warehouse Basic Guide
thomasmary607
 
Statistics for data science
Statistics for data science Statistics for data science
Statistics for data science
zekeLabs Technologies
 
Data Mining & Applications
Data Mining & ApplicationsData Mining & Applications
Data Mining & Applications
Fazle Rabbi Ador
 
Data mining , Knowledge Discovery Process, Classification
Data mining , Knowledge Discovery Process, ClassificationData mining , Knowledge Discovery Process, Classification
Data mining , Knowledge Discovery Process, Classification
Dr. Abdul Ahad Abro
 
Data mining primitives
Data mining primitivesData mining primitives
Data mining primitives
lavanya marichamy
 
CART – Classification & Regression Trees
CART – Classification & Regression TreesCART – Classification & Regression Trees
CART – Classification & Regression Trees
Hemant Chetwani
 
Data mining technique (decision tree)
Data mining technique (decision tree)Data mining technique (decision tree)
Data mining technique (decision tree)
Shweta Ghate
 
Data warehousing and online analytical processing
Data warehousing and online analytical processingData warehousing and online analytical processing
Data warehousing and online analytical processing
VijayasankariS
 
Statistics and data science
Statistics and data scienceStatistics and data science
Statistics and data science
Mohammad Azharuddin
 
Recovery techniques
Recovery techniquesRecovery techniques
Recovery techniques
Dr. C.V. Suresh Babu
 
Data mining
Data mining Data mining
Data mining
AthiraR23
 
Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data Mining
DataminingTools Inc
 
Data Mining: clustering and analysis
Data Mining: clustering and analysisData Mining: clustering and analysis
Data Mining: clustering and analysis
DataminingTools Inc
 
Etl process in data warehouse
Etl process in data warehouseEtl process in data warehouse
Etl process in data warehouse
Komal Choudhary
 
Data mining: Classification and prediction
Data mining: Classification and predictionData mining: Classification and prediction
Data mining: Classification and prediction
DataminingTools Inc
 

What's hot (20)

Data preprocessing ng
Data preprocessing   ngData preprocessing   ng
Data preprocessing ng
 
Data preparation
Data preparationData preparation
Data preparation
 
Data Warehouse Basic Guide
Data Warehouse Basic GuideData Warehouse Basic Guide
Data Warehouse Basic Guide
 
Data mining notes
Data mining notesData mining notes
Data mining notes
 
Statistics for data science
Statistics for data science Statistics for data science
Statistics for data science
 
Data Mining & Applications
Data Mining & ApplicationsData Mining & Applications
Data Mining & Applications
 
Data mining , Knowledge Discovery Process, Classification
Data mining , Knowledge Discovery Process, ClassificationData mining , Knowledge Discovery Process, Classification
Data mining , Knowledge Discovery Process, Classification
 
Data mining primitives
Data mining primitivesData mining primitives
Data mining primitives
 
CART – Classification & Regression Trees
CART – Classification & Regression TreesCART – Classification & Regression Trees
CART – Classification & Regression Trees
 
Data mining technique (decision tree)
Data mining technique (decision tree)Data mining technique (decision tree)
Data mining technique (decision tree)
 
Data PreProcessing
Data PreProcessingData PreProcessing
Data PreProcessing
 
Data warehousing and online analytical processing
Data warehousing and online analytical processingData warehousing and online analytical processing
Data warehousing and online analytical processing
 
Statistics and data science
Statistics and data scienceStatistics and data science
Statistics and data science
 
Recovery techniques
Recovery techniquesRecovery techniques
Recovery techniques
 
Decision tree
Decision treeDecision tree
Decision tree
 
Data mining
Data mining Data mining
Data mining
 
Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data Mining
 
Data Mining: clustering and analysis
Data Mining: clustering and analysisData Mining: clustering and analysis
Data Mining: clustering and analysis
 
Etl process in data warehouse
Etl process in data warehouseEtl process in data warehouse
Etl process in data warehouse
 
Data mining: Classification and prediction
Data mining: Classification and predictionData mining: Classification and prediction
Data mining: Classification and prediction
 

Similar to Lect8 Classification & prediction

Data mining chapter04and5-best
Data mining chapter04and5-bestData mining chapter04and5-best
Data mining chapter04and5-best
ABDUmomo
 
Machine learning algorithms for data mining
Machine learning algorithms for data miningMachine learning algorithms for data mining
Machine learning algorithms for data mining
Ashikur Rahman
 
Chapter 4 Classification in data sience .pdf
Chapter 4 Classification in data sience .pdfChapter 4 Classification in data sience .pdf
Chapter 4 Classification in data sience .pdf
AschalewAyele2
 
Classification and Prediction.pptx
Classification and Prediction.pptxClassification and Prediction.pptx
Classification and Prediction.pptx
SandeepAgrawal84
 
Classification
ClassificationClassification
Classification
thamizh arasi
 
dataminingclassificationprediction123 .pptx
dataminingclassificationprediction123 .pptxdataminingclassificationprediction123 .pptx
dataminingclassificationprediction123 .pptx
AsrithaKorupolu
 
2.1 Data Mining-classification Basic concepts
2.1 Data Mining-classification Basic concepts2.1 Data Mining-classification Basic concepts
2.1 Data Mining-classification Basic concepts
Krish_ver2
 
Data mining techniques unit iv
Data mining techniques unit ivData mining techniques unit iv
Data mining techniques unit iv
malathieswaran29
 
Classification techniques in data mining
Classification techniques in data miningClassification techniques in data mining
Classification techniques in data mining
Kamal Acharya
 
classification in data mining and data warehousing.pdf
classification in data mining and data warehousing.pdfclassification in data mining and data warehousing.pdf
classification in data mining and data warehousing.pdf
321106410027
 
Machine learning
Machine learningMachine learning
Machine learning
deepakbagam
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
Girish Khanzode
 
CS 402 DATAMINING AND WAREHOUSING -MODULE 3
CS 402 DATAMINING AND WAREHOUSING -MODULE 3CS 402 DATAMINING AND WAREHOUSING -MODULE 3
CS 402 DATAMINING AND WAREHOUSING -MODULE 3
NIMMYRAJU
 
Supply Chain Analytics, Supply Chain Management, Supply Chain Data Analytics
Supply Chain Analytics, Supply Chain Management, Supply Chain Data AnalyticsSupply Chain Analytics, Supply Chain Management, Supply Chain Data Analytics
Supply Chain Analytics, Supply Chain Management, Supply Chain Data Analytics
MujtabaAliKhan12
 
Statistical Learning and Model Selection (1).pptx
Statistical Learning and Model Selection (1).pptxStatistical Learning and Model Selection (1).pptx
Statistical Learning and Model Selection (1).pptx
rajalakshmi5921
 
ai4.ppt
ai4.pptai4.ppt
ai4.ppt
atul404633
 
++16516.pptx
++16516.pptx++16516.pptx
++16516.pptx
CoolDude357501
 
DataAnalyticsIntroduction and its ci.pptx
DataAnalyticsIntroduction and its ci.pptxDataAnalyticsIntroduction and its ci.pptx
DataAnalyticsIntroduction and its ci.pptx
PrincePatel272012
 
Data Mining
Data MiningData Mining
Data Mining
IIIT ALLAHABAD
 

Similar to Lect8 Classification & prediction (20)

Data mining chapter04and5-best
Data mining chapter04and5-bestData mining chapter04and5-best
Data mining chapter04and5-best
 
Machine learning algorithms for data mining
Machine learning algorithms for data miningMachine learning algorithms for data mining
Machine learning algorithms for data mining
 
Chapter 4 Classification in data sience .pdf
Chapter 4 Classification in data sience .pdfChapter 4 Classification in data sience .pdf
Chapter 4 Classification in data sience .pdf
 
Classification and Prediction.pptx
Classification and Prediction.pptxClassification and Prediction.pptx
Classification and Prediction.pptx
 
Classification
ClassificationClassification
Classification
 
dataminingclassificationprediction123 .pptx
dataminingclassificationprediction123 .pptxdataminingclassificationprediction123 .pptx
dataminingclassificationprediction123 .pptx
 
2.1 Data Mining-classification Basic concepts
2.1 Data Mining-classification Basic concepts2.1 Data Mining-classification Basic concepts
2.1 Data Mining-classification Basic concepts
 
Data mining techniques unit iv
Data mining techniques unit ivData mining techniques unit iv
Data mining techniques unit iv
 
Classification techniques in data mining
Classification techniques in data miningClassification techniques in data mining
Classification techniques in data mining
 
classification in data mining and data warehousing.pdf
classification in data mining and data warehousing.pdfclassification in data mining and data warehousing.pdf
classification in data mining and data warehousing.pdf
 
Machine learning
Machine learningMachine learning
Machine learning
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
CS 402 DATAMINING AND WAREHOUSING -MODULE 3
CS 402 DATAMINING AND WAREHOUSING -MODULE 3CS 402 DATAMINING AND WAREHOUSING -MODULE 3
CS 402 DATAMINING AND WAREHOUSING -MODULE 3
 
Supply Chain Analytics, Supply Chain Management, Supply Chain Data Analytics
Supply Chain Analytics, Supply Chain Management, Supply Chain Data AnalyticsSupply Chain Analytics, Supply Chain Management, Supply Chain Data Analytics
Supply Chain Analytics, Supply Chain Management, Supply Chain Data Analytics
 
Statistical Learning and Model Selection (1).pptx
Statistical Learning and Model Selection (1).pptxStatistical Learning and Model Selection (1).pptx
Statistical Learning and Model Selection (1).pptx
 
ai4.ppt
ai4.pptai4.ppt
ai4.ppt
 
++16516.pptx
++16516.pptx++16516.pptx
++16516.pptx
 
ai4.ppt
ai4.pptai4.ppt
ai4.ppt
 
DataAnalyticsIntroduction and its ci.pptx
DataAnalyticsIntroduction and its ci.pptxDataAnalyticsIntroduction and its ci.pptx
DataAnalyticsIntroduction and its ci.pptx
 
Data Mining
Data MiningData Mining
Data Mining
 

More from hktripathy

Lect 3 background mathematics
Lect 3 background mathematicsLect 3 background mathematics
Lect 3 background mathematics
hktripathy
 
Lect 2 getting to know your data
Lect 2 getting to know your dataLect 2 getting to know your data
Lect 2 getting to know your data
hktripathy
 
Lect 1 introduction
Lect 1 introductionLect 1 introduction
Lect 1 introduction
hktripathy
 
Lecture7.1 data sampling
Lecture7.1 data samplingLecture7.1 data sampling
Lecture7.1 data sampling
hktripathy
 
Lecture5 virtualization
Lecture5 virtualizationLecture5 virtualization
Lecture5 virtualization
hktripathy
 
Lecture6 introduction to data streams
Lecture6 introduction to data streamsLecture6 introduction to data streams
Lecture6 introduction to data streams
hktripathy
 
Lecture4 big data technology foundations
Lecture4 big data technology foundationsLecture4 big data technology foundations
Lecture4 big data technology foundations
hktripathy
 
Lecture3 business intelligence
Lecture3 business intelligenceLecture3 business intelligence
Lecture3 business intelligence
hktripathy
 
Lecture2 big data life cycle
Lecture2 big data life cycleLecture2 big data life cycle
Lecture2 big data life cycle
hktripathy
 
Lecture1 introduction to big data
Lecture1 introduction to big dataLecture1 introduction to big data
Lecture1 introduction to big data
hktripathy
 
Lect9 Decision tree
Lect9 Decision treeLect9 Decision tree
Lect9 Decision tree
hktripathy
 
Lect7 Association analysis to correlation analysis
Lect7 Association analysis to correlation analysisLect7 Association analysis to correlation analysis
Lect7 Association analysis to correlation analysis
hktripathy
 
Lect6 Association rule & Apriori algorithm
Lect6 Association rule & Apriori algorithmLect6 Association rule & Apriori algorithm
Lect6 Association rule & Apriori algorithm
hktripathy
 
Lect5 principal component analysis
Lect5 principal component analysisLect5 principal component analysis
Lect5 principal component analysis
hktripathy
 
Lect4 principal component analysis-I
Lect4 principal component analysis-ILect4 principal component analysis-I
Lect4 principal component analysis-I
hktripathy
 
Lect 3 background mathematics for Data Mining
Lect 3 background mathematics for Data MiningLect 3 background mathematics for Data Mining
Lect 3 background mathematics for Data Mining
hktripathy
 
Lect 2 getting to know your data
Lect 2 getting to know your dataLect 2 getting to know your data
Lect 2 getting to know your data
hktripathy
 
Lect 1 introduction
Lect 1 introductionLect 1 introduction
Lect 1 introduction
hktripathy
 

More from hktripathy (18)

Lect 3 background mathematics
Lect 3 background mathematicsLect 3 background mathematics
Lect 3 background mathematics
 
Lect 2 getting to know your data
Lect 2 getting to know your dataLect 2 getting to know your data
Lect 2 getting to know your data
 
Lect 1 introduction
Lect 1 introductionLect 1 introduction
Lect 1 introduction
 
Lecture7.1 data sampling
Lecture7.1 data samplingLecture7.1 data sampling
Lecture7.1 data sampling
 
Lecture5 virtualization
Lecture5 virtualizationLecture5 virtualization
Lecture5 virtualization
 
Lecture6 introduction to data streams
Lecture6 introduction to data streamsLecture6 introduction to data streams
Lecture6 introduction to data streams
 
Lecture4 big data technology foundations
Lecture4 big data technology foundationsLecture4 big data technology foundations
Lecture4 big data technology foundations
 
Lecture3 business intelligence
Lecture3 business intelligenceLecture3 business intelligence
Lecture3 business intelligence
 
Lecture2 big data life cycle
Lecture2 big data life cycleLecture2 big data life cycle
Lecture2 big data life cycle
 
Lecture1 introduction to big data
Lecture1 introduction to big dataLecture1 introduction to big data
Lecture1 introduction to big data
 
Lect9 Decision tree
Lect9 Decision treeLect9 Decision tree
Lect9 Decision tree
 
Lect7 Association analysis to correlation analysis
Lect7 Association analysis to correlation analysisLect7 Association analysis to correlation analysis
Lect7 Association analysis to correlation analysis
 
Lect6 Association rule & Apriori algorithm
Lect6 Association rule & Apriori algorithmLect6 Association rule & Apriori algorithm
Lect6 Association rule & Apriori algorithm
 
Lect5 principal component analysis
Lect5 principal component analysisLect5 principal component analysis
Lect5 principal component analysis
 
Lect4 principal component analysis-I
Lect4 principal component analysis-ILect4 principal component analysis-I
Lect4 principal component analysis-I
 
Lect 3 background mathematics for Data Mining
Lect 3 background mathematics for Data MiningLect 3 background mathematics for Data Mining
Lect 3 background mathematics for Data Mining
 
Lect 2 getting to know your data
Lect 2 getting to know your dataLect 2 getting to know your data
Lect 2 getting to know your data
 
Lect 1 introduction
Lect 1 introductionLect 1 introduction
Lect 1 introduction
 

Recently uploaded

Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup   New Member Orientation and Q&A (May 2024).pdfWelcome to TechSoup   New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
TechSoup
 
How to Create Map Views in the Odoo 17 ERP
How to Create Map Views in the Odoo 17 ERPHow to Create Map Views in the Odoo 17 ERP
How to Create Map Views in the Odoo 17 ERP
Celine George
 
How to Split Bills in the Odoo 17 POS Module
How to Split Bills in the Odoo 17 POS ModuleHow to Split Bills in the Odoo 17 POS Module
How to Split Bills in the Odoo 17 POS Module
Celine George
 
Home assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdfHome assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdf
Tamralipta Mahavidyalaya
 
Model Attribute Check Company Auto Property
Model Attribute  Check Company Auto PropertyModel Attribute  Check Company Auto Property
Model Attribute Check Company Auto Property
Celine George
 
Introduction to Quality Improvement Essentials
Introduction to Quality Improvement EssentialsIntroduction to Quality Improvement Essentials
Introduction to Quality Improvement Essentials
Excellence Foundation for South Sudan
 
The Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdfThe Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdf
kaushalkr1407
 
2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...
Sandy Millin
 
Unit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdfUnit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdf
Thiyagu K
 
The Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official PublicationThe Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official Publication
Delapenabediema
 
Sectors of the Indian Economy - Class 10 Study Notes pdf
Sectors of the Indian Economy - Class 10 Study Notes pdfSectors of the Indian Economy - Class 10 Study Notes pdf
Sectors of the Indian Economy - Class 10 Study Notes pdf
Vivekanand Anglo Vedic Academy
 
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCECLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
BhavyaRajput3
 
Chapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptxChapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptx
Mohd Adib Abd Muin, Senior Lecturer at Universiti Utara Malaysia
 
How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17
Celine George
 
Basic phrases for greeting and assisting costumers
Basic phrases for greeting and assisting costumersBasic phrases for greeting and assisting costumers
Basic phrases for greeting and assisting costumers
PedroFerreira53928
 
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdfUnit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Thiyagu K
 
Palestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptxPalestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptx
RaedMohamed3
 
Synthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptxSynthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptx
Pavel ( NSTU)
 
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
Nguyen Thanh Tu Collection
 
special B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdfspecial B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdf
Special education needs
 

Recently uploaded (20)

Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup   New Member Orientation and Q&A (May 2024).pdfWelcome to TechSoup   New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
 
How to Create Map Views in the Odoo 17 ERP
How to Create Map Views in the Odoo 17 ERPHow to Create Map Views in the Odoo 17 ERP
How to Create Map Views in the Odoo 17 ERP
 
How to Split Bills in the Odoo 17 POS Module
How to Split Bills in the Odoo 17 POS ModuleHow to Split Bills in the Odoo 17 POS Module
How to Split Bills in the Odoo 17 POS Module
 
Home assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdfHome assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdf
 
Model Attribute Check Company Auto Property
Model Attribute  Check Company Auto PropertyModel Attribute  Check Company Auto Property
Model Attribute Check Company Auto Property
 
Introduction to Quality Improvement Essentials
Introduction to Quality Improvement EssentialsIntroduction to Quality Improvement Essentials
Introduction to Quality Improvement Essentials
 
The Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdfThe Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdf
 
2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...
 
Unit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdfUnit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdf
 
The Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official PublicationThe Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official Publication
 
Sectors of the Indian Economy - Class 10 Study Notes pdf
Sectors of the Indian Economy - Class 10 Study Notes pdfSectors of the Indian Economy - Class 10 Study Notes pdf
Sectors of the Indian Economy - Class 10 Study Notes pdf
 
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCECLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
 
Chapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptxChapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptx
 
How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17
 
Basic phrases for greeting and assisting costumers
Basic phrases for greeting and assisting costumersBasic phrases for greeting and assisting costumers
Basic phrases for greeting and assisting costumers
 
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdfUnit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdf
 
Palestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptxPalestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptx
 
Synthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptxSynthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptx
 
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
 
special B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdfspecial B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdf
 

Lect8 Classification & prediction

  • 2. Classification and Prediction • What is classification? What is prediction? • Issues regarding classification and prediction • Classification by decision tree induction
  • 3. • Classification: – predicts categorical class labels – classifies data (constructs a model) based on the training set and the values (class labels) in a classifying attribute and uses it in classifying new data • Prediction: – models continuous-valued functions, i.e., predicts unknown or missing values • Typical Applications – credit approval – target marketing – medical diagnosis – treatment effectiveness analysis Classification vs. Prediction
  • 4. Classification Learning: Definition l Given a collection of records (training set) – Each record contains a set of attributes, one of the attributes is the class l Find a model for the class attribute as a function of the values of the other attributes l Goal: previously unseen records should be assigned a class as accurately as possible – Use test set to estimate the accuracy of the model – Often, the given data set is divided into training and test sets, with training set used to build the model and test set used to validate it.
  • 5. Illustrating Classification Learning Apply Model Induction Deduction Learn Model Model Tid Attrib1 Attrib2 Attrib3 Class 1 Yes Large 125K No 2 No Medium 100K No 3 No Small 70K No 4 Yes Medium 120K No 5 No Large 95K Yes 6 No Medium 60K No 7 Yes Large 220K No 8 No Small 85K Yes 9 No Medium 75K No 10 No Small 90K Yes 10 Tid Attrib1 Attrib2 Attrib3 Class 11 No Small 55K ? 12 Yes Medium 80K ? 13 Yes Large 110K ? 14 No Small 95K ? 15 No Large 67K ? 10 Test Set Learning algorithm Training Set
  • 6. Examples of Classification Task l Predicting tumor cells as benign or malignant l Classifying credit card transactions as legitimate or fraudulent l Classifying secondary structures of protein as alpha-helix, beta-sheet, or random coil l Categorizing news stories as finance, weather, entertainment, sports, etc.
  • 7. Classification - A Two-Step Process • Model construction: describing a set of predetermined classes – Building the Classifier or Model – Each tuple/sample is assumed to belong to a predefined class, as determined by the class label attribute – The set of tuples used for model construction: training set – The model is represented as classification rules, decision trees, or mathematical formulae • Model usage: for classifying future or unknown objects – Using Classifier for Classification – Estimate accuracy of the model • The known label of test sample is compared with the classified result from the model • Accuracy rate is the percentage of test set samples that are correctly classified by the model • Test set is independent of training set, otherwise over-fitting will occur
  • 8. Classification Process (1): Model Construction Example: Loan application The data classification process: Learning: Training data are analyzed by a classification algorithm. Here, the class label attribute is loan decision, and the learned model or classifier is represented in the form of classification rules.
  • 9. Classification Process (2): Use the Model in Prediction Classification: Test data are used to estimate the accuracy of the classification rules. If the accuracy is considered acceptable, the rules can be applied to the classification of new data tuples.
  • 10. Supervised vs. Unsupervised Learning • Supervised learning (classification) – Supervision: The training data (observations, measurements, etc.) are accompanied by labels indicating the class of the observations – New data is classified based on the training set • Unsupervised learning (clustering) – The class labels of training data is unknown – Given a set of measurements, observations, etc. with the aim of establishing the existence of classes or clusters in the data
  • 11. Issues (1): Data Preparation • Data cleaning – Preprocess data in order to reduce noise and handle missing values • Relevance analysis (feature selection) – Remove the irrelevant or redundant attributes • Data transformation – Generalize and/or normalize data
  • 12. Issues (2): Evaluating Classification Methods • Predictive accuracy • Speed and scalability – time to construct the model – time to use the model • Robustness – handling noise and missing values • Scalability – efficiency in disk-resident databases • Interpretability: – understanding and insight provded by the model • Goodness of rules – decision tree size – compactness of classification rules
  • 13. The problem • Given a set of training cases/objects and their attribute values, try to determine the target attribute value of new examples. – Classification – Prediction • Use a decision tree to predict categories for new events. • Use training data to build the decision tree. New Events Decision Tree Category Training Events and Categories