SlideShare a Scribd company logo
Er. Nawaraj Bhandari
Data Warehouse/Data Mining
Classification and Prediction
Chapter 8
Introduction
There are two forms of data analysis that can be used for extracting models
describing important classes or to predict future data trends.
These two forms are as follows:
 Classification
 Prediction
Introduction
 Classification models predict categorical class labels; and prediction models
predict continuous valued functions.
 For example, we can build a classification model to categorize bank loan
applications as either safe or risky
 Prediction model to predict the expenditures in dollars of potential customers
on computer equipment given their income and occupation.
What is classification?
 Following are the examples of cases where the data analysis task is
Classification:
 A bank loan officer wants to analyze the data in order to know which customer
(loan applicant) is risky or which are safe.
 A marketing manager at a company needs to analyze a customer with a given
profile, who will buy a new computer.
 In both of the above examples, a model or classifier is constructed to predict
the categorical labels. These labels are risky or safe for loan application data
and yes or no for marketing data.
What is prediction?
 Following are the examples of cases where the data analysis task is Prediction:
 Suppose the marketing manager needs to predict how much a given customer
will spend during a sale at his company.
 In this example we are bothered to predict a numeric value. Therefore the data
analysis task is an example of numeric prediction. In this case, a model or a
predictor will be constructed that predicts a continuous-valued-function or
ordered value.
 Regression analysis is a statistical methodology that is most often used for
numeric prediction.
How Does Classification Works?
With the help of the bank loan application that we have discussed above, let us
understand the working of classification. The Data Classification process includes
two steps:
 Building the Classifier or Model
 Using Classifier for Classification
Building the Classifier or Model
 This step is the learning step or the learning phase.
 In this step the classification algorithms build the classifier.
 The classifier is built from the training set made up of database tuples and their
associated class labels.
 Each tuple that constitutes the training set is referred to as a category or class.
These tuples can also be referred to as sample, object or data points.
Building the Classifier or Model
Using Classifier for Classification
 In this step, the classifier is used for classification.
 Here the test data is used to estimate the accuracy of classification rules.
 The classification rules can be applied to the new data tuples if the accuracy is
considered acceptable.
Classification by Decision Tree Induction
Decision tree induction is the learning of decision trees from class labeled
training tuples.
Decision tree is a flowchart-like tree structure where internal nodes (non leaf
node) denotes a test on an attribute branches represent outcomes of tests Leaf
nodes (terminal nodes) hold class labels and Root node is the topmost node.
Classification by Decision Tree Induction
Classification by Decision Tree Induction
Example
RID age income student credit-rating Class
1 youth high no fair ?
Test on age: youth
Test of student: no
Reach leaf node
Class NO: the customer Is Unlikely to buy a computer
Algorithm for constructing Decision Tress
Constructing a Decision tree uses greedy algorithm. Tree is constructed in a top-down recursive divide-
and-conquer manner.
• At start, all the training tuples are at the root
• Tuples are partitioned recursively based on selected attributes
• If all samples for a given node belong to the same class
Label the class
• If There are no remaining attributes for further partitioning
Majority voting is employed for classifying the leaf
• There are no samples left
Label the class and terminate
• Else
Got to step 2
K-Mean Algorithms
1. Take mean value
2. Find nearest number of mean and put in cluster.
3. Repeat one and two until we get same mean
References
1. Sam Anahory, Dennis Murray, “Data warehousing In the Real World”, Pearson
Education.
2. Kimball, R. “The Data Warehouse Toolkit”, Wiley, 1996.
3. Teorey, T. J., “Database Modeling and Design: The Entity-Relationship Approach”,
Morgan Kaufmann Publishers, Inc., 1990.
4. “An Overview of Data Warehousing and OLAP Technology”, S. Chaudhuri,
Microsoft Research
5. “Data Warehousing with Oracle”, M. A. Shahzad
6. “Data Mining Concepts and Techniques”, Morgan Kaufmann J. Han, M Kamber
Second Edition ISBN : 978-1-55860-901-3
ANY QUESTIONS?

More Related Content

What's hot

Datawarehousing
DatawarehousingDatawarehousing
Datawarehousing
sumit621
 
Cssu dw dm
Cssu dw dmCssu dw dm
Cssu dw dm
sumit621
 
Chapter 13 data warehousing
Chapter 13   data warehousingChapter 13   data warehousing
Chapter 13 data warehousing
sumit621
 

What's hot (19)

Data Mining
Data MiningData Mining
Data Mining
 
Data mining and data warehouse lab manual updated
Data mining and data warehouse lab manual updatedData mining and data warehouse lab manual updated
Data mining and data warehouse lab manual updated
 
Data mining concepts and work
Data mining concepts and workData mining concepts and work
Data mining concepts and work
 
01 Introduction to Data Mining
01 Introduction to Data Mining01 Introduction to Data Mining
01 Introduction to Data Mining
 
Datawarehousing
DatawarehousingDatawarehousing
Datawarehousing
 
Data mining
Data miningData mining
Data mining
 
Lecture1
Lecture1Lecture1
Lecture1
 
Cssu dw dm
Cssu dw dmCssu dw dm
Cssu dw dm
 
Data Warehouse By Piyush
Data Warehouse By PiyushData Warehouse By Piyush
Data Warehouse By Piyush
 
142230 633685297550892500
142230 633685297550892500142230 633685297550892500
142230 633685297550892500
 
Database
DatabaseDatabase
Database
 
Chapter 13 data warehousing
Chapter 13   data warehousingChapter 13   data warehousing
Chapter 13 data warehousing
 
Introduction to Datamining Concept and Techniques
Introduction to Datamining Concept and TechniquesIntroduction to Datamining Concept and Techniques
Introduction to Datamining Concept and Techniques
 
Part1
Part1Part1
Part1
 
Data mining presentation.ppt
Data mining presentation.pptData mining presentation.ppt
Data mining presentation.ppt
 
4 Data preparation and processing
4  Data preparation and processing4  Data preparation and processing
4 Data preparation and processing
 
Data warehousing and online analytical processing
Data warehousing and online analytical processingData warehousing and online analytical processing
Data warehousing and online analytical processing
 
1.2 steps and functionalities
1.2 steps and functionalities1.2 steps and functionalities
1.2 steps and functionalities
 
Data mininng trends
Data mininng trendsData mininng trends
Data mininng trends
 

Similar to Research trends in data warehousing and data mining

Machine Learning with Python- Methods for Machine Learning.pptx
Machine Learning with Python- Methods for Machine Learning.pptxMachine Learning with Python- Methods for Machine Learning.pptx
Machine Learning with Python- Methods for Machine Learning.pptx
iaeronlineexm
 

Similar to Research trends in data warehousing and data mining (20)

Classification
ClassificationClassification
Classification
 
dataminingclassificationprediction123 .pptx
dataminingclassificationprediction123 .pptxdataminingclassificationprediction123 .pptx
dataminingclassificationprediction123 .pptx
 
3 classification
3  classification3  classification
3 classification
 
Classification and Prediction.pptx
Classification and Prediction.pptxClassification and Prediction.pptx
Classification and Prediction.pptx
 
Chapter 1.pdf
Chapter 1.pdfChapter 1.pdf
Chapter 1.pdf
 
Artificial intyelligence and machine learning introduction.pptx
Artificial intyelligence and machine learning introduction.pptxArtificial intyelligence and machine learning introduction.pptx
Artificial intyelligence and machine learning introduction.pptx
 
Presentation on supervised learning
Presentation on supervised learningPresentation on supervised learning
Presentation on supervised learning
 
5. Machine Learning.pptx
5.  Machine Learning.pptx5.  Machine Learning.pptx
5. Machine Learning.pptx
 
Machine Learning with Python- Methods for Machine Learning.pptx
Machine Learning with Python- Methods for Machine Learning.pptxMachine Learning with Python- Methods for Machine Learning.pptx
Machine Learning with Python- Methods for Machine Learning.pptx
 
Machine Learning - Deep Learning
Machine Learning - Deep LearningMachine Learning - Deep Learning
Machine Learning - Deep Learning
 
classification in data mining and data warehousing.pdf
classification in data mining and data warehousing.pdfclassification in data mining and data warehousing.pdf
classification in data mining and data warehousing.pdf
 
Big Data Analytics - Unit 3.pptx
Big Data Analytics - Unit 3.pptxBig Data Analytics - Unit 3.pptx
Big Data Analytics - Unit 3.pptx
 
Introduction to machine learning
Introduction to machine learningIntroduction to machine learning
Introduction to machine learning
 
Supervised learning techniques and applications
Supervised learning techniques and applicationsSupervised learning techniques and applications
Supervised learning techniques and applications
 
Unit 2-ML.pptx
Unit 2-ML.pptxUnit 2-ML.pptx
Unit 2-ML.pptx
 
Review of Algorithms for Crime Analysis & Prediction
Review of Algorithms for Crime Analysis & PredictionReview of Algorithms for Crime Analysis & Prediction
Review of Algorithms for Crime Analysis & Prediction
 
Proposing an Appropriate Pattern for Car Detection by Using Intelligent Algor...
Proposing an Appropriate Pattern for Car Detection by Using Intelligent Algor...Proposing an Appropriate Pattern for Car Detection by Using Intelligent Algor...
Proposing an Appropriate Pattern for Car Detection by Using Intelligent Algor...
 
Profile Analysis of Users in Data Analytics Domain
Profile Analysis of   Users in Data Analytics DomainProfile Analysis of   Users in Data Analytics Domain
Profile Analysis of Users in Data Analytics Domain
 
Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests
 
Data Analytics Using R - Report
Data Analytics Using R - ReportData Analytics Using R - Report
Data Analytics Using R - Report
 

More from Er. Nawaraj Bhandari

More from Er. Nawaraj Bhandari (20)

Data mining approaches and methods
Data mining approaches and methodsData mining approaches and methods
Data mining approaches and methods
 
Mining Association Rules in Large Database
Mining Association Rules in Large DatabaseMining Association Rules in Large Database
Mining Association Rules in Large Database
 
Data warehouse testing
Data warehouse testingData warehouse testing
Data warehouse testing
 
Data warehouse physical design
Data warehouse physical designData warehouse physical design
Data warehouse physical design
 
Data warehouse logical design
Data warehouse logical designData warehouse logical design
Data warehouse logical design
 
Chapter 3: Simplification of Boolean Function
Chapter 3: Simplification of Boolean FunctionChapter 3: Simplification of Boolean Function
Chapter 3: Simplification of Boolean Function
 
Chapter 6: Sequential Logic
Chapter 6: Sequential LogicChapter 6: Sequential Logic
Chapter 6: Sequential Logic
 
Chapter 5: Cominational Logic with MSI and LSI
Chapter 5: Cominational Logic with MSI and LSIChapter 5: Cominational Logic with MSI and LSI
Chapter 5: Cominational Logic with MSI and LSI
 
Chapter 4: Combinational Logic
Chapter 4: Combinational LogicChapter 4: Combinational Logic
Chapter 4: Combinational Logic
 
Chapter 2: Boolean Algebra and Logic Gates
Chapter 2: Boolean Algebra and Logic GatesChapter 2: Boolean Algebra and Logic Gates
Chapter 2: Boolean Algebra and Logic Gates
 
Chapter 1: Binary System
 Chapter 1: Binary System Chapter 1: Binary System
Chapter 1: Binary System
 
Introduction to Electronic Commerce
Introduction to Electronic CommerceIntroduction to Electronic Commerce
Introduction to Electronic Commerce
 
Evaluating software development
Evaluating software developmentEvaluating software development
Evaluating software development
 
Using macros in microsoft excel part 2
Using macros in microsoft excel   part 2Using macros in microsoft excel   part 2
Using macros in microsoft excel part 2
 
Using macros in microsoft excel part 1
Using macros in microsoft excel   part 1Using macros in microsoft excel   part 1
Using macros in microsoft excel part 1
 
Using macros in microsoft access
Using macros in microsoft accessUsing macros in microsoft access
Using macros in microsoft access
 
Testing software development
Testing software developmentTesting software development
Testing software development
 
Application software and business processes
Application software and business processesApplication software and business processes
Application software and business processes
 
An introduction to vba and macros
An introduction to vba and macrosAn introduction to vba and macros
An introduction to vba and macros
 
An introduction to end user software development
An introduction to end user software developmentAn introduction to end user software development
An introduction to end user software development
 

Recently uploaded

一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
yhkoc
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
vcaxypu
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
ewymefz
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
ewymefz
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
vcaxypu
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
nscud
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
ArpitMalhotra16
 
Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...
Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...
Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...
Domenico Conte
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
Opendatabay
 

Recently uploaded (20)

一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
 
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...
 
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPsWebinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
 
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
 
tapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive datatapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive data
 
Uber Ride Supply Demand Gap Analysis Report
Uber Ride Supply Demand Gap Analysis ReportUber Ride Supply Demand Gap Analysis Report
Uber Ride Supply Demand Gap Analysis Report
 
Using PDB Relocation to Move a Single PDB to Another Existing CDB
Using PDB Relocation to Move a Single PDB to Another Existing CDBUsing PDB Relocation to Move a Single PDB to Another Existing CDB
Using PDB Relocation to Move a Single PDB to Another Existing CDB
 
Tabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsTabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflows
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
 
How can I successfully sell my pi coins in Philippines?
How can I successfully sell my pi coins in Philippines?How can I successfully sell my pi coins in Philippines?
How can I successfully sell my pi coins in Philippines?
 
Slip-and-fall Injuries: Top Workers' Comp Claims
Slip-and-fall Injuries: Top Workers' Comp ClaimsSlip-and-fall Injuries: Top Workers' Comp Claims
Slip-and-fall Injuries: Top Workers' Comp Claims
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
 
Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...
Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...
Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...
 
Business update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMIBusiness update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMI
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
 

Research trends in data warehousing and data mining

  • 1. Er. Nawaraj Bhandari Data Warehouse/Data Mining Classification and Prediction Chapter 8
  • 2. Introduction There are two forms of data analysis that can be used for extracting models describing important classes or to predict future data trends. These two forms are as follows:  Classification  Prediction
  • 3. Introduction  Classification models predict categorical class labels; and prediction models predict continuous valued functions.  For example, we can build a classification model to categorize bank loan applications as either safe or risky  Prediction model to predict the expenditures in dollars of potential customers on computer equipment given their income and occupation.
  • 4. What is classification?  Following are the examples of cases where the data analysis task is Classification:  A bank loan officer wants to analyze the data in order to know which customer (loan applicant) is risky or which are safe.  A marketing manager at a company needs to analyze a customer with a given profile, who will buy a new computer.  In both of the above examples, a model or classifier is constructed to predict the categorical labels. These labels are risky or safe for loan application data and yes or no for marketing data.
  • 5. What is prediction?  Following are the examples of cases where the data analysis task is Prediction:  Suppose the marketing manager needs to predict how much a given customer will spend during a sale at his company.  In this example we are bothered to predict a numeric value. Therefore the data analysis task is an example of numeric prediction. In this case, a model or a predictor will be constructed that predicts a continuous-valued-function or ordered value.  Regression analysis is a statistical methodology that is most often used for numeric prediction.
  • 6. How Does Classification Works? With the help of the bank loan application that we have discussed above, let us understand the working of classification. The Data Classification process includes two steps:  Building the Classifier or Model  Using Classifier for Classification
  • 7. Building the Classifier or Model  This step is the learning step or the learning phase.  In this step the classification algorithms build the classifier.  The classifier is built from the training set made up of database tuples and their associated class labels.  Each tuple that constitutes the training set is referred to as a category or class. These tuples can also be referred to as sample, object or data points.
  • 9. Using Classifier for Classification  In this step, the classifier is used for classification.  Here the test data is used to estimate the accuracy of classification rules.  The classification rules can be applied to the new data tuples if the accuracy is considered acceptable.
  • 10. Classification by Decision Tree Induction Decision tree induction is the learning of decision trees from class labeled training tuples. Decision tree is a flowchart-like tree structure where internal nodes (non leaf node) denotes a test on an attribute branches represent outcomes of tests Leaf nodes (terminal nodes) hold class labels and Root node is the topmost node.
  • 11. Classification by Decision Tree Induction
  • 12. Classification by Decision Tree Induction Example RID age income student credit-rating Class 1 youth high no fair ? Test on age: youth Test of student: no Reach leaf node Class NO: the customer Is Unlikely to buy a computer
  • 13. Algorithm for constructing Decision Tress Constructing a Decision tree uses greedy algorithm. Tree is constructed in a top-down recursive divide- and-conquer manner. • At start, all the training tuples are at the root • Tuples are partitioned recursively based on selected attributes • If all samples for a given node belong to the same class Label the class • If There are no remaining attributes for further partitioning Majority voting is employed for classifying the leaf • There are no samples left Label the class and terminate • Else Got to step 2
  • 14. K-Mean Algorithms 1. Take mean value 2. Find nearest number of mean and put in cluster. 3. Repeat one and two until we get same mean
  • 15. References 1. Sam Anahory, Dennis Murray, “Data warehousing In the Real World”, Pearson Education. 2. Kimball, R. “The Data Warehouse Toolkit”, Wiley, 1996. 3. Teorey, T. J., “Database Modeling and Design: The Entity-Relationship Approach”, Morgan Kaufmann Publishers, Inc., 1990. 4. “An Overview of Data Warehousing and OLAP Technology”, S. Chaudhuri, Microsoft Research 5. “Data Warehousing with Oracle”, M. A. Shahzad 6. “Data Mining Concepts and Techniques”, Morgan Kaufmann J. Han, M Kamber Second Edition ISBN : 978-1-55860-901-3

Editor's Notes

  1. Example is inside hardcopy please follow hardcopy
  2. Example is inside hardcopy please follow hardcopy