SlideShare a Scribd company logo
1 of 27
Er. Nawaraj Bhandari
Data Warehouse/Data Mining
Chapter 5:
Data Mining Definition and Task
 There is a huge amount of data available in the Information Industry. This data
is of no use until it is converted into useful information. It is necessary to
analyze this huge amount of data and extract useful information from it.
Data Mining is defined as extracting information from huge sets of data. In other
words, we can say that data mining is the procedure of mining knowledge from
data.
Data Mining Definition and Task
 On the basis of the kind of data to be mined, there are two types of tasks that
are performed by Data Mining:
 Descriptive
 Classification and Prediction
Descriptive Function
The descriptive function deals with the general properties of data in the database.
Here is the list of descriptive functions −
 Class/Concept Description
 Mining of Frequent Patterns
 Mining of Associations
 Mining of Correlations
 Mining of Clusters
Classification and Prediction
Classification is the process of finding a model that describes the data classes or
concepts. The purpose is to be able to use this model to predict the class of
objects whose class label is unknown. This derived model is based on the
analysis of sets of training data. The derived model can be presented in the
following forms
 Classification (IF-THEN) Rules
 Decision Trees
 Mathematical Formulae
 Neural Networks
 Prediction is used to predict missing or unavailable numerical data values rather
than class labels.
Data Mining Techniques
There are several major data mining techniques have been developing and using
in data mining projects recently including:
association, classification, clustering, prediction, sequential patterns
and decision tree.
Association
 Association is one of the best-known data mining technique. In association, a
pattern is discovered based on a relationship between items in the same
transaction.
 That’s is the reason why association technique is also known as relation
technique. The association technique is used in market basket analysis to
identify a set of products that customers frequently purchase together.
 Retailers are using association technique to research customer’s buying habits.
Based on historical sale data, retailers might find out that customers always buy
crisps when they buy beers, and therefore they can put beers and crisps next to
each other to save time for customer and increase sales.
Association
Classification
Classification is a classic data mining technique based on machine learning.
Basically classification is used to classify each item in a set of data into one of
predefined set of classes or groups.
Classification method makes use of mathematical techniques such as decision
trees, linear programming, neural network and statistics. In classification, we
develop the software that can learn how to classify the data items into groups. For
example, we can apply classification in application that “given all records of
employees who left the company, predict who will probably leave the company in a
future period.”
In this case, we divide the records of employees into two groups that named
“leave” and “stay”. And then we can ask our data mining software to classify the
employees into separate groups.
Clustering
Clustering is a data mining technique that makes meaningful or useful cluster of
objects which have similar characteristics using automatic technique. The
clustering technique defines the classes and puts objects in each class, while in the
classification techniques, objects are assigned into predefined classes.
To make the concept clearer, we can take book management in library as an
example. In a library, there is a wide range of books in various topics available. The
challenge is how to keep those books in a way that readers can take several books
in a particular topic without hassle.
By using clustering technique, we can keep books that have some kinds of
similarities in one cluster or one shelf and label it with a meaningful name. If
readers want to grab books in that topic, they would only have to go to that shelf
instead of looking for entire library.
Clustering
The prediction, as it name implied, is one of a data mining techniques that
discovers relationship between independent variables and relationship between
dependent variables.
For instance, the prediction analysis technique can be used in sale to predict profit
for the future if we consider sale is an independent variable, profit could be a
dependent variable. Then based on the historical sale and profit data, we can draw
a fitted regression curve that is used for profit prediction.
Prediction
Often used over longer-term data, sequential patterns are a useful method for
identifying trends, or regular occurrences of similar events. For example, with
customer data you can identify that customers buy a particular collection of
products together at different times of the year.
In a shopping basket application, you can use this information to automatically
suggest that certain items be added to a basket based on their frequency and past
purchasing history.
Sequential Patterns
Decision tree is one of the most used data mining techniques because its model is
easy to understand for users.
In decision tree technique, the root of the decision tree is a simple question or
condition that has multiple answers.
Each answer then leads to a set of questions or conditions that help us determine
the data so that we can make the final decision based on it.
Decision trees
Different types of data mining tools are available in the marketplace, each with
their own strengths and weaknesses.
These tools use artificial intelligence, machine learning and other techniques to
extract data.
Most data mining tools can be classified into one of three categories: traditional
data mining tools, dashboards, and text-mining tools.
Data Mining Tools
 Traditional data mining tools and techniques work with existing databases stored
on enterprise servers or even local hard drives. They interpret the data stored
there using pre-defined algorithms and queries written out in a database-specific
programming language (macros) to reveal patterns in the data that would
otherwise be invisible.
 For example, a database of sales figures can easily display monthly sales trends
simply by accessing the database’s built-in query and table system. A data
mining tool installed to the server can then analyze those broad numbers to
identify aspects affecting monthly sales that are not immediately apparent, and,
most importantly, render that analysis into an easily-readable report form that
makes those patterns explicit.
Traditional Data Mining Tools
A more recent innovation in the world of data mining tools and techniques is the
Dashboard. Dashboard is a piece of software that sits on an end-user’s desktop or
tablet and reports real-time fluctuations in data as it flows into the database and is
manipulated or sorted. Typically, historical data can also be accessed via the
Dashboard, although the data mining of historic
Dashboards are typically used by managers and other positions to track the effect
of events and other influences on data streams in real time.
One example is monitoring new picking policies in a warehouse as a company
attempts to massage their logistical management of stock ”“ a Dashboard allows
the company to see the effect of new policies immediately, quickly analyzing just a
few hours of data to see if they getting the desired efficiency or not.
Dashboards
 One of the newer innovations in data mining tools and techniques are text-mining
applications. These tools take disparate forms of textual data ”“ word processing
documents, plain text files, ‘flat’ text formats like PDF files or presentation files ”“
and mine them for patterns in the text.
 This allows companies and users to use data mining tools and techniques without
having to open each document in a separate application or perform cumbersome
(and error-introducing) conversions on documents.
 Text analysis has many possible techniques and applications. One popular one
involves seeking out plagiarized or ‘copy pasted’ content. Text analysis data
mining tools allow users to quickly scan huge amounts of text in different formats
to identify identical strings and report back the odds that a particular piece of
text was lifted from an existing text. Universities and colleges are using such tools
more and more commonly to fight plagiarism in classrooms.
Text Analysis
 Future Healthcare
 Data mining holds great potential to improve health systems. It uses data and
analytics to identify best practices that improve care and reduce costs.
Researchers use data mining approaches like multi-dimensional databases,
machine learning, soft computing, data visualization and statistics. Mining can be
used to predict the volume of patients in every category. Processes are
developed that make sure that the patients receive appropriate care at the right
place and at the right time. Data mining can also help healthcare insurers to
detect fraud and abuse.
Data mining applications
 Market Basket Analysis
 Market basket analysis is a modelling technique based upon a theory that if you
buy a certain group of items you are more likely to buy another group of items.
This technique may allow the retailer to understand the purchase behaviour of a
buyer. This information may help the retailer to know the buyer’s needs and
change the store’s layout accordingly. Using differential analysis comparison of
results between different stores, between customers in different demographic
groups can be done.
Data mining applications
 Education
 There is a new emerging field, called Educational Data Mining, concerns with
developing methods that discover knowledge from data originating from
educational Environments. The goals of EDM are identified as predicting students’
future learning behavior, studying the effects of educational support, and
advancing scientific knowledge about learning. Data mining can be used by an
institution to take accurate decisions and also to predict the results of the
student. With the results the institution can focus on what to teach and how to
teach. Learning pattern of the students can be captured and used to develop
techniques to teach them.
Data mining applications
 Manufacturing Engineering
 Knowledge is the best asset a manufacturing enterprise would possess. Data
mining tools can be very useful to discover patterns in complex manufacturing
process. Data mining can be used in system-level designing to extract the
relationships between product architecture, product portfolio, and customer
needs data. It can also be used to predict the product development span time,
cost, and dependencies among other tasks.
Data mining applications
 CRM
 Customer Relationship Management is all about acquiring and retaining
customers, also improving customers’ loyalty and implementing customer
focused strategies. To maintain a proper relationship with a customer a business
need to collect data and analyse the information. This is where data mining plays
its part. With data mining technologies the collected data can be used for
analysis. Instead of being confused where to focus to retain customer, the
seekers for the solution get filtered results.
Data mining applications
 Fraud Detection
 Billions of dollars have been lost to the action of frauds. Traditional methods of
fraud detection are time consuming and complex. Data mining aids in providing
meaningful patterns and turning data into information. Any information that is
valid and useful is knowledge. A perfect fraud detection system should protect
information of all the users. A supervised method includes collection of sample
records. These records are classified fraudulent or non-fraudulent. A model is
built using this data and the algorithm is made to identify whether the record is
fraudulent or not.
Data mining applications
 Fraud Detection
 Billions of dollars have been lost to the action of frauds. Traditional methods of
fraud detection are time consuming and complex. Data mining aids in providing
meaningful patterns and turning data into information. Any information that is
valid and useful is knowledge. A perfect fraud detection system should protect
information of all the users. A supervised method includes collection of sample
records. These records are classified fraudulent or non-fraudulent. A model is
built using this data and the algorithm is made to identify whether the record is
fraudulent or not.
Data mining applications
References
1. Sam Anahory, Dennis Murray, “Data warehousing In the Real World”, Pearson
Education.
2. Kimball, R. “The Data Warehouse Toolkit”, Wiley, 1996.
3. Teorey, T. J., “Database Modeling and Design: The Entity-Relationship Approach”,
Morgan Kaufmann Publishers, Inc., 1990.
4. “An Overview of Data Warehousing and OLAP Technology”, S. Chaudhuri,
Microsoft Research
5. “Data Warehousing with Oracle”, M. A. Shahzad
6. “Data Mining Concepts and Techniques”, Morgan Kaufmann J. Han, M Kamber
Second Edition ISBN : 978-1-55860-901-3
ANY QUESTIONS?

More Related Content

What's hot

lazy learners and other classication methods
lazy learners and other classication methodslazy learners and other classication methods
lazy learners and other classication methodsrajshreemuthiah
 
introduction to data mining tutorial
introduction to data mining tutorial introduction to data mining tutorial
introduction to data mining tutorial Salah Amean
 
Data preprocessing using Machine Learning
Data  preprocessing using Machine Learning Data  preprocessing using Machine Learning
Data preprocessing using Machine Learning Gopal Sakarkar
 
1.2 steps and functionalities
1.2 steps and functionalities1.2 steps and functionalities
1.2 steps and functionalitiesKrish_ver2
 
DATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALA
DATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALADATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALA
DATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALASaikiran Panjala
 
2.5 backpropagation
2.5 backpropagation2.5 backpropagation
2.5 backpropagationKrish_ver2
 
Computational learning theory
Computational learning theoryComputational learning theory
Computational learning theoryswapnac12
 
4.2 spatial data mining
4.2 spatial data mining4.2 spatial data mining
4.2 spatial data miningKrish_ver2
 
Types of Machine Learning
Types of Machine LearningTypes of Machine Learning
Types of Machine LearningSamra Shahzadi
 
Data mining presentation.ppt
Data mining presentation.pptData mining presentation.ppt
Data mining presentation.pptneelamoberoi1030
 
Machine Learning - Dataset Preparation
Machine Learning - Dataset PreparationMachine Learning - Dataset Preparation
Machine Learning - Dataset PreparationAndrew Ferlitsch
 
5.1 mining data streams
5.1 mining data streams5.1 mining data streams
5.1 mining data streamsKrish_ver2
 
Data Mining: Application and trends in data mining
Data Mining: Application and trends in data miningData Mining: Application and trends in data mining
Data Mining: Application and trends in data miningDataminingTools Inc
 

What's hot (20)

lazy learners and other classication methods
lazy learners and other classication methodslazy learners and other classication methods
lazy learners and other classication methods
 
introduction to data mining tutorial
introduction to data mining tutorial introduction to data mining tutorial
introduction to data mining tutorial
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
Data preprocessing using Machine Learning
Data  preprocessing using Machine Learning Data  preprocessing using Machine Learning
Data preprocessing using Machine Learning
 
1.2 steps and functionalities
1.2 steps and functionalities1.2 steps and functionalities
1.2 steps and functionalities
 
DATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALA
DATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALADATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALA
DATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALA
 
2.5 backpropagation
2.5 backpropagation2.5 backpropagation
2.5 backpropagation
 
Topic Modeling
Topic ModelingTopic Modeling
Topic Modeling
 
Computational learning theory
Computational learning theoryComputational learning theory
Computational learning theory
 
Text mining
Text miningText mining
Text mining
 
Machine learning clustering
Machine learning clusteringMachine learning clustering
Machine learning clustering
 
Multimedia Mining
Multimedia Mining Multimedia Mining
Multimedia Mining
 
Data Mining
Data MiningData Mining
Data Mining
 
Learning in AI
Learning in AILearning in AI
Learning in AI
 
4.2 spatial data mining
4.2 spatial data mining4.2 spatial data mining
4.2 spatial data mining
 
Types of Machine Learning
Types of Machine LearningTypes of Machine Learning
Types of Machine Learning
 
Data mining presentation.ppt
Data mining presentation.pptData mining presentation.ppt
Data mining presentation.ppt
 
Machine Learning - Dataset Preparation
Machine Learning - Dataset PreparationMachine Learning - Dataset Preparation
Machine Learning - Dataset Preparation
 
5.1 mining data streams
5.1 mining data streams5.1 mining data streams
5.1 mining data streams
 
Data Mining: Application and trends in data mining
Data Mining: Application and trends in data miningData Mining: Application and trends in data mining
Data Mining: Application and trends in data mining
 

Similar to Classification and prediction in data mining

EXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONS
EXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONSEXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONS
EXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONSeditorijettcs
 
EXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONS
EXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONSEXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONS
EXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONSeditorijettcs
 
Week-1-Introduction to Data Mining.pptx
Week-1-Introduction to Data Mining.pptxWeek-1-Introduction to Data Mining.pptx
Week-1-Introduction to Data Mining.pptxTake1As
 
DataMining Techniq
DataMining TechniqDataMining Techniq
DataMining TechniqRespa Peter
 
Mining internal sources of data
Mining internal sources of dataMining internal sources of data
Mining internal sources of datanomanbhutta
 
what is ..how to process types and methods involved in data analysis
what is ..how to process types and methods involved in data analysiswhat is ..how to process types and methods involved in data analysis
what is ..how to process types and methods involved in data analysisData analysis ireland
 
Introduction to feature subset selection method
Introduction to feature subset selection methodIntroduction to feature subset selection method
Introduction to feature subset selection methodIJSRD
 
notes_dmdw_chap1.docx
notes_dmdw_chap1.docxnotes_dmdw_chap1.docx
notes_dmdw_chap1.docxAbshar Fatima
 
Forecasting Businesses Through Data Mining
Forecasting Businesses Through Data MiningForecasting Businesses Through Data Mining
Forecasting Businesses Through Data MiningAkash Shukla
 
Data and Information Visualization part 2.pptx
Data and Information Visualization part 2.pptxData and Information Visualization part 2.pptx
Data and Information Visualization part 2.pptxLamees EL- Ghazoly
 

Similar to Classification and prediction in data mining (20)

EXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONS
EXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONSEXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONS
EXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONS
 
EXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONS
EXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONSEXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONS
EXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONS
 
Data Mining
Data MiningData Mining
Data Mining
 
Week-1-Introduction to Data Mining.pptx
Week-1-Introduction to Data Mining.pptxWeek-1-Introduction to Data Mining.pptx
Week-1-Introduction to Data Mining.pptx
 
DataMining Techniq
DataMining TechniqDataMining Techniq
DataMining Techniq
 
Chapter 1.pdf
Chapter 1.pdfChapter 1.pdf
Chapter 1.pdf
 
Mining internal sources of data
Mining internal sources of dataMining internal sources of data
Mining internal sources of data
 
Data mining-basic
Data mining-basicData mining-basic
Data mining-basic
 
what is ..how to process types and methods involved in data analysis
what is ..how to process types and methods involved in data analysiswhat is ..how to process types and methods involved in data analysis
what is ..how to process types and methods involved in data analysis
 
Introduction to feature subset selection method
Introduction to feature subset selection methodIntroduction to feature subset selection method
Introduction to feature subset selection method
 
Unit i
Unit iUnit i
Unit i
 
Data mining
Data miningData mining
Data mining
 
Data mining
Data miningData mining
Data mining
 
Data Mining
Data MiningData Mining
Data Mining
 
Data Mining
Data MiningData Mining
Data Mining
 
notes_dmdw_chap1.docx
notes_dmdw_chap1.docxnotes_dmdw_chap1.docx
notes_dmdw_chap1.docx
 
Forecasting Businesses Through Data Mining
Forecasting Businesses Through Data MiningForecasting Businesses Through Data Mining
Forecasting Businesses Through Data Mining
 
Seminar Presentation
Seminar PresentationSeminar Presentation
Seminar Presentation
 
Data mining
Data miningData mining
Data mining
 
Data and Information Visualization part 2.pptx
Data and Information Visualization part 2.pptxData and Information Visualization part 2.pptx
Data and Information Visualization part 2.pptx
 

More from Er. Nawaraj Bhandari

Data mining approaches and methods
Data mining approaches and methodsData mining approaches and methods
Data mining approaches and methodsEr. Nawaraj Bhandari
 
Research trends in data warehousing and data mining
Research trends in data warehousing and data miningResearch trends in data warehousing and data mining
Research trends in data warehousing and data miningEr. Nawaraj Bhandari
 
Mining Association Rules in Large Database
Mining Association Rules in Large DatabaseMining Association Rules in Large Database
Mining Association Rules in Large DatabaseEr. Nawaraj Bhandari
 
Introduction to data mining and data warehousing
Introduction to data mining and data warehousingIntroduction to data mining and data warehousing
Introduction to data mining and data warehousingEr. Nawaraj Bhandari
 
Chapter 3: Simplification of Boolean Function
Chapter 3: Simplification of Boolean FunctionChapter 3: Simplification of Boolean Function
Chapter 3: Simplification of Boolean FunctionEr. Nawaraj Bhandari
 
Chapter 5: Cominational Logic with MSI and LSI
Chapter 5: Cominational Logic with MSI and LSIChapter 5: Cominational Logic with MSI and LSI
Chapter 5: Cominational Logic with MSI and LSIEr. Nawaraj Bhandari
 
Chapter 2: Boolean Algebra and Logic Gates
Chapter 2: Boolean Algebra and Logic GatesChapter 2: Boolean Algebra and Logic Gates
Chapter 2: Boolean Algebra and Logic GatesEr. Nawaraj Bhandari
 
Introduction to Electronic Commerce
Introduction to Electronic CommerceIntroduction to Electronic Commerce
Introduction to Electronic CommerceEr. Nawaraj Bhandari
 
Using macros in microsoft excel part 2
Using macros in microsoft excel   part 2Using macros in microsoft excel   part 2
Using macros in microsoft excel part 2Er. Nawaraj Bhandari
 
Using macros in microsoft excel part 1
Using macros in microsoft excel   part 1Using macros in microsoft excel   part 1
Using macros in microsoft excel part 1Er. Nawaraj Bhandari
 
Application software and business processes
Application software and business processesApplication software and business processes
Application software and business processesEr. Nawaraj Bhandari
 

More from Er. Nawaraj Bhandari (20)

Data mining approaches and methods
Data mining approaches and methodsData mining approaches and methods
Data mining approaches and methods
 
Research trends in data warehousing and data mining
Research trends in data warehousing and data miningResearch trends in data warehousing and data mining
Research trends in data warehousing and data mining
 
Mining Association Rules in Large Database
Mining Association Rules in Large DatabaseMining Association Rules in Large Database
Mining Association Rules in Large Database
 
Introduction to data mining and data warehousing
Introduction to data mining and data warehousingIntroduction to data mining and data warehousing
Introduction to data mining and data warehousing
 
Data warehouse testing
Data warehouse testingData warehouse testing
Data warehouse testing
 
Data warehouse physical design
Data warehouse physical designData warehouse physical design
Data warehouse physical design
 
Data warehouse logical design
Data warehouse logical designData warehouse logical design
Data warehouse logical design
 
Chapter 3: Simplification of Boolean Function
Chapter 3: Simplification of Boolean FunctionChapter 3: Simplification of Boolean Function
Chapter 3: Simplification of Boolean Function
 
Chapter 6: Sequential Logic
Chapter 6: Sequential LogicChapter 6: Sequential Logic
Chapter 6: Sequential Logic
 
Chapter 5: Cominational Logic with MSI and LSI
Chapter 5: Cominational Logic with MSI and LSIChapter 5: Cominational Logic with MSI and LSI
Chapter 5: Cominational Logic with MSI and LSI
 
Chapter 4: Combinational Logic
Chapter 4: Combinational LogicChapter 4: Combinational Logic
Chapter 4: Combinational Logic
 
Chapter 2: Boolean Algebra and Logic Gates
Chapter 2: Boolean Algebra and Logic GatesChapter 2: Boolean Algebra and Logic Gates
Chapter 2: Boolean Algebra and Logic Gates
 
Chapter 1: Binary System
 Chapter 1: Binary System Chapter 1: Binary System
Chapter 1: Binary System
 
Introduction to Electronic Commerce
Introduction to Electronic CommerceIntroduction to Electronic Commerce
Introduction to Electronic Commerce
 
Evaluating software development
Evaluating software developmentEvaluating software development
Evaluating software development
 
Using macros in microsoft excel part 2
Using macros in microsoft excel   part 2Using macros in microsoft excel   part 2
Using macros in microsoft excel part 2
 
Using macros in microsoft excel part 1
Using macros in microsoft excel   part 1Using macros in microsoft excel   part 1
Using macros in microsoft excel part 1
 
Using macros in microsoft access
Using macros in microsoft accessUsing macros in microsoft access
Using macros in microsoft access
 
Testing software development
Testing software developmentTesting software development
Testing software development
 
Application software and business processes
Application software and business processesApplication software and business processes
Application software and business processes
 

Recently uploaded

Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 

Recently uploaded (20)

Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 

Classification and prediction in data mining

  • 1. Er. Nawaraj Bhandari Data Warehouse/Data Mining Chapter 5:
  • 2. Data Mining Definition and Task  There is a huge amount of data available in the Information Industry. This data is of no use until it is converted into useful information. It is necessary to analyze this huge amount of data and extract useful information from it. Data Mining is defined as extracting information from huge sets of data. In other words, we can say that data mining is the procedure of mining knowledge from data.
  • 3. Data Mining Definition and Task  On the basis of the kind of data to be mined, there are two types of tasks that are performed by Data Mining:  Descriptive  Classification and Prediction
  • 4. Descriptive Function The descriptive function deals with the general properties of data in the database. Here is the list of descriptive functions −  Class/Concept Description  Mining of Frequent Patterns  Mining of Associations  Mining of Correlations  Mining of Clusters
  • 5. Classification and Prediction Classification is the process of finding a model that describes the data classes or concepts. The purpose is to be able to use this model to predict the class of objects whose class label is unknown. This derived model is based on the analysis of sets of training data. The derived model can be presented in the following forms  Classification (IF-THEN) Rules  Decision Trees  Mathematical Formulae  Neural Networks  Prediction is used to predict missing or unavailable numerical data values rather than class labels.
  • 6. Data Mining Techniques There are several major data mining techniques have been developing and using in data mining projects recently including: association, classification, clustering, prediction, sequential patterns and decision tree.
  • 7. Association  Association is one of the best-known data mining technique. In association, a pattern is discovered based on a relationship between items in the same transaction.  That’s is the reason why association technique is also known as relation technique. The association technique is used in market basket analysis to identify a set of products that customers frequently purchase together.  Retailers are using association technique to research customer’s buying habits. Based on historical sale data, retailers might find out that customers always buy crisps when they buy beers, and therefore they can put beers and crisps next to each other to save time for customer and increase sales.
  • 9. Classification Classification is a classic data mining technique based on machine learning. Basically classification is used to classify each item in a set of data into one of predefined set of classes or groups. Classification method makes use of mathematical techniques such as decision trees, linear programming, neural network and statistics. In classification, we develop the software that can learn how to classify the data items into groups. For example, we can apply classification in application that “given all records of employees who left the company, predict who will probably leave the company in a future period.” In this case, we divide the records of employees into two groups that named “leave” and “stay”. And then we can ask our data mining software to classify the employees into separate groups.
  • 10. Clustering Clustering is a data mining technique that makes meaningful or useful cluster of objects which have similar characteristics using automatic technique. The clustering technique defines the classes and puts objects in each class, while in the classification techniques, objects are assigned into predefined classes. To make the concept clearer, we can take book management in library as an example. In a library, there is a wide range of books in various topics available. The challenge is how to keep those books in a way that readers can take several books in a particular topic without hassle. By using clustering technique, we can keep books that have some kinds of similarities in one cluster or one shelf and label it with a meaningful name. If readers want to grab books in that topic, they would only have to go to that shelf instead of looking for entire library.
  • 12. The prediction, as it name implied, is one of a data mining techniques that discovers relationship between independent variables and relationship between dependent variables. For instance, the prediction analysis technique can be used in sale to predict profit for the future if we consider sale is an independent variable, profit could be a dependent variable. Then based on the historical sale and profit data, we can draw a fitted regression curve that is used for profit prediction. Prediction
  • 13. Often used over longer-term data, sequential patterns are a useful method for identifying trends, or regular occurrences of similar events. For example, with customer data you can identify that customers buy a particular collection of products together at different times of the year. In a shopping basket application, you can use this information to automatically suggest that certain items be added to a basket based on their frequency and past purchasing history. Sequential Patterns
  • 14. Decision tree is one of the most used data mining techniques because its model is easy to understand for users. In decision tree technique, the root of the decision tree is a simple question or condition that has multiple answers. Each answer then leads to a set of questions or conditions that help us determine the data so that we can make the final decision based on it. Decision trees
  • 15. Different types of data mining tools are available in the marketplace, each with their own strengths and weaknesses. These tools use artificial intelligence, machine learning and other techniques to extract data. Most data mining tools can be classified into one of three categories: traditional data mining tools, dashboards, and text-mining tools. Data Mining Tools
  • 16.  Traditional data mining tools and techniques work with existing databases stored on enterprise servers or even local hard drives. They interpret the data stored there using pre-defined algorithms and queries written out in a database-specific programming language (macros) to reveal patterns in the data that would otherwise be invisible.  For example, a database of sales figures can easily display monthly sales trends simply by accessing the database’s built-in query and table system. A data mining tool installed to the server can then analyze those broad numbers to identify aspects affecting monthly sales that are not immediately apparent, and, most importantly, render that analysis into an easily-readable report form that makes those patterns explicit. Traditional Data Mining Tools
  • 17. A more recent innovation in the world of data mining tools and techniques is the Dashboard. Dashboard is a piece of software that sits on an end-user’s desktop or tablet and reports real-time fluctuations in data as it flows into the database and is manipulated or sorted. Typically, historical data can also be accessed via the Dashboard, although the data mining of historic Dashboards are typically used by managers and other positions to track the effect of events and other influences on data streams in real time. One example is monitoring new picking policies in a warehouse as a company attempts to massage their logistical management of stock ”“ a Dashboard allows the company to see the effect of new policies immediately, quickly analyzing just a few hours of data to see if they getting the desired efficiency or not. Dashboards
  • 18.  One of the newer innovations in data mining tools and techniques are text-mining applications. These tools take disparate forms of textual data ”“ word processing documents, plain text files, ‘flat’ text formats like PDF files or presentation files ”“ and mine them for patterns in the text.  This allows companies and users to use data mining tools and techniques without having to open each document in a separate application or perform cumbersome (and error-introducing) conversions on documents.  Text analysis has many possible techniques and applications. One popular one involves seeking out plagiarized or ‘copy pasted’ content. Text analysis data mining tools allow users to quickly scan huge amounts of text in different formats to identify identical strings and report back the odds that a particular piece of text was lifted from an existing text. Universities and colleges are using such tools more and more commonly to fight plagiarism in classrooms. Text Analysis
  • 19.  Future Healthcare  Data mining holds great potential to improve health systems. It uses data and analytics to identify best practices that improve care and reduce costs. Researchers use data mining approaches like multi-dimensional databases, machine learning, soft computing, data visualization and statistics. Mining can be used to predict the volume of patients in every category. Processes are developed that make sure that the patients receive appropriate care at the right place and at the right time. Data mining can also help healthcare insurers to detect fraud and abuse. Data mining applications
  • 20.  Market Basket Analysis  Market basket analysis is a modelling technique based upon a theory that if you buy a certain group of items you are more likely to buy another group of items. This technique may allow the retailer to understand the purchase behaviour of a buyer. This information may help the retailer to know the buyer’s needs and change the store’s layout accordingly. Using differential analysis comparison of results between different stores, between customers in different demographic groups can be done. Data mining applications
  • 21.  Education  There is a new emerging field, called Educational Data Mining, concerns with developing methods that discover knowledge from data originating from educational Environments. The goals of EDM are identified as predicting students’ future learning behavior, studying the effects of educational support, and advancing scientific knowledge about learning. Data mining can be used by an institution to take accurate decisions and also to predict the results of the student. With the results the institution can focus on what to teach and how to teach. Learning pattern of the students can be captured and used to develop techniques to teach them. Data mining applications
  • 22.  Manufacturing Engineering  Knowledge is the best asset a manufacturing enterprise would possess. Data mining tools can be very useful to discover patterns in complex manufacturing process. Data mining can be used in system-level designing to extract the relationships between product architecture, product portfolio, and customer needs data. It can also be used to predict the product development span time, cost, and dependencies among other tasks. Data mining applications
  • 23.  CRM  Customer Relationship Management is all about acquiring and retaining customers, also improving customers’ loyalty and implementing customer focused strategies. To maintain a proper relationship with a customer a business need to collect data and analyse the information. This is where data mining plays its part. With data mining technologies the collected data can be used for analysis. Instead of being confused where to focus to retain customer, the seekers for the solution get filtered results. Data mining applications
  • 24.  Fraud Detection  Billions of dollars have been lost to the action of frauds. Traditional methods of fraud detection are time consuming and complex. Data mining aids in providing meaningful patterns and turning data into information. Any information that is valid and useful is knowledge. A perfect fraud detection system should protect information of all the users. A supervised method includes collection of sample records. These records are classified fraudulent or non-fraudulent. A model is built using this data and the algorithm is made to identify whether the record is fraudulent or not. Data mining applications
  • 25.  Fraud Detection  Billions of dollars have been lost to the action of frauds. Traditional methods of fraud detection are time consuming and complex. Data mining aids in providing meaningful patterns and turning data into information. Any information that is valid and useful is knowledge. A perfect fraud detection system should protect information of all the users. A supervised method includes collection of sample records. These records are classified fraudulent or non-fraudulent. A model is built using this data and the algorithm is made to identify whether the record is fraudulent or not. Data mining applications
  • 26. References 1. Sam Anahory, Dennis Murray, “Data warehousing In the Real World”, Pearson Education. 2. Kimball, R. “The Data Warehouse Toolkit”, Wiley, 1996. 3. Teorey, T. J., “Database Modeling and Design: The Entity-Relationship Approach”, Morgan Kaufmann Publishers, Inc., 1990. 4. “An Overview of Data Warehousing and OLAP Technology”, S. Chaudhuri, Microsoft Research 5. “Data Warehousing with Oracle”, M. A. Shahzad 6. “Data Mining Concepts and Techniques”, Morgan Kaufmann J. Han, M Kamber Second Edition ISBN : 978-1-55860-901-3