SlideShare a Scribd company logo
1 of 39
Data Mining-PART II
By
M.Dhilsath Fathima
DATA MINING Task/Functions
• Classification
• Clustering
• Outlier analysis
• Association
• Prediction/Regression
CLASSIFICATION
• Classification is a data mining (machine
learning) technique used to predict the target
class for each case in the data.
• For example, you may wish to use classification
to predict whether the weather on a particular
day will be “sunny”, “rainy” or “cloudy”.
• For example, a classification model could be
used to identify loan applicants as low,
medium, or high credit risks.
• Popular classification techniques include
decision trees and neural networks.
CLASSIFICATION
CLASSIFICATION-Example
Clustering
• Classification is supervised learning the supervision comes from
labeling the instances with the class.
• Clustering is unsupervised learning -- there are no predefined
class labels, no training set.
• So our clustering algorithm needs to assign a cluster to each
instance such that all objects with the same cluster are more
similar than others.
Clustering
• Finding groups of objects such that the objects in a group will be similar
(or related) to one another and different from (or unrelated to) the
objects in other groups
• The goal is to find the most 'natural' groupings of the instances.
- Within a cluster: Maximize similarity between instances.
- Between clusters: Minimize similarity between instances.
Inter-cluster
distances are
maximizedIntra-cluster
distances are
minimized
OUTLIERS ANALYSIS
Cluster 1
Cluster 2
Outliers
What is an Outlier?
ASSOCIATION
• An association rule has two parts, an antecedent (ifand a
consequent (then). An antecedent )(preceding in time or
order) is an item found in the data. A consequent(the
second part of a conditional proposition/Result) is an item
that is found in combination with the antecedent.
• Association rules are created by analyzing data for
frequent if/then patterns and using the
criteria support and confidence to identify the most
important relationships. Support is an indication of how
frequently the items appear in the
database. Confidence indicates the number of times the
if/then statements have been found to be true.
ASSOCIATION(Cont..)
• In data mining, association rules are useful for 
analyzing  and  predicting customer behavior.
They  play  an  important  part  in  shopping 
basket  data  analysis,  product  clustering, 
catalog design and store layout.
• Form: AB
• Ex for association:{Bread,Jam},{Computer,Printer}
antecedent
consequent
Applications of Data Mining
Data Mining Applications in Sales/Marketing- Ex
For Association
• Discover  consumer  groups  based  on  their  purchasing 
habits,  thus  helping  in  planning and launching new
marketing campaigns in prompt and cost effective way. 
• Data mining is used for market basket analysis to provide 
information  on  what  product  combinations  were 
purchased together when they were bought and in what 
sequence. 
Data Mining Applications in Banking –Ex For
Classification
• Data mining is used to identify customers loyalty by analyzing 
the  data  of  customer’s  purchasing  activities  such  as  the  data 
of frequency of purchase in a period of time, a total monetary 
value  of  all  purchases  and  when  was  the  last purchase.  After 
analyzing those dimensions, the relative measure is generated 
for each customer. The higher of the score, the more relative 
loyal the customer is.
• To help the bank to retain credit card customers, data mining is 
applied.  By analyzing the past data, data mining can help banks 
predict  customers  that  likely  to  change  their  credit  card 
affiliation so they can plan and launch different special offers to 
retain those customers.
Data Mining Applications in Banking –Ex
For Clustering
• Given:
– A source of textual
documents
– Similarity measure
• e.g., how many words
are common in these
documents
Clustering
System
Similarity
measure
Documents
source
Doc
Do
c
Doc
Doc
Doc
DocDoc
Doc
Doc
Doc
• Find:
• Several clusters of documents
that are relevant to each
other
Association Rules 
• A common application
is market basket
analysis which
(1) items are frequently
sold together at a
supermarket
(2) arranging items on
shelves which items
should be promoted
together
DATA PREPROCESSING
Define-Data Preprocessing
• Data preprocessing is a data mining technique 
that  involves  transforming  raw  data  into  an 
understandable format.
•   Data pre-processing is  an  important  step  in 
the data mining process. 
• The  product  of  data  pre-processing  is  the 
final training set.
Why Data Preprocessing?
• Data in the real world is dirty.
noisy: containing errors or outliers.
Incomplete: Missing Values, Lacking attribute 
values.
Inconsistent Data
• No quality data, no quality mining results!
– Quality decisions must be based on quality data
– Data warehouse needs consistent integration of quality 
data
Major Tasks in Data
Preprocessing
• Data cleaning
– Fill in missing values, smooth noisy data, identify or remove outliers, 
and resolve inconsistencies
• Data integration
– Integration of multiple databases, data cubes, or files
• Data transformation
– Normalization and aggregation
• Data reduction
– Obtains reduced representation in volume but produces the same or 
similar analytical results
Forms of data preprocessing
Data Cleaning
• Data cleaning tasks
– Fill in missing values
– Identify outliers and smooth out noisy data
– Correct inconsistent data
What is Missing Data?
• Data is not always available
– E.g., many tuples have no recorded value for several attributes,
such as customer income in sales data
• Missing data may be due to
– equipment malfunction
– inconsistent with other recorded data and thus deleted
– data not entered due to misunderstanding
– certain data may not be considered important at the time of entry
– not register history or changes of the data
How to Handle Missing Data?
• Ignore the tuple: usually done when class label is missing
 (Can be applicable for large data set)
• Fill in the missing value manually: tedious + infeasible for large
database?
• Use a global constant to fill in the missing value
• Use the attribute mean to fill in the missing value
• Use the most probable value to fill in the missing value:
inference-based such as Bayesian formula or decision tree
Noisy Data/Outlier
• Noise: random error or variance in a measured
variable
• Incorrect attribute values may due to
– faulty data collection instruments
– data entry problems
– data transmission problems
– inconsistency in naming convention
– duplicate records
– incomplete data
– inconsistent data
OUTLIER
• A Data object or observations that do not
comply with the general behavior or model of
the data. Such data objects, which are grossly
different from or inconsistent with the
remaining set of data, are called outliers.
• A data object that deviates significantly from
the normal objects as if it were generated by a
different mechanism.
How to Handle Noisy Data?
(Not Now)
• Binning method
• Clustering
• Combined computer and human
inspection
• Regression
Data integration and transformation
Data Integration
• Data integration:
– combines data from multiple sources into a coherent store
Three Problems involved in data integration
Schema integration
Detecting and resolving data value conflicts.
Redundant data occur often when integration of multiple
databases
Data Transformation
• Smoothing: remove noise from data
• Aggregation: summarization, data cube construction
• Generalization: concept hierarchy climbing
• Normalization: scaled to fall within a small, specified range
– min-max normalization
– z-score normalization
– normalization by decimal scaling
DATA REDUCTION
Data Reduction Strategies
• Warehouse may store terabytes of data: Complex data
analysis/mining may take a very long time to run on the
complete data set
• Data reduction
– Obtains a reduced representation of the data set that is much
smaller in volume but yet produces the same (or almost the
same) analytical results
• Data reduction strategies
– Data cube aggregation(Ex:Construction of Datacube)
– Numerosity reduction(Ex:Generating Histograms)
– concept hierarchy generation
Data Cube Aggregation
• The lowest level of a data cube
– the aggregated data for an individual entity of interest
– e.g., a customer in a phone calling data warehouse.
• Multiple levels of aggregation in data cubes
– Further reduce the size of data to deal with
• Reference appropriate levels
– Use the smallest representation which is enough to solve the
task
Numerosity reduction-Histograms
• A popular data reduction
technique
• Divide data into buckets
and store average (sum) for
each bucket
• Can be constructed
optimally in one dimension
using dynamic
programming
• Related to quantization
problems.
0
5
10
15
20
25
30
35
40
10000 30000 50000 70000 90000
Numerosity reduction-Clustering
• Partition data set into clusters, and one can store cluster
representation only
• Can be very effective if data is clustered but not if data is
“smeared”
• Can have hierarchical clustering and be stored in multi-
dimensional index tree structures
• There are many choices of clustering definitions and
clustering algorithms, further detailed in Chapter 8
Sampling
• Allow a mining algorithm to run in complexity that is
potentially sub-linear to the size of the data
• Choose a representative subset of the data
– Simple random sampling may have very poor performance
in the presence of skew
• Develop adaptive sampling methods
– Stratified sampling:
• Approximate the percentage of each class (or
subpopulation of interest) in the overall database
• Used in conjunction with skewed data
Sampling
SRSWOR
(simple random
sample without
replacement)
SRSWR
Raw Data
Concept hierarchy
• Arrangement of concepts such as time , location.
– reduce the data by collecting and replacing low
level concepts (such as numeric values for the
attribute age) by higher level concepts (such as
young, middle-aged, or senior).
Data warehouse Usage/Applications of Data
warehouse

More Related Content

What's hot

Data preprocessing in Data Mining
Data preprocessing in Data MiningData preprocessing in Data Mining
Data preprocessing in Data MiningDHIVYADEVAKI
 
3.4 density and grid methods
3.4 density and grid methods3.4 density and grid methods
3.4 density and grid methodsKrish_ver2
 
Knowledge Discovery and Data Mining
Knowledge Discovery and Data MiningKnowledge Discovery and Data Mining
Knowledge Discovery and Data MiningAmritanshu Mehra
 
Data Mining & Data Warehousing Lecture Notes
Data Mining & Data Warehousing Lecture NotesData Mining & Data Warehousing Lecture Notes
Data Mining & Data Warehousing Lecture NotesFellowBuddy.com
 
Data Mining: clustering and analysis
Data Mining: clustering and analysisData Mining: clustering and analysis
Data Mining: clustering and analysisDataminingTools Inc
 
2.1 Data Mining-classification Basic concepts
2.1 Data Mining-classification Basic concepts2.1 Data Mining-classification Basic concepts
2.1 Data Mining-classification Basic conceptsKrish_ver2
 
File systems versus a dbms
File systems versus a dbmsFile systems versus a dbms
File systems versus a dbmsRituBhargava7
 
Data warehouse and data mining
Data warehouse and data miningData warehouse and data mining
Data warehouse and data miningPradnya Saval
 
Knowledge discovery process
Knowledge discovery process Knowledge discovery process
Knowledge discovery process Shuvra Ghosh
 
3.3 hierarchical methods
3.3 hierarchical methods3.3 hierarchical methods
3.3 hierarchical methodsKrish_ver2
 
Data mining (lecture 1 & 2) conecpts and techniques
Data mining (lecture 1 & 2) conecpts and techniquesData mining (lecture 1 & 2) conecpts and techniques
Data mining (lecture 1 & 2) conecpts and techniquesSaif Ullah
 
Lecture6 introduction to data streams
Lecture6 introduction to data streamsLecture6 introduction to data streams
Lecture6 introduction to data streamshktripathy
 
Challenges of Conventional Systems.pptx
Challenges of Conventional Systems.pptxChallenges of Conventional Systems.pptx
Challenges of Conventional Systems.pptxGovardhanV7
 
Introduction to Data mining
Introduction to Data miningIntroduction to Data mining
Introduction to Data miningHadi Fadlallah
 
Unit 1: Introduction to DBMS Unit 1 Complete
Unit 1: Introduction to DBMS Unit 1 CompleteUnit 1: Introduction to DBMS Unit 1 Complete
Unit 1: Introduction to DBMS Unit 1 CompleteRaj vardhan
 

What's hot (20)

Data preprocessing in Data Mining
Data preprocessing in Data MiningData preprocessing in Data Mining
Data preprocessing in Data Mining
 
Terminology Machine Learning
Terminology Machine LearningTerminology Machine Learning
Terminology Machine Learning
 
3.4 density and grid methods
3.4 density and grid methods3.4 density and grid methods
3.4 density and grid methods
 
3. mining frequent patterns
3. mining frequent patterns3. mining frequent patterns
3. mining frequent patterns
 
Knowledge Discovery and Data Mining
Knowledge Discovery and Data MiningKnowledge Discovery and Data Mining
Knowledge Discovery and Data Mining
 
Data Mining & Data Warehousing Lecture Notes
Data Mining & Data Warehousing Lecture NotesData Mining & Data Warehousing Lecture Notes
Data Mining & Data Warehousing Lecture Notes
 
Data Mining: clustering and analysis
Data Mining: clustering and analysisData Mining: clustering and analysis
Data Mining: clustering and analysis
 
Multimedia Mining
Multimedia Mining Multimedia Mining
Multimedia Mining
 
SQL - RDBMS Concepts
SQL - RDBMS ConceptsSQL - RDBMS Concepts
SQL - RDBMS Concepts
 
2.1 Data Mining-classification Basic concepts
2.1 Data Mining-classification Basic concepts2.1 Data Mining-classification Basic concepts
2.1 Data Mining-classification Basic concepts
 
File systems versus a dbms
File systems versus a dbmsFile systems versus a dbms
File systems versus a dbms
 
Data warehouse and data mining
Data warehouse and data miningData warehouse and data mining
Data warehouse and data mining
 
Knowledge discovery process
Knowledge discovery process Knowledge discovery process
Knowledge discovery process
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
3.3 hierarchical methods
3.3 hierarchical methods3.3 hierarchical methods
3.3 hierarchical methods
 
Data mining (lecture 1 & 2) conecpts and techniques
Data mining (lecture 1 & 2) conecpts and techniquesData mining (lecture 1 & 2) conecpts and techniques
Data mining (lecture 1 & 2) conecpts and techniques
 
Lecture6 introduction to data streams
Lecture6 introduction to data streamsLecture6 introduction to data streams
Lecture6 introduction to data streams
 
Challenges of Conventional Systems.pptx
Challenges of Conventional Systems.pptxChallenges of Conventional Systems.pptx
Challenges of Conventional Systems.pptx
 
Introduction to Data mining
Introduction to Data miningIntroduction to Data mining
Introduction to Data mining
 
Unit 1: Introduction to DBMS Unit 1 Complete
Unit 1: Introduction to DBMS Unit 1 CompleteUnit 1: Introduction to DBMS Unit 1 Complete
Unit 1: Introduction to DBMS Unit 1 Complete
 

Similar to Unit 3 part ii Data mining

finalestkddfinalpresentation-111207021040-phpapp01.pptx
finalestkddfinalpresentation-111207021040-phpapp01.pptxfinalestkddfinalpresentation-111207021040-phpapp01.pptx
finalestkddfinalpresentation-111207021040-phpapp01.pptxshumPanwar
 
Data preprocessing ppt1
Data preprocessing ppt1Data preprocessing ppt1
Data preprocessing ppt1meenas06
 
Data extraction, cleanup & transformation tools 29.1.16
Data extraction, cleanup & transformation tools 29.1.16Data extraction, cleanup & transformation tools 29.1.16
Data extraction, cleanup & transformation tools 29.1.16Dhilsath Fathima
 
Data mining Basics and complete description onword
Data mining Basics and complete description onwordData mining Basics and complete description onword
Data mining Basics and complete description onwordSulman Ahmed
 
Pre-Processing and Data Preparation
Pre-Processing and Data PreparationPre-Processing and Data Preparation
Pre-Processing and Data PreparationUmair Shafique
 
Data mining basic concept and Data warehousing
Data mining basic concept and Data warehousingData mining basic concept and Data warehousing
Data mining basic concept and Data warehousingNivaTripathy1
 
Data preprocessing using Machine Learning
Data  preprocessing using Machine Learning Data  preprocessing using Machine Learning
Data preprocessing using Machine Learning Gopal Sakarkar
 
Unit-V-Introduction to Data Mining.pptx
Unit-V-Introduction to  Data Mining.pptxUnit-V-Introduction to  Data Mining.pptx
Unit-V-Introduction to Data Mining.pptxHarsha Patel
 
Introduction to data mining and data warehousing
Introduction to data mining and data warehousingIntroduction to data mining and data warehousing
Introduction to data mining and data warehousingEr. Nawaraj Bhandari
 
Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!
Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!
Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!Caserta
 
01 Introduction to Data Mining
01 Introduction to Data Mining01 Introduction to Data Mining
01 Introduction to Data MiningValerii Klymchuk
 

Similar to Unit 3 part ii Data mining (20)

finalestkddfinalpresentation-111207021040-phpapp01.pptx
finalestkddfinalpresentation-111207021040-phpapp01.pptxfinalestkddfinalpresentation-111207021040-phpapp01.pptx
finalestkddfinalpresentation-111207021040-phpapp01.pptx
 
Data warehousing
Data warehousingData warehousing
Data warehousing
 
Data preprocessing ppt1
Data preprocessing ppt1Data preprocessing ppt1
Data preprocessing ppt1
 
Data extraction, cleanup & transformation tools 29.1.16
Data extraction, cleanup & transformation tools 29.1.16Data extraction, cleanup & transformation tools 29.1.16
Data extraction, cleanup & transformation tools 29.1.16
 
Data mining Basics and complete description onword
Data mining Basics and complete description onwordData mining Basics and complete description onword
Data mining Basics and complete description onword
 
Pre-Processing and Data Preparation
Pre-Processing and Data PreparationPre-Processing and Data Preparation
Pre-Processing and Data Preparation
 
Data mining basic concept and Data warehousing
Data mining basic concept and Data warehousingData mining basic concept and Data warehousing
Data mining basic concept and Data warehousing
 
Data preprocessing using Machine Learning
Data  preprocessing using Machine Learning Data  preprocessing using Machine Learning
Data preprocessing using Machine Learning
 
Unit 3 part i Data mining
Unit 3 part i Data miningUnit 3 part i Data mining
Unit 3 part i Data mining
 
Unit-V-Introduction to Data Mining.pptx
Unit-V-Introduction to  Data Mining.pptxUnit-V-Introduction to  Data Mining.pptx
Unit-V-Introduction to Data Mining.pptx
 
Dmblog
DmblogDmblog
Dmblog
 
Introduction to data mining and data warehousing
Introduction to data mining and data warehousingIntroduction to data mining and data warehousing
Introduction to data mining and data warehousing
 
Preprocessing
PreprocessingPreprocessing
Preprocessing
 
Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!
Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!
Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!
 
Pre processing
Pre processingPre processing
Pre processing
 
01 Introduction to Data Mining
01 Introduction to Data Mining01 Introduction to Data Mining
01 Introduction to Data Mining
 
Data Mining-2023 (2).ppt
Data Mining-2023 (2).pptData Mining-2023 (2).ppt
Data Mining-2023 (2).ppt
 
Ch~2.pdf
Ch~2.pdfCh~2.pdf
Ch~2.pdf
 
Chapter 3.pdf
Chapter 3.pdfChapter 3.pdf
Chapter 3.pdf
 
Data mining
Data miningData mining
Data mining
 

More from Dhilsath Fathima

engineer's are responsible for safety
engineer's are responsible for safetyengineer's are responsible for safety
engineer's are responsible for safetyDhilsath Fathima
 
Dwdm unit 1-2016-Data ingarehousing
Dwdm unit 1-2016-Data ingarehousingDwdm unit 1-2016-Data ingarehousing
Dwdm unit 1-2016-Data ingarehousingDhilsath Fathima
 
business analysis-Data warehousing
business analysis-Data warehousingbusiness analysis-Data warehousing
business analysis-Data warehousingDhilsath Fathima
 
Profession & professionalism
Profession & professionalismProfession & professionalism
Profession & professionalismDhilsath Fathima
 
Engineering as social experimentation
Engineering as social experimentation Engineering as social experimentation
Engineering as social experimentation Dhilsath Fathima
 
Moral autonomy & consensus &controversy
Moral autonomy & consensus &controversyMoral autonomy & consensus &controversy
Moral autonomy & consensus &controversyDhilsath Fathima
 

More from Dhilsath Fathima (10)

Information Security
Information SecurityInformation Security
Information Security
 
Sdlc model
Sdlc modelSdlc model
Sdlc model
 
engineer's are responsible for safety
engineer's are responsible for safetyengineer's are responsible for safety
engineer's are responsible for safety
 
Dwdm unit 1-2016-Data ingarehousing
Dwdm unit 1-2016-Data ingarehousingDwdm unit 1-2016-Data ingarehousing
Dwdm unit 1-2016-Data ingarehousing
 
business analysis-Data warehousing
business analysis-Data warehousingbusiness analysis-Data warehousing
business analysis-Data warehousing
 
Profession & professionalism
Profession & professionalismProfession & professionalism
Profession & professionalism
 
Engineering as social experimentation
Engineering as social experimentation Engineering as social experimentation
Engineering as social experimentation
 
Moral autonomy & consensus &controversy
Moral autonomy & consensus &controversyMoral autonomy & consensus &controversy
Moral autonomy & consensus &controversy
 
Virtues
VirtuesVirtues
Virtues
 
Business analysis
Business analysisBusiness analysis
Business analysis
 

Recently uploaded

AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdfankushspencer015
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXssuser89054b
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Bookingdharasingh5698
 
notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptMsecMca
 
Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01KreezheaRecto
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlysanyuktamishra911
 
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...Call Girls in Nagpur High Profile
 
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...tanu pandey
 
Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfRagavanV2
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfJiananWang21
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordAsst.prof M.Gokilavani
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...roncy bisnoi
 
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoorTop Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoordharasingh5698
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueBhangaleSonal
 
chapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringchapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringmulugeta48
 

Recently uploaded (20)

AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
 
notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.ppt
 
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
 
Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 
Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024
 
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
 
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
 
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
 
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
 
Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdf
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdf
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
 
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoorTop Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
 
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torque
 
chapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringchapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineering
 

Unit 3 part ii Data mining

  • 2. DATA MINING Task/Functions • Classification • Clustering • Outlier analysis • Association • Prediction/Regression
  • 3. CLASSIFICATION • Classification is a data mining (machine learning) technique used to predict the target class for each case in the data. • For example, you may wish to use classification to predict whether the weather on a particular day will be “sunny”, “rainy” or “cloudy”. • For example, a classification model could be used to identify loan applicants as low, medium, or high credit risks. • Popular classification techniques include decision trees and neural networks.
  • 6. Clustering • Classification is supervised learning the supervision comes from labeling the instances with the class. • Clustering is unsupervised learning -- there are no predefined class labels, no training set. • So our clustering algorithm needs to assign a cluster to each instance such that all objects with the same cluster are more similar than others.
  • 7. Clustering • Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to) the objects in other groups • The goal is to find the most 'natural' groupings of the instances. - Within a cluster: Maximize similarity between instances. - Between clusters: Minimize similarity between instances. Inter-cluster distances are maximizedIntra-cluster distances are minimized
  • 9. What is an Outlier?
  • 10. ASSOCIATION • An association rule has two parts, an antecedent (ifand a consequent (then). An antecedent )(preceding in time or order) is an item found in the data. A consequent(the second part of a conditional proposition/Result) is an item that is found in combination with the antecedent. • Association rules are created by analyzing data for frequent if/then patterns and using the criteria support and confidence to identify the most important relationships. Support is an indication of how frequently the items appear in the database. Confidence indicates the number of times the if/then statements have been found to be true.
  • 11. ASSOCIATION(Cont..) • In data mining, association rules are useful for  analyzing  and  predicting customer behavior. They  play  an  important  part  in  shopping  basket  data  analysis,  product  clustering,  catalog design and store layout. • Form: AB • Ex for association:{Bread,Jam},{Computer,Printer} antecedent consequent
  • 13. Data Mining Applications in Sales/Marketing- Ex For Association • Discover  consumer  groups  based  on  their  purchasing  habits,  thus  helping  in  planning and launching new marketing campaigns in prompt and cost effective way.  • Data mining is used for market basket analysis to provide  information  on  what  product  combinations  were  purchased together when they were bought and in what  sequence. 
  • 14. Data Mining Applications in Banking –Ex For Classification • Data mining is used to identify customers loyalty by analyzing  the  data  of  customer’s  purchasing  activities  such  as  the  data  of frequency of purchase in a period of time, a total monetary  value  of  all  purchases  and  when  was  the  last purchase.  After  analyzing those dimensions, the relative measure is generated  for each customer. The higher of the score, the more relative  loyal the customer is. • To help the bank to retain credit card customers, data mining is  applied.  By analyzing the past data, data mining can help banks  predict  customers  that  likely  to  change  their  credit  card  affiliation so they can plan and launch different special offers to  retain those customers.
  • 15. Data Mining Applications in Banking –Ex For Clustering • Given: – A source of textual documents – Similarity measure • e.g., how many words are common in these documents Clustering System Similarity measure Documents source Doc Do c Doc Doc Doc DocDoc Doc Doc Doc • Find: • Several clusters of documents that are relevant to each other
  • 16. Association Rules  • A common application is market basket analysis which (1) items are frequently sold together at a supermarket (2) arranging items on shelves which items should be promoted together
  • 18. Define-Data Preprocessing • Data preprocessing is a data mining technique  that  involves  transforming  raw  data  into  an  understandable format. •   Data pre-processing is  an  important  step  in  the data mining process.  • The  product  of  data  pre-processing  is  the  final training set.
  • 19. Why Data Preprocessing? • Data in the real world is dirty. noisy: containing errors or outliers. Incomplete: Missing Values, Lacking attribute  values. Inconsistent Data • No quality data, no quality mining results! – Quality decisions must be based on quality data – Data warehouse needs consistent integration of quality  data
  • 20. Major Tasks in Data Preprocessing • Data cleaning – Fill in missing values, smooth noisy data, identify or remove outliers,  and resolve inconsistencies • Data integration – Integration of multiple databases, data cubes, or files • Data transformation – Normalization and aggregation • Data reduction – Obtains reduced representation in volume but produces the same or  similar analytical results
  • 21. Forms of data preprocessing
  • 22. Data Cleaning • Data cleaning tasks – Fill in missing values – Identify outliers and smooth out noisy data – Correct inconsistent data
  • 23. What is Missing Data? • Data is not always available – E.g., many tuples have no recorded value for several attributes, such as customer income in sales data • Missing data may be due to – equipment malfunction – inconsistent with other recorded data and thus deleted – data not entered due to misunderstanding – certain data may not be considered important at the time of entry – not register history or changes of the data
  • 24. How to Handle Missing Data? • Ignore the tuple: usually done when class label is missing  (Can be applicable for large data set) • Fill in the missing value manually: tedious + infeasible for large database? • Use a global constant to fill in the missing value • Use the attribute mean to fill in the missing value • Use the most probable value to fill in the missing value: inference-based such as Bayesian formula or decision tree
  • 25. Noisy Data/Outlier • Noise: random error or variance in a measured variable • Incorrect attribute values may due to – faulty data collection instruments – data entry problems – data transmission problems – inconsistency in naming convention – duplicate records – incomplete data – inconsistent data
  • 26. OUTLIER • A Data object or observations that do not comply with the general behavior or model of the data. Such data objects, which are grossly different from or inconsistent with the remaining set of data, are called outliers. • A data object that deviates significantly from the normal objects as if it were generated by a different mechanism.
  • 27. How to Handle Noisy Data? (Not Now) • Binning method • Clustering • Combined computer and human inspection • Regression
  • 28. Data integration and transformation
  • 29. Data Integration • Data integration: – combines data from multiple sources into a coherent store Three Problems involved in data integration Schema integration Detecting and resolving data value conflicts. Redundant data occur often when integration of multiple databases
  • 30. Data Transformation • Smoothing: remove noise from data • Aggregation: summarization, data cube construction • Generalization: concept hierarchy climbing • Normalization: scaled to fall within a small, specified range – min-max normalization – z-score normalization – normalization by decimal scaling
  • 32. Data Reduction Strategies • Warehouse may store terabytes of data: Complex data analysis/mining may take a very long time to run on the complete data set • Data reduction – Obtains a reduced representation of the data set that is much smaller in volume but yet produces the same (or almost the same) analytical results • Data reduction strategies – Data cube aggregation(Ex:Construction of Datacube) – Numerosity reduction(Ex:Generating Histograms) – concept hierarchy generation
  • 33. Data Cube Aggregation • The lowest level of a data cube – the aggregated data for an individual entity of interest – e.g., a customer in a phone calling data warehouse. • Multiple levels of aggregation in data cubes – Further reduce the size of data to deal with • Reference appropriate levels – Use the smallest representation which is enough to solve the task
  • 34. Numerosity reduction-Histograms • A popular data reduction technique • Divide data into buckets and store average (sum) for each bucket • Can be constructed optimally in one dimension using dynamic programming • Related to quantization problems. 0 5 10 15 20 25 30 35 40 10000 30000 50000 70000 90000
  • 35. Numerosity reduction-Clustering • Partition data set into clusters, and one can store cluster representation only • Can be very effective if data is clustered but not if data is “smeared” • Can have hierarchical clustering and be stored in multi- dimensional index tree structures • There are many choices of clustering definitions and clustering algorithms, further detailed in Chapter 8
  • 36. Sampling • Allow a mining algorithm to run in complexity that is potentially sub-linear to the size of the data • Choose a representative subset of the data – Simple random sampling may have very poor performance in the presence of skew • Develop adaptive sampling methods – Stratified sampling: • Approximate the percentage of each class (or subpopulation of interest) in the overall database • Used in conjunction with skewed data
  • 38. Concept hierarchy • Arrangement of concepts such as time , location. – reduce the data by collecting and replacing low level concepts (such as numeric values for the attribute age) by higher level concepts (such as young, middle-aged, or senior).