SlideShare a Scribd company logo
1 of 24
Download to read offline
Sanjivani Rural Education Society’s
Sanjivani College of Engineering, Kopargaon-423 603
(An Autonomous Institute, Affiliated to Savitribai Phule Pune University, Pune)
NACC ‘A’ Grade Accredited, ISO 9001:2015 Certified
Department of Computer Engineering
(NBA Accredited)
Prof. S.A.Shivarkar
Assistant Professor
Contact No.8275032712
Email- shivarkarsandipcomp@sanjivani.org.in
Subject- Data Mining and Warehousing (CO314)
Unit –I: Introduction to Data Mining
DEPARTMENT OF COMPUTER ENGINEERING, Sanjivani COE, Kopargaon 2
Content
 Kinds of pattern and technologies
 Issues in mining
 OLAP, knowledge representation, Information and Knowledge
DEPARTMENT OF COMPUTER ENGINEERING, Sanjivani COE, Kopargaon 3
Kinds of pattern and technologies
 We have observed various types of data and information repositories on which data mining can
be performed.
 Let us now examine the kinds of patterns that can be mined.
 There are a number of data mining functionalities.
 These include characterization and discrimination the mining of frequent patterns, associations,
and correlations classification and regression ,clustering analysis; and outlier analysis
 Data mining functionalities are used to specify the kinds of patterns to be found in data mining
tasks.
DEPARTMENT OF COMPUTER ENGINEERING, Sanjivani COE, Kopargaon 4
Kinds of pattern and technologies
 Pattern mining concentrates
on identifying rules that
describe specific patterns
within the data.
 Market-basket analysis,
which identifies items that
typically occur together in
purchase transactions, was
one of the first applications
of data mining.
DEPARTMENT OF COMPUTER ENGINEERING, Sanjivani COE, Kopargaon 5
Kinds of pattern and technologies
 In general, such tasks can be classified into two categories:
 Descriptive:
 Descriptive mining tasks characterize properties of the data in a target data
set.
 Predictive:
 Predictive mining tasks perform induction on the current data in order to
make predictions.
DEPARTMENT OF COMPUTER ENGINEERING, Sanjivani COE, Kopargaon 6
Class/Concept Description
 Data entries can be associated with classes or concepts.
e.g. in the AllElectronics store, classes of items for sale include computers and printers, and
concepts of customers include bigSpenders and budgetSpenders.
 It can be useful to describe individual classes and concepts in summarized, concise, and yet
precise terms. Such descriptions of a class or a concept are called class/concept descriptions.
 These descriptions can be derived using:
(1) data characterization, by summarizing the data of the class under study (often called the target
class) in general terms, or
(2) data discrimination, by comparison of the target class with one or a set of comparative classes
(often called the contrasting classes), or
(3) both data characterization and discrimination.
DEPARTMENT OF COMPUTER ENGINEERING, Sanjivani COE, Kopargaon 7
Data Characterization
 In Data characterization Data entries can be associated with classes or concepts.
 Data characterization is a summarization of the general characteristics or features of a target class of data.
 The data corresponding to the user-specified class are typically collected by a query.
e.g. to study the characteristics of software products with sales that increased by 10% in the previous year,
the data related to such products can be collected by executing an SQL query on the sales database
 The data cube-based OLAP roll-up operation can be used to perform user-controlled data summarization
along a specified dimension.
 The output of data characterization can be presented in various forms. Examples include pie charts, bar
charts, curves, multidimensional data cubes, and multidimensional tables, including crosstabs.
 The resulting descriptions can also be presented as generalized relations or in rule form (called
characteristic rules).
DEPARTMENT OF COMPUTER ENGINEERING, Sanjivani COE, Kopargaon 8
Data Discrimination
 Data discrimination is a comparison of the general features of the target class data objects
against the general features of objects from one or multiple contrasting classes.
 The target and contrasting classes can be specified by a user, and the corresponding data objects
can be retrieved through database queries.
e.g. a user may want to compare the general features of software products with sales that
increased by 10% last year against those with sales that decreased by at least 30% during the same
period.
 The methods used for data discrimination are similar to those used for data characterization.
 The forms of output presentation are similar to those for characteristic descriptions, although
discrimination descriptions should include comparative measures that help to distinguish between
the target and contrasting classes.
 Discrimination descriptions expressed in the form of rules are referred to as discriminant rules.
DEPARTMENT OF COMPUTER ENGINEERING, Sanjivani COE, Kopargaon 9
Mining Frequent Patterns, Associations, and Correlations
 Frequent patterns, as the name suggests, are patterns that occur frequently in data.
 There are many kinds of frequent patterns, including frequent itemsets, frequent subsequences
(also known as sequential patterns), and frequent substructures.
 A frequent itemset typically refers to a set of items that often appear together in a transactional
data set— e.g. milk and bread, which are frequently bought together in grocery stores by many
customers.
 A frequently occurring subsequence, such as the pattern that customers, tend to purchase first a
laptop, followed by a digital camera, and then a memory card, is a (frequent) sequential pattern.
 A substructure can refer to different structural forms (e.g., graphs, trees, or lattices) that may be
combined with itemsets or subsequences.
 If a substructure occurs frequently, it is called a (frequent) structured pattern.
 Mining frequent patterns leads to the discovery of interesting associations and correlations within
data.
DEPARTMENT OF COMPUTER ENGINEERING, Sanjivani COE, Kopargaon 10
Support and Confidence
 As we know data mining refers to extracting or mining knowledge from large amounts of data.
 In other words, Data mining is the science, art, and technology of discovering large and
complex bodies of data in order to discover useful patterns.
 Support
 In data mining, support refers to the relative frequency of an item set in a dataset.
e.g. if an itemset occurs in 5% of the transactions in a dataset, it has a support of 5%.
Support is often used as a threshold for identifying frequent item sets in a dataset,
which can be used to generate association rules.
e.g. if we set the support threshold to 5%, then any itemset that occurs in more than
5% of the transactions in the dataset will be considered a frequent itemset.
DEPARTMENT OF COMPUTER ENGINEERING, Sanjivani COE, Kopargaon 11
Support and Confidence
 Support
 The support of an itemset is the number of transactions in which the itemset
appears, divided by the total number of transactions.
e.g. suppose we have a dataset of 1000 transactions, and the itemset {milk, bread}
appears in 100 of those transactions. The support of the itemset {milk, bread} would
be calculated as follows:
Support({milk, bread})
= Number of transactions containing {milk, bread} / Total number of transactions
= 100 / 1000 = 10%
DEPARTMENT OF COMPUTER ENGINEERING, Sanjivani COE, Kopargaon 12
Confidence
 Confidence
 In data mining, confidence is a measure of the reliability or support for a given
association rule. It is defined as the proportion of cases in which the association
rule holds true, or in other words, the percentage of times that the items in the
antecedent (the “if” part of the rule) appear in the same transaction as the items
in the consequent (the “then” part of the rule).
 Confidence is a measure of the likelihood that an itemset will appear if another
itemset appears.
DEPARTMENT OF COMPUTER ENGINEERING, Sanjivani COE, Kopargaon 13
Confidence
 E.g.
Confidence("If a customer buys milk, they will also buy bread")
= Number of transactions containing {milk, bread} / Number of transactions containing {milk}
= 100 / 200 = 50%
Introduction to Data
 We frequently hear the words Data, Information and Knowledge used as if
they are the same thing.
 Data is/are the facts of the World.
 For example, take yourself. You may be 5ft tall, have brown hair and blue
eyes. All of this is “data”. You have brown hair whether this is written
down somewhere or not.
Data
 In many ways, data can be thought of as a description of the World.
We can perceive this data with our senses, and then the brain can
process this.
Information
 Information allows us to expand our knowledge beyond the range of our senses. We
can capture data in information, then move it about so that other people can access it
at different times.
 If I take a picture of you, the photograph is information. But what you look like is data.
Knowledge
 Knowledge is what we know. Think of this
as the map of the World we build inside our
brains.
 Like a physical map, it helps us know
where things are – but it contains more
than that.
 It also contains our beliefs and
expectations. “If I do this, I will probably
get that.”
 Crucially, the brain links all these things
together into a giant network of ideas,
memories, predictions, beliefs, etc.
Data, Information and Knowledge
Online Analytical Processing (OLAP)
 OLAP, or online analytical processing, is technology for performing high-speed complex
queries or multidimensional analysis on large volumes of data in a data
warehouse, data lake or other data repository.
 OLAP is used in business intelligence (BI), decision support, and a variety of business
forecasting and reporting applications.
 The core of most OLAP systems, the OLAP cube is an array-based multidimensional
database that makes it possible to process and analyze multiple data dimensions much
more quickly and efficiently than a traditional relational database.
 In theory, a cube can contain an infinite number of layers. (An OLAP cube representing
more than three dimensions is sometimes called a hypercube.) And smaller cubes can
exist within layers—for example, each store layer could contain cubes arranging sales
by salesperson and product. In practice, data analysts will create OLAP cubes
containing just the layers they need, for optimal analysis and performance.
Online Analytical Processing (OLAP) cont…
 Drill-down
 The drill-down operation converts less-detailed data into more-detailed data
through one of two methods—moving down in the concept hierarchy or adding
a new dimension to the cube. For example, if you view sales data for an
organization’s calendar or fiscal quarter, you can drill-down to see sales for each
month, moving down in the concept hierarchy of the “time” dimension.
 Roll up
 Roll up is the opposite of the drill-down function—it aggregates data on an OLAP
cube by moving up in the concept hierarchy or by reducing the number of
dimensions. For example, you could move up in the concept hierarchy of the
“location” dimension by viewing each country's data, rather than each city.
Online Analytical Processing (OLAP) cont…
 Slice and dice
 The slice operation creates a sub-cube by selecting a single dimension from the
main OLAP cube. For example, you can perform a slice by highlighting all data for
the organization's first fiscal or calendar quarter (time dimension).
 The dice operation isolates a sub-cube by selecting several dimensions within
the main OLAP cube. For example, you could perform a dice operation by
highlighting all data by an organization’s calendar or fiscal quarters (time
dimension) and within the U.S. and Canada (location dimension).
Online Analytical Processing (OLAP) cont…
 Pivot
 The pivot function rotates the current cube view to display a new representation
of the data—enabling dynamic multidimensional views of data.
 The OLAP pivot function is comparable to the pivot table feature in spreadsheet
software, such as Microsoft Excel, but while pivot tables in Excel can be
challenging, OLAP pivots are relatively easier to use (less expertise is required)
and have a faster response time and query performance.
OLAP vs OLTP
DEPARTMENT OF COMPUTER ENGINEERING, Sanjivani COE, Kopargaon 24
Reference
 Han, Jiawei Kamber, Micheline Pei and Jian, “Data Mining: Concepts and
Techniques”,Elsevier Publishers, ISBN:9780123814791, 9780123814807.
 https://www.ibm.com/topics/olap

More Related Content

Similar to Issues in data mining Patterns Online Analytical Processing

Research methodology-Research Report
Research methodology-Research ReportResearch methodology-Research Report
Research methodology-Research ReportDrMAlagupriyasafiq
 
Research Methodology-Data Processing
Research Methodology-Data ProcessingResearch Methodology-Data Processing
Research Methodology-Data ProcessingDrMAlagupriyasafiq
 
Applications of machine learning
Applications of machine learningApplications of machine learning
Applications of machine learningbusiness Corporate
 
Questions On The And Football
Questions On The And FootballQuestions On The And Football
Questions On The And FootballAmanda Gray
 
Evaluation Mechanism for Similarity-Based Ranked Search Over Scientific Data
Evaluation Mechanism for Similarity-Based Ranked Search Over Scientific DataEvaluation Mechanism for Similarity-Based Ranked Search Over Scientific Data
Evaluation Mechanism for Similarity-Based Ranked Search Over Scientific DataAM Publications
 
Evaluation Mechanism for Similarity-Based Ranked Search Over Scientific Data
Evaluation Mechanism for Similarity-Based Ranked Search Over Scientific DataEvaluation Mechanism for Similarity-Based Ranked Search Over Scientific Data
Evaluation Mechanism for Similarity-Based Ranked Search Over Scientific DataAM Publications
 
Data Mining and Data Warehousing (MAKAUT)
Data Mining and Data Warehousing (MAKAUT)Data Mining and Data Warehousing (MAKAUT)
Data Mining and Data Warehousing (MAKAUT)Bikramjit Sarkar, Ph.D.
 
Data mining techniques
Data mining techniquesData mining techniques
Data mining techniquesHatem Magdy
 
Barga, roger. predictive analytics with microsoft azure machine learning
Barga, roger. predictive analytics with microsoft azure machine learningBarga, roger. predictive analytics with microsoft azure machine learning
Barga, roger. predictive analytics with microsoft azure machine learningmaldonadojorge
 
KIT-601 Lecture Notes-UNIT-1.pdf
KIT-601 Lecture Notes-UNIT-1.pdfKIT-601 Lecture Notes-UNIT-1.pdf
KIT-601 Lecture Notes-UNIT-1.pdfDr. Radhey Shyam
 
BigData Analytics_1.7
BigData Analytics_1.7BigData Analytics_1.7
BigData Analytics_1.7Rohit Mittal
 
Data Mining Techniques
Data Mining TechniquesData Mining Techniques
Data Mining TechniquesSanzid Kawsar
 
Unit i big data introduction
Unit  i big data introductionUnit  i big data introduction
Unit i big data introductionSujaMaryD
 
DataMining Techniq
DataMining TechniqDataMining Techniq
DataMining TechniqRespa Peter
 

Similar to Issues in data mining Patterns Online Analytical Processing (20)

Data Mining
Data MiningData Mining
Data Mining
 
Research methodology-Research Report
Research methodology-Research ReportResearch methodology-Research Report
Research methodology-Research Report
 
Research Methodology-Data Processing
Research Methodology-Data ProcessingResearch Methodology-Data Processing
Research Methodology-Data Processing
 
Applications of machine learning
Applications of machine learningApplications of machine learning
Applications of machine learning
 
Unit i
Unit iUnit i
Unit i
 
Questions On The And Football
Questions On The And FootballQuestions On The And Football
Questions On The And Football
 
Evaluation Mechanism for Similarity-Based Ranked Search Over Scientific Data
Evaluation Mechanism for Similarity-Based Ranked Search Over Scientific DataEvaluation Mechanism for Similarity-Based Ranked Search Over Scientific Data
Evaluation Mechanism for Similarity-Based Ranked Search Over Scientific Data
 
Evaluation Mechanism for Similarity-Based Ranked Search Over Scientific Data
Evaluation Mechanism for Similarity-Based Ranked Search Over Scientific DataEvaluation Mechanism for Similarity-Based Ranked Search Over Scientific Data
Evaluation Mechanism for Similarity-Based Ranked Search Over Scientific Data
 
Data Mining and Data Warehousing (MAKAUT)
Data Mining and Data Warehousing (MAKAUT)Data Mining and Data Warehousing (MAKAUT)
Data Mining and Data Warehousing (MAKAUT)
 
Data mining techniques
Data mining techniquesData mining techniques
Data mining techniques
 
Barga, roger. predictive analytics with microsoft azure machine learning
Barga, roger. predictive analytics with microsoft azure machine learningBarga, roger. predictive analytics with microsoft azure machine learning
Barga, roger. predictive analytics with microsoft azure machine learning
 
KIT-601 Lecture Notes-UNIT-1.pdf
KIT-601 Lecture Notes-UNIT-1.pdfKIT-601 Lecture Notes-UNIT-1.pdf
KIT-601 Lecture Notes-UNIT-1.pdf
 
BigData Analytics_1.7
BigData Analytics_1.7BigData Analytics_1.7
BigData Analytics_1.7
 
Data .pptx
Data .pptxData .pptx
Data .pptx
 
Data Mining Techniques
Data Mining TechniquesData Mining Techniques
Data Mining Techniques
 
QQ Plot.pptx
QQ Plot.pptxQQ Plot.pptx
QQ Plot.pptx
 
Unit i big data introduction
Unit  i big data introductionUnit  i big data introduction
Unit i big data introduction
 
Z36149154
Z36149154Z36149154
Z36149154
 
DataMining Techniq
DataMining TechniqDataMining Techniq
DataMining Techniq
 
Data mining
Data miningData mining
Data mining
 

More from ShivarkarSandip

Cluster Analysis: Measuring Similarity & Dissimilarity
Cluster Analysis: Measuring Similarity & DissimilarityCluster Analysis: Measuring Similarity & Dissimilarity
Cluster Analysis: Measuring Similarity & DissimilarityShivarkarSandip
 
Classification, Attribute Selection, Classifiers- Decision Tree, ID3,C4.5,Nav...
Classification, Attribute Selection, Classifiers- Decision Tree, ID3,C4.5,Nav...Classification, Attribute Selection, Classifiers- Decision Tree, ID3,C4.5,Nav...
Classification, Attribute Selection, Classifiers- Decision Tree, ID3,C4.5,Nav...ShivarkarSandip
 
Frequent Pattern Analysis, Apriori and FP Growth Algorithm
Frequent Pattern Analysis, Apriori and FP Growth AlgorithmFrequent Pattern Analysis, Apriori and FP Growth Algorithm
Frequent Pattern Analysis, Apriori and FP Growth AlgorithmShivarkarSandip
 
Data Warehouse and Architecture, OLAP Operation
Data Warehouse and Architecture, OLAP OperationData Warehouse and Architecture, OLAP Operation
Data Warehouse and Architecture, OLAP OperationShivarkarSandip
 
Data Preparation and Preprocessing , Data Cleaning
Data Preparation and Preprocessing , Data CleaningData Preparation and Preprocessing , Data Cleaning
Data Preparation and Preprocessing , Data CleaningShivarkarSandip
 
Introduction to Data Mining, KDD Process, OLTP and OLAP
Introduction to Data Mining, KDD Process, OLTP and OLAPIntroduction to Data Mining, KDD Process, OLTP and OLAP
Introduction to Data Mining, KDD Process, OLTP and OLAPShivarkarSandip
 
Introduction to Data Mining KDD Process OLAP
Introduction to Data Mining KDD Process OLAPIntroduction to Data Mining KDD Process OLAP
Introduction to Data Mining KDD Process OLAPShivarkarSandip
 
Introduction to data mining which covers the basics
Introduction to data mining which covers the basicsIntroduction to data mining which covers the basics
Introduction to data mining which covers the basicsShivarkarSandip
 
Introduction to Data Communication.pdf
Introduction to Data Communication.pdfIntroduction to Data Communication.pdf
Introduction to Data Communication.pdfShivarkarSandip
 
Classification of Signal.pdf
Classification of Signal.pdfClassification of Signal.pdf
Classification of Signal.pdfShivarkarSandip
 
Sequential Circuit Design-2.pdf
Sequential Circuit Design-2.pdfSequential Circuit Design-2.pdf
Sequential Circuit Design-2.pdfShivarkarSandip
 
Boolean Algebra Terminologies.pdf
Boolean Algebra Terminologies.pdfBoolean Algebra Terminologies.pdf
Boolean Algebra Terminologies.pdfShivarkarSandip
 
Unit III Introduction to DWH.pdf
Unit III Introduction to DWH.pdfUnit III Introduction to DWH.pdf
Unit III Introduction to DWH.pdfShivarkarSandip
 
Unit II Decision Making Basics and Concepts.pdf
Unit II Decision Making Basics and Concepts.pdfUnit II Decision Making Basics and Concepts.pdf
Unit II Decision Making Basics and Concepts.pdfShivarkarSandip
 
Unit I Factors Responsible for Successful BI Project.pdf
Unit I Factors Responsible for Successful BI Project.pdfUnit I Factors Responsible for Successful BI Project.pdf
Unit I Factors Responsible for Successful BI Project.pdfShivarkarSandip
 
Unit I Operational data Informational data.pdf
Unit I Operational data  Informational data.pdfUnit I Operational data  Informational data.pdf
Unit I Operational data Informational data.pdfShivarkarSandip
 

More from ShivarkarSandip (20)

Cluster Analysis: Measuring Similarity & Dissimilarity
Cluster Analysis: Measuring Similarity & DissimilarityCluster Analysis: Measuring Similarity & Dissimilarity
Cluster Analysis: Measuring Similarity & Dissimilarity
 
Classification, Attribute Selection, Classifiers- Decision Tree, ID3,C4.5,Nav...
Classification, Attribute Selection, Classifiers- Decision Tree, ID3,C4.5,Nav...Classification, Attribute Selection, Classifiers- Decision Tree, ID3,C4.5,Nav...
Classification, Attribute Selection, Classifiers- Decision Tree, ID3,C4.5,Nav...
 
Frequent Pattern Analysis, Apriori and FP Growth Algorithm
Frequent Pattern Analysis, Apriori and FP Growth AlgorithmFrequent Pattern Analysis, Apriori and FP Growth Algorithm
Frequent Pattern Analysis, Apriori and FP Growth Algorithm
 
Data Warehouse and Architecture, OLAP Operation
Data Warehouse and Architecture, OLAP OperationData Warehouse and Architecture, OLAP Operation
Data Warehouse and Architecture, OLAP Operation
 
Data Preparation and Preprocessing , Data Cleaning
Data Preparation and Preprocessing , Data CleaningData Preparation and Preprocessing , Data Cleaning
Data Preparation and Preprocessing , Data Cleaning
 
Introduction to Data Mining, KDD Process, OLTP and OLAP
Introduction to Data Mining, KDD Process, OLTP and OLAPIntroduction to Data Mining, KDD Process, OLTP and OLAP
Introduction to Data Mining, KDD Process, OLTP and OLAP
 
Introduction to Data Mining KDD Process OLAP
Introduction to Data Mining KDD Process OLAPIntroduction to Data Mining KDD Process OLAP
Introduction to Data Mining KDD Process OLAP
 
Introduction to data mining which covers the basics
Introduction to data mining which covers the basicsIntroduction to data mining which covers the basics
Introduction to data mining which covers the basics
 
Introduction to Data Communication.pdf
Introduction to Data Communication.pdfIntroduction to Data Communication.pdf
Introduction to Data Communication.pdf
 
Classification of Signal.pdf
Classification of Signal.pdfClassification of Signal.pdf
Classification of Signal.pdf
 
Sequential Circuit Design-2.pdf
Sequential Circuit Design-2.pdfSequential Circuit Design-2.pdf
Sequential Circuit Design-2.pdf
 
Sequential Ckt.pdf
Sequential Ckt.pdfSequential Ckt.pdf
Sequential Ckt.pdf
 
Flip Flop.pdf
Flip Flop.pdfFlip Flop.pdf
Flip Flop.pdf
 
Combinational Ckt.pdf
Combinational Ckt.pdfCombinational Ckt.pdf
Combinational Ckt.pdf
 
Boolean Algebra Terminologies.pdf
Boolean Algebra Terminologies.pdfBoolean Algebra Terminologies.pdf
Boolean Algebra Terminologies.pdf
 
Logic Minimization.pdf
Logic Minimization.pdfLogic Minimization.pdf
Logic Minimization.pdf
 
Unit III Introduction to DWH.pdf
Unit III Introduction to DWH.pdfUnit III Introduction to DWH.pdf
Unit III Introduction to DWH.pdf
 
Unit II Decision Making Basics and Concepts.pdf
Unit II Decision Making Basics and Concepts.pdfUnit II Decision Making Basics and Concepts.pdf
Unit II Decision Making Basics and Concepts.pdf
 
Unit I Factors Responsible for Successful BI Project.pdf
Unit I Factors Responsible for Successful BI Project.pdfUnit I Factors Responsible for Successful BI Project.pdf
Unit I Factors Responsible for Successful BI Project.pdf
 
Unit I Operational data Informational data.pdf
Unit I Operational data  Informational data.pdfUnit I Operational data  Informational data.pdf
Unit I Operational data Informational data.pdf
 

Recently uploaded

VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Bookingdharasingh5698
 
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank  Design by Working Stress - IS Method.pdfIntze Overhead Water Tank  Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank Design by Working Stress - IS Method.pdfSuman Jyoti
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXssuser89054b
 
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Bookingroncy bisnoi
 
Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfRagavanV2
 
Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01KreezheaRecto
 
Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdfKamal Acharya
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startQuintin Balsdon
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756dollysharma2066
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...roncy bisnoi
 
notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptMsecMca
 
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoorTop Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoordharasingh5698
 
chapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringchapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringmulugeta48
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptDineshKumar4165
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTbhaskargani46
 
Intro To Electric Vehicles PDF Notes.pdf
Intro To Electric Vehicles PDF Notes.pdfIntro To Electric Vehicles PDF Notes.pdf
Intro To Electric Vehicles PDF Notes.pdfrs7054576148
 
Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapUnleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapRishantSharmaFr
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...Call Girls in Nagpur High Profile
 

Recently uploaded (20)

VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
 
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank  Design by Working Stress - IS Method.pdfIntze Overhead Water Tank  Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
 
Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdf
 
Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01
 
Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdf
 
NFPA 5000 2024 standard .
NFPA 5000 2024 standard                                  .NFPA 5000 2024 standard                                  .
NFPA 5000 2024 standard .
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the start
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
 
notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.ppt
 
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoorTop Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
 
chapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringchapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineering
 
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.ppt
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
 
Intro To Electric Vehicles PDF Notes.pdf
Intro To Electric Vehicles PDF Notes.pdfIntro To Electric Vehicles PDF Notes.pdf
Intro To Electric Vehicles PDF Notes.pdf
 
Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapUnleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leap
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
 

Issues in data mining Patterns Online Analytical Processing

  • 1. Sanjivani Rural Education Society’s Sanjivani College of Engineering, Kopargaon-423 603 (An Autonomous Institute, Affiliated to Savitribai Phule Pune University, Pune) NACC ‘A’ Grade Accredited, ISO 9001:2015 Certified Department of Computer Engineering (NBA Accredited) Prof. S.A.Shivarkar Assistant Professor Contact No.8275032712 Email- shivarkarsandipcomp@sanjivani.org.in Subject- Data Mining and Warehousing (CO314) Unit –I: Introduction to Data Mining
  • 2. DEPARTMENT OF COMPUTER ENGINEERING, Sanjivani COE, Kopargaon 2 Content  Kinds of pattern and technologies  Issues in mining  OLAP, knowledge representation, Information and Knowledge
  • 3. DEPARTMENT OF COMPUTER ENGINEERING, Sanjivani COE, Kopargaon 3 Kinds of pattern and technologies  We have observed various types of data and information repositories on which data mining can be performed.  Let us now examine the kinds of patterns that can be mined.  There are a number of data mining functionalities.  These include characterization and discrimination the mining of frequent patterns, associations, and correlations classification and regression ,clustering analysis; and outlier analysis  Data mining functionalities are used to specify the kinds of patterns to be found in data mining tasks.
  • 4. DEPARTMENT OF COMPUTER ENGINEERING, Sanjivani COE, Kopargaon 4 Kinds of pattern and technologies  Pattern mining concentrates on identifying rules that describe specific patterns within the data.  Market-basket analysis, which identifies items that typically occur together in purchase transactions, was one of the first applications of data mining.
  • 5. DEPARTMENT OF COMPUTER ENGINEERING, Sanjivani COE, Kopargaon 5 Kinds of pattern and technologies  In general, such tasks can be classified into two categories:  Descriptive:  Descriptive mining tasks characterize properties of the data in a target data set.  Predictive:  Predictive mining tasks perform induction on the current data in order to make predictions.
  • 6. DEPARTMENT OF COMPUTER ENGINEERING, Sanjivani COE, Kopargaon 6 Class/Concept Description  Data entries can be associated with classes or concepts. e.g. in the AllElectronics store, classes of items for sale include computers and printers, and concepts of customers include bigSpenders and budgetSpenders.  It can be useful to describe individual classes and concepts in summarized, concise, and yet precise terms. Such descriptions of a class or a concept are called class/concept descriptions.  These descriptions can be derived using: (1) data characterization, by summarizing the data of the class under study (often called the target class) in general terms, or (2) data discrimination, by comparison of the target class with one or a set of comparative classes (often called the contrasting classes), or (3) both data characterization and discrimination.
  • 7. DEPARTMENT OF COMPUTER ENGINEERING, Sanjivani COE, Kopargaon 7 Data Characterization  In Data characterization Data entries can be associated with classes or concepts.  Data characterization is a summarization of the general characteristics or features of a target class of data.  The data corresponding to the user-specified class are typically collected by a query. e.g. to study the characteristics of software products with sales that increased by 10% in the previous year, the data related to such products can be collected by executing an SQL query on the sales database  The data cube-based OLAP roll-up operation can be used to perform user-controlled data summarization along a specified dimension.  The output of data characterization can be presented in various forms. Examples include pie charts, bar charts, curves, multidimensional data cubes, and multidimensional tables, including crosstabs.  The resulting descriptions can also be presented as generalized relations or in rule form (called characteristic rules).
  • 8. DEPARTMENT OF COMPUTER ENGINEERING, Sanjivani COE, Kopargaon 8 Data Discrimination  Data discrimination is a comparison of the general features of the target class data objects against the general features of objects from one or multiple contrasting classes.  The target and contrasting classes can be specified by a user, and the corresponding data objects can be retrieved through database queries. e.g. a user may want to compare the general features of software products with sales that increased by 10% last year against those with sales that decreased by at least 30% during the same period.  The methods used for data discrimination are similar to those used for data characterization.  The forms of output presentation are similar to those for characteristic descriptions, although discrimination descriptions should include comparative measures that help to distinguish between the target and contrasting classes.  Discrimination descriptions expressed in the form of rules are referred to as discriminant rules.
  • 9. DEPARTMENT OF COMPUTER ENGINEERING, Sanjivani COE, Kopargaon 9 Mining Frequent Patterns, Associations, and Correlations  Frequent patterns, as the name suggests, are patterns that occur frequently in data.  There are many kinds of frequent patterns, including frequent itemsets, frequent subsequences (also known as sequential patterns), and frequent substructures.  A frequent itemset typically refers to a set of items that often appear together in a transactional data set— e.g. milk and bread, which are frequently bought together in grocery stores by many customers.  A frequently occurring subsequence, such as the pattern that customers, tend to purchase first a laptop, followed by a digital camera, and then a memory card, is a (frequent) sequential pattern.  A substructure can refer to different structural forms (e.g., graphs, trees, or lattices) that may be combined with itemsets or subsequences.  If a substructure occurs frequently, it is called a (frequent) structured pattern.  Mining frequent patterns leads to the discovery of interesting associations and correlations within data.
  • 10. DEPARTMENT OF COMPUTER ENGINEERING, Sanjivani COE, Kopargaon 10 Support and Confidence  As we know data mining refers to extracting or mining knowledge from large amounts of data.  In other words, Data mining is the science, art, and technology of discovering large and complex bodies of data in order to discover useful patterns.  Support  In data mining, support refers to the relative frequency of an item set in a dataset. e.g. if an itemset occurs in 5% of the transactions in a dataset, it has a support of 5%. Support is often used as a threshold for identifying frequent item sets in a dataset, which can be used to generate association rules. e.g. if we set the support threshold to 5%, then any itemset that occurs in more than 5% of the transactions in the dataset will be considered a frequent itemset.
  • 11. DEPARTMENT OF COMPUTER ENGINEERING, Sanjivani COE, Kopargaon 11 Support and Confidence  Support  The support of an itemset is the number of transactions in which the itemset appears, divided by the total number of transactions. e.g. suppose we have a dataset of 1000 transactions, and the itemset {milk, bread} appears in 100 of those transactions. The support of the itemset {milk, bread} would be calculated as follows: Support({milk, bread}) = Number of transactions containing {milk, bread} / Total number of transactions = 100 / 1000 = 10%
  • 12. DEPARTMENT OF COMPUTER ENGINEERING, Sanjivani COE, Kopargaon 12 Confidence  Confidence  In data mining, confidence is a measure of the reliability or support for a given association rule. It is defined as the proportion of cases in which the association rule holds true, or in other words, the percentage of times that the items in the antecedent (the “if” part of the rule) appear in the same transaction as the items in the consequent (the “then” part of the rule).  Confidence is a measure of the likelihood that an itemset will appear if another itemset appears.
  • 13. DEPARTMENT OF COMPUTER ENGINEERING, Sanjivani COE, Kopargaon 13 Confidence  E.g. Confidence("If a customer buys milk, they will also buy bread") = Number of transactions containing {milk, bread} / Number of transactions containing {milk} = 100 / 200 = 50%
  • 14. Introduction to Data  We frequently hear the words Data, Information and Knowledge used as if they are the same thing.  Data is/are the facts of the World.  For example, take yourself. You may be 5ft tall, have brown hair and blue eyes. All of this is “data”. You have brown hair whether this is written down somewhere or not.
  • 15. Data  In many ways, data can be thought of as a description of the World. We can perceive this data with our senses, and then the brain can process this.
  • 16. Information  Information allows us to expand our knowledge beyond the range of our senses. We can capture data in information, then move it about so that other people can access it at different times.  If I take a picture of you, the photograph is information. But what you look like is data.
  • 17. Knowledge  Knowledge is what we know. Think of this as the map of the World we build inside our brains.  Like a physical map, it helps us know where things are – but it contains more than that.  It also contains our beliefs and expectations. “If I do this, I will probably get that.”  Crucially, the brain links all these things together into a giant network of ideas, memories, predictions, beliefs, etc.
  • 19. Online Analytical Processing (OLAP)  OLAP, or online analytical processing, is technology for performing high-speed complex queries or multidimensional analysis on large volumes of data in a data warehouse, data lake or other data repository.  OLAP is used in business intelligence (BI), decision support, and a variety of business forecasting and reporting applications.  The core of most OLAP systems, the OLAP cube is an array-based multidimensional database that makes it possible to process and analyze multiple data dimensions much more quickly and efficiently than a traditional relational database.  In theory, a cube can contain an infinite number of layers. (An OLAP cube representing more than three dimensions is sometimes called a hypercube.) And smaller cubes can exist within layers—for example, each store layer could contain cubes arranging sales by salesperson and product. In practice, data analysts will create OLAP cubes containing just the layers they need, for optimal analysis and performance.
  • 20. Online Analytical Processing (OLAP) cont…  Drill-down  The drill-down operation converts less-detailed data into more-detailed data through one of two methods—moving down in the concept hierarchy or adding a new dimension to the cube. For example, if you view sales data for an organization’s calendar or fiscal quarter, you can drill-down to see sales for each month, moving down in the concept hierarchy of the “time” dimension.  Roll up  Roll up is the opposite of the drill-down function—it aggregates data on an OLAP cube by moving up in the concept hierarchy or by reducing the number of dimensions. For example, you could move up in the concept hierarchy of the “location” dimension by viewing each country's data, rather than each city.
  • 21. Online Analytical Processing (OLAP) cont…  Slice and dice  The slice operation creates a sub-cube by selecting a single dimension from the main OLAP cube. For example, you can perform a slice by highlighting all data for the organization's first fiscal or calendar quarter (time dimension).  The dice operation isolates a sub-cube by selecting several dimensions within the main OLAP cube. For example, you could perform a dice operation by highlighting all data by an organization’s calendar or fiscal quarters (time dimension) and within the U.S. and Canada (location dimension).
  • 22. Online Analytical Processing (OLAP) cont…  Pivot  The pivot function rotates the current cube view to display a new representation of the data—enabling dynamic multidimensional views of data.  The OLAP pivot function is comparable to the pivot table feature in spreadsheet software, such as Microsoft Excel, but while pivot tables in Excel can be challenging, OLAP pivots are relatively easier to use (less expertise is required) and have a faster response time and query performance.
  • 24. DEPARTMENT OF COMPUTER ENGINEERING, Sanjivani COE, Kopargaon 24 Reference  Han, Jiawei Kamber, Micheline Pei and Jian, “Data Mining: Concepts and Techniques”,Elsevier Publishers, ISBN:9780123814791, 9780123814807.  https://www.ibm.com/topics/olap