SlideShare a Scribd company logo
1 of 24
Download to read offline
Sanjivani Rural Education Societyโ€™s
Sanjivani College of Engineering, Kopargaon-423 603
(An Autonomous Institute, Affiliated to Savitribai Phule Pune University, Pune)
NACC โ€˜Aโ€™ Grade Accredited, ISO 9001:2015 Certified
Department of Computer Engineering
(NBA Accredited)
Prof. S.A.Shivarkar
Assistant Professor
Contact No.8275032712
Email- shivarkarsandipcomp@sanjivani.org.in
Subject- Data Mining and Warehousing (CO314)
Unit โ€“I: Introduction to Data Mining
DEPARTMENT OF COMPUTER ENGINEERING, Sanjivani COE, Kopargaon 2
Content
๏ถ Kinds of pattern and technologies
๏ถ Issues in mining
๏ถ OLAP, knowledge representation, Information and Knowledge
DEPARTMENT OF COMPUTER ENGINEERING, Sanjivani COE, Kopargaon 3
Kinds of pattern and technologies
๏ฎ We have observed various types of data and information repositories on which data mining can
be performed.
๏ฎ Let us now examine the kinds of patterns that can be mined.
๏ฎ There are a number of data mining functionalities.
๏ฎ These include characterization and discrimination the mining of frequent patterns, associations,
and correlations classification and regression ,clustering analysis; and outlier analysis
๏ฎ Data mining functionalities are used to specify the kinds of patterns to be found in data mining
tasks.
DEPARTMENT OF COMPUTER ENGINEERING, Sanjivani COE, Kopargaon 4
Kinds of pattern and technologies
๏ฎ Pattern mining concentrates
on identifying rules that
describe specific patterns
within the data.
๏ฎ Market-basket analysis,
which identifies items that
typically occur together in
purchase transactions, was
one of the first applications
of data mining.
DEPARTMENT OF COMPUTER ENGINEERING, Sanjivani COE, Kopargaon 5
Kinds of pattern and technologies
๏ฎ In general, such tasks can be classified into two categories:
๏ฎ Descriptive:
๏ƒ˜ Descriptive mining tasks characterize properties of the data in a target data
set.
๏ฎ Predictive:
๏ƒ˜ Predictive mining tasks perform induction on the current data in order to
make predictions.
DEPARTMENT OF COMPUTER ENGINEERING, Sanjivani COE, Kopargaon 6
Class/Concept Description
๏ฎ Data entries can be associated with classes or concepts.
e.g. in the AllElectronics store, classes of items for sale include computers and printers, and
concepts of customers include bigSpenders and budgetSpenders.
๏ฎ It can be useful to describe individual classes and concepts in summarized, concise, and yet
precise terms. Such descriptions of a class or a concept are called class/concept descriptions.
๏ฎ These descriptions can be derived using:
(1) data characterization, by summarizing the data of the class under study (often called the target
class) in general terms, or
(2) data discrimination, by comparison of the target class with one or a set of comparative classes
(often called the contrasting classes), or
(3) both data characterization and discrimination.
DEPARTMENT OF COMPUTER ENGINEERING, Sanjivani COE, Kopargaon 7
Data Characterization
๏ฎ In Data characterization Data entries can be associated with classes or concepts.
๏ฎ Data characterization is a summarization of the general characteristics or features of a target class of data.
๏ฎ The data corresponding to the user-specified class are typically collected by a query.
e.g. to study the characteristics of software products with sales that increased by 10% in the previous year,
the data related to such products can be collected by executing an SQL query on the sales database
๏ฎ The data cube-based OLAP roll-up operation can be used to perform user-controlled data summarization
along a specified dimension.
๏ฎ The output of data characterization can be presented in various forms. Examples include pie charts, bar
charts, curves, multidimensional data cubes, and multidimensional tables, including crosstabs.
๏ฎ The resulting descriptions can also be presented as generalized relations or in rule form (called
characteristic rules).
DEPARTMENT OF COMPUTER ENGINEERING, Sanjivani COE, Kopargaon 8
Data Discrimination
๏ฎ Data discrimination is a comparison of the general features of the target class data objects
against the general features of objects from one or multiple contrasting classes.
๏ฎ The target and contrasting classes can be specified by a user, and the corresponding data objects
can be retrieved through database queries.
e.g. a user may want to compare the general features of software products with sales that
increased by 10% last year against those with sales that decreased by at least 30% during the same
period.
๏ฎ The methods used for data discrimination are similar to those used for data characterization.
๏ฎ The forms of output presentation are similar to those for characteristic descriptions, although
discrimination descriptions should include comparative measures that help to distinguish between
the target and contrasting classes.
๏ฎ Discrimination descriptions expressed in the form of rules are referred to as discriminant rules.
DEPARTMENT OF COMPUTER ENGINEERING, Sanjivani COE, Kopargaon 9
Mining Frequent Patterns, Associations, and Correlations
๏ฎ Frequent patterns, as the name suggests, are patterns that occur frequently in data.
๏ฎ There are many kinds of frequent patterns, including frequent itemsets, frequent subsequences
(also known as sequential patterns), and frequent substructures.
๏ฎ A frequent itemset typically refers to a set of items that often appear together in a transactional
data setโ€” e.g. milk and bread, which are frequently bought together in grocery stores by many
customers.
๏ฎ A frequently occurring subsequence, such as the pattern that customers, tend to purchase first a
laptop, followed by a digital camera, and then a memory card, is a (frequent) sequential pattern.
๏ฎ A substructure can refer to different structural forms (e.g., graphs, trees, or lattices) that may be
combined with itemsets or subsequences.
๏ฎ If a substructure occurs frequently, it is called a (frequent) structured pattern.
๏ฎ Mining frequent patterns leads to the discovery of interesting associations and correlations within
data.
DEPARTMENT OF COMPUTER ENGINEERING, Sanjivani COE, Kopargaon 10
Support and Confidence
๏ฎ As we know data mining refers to extracting or mining knowledge from large amounts of data.
๏ฎ In other words, Data mining is the science, art, and technology of discovering large and
complex bodies of data in order to discover useful patterns.
๏ฎ Support
๏ฎ In data mining, support refers to the relative frequency of an item set in a dataset.
e.g. if an itemset occurs in 5% of the transactions in a dataset, it has a support of 5%.
Support is often used as a threshold for identifying frequent item sets in a dataset,
which can be used to generate association rules.
e.g. if we set the support threshold to 5%, then any itemset that occurs in more than
5% of the transactions in the dataset will be considered a frequent itemset.
DEPARTMENT OF COMPUTER ENGINEERING, Sanjivani COE, Kopargaon 11
Support and Confidence
๏ฎ Support
๏ฎ The support of an itemset is the number of transactions in which the itemset
appears, divided by the total number of transactions.
e.g. suppose we have a dataset of 1000 transactions, and the itemset {milk, bread}
appears in 100 of those transactions. The support of the itemset {milk, bread} would
be calculated as follows:
Support({milk, bread})
= Number of transactions containing {milk, bread} / Total number of transactions
= 100 / 1000 = 10%
DEPARTMENT OF COMPUTER ENGINEERING, Sanjivani COE, Kopargaon 12
Confidence
๏ฎ Confidence
๏ฎ In data mining, confidence is a measure of the reliability or support for a given
association rule. It is defined as the proportion of cases in which the association
rule holds true, or in other words, the percentage of times that the items in the
antecedent (the โ€œifโ€ part of the rule) appear in the same transaction as the items
in the consequent (the โ€œthenโ€ part of the rule).
๏ฎ Confidence is a measure of the likelihood that an itemset will appear if another
itemset appears.
DEPARTMENT OF COMPUTER ENGINEERING, Sanjivani COE, Kopargaon 13
Confidence
๏ฎ E.g.
Confidence("If a customer buys milk, they will also buy bread")
= Number of transactions containing {milk, bread} / Number of transactions containing {milk}
= 100 / 200 = 50%
Introduction to Data
๏ฎ We frequently hear the words Data, Information and Knowledge used as if
they are the same thing.
๏ฎ Data is/are the facts of the World.
๏ฎ For example, take yourself. You may be 5ft tall, have brown hair and blue
eyes. All of this is โ€œdataโ€. You have brown hair whether this is written
down somewhere or not.
Data
๏ฎ In many ways, data can be thought of as a description of the World.
We can perceive this data with our senses, and then the brain can
process this.
Information
๏ฎ Information allows us to expand our knowledge beyond the range of our senses. We
can capture data in information, then move it about so that other people can access it
at different times.
๏ฎ If I take a picture of you, the photograph is information. But what you look like is data.
Knowledge
๏ฎ Knowledge is what we know. Think of this
as the map of the World we build inside our
brains.
๏ฎ Like a physical map, it helps us know
where things are โ€“ but it contains more
than that.
๏ฎ It also contains our beliefs and
expectations. โ€œIf I do this, I will probably
get that.โ€
๏ฎ Crucially, the brain links all these things
together into a giant network of ideas,
memories, predictions, beliefs, etc.
Data, Information and Knowledge
Online Analytical Processing (OLAP)
๏ฎ OLAP, or online analytical processing, is technology for performing high-speed complex
queries or multidimensional analysis on large volumes of data in a data
warehouse, data lake or other data repository.
๏ฎ OLAP is used in business intelligence (BI), decision support, and a variety of business
forecasting and reporting applications.
๏ฎ The core of most OLAP systems, the OLAP cube is an array-based multidimensional
database that makes it possible to process and analyze multiple data dimensions much
more quickly and efficiently than a traditional relational database.
๏ฎ In theory, a cube can contain an infinite number of layers. (An OLAP cube representing
more than three dimensions is sometimes called a hypercube.) And smaller cubes can
exist within layersโ€”for example, each store layer could contain cubes arranging sales
by salesperson and product. In practice, data analysts will create OLAP cubes
containing just the layers they need, for optimal analysis and performance.
Online Analytical Processing (OLAP) contโ€ฆ
๏ฎ Drill-down
๏ฎ The drill-down operation converts less-detailed data into more-detailed data
through one of two methodsโ€”moving down in the concept hierarchy or adding
a new dimension to the cube. For example, if you view sales data for an
organizationโ€™s calendar or fiscal quarter, you can drill-down to see sales for each
month, moving down in the concept hierarchy of the โ€œtimeโ€ dimension.
๏ฎ Roll up
๏ฎ Roll up is the opposite of the drill-down functionโ€”it aggregates data on an OLAP
cube by moving up in the concept hierarchy or by reducing the number of
dimensions. For example, you could move up in the concept hierarchy of the
โ€œlocationโ€ dimension by viewing each country's data, rather than each city.
Online Analytical Processing (OLAP) contโ€ฆ
๏ฎ Slice and dice
๏ฎ The slice operation creates a sub-cube by selecting a single dimension from the
main OLAP cube. For example, you can perform a slice by highlighting all data for
the organization's first fiscal or calendar quarter (time dimension).
๏ฎ The dice operation isolates a sub-cube by selecting several dimensions within
the main OLAP cube. For example, you could perform a dice operation by
highlighting all data by an organizationโ€™s calendar or fiscal quarters (time
dimension) and within the U.S. and Canada (location dimension).
Online Analytical Processing (OLAP) contโ€ฆ
๏ฎ Pivot
๏ฎ The pivot function rotates the current cube view to display a new representation
of the dataโ€”enabling dynamic multidimensional views of data.
๏ฎ The OLAP pivot function is comparable to the pivot table feature in spreadsheet
software, such as Microsoft Excel, but while pivot tables in Excel can be
challenging, OLAP pivots are relatively easier to use (less expertise is required)
and have a faster response time and query performance.
OLAP vs OLTP
DEPARTMENT OF COMPUTER ENGINEERING, Sanjivani COE, Kopargaon 24
Reference
๏ถ Han, Jiawei Kamber, Micheline Pei and Jian, โ€œData Mining: Concepts and
Techniquesโ€,Elsevier Publishers, ISBN:9780123814791, 9780123814807.
๏ถ https://www.ibm.com/topics/olap

More Related Content

Similar to Introduction to Data Mining, KDD Process, OLTP and OLAP

Data Mining
Data MiningData Mining
Data MiningGary Stefan
ย 
Research methodology-Research Report
Research methodology-Research ReportResearch methodology-Research Report
Research methodology-Research ReportDrMAlagupriyasafiq
ย 
Research Methodology-Data Processing
Research Methodology-Data ProcessingResearch Methodology-Data Processing
Research Methodology-Data ProcessingDrMAlagupriyasafiq
ย 
Applications of machine learning
Applications of machine learningApplications of machine learning
Applications of machine learningbusiness Corporate
ย 
Questions On The And Football
Questions On The And FootballQuestions On The And Football
Questions On The And FootballAmanda Gray
ย 
Evaluation Mechanism for Similarity-Based Ranked Search Over Scientific Data
Evaluation Mechanism for Similarity-Based Ranked Search Over Scientific DataEvaluation Mechanism for Similarity-Based Ranked Search Over Scientific Data
Evaluation Mechanism for Similarity-Based Ranked Search Over Scientific DataAM Publications
ย 
Evaluation Mechanism for Similarity-Based Ranked Search Over Scientific Data
Evaluation Mechanism for Similarity-Based Ranked Search Over Scientific DataEvaluation Mechanism for Similarity-Based Ranked Search Over Scientific Data
Evaluation Mechanism for Similarity-Based Ranked Search Over Scientific DataAM Publications
ย 
Data Mining and Data Warehousing (MAKAUT)
Data Mining and Data Warehousing (MAKAUT)Data Mining and Data Warehousing (MAKAUT)
Data Mining and Data Warehousing (MAKAUT)Bikramjit Sarkar, Ph.D.
ย 
Data mining techniques
Data mining techniquesData mining techniques
Data mining techniquesHatem Magdy
ย 
Barga, roger. predictive analytics with microsoft azure machine learning
Barga, roger. predictive analytics with microsoft azure machine learningBarga, roger. predictive analytics with microsoft azure machine learning
Barga, roger. predictive analytics with microsoft azure machine learningmaldonadojorge
ย 
KIT-601 Lecture Notes-UNIT-1.pdf
KIT-601 Lecture Notes-UNIT-1.pdfKIT-601 Lecture Notes-UNIT-1.pdf
KIT-601 Lecture Notes-UNIT-1.pdfDr. Radhey Shyam
ย 
BigData Analytics_1.7
BigData Analytics_1.7BigData Analytics_1.7
BigData Analytics_1.7Rohit Mittal
ย 
Data .pptx
Data .pptxData .pptx
Data .pptxssuserbda195
ย 
Data Mining Techniques
Data Mining TechniquesData Mining Techniques
Data Mining TechniquesSanzid Kawsar
ย 
QQ Plot.pptx
QQ Plot.pptxQQ Plot.pptx
QQ Plot.pptxRahul Borate
ย 
Unit i big data introduction
Unit  i big data introductionUnit  i big data introduction
Unit i big data introductionSujaMaryD
ย 
Z36149154
Z36149154Z36149154
Z36149154IJERA Editor
ย 
DataMining Techniq
DataMining TechniqDataMining Techniq
DataMining TechniqRespa Peter
ย 

Similar to Introduction to Data Mining, KDD Process, OLTP and OLAP (20)

Data Mining
Data MiningData Mining
Data Mining
ย 
Research methodology-Research Report
Research methodology-Research ReportResearch methodology-Research Report
Research methodology-Research Report
ย 
Research Methodology-Data Processing
Research Methodology-Data ProcessingResearch Methodology-Data Processing
Research Methodology-Data Processing
ย 
Applications of machine learning
Applications of machine learningApplications of machine learning
Applications of machine learning
ย 
Unit i
Unit iUnit i
Unit i
ย 
Questions On The And Football
Questions On The And FootballQuestions On The And Football
Questions On The And Football
ย 
Evaluation Mechanism for Similarity-Based Ranked Search Over Scientific Data
Evaluation Mechanism for Similarity-Based Ranked Search Over Scientific DataEvaluation Mechanism for Similarity-Based Ranked Search Over Scientific Data
Evaluation Mechanism for Similarity-Based Ranked Search Over Scientific Data
ย 
Evaluation Mechanism for Similarity-Based Ranked Search Over Scientific Data
Evaluation Mechanism for Similarity-Based Ranked Search Over Scientific DataEvaluation Mechanism for Similarity-Based Ranked Search Over Scientific Data
Evaluation Mechanism for Similarity-Based Ranked Search Over Scientific Data
ย 
Data Mining and Data Warehousing (MAKAUT)
Data Mining and Data Warehousing (MAKAUT)Data Mining and Data Warehousing (MAKAUT)
Data Mining and Data Warehousing (MAKAUT)
ย 
Data mining techniques
Data mining techniquesData mining techniques
Data mining techniques
ย 
Barga, roger. predictive analytics with microsoft azure machine learning
Barga, roger. predictive analytics with microsoft azure machine learningBarga, roger. predictive analytics with microsoft azure machine learning
Barga, roger. predictive analytics with microsoft azure machine learning
ย 
KIT-601 Lecture Notes-UNIT-1.pdf
KIT-601 Lecture Notes-UNIT-1.pdfKIT-601 Lecture Notes-UNIT-1.pdf
KIT-601 Lecture Notes-UNIT-1.pdf
ย 
BigData Analytics_1.7
BigData Analytics_1.7BigData Analytics_1.7
BigData Analytics_1.7
ย 
Data .pptx
Data .pptxData .pptx
Data .pptx
ย 
Data Mining Techniques
Data Mining TechniquesData Mining Techniques
Data Mining Techniques
ย 
QQ Plot.pptx
QQ Plot.pptxQQ Plot.pptx
QQ Plot.pptx
ย 
Unit i big data introduction
Unit  i big data introductionUnit  i big data introduction
Unit i big data introduction
ย 
Z36149154
Z36149154Z36149154
Z36149154
ย 
DataMining Techniq
DataMining TechniqDataMining Techniq
DataMining Techniq
ย 
Data mining
Data miningData mining
Data mining
ย 

More from ShivarkarSandip

Cluster Analysis: Measuring Similarity & Dissimilarity
Cluster Analysis: Measuring Similarity & DissimilarityCluster Analysis: Measuring Similarity & Dissimilarity
Cluster Analysis: Measuring Similarity & DissimilarityShivarkarSandip
ย 
Classification, Attribute Selection, Classifiers- Decision Tree, ID3,C4.5,Nav...
Classification, Attribute Selection, Classifiers- Decision Tree, ID3,C4.5,Nav...Classification, Attribute Selection, Classifiers- Decision Tree, ID3,C4.5,Nav...
Classification, Attribute Selection, Classifiers- Decision Tree, ID3,C4.5,Nav...ShivarkarSandip
ย 
Frequent Pattern Analysis, Apriori and FP Growth Algorithm
Frequent Pattern Analysis, Apriori and FP Growth AlgorithmFrequent Pattern Analysis, Apriori and FP Growth Algorithm
Frequent Pattern Analysis, Apriori and FP Growth AlgorithmShivarkarSandip
ย 
Data Warehouse and Architecture, OLAP Operation
Data Warehouse and Architecture, OLAP OperationData Warehouse and Architecture, OLAP Operation
Data Warehouse and Architecture, OLAP OperationShivarkarSandip
ย 
Data Preparation and Preprocessing , Data Cleaning
Data Preparation and Preprocessing , Data CleaningData Preparation and Preprocessing , Data Cleaning
Data Preparation and Preprocessing , Data CleaningShivarkarSandip
ย 
Introduction to Data Mining KDD Process OLAP
Introduction to Data Mining KDD Process OLAPIntroduction to Data Mining KDD Process OLAP
Introduction to Data Mining KDD Process OLAPShivarkarSandip
ย 
Issues in data mining Patterns Online Analytical Processing
Issues in data mining  Patterns Online Analytical ProcessingIssues in data mining  Patterns Online Analytical Processing
Issues in data mining Patterns Online Analytical ProcessingShivarkarSandip
ย 
Introduction to data mining which covers the basics
Introduction to data mining which covers the basicsIntroduction to data mining which covers the basics
Introduction to data mining which covers the basicsShivarkarSandip
ย 
Introduction to Data Communication.pdf
Introduction to Data Communication.pdfIntroduction to Data Communication.pdf
Introduction to Data Communication.pdfShivarkarSandip
ย 
Classification of Signal.pdf
Classification of Signal.pdfClassification of Signal.pdf
Classification of Signal.pdfShivarkarSandip
ย 
Sequential Circuit Design-2.pdf
Sequential Circuit Design-2.pdfSequential Circuit Design-2.pdf
Sequential Circuit Design-2.pdfShivarkarSandip
ย 
Sequential Ckt.pdf
Sequential Ckt.pdfSequential Ckt.pdf
Sequential Ckt.pdfShivarkarSandip
ย 
Combinational Ckt.pdf
Combinational Ckt.pdfCombinational Ckt.pdf
Combinational Ckt.pdfShivarkarSandip
ย 
Boolean Algebra Terminologies.pdf
Boolean Algebra Terminologies.pdfBoolean Algebra Terminologies.pdf
Boolean Algebra Terminologies.pdfShivarkarSandip
ย 
Logic Minimization.pdf
Logic Minimization.pdfLogic Minimization.pdf
Logic Minimization.pdfShivarkarSandip
ย 
Unit III Introduction to DWH.pdf
Unit III Introduction to DWH.pdfUnit III Introduction to DWH.pdf
Unit III Introduction to DWH.pdfShivarkarSandip
ย 
Unit II Decision Making Basics and Concepts.pdf
Unit II Decision Making Basics and Concepts.pdfUnit II Decision Making Basics and Concepts.pdf
Unit II Decision Making Basics and Concepts.pdfShivarkarSandip
ย 
Unit I Factors Responsible for Successful BI Project.pdf
Unit I Factors Responsible for Successful BI Project.pdfUnit I Factors Responsible for Successful BI Project.pdf
Unit I Factors Responsible for Successful BI Project.pdfShivarkarSandip
ย 
Unit I Operational data Informational data.pdf
Unit I Operational data  Informational data.pdfUnit I Operational data  Informational data.pdf
Unit I Operational data Informational data.pdfShivarkarSandip
ย 

More from ShivarkarSandip (20)

Cluster Analysis: Measuring Similarity & Dissimilarity
Cluster Analysis: Measuring Similarity & DissimilarityCluster Analysis: Measuring Similarity & Dissimilarity
Cluster Analysis: Measuring Similarity & Dissimilarity
ย 
Classification, Attribute Selection, Classifiers- Decision Tree, ID3,C4.5,Nav...
Classification, Attribute Selection, Classifiers- Decision Tree, ID3,C4.5,Nav...Classification, Attribute Selection, Classifiers- Decision Tree, ID3,C4.5,Nav...
Classification, Attribute Selection, Classifiers- Decision Tree, ID3,C4.5,Nav...
ย 
Frequent Pattern Analysis, Apriori and FP Growth Algorithm
Frequent Pattern Analysis, Apriori and FP Growth AlgorithmFrequent Pattern Analysis, Apriori and FP Growth Algorithm
Frequent Pattern Analysis, Apriori and FP Growth Algorithm
ย 
Data Warehouse and Architecture, OLAP Operation
Data Warehouse and Architecture, OLAP OperationData Warehouse and Architecture, OLAP Operation
Data Warehouse and Architecture, OLAP Operation
ย 
Data Preparation and Preprocessing , Data Cleaning
Data Preparation and Preprocessing , Data CleaningData Preparation and Preprocessing , Data Cleaning
Data Preparation and Preprocessing , Data Cleaning
ย 
Introduction to Data Mining KDD Process OLAP
Introduction to Data Mining KDD Process OLAPIntroduction to Data Mining KDD Process OLAP
Introduction to Data Mining KDD Process OLAP
ย 
Issues in data mining Patterns Online Analytical Processing
Issues in data mining  Patterns Online Analytical ProcessingIssues in data mining  Patterns Online Analytical Processing
Issues in data mining Patterns Online Analytical Processing
ย 
Introduction to data mining which covers the basics
Introduction to data mining which covers the basicsIntroduction to data mining which covers the basics
Introduction to data mining which covers the basics
ย 
Introduction to Data Communication.pdf
Introduction to Data Communication.pdfIntroduction to Data Communication.pdf
Introduction to Data Communication.pdf
ย 
Classification of Signal.pdf
Classification of Signal.pdfClassification of Signal.pdf
Classification of Signal.pdf
ย 
Sequential Circuit Design-2.pdf
Sequential Circuit Design-2.pdfSequential Circuit Design-2.pdf
Sequential Circuit Design-2.pdf
ย 
Sequential Ckt.pdf
Sequential Ckt.pdfSequential Ckt.pdf
Sequential Ckt.pdf
ย 
Flip Flop.pdf
Flip Flop.pdfFlip Flop.pdf
Flip Flop.pdf
ย 
Combinational Ckt.pdf
Combinational Ckt.pdfCombinational Ckt.pdf
Combinational Ckt.pdf
ย 
Boolean Algebra Terminologies.pdf
Boolean Algebra Terminologies.pdfBoolean Algebra Terminologies.pdf
Boolean Algebra Terminologies.pdf
ย 
Logic Minimization.pdf
Logic Minimization.pdfLogic Minimization.pdf
Logic Minimization.pdf
ย 
Unit III Introduction to DWH.pdf
Unit III Introduction to DWH.pdfUnit III Introduction to DWH.pdf
Unit III Introduction to DWH.pdf
ย 
Unit II Decision Making Basics and Concepts.pdf
Unit II Decision Making Basics and Concepts.pdfUnit II Decision Making Basics and Concepts.pdf
Unit II Decision Making Basics and Concepts.pdf
ย 
Unit I Factors Responsible for Successful BI Project.pdf
Unit I Factors Responsible for Successful BI Project.pdfUnit I Factors Responsible for Successful BI Project.pdf
Unit I Factors Responsible for Successful BI Project.pdf
ย 
Unit I Operational data Informational data.pdf
Unit I Operational data  Informational data.pdfUnit I Operational data  Informational data.pdf
Unit I Operational data Informational data.pdf
ย 

Recently uploaded

Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Bookingroncy bisnoi
ย 
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...SUHANI PANDEY
ย 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfJiananWang21
ย 
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Bookingroncy bisnoi
ย 
Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdfKamal Acharya
ย 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performancesivaprakash250
ย 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Christo Ananth
ย 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)simmis5
ย 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdfankushspencer015
ย 
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7Call Girls in Nagpur High Profile Call Girls
ย 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfKamal Acharya
ย 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlysanyuktamishra911
ย 
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Bookingdharasingh5698
ย 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Christo Ananth
ย 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTbhaskargani46
ย 
Call Now โ‰ฝ 9953056974 โ‰ผ๐Ÿ” Call Girls In New Ashok Nagar โ‰ผ๐Ÿ” Delhi door step de...
Call Now โ‰ฝ 9953056974 โ‰ผ๐Ÿ” Call Girls In New Ashok Nagar  โ‰ผ๐Ÿ” Delhi door step de...Call Now โ‰ฝ 9953056974 โ‰ผ๐Ÿ” Call Girls In New Ashok Nagar  โ‰ผ๐Ÿ” Delhi door step de...
Call Now โ‰ฝ 9953056974 โ‰ผ๐Ÿ” Call Girls In New Ashok Nagar โ‰ผ๐Ÿ” Delhi door step de...9953056974 Low Rate Call Girls In Saket, Delhi NCR
ย 
Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfRagavanV2
ย 
NFPA 5000 2024 standard .
NFPA 5000 2024 standard                                  .NFPA 5000 2024 standard                                  .
NFPA 5000 2024 standard .DerechoLaboralIndivi
ย 

Recently uploaded (20)

Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
ย 
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
ย 
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
ย 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdf
ย 
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
ย 
Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdf
ย 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
ย 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
ย 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)
ย 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
ย 
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
ย 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ย 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
ย 
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
ย 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
ย 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
ย 
Call Now โ‰ฝ 9953056974 โ‰ผ๐Ÿ” Call Girls In New Ashok Nagar โ‰ผ๐Ÿ” Delhi door step de...
Call Now โ‰ฝ 9953056974 โ‰ผ๐Ÿ” Call Girls In New Ashok Nagar  โ‰ผ๐Ÿ” Delhi door step de...Call Now โ‰ฝ 9953056974 โ‰ผ๐Ÿ” Call Girls In New Ashok Nagar  โ‰ผ๐Ÿ” Delhi door step de...
Call Now โ‰ฝ 9953056974 โ‰ผ๐Ÿ” Call Girls In New Ashok Nagar โ‰ผ๐Ÿ” Delhi door step de...
ย 
Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdf
ย 
Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024
ย 
NFPA 5000 2024 standard .
NFPA 5000 2024 standard                                  .NFPA 5000 2024 standard                                  .
NFPA 5000 2024 standard .
ย 

Introduction to Data Mining, KDD Process, OLTP and OLAP

  • 1. Sanjivani Rural Education Societyโ€™s Sanjivani College of Engineering, Kopargaon-423 603 (An Autonomous Institute, Affiliated to Savitribai Phule Pune University, Pune) NACC โ€˜Aโ€™ Grade Accredited, ISO 9001:2015 Certified Department of Computer Engineering (NBA Accredited) Prof. S.A.Shivarkar Assistant Professor Contact No.8275032712 Email- shivarkarsandipcomp@sanjivani.org.in Subject- Data Mining and Warehousing (CO314) Unit โ€“I: Introduction to Data Mining
  • 2. DEPARTMENT OF COMPUTER ENGINEERING, Sanjivani COE, Kopargaon 2 Content ๏ถ Kinds of pattern and technologies ๏ถ Issues in mining ๏ถ OLAP, knowledge representation, Information and Knowledge
  • 3. DEPARTMENT OF COMPUTER ENGINEERING, Sanjivani COE, Kopargaon 3 Kinds of pattern and technologies ๏ฎ We have observed various types of data and information repositories on which data mining can be performed. ๏ฎ Let us now examine the kinds of patterns that can be mined. ๏ฎ There are a number of data mining functionalities. ๏ฎ These include characterization and discrimination the mining of frequent patterns, associations, and correlations classification and regression ,clustering analysis; and outlier analysis ๏ฎ Data mining functionalities are used to specify the kinds of patterns to be found in data mining tasks.
  • 4. DEPARTMENT OF COMPUTER ENGINEERING, Sanjivani COE, Kopargaon 4 Kinds of pattern and technologies ๏ฎ Pattern mining concentrates on identifying rules that describe specific patterns within the data. ๏ฎ Market-basket analysis, which identifies items that typically occur together in purchase transactions, was one of the first applications of data mining.
  • 5. DEPARTMENT OF COMPUTER ENGINEERING, Sanjivani COE, Kopargaon 5 Kinds of pattern and technologies ๏ฎ In general, such tasks can be classified into two categories: ๏ฎ Descriptive: ๏ƒ˜ Descriptive mining tasks characterize properties of the data in a target data set. ๏ฎ Predictive: ๏ƒ˜ Predictive mining tasks perform induction on the current data in order to make predictions.
  • 6. DEPARTMENT OF COMPUTER ENGINEERING, Sanjivani COE, Kopargaon 6 Class/Concept Description ๏ฎ Data entries can be associated with classes or concepts. e.g. in the AllElectronics store, classes of items for sale include computers and printers, and concepts of customers include bigSpenders and budgetSpenders. ๏ฎ It can be useful to describe individual classes and concepts in summarized, concise, and yet precise terms. Such descriptions of a class or a concept are called class/concept descriptions. ๏ฎ These descriptions can be derived using: (1) data characterization, by summarizing the data of the class under study (often called the target class) in general terms, or (2) data discrimination, by comparison of the target class with one or a set of comparative classes (often called the contrasting classes), or (3) both data characterization and discrimination.
  • 7. DEPARTMENT OF COMPUTER ENGINEERING, Sanjivani COE, Kopargaon 7 Data Characterization ๏ฎ In Data characterization Data entries can be associated with classes or concepts. ๏ฎ Data characterization is a summarization of the general characteristics or features of a target class of data. ๏ฎ The data corresponding to the user-specified class are typically collected by a query. e.g. to study the characteristics of software products with sales that increased by 10% in the previous year, the data related to such products can be collected by executing an SQL query on the sales database ๏ฎ The data cube-based OLAP roll-up operation can be used to perform user-controlled data summarization along a specified dimension. ๏ฎ The output of data characterization can be presented in various forms. Examples include pie charts, bar charts, curves, multidimensional data cubes, and multidimensional tables, including crosstabs. ๏ฎ The resulting descriptions can also be presented as generalized relations or in rule form (called characteristic rules).
  • 8. DEPARTMENT OF COMPUTER ENGINEERING, Sanjivani COE, Kopargaon 8 Data Discrimination ๏ฎ Data discrimination is a comparison of the general features of the target class data objects against the general features of objects from one or multiple contrasting classes. ๏ฎ The target and contrasting classes can be specified by a user, and the corresponding data objects can be retrieved through database queries. e.g. a user may want to compare the general features of software products with sales that increased by 10% last year against those with sales that decreased by at least 30% during the same period. ๏ฎ The methods used for data discrimination are similar to those used for data characterization. ๏ฎ The forms of output presentation are similar to those for characteristic descriptions, although discrimination descriptions should include comparative measures that help to distinguish between the target and contrasting classes. ๏ฎ Discrimination descriptions expressed in the form of rules are referred to as discriminant rules.
  • 9. DEPARTMENT OF COMPUTER ENGINEERING, Sanjivani COE, Kopargaon 9 Mining Frequent Patterns, Associations, and Correlations ๏ฎ Frequent patterns, as the name suggests, are patterns that occur frequently in data. ๏ฎ There are many kinds of frequent patterns, including frequent itemsets, frequent subsequences (also known as sequential patterns), and frequent substructures. ๏ฎ A frequent itemset typically refers to a set of items that often appear together in a transactional data setโ€” e.g. milk and bread, which are frequently bought together in grocery stores by many customers. ๏ฎ A frequently occurring subsequence, such as the pattern that customers, tend to purchase first a laptop, followed by a digital camera, and then a memory card, is a (frequent) sequential pattern. ๏ฎ A substructure can refer to different structural forms (e.g., graphs, trees, or lattices) that may be combined with itemsets or subsequences. ๏ฎ If a substructure occurs frequently, it is called a (frequent) structured pattern. ๏ฎ Mining frequent patterns leads to the discovery of interesting associations and correlations within data.
  • 10. DEPARTMENT OF COMPUTER ENGINEERING, Sanjivani COE, Kopargaon 10 Support and Confidence ๏ฎ As we know data mining refers to extracting or mining knowledge from large amounts of data. ๏ฎ In other words, Data mining is the science, art, and technology of discovering large and complex bodies of data in order to discover useful patterns. ๏ฎ Support ๏ฎ In data mining, support refers to the relative frequency of an item set in a dataset. e.g. if an itemset occurs in 5% of the transactions in a dataset, it has a support of 5%. Support is often used as a threshold for identifying frequent item sets in a dataset, which can be used to generate association rules. e.g. if we set the support threshold to 5%, then any itemset that occurs in more than 5% of the transactions in the dataset will be considered a frequent itemset.
  • 11. DEPARTMENT OF COMPUTER ENGINEERING, Sanjivani COE, Kopargaon 11 Support and Confidence ๏ฎ Support ๏ฎ The support of an itemset is the number of transactions in which the itemset appears, divided by the total number of transactions. e.g. suppose we have a dataset of 1000 transactions, and the itemset {milk, bread} appears in 100 of those transactions. The support of the itemset {milk, bread} would be calculated as follows: Support({milk, bread}) = Number of transactions containing {milk, bread} / Total number of transactions = 100 / 1000 = 10%
  • 12. DEPARTMENT OF COMPUTER ENGINEERING, Sanjivani COE, Kopargaon 12 Confidence ๏ฎ Confidence ๏ฎ In data mining, confidence is a measure of the reliability or support for a given association rule. It is defined as the proportion of cases in which the association rule holds true, or in other words, the percentage of times that the items in the antecedent (the โ€œifโ€ part of the rule) appear in the same transaction as the items in the consequent (the โ€œthenโ€ part of the rule). ๏ฎ Confidence is a measure of the likelihood that an itemset will appear if another itemset appears.
  • 13. DEPARTMENT OF COMPUTER ENGINEERING, Sanjivani COE, Kopargaon 13 Confidence ๏ฎ E.g. Confidence("If a customer buys milk, they will also buy bread") = Number of transactions containing {milk, bread} / Number of transactions containing {milk} = 100 / 200 = 50%
  • 14. Introduction to Data ๏ฎ We frequently hear the words Data, Information and Knowledge used as if they are the same thing. ๏ฎ Data is/are the facts of the World. ๏ฎ For example, take yourself. You may be 5ft tall, have brown hair and blue eyes. All of this is โ€œdataโ€. You have brown hair whether this is written down somewhere or not.
  • 15. Data ๏ฎ In many ways, data can be thought of as a description of the World. We can perceive this data with our senses, and then the brain can process this.
  • 16. Information ๏ฎ Information allows us to expand our knowledge beyond the range of our senses. We can capture data in information, then move it about so that other people can access it at different times. ๏ฎ If I take a picture of you, the photograph is information. But what you look like is data.
  • 17. Knowledge ๏ฎ Knowledge is what we know. Think of this as the map of the World we build inside our brains. ๏ฎ Like a physical map, it helps us know where things are โ€“ but it contains more than that. ๏ฎ It also contains our beliefs and expectations. โ€œIf I do this, I will probably get that.โ€ ๏ฎ Crucially, the brain links all these things together into a giant network of ideas, memories, predictions, beliefs, etc.
  • 19. Online Analytical Processing (OLAP) ๏ฎ OLAP, or online analytical processing, is technology for performing high-speed complex queries or multidimensional analysis on large volumes of data in a data warehouse, data lake or other data repository. ๏ฎ OLAP is used in business intelligence (BI), decision support, and a variety of business forecasting and reporting applications. ๏ฎ The core of most OLAP systems, the OLAP cube is an array-based multidimensional database that makes it possible to process and analyze multiple data dimensions much more quickly and efficiently than a traditional relational database. ๏ฎ In theory, a cube can contain an infinite number of layers. (An OLAP cube representing more than three dimensions is sometimes called a hypercube.) And smaller cubes can exist within layersโ€”for example, each store layer could contain cubes arranging sales by salesperson and product. In practice, data analysts will create OLAP cubes containing just the layers they need, for optimal analysis and performance.
  • 20. Online Analytical Processing (OLAP) contโ€ฆ ๏ฎ Drill-down ๏ฎ The drill-down operation converts less-detailed data into more-detailed data through one of two methodsโ€”moving down in the concept hierarchy or adding a new dimension to the cube. For example, if you view sales data for an organizationโ€™s calendar or fiscal quarter, you can drill-down to see sales for each month, moving down in the concept hierarchy of the โ€œtimeโ€ dimension. ๏ฎ Roll up ๏ฎ Roll up is the opposite of the drill-down functionโ€”it aggregates data on an OLAP cube by moving up in the concept hierarchy or by reducing the number of dimensions. For example, you could move up in the concept hierarchy of the โ€œlocationโ€ dimension by viewing each country's data, rather than each city.
  • 21. Online Analytical Processing (OLAP) contโ€ฆ ๏ฎ Slice and dice ๏ฎ The slice operation creates a sub-cube by selecting a single dimension from the main OLAP cube. For example, you can perform a slice by highlighting all data for the organization's first fiscal or calendar quarter (time dimension). ๏ฎ The dice operation isolates a sub-cube by selecting several dimensions within the main OLAP cube. For example, you could perform a dice operation by highlighting all data by an organizationโ€™s calendar or fiscal quarters (time dimension) and within the U.S. and Canada (location dimension).
  • 22. Online Analytical Processing (OLAP) contโ€ฆ ๏ฎ Pivot ๏ฎ The pivot function rotates the current cube view to display a new representation of the dataโ€”enabling dynamic multidimensional views of data. ๏ฎ The OLAP pivot function is comparable to the pivot table feature in spreadsheet software, such as Microsoft Excel, but while pivot tables in Excel can be challenging, OLAP pivots are relatively easier to use (less expertise is required) and have a faster response time and query performance.
  • 24. DEPARTMENT OF COMPUTER ENGINEERING, Sanjivani COE, Kopargaon 24 Reference ๏ถ Han, Jiawei Kamber, Micheline Pei and Jian, โ€œData Mining: Concepts and Techniquesโ€,Elsevier Publishers, ISBN:9780123814791, 9780123814807. ๏ถ https://www.ibm.com/topics/olap