SlideShare a Scribd company logo
1 of 38
Download to read offline
C S 5 6 1 - S P R I N G 2 0 1 2
W P I , M O H A M E D E LTA B A K H
OLAP &
DATA MINING
1
Online Analytic Processing
OLAP
2
OLAP
•  OLAP: Online Analytic Processing
•  OLAP queries are complex queries that
•  Touch large amounts of data
•  Discover patterns and trends in the data
•  Typically expensive queries that take long time
•  Also called decision-support queries
•  In contrast to OLAP:
•  OLTP: Online Transaction Processing
•  OLTP queries are simple queries, e.g., over banking or airline
systems
•  OLTP queries touch small amount of data for fast transactions
3
OLTP vs. OLAP
§  On-Line Transaction Processing (OLTP):
–  technology used to perform updates on operational or
transactional systems (e.g., point of sale systems)
§  On-Line Analytical Processing (OLAP):
–  technology used to perform complex analysis of the data
in a data warehouse
4
OLAP is a category of software technology that enables
analysts, managers, and executives to gain insight into data
through fast, consistent, interactive access to a wide variety
of possible views of information that has been transformed
from raw data to reflect the dimensionality of the enterprise
as understood by the user.
[source: OLAP Council: www.olapcouncil.org]
OLAP AND DATA WAREHOUSE
5
Query and
Analysis
Component
Data
Integration
Component
Data
Warehouse
Operational
DBs
External
Sources
Internal
Sources
OLAP
Server
Meta
data
OLAP
Reports
Client
Tools
Data
Mining
OLAP AND DATA WAREHOUSE
•  Typically, OLAP queries are executed over a separate copy of
the working data
•  Over data warehouse
•  Data warehouse is periodically updated, e.g., overnight
•  OLAP queries tolerate such out-of-date gaps
•  Why run OLAP queries over data warehouse??
•  Warehouse collects and combines data from multiple sources
•  Warehouse may organize the data in certain formats to support OLAP
queries
•  OLAP queries are complex and touch large amounts of data
•  They may lock the database for long periods of time
•  Negatively affects all other OLTP transactions
6
OLAP ARCHITECTURE
7
EXAMPLE OLAP APPLICATIONS
•  Market Analysis
•  Find which items are frequently sold over the summer but
not over winter?
•  Credit Card Companies
•  Given a new applicant, does (s)he a credit-worthy?
•  Need to check other similar applicants (age, gender,
income, etc…) and observe how they perform, then do
prediction for new applicant
8
OLAP queries are also called “decision-
support” queries
MULTI-DIMENSIONAL VIEW
•  Data is typically viewed as points
in multi-dimensional space
9
10
47
30
12Milk 1%fat
3/1 3/2 3/3 3/4
Milk 2%fat
Orange
juice
bread
Time
Items
NY
MA
CA
Location
Raw data cubes
(raw level without
aggregation)
Typical OLAP applications
have many dimensions
ANOTHER EXAMPLE
10
!"# !$$%&#'()
"#'&#*
APPROACHES FOR OLAP
•  Relational OLAP (ROLAP)
•  Multi-dimensional OLAP (MOLAP)
•  Hybrid OLAP (HOLAP) = ROLAP + MOLAP
11
RELATIONAL OLAP: ROLAP
•  Data are stored in relational model (tables)
•  Special schema called Star Schema
•  One relation is the fact table, all the others are dimension tables
12
Facts
Week
Product
Product
Year
Region
Time
Channel
Revenue
Expenses
Units
Model
Type
Color
Channel
Region
Nation
District
Dealer
Time
Large table Small tables
CUBE vs. STAR SCHEMA
13
Facts
Week
Product
Product
Year
Region
Time
Channel
Revenue
Expenses
Units
Model
Type
Color
Channel
Region
Nation
District
Dealer
Time
Data inside the cube
are the fact records
Dimension tables
describe the dimensions
10
47
30
12Milk 1%fat
3/1 3/2 3/3 3/4
Milk 2%fat
Orange
juice
bread
Time
Items
NY
MA
CA
Location
ROLAP: EXTENSIONS TO DBMS
•  Schema design
•  Specialized scan, indexing and join techniques
•  Handling of aggregate views (querying and materialization)
•  Supporting query language extensions beyond SQL
•  Complex query processing and optimization
•  Data partitioning and parallelism
14
SLICING & DICING
•  Dicing
•  how each dimension in the cube
is divided
•  Different granularities
•  When building the data cube
•  Slicing
•  Selecting slices of the data cube
to answer the OLAP query
•  When answering a query
15
Dicing Time by day
10
47
30
12Milk 1%fat
3/1 3/2 3/3 3/4
Milk 2%fat
Orange
juice
bread
Time
Items
NY
MA
CA
Location
Dicing Location by state
SLICING & DICING: EXAMPLE 1
16
Dicing Slicing
Slicing operation in ROLAP is basically:
-- Selection conditions on some attributes (WHERE clause) +
-- Group by and aggregation
SLICING & DICING: EXAMPLE 2
17
SLICING & DICING: EXAMPLE 3
18
DRILL-DOWN & ROLL-UP
19
Region Sales variance
Africa 105%
Asia 57%
Europe 122%
North America 97%
Pacific 85%
South America 163%
Nation Sales variance
China 123%
Japan 52%
India 87%
Singapore 95%
Drill-down
(Group by Nation)
Roll-up
(group by Region)
ROLAP: DRILL-DOWN & ROLL-UP
20
Drill-down Roll-up
MOLAP
•  Unlike ROLAP, in MOLAP data are stored in special structures called
“Data Cubes” (Array-bases storage)
•  Data cubes pre-compute and aggregate the data
•  Possibly several data cubes with different granularities
•  Data cubes are aggregated materialized views over the data
•  As long as the data does not change frequently, the overhead of
data cubes is manageable
21
Sales 1996
Red
blob
Blue
blob
1997
Every day, every item, every city
Every week, every item
category, every city
MOLAP: CUBE OPERATOR
22
Raw-data (fact table)
Aggregation over the X axis
Aggregation over the Y axis
Aggregation over the Z axis
Aggregation over the X,Y
MOLAP & ROLAP
•  Commercial offerings of both types are available
•  In general, MOLAP is good for smaller warehouses and is
optimized for canned queries
•  In general, ROLAP is more flexible and leverages relational
technology
•  ROLAP May pay a performance penalty to realize flexibility
23
OLTP vs. OLAP
24
•  Clerk, IT Professional
•  Day to day operations
•  Application-oriented (E-R
based)
•  Current, Isolated
•  Detailed, Flat relational
•  Structured, Repetitive
•  Short, Simple transaction
•  Read/write
•  Index/hash on prim. Key
•  Tens
•  Thousands
•  100 MB-GB
•  Trans. throughput
•  Knowledge worker
•  Decision support
•  Subject-oriented (Star, snowflake)
•  Historical, Consolidated
•  Summarized, Multidimensional
•  Ad hoc
•  Complex query
•  Read Mostly
•  Lots of Scans
•  Millions
•  Hundreds
•  100GB-TB
•  Query throughput, response
User
Function
DB Design
Data
View
Usage
Unit of work
Access
Operations
# Records accessed
#Users
Db size
Metric
OLTP OLAP
Source: Datta, GT
OLAP: SUMMARY
•  OLAP stands for Online Analytic Processing and used in
decision support systems
•  Usually runs on data warehouse
•  In contrast to OLTP, OLAP queries are complex, touch large
amounts of data, try to discover patterns or trends in the data
•  OLAP Models
•  Relational (ROLAP): uses relational star schema
•  Multidimensional (MOLAP): uses data cubes
25
Overview on Data Mining
Techniques
26
DATA MINING vs. OLAP
27
•  OLAP - Online
Analytical Processing
–  Provides you with a very
good view of what is
happening, but can not
predict what will happen
in the future or why it is
happening
Data Mining is a combination of discovering
techniques + prediction techniques
DATA MINING TECHNIQUES
•  Clustering
•  Classification
•  Association Rules
•  Frequent Itemsets
•  Outlier Detection
•  ….
28
FREQUENT ITEMSET MINING
•  Very common problem in Market-Basket applications
•  Given a set of items I ={milk, bread, jelly, …}
•  Given a set of transactions where each transaction contains
subset of items
•  t1 = {milk, bread, water}
•  t2 = {milk, nuts, butter, rice}
29
What are the itemsets frequently sold together ??
% of transactions in which the itemset appears >= α
EXAMPLE
30
Assume α = 60%, what are the frequent itemsets
•  {Bread} à 80%
•  {PeanutButter} à 60%
•  {Bread, PeanutButter} à 60%
called “Support”
All frequent itemsets given α = 60%
HOW TO FIND FREQUENT ITEMSETS
•  Naïve Approach
•  Enumerate all possible itemsets and then count each one
31
All possible itemsets of size 1
All possible itemsets of size 2
All possible itemsets of size 3
All possible itemsets of size 4
CAN WE OPTIMIZE??
32
Assume α = 60%, what are the frequent itemsets
•  {Bread} à 80%
•  {PeanutButter} à 60%
•  {Bread, PeanutButter} à 60%
called “Support”
Property
For itemset S={X, Y, Z, …} of size n to be frequent, all its
subsets of size n-1 must be frequent as well
APRIORI ALGORITHM
•  Executes in scans, each scan has two phases
•  Given a list of candidate itemsets of size n, count their appearance
and find frequent ones
•  From the frequent ones generate candidates of size n+1 (previous
property must hold)
•  Start the algorithm where n =1, then repeat
33
Use the property reduce the number of
itemsets to check
APRIORI EXAMPLE
34
APRIORI EXAMPLE (CONT’D)
35
DATA MINING TECHNIQUES
•  Clustering
•  Classification
•  Association Rules
•  Frequent Itemsets
•  Outlier Detection
•  ….
36
ASSOCIATION RULES MINING
•  What is the probability when a customer buys bread in a
transaction, (s)he also buys milk in the same transaction?
37
Bread ----------------------> milk
Implies?
Frequent itemsets cannot answer this question….But Association rules can
General Form
Association rule: x1, x2, …, xn à y1, y2, …ym
Meaning: when the L.H.S appears (or occurs), the R.H.S also appears (or
occurs) with certain probability
Two measures for a given rule:
1- Support(L.H.S U R.H.S) > α
2- Confidence C = Support(L.H.S U R.H.S)/ Support(L.H.S)
EXAMPLE
38
Rule: Bread à PeanutButter
•  Support of rule = support(Bread, PeanutButter) = 60%
•  Confidence of rule = support(Bread, PeanutButter)/support(Bread) = 75%
Rule: Bread, Jelly à PeanutButter
•  Support of rule = support(Bread, Jelly, PeanutButter) = 20%
•  Confidence of rule = support(Bread, Jelly, PeanutButter) /support(Bread, Jelly) = 100%
Usually we search for rules:
Support > α
Confidence > β

More Related Content

What's hot

OLAP in Data Warehouse
OLAP in Data WarehouseOLAP in Data Warehouse
OLAP in Data WarehouseSOMASUNDARAM T
 
Association rule mining.pptx
Association rule mining.pptxAssociation rule mining.pptx
Association rule mining.pptxmaha797959
 
Schemas for multidimensional databases
Schemas for multidimensional databasesSchemas for multidimensional databases
Schemas for multidimensional databasesyazad dumasia
 
Data Warehouse Fundamentals
Data Warehouse FundamentalsData Warehouse Fundamentals
Data Warehouse FundamentalsRashmi Bhat
 
DATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALA
DATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALADATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALA
DATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALASaikiran Panjala
 
Multidimensional data models
Multidimensional data  modelsMultidimensional data  models
Multidimensional data models774474
 
Data preprocessing using Machine Learning
Data  preprocessing using Machine Learning Data  preprocessing using Machine Learning
Data preprocessing using Machine Learning Gopal Sakarkar
 
Online Analytical Processing
Online Analytical ProcessingOnline Analytical Processing
Online Analytical Processingnayakslideshare
 
1.4 data warehouse
1.4 data warehouse1.4 data warehouse
1.4 data warehouseKrish_ver2
 
Major issues in data mining
Major issues in data miningMajor issues in data mining
Major issues in data miningSlideshare
 
multi dimensional data model
multi dimensional data modelmulti dimensional data model
multi dimensional data modelmoni sindhu
 

What's hot (20)

OLAP v/s OLTP
OLAP v/s OLTPOLAP v/s OLTP
OLAP v/s OLTP
 
OLAP in Data Warehouse
OLAP in Data WarehouseOLAP in Data Warehouse
OLAP in Data Warehouse
 
OLTP vs OLAP
OLTP vs OLAPOLTP vs OLAP
OLTP vs OLAP
 
02 data
02 data02 data
02 data
 
Association rule mining.pptx
Association rule mining.pptxAssociation rule mining.pptx
Association rule mining.pptx
 
Schemas for multidimensional databases
Schemas for multidimensional databasesSchemas for multidimensional databases
Schemas for multidimensional databases
 
Data Warehouse Fundamentals
Data Warehouse FundamentalsData Warehouse Fundamentals
Data Warehouse Fundamentals
 
DATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALA
DATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALADATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALA
DATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALA
 
Data mining
Data miningData mining
Data mining
 
DATA WAREHOUSING AND DATA MINING
DATA WAREHOUSING AND DATA MININGDATA WAREHOUSING AND DATA MINING
DATA WAREHOUSING AND DATA MINING
 
Multidimensional data models
Multidimensional data  modelsMultidimensional data  models
Multidimensional data models
 
Data preprocessing using Machine Learning
Data  preprocessing using Machine Learning Data  preprocessing using Machine Learning
Data preprocessing using Machine Learning
 
OLAP
OLAPOLAP
OLAP
 
Online Analytical Processing
Online Analytical ProcessingOnline Analytical Processing
Online Analytical Processing
 
Programming in R
Programming in RProgramming in R
Programming in R
 
Data integration
Data integrationData integration
Data integration
 
1.4 data warehouse
1.4 data warehouse1.4 data warehouse
1.4 data warehouse
 
Major issues in data mining
Major issues in data miningMajor issues in data mining
Major issues in data mining
 
multi dimensional data model
multi dimensional data modelmulti dimensional data model
multi dimensional data model
 
OLAP technology
OLAP technologyOLAP technology
OLAP technology
 

Similar to OLAP IN DATA MINING

MariaDB AX: Solución analítica con ColumnStore
MariaDB AX: Solución analítica con ColumnStoreMariaDB AX: Solución analítica con ColumnStore
MariaDB AX: Solución analítica con ColumnStoreMariaDB plc
 
MariaDB AX: Analytics with MariaDB ColumnStore
MariaDB AX: Analytics with MariaDB ColumnStoreMariaDB AX: Analytics with MariaDB ColumnStore
MariaDB AX: Analytics with MariaDB ColumnStoreMariaDB plc
 
Introduction to data mining and data warehousing
Introduction to data mining and data warehousingIntroduction to data mining and data warehousing
Introduction to data mining and data warehousingEr. Nawaraj Bhandari
 
05_Decision Support and OLAP.pdf
05_Decision Support and OLAP.pdf05_Decision Support and OLAP.pdf
05_Decision Support and OLAP.pdfINyomanSwitrayana
 
Introduction to data mining
Introduction to data miningIntroduction to data mining
Introduction to data miningSamrat Tayade
 
finalestkddfinalpresentation-111207021040-phpapp01.pptx
finalestkddfinalpresentation-111207021040-phpapp01.pptxfinalestkddfinalpresentation-111207021040-phpapp01.pptx
finalestkddfinalpresentation-111207021040-phpapp01.pptxshumPanwar
 
OLAP OnLine Analytical Processing
OLAP OnLine Analytical ProcessingOLAP OnLine Analytical Processing
OLAP OnLine Analytical ProcessingWalid Elbadawy
 
Data mining techniques unit 1
Data mining techniques  unit 1Data mining techniques  unit 1
Data mining techniques unit 1malathieswaran29
 
Data ware housing - Introduction to data ware housing process.
Data ware housing - Introduction to data ware housing process.Data ware housing - Introduction to data ware housing process.
Data ware housing - Introduction to data ware housing process.Vibrant Technologies & Computers
 
Data warehouse 10 oltp vs datawarehouse
Data warehouse 10 oltp vs datawarehouseData warehouse 10 oltp vs datawarehouse
Data warehouse 10 oltp vs datawarehouseVaibhav Khanna
 
dataWarehouse.pptx
dataWarehouse.pptxdataWarehouse.pptx
dataWarehouse.pptxhqlm1
 
Kylin and Druid Presentation
Kylin and Druid PresentationKylin and Druid Presentation
Kylin and Druid Presentationargonauts007
 
Online analytical processing (olap) tools
Online analytical processing (olap) toolsOnline analytical processing (olap) tools
Online analytical processing (olap) toolskulkarnivaibhav
 
IBANK - Big data www.ibank.uk.com 07474222079
IBANK - Big data www.ibank.uk.com 07474222079IBANK - Big data www.ibank.uk.com 07474222079
IBANK - Big data www.ibank.uk.com 07474222079ibankuk
 

Similar to OLAP IN DATA MINING (20)

OLAP
OLAPOLAP
OLAP
 
Olap queries
Olap queriesOlap queries
Olap queries
 
Lecture2 (1).ppt
Lecture2 (1).pptLecture2 (1).ppt
Lecture2 (1).ppt
 
MariaDB AX: Solución analítica con ColumnStore
MariaDB AX: Solución analítica con ColumnStoreMariaDB AX: Solución analítica con ColumnStore
MariaDB AX: Solución analítica con ColumnStore
 
MariaDB AX: Analytics with MariaDB ColumnStore
MariaDB AX: Analytics with MariaDB ColumnStoreMariaDB AX: Analytics with MariaDB ColumnStore
MariaDB AX: Analytics with MariaDB ColumnStore
 
Introduction to data mining and data warehousing
Introduction to data mining and data warehousingIntroduction to data mining and data warehousing
Introduction to data mining and data warehousing
 
05_Decision Support and OLAP.pdf
05_Decision Support and OLAP.pdf05_Decision Support and OLAP.pdf
05_Decision Support and OLAP.pdf
 
Introduction to data mining
Introduction to data miningIntroduction to data mining
Introduction to data mining
 
finalestkddfinalpresentation-111207021040-phpapp01.pptx
finalestkddfinalpresentation-111207021040-phpapp01.pptxfinalestkddfinalpresentation-111207021040-phpapp01.pptx
finalestkddfinalpresentation-111207021040-phpapp01.pptx
 
OLAP OnLine Analytical Processing
OLAP OnLine Analytical ProcessingOLAP OnLine Analytical Processing
OLAP OnLine Analytical Processing
 
Data mining techniques unit 1
Data mining techniques  unit 1Data mining techniques  unit 1
Data mining techniques unit 1
 
Data ware housing - Introduction to data ware housing process.
Data ware housing - Introduction to data ware housing process.Data ware housing - Introduction to data ware housing process.
Data ware housing - Introduction to data ware housing process.
 
Data warehouse 10 oltp vs datawarehouse
Data warehouse 10 oltp vs datawarehouseData warehouse 10 oltp vs datawarehouse
Data warehouse 10 oltp vs datawarehouse
 
Dw 07032018-dr pl pradhan
Dw 07032018-dr pl pradhanDw 07032018-dr pl pradhan
Dw 07032018-dr pl pradhan
 
Chap3.pptx
Chap3.pptxChap3.pptx
Chap3.pptx
 
dataWarehouse.pptx
dataWarehouse.pptxdataWarehouse.pptx
dataWarehouse.pptx
 
Kylin and Druid Presentation
Kylin and Druid PresentationKylin and Druid Presentation
Kylin and Druid Presentation
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
Online analytical processing (olap) tools
Online analytical processing (olap) toolsOnline analytical processing (olap) tools
Online analytical processing (olap) tools
 
IBANK - Big data www.ibank.uk.com 07474222079
IBANK - Big data www.ibank.uk.com 07474222079IBANK - Big data www.ibank.uk.com 07474222079
IBANK - Big data www.ibank.uk.com 07474222079
 

Recently uploaded

New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 

Recently uploaded (20)

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 

OLAP IN DATA MINING

  • 1. C S 5 6 1 - S P R I N G 2 0 1 2 W P I , M O H A M E D E LTA B A K H OLAP & DATA MINING 1
  • 3. OLAP •  OLAP: Online Analytic Processing •  OLAP queries are complex queries that •  Touch large amounts of data •  Discover patterns and trends in the data •  Typically expensive queries that take long time •  Also called decision-support queries •  In contrast to OLAP: •  OLTP: Online Transaction Processing •  OLTP queries are simple queries, e.g., over banking or airline systems •  OLTP queries touch small amount of data for fast transactions 3
  • 4. OLTP vs. OLAP §  On-Line Transaction Processing (OLTP): –  technology used to perform updates on operational or transactional systems (e.g., point of sale systems) §  On-Line Analytical Processing (OLAP): –  technology used to perform complex analysis of the data in a data warehouse 4 OLAP is a category of software technology that enables analysts, managers, and executives to gain insight into data through fast, consistent, interactive access to a wide variety of possible views of information that has been transformed from raw data to reflect the dimensionality of the enterprise as understood by the user. [source: OLAP Council: www.olapcouncil.org]
  • 5. OLAP AND DATA WAREHOUSE 5 Query and Analysis Component Data Integration Component Data Warehouse Operational DBs External Sources Internal Sources OLAP Server Meta data OLAP Reports Client Tools Data Mining
  • 6. OLAP AND DATA WAREHOUSE •  Typically, OLAP queries are executed over a separate copy of the working data •  Over data warehouse •  Data warehouse is periodically updated, e.g., overnight •  OLAP queries tolerate such out-of-date gaps •  Why run OLAP queries over data warehouse?? •  Warehouse collects and combines data from multiple sources •  Warehouse may organize the data in certain formats to support OLAP queries •  OLAP queries are complex and touch large amounts of data •  They may lock the database for long periods of time •  Negatively affects all other OLTP transactions 6
  • 8. EXAMPLE OLAP APPLICATIONS •  Market Analysis •  Find which items are frequently sold over the summer but not over winter? •  Credit Card Companies •  Given a new applicant, does (s)he a credit-worthy? •  Need to check other similar applicants (age, gender, income, etc…) and observe how they perform, then do prediction for new applicant 8 OLAP queries are also called “decision- support” queries
  • 9. MULTI-DIMENSIONAL VIEW •  Data is typically viewed as points in multi-dimensional space 9 10 47 30 12Milk 1%fat 3/1 3/2 3/3 3/4 Milk 2%fat Orange juice bread Time Items NY MA CA Location Raw data cubes (raw level without aggregation) Typical OLAP applications have many dimensions
  • 11. APPROACHES FOR OLAP •  Relational OLAP (ROLAP) •  Multi-dimensional OLAP (MOLAP) •  Hybrid OLAP (HOLAP) = ROLAP + MOLAP 11
  • 12. RELATIONAL OLAP: ROLAP •  Data are stored in relational model (tables) •  Special schema called Star Schema •  One relation is the fact table, all the others are dimension tables 12 Facts Week Product Product Year Region Time Channel Revenue Expenses Units Model Type Color Channel Region Nation District Dealer Time Large table Small tables
  • 13. CUBE vs. STAR SCHEMA 13 Facts Week Product Product Year Region Time Channel Revenue Expenses Units Model Type Color Channel Region Nation District Dealer Time Data inside the cube are the fact records Dimension tables describe the dimensions 10 47 30 12Milk 1%fat 3/1 3/2 3/3 3/4 Milk 2%fat Orange juice bread Time Items NY MA CA Location
  • 14. ROLAP: EXTENSIONS TO DBMS •  Schema design •  Specialized scan, indexing and join techniques •  Handling of aggregate views (querying and materialization) •  Supporting query language extensions beyond SQL •  Complex query processing and optimization •  Data partitioning and parallelism 14
  • 15. SLICING & DICING •  Dicing •  how each dimension in the cube is divided •  Different granularities •  When building the data cube •  Slicing •  Selecting slices of the data cube to answer the OLAP query •  When answering a query 15 Dicing Time by day 10 47 30 12Milk 1%fat 3/1 3/2 3/3 3/4 Milk 2%fat Orange juice bread Time Items NY MA CA Location Dicing Location by state
  • 16. SLICING & DICING: EXAMPLE 1 16 Dicing Slicing Slicing operation in ROLAP is basically: -- Selection conditions on some attributes (WHERE clause) + -- Group by and aggregation
  • 17. SLICING & DICING: EXAMPLE 2 17
  • 18. SLICING & DICING: EXAMPLE 3 18
  • 19. DRILL-DOWN & ROLL-UP 19 Region Sales variance Africa 105% Asia 57% Europe 122% North America 97% Pacific 85% South America 163% Nation Sales variance China 123% Japan 52% India 87% Singapore 95% Drill-down (Group by Nation) Roll-up (group by Region)
  • 20. ROLAP: DRILL-DOWN & ROLL-UP 20 Drill-down Roll-up
  • 21. MOLAP •  Unlike ROLAP, in MOLAP data are stored in special structures called “Data Cubes” (Array-bases storage) •  Data cubes pre-compute and aggregate the data •  Possibly several data cubes with different granularities •  Data cubes are aggregated materialized views over the data •  As long as the data does not change frequently, the overhead of data cubes is manageable 21 Sales 1996 Red blob Blue blob 1997 Every day, every item, every city Every week, every item category, every city
  • 22. MOLAP: CUBE OPERATOR 22 Raw-data (fact table) Aggregation over the X axis Aggregation over the Y axis Aggregation over the Z axis Aggregation over the X,Y
  • 23. MOLAP & ROLAP •  Commercial offerings of both types are available •  In general, MOLAP is good for smaller warehouses and is optimized for canned queries •  In general, ROLAP is more flexible and leverages relational technology •  ROLAP May pay a performance penalty to realize flexibility 23
  • 24. OLTP vs. OLAP 24 •  Clerk, IT Professional •  Day to day operations •  Application-oriented (E-R based) •  Current, Isolated •  Detailed, Flat relational •  Structured, Repetitive •  Short, Simple transaction •  Read/write •  Index/hash on prim. Key •  Tens •  Thousands •  100 MB-GB •  Trans. throughput •  Knowledge worker •  Decision support •  Subject-oriented (Star, snowflake) •  Historical, Consolidated •  Summarized, Multidimensional •  Ad hoc •  Complex query •  Read Mostly •  Lots of Scans •  Millions •  Hundreds •  100GB-TB •  Query throughput, response User Function DB Design Data View Usage Unit of work Access Operations # Records accessed #Users Db size Metric OLTP OLAP Source: Datta, GT
  • 25. OLAP: SUMMARY •  OLAP stands for Online Analytic Processing and used in decision support systems •  Usually runs on data warehouse •  In contrast to OLTP, OLAP queries are complex, touch large amounts of data, try to discover patterns or trends in the data •  OLAP Models •  Relational (ROLAP): uses relational star schema •  Multidimensional (MOLAP): uses data cubes 25
  • 26. Overview on Data Mining Techniques 26
  • 27. DATA MINING vs. OLAP 27 •  OLAP - Online Analytical Processing –  Provides you with a very good view of what is happening, but can not predict what will happen in the future or why it is happening Data Mining is a combination of discovering techniques + prediction techniques
  • 28. DATA MINING TECHNIQUES •  Clustering •  Classification •  Association Rules •  Frequent Itemsets •  Outlier Detection •  …. 28
  • 29. FREQUENT ITEMSET MINING •  Very common problem in Market-Basket applications •  Given a set of items I ={milk, bread, jelly, …} •  Given a set of transactions where each transaction contains subset of items •  t1 = {milk, bread, water} •  t2 = {milk, nuts, butter, rice} 29 What are the itemsets frequently sold together ?? % of transactions in which the itemset appears >= α
  • 30. EXAMPLE 30 Assume α = 60%, what are the frequent itemsets •  {Bread} à 80% •  {PeanutButter} à 60% •  {Bread, PeanutButter} à 60% called “Support” All frequent itemsets given α = 60%
  • 31. HOW TO FIND FREQUENT ITEMSETS •  Naïve Approach •  Enumerate all possible itemsets and then count each one 31 All possible itemsets of size 1 All possible itemsets of size 2 All possible itemsets of size 3 All possible itemsets of size 4
  • 32. CAN WE OPTIMIZE?? 32 Assume α = 60%, what are the frequent itemsets •  {Bread} à 80% •  {PeanutButter} à 60% •  {Bread, PeanutButter} à 60% called “Support” Property For itemset S={X, Y, Z, …} of size n to be frequent, all its subsets of size n-1 must be frequent as well
  • 33. APRIORI ALGORITHM •  Executes in scans, each scan has two phases •  Given a list of candidate itemsets of size n, count their appearance and find frequent ones •  From the frequent ones generate candidates of size n+1 (previous property must hold) •  Start the algorithm where n =1, then repeat 33 Use the property reduce the number of itemsets to check
  • 36. DATA MINING TECHNIQUES •  Clustering •  Classification •  Association Rules •  Frequent Itemsets •  Outlier Detection •  …. 36
  • 37. ASSOCIATION RULES MINING •  What is the probability when a customer buys bread in a transaction, (s)he also buys milk in the same transaction? 37 Bread ----------------------> milk Implies? Frequent itemsets cannot answer this question….But Association rules can General Form Association rule: x1, x2, …, xn à y1, y2, …ym Meaning: when the L.H.S appears (or occurs), the R.H.S also appears (or occurs) with certain probability Two measures for a given rule: 1- Support(L.H.S U R.H.S) > α 2- Confidence C = Support(L.H.S U R.H.S)/ Support(L.H.S)
  • 38. EXAMPLE 38 Rule: Bread à PeanutButter •  Support of rule = support(Bread, PeanutButter) = 60% •  Confidence of rule = support(Bread, PeanutButter)/support(Bread) = 75% Rule: Bread, Jelly à PeanutButter •  Support of rule = support(Bread, Jelly, PeanutButter) = 20% •  Confidence of rule = support(Bread, Jelly, PeanutButter) /support(Bread, Jelly) = 100% Usually we search for rules: Support > α Confidence > β