SlideShare a Scribd company logo
1 of 169
DATA WAREHOUSING   AND DATA MINING Gulab Chand Sharma SIOM Matrix Pune [email_address] 09730495612
Course Overview ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
0. Introduction ,[object Object],[object Object],[object Object],[object Object]
A producer wants to know…. Which are our  lowest/highest margin  customers ? Who are my customers  and what products  are they buying? Which customers  are most likely to go  to the competition ?   What impact will  new products/services  have on revenue  and margins? What product prom- -otions have the biggest  impact on revenue? What is the most  effective distribution  channel?
Data, Data everywhere yet ... ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
What is a Data Warehouse? ,[object Object],[object Object]
What are the users saying... ,[object Object],[object Object],[object Object],[object Object]
What is Data Warehousing? ,[object Object],[object Object],Data Information
Evolution ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Warehouses are Very Large Databases 35% 30% 25% 20% 15% 10% 5% 0% 5GB 5-9GB 10-19GB 50-99GB 250-499GB 20-49GB 100-249GB 500GB-1TB Initial Projected 2Q96 Source: META Group, Inc. Respondents
Very Large Data Bases ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Data Warehousing --  It is a process ,[object Object],[object Object]
Data Warehouse ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Explorers, Farmers and Tourists Explorers:  Seek out the unknown and previously unsuspected rewards hiding in the detailed data Farmers:  Harvest information from known access paths Tourists:  Browse information harvested by farmers
Data Warehouse Architecture Data Warehouse  Engine Optimized Loader Extraction Cleansing Analyze Query Metadata Repository Relational Databases Legacy Data Purchased  Data ERP Systems
Data Warehouse for Decision Support & OLAP ,[object Object],[object Object],[object Object],[object Object]
Decision Support ,[object Object],[object Object],[object Object],[object Object],[object Object]
Data Mining works with Warehouse Data ,[object Object],[object Object]
We want to know ... ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Data Mining helps extract such information
Application Areas Industry Application Finance Credit Card Analysis Insurance Claims, Fraud Analysis Telecommunication Call record analysis Transport Logistics management Consumer goods promotion analysis Data Service providers Value added data Utilities Power usage analysis
Data Mining in Use ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
What makes data mining possible? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Why Separate Data Warehouse? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
What are Operational Systems? ,[object Object],[object Object],[object Object],[object Object]
RDBMS  used for OLTP ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Operational Systems ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Examples of Operational Data Data Industry Usage Technology Volumes Customer File All Track Customer Details Legacy application, flat files, main frames Small-medium Account Balance Finance Control account activities Legacy applications, hierarchical databases, mainframe Large Point-of- Sale data Retail Generate bills, manage stock ERP, Client/Server, relational databases Very Large Call Record Telecomm- unications Billing Legacy application, hierarchical database, mainframe Very Large Production Record Manufact- uring Control Production ERP, relational databases, AS/400 Medium
So, what’s different?
Application-Orientation vs. Subject-Orientation Application-Orientation Operational Database Loans Credit  Card Trust Savings Subject-Orientation Data Warehouse Customer Vendor Product Activity
OLTP vs. Data Warehouse ,[object Object],[object Object],[object Object]
OLTP vs Data Warehouse ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
OLTP vs Data Warehouse ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
OLTP vs Data Warehouse ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
To summarize ... ,[object Object],[object Object]
Why Now? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Myths surrounding OLAP Servers and Data Marts ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Wal*Mart Case Study ,[object Object],[object Object],[object Object],[object Object],[object Object]
Old Retail Paradigm ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
New (Just-In-Time) Retail Paradigm ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Wal*Mart System ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Course Overview ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
I. Data Warehouses: Architecture, Design & Construction ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Data Warehouse Architecture Data Warehouse  Engine Optimized Loader Extraction Cleansing Analyze Query Metadata Repository Relational Databases Legacy Data Purchased  Data ERP Systems
Components of the Warehouse ,[object Object],[object Object],[object Object],[object Object],[object Object]
Loading the Warehouse Cleaning the data before it is loaded
Source Data  ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Sequential Legacy Relational External Operational/ Source Data
Data Quality - The Reality ,[object Object],[object Object],[object Object]
Data Quality - The Reality ,[object Object],[object Object],[object Object],[object Object],[object Object]
Data Integration Across Sources Trust Credit card Savings Loans Same data  different name Different data  Same name Data found here  nowhere else Different keys same data
Data Transformation Example encoding unit field appl A - balance appl B - bal appl C - currbal appl D - balcurr appl A - pipeline - cm appl B - pipeline - in appl C - pipeline - feet appl D - pipeline - yds appl A - m,f appl B - 1,0 appl C - x,y appl D - male, female Data Warehouse
Data Integrity Problems ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Data Transformation Terms ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Data Transformation Terms ,[object Object],[object Object],[object Object],[object Object],[object Object]
Data Transformation Terms ,[object Object],[object Object],[object Object],[object Object]
Data Transformation Terms ,[object Object],[object Object],[object Object],[object Object]
Loads ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Load Techniques ,[object Object],[object Object],[object Object],[object Object]
Load Taxonomy ,[object Object],[object Object]
Refresh ,[object Object],[object Object],[object Object],[object Object]
When to Refresh? ,[object Object],[object Object],[object Object],[object Object]
Refresh Techniques ,[object Object],[object Object],[object Object]
How To Detect Changes ,[object Object],[object Object],[object Object],[object Object]
Data Extraction and Cleansing ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Scrubbing Data ,[object Object],[object Object],[object Object],[object Object],[object Object]
Scrubbing Tools ,[object Object],[object Object],[object Object]
Structuring/Modeling Issues
Data -- Heart of the Data Warehouse ,[object Object],[object Object],[object Object],[object Object]
Data Warehouse Structure ,[object Object]
Data Warehouse Structure ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Time is  part of  key of  each table
Data Granularity in Warehouse ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Granularity in Warehouse ,[object Object],[object Object],[object Object]
Granularity in Warehouse ,[object Object],[object Object],[object Object],[object Object],[object Object]
Vertical Partitioning Frequently accessed Rarely  accessed Smaller table and so less I/O Acct. No Name Balance Date Opened Interest Rate Address Acct. No Balance Acct. No Name Date Opened Interest Rate Address
Derived Data ,[object Object],[object Object],[object Object]
Schema Design ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Dimension Tables ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Fact Table ,[object Object],[object Object],[object Object],[object Object],[object Object]
Star Schema ,[object Object],[object Object],T i m e p r o d c u s t c i t y f a c t date, custno, prodno, cityname,  ...
Snowflake schema ,[object Object],[object Object],T i m e p r o d c u s t c i t y f a c t date, custno, prodno, cityname,  ... r e g i o n
Fact Constellation ,[object Object],[object Object],[object Object],Hotels Travel Agents Promotion Room Type Customer Booking Checkout
De-normalization ,[object Object],[object Object],[object Object]
Creating Arrays ,[object Object],[object Object],[object Object],[object Object],[object Object]
Selective Redundancy ,[object Object],[object Object]
Partitioning ,[object Object],[object Object],[object Object]
Why Partition? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Criterion for Partitioning ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Where to Partition? ,[object Object],[object Object],[object Object],[object Object],[object Object]
Data Warehouse vs. Data Marts What comes first
From the Data Warehouse to Data Marts Departmentally Structured Individually Structured Data Warehouse Organizationally Structured Less More History Normalized Detailed Data Information
Data Warehouse and Data Marts OLAP Data Mart Lightly summarized Departmentally structured Organizationally structured Atomic Detailed Data Warehouse Data
Characteristics of the Departmental Data Mart ,[object Object],[object Object],[object Object],[object Object],[object Object]
Techniques for Creating Departmental Data Mart ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Sales Mktg. Finance
Data Mart Centric Data Marts Data Sources Data Warehouse
Problems with Data Mart Centric Solution If you end up creating multiple warehouses, integrating them is a problem
True Warehouse Data Marts Data Sources Data Warehouse
Query Processing ,[object Object],[object Object],[object Object]
Indexing Techniques ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Indexing Techniques ,[object Object],[object Object],[object Object],[object Object]
BitMap Indexes ,[object Object],[object Object],[object Object],[object Object],[object Object]
Bitmap Index Customer Query : select * from customer where gender = ‘F’ and vote = ‘Y’ gender (f) vote (y) result vote gender 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 M F F F F M Y Y Y N N N
Bit Map Index Base Table Rating Index Region Index Customers where   Region = W Rating = M And
BitMap Indexes ,[object Object],[object Object],[object Object],[object Object],[object Object]
Join Indexes ,[object Object],[object Object],[object Object],[object Object]
Join Indexes ,[object Object],[object Object]
Star Join Processing ,[object Object],Calls C+T  C+T+L C+T+L +P Time Loca- tion Plan
Optimized Star Join Processing Virtual Cross Product of T, L and P Apply Selections Time Loca- tion Plan Calls
Bitmapped Join Processing AND Time Loca- tion Plan Calls Calls Calls Bitmaps 1 0 1 0 0 1 1 1 0
Intelligent Scan ,[object Object],[object Object]
Parallel Query Processing ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Parallel Query Processing ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Pre-computed Aggregates ,[object Object],[object Object],[object Object],[object Object],[object Object]
Pre-computed Aggregates ,[object Object],[object Object],[object Object],[object Object],[object Object]
SQL Extensions ,[object Object],[object Object],[object Object],[object Object],[object Object]
SQL Extensions ,[object Object],[object Object],[object Object],[object Object],[object Object]
Red Brick has Extended set of Aggregates ,[object Object]
RISQL (Red Brick Systems) Extensions ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Using SubQueries in Calculations select product, dollars as jun97_sales,  (select sum(s1.dollars) from market mi, product pi, period, ti, sales si where pi.product = product.product and  ti.year  = period.year and  mi.city  = market.city)  as total97_sales, 100 * dollars/ (select sum(s1.dollars) from market mi, product pi, period, ti, sales si where pi.product = product.product and  ti.year  = period.year and  mi.city  = market.city)  as percent_of_yr from market, product, period, sales where year = 1997 and  month = ‘June’ and city like ‘Ahmed%’ order by product;
Course Overview ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
II.  On-Line Analytical Processing (OLAP) Making Decision Support Possible
Limitations of SQL ,[object Object],[object Object]
Typical OLAP Queries ,[object Object],[object Object],[object Object],[object Object]
What Is OLAP? ,[object Object],[object Object],[object Object],[object Object],[object Object],* Reference:  http://www.arborsoft.com/essbase/wht_ppr/coddTOC.html
The OLAP Market  ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Strengths of OLAP ,[object Object],[object Object],[object Object],[object Object],[object Object]
OLAP Is FASMI ,[object Object],[object Object],[object Object],[object Object],[object Object],Nigel Pendse, Richard Creath - The OLAP Report
Multi-dimensional Data ,[object Object],Dimensions:  Product, Region, Time Hierarchical summarization paths Product  Region  Time Industry  Country  Year Category  Region  Quarter  Product  City  Month  Week   Office  Day Month 1  2 3  4  7 6  5  Product Toothpaste  Juice Cola Milk  Cream Soap  Region W S  N
Data Cube Lattice ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Visualizing Neighbors is simpler
A Visual Operation:  Pivot (Rotate) 10 47 30 12 Juice Cola Milk  Cream NY LA SF 3/1  3/2  3/3 3/4 Date Month Region Product
“ Slicing and Dicing” Product Sales Channel Regions Retail Direct Special Household Telecomm Video Audio India Far East Europe The Telecomm Slice
Roll-up and Drill Down ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Roll Up Higher Level of Aggregation Low-level Details Drill-Down
Nature of OLAP Analysis ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Organizationally Structured Data ,[object Object],marketing manufacturing sales finance
Multidimensional Spreadsheets ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
OLAP - Data Cube ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
SQL Extensions ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Relational OLAP:  3 Tier DSS Store atomic data in industry standard RDBMS. Generate SQL execution plans in the ROLAP engine to obtain OLAP functionality. Obtain multi-dimensional reports from the DSS Client. Data Warehouse ROLAP Engine Decision Support Client Database Layer Application Logic Layer Presentation Layer
MD-OLAP: 2 Tier DSS MDDB Engine MDDB Engine Decision Support Client Database Layer Application Logic Layer Presentation Layer Store atomic data in a proprietary data structure (MDDB), pre-calculate as many outcomes as possible, obtain OLAP functionality via proprietary algorithms running against this data. Obtain multi-dimensional reports from the DSS Client.
Typical OLAP Problems  Data Explosion Data Explosion Syndrome Number of Dimensions Number of Aggregations (4 levels in each dimension) Microsoft TechEd’98
Metadata Repository ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Metdata Repository .. 2 ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Recipe for a Successful Warehouse
For a Successful Warehouse ,[object Object],[object Object],[object Object],[object Object],From Larry Greenfield,  http://pwp.starnetinc.com/larryg/index.html
For a Successful Warehouse ,[object Object],[object Object],[object Object],[object Object]
For a Successful Warehouse ,[object Object],[object Object],[object Object],[object Object]
Data Warehouse Pitfalls ,[object Object],[object Object],[object Object],[object Object],[object Object]
Data Warehouse Pitfalls ,[object Object],[object Object],[object Object],[object Object],[object Object]
Data Warehouse Pitfalls ,[object Object],[object Object],[object Object],[object Object],[object Object]
DW and OLAP Research Issues ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
DW and OLAP Research Issues .. 2 ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Products, References, Useful Links
Reporting Tools ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
OLAP and Executive Information Systems ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Other Warehouse Related Products ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Extraction and Transformation Tools ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Scrubbing Tools ,[object Object],[object Object],[object Object]
Warehouse Products ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Warehouse Server Products ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Warehouse Server Products ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Other Warehouse Related Products ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Other Warehouse Related Products ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
4GL's, GUI Builders, and PC Databases ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Data Mining Products ,[object Object],[object Object],[object Object]
Data Warehouse ,[object Object],[object Object],[object Object]
Data Warehouse ,[object Object],[object Object]
OLAP and DSS ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Data Mining ,[object Object],[object Object],[object Object]
Other Tutorials ,[object Object],[object Object],[object Object]
Useful URLs ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

More Related Content

What's hot

Datawarehousing
DatawarehousingDatawarehousing
Datawarehousing
work
 
PowerPoint Template
PowerPoint TemplatePowerPoint Template
PowerPoint Template
butest
 
Introduction to Data Warehouse
Introduction to Data WarehouseIntroduction to Data Warehouse
Introduction to Data Warehouse
Shanthi Mukkavilli
 
Introduction to data warehousing
Introduction to data warehousing   Introduction to data warehousing
Introduction to data warehousing
Girish Dhareshwar
 

What's hot (20)

Datawarehousing
DatawarehousingDatawarehousing
Datawarehousing
 
Project Presentation on Data WareHouse
Project Presentation on Data WareHouseProject Presentation on Data WareHouse
Project Presentation on Data WareHouse
 
Data Warehouse Basic Guide
Data Warehouse Basic GuideData Warehouse Basic Guide
Data Warehouse Basic Guide
 
PowerPoint Template
PowerPoint TemplatePowerPoint Template
PowerPoint Template
 
Ppt
PptPpt
Ppt
 
Introduction to Data Warehouse
Introduction to Data WarehouseIntroduction to Data Warehouse
Introduction to Data Warehouse
 
DATA WAREHOUSING AND DATA MINING
DATA WAREHOUSING AND DATA MININGDATA WAREHOUSING AND DATA MINING
DATA WAREHOUSING AND DATA MINING
 
Data Warehousing Overview
Data Warehousing OverviewData Warehousing Overview
Data Warehousing Overview
 
Data warehousing
Data warehousingData warehousing
Data warehousing
 
Data warehouse concepts
Data warehouse conceptsData warehouse concepts
Data warehouse concepts
 
Data warehousing
Data warehousingData warehousing
Data warehousing
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
Warehousing dimension star-snowflake_schemas
Warehousing dimension star-snowflake_schemasWarehousing dimension star-snowflake_schemas
Warehousing dimension star-snowflake_schemas
 
Data warehousing Demo PPTS | Over View | Introduction
Data warehousing Demo PPTS | Over View | Introduction Data warehousing Demo PPTS | Over View | Introduction
Data warehousing Demo PPTS | Over View | Introduction
 
Data mining and data warehousing
Data mining and data warehousingData mining and data warehousing
Data mining and data warehousing
 
Introduction to data warehousing
Introduction to data warehousing   Introduction to data warehousing
Introduction to data warehousing
 
Data warehousing
Data warehousingData warehousing
Data warehousing
 
Enterprise resource planning system & data warehousing implementation
Enterprise resource planning system & data warehousing implementationEnterprise resource planning system & data warehousing implementation
Enterprise resource planning system & data warehousing implementation
 
Lecture 1 introduction to data warehouse
Lecture 1 introduction to data warehouseLecture 1 introduction to data warehouse
Lecture 1 introduction to data warehouse
 

Viewers also liked

Data warehouse and data mining
Data warehouse and data miningData warehouse and data mining
Data warehouse and data mining
Rohit Kumar
 
Vbmca204821311240
Vbmca204821311240Vbmca204821311240
Vbmca204821311240
Ayushi Jain
 

Viewers also liked (20)

Data Mining and Data Warehousing
Data Mining and Data WarehousingData Mining and Data Warehousing
Data Mining and Data Warehousing
 
Lecture 6, c++(complete reference,herbet sheidt)chapter-16
Lecture 6, c++(complete reference,herbet sheidt)chapter-16Lecture 6, c++(complete reference,herbet sheidt)chapter-16
Lecture 6, c++(complete reference,herbet sheidt)chapter-16
 
Lecture 4, c++(complete reference,herbet sheidt)chapter-14
Lecture 4, c++(complete reference,herbet sheidt)chapter-14Lecture 4, c++(complete reference,herbet sheidt)chapter-14
Lecture 4, c++(complete reference,herbet sheidt)chapter-14
 
Lecture 3, c++(complete reference,herbet sheidt)chapter-13
Lecture 3, c++(complete reference,herbet sheidt)chapter-13Lecture 3, c++(complete reference,herbet sheidt)chapter-13
Lecture 3, c++(complete reference,herbet sheidt)chapter-13
 
C++ Complete Reference
C++ Complete ReferenceC++ Complete Reference
C++ Complete Reference
 
Data warehouse and data mining
Data warehouse and data miningData warehouse and data mining
Data warehouse and data mining
 
Lecture 7, c++(complete reference,herbet sheidt)chapter-17.
Lecture 7, c++(complete reference,herbet sheidt)chapter-17.Lecture 7, c++(complete reference,herbet sheidt)chapter-17.
Lecture 7, c++(complete reference,herbet sheidt)chapter-17.
 
2.1 Data Mining-classification Basic concepts
2.1 Data Mining-classification Basic concepts2.1 Data Mining-classification Basic concepts
2.1 Data Mining-classification Basic concepts
 
Data Warehousing and Data Mining
Data Warehousing and Data MiningData Warehousing and Data Mining
Data Warehousing and Data Mining
 
datamining and warehousing ppt
datamining  and warehousing pptdatamining  and warehousing ppt
datamining and warehousing ppt
 
KOKPIT CPM for IT - Kurumsal Performans Yönetim Platformu
KOKPIT CPM for IT - Kurumsal Performans Yönetim PlatformuKOKPIT CPM for IT - Kurumsal Performans Yönetim Platformu
KOKPIT CPM for IT - Kurumsal Performans Yönetim Platformu
 
An example of discovering simple patterns using basic data mining
An example of discovering simple patterns using basic data miningAn example of discovering simple patterns using basic data mining
An example of discovering simple patterns using basic data mining
 
Kushal Data Warehousing PPT
Kushal Data Warehousing PPTKushal Data Warehousing PPT
Kushal Data Warehousing PPT
 
Vbmca204821311240
Vbmca204821311240Vbmca204821311240
Vbmca204821311240
 
Lecture 2, c++(complete reference,herbet sheidt)chapter-12
Lecture 2, c++(complete reference,herbet sheidt)chapter-12Lecture 2, c++(complete reference,herbet sheidt)chapter-12
Lecture 2, c++(complete reference,herbet sheidt)chapter-12
 
Lecture 5, c++(complete reference,herbet sheidt)chapter-15
Lecture 5, c++(complete reference,herbet sheidt)chapter-15Lecture 5, c++(complete reference,herbet sheidt)chapter-15
Lecture 5, c++(complete reference,herbet sheidt)chapter-15
 
data mining and data warehousing
data mining and data warehousingdata mining and data warehousing
data mining and data warehousing
 
Data mining & data warehousing (ppt)
Data mining & data warehousing (ppt)Data mining & data warehousing (ppt)
Data mining & data warehousing (ppt)
 
Data warehousing
Data warehousingData warehousing
Data warehousing
 
Data Mining & Data Warehousing Lecture Notes
Data Mining & Data Warehousing Lecture NotesData Mining & Data Warehousing Lecture Notes
Data Mining & Data Warehousing Lecture Notes
 

Similar to Gulabs Ppt On Data Warehousing And Mining

Data warehouse
Data warehouseData warehouse
Data warehouse
MR Z
 
Dataware housing
Dataware housingDataware housing
Dataware housing
work
 
13500892 data-warehousing-and-data-mining
13500892 data-warehousing-and-data-mining13500892 data-warehousing-and-data-mining
13500892 data-warehousing-and-data-mining
Ngaire Taylor
 
Datawarehouse Overview
Datawarehouse OverviewDatawarehouse Overview
Datawarehouse Overview
ashok kumar
 
Data mining slides
Data mining slidesData mining slides
Data mining slides
smj
 
Capturing big value in big data
Capturing big value in big data Capturing big value in big data
Capturing big value in big data
BSP Media Group
 

Similar to Gulabs Ppt On Data Warehousing And Mining (20)

Data Warehousing Datamining Concepts
Data Warehousing Datamining ConceptsData Warehousing Datamining Concepts
Data Warehousing Datamining Concepts
 
IT Ready - DW: 1st Day
IT Ready - DW: 1st Day IT Ready - DW: 1st Day
IT Ready - DW: 1st Day
 
UNIT - 1 : Part 1: Data Warehousing and Data Mining
UNIT - 1 : Part 1: Data Warehousing and Data MiningUNIT - 1 : Part 1: Data Warehousing and Data Mining
UNIT - 1 : Part 1: Data Warehousing and Data Mining
 
krithi-talk-impact.ppt
krithi-talk-impact.pptkrithi-talk-impact.ppt
krithi-talk-impact.ppt
 
krithi-talk-impact.ppt
krithi-talk-impact.pptkrithi-talk-impact.ppt
krithi-talk-impact.ppt
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
Dataware housing
Dataware housingDataware housing
Dataware housing
 
DWM
DWMDWM
DWM
 
13500892 data-warehousing-and-data-mining
13500892 data-warehousing-and-data-mining13500892 data-warehousing-and-data-mining
13500892 data-warehousing-and-data-mining
 
Datawarehouse Overview
Datawarehouse OverviewDatawarehouse Overview
Datawarehouse Overview
 
dw_concepts_2_day_course.ppt
dw_concepts_2_day_course.pptdw_concepts_2_day_course.ppt
dw_concepts_2_day_course.ppt
 
DATA WAREHOUSING
DATA WAREHOUSINGDATA WAREHOUSING
DATA WAREHOUSING
 
Machine Data Analytics
Machine Data AnalyticsMachine Data Analytics
Machine Data Analytics
 
CTP Data Warehouse
CTP Data WarehouseCTP Data Warehouse
CTP Data Warehouse
 
Data mining slides
Data mining slidesData mining slides
Data mining slides
 
Introduction_to_DataWareHousingbasic.ppt
Introduction_to_DataWareHousingbasic.pptIntroduction_to_DataWareHousingbasic.ppt
Introduction_to_DataWareHousingbasic.ppt
 
Data warehouse-1 (1)
Data warehouse-1 (1)Data warehouse-1 (1)
Data warehouse-1 (1)
 
Capturing big value in big data
Capturing big value in big data Capturing big value in big data
Capturing big value in big data
 
Introduction To Msbi By Yasir
Introduction To Msbi By YasirIntroduction To Msbi By Yasir
Introduction To Msbi By Yasir
 
Datawarehouse & bi introduction
Datawarehouse & bi introductionDatawarehouse & bi introduction
Datawarehouse & bi introduction
 

More from gulab sharma

Britannia Industries
Britannia IndustriesBritannia Industries
Britannia Industries
gulab sharma
 
Birla Institute Of Technology And Science Wikipedia, The Free Encyclopedia
Birla Institute Of Technology And Science   Wikipedia, The Free EncyclopediaBirla Institute Of Technology And Science   Wikipedia, The Free Encyclopedia
Birla Institute Of Technology And Science Wikipedia, The Free Encyclopedia
gulab sharma
 
Bharti Enterprises
Bharti EnterprisesBharti Enterprises
Bharti Enterprises
gulab sharma
 
Bharat Heavy Electricals Limited Wikipedia, The Free Encyclopedia
Bharat Heavy Electricals Limited   Wikipedia, The Free EncyclopediaBharat Heavy Electricals Limited   Wikipedia, The Free Encyclopedia
Bharat Heavy Electricals Limited Wikipedia, The Free Encyclopedia
gulab sharma
 
Bharat Sanchar Nigam Limited
Bharat Sanchar Nigam LimitedBharat Sanchar Nigam Limited
Bharat Sanchar Nigam Limited
gulab sharma
 
Banco Bilbao Vizcaya Argentaria Wikipedia, The Free Encyclopedia
Banco Bilbao Vizcaya Argentaria   Wikipedia, The Free EncyclopediaBanco Bilbao Vizcaya Argentaria   Wikipedia, The Free Encyclopedia
Banco Bilbao Vizcaya Argentaria Wikipedia, The Free Encyclopedia
gulab sharma
 

More from gulab sharma (20)

Bsnl Broadband
Bsnl BroadbandBsnl Broadband
Bsnl Broadband
 
Brunner Mond
Brunner MondBrunner Mond
Brunner Mond
 
British Airways
British AirwaysBritish Airways
British Airways
 
Britannia Industries
Britannia IndustriesBritannia Industries
Britannia Industries
 
Bridgestone
BridgestoneBridgestone
Bridgestone
 
Bp
BpBp
Bp
 
Borsa Italiana
Borsa ItalianaBorsa Italiana
Borsa Italiana
 
Bombay Dyeing
Bombay DyeingBombay Dyeing
Bombay Dyeing
 
Bollywood
BollywoodBollywood
Bollywood
 
Bisleri
BisleriBisleri
Bisleri
 
Birla Institute Of Technology And Science Wikipedia, The Free Encyclopedia
Birla Institute Of Technology And Science   Wikipedia, The Free EncyclopediaBirla Institute Of Technology And Science   Wikipedia, The Free Encyclopedia
Birla Institute Of Technology And Science Wikipedia, The Free Encyclopedia
 
Birla Corporation
Birla CorporationBirla Corporation
Birla Corporation
 
Biocon
BioconBiocon
Biocon
 
Bhp Billiton
Bhp BillitonBhp Billiton
Bhp Billiton
 
Bharti Enterprises
Bharti EnterprisesBharti Enterprises
Bharti Enterprises
 
Bharat Heavy Electricals Limited Wikipedia, The Free Encyclopedia
Bharat Heavy Electricals Limited   Wikipedia, The Free EncyclopediaBharat Heavy Electricals Limited   Wikipedia, The Free Encyclopedia
Bharat Heavy Electricals Limited Wikipedia, The Free Encyclopedia
 
Bharti Airtel
Bharti AirtelBharti Airtel
Bharti Airtel
 
Bharat Sanchar Nigam Limited
Bharat Sanchar Nigam LimitedBharat Sanchar Nigam Limited
Bharat Sanchar Nigam Limited
 
Beverage Alcohol
Beverage AlcoholBeverage Alcohol
Beverage Alcohol
 
Banco Bilbao Vizcaya Argentaria Wikipedia, The Free Encyclopedia
Banco Bilbao Vizcaya Argentaria   Wikipedia, The Free EncyclopediaBanco Bilbao Vizcaya Argentaria   Wikipedia, The Free Encyclopedia
Banco Bilbao Vizcaya Argentaria Wikipedia, The Free Encyclopedia
 

Recently uploaded

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Recently uploaded (20)

Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 

Gulabs Ppt On Data Warehousing And Mining

  • 1. DATA WAREHOUSING AND DATA MINING Gulab Chand Sharma SIOM Matrix Pune [email_address] 09730495612
  • 2.
  • 3.
  • 4. A producer wants to know…. Which are our lowest/highest margin customers ? Who are my customers and what products are they buying? Which customers are most likely to go to the competition ? What impact will new products/services have on revenue and margins? What product prom- -otions have the biggest impact on revenue? What is the most effective distribution channel?
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10. Warehouses are Very Large Databases 35% 30% 25% 20% 15% 10% 5% 0% 5GB 5-9GB 10-19GB 50-99GB 250-499GB 20-49GB 100-249GB 500GB-1TB Initial Projected 2Q96 Source: META Group, Inc. Respondents
  • 11.
  • 12.
  • 13.
  • 14. Explorers, Farmers and Tourists Explorers: Seek out the unknown and previously unsuspected rewards hiding in the detailed data Farmers: Harvest information from known access paths Tourists: Browse information harvested by farmers
  • 15. Data Warehouse Architecture Data Warehouse Engine Optimized Loader Extraction Cleansing Analyze Query Metadata Repository Relational Databases Legacy Data Purchased Data ERP Systems
  • 16.
  • 17.
  • 18.
  • 19.
  • 20. Application Areas Industry Application Finance Credit Card Analysis Insurance Claims, Fraud Analysis Telecommunication Call record analysis Transport Logistics management Consumer goods promotion analysis Data Service providers Value added data Utilities Power usage analysis
  • 21.
  • 22.
  • 23.
  • 24.
  • 25.
  • 26.
  • 27. Examples of Operational Data Data Industry Usage Technology Volumes Customer File All Track Customer Details Legacy application, flat files, main frames Small-medium Account Balance Finance Control account activities Legacy applications, hierarchical databases, mainframe Large Point-of- Sale data Retail Generate bills, manage stock ERP, Client/Server, relational databases Very Large Call Record Telecomm- unications Billing Legacy application, hierarchical database, mainframe Very Large Production Record Manufact- uring Control Production ERP, relational databases, AS/400 Medium
  • 29. Application-Orientation vs. Subject-Orientation Application-Orientation Operational Database Loans Credit Card Trust Savings Subject-Orientation Data Warehouse Customer Vendor Product Activity
  • 30.
  • 31.
  • 32.
  • 33.
  • 34.
  • 35.
  • 36.
  • 37.
  • 38.
  • 39.
  • 40.
  • 41.
  • 42.
  • 43. Data Warehouse Architecture Data Warehouse Engine Optimized Loader Extraction Cleansing Analyze Query Metadata Repository Relational Databases Legacy Data Purchased Data ERP Systems
  • 44.
  • 45. Loading the Warehouse Cleaning the data before it is loaded
  • 46.
  • 47.
  • 48.
  • 49. Data Integration Across Sources Trust Credit card Savings Loans Same data different name Different data Same name Data found here nowhere else Different keys same data
  • 50. Data Transformation Example encoding unit field appl A - balance appl B - bal appl C - currbal appl D - balcurr appl A - pipeline - cm appl B - pipeline - in appl C - pipeline - feet appl D - pipeline - yds appl A - m,f appl B - 1,0 appl C - x,y appl D - male, female Data Warehouse
  • 51.
  • 52.
  • 53.
  • 54.
  • 55.
  • 56.
  • 57.
  • 58.
  • 59.
  • 60.
  • 61.
  • 62.
  • 63.
  • 64.
  • 65.
  • 67.
  • 68.
  • 69.
  • 70.
  • 71.
  • 72.
  • 73. Vertical Partitioning Frequently accessed Rarely accessed Smaller table and so less I/O Acct. No Name Balance Date Opened Interest Rate Address Acct. No Balance Acct. No Name Date Opened Interest Rate Address
  • 74.
  • 75.
  • 76.
  • 77.
  • 78.
  • 79.
  • 80.
  • 81.
  • 82.
  • 83.
  • 84.
  • 85.
  • 86.
  • 87.
  • 88. Data Warehouse vs. Data Marts What comes first
  • 89. From the Data Warehouse to Data Marts Departmentally Structured Individually Structured Data Warehouse Organizationally Structured Less More History Normalized Detailed Data Information
  • 90. Data Warehouse and Data Marts OLAP Data Mart Lightly summarized Departmentally structured Organizationally structured Atomic Detailed Data Warehouse Data
  • 91.
  • 92.
  • 93. Data Mart Centric Data Marts Data Sources Data Warehouse
  • 94. Problems with Data Mart Centric Solution If you end up creating multiple warehouses, integrating them is a problem
  • 95. True Warehouse Data Marts Data Sources Data Warehouse
  • 96.
  • 97.
  • 98.
  • 99.
  • 100. Bitmap Index Customer Query : select * from customer where gender = ‘F’ and vote = ‘Y’ gender (f) vote (y) result vote gender 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 M F F F F M Y Y Y N N N
  • 101. Bit Map Index Base Table Rating Index Region Index Customers where Region = W Rating = M And
  • 102.
  • 103.
  • 104.
  • 105.
  • 106. Optimized Star Join Processing Virtual Cross Product of T, L and P Apply Selections Time Loca- tion Plan Calls
  • 107. Bitmapped Join Processing AND Time Loca- tion Plan Calls Calls Calls Bitmaps 1 0 1 0 0 1 1 1 0
  • 108.
  • 109.
  • 110.
  • 111.
  • 112.
  • 113.
  • 114.
  • 115.
  • 116.
  • 117. Using SubQueries in Calculations select product, dollars as jun97_sales, (select sum(s1.dollars) from market mi, product pi, period, ti, sales si where pi.product = product.product and ti.year = period.year and mi.city = market.city) as total97_sales, 100 * dollars/ (select sum(s1.dollars) from market mi, product pi, period, ti, sales si where pi.product = product.product and ti.year = period.year and mi.city = market.city) as percent_of_yr from market, product, period, sales where year = 1997 and month = ‘June’ and city like ‘Ahmed%’ order by product;
  • 118.
  • 119. II. On-Line Analytical Processing (OLAP) Making Decision Support Possible
  • 120.
  • 121.
  • 122.
  • 123.
  • 124.
  • 125.
  • 126.
  • 127.
  • 129. A Visual Operation: Pivot (Rotate) 10 47 30 12 Juice Cola Milk Cream NY LA SF 3/1 3/2 3/3 3/4 Date Month Region Product
  • 130. “ Slicing and Dicing” Product Sales Channel Regions Retail Direct Special Household Telecomm Video Audio India Far East Europe The Telecomm Slice
  • 131.
  • 132.
  • 133.
  • 134.
  • 135.
  • 136.
  • 137. Relational OLAP: 3 Tier DSS Store atomic data in industry standard RDBMS. Generate SQL execution plans in the ROLAP engine to obtain OLAP functionality. Obtain multi-dimensional reports from the DSS Client. Data Warehouse ROLAP Engine Decision Support Client Database Layer Application Logic Layer Presentation Layer
  • 138. MD-OLAP: 2 Tier DSS MDDB Engine MDDB Engine Decision Support Client Database Layer Application Logic Layer Presentation Layer Store atomic data in a proprietary data structure (MDDB), pre-calculate as many outcomes as possible, obtain OLAP functionality via proprietary algorithms running against this data. Obtain multi-dimensional reports from the DSS Client.
  • 139. Typical OLAP Problems Data Explosion Data Explosion Syndrome Number of Dimensions Number of Aggregations (4 levels in each dimension) Microsoft TechEd’98
  • 140.
  • 141.
  • 142. Recipe for a Successful Warehouse
  • 143.
  • 144.
  • 145.
  • 146.
  • 147.
  • 148.
  • 149.
  • 150.
  • 152.
  • 153.
  • 154.
  • 155.
  • 156.
  • 157.
  • 158.
  • 159.
  • 160.
  • 161.
  • 162.
  • 163.
  • 164.
  • 165.
  • 166.
  • 167.
  • 168.
  • 169.