SlideShare a Scribd company logo
1 of 169
DATA WAREHOUSING   AND DATA MINING S. Sudarshan Krithi Ramamritham IIT Bombay [email_address] [email_address]
Course Overview ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
0. Introduction ,[object Object],[object Object],[object Object],[object Object]
A producer wants to know…. Which are our  lowest/highest margin  customers ? Who are my customers  and what products  are they buying? Which customers  are most likely to go  to the competition ?   What impact will  new products/services  have on revenue  and margins? What product prom- -otions have the biggest  impact on revenue? What is the most  effective distribution  channel?
Data, Data everywhere yet ... ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
What is a Data Warehouse? ,[object Object],[object Object]
What are the users saying... ,[object Object],[object Object],[object Object],[object Object]
What is Data Warehousing? ,[object Object],[object Object],Data Information
Evolution ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Warehouses are Very Large Databases 35% 30% 25% 20% 15% 10% 5% 0% 5GB 5-9GB 10-19GB 50-99GB 250-499GB 20-49GB 100-249GB 500GB-1TB Initial Projected 2Q96 Source: META Group, Inc. Respondents
Very Large Data Bases ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Data Warehousing --  It is a process ,[object Object],[object Object]
Data Warehouse ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Explorers, Farmers and Tourists Explorers:  Seek out the unknown and previously unsuspected rewards hiding in the detailed data Farmers:  Harvest information from known access paths Tourists:  Browse information harvested by farmers
Data Warehouse Architecture Data Warehouse  Engine Optimized Loader Extraction Cleansing Analyze Query Metadata Repository Relational Databases Legacy Data Purchased  Data ERP Systems
Data Warehouse for Decision Support & OLAP ,[object Object],[object Object],[object Object],[object Object]
Decision Support ,[object Object],[object Object],[object Object],[object Object],[object Object]
Data Mining works with Warehouse Data ,[object Object],[object Object]
We want to know ... ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Data Mining helps extract such information
Application Areas Industry Application Finance Credit Card Analysis Insurance Claims, Fraud Analysis Telecommunication Call record analysis Transport Logistics management Consumer goods promotion analysis Data Service providers Value added data Utilities Power usage analysis
Data Mining in Use ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
What makes data mining possible? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Why Separate Data Warehouse? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
What are Operational Systems? ,[object Object],[object Object],[object Object],[object Object]
RDBMS  used for OLTP ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Operational Systems ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Examples of Operational Data Data Industry Usage Technology Volumes Customer File All Track Customer Details Legacy application, flat files, main frames Small-medium Account Balance Finance Control account activities Legacy applications, hierarchical databases, mainframe Large Point-of- Sale data Retail Generate bills, manage stock ERP, Client/Server, relational databases Very Large Call Record Telecomm- unications Billing Legacy application, hierarchical database, mainframe Very Large Production Record Manufact- uring Control Production ERP, relational databases, AS/400 Medium
So, what’s different?
Application-Orientation vs. Subject-Orientation Application-Orientation Operational Database Loans Credit  Card Trust Savings Subject-Orientation Data Warehouse Customer Vendor Product Activity
OLTP vs. Data Warehouse ,[object Object],[object Object],[object Object]
OLTP vs Data Warehouse ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
OLTP vs Data Warehouse ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
OLTP vs Data Warehouse ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
To summarize ... ,[object Object],[object Object]
Why Now? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Myths surrounding OLAP Servers and Data Marts ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Wal*Mart Case Study ,[object Object],[object Object],[object Object],[object Object],[object Object]
Old Retail Paradigm ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
New (Just-In-Time) Retail Paradigm ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Wal*Mart System ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Course Overview ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
I. Data Warehouses: Architecture, Design & Construction ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Data Warehouse Architecture Data Warehouse  Engine Optimized Loader Extraction Cleansing Analyze Query Metadata Repository Relational Databases Legacy Data Purchased  Data ERP Systems
Components of the Warehouse ,[object Object],[object Object],[object Object],[object Object],[object Object]
Loading the Warehouse Cleaning the data before it is loaded
Source Data  ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Sequential Legacy Relational External Operational/ Source Data
Data Quality - The Reality ,[object Object],[object Object],[object Object]
Data Quality - The Reality ,[object Object],[object Object],[object Object],[object Object],[object Object]
Data Integration Across Sources Trust Credit card Savings Loans Same data  different name Different data  Same name Data found here  nowhere else Different keys same data
Data Transformation Example encoding unit field appl A - balance appl B - bal appl C - currbal appl D - balcurr appl A - pipeline - cm appl B - pipeline - in appl C - pipeline - feet appl D - pipeline - yds appl A - m,f appl B - 1,0 appl C - x,y appl D - male, female Data Warehouse
Data Integrity Problems ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Data Transformation Terms ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Data Transformation Terms ,[object Object],[object Object],[object Object],[object Object],[object Object]
Data Transformation Terms ,[object Object],[object Object],[object Object],[object Object]
Data Transformation Terms ,[object Object],[object Object],[object Object],[object Object]
Loads ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Load Techniques ,[object Object],[object Object],[object Object],[object Object]
Load Taxonomy ,[object Object],[object Object]
Refresh ,[object Object],[object Object],[object Object],[object Object]
When to Refresh? ,[object Object],[object Object],[object Object],[object Object]
Refresh Techniques ,[object Object],[object Object],[object Object]
How To Detect Changes ,[object Object],[object Object],[object Object],[object Object]
Data Extraction and Cleansing ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Scrubbing Data ,[object Object],[object Object],[object Object],[object Object],[object Object]
Scrubbing Tools ,[object Object],[object Object],[object Object]
Structuring/Modeling Issues
Data -- Heart of the Data Warehouse ,[object Object],[object Object],[object Object],[object Object]
Data Warehouse Structure ,[object Object]
Data Warehouse Structure ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Time is  part of  key of  each table
Data Granularity in Warehouse ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Granularity in Warehouse ,[object Object],[object Object],[object Object]
Granularity in Warehouse ,[object Object],[object Object],[object Object],[object Object],[object Object]
Vertical Partitioning Frequently accessed Rarely  accessed Smaller table and so less I/O Acct. No Name Balance Date Opened Interest Rate Address Acct. No Balance Acct. No Name Date Opened Interest Rate Address
Derived Data ,[object Object],[object Object],[object Object]
Schema Design ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Dimension Tables ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Fact Table ,[object Object],[object Object],[object Object],[object Object],[object Object]
Star Schema ,[object Object],[object Object],T i m e p r o d c u s t c i t y f a c t date, custno, prodno, cityname,  ...
Snowflake schema ,[object Object],[object Object],T i m e p r o d c u s t c i t y f a c t date, custno, prodno, cityname,  ... r e g i o n
Fact Constellation ,[object Object],[object Object],[object Object],Hotels Travel Agents Promotion Room Type Customer Booking Checkout
De-normalization ,[object Object],[object Object],[object Object]
Creating Arrays ,[object Object],[object Object],[object Object],[object Object],[object Object]
Selective Redundancy ,[object Object],[object Object]
Partitioning ,[object Object],[object Object],[object Object]
Why Partition? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Criterion for Partitioning ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Where to Partition? ,[object Object],[object Object],[object Object],[object Object],[object Object]
Data Warehouse vs. Data Marts What comes first
From the Data Warehouse to Data Marts Departmentally Structured Individually Structured Data Warehouse Organizationally Structured Less More History Normalized Detailed Data Information
Data Warehouse and Data Marts OLAP Data Mart Lightly summarized Departmentally structured Organizationally structured Atomic Detailed Data Warehouse Data
Characteristics of the Departmental Data Mart ,[object Object],[object Object],[object Object],[object Object],[object Object]
Techniques for Creating Departmental Data Mart ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Sales Mktg. Finance
Data Mart Centric Data Marts Data Sources Data Warehouse
Problems with Data Mart Centric Solution If you end up creating multiple warehouses, integrating them is a problem
True Warehouse Data Marts Data Sources Data Warehouse
Query Processing ,[object Object],[object Object],[object Object]
Indexing Techniques ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Indexing Techniques ,[object Object],[object Object],[object Object],[object Object]
BitMap Indexes ,[object Object],[object Object],[object Object],[object Object],[object Object]
Bitmap Index Customer Query : select * from customer where gender = ‘F’ and vote = ‘Y’ gender (f) vote (y) result vote gender 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 M F F F F M Y Y Y N N N
Bit Map Index Base Table Rating Index Region Index Customers where   Region = W Rating = M And
BitMap Indexes ,[object Object],[object Object],[object Object],[object Object],[object Object]
Join Indexes ,[object Object],[object Object],[object Object],[object Object]
Join Indexes ,[object Object],[object Object]
Star Join Processing ,[object Object],Calls C+T  C+T+L C+T+L +P Time Loca- tion Plan
Optimized Star Join Processing Virtual Cross Product of T, L and P Apply Selections Time Loca- tion Plan Calls
Bitmapped Join Processing AND Time Loca- tion Plan Calls Calls Calls Bitmaps 1 0 1 0 0 1 1 1 0
Intelligent Scan ,[object Object],[object Object]
Parallel Query Processing ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Parallel Query Processing ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Pre-computed Aggregates ,[object Object],[object Object],[object Object],[object Object],[object Object]
Pre-computed Aggregates ,[object Object],[object Object],[object Object],[object Object],[object Object]
SQL Extensions ,[object Object],[object Object],[object Object],[object Object],[object Object]
SQL Extensions ,[object Object],[object Object],[object Object],[object Object],[object Object]
Red Brick has Extended set of Aggregates ,[object Object]
RISQL (Red Brick Systems) Extensions ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Using SubQueries in Calculations select product, dollars as jun97_sales,  (select sum(s1.dollars) from market mi, product pi, period, ti, sales si where pi.product = product.product and  ti.year  = period.year and  mi.city  = market.city)  as total97_sales, 100 * dollars/ (select sum(s1.dollars) from market mi, product pi, period, ti, sales si where pi.product = product.product and  ti.year  = period.year and  mi.city  = market.city)  as percent_of_yr from market, product, period, sales where year = 1997 and  month = ‘June’ and city like ‘Ahmed%’ order by product;
Course Overview ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
II.  On-Line Analytical Processing (OLAP) Making Decision Support Possible
Limitations of SQL ,[object Object],[object Object]
Typical OLAP Queries ,[object Object],[object Object],[object Object],[object Object]
What Is OLAP? ,[object Object],[object Object],[object Object],[object Object],[object Object],* Reference:  http://www.arborsoft.com/essbase/wht_ppr/coddTOC.html
The OLAP Market  ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Strengths of OLAP ,[object Object],[object Object],[object Object],[object Object],[object Object]
OLAP Is FASMI ,[object Object],[object Object],[object Object],[object Object],[object Object],Nigel Pendse, Richard Creath - The OLAP Report
Multi-dimensional Data ,[object Object],Dimensions:  Product, Region, Time Hierarchical summarization paths Product  Region  Time Industry  Country  Year Category  Region  Quarter  Product  City  Month  Week   Office  Day Month 1  2 3  4  7 6  5  Product Toothpaste  Juice Cola Milk  Cream Soap  Region W S  N
Data Cube Lattice ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Visualizing Neighbors is simpler
A Visual Operation:  Pivot (Rotate) 10 47 30 12 Juice Cola Milk  Cream NY LA SF 3/1  3/2  3/3 3/4 Date Month Region Product
“Slicing and Dicing” Product Sales Channel Regions Retail Direct Special Household Telecomm Video Audio India Far East Europe The Telecomm Slice
Roll-up and Drill Down ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Roll Up Higher Level of Aggregation Low-level Details Drill-Down
Nature of OLAP Analysis ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Organizationally Structured Data ,[object Object],marketing manufacturing sales finance
Multidimensional Spreadsheets ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
OLAP - Data Cube ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
SQL Extensions ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Relational OLAP:  3 Tier DSS Store atomic data in industry standard RDBMS. Generate SQL execution plans in the ROLAP engine to obtain OLAP functionality. Obtain multi-dimensional reports from the DSS Client. Data Warehouse ROLAP Engine Decision Support Client Database Layer Application Logic Layer Presentation Layer
MD-OLAP: 2 Tier DSS MDDB Engine MDDB Engine Decision Support Client Database Layer Application Logic Layer Presentation Layer Store atomic data in a proprietary data structure (MDDB), pre-calculate as many outcomes as possible, obtain OLAP functionality via proprietary algorithms running against this data. Obtain multi-dimensional reports from the DSS Client.
Typical OLAP Problems  Data Explosion Data Explosion Syndrome Number of Dimensions Number of Aggregations (4 levels in each dimension) Microsoft TechEd’98
Metadata Repository ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Metdata Repository .. 2 ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Recipe for a Successful Warehouse
For a Successful Warehouse ,[object Object],[object Object],[object Object],[object Object],From Larry Greenfield,  http://pwp.starnetinc.com/larryg/index.html
For a Successful Warehouse ,[object Object],[object Object],[object Object],[object Object]
For a Successful Warehouse ,[object Object],[object Object],[object Object],[object Object]
Data Warehouse Pitfalls ,[object Object],[object Object],[object Object],[object Object],[object Object]
Data Warehouse Pitfalls ,[object Object],[object Object],[object Object],[object Object],[object Object]
Data Warehouse Pitfalls ,[object Object],[object Object],[object Object],[object Object],[object Object]
DW and OLAP Research Issues ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
DW and OLAP Research Issues .. 2 ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Products, References, Useful Links
Reporting Tools ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
OLAP and Executive Information Systems ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Other Warehouse Related Products ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Extraction and Transformation Tools ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Scrubbing Tools ,[object Object],[object Object],[object Object]
Warehouse Products ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Warehouse Server Products ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Warehouse Server Products ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Other Warehouse Related Products ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Other Warehouse Related Products ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
4GL's, GUI Builders, and PC Databases ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Data Mining Products ,[object Object],[object Object],[object Object]
Data Warehouse ,[object Object],[object Object],[object Object]
Data Warehouse ,[object Object],[object Object]
OLAP and DSS ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Data Mining ,[object Object],[object Object],[object Object]
Other Tutorials ,[object Object],[object Object],[object Object]
Useful URLs ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

More Related Content

What's hot

Introduction to data science club
Introduction to data science clubIntroduction to data science club
Introduction to data science clubData Science Club
 
Data strategy in a Big Data world
Data strategy in a Big Data worldData strategy in a Big Data world
Data strategy in a Big Data worldCraig Milroy
 
Business Intelligence - A Management Perspective
Business Intelligence - A Management PerspectiveBusiness Intelligence - A Management Perspective
Business Intelligence - A Management Perspectivevinaya.hs
 
Data Analytics PowerPoint Presentation Slides
Data Analytics PowerPoint Presentation SlidesData Analytics PowerPoint Presentation Slides
Data Analytics PowerPoint Presentation SlidesSlideTeam
 
Business intelligence in the real time economy
Business intelligence in the real time economyBusiness intelligence in the real time economy
Business intelligence in the real time economyJohan Blomme
 
Big data introduction
Big data introductionBig data introduction
Big data introductionChirag Ahuja
 
Introduction to Data Warehouse
Introduction to Data WarehouseIntroduction to Data Warehouse
Introduction to Data WarehouseShanthi Mukkavilli
 
Datawarehousing
DatawarehousingDatawarehousing
Datawarehousingwork
 
Introduction to Data Warehousing
Introduction to Data WarehousingIntroduction to Data Warehousing
Introduction to Data WarehousingEyad Manna
 
Introduction to Data Mining
Introduction to Data Mining Introduction to Data Mining
Introduction to Data Mining Sushil Kulkarni
 
Tips & tricks to drive effective Master Data Management & ERP harmonization
Tips & tricks to drive effective Master Data Management & ERP harmonizationTips & tricks to drive effective Master Data Management & ERP harmonization
Tips & tricks to drive effective Master Data Management & ERP harmonizationVerdantis
 
Presentation on Business Intelligence (BI)
Presentation on Business Intelligence (BI)Presentation on Business Intelligence (BI)
Presentation on Business Intelligence (BI)AkashBorse2
 
Data mining slides
Data mining slidesData mining slides
Data mining slidessmj
 

What's hot (20)

Introduction to data science club
Introduction to data science clubIntroduction to data science club
Introduction to data science club
 
Business Intelligence - Conceptual Introduction
Business Intelligence - Conceptual IntroductionBusiness Intelligence - Conceptual Introduction
Business Intelligence - Conceptual Introduction
 
Business Intelligence Presentation
Business Intelligence PresentationBusiness Intelligence Presentation
Business Intelligence Presentation
 
Data strategy in a Big Data world
Data strategy in a Big Data worldData strategy in a Big Data world
Data strategy in a Big Data world
 
Business Intelligence - A Management Perspective
Business Intelligence - A Management PerspectiveBusiness Intelligence - A Management Perspective
Business Intelligence - A Management Perspective
 
Data Analytics PowerPoint Presentation Slides
Data Analytics PowerPoint Presentation SlidesData Analytics PowerPoint Presentation Slides
Data Analytics PowerPoint Presentation Slides
 
DATA WAREHOUSING AND DATA MINING
DATA WAREHOUSING AND DATA MININGDATA WAREHOUSING AND DATA MINING
DATA WAREHOUSING AND DATA MINING
 
Business intelligence in the real time economy
Business intelligence in the real time economyBusiness intelligence in the real time economy
Business intelligence in the real time economy
 
Big data introduction
Big data introductionBig data introduction
Big data introduction
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
Data warehousing
Data warehousingData warehousing
Data warehousing
 
Introduction to Data Warehouse
Introduction to Data WarehouseIntroduction to Data Warehouse
Introduction to Data Warehouse
 
Datawarehousing
DatawarehousingDatawarehousing
Datawarehousing
 
Introduction to Data Warehousing
Introduction to Data WarehousingIntroduction to Data Warehousing
Introduction to Data Warehousing
 
Introduction to Data Mining
Introduction to Data Mining Introduction to Data Mining
Introduction to Data Mining
 
Tips & tricks to drive effective Master Data Management & ERP harmonization
Tips & tricks to drive effective Master Data Management & ERP harmonizationTips & tricks to drive effective Master Data Management & ERP harmonization
Tips & tricks to drive effective Master Data Management & ERP harmonization
 
Presentation on Business Intelligence (BI)
Presentation on Business Intelligence (BI)Presentation on Business Intelligence (BI)
Presentation on Business Intelligence (BI)
 
Big Data
Big DataBig Data
Big Data
 
Oltp vs olap
Oltp vs olapOltp vs olap
Oltp vs olap
 
Data mining slides
Data mining slidesData mining slides
Data mining slides
 

Viewers also liked

DATA WAREHOUSING
DATA WAREHOUSINGDATA WAREHOUSING
DATA WAREHOUSINGKing Julian
 
Data Warehousing and Data Mining
Data Warehousing and Data MiningData Warehousing and Data Mining
Data Warehousing and Data Miningidnats
 
Lecture 04 - Granularity in the Data Warehouse
Lecture 04 - Granularity in the Data WarehouseLecture 04 - Granularity in the Data Warehouse
Lecture 04 - Granularity in the Data Warehousephanleson
 
Architecting a Data Warehouse: A Case Study
Architecting a Data Warehouse: A Case StudyArchitecting a Data Warehouse: A Case Study
Architecting a Data Warehouse: A Case StudyMark Ginnebaugh
 
The Importance of Data Visualization
The Importance of Data VisualizationThe Importance of Data Visualization
The Importance of Data VisualizationCenterline Digital
 
Database vs Data Warehouse: A Comparative Review
Database vs Data Warehouse: A Comparative ReviewDatabase vs Data Warehouse: A Comparative Review
Database vs Data Warehouse: A Comparative ReviewHealth Catalyst
 
Clinical Data Repository vs. A Data Warehouse - Which Do You Need?
Clinical Data Repository vs. A Data Warehouse - Which Do You Need?Clinical Data Repository vs. A Data Warehouse - Which Do You Need?
Clinical Data Repository vs. A Data Warehouse - Which Do You Need?Health Catalyst
 
Data Mining and Data Warehousing
Data Mining and Data WarehousingData Mining and Data Warehousing
Data Mining and Data WarehousingAmdocs
 
Introduction to Data Warehousing
Introduction to Data WarehousingIntroduction to Data Warehousing
Introduction to Data WarehousingJason S
 
Building an Effective Data Warehouse Architecture
Building an Effective Data Warehouse ArchitectureBuilding an Effective Data Warehouse Architecture
Building an Effective Data Warehouse ArchitectureJames Serra
 
Introduction To Data Warehousing
Introduction To Data WarehousingIntroduction To Data Warehousing
Introduction To Data WarehousingAlex Meadows
 
Semantic Technology for the Data Warehousing Practitioner
Semantic Technology for the Data Warehousing PractitionerSemantic Technology for the Data Warehousing Practitioner
Semantic Technology for the Data Warehousing PractitionerThomas Kelly, PMP
 
Lecture 02 - The Data Warehouse Environment
Lecture 02 - The Data Warehouse Environment Lecture 02 - The Data Warehouse Environment
Lecture 02 - The Data Warehouse Environment phanleson
 
Lecture 03 - The Data Warehouse and Design
Lecture 03 - The Data Warehouse and Design Lecture 03 - The Data Warehouse and Design
Lecture 03 - The Data Warehouse and Design phanleson
 
Lecture 13
Lecture 13Lecture 13
Lecture 13Shani729
 
Accelerating Apache Spark-based Analytics on Intel Architecture-(Michael Gree...
Accelerating Apache Spark-based Analytics on Intel Architecture-(Michael Gree...Accelerating Apache Spark-based Analytics on Intel Architecture-(Michael Gree...
Accelerating Apache Spark-based Analytics on Intel Architecture-(Michael Gree...Spark Summit
 

Viewers also liked (20)

DATA WAREHOUSING
DATA WAREHOUSINGDATA WAREHOUSING
DATA WAREHOUSING
 
Data Warehousing and Data Mining
Data Warehousing and Data MiningData Warehousing and Data Mining
Data Warehousing and Data Mining
 
Lecture 04 - Granularity in the Data Warehouse
Lecture 04 - Granularity in the Data WarehouseLecture 04 - Granularity in the Data Warehouse
Lecture 04 - Granularity in the Data Warehouse
 
Architecting a Data Warehouse: A Case Study
Architecting a Data Warehouse: A Case StudyArchitecting a Data Warehouse: A Case Study
Architecting a Data Warehouse: A Case Study
 
The Importance of Data Visualization
The Importance of Data VisualizationThe Importance of Data Visualization
The Importance of Data Visualization
 
Database vs Data Warehouse: A Comparative Review
Database vs Data Warehouse: A Comparative ReviewDatabase vs Data Warehouse: A Comparative Review
Database vs Data Warehouse: A Comparative Review
 
DATA WAREHOUSING
DATA WAREHOUSINGDATA WAREHOUSING
DATA WAREHOUSING
 
Clinical Data Repository vs. A Data Warehouse - Which Do You Need?
Clinical Data Repository vs. A Data Warehouse - Which Do You Need?Clinical Data Repository vs. A Data Warehouse - Which Do You Need?
Clinical Data Repository vs. A Data Warehouse - Which Do You Need?
 
Data Mining and Data Warehousing
Data Mining and Data WarehousingData Mining and Data Warehousing
Data Mining and Data Warehousing
 
Introduction to Data Warehousing
Introduction to Data WarehousingIntroduction to Data Warehousing
Introduction to Data Warehousing
 
Building an Effective Data Warehouse Architecture
Building an Effective Data Warehouse ArchitectureBuilding an Effective Data Warehouse Architecture
Building an Effective Data Warehouse Architecture
 
Introduction To Data Warehousing
Introduction To Data WarehousingIntroduction To Data Warehousing
Introduction To Data Warehousing
 
Semantic Technology for the Data Warehousing Practitioner
Semantic Technology for the Data Warehousing PractitionerSemantic Technology for the Data Warehousing Practitioner
Semantic Technology for the Data Warehousing Practitioner
 
Lecture 02 - The Data Warehouse Environment
Lecture 02 - The Data Warehouse Environment Lecture 02 - The Data Warehouse Environment
Lecture 02 - The Data Warehouse Environment
 
Lecture 03 - The Data Warehouse and Design
Lecture 03 - The Data Warehouse and Design Lecture 03 - The Data Warehouse and Design
Lecture 03 - The Data Warehouse and Design
 
data warehousing
data warehousingdata warehousing
data warehousing
 
Lecture 13
Lecture 13Lecture 13
Lecture 13
 
Lecture 1
Lecture 1Lecture 1
Lecture 1
 
XML | Computer Science
XML | Computer ScienceXML | Computer Science
XML | Computer Science
 
Accelerating Apache Spark-based Analytics on Intel Architecture-(Michael Gree...
Accelerating Apache Spark-based Analytics on Intel Architecture-(Michael Gree...Accelerating Apache Spark-based Analytics on Intel Architecture-(Michael Gree...
Accelerating Apache Spark-based Analytics on Intel Architecture-(Michael Gree...
 

Similar to Data Warehousing Datamining Concepts

Gulabs Ppt On Data Warehousing And Mining
Gulabs Ppt On Data Warehousing And MiningGulabs Ppt On Data Warehousing And Mining
Gulabs Ppt On Data Warehousing And Mininggulab sharma
 
UNIT - 1 : Part 1: Data Warehousing and Data Mining
UNIT - 1 : Part 1: Data Warehousing and Data MiningUNIT - 1 : Part 1: Data Warehousing and Data Mining
UNIT - 1 : Part 1: Data Warehousing and Data MiningNandakumar P
 
krithi-talk-impact.ppt
krithi-talk-impact.pptkrithi-talk-impact.ppt
krithi-talk-impact.pptKRISHNARAJ207
 
Dataware housing
Dataware housingDataware housing
Dataware housingwork
 
Data warehouse
Data warehouseData warehouse
Data warehouseMR Z
 
13500892 data-warehousing-and-data-mining
13500892 data-warehousing-and-data-mining13500892 data-warehousing-and-data-mining
13500892 data-warehousing-and-data-miningNgaire Taylor
 
Datawarehouse Overview
Datawarehouse OverviewDatawarehouse Overview
Datawarehouse Overviewashok kumar
 
dw_concepts_2_day_course.ppt
dw_concepts_2_day_course.pptdw_concepts_2_day_course.ppt
dw_concepts_2_day_course.pptDougSchoemaker
 
Capturing big value in big data
Capturing big value in big data Capturing big value in big data
Capturing big value in big data BSP Media Group
 
Date warehousing concepts
Date warehousing conceptsDate warehousing concepts
Date warehousing conceptspcherukumalla
 
Data warehouse-1 (1)
Data warehouse-1 (1)Data warehouse-1 (1)
Data warehouse-1 (1)vikram singh
 
Introduction To Msbi By Yasir
Introduction To Msbi By YasirIntroduction To Msbi By Yasir
Introduction To Msbi By Yasirguest7c8e5f
 
Dw & etl concepts
Dw & etl conceptsDw & etl concepts
Dw & etl conceptsjeshocarme
 
Datawarehouse & bi introduction
Datawarehouse & bi introductionDatawarehouse & bi introduction
Datawarehouse & bi introductionShivmohan Purohit
 

Similar to Data Warehousing Datamining Concepts (20)

Gulabs Ppt On Data Warehousing And Mining
Gulabs Ppt On Data Warehousing And MiningGulabs Ppt On Data Warehousing And Mining
Gulabs Ppt On Data Warehousing And Mining
 
IT Ready - DW: 1st Day
IT Ready - DW: 1st Day IT Ready - DW: 1st Day
IT Ready - DW: 1st Day
 
UNIT - 1 : Part 1: Data Warehousing and Data Mining
UNIT - 1 : Part 1: Data Warehousing and Data MiningUNIT - 1 : Part 1: Data Warehousing and Data Mining
UNIT - 1 : Part 1: Data Warehousing and Data Mining
 
krithi-talk-impact.ppt
krithi-talk-impact.pptkrithi-talk-impact.ppt
krithi-talk-impact.ppt
 
krithi-talk-impact.ppt
krithi-talk-impact.pptkrithi-talk-impact.ppt
krithi-talk-impact.ppt
 
Dataware housing
Dataware housingDataware housing
Dataware housing
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
DWM
DWMDWM
DWM
 
13500892 data-warehousing-and-data-mining
13500892 data-warehousing-and-data-mining13500892 data-warehousing-and-data-mining
13500892 data-warehousing-and-data-mining
 
Datawarehouse Overview
Datawarehouse OverviewDatawarehouse Overview
Datawarehouse Overview
 
dw_concepts_2_day_course.ppt
dw_concepts_2_day_course.pptdw_concepts_2_day_course.ppt
dw_concepts_2_day_course.ppt
 
Machine Data Analytics
Machine Data AnalyticsMachine Data Analytics
Machine Data Analytics
 
CTP Data Warehouse
CTP Data WarehouseCTP Data Warehouse
CTP Data Warehouse
 
Capturing big value in big data
Capturing big value in big data Capturing big value in big data
Capturing big value in big data
 
Date warehousing concepts
Date warehousing conceptsDate warehousing concepts
Date warehousing concepts
 
Data warehouse-1 (1)
Data warehouse-1 (1)Data warehouse-1 (1)
Data warehouse-1 (1)
 
Introduction To Msbi By Yasir
Introduction To Msbi By YasirIntroduction To Msbi By Yasir
Introduction To Msbi By Yasir
 
Dw & etl concepts
Dw & etl conceptsDw & etl concepts
Dw & etl concepts
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
Datawarehouse & bi introduction
Datawarehouse & bi introductionDatawarehouse & bi introduction
Datawarehouse & bi introduction
 

Data Warehousing Datamining Concepts

  • 1. DATA WAREHOUSING AND DATA MINING S. Sudarshan Krithi Ramamritham IIT Bombay [email_address] [email_address]
  • 2.
  • 3.
  • 4. A producer wants to know…. Which are our lowest/highest margin customers ? Who are my customers and what products are they buying? Which customers are most likely to go to the competition ? What impact will new products/services have on revenue and margins? What product prom- -otions have the biggest impact on revenue? What is the most effective distribution channel?
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10. Warehouses are Very Large Databases 35% 30% 25% 20% 15% 10% 5% 0% 5GB 5-9GB 10-19GB 50-99GB 250-499GB 20-49GB 100-249GB 500GB-1TB Initial Projected 2Q96 Source: META Group, Inc. Respondents
  • 11.
  • 12.
  • 13.
  • 14. Explorers, Farmers and Tourists Explorers: Seek out the unknown and previously unsuspected rewards hiding in the detailed data Farmers: Harvest information from known access paths Tourists: Browse information harvested by farmers
  • 15. Data Warehouse Architecture Data Warehouse Engine Optimized Loader Extraction Cleansing Analyze Query Metadata Repository Relational Databases Legacy Data Purchased Data ERP Systems
  • 16.
  • 17.
  • 18.
  • 19.
  • 20. Application Areas Industry Application Finance Credit Card Analysis Insurance Claims, Fraud Analysis Telecommunication Call record analysis Transport Logistics management Consumer goods promotion analysis Data Service providers Value added data Utilities Power usage analysis
  • 21.
  • 22.
  • 23.
  • 24.
  • 25.
  • 26.
  • 27. Examples of Operational Data Data Industry Usage Technology Volumes Customer File All Track Customer Details Legacy application, flat files, main frames Small-medium Account Balance Finance Control account activities Legacy applications, hierarchical databases, mainframe Large Point-of- Sale data Retail Generate bills, manage stock ERP, Client/Server, relational databases Very Large Call Record Telecomm- unications Billing Legacy application, hierarchical database, mainframe Very Large Production Record Manufact- uring Control Production ERP, relational databases, AS/400 Medium
  • 29. Application-Orientation vs. Subject-Orientation Application-Orientation Operational Database Loans Credit Card Trust Savings Subject-Orientation Data Warehouse Customer Vendor Product Activity
  • 30.
  • 31.
  • 32.
  • 33.
  • 34.
  • 35.
  • 36.
  • 37.
  • 38.
  • 39.
  • 40.
  • 41.
  • 42.
  • 43. Data Warehouse Architecture Data Warehouse Engine Optimized Loader Extraction Cleansing Analyze Query Metadata Repository Relational Databases Legacy Data Purchased Data ERP Systems
  • 44.
  • 45. Loading the Warehouse Cleaning the data before it is loaded
  • 46.
  • 47.
  • 48.
  • 49. Data Integration Across Sources Trust Credit card Savings Loans Same data different name Different data Same name Data found here nowhere else Different keys same data
  • 50. Data Transformation Example encoding unit field appl A - balance appl B - bal appl C - currbal appl D - balcurr appl A - pipeline - cm appl B - pipeline - in appl C - pipeline - feet appl D - pipeline - yds appl A - m,f appl B - 1,0 appl C - x,y appl D - male, female Data Warehouse
  • 51.
  • 52.
  • 53.
  • 54.
  • 55.
  • 56.
  • 57.
  • 58.
  • 59.
  • 60.
  • 61.
  • 62.
  • 63.
  • 64.
  • 65.
  • 67.
  • 68.
  • 69.
  • 70.
  • 71.
  • 72.
  • 73. Vertical Partitioning Frequently accessed Rarely accessed Smaller table and so less I/O Acct. No Name Balance Date Opened Interest Rate Address Acct. No Balance Acct. No Name Date Opened Interest Rate Address
  • 74.
  • 75.
  • 76.
  • 77.
  • 78.
  • 79.
  • 80.
  • 81.
  • 82.
  • 83.
  • 84.
  • 85.
  • 86.
  • 87.
  • 88. Data Warehouse vs. Data Marts What comes first
  • 89. From the Data Warehouse to Data Marts Departmentally Structured Individually Structured Data Warehouse Organizationally Structured Less More History Normalized Detailed Data Information
  • 90. Data Warehouse and Data Marts OLAP Data Mart Lightly summarized Departmentally structured Organizationally structured Atomic Detailed Data Warehouse Data
  • 91.
  • 92.
  • 93. Data Mart Centric Data Marts Data Sources Data Warehouse
  • 94. Problems with Data Mart Centric Solution If you end up creating multiple warehouses, integrating them is a problem
  • 95. True Warehouse Data Marts Data Sources Data Warehouse
  • 96.
  • 97.
  • 98.
  • 99.
  • 100. Bitmap Index Customer Query : select * from customer where gender = ‘F’ and vote = ‘Y’ gender (f) vote (y) result vote gender 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 M F F F F M Y Y Y N N N
  • 101. Bit Map Index Base Table Rating Index Region Index Customers where Region = W Rating = M And
  • 102.
  • 103.
  • 104.
  • 105.
  • 106. Optimized Star Join Processing Virtual Cross Product of T, L and P Apply Selections Time Loca- tion Plan Calls
  • 107. Bitmapped Join Processing AND Time Loca- tion Plan Calls Calls Calls Bitmaps 1 0 1 0 0 1 1 1 0
  • 108.
  • 109.
  • 110.
  • 111.
  • 112.
  • 113.
  • 114.
  • 115.
  • 116.
  • 117. Using SubQueries in Calculations select product, dollars as jun97_sales, (select sum(s1.dollars) from market mi, product pi, period, ti, sales si where pi.product = product.product and ti.year = period.year and mi.city = market.city) as total97_sales, 100 * dollars/ (select sum(s1.dollars) from market mi, product pi, period, ti, sales si where pi.product = product.product and ti.year = period.year and mi.city = market.city) as percent_of_yr from market, product, period, sales where year = 1997 and month = ‘June’ and city like ‘Ahmed%’ order by product;
  • 118.
  • 119. II. On-Line Analytical Processing (OLAP) Making Decision Support Possible
  • 120.
  • 121.
  • 122.
  • 123.
  • 124.
  • 125.
  • 126.
  • 127.
  • 129. A Visual Operation: Pivot (Rotate) 10 47 30 12 Juice Cola Milk Cream NY LA SF 3/1 3/2 3/3 3/4 Date Month Region Product
  • 130. “Slicing and Dicing” Product Sales Channel Regions Retail Direct Special Household Telecomm Video Audio India Far East Europe The Telecomm Slice
  • 131.
  • 132.
  • 133.
  • 134.
  • 135.
  • 136.
  • 137. Relational OLAP: 3 Tier DSS Store atomic data in industry standard RDBMS. Generate SQL execution plans in the ROLAP engine to obtain OLAP functionality. Obtain multi-dimensional reports from the DSS Client. Data Warehouse ROLAP Engine Decision Support Client Database Layer Application Logic Layer Presentation Layer
  • 138. MD-OLAP: 2 Tier DSS MDDB Engine MDDB Engine Decision Support Client Database Layer Application Logic Layer Presentation Layer Store atomic data in a proprietary data structure (MDDB), pre-calculate as many outcomes as possible, obtain OLAP functionality via proprietary algorithms running against this data. Obtain multi-dimensional reports from the DSS Client.
  • 139. Typical OLAP Problems Data Explosion Data Explosion Syndrome Number of Dimensions Number of Aggregations (4 levels in each dimension) Microsoft TechEd’98
  • 140.
  • 141.
  • 142. Recipe for a Successful Warehouse
  • 143.
  • 144.
  • 145.
  • 146.
  • 147.
  • 148.
  • 149.
  • 150.
  • 152.
  • 153.
  • 154.
  • 155.
  • 156.
  • 157.
  • 158.
  • 159.
  • 160.
  • 161.
  • 162.
  • 163.
  • 164.
  • 165.
  • 166.
  • 167.
  • 168.
  • 169.