SlideShare a Scribd company logo
1 of 27
Data Warehouse System
Faculty : Gaurav Garg
Table of Contents
• Data warehouse
• Data Warehouse vs. Operational DBMS
• OLTP vs. OLAP
• Why Need Separate Data Warehouse?
• Data Mart
• Data warehouse Architecture
• Data warehouse backend process
• Metadata Repository
• OLAP Server Architecture
• Schemas for multidimensional data
• Partitioning the Data warehouse
What is Data Warehouse?
• Defined in many different ways, but not rigorously.
– A decision support database that is maintained separately from the
organization’s operational database
– Support information processing by providing a solid platform of
consolidated, historical data for analysis.
• “A data warehouse is a subject-oriented, integrated, time-variant, and
nonvolatile collection of data in support of management’s decision-making
process.”—W. H. Inmno
• Data warehousing:
– The process of constructing and using data warehouses
Gaurav Garg (AITM, Palwal)
Data Warehouse—Subject-Oriented
• Organized around major subjects, such as customer, product, sales
• Focusing on the modeling and analysis of data for decision makers, not on
daily operations or transaction processing
• Provide a simple and concise view around particular subject issues by
excluding data that are not useful in the decision support process
Gaurav Garg (AITM, Palwal)
Data Warehouse—Integrated
• Constructed by integrating multiple, heterogeneous data
sources
– relational databases, flat files, on-line transaction records
• Data cleaning and data integration techniques are applied.
– Ensure consistency in naming conventions, encoding structures,
attribute measures, etc. among different data sources
• E.g., Hotel price: currency, tax, breakfast covered, etc.
– When data is moved to the warehouse, it is converted.
Gaurav Garg (AITM, Palwal)
Data Warehouse—Time Variant
• The time horizon for the data warehouse is significantly longer
than that of operational systems
– Operational database: current value data
– Data warehouse data: provide information from a historical perspective
(e.g., past 5-10 years)
• Every key structure in the data warehouse
– Contains an element of time, explicitly or implicitly
– But the key of operational data may or may not contain “time element”
Gaurav Garg (AITM, Palwal)
Data Warehouse vs. Operational DBMS
• OLTP (on-line transaction processing)
– Major task of traditional relational DBMS
– Day-to-day operations: purchasing, inventory, banking, manufacturing,
payroll, registration, accounting, etc.
• OLAP (on-line analytical processing)
– Major task of data warehouse system
– Data analysis and decision making
• Distinct features (OLTP vs. OLAP):
– User and system orientation: customer vs. market
– Data contents: current, detailed vs. historical, consolidated
– Database design: ER + application vs. star + subject
– View: current, local vs. evolutionary, integrated
– Access patterns: update vs. read-only but complex queries
Gaurav Garg (AITM, Palwal)
OLTP vs. OLAP
OLTP OLAP
users clerk, IT professional knowledge worker
function day to day operations decision support
DB design application-oriented subject-oriented
data current, up-to-date
detailed, flat relational
isolated
historical,
summarized, multidimensional
integrated, consolidated
usage repetitive ad-hoc
access read/write
index/hash on prim. key
lots of scans
unit of work short, simple transaction complex query
# records accessed tens millions
#users thousands hundreds
DB size 100MB-GB 100GB-TB
metric transaction throughput query throughput, response
Gaurav Garg (AITM, Palwal)
Why Separate Data Warehouse?
• High performance for both systems
– DBMS— tuned for OLTP: access methods, indexing, concurrency control,
recovery
– Warehouse—tuned for OLAP: complex OLAP queries, multidimensional
view, consolidation
• Different functions and different data:
– missing data: Decision support requires historical data which
operational DBs do not typically maintain
– data consolidation: DS requires consolidation (aggregation,
summarization) of data from heterogeneous sources
– data quality: different sources typically use inconsistent data
representations, codes and formats which have to be reconciled
Gaurav Garg (AITM, Palwal)
Data Warehouse Usage
• Three kinds of data warehouse applications
– Information processing
• supports querying, basic statistical analysis, and reporting using
crosstabs, tables, charts and graphs
– Analytical processing
• multidimensional analysis of data warehouse data
• supports basic OLAP operations, slice-dice, drilling, pivoting
– Data mining
• knowledge discovery from hidden patterns
• supports associations, constructing analytical models, performing
classification and prediction, and presenting the mining results
using visualization tools
Gaurav Garg (AITM, Palwal)
Data Mart
• Data marts are partitions of the overall data warehouse.
• A data mart is a simple form of a data warehouse that is focused on a
single subject (or functional area), such as Sales, Finance, or Marketing
• Data marts may contain some overlapping data. E.g. a store sales data
mart, would also need some data from inventory and payroll.
Types of data marts
Dependent data marts draw data from a central data warehouse that has
already been created.
Independent data marts, are standalone systems built by drawing data
directly from operational or external sources of data, or both.
The main difference between these is how you get data out of the sources
and into the data mart. This step, called the Extraction-Transformation-
and Loading (ETL) process, involves moving data from operational systems,
filtering it, and loading it into the data mart.
Gaurav Garg (AITM, Palwal)
A 3-tier Data warehouse Architecture
Data
Warehouse
Extract
Transform
Load
Refresh
OLAP Engine
Analysis
Query
Reports
Data mining
Monitor
&
Integrator
Metadata
Data Sources Front-End Tools
Serve
Data Marts
Operational
DBs
Other
sources
Data Storage
OLAP Server
Gaurav Garg (AITM, Palwal)
Data warehouse backend process
Data extraction
It’s a process of extracting data for the warehouse from various sources.
Data cleaning
Its an essential operation for construction of quality data warehouse. Because
a large volume of data from heterogeneous sources are involved,so there
is high probability of errors in the data. The data cleaning process include
• Using transformation rules e.g. translating attribute name like ‘age’ to
‘DOB’
• Using domain specific knowledge
• Performing parsing and fuzzy matching e.g for multiple data sources, one
can designate a preferred source as a matching standard
• Auditing
it is difficult and costly to clean the data.
Gaurav Garg (AITM, Palwal)
Data Transformation
It’s a process of transforming heterogeneous data to an uniform structure so
that data can be combined and integrated .i.e convert data from legacy or
host format to warehouse format.
Data Loading
For loading correct, refined, processed data( huge in size) into data
warehouse, there are some data loading strategies.
Batch loading.
Sequential loading.
Incremental loading.
Refresh
The process of updating the data.
Gaurav Garg (AITM, Palwal)
Metadata Repository
• Meta data is the data defining warehouse objects. It stores:
• Description of the structure of the data warehouse
– schema, view, dimensions, hierarchies, derived data defn, data mart
locations and contents
• Operational meta-data
– data lineage (history of migrated data and transformation path), currency
of data (active, archived, or purged), monitoring information (warehouse
usage statistics, error reports, audit trails)
• The algorithms used for summarization
• The mapping from operational environment to the data warehouse
• Data related to system performance
– warehouse schema, view and derived data definitions
• Business data
– business terms and definitions, ownership of data, charging policies
Gaurav Garg (AITM, Palwal)
OLAP Server Architecture
• Relational OLAP (ROLAP)
– Use relational or extended-relational DBMS to store and manage
warehouse data and OLAP middle ware
– Include optimization of DBMS backend, implementation of aggregation
navigation logic, and additional tools and services
– Greater scalability
• Multidimensional OLAP (MOLAP)
– Sparse array-based multidimensional storage engine
– Fast indexing to pre-computed summarized data
• Hybrid OLAP (HOLAP) (e.g., Microsoft SQLServer)
– Flexibility, e.g., low level: relational, high-level: array
• Specialized SQL servers (e.g., Redbricks)
– Specialized support for SQL queries over star/snowflake schemas
Gaurav Garg (AITM, Palwal)
Multidimensional Data
• Sales volume as a function of product, month,
and region
Product
Month
Dimensions: Product, Location, Time
Hierarchical summarization paths
Industry Region Year
Category Country Quarter
Product City Month Week
Office Day
Gaurav Garg (AITM, Palwal)
Multidimensional Data Model
“what is a data cube?”
“what do you mean by dimensions?”
Data cube allows data to be modeled and viewed in multiple dimensions. It is
defined by dimensions and facts.
Dimensions are the perspectives or entities with respect to which an
organizations wants to keep records.
Each dimension may have a table associated with it, called a dimension table.
A dimension table for the item may contain the attributes item_name,
brand etc.
A multidimensional data model is typically organized around a central theme,
like sales. These theme is represented by fact table.
Facts are numerical measure. Examples of facts for a sales data warehouse
include dollars_sold,unit_sold,amount_budgeted.
Facts table contains the names of the facts, or measures, as well as keys to
each related dimension tables.
Gaurav Garg (AITM, Palwal)
Schemas for multidimensional data
These multidimensional model can exist in the form of a star schema, a
snowflake schema, or a fact constellation schema (galaxy schema).
Star Shcema
It is the most common schema type in which the data warehouse contains
a large central table (fact table) which contains the bulk of data, with no
redundancy, and a set of smaller attendant tables (dimension tables), one
for each dimension. The dimension tables displayed a radial pattern
around the central fact table.
Snowflake schema
It is a variant of the star schema model, where some dimension tables are
normalized, thereby further splitting the data into additional tables. The
resulting schema graph forms a shape similar to a snowflake.
Gaurav Garg (AITM, Palwal)
Example of Star Schema
time_key
day
day_of_the_week
month
quarter
year
time
location_key
street
city
state_or_province
country
location
Sales Fact Table
time_key
item_key
branch_key
location_key
units_sold
dollars_sold
avg_sales
Measures
item_key
item_name
brand
type
supplier_type
item
branch_key
branch_name
branch_type
branch
Gaurav Garg (AITM, Palwal)
Example of Snowflake Schema
time_key
day
day_of_the_week
month
quarter
year
time
location_key
street
city_key
location
Sales Fact Table
time_key
item_key
branch_key
location_key
units_sold
dollars_sold
avg_sales
Measures
item_key
item_name
brand
type
supplier_key
item
branch_key
branch_name
branch_type
branch
supplier_key
supplier_type
supplier
city_key
city
state_or_province
country
city
Gaurav Garg (AITM, Palwal)
Fact constellation (galaxy schema)
Some sophisticated applications may require multiple fact tables to share
dimension tables. This kind of schema can be viewed as a collection of
stars, and hence is called a galaxy schema or fact constellation.
Gaurav Garg (AITM, Palwal)
Example of Fact Constellation
time_key
day
day_of_the_week
month
quarter
year
time
location_key
street
city
province_or_state
country
location
Sales Fact Table
time_key
item_key
branch_key
location_key
units_sold
dollars_sold
avg_sales
Measures
item_key
item_name
brand
type
supplier_type
item
branch_key
branch_name
branch_type
branch
Shipping Fact Table
time_key
item_key
shipper_key
from_location
to_location
dollars_cost
units_shipped
shipper_key
shipper_name
location_key
shipper_type
shipper
Gaurav Garg (AITM, Palwal)
Partitioning the Data warehouse
Data warehouses often contain large tables and require techniques both
for managing these large tables and for providing good query performance
across these large tables.
Partitioning have many advantages, some are as
Scalability
Partitioning helps scaling a data warehouse by dividing database objects
into smaller pieces, enabling access to smaller, more manageable objects.
Having direct access to smaller objects .
Performance
Good performance is a key to success for a data warehouse. Analyses run
against the database should return within a reasonable amount of time,
even if the queries access large amounts of data in tables that are
terabytes in size. Partitioning increases the performance of data
warehouse.
Note: partitions of database are generally known as clusters.
Gaurav Garg (AITM, Palwal)
Types of partitioning
• Basically it has two types
1. Horizontal partitioning
2. Vertical partitioning
There are two main categories of partitioning algorithms
1. k-means algorithm, where each cluster is represented by the center of
gravity of the cluster.
2. K-medoid algorithm, where each cluster is represented by one of the
objects of the cluster located near the center.
Gaurav Garg (AITM, Palwal)
Some general methods of partitioning
Range Partitioning
Range partitioning maps data to partitions based on ranges of partition
key values that you establish for each partition. It is the most common
type of partitioning and is often used with dates. For example, you might
want to partition sales data into monthly partitions.
Hash Partitioning
Hash partitioning maps data to partitions based on a hashing algorithm
The hashing algorithm evenly distributes rows among partitions, giving
partitions approximately the same size.
List Partitioning
List partitioning enables you to explicitly control how rows map to
partitions. This is different from range partitioning, where a range of
values is associated with a partition and with hash partitioning, where you
have no control of the row-to-partition mapping. The advantage of list
partitioning is that you can group and organize unordered and unrelated
sets of data in a natural way.
Gaurav Garg (AITM, Palwal)
Data warehouse system and its concepts

More Related Content

What's hot

Data ware house architecture
Data ware house architectureData ware house architecture
Data ware house architectureDeepak Chaurasia
 
Seminar datawarehousing
Seminar datawarehousingSeminar datawarehousing
Seminar datawarehousingKavisha Uniyal
 
Introduction Data warehouse
Introduction Data warehouseIntroduction Data warehouse
Introduction Data warehouseAmin Choroomi
 
Introduction to Data Warehousing
Introduction to Data WarehousingIntroduction to Data Warehousing
Introduction to Data WarehousingEyad Manna
 
Data Warehousing - in the real world
Data Warehousing - in the real worldData Warehousing - in the real world
Data Warehousing - in the real worldukc4
 
Real World Business Intelligence and Data Warehousing
Real World Business Intelligence and Data WarehousingReal World Business Intelligence and Data Warehousing
Real World Business Intelligence and Data Warehousingukc4
 
Data warehouse architecture
Data warehouse architectureData warehouse architecture
Data warehouse architecturepcherukumalla
 
Dw Concepts
Dw ConceptsDw Concepts
Dw Conceptsdataware
 
DATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALA
DATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALADATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALA
DATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALASaikiran Panjala
 
11666 Bitt I 2008 Lect3
11666 Bitt I 2008 Lect311666 Bitt I 2008 Lect3
11666 Bitt I 2008 Lect3ambujm
 

What's hot (20)

Introduction to Data Warehousing
Introduction to Data WarehousingIntroduction to Data Warehousing
Introduction to Data Warehousing
 
Datawarehouse and OLAP
Datawarehouse and OLAPDatawarehouse and OLAP
Datawarehouse and OLAP
 
Data ware house architecture
Data ware house architectureData ware house architecture
Data ware house architecture
 
Seminar datawarehousing
Seminar datawarehousingSeminar datawarehousing
Seminar datawarehousing
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
Data warehousing
Data warehousingData warehousing
Data warehousing
 
Introduction Data warehouse
Introduction Data warehouseIntroduction Data warehouse
Introduction Data warehouse
 
Introduction to Data Warehousing
Introduction to Data WarehousingIntroduction to Data Warehousing
Introduction to Data Warehousing
 
Data Warehousing - in the real world
Data Warehousing - in the real worldData Warehousing - in the real world
Data Warehousing - in the real world
 
data warehousing
data warehousingdata warehousing
data warehousing
 
Real World Business Intelligence and Data Warehousing
Real World Business Intelligence and Data WarehousingReal World Business Intelligence and Data Warehousing
Real World Business Intelligence and Data Warehousing
 
Data warehouse architecture
Data warehouse architectureData warehouse architecture
Data warehouse architecture
 
Data warehousing
Data warehousingData warehousing
Data warehousing
 
Data warehousing ppt
Data warehousing pptData warehousing ppt
Data warehousing ppt
 
Dw Concepts
Dw ConceptsDw Concepts
Dw Concepts
 
DATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALA
DATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALADATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALA
DATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALA
 
11666 Bitt I 2008 Lect3
11666 Bitt I 2008 Lect311666 Bitt I 2008 Lect3
11666 Bitt I 2008 Lect3
 
Data Warehousing
Data WarehousingData Warehousing
Data Warehousing
 
Data warehousing
Data warehousingData warehousing
Data warehousing
 
Data Warehouse 101
Data Warehouse 101Data Warehouse 101
Data Warehouse 101
 

Similar to Data warehouse system and its concepts

Data ware housing - Introduction to data ware housing process.
Data ware housing - Introduction to data ware housing process.Data ware housing - Introduction to data ware housing process.
Data ware housing - Introduction to data ware housing process.Vibrant Technologies & Computers
 
Data warehouse introduction
Data warehouse introductionData warehouse introduction
Data warehouse introductionMurli Jha
 
data warehousing
data warehousingdata warehousing
data warehousing143sohil
 
Data warehousing.pptx
Data warehousing.pptxData warehousing.pptx
Data warehousing.pptxAnusuya123
 
Online analytical processing
Online analytical processingOnline analytical processing
Online analytical processingSamraiz Tejani
 
Introduction to Data warehouse
Introduction to Data warehouseIntroduction to Data warehouse
Introduction to Data warehouseSwapnilSaurav7
 
Module 1_Data Warehousing Fundamentals.pptx
Module 1_Data Warehousing Fundamentals.pptxModule 1_Data Warehousing Fundamentals.pptx
Module 1_Data Warehousing Fundamentals.pptxnikshaikh786
 
DWDM Unit 1 (1).pptx
DWDM Unit 1 (1).pptxDWDM Unit 1 (1).pptx
DWDM Unit 1 (1).pptxSalehaMariyam
 
MariaDB AX: Solución analítica con ColumnStore
MariaDB AX: Solución analítica con ColumnStoreMariaDB AX: Solución analítica con ColumnStore
MariaDB AX: Solución analítica con ColumnStoreMariaDB plc
 
MariaDB AX: Analytics with MariaDB ColumnStore
MariaDB AX: Analytics with MariaDB ColumnStoreMariaDB AX: Analytics with MariaDB ColumnStore
MariaDB AX: Analytics with MariaDB ColumnStoreMariaDB plc
 

Similar to Data warehouse system and its concepts (20)

Data ware housing - Introduction to data ware housing process.
Data ware housing - Introduction to data ware housing process.Data ware housing - Introduction to data ware housing process.
Data ware housing - Introduction to data ware housing process.
 
Data warehouse introduction
Data warehouse introductionData warehouse introduction
Data warehouse introduction
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
data warehousing
data warehousingdata warehousing
data warehousing
 
Datawarehousing
DatawarehousingDatawarehousing
Datawarehousing
 
Data warehousing.pptx
Data warehousing.pptxData warehousing.pptx
Data warehousing.pptx
 
Online analytical processing
Online analytical processingOnline analytical processing
Online analytical processing
 
Introduction to Data warehouse
Introduction to Data warehouseIntroduction to Data warehouse
Introduction to Data warehouse
 
Module 1_Data Warehousing Fundamentals.pptx
Module 1_Data Warehousing Fundamentals.pptxModule 1_Data Warehousing Fundamentals.pptx
Module 1_Data Warehousing Fundamentals.pptx
 
DWDM Unit 1 (1).pptx
DWDM Unit 1 (1).pptxDWDM Unit 1 (1).pptx
DWDM Unit 1 (1).pptx
 
MariaDB AX: Solución analítica con ColumnStore
MariaDB AX: Solución analítica con ColumnStoreMariaDB AX: Solución analítica con ColumnStore
MariaDB AX: Solución analítica con ColumnStore
 
MariaDB AX: Analytics with MariaDB ColumnStore
MariaDB AX: Analytics with MariaDB ColumnStoreMariaDB AX: Analytics with MariaDB ColumnStore
MariaDB AX: Analytics with MariaDB ColumnStore
 
Data Warehouse
Data WarehouseData Warehouse
Data Warehouse
 
kalyani.ppt
kalyani.pptkalyani.ppt
kalyani.ppt
 
Data Warehouse
Data WarehouseData Warehouse
Data Warehouse
 
kalyani.ppt
kalyani.pptkalyani.ppt
kalyani.ppt
 
Data warehousing
Data warehousingData warehousing
Data warehousing
 
Data warehousing
Data warehousingData warehousing
Data warehousing
 
Data warehousing
Data warehousingData warehousing
Data warehousing
 
DATA WAREHOUSING.2.pptx
DATA WAREHOUSING.2.pptxDATA WAREHOUSING.2.pptx
DATA WAREHOUSING.2.pptx
 

Recently uploaded

Blooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docxBlooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docxUnboundStockton
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
Painted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaPainted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaVirag Sontakke
 
Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxAvyJaneVismanos
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application ) Sakshi Ghasle
 
History Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptxHistory Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptxsocialsciencegdgrohi
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,Virag Sontakke
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTiammrhaywood
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Celine George
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 

Recently uploaded (20)

Blooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docxBlooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docx
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
9953330565 Low Rate Call Girls In Rohini Delhi NCR
9953330565 Low Rate Call Girls In Rohini  Delhi NCR9953330565 Low Rate Call Girls In Rohini  Delhi NCR
9953330565 Low Rate Call Girls In Rohini Delhi NCR
 
Painted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaPainted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of India
 
Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptx
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application )
 
History Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptxHistory Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptx
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
Staff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSDStaff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSD
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 

Data warehouse system and its concepts

  • 2. Table of Contents • Data warehouse • Data Warehouse vs. Operational DBMS • OLTP vs. OLAP • Why Need Separate Data Warehouse? • Data Mart • Data warehouse Architecture • Data warehouse backend process • Metadata Repository • OLAP Server Architecture • Schemas for multidimensional data • Partitioning the Data warehouse
  • 3. What is Data Warehouse? • Defined in many different ways, but not rigorously. – A decision support database that is maintained separately from the organization’s operational database – Support information processing by providing a solid platform of consolidated, historical data for analysis. • “A data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile collection of data in support of management’s decision-making process.”—W. H. Inmno • Data warehousing: – The process of constructing and using data warehouses Gaurav Garg (AITM, Palwal)
  • 4. Data Warehouse—Subject-Oriented • Organized around major subjects, such as customer, product, sales • Focusing on the modeling and analysis of data for decision makers, not on daily operations or transaction processing • Provide a simple and concise view around particular subject issues by excluding data that are not useful in the decision support process Gaurav Garg (AITM, Palwal)
  • 5. Data Warehouse—Integrated • Constructed by integrating multiple, heterogeneous data sources – relational databases, flat files, on-line transaction records • Data cleaning and data integration techniques are applied. – Ensure consistency in naming conventions, encoding structures, attribute measures, etc. among different data sources • E.g., Hotel price: currency, tax, breakfast covered, etc. – When data is moved to the warehouse, it is converted. Gaurav Garg (AITM, Palwal)
  • 6. Data Warehouse—Time Variant • The time horizon for the data warehouse is significantly longer than that of operational systems – Operational database: current value data – Data warehouse data: provide information from a historical perspective (e.g., past 5-10 years) • Every key structure in the data warehouse – Contains an element of time, explicitly or implicitly – But the key of operational data may or may not contain “time element” Gaurav Garg (AITM, Palwal)
  • 7. Data Warehouse vs. Operational DBMS • OLTP (on-line transaction processing) – Major task of traditional relational DBMS – Day-to-day operations: purchasing, inventory, banking, manufacturing, payroll, registration, accounting, etc. • OLAP (on-line analytical processing) – Major task of data warehouse system – Data analysis and decision making • Distinct features (OLTP vs. OLAP): – User and system orientation: customer vs. market – Data contents: current, detailed vs. historical, consolidated – Database design: ER + application vs. star + subject – View: current, local vs. evolutionary, integrated – Access patterns: update vs. read-only but complex queries Gaurav Garg (AITM, Palwal)
  • 8. OLTP vs. OLAP OLTP OLAP users clerk, IT professional knowledge worker function day to day operations decision support DB design application-oriented subject-oriented data current, up-to-date detailed, flat relational isolated historical, summarized, multidimensional integrated, consolidated usage repetitive ad-hoc access read/write index/hash on prim. key lots of scans unit of work short, simple transaction complex query # records accessed tens millions #users thousands hundreds DB size 100MB-GB 100GB-TB metric transaction throughput query throughput, response Gaurav Garg (AITM, Palwal)
  • 9. Why Separate Data Warehouse? • High performance for both systems – DBMS— tuned for OLTP: access methods, indexing, concurrency control, recovery – Warehouse—tuned for OLAP: complex OLAP queries, multidimensional view, consolidation • Different functions and different data: – missing data: Decision support requires historical data which operational DBs do not typically maintain – data consolidation: DS requires consolidation (aggregation, summarization) of data from heterogeneous sources – data quality: different sources typically use inconsistent data representations, codes and formats which have to be reconciled Gaurav Garg (AITM, Palwal)
  • 10. Data Warehouse Usage • Three kinds of data warehouse applications – Information processing • supports querying, basic statistical analysis, and reporting using crosstabs, tables, charts and graphs – Analytical processing • multidimensional analysis of data warehouse data • supports basic OLAP operations, slice-dice, drilling, pivoting – Data mining • knowledge discovery from hidden patterns • supports associations, constructing analytical models, performing classification and prediction, and presenting the mining results using visualization tools Gaurav Garg (AITM, Palwal)
  • 11. Data Mart • Data marts are partitions of the overall data warehouse. • A data mart is a simple form of a data warehouse that is focused on a single subject (or functional area), such as Sales, Finance, or Marketing • Data marts may contain some overlapping data. E.g. a store sales data mart, would also need some data from inventory and payroll. Types of data marts Dependent data marts draw data from a central data warehouse that has already been created. Independent data marts, are standalone systems built by drawing data directly from operational or external sources of data, or both. The main difference between these is how you get data out of the sources and into the data mart. This step, called the Extraction-Transformation- and Loading (ETL) process, involves moving data from operational systems, filtering it, and loading it into the data mart. Gaurav Garg (AITM, Palwal)
  • 12. A 3-tier Data warehouse Architecture Data Warehouse Extract Transform Load Refresh OLAP Engine Analysis Query Reports Data mining Monitor & Integrator Metadata Data Sources Front-End Tools Serve Data Marts Operational DBs Other sources Data Storage OLAP Server Gaurav Garg (AITM, Palwal)
  • 13. Data warehouse backend process Data extraction It’s a process of extracting data for the warehouse from various sources. Data cleaning Its an essential operation for construction of quality data warehouse. Because a large volume of data from heterogeneous sources are involved,so there is high probability of errors in the data. The data cleaning process include • Using transformation rules e.g. translating attribute name like ‘age’ to ‘DOB’ • Using domain specific knowledge • Performing parsing and fuzzy matching e.g for multiple data sources, one can designate a preferred source as a matching standard • Auditing it is difficult and costly to clean the data. Gaurav Garg (AITM, Palwal)
  • 14. Data Transformation It’s a process of transforming heterogeneous data to an uniform structure so that data can be combined and integrated .i.e convert data from legacy or host format to warehouse format. Data Loading For loading correct, refined, processed data( huge in size) into data warehouse, there are some data loading strategies. Batch loading. Sequential loading. Incremental loading. Refresh The process of updating the data. Gaurav Garg (AITM, Palwal)
  • 15. Metadata Repository • Meta data is the data defining warehouse objects. It stores: • Description of the structure of the data warehouse – schema, view, dimensions, hierarchies, derived data defn, data mart locations and contents • Operational meta-data – data lineage (history of migrated data and transformation path), currency of data (active, archived, or purged), monitoring information (warehouse usage statistics, error reports, audit trails) • The algorithms used for summarization • The mapping from operational environment to the data warehouse • Data related to system performance – warehouse schema, view and derived data definitions • Business data – business terms and definitions, ownership of data, charging policies Gaurav Garg (AITM, Palwal)
  • 16. OLAP Server Architecture • Relational OLAP (ROLAP) – Use relational or extended-relational DBMS to store and manage warehouse data and OLAP middle ware – Include optimization of DBMS backend, implementation of aggregation navigation logic, and additional tools and services – Greater scalability • Multidimensional OLAP (MOLAP) – Sparse array-based multidimensional storage engine – Fast indexing to pre-computed summarized data • Hybrid OLAP (HOLAP) (e.g., Microsoft SQLServer) – Flexibility, e.g., low level: relational, high-level: array • Specialized SQL servers (e.g., Redbricks) – Specialized support for SQL queries over star/snowflake schemas Gaurav Garg (AITM, Palwal)
  • 17. Multidimensional Data • Sales volume as a function of product, month, and region Product Month Dimensions: Product, Location, Time Hierarchical summarization paths Industry Region Year Category Country Quarter Product City Month Week Office Day Gaurav Garg (AITM, Palwal)
  • 18. Multidimensional Data Model “what is a data cube?” “what do you mean by dimensions?” Data cube allows data to be modeled and viewed in multiple dimensions. It is defined by dimensions and facts. Dimensions are the perspectives or entities with respect to which an organizations wants to keep records. Each dimension may have a table associated with it, called a dimension table. A dimension table for the item may contain the attributes item_name, brand etc. A multidimensional data model is typically organized around a central theme, like sales. These theme is represented by fact table. Facts are numerical measure. Examples of facts for a sales data warehouse include dollars_sold,unit_sold,amount_budgeted. Facts table contains the names of the facts, or measures, as well as keys to each related dimension tables. Gaurav Garg (AITM, Palwal)
  • 19. Schemas for multidimensional data These multidimensional model can exist in the form of a star schema, a snowflake schema, or a fact constellation schema (galaxy schema). Star Shcema It is the most common schema type in which the data warehouse contains a large central table (fact table) which contains the bulk of data, with no redundancy, and a set of smaller attendant tables (dimension tables), one for each dimension. The dimension tables displayed a radial pattern around the central fact table. Snowflake schema It is a variant of the star schema model, where some dimension tables are normalized, thereby further splitting the data into additional tables. The resulting schema graph forms a shape similar to a snowflake. Gaurav Garg (AITM, Palwal)
  • 20. Example of Star Schema time_key day day_of_the_week month quarter year time location_key street city state_or_province country location Sales Fact Table time_key item_key branch_key location_key units_sold dollars_sold avg_sales Measures item_key item_name brand type supplier_type item branch_key branch_name branch_type branch Gaurav Garg (AITM, Palwal)
  • 21. Example of Snowflake Schema time_key day day_of_the_week month quarter year time location_key street city_key location Sales Fact Table time_key item_key branch_key location_key units_sold dollars_sold avg_sales Measures item_key item_name brand type supplier_key item branch_key branch_name branch_type branch supplier_key supplier_type supplier city_key city state_or_province country city Gaurav Garg (AITM, Palwal)
  • 22. Fact constellation (galaxy schema) Some sophisticated applications may require multiple fact tables to share dimension tables. This kind of schema can be viewed as a collection of stars, and hence is called a galaxy schema or fact constellation. Gaurav Garg (AITM, Palwal)
  • 23. Example of Fact Constellation time_key day day_of_the_week month quarter year time location_key street city province_or_state country location Sales Fact Table time_key item_key branch_key location_key units_sold dollars_sold avg_sales Measures item_key item_name brand type supplier_type item branch_key branch_name branch_type branch Shipping Fact Table time_key item_key shipper_key from_location to_location dollars_cost units_shipped shipper_key shipper_name location_key shipper_type shipper Gaurav Garg (AITM, Palwal)
  • 24. Partitioning the Data warehouse Data warehouses often contain large tables and require techniques both for managing these large tables and for providing good query performance across these large tables. Partitioning have many advantages, some are as Scalability Partitioning helps scaling a data warehouse by dividing database objects into smaller pieces, enabling access to smaller, more manageable objects. Having direct access to smaller objects . Performance Good performance is a key to success for a data warehouse. Analyses run against the database should return within a reasonable amount of time, even if the queries access large amounts of data in tables that are terabytes in size. Partitioning increases the performance of data warehouse. Note: partitions of database are generally known as clusters. Gaurav Garg (AITM, Palwal)
  • 25. Types of partitioning • Basically it has two types 1. Horizontal partitioning 2. Vertical partitioning There are two main categories of partitioning algorithms 1. k-means algorithm, where each cluster is represented by the center of gravity of the cluster. 2. K-medoid algorithm, where each cluster is represented by one of the objects of the cluster located near the center. Gaurav Garg (AITM, Palwal)
  • 26. Some general methods of partitioning Range Partitioning Range partitioning maps data to partitions based on ranges of partition key values that you establish for each partition. It is the most common type of partitioning and is often used with dates. For example, you might want to partition sales data into monthly partitions. Hash Partitioning Hash partitioning maps data to partitions based on a hashing algorithm The hashing algorithm evenly distributes rows among partitions, giving partitions approximately the same size. List Partitioning List partitioning enables you to explicitly control how rows map to partitions. This is different from range partitioning, where a range of values is associated with a partition and with hash partitioning, where you have no control of the row-to-partition mapping. The advantage of list partitioning is that you can group and organize unordered and unrelated sets of data in a natural way. Gaurav Garg (AITM, Palwal)