SlideShare a Scribd company logo
1 of 37
Download to read offline
Data Warehouse
A Practitioner’s Overview
Praveen Kumar B S
8 July 2019
1
2
This pack is put together for a 2 hour
session to final year students on the
topic of Data Warehouse.
The contents are author’s own view
as a practitioner with no claim to any
originality, accuracy or completeness.
The narrative is intended to help
reader gain a brief insight into what
goes into design of a modern data
platform along with considerations
and constraints.
As the author, I am keen to hear your
comments. Please send your
feedback to pravbs@gmail.com
3
Let us look at
a typical data
model for a
ticket
reservation
system
1
16
As a Traveler, I want to be able to reserve a ticket
4
3NF data
model for
ticket
reservation
1. Data model is friendly for
CREATE, READ, UPDATE
and DELETE
2. Most importantly UPDATES
are limited to say, one
group of tables
3. Describe an event through
use of JOINS to merge
data from two or more
tables
4. INDEXES are used on
tables to improve query
performance
5. Such systems are
bracketed as Transaction
Processing Systems (OLTP)
5
The business
is interested
in improving
performance,
by asking
questions
2
16
Answering Performance Queries
6
Which provider has
cancelled trips the most?
What is the average
number of seats booked
by customers for a trip?
What customer age
brackets prefer using my
platform?
What is the coverage of
cities in a given region?
What is the impact of
high peak day pricing on
booking cancellation?
1. 3NF data models are non-
performant for analysis
2. Aggregation queries are
long running and impact
performance of regular
queries negatively
3. It is common to separate
systems that are customer
facing vs business facing
4. Data duplication through
copying is a common form
of segregation
5. Choosing timing and
frequency of copying is a
well thought out decision
7
Workloads
are
segregated
across
systems
3
16
Division of Responsibilities
8
Ticket
Reservation
System
customers
Business
Performance
Analysis
executives
Data Copy
1. Separating workloads
ensures optimal response
time and/or throughput
2. A variety of means is
adopted to copy data
between systems
3. However, a time delay is
introduced between an
event and its analysis
4. Data structures for a
performance analysis
system is de-normalized
5. Design of data copy
mechanism is a trade-off
between acceptable time
delay and analysis needs
Online
Transaction
Processing
System
Do I have tickets for this
Friday?
Will an additional bus be
filled-up this Friday?
Expected
response
time in
milliseconds
Inference
possibly after
a day long
analysis
Decision
Support
Systems
9
Analysis is
always done
on one or
more
dimensions
4
16
Dimensions help gain Insights for Decision Making
10
1. Dimensions help pivot a
simple fact or an
computed fact / measure
2. Using multiple dimensions
allows, comparison for
example
3. Charts plotted using
dimensions help visualize
and gain better insights
4. Good insights in turn
allow business to make
better decisions
5. Success of decision
support systems depend
on dimensions it supports
for analysis
* All images are courtesy, their original owners. Intent is only to provide examples. The author does not claim any responsibility or credit
11
Decisions
might need
inputs from
multiple
sources
5
16
Decisions Impact Multiple Business Processes
12
1. Continuous optimization
requires continuous
analysis and decisions
2. Each department
potentially has its own
transaction system
3. Common functions across
departments might be
hosted on a core system,
e.g., ERP*
4. Integrated view of data is
necessary for analyzing
performance
5. Decision support system
hence can have copy of
data from all systems
* Enterprise Resource Planning
An Organization can be represented as a value chain as below
13
Drilling down
into making of
a decision
support
system
6
16
Components of a Decision Support System
14
Business Performance Analysis
Source
systems
Data
transfer
systems
File
transfer
Streaming
.
.
.
Extract + Load
Staging
Data
Warehouse
Data
Marts
Transform + Load
Transform + Load
1. Performance analysis and
decision support needs
data with high fidelity*
2. Ecosystem of decision
support is an assembly of
data flows and data
storage / persistence
3. Large enterprises typically
use off-the-shelf products
to achieve objectives
4. Data warehouse is usually
mandated to store single
version of truth
5. Path to decision making is
continuously evolving,
influencing entire chain
* Data fidelity means that as datatravels from the
point of originationto consumption,it retains its
granularityand meaning.
Decision Makers
Reports /
Business Intelligence /
Data Analytics
15
Further drill
down into a
data
warehouse
7
16
Organizing a Data Warehouse
16
1. Data from processes and
events are captured and
published into data
warehouse
2. Data from a business
process can spawn
multiple OLTP databases
3. Physical tables in a data
warehouse are organized
into subject areas
4. Inter-table relationships
are pre-established to
enforce data quality
5. For speed of analysis, key
facts are computed and
stored as well
Business Processes and Events
Subject Areas
Tables and Relationships
Facts and data about processes
Process examples
Customer on-boarding
Order management
Product fulfillment
Payment
Subject area examples
Customers
Orders
Products
Sales
17
The schema
that captures
information
8
16
Facts and Dimensions
18
1. Replicating OLTP database
schema is sufficient for
many query needs
2. Design of schema is a
function of READ needs
3. Example schema did not
need any aggregation or
inference
4. However, some
enrichment on the
dimension improves READ-
ability
5. Traceability of facts is an
important requirement
for data warehouse tables
Following is just one way of arranging facts and describing them with dimensions*
*This is the famous STAR SCHEMA. However, if Trip_Dimension is
linked with additional dimension tables, it soon becomes a SNOW
FLAKE SCHEMA
19
Heading
North or
South with
aggregated
or detailed
facts
9
16
Analysts Need a way to ROLL-UP or DRILL-DOWN
20
1. Roll-up and drill-down
across dimensions are key
capabilities of DSS
2. More the number of
dimensions the better for
analysis and insights gain
3. OLTP system supplies
facts. Some dimension
attributes are inferred
4. Aggregations across
hierarchy is either pre-
computed and stored or
dynamically ascertained
5. Pre-computation improves
performance at the cost
of data currency
i6
i5 i4
i3
i1 i2
Hierarchy
Roll-up: Summarizing
while traversing up the
hierarchy
Drill-down: Getting into
details while moving
down the hierarchy
measures
Place Dimension Time Dimension
drilldown roll-up
Facts or measures
21
The
abbreviation
soup
10
16
Multidimensional Modeling – The Data Cube
22
Source: Conceptual Modeling Solutions for the Data Warehouse – Stefano Rizzi DEIS – University of Bologna, Italy
1. Decision making is
enabled by cubes,
dimensions and measures
2. Decision activity is called
online analytical
processing – OLAP
3. ROLAP systems store data
in relational form and
creates cube at runtime
4. MOLAP systems store
precomputed data in the
form of multi-dimensional
cubes
5. HOLAP is a hybrid of both
ROLAP and MOLAP – just
enough aggregation
23
OLAP space is
continuously
evolving
11
16
A Play on Storage and Memory Technologies
24
 Accessed https://en.wikipedia.org/wiki/Comparison_of_OLAP_servers on 6 Jul 2019
 Continuous innovation is bound to change the product capabilities over time
A snapshot of different products and their *OLAP support
1. Reporting use cases need
authentic lineage thereby
insisting on precomputing
2. The trend has been to
reduce precomputation to
eliminate data staleness
3. For use cases with near
real time data needs,
dynamic cube is useful
4. Raw facts are held in disks
and are aggregated at
runtime in-memory
5. Speed of computing is
enhanced with parallel
processing using clusters
a
b
c
d
e
g
h
MOLAP – a  b  c d
ROLAP – e  h
HOLAP – a  b  e, g h
storage
memory
Cube
processing
Multi-
Dimensional
Data
Relational
Data
25
Who needs
cheese?
12
16
Business Need creates a case for Technology
26
Re-imagined from Providing OLAP to User-Analysts: an IT Mandate ,
accessed on 6 July 2019
1. Data warehouses are
designed to cater to
business users
2. All the processing and
storage helps in effective
operations, tactics and
strategy
3. Most business users prefer
data for analysis in
Microsoft Excel
4. Data scientists use tools
such as SAS, MATLAB to
conduct experiments
5. Choice of technology is
evolving to satisfy
changing business needs
 Operational
 Tactical
 Strategic
Formulaic
Goal-seeking: How can I increase sale of housing loan
in Tier 2 cities
Contemplative
What-if: What is the effect of decreasing interest
rates on sales of housing loan
Exegetical
Slice and Dice: Understand impact of price of houses
on housing loan take off
Categorical
Explanatory: How many people opted for housing
loan during summer vacations?
Categories of Business Analysis
SQL
Dimensional
files
Data
Platform
Sample technologies used at various levels
Indicative technologies only, varies based on needs
and organizations
APIs
27
Plumbing from
the sources
13
16
Integration – Data, Transport and Frequencies
28
1. Primary purpose of
integration is to get data
to business users
2. Traditional methods
sourced data through files
and loaded into databases
3. Need for agility – sense
and act – increase use of
data directly from sources
4. Traditional extract-
transform-load is giving
way to data virtualization
5. All integration - data and
application – is converging
– to reduce latencies as
much as possible
Traditional Data
Warehouse
Real – Time Data
Warehouse
Logical Data Warehouse
Context – Independent
Data Warehouse
Data warehouse Use Cases*
Data Sources
Flat files
Relational
Databases
Message
Topics
Frequencies
Continuously
Updated Log
Files
Change Data
Periodic (Batch)
e.g. once in a day
Near real time
e.g. once in 1 minute
Pre-processed
Data
External Data
Real time
e.g. on business event
Direct access to
source
Data moved into
intermediate location
Data enriched and
precomputed
Data Access / Transport
Data anonymized
before use
* Definitions accessed here on 6 July 2019
Data made available
in-memory
29
Core of
everything is
data
14
16
It is Important to Understand Data
30
1. The old adage of quality
of output depending on
quality of input still holds
2. A continuously updated
metadata engine ensures
authenticity of outcomes
3. Changing dimensions can
impact data access
performance and accuracy
4. Speed of availability can
be a trade-off with
accuracy
5. Data governance driven
by data architecture is
key to managing sanctity
of data warehouse
Uniqueness and
Relationships
Changing
Dimensions
Historical Facts
+ Time Series
Primary Key, Business Key, Surrogate
Key, Foreign Key
Type 1 – no change, e.g. date of Birth
Type 2 – Infrequently change, e.g. manager,
stored with an effective start date
Type 3 – similar to Type 3, but stores both old
and new values together in the same row
Facts that indicate an event or
influenced by event, e.g. Open and close
price of stock, over a period of time
Computed
Measures Facts that are aggregated, inferred, derived
etc., for making sense of eventsMissing
Values
Facts that genuinely indicate absence of any
values or erroneously not captured for event
Partly
Unstructured
Information in textual format requiring
parsing, tokenizing and context setting to help
understand and derive insights
Graph
Datasets having high
degree of relationships
– making relationships
first class citizens
Spatial
Data
Datasets with multiple
layers describing locations
Unstructured Data
Audio, video and images; usually
accompanied by metadata
Each needs
its own way
of handling
Data can be held in a variety of formats - relational, XML, JSON, key-value, geospatial, graph
31
Need and
innovation has
thrown up
alternatives
15
16
Alternative Data Platforms
32
1. Corporate information
factory uses central
integration database as a
single version of truth
2. Data lake bring together
all sorts of data from
within and outside
3. Data vault design ensures
availability of all data –
both clean and unclean
4. Data Archive, (not
depicted), stores old data
for reference purpose
5. Technology choice is
dependent on perceived
benefits and cost
Corporate InformationFactory Data Lake Data Vault
Data ingested into a
central integration
database in 3NF
Data marts for specific purpose
External
source
Data ingested into a central
cluster in raw format
Data accessed on demand
using variety of means
Data ingested into a
central database as-is
with time factor
Historical view of all data
across operational data stores
Allow enhanceability while
guaranteeing data accuracy
Allow insight creation merging
data from multiple sources
Allow data to be viewed as it
arrived rather than as it should
have arrived
33
You need not
spend all the
money
upfront
16
16
Data Platforms are Available as-a-Service
34
1. On-premise data sources
are connected to a cloud
service provider
2. Oracle, SAP, Snowflake,
Microsoft, Amazon etc.,
provide data warehouse
as a platform (DWaaS)
3. Key value proposition is
reduced capital expense
in procuring and setting
up infrastructure
4. An enterprise can quickly
acquire a data warehouse
software capability
5. Running costs and security
need to be monitored
Copied from https://panoply.io/data-warehouse-guide/data-warehouse-architecture-traditional-vs-cloud/
35
Thoughts to
Ponder

36
37
Thank You

More Related Content

What's hot

Mis2013 chapter 12 business intelligence and knowledge management
Mis2013   chapter 12 business intelligence and knowledge managementMis2013   chapter 12 business intelligence and knowledge management
Mis2013 chapter 12 business intelligence and knowledge managementAndi Iswoyo
 
Designing a Framework to Standardize Data Warehouse Development Process for E...
Designing a Framework to Standardize Data Warehouse Development Process for E...Designing a Framework to Standardize Data Warehouse Development Process for E...
Designing a Framework to Standardize Data Warehouse Development Process for E...ijdms
 
Different types of data processing
Different types of data processingDifferent types of data processing
Different types of data processingShyam Sunder Budhwar
 
Chapter 2-data-warehousingppt2517 vero
Chapter 2-data-warehousingppt2517 veroChapter 2-data-warehousingppt2517 vero
Chapter 2-data-warehousingppt2517 veroangshuman2387
 
Fast Range Aggregate Queries for Big Data Analysis
Fast Range Aggregate Queries for Big Data AnalysisFast Range Aggregate Queries for Big Data Analysis
Fast Range Aggregate Queries for Big Data AnalysisIRJET Journal
 
Computer based information system
Computer based information systemComputer based information system
Computer based information systemshoaibzaheer1
 
Chapter 5 data resource management
Chapter 5 data resource managementChapter 5 data resource management
Chapter 5 data resource managementAG RD
 
Benefits of data_archiving_in_data _warehouses
Benefits of data_archiving_in_data _warehousesBenefits of data_archiving_in_data _warehouses
Benefits of data_archiving_in_data _warehousesSurendar Bandi
 
Introduction to Information System
Introduction to Information SystemIntroduction to Information System
Introduction to Information Systemshaylor_swift
 
Data Processing and its Types
Data Processing and its TypesData Processing and its Types
Data Processing and its TypesMuhammad Zubair
 
Book 1 chapter-2
Book 1 chapter-2Book 1 chapter-2
Book 1 chapter-2GTU
 
Databases
DatabasesDatabases
DatabasesUMaine
 
01 bus infosysoverview
01  bus infosysoverview01  bus infosysoverview
01 bus infosysoverviewvicson_4
 
6. ijece guideforauthors 2012_2 eidt sat
6. ijece guideforauthors 2012_2 eidt sat6. ijece guideforauthors 2012_2 eidt sat
6. ijece guideforauthors 2012_2 eidt satIAESIJEECS
 
Information,Knowledge,Business intelligence
Information,Knowledge,Business intelligenceInformation,Knowledge,Business intelligence
Information,Knowledge,Business intelligenceHiren Selani
 

What's hot (20)

Mis2013 chapter 12 business intelligence and knowledge management
Mis2013   chapter 12 business intelligence and knowledge managementMis2013   chapter 12 business intelligence and knowledge management
Mis2013 chapter 12 business intelligence and knowledge management
 
Designing a Framework to Standardize Data Warehouse Development Process for E...
Designing a Framework to Standardize Data Warehouse Development Process for E...Designing a Framework to Standardize Data Warehouse Development Process for E...
Designing a Framework to Standardize Data Warehouse Development Process for E...
 
Different types of data processing
Different types of data processingDifferent types of data processing
Different types of data processing
 
Chapter 2-data-warehousingppt2517 vero
Chapter 2-data-warehousingppt2517 veroChapter 2-data-warehousingppt2517 vero
Chapter 2-data-warehousingppt2517 vero
 
Fast Range Aggregate Queries for Big Data Analysis
Fast Range Aggregate Queries for Big Data AnalysisFast Range Aggregate Queries for Big Data Analysis
Fast Range Aggregate Queries for Big Data Analysis
 
Unit 5
Unit 5 Unit 5
Unit 5
 
Computer based information system
Computer based information systemComputer based information system
Computer based information system
 
Chapter 5 data resource management
Chapter 5 data resource managementChapter 5 data resource management
Chapter 5 data resource management
 
Data Warehouse
Data WarehouseData Warehouse
Data Warehouse
 
Business Analytics Unit III: Developing analytical talent
Business Analytics Unit III: Developing analytical talentBusiness Analytics Unit III: Developing analytical talent
Business Analytics Unit III: Developing analytical talent
 
Benefits of data_archiving_in_data _warehouses
Benefits of data_archiving_in_data _warehousesBenefits of data_archiving_in_data _warehouses
Benefits of data_archiving_in_data _warehouses
 
Introduction to Information System
Introduction to Information SystemIntroduction to Information System
Introduction to Information System
 
Data Processing and its Types
Data Processing and its TypesData Processing and its Types
Data Processing and its Types
 
Book 1 chapter-2
Book 1 chapter-2Book 1 chapter-2
Book 1 chapter-2
 
Ijebea14 267
Ijebea14 267Ijebea14 267
Ijebea14 267
 
Databases
DatabasesDatabases
Databases
 
Mis
MisMis
Mis
 
01 bus infosysoverview
01  bus infosysoverview01  bus infosysoverview
01 bus infosysoverview
 
6. ijece guideforauthors 2012_2 eidt sat
6. ijece guideforauthors 2012_2 eidt sat6. ijece guideforauthors 2012_2 eidt sat
6. ijece guideforauthors 2012_2 eidt sat
 
Information,Knowledge,Business intelligence
Information,Knowledge,Business intelligenceInformation,Knowledge,Business intelligence
Information,Knowledge,Business intelligence
 

Similar to Data Warehouse - A Practitioner's Overview

ERP and related technology
ERP and related technology ERP and related technology
ERP and related technology Usman Tariq
 
Exercise solution of chapter1 of datawarehouse cs614(solution of exercise)
Exercise solution of chapter1 of datawarehouse cs614(solution of exercise)Exercise solution of chapter1 of datawarehouse cs614(solution of exercise)
Exercise solution of chapter1 of datawarehouse cs614(solution of exercise)AYESHA JAVED
 
An Integrated ERP With Web Portal
An Integrated ERP With Web PortalAn Integrated ERP With Web Portal
An Integrated ERP With Web PortalTracy Morgan
 
Modern trends in information systems
Modern trends in information systemsModern trends in information systems
Modern trends in information systemsPreeti Sontakke
 
An Integrated ERP with Web Portal
An Integrated ERP with Web Portal An Integrated ERP with Web Portal
An Integrated ERP with Web Portal acijjournal
 
Data Warehouses & Deployment By Ankita dubey
Data Warehouses & Deployment By Ankita dubeyData Warehouses & Deployment By Ankita dubey
Data Warehouses & Deployment By Ankita dubeyAnkita Dubey
 
IT6010-BUSINESS-INTELLIGENCE-Question-Bank_watermark.pdf
IT6010-BUSINESS-INTELLIGENCE-Question-Bank_watermark.pdfIT6010-BUSINESS-INTELLIGENCE-Question-Bank_watermark.pdf
IT6010-BUSINESS-INTELLIGENCE-Question-Bank_watermark.pdfHemaSenthil5
 
Data warehouse concepts
Data warehouse conceptsData warehouse concepts
Data warehouse conceptsobieefans
 
What is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
What is OLAP -Data Warehouse Concepts - IT Online Training @ NewyorksysWhat is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
What is OLAP -Data Warehouse Concepts - IT Online Training @ NewyorksysNEWYORKSYS-IT SOLUTIONS
 
Informatica and datawarehouse Material
Informatica and datawarehouse MaterialInformatica and datawarehouse Material
Informatica and datawarehouse Materialobieefans
 
Warehouse Planning and Implementation
Warehouse Planning and ImplementationWarehouse Planning and Implementation
Warehouse Planning and ImplementationSHIKHA GAUTAM
 
IRJET- Analysis for EnhancedForecastof Expense Movement in Stock Exchange
IRJET- Analysis for EnhancedForecastof Expense Movement in Stock ExchangeIRJET- Analysis for EnhancedForecastof Expense Movement in Stock Exchange
IRJET- Analysis for EnhancedForecastof Expense Movement in Stock ExchangeIRJET Journal
 
UNIT - 1 : Part 1: Data Warehousing and Data Mining
UNIT - 1 : Part 1: Data Warehousing and Data MiningUNIT - 1 : Part 1: Data Warehousing and Data Mining
UNIT - 1 : Part 1: Data Warehousing and Data MiningNandakumar P
 
On multi dimensional cubes of census data: designing and querying
On multi dimensional cubes of census data: designing and queryingOn multi dimensional cubes of census data: designing and querying
On multi dimensional cubes of census data: designing and queryingJaspreet Issaj
 

Similar to Data Warehouse - A Practitioner's Overview (20)

ERP and related technology
ERP and related technology ERP and related technology
ERP and related technology
 
Bi
BiBi
Bi
 
Exercise solution of chapter1 of datawarehouse cs614(solution of exercise)
Exercise solution of chapter1 of datawarehouse cs614(solution of exercise)Exercise solution of chapter1 of datawarehouse cs614(solution of exercise)
Exercise solution of chapter1 of datawarehouse cs614(solution of exercise)
 
An Integrated ERP With Web Portal
An Integrated ERP With Web PortalAn Integrated ERP With Web Portal
An Integrated ERP With Web Portal
 
Advanced Database System
Advanced Database SystemAdvanced Database System
Advanced Database System
 
Modern trends in information systems
Modern trends in information systemsModern trends in information systems
Modern trends in information systems
 
H1803014347
H1803014347H1803014347
H1803014347
 
An Integrated ERP with Web Portal
An Integrated ERP with Web Portal An Integrated ERP with Web Portal
An Integrated ERP with Web Portal
 
Data Warehouses & Deployment By Ankita dubey
Data Warehouses & Deployment By Ankita dubeyData Warehouses & Deployment By Ankita dubey
Data Warehouses & Deployment By Ankita dubey
 
IT6010-BUSINESS-INTELLIGENCE-Question-Bank_watermark.pdf
IT6010-BUSINESS-INTELLIGENCE-Question-Bank_watermark.pdfIT6010-BUSINESS-INTELLIGENCE-Question-Bank_watermark.pdf
IT6010-BUSINESS-INTELLIGENCE-Question-Bank_watermark.pdf
 
Data warehouse concepts
Data warehouse conceptsData warehouse concepts
Data warehouse concepts
 
What is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
What is OLAP -Data Warehouse Concepts - IT Online Training @ NewyorksysWhat is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
What is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
 
Unit 3 part 2
Unit  3 part 2Unit  3 part 2
Unit 3 part 2
 
Informatica and datawarehouse Material
Informatica and datawarehouse MaterialInformatica and datawarehouse Material
Informatica and datawarehouse Material
 
Data Mining
Data MiningData Mining
Data Mining
 
Business inteligence
Business inteligenceBusiness inteligence
Business inteligence
 
Warehouse Planning and Implementation
Warehouse Planning and ImplementationWarehouse Planning and Implementation
Warehouse Planning and Implementation
 
IRJET- Analysis for EnhancedForecastof Expense Movement in Stock Exchange
IRJET- Analysis for EnhancedForecastof Expense Movement in Stock ExchangeIRJET- Analysis for EnhancedForecastof Expense Movement in Stock Exchange
IRJET- Analysis for EnhancedForecastof Expense Movement in Stock Exchange
 
UNIT - 1 : Part 1: Data Warehousing and Data Mining
UNIT - 1 : Part 1: Data Warehousing and Data MiningUNIT - 1 : Part 1: Data Warehousing and Data Mining
UNIT - 1 : Part 1: Data Warehousing and Data Mining
 
On multi dimensional cubes of census data: designing and querying
On multi dimensional cubes of census data: designing and queryingOn multi dimensional cubes of census data: designing and querying
On multi dimensional cubes of census data: designing and querying
 

Recently uploaded

100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Delhi Call girls
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service OnlineCALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Onlineanilsa9823
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 

Recently uploaded (20)

100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service OnlineCALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 

Data Warehouse - A Practitioner's Overview

  • 1. Data Warehouse A Practitioner’s Overview Praveen Kumar B S 8 July 2019 1
  • 2. 2 This pack is put together for a 2 hour session to final year students on the topic of Data Warehouse. The contents are author’s own view as a practitioner with no claim to any originality, accuracy or completeness. The narrative is intended to help reader gain a brief insight into what goes into design of a modern data platform along with considerations and constraints. As the author, I am keen to hear your comments. Please send your feedback to pravbs@gmail.com
  • 3. 3 Let us look at a typical data model for a ticket reservation system 1 16
  • 4. As a Traveler, I want to be able to reserve a ticket 4 3NF data model for ticket reservation 1. Data model is friendly for CREATE, READ, UPDATE and DELETE 2. Most importantly UPDATES are limited to say, one group of tables 3. Describe an event through use of JOINS to merge data from two or more tables 4. INDEXES are used on tables to improve query performance 5. Such systems are bracketed as Transaction Processing Systems (OLTP)
  • 5. 5 The business is interested in improving performance, by asking questions 2 16
  • 6. Answering Performance Queries 6 Which provider has cancelled trips the most? What is the average number of seats booked by customers for a trip? What customer age brackets prefer using my platform? What is the coverage of cities in a given region? What is the impact of high peak day pricing on booking cancellation? 1. 3NF data models are non- performant for analysis 2. Aggregation queries are long running and impact performance of regular queries negatively 3. It is common to separate systems that are customer facing vs business facing 4. Data duplication through copying is a common form of segregation 5. Choosing timing and frequency of copying is a well thought out decision
  • 8. Division of Responsibilities 8 Ticket Reservation System customers Business Performance Analysis executives Data Copy 1. Separating workloads ensures optimal response time and/or throughput 2. A variety of means is adopted to copy data between systems 3. However, a time delay is introduced between an event and its analysis 4. Data structures for a performance analysis system is de-normalized 5. Design of data copy mechanism is a trade-off between acceptable time delay and analysis needs Online Transaction Processing System Do I have tickets for this Friday? Will an additional bus be filled-up this Friday? Expected response time in milliseconds Inference possibly after a day long analysis Decision Support Systems
  • 9. 9 Analysis is always done on one or more dimensions 4 16
  • 10. Dimensions help gain Insights for Decision Making 10 1. Dimensions help pivot a simple fact or an computed fact / measure 2. Using multiple dimensions allows, comparison for example 3. Charts plotted using dimensions help visualize and gain better insights 4. Good insights in turn allow business to make better decisions 5. Success of decision support systems depend on dimensions it supports for analysis * All images are courtesy, their original owners. Intent is only to provide examples. The author does not claim any responsibility or credit
  • 12. Decisions Impact Multiple Business Processes 12 1. Continuous optimization requires continuous analysis and decisions 2. Each department potentially has its own transaction system 3. Common functions across departments might be hosted on a core system, e.g., ERP* 4. Integrated view of data is necessary for analyzing performance 5. Decision support system hence can have copy of data from all systems * Enterprise Resource Planning An Organization can be represented as a value chain as below
  • 13. 13 Drilling down into making of a decision support system 6 16
  • 14. Components of a Decision Support System 14 Business Performance Analysis Source systems Data transfer systems File transfer Streaming . . . Extract + Load Staging Data Warehouse Data Marts Transform + Load Transform + Load 1. Performance analysis and decision support needs data with high fidelity* 2. Ecosystem of decision support is an assembly of data flows and data storage / persistence 3. Large enterprises typically use off-the-shelf products to achieve objectives 4. Data warehouse is usually mandated to store single version of truth 5. Path to decision making is continuously evolving, influencing entire chain * Data fidelity means that as datatravels from the point of originationto consumption,it retains its granularityand meaning. Decision Makers Reports / Business Intelligence / Data Analytics
  • 15. 15 Further drill down into a data warehouse 7 16
  • 16. Organizing a Data Warehouse 16 1. Data from processes and events are captured and published into data warehouse 2. Data from a business process can spawn multiple OLTP databases 3. Physical tables in a data warehouse are organized into subject areas 4. Inter-table relationships are pre-established to enforce data quality 5. For speed of analysis, key facts are computed and stored as well Business Processes and Events Subject Areas Tables and Relationships Facts and data about processes Process examples Customer on-boarding Order management Product fulfillment Payment Subject area examples Customers Orders Products Sales
  • 18. Facts and Dimensions 18 1. Replicating OLTP database schema is sufficient for many query needs 2. Design of schema is a function of READ needs 3. Example schema did not need any aggregation or inference 4. However, some enrichment on the dimension improves READ- ability 5. Traceability of facts is an important requirement for data warehouse tables Following is just one way of arranging facts and describing them with dimensions* *This is the famous STAR SCHEMA. However, if Trip_Dimension is linked with additional dimension tables, it soon becomes a SNOW FLAKE SCHEMA
  • 20. Analysts Need a way to ROLL-UP or DRILL-DOWN 20 1. Roll-up and drill-down across dimensions are key capabilities of DSS 2. More the number of dimensions the better for analysis and insights gain 3. OLTP system supplies facts. Some dimension attributes are inferred 4. Aggregations across hierarchy is either pre- computed and stored or dynamically ascertained 5. Pre-computation improves performance at the cost of data currency i6 i5 i4 i3 i1 i2 Hierarchy Roll-up: Summarizing while traversing up the hierarchy Drill-down: Getting into details while moving down the hierarchy measures Place Dimension Time Dimension drilldown roll-up Facts or measures
  • 22. Multidimensional Modeling – The Data Cube 22 Source: Conceptual Modeling Solutions for the Data Warehouse – Stefano Rizzi DEIS – University of Bologna, Italy 1. Decision making is enabled by cubes, dimensions and measures 2. Decision activity is called online analytical processing – OLAP 3. ROLAP systems store data in relational form and creates cube at runtime 4. MOLAP systems store precomputed data in the form of multi-dimensional cubes 5. HOLAP is a hybrid of both ROLAP and MOLAP – just enough aggregation
  • 24. A Play on Storage and Memory Technologies 24  Accessed https://en.wikipedia.org/wiki/Comparison_of_OLAP_servers on 6 Jul 2019  Continuous innovation is bound to change the product capabilities over time A snapshot of different products and their *OLAP support 1. Reporting use cases need authentic lineage thereby insisting on precomputing 2. The trend has been to reduce precomputation to eliminate data staleness 3. For use cases with near real time data needs, dynamic cube is useful 4. Raw facts are held in disks and are aggregated at runtime in-memory 5. Speed of computing is enhanced with parallel processing using clusters a b c d e g h MOLAP – a  b  c d ROLAP – e  h HOLAP – a  b  e, g h storage memory Cube processing Multi- Dimensional Data Relational Data
  • 26. Business Need creates a case for Technology 26 Re-imagined from Providing OLAP to User-Analysts: an IT Mandate , accessed on 6 July 2019 1. Data warehouses are designed to cater to business users 2. All the processing and storage helps in effective operations, tactics and strategy 3. Most business users prefer data for analysis in Microsoft Excel 4. Data scientists use tools such as SAS, MATLAB to conduct experiments 5. Choice of technology is evolving to satisfy changing business needs  Operational  Tactical  Strategic Formulaic Goal-seeking: How can I increase sale of housing loan in Tier 2 cities Contemplative What-if: What is the effect of decreasing interest rates on sales of housing loan Exegetical Slice and Dice: Understand impact of price of houses on housing loan take off Categorical Explanatory: How many people opted for housing loan during summer vacations? Categories of Business Analysis SQL Dimensional files Data Platform Sample technologies used at various levels Indicative technologies only, varies based on needs and organizations APIs
  • 28. Integration – Data, Transport and Frequencies 28 1. Primary purpose of integration is to get data to business users 2. Traditional methods sourced data through files and loaded into databases 3. Need for agility – sense and act – increase use of data directly from sources 4. Traditional extract- transform-load is giving way to data virtualization 5. All integration - data and application – is converging – to reduce latencies as much as possible Traditional Data Warehouse Real – Time Data Warehouse Logical Data Warehouse Context – Independent Data Warehouse Data warehouse Use Cases* Data Sources Flat files Relational Databases Message Topics Frequencies Continuously Updated Log Files Change Data Periodic (Batch) e.g. once in a day Near real time e.g. once in 1 minute Pre-processed Data External Data Real time e.g. on business event Direct access to source Data moved into intermediate location Data enriched and precomputed Data Access / Transport Data anonymized before use * Definitions accessed here on 6 July 2019 Data made available in-memory
  • 30. It is Important to Understand Data 30 1. The old adage of quality of output depending on quality of input still holds 2. A continuously updated metadata engine ensures authenticity of outcomes 3. Changing dimensions can impact data access performance and accuracy 4. Speed of availability can be a trade-off with accuracy 5. Data governance driven by data architecture is key to managing sanctity of data warehouse Uniqueness and Relationships Changing Dimensions Historical Facts + Time Series Primary Key, Business Key, Surrogate Key, Foreign Key Type 1 – no change, e.g. date of Birth Type 2 – Infrequently change, e.g. manager, stored with an effective start date Type 3 – similar to Type 3, but stores both old and new values together in the same row Facts that indicate an event or influenced by event, e.g. Open and close price of stock, over a period of time Computed Measures Facts that are aggregated, inferred, derived etc., for making sense of eventsMissing Values Facts that genuinely indicate absence of any values or erroneously not captured for event Partly Unstructured Information in textual format requiring parsing, tokenizing and context setting to help understand and derive insights Graph Datasets having high degree of relationships – making relationships first class citizens Spatial Data Datasets with multiple layers describing locations Unstructured Data Audio, video and images; usually accompanied by metadata Each needs its own way of handling Data can be held in a variety of formats - relational, XML, JSON, key-value, geospatial, graph
  • 31. 31 Need and innovation has thrown up alternatives 15 16
  • 32. Alternative Data Platforms 32 1. Corporate information factory uses central integration database as a single version of truth 2. Data lake bring together all sorts of data from within and outside 3. Data vault design ensures availability of all data – both clean and unclean 4. Data Archive, (not depicted), stores old data for reference purpose 5. Technology choice is dependent on perceived benefits and cost Corporate InformationFactory Data Lake Data Vault Data ingested into a central integration database in 3NF Data marts for specific purpose External source Data ingested into a central cluster in raw format Data accessed on demand using variety of means Data ingested into a central database as-is with time factor Historical view of all data across operational data stores Allow enhanceability while guaranteeing data accuracy Allow insight creation merging data from multiple sources Allow data to be viewed as it arrived rather than as it should have arrived
  • 33. 33 You need not spend all the money upfront 16 16
  • 34. Data Platforms are Available as-a-Service 34 1. On-premise data sources are connected to a cloud service provider 2. Oracle, SAP, Snowflake, Microsoft, Amazon etc., provide data warehouse as a platform (DWaaS) 3. Key value proposition is reduced capital expense in procuring and setting up infrastructure 4. An enterprise can quickly acquire a data warehouse software capability 5. Running costs and security need to be monitored Copied from https://panoply.io/data-warehouse-guide/data-warehouse-architecture-traditional-vs-cloud/
  • 36. 36