Data Warehousing Concepts
 What is Data Warehousing?
 Dimensional Data Model
 Star Schema
 Snowflake Schema
 Slowly Changing Dimension
 Conceptual Data Model
 Logical Data Model
 Physical Data Model
 Conceptual, Logical, and Physical Data Model
 Data Integrity
 What is OLAP
 MOLAP, ROLAP, and HOLAP
What is Data Warehousing?
Different people have different definitions for a data warehouse. The most popular
definition came from Bill Inmon, who provided the following:
A data warehouse is a subject-oriented, integrated, time-variant and non-volatile
collection of data in support of management's decision making process.
A process of transforming data
into information and making it
available to users in a timely
enough manner to make a
difference
To summarize ...
• OLTP Systems
are used to “run”
a business
• The Data Warehouse
helps to “optimize” the
business
Corporate Data
It includes
• human resource data
• financial data
• facilities data
• sales data
• expenses on marketing data
• production planning cost
• manufacturing cost
• service delivery cost
• inventory management
• shipping and payment data
What is enterprise-wide corporate data?
How is the Business Intelligence in Retail Banking? Or Retail
Industry?
KPI’s
The KPI can be used as the performance measurement tool
(Key Performance Indicator)
The KPI’s in Retail Banking:
 The Total cash deposits held in a month
 The average annual deposit held
 Average number of deposits per retail bank growth
 Average withdrawals made by each depositor
 Ratio of active depositor or dormant depositor
 Average number of default borrowers in a year
 Average number of credit cards issued by the retail bank
 Rate of borrowing risk
 Rate of default risk
 Average number of customers served in a day
 Average number of closed bank accounts
KPI’s
The KPI can be used as the performance measurement tool
(Key Performance Indicator)
The KPI’s in Retail Industry:
• Sales compared to Budget & Target
• Sales compared to last year (or any other period)
• Wage cost recovery
• Average sale per customer/transaction
• Units per customer/transaction
• Sales per hour
• Sales & Gross Margin
KPI’s (Key Performance Indicator)
Examples of common departmental KPIs
Sales Growth
Analyze the pace at which your organization's
sales revenue is growing and use that
information in strategic decision-making
Marketing
Analyze the pace at which your organization's
sales revenue is growing and use that
information in strategic decision-making
Financial
Measures your organization's financial health
by analyzing readily available resources that
could be used to meet any short-term
obligations.
Data Warehousing
Data Warehousing Architecture
Data Warehousing Environment
• Duplicate data
• Inconsistent values
• Missing data
• Unexpected use of fields
• Impossible or wrong values
Data Quality
• Data-Type Constraints:
• Range Constraints:
• Mandatory Constraints:
• Unique Constraints:
• Set-Membership constraints:
• Foreign-key constraints: Regular expression patterns:
Validations for Data Cleansing
Views to build warehouse
• The top-down view
• The data source view
• The data warehouse view
• The business query view
What approach is better to design data warehouse?
Top Down Approach
Bottom Up Approach
Data Warehousing Design
• Requirement Gathering
• Physical Environment Setup
• Data Modeling
• ETL
• OLAP Cube Design
• Front End Development
• Report Development
• Performance Tuning
• Query Optimization
• Quality Assurance
• Rolling out to Production
• Production Maintenance
• Incremental Enhancements
Why Data Warehousing?
 Need to see daily, weekly, monthly, quarterly profit of each
store.
 Comparison of sales and profit on various time periods.
 Comparison of sales in various time bands of the day.
 Need to know which product has more demand on which
location?
 Need to study trend of sales by time period of the day over
the week, month, and year?
 On what day sales is higher?
Phases of Data Warehousing Project
1. Identify and collect requirements
 Need to see daily, weekly, monthly, quarterly profit of each store.
 Comparison of sales and profit on various time periods.
 Comparison of sales in various time bands of the day.
 Need to know which product has more demand on which location?
 Need to study trend of sales by time period of the day over the week, month, and year?
 On what day sales is higher?
Will be handled by business analyst and leads
Who collects the requirements?
Phases of Data Warehousing Project
2. Design the dimensional model
Pharmacy_Claims_Fact
Drug_Id (FK)
Org_Id (FK)
Practitioner_Id (FK)
Product_Id (FK)
Time_ID (FK)
Claim_status_Id (FK)
Provider_Id (FK)
Subscriber_id (FK)
Demographic_key (FK)
InsuranceType_Id (FK)
Incurred_Date
Claim_Date
Claim_Settled_Date
Days_Supply
Dispensing_Fee
Incentive_Savings_Amount
Incentive_Fee_Paid_Amount
Amount_Claimed
Amount_Paid
Amount_Pending
Amount_Adjusted
CoPayment_Amount
CoInsurance_Amount
Deductible
Refill_Indicator
Claim_Production_Key
Claim_Production_Txn_No
Status_Change_Date
Last_Record_Flag
Practitioner
Practitioner_Id
Practitioner_Name
Practitioner_Type
practioner_type_desc
Qualification
Specialisation
ssn
Medical_Assoc_Enroll_No
Organisation
Org_Id
Org_prod_id
Org_Name
Address
City
County
State
Zip
Industry_Classification
Subscriber
Subscriber_id
Subscriber_prod_key
Member_prod_key
Member_Name
Date_of_Birth
Subscriber_type
Address
City
County
State
Zip
Hobby1
Hobby2
Smoker_YN
Alcoholic_YN
Pre_Existing_Ailments
Demographics
Demographic_key
Age_group
Income_group
Race
Country_of_birth
Marital_status
Gender
Citizenship_status
Provider
Provider_Id
Provider_Name
Provider_Type
Address
City
County
State
Zip
Service_Area
Netwrok_Provider
Insurance_Type
InsuranceType_Id
InsuranceType_Name
InsuranceType_Desc
Product
Product_Id
Product_Name
Product_Category
LoB
Claim_Status
Claim_status_Id
Claim_Status_Reason
Claim_stat_catg
Time
Time_ID
Day
Week
Month
Quarter
Year
Season
Drugs
Drug_Id
Drug_Name_Generic
Drug_Name_Trade
National_Drug_Code
Drug_Description
Drug_Category
Formulary
Manufacturer
Data Model will be designed by Data Modelers
Phases of Data Warehousing Project
3. Create and Maintain the tables
Database will be maintained by DBA’s
Phases of Data Warehousing Project
4. Loading the data into Data Warehouse and Data Marts
Will be taken care by ETL Team
What is ETL?
Informatica is ETL application
Phases of Data Warehousing Project
5. Develop Reports / Dashboards
Will be taken care by Reporting Team
Phases of Data Warehousing Project
6. Testing ETL Mappings and Reports / Dashboards
Will be taken care by QA Department
7. Deploying to the Production and Maintaining by Production
Team
Will be taken care by Production Department
Where do we fit after learning this training?
Phases of Data Warehousing Project
Where do we fit after learning this training?
We can work as a
1. ETL Developer
2. ETL Administrator
3. ETL Tester
Data Modeling
What is Data Modeling?
• Data model defines relationships between
data
• Dimensional data model is most often used in
data warehousing systems.
• Data modeling is the process of learning about
the data.
Data modeling will be designed by data modelers
What is Dimensional Modeling?
• It help us store the data
Goals and benefits of Dimensional Modeling
• Faster Data retrieval
• Better Understandability
• Extensibility
It has 2 distinct categories
• Dimension and
• Measures
Scenarios of Dimensional Data Modeling
McDonald’s client:
I want to store information of how many burgers and fries are getting
sold per day from a single McDonald’s outlet.
what is dimension and what is a measure in this example
Step1: Identify the Dimensions
1.Food (ex: Burgers and fries)
2. Store (McDonald’s)
3. Some specific day
Step2: Identify the measures
Number of burgers/fries sold is a measure.
The Fact table captures the data that measures the organizations business
operations
Scenarios of Dimensional Data Modeling
Step3: Identify the attributes or properties of dimensions
KEY NAME
1 Burger
2 Fries
KEY NAME
1 Store 1
2 Store 2
... ...
KEY DAY
1 01 Jan 2012
2 02 Jan 2012
3 03 Jan 2012
... ...
Scenarios of Dimensional Data Modeling
Step 4: Identify the granularity of the measures
What is meant by "Granularity"?
Granularity refers to the lowest (or most granular) level of information
stored in any table
Scenarios of Dimensional Data Modeling
Step 5: History Preservation (Optional)
This can be solved by designing the dimension tables as "slowly changing
dimension".
Entities:
Entities are the things about which you want to store information.
For example: EMPLOYEE
Cardinalities:
Scenarios of Dimensional Data Modeling
The cardinality shows how much of one side of the relationship belongs to
how much of the other side of the relationship.
For example:
• How many customers belong to 1 sale?;
• How many sales belong to 1 customer?;
• How many sales take place in 1 shop?
Customers --> Sales; 1 customer can buy something several times
Sales --> Customers; 1 sale is always made by 1 customer at the time
Customers --> Products; 1 customer can buy multiple products
Products --> Customers; 1 product can be purchased by multiple customers
Scenarios of Dimensional Data Modeling for Banking
Scenarios of Dimensional Data Modeling for Retail Banking
Scenarios of Dimensional Data Modeling for Retail Banking
Event 1 - Set-up Banks and Branches
Event 2 - Create new Customer
Event 3 - Setup New Account
Event 4 - Issue Credit Card
Event 5 - Customer makes Deposit
Event 6 - Customer uses Card
Event 7 - Bank Issues Statement
Event 8 - Customer closes Account
Data Modeling
Data Modeling
Data Modeling
Types of OLAP Servers
We have four types of OLAP servers:
• Relational OLAP (ROLAP)
• Multidimensional OLAP (MOLAP)
• Hybrid OLAP (HOLAP)
• Specialized SQL Servers
OLTP v/s OLAP
OLTP Data Model
OLTP  OLAP
Snowflake Schema
Snowflake Schema
Star Schema
Informatica

INFORMATICA EASY LEARNING ONLINE TRAINING

  • 2.
    Data Warehousing Concepts What is Data Warehousing?  Dimensional Data Model  Star Schema  Snowflake Schema  Slowly Changing Dimension  Conceptual Data Model  Logical Data Model  Physical Data Model  Conceptual, Logical, and Physical Data Model  Data Integrity  What is OLAP  MOLAP, ROLAP, and HOLAP
  • 3.
    What is DataWarehousing? Different people have different definitions for a data warehouse. The most popular definition came from Bill Inmon, who provided the following: A data warehouse is a subject-oriented, integrated, time-variant and non-volatile collection of data in support of management's decision making process. A process of transforming data into information and making it available to users in a timely enough manner to make a difference
  • 4.
    To summarize ... •OLTP Systems are used to “run” a business • The Data Warehouse helps to “optimize” the business
  • 5.
    Corporate Data It includes •human resource data • financial data • facilities data • sales data • expenses on marketing data • production planning cost • manufacturing cost • service delivery cost • inventory management • shipping and payment data What is enterprise-wide corporate data? How is the Business Intelligence in Retail Banking? Or Retail Industry?
  • 6.
    KPI’s The KPI canbe used as the performance measurement tool (Key Performance Indicator) The KPI’s in Retail Banking:  The Total cash deposits held in a month  The average annual deposit held  Average number of deposits per retail bank growth  Average withdrawals made by each depositor  Ratio of active depositor or dormant depositor  Average number of default borrowers in a year  Average number of credit cards issued by the retail bank  Rate of borrowing risk  Rate of default risk  Average number of customers served in a day  Average number of closed bank accounts
  • 7.
    KPI’s The KPI canbe used as the performance measurement tool (Key Performance Indicator) The KPI’s in Retail Industry: • Sales compared to Budget & Target • Sales compared to last year (or any other period) • Wage cost recovery • Average sale per customer/transaction • Units per customer/transaction • Sales per hour • Sales & Gross Margin
  • 8.
    KPI’s (Key PerformanceIndicator) Examples of common departmental KPIs Sales Growth Analyze the pace at which your organization's sales revenue is growing and use that information in strategic decision-making Marketing Analyze the pace at which your organization's sales revenue is growing and use that information in strategic decision-making Financial Measures your organization's financial health by analyzing readily available resources that could be used to meet any short-term obligations.
  • 9.
  • 10.
  • 11.
  • 12.
    • Duplicate data •Inconsistent values • Missing data • Unexpected use of fields • Impossible or wrong values Data Quality • Data-Type Constraints: • Range Constraints: • Mandatory Constraints: • Unique Constraints: • Set-Membership constraints: • Foreign-key constraints: Regular expression patterns: Validations for Data Cleansing
  • 13.
    Views to buildwarehouse • The top-down view • The data source view • The data warehouse view • The business query view What approach is better to design data warehouse?
  • 14.
  • 15.
  • 16.
    Data Warehousing Design •Requirement Gathering • Physical Environment Setup • Data Modeling • ETL • OLAP Cube Design • Front End Development • Report Development • Performance Tuning • Query Optimization • Quality Assurance • Rolling out to Production • Production Maintenance • Incremental Enhancements
  • 17.
    Why Data Warehousing? Need to see daily, weekly, monthly, quarterly profit of each store.  Comparison of sales and profit on various time periods.  Comparison of sales in various time bands of the day.  Need to know which product has more demand on which location?  Need to study trend of sales by time period of the day over the week, month, and year?  On what day sales is higher?
  • 18.
    Phases of DataWarehousing Project 1. Identify and collect requirements  Need to see daily, weekly, monthly, quarterly profit of each store.  Comparison of sales and profit on various time periods.  Comparison of sales in various time bands of the day.  Need to know which product has more demand on which location?  Need to study trend of sales by time period of the day over the week, month, and year?  On what day sales is higher? Will be handled by business analyst and leads Who collects the requirements?
  • 19.
    Phases of DataWarehousing Project 2. Design the dimensional model Pharmacy_Claims_Fact Drug_Id (FK) Org_Id (FK) Practitioner_Id (FK) Product_Id (FK) Time_ID (FK) Claim_status_Id (FK) Provider_Id (FK) Subscriber_id (FK) Demographic_key (FK) InsuranceType_Id (FK) Incurred_Date Claim_Date Claim_Settled_Date Days_Supply Dispensing_Fee Incentive_Savings_Amount Incentive_Fee_Paid_Amount Amount_Claimed Amount_Paid Amount_Pending Amount_Adjusted CoPayment_Amount CoInsurance_Amount Deductible Refill_Indicator Claim_Production_Key Claim_Production_Txn_No Status_Change_Date Last_Record_Flag Practitioner Practitioner_Id Practitioner_Name Practitioner_Type practioner_type_desc Qualification Specialisation ssn Medical_Assoc_Enroll_No Organisation Org_Id Org_prod_id Org_Name Address City County State Zip Industry_Classification Subscriber Subscriber_id Subscriber_prod_key Member_prod_key Member_Name Date_of_Birth Subscriber_type Address City County State Zip Hobby1 Hobby2 Smoker_YN Alcoholic_YN Pre_Existing_Ailments Demographics Demographic_key Age_group Income_group Race Country_of_birth Marital_status Gender Citizenship_status Provider Provider_Id Provider_Name Provider_Type Address City County State Zip Service_Area Netwrok_Provider Insurance_Type InsuranceType_Id InsuranceType_Name InsuranceType_Desc Product Product_Id Product_Name Product_Category LoB Claim_Status Claim_status_Id Claim_Status_Reason Claim_stat_catg Time Time_ID Day Week Month Quarter Year Season Drugs Drug_Id Drug_Name_Generic Drug_Name_Trade National_Drug_Code Drug_Description Drug_Category Formulary Manufacturer Data Model will be designed by Data Modelers
  • 20.
    Phases of DataWarehousing Project 3. Create and Maintain the tables Database will be maintained by DBA’s
  • 21.
    Phases of DataWarehousing Project 4. Loading the data into Data Warehouse and Data Marts Will be taken care by ETL Team
  • 22.
    What is ETL? Informaticais ETL application
  • 23.
    Phases of DataWarehousing Project 5. Develop Reports / Dashboards Will be taken care by Reporting Team
  • 24.
    Phases of DataWarehousing Project 6. Testing ETL Mappings and Reports / Dashboards Will be taken care by QA Department 7. Deploying to the Production and Maintaining by Production Team Will be taken care by Production Department Where do we fit after learning this training?
  • 25.
    Phases of DataWarehousing Project Where do we fit after learning this training? We can work as a 1. ETL Developer 2. ETL Administrator 3. ETL Tester
  • 26.
  • 27.
    What is DataModeling? • Data model defines relationships between data • Dimensional data model is most often used in data warehousing systems. • Data modeling is the process of learning about the data. Data modeling will be designed by data modelers
  • 28.
    What is DimensionalModeling? • It help us store the data Goals and benefits of Dimensional Modeling • Faster Data retrieval • Better Understandability • Extensibility It has 2 distinct categories • Dimension and • Measures
  • 29.
    Scenarios of DimensionalData Modeling McDonald’s client: I want to store information of how many burgers and fries are getting sold per day from a single McDonald’s outlet. what is dimension and what is a measure in this example Step1: Identify the Dimensions 1.Food (ex: Burgers and fries) 2. Store (McDonald’s) 3. Some specific day Step2: Identify the measures Number of burgers/fries sold is a measure. The Fact table captures the data that measures the organizations business operations
  • 30.
    Scenarios of DimensionalData Modeling Step3: Identify the attributes or properties of dimensions KEY NAME 1 Burger 2 Fries KEY NAME 1 Store 1 2 Store 2 ... ... KEY DAY 1 01 Jan 2012 2 02 Jan 2012 3 03 Jan 2012 ... ...
  • 31.
    Scenarios of DimensionalData Modeling Step 4: Identify the granularity of the measures What is meant by "Granularity"? Granularity refers to the lowest (or most granular) level of information stored in any table
  • 32.
    Scenarios of DimensionalData Modeling Step 5: History Preservation (Optional) This can be solved by designing the dimension tables as "slowly changing dimension". Entities: Entities are the things about which you want to store information. For example: EMPLOYEE
  • 33.
    Cardinalities: Scenarios of DimensionalData Modeling The cardinality shows how much of one side of the relationship belongs to how much of the other side of the relationship. For example: • How many customers belong to 1 sale?; • How many sales belong to 1 customer?; • How many sales take place in 1 shop? Customers --> Sales; 1 customer can buy something several times Sales --> Customers; 1 sale is always made by 1 customer at the time Customers --> Products; 1 customer can buy multiple products Products --> Customers; 1 product can be purchased by multiple customers
  • 34.
    Scenarios of DimensionalData Modeling for Banking
  • 35.
    Scenarios of DimensionalData Modeling for Retail Banking
  • 36.
    Scenarios of DimensionalData Modeling for Retail Banking Event 1 - Set-up Banks and Branches Event 2 - Create new Customer Event 3 - Setup New Account Event 4 - Issue Credit Card Event 5 - Customer makes Deposit Event 6 - Customer uses Card Event 7 - Bank Issues Statement Event 8 - Customer closes Account
  • 37.
  • 38.
  • 39.
  • 40.
    Types of OLAPServers We have four types of OLAP servers: • Relational OLAP (ROLAP) • Multidimensional OLAP (MOLAP) • Hybrid OLAP (HOLAP) • Specialized SQL Servers
  • 41.
  • 42.
  • 43.
  • 44.
  • 45.
  • 46.
  • 47.