Submit Search
Upload
Data Warehouse Basics
•
4 likes
•
3,761 views
Ram Kedem
Follow
Data Warehouse Basics
Read less
Read more
Technology
Report
Share
Report
Share
1 of 30
Recommended
Date warehousing concepts
Date warehousing concepts
pcherukumalla
Data Warehouse 101
Data Warehouse 101
PanaEk Warawit
Data warehouse presentaion
Data warehouse presentaion
sridhark1981
Dimensional Modeling
Dimensional Modeling
Sunita Sahu
Data Architecture for Data Governance
Data Architecture for Data Governance
DATAVERSITY
Advanced Dimensional Modelling
Advanced Dimensional Modelling
Vincent Rainardi
Intro to Data Vault 2.0 on Snowflake
Intro to Data Vault 2.0 on Snowflake
Kent Graziano
Activate Data Governance Using the Data Catalog
Activate Data Governance Using the Data Catalog
DATAVERSITY
Recommended
Date warehousing concepts
Date warehousing concepts
pcherukumalla
Data Warehouse 101
Data Warehouse 101
PanaEk Warawit
Data warehouse presentaion
Data warehouse presentaion
sridhark1981
Dimensional Modeling
Dimensional Modeling
Sunita Sahu
Data Architecture for Data Governance
Data Architecture for Data Governance
DATAVERSITY
Advanced Dimensional Modelling
Advanced Dimensional Modelling
Vincent Rainardi
Intro to Data Vault 2.0 on Snowflake
Intro to Data Vault 2.0 on Snowflake
Kent Graziano
Activate Data Governance Using the Data Catalog
Activate Data Governance Using the Data Catalog
DATAVERSITY
Different data models
Different data models
madhusha udayangani
Introduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse Architecture
Databricks
Data warehouse
Data warehouse
Richard Bányi
Building an Effective Data Warehouse Architecture
Building an Effective Data Warehouse Architecture
James Serra
Data warehousing - Dr. Radhika Kotecha
Data warehousing - Dr. Radhika Kotecha
Radhika Kotecha
Data modelling 101
Data modelling 101
Christopher Bradley
Data Vault Overview
Data Vault Overview
Empowered Holdings, LLC
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
James Serra
The Importance of Metadata
The Importance of Metadata
DATAVERSITY
ETL VS ELT.pdf
ETL VS ELT.pdf
BOSupport
Enterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data Architecture
DATAVERSITY
Snowflake Architecture.pptx
Snowflake Architecture.pptx
chennakesava44
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)
James Serra
Database design
Database design
Dhani Ahmad
Data warehouse
Data warehouse
shachibattar
Operational Data Vault
Operational Data Vault
Empowered Holdings, LLC
Why shift from ETL to ELT?
Why shift from ETL to ELT?
HEXANIKA
Why Data Vault?
Why Data Vault?
Kent Graziano
Agile Data Engineering - Intro to Data Vault Modeling (2016)
Agile Data Engineering - Intro to Data Vault Modeling (2016)
Kent Graziano
Liberating data with Talend Data Catalog
Liberating data with Talend Data Catalog
Jean-Michel Franco
Data Warehouse Design Considerations
Data Warehouse Design Considerations
Ram Kedem
SSIS Basic Data Flow
SSIS Basic Data Flow
Ram Kedem
More Related Content
What's hot
Different data models
Different data models
madhusha udayangani
Introduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse Architecture
Databricks
Data warehouse
Data warehouse
Richard Bányi
Building an Effective Data Warehouse Architecture
Building an Effective Data Warehouse Architecture
James Serra
Data warehousing - Dr. Radhika Kotecha
Data warehousing - Dr. Radhika Kotecha
Radhika Kotecha
Data modelling 101
Data modelling 101
Christopher Bradley
Data Vault Overview
Data Vault Overview
Empowered Holdings, LLC
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
James Serra
The Importance of Metadata
The Importance of Metadata
DATAVERSITY
ETL VS ELT.pdf
ETL VS ELT.pdf
BOSupport
Enterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data Architecture
DATAVERSITY
Snowflake Architecture.pptx
Snowflake Architecture.pptx
chennakesava44
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)
James Serra
Database design
Database design
Dhani Ahmad
Data warehouse
Data warehouse
shachibattar
Operational Data Vault
Operational Data Vault
Empowered Holdings, LLC
Why shift from ETL to ELT?
Why shift from ETL to ELT?
HEXANIKA
Why Data Vault?
Why Data Vault?
Kent Graziano
Agile Data Engineering - Intro to Data Vault Modeling (2016)
Agile Data Engineering - Intro to Data Vault Modeling (2016)
Kent Graziano
Liberating data with Talend Data Catalog
Liberating data with Talend Data Catalog
Jean-Michel Franco
What's hot
(20)
Different data models
Different data models
Introduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse Architecture
Data warehouse
Data warehouse
Building an Effective Data Warehouse Architecture
Building an Effective Data Warehouse Architecture
Data warehousing - Dr. Radhika Kotecha
Data warehousing - Dr. Radhika Kotecha
Data modelling 101
Data modelling 101
Data Vault Overview
Data Vault Overview
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
The Importance of Metadata
The Importance of Metadata
ETL VS ELT.pdf
ETL VS ELT.pdf
Enterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data Architecture
Snowflake Architecture.pptx
Snowflake Architecture.pptx
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Database design
Database design
Data warehouse
Data warehouse
Operational Data Vault
Operational Data Vault
Why shift from ETL to ELT?
Why shift from ETL to ELT?
Why Data Vault?
Why Data Vault?
Agile Data Engineering - Intro to Data Vault Modeling (2016)
Agile Data Engineering - Intro to Data Vault Modeling (2016)
Liberating data with Talend Data Catalog
Liberating data with Talend Data Catalog
Viewers also liked
Data Warehouse Design Considerations
Data Warehouse Design Considerations
Ram Kedem
SSIS Basic Data Flow
SSIS Basic Data Flow
Ram Kedem
SSIS Incremental ETL process
SSIS Incremental ETL process
Ram Kedem
Big data (Data Size doesn't Matter, How and What is Data that's matter)
Big data (Data Size doesn't Matter, How and What is Data that's matter)
Syed Taimoor Hussain Shah
SSAS Cubes & Hierarchies
SSAS Cubes & Hierarchies
Ram Kedem
SSIS Data Flow Tasks
SSIS Data Flow Tasks
Ram Kedem
Control Flow Using SSIS
Control Flow Using SSIS
Ram Kedem
Viewers also liked
(7)
Data Warehouse Design Considerations
Data Warehouse Design Considerations
SSIS Basic Data Flow
SSIS Basic Data Flow
SSIS Incremental ETL process
SSIS Incremental ETL process
Big data (Data Size doesn't Matter, How and What is Data that's matter)
Big data (Data Size doesn't Matter, How and What is Data that's matter)
SSAS Cubes & Hierarchies
SSAS Cubes & Hierarchies
SSIS Data Flow Tasks
SSIS Data Flow Tasks
Control Flow Using SSIS
Control Flow Using SSIS
Similar to Data Warehouse Basics
Data mining In SSAS
Data mining In SSAS
Ram Kedem
Data Mining in SSAS
Data Mining in SSAS
Ram Kedem
Conflict in the Cloud – Issues & Solutions for Big Data
Conflict in the Cloud – Issues & Solutions for Big Data
Halo BI
Designing dashboards for performance shridhar wip 040613
Designing dashboards for performance shridhar wip 040613
Mrunal Shridhar
Column Statistics in Hive
Column Statistics in Hive
vshreepadma
Datastage Introduction To Data Warehousing
Datastage Introduction To Data Warehousing
Vibrant Technologies & Computers
Top 5 Java Performance Metrics, Tips & Tricks
Top 5 Java Performance Metrics, Tips & Tricks
AppDynamics
Microservices for java architects schamburg-2015-05-19
Microservices for java architects schamburg-2015-05-19
Derek Ashmore
Pr dc 2015 sql server is cheaper than open source
Pr dc 2015 sql server is cheaper than open source
Terry Bunio
The Future of Data Warehousing: ETL Will Never be the Same
The Future of Data Warehousing: ETL Will Never be the Same
Cloudera, Inc.
Microservices for Java Architects (Madison-Milwaukee, April 28-9, 2015)
Microservices for Java Architects (Madison-Milwaukee, April 28-9, 2015)
Derek Ashmore
Data Vault Introduction
Data Vault Introduction
Patrick Van Renterghem
ETL Testing - Introduction to ETL testing
ETL Testing - Introduction to ETL testing
Vibrant Technologies & Computers
ETL Testing - Introduction to ETL Testing
ETL Testing - Introduction to ETL Testing
Vibrant Event
ETL Testing - Introduction to ETL testing
ETL Testing - Introduction to ETL testing
Vibrant Event
Microservices for Java Architects (Chicago, April 21, 2015)
Microservices for Java Architects (Chicago, April 21, 2015)
Derek Ashmore
dwproblems.pptx
dwproblems.pptx
manojMarwah
Designing Effective Storage Strategies to Meet Business Needs
Designing Effective Storage Strategies to Meet Business Needs
Brian Anderson
Designing Effective Storage Strategies to Meet Business Needs
Designing Effective Storage Strategies to Meet Business Needs
Eagle Technologies
Faster, Cheaper, Easier... and Successful Best Practices for Big Data Integra...
Faster, Cheaper, Easier... and Successful Best Practices for Big Data Integra...
DataWorks Summit
Similar to Data Warehouse Basics
(20)
Data mining In SSAS
Data mining In SSAS
Data Mining in SSAS
Data Mining in SSAS
Conflict in the Cloud – Issues & Solutions for Big Data
Conflict in the Cloud – Issues & Solutions for Big Data
Designing dashboards for performance shridhar wip 040613
Designing dashboards for performance shridhar wip 040613
Column Statistics in Hive
Column Statistics in Hive
Datastage Introduction To Data Warehousing
Datastage Introduction To Data Warehousing
Top 5 Java Performance Metrics, Tips & Tricks
Top 5 Java Performance Metrics, Tips & Tricks
Microservices for java architects schamburg-2015-05-19
Microservices for java architects schamburg-2015-05-19
Pr dc 2015 sql server is cheaper than open source
Pr dc 2015 sql server is cheaper than open source
The Future of Data Warehousing: ETL Will Never be the Same
The Future of Data Warehousing: ETL Will Never be the Same
Microservices for Java Architects (Madison-Milwaukee, April 28-9, 2015)
Microservices for Java Architects (Madison-Milwaukee, April 28-9, 2015)
Data Vault Introduction
Data Vault Introduction
ETL Testing - Introduction to ETL testing
ETL Testing - Introduction to ETL testing
ETL Testing - Introduction to ETL Testing
ETL Testing - Introduction to ETL Testing
ETL Testing - Introduction to ETL testing
ETL Testing - Introduction to ETL testing
Microservices for Java Architects (Chicago, April 21, 2015)
Microservices for Java Architects (Chicago, April 21, 2015)
dwproblems.pptx
dwproblems.pptx
Designing Effective Storage Strategies to Meet Business Needs
Designing Effective Storage Strategies to Meet Business Needs
Designing Effective Storage Strategies to Meet Business Needs
Designing Effective Storage Strategies to Meet Business Needs
Faster, Cheaper, Easier... and Successful Best Practices for Big Data Integra...
Faster, Cheaper, Easier... and Successful Best Practices for Big Data Integra...
More from Ram Kedem
Impala use case @ edge
Impala use case @ edge
Ram Kedem
Advanced SQL Webinar
Advanced SQL Webinar
Ram Kedem
Managing oracle Database Instance
Managing oracle Database Instance
Ram Kedem
Power Pivot and Power View
Power Pivot and Power View
Ram Kedem
SQL Injections - Oracle
SQL Injections - Oracle
Ram Kedem
SSAS Attributes
SSAS Attributes
Ram Kedem
SSRS Matrix
SSRS Matrix
Ram Kedem
DDL Practice (Hebrew)
DDL Practice (Hebrew)
Ram Kedem
DML Practice (Hebrew)
DML Practice (Hebrew)
Ram Kedem
Exploring Oracle Database Architecture (Hebrew)
Exploring Oracle Database Architecture (Hebrew)
Ram Kedem
Introduction to SQL
Introduction to SQL
Ram Kedem
Introduction to Databases
Introduction to Databases
Ram Kedem
Deploy SSRS Project - SQL Server 2014
Deploy SSRS Project - SQL Server 2014
Ram Kedem
Pig - Processing XML data
Pig - Processing XML data
Ram Kedem
SSRS Basic Parameters
SSRS Basic Parameters
Ram Kedem
SSRS Gauges
SSRS Gauges
Ram Kedem
SSRS Conditional Formatting
SSRS Conditional Formatting
Ram Kedem
SSRS Calculated Fields
SSRS Calculated Fields
Ram Kedem
SSRS Groups
SSRS Groups
Ram Kedem
Deploy SSIS
Deploy SSIS
Ram Kedem
More from Ram Kedem
(20)
Impala use case @ edge
Impala use case @ edge
Advanced SQL Webinar
Advanced SQL Webinar
Managing oracle Database Instance
Managing oracle Database Instance
Power Pivot and Power View
Power Pivot and Power View
SQL Injections - Oracle
SQL Injections - Oracle
SSAS Attributes
SSAS Attributes
SSRS Matrix
SSRS Matrix
DDL Practice (Hebrew)
DDL Practice (Hebrew)
DML Practice (Hebrew)
DML Practice (Hebrew)
Exploring Oracle Database Architecture (Hebrew)
Exploring Oracle Database Architecture (Hebrew)
Introduction to SQL
Introduction to SQL
Introduction to Databases
Introduction to Databases
Deploy SSRS Project - SQL Server 2014
Deploy SSRS Project - SQL Server 2014
Pig - Processing XML data
Pig - Processing XML data
SSRS Basic Parameters
SSRS Basic Parameters
SSRS Gauges
SSRS Gauges
SSRS Conditional Formatting
SSRS Conditional Formatting
SSRS Calculated Fields
SSRS Calculated Fields
SSRS Groups
SSRS Groups
Deploy SSIS
Deploy SSIS
Recently uploaded
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
Puma Security, LLC
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
Igalia
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
naman860154
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
debabhi2
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Neo4j
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
Safe Software
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Katpro Technologies
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
sudhanshuwaghmare1
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
Enterprise Knowledge
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
apidays
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
Delhi Call girls
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
Radu Cotescu
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
HampshireHUG
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
Antenna Manufacturer Coco
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
Malak Abu Hammad
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
The Digital Insurer
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
Pixlogix Infotech
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
The Digital Insurer
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
Michael W. Hawkins
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
Delhi Call girls
Recently uploaded
(20)
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
Data Warehouse Basics
1.
Data Warehouse Basics
Ram Kedem
2.
Copyright 2014 ©
Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Data Warehouse Basics •Data Usage Challenges •OLAP vs. OLTP •Understanding Normalization •OLAP •Star Schema Basics •Snowflake Schema Basics •Understanding Granularity •Auditing
3.
Copyright 2014 ©
Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Data Usage Challenges •Databases are usually divided into two separate types –OLTP / OLAP
4.
Copyright 2014 ©
Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com OLAP vs. OLTP OLTP SystemOnline Transaction Processing(Operational System) OLAP SystemOnline Analytical Processing(Data Warehouse) Source of data Operational data; OLTPs are the original source of the data. Consolidation data; OLAP data comes from the various OLTP Databases Purpose of data To control and run fundamental business tasks To help with planning, problem solving, and decision support What the data Reveals a snapshot of ongoing business processes Multi-dimensional views of various kinds of business activities Inserts and Updates Short and fast inserts and updates initiated by end users Periodic long-running batch jobs refresh the data Queries Relatively standardized and simple queries Returning relatively few records Often complex queries involving aggregations
5.
Copyright 2014 ©
Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com OLAP vs. OLTP OLTP SystemOnline Transaction Processing(Operational System) OLAP SystemOnline Analytical Processing(Data Warehouse) Processing Speed Typically very fast Depends on the amount of data involved; batch data refreshes and complex queries may take many hours; query speed can be improved by creating indexes Space Requirements Can be relatively small if historical data is archived Larger due to the existence of aggregation structures and history data; requires more indexes than OLTP Database Design Highly normalized with many tables Typically de-normalized with fewer tables; use of star and/or snowflake schemas Backup and Recovery Backup religiously; operational data is critical to run the business, data loss is likely to entail significant monetary loss and legal liability Instead of regular backups, some environments may consider simply reloading the OLTP data as a recovery method
6.
Copyright 2014 ©
Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Data Usage Challenges •Databases start out as OLTP (99.99 of times…) •OLAP functionality becomes a need as data accumulates •At some point two databases are required •The OLTP captures and manages daily transactions •The OLAP is periodically loaded with data from OLTP
7.
Copyright 2014 ©
Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Understanding Normalization •What is Normalization ? •The process of organizing the tables in a relational Database •Eliminates data redundancy •Lowers record locking •Increases efficiency in concurrency •Accomplished by dividing large tables into smaller tables •Tables have relationships defined
8.
Copyright 2014 ©
Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Understanding Normalization •Form zero
9.
Copyright 2014 ©
Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Understanding Normalization •First Form •Break each field down to the smallest meaningful value •Remove repeating groups of data and Create a separate table for each set of related data
10.
Copyright 2014 ©
Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Understanding Normalization •Second Form •Create new tables for data that applies to more than one record in a table •Add a related field (foreign key) to the table
11.
Copyright 2014 ©
Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Understanding Normalization •Third Form •Remove fields that do not relate to, or provide a fact about, the primary key. •Take the Manager, Dept, and Sector fields and moved to another table. In addistiona field to establish a relationship between the tables should be added
12.
Copyright 2014 ©
Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Normalized Structure Challenges •It is usually very inefficient for data extraction •Usually requires multiple table joins to reach all the data •Join queries can be a challenging to write •Join queries can be challenging for the Database Engine •It doesn’t store data in the form needed for data analysis •data is stored in the most detailed form, without aggregation •Data may be stored in multiple, normalized Databases
13.
Copyright 2014 ©
Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Star Schema Basics •What is a Star Schema ? •The simplest form of database structure used in a DWH •Answers the basic question : •What happened, who did it, when did they do it.. Etc. •Focuses on one, single business area •What advantaged does a start schema offer ? •Separates data into two main categories •Fact •Dimensions ( Descriptive information about the facts)
14.
Copyright 2014 ©
Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Star Schema Basics •Fact vs. Dimensions •Fact (what happened) •Product sold •Customer who bought •Etc. •Dimensions (Attributes that describe what happened) •When the product was sold •Day / Date / year / quarter •Where the product was sold
15.
Copyright 2014 ©
Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Star Schema Basics
16.
Copyright 2014 ©
Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Star Schema Basics
17.
Copyright 2014 ©
Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Understanding Fact Tables •A fact table is a collection of measurements •Note the word Measurements •This is usually a number, something we can measure about a specificbusiness process. •Fact table contains a single / multiple facts about a specific process (usually numeric) •Sales amount •Order quantity •Tax amount •Discount amount
18.
Copyright 2014 ©
Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Understanding Fact Tables •Fact tables may contain multiple measurements only if they are closely related. •A data warehouse will have many fact tables •Each table stores data (measure) for each specific business area) •Products sold Fact Table / shipment details Fact Table •Since fact tables design depends on science and data understanding, there are many ways by which fact tables can be designed.
19.
Copyright 2014 ©
Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Understanding Dimensions •Dimensions give context to measures (facts) •Dimensions give context, or specific meaning to facts. •The term “Dimension” usually refers to a table of related dimensions.
20.
Copyright 2014 ©
Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Understanding Dimensions •Example : •A facttable contains numbers of products sold •A date dimension table contains the following “dimensions” of dates pertaining the number of products sold •Date and time (15.09.2013 09:25:32) •Quarter •DayofYear(321) •Week (44) •Weekday (Thursday)
21.
Copyright 2014 ©
Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Understanding Dimensions •Each individual column in a dimension table is an attribute. •Attribute usually compress or expand data detail •Data can be “discretized” into smaller, summarized groups •Days (365 values) •Weeks (52 Values) •Months (12 values) •Quarters (4 values) •Hour / Minute / Second ..
22.
Copyright 2014 ©
Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com UnderstandingDimensions
23.
Copyright 2014 ©
Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Snowflake Schema Basics •What is a Snowflake Schema ? •A Star Schema with a little normalization added in •Dimension tables are normalized somewhat •Why use snowflake schema ? •To satisfy data gathering functionality of more advanced data warehousing / mining tools •To logically separate large dimensions tables •To more naturally separate dimensional data •Known customers vs. anonymous customers
24.
Copyright 2014 ©
Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Snowflake Schema Basics •One main rule concerning snowflake schema •Don’t use it, Unless you want to or need to.
25.
Copyright 2014 ©
Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Snowflake Schema Basics
26.
Copyright 2014 ©
Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Understanding Granularity •What is meant by the tem granularity in a DWH ? •The level of detail available •What determines Granularity •The level of data loaded into the fact table •For example, per order numbers vs. daily numbers vs. weekly numbers etc. •The number and detail level of dimensions •If we want to look into customer details but we don’t have customer dimension –this data won’t be available
27.
Copyright 2014 ©
Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Understanding Granularity •Granularity should be determined during database design. •This change can be made after database was created as well, but it will require much more effort. •This change may involve •Changing fact table structure •Possible changes in dimension tables •Changes in data loading
28.
Copyright 2014 ©
Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Auditing •Data warehouses do not store data as it is created. •OLTP databases are populated as business occurs •Source and purpose of data is generally self explanatory •Data is added when transaction occurs •DWH are populated from OLTP data •Based on various conditions •At various times •From various sources
29.
Copyright 2014 ©
Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Auditing •Data can be informative based on different aspects •The data itself •The source of the data •The volume of the data •These characteristics usually change over time •Auditing identify these aspects •Usually stored in tables •Describe source, duration of load, who performed the load, etc.
30.
Copyright 2014 ©
Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Auditing •SQL Server Integration Services •Provides SSIS logging