SlideShare a Scribd company logo
1 of 30
Data Warehouse Basics 
Ram Kedem
Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com 
Data Warehouse Basics 
•Data Usage Challenges 
•OLAP vs. OLTP 
•Understanding Normalization 
•OLAP 
•Star Schema Basics 
•Snowflake Schema Basics 
•Understanding Granularity 
•Auditing
Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com 
Data Usage Challenges 
•Databases are usually divided into two separate types –OLTP / OLAP
Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com 
OLAP vs. OLTP 
OLTP SystemOnline Transaction Processing(Operational System) 
OLAP SystemOnline Analytical Processing(Data Warehouse) 
Source of data 
Operational data; OLTPs are the original source of the data. 
Consolidation data; OLAP data comes from the various OLTP Databases 
Purpose of data 
To control and run fundamental business tasks 
To help with planning, problem solving, and decision support 
What the data 
Reveals a snapshot of ongoing business processes 
Multi-dimensional views of various kinds of business activities 
Inserts and Updates 
Short and fast inserts and updates initiated by end users 
Periodic long-running batch jobs refresh the data 
Queries 
Relatively standardized and simple queries Returning relatively few records 
Often complex queries involving aggregations
Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com 
OLAP vs. OLTP 
OLTP SystemOnline Transaction Processing(Operational System) 
OLAP SystemOnline Analytical Processing(Data Warehouse) 
Processing Speed 
Typically very fast 
Depends on the amount of data involved; batch data refreshes and complex queries may take many hours; query speed can be improved by creating indexes 
Space Requirements 
Can be relatively small if historical data is archived 
Larger due to the existence of aggregation structures and history data; requires more indexes than OLTP 
Database Design 
Highly normalized with many tables 
Typically de-normalized with fewer tables; use of star and/or snowflake schemas 
Backup and Recovery 
Backup religiously; operational data is critical to run the business, data loss is likely to entail significant monetary loss and legal liability 
Instead of regular backups, some environments may consider simply reloading the OLTP data as a recovery method
Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com 
Data Usage Challenges 
•Databases start out as OLTP (99.99 of times…) 
•OLAP functionality becomes a need as data accumulates 
•At some point two databases are required 
•The OLTP captures and manages daily transactions 
•The OLAP is periodically loaded with data from OLTP
Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com 
Understanding Normalization 
•What is Normalization ? 
•The process of organizing the tables in a relational Database 
•Eliminates data redundancy 
•Lowers record locking 
•Increases efficiency in concurrency 
•Accomplished by dividing large tables into smaller tables 
•Tables have relationships defined
Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com 
Understanding Normalization 
•Form zero
Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com 
Understanding Normalization 
•First Form 
•Break each field down to the smallest meaningful value 
•Remove repeating groups of data and Create a separate table for each set of related data
Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com 
Understanding Normalization 
•Second Form 
•Create new tables for data that applies to more than one record in a table 
•Add a related field (foreign key) to the table
Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com 
Understanding Normalization 
•Third Form 
•Remove fields that do not relate to, or provide a fact about, the primary key. 
•Take the Manager, Dept, and Sector fields and moved to another table. In addistiona field to establish a relationship between the tables should be added
Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com 
Normalized Structure Challenges 
•It is usually very inefficient for data extraction 
•Usually requires multiple table joins to reach all the data 
•Join queries can be a challenging to write 
•Join queries can be challenging for the Database Engine 
•It doesn’t store data in the form needed for data analysis 
•data is stored in the most detailed form, without aggregation 
•Data may be stored in multiple, normalized Databases
Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com 
Star Schema Basics 
•What is a Star Schema ? 
•The simplest form of database structure used in a DWH 
•Answers the basic question : 
•What happened, who did it, when did they do it.. Etc. 
•Focuses on one, single business area 
•What advantaged does a start schema offer ? 
•Separates data into two main categories 
•Fact 
•Dimensions ( Descriptive information about the facts)
Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com 
Star Schema Basics 
•Fact vs. Dimensions 
•Fact (what happened) 
•Product sold 
•Customer who bought 
•Etc. 
•Dimensions (Attributes that describe what happened) 
•When the product was sold 
•Day / Date / year / quarter 
•Where the product was sold
Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com 
Star Schema Basics
Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com 
Star Schema Basics
Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com 
Understanding Fact Tables 
•A fact table is a collection of measurements 
•Note the word Measurements 
•This is usually a number, something we can measure about a specificbusiness process. 
•Fact table contains a single / multiple facts about a specific process (usually numeric) 
•Sales amount 
•Order quantity 
•Tax amount 
•Discount amount
Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com 
Understanding Fact Tables 
•Fact tables may contain multiple measurements only if they are closely related. 
•A data warehouse will have many fact tables 
•Each table stores data (measure) for each specific business area) 
•Products sold Fact Table / shipment details Fact Table 
•Since fact tables design depends on science and data understanding, there are many ways by which fact tables can be designed.
Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com 
Understanding Dimensions 
•Dimensions give context to measures (facts) 
•Dimensions give context, or specific meaning to facts. 
•The term “Dimension” usually refers to a table of related dimensions.
Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com 
Understanding Dimensions 
•Example : 
•A facttable contains numbers of products sold 
•A date dimension table contains the following “dimensions” of dates pertaining the number of products sold 
•Date and time (15.09.2013 09:25:32) 
•Quarter 
•DayofYear(321) 
•Week (44) 
•Weekday (Thursday)
Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com 
Understanding Dimensions 
•Each individual column in a dimension table is an attribute. 
•Attribute usually compress or expand data detail 
•Data can be “discretized” into smaller, summarized groups 
•Days (365 values) 
•Weeks (52 Values) 
•Months (12 values) 
•Quarters (4 values) 
•Hour / Minute / Second ..
Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com 
UnderstandingDimensions
Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com 
Snowflake Schema Basics 
•What is a Snowflake Schema ? 
•A Star Schema with a little normalization added in 
•Dimension tables are normalized somewhat 
•Why use snowflake schema ? 
•To satisfy data gathering functionality of more advanced data warehousing / mining tools 
•To logically separate large dimensions tables 
•To more naturally separate dimensional data 
•Known customers vs. anonymous customers
Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com 
Snowflake Schema Basics 
•One main rule concerning snowflake schema 
•Don’t use it, Unless you want to or need to.
Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com 
Snowflake Schema Basics
Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com 
Understanding Granularity 
•What is meant by the tem granularity in a DWH ? 
•The level of detail available 
•What determines Granularity 
•The level of data loaded into the fact table 
•For example, per order numbers vs. daily numbers vs. weekly numbers etc. 
•The number and detail level of dimensions 
•If we want to look into customer details but we don’t have customer dimension –this data won’t be available
Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com 
Understanding Granularity 
•Granularity should be determined during database design. 
•This change can be made after database was created as well, but it will require much more effort. 
•This change may involve 
•Changing fact table structure 
•Possible changes in dimension tables 
•Changes in data loading
Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com 
Auditing 
•Data warehouses do not store data as it is created. 
•OLTP databases are populated as business occurs 
•Source and purpose of data is generally self explanatory 
•Data is added when transaction occurs 
•DWH are populated from OLTP data 
•Based on various conditions 
•At various times 
•From various sources
Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com 
Auditing 
•Data can be informative based on different aspects 
•The data itself 
•The source of the data 
•The volume of the data 
•These characteristics usually change over time 
•Auditing identify these aspects 
•Usually stored in tables 
•Describe source, duration of load, who performed the load, etc.
Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com 
Auditing 
•SQL Server Integration Services 
•Provides SSIS logging

More Related Content

What's hot

Introduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse ArchitectureIntroduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse ArchitectureDatabricks
 
Building an Effective Data Warehouse Architecture
Building an Effective Data Warehouse ArchitectureBuilding an Effective Data Warehouse Architecture
Building an Effective Data Warehouse ArchitectureJames Serra
 
Data warehousing - Dr. Radhika Kotecha
Data warehousing - Dr. Radhika KotechaData warehousing - Dr. Radhika Kotecha
Data warehousing - Dr. Radhika KotechaRadhika Kotecha
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)James Serra
 
The Importance of Metadata
The Importance of MetadataThe Importance of Metadata
The Importance of MetadataDATAVERSITY
 
ETL VS ELT.pdf
ETL VS ELT.pdfETL VS ELT.pdf
ETL VS ELT.pdfBOSupport
 
Enterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data ArchitectureEnterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data ArchitectureDATAVERSITY
 
Snowflake Architecture.pptx
Snowflake Architecture.pptxSnowflake Architecture.pptx
Snowflake Architecture.pptxchennakesava44
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)James Serra
 
Why shift from ETL to ELT?
Why shift from ETL to ELT?Why shift from ETL to ELT?
Why shift from ETL to ELT?HEXANIKA
 
Agile Data Engineering - Intro to Data Vault Modeling (2016)
Agile Data Engineering - Intro to Data Vault Modeling (2016)Agile Data Engineering - Intro to Data Vault Modeling (2016)
Agile Data Engineering - Intro to Data Vault Modeling (2016)Kent Graziano
 
Liberating data with Talend Data Catalog
Liberating data with Talend Data CatalogLiberating data with Talend Data Catalog
Liberating data with Talend Data CatalogJean-Michel Franco
 

What's hot (20)

Different data models
Different data modelsDifferent data models
Different data models
 
Introduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse ArchitectureIntroduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse Architecture
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
Building an Effective Data Warehouse Architecture
Building an Effective Data Warehouse ArchitectureBuilding an Effective Data Warehouse Architecture
Building an Effective Data Warehouse Architecture
 
Data warehousing - Dr. Radhika Kotecha
Data warehousing - Dr. Radhika KotechaData warehousing - Dr. Radhika Kotecha
Data warehousing - Dr. Radhika Kotecha
 
Data modelling 101
Data modelling 101Data modelling 101
Data modelling 101
 
Data Vault Overview
Data Vault OverviewData Vault Overview
Data Vault Overview
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 
The Importance of Metadata
The Importance of MetadataThe Importance of Metadata
The Importance of Metadata
 
ETL VS ELT.pdf
ETL VS ELT.pdfETL VS ELT.pdf
ETL VS ELT.pdf
 
Enterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data ArchitectureEnterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data Architecture
 
Snowflake Architecture.pptx
Snowflake Architecture.pptxSnowflake Architecture.pptx
Snowflake Architecture.pptx
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)
 
Database design
Database designDatabase design
Database design
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
Operational Data Vault
Operational Data VaultOperational Data Vault
Operational Data Vault
 
Why shift from ETL to ELT?
Why shift from ETL to ELT?Why shift from ETL to ELT?
Why shift from ETL to ELT?
 
Why Data Vault?
Why Data Vault? Why Data Vault?
Why Data Vault?
 
Agile Data Engineering - Intro to Data Vault Modeling (2016)
Agile Data Engineering - Intro to Data Vault Modeling (2016)Agile Data Engineering - Intro to Data Vault Modeling (2016)
Agile Data Engineering - Intro to Data Vault Modeling (2016)
 
Liberating data with Talend Data Catalog
Liberating data with Talend Data CatalogLiberating data with Talend Data Catalog
Liberating data with Talend Data Catalog
 

Viewers also liked

Data Warehouse Design Considerations
Data Warehouse Design ConsiderationsData Warehouse Design Considerations
Data Warehouse Design ConsiderationsRam Kedem
 
SSIS Basic Data Flow
SSIS Basic Data FlowSSIS Basic Data Flow
SSIS Basic Data FlowRam Kedem
 
SSIS Incremental ETL process
SSIS Incremental ETL processSSIS Incremental ETL process
SSIS Incremental ETL processRam Kedem
 
Big data (Data Size doesn't Matter, How and What is Data that's matter)
Big data (Data Size doesn't Matter, How and What is Data that's matter)Big data (Data Size doesn't Matter, How and What is Data that's matter)
Big data (Data Size doesn't Matter, How and What is Data that's matter)Syed Taimoor Hussain Shah
 
SSAS Cubes & Hierarchies
SSAS Cubes & HierarchiesSSAS Cubes & Hierarchies
SSAS Cubes & HierarchiesRam Kedem
 
SSIS Data Flow Tasks
SSIS Data Flow Tasks SSIS Data Flow Tasks
SSIS Data Flow Tasks Ram Kedem
 
Control Flow Using SSIS
Control Flow Using SSISControl Flow Using SSIS
Control Flow Using SSISRam Kedem
 

Viewers also liked (7)

Data Warehouse Design Considerations
Data Warehouse Design ConsiderationsData Warehouse Design Considerations
Data Warehouse Design Considerations
 
SSIS Basic Data Flow
SSIS Basic Data FlowSSIS Basic Data Flow
SSIS Basic Data Flow
 
SSIS Incremental ETL process
SSIS Incremental ETL processSSIS Incremental ETL process
SSIS Incremental ETL process
 
Big data (Data Size doesn't Matter, How and What is Data that's matter)
Big data (Data Size doesn't Matter, How and What is Data that's matter)Big data (Data Size doesn't Matter, How and What is Data that's matter)
Big data (Data Size doesn't Matter, How and What is Data that's matter)
 
SSAS Cubes & Hierarchies
SSAS Cubes & HierarchiesSSAS Cubes & Hierarchies
SSAS Cubes & Hierarchies
 
SSIS Data Flow Tasks
SSIS Data Flow Tasks SSIS Data Flow Tasks
SSIS Data Flow Tasks
 
Control Flow Using SSIS
Control Flow Using SSISControl Flow Using SSIS
Control Flow Using SSIS
 

Similar to Data Warehouse Basics

Data mining In SSAS
Data mining In SSASData mining In SSAS
Data mining In SSASRam Kedem
 
Data Mining in SSAS
Data Mining in SSASData Mining in SSAS
Data Mining in SSASRam Kedem
 
Conflict in the Cloud – Issues & Solutions for Big Data
Conflict in the Cloud – Issues & Solutions for Big DataConflict in the Cloud – Issues & Solutions for Big Data
Conflict in the Cloud – Issues & Solutions for Big DataHalo BI
 
Designing dashboards for performance shridhar wip 040613
Designing dashboards for performance shridhar wip 040613Designing dashboards for performance shridhar wip 040613
Designing dashboards for performance shridhar wip 040613Mrunal Shridhar
 
Column Statistics in Hive
Column Statistics in HiveColumn Statistics in Hive
Column Statistics in Hivevshreepadma
 
Top 5 Java Performance Metrics, Tips & Tricks
Top 5 Java Performance Metrics, Tips & TricksTop 5 Java Performance Metrics, Tips & Tricks
Top 5 Java Performance Metrics, Tips & TricksAppDynamics
 
Microservices for java architects schamburg-2015-05-19
Microservices for java architects schamburg-2015-05-19Microservices for java architects schamburg-2015-05-19
Microservices for java architects schamburg-2015-05-19Derek Ashmore
 
Pr dc 2015 sql server is cheaper than open source
Pr dc 2015 sql server is cheaper than open sourcePr dc 2015 sql server is cheaper than open source
Pr dc 2015 sql server is cheaper than open sourceTerry Bunio
 
The Future of Data Warehousing: ETL Will Never be the Same
The Future of Data Warehousing: ETL Will Never be the SameThe Future of Data Warehousing: ETL Will Never be the Same
The Future of Data Warehousing: ETL Will Never be the SameCloudera, Inc.
 
Microservices for Java Architects (Madison-Milwaukee, April 28-9, 2015)
Microservices for Java Architects (Madison-Milwaukee, April 28-9, 2015)Microservices for Java Architects (Madison-Milwaukee, April 28-9, 2015)
Microservices for Java Architects (Madison-Milwaukee, April 28-9, 2015)Derek Ashmore
 
ETL Testing - Introduction to ETL Testing
ETL Testing - Introduction to ETL TestingETL Testing - Introduction to ETL Testing
ETL Testing - Introduction to ETL TestingVibrant Event
 
ETL Testing - Introduction to ETL testing
ETL Testing - Introduction to ETL testingETL Testing - Introduction to ETL testing
ETL Testing - Introduction to ETL testingVibrant Event
 
Microservices for Java Architects (Chicago, April 21, 2015)
Microservices for Java Architects (Chicago, April 21, 2015)Microservices for Java Architects (Chicago, April 21, 2015)
Microservices for Java Architects (Chicago, April 21, 2015)Derek Ashmore
 
Designing Effective Storage Strategies to Meet Business Needs
Designing Effective Storage Strategies to Meet Business NeedsDesigning Effective Storage Strategies to Meet Business Needs
Designing Effective Storage Strategies to Meet Business NeedsBrian Anderson
 
Designing Effective Storage Strategies to Meet Business Needs
Designing Effective Storage Strategies to Meet Business NeedsDesigning Effective Storage Strategies to Meet Business Needs
Designing Effective Storage Strategies to Meet Business NeedsEagle Technologies
 
Faster, Cheaper, Easier... and Successful Best Practices for Big Data Integra...
Faster, Cheaper, Easier... and Successful Best Practices for Big Data Integra...Faster, Cheaper, Easier... and Successful Best Practices for Big Data Integra...
Faster, Cheaper, Easier... and Successful Best Practices for Big Data Integra...DataWorks Summit
 

Similar to Data Warehouse Basics (20)

Data mining In SSAS
Data mining In SSASData mining In SSAS
Data mining In SSAS
 
Data Mining in SSAS
Data Mining in SSASData Mining in SSAS
Data Mining in SSAS
 
Conflict in the Cloud – Issues & Solutions for Big Data
Conflict in the Cloud – Issues & Solutions for Big DataConflict in the Cloud – Issues & Solutions for Big Data
Conflict in the Cloud – Issues & Solutions for Big Data
 
Designing dashboards for performance shridhar wip 040613
Designing dashboards for performance shridhar wip 040613Designing dashboards for performance shridhar wip 040613
Designing dashboards for performance shridhar wip 040613
 
Column Statistics in Hive
Column Statistics in HiveColumn Statistics in Hive
Column Statistics in Hive
 
Datastage Introduction To Data Warehousing
Datastage Introduction To Data Warehousing Datastage Introduction To Data Warehousing
Datastage Introduction To Data Warehousing
 
Top 5 Java Performance Metrics, Tips & Tricks
Top 5 Java Performance Metrics, Tips & TricksTop 5 Java Performance Metrics, Tips & Tricks
Top 5 Java Performance Metrics, Tips & Tricks
 
Microservices for java architects schamburg-2015-05-19
Microservices for java architects schamburg-2015-05-19Microservices for java architects schamburg-2015-05-19
Microservices for java architects schamburg-2015-05-19
 
Pr dc 2015 sql server is cheaper than open source
Pr dc 2015 sql server is cheaper than open sourcePr dc 2015 sql server is cheaper than open source
Pr dc 2015 sql server is cheaper than open source
 
The Future of Data Warehousing: ETL Will Never be the Same
The Future of Data Warehousing: ETL Will Never be the SameThe Future of Data Warehousing: ETL Will Never be the Same
The Future of Data Warehousing: ETL Will Never be the Same
 
Microservices for Java Architects (Madison-Milwaukee, April 28-9, 2015)
Microservices for Java Architects (Madison-Milwaukee, April 28-9, 2015)Microservices for Java Architects (Madison-Milwaukee, April 28-9, 2015)
Microservices for Java Architects (Madison-Milwaukee, April 28-9, 2015)
 
Data Vault Introduction
Data Vault IntroductionData Vault Introduction
Data Vault Introduction
 
ETL Testing - Introduction to ETL testing
ETL Testing - Introduction to ETL testingETL Testing - Introduction to ETL testing
ETL Testing - Introduction to ETL testing
 
ETL Testing - Introduction to ETL Testing
ETL Testing - Introduction to ETL TestingETL Testing - Introduction to ETL Testing
ETL Testing - Introduction to ETL Testing
 
ETL Testing - Introduction to ETL testing
ETL Testing - Introduction to ETL testingETL Testing - Introduction to ETL testing
ETL Testing - Introduction to ETL testing
 
Microservices for Java Architects (Chicago, April 21, 2015)
Microservices for Java Architects (Chicago, April 21, 2015)Microservices for Java Architects (Chicago, April 21, 2015)
Microservices for Java Architects (Chicago, April 21, 2015)
 
dwproblems.pptx
dwproblems.pptxdwproblems.pptx
dwproblems.pptx
 
Designing Effective Storage Strategies to Meet Business Needs
Designing Effective Storage Strategies to Meet Business NeedsDesigning Effective Storage Strategies to Meet Business Needs
Designing Effective Storage Strategies to Meet Business Needs
 
Designing Effective Storage Strategies to Meet Business Needs
Designing Effective Storage Strategies to Meet Business NeedsDesigning Effective Storage Strategies to Meet Business Needs
Designing Effective Storage Strategies to Meet Business Needs
 
Faster, Cheaper, Easier... and Successful Best Practices for Big Data Integra...
Faster, Cheaper, Easier... and Successful Best Practices for Big Data Integra...Faster, Cheaper, Easier... and Successful Best Practices for Big Data Integra...
Faster, Cheaper, Easier... and Successful Best Practices for Big Data Integra...
 

More from Ram Kedem

Impala use case @ edge
Impala use case @ edgeImpala use case @ edge
Impala use case @ edgeRam Kedem
 
Advanced SQL Webinar
Advanced SQL WebinarAdvanced SQL Webinar
Advanced SQL WebinarRam Kedem
 
Managing oracle Database Instance
Managing oracle Database InstanceManaging oracle Database Instance
Managing oracle Database InstanceRam Kedem
 
Power Pivot and Power View
Power Pivot and Power ViewPower Pivot and Power View
Power Pivot and Power ViewRam Kedem
 
SQL Injections - Oracle
SQL Injections - OracleSQL Injections - Oracle
SQL Injections - OracleRam Kedem
 
SSAS Attributes
SSAS AttributesSSAS Attributes
SSAS AttributesRam Kedem
 
DDL Practice (Hebrew)
DDL Practice (Hebrew)DDL Practice (Hebrew)
DDL Practice (Hebrew)Ram Kedem
 
DML Practice (Hebrew)
DML Practice (Hebrew)DML Practice (Hebrew)
DML Practice (Hebrew)Ram Kedem
 
Exploring Oracle Database Architecture (Hebrew)
Exploring Oracle Database Architecture (Hebrew)Exploring Oracle Database Architecture (Hebrew)
Exploring Oracle Database Architecture (Hebrew)Ram Kedem
 
Introduction to SQL
Introduction to SQLIntroduction to SQL
Introduction to SQLRam Kedem
 
Introduction to Databases
Introduction to DatabasesIntroduction to Databases
Introduction to DatabasesRam Kedem
 
Deploy SSRS Project - SQL Server 2014
Deploy SSRS Project - SQL Server 2014Deploy SSRS Project - SQL Server 2014
Deploy SSRS Project - SQL Server 2014Ram Kedem
 
Pig - Processing XML data
Pig - Processing XML dataPig - Processing XML data
Pig - Processing XML dataRam Kedem
 
SSRS Basic Parameters
SSRS Basic ParametersSSRS Basic Parameters
SSRS Basic ParametersRam Kedem
 
SSRS Conditional Formatting
SSRS Conditional FormattingSSRS Conditional Formatting
SSRS Conditional FormattingRam Kedem
 
SSRS Calculated Fields
SSRS Calculated FieldsSSRS Calculated Fields
SSRS Calculated FieldsRam Kedem
 

More from Ram Kedem (20)

Impala use case @ edge
Impala use case @ edgeImpala use case @ edge
Impala use case @ edge
 
Advanced SQL Webinar
Advanced SQL WebinarAdvanced SQL Webinar
Advanced SQL Webinar
 
Managing oracle Database Instance
Managing oracle Database InstanceManaging oracle Database Instance
Managing oracle Database Instance
 
Power Pivot and Power View
Power Pivot and Power ViewPower Pivot and Power View
Power Pivot and Power View
 
SQL Injections - Oracle
SQL Injections - OracleSQL Injections - Oracle
SQL Injections - Oracle
 
SSAS Attributes
SSAS AttributesSSAS Attributes
SSAS Attributes
 
SSRS Matrix
SSRS MatrixSSRS Matrix
SSRS Matrix
 
DDL Practice (Hebrew)
DDL Practice (Hebrew)DDL Practice (Hebrew)
DDL Practice (Hebrew)
 
DML Practice (Hebrew)
DML Practice (Hebrew)DML Practice (Hebrew)
DML Practice (Hebrew)
 
Exploring Oracle Database Architecture (Hebrew)
Exploring Oracle Database Architecture (Hebrew)Exploring Oracle Database Architecture (Hebrew)
Exploring Oracle Database Architecture (Hebrew)
 
Introduction to SQL
Introduction to SQLIntroduction to SQL
Introduction to SQL
 
Introduction to Databases
Introduction to DatabasesIntroduction to Databases
Introduction to Databases
 
Deploy SSRS Project - SQL Server 2014
Deploy SSRS Project - SQL Server 2014Deploy SSRS Project - SQL Server 2014
Deploy SSRS Project - SQL Server 2014
 
Pig - Processing XML data
Pig - Processing XML dataPig - Processing XML data
Pig - Processing XML data
 
SSRS Basic Parameters
SSRS Basic ParametersSSRS Basic Parameters
SSRS Basic Parameters
 
SSRS Gauges
SSRS GaugesSSRS Gauges
SSRS Gauges
 
SSRS Conditional Formatting
SSRS Conditional FormattingSSRS Conditional Formatting
SSRS Conditional Formatting
 
SSRS Calculated Fields
SSRS Calculated FieldsSSRS Calculated Fields
SSRS Calculated Fields
 
SSRS Groups
SSRS GroupsSSRS Groups
SSRS Groups
 
Deploy SSIS
Deploy SSISDeploy SSIS
Deploy SSIS
 

Recently uploaded

Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 

Recently uploaded (20)

Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 

Data Warehouse Basics

  • 2. Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Data Warehouse Basics •Data Usage Challenges •OLAP vs. OLTP •Understanding Normalization •OLAP •Star Schema Basics •Snowflake Schema Basics •Understanding Granularity •Auditing
  • 3. Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Data Usage Challenges •Databases are usually divided into two separate types –OLTP / OLAP
  • 4. Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com OLAP vs. OLTP OLTP SystemOnline Transaction Processing(Operational System) OLAP SystemOnline Analytical Processing(Data Warehouse) Source of data Operational data; OLTPs are the original source of the data. Consolidation data; OLAP data comes from the various OLTP Databases Purpose of data To control and run fundamental business tasks To help with planning, problem solving, and decision support What the data Reveals a snapshot of ongoing business processes Multi-dimensional views of various kinds of business activities Inserts and Updates Short and fast inserts and updates initiated by end users Periodic long-running batch jobs refresh the data Queries Relatively standardized and simple queries Returning relatively few records Often complex queries involving aggregations
  • 5. Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com OLAP vs. OLTP OLTP SystemOnline Transaction Processing(Operational System) OLAP SystemOnline Analytical Processing(Data Warehouse) Processing Speed Typically very fast Depends on the amount of data involved; batch data refreshes and complex queries may take many hours; query speed can be improved by creating indexes Space Requirements Can be relatively small if historical data is archived Larger due to the existence of aggregation structures and history data; requires more indexes than OLTP Database Design Highly normalized with many tables Typically de-normalized with fewer tables; use of star and/or snowflake schemas Backup and Recovery Backup religiously; operational data is critical to run the business, data loss is likely to entail significant monetary loss and legal liability Instead of regular backups, some environments may consider simply reloading the OLTP data as a recovery method
  • 6. Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Data Usage Challenges •Databases start out as OLTP (99.99 of times…) •OLAP functionality becomes a need as data accumulates •At some point two databases are required •The OLTP captures and manages daily transactions •The OLAP is periodically loaded with data from OLTP
  • 7. Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Understanding Normalization •What is Normalization ? •The process of organizing the tables in a relational Database •Eliminates data redundancy •Lowers record locking •Increases efficiency in concurrency •Accomplished by dividing large tables into smaller tables •Tables have relationships defined
  • 8. Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Understanding Normalization •Form zero
  • 9. Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Understanding Normalization •First Form •Break each field down to the smallest meaningful value •Remove repeating groups of data and Create a separate table for each set of related data
  • 10. Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Understanding Normalization •Second Form •Create new tables for data that applies to more than one record in a table •Add a related field (foreign key) to the table
  • 11. Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Understanding Normalization •Third Form •Remove fields that do not relate to, or provide a fact about, the primary key. •Take the Manager, Dept, and Sector fields and moved to another table. In addistiona field to establish a relationship between the tables should be added
  • 12. Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Normalized Structure Challenges •It is usually very inefficient for data extraction •Usually requires multiple table joins to reach all the data •Join queries can be a challenging to write •Join queries can be challenging for the Database Engine •It doesn’t store data in the form needed for data analysis •data is stored in the most detailed form, without aggregation •Data may be stored in multiple, normalized Databases
  • 13. Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Star Schema Basics •What is a Star Schema ? •The simplest form of database structure used in a DWH •Answers the basic question : •What happened, who did it, when did they do it.. Etc. •Focuses on one, single business area •What advantaged does a start schema offer ? •Separates data into two main categories •Fact •Dimensions ( Descriptive information about the facts)
  • 14. Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Star Schema Basics •Fact vs. Dimensions •Fact (what happened) •Product sold •Customer who bought •Etc. •Dimensions (Attributes that describe what happened) •When the product was sold •Day / Date / year / quarter •Where the product was sold
  • 15. Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Star Schema Basics
  • 16. Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Star Schema Basics
  • 17. Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Understanding Fact Tables •A fact table is a collection of measurements •Note the word Measurements •This is usually a number, something we can measure about a specificbusiness process. •Fact table contains a single / multiple facts about a specific process (usually numeric) •Sales amount •Order quantity •Tax amount •Discount amount
  • 18. Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Understanding Fact Tables •Fact tables may contain multiple measurements only if they are closely related. •A data warehouse will have many fact tables •Each table stores data (measure) for each specific business area) •Products sold Fact Table / shipment details Fact Table •Since fact tables design depends on science and data understanding, there are many ways by which fact tables can be designed.
  • 19. Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Understanding Dimensions •Dimensions give context to measures (facts) •Dimensions give context, or specific meaning to facts. •The term “Dimension” usually refers to a table of related dimensions.
  • 20. Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Understanding Dimensions •Example : •A facttable contains numbers of products sold •A date dimension table contains the following “dimensions” of dates pertaining the number of products sold •Date and time (15.09.2013 09:25:32) •Quarter •DayofYear(321) •Week (44) •Weekday (Thursday)
  • 21. Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Understanding Dimensions •Each individual column in a dimension table is an attribute. •Attribute usually compress or expand data detail •Data can be “discretized” into smaller, summarized groups •Days (365 values) •Weeks (52 Values) •Months (12 values) •Quarters (4 values) •Hour / Minute / Second ..
  • 22. Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com UnderstandingDimensions
  • 23. Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Snowflake Schema Basics •What is a Snowflake Schema ? •A Star Schema with a little normalization added in •Dimension tables are normalized somewhat •Why use snowflake schema ? •To satisfy data gathering functionality of more advanced data warehousing / mining tools •To logically separate large dimensions tables •To more naturally separate dimensional data •Known customers vs. anonymous customers
  • 24. Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Snowflake Schema Basics •One main rule concerning snowflake schema •Don’t use it, Unless you want to or need to.
  • 25. Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Snowflake Schema Basics
  • 26. Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Understanding Granularity •What is meant by the tem granularity in a DWH ? •The level of detail available •What determines Granularity •The level of data loaded into the fact table •For example, per order numbers vs. daily numbers vs. weekly numbers etc. •The number and detail level of dimensions •If we want to look into customer details but we don’t have customer dimension –this data won’t be available
  • 27. Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Understanding Granularity •Granularity should be determined during database design. •This change can be made after database was created as well, but it will require much more effort. •This change may involve •Changing fact table structure •Possible changes in dimension tables •Changes in data loading
  • 28. Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Auditing •Data warehouses do not store data as it is created. •OLTP databases are populated as business occurs •Source and purpose of data is generally self explanatory •Data is added when transaction occurs •DWH are populated from OLTP data •Based on various conditions •At various times •From various sources
  • 29. Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Auditing •Data can be informative based on different aspects •The data itself •The source of the data •The volume of the data •These characteristics usually change over time •Auditing identify these aspects •Usually stored in tables •Describe source, duration of load, who performed the load, etc.
  • 30. Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Auditing •SQL Server Integration Services •Provides SSIS logging