SlideShare a Scribd company logo
1 of 47
© 2015 Real-Time Technology Solutions, Inc.
New York  Philadelphia  Atlanta  www.rtts.com
What is a Data Warehouse and
How Do I Test It?
A primer for Testers on Data Warehouses, the ETL process and Business
Intelligence and how to test them
© 2011 Real-Time Technology Solutions, Inc.
New York  Philadelphia  Atlanta  www.rtts.com
built by
QuerySurge™
About
FACTS
Founded:
1996 – consulting
firm
Locations:
New York (HQ), Atlanta,
Philadelphia, Phoenix
Strategic Partners:
IBM, Microsoft, HP,
Oracle, Teradata,
HortonWorks, Cloudera,
Amazon
Software:
QuerySurge
RTTS is the leading provider of software & data quality
for critical business systems
Overview
 What is Big Data?
 What is a Data Warehouse?
o About the ETL Process
o The Data Warehouse marketplace
 What is Business Intelligence?
o The architecture
o The BI marketplace
 Testing the DW Architecture
o Entry points
o The Mapping document
o Functional test implementation
o Test Tools
 Testing BI
o Functional test implementation
o Performance Testing
 Data Warehouse Test Tool demo
 Q&A
ETL
Business Intelligence (BI) software
CxOs are using Business Intelligence & Analytics to make critical business decisions
– with the assumption that the underlying data is fine.
“The average organization loses
$8.2 million annually through
poor Data Quality.”
- Gartner
Data Architecture
The Executive Office and Critical Data
potential problem
areas
What is Big Data?
Big data – defined as too much volume, velocity and
variability to work on normal database architectures.
What is Big Data?
“The market for big data is $70 billion and growing
by 15% a year.”
- EMC COO Pat Gelsinger
Size
Defined as 5 petabytes or more
1 petabyte = 1,000 terabytes
1,000 terabytes = 1,000,000 gigabytes
1,000,000 gigabytes = 1,000,000,000 megabytes
Big Data Impact
Handles more than 1 million customer transactions every hour.
• data imported into databases that contain > 2.5 petabytes of data
• the equivalent of 167 times the information contained in all the books in the US Library of
Congress.
Facebook handles 40 billion photos from its user base.
Google processes 1 Terabyte per hour
Twitter processes 85 million tweets per day
eBay processes 80 Terabytes per day
others
Requires exceptional technologies to efficiently process large quantities of
data within tolerable elapsed times.
Technologies include:
• massively parallel processing (MPP) databases
• data warehouses
• Data mining grids
• distributed file systems
• distributed databases
• cloud computing platforms
• the Internet, and
• scalable storage system
Big Data Solutions
What is a Data Warehouse?
What is a Data Warehouse?
Data Warehouse
• typically a relational database that is designed for query and
analysis rather than for transaction processing
• a place where historical data is stored for archival, analysis and
security purposes.
• contains either raw data or formatted data
• combines data from multiple sources
• Sales
• salaries
• operational data
• human resource data
• inventory data
• web logs
• Social networks
• Internet text and docs
• other
Legacy DB
CRM/ERP
DB
Finance DB
Data Warehouse: Business Case
Why build a Data Warehouse?
• Data stored in operational systems (OLTP) not
easily accessible
• OLTP systems are not designed for end-user
analysis
• The data in OLTP is constantly changing
• May be deficient in historical data
• Diverse forms of data stored in different platforms
and/or dissimilar formats
Data Warehouse: Business Case
The Data Warehouse Business Solution
• Collects data from different sources (other databases,
files, web services, etc)
• Integrates data into logical business areas
• Provides direct access to data with powerful reporting
tools (BI)
Data Warehouse: About the data
The Data Warehouse data
• Subject-oriented
• Integrated
• Non-volatile
• Time-variant
Data Warehouse: the ETL process
ETL = Extract, Transform, Load
Why ETL?
Need to load the data warehouse regularly (daily/weekly) so that it
can serve its purpose of facilitating business analysis.
Extract - data from one or more OLTP systems and copied into
the warehouse
Extract
Transform – removing inconsistencies, assemble to a common
format, adding missing fields, summarizing detailed data and
deriving new fields to store calculated data.
Transform
Load – map the data and load it into the DW
Load
Data Warehouse: the ETL process
Extract
Legacy DB
CRM/ERP
DB
Finance DB
Source
Data
ETL Process Target
Data Warehouse
Transform
Load
Data Warehouse: the Marketplace
“The data warehousing market will see a compound annual growth rate of
11.5% through 2013 to reach a total of $13.2 billion in revenue.”
- consulting specialist The 451 Group
Data Warehouse size
Small data warehouses: < 5 TB
Midsize data warehouses: 5 TB - 20 TB
Large data warehouses: >20 TB
- Analyst firm Gartner
Leaders in Data Warehouse Data Management Systems
  
  
- Analyst firm Gartner’s ‘Magic Quadrant for Data Warehouse Database Management Systems’
Data Warehouse: the Marketplace
Delivery Models
• stand-alone DBMS software
• Cloud offerings
• data warehouse appliances
Leading Appliance Makers
Business Intelligence (BI)
Business Intelligence (BI)
B.I. – What is it?
• Software applications used in spotting,
digging-out, and analyzing business data
• provides simple access to data which can be
used in day to day operations, integrates
data into logical business areas
• provides historical, current and predictive
views of business operations
• made up of several related activities,
including data mining, online analytical
processing, querying and reporting.
Business Intelligence (BI): Who uses it?
Wal-Mart uses vast amounts of data and
category analysis to dominate the industry.
Amazon and Yahoo follow a "test and learn"
approach to business changes.
Hardee’s, Wendy’s, and T.G.I. Friday’s use
BI to make strategic decisions.
Business Intelligence (BI) & Data Marts
Data Mart
A database that has the same characteristics as a data
warehouse, but is usually smaller and is focused on the data
for one division or one workgroup within an enterprise.
Typically hold aggregated data and some granular data.
It is a subset of the DW and makes it more efficient for
Business Intelligence reporting.
Legacy
DB
CRM/E
RP DB
Finance
DB
ETL ETL
Source Data ETL Process Target DW ETL Process Data Mart
Business Intelligence (BI)
Legacy DB
CRM/ERP
DB
Finance DB
ETL ETL
Source Data
ETL Process Target DW
ETL Process
Data Mart
BI: the Marketplace
“Worldwide business intelligence (BI) platform, analytic applications and
performance management (PM) software revenue reached $10.5 billion in
2010, a 13.4 percent increase from 2009 revenue of $9.3 billion”
“The four large "stack" vendors (SAP, Oracle, IBM and Microsoft) continue to
consolidate the market, owning 59 percent of the market share. ”
- Analyst firm Gartner
- Analyst firm Forrester Research’s ‘Forrester Wave’
Leaders in BI
   
   
Testing a Data Warehouse (DWH)
DataWarehouseTesting
The Challenge
Comprehensive testing of data at every point throughout data process is becoming increasingly important as
more data is being used in strategic decision-making. Yet current strategies are time-consuming, resource-
intensive and inefficient.
What's Involved in Data Testing?
According to authors Doug Vucevic and Wayne Yaddow in the book "Testing the Data Warehouse Practicum:
Assuring Data Content, Data Structures and Quality", some of the main challenges of data testing are:
Data Completeness
Verifying that all data has been loaded from the sources to the target.
Data Transformation
Ensuring that all data has been transformed correctly during the extract-
transform-load (ETL) process.
Data Quality
Ensuring that the ETL process correctly rejects, substitutes default values,
corrects or ignores and reports invalid data.
Regression Testing
Ensuring existing functionality remains intact each time a new release of
code is completed.
Resources involved
• Business Analysts create requirements
• QA Testers develop and execute test plans and test cases.
***Skill Set required: Strong SQL!!!
• Architects set up test environments
• Developers perform unit tests
• DBAs test for performance and stress
• Business Users perform functional User Acceptance Tests
Testing the DWH: Resources Involved
For the purposes of this presentation, we will focus on a
strategy for Testers.
An effective data warehouse testing strategy focuses on the main
structures within the data warehouse architecture:
1) The Sources
2) The ETL layer
3) The data warehouse itself
4) The front-end (BI) data warehouse applications
Testing the Data Warehouse: the Strategy
Testing the Data Warehouse: Entry Points
Recommended functional test strategy: Test every entry point in the
system (feeds, databases, internal messaging, front-end transactions).
The goal: provide rapid localization of data issues between points
test entry point(s) test entry point test entry point
Legacy DB
CRM/ERP
DB
Finance DB
ETL ETL
Source Data ETL Process Target DW ETL Process Data Mart
Business
Intelligence
software
Target DW
Testing the Data Warehouse: Entry Points
Legacy DB
CRM/ERP
DB
Finance DB
Source Data
File
File
Staging DB
ETL Process
ETL
ETL
ETL
ETL
ETL
ETL
test entry pointstest entry points
test entry points test entry points
Data MartsETL Process
ETL
ETL
possible
architectureETL
ETL
ETL
ETL
ETL
ETL
ETL Process
Business
Intelligence
software
Testing the DWH: the Mapping Document
a.k.a. Source to Target Map
It’s the critical element
required to efficiently plan the
ETL process.
Intention:
 capture business rules
 data flow mapping and
 data movement requirements.
Mapping Doc specifies:
 Source input definition
 Target/output details
 Business & data transformation
rules
 Data quality requirements
Testing the DWH: the Mapping Document
SELECT c.idCustomer "Customer ID", c.lastName "Customer Last
Name", c.firstName "Customer First Name", o.idOrder "Order
Number", p.name "Product Name", op.quantity "Quantity
Ordered",
CASE
WHEN os.idOrderStatus = 5 AND o.refundDate IS NOT NULL
THEN 'Returned'
WHEN (os.idOrderStatus = 3 OR os.idOrderStatus = 4) AND
o.shipDate IS NOT NULL THEN 'Delivered'
ELSE 'Processing'
END "Order Status"
FROM Sales.Orders o, Sales.OrderStatus os, Sales.OrderProduct op,
Sales.Product p, Sales.Category cat, Sales.Customer c
WHERE o.order_idOrderStatus = os.idorderstatus AND
op.orderProduct_idOrder = o.idOrder AND
op.orderProduct_idProduct = p.idProduct AND
p.product_idCategory = cat.idCategory AND
cat.name = 'Electronics' AND
o.order_idCustomer = c.idCustomer AND
o.orderDate BETWEEN '01-SEP-10' AND '07-SEP-10'
ORDER BY c.idCustomer, c.lastName, c.firstName, o.idorder
Source
SELECT u.idUser "Customer ID", u.lastName "Customer
Last Name", u.firstName "Customer First Name",
p.idPurchase "Purchase Number", i.name "Item Name",
oi.quantity "Quantity Ordered", ps.status "Purchase
Status"
FROM dw.Purchase p, dw.PurchaseStatus ps,
dw.OrderItem oi, dw.Item i, dw.user_ u, dw.category cat
WHERE p.purchase_idPurchaseStatus =
ps.idPurchaseStatus AND
oi.orderItem_idPurchase = p.idPurchase AND
oi.orderItem_idItem = i.idItem AND
p.purchase_idUser = u.idUser AND
i.item_idCategory = cat.idCategory AND
cat.name = 'Electronics' AND
SUBSTR(p.purchaseDate, 1, 5) BETWEEN '09-01' AND
'09-07' AND
SUBSTR(p.purchaseDate, -2) = '10'
ORDER BY u.idUser, u.lastname, u.firstname,
p.idpurchase
Target
Testing the DWH: Implementation
Implementation of Functional Test
What is going on in the marketplace?
1. Manual Execution
2. Automated execution with standard test
tools
3. Bulk automation with Data Warehouse
Testing Tool (i.e. QuerySurge)
© 2015 Real-Time Technology Solutions, Inc.
Review
Mapping
Docs
Write SQL in
favorite
editor
Run
TESTs
Dump results
to a file
Compare results
manually or
w/compare tool
Report
Defects and
issues
Tools Tasks
Timeline
Testing the DWH: Manual Testing Flow
Manual ETL Testing Flow Comments
 Check points across each leg so that each transformation is checked.
 If a file compare tool is used, care must be taken to ensure that the result
rows for each query are in the same order (the db is under no obligation
to return rows in a specified order, unless the sql indicates an order).
 This process can quickly result in 100’s or 1,000’s of source and target
query pairs.
 Process is labor intensive. Even with multiple people, a VERY small
sampling can be performed.
Testing the DWH: Manual Testing Flow
Functional Automation ETL Testing flow
1. Similar to previous - Extract mappings from mapping document
2. Write pairs of queries that test between any two points in the
architecture.
3. Issue the queries via a Functional Automation tool
4. Have the functional Scripts dump the query result-sets to files
5. Compare the files, either by writing automation code or by using a file
compare tool.
This process is dependent on the speed of the automation tool; Normally, only
a fraction of the data can be covered per ETL per build.
Functional Tester
Testing the DWH:
Typical Functional Automation Testing Flow
36
© 2015 Real-Time Technology Solutions, Inc.
SQL
(source)
SQL
(target)
SQL
(source)
SQL
(target)
Legacy DB
CRM/ERP
DB
Finance DB
Testing the Data Warehouse:
Specialized Data Warehouse Test Tool
QuerySurge™
QuerySurge™
the collaborative
Data Warehouse
Testing solution that
finds bad data &
provides a holistic
view of your data’s
health
built by
• Reduce your costs & risks
• Improve your data quality
• Accelerate your testing cycles
• Share information with your team
with QuerySurge™ you can:
built by
QuerySurge™
• Provides huge ROI (i.e. 1,300%)*
*based on client’s calculation of Return on Investment
the QuerySurge advantage
built by
QuerySurge™
Automate the entire testing cycle
 Automate kickoff, tests, comparison, auto-emailed results
Create Tests easily with no SQL programming
 ensures minimal time & effort to create tests / obtain results
Test across different platforms
 data warehouse, Hadoop, NoSQL, database, flat file, XML
Collaborate with team
 Data Health dashboard, shared tests & auto-emailed reports
Verify more data & do it quickly
 verifies up to 100% of all data up to 1,000 x faster
Integrate for Continuous Delivery
 Integrates with most Build, ETL & QA management software
QuerySurge™ Architecture
Web-based…
Installs on...
Linux
Connects to…
…or any other JDBC compliant data source
built by
QuerySurge™
QuerySurge
Controller
QuerySurge
Server
QuerySurge
Agents
Flat Files
Collaboration
Testers
- functional testing
- regression testing
- result analysis
Developers / DBAs
- unit testing
- result analysis
Data Analysts
- review, analyze data
- verify mapping failures
Operations teams
- monitoring
- result analysis
Managers
- oversight
- result analysis
Share information on the
built by
QuerySurge™
Strategy
• Execute business user reports and verify results from
report to Data Mart
» Logical Calculations
− Verify logical calculations to back-end Data Mart by
creating SQL queries that incorporate and return the
calculations from the Data Mart. Compare to report.
(Example: Total sales for the month of January)
» Data Validation
− Verify data validations to back-end Data Mart by creating
SQL queries that incorporate and return the equivalent data
from the Data Mart. Compare to report. (Example: List of all
customers that spent more than $100)
» Parameter Validation
− For reports that have parameters, create multiple tests that
incorporates a reasonable amount of test coverage.
Testing the Data Warehouse:
Functional Test of Business Intelligence software
Testing the DWH: Functional Test of BI
Functional Testing of BI
1. BI Developer creates reports based on Business user
requirements
2. Testers verify reports by:
• Running reports using a range of parameter permutations.
• Verify that data is correct
o Record counts on report to backend data mart
o Verify field data elements
o Verify field lengths and field level data
o Verify logical dependencies
Functional Tester
Automation tools can and should be used for regression purposes.
Common Challenges
• BI systems often have reports that require complex SQL queries
across dozens of tables encompassing 100’s of 1,000’s of records, from
multiple databases.
• Challenge: Determining the performance characteristics under
differing conditions and workloads .
• Need to know the ability of the system to scale to the # of
concurrent users.
• Must test how length of time for user to receive report after
requesting it with the parameters he/she specifies.
Testing the Data Warehouse:
Performance Test of BI
Testing the DWH: Performance Test of BI
Strategy
• Determine a typical workload for the business intelligence system.
• Identify different user roles, what kinds of work they do on the system, and
how often they do this work.
• Determine how many users of each role there are.
• Choose a performance tool that can record the protocol activity of the
system and allow the performance tester to modify data parameters.
• Create scripts by recording the protocol traffic emitted by the BI system as
the targeted reports were opened and refreshed.
• Prepare and execute series of concurrent multi-user tests
• Make sure each virtual user emulates the activity of real users accessing
business intelligence reports based on separate concerns.
• Monitor response times, throughput, network activity, and system activity
for issues
• Review results and provide recommendations.
Using this approach, the workload activity of the entire population of
business intelligence users can be reproduced in controlled conditions
Performance Tester
Summary
What is a Data Warehouse and
How Do I Test It?
• Big Data is a growing technical concern and has reached
$70 billion in scope.
• The Data Warehouse and Business Intelligence software
marketplace is a $22 billion market and growing.
• Functional testing of a data warehouse implementation
is a complex undertaking and requires strong SQL skills
by the Tester
• Manual testing and automated testing using standard
tools provide a very small % of coverage.
• Business Intelligence software must be properly tested
for both functionality and performance.
© 2015 Real-Time Technology Solutions, Inc.
47
To see the video of this Webinar please visit:
http://www.querysurge.com/solutions/data-warehouse-testing
What is a Data Warehouse
and
How Do I Test It?

More Related Content

What's hot

Date warehousing concepts
Date warehousing conceptsDate warehousing concepts
Date warehousing conceptspcherukumalla
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)James Serra
 
Building an Effective Data Warehouse Architecture
Building an Effective Data Warehouse ArchitectureBuilding an Effective Data Warehouse Architecture
Building an Effective Data Warehouse ArchitectureJames Serra
 
Power BI Overview
Power BI OverviewPower BI Overview
Power BI OverviewJames Serra
 
Data Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future OutlookData Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future OutlookJames Serra
 
Migrating on premises workload to azure sql database
Migrating on premises workload to azure sql databaseMigrating on premises workload to azure sql database
Migrating on premises workload to azure sql databasePARIKSHIT SAVJANI
 
Power BI Made Simple
Power BI Made SimplePower BI Made Simple
Power BI Made SimpleJames Serra
 
Azure data analytics platform - A reference architecture
Azure data analytics platform - A reference architecture Azure data analytics platform - A reference architecture
Azure data analytics platform - A reference architecture Rajesh Kumar
 
Data Discovery at Databricks with Amundsen
Data Discovery at Databricks with AmundsenData Discovery at Databricks with Amundsen
Data Discovery at Databricks with AmundsenDatabricks
 
Enabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data VirtualizationEnabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data VirtualizationDenodo
 
DI&A Slides: Data Lake vs. Data Warehouse
DI&A Slides: Data Lake vs. Data WarehouseDI&A Slides: Data Lake vs. Data Warehouse
DI&A Slides: Data Lake vs. Data WarehouseDATAVERSITY
 
Data Warehouse Basic Guide
Data Warehouse Basic GuideData Warehouse Basic Guide
Data Warehouse Basic Guidethomasmary607
 
Introduction to Data Warehousing
Introduction to Data WarehousingIntroduction to Data Warehousing
Introduction to Data WarehousingEyad Manna
 
Etl And Data Test Guidelines For Large Applications
Etl And Data Test Guidelines For Large ApplicationsEtl And Data Test Guidelines For Large Applications
Etl And Data Test Guidelines For Large ApplicationsWayne Yaddow
 
ADF Mapping Data Flows Level 300
ADF Mapping Data Flows Level 300ADF Mapping Data Flows Level 300
ADF Mapping Data Flows Level 300Mark Kromer
 
Data warehousing testing strategies cognos
Data warehousing testing strategies cognosData warehousing testing strategies cognos
Data warehousing testing strategies cognosSandeep Mehta
 
Delta lake and the delta architecture
Delta lake and the delta architectureDelta lake and the delta architecture
Delta lake and the delta architectureAdam Doyle
 

What's hot (20)

ETL Testing Overview
ETL Testing OverviewETL Testing Overview
ETL Testing Overview
 
Date warehousing concepts
Date warehousing conceptsDate warehousing concepts
Date warehousing concepts
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 
Building an Effective Data Warehouse Architecture
Building an Effective Data Warehouse ArchitectureBuilding an Effective Data Warehouse Architecture
Building an Effective Data Warehouse Architecture
 
Power BI Overview
Power BI OverviewPower BI Overview
Power BI Overview
 
Data Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future OutlookData Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future Outlook
 
Migrating on premises workload to azure sql database
Migrating on premises workload to azure sql databaseMigrating on premises workload to azure sql database
Migrating on premises workload to azure sql database
 
Power BI Made Simple
Power BI Made SimplePower BI Made Simple
Power BI Made Simple
 
Data lake ppt
Data lake pptData lake ppt
Data lake ppt
 
Azure data analytics platform - A reference architecture
Azure data analytics platform - A reference architecture Azure data analytics platform - A reference architecture
Azure data analytics platform - A reference architecture
 
Data Discovery at Databricks with Amundsen
Data Discovery at Databricks with AmundsenData Discovery at Databricks with Amundsen
Data Discovery at Databricks with Amundsen
 
Enabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data VirtualizationEnabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data Virtualization
 
DI&A Slides: Data Lake vs. Data Warehouse
DI&A Slides: Data Lake vs. Data WarehouseDI&A Slides: Data Lake vs. Data Warehouse
DI&A Slides: Data Lake vs. Data Warehouse
 
Data Warehouse Basic Guide
Data Warehouse Basic GuideData Warehouse Basic Guide
Data Warehouse Basic Guide
 
Introduction to Data Warehousing
Introduction to Data WarehousingIntroduction to Data Warehousing
Introduction to Data Warehousing
 
Power BI Overview
Power BI OverviewPower BI Overview
Power BI Overview
 
Etl And Data Test Guidelines For Large Applications
Etl And Data Test Guidelines For Large ApplicationsEtl And Data Test Guidelines For Large Applications
Etl And Data Test Guidelines For Large Applications
 
ADF Mapping Data Flows Level 300
ADF Mapping Data Flows Level 300ADF Mapping Data Flows Level 300
ADF Mapping Data Flows Level 300
 
Data warehousing testing strategies cognos
Data warehousing testing strategies cognosData warehousing testing strategies cognos
Data warehousing testing strategies cognos
 
Delta lake and the delta architecture
Delta lake and the delta architectureDelta lake and the delta architecture
Delta lake and the delta architecture
 

Similar to What is a Data Warehouse and How Do I Test It?

the Data World Distilled
the Data World Distilledthe Data World Distilled
the Data World DistilledRTTS
 
Datawarehousing & DSS
Datawarehousing & DSSDatawarehousing & DSS
Datawarehousing & DSSDeepali Raut
 
Big Data's Impact on the Enterprise
Big Data's Impact on the EnterpriseBig Data's Impact on the Enterprise
Big Data's Impact on the EnterpriseCaserta
 
Why do Data Warehousing & Business Intelligence go hand in hand?
Why do Data Warehousing & Business Intelligence go hand in hand? Why do Data Warehousing & Business Intelligence go hand in hand?
Why do Data Warehousing & Business Intelligence go hand in hand? Vineet Chaturvedi
 
Mastering in data warehousing & BusinessIintelligence
Mastering in data warehousing & BusinessIintelligenceMastering in data warehousing & BusinessIintelligence
Mastering in data warehousing & BusinessIintelligenceEdureka!
 
Kushal Data Warehousing PPT
Kushal Data Warehousing PPTKushal Data Warehousing PPT
Kushal Data Warehousing PPTKushal Singh
 
DWH_PROJECT [Compatibility Mode]
DWH_PROJECT [Compatibility Mode]DWH_PROJECT [Compatibility Mode]
DWH_PROJECT [Compatibility Mode]vasanth kumar C
 
BI Masterclass slides (Reference Architecture v3)
BI Masterclass slides (Reference Architecture v3)BI Masterclass slides (Reference Architecture v3)
BI Masterclass slides (Reference Architecture v3)Syaifuddin Ismail
 
What is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
What is OLAP -Data Warehouse Concepts - IT Online Training @ NewyorksysWhat is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
What is OLAP -Data Warehouse Concepts - IT Online Training @ NewyorksysNEWYORKSYS-IT SOLUTIONS
 
Designing modern dw and data lake
Designing modern dw and data lakeDesigning modern dw and data lake
Designing modern dw and data lakepunedevscom
 
Big data analytics beyond beer and diapers
Big data analytics   beyond beer and diapersBig data analytics   beyond beer and diapers
Big data analytics beyond beer and diapersKai Zhao
 
DATA Warehousing & Data Mining
DATA Warehousing & Data MiningDATA Warehousing & Data Mining
DATA Warehousing & Data Miningcpjcollege
 
Эволюция Big Data и Information Management. Reference Architecture.
Эволюция Big Data и Information Management. Reference Architecture.Эволюция Big Data и Information Management. Reference Architecture.
Эволюция Big Data и Information Management. Reference Architecture.Andrey Akulov
 
Introduction to Harnessing Big Data
Introduction to Harnessing Big DataIntroduction to Harnessing Big Data
Introduction to Harnessing Big DataPaul Barsch
 
Oracle: Fundamental Of Dw
Oracle: Fundamental Of DwOracle: Fundamental Of Dw
Oracle: Fundamental Of Dworacle content
 
Traditional Data-warehousing / BI overview
Traditional Data-warehousing / BI overviewTraditional Data-warehousing / BI overview
Traditional Data-warehousing / BI overviewNagaraj Yerram
 
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...DATAVERSITY
 

Similar to What is a Data Warehouse and How Do I Test It? (20)

the Data World Distilled
the Data World Distilledthe Data World Distilled
the Data World Distilled
 
DWH_Session_1.pptx
DWH_Session_1.pptxDWH_Session_1.pptx
DWH_Session_1.pptx
 
Datawarehousing & DSS
Datawarehousing & DSSDatawarehousing & DSS
Datawarehousing & DSS
 
Big Data's Impact on the Enterprise
Big Data's Impact on the EnterpriseBig Data's Impact on the Enterprise
Big Data's Impact on the Enterprise
 
Why do Data Warehousing & Business Intelligence go hand in hand?
Why do Data Warehousing & Business Intelligence go hand in hand? Why do Data Warehousing & Business Intelligence go hand in hand?
Why do Data Warehousing & Business Intelligence go hand in hand?
 
Mastering in data warehousing & BusinessIintelligence
Mastering in data warehousing & BusinessIintelligenceMastering in data warehousing & BusinessIintelligence
Mastering in data warehousing & BusinessIintelligence
 
Kushal Data Warehousing PPT
Kushal Data Warehousing PPTKushal Data Warehousing PPT
Kushal Data Warehousing PPT
 
DWH_PROJECT [Compatibility Mode]
DWH_PROJECT [Compatibility Mode]DWH_PROJECT [Compatibility Mode]
DWH_PROJECT [Compatibility Mode]
 
Data warehousing
Data warehousingData warehousing
Data warehousing
 
BI Masterclass slides (Reference Architecture v3)
BI Masterclass slides (Reference Architecture v3)BI Masterclass slides (Reference Architecture v3)
BI Masterclass slides (Reference Architecture v3)
 
What is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
What is OLAP -Data Warehouse Concepts - IT Online Training @ NewyorksysWhat is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
What is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
 
Designing modern dw and data lake
Designing modern dw and data lakeDesigning modern dw and data lake
Designing modern dw and data lake
 
Big data analytics beyond beer and diapers
Big data analytics   beyond beer and diapersBig data analytics   beyond beer and diapers
Big data analytics beyond beer and diapers
 
DATA Warehousing & Data Mining
DATA Warehousing & Data MiningDATA Warehousing & Data Mining
DATA Warehousing & Data Mining
 
Эволюция Big Data и Information Management. Reference Architecture.
Эволюция Big Data и Information Management. Reference Architecture.Эволюция Big Data и Information Management. Reference Architecture.
Эволюция Big Data и Information Management. Reference Architecture.
 
Introduction to Harnessing Big Data
Introduction to Harnessing Big DataIntroduction to Harnessing Big Data
Introduction to Harnessing Big Data
 
Oracle: Fundamental Of DW
Oracle: Fundamental Of DWOracle: Fundamental Of DW
Oracle: Fundamental Of DW
 
Oracle: Fundamental Of Dw
Oracle: Fundamental Of DwOracle: Fundamental Of Dw
Oracle: Fundamental Of Dw
 
Traditional Data-warehousing / BI overview
Traditional Data-warehousing / BI overviewTraditional Data-warehousing / BI overview
Traditional Data-warehousing / BI overview
 
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
 

More from RTTS

Automated Testing of Microsoft Power BI Reports
Automated Testing of Microsoft Power BI ReportsAutomated Testing of Microsoft Power BI Reports
Automated Testing of Microsoft Power BI ReportsRTTS
 
QuerySurge AI webinar
QuerySurge AI webinarQuerySurge AI webinar
QuerySurge AI webinarRTTS
 
State of the Market - Data Quality in 2023
State of the Market - Data Quality in 2023State of the Market - Data Quality in 2023
State of the Market - Data Quality in 2023RTTS
 
TestGuild and QuerySurge Presentation -DevOps for Data Testing
TestGuild and QuerySurge Presentation -DevOps for Data TestingTestGuild and QuerySurge Presentation -DevOps for Data Testing
TestGuild and QuerySurge Presentation -DevOps for Data TestingRTTS
 
Creating a Project Plan for a Data Warehouse Testing Assignment
Creating a Project Plan for a Data Warehouse Testing AssignmentCreating a Project Plan for a Data Warehouse Testing Assignment
Creating a Project Plan for a Data Warehouse Testing AssignmentRTTS
 
RTTS Postman and API Testing Webinar Slides.pdf
RTTS Postman and API Testing Webinar  Slides.pdfRTTS Postman and API Testing Webinar  Slides.pdf
RTTS Postman and API Testing Webinar Slides.pdfRTTS
 
How to Automate your Enterprise Application / ERP Testing
How to Automate your  Enterprise Application / ERP TestingHow to Automate your  Enterprise Application / ERP Testing
How to Automate your Enterprise Application / ERP TestingRTTS
 
QuerySurge Slide Deck for Big Data Testing Webinar
QuerySurge Slide Deck for Big Data Testing WebinarQuerySurge Slide Deck for Big Data Testing Webinar
QuerySurge Slide Deck for Big Data Testing WebinarRTTS
 
Webinar - QuerySurge and Azure DevOps in the Azure Cloud
 Webinar - QuerySurge and Azure DevOps in the Azure Cloud Webinar - QuerySurge and Azure DevOps in the Azure Cloud
Webinar - QuerySurge and Azure DevOps in the Azure CloudRTTS
 
Implementing Azure DevOps with your Testing Project
Implementing Azure DevOps with your Testing ProjectImplementing Azure DevOps with your Testing Project
Implementing Azure DevOps with your Testing ProjectRTTS
 
An introduction to QuerySurge webinar
An introduction to QuerySurge webinarAn introduction to QuerySurge webinar
An introduction to QuerySurge webinarRTTS
 
Data Warehouse Testing in the Pharmaceutical Industry
Data Warehouse Testing in the Pharmaceutical IndustryData Warehouse Testing in the Pharmaceutical Industry
Data Warehouse Testing in the Pharmaceutical IndustryRTTS
 
Completing the Data Equation: Test Data + Data Validation = Success
Completing the Data Equation: Test Data + Data Validation = SuccessCompleting the Data Equation: Test Data + Data Validation = Success
Completing the Data Equation: Test Data + Data Validation = SuccessRTTS
 
QuerySurge for DevOps
QuerySurge for DevOpsQuerySurge for DevOps
QuerySurge for DevOpsRTTS
 
Leveraging HPE ALM & QuerySurge to test HPE Vertica
Leveraging HPE ALM & QuerySurge to test HPE VerticaLeveraging HPE ALM & QuerySurge to test HPE Vertica
Leveraging HPE ALM & QuerySurge to test HPE VerticaRTTS
 
Big Data Testing : Automate theTesting of Hadoop, NoSQL & DWH without Writing...
Big Data Testing : Automate theTesting of Hadoop, NoSQL & DWH without Writing...Big Data Testing : Automate theTesting of Hadoop, NoSQL & DWH without Writing...
Big Data Testing : Automate theTesting of Hadoop, NoSQL & DWH without Writing...RTTS
 
Whitepaper: Volume Testing Thick Clients and Databases
Whitepaper:  Volume Testing Thick Clients and DatabasesWhitepaper:  Volume Testing Thick Clients and Databases
Whitepaper: Volume Testing Thick Clients and DatabasesRTTS
 
Query Wizards - data testing made easy - no programming
Query Wizards - data testing made easy - no programmingQuery Wizards - data testing made easy - no programming
Query Wizards - data testing made easy - no programmingRTTS
 
Case study: Open Source Automation Framework using Selenium WebDriver
Case study: Open Source Automation Framework using Selenium WebDriverCase study: Open Source Automation Framework using Selenium WebDriver
Case study: Open Source Automation Framework using Selenium WebDriverRTTS
 
Enterprise Business Intelligence & Data Warehousing: The Data Quality Conundrum
Enterprise Business Intelligence & Data Warehousing: The Data Quality ConundrumEnterprise Business Intelligence & Data Warehousing: The Data Quality Conundrum
Enterprise Business Intelligence & Data Warehousing: The Data Quality ConundrumRTTS
 

More from RTTS (20)

Automated Testing of Microsoft Power BI Reports
Automated Testing of Microsoft Power BI ReportsAutomated Testing of Microsoft Power BI Reports
Automated Testing of Microsoft Power BI Reports
 
QuerySurge AI webinar
QuerySurge AI webinarQuerySurge AI webinar
QuerySurge AI webinar
 
State of the Market - Data Quality in 2023
State of the Market - Data Quality in 2023State of the Market - Data Quality in 2023
State of the Market - Data Quality in 2023
 
TestGuild and QuerySurge Presentation -DevOps for Data Testing
TestGuild and QuerySurge Presentation -DevOps for Data TestingTestGuild and QuerySurge Presentation -DevOps for Data Testing
TestGuild and QuerySurge Presentation -DevOps for Data Testing
 
Creating a Project Plan for a Data Warehouse Testing Assignment
Creating a Project Plan for a Data Warehouse Testing AssignmentCreating a Project Plan for a Data Warehouse Testing Assignment
Creating a Project Plan for a Data Warehouse Testing Assignment
 
RTTS Postman and API Testing Webinar Slides.pdf
RTTS Postman and API Testing Webinar  Slides.pdfRTTS Postman and API Testing Webinar  Slides.pdf
RTTS Postman and API Testing Webinar Slides.pdf
 
How to Automate your Enterprise Application / ERP Testing
How to Automate your  Enterprise Application / ERP TestingHow to Automate your  Enterprise Application / ERP Testing
How to Automate your Enterprise Application / ERP Testing
 
QuerySurge Slide Deck for Big Data Testing Webinar
QuerySurge Slide Deck for Big Data Testing WebinarQuerySurge Slide Deck for Big Data Testing Webinar
QuerySurge Slide Deck for Big Data Testing Webinar
 
Webinar - QuerySurge and Azure DevOps in the Azure Cloud
 Webinar - QuerySurge and Azure DevOps in the Azure Cloud Webinar - QuerySurge and Azure DevOps in the Azure Cloud
Webinar - QuerySurge and Azure DevOps in the Azure Cloud
 
Implementing Azure DevOps with your Testing Project
Implementing Azure DevOps with your Testing ProjectImplementing Azure DevOps with your Testing Project
Implementing Azure DevOps with your Testing Project
 
An introduction to QuerySurge webinar
An introduction to QuerySurge webinarAn introduction to QuerySurge webinar
An introduction to QuerySurge webinar
 
Data Warehouse Testing in the Pharmaceutical Industry
Data Warehouse Testing in the Pharmaceutical IndustryData Warehouse Testing in the Pharmaceutical Industry
Data Warehouse Testing in the Pharmaceutical Industry
 
Completing the Data Equation: Test Data + Data Validation = Success
Completing the Data Equation: Test Data + Data Validation = SuccessCompleting the Data Equation: Test Data + Data Validation = Success
Completing the Data Equation: Test Data + Data Validation = Success
 
QuerySurge for DevOps
QuerySurge for DevOpsQuerySurge for DevOps
QuerySurge for DevOps
 
Leveraging HPE ALM & QuerySurge to test HPE Vertica
Leveraging HPE ALM & QuerySurge to test HPE VerticaLeveraging HPE ALM & QuerySurge to test HPE Vertica
Leveraging HPE ALM & QuerySurge to test HPE Vertica
 
Big Data Testing : Automate theTesting of Hadoop, NoSQL & DWH without Writing...
Big Data Testing : Automate theTesting of Hadoop, NoSQL & DWH without Writing...Big Data Testing : Automate theTesting of Hadoop, NoSQL & DWH without Writing...
Big Data Testing : Automate theTesting of Hadoop, NoSQL & DWH without Writing...
 
Whitepaper: Volume Testing Thick Clients and Databases
Whitepaper:  Volume Testing Thick Clients and DatabasesWhitepaper:  Volume Testing Thick Clients and Databases
Whitepaper: Volume Testing Thick Clients and Databases
 
Query Wizards - data testing made easy - no programming
Query Wizards - data testing made easy - no programmingQuery Wizards - data testing made easy - no programming
Query Wizards - data testing made easy - no programming
 
Case study: Open Source Automation Framework using Selenium WebDriver
Case study: Open Source Automation Framework using Selenium WebDriverCase study: Open Source Automation Framework using Selenium WebDriver
Case study: Open Source Automation Framework using Selenium WebDriver
 
Enterprise Business Intelligence & Data Warehousing: The Data Quality Conundrum
Enterprise Business Intelligence & Data Warehousing: The Data Quality ConundrumEnterprise Business Intelligence & Data Warehousing: The Data Quality Conundrum
Enterprise Business Intelligence & Data Warehousing: The Data Quality Conundrum
 

Recently uploaded

Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 

Recently uploaded (20)

Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 

What is a Data Warehouse and How Do I Test It?

  • 1. © 2015 Real-Time Technology Solutions, Inc. New York  Philadelphia  Atlanta  www.rtts.com What is a Data Warehouse and How Do I Test It? A primer for Testers on Data Warehouses, the ETL process and Business Intelligence and how to test them
  • 2. © 2011 Real-Time Technology Solutions, Inc. New York  Philadelphia  Atlanta  www.rtts.com built by QuerySurge™ About FACTS Founded: 1996 – consulting firm Locations: New York (HQ), Atlanta, Philadelphia, Phoenix Strategic Partners: IBM, Microsoft, HP, Oracle, Teradata, HortonWorks, Cloudera, Amazon Software: QuerySurge RTTS is the leading provider of software & data quality for critical business systems
  • 3. Overview  What is Big Data?  What is a Data Warehouse? o About the ETL Process o The Data Warehouse marketplace  What is Business Intelligence? o The architecture o The BI marketplace  Testing the DW Architecture o Entry points o The Mapping document o Functional test implementation o Test Tools  Testing BI o Functional test implementation o Performance Testing  Data Warehouse Test Tool demo  Q&A
  • 4. ETL Business Intelligence (BI) software CxOs are using Business Intelligence & Analytics to make critical business decisions – with the assumption that the underlying data is fine. “The average organization loses $8.2 million annually through poor Data Quality.” - Gartner Data Architecture The Executive Office and Critical Data potential problem areas
  • 5. What is Big Data?
  • 6. Big data – defined as too much volume, velocity and variability to work on normal database architectures. What is Big Data? “The market for big data is $70 billion and growing by 15% a year.” - EMC COO Pat Gelsinger Size Defined as 5 petabytes or more 1 petabyte = 1,000 terabytes 1,000 terabytes = 1,000,000 gigabytes 1,000,000 gigabytes = 1,000,000,000 megabytes
  • 7. Big Data Impact Handles more than 1 million customer transactions every hour. • data imported into databases that contain > 2.5 petabytes of data • the equivalent of 167 times the information contained in all the books in the US Library of Congress. Facebook handles 40 billion photos from its user base. Google processes 1 Terabyte per hour Twitter processes 85 million tweets per day eBay processes 80 Terabytes per day others
  • 8. Requires exceptional technologies to efficiently process large quantities of data within tolerable elapsed times. Technologies include: • massively parallel processing (MPP) databases • data warehouses • Data mining grids • distributed file systems • distributed databases • cloud computing platforms • the Internet, and • scalable storage system Big Data Solutions
  • 9. What is a Data Warehouse?
  • 10. What is a Data Warehouse? Data Warehouse • typically a relational database that is designed for query and analysis rather than for transaction processing • a place where historical data is stored for archival, analysis and security purposes. • contains either raw data or formatted data • combines data from multiple sources • Sales • salaries • operational data • human resource data • inventory data • web logs • Social networks • Internet text and docs • other Legacy DB CRM/ERP DB Finance DB
  • 11. Data Warehouse: Business Case Why build a Data Warehouse? • Data stored in operational systems (OLTP) not easily accessible • OLTP systems are not designed for end-user analysis • The data in OLTP is constantly changing • May be deficient in historical data • Diverse forms of data stored in different platforms and/or dissimilar formats
  • 12. Data Warehouse: Business Case The Data Warehouse Business Solution • Collects data from different sources (other databases, files, web services, etc) • Integrates data into logical business areas • Provides direct access to data with powerful reporting tools (BI)
  • 13. Data Warehouse: About the data The Data Warehouse data • Subject-oriented • Integrated • Non-volatile • Time-variant
  • 14. Data Warehouse: the ETL process ETL = Extract, Transform, Load Why ETL? Need to load the data warehouse regularly (daily/weekly) so that it can serve its purpose of facilitating business analysis. Extract - data from one or more OLTP systems and copied into the warehouse Extract Transform – removing inconsistencies, assemble to a common format, adding missing fields, summarizing detailed data and deriving new fields to store calculated data. Transform Load – map the data and load it into the DW Load
  • 15. Data Warehouse: the ETL process Extract Legacy DB CRM/ERP DB Finance DB Source Data ETL Process Target Data Warehouse Transform Load
  • 16. Data Warehouse: the Marketplace “The data warehousing market will see a compound annual growth rate of 11.5% through 2013 to reach a total of $13.2 billion in revenue.” - consulting specialist The 451 Group Data Warehouse size Small data warehouses: < 5 TB Midsize data warehouses: 5 TB - 20 TB Large data warehouses: >20 TB - Analyst firm Gartner Leaders in Data Warehouse Data Management Systems       - Analyst firm Gartner’s ‘Magic Quadrant for Data Warehouse Database Management Systems’
  • 17. Data Warehouse: the Marketplace Delivery Models • stand-alone DBMS software • Cloud offerings • data warehouse appliances Leading Appliance Makers
  • 19. Business Intelligence (BI) B.I. – What is it? • Software applications used in spotting, digging-out, and analyzing business data • provides simple access to data which can be used in day to day operations, integrates data into logical business areas • provides historical, current and predictive views of business operations • made up of several related activities, including data mining, online analytical processing, querying and reporting.
  • 20. Business Intelligence (BI): Who uses it? Wal-Mart uses vast amounts of data and category analysis to dominate the industry. Amazon and Yahoo follow a "test and learn" approach to business changes. Hardee’s, Wendy’s, and T.G.I. Friday’s use BI to make strategic decisions.
  • 21. Business Intelligence (BI) & Data Marts Data Mart A database that has the same characteristics as a data warehouse, but is usually smaller and is focused on the data for one division or one workgroup within an enterprise. Typically hold aggregated data and some granular data. It is a subset of the DW and makes it more efficient for Business Intelligence reporting. Legacy DB CRM/E RP DB Finance DB ETL ETL Source Data ETL Process Target DW ETL Process Data Mart
  • 22. Business Intelligence (BI) Legacy DB CRM/ERP DB Finance DB ETL ETL Source Data ETL Process Target DW ETL Process Data Mart
  • 23. BI: the Marketplace “Worldwide business intelligence (BI) platform, analytic applications and performance management (PM) software revenue reached $10.5 billion in 2010, a 13.4 percent increase from 2009 revenue of $9.3 billion” “The four large "stack" vendors (SAP, Oracle, IBM and Microsoft) continue to consolidate the market, owning 59 percent of the market share. ” - Analyst firm Gartner - Analyst firm Forrester Research’s ‘Forrester Wave’ Leaders in BI        
  • 24. Testing a Data Warehouse (DWH)
  • 25. DataWarehouseTesting The Challenge Comprehensive testing of data at every point throughout data process is becoming increasingly important as more data is being used in strategic decision-making. Yet current strategies are time-consuming, resource- intensive and inefficient. What's Involved in Data Testing? According to authors Doug Vucevic and Wayne Yaddow in the book "Testing the Data Warehouse Practicum: Assuring Data Content, Data Structures and Quality", some of the main challenges of data testing are: Data Completeness Verifying that all data has been loaded from the sources to the target. Data Transformation Ensuring that all data has been transformed correctly during the extract- transform-load (ETL) process. Data Quality Ensuring that the ETL process correctly rejects, substitutes default values, corrects or ignores and reports invalid data. Regression Testing Ensuring existing functionality remains intact each time a new release of code is completed.
  • 26. Resources involved • Business Analysts create requirements • QA Testers develop and execute test plans and test cases. ***Skill Set required: Strong SQL!!! • Architects set up test environments • Developers perform unit tests • DBAs test for performance and stress • Business Users perform functional User Acceptance Tests Testing the DWH: Resources Involved For the purposes of this presentation, we will focus on a strategy for Testers.
  • 27. An effective data warehouse testing strategy focuses on the main structures within the data warehouse architecture: 1) The Sources 2) The ETL layer 3) The data warehouse itself 4) The front-end (BI) data warehouse applications Testing the Data Warehouse: the Strategy
  • 28. Testing the Data Warehouse: Entry Points Recommended functional test strategy: Test every entry point in the system (feeds, databases, internal messaging, front-end transactions). The goal: provide rapid localization of data issues between points test entry point(s) test entry point test entry point Legacy DB CRM/ERP DB Finance DB ETL ETL Source Data ETL Process Target DW ETL Process Data Mart Business Intelligence software
  • 29. Target DW Testing the Data Warehouse: Entry Points Legacy DB CRM/ERP DB Finance DB Source Data File File Staging DB ETL Process ETL ETL ETL ETL ETL ETL test entry pointstest entry points test entry points test entry points Data MartsETL Process ETL ETL possible architectureETL ETL ETL ETL ETL ETL ETL Process Business Intelligence software
  • 30. Testing the DWH: the Mapping Document a.k.a. Source to Target Map It’s the critical element required to efficiently plan the ETL process. Intention:  capture business rules  data flow mapping and  data movement requirements. Mapping Doc specifies:  Source input definition  Target/output details  Business & data transformation rules  Data quality requirements
  • 31. Testing the DWH: the Mapping Document SELECT c.idCustomer "Customer ID", c.lastName "Customer Last Name", c.firstName "Customer First Name", o.idOrder "Order Number", p.name "Product Name", op.quantity "Quantity Ordered", CASE WHEN os.idOrderStatus = 5 AND o.refundDate IS NOT NULL THEN 'Returned' WHEN (os.idOrderStatus = 3 OR os.idOrderStatus = 4) AND o.shipDate IS NOT NULL THEN 'Delivered' ELSE 'Processing' END "Order Status" FROM Sales.Orders o, Sales.OrderStatus os, Sales.OrderProduct op, Sales.Product p, Sales.Category cat, Sales.Customer c WHERE o.order_idOrderStatus = os.idorderstatus AND op.orderProduct_idOrder = o.idOrder AND op.orderProduct_idProduct = p.idProduct AND p.product_idCategory = cat.idCategory AND cat.name = 'Electronics' AND o.order_idCustomer = c.idCustomer AND o.orderDate BETWEEN '01-SEP-10' AND '07-SEP-10' ORDER BY c.idCustomer, c.lastName, c.firstName, o.idorder Source SELECT u.idUser "Customer ID", u.lastName "Customer Last Name", u.firstName "Customer First Name", p.idPurchase "Purchase Number", i.name "Item Name", oi.quantity "Quantity Ordered", ps.status "Purchase Status" FROM dw.Purchase p, dw.PurchaseStatus ps, dw.OrderItem oi, dw.Item i, dw.user_ u, dw.category cat WHERE p.purchase_idPurchaseStatus = ps.idPurchaseStatus AND oi.orderItem_idPurchase = p.idPurchase AND oi.orderItem_idItem = i.idItem AND p.purchase_idUser = u.idUser AND i.item_idCategory = cat.idCategory AND cat.name = 'Electronics' AND SUBSTR(p.purchaseDate, 1, 5) BETWEEN '09-01' AND '09-07' AND SUBSTR(p.purchaseDate, -2) = '10' ORDER BY u.idUser, u.lastname, u.firstname, p.idpurchase Target
  • 32. Testing the DWH: Implementation Implementation of Functional Test What is going on in the marketplace? 1. Manual Execution 2. Automated execution with standard test tools 3. Bulk automation with Data Warehouse Testing Tool (i.e. QuerySurge)
  • 33. © 2015 Real-Time Technology Solutions, Inc. Review Mapping Docs Write SQL in favorite editor Run TESTs Dump results to a file Compare results manually or w/compare tool Report Defects and issues Tools Tasks Timeline Testing the DWH: Manual Testing Flow
  • 34. Manual ETL Testing Flow Comments  Check points across each leg so that each transformation is checked.  If a file compare tool is used, care must be taken to ensure that the result rows for each query are in the same order (the db is under no obligation to return rows in a specified order, unless the sql indicates an order).  This process can quickly result in 100’s or 1,000’s of source and target query pairs.  Process is labor intensive. Even with multiple people, a VERY small sampling can be performed. Testing the DWH: Manual Testing Flow
  • 35. Functional Automation ETL Testing flow 1. Similar to previous - Extract mappings from mapping document 2. Write pairs of queries that test between any two points in the architecture. 3. Issue the queries via a Functional Automation tool 4. Have the functional Scripts dump the query result-sets to files 5. Compare the files, either by writing automation code or by using a file compare tool. This process is dependent on the speed of the automation tool; Normally, only a fraction of the data can be covered per ETL per build. Functional Tester Testing the DWH: Typical Functional Automation Testing Flow
  • 36. 36 © 2015 Real-Time Technology Solutions, Inc. SQL (source) SQL (target) SQL (source) SQL (target) Legacy DB CRM/ERP DB Finance DB Testing the Data Warehouse: Specialized Data Warehouse Test Tool QuerySurge™
  • 37. QuerySurge™ the collaborative Data Warehouse Testing solution that finds bad data & provides a holistic view of your data’s health built by
  • 38. • Reduce your costs & risks • Improve your data quality • Accelerate your testing cycles • Share information with your team with QuerySurge™ you can: built by QuerySurge™ • Provides huge ROI (i.e. 1,300%)* *based on client’s calculation of Return on Investment
  • 39. the QuerySurge advantage built by QuerySurge™ Automate the entire testing cycle  Automate kickoff, tests, comparison, auto-emailed results Create Tests easily with no SQL programming  ensures minimal time & effort to create tests / obtain results Test across different platforms  data warehouse, Hadoop, NoSQL, database, flat file, XML Collaborate with team  Data Health dashboard, shared tests & auto-emailed reports Verify more data & do it quickly  verifies up to 100% of all data up to 1,000 x faster Integrate for Continuous Delivery  Integrates with most Build, ETL & QA management software
  • 40. QuerySurge™ Architecture Web-based… Installs on... Linux Connects to… …or any other JDBC compliant data source built by QuerySurge™ QuerySurge Controller QuerySurge Server QuerySurge Agents Flat Files
  • 41. Collaboration Testers - functional testing - regression testing - result analysis Developers / DBAs - unit testing - result analysis Data Analysts - review, analyze data - verify mapping failures Operations teams - monitoring - result analysis Managers - oversight - result analysis Share information on the built by QuerySurge™
  • 42. Strategy • Execute business user reports and verify results from report to Data Mart » Logical Calculations − Verify logical calculations to back-end Data Mart by creating SQL queries that incorporate and return the calculations from the Data Mart. Compare to report. (Example: Total sales for the month of January) » Data Validation − Verify data validations to back-end Data Mart by creating SQL queries that incorporate and return the equivalent data from the Data Mart. Compare to report. (Example: List of all customers that spent more than $100) » Parameter Validation − For reports that have parameters, create multiple tests that incorporates a reasonable amount of test coverage. Testing the Data Warehouse: Functional Test of Business Intelligence software
  • 43. Testing the DWH: Functional Test of BI Functional Testing of BI 1. BI Developer creates reports based on Business user requirements 2. Testers verify reports by: • Running reports using a range of parameter permutations. • Verify that data is correct o Record counts on report to backend data mart o Verify field data elements o Verify field lengths and field level data o Verify logical dependencies Functional Tester Automation tools can and should be used for regression purposes.
  • 44. Common Challenges • BI systems often have reports that require complex SQL queries across dozens of tables encompassing 100’s of 1,000’s of records, from multiple databases. • Challenge: Determining the performance characteristics under differing conditions and workloads . • Need to know the ability of the system to scale to the # of concurrent users. • Must test how length of time for user to receive report after requesting it with the parameters he/she specifies. Testing the Data Warehouse: Performance Test of BI
  • 45. Testing the DWH: Performance Test of BI Strategy • Determine a typical workload for the business intelligence system. • Identify different user roles, what kinds of work they do on the system, and how often they do this work. • Determine how many users of each role there are. • Choose a performance tool that can record the protocol activity of the system and allow the performance tester to modify data parameters. • Create scripts by recording the protocol traffic emitted by the BI system as the targeted reports were opened and refreshed. • Prepare and execute series of concurrent multi-user tests • Make sure each virtual user emulates the activity of real users accessing business intelligence reports based on separate concerns. • Monitor response times, throughput, network activity, and system activity for issues • Review results and provide recommendations. Using this approach, the workload activity of the entire population of business intelligence users can be reproduced in controlled conditions Performance Tester
  • 46. Summary What is a Data Warehouse and How Do I Test It? • Big Data is a growing technical concern and has reached $70 billion in scope. • The Data Warehouse and Business Intelligence software marketplace is a $22 billion market and growing. • Functional testing of a data warehouse implementation is a complex undertaking and requires strong SQL skills by the Tester • Manual testing and automated testing using standard tools provide a very small % of coverage. • Business Intelligence software must be properly tested for both functionality and performance.
  • 47. © 2015 Real-Time Technology Solutions, Inc. 47 To see the video of this Webinar please visit: http://www.querysurge.com/solutions/data-warehouse-testing What is a Data Warehouse and How Do I Test It?

Editor's Notes

  1. Volume -- of data is getting higher/bigger than ever. Velocity -- of data is increasing e.g. Complex Event Processing of real time data.  Variety -- of data is spiraling e.g. unstructured video and voice. Variability -- of data types is also increasing
  2. Corporate data in an organization is generated and stored in a variety of operational systems.  Operational systems are systems like order-entry and invoicing, that are tuned to handle day-to-day transactions. OLTP - On-line Transaction Processing systems.
  3. Access Data Directly With data warehousing, as a decision-maker, you do not need to rely on IS personnel to fulfill your querying needs.  You can access data directly, when and how you want. You can execute queries and build reports on your workstation, freeing the IS department to focus on tasks such as building applications.
  4. Subject-Oriented OLTP = application-oriented & current— designed to support application processing. DW = subject-oriented & historical— designed to aid decision-making. Integrated Data in a warehouse is integrated by consolidating data from different operational systems. Non-Volatile Nonvolatile means that, once entered into the warehouse, data should not change. This is logical because the purpose of a warehouse is to enable you to analyze what has occurred. Where an operational system replaces existing data with new data, a data warehouse continually absorbs new data, integrating it with existing data. Time Variant In order to discover trends in business, analysts need large amounts of data. This is very much in contrast to OLTP systems, where performance requirements demand that historical data be moved to an archive. A data warehouse's focus on change over time is what is meant by the term time variant.
  5. Designing and maintaining the ETL process is often considered one of the most difficult and resource-intensive portions of a data warehouse project. Many data warehousing projects use ETL tools to manage this process. Other data warehouse builders create their own ETL tools and processes, either inside or outside the database. Besides the support of extraction, transformation, and loading, there are some other tasks that are important for a successful ETL implementation as part of the daily operations of the data warehouse and its support for further enhancements.
  6. Informatica’s software is the premier used for ETL, but was not mentioned in Gartner’s report because they don’t have DW software.
  7. Companies use BI to improve decision making, cut costs and identify new business opportunities. BI is more than just corporate reporting and more than a set of tools to coax data out of enterprise systems. CIOs use BI to identify inefficient business processes that are ripe for re-engineering. With today’s BI tools, business folks can jump in and start analyzing data themselves, rather than wait for IT to run complex reports.
  8. Restaurant chains such as Hardee’s, Wendy’s, Ruby Tuesday and T.G.I. Friday’s are heavy users of BI software. They use BI to make strategic decisions, such as what new products to add to their menus, which dishes to remove and which underperforming stores to close. They also use BI for tactical matters such as renegotiating contracts with food suppliers and identifying opportunities to improve inefficient processes. Because restaurant chains are so operations-driven, and because BI is so central to helping them run their businesses, they are among the elite group of companies across all industries that are actually getting real value from these systems.
  9. Each of these units must be treated separately and in combination, and since there may be multiple components in each (multiple feeds to ETL, multiple databases or data repositories that constitute the warehouse, and multiple front-end applications), each of these subsystems must be individually validated.
  10. 1. (comment) Usually, the points are across each ETL “Leg”, so that each transformation is checked stepwise. 4. If a file compare tool is used, care must be taken to ensure that the result rows for each query are in the same order (the db is under no obligation to return rows in a specified order, unless the sql indicates an order). Output is not fancy from file compare tools (usually); reporting will be ad hoc using Excel or similar This process can quick result in 100’s or 1,000’s of pairs of queries – since if you write several testing queries for each mapping across each leg, the multipliers raise the numbers quickly. (avg # of testing queries per mapping X # of mappings per leg X # of legs) Clearly, this process is labor intensive, and even with several people executing, only a tiny fraction of the data can be covered per ETL per build.  
  11. Functional Automation ETL Testing flow As above - Extract mappings from mapping document Write pairs of queries that test between any two points in the architecture. Usually, the points are across each ETL “Leg”, so that each transformation is checked stepwise. Issue the queries via a Functional Automation tool Have the functional Scripts dump the query result-sets to files Compare the files, either by writing automation code or by using a file compare tool. The tool logs serve as reporting output. Clearly, this process is dependent on the speed of the automation tool; even with several tool instances executing, typically only a fraction of the data can be covered per ETL per build.
  12. QuerySurge provides insight into the health of your data throughout your organization through BI dashboards and reporting at your fingertips. It is a collaborative tool that allows for distributed use of the tool throughout your organization and provides for a sharable, holistic view of your data’s health and your organization’s level of maturity of your data management.
  13. QuerySurge helps your team coordinate your data quality initiatives while speeding up your development and testing cycles and finding your bad data. Why risk having your team identify trends and develop strategic initiatives when the underlying data is incorrect? QuerySurge reduces this risk.
  14. Your distributed team from around the world can use any of these web browsers: Internet Explorer, Chrome, Firefox and Safari. Installs on operating systems: Windows & Linux. QS connects to any JDBC-compliant data source. Even if it is not listed here.
  15. QuerySurge can utilized by active practitioners such as testers & developers to create and launch tests, or by managers, analysts and operations to view data test results and the overall health of the data. QuerySurge facilitates this by providing 2 types of licenses: (1) full user & (2) participant user. (1) Full User – This type of user has unlimited access to create QueryPairs, Suites, and Scenarios. This user can also schedule and run tests, see results, run and export reports, and export data. Perfect for anyone creating and/or running data tests while performing analysis of results. (2) Participant User – This user cannot create or run tests, but has access to all other information - including viewing all query pairs, results, and reports, receiving email notifications, and exporting test results and reports. Perfect for managers, analysts, architects, DBAs, developers, and operations users who need to know the health of their data.
  16. Business Intelligence systems often have reports that require the use of complex SQL queries across dozens of tables encompassing hundreds of thousands of records, originating from several individual databases.   Determining the performance characteristics of these systems under differing conditions and workloads has always been a challenge.   In most cases, we wish to test how long it would take a user to receive a report after requesting it with the parameters he or she specifies.     In order to begin, we must determine a typical workload for the business intelligence system. We determine how many user roles there are, what kinds of work they do on the system, and how often they do this work. We determine how many users of each role there are.     It is important to choose a performance tool that has the ability to record the protocol activity of the system and allow the performance tester to modify that to make it generalized and applicable to different data parameters to emulate the variable data that are often required of business intelligence systems.   We create scripts by recording the protocol traffic emitted by the business intelligence system as the targeted reports were opened and refreshed. These scripts were modified in order to accept data parameters from a file or prepared information, determined by a subject matter expert. These parameters were put into data files, in a form suitable for use with the test scripts.