Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
© 2015 Real-Time Technology Solutions, Inc.
New York  Philadelphia  Atlanta  www.rtts.com
What is a Data Warehouse and
...
© 2011 Real-Time Technology Solutions, Inc.
New York  Philadelphia  Atlanta  www.rtts.com
built by
QuerySurge™
About
FA...
Overview
 What is Big Data?
 What is a Data Warehouse?
o About the ETL Process
o The Data Warehouse marketplace
 What i...
ETL
Business Intelligence (BI) software
CxOs are using Business Intelligence & Analytics to make critical business decisio...
What is Big Data?
Big data – defined as too much volume, velocity and
variability to work on normal database architectures.
What is Big Data...
Big Data Impact
Handles more than 1 million customer transactions every hour.
• data imported into databases that contain ...
Requires exceptional technologies to efficiently process large quantities of
data within tolerable elapsed times.
Technolo...
What is a Data Warehouse?
What is a Data Warehouse?
Data Warehouse
• typically a relational database that is designed for query and
analysis rather ...
Data Warehouse: Business Case
Why build a Data Warehouse?
• Data stored in operational systems (OLTP) not
easily accessibl...
Data Warehouse: Business Case
The Data Warehouse Business Solution
• Collects data from different sources (other databases...
Data Warehouse: About the data
The Data Warehouse data
• Subject-oriented
• Integrated
• Non-volatile
• Time-variant
Data Warehouse: the ETL process
ETL = Extract, Transform, Load
Why ETL?
Need to load the data warehouse regularly (daily/w...
Data Warehouse: the ETL process
Extract
Legacy DB
CRM/ERP
DB
Finance DB
Source
Data
ETL Process Target
Data Warehouse
Tran...
Data Warehouse: the Marketplace
“The data warehousing market will see a compound annual growth rate of
11.5% through 2013 ...
Data Warehouse: the Marketplace
Delivery Models
• stand-alone DBMS software
• Cloud offerings
• data warehouse appliances
...
Business Intelligence (BI)
Business Intelligence (BI)
B.I. – What is it?
• Software applications used in spotting,
digging-out, and analyzing busines...
Business Intelligence (BI): Who uses it?
Wal-Mart uses vast amounts of data and
category analysis to dominate the industry...
Business Intelligence (BI) & Data Marts
Data Mart
A database that has the same characteristics as a data
warehouse, but is...
Business Intelligence (BI)
Legacy DB
CRM/ERP
DB
Finance DB
ETL ETL
Source Data
ETL Process Target DW
ETL Process
Data Mart
BI: the Marketplace
“Worldwide business intelligence (BI) platform, analytic applications and
performance management (PM) ...
Testing a Data Warehouse (DWH)
DataWarehouseTesting
The Challenge
Comprehensive testing of data at every point throughout data process is becoming increa...
Resources involved
• Business Analysts create requirements
• QA Testers develop and execute test plans and test cases.
***...
An effective data warehouse testing strategy focuses on the main
structures within the data warehouse architecture:
1) The...
Testing the Data Warehouse: Entry Points
Recommended functional test strategy: Test every entry point in the
system (feeds...
Target DW
Testing the Data Warehouse: Entry Points
Legacy DB
CRM/ERP
DB
Finance DB
Source Data
File
File
Staging DB
ETL Pr...
Testing the DWH: the Mapping Document
a.k.a. Source to Target Map
It’s the critical element
required to efficiently plan t...
Testing the DWH: the Mapping Document
SELECT c.idCustomer "Customer ID", c.lastName "Customer Last
Name", c.firstName "Cus...
Testing the DWH: Implementation
Implementation of Functional Test
What is going on in the marketplace?
1. Manual Execution...
© 2015 Real-Time Technology Solutions, Inc.
Review
Mapping
Docs
Write SQL in
favorite
editor
Run
TESTs
Dump results
to a f...
Manual ETL Testing Flow Comments
 Check points across each leg so that each transformation is checked.
 If a file compar...
Functional Automation ETL Testing flow
1. Similar to previous - Extract mappings from mapping document
2. Write pairs of q...
36
© 2015 Real-Time Technology Solutions, Inc.
SQL
(source)
SQL
(target)
SQL
(source)
SQL
(target)
Legacy DB
CRM/ERP
DB
Fi...
QuerySurge™
the collaborative
Data Warehouse
Testing solution that
finds bad data &
provides a holistic
view of your data’...
• Reduce your costs & risks
• Improve your data quality
• Accelerate your testing cycles
• Share information with your tea...
the QuerySurge advantage
built by
QuerySurge™
Automate the entire testing cycle
 Automate kickoff, tests, comparison, aut...
QuerySurge™ Architecture
Web-based…
Installs on...
Linux
Connects to…
…or any other JDBC compliant data source
built by
Qu...
Collaboration
Testers
- functional testing
- regression testing
- result analysis
Developers / DBAs
- unit testing
- resul...
Strategy
• Execute business user reports and verify results from
report to Data Mart
» Logical Calculations
− Verify logic...
Testing the DWH: Functional Test of BI
Functional Testing of BI
1. BI Developer creates reports based on Business user
req...
Common Challenges
• BI systems often have reports that require complex SQL queries
across dozens of tables encompassing 10...
Testing the DWH: Performance Test of BI
Strategy
• Determine a typical workload for the business intelligence system.
• Id...
Summary
What is a Data Warehouse and
How Do I Test It?
• Big Data is a growing technical concern and has reached
$70 billi...
© 2015 Real-Time Technology Solutions, Inc.
47
To see the video of this Webinar please visit:
http://www.querysurge.com/so...
Upcoming SlideShare
Loading in …5
×

What is a Data Warehouse and How Do I Test It?

75,011 views

Published on

ETL Testing: A primer for Testers on Data Warehouses, ETL, Business Intelligence and how to test them.

Are you hearing and reading about Big Data, Enterprise Data Warehouses (EDW), the ETL Process and Business Intelligence (BI)? The software markets for EDW and BI are quickly approaching $22 billion, according to Gartner, and Big Data is growing at an exponential pace.

Are you being tasked to test these environments or would you like to learn about them and be prepared for when you are asked to test them?

RTTS, the Software Quality Experts, provided this groundbreaking webinar, based upon our many years of experience in providing software quality solutions for more than 400 companies.

You will learn the answer to the following questions:
• What is Big Data and what does it mean to me?
• What are the business reasons for a building a Data Warehouse and for using Business Intelligence software?
• How do Data Warehouses, Business Intelligence tools and ETL work from a technical perspective?
• Who are the primary players in this software space?
• How do I test these environments?
• What tools should I use?

This slide deck is geared towards:
 QA Testers
 Data Architects
 Business Analysts
 ETL Developers
 Operations Teams
 Project Managers

...and anyone else who is (a) new to the EDW space, (b) wants to be educated in the business and technical sides and (c) wants to understand how to test them.

Published in: Technology, Business

What is a Data Warehouse and How Do I Test It?

  1. 1. © 2015 Real-Time Technology Solutions, Inc. New York  Philadelphia  Atlanta  www.rtts.com What is a Data Warehouse and How Do I Test It? A primer for Testers on Data Warehouses, the ETL process and Business Intelligence and how to test them
  2. 2. © 2011 Real-Time Technology Solutions, Inc. New York  Philadelphia  Atlanta  www.rtts.com built by QuerySurge™ About FACTS Founded: 1996 – consulting firm Locations: New York (HQ), Atlanta, Philadelphia, Phoenix Strategic Partners: IBM, Microsoft, HP, Oracle, Teradata, HortonWorks, Cloudera, Amazon Software: QuerySurge RTTS is the leading provider of software & data quality for critical business systems
  3. 3. Overview  What is Big Data?  What is a Data Warehouse? o About the ETL Process o The Data Warehouse marketplace  What is Business Intelligence? o The architecture o The BI marketplace  Testing the DW Architecture o Entry points o The Mapping document o Functional test implementation o Test Tools  Testing BI o Functional test implementation o Performance Testing  Data Warehouse Test Tool demo  Q&A
  4. 4. ETL Business Intelligence (BI) software CxOs are using Business Intelligence & Analytics to make critical business decisions – with the assumption that the underlying data is fine. “The average organization loses $8.2 million annually through poor Data Quality.” - Gartner Data Architecture The Executive Office and Critical Data potential problem areas
  5. 5. What is Big Data?
  6. 6. Big data – defined as too much volume, velocity and variability to work on normal database architectures. What is Big Data? “The market for big data is $70 billion and growing by 15% a year.” - EMC COO Pat Gelsinger Size Defined as 5 petabytes or more 1 petabyte = 1,000 terabytes 1,000 terabytes = 1,000,000 gigabytes 1,000,000 gigabytes = 1,000,000,000 megabytes
  7. 7. Big Data Impact Handles more than 1 million customer transactions every hour. • data imported into databases that contain > 2.5 petabytes of data • the equivalent of 167 times the information contained in all the books in the US Library of Congress. Facebook handles 40 billion photos from its user base. Google processes 1 Terabyte per hour Twitter processes 85 million tweets per day eBay processes 80 Terabytes per day others
  8. 8. Requires exceptional technologies to efficiently process large quantities of data within tolerable elapsed times. Technologies include: • massively parallel processing (MPP) databases • data warehouses • Data mining grids • distributed file systems • distributed databases • cloud computing platforms • the Internet, and • scalable storage system Big Data Solutions
  9. 9. What is a Data Warehouse?
  10. 10. What is a Data Warehouse? Data Warehouse • typically a relational database that is designed for query and analysis rather than for transaction processing • a place where historical data is stored for archival, analysis and security purposes. • contains either raw data or formatted data • combines data from multiple sources • Sales • salaries • operational data • human resource data • inventory data • web logs • Social networks • Internet text and docs • other Legacy DB CRM/ERP DB Finance DB
  11. 11. Data Warehouse: Business Case Why build a Data Warehouse? • Data stored in operational systems (OLTP) not easily accessible • OLTP systems are not designed for end-user analysis • The data in OLTP is constantly changing • May be deficient in historical data • Diverse forms of data stored in different platforms and/or dissimilar formats
  12. 12. Data Warehouse: Business Case The Data Warehouse Business Solution • Collects data from different sources (other databases, files, web services, etc) • Integrates data into logical business areas • Provides direct access to data with powerful reporting tools (BI)
  13. 13. Data Warehouse: About the data The Data Warehouse data • Subject-oriented • Integrated • Non-volatile • Time-variant
  14. 14. Data Warehouse: the ETL process ETL = Extract, Transform, Load Why ETL? Need to load the data warehouse regularly (daily/weekly) so that it can serve its purpose of facilitating business analysis. Extract - data from one or more OLTP systems and copied into the warehouse Extract Transform – removing inconsistencies, assemble to a common format, adding missing fields, summarizing detailed data and deriving new fields to store calculated data. Transform Load – map the data and load it into the DW Load
  15. 15. Data Warehouse: the ETL process Extract Legacy DB CRM/ERP DB Finance DB Source Data ETL Process Target Data Warehouse Transform Load
  16. 16. Data Warehouse: the Marketplace “The data warehousing market will see a compound annual growth rate of 11.5% through 2013 to reach a total of $13.2 billion in revenue.” - consulting specialist The 451 Group Data Warehouse size Small data warehouses: < 5 TB Midsize data warehouses: 5 TB - 20 TB Large data warehouses: >20 TB - Analyst firm Gartner Leaders in Data Warehouse Data Management Systems       - Analyst firm Gartner’s ‘Magic Quadrant for Data Warehouse Database Management Systems’
  17. 17. Data Warehouse: the Marketplace Delivery Models • stand-alone DBMS software • Cloud offerings • data warehouse appliances Leading Appliance Makers
  18. 18. Business Intelligence (BI)
  19. 19. Business Intelligence (BI) B.I. – What is it? • Software applications used in spotting, digging-out, and analyzing business data • provides simple access to data which can be used in day to day operations, integrates data into logical business areas • provides historical, current and predictive views of business operations • made up of several related activities, including data mining, online analytical processing, querying and reporting.
  20. 20. Business Intelligence (BI): Who uses it? Wal-Mart uses vast amounts of data and category analysis to dominate the industry. Amazon and Yahoo follow a "test and learn" approach to business changes. Hardee’s, Wendy’s, and T.G.I. Friday’s use BI to make strategic decisions.
  21. 21. Business Intelligence (BI) & Data Marts Data Mart A database that has the same characteristics as a data warehouse, but is usually smaller and is focused on the data for one division or one workgroup within an enterprise. Typically hold aggregated data and some granular data. It is a subset of the DW and makes it more efficient for Business Intelligence reporting. Legacy DB CRM/E RP DB Finance DB ETL ETL Source Data ETL Process Target DW ETL Process Data Mart
  22. 22. Business Intelligence (BI) Legacy DB CRM/ERP DB Finance DB ETL ETL Source Data ETL Process Target DW ETL Process Data Mart
  23. 23. BI: the Marketplace “Worldwide business intelligence (BI) platform, analytic applications and performance management (PM) software revenue reached $10.5 billion in 2010, a 13.4 percent increase from 2009 revenue of $9.3 billion” “The four large "stack" vendors (SAP, Oracle, IBM and Microsoft) continue to consolidate the market, owning 59 percent of the market share. ” - Analyst firm Gartner - Analyst firm Forrester Research’s ‘Forrester Wave’ Leaders in BI        
  24. 24. Testing a Data Warehouse (DWH)
  25. 25. DataWarehouseTesting The Challenge Comprehensive testing of data at every point throughout data process is becoming increasingly important as more data is being used in strategic decision-making. Yet current strategies are time-consuming, resource- intensive and inefficient. What's Involved in Data Testing? According to authors Doug Vucevic and Wayne Yaddow in the book "Testing the Data Warehouse Practicum: Assuring Data Content, Data Structures and Quality", some of the main challenges of data testing are: Data Completeness Verifying that all data has been loaded from the sources to the target. Data Transformation Ensuring that all data has been transformed correctly during the extract- transform-load (ETL) process. Data Quality Ensuring that the ETL process correctly rejects, substitutes default values, corrects or ignores and reports invalid data. Regression Testing Ensuring existing functionality remains intact each time a new release of code is completed.
  26. 26. Resources involved • Business Analysts create requirements • QA Testers develop and execute test plans and test cases. ***Skill Set required: Strong SQL!!! • Architects set up test environments • Developers perform unit tests • DBAs test for performance and stress • Business Users perform functional User Acceptance Tests Testing the DWH: Resources Involved For the purposes of this presentation, we will focus on a strategy for Testers.
  27. 27. An effective data warehouse testing strategy focuses on the main structures within the data warehouse architecture: 1) The Sources 2) The ETL layer 3) The data warehouse itself 4) The front-end (BI) data warehouse applications Testing the Data Warehouse: the Strategy
  28. 28. Testing the Data Warehouse: Entry Points Recommended functional test strategy: Test every entry point in the system (feeds, databases, internal messaging, front-end transactions). The goal: provide rapid localization of data issues between points test entry point(s) test entry point test entry point Legacy DB CRM/ERP DB Finance DB ETL ETL Source Data ETL Process Target DW ETL Process Data Mart Business Intelligence software
  29. 29. Target DW Testing the Data Warehouse: Entry Points Legacy DB CRM/ERP DB Finance DB Source Data File File Staging DB ETL Process ETL ETL ETL ETL ETL ETL test entry pointstest entry points test entry points test entry points Data MartsETL Process ETL ETL possible architectureETL ETL ETL ETL ETL ETL ETL Process Business Intelligence software
  30. 30. Testing the DWH: the Mapping Document a.k.a. Source to Target Map It’s the critical element required to efficiently plan the ETL process. Intention:  capture business rules  data flow mapping and  data movement requirements. Mapping Doc specifies:  Source input definition  Target/output details  Business & data transformation rules  Data quality requirements
  31. 31. Testing the DWH: the Mapping Document SELECT c.idCustomer "Customer ID", c.lastName "Customer Last Name", c.firstName "Customer First Name", o.idOrder "Order Number", p.name "Product Name", op.quantity "Quantity Ordered", CASE WHEN os.idOrderStatus = 5 AND o.refundDate IS NOT NULL THEN 'Returned' WHEN (os.idOrderStatus = 3 OR os.idOrderStatus = 4) AND o.shipDate IS NOT NULL THEN 'Delivered' ELSE 'Processing' END "Order Status" FROM Sales.Orders o, Sales.OrderStatus os, Sales.OrderProduct op, Sales.Product p, Sales.Category cat, Sales.Customer c WHERE o.order_idOrderStatus = os.idorderstatus AND op.orderProduct_idOrder = o.idOrder AND op.orderProduct_idProduct = p.idProduct AND p.product_idCategory = cat.idCategory AND cat.name = 'Electronics' AND o.order_idCustomer = c.idCustomer AND o.orderDate BETWEEN '01-SEP-10' AND '07-SEP-10' ORDER BY c.idCustomer, c.lastName, c.firstName, o.idorder Source SELECT u.idUser "Customer ID", u.lastName "Customer Last Name", u.firstName "Customer First Name", p.idPurchase "Purchase Number", i.name "Item Name", oi.quantity "Quantity Ordered", ps.status "Purchase Status" FROM dw.Purchase p, dw.PurchaseStatus ps, dw.OrderItem oi, dw.Item i, dw.user_ u, dw.category cat WHERE p.purchase_idPurchaseStatus = ps.idPurchaseStatus AND oi.orderItem_idPurchase = p.idPurchase AND oi.orderItem_idItem = i.idItem AND p.purchase_idUser = u.idUser AND i.item_idCategory = cat.idCategory AND cat.name = 'Electronics' AND SUBSTR(p.purchaseDate, 1, 5) BETWEEN '09-01' AND '09-07' AND SUBSTR(p.purchaseDate, -2) = '10' ORDER BY u.idUser, u.lastname, u.firstname, p.idpurchase Target
  32. 32. Testing the DWH: Implementation Implementation of Functional Test What is going on in the marketplace? 1. Manual Execution 2. Automated execution with standard test tools 3. Bulk automation with Data Warehouse Testing Tool (i.e. QuerySurge)
  33. 33. © 2015 Real-Time Technology Solutions, Inc. Review Mapping Docs Write SQL in favorite editor Run TESTs Dump results to a file Compare results manually or w/compare tool Report Defects and issues Tools Tasks Timeline Testing the DWH: Manual Testing Flow
  34. 34. Manual ETL Testing Flow Comments  Check points across each leg so that each transformation is checked.  If a file compare tool is used, care must be taken to ensure that the result rows for each query are in the same order (the db is under no obligation to return rows in a specified order, unless the sql indicates an order).  This process can quickly result in 100’s or 1,000’s of source and target query pairs.  Process is labor intensive. Even with multiple people, a VERY small sampling can be performed. Testing the DWH: Manual Testing Flow
  35. 35. Functional Automation ETL Testing flow 1. Similar to previous - Extract mappings from mapping document 2. Write pairs of queries that test between any two points in the architecture. 3. Issue the queries via a Functional Automation tool 4. Have the functional Scripts dump the query result-sets to files 5. Compare the files, either by writing automation code or by using a file compare tool. This process is dependent on the speed of the automation tool; Normally, only a fraction of the data can be covered per ETL per build. Functional Tester Testing the DWH: Typical Functional Automation Testing Flow
  36. 36. 36 © 2015 Real-Time Technology Solutions, Inc. SQL (source) SQL (target) SQL (source) SQL (target) Legacy DB CRM/ERP DB Finance DB Testing the Data Warehouse: Specialized Data Warehouse Test Tool QuerySurge™
  37. 37. QuerySurge™ the collaborative Data Warehouse Testing solution that finds bad data & provides a holistic view of your data’s health built by
  38. 38. • Reduce your costs & risks • Improve your data quality • Accelerate your testing cycles • Share information with your team with QuerySurge™ you can: built by QuerySurge™ • Provides huge ROI (i.e. 1,300%)* *based on client’s calculation of Return on Investment
  39. 39. the QuerySurge advantage built by QuerySurge™ Automate the entire testing cycle  Automate kickoff, tests, comparison, auto-emailed results Create Tests easily with no SQL programming  ensures minimal time & effort to create tests / obtain results Test across different platforms  data warehouse, Hadoop, NoSQL, database, flat file, XML Collaborate with team  Data Health dashboard, shared tests & auto-emailed reports Verify more data & do it quickly  verifies up to 100% of all data up to 1,000 x faster Integrate for Continuous Delivery  Integrates with most Build, ETL & QA management software
  40. 40. QuerySurge™ Architecture Web-based… Installs on... Linux Connects to… …or any other JDBC compliant data source built by QuerySurge™ QuerySurge Controller QuerySurge Server QuerySurge Agents Flat Files
  41. 41. Collaboration Testers - functional testing - regression testing - result analysis Developers / DBAs - unit testing - result analysis Data Analysts - review, analyze data - verify mapping failures Operations teams - monitoring - result analysis Managers - oversight - result analysis Share information on the built by QuerySurge™
  42. 42. Strategy • Execute business user reports and verify results from report to Data Mart » Logical Calculations − Verify logical calculations to back-end Data Mart by creating SQL queries that incorporate and return the calculations from the Data Mart. Compare to report. (Example: Total sales for the month of January) » Data Validation − Verify data validations to back-end Data Mart by creating SQL queries that incorporate and return the equivalent data from the Data Mart. Compare to report. (Example: List of all customers that spent more than $100) » Parameter Validation − For reports that have parameters, create multiple tests that incorporates a reasonable amount of test coverage. Testing the Data Warehouse: Functional Test of Business Intelligence software
  43. 43. Testing the DWH: Functional Test of BI Functional Testing of BI 1. BI Developer creates reports based on Business user requirements 2. Testers verify reports by: • Running reports using a range of parameter permutations. • Verify that data is correct o Record counts on report to backend data mart o Verify field data elements o Verify field lengths and field level data o Verify logical dependencies Functional Tester Automation tools can and should be used for regression purposes.
  44. 44. Common Challenges • BI systems often have reports that require complex SQL queries across dozens of tables encompassing 100’s of 1,000’s of records, from multiple databases. • Challenge: Determining the performance characteristics under differing conditions and workloads . • Need to know the ability of the system to scale to the # of concurrent users. • Must test how length of time for user to receive report after requesting it with the parameters he/she specifies. Testing the Data Warehouse: Performance Test of BI
  45. 45. Testing the DWH: Performance Test of BI Strategy • Determine a typical workload for the business intelligence system. • Identify different user roles, what kinds of work they do on the system, and how often they do this work. • Determine how many users of each role there are. • Choose a performance tool that can record the protocol activity of the system and allow the performance tester to modify data parameters. • Create scripts by recording the protocol traffic emitted by the BI system as the targeted reports were opened and refreshed. • Prepare and execute series of concurrent multi-user tests • Make sure each virtual user emulates the activity of real users accessing business intelligence reports based on separate concerns. • Monitor response times, throughput, network activity, and system activity for issues • Review results and provide recommendations. Using this approach, the workload activity of the entire population of business intelligence users can be reproduced in controlled conditions Performance Tester
  46. 46. Summary What is a Data Warehouse and How Do I Test It? • Big Data is a growing technical concern and has reached $70 billion in scope. • The Data Warehouse and Business Intelligence software marketplace is a $22 billion market and growing. • Functional testing of a data warehouse implementation is a complex undertaking and requires strong SQL skills by the Tester • Manual testing and automated testing using standard tools provide a very small % of coverage. • Business Intelligence software must be properly tested for both functionality and performance.
  47. 47. © 2015 Real-Time Technology Solutions, Inc. 47 To see the video of this Webinar please visit: http://www.querysurge.com/solutions/data-warehouse-testing What is a Data Warehouse and How Do I Test It?

×