1
©LuxoftTraining2012
1
©LuxoftTraining2012
TEST Labs 2016
Тестирование Data
Warehouse (DWH)
Юрий Слива
Luxoft
1. Введение.
2. Основные понятия и принципы работы DWH.
3. Тестирование DWH. С чего начать?
4. SQL(DDL, DML, DCL) и их использование в тестировании.
5. Tips and tricks. QA.
Содержание курса
3
©LuxoftTraining2012
3
©LuxoftTraining2012
Тестирование DWH
Введение
Relational database
A relational database is a collection of data
items organized as a set of formally-described
tables from which data can be accessed or
reassembled in many different ways without
having to reorganize the database tables.
The standard user and application
program interface to a relational
database is the structured query
language (SQL).
5
©LuxoftTraining2012
5
©LuxoftTraining2012
Тестирование DWH
Основные понятия и принципы работы DWH.
Why a Data Warehouse is Separated from
Operational Databases?
• An operational database is constructed for well-
known tasks and workloads such as searching
particular records, indexing, etc.
• In contract, data warehouse queries are often
complex and they present a general form of
data.
• Operational databases support concurrent
processing of multiple transactions.
• Concurrency control and recovery mechanisms
are required for operational databases to ensure
robustness and consistency of the database.
• An operational database query allows to read
and modify operations, while an OLAP query
needs only read only access of stored data.
• An operational database maintains current data.
On the other hand, a data warehouse maintains
historical data.
What is Data Warehouse?
• A data warehouse is a database, which is kept
separate from the organization's operational
database.
• There is no frequent updating done in a data
warehouse.
• It possesses consolidated historical data, which
helps the organization to analyse its business.
• A data warehouse helps executives to organize,
understand, and use their data to take strategic
decisions.
• Data warehouse systems help in the integration
of diversity of application systems.
• A data warehouse system helps in consolidated
historical data analysis.
8
©LuxoftTraining2012
8
©LuxoftTraining2012
Тестирование DWH
Тестирование DWH. С чего начать?
ETL
Source data
Transformed
data
Business application specific data
Business application specific data
ETL
Transformed data
Local
storage
area
Dimensions
Schema 1
Application1
Pipe-delimited data
Feed 1
Feed 2
Real-time feeds
Feed 3
(Web Services)
Feeds
Static Data
DATA
(Oracle DB )
XLS
CSV
CSV
CSV
XLS
CSV
Application area 1Staging Area
JMS
Transformationarea
(Iortca)
Transformed data
from Schema 1
Application 3
Application area 3
Transformed data
Application 2
Application area 2
Reporting
App
App
Reporting
Reporting
ETL
ETL
Shared Folder
SFTP
SFTP
SFTP
from Schema 1
from Schema 1
Transformed
data
Transformed
data
DWH - high level
DWH Testing Process
Test Preparation
Following task’s should be done on test
preparation phase:
- Analyse requirements
- Create test plan
- Clarify open points
- Create test pack (test cases)
- Mitigate risks
Test Execution
• Test Scripts and Test Cases execution -
it is the responsibility of the Testers, and
test Results are recorded by tester in
the Bug tracking system.
• The tester will record any defects
identified during test execution in the
Defect Management system
• Defects will be logged in Defect
Management System, according to the
Defect Management process definition.
DWH – feeds testing
Legend:
System Parameter, ie parameter is generated by system Parameter 1
Parametrized XML parameter (i.e. value of tag is derived from one of system field) <Attribute>
Line # Xpath (open tag) Input Parameter Xpath (close tag) R/O/C
1<?xml version="1.0" encoding="UTF-8"?>
2<publicExecutionReport xmlns="http://www.fpml.org/FpML-5/transparency"
3 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" fpmlVersion="5-3"
4
xsi:schemaLocation="http://www.fpml.org/FpML-5/transparency
../../xmls/SDR/transparency/fpml-main-5-3.xsd">
5 <header>
6 <messageId messageIdScheme=" Data prefix "> Required
7 Internal TWH Message SID </messageId>
8 <sentBy> Data value </sentBy> Required
9 <sendTo>DTCCGTR</sendTo>
10 <creationTimestamp> Message Creation Date/Time </creationTimestamp>
11 </header>
12
SELECT
to_char(EXECUTIONDATETIME2)
FROM SCHEMA_OWNER.TABLE T1,
XMLTABLE
(
XMLNAMESPACES
(
'http://www.fpml.org/FpML-5/transparency' AS "ns"
),
'//ns:publicExecutionReport'
PASSING XMLType(T.MESSAGE_C)
COLUMNS -- columns for parsed values
EXECUTIONDATETIME2 VARCHAR2(200) PATH
'//ns:termination/ns:executionDateTime/text()'
)t2
where
t1.id = 1
DWH – Staging Area
A staging area, or landing zone, is an intermediate storage area used for data
processing during the extract, transform and load (ETL) process. The data
staging area sits between the data source(s) and the data target(s), which are
often data warehouses, data marts, or other data repositories.[1]
List of the most popular ETL tools
• Informatica - Power Center
• IBM - Websphere DataStage(Formerly known
as Ascential DataStage)
• SAP - BusinessObjects Data Integrator
• IBM - Cognos Data Manager (Formerly known
as Cognos DecisionStream)
• Microsoft - SQL Server Integration Services
• Oracle - Data Integrator (Formerly known as
Sunopsis Data Conductor)
• SAS - Data Integration Studio
• Oracle - Warehouse Builder
• AB Initio
• Information Builders - Data Migrator
• Pentaho - Pentaho Data Integration
• Embarcadero Technologies - DT/Studio
• IKAN - ETL4ALL
• IBM - DB2 Warehouse Edition
• Pervasive - Data Integrator
• ETL Solutions Ltd. - Transformation Manager
• Group 1 Software (Sagent) – DataFlow
• Sybase - Data Integrated Suite ETL
• Talend - Talend Open Studio
• Expressor Software - Expressor Semantic Data
Integration System
• Elixir - Elixir Repertoire
• OpenSys - CloverETL
ETL Testing
Key points:
• Ensure that data is transformed correctly
• Without any data loss and truncation projected
• Data should be loaded into the data warehouse
• ETL application appropriately rejects and
replaces with default values and reports invalid
data
• Make sure that the data loaded in data
warehouse within prescribed and expected time
frames to confirm scalability and performance
• All methods should have appropriate unit tests
regardless of visibility
• To measure their effectiveness all unit tests
should use appropriate coverage techniques
• Strive for one assertion per test case
• Create unit tests that target exceptions
Testers key responsibilities:
• Stage table testing
• Business transformation logic applied
• Target table loading from stage file or table after
applying a transformation.
Mapping
Source -> Staging
Staging to CSV-file
16
©LuxoftTraining2012
16
©LuxoftTraining2012
Тестирование DWH
SQL(DDL, DML, DCL) и их использование в
тестировании
SQL(DDL, DML, DCL)
Data Definition Language (DDL) are used to define the database structure or schema. Examples:
CREATE - to create objects in the database
ALTER - alters the structure of the database
DROP - delete objects from the database
TRUNCATE - remove all records from a table, including all spaces allocated for the records are removed
COMMENT - add comments to the data dictionary
RENAME - rename an object
Data Manipulation Language (DML) are used for managing data within schema objects. Examples:
SELECT - retrieve data from the a database
INSERT - insert data into a table
UPDATE - updates existing data within a table
DELETE - deletes all records from a table, the space for the records remain
MERGE - UPSERT operation (insert or update)
CALL - call a PL/SQL or Java subprogram
EXPLAIN PLAN - explain access path to data
LOCK TABLE - control concurrency
Data Control Language (DCL) is used for privileges. Examples:
GRANT - gives user's access privileges to database
REVOKE - withdraw access privileges given with the GRANT command
18
©LuxoftTraining2012
18
©LuxoftTraining2012
Тестирование DWH
SQL(DDL, DML, DCL) и их использование в
тестировании
Tips and tricks
CREATE SEQUENCE Name [START WITH first value]
[INCREMENT BY increment_value];
SEQUENCE
PARTITION BY
SELECT col1, col2, SUM(col3) sum_col3
FROM Table
GROUP BY col1, col2;
SELECT id, col1, col2, SUM(col3)
OVER (PARTITION BY col1, col2) sum_col3
FROM Table;
ROW_NUMBER
RANK
SELECT *, ROW_NUMBER() OVER(ORDER BY type)
num, RANK() OVER(ORDER BY type) rnk
FROM WORK_PRN
code model color type price num rnk
1 1276n Laser 259 3 3
2 1433y Jet 302 1 1
3 1434y Jet 243 2 1
4 1401n Matrix 139 5 5
5 1408n Matrix 280 6 5
6 1288n Laser 402 4 3
20
©LuxoftTraining2012
20
©LuxoftTraining2012
Тестирование DWH
Questions

Test labs 2016. Тестирование data warehouse

  • 1.
  • 2.
    1. Введение. 2. Основныепонятия и принципы работы DWH. 3. Тестирование DWH. С чего начать? 4. SQL(DDL, DML, DCL) и их использование в тестировании. 5. Tips and tricks. QA. Содержание курса
  • 3.
  • 4.
    Relational database A relationaldatabase is a collection of data items organized as a set of formally-described tables from which data can be accessed or reassembled in many different ways without having to reorganize the database tables. The standard user and application program interface to a relational database is the structured query language (SQL).
  • 5.
  • 6.
    Why a DataWarehouse is Separated from Operational Databases? • An operational database is constructed for well- known tasks and workloads such as searching particular records, indexing, etc. • In contract, data warehouse queries are often complex and they present a general form of data. • Operational databases support concurrent processing of multiple transactions. • Concurrency control and recovery mechanisms are required for operational databases to ensure robustness and consistency of the database. • An operational database query allows to read and modify operations, while an OLAP query needs only read only access of stored data. • An operational database maintains current data. On the other hand, a data warehouse maintains historical data.
  • 7.
    What is DataWarehouse? • A data warehouse is a database, which is kept separate from the organization's operational database. • There is no frequent updating done in a data warehouse. • It possesses consolidated historical data, which helps the organization to analyse its business. • A data warehouse helps executives to organize, understand, and use their data to take strategic decisions. • Data warehouse systems help in the integration of diversity of application systems. • A data warehouse system helps in consolidated historical data analysis.
  • 8.
  • 9.
    ETL Source data Transformed data Business applicationspecific data Business application specific data ETL Transformed data Local storage area Dimensions Schema 1 Application1 Pipe-delimited data Feed 1 Feed 2 Real-time feeds Feed 3 (Web Services) Feeds Static Data DATA (Oracle DB ) XLS CSV CSV CSV XLS CSV Application area 1Staging Area JMS Transformationarea (Iortca) Transformed data from Schema 1 Application 3 Application area 3 Transformed data Application 2 Application area 2 Reporting App App Reporting Reporting ETL ETL Shared Folder SFTP SFTP SFTP from Schema 1 from Schema 1 Transformed data Transformed data DWH - high level
  • 10.
    DWH Testing Process TestPreparation Following task’s should be done on test preparation phase: - Analyse requirements - Create test plan - Clarify open points - Create test pack (test cases) - Mitigate risks Test Execution • Test Scripts and Test Cases execution - it is the responsibility of the Testers, and test Results are recorded by tester in the Bug tracking system. • The tester will record any defects identified during test execution in the Defect Management system • Defects will be logged in Defect Management System, according to the Defect Management process definition.
  • 11.
    DWH – feedstesting Legend: System Parameter, ie parameter is generated by system Parameter 1 Parametrized XML parameter (i.e. value of tag is derived from one of system field) <Attribute> Line # Xpath (open tag) Input Parameter Xpath (close tag) R/O/C 1<?xml version="1.0" encoding="UTF-8"?> 2<publicExecutionReport xmlns="http://www.fpml.org/FpML-5/transparency" 3 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" fpmlVersion="5-3" 4 xsi:schemaLocation="http://www.fpml.org/FpML-5/transparency ../../xmls/SDR/transparency/fpml-main-5-3.xsd"> 5 <header> 6 <messageId messageIdScheme=" Data prefix "> Required 7 Internal TWH Message SID </messageId> 8 <sentBy> Data value </sentBy> Required 9 <sendTo>DTCCGTR</sendTo> 10 <creationTimestamp> Message Creation Date/Time </creationTimestamp> 11 </header> 12 SELECT to_char(EXECUTIONDATETIME2) FROM SCHEMA_OWNER.TABLE T1, XMLTABLE ( XMLNAMESPACES ( 'http://www.fpml.org/FpML-5/transparency' AS "ns" ), '//ns:publicExecutionReport' PASSING XMLType(T.MESSAGE_C) COLUMNS -- columns for parsed values EXECUTIONDATETIME2 VARCHAR2(200) PATH '//ns:termination/ns:executionDateTime/text()' )t2 where t1.id = 1
  • 12.
    DWH – StagingArea A staging area, or landing zone, is an intermediate storage area used for data processing during the extract, transform and load (ETL) process. The data staging area sits between the data source(s) and the data target(s), which are often data warehouses, data marts, or other data repositories.[1]
  • 13.
    List of themost popular ETL tools • Informatica - Power Center • IBM - Websphere DataStage(Formerly known as Ascential DataStage) • SAP - BusinessObjects Data Integrator • IBM - Cognos Data Manager (Formerly known as Cognos DecisionStream) • Microsoft - SQL Server Integration Services • Oracle - Data Integrator (Formerly known as Sunopsis Data Conductor) • SAS - Data Integration Studio • Oracle - Warehouse Builder • AB Initio • Information Builders - Data Migrator • Pentaho - Pentaho Data Integration • Embarcadero Technologies - DT/Studio • IKAN - ETL4ALL • IBM - DB2 Warehouse Edition • Pervasive - Data Integrator • ETL Solutions Ltd. - Transformation Manager • Group 1 Software (Sagent) – DataFlow • Sybase - Data Integrated Suite ETL • Talend - Talend Open Studio • Expressor Software - Expressor Semantic Data Integration System • Elixir - Elixir Repertoire • OpenSys - CloverETL
  • 14.
    ETL Testing Key points: •Ensure that data is transformed correctly • Without any data loss and truncation projected • Data should be loaded into the data warehouse • ETL application appropriately rejects and replaces with default values and reports invalid data • Make sure that the data loaded in data warehouse within prescribed and expected time frames to confirm scalability and performance • All methods should have appropriate unit tests regardless of visibility • To measure their effectiveness all unit tests should use appropriate coverage techniques • Strive for one assertion per test case • Create unit tests that target exceptions Testers key responsibilities: • Stage table testing • Business transformation logic applied • Target table loading from stage file or table after applying a transformation.
  • 15.
  • 16.
    16 ©LuxoftTraining2012 16 ©LuxoftTraining2012 Тестирование DWH SQL(DDL, DML,DCL) и их использование в тестировании
  • 17.
    SQL(DDL, DML, DCL) DataDefinition Language (DDL) are used to define the database structure or schema. Examples: CREATE - to create objects in the database ALTER - alters the structure of the database DROP - delete objects from the database TRUNCATE - remove all records from a table, including all spaces allocated for the records are removed COMMENT - add comments to the data dictionary RENAME - rename an object Data Manipulation Language (DML) are used for managing data within schema objects. Examples: SELECT - retrieve data from the a database INSERT - insert data into a table UPDATE - updates existing data within a table DELETE - deletes all records from a table, the space for the records remain MERGE - UPSERT operation (insert or update) CALL - call a PL/SQL or Java subprogram EXPLAIN PLAN - explain access path to data LOCK TABLE - control concurrency Data Control Language (DCL) is used for privileges. Examples: GRANT - gives user's access privileges to database REVOKE - withdraw access privileges given with the GRANT command
  • 18.
    18 ©LuxoftTraining2012 18 ©LuxoftTraining2012 Тестирование DWH SQL(DDL, DML,DCL) и их использование в тестировании
  • 19.
    Tips and tricks CREATESEQUENCE Name [START WITH first value] [INCREMENT BY increment_value]; SEQUENCE PARTITION BY SELECT col1, col2, SUM(col3) sum_col3 FROM Table GROUP BY col1, col2; SELECT id, col1, col2, SUM(col3) OVER (PARTITION BY col1, col2) sum_col3 FROM Table; ROW_NUMBER RANK SELECT *, ROW_NUMBER() OVER(ORDER BY type) num, RANK() OVER(ORDER BY type) rnk FROM WORK_PRN code model color type price num rnk 1 1276n Laser 259 3 3 2 1433y Jet 302 1 1 3 1434y Jet 243 2 1 4 1401n Matrix 139 5 5 5 1408n Matrix 280 6 5 6 1288n Laser 402 4 3
  • 20.