Data Verification In QA Department Final


Published on

Data warehouse and ETL testing should be conducted according to a process and checklist. This presentation provides an overview of recommended methods.

Published in: Technology

Data Verification In QA Department Final

  1. 1. Database and ETL Testing Process Methods, Issues, Recommendations Jan. 19, 2010 W. Yaddow [email_address] For internal use only – Not for external distribution
  2. 2. Agenda <ul><li>QA objectives for ETL’s & data loading projects </li></ul><ul><li>Samples of QA data defect discoveries </li></ul><ul><li>Data quality tools / techniques used by QA team </li></ul><ul><li>ETL & data loading verification checks </li></ul><ul><li>Lessons learned in data verification </li></ul><ul><li>Recommendations for continued early involvement by QA </li></ul>
  3. 3. Data defects… a definition <ul><ul><li>Data Defects: Deviations from the correctness of data, generally errors occurring prior to processing of data for analytics or reporting. </li></ul></ul><ul><ul><li>Errors can be the result of data model, low level design, data mapping, or data loading prior to processing in an application . </li></ul></ul><ul><ul><li>Note: Data issues on displays or reports are not considered data defects when they result from service calls or computation errors within an application. </li></ul></ul>
  4. 4. QA objectives for ETL & data integration <ul><li>Assure that all the records in source systems that should be migrated to a database are extracted -- no more, no less. </li></ul><ul><li>Verify that all of the components of the ETL / load process complete with no defects. </li></ul><ul><li>Verify that all of the source data is correctly transformed into dimension, fact and other tables. </li></ul><ul><li>Analyze ETL / load exception logs </li></ul>
  5. 5. QA role in data verification projects <ul><ul><ul><li>QA develops verification methods to support data integration specific to projects. </li></ul></ul></ul><ul><ul><ul><li>QA executes tasks that demonstrate data verification is a critical link between the DS, DSO, application development and analytics teams. </li></ul></ul></ul><ul><ul><ul><li>QA continues to demonstrate that early data testing is the most efficient means of identifying and correcting defects. </li></ul></ul></ul>
  6. 6. Sample of data defect discoveries % Data Defects % Data Defects High or Critical Severity App1 39% 48% App2 26% 70% App3 26% 33% App4 6% 59% App5 29% 68% Note: Data as of 10/23/2009
  7. 7. Data integration & ETL error injection points 1) DB design & planning 2) ETL, Data Load DATA TRACK PHASES ARTIFACTS QA TASKS Data and analysis requirements Data design & requirements Reviews, comments   Source data planning & profiling Reviews, comments Data flow and load design Data model Reviews, comments   Logical & physical data flow diagrams Reviews, comments   Data movement low level design (LLD) Reviews, comments   Data mappings & transformations, source to target. Reviews, test planning, test case development.   ETL design & logic Reviews, comments   SQL and PL/SQL for data loads Reviews, comments   Data cleansing plan Reviews, comments   Data load and ETL developer test plan Reviews, comments Data load /ETLs Execution Extract, transpose, load Reviews, verification, defect reports Data load / ETL load inspection Workflow logs, session logs, error log tables, reject tables, Reviews, verification, defect reports
  8. 8. Small sample; data verification
  9. 9. ETL & data loading verification checks <ul><li>Basic ETL and PL/SQL Verifications Conducted by QA </li></ul><ul><li>Verify mappings, source to target </li></ul><ul><li>Verify that all tables and specified fields were loaded from source to staging </li></ul><ul><li>Verify that keys were properly generated using sequence generator </li></ul><ul><li>Verify that not-null fields are populated </li></ul><ul><li>Verify no data truncation in each field </li></ul><ul><li>Verify data types and formats are as specified in design phase </li></ul><ul><li>Verify no duplicate records in target tables. </li></ul><ul><li>Verify transformations based on data low level design (LLD's) </li></ul><ul><li>Verify that numeric fields are populated with correct precision </li></ul><ul><li>Verify that every ETL session completed with only planned exceptions </li></ul><ul><li>Verify all cleansing, transformation, error and exception handling </li></ul><ul><li>Verify PL/SQL calculations and data mappings </li></ul>
  10. 10. Data verification training overview <ul><li>Data Quality Overview </li></ul><ul><li>Testing: DQ Categories / Checks </li></ul><ul><li>Testing: DQ Case Study </li></ul><ul><li>DQ Test Management (planning, design, execution, tools) </li></ul><ul><li>DQ Benefits & Challenges </li></ul>
  11. 11. QA steps: data integration verification (1) <ul><li>Data integration planning ( Data model, LLD’s ) </li></ul><ul><li>Gain understanding of data to be reported by the application … and the tables upon which each report is based (orgs, ratings, countries, analysts, etc.). </li></ul><ul><li>Review, understand data model – gain understanding of keys, flows from source to target </li></ul><ul><li>Review, understand data LLD’s and mappings : add, update sequences for all sources of each target table </li></ul><ul><li>ETL Planning and testing ( source inputs & ETL design ) </li></ul><ul><li>Participate in ETL design reviews </li></ul><ul><li>Gain in-depth knowledge of ETL sessions , the order of execution, restraints, transformations </li></ul><ul><li>Participate in development ETL test case reviews </li></ul><ul><li>After ETL’s are run, use checklists for QA assessments of rejects, session failures, errors </li></ul>
  12. 12. QA steps: data integration verification (2) <ul><li>Assess ETL logs: session, workflow, errors </li></ul><ul><li>Review ETL workflow outputs , source to target counts </li></ul><ul><li>Verify source to target mapping docs with loaded tables using TOAD and other tools </li></ul><ul><li>After ETL runs or manual data loads, assess data in every table with focus on key fields (dirty data, incorrect formats, duplicates, etc.). Use TOAD, Excel tools. (SQL queries, filtering, etc.) </li></ul><ul><li>GUI and report validations </li></ul><ul><li>Compare reports with target data. </li></ul><ul><li>Verify that reporting meets user expectations </li></ul><ul><li>Analytics test team data validation </li></ul><ul><li>Test data as it is integrated into application. </li></ul><ul><li>Provide tools and tests for data validation. </li></ul>
  13. 13. From Source to Data Warehouse… Unit Testing <ul><li>• Know data transformation rules! </li></ul><ul><li>• Run test cases for each transformation rule; include positive & negative situations </li></ul><ul><li>• Row counts: DWH (Destination) = Source + Rejected </li></ul><ul><li>• Verify process correctly uses all required data including metadata </li></ul><ul><li>• Cross reference DWH Dimensions and fact tables to source tables </li></ul><ul><li>• Verify all busines rule computations are correct </li></ul><ul><li>• Verify database queries, expected vs actual results </li></ul><ul><li>• Rejects are correctly handled and conform to business rules </li></ul><ul><li>• Slow-changing Dimensions eg. address, marital status processed correctly </li></ul><ul><li>• Correctness of surrogate keys eg. time zones, currencies in fact tables </li></ul>
  14. 14. Transforming Data, Source to Target
  15. 15. DQ tools / techniques used by QA team <ul><li>TOAD / SQL Navigator </li></ul><ul><li>Data profiling for value range & boundary analysis </li></ul><ul><li>Null field analysis </li></ul><ul><li>Row counting </li></ul><ul><li>Data type analysis </li></ul><ul><li>Referential integrity analysis (key analysis) </li></ul><ul><li>Distinct value analysis by field </li></ul><ul><li>Duplicate data analysis (fields and rows) </li></ul><ul><li>Cardinality analysis </li></ul><ul><li>PL/SQL stored procedures & package verification </li></ul><ul><li>Excel </li></ul><ul><li>Data filtering for profile analysis </li></ul><ul><li>Data value sampling </li></ul><ul><li>Data type analysis </li></ul><ul><li>MS Access </li></ul><ul><li>Table and data analysis across schemas </li></ul><ul><li>QTP </li></ul><ul><li>Automated testing of templates and application screens </li></ul><ul><li>Analytics Tools </li></ul><ul><li>J – statistics, visualization, data manipulation </li></ul><ul><li>Perl – data manipulation, scripting </li></ul><ul><li>R – statistics </li></ul>
  16. 16. Data defect findings by QA team <ul><li>Data Defects Types on six projects: </li></ul><ul><ul><li>Inadequate ETL and stored procedure design documents </li></ul></ul><ul><ul><li>Field values are null when specified as “Not Null”. </li></ul></ul><ul><ul><li>Field constraints and SQL not coded correctly for Informatica ETL </li></ul></ul><ul><ul><li>Excessive ETL errors discovered after entry to QA </li></ul></ul><ul><ul><li>Source data does not meet table mapping specifications (ex., dirty data) </li></ul></ul><ul><ul><li>Source to target mappings: 1) often not reviewed, 2) in error and 2) not consistently maintained through dev lifecycle </li></ul></ul><ul><ul><li>Data models are not adequately maintained during development lifecycle </li></ul></ul><ul><ul><li>Target data does not meet mapping specifications </li></ul></ul><ul><ul><li>Duplicate field values when defined to be DISTINCT </li></ul></ul><ul><ul><li>ETL SQL / transformation errors leading to missing rows and invalid field values </li></ul></ul><ul><ul><li>Constraint violations in source </li></ul></ul><ul><ul><li>Target data is incorrectly stored in nonstandard formats </li></ul></ul><ul><ul><li>Table keys are incorrect for important relationship linkages </li></ul></ul>
  17. 17. Lessons learned <ul><li>Formal QA data track verifications should continue early in the ETL design and data load process (independent of application development. </li></ul><ul><li>With access to ETL dev environment, QA can prepare for formal testing and offer feedback to dev team </li></ul><ul><li>Offshore teams need adequate and more representative samples of data for data planning and design </li></ul><ul><li>Data models, LLD’s, ETL design and data mapping documents need to be kept in sync until transition </li></ul><ul><li>QA resourcing for projects must include needs to accommodate data track verifications </li></ul>
  18. 18. Recommendations for data verifications <ul><li>Detailed Recommendations for Development, QA, Data Services </li></ul><ul><li>Need analysis of a.) source data quality and b.) data field profiles before input to Informatica and other data-build services. </li></ul><ul><li>QA should participate in all data model and data mapping reviews . </li></ul><ul><li>Need complete review of ETL error logs and resolution of errors by ETL teams before DB turn-over to QA. </li></ul><ul><li>Early use of QC during ETL and stored procedure testing to target vulnerable process areas. </li></ul><ul><li>Substantially improved documentation of PL/SQL stored procedures. </li></ul><ul><li>QA needs dev or separate environment for early data testing. QA should be able to modify data in order to perform negative tests. (QA currently does only positive tests because the application and data base tests work in parallel in the same environment.) </li></ul><ul><li>Need substantially enhanced verification of target tables after each ETL load before data turn-over to QA. </li></ul><ul><li>Need mandatory maintenance of data models and source to target mapping / transformation rules documents from elaboration until transition. </li></ul><ul><li>Investments in more Informatica and off-the-shelf data quality analysis tools for pre and post ETL. </li></ul><ul><li>Investments in automated DB regression test tools and training to support frequent data loads. </li></ul>
  19. 19. Important resource for DB testers