SlideShare a Scribd company logo
1 of 9
White Paper
1
Pragmatic Approach to
Data Warehouse Testing
White Paper | Shankar Mani & Bijoymon Soman
White Paper
2
Introduction
In today’s dynamic and fast evolving businesslandscape, organizations have to maintain an ever imperative acumento
play out strategies to be the leader in the competitive market. Fostering a customer centric approach plays a vital role
here in defining and delivering business services, for which the key insights lies hiddenin their very own day to day
businesstransactional data.Asthe businessservicesandsupportingITecosystemshave evolvedoverthe course of time,
organizationswillnowhave toconsolidate thesedatatoderive a holisticviewandanalyze meaningful insightswhichwill
enable intelligentdecisionmakingforstrategicbusinesstransformations.Investingandimplementinga data warehouse
systemisan essential criteriaforbuildingbusinessintelligence asasixthsense,thathelpsinpredictingandimprovingon
customer loyalty, market planning, risk management, fraud detection, cost reduction, logistics management, etc. This
foundation is laid to retain and build brand loyalty, improve business revenue from existing customers, business
expansion to newer markets and build new consumer base. Business growth achieved by strategic merger/acquisition
betweenorganizationscanalsolooktowardsachievingbenefitsfroma data warehouse system.Vastamountof system
and data consolidationwill eventuallyaligndatatowardsservingcommonbusinessobjectivesandneedsof the merged
organization.
OrganizationsusuallysourcespecializedITskillstodometiculousplanning,analysisandunderstandingof the application
landscape and therebysiphonthe echelonsof informationtheyhouse indisparate locationsintothe consolidateddata
warehouse.Furtherconsiderationmustalsobe givenregardingthe implementationstrategyandtechnology,resources
andtoolsto be usedtoperformthe dataIntegrationprocessandalsotobuildandpioneeradatapractice teamanddata
governance processes. Testing plays a key role here as the success factors of the organization’s decisions relyon these
critical source data streams and consolidation has to build a high quality Single Source of Truth. In this whitepaper we
will focusonhowthe traditional data warehouse testingcanbe customizedsothattestingteamplaysa keyrole indata
warehouse programimplementationthroughthe pragmaticshiftleftstrategyacrossthe SDLC.
Key Challenges to Data Warehouse Testing
 Ambiguousrequirementsspecificationswhichare notgranulizedtoa pointwhere theycan be visualizedbythe
data architecture / solution designteams.
 Decision making on requirementsis time consuming as there are comparatively more layers of stakeholders
involved.
 Data modelingand architecture issueswhichpopuponlyinthe testexecutionphase.
 Estimationisdifficultdue tothe followingreasons:
o Data qualitydifference fromheterogeneoussources
o Limiteddocumentationonbusinessrulesof the source systemsandcorrespondingdataflavors
 100% comprehensivedatavalidationnotfeasible manuallyinmanycases,hence the needof anautomationtool
iscrucial to meetthe CQT (Cost,Quality&Time factor) of the project.
 Performance bottlenecksin solution designthatcanbe identifiedfigurativelyonlyduring testexecutionphase.
White Paper
3
Tackling Data Warehouse Implementation
Approach for Optimized Results
Core principlesof Testingstayintact whenitcomesto testinga DWH/BI application/environment.Howeverthe strategy
adopted and methods employed to do verification and validation will get customized for data warehouse testing. All
specifictestingobjectiveswill be attunedtovalidatethatthe required datafromthe respective sourcesare consolidated
into the desired aggregation level with adequate quality standards and also be aligned to the existing IT operational
timingsof the organization.These are furthervalidatedandreinforcedbythe levelof desirable outcomesthatare visible
to consume bythe ReportingLayer.
Figure 1: Depiction of Stakeholder involvement in alignment with each phase of the Program Implementation
White Paper
4
Requirements Engineering & Data Profiling
Phase
Laying out an implementation roadmap on the subject areas for a data warehouse system is one of the critical success
factors for the program.Business Team& IT SMEs inconsultationwiththe implementationpartnershouldprioritizethe
migrationof data sourceson the factors presidingonmajorityof organizational data, business criticalityandinline with
the operational flow of the business.Datathatrepresents amajorshare of revenue forthe organizationshouldobviously
be prioritized above other categories. Rationale behind this is that large scale data integration project will have to go
through the initial stabilization of requirements for the overall data warehouse design. This is needed for a scalable
delivery model with processes and standards getting solidified to leverage the resources in catering to the handle the
most critical data volumes.Basedonthe prioritizationdone,SDLCprocessneedsto be initiatedforeach of the subject
areas with BI Users, Business Application SMEs, Business Analysts, Data Architects, and Solution Designers & Test
Architects participating in workshops and deriving the detailed business use cases. A comprehensive study (commonly
knownas AS-ISData Profiling) shouldhappenforall business processflowswithineachof the source systemsthat are
identifiedfor the subject areas whichwill define the business rules, data flavors and data quality check filters that will
thenflow intothe newlydevelopedData warehousesystemsas factual information.Dataarchitectusesthisinformation
to model the DimensionsandFacts (Star Schema/Snow-Flake Schema) asto be robustand scalable to handle the data
frommultitude of Sources tofinallyrepresentthe consolidated business factsoneachof the subjectareas.
The expectedoutcomesatthisstage are the following:
 ApprovedLow-Level Requirements –High-levelrequirementstatements(BIReporting) needtobe granulizedto
provide specific low-level requirements against data model and the traceability to be established to the RTM.
This will specify details to source systems for history and daily loads across multi-channel systems, volume of
history and daily data, applicability of source fields, corresponding business transformations and data
aggregationconsiderations, exceptionhandlinganddatarecyclingconsiderations.
 Approved Data Profiling Sheet – Low-level requirements to identify all operational and historical data source
fieldsthatneedtobe mappedagainsttarget dimensionsand facts.Thisshouldalsocoverthe flavorsof business
scenarios and data that will be generated in daily and historical loads with the volumes expected, tracing it to
approvedNFR’s.
 Approved Physical Data Model – Cater to all types of data sources identified considering difference in
characteristicsof the data comingfrommultiple sourcesandalso hierarchiesof dimensions.
 Business AnalystandBI team can use this informationtocreate a visual representationof the requirementsby
depictinghowthe data will be consumedbybusinessusersinthe reportingworld.Thisshouldbe signedoff by
the BI Users across variousportfoliosandby Data Architecture and IntegrationSolutionPractice andleveraged
to create endto endtestvalidationsuite.
White Paper
5
Effective Solution Design and Development
Considerations for the EDW
With the productive informationthatcame out of the as-isassessment,the solutiondesignteamwouldneedtoanalyze
and plan on sourcing the relevant data from history archives and daily transactional sources. Solution should ensure
adequate processesinplace for datacleansing,enrichment,standardizationanddeduplication,caveatthatthe functional
requirements are also satisfied to provide the desired level of granularity in the Facts roll-up and drill-down analysis.
History dataloadand dailydataloadare to be calibratedtosuitthe endobjectivetomaintainfull referential integrityand
seamlessdata. SolutionDesignteammustmapthe low-level requirementstatementstothe ETLframeworkdesignedfor
the data warehouse.Development teamshouldprovide a detailed low-level designtostate the detailsof the ETL Logic
for source extraction stage, transformation stage and target loading stage. This should also state all ETL Job stages to
perform the migration,the source systems and the target system environments and respective connection details, ETL
framework layer logic to handle data error handling, data recycling and performance enhancement. Testing team will
leverage thistocreate test scenariosrelevantto technical dynamicsof the ETL basedon the clarityavailable inthe LLD.
Also, the LLD should be able to show traceability to the RTM wherever applicable (althoughnot required or is least
applicable).
The expectedoutcomesatthisstage are the following:
 High-Level Design
o Extraction logic from all identified Sources with details to data hierarchy representation,
normalization/de-normalizationmappingto low-level requirements.
o Data life cycle in source systemsshouldbe appliedin datawarehouse system (SCDTypes,Delete etc.).
o Dimensions/Facts batchloadorder inalignmentwith datamodel.
o Integrationof the solutiontoETL framework.
 Requirement traceability matrix to be updated by the solution design and development team which shouldbe
signedoff byBA,Business SME andProgram Manager.
 Low-Level Design–ETLLogicforsource extraction, transformation,mappingand targetloadingstage.Thisshould
also state all ETL job stages to perform the data migration along withsource and target systemsenvironments
and connection details. This should also provide ETL layer logic to handle data errors, data recycling and
performance enhancement.
 Deploymentprocess(Code &Target Table structures) to be definedwithkeyowners andto be done througha
configurationmanagementprocess.
White Paper
6
Aligning Test Function Activities for a
Pragmatic Project Delivery
The testingpractice shouldgetinvolvedrightfromthe requirementanalysisphasetovetandbaselinerequirementsalong
withbusinessandprojectdeliveryteam.As data warehouse ismeantto be the single consolidatedview of truthof data
from all sources, the target data model should be able to accommodate the varying characteristics of the source data
coming in as part of historical data and daily data loads. Verification should commence from an approved data profiling
sheettovalidate thedataflavors andthe RTMshouldthenconnectthesefurtherintohigh-levelBIreportingandlow-level
data warehouse requirements.
As part of the project workshops project stakeholdersand test functionshouldbe part of the data model walkthrough
and signoff withwhere detailsto one toone, one tomanyand manyto many relationshipsbetween multipletablescan
be understood in relationshipto dimensions/facts. Data model tests will seekto validate the data relationshipsamong
the business entities(DimensionsandFacts) beyondthe basic table level constraints.Testingteamshouldalsoensure a
walkthroughof highlevel designandlow leveldesignfromthesolutiondesignteamanddevelopmentteamrespectively.
They must ensure to see the traceabilityof the HLD Logic to the data warehouse requirementsbeingclearlyidentified.
The extraction aggregates, filters and the field level logic should justify the low level requirements. This will help the
testingteamtoprepare testscenariosrelevantto business casesthattestfor data flavorsand data hierarchiesfromthe
HLD Logicthat maps to the low level datawarehouse requirements.
At thispoint,the TestingTeamshouldensure thatthe ProjectRTMshouldbe updatedtocapture the connectivityof the
Requirements to next level deliverables like Low-Level EDW Requirements, Physical Data Model, High Level Design
Sections,Low-LevelDesignSections,TestScenarios,TestCasesandDefects.Onlybystrengtheningthe connectivityof all
levels,we willbe able tomakeanEnd-To-EndTraceabilitytovalidateif alldetailsrequiredtoformthe Single Consolidated
View of Truth are properly identified and captured. RTMis to be provided as a key delivery by the Solution Design and
DevelopmentTeamwhichshouldbe signedoff byBA,BusinessTeamandProgramManager.
Inadditiontothe conventional straightforward technical ETLtestingscenarioswhichare purelybasedon solutionDesign
and source to target mapping sheet, testing should step up the game by considering test scenarios and test cases
preparedtotestthe businessperspectivesof the transactionaldatabeingdemandedinto datawarehouseandhowthey
roll-upanddrill-downcapabilitiesare possible in the data. These are critical business testScenarioswhichcould testfor
the followingbusiness relations:
a. Data Hierarchiesfollowedforsource entities
b. Data flavorsof varioustypesof transactional business flowsandanyspecificcharacteristicsassociated
c. Interrelationbetweenmultipleinterconnectedfields
d. Adaptabilityof eachof these conditionswhenconsideredforvariousgeographiesordemographics
The data quality issues in existing legacy and operational systems should also be considered. Any manual data fix
processingcarriedoutinsource systemswouldmeanthatwe needto properlytake care of extractingonlythe data
that has beencorrectedthroughthese means.We needtomigrate onlythe cleanseddataand at the same time for
daily loads, we should think if the same cleansing process needs to be built into the to-be system as well, if the
previous system is going to be decommissioned. At the end of the Day, BI Users should be using the cleansed,
enriched,de-duplicatedandstandardizeddata suitable foruse withreportingstandards.
White Paper
7
Test Planning and Estimation, Test Design, Test
Environment & Data
 TestingeffortforDimensionandFactComponentsinvolvedinadata model shouldbe basedon classifyingthem
basedoncomplexitylevels.The amountof complexitycanbe gradedbasedonthe numberof sources,numberof
Fields,amountof Transformations,recyclingfactors,datavolumeand above allthe amountof businessusecases
involvedinordertosatisfyfull testcoverage.
 Business SME, Business Analyst & Data Architect should approve the End to End test scenarios to cover the
businessfunctionalitiesanddatamodel.
 Solution Design should approve the test scenarios and test cases to cover the ETL Logic for source extraction
stage, transformation stage and target loading stage, data error handling, data recycling & performance
enhancement.
 Data mining will be performedon the basis of approveddata profiling sheet and will be leveraged as test data
forsystemtestcases.Planfordatasourcingshouldtake intoaccountof testingthe historicalanddailydataLoad.
Further to that, the test conditions that cover the updates and deletes on data should also be planned and
monitoredbetweenthe loads.
 Dev Ops Model of build integration can be implemented so that the test function will be able to plan the test
execution effort for the defect fixes and new features being available with each build. This will reduce the
turnaroundtime fortestingteamandalsoreduce the amount of time to be spent of regression.
Accelerators for Testing Phase
AutomationTool isakeylevertodeliverany datawarehouse testingprojectwithanoptimizedcostwithhighdatafidelity
validationonthe committeddeliverytimelines.Afteranalyzingthe requirements,designandtestingscope lookoutfor
a tool inthe marketwhichcan suffix inmeetingthe below validationareaselse plantodevelopanIn-housetool solution
whichwill caterto this.
 Data completeness&data quality(acrossvarioussourcesandtargetdatabase)
 Staticvalidationof datastructuresinmultiple environments
 Businesstransformationrule validation
 Volumetrictestdataoutcome validation
 Duplicate datacheck
 Durable keycheck
 SCD Validations(Types1to 4) & Deletes
 Reportingcapability(forthe above features)
Automation tool if implemented can be leverage even by the development team during their unit / integration testing
phase and flushoutdefectsearlyinthe projectlifecycle.
White Paper
8
Effective Program governance practices to be
ensured by the Customer
Even if the customer has handed over the initiative to the vendor/ internal IT team who are specialized in the data
integrationspace,itstill formsthe responsibilityof theirbusiness teamtohave the complete ownershipandgovernance
of the data of theirorganization.Theyshouldbe able to comprehendthe data needsas definedinthe requirementsfor
reporting and have an authoritative say on the business process that generate and handle data in Online Transactional
Processing Systems (OLTP). Overall they need to strategically position and advise the partner teams on the business
deliverypriorities forthe program.
The program manager should ensure the involvement of business system SMEs and BI Users in project workshops to
detail andgranulize the requirementsforuse byotherstakeholders.SMEsshouldexplainthe business processand data
representationsinOLTP systemsand BI Users shouldquantifythe data needsandsee the granularitylevelsavailable in
both the history and daily data sources. The program manager should take accountability of maintaining the RTM by
gettingit updatedby variousfunctions withthe traceability rootedinto granular level requirementsderived duringthe
project workshops as this will instill the team with confidence to proceed into each stage . It will also ensure water-
proofingof all the BI level userstorieswiththe rightdatascopingbroughtabout by the considerationof the rightsetof
data sourcesas well as data flavors. Customertoalso track the projectKPIsto progressivelyevaluate the adherenceof
the projectdeliveriestothe expectedsuccess criteriaandaccordinglyfinetune the sameduringthecourse of the project
to steerittowardsachievingthe organizationalbusiness goalsontime withquality.
Key Factors to Successful Implementation of
Data Warehouse System
In a nutshell,tosuccessfullyimplementadatawarehouse system, the followingcriteriashouldbe considered:
 Projectrequirementworkshopsshouldstrivetocapture theholisticview ofthe reportingneedstothe enterprise-
requirementsfromthe executivemanagementteam(andotherendusersof the reports)
 Pragmatic plan should be in place to support the staged implementation encompassing the phase for solution
prototypingforall endto endscenarios
 Project plan should consider prioritizing the design and implementation of the key business areas and data
volumeswhere the BIReportingiscrucial forthe clientexecutiveboard
 Implement effective communication methods with all stakeholders and implementation partners through
meetingsandworkshopsfrominceptiontoprojectpilotphase
 Business team,SystemSME’s,BusinessAnalystand BIteamto be part of eachstage gate criteriaof the SDLC for
walkthroughandapprovals
 Data quality is the key criteria where processes should be definedto flush out/ handle erroneous source data
before loadingtothe datawarehouse system
 Identifythe frequencyandupdatesof datafromall external andinternal sourcesanditsrelevance tobusiness
 Design and data model should give enough considerationon the time and effort for ETL jobs of history and
incremental dataloadsasthistime shouldfactor intothe businessoperational jobtimings
White Paper
9
About the Authors
 Shankar Mani isTestManagerwithover11years’experience,
currently working with the Europe Delivery Team of UST
Global. He has worked in different sectors retail, financial
services & logistics as test manager, test consultant and
playedkeyrolesfordeliveringlarge programsforthe clients.
 Bijoymon Soman is a seasoned data warehouse Test Expert
with 9 + years of testing experience, spanning varied
technology platforms for Banking & Financial Services and
Retail Clients.Over5+ years of experience purelyinBI & DW
and ETL Testing with extensive knowledge on business
processes.

More Related Content

What's hot

Value stream mapping for DevOps
Value stream mapping for DevOpsValue stream mapping for DevOps
Value stream mapping for DevOpsMarc Hornbeek
 
SQA Manager_Profile_Suman Kumar Ghosh
SQA Manager_Profile_Suman Kumar GhoshSQA Manager_Profile_Suman Kumar Ghosh
SQA Manager_Profile_Suman Kumar GhoshSuman Ghosh
 
MSS Business Integration Practice Ibm Web Sphere
MSS Business Integration Practice   Ibm Web SphereMSS Business Integration Practice   Ibm Web Sphere
MSS Business Integration Practice Ibm Web SphereDavid White
 
Engineering Continuous Delivery Architectures
Engineering Continuous Delivery ArchitecturesEngineering Continuous Delivery Architectures
Engineering Continuous Delivery ArchitecturesMarc Hornbeek
 
Sap test center of excellence
Sap test center of excellenceSap test center of excellence
Sap test center of excellenceInfosys
 
An example of a successful proof of concept
An example of a successful proof of conceptAn example of a successful proof of concept
An example of a successful proof of conceptETLSolutions
 
Engineering DevOps Right the First Time
Engineering DevOps Right the First TimeEngineering DevOps Right the First Time
Engineering DevOps Right the First TimeMarc Hornbeek
 
Top 8 Trends in Performance Engineering
Top 8 Trends in Performance EngineeringTop 8 Trends in Performance Engineering
Top 8 Trends in Performance EngineeringConvetit
 
Quality-Ready Assessment sample report
Quality-Ready Assessment sample reportQuality-Ready Assessment sample report
Quality-Ready Assessment sample reportAlan See
 
Dineshkumar_Automation tester_6.5years
Dineshkumar_Automation tester_6.5yearsDineshkumar_Automation tester_6.5years
Dineshkumar_Automation tester_6.5yearsdineshkumar selvaraj
 
Testing Centralization
Testing CentralizationTesting Centralization
Testing CentralizationCognizant
 
Governance of agile Software projects by an automated KPI Cockpit in the Cloud
Governance of agile Software projectsby an automated KPI Cockpit in the CloudGovernance of agile Software projectsby an automated KPI Cockpit in the Cloud
Governance of agile Software projects by an automated KPI Cockpit in the CloudpliXos GmbH
 
Gap assessment Continuous Testing
Gap assessment   Continuous TestingGap assessment   Continuous Testing
Gap assessment Continuous TestingMarc Hornbeek
 
Testing as a Managed Service using SLAs and KPIs
Testing as a Managed Service using SLAs and KPIsTesting as a Managed Service using SLAs and KPIs
Testing as a Managed Service using SLAs and KPIsProlifics
 
Advantix Technologies_ Website | SoftwareTesting services
Advantix Technologies_ Website | SoftwareTesting services Advantix Technologies_ Website | SoftwareTesting services
Advantix Technologies_ Website | SoftwareTesting services Advantix Technologies
 

What's hot (20)

Testing Centre Of Excellence From AppLabs
Testing Centre Of Excellence From AppLabsTesting Centre Of Excellence From AppLabs
Testing Centre Of Excellence From AppLabs
 
Value stream mapping for DevOps
Value stream mapping for DevOpsValue stream mapping for DevOps
Value stream mapping for DevOps
 
SQA Manager_Profile_Suman Kumar Ghosh
SQA Manager_Profile_Suman Kumar GhoshSQA Manager_Profile_Suman Kumar Ghosh
SQA Manager_Profile_Suman Kumar Ghosh
 
MSS Business Integration Practice Ibm Web Sphere
MSS Business Integration Practice   Ibm Web SphereMSS Business Integration Practice   Ibm Web Sphere
MSS Business Integration Practice Ibm Web Sphere
 
PLM Implementation
PLM ImplementationPLM Implementation
PLM Implementation
 
Engineering Continuous Delivery Architectures
Engineering Continuous Delivery ArchitecturesEngineering Continuous Delivery Architectures
Engineering Continuous Delivery Architectures
 
Sap test center of excellence
Sap test center of excellenceSap test center of excellence
Sap test center of excellence
 
An example of a successful proof of concept
An example of a successful proof of conceptAn example of a successful proof of concept
An example of a successful proof of concept
 
Engineering DevOps Right the First Time
Engineering DevOps Right the First TimeEngineering DevOps Right the First Time
Engineering DevOps Right the First Time
 
Top 8 Trends in Performance Engineering
Top 8 Trends in Performance EngineeringTop 8 Trends in Performance Engineering
Top 8 Trends in Performance Engineering
 
Quality-Ready Assessment sample report
Quality-Ready Assessment sample reportQuality-Ready Assessment sample report
Quality-Ready Assessment sample report
 
Dineshkumar_Automation tester_6.5years
Dineshkumar_Automation tester_6.5yearsDineshkumar_Automation tester_6.5years
Dineshkumar_Automation tester_6.5years
 
Testing Centralization
Testing CentralizationTesting Centralization
Testing Centralization
 
Susan Dyl DEC
Susan Dyl DECSusan Dyl DEC
Susan Dyl DEC
 
Governance of agile Software projects by an automated KPI Cockpit in the Cloud
Governance of agile Software projectsby an automated KPI Cockpit in the CloudGovernance of agile Software projectsby an automated KPI Cockpit in the Cloud
Governance of agile Software projects by an automated KPI Cockpit in the Cloud
 
Michael Grumbach PM Resume 2015
Michael Grumbach PM Resume 2015Michael Grumbach PM Resume 2015
Michael Grumbach PM Resume 2015
 
Gap assessment Continuous Testing
Gap assessment   Continuous TestingGap assessment   Continuous Testing
Gap assessment Continuous Testing
 
Testing as a Managed Service using SLAs and KPIs
Testing as a Managed Service using SLAs and KPIsTesting as a Managed Service using SLAs and KPIs
Testing as a Managed Service using SLAs and KPIs
 
Advantix Technologies_ Website | SoftwareTesting services
Advantix Technologies_ Website | SoftwareTesting services Advantix Technologies_ Website | SoftwareTesting services
Advantix Technologies_ Website | SoftwareTesting services
 
SOA - Make your CEO Happy
SOA - Make your CEO HappySOA - Make your CEO Happy
SOA - Make your CEO Happy
 

Similar to Pragmatic Approach to Datawarehouse Testing_.docx

A Practical Approach for Power Utilities Seeking to Create Sustaining Busines...
A Practical Approach for Power Utilities Seeking to Create Sustaining Busines...A Practical Approach for Power Utilities Seeking to Create Sustaining Busines...
A Practical Approach for Power Utilities Seeking to Create Sustaining Busines...Cognizant
 
Documentation on bigmarket copy
Documentation on bigmarket   copyDocumentation on bigmarket   copy
Documentation on bigmarket copyswamypotharaveni
 
BI Masterclass slides (Reference Architecture v3)
BI Masterclass slides (Reference Architecture v3)BI Masterclass slides (Reference Architecture v3)
BI Masterclass slides (Reference Architecture v3)Syaifuddin Ismail
 
Cognitivo - Tackling the enterprise data quality challenge
Cognitivo - Tackling the enterprise data quality challengeCognitivo - Tackling the enterprise data quality challenge
Cognitivo - Tackling the enterprise data quality challengeAlan Hsiao
 
From Visibility to Value
From Visibility to ValueFrom Visibility to Value
From Visibility to Valueaccenture
 
Quant Foundry Labs - Low Probability Defaults
Quant Foundry Labs - Low Probability DefaultsQuant Foundry Labs - Low Probability Defaults
Quant Foundry Labs - Low Probability DefaultsDavidkerrkelly
 
The Xoriant Whitepaper: Last Mile Soa Implementation
The Xoriant Whitepaper: Last Mile Soa ImplementationThe Xoriant Whitepaper: Last Mile Soa Implementation
The Xoriant Whitepaper: Last Mile Soa ImplementationXoriant Corporation
 
Performance measurement and exception management in investment processing
Performance measurement and exception management in investment processingPerformance measurement and exception management in investment processing
Performance measurement and exception management in investment processingNIIT Technologies
 
POS Data Quality: Overcoming a Lingering Retail Nightmare
POS Data Quality: Overcoming a Lingering Retail NightmarePOS Data Quality: Overcoming a Lingering Retail Nightmare
POS Data Quality: Overcoming a Lingering Retail NightmareCognizant
 
CWIN17 India / Bigdata architecture yashowardhan sowale
CWIN17 India / Bigdata architecture  yashowardhan sowaleCWIN17 India / Bigdata architecture  yashowardhan sowale
CWIN17 India / Bigdata architecture yashowardhan sowaleCapgemini
 
rahul cv modified
rahul cv modifiedrahul cv modified
rahul cv modifiedRahul Patil
 
De-risking Life and Annuity Policy Admin System Conversions
De-risking Life and Annuity Policy Admin System ConversionsDe-risking Life and Annuity Policy Admin System Conversions
De-risking Life and Annuity Policy Admin System ConversionsCognizant
 
Ibm test data_management_v0.4
Ibm test data_management_v0.4Ibm test data_management_v0.4
Ibm test data_management_v0.4Rosario Cunha
 
Building a Robust Big Data QA Ecosystem to Mitigate Data Integrity Challenges
Building a Robust Big Data QA Ecosystem to Mitigate Data Integrity ChallengesBuilding a Robust Big Data QA Ecosystem to Mitigate Data Integrity Challenges
Building a Robust Big Data QA Ecosystem to Mitigate Data Integrity ChallengesCognizant
 
Data science in demand planning - when the machine is not enough
Data science in demand planning - when the machine is not enoughData science in demand planning - when the machine is not enough
Data science in demand planning - when the machine is not enoughTristan Wiggill
 
Example data specifications and info requirements framework OVERVIEW
Example data specifications and info requirements framework OVERVIEWExample data specifications and info requirements framework OVERVIEW
Example data specifications and info requirements framework OVERVIEWAlan D. Duncan
 
Derivatives Post Trade Processing
Derivatives Post Trade ProcessingDerivatives Post Trade Processing
Derivatives Post Trade ProcessingGenpact Ltd
 
Эволюция Big Data и Information Management. Reference Architecture.
Эволюция Big Data и Information Management. Reference Architecture.Эволюция Big Data и Information Management. Reference Architecture.
Эволюция Big Data и Information Management. Reference Architecture.Andrey Akulov
 

Similar to Pragmatic Approach to Datawarehouse Testing_.docx (20)

A Practical Approach for Power Utilities Seeking to Create Sustaining Busines...
A Practical Approach for Power Utilities Seeking to Create Sustaining Busines...A Practical Approach for Power Utilities Seeking to Create Sustaining Busines...
A Practical Approach for Power Utilities Seeking to Create Sustaining Busines...
 
Documentation on bigmarket copy
Documentation on bigmarket   copyDocumentation on bigmarket   copy
Documentation on bigmarket copy
 
BI Masterclass slides (Reference Architecture v3)
BI Masterclass slides (Reference Architecture v3)BI Masterclass slides (Reference Architecture v3)
BI Masterclass slides (Reference Architecture v3)
 
ALLL FZL and TD
ALLL FZL and TDALLL FZL and TD
ALLL FZL and TD
 
Cognitivo - Tackling the enterprise data quality challenge
Cognitivo - Tackling the enterprise data quality challengeCognitivo - Tackling the enterprise data quality challenge
Cognitivo - Tackling the enterprise data quality challenge
 
From Visibility to Value
From Visibility to ValueFrom Visibility to Value
From Visibility to Value
 
Quant Foundry Labs - Low Probability Defaults
Quant Foundry Labs - Low Probability DefaultsQuant Foundry Labs - Low Probability Defaults
Quant Foundry Labs - Low Probability Defaults
 
Approach to Data Management v0.2
Approach to Data Management v0.2Approach to Data Management v0.2
Approach to Data Management v0.2
 
The Xoriant Whitepaper: Last Mile Soa Implementation
The Xoriant Whitepaper: Last Mile Soa ImplementationThe Xoriant Whitepaper: Last Mile Soa Implementation
The Xoriant Whitepaper: Last Mile Soa Implementation
 
Performance measurement and exception management in investment processing
Performance measurement and exception management in investment processingPerformance measurement and exception management in investment processing
Performance measurement and exception management in investment processing
 
POS Data Quality: Overcoming a Lingering Retail Nightmare
POS Data Quality: Overcoming a Lingering Retail NightmarePOS Data Quality: Overcoming a Lingering Retail Nightmare
POS Data Quality: Overcoming a Lingering Retail Nightmare
 
CWIN17 India / Bigdata architecture yashowardhan sowale
CWIN17 India / Bigdata architecture  yashowardhan sowaleCWIN17 India / Bigdata architecture  yashowardhan sowale
CWIN17 India / Bigdata architecture yashowardhan sowale
 
rahul cv modified
rahul cv modifiedrahul cv modified
rahul cv modified
 
De-risking Life and Annuity Policy Admin System Conversions
De-risking Life and Annuity Policy Admin System ConversionsDe-risking Life and Annuity Policy Admin System Conversions
De-risking Life and Annuity Policy Admin System Conversions
 
Ibm test data_management_v0.4
Ibm test data_management_v0.4Ibm test data_management_v0.4
Ibm test data_management_v0.4
 
Building a Robust Big Data QA Ecosystem to Mitigate Data Integrity Challenges
Building a Robust Big Data QA Ecosystem to Mitigate Data Integrity ChallengesBuilding a Robust Big Data QA Ecosystem to Mitigate Data Integrity Challenges
Building a Robust Big Data QA Ecosystem to Mitigate Data Integrity Challenges
 
Data science in demand planning - when the machine is not enough
Data science in demand planning - when the machine is not enoughData science in demand planning - when the machine is not enough
Data science in demand planning - when the machine is not enough
 
Example data specifications and info requirements framework OVERVIEW
Example data specifications and info requirements framework OVERVIEWExample data specifications and info requirements framework OVERVIEW
Example data specifications and info requirements framework OVERVIEW
 
Derivatives Post Trade Processing
Derivatives Post Trade ProcessingDerivatives Post Trade Processing
Derivatives Post Trade Processing
 
Эволюция Big Data и Information Management. Reference Architecture.
Эволюция Big Data и Information Management. Reference Architecture.Эволюция Big Data и Information Management. Reference Architecture.
Эволюция Big Data и Information Management. Reference Architecture.
 

Recently uploaded

Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 

Recently uploaded (20)

Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 

Pragmatic Approach to Datawarehouse Testing_.docx

  • 1. White Paper 1 Pragmatic Approach to Data Warehouse Testing White Paper | Shankar Mani & Bijoymon Soman
  • 2. White Paper 2 Introduction In today’s dynamic and fast evolving businesslandscape, organizations have to maintain an ever imperative acumento play out strategies to be the leader in the competitive market. Fostering a customer centric approach plays a vital role here in defining and delivering business services, for which the key insights lies hiddenin their very own day to day businesstransactional data.Asthe businessservicesandsupportingITecosystemshave evolvedoverthe course of time, organizationswillnowhave toconsolidate thesedatatoderive a holisticviewandanalyze meaningful insightswhichwill enable intelligentdecisionmakingforstrategicbusinesstransformations.Investingandimplementinga data warehouse systemisan essential criteriaforbuildingbusinessintelligence asasixthsense,thathelpsinpredictingandimprovingon customer loyalty, market planning, risk management, fraud detection, cost reduction, logistics management, etc. This foundation is laid to retain and build brand loyalty, improve business revenue from existing customers, business expansion to newer markets and build new consumer base. Business growth achieved by strategic merger/acquisition betweenorganizationscanalsolooktowardsachievingbenefitsfroma data warehouse system.Vastamountof system and data consolidationwill eventuallyaligndatatowardsservingcommonbusinessobjectivesandneedsof the merged organization. OrganizationsusuallysourcespecializedITskillstodometiculousplanning,analysisandunderstandingof the application landscape and therebysiphonthe echelonsof informationtheyhouse indisparate locationsintothe consolidateddata warehouse.Furtherconsiderationmustalsobe givenregardingthe implementationstrategyandtechnology,resources andtoolsto be usedtoperformthe dataIntegrationprocessandalsotobuildandpioneeradatapractice teamanddata governance processes. Testing plays a key role here as the success factors of the organization’s decisions relyon these critical source data streams and consolidation has to build a high quality Single Source of Truth. In this whitepaper we will focusonhowthe traditional data warehouse testingcanbe customizedsothattestingteamplaysa keyrole indata warehouse programimplementationthroughthe pragmaticshiftleftstrategyacrossthe SDLC. Key Challenges to Data Warehouse Testing  Ambiguousrequirementsspecificationswhichare notgranulizedtoa pointwhere theycan be visualizedbythe data architecture / solution designteams.  Decision making on requirementsis time consuming as there are comparatively more layers of stakeholders involved.  Data modelingand architecture issueswhichpopuponlyinthe testexecutionphase.  Estimationisdifficultdue tothe followingreasons: o Data qualitydifference fromheterogeneoussources o Limiteddocumentationonbusinessrulesof the source systemsandcorrespondingdataflavors  100% comprehensivedatavalidationnotfeasible manuallyinmanycases,hence the needof anautomationtool iscrucial to meetthe CQT (Cost,Quality&Time factor) of the project.  Performance bottlenecksin solution designthatcanbe identifiedfigurativelyonlyduring testexecutionphase.
  • 3. White Paper 3 Tackling Data Warehouse Implementation Approach for Optimized Results Core principlesof Testingstayintact whenitcomesto testinga DWH/BI application/environment.Howeverthe strategy adopted and methods employed to do verification and validation will get customized for data warehouse testing. All specifictestingobjectiveswill be attunedtovalidatethatthe required datafromthe respective sourcesare consolidated into the desired aggregation level with adequate quality standards and also be aligned to the existing IT operational timingsof the organization.These are furthervalidatedandreinforcedbythe levelof desirable outcomesthatare visible to consume bythe ReportingLayer. Figure 1: Depiction of Stakeholder involvement in alignment with each phase of the Program Implementation
  • 4. White Paper 4 Requirements Engineering & Data Profiling Phase Laying out an implementation roadmap on the subject areas for a data warehouse system is one of the critical success factors for the program.Business Team& IT SMEs inconsultationwiththe implementationpartnershouldprioritizethe migrationof data sourceson the factors presidingonmajorityof organizational data, business criticalityandinline with the operational flow of the business.Datathatrepresents amajorshare of revenue forthe organizationshouldobviously be prioritized above other categories. Rationale behind this is that large scale data integration project will have to go through the initial stabilization of requirements for the overall data warehouse design. This is needed for a scalable delivery model with processes and standards getting solidified to leverage the resources in catering to the handle the most critical data volumes.Basedonthe prioritizationdone,SDLCprocessneedsto be initiatedforeach of the subject areas with BI Users, Business Application SMEs, Business Analysts, Data Architects, and Solution Designers & Test Architects participating in workshops and deriving the detailed business use cases. A comprehensive study (commonly knownas AS-ISData Profiling) shouldhappenforall business processflowswithineachof the source systemsthat are identifiedfor the subject areas whichwill define the business rules, data flavors and data quality check filters that will thenflow intothe newlydevelopedData warehousesystemsas factual information.Dataarchitectusesthisinformation to model the DimensionsandFacts (Star Schema/Snow-Flake Schema) asto be robustand scalable to handle the data frommultitude of Sources tofinallyrepresentthe consolidated business factsoneachof the subjectareas. The expectedoutcomesatthisstage are the following:  ApprovedLow-Level Requirements –High-levelrequirementstatements(BIReporting) needtobe granulizedto provide specific low-level requirements against data model and the traceability to be established to the RTM. This will specify details to source systems for history and daily loads across multi-channel systems, volume of history and daily data, applicability of source fields, corresponding business transformations and data aggregationconsiderations, exceptionhandlinganddatarecyclingconsiderations.  Approved Data Profiling Sheet – Low-level requirements to identify all operational and historical data source fieldsthatneedtobe mappedagainsttarget dimensionsand facts.Thisshouldalsocoverthe flavorsof business scenarios and data that will be generated in daily and historical loads with the volumes expected, tracing it to approvedNFR’s.  Approved Physical Data Model – Cater to all types of data sources identified considering difference in characteristicsof the data comingfrommultiple sourcesandalso hierarchiesof dimensions.  Business AnalystandBI team can use this informationtocreate a visual representationof the requirementsby depictinghowthe data will be consumedbybusinessusersinthe reportingworld.Thisshouldbe signedoff by the BI Users across variousportfoliosandby Data Architecture and IntegrationSolutionPractice andleveraged to create endto endtestvalidationsuite.
  • 5. White Paper 5 Effective Solution Design and Development Considerations for the EDW With the productive informationthatcame out of the as-isassessment,the solutiondesignteamwouldneedtoanalyze and plan on sourcing the relevant data from history archives and daily transactional sources. Solution should ensure adequate processesinplace for datacleansing,enrichment,standardizationanddeduplication,caveatthatthe functional requirements are also satisfied to provide the desired level of granularity in the Facts roll-up and drill-down analysis. History dataloadand dailydataloadare to be calibratedtosuitthe endobjectivetomaintainfull referential integrityand seamlessdata. SolutionDesignteammustmapthe low-level requirementstatementstothe ETLframeworkdesignedfor the data warehouse.Development teamshouldprovide a detailed low-level designtostate the detailsof the ETL Logic for source extraction stage, transformation stage and target loading stage. This should also state all ETL Job stages to perform the migration,the source systems and the target system environments and respective connection details, ETL framework layer logic to handle data error handling, data recycling and performance enhancement. Testing team will leverage thistocreate test scenariosrelevantto technical dynamicsof the ETL basedon the clarityavailable inthe LLD. Also, the LLD should be able to show traceability to the RTM wherever applicable (althoughnot required or is least applicable). The expectedoutcomesatthisstage are the following:  High-Level Design o Extraction logic from all identified Sources with details to data hierarchy representation, normalization/de-normalizationmappingto low-level requirements. o Data life cycle in source systemsshouldbe appliedin datawarehouse system (SCDTypes,Delete etc.). o Dimensions/Facts batchloadorder inalignmentwith datamodel. o Integrationof the solutiontoETL framework.  Requirement traceability matrix to be updated by the solution design and development team which shouldbe signedoff byBA,Business SME andProgram Manager.  Low-Level Design–ETLLogicforsource extraction, transformation,mappingand targetloadingstage.Thisshould also state all ETL job stages to perform the data migration along withsource and target systemsenvironments and connection details. This should also provide ETL layer logic to handle data errors, data recycling and performance enhancement.  Deploymentprocess(Code &Target Table structures) to be definedwithkeyowners andto be done througha configurationmanagementprocess.
  • 6. White Paper 6 Aligning Test Function Activities for a Pragmatic Project Delivery The testingpractice shouldgetinvolvedrightfromthe requirementanalysisphasetovetandbaselinerequirementsalong withbusinessandprojectdeliveryteam.As data warehouse ismeantto be the single consolidatedview of truthof data from all sources, the target data model should be able to accommodate the varying characteristics of the source data coming in as part of historical data and daily data loads. Verification should commence from an approved data profiling sheettovalidate thedataflavors andthe RTMshouldthenconnectthesefurtherintohigh-levelBIreportingandlow-level data warehouse requirements. As part of the project workshops project stakeholdersand test functionshouldbe part of the data model walkthrough and signoff withwhere detailsto one toone, one tomanyand manyto many relationshipsbetween multipletablescan be understood in relationshipto dimensions/facts. Data model tests will seekto validate the data relationshipsamong the business entities(DimensionsandFacts) beyondthe basic table level constraints.Testingteamshouldalsoensure a walkthroughof highlevel designandlow leveldesignfromthesolutiondesignteamanddevelopmentteamrespectively. They must ensure to see the traceabilityof the HLD Logic to the data warehouse requirementsbeingclearlyidentified. The extraction aggregates, filters and the field level logic should justify the low level requirements. This will help the testingteamtoprepare testscenariosrelevantto business casesthattestfor data flavorsand data hierarchiesfromthe HLD Logicthat maps to the low level datawarehouse requirements. At thispoint,the TestingTeamshouldensure thatthe ProjectRTMshouldbe updatedtocapture the connectivityof the Requirements to next level deliverables like Low-Level EDW Requirements, Physical Data Model, High Level Design Sections,Low-LevelDesignSections,TestScenarios,TestCasesandDefects.Onlybystrengtheningthe connectivityof all levels,we willbe able tomakeanEnd-To-EndTraceabilitytovalidateif alldetailsrequiredtoformthe Single Consolidated View of Truth are properly identified and captured. RTMis to be provided as a key delivery by the Solution Design and DevelopmentTeamwhichshouldbe signedoff byBA,BusinessTeamandProgramManager. Inadditiontothe conventional straightforward technical ETLtestingscenarioswhichare purelybasedon solutionDesign and source to target mapping sheet, testing should step up the game by considering test scenarios and test cases preparedtotestthe businessperspectivesof the transactionaldatabeingdemandedinto datawarehouseandhowthey roll-upanddrill-downcapabilitiesare possible in the data. These are critical business testScenarioswhichcould testfor the followingbusiness relations: a. Data Hierarchiesfollowedforsource entities b. Data flavorsof varioustypesof transactional business flowsandanyspecificcharacteristicsassociated c. Interrelationbetweenmultipleinterconnectedfields d. Adaptabilityof eachof these conditionswhenconsideredforvariousgeographiesordemographics The data quality issues in existing legacy and operational systems should also be considered. Any manual data fix processingcarriedoutinsource systemswouldmeanthatwe needto properlytake care of extractingonlythe data that has beencorrectedthroughthese means.We needtomigrate onlythe cleanseddataand at the same time for daily loads, we should think if the same cleansing process needs to be built into the to-be system as well, if the previous system is going to be decommissioned. At the end of the Day, BI Users should be using the cleansed, enriched,de-duplicatedandstandardizeddata suitable foruse withreportingstandards.
  • 7. White Paper 7 Test Planning and Estimation, Test Design, Test Environment & Data  TestingeffortforDimensionandFactComponentsinvolvedinadata model shouldbe basedon classifyingthem basedoncomplexitylevels.The amountof complexitycanbe gradedbasedonthe numberof sources,numberof Fields,amountof Transformations,recyclingfactors,datavolumeand above allthe amountof businessusecases involvedinordertosatisfyfull testcoverage.  Business SME, Business Analyst & Data Architect should approve the End to End test scenarios to cover the businessfunctionalitiesanddatamodel.  Solution Design should approve the test scenarios and test cases to cover the ETL Logic for source extraction stage, transformation stage and target loading stage, data error handling, data recycling & performance enhancement.  Data mining will be performedon the basis of approveddata profiling sheet and will be leveraged as test data forsystemtestcases.Planfordatasourcingshouldtake intoaccountof testingthe historicalanddailydataLoad. Further to that, the test conditions that cover the updates and deletes on data should also be planned and monitoredbetweenthe loads.  Dev Ops Model of build integration can be implemented so that the test function will be able to plan the test execution effort for the defect fixes and new features being available with each build. This will reduce the turnaroundtime fortestingteamandalsoreduce the amount of time to be spent of regression. Accelerators for Testing Phase AutomationTool isakeylevertodeliverany datawarehouse testingprojectwithanoptimizedcostwithhighdatafidelity validationonthe committeddeliverytimelines.Afteranalyzingthe requirements,designandtestingscope lookoutfor a tool inthe marketwhichcan suffix inmeetingthe below validationareaselse plantodevelopanIn-housetool solution whichwill caterto this.  Data completeness&data quality(acrossvarioussourcesandtargetdatabase)  Staticvalidationof datastructuresinmultiple environments  Businesstransformationrule validation  Volumetrictestdataoutcome validation  Duplicate datacheck  Durable keycheck  SCD Validations(Types1to 4) & Deletes  Reportingcapability(forthe above features) Automation tool if implemented can be leverage even by the development team during their unit / integration testing phase and flushoutdefectsearlyinthe projectlifecycle.
  • 8. White Paper 8 Effective Program governance practices to be ensured by the Customer Even if the customer has handed over the initiative to the vendor/ internal IT team who are specialized in the data integrationspace,itstill formsthe responsibilityof theirbusiness teamtohave the complete ownershipandgovernance of the data of theirorganization.Theyshouldbe able to comprehendthe data needsas definedinthe requirementsfor reporting and have an authoritative say on the business process that generate and handle data in Online Transactional Processing Systems (OLTP). Overall they need to strategically position and advise the partner teams on the business deliverypriorities forthe program. The program manager should ensure the involvement of business system SMEs and BI Users in project workshops to detail andgranulize the requirementsforuse byotherstakeholders.SMEsshouldexplainthe business processand data representationsinOLTP systemsand BI Users shouldquantifythe data needsandsee the granularitylevelsavailable in both the history and daily data sources. The program manager should take accountability of maintaining the RTM by gettingit updatedby variousfunctions withthe traceability rootedinto granular level requirementsderived duringthe project workshops as this will instill the team with confidence to proceed into each stage . It will also ensure water- proofingof all the BI level userstorieswiththe rightdatascopingbroughtabout by the considerationof the rightsetof data sourcesas well as data flavors. Customertoalso track the projectKPIsto progressivelyevaluate the adherenceof the projectdeliveriestothe expectedsuccess criteriaandaccordinglyfinetune the sameduringthecourse of the project to steerittowardsachievingthe organizationalbusiness goalsontime withquality. Key Factors to Successful Implementation of Data Warehouse System In a nutshell,tosuccessfullyimplementadatawarehouse system, the followingcriteriashouldbe considered:  Projectrequirementworkshopsshouldstrivetocapture theholisticview ofthe reportingneedstothe enterprise- requirementsfromthe executivemanagementteam(andotherendusersof the reports)  Pragmatic plan should be in place to support the staged implementation encompassing the phase for solution prototypingforall endto endscenarios  Project plan should consider prioritizing the design and implementation of the key business areas and data volumeswhere the BIReportingiscrucial forthe clientexecutiveboard  Implement effective communication methods with all stakeholders and implementation partners through meetingsandworkshopsfrominceptiontoprojectpilotphase  Business team,SystemSME’s,BusinessAnalystand BIteamto be part of eachstage gate criteriaof the SDLC for walkthroughandapprovals  Data quality is the key criteria where processes should be definedto flush out/ handle erroneous source data before loadingtothe datawarehouse system  Identifythe frequencyandupdatesof datafromall external andinternal sourcesanditsrelevance tobusiness  Design and data model should give enough considerationon the time and effort for ETL jobs of history and incremental dataloadsasthistime shouldfactor intothe businessoperational jobtimings
  • 9. White Paper 9 About the Authors  Shankar Mani isTestManagerwithover11years’experience, currently working with the Europe Delivery Team of UST Global. He has worked in different sectors retail, financial services & logistics as test manager, test consultant and playedkeyrolesfordeliveringlarge programsforthe clients.  Bijoymon Soman is a seasoned data warehouse Test Expert with 9 + years of testing experience, spanning varied technology platforms for Banking & Financial Services and Retail Clients.Over5+ years of experience purelyinBI & DW and ETL Testing with extensive knowledge on business processes.