Proven Testing Techniques in Large Data Warehousing Projects


Published on

A paper on industry-best testing practices to deliver zero defects and ensure requirement-output alignment.

Published in: Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Proven Testing Techniques in Large Data Warehousing Projects

  2. 2. TABLE OF CONTENTS1 EXECUTIVE SUMMARY2 INDUSTRY-BEST PRACTICES IN DWH TESTING 2.1 DATA COMPLETENESS AND QUALITY CHECK 2.2 BI REPORT DATA TESTING 2.3 PERFORMANCE VALIDATION OF ETL AND REPORTS3 CRITICAL SUCCESS FACTORS FOR TESTING 3.1 REFERENTIAL INTEGRITY OF FACTS AND DIMENSIONS 3.2 RISK-BASED TESTING 3.3 DATA OBFUSCATION 3.4 EFFECTIVE DEFECT MANAGEMENT 3.5 FOCUS ON AUTOMATION4 SYNTEL’S BI/DW AND ANALYTICS OFFERINGS SOLUTION5 ABOUT SYNTEL Executive Summary Refining databases and streamlining data warehousing (DWH) are quickly becoming integral requirements in every business. Decision-makers are now realizing the need to study their business, scrutinize their data, and optimize available information to their advantage, in order to stay competitive. Business information is available in many forms, but mostly in knowledge repositories of unstructured data. And while data warehousing projects are on the rise, testing plays a significant role, determining the success of each project by evaluating the final product to ensure it meets specified business needs and the scope of work. However, there are two key challenges involved in data warehousing projects - increased complexities and the significant volume of data. To ensure a methodical analysis of the end product, businesses should focus on the following areas: • Data completeness and quality check • Referential integrity of facts and dimensions • Risk-based testing • Data obfuscation • Effective defect management • Communication process • Adherence to compliance standards The aim of this whitepaper is to outline the key points of each testing aspect, while including a few critical success factors to help you cover all your bases and ensure meticulous and zero-defect solutions and services. ©2012 SYNTEL, INC.
  3. 3. 2. Best Practices in Data Warehouse TestingThe testing activities in data warehousing projects begin in the requirement-gathering phase and are carried out in an iterative manner. In datawarehousing testing, every component of the project needs to be tested, both independently as well as when integrated. This varies from testingthe data model, ETL scripts, reporting scripts and even the user interface reporting layer.The important milestones involved in the data warehousing testing lifecycle are depicted in the diagram below: Establish Entry Set-up test Review BRD Set-up test data Testing Strategy Write test cases SME Discussion and Exit criteria environment Preparation Integration testing ETL data Report • ETL, BO Integration Smoke Testing • Complete cycle validation validation validation • Basic Testing • Cleanliness • Report validity Performance • Jobs and reports • Completeness • Relevance of data testing User Acceptance are accessible • Quality • Thoroughness • Check NFR Testing • SME data validation • Business • Availability of data • Scalability • End user transformations • Consistency • Performance SLA Demo / validation • Peak user testing Execution • Peak load testing Defect Metrics Review Performance Statistics Lessons Learnt Process Improvement2.1. DATA COMPLETENESS AND QUALITY CHECKAn integral part of DWH testing is verifying the quality and completeness of data. Data completeness testing ensures that all expected recordsfrom the source are loaded into the database by reconciling with error and reject records. A data quality check ascertains proper and accuratedata, as per the recommended standard, is processed to the data warehouse; this includes data transformation testing.The following activities are recommended, to determine data completeness and quality: • Data extraction process for both historical and incremental loads • Data cleansing checks, based on standards; here the testing reject threshold is important • ‘Source to target’ transformation validation for thoroughness and accuracy • Historical and incremental transformation process validation • ‘Reject and error’ record analysis and validation • Scenario-based testing with specified transformation rules • Record reconciliation testing by comparing source, error, reject and target records to prevent record leakage • Data load process check for both historical and incremental load process • Negative testing for all the above mentioned casesData profiling is not related to validating data quality, but it is related to source data analysis and is usually conducted by SMEs anddevelopment teams. The Testing teams should focus on target data scenarios.2.2. BI REPORT DATA TESTINGAnother important aspect of DWH testing is confirmation of the accuracy and completeness of business intelligence (BI) reports. These mayvary in appearance, turnaround time, report accuracy and usability, but testing this is of paramount importance as this will be reflected in theUI and is what the end users will eventually see. The following activities are key while testing BI reports: • Restriction of users’ access to reports, with multiple layers of security • Validation of the accuracy and relevance of the data displayed in each report • Ensuring sufficient information for analyzing graphical reports • Relevancy of options in the drop down lists in each report • Testing of pop-up reports and child reports with proper data flow from parent reports • Functionality of additional features such as report storage into PDF formats, print options2.3. PERFORMANCE VALIDATION OF ETL AND REPORTSLoading and populating the data warehouse with relevant and complete data, and ensuring the relevance of reports constitutes 50% of business
  4. 4. expectations. But, these tasks have to completed within a given timeline and should be scalable to support the ever-growing system. Testing theperformance of ETL and reports for responsiveness and scalability is critical to the success of the design.Although there are many non-functional requirements (NFRs) surrounding the performance of ETL and Report response, it would be helpfulto follow these guidelines: • Execution with peak production volume to check for completion of the ETL process within the agreeable window • Analysis of ETL loading times, with a smaller amount of data, to gauge scalability issues • Verification of ETL processing times, component by component, to identify areas of improvement • Testing the timing of the reject process and developing processes on management of large volumes of rejected data • Shutdown of the server during ETL execution, to test for restart ability • Simulation of maximum concurrent user testing for all BI reports and for ad-hoc reports • Ensuring access to BI reports during ETL loads3. Critical Success Factors for Testing3.1. REFERENTIAL INTEGRITY OF FACTS AND DIMENSIONSA number of data warehouses are modeled as dimensions and facts. In these scenarios, the important task would be to test the integritybetween the dimensions and facts carefully. Since there are multiple representations of dimensions, such as slowly changing dimension(SCD), testing will check references and the point in time of reference. Table-level integrity constraints are not usually enforced in large datawarehouses, so the checks have to be tested at the ETL layer and not at the database layer.3.2. RISK-BASED TESTINGAs the data present in data warehouses is huge, it is impossible to test every piece of data available. It is important to work with the businessSMEs to identify risk prone areas while finalizing test cases. Key risk prone areas include the following: • Items which will cause the highest damage to a project upon failure will carry the highest risk and should be tested thoroughly • Items which will be used frequently should also be considered for risk-based testing as the probability of failure is very highAfter discussing the report criticality with the end users, these items should be documented in the test plan.3.3. DATA OBFUSCATIONIn most DWH testing cases, a subset of production data is considered for performing testing activities. However, if this data contains sensitiveinformation it can pose a potential risk.In such cases, data obfuscation can be used to compile the test data in the test bed, but this is not an easy task.The process owners need to consider factors such as secure information masking, catering specific data needs, ensuring referential integrity anddata readability. In large DWH testing projects with secure information, it is advisable to use data masking or the test data generation tool.3.4. EFFECTIVE DEFECT MANAGEMENTIn large projects, the defects would be assigned across streams, followed by careful coordination, analysis and improvements to close the defects.Defect tracking tools such as HPQC and Test Director can be very helpful.In data validation, scenario-based testing is predominant. All scenarios would not be listed as test cases, but the results of every scenario needto be captured effectively. A defect triage meeting can be a forum to discuss all cross stream defects and can feature as a recurring discussionwith all stream members to understand and close defects.3.5. FOCUS ON AUTOMATIONThe need for additional or repeated testing in large projects will arise due to factors such as a change in requirements, defects, design changesor even enhancements. If the testing process is automated it will reduce the time taken and the manpower and effort invested. In a datawarehouse testing environment, the following items could be automated: • Test data generation • Regression testing suite • Performance testing suite • Data profiling tools - this does not directly pertain to testing but can helpAlthough there are tools available for these automations, the teams could choose to build a customized tool if the project needs are specific.Testing activities in a large data warehouse project are much more complex than normal software testing, and necessitates careful co-ordinationand proper understanding of the data. A capable IT partner will be able to collaborate with you, understand your business and assess yourproject, while ensuring a no-defect environment with smooth and streamlined processes.
  5. 5. 4. Syntels BI/DW and Analytics Offerings SolutionSyntel has delivered more than 700 Business Intelligence–Data Warehouse projects worldwide, across various industries. Our dedicated BIPractice is geared to provide quality services across the BI-DW systems lifecycle, by leveraging the cost effectiveness of onsite-offsite delivery.Syntels value proposition is driven by an experienced team, with mature methodologies to provide consultancy across domain and applicationareas.With our comprehensive domain knowledge of our clients’ industries, including trends, competitive environments, customers and stakeholders,our BI-DW-based solutions support clients’ overarching business strategy, while ensuring that the final output is aligned to their businessneeds. Our solutions include customized approaches, proven practices, innovative frameworks and adept techniques that streamlineorganizational activities, deliver applications with superior quality, and ensure a zero-defect environment.Syntel’s solutions allow us to guide organizations through a transformational journey by reducing risks, optimizing costs and providing businessbenefits. DATAMANAGEMENT BUSINESS INSIGHTS BUSINESS FORESIGHTS  Data modeling and architecture  Analytical and operational reporting  Data mining  Data integration  Intuitive dashboards and scorecards  Statistical model development  Data quality and governance  Report inventory rationalization  Big Data analytics  Master data management  Mobile -based BI delivery  Predictive modeling  Metadata management  Upgrade and platform migration services  Text mining  Large size data warehouses  Reporting services on Cloud  Forecasting and optimization  Upgrade and platform migration services  Performance tuning CONSULTING SERVICES - Assessment, Strategy and RoadmapSome of Syntel’s in-house accelerators, developed by the BI-DW team, are as follows: Business Challenges Syntel’s Accelerators • Poor data quality • SmartData - Syntel’s data quality enrichment tool and data governance frame- • Delayed time-to-market work • High risk of implementing BI projects • Delivers 40% functionality at fractional cost of products Increasing complexity due to: • Data Integration Framework to improve time-to-market • New data sources • Automated source-to-target mapping documentation using SmartMap, with • New data elements from existing sources 80% of analysis efforts • Increased efforts and lack of documentation Fragmented reporting environment • Accelerators for report migration with 50-60% automation (e.g.: CoBo - • High total cost of ownership Cognos to BO, ActJasper – Actuate to Jasper) • Poor visibility into enterprise data • Cognos to SSRS migration framework • Report rationalization framework Insurance KPI Reporting • PerformINS, a proprietary KPI reporting solution • Plug & Play BI Solution, saving 30% of efforts and costs • 60+ Key Performance Indicators (KPIs) to provide insights into business with readily available dashboards and reportsSyntel can help you build defect-free applications, compliant with industry and regulatory requirements, accelerated by our innovative BI-DWsolutions. For more information on Syntel’s capabilities and how we can leverage industry-best techniques to deliver a seamless, error-freebusiness output, log onto
  6. 6. about SYNTEL: Syntel (NASDAQ:SYNT) is a leading global provider of integrated information technology and Knowledge Process Outsourcing (KPO) solutions spanning the entire lifecycle of business and information systems and processes. The Company is driven by its mission to create new opportunities for clients by harnessing the passion, talent and innovation of Syntel employees worldwide. Syntel leverages dedicated Centers of Excellence, a flexible Global Delivery Model, and a strong track record of building collaborative client partnerships to create sustainable business advantage for Global 2000 organizations. Syntel is assessed at SEI CMMi Level 5, and is ISO 27001 and ISO 9001:2008 certified. As of June 30, 2012, Syntel employed more than 20,000 people worldwide. To learn more, visit us at SYNTEL 525 E. Big Beaver, Third Floor Troy, MI 48083 phone 248.619.3503 info@syntelinc.comv i s i t S y n t e l s w e b s i t e a t w w w . s y n t e l i n c . c o m