Business value Assurance / Advanced DWH (Testing)1. Challenges faced by the testing team in realtime scenario2. Challenges faced by the team in differents phases of STLC3. What tools are available & used for testing DWH at differentstages4. Any automation tool available for DWH5. Any tool available and used to ensure data quality6. How it is ensured that the data sample selected ensurescompleteness7. How is data reconciliation done8. How to test bulk data9. Some information on performance tool and how the result isanalyzed
Table Of Contents1. Challenges faced by the testing team in real-time scenario.2. Challenges faced by the team in different phases of STLC.3. What tools are available & used for testing DWH at differentstages.4. Any automation tool available for DWH.5. Any tool available and used to ensure data quality.6. How it is ensured that the data sample selected ensurescompleteness.7. How is data reconciliation done.8. How to test bulk data.9. Some information on performance tool and how the result isanalyzed.
Challenges faced by the testing team in real-time scenario.Challenges Faced:Lack of Skilled testersResults:Resulted into incomplete, insufficient andinadequacy of testing that led to spending oflot of effort in finding and reporting the bugs.
Challenges Faced:Lack of availability of standard testdata / datasets during testingResults:Lead to insufficient test coverage.
Challenges Faced:The team members had insufficientknowledge of the domain standardsResults:Resulted in inadequate testing.
Challenges Faced:Poor understanding of requirements andMiscommunication or no communication with the end-users during testing/development cyclesResults:No specifics of what an application should or shouldntdo (the applications requirements) and lead to poorquality of testing.
Challenges Faced:Not recording non-reproducible defectsResults:Many times tester came across bugs during random /exploratory testing which appeared on specificconfigurations and are non-reproducible. This madetesting task extremely tedious and time consuming, asmany times there would be random hangs in product.
Challenges Faced:Tedious manual verification and testing the completeapplicationResults:Even though this led developers on displaying specificinterpretation of results, this has to be done on widerange of datasets and is a repetitive work. Also to testeach and every combination was challenging.
Challenges Faced:Interdependencies of components in the SoftwareResults:Since the software was complex with differentcomponents, the changes in one part of software oftencaused breaks in other parts of the software. Pressureto handle the current functionality changes, previousworking functionality checks and bug tracking.
Challenges Faced:Testing always under time constraintsResults:Often there was a slippage in other phases of the project and thusreduced time for testing as there was a committed end date tocustomer. It was also observed that the tester could simply focuson task completion and not on the test coverage and quality ofwork. This testing activity was taken up as last activity in projectlife cycle and there was always a pressure to squeeze testing in ashort time.
Challenges Faced:Test Systems inadequacy & lack of dedicated resources for test team.Under estimating testing efforts in project effortsResults:Testing time was affected because of lack of dedicated test systems given totest team, the testers got assigned to test multiple modules and the developerswere finally moved on the testing job.Test engineers were forced to work at odd hours/weekends as the limitedresources were in control of the development team and test engineers weregiven a lower priority during allocation of resources.Testing team was not involved during scoping phase and the testing team’sefforts were typically underestimated. This led to lower quality of testing assufficient efforts could not be put in for the same.
Challenges Faced:The involvement of test team in entire life cycle is lackingResults:Test engineers were involved late in the life cycle. This limitedtheir contribution only to black box testing. The project team didn’tuse the services of the test team for the unit as well as integrationtesting phases. Due to the involvement testers in the testingphase, the test engineers took time to understand all therequirements of the product, and were overloaded and finallywere forced to work many late hours.
Challenges Faced:Problems faced to cope with attritionResults:Few Key employees left the company at very shortcareer intervals. Management faced hard problems tocope with attrition rate. New testers taken into projectrequired project training from the beginning and as thisis a complex project it became difficult to understandthus causing delay in release date.
Challenges Faced:Hard or subtle bug remained unnoticedResults:Since there was a lack of skilled testers anddomain expertise, some testers concentratedmore on finding easy bugs that did not requiredeep understanding.
Challenges Faced:Lack of relationship with the developers & nodocumentation accompanying releases provided totest teamResults:It is a big challenge. There is no proper documentationaccompanying releases provided to the test team. Thetest engineer is not aware of the known issues, mainFeatures to be tested, etc. Hence a lot of effort iswasted.
Challenges Faced:Problems faced to cope up with scope creep andchanges to the functionality.Results:Delays in implementation date because of lot ofrework. Since there were dependencies among partsof the project and the frequent changes to beincorporated, resulted many bugs in the software.
Though automated testing has a lot of benefit, but italso has some associated challenges.i. Selection of Test Toolii. Customization of Tooliii. Selection of Automation Leveliv. Development and Verification of Scriptv. Implementation of Test Management System
Challenges faced by the teamin different phases of STLC.Testing the complete application:Is it possible? I think impossible. There are millions oftest combinations. It’s not possible to test each andevery combination both in manual as well as inautomation testing. If you try all these combinationsyou will never release the product.
Misunderstanding of company processes:Some times you just don’t pay proper attention whatthe company-defined processes are and these arefor what purposes. There are some myths in testersthat they should only go with company processeseven these processes are not applicable for theircurrent testing scenario. This results in incompleteand inappropriate application testing.
Relationship with developers:Big challenge. Requires very skilled tester to handlethis relation positively and even by completing thework in testers way. There are simply hundreds ofexcuses developers or testers can make when theyare not agree with some points. For this tester alsorequires good communication, troubleshooting andanalyzing skill.
Regression testing:When project goes on expanding the regressiontesting work simply becomes uncontrolled. Pressureto handle the current functionality changes, previousworking functionality checks and bug tracking.
Testing always under time constraint:Hey tester, we want to ship this product by thisweekend, are you ready for completion? When thisorder comes from boss, tester simply focuses ontaskcompletion and not on the test coverage and qualityof work. There is huge list of tasks that you need tocomplete within specified time. This includes writing,executing, automating and reviewing the test cases.
Which tests to execute first?Then how will you take decision which test casesshould be executed and with what priority? Whichtests are important over others? This requires goodexperience to work under pressure.
Understanding the requirements:Some times testers are responsible forcommunicating with customers for understanding therequirements. What if tester fails to understand therequirements? Will tester be able to test theapplication properly? Definitely No! Testers requiregood listening and understanding capabilities.
Decision to stop the testing:When to stop testing? Very difficult decision.Requires core judgment of testing processes andimportance of each process. Also requires ‘on the fly’decision ability.
One test team under multiple projects:Challenging to keep track of each task.Communication challenges. Many times results infailure of one or both the projects.
Reuse of Test scripts:Application development methods are changingrapidly, making it difficult to manage the test toolsand test scripts. Test script migration or reuse is veryessential but difficult task.
Testers focusing on finding easy bugs:If organization is rewarding testers based on numberof bugs (very bad approach to judge testersperformance) then some testers only concentrate onfinding easy bugs those don’t require deepunderstanding and testing. A hard or subtle bugremains unnoticed in such testing approach.
To cope with attrition:Increasing salaries and benefits making manyemployees leave the company at very short careerintervals. Managements are facing hard problems tocope with attrition rate. Challenges – New testersrequire project training from the beginning, complexprojects are difficult to understand, delay in shippingdate!
Different types of testing are requiredthroughout the life cycle of a DWHimplementation.So we have different challenges to faceduring the different phases of STLC.
ETL (Business Functionality Data Quality Performance)During the ETL phase of DWH implementation, Dataquality testing is of utmost importance. Any defectslippage in this phase will be very costly to rectify later.Functional testing need to be carried out to validate theTransformation Logic.
Data Load (Parameters Settings Validation)During the setup of Data Load functionality, specifictesting on the load module is carried out. TheParameters and Settings for data load are tested here.
Initial Data Load (Perfomance Data Quality)Initial Data Load is when the underlying databases areloaded for the first time. Performance testing is ofsignificance here. Data Quality, once tested and signedoff during the ETL testing phase is re-tested here.
E2E Business Testing (UI & Interface Testing)Once the initial data load is done, the Data warehouseis ready for an end-to-end functional validation. UItesting and Interface testing are carried out during thisphase.
Maintenance / Data Feeds (Regression)Data from the operational Database should be input intothe Data warehouse periodically. During such periodicupdates, regressing testing should be executed. Thisensures the new data updates heve not broken anyexisting functionality. Periodic updates are required toensure temporal consistency.
What tools are available andused for testing DWH at different stages?ETL software can help you in automatingsuch process of data loading fromOperational environment to DataWarehouse environment.
What tools are available and used for testing DWH at different stages?
What tools are available andused for testing DWH at different stages?Create pairs of SQL queries (QueryPairs)and reusable queries (Query Snippets) toembed in queries.
What tools are available andused for testing DWH at different stages?Execute Scenarios that compare Sourcedatabases and / or files to Target datawarehouses.
What tools are available andused for testing DWH at different stages?Agents execute your queries and return the results to theQuerySurge server for reporting and analysis.Analyze and drill down into your results and identify bad data anddata defects with our robust reporting.
Issue: Missing DataDescription: Data that does not make it into the target databasePossible Causes: By invalid or incorrect lookup table in thetransformation logicBad data from the source database (Needs cleansing) InvalidjoinsExample(s): Lookup table should contain a field value of “High”which maps to “Critical”. However, Source data field contains“Hig” - missing the h and fails the lookup, resulting in the targetdata field containing null. If this occurs on a key field, a possiblejoin would be missed and the entire row could fall out.
Issue: Truncation of DataDescription: Data being lost by truncation of the data fieldPossible Causes: Invalid field lengths on target databaseTransformation logic not taking into account field lengths fromsourceExample(s):Source field value “New Mexico City” is being truncated to “NewMexico C” since the source data field did not have the correctlength to capture the entire field.
Issue: Data Type MismatchDescription: Data types not setup correct on target databasePossible Causes: Source data field not configured correctlyExample(s): Source data field was required to be a date,however, when initially configured, was setup as a VarChar.
Issue:Null TranslationDescription:Null source values not being transformed to correct target valuesPossible Causes:Development team did not include the null translation in thetransformation logicExample(s):A Source data field for null was supposed to be transformed to‘None’ in the target data field. However, the logic was notimplemented, resulting in the target data field containing nullvalues.
Issue:Wrong TranslationDescription:Opposite of the Null Translation error. Field should be null but ispopulated with a non-null value or field should be populated butwith wrong valuePossible Causes:Development team incorrectly translated the source field forcertain valuesExample(s):Ex. 1) Target field should only be populated when the source fieldcontains certain values, otherwise should be set to nullEx. 2) Target field should be “Odd” if the source value is an oddnumber but target field is “Even” (This is a very basic example)
Issue:Misplaced DataDescription:Source data fields not being transformed to the correct targetdata fieldPossible Causes:Development team inadvertently mapped the source data field tothe wrong target data fieldExample(s):A source data field was supposed to be transformed to targetdata field ‘Last_Name’. However, the development teaminadvertently mapped the source data field to ‘First_Name’
Issue:Extra RecordsDescription:Records which should not be in the ETL are included in the ETLPossible Causes:Development team did not include filter in their codeExample(s):If a case has the deleted field populated, the case and any datarelated to the case should not be in any ETL
Issue:Not Enough RecordsDescription:Records which should be in the ETL are not included in the ETLPossible Causes:Development team had a filter in their code which should nothave been thereExample(s):If a case was in a certain state, it should be ETL’d over to thedata warehouse but not the data mart
Issue:Transformation Logic Errors/HolesDescription:Testing sometimes can lead to finding “holes” in the transformation logic orrealizing the logic is unclearPossible Causes:Development team did not take into account special cases. For exampleinternational cities that contain special language specific characters mightneed to be dealt with in the ETL codeExample(s):Ex. 1) Most cases may fall into a certain branch of logic for atransformation but a small subset of cases (sometimes with unusual data)may not fall into any branches. How the testers code and the developerscode handle these cases could be different (and possibly both end upbeing wrong) and the logic is changed to accommodate the cases.Ex. 2) Tester and developer have different interpretation of transformationlogic, which results in having different values. This will lead to the logicbeing re-written to become more clear
Issue:Simple/Small ErrorsDescription:Capitalization, spacing and other small errorsPossible Causes:Development team did not add an additional space after acomma for populating the target field.Example(s):Product names on a case should be separated by a comma andthen a space but target field only has it separated by a comma
Issue:Sequence GeneratorDescription:Ensuring that the sequence number of reports are in the correctorder is very important when processing follow up reports oranswering to an auditPossible Causes:Development team did not configure the sequence generatorcorrectly resulting in records with a duplicate sequence numberExample(s):Duplicate records in the sales report was doubling up severalsales transactions which skewed the report significantly
Issue:Undocumented RequirementsDescription:Find requirements that are “understood” but are not actuallydocumented anywherePossible Causes:Several of the members of the development team did notunderstand the “understood” undocumented requirements.Example(s):There was a restriction in the “where” clause that limited howcertain reports were brought over. Used in mappings that wereunderstood to be necessary, but were not actually in therequirements.Occasionally it turns out that the understood requirements arenot what the business wanted.
Issue:Duplicate RecordsDescription:Duplicate records are two or more records that contain the samedataPossible Causes:Development team did not add the appropriate code to filter outduplicate recordsExample(s):Duplicate records in the sales report was doubling up severalsales transactions which skewed the report significantly
Issue:Numeric Field PrecisionDescription:Numbers that are not formatted to the correct decimal point ornot rounded per specificationsPossible Causes:Development team rounded the numbers to the wrong decimalpointExample(s):The sales data did not contain the correct precision and all saleswere being rounded to the whole dollar
Issue:Rejected RowsDescription:Data rows that get rejected due to data issuesPossible Causes:Development team did not take into account data conditions thatcould break the ETL for a particular rowExample(s):Missing data rows on the sales table caused major issues withthe end of year sales report
Any tool available and used to ensure data quality.WizSoft- WizRuleVality- IntegrityPrism Solutions, Inc.- Prism Quality Manager
Objective:Is your data complete and valid?Tool:WizSoft- WizRule, Vality- IntegrityFeatures:Data examination- determines quality of data, patternswithin it, and number of different fields used.
Objective:Does your data comply to your business rules? (Do youhave missing values, illegal values, inconsistent values,invalid relationships?)Tool:Prism Solutions, Inc.- Prism Quality ManagerWizSoft - WizRuleVality- IntegrityFeatures:Compare to business rules and assess data forconsistency and completeness against rules.
Objective:Are you using sources that comply to yourbusiness rules?Tool:WizSoft- WizRule, Vality- IntegrityFeatures:Data reengineering- examining the data to determinewhat the business rules are?
Objective: Does your data need to be broken upbetween source and data warehouse?Tool: Trillium Software- Parseri.d. Centric- DataRightFeatures: Data parsing (elementizing)- context anddestination of each component of each field.
Objective: Does your data have abbreviations thatshould be changed to insure consistency?Tool: Trillium Software- Parseri.d. Centric- DataRightFeatures: Data standardizing- converting dataelements to forms that are standard throughout theDW.
Objective: Is your data correct?Tool: Trillium Software- ParserTrillium Software- GeoCoderi.d. Centric- ACE, Clear I.D.LibraryGroup 1- NADISFeatures: Data correction and verification- matchesdata against known lists (addresses, product lists,customer lists)
Objective: Is there redundancy in your data?Tool: Trillium Software- MatcherInnovative Systems- Matchi.d. Centric-Match/ConsolidationGroup 1- Merge/Purge Plus.Features: Record matching- determines whether tworecords represent data on the same object.
Objective: Are there multiple versions of companynames in your database?Tool: Innovative Systems- Corp-MatchFeatures: Record matching- based on user specifiedfields such as tax ID
Objective: Is your data consistent prior to entering datawarehouse?Tool: Vality- Integrityi.d. Centric-Match/ConsolidationFeatures: Transform data- “1” for male, “2” for femalebecomes “M” & “F”- ensures consistent mappingbetween source systems and datawarehouse
Objective: Do you have information in free form fieldsthat differs between databases?Tool: Vality- IntegrityFeatures: Data reengineering- examining the data todetermine what the business rules are?
Objective: Do you multiple individuals in the samehousehold that need to be grouped together?Tool: i.d. Centric-Match/ConsolidationTrillium Software- MatcherFeatures: Householding- combining individual recordsthat have same address.
Objective: Does your data contain atypical words-such as industry specific words, ethnic or hyphenatednames?Tool: i.d. Centric- ACE, Clear I.D.Features:Data parsing combined with data verification-comparison to industry specific lists.
Enterprise / Integrator by Carleton.Semio - SemioMap
Objective: Do you have multiple formats to beaccessed- relational dbs, flat files, etc.?Tool: Enterprise/Integrator by Carleton.Features: Access the data then map it to the dwschema.
Objective: Do you have free form text that needs to beindexed, classified, other?Tool: Semio- SemioMapFeatures: Text mining- extracts meaning and relevancefrom large amounts of information
Objective: Have the rules established during the datacleansing steps been reflected in the metadata?Tool: Vality- IntegrityFeatures: Documenting- documenting the results ofthe data cleansing steps in the metadata.
Objective: Is data Y2K compliant?Tool: Enterprise/Integrator by Carleton.Features: Data verifiacation within a migration tool.
How it is ensured that the data sample selected ensures completeness.By data verification with the help of migration tool.
How is data reconciliation done?If the DDL that the data architect has produced somehowdoes not match the DDL that has already been defined tothe dbms, then there MUST BE a reconciliation beforeany other design and development ensues.
Many of the data warehouses are built on n-tierarchitecture with multiple data extraction and datainsertion jobs between two consecutive tiers. As ithappens, the nature of the data changes as it passesfrom one tier to the next tier. Data reconciliation is themethod of reconciling or tie-up the data between anytwo consecutive tiers (layers).
Master Data ReconciliationMaster data reconciliation is the method of reconcilingonly the master data between source and target.Common examples of master data reconciliationTotal count of rows, example:Total Customer in source and targetTotal number of Products in source and target etc.Total count of rows based on a condition, example:Total number of active customersTotal number of inactive customers etc.
Transactional Data ReconciliationSales quantity, revenue, tax amount, service usage etc. areexamples of transactional data. Transactional data make thevery base of BI reports so any mismatch in transactional datacan cause direct impact on the reliability of the report and thewhole BI system in general. That is why reconciliationmechanism must be in-place in order to detect such adiscrepancy before hand (meaning, before the data reach tothe final business users)Some examples measures used for transactional datareconciliationSum of total revenue calculated from source and targetSum of total product sold calculated from source and target etc.
Automated Data ReconciliationFor large warehouse systems, it is often convenient to automate the data reconciliation process by making this an integral part of data loading. This can be done by maintaining separate loading metadata tables and populating those tables with reconciliation queries. The existing reporting architecture of the warehouse can be then used to generate and publish reconciliation reports at the end of the loading. Such automated reconciliation will keep all the stake holders informed about the trustworthiness of the reports.
How to test bulk data? Using Automation tools.
Some information on performance tool and how the result is analyzed.Open source load testing tool: It is a Java platformapplication. It is mainly considered as a performancetesting tool and it can also be integrated with the testplan. In addition to the load test plan, you can alsocreate a functional test plan. This tool has the capacityto be loaded into a server or network so as to check onits performance and analyze its working under differentconditions. It is of great use in testing the functionalperformance of the resources such as Servlets, PerlScripts and JAVA objects.
Load and performance testing software: This is atool used for measuring and analyzing theperformance of the website. The performance and theend result can be evaluated by using this tool and anyfurther steps can be taken. This helps you inimproving and optimizing the performance of yourweb application. This tool analysis the performance ofthe web application by increasing the traffic to thewebsite and the performance under heavy load canbe determined. It is available in two differentlanguages; English and French.
One of the key attractive features of this testing tool isthat, it can create and handle thousands of users atthe same time. This tool enables you to gather all therequired information with respect to the performanceand also based on the infrastructure. TheLoadRunner comprises of different tools; namely,Virtual User Generator, Controller, Load Generatorand Analysis.
Open Source Stress Testing Tool: This tool workseffectively when it is integrated with the functional testingtool soapUI. This allows you to create, configure andupdate your tests while the application is being tested. Italso gives a visual Aid for the user with a drag and dropexperience. This is not a static performance tool. Theadvanced analysis and report generating features allowsyou to examine the actual performance by pumping innew data even while the application is being tested. Youneed not bother to restart the LoadUI each and everytime you modify or change the application. Itautomatically gets updated in the interface.
Load testing and stress testing tool for webapplication: To find out the bottlenecks of the website,it is necessary to examine the pros and cons. There aremany performance testing tools available for measuringthe performance of the certain web application.WebLoad is one such tool used for load testing andstress testing. This tool can be used for Load testingany internet applications such as Ajax, Adobe Flex,Oracle Forms and much more. This tool is widely usedin the environment where there is a high demand formaximum Load testing.
It refers to the Web Application Performance tool. Theseare scales or analyzing tools for measuring the performanceand output of any web application or web related interfaces.These tools help us to measure the performance of anyweb services, web applications or for any other webinterfaces. With this tool you have the advantage of testingthe web application performances in various differentenvironment and different load conditions. WAPT providesdetailed information about the virtual users and its output toits users during the load testing. The WAPT tools can teststhe web application on its compatibility with browser andoperating system. It is also used for testing the compatibilitywith the windows application in certain cases.
It is a desktop based advanced HTTP load testingtool. The web browser can be used to record thescripts which is easy to use and record. Using theGUI you can modify the basic script with dynamicvariables to validate response. With control overnetwork bandwidth you can simulate large virtual userbase for your application stress tests. After test isexecuted HTML report is generated for analysis.
It is a load testing tool which is mainly used in the cloud-based services. This also helps in website optimizationand improvising the working of any web application. Thistools generates traffic to the website by simulating usersso as to find the stress and maximum load it can work.This LoadImpact comprises of two main parts; the loadtesting tool and the page analyzer. The load testing can bedivided into three types such as Fixed, Ramp up andTimeout. The page analyzer works similar to a browserand it gives information regarding the working andstatistics of the website. The fame of developing this loadtesting tool belongs to Gatorhole AB. This is a freemiumservice which means that, it can be acquired for free andalso available for premium price.
It is an automated performance testing tool which canbe used for a web application or a server basedapplication where there is a process of input andoutput is involved. This tool creates a demo of theoriginal transaction process between the user and theweb service. By the end of it all the statisticalinformation are gathered and they are analyzed toincrease the efficiency. Any leakage in the website orthe server can be identified and rectified immediatelywith the help of this tool. This tool can be the bestoption in building a effective and error free cloudcomputing service.
It is a automated testing tool which can be employedfor testing the performance of any web sites, webapplications or any other objects. Many developersand testers make use if this tool to find out anybottlenecks in their web application and rectify themaccordingly. This testing tool comes along with a builtin editor which allows the users to edit the testingcriteria according to their needs. The testinganywhere tool involves 5 simple steps to create atest. They are object recorder, advanced webrecorder, SMART test recorder, Image recognitionand Editor with 385+ comments.
Thanks Prepared by Mr. Prashanth B S Software Testing – Corporate Trainer On behalf of ISQT InternationalISQT - Process & Consulting Services Private Limited 732, 1st Floor, 12th Main, 3rd Block, Rajajinagar, Bangalore - 560 010, INDIA Phone: + 91- 80 - 23012501-15 Fax: + 91 80 23142425 www.isqtinternational.com email: email@example.com