ANOVA Parametric test: Biostatics and Research Methodology
Ranjitbanshpal1
1. paper
Topic:- Ensure Accuracy in Data Transformation with Data Testing
Framework(DTF)
Abstract:
Need for a Data Testing Framework
Testing holds high stakes in helping businesses make insightful and intelligent decisions using available
information. Given the growing complexity in Data Warehousing and Business Intelligence space in the
IT Industry; L&T InfoTech has developed a cost-effective solution to address the following challenges
faced by clients:
Security services to that rich environment. Higher security assurance typically comes with higher
integration costs and reduced usability. TCS recommends a risk-based, cost-effective, holistic mobile
security solution with focus on user experience and enhancing customer engagement.
• Unavailability of comprehensive testing tools
• Varied skill sets required to understand various file formats
• Voluminous data from heterogeneous sources
• 100 % data validation is not be feasible
• Manual comparison of data is tedious and error-prone
Data Testing Framework is a testing framework that easily integrates with users’ needs for different
types of data validation processes. It enables users to compare and validate data across various types of
data sources and databases.
DTF Overview
DTF is Open Source data validation and comparison framework that allows a user to perform data-
centric testing. It’s simple User Interface (UI) enables users to easily configure the tool as per their
testing needs. The framework also provides detailed results of the test cases enabling faster analysis of
Information stored in a data warehouse is critical to organizations for decision making
and Predictive analysis. The huge volume of data loaded onto a data warehouse makes
exhaustive manual comparison of data impractical. The existing quality tools are either
manual or have other limitations, and do not cover all aspects of data warehouse testing.
Therefore, a holistic solution is required to test high-volume applications that are built on
Data Warehouse (DW) or Business Intelligence (BI) architecture.
2. test results.
What is DTF?
The DTF has been developed by synthesizing years of experience in the Database Testing area. DTF can
be used for comparison of data from two different data feeds after data migration or reconciliation.These
source and target data-feeds can be database table, database query, flat file, CSV, PSV or an Excel file.
DTF has a proven track record of comparing high volume of data and supports leading databases
in the market.
DTF can be configured to perform the following types of comparisons:
• File to File comparator
• File to Database table comparator
• Database to Database comparator
• Query output to File comparator
• Database Table comparator
• Database Table to Query output
• Database table to Fixed Length File Comparator
• Database table to XML comparator
• Database table to Stored Procedure output comparator
DTF provides a user-friendly UI to the testers from non-technical background and allows them to
configure the tool to operate in different modes for different types of comparisons.
Execution Steps
Common test scenarios required for data conversion testing can be broadly classified into the following
categories:
• Table/schema validation (includes the verification of indexes, stored procedures and trigger)
• Count and data validation
• Data character set conversion
• File processing (In cases where the source is a file)
• Batch job and business rule validation
• Interface testing
3. Figure 1: DTF Process
The process for data testing using DTF is as follows:
1. Analyze – Study the data model of the source and the target databases to understand the
conversion process. If the source is a flat file, analyze the file’s structure and its mapping with
the target database.
2. Data Mapping – The mapping between the source and the target databases & tables needs to be
configured in the DTF. If there are no schema changes, the mapping of the source and target
databases at database level is enough. There may be scenarios where either the data of one
source table is distributed to multiple target tables or the data of multiple source tables is merged
in one target table. In such cases, the mapping of source and the target tables will be required to
be configured in the DTF at column level.
3. Test Case Creation – The test cases for various data comparison and validation scenarios can be
created in DTF using the data mappings done. DTF also provides the user an option to create test
suites and execute multiple test cases in a single framework execution.
4. Execute & Report – DTF test case or test suite can be executed in DTF by providing different
run time DTF execution options. Following are some DTF execution options:
• Trim Data before Comparison
• Ignore case in Comparison
• Database Schema Comparison
4. • Full Database Comparison
Once the execution is complete, a detailed report is generated which gives the following details:
• Summary report
• Mismatched records
• Extra records in source
• Extra records in Target
All reports are generated in a spreadsheet, which are detailed and convenient to analyze.
Building Blocks of DTF
Figure 2: DTF Building Blocks
DTF comprises the following three blocks:
• DTF Util Manager - DTF Util Manager is responsible for reading/writing data into
files/databases and data conversion, if required, for internal DTF logic. It ensures that the source
and target data arein same format before data goes to the DTF Compare Engine. It implements
logic for all other activities other than actual data comparison and report generation.
• DTF Compare Engine - DTF Compare Engine is responsible for actual comparison of source and
target data. If the data is huge, it divides the data into predefined sized chunks and does the
comparison. Formation of the data chunks and data comparison is done in parallel to have faster
comparison. This engine communicates with DTF Report Manager to give details of comparison
execution result.
• DTF Report Manager - DTF Report Manager is responsible for generating DTF reports by taking
comparison execution results from DTF Compare Engine. It generates reports in excel format.
ports are generated in two categories: summary reports and detailed reports. It takes comparison
execution time as a reference and creates folders with that name to store reports for every
execution.
5. In addition to the three primary blocks, DTF has the following building blocks, each of which
Represents different data feeds:
• Excel Files
• Flat Files
• Database Tables
• Database Query
Excel Config file block represents configuration input excel files. Typically, a user lists the
parameters for comparison between the source and destination in these configuration file(s).
DTF Report block represents DTF summary as well as DTF detailed reports generated after
comparison execution.
Software Requirements
• JRE 1.6
• Microsoft Office
• Windows Operating System
Hardware Requirements
• 1 GB RAM or greater
• 3 GHz CPU
Benefits offered by DTF
• DTF is a very cost-effective solution as it is developed using Open Source Tool.
• Detailed reports help in identifying problems Reduction in test execution effort.
• Reusability of the framework across different Data Warehousing projects.
• Less maintenance because of the modular structure of the framework.
• Ability to work with different types of data feeds.
• Easier result analysis through Excel sheets.
Differentiators
• Simple test script creation and execution
• Tester productivity increased with improved quality of testing
• Cost savings of 30%
• Compressed testing cycle
6. Conclusion
DTF, the open source technology based framework that supports all databases currently available in the
market, creates detailed reports that help organizations identify defects and take corrective actions based
on the inputs. Enterprises are thus able to achieve cost and efforts savings with enhanced test coverage
through automation. Accurate, real-time information is readily available to help in making informed
decisions.
7. Conclusion
DTF, the open source technology based framework that supports all databases currently available in the
market, creates detailed reports that help organizations identify defects and take corrective actions based
on the inputs. Enterprises are thus able to achieve cost and efforts savings with enhanced test coverage
through automation. Accurate, real-time information is readily available to help in making informed
decisions.