TEST DATA MANAGEMENT
The need for Continuous testing and Integration is well acknowledged
across the industry today in order to fully embrace the agile
methodology. This requires a complete shift to an extremely dynamic
and Flexible development and testing process. For this access to Quality
test data is the key to success.
The Success factors also include a comprehensive test coverage leading to early detection of Defects. A
strong test data strategy to overcome some of the challenges:
 Lack of Specific data sets to test.
 Not knowing where to look for the data/ not having appropriate access to the data.
 Effort wastage in coordination, operational inefficiencies.
Introduction
Managing Test Data in Multiple environments(Non Production) is essential to enhance the quality of
testing and optimizing effort in following ways:
Functional Testing:
An effective (Positive/ negative) functional test with appropriate test data helps in:
 Finding defects early.
 Focus on functional and Regression tests and not on steps required to reach the desired
test state.
Performance Testing:
For applications where big data is involved, and performance is paramount, a robust, strong automated
test data strategy is required. For sustained performance tests, test data must comprise of:
 Stability
 Load
 Baseline
Services Virtualization:
 Realistic test data is required to simulate a live service behavior in an integrated fashion.
 Subset of Production data helps in emulating end user behavior during beta releases/ UAT.
Essential Steps for a Streamlined Test Data management
Data requirement Analysis:
Test data is predominantly created based on the Test Requirements. However, the complete analysis of
data must consider the following
 Systems: Systems involved in all of the testing phases.
 Formats: Format of data which may be needed by different systems (Normalized, Raw, Json, Xls
etc.) or different testing requirements (Negative, Positive, boundary values etc.)
 Rules: Different rules may be applied to data at different stages of testing or location or type of
data. For example: A service test may require data in raw format, however for doing a system
test the requirement may be of a normalized format.
Data Setup/Provisioning:
There can be different approaches for creation of a realistic, referentially correct/intact test data.
 Subset of Production data: This kind of data set is most accurate and can be easily created
without adding a lot of administrative costs or challenges. These data sets are small enough to
accommodate model changes but large enough to simulate production like behavior. The only
in this approach is when sensitive data such as personal information of customer or encrypted
data is involved.
 Automated Data creation: In absence of any kind of production data, for effective testing,
automated data generation jobs can be created which creates a large data set for both
functional and non-functional testing. This data set is created to force error and boundary
conditions.
Data Restrictions:
Data restrictions can be due to Regulations, Compliance, or sensitive client/customer data. Capabilities
must be developed to de-mask such confidential data and provide a real look and feel. For example: In a
cloud based testing model sensitive client information must not be shared.
Test Data Administration:
 Golden Copy: Creating a new copy of test data at each phase of testing or release will lead to a
lot of effort consumption and may bear different results. Hence it is always a good idea to create
a golden copy of reference data and provision a copy/ subset of the same depending on the test
requirements.
 Maintenance: Data maintenance is necessity at periodic intervals. This is required due to
application design changes, Data model changes or plugging gaps identified from earlier test
cycles.
 Data Refresh: This is often required to reset the data source of test data environment for
multiple rounds of testing as during testing the test data may be altered or exhausted.
In a nutshell the effectiveness of test data management is critical for successful validation of any
application. This is achieved via a well-defined process for data creation, usage along with appropriate
usage of tools for comprehensive test coverage.
Case Study:
Project:
Development of strategic platform which caters to institutional clients of a major investment bank. The
requirements included providing the clients a Real time view of Holding, Performance, Reference, Risk
and Transaction data with very specific visual requirements on how the data is to be shown to the end
user.
Challenges:
 Multiple Data sources providing all of this data in different formats.
 Multi-tiered service oriented architecture.
 This data also had both compliance and organizational restrictions as this was sensitive real
client data.
 The data had to be transformed from its raw form to meet the Visual requirements.
Objective:
Data integrity must be maintained at all costs. Response time of the application is expected to be below
3 seconds irrespective of the data being shown to the client.
Strategy and Solution.
Phase of Testing Approach Pros Cons
Independent Services
Testing
Automated Stubs for data
creation to verify API
signature in request and
response.
Early detection of issues
with Service response.
Limited data set availability
didn’t allow a comprehensive
testing. Leading to rework in
later stages.
Integration Tests Use of Automated jobs for
production quality data
creation and automated
tests to validate expected
and actual vis vis
requirements.
Not only integration defects
were detected, were able to
simulate end user behavior
to test load on application
as well.
---
System Tests Subset of production data Tests with data variations Cost of testing was high as
was taken to create test
bed.
yielded edge scenario
defects.
Data issues at source were
found and fixed in source
systems. Ensured smooth
UAT.
resources were spent to
ensure no data leak/ breach.
Coordination with Source data
teams and controllers was
required.
UAT Testing with Actual
production Data.
Actual production data
usage made sure data
testing during UAT was
successful. Simulated a beta
release to production
behavior.
---
Conclusion:
 Automated data creation helped in multiple rounds of regression tests. This ensured a robust
application was delivered to next phase with minimal issues.
 Usage of Production data helped simulate end user behavior of the application and weed out
issues which could have caused high impact.
 Automated tests helped in large data set validations, with thousands of data rows of hundreds
of accounts quickly multiple times.
 Source Data issues were found and fixed.
 Application data being client specific and sensitive, it was paramount to ensure data integrity,
Usage of actual production data helped simulate a beta release to production in UAT itself.
 An effective test data management strategy ensured all compliance and organization processes
were adhered.
 Planning of periodic data refresh in different environments for robust data testing helped in on
time quality delivery.
About Rohit:
A Thought leader, Strategist and Quality
professional based in India. Rohit is currently
working for Sapient Ltd as Manager Quality.
Email: Rohit.aries@Gmail.com

Test data management

  • 1.
    TEST DATA MANAGEMENT Theneed for Continuous testing and Integration is well acknowledged across the industry today in order to fully embrace the agile methodology. This requires a complete shift to an extremely dynamic and Flexible development and testing process. For this access to Quality test data is the key to success. The Success factors also include a comprehensive test coverage leading to early detection of Defects. A strong test data strategy to overcome some of the challenges:  Lack of Specific data sets to test.  Not knowing where to look for the data/ not having appropriate access to the data.  Effort wastage in coordination, operational inefficiencies. Introduction Managing Test Data in Multiple environments(Non Production) is essential to enhance the quality of testing and optimizing effort in following ways: Functional Testing: An effective (Positive/ negative) functional test with appropriate test data helps in:  Finding defects early.  Focus on functional and Regression tests and not on steps required to reach the desired test state. Performance Testing: For applications where big data is involved, and performance is paramount, a robust, strong automated test data strategy is required. For sustained performance tests, test data must comprise of:  Stability  Load  Baseline Services Virtualization:  Realistic test data is required to simulate a live service behavior in an integrated fashion.  Subset of Production data helps in emulating end user behavior during beta releases/ UAT.
  • 2.
    Essential Steps fora Streamlined Test Data management Data requirement Analysis: Test data is predominantly created based on the Test Requirements. However, the complete analysis of data must consider the following  Systems: Systems involved in all of the testing phases.  Formats: Format of data which may be needed by different systems (Normalized, Raw, Json, Xls etc.) or different testing requirements (Negative, Positive, boundary values etc.)  Rules: Different rules may be applied to data at different stages of testing or location or type of data. For example: A service test may require data in raw format, however for doing a system test the requirement may be of a normalized format. Data Setup/Provisioning: There can be different approaches for creation of a realistic, referentially correct/intact test data.  Subset of Production data: This kind of data set is most accurate and can be easily created without adding a lot of administrative costs or challenges. These data sets are small enough to accommodate model changes but large enough to simulate production like behavior. The only in this approach is when sensitive data such as personal information of customer or encrypted data is involved.  Automated Data creation: In absence of any kind of production data, for effective testing, automated data generation jobs can be created which creates a large data set for both functional and non-functional testing. This data set is created to force error and boundary conditions. Data Restrictions: Data restrictions can be due to Regulations, Compliance, or sensitive client/customer data. Capabilities must be developed to de-mask such confidential data and provide a real look and feel. For example: In a cloud based testing model sensitive client information must not be shared. Test Data Administration:  Golden Copy: Creating a new copy of test data at each phase of testing or release will lead to a lot of effort consumption and may bear different results. Hence it is always a good idea to create a golden copy of reference data and provision a copy/ subset of the same depending on the test requirements.
  • 3.
     Maintenance: Datamaintenance is necessity at periodic intervals. This is required due to application design changes, Data model changes or plugging gaps identified from earlier test cycles.  Data Refresh: This is often required to reset the data source of test data environment for multiple rounds of testing as during testing the test data may be altered or exhausted. In a nutshell the effectiveness of test data management is critical for successful validation of any application. This is achieved via a well-defined process for data creation, usage along with appropriate usage of tools for comprehensive test coverage. Case Study: Project: Development of strategic platform which caters to institutional clients of a major investment bank. The requirements included providing the clients a Real time view of Holding, Performance, Reference, Risk and Transaction data with very specific visual requirements on how the data is to be shown to the end user. Challenges:  Multiple Data sources providing all of this data in different formats.  Multi-tiered service oriented architecture.  This data also had both compliance and organizational restrictions as this was sensitive real client data.  The data had to be transformed from its raw form to meet the Visual requirements. Objective: Data integrity must be maintained at all costs. Response time of the application is expected to be below 3 seconds irrespective of the data being shown to the client. Strategy and Solution. Phase of Testing Approach Pros Cons Independent Services Testing Automated Stubs for data creation to verify API signature in request and response. Early detection of issues with Service response. Limited data set availability didn’t allow a comprehensive testing. Leading to rework in later stages. Integration Tests Use of Automated jobs for production quality data creation and automated tests to validate expected and actual vis vis requirements. Not only integration defects were detected, were able to simulate end user behavior to test load on application as well. --- System Tests Subset of production data Tests with data variations Cost of testing was high as
  • 4.
    was taken tocreate test bed. yielded edge scenario defects. Data issues at source were found and fixed in source systems. Ensured smooth UAT. resources were spent to ensure no data leak/ breach. Coordination with Source data teams and controllers was required. UAT Testing with Actual production Data. Actual production data usage made sure data testing during UAT was successful. Simulated a beta release to production behavior. --- Conclusion:  Automated data creation helped in multiple rounds of regression tests. This ensured a robust application was delivered to next phase with minimal issues.  Usage of Production data helped simulate end user behavior of the application and weed out issues which could have caused high impact.  Automated tests helped in large data set validations, with thousands of data rows of hundreds of accounts quickly multiple times.  Source Data issues were found and fixed.  Application data being client specific and sensitive, it was paramount to ensure data integrity, Usage of actual production data helped simulate a beta release to production in UAT itself.  An effective test data management strategy ensured all compliance and organization processes were adhered.  Planning of periodic data refresh in different environments for robust data testing helped in on time quality delivery. About Rohit: A Thought leader, Strategist and Quality professional based in India. Rohit is currently working for Sapient Ltd as Manager Quality. Email: Rohit.aries@Gmail.com