2. Data Migration Testing: Agenda
The key to migration success
… test, test, test, and test again!
Where
… does it fit in the PDM v2 landscape?
When
… does the testing of data start during a migration project?
… and when does it end?
What
… is the scope of the data to be tested?
… is the grey area?
… do we need?
How
… do we test for completeness?
… do we test for accuracy?
Questions
3. Test, test, test, and test again!
(and repeat if necessary)
Data Migration Testing: The key to
Migration Success
4. Test, test, test, and test again!
Independent of the approach used in the migration (big bang or staggered, waterfall or agile)
sufficient test cycles should always be part of the overall project plan
Best practise (based on experience, but flexible based on budget and timelines):
At least 3 full test cycles should be scheduled, with partial cycles in between (if needed)
• Full cycles are used to test the end-to-end readiness of the target system(s)
• Partial cycles are used to (iteratively) fix problems found in the User Acceptance Testing
• Partial cycles can be as small as a single object, or can be as big as a 90% redo of the whole migration
• If more than 90% needs to be fixed after a UAT cycle, something is terribly wrong with the whole project,
and it might be better to start from scratch…
Use of actual target platform is recommended for all full test cycles
• No use testing a full migration on scaled down versions of the source or target platforms, as it will
completely distort any estimation of cut-over windows
A complete dry-run of the end-to-end migration should preferably be scheduled as the last full
test cycle
• Used as a dress-rehearsal for the final migration, and used to fine-tune the cut-over window
Every iteration should result in a more complete picture
Target will be evolving, based on implementation changes, and thus data migration will grow!
If in doubt of completeness or accuracy, beg for project extension to fit in another test cycle,
rather than ending up with a failed project!
5. Where does testing fit into the PDM v2 landscape?
Data Migration Testing: Where?
6. Where does testing fit into the PDM v2 landscape?
Traditionally, the testing of migrated data is limited to be a subset of the actual migration tool
But it should be much larger: it should be part of every portion that actually touches data!
Business
engagement
Technical
Landscape
analysis
(LA)
Gap analysis
and mapping
(GAM)
Migration design
and execution
(MDE)
Data quality rules
(DQR)
Legacy
decommissioning
(LD)
Key data stakeholder
management
(KDSM)
System retirement plan
(SRP)
Migrationstrategyandgovernance
(MSG)
Profiling tool Data quality tool
Migration
controller
DMZ
7. When
… does the testing of data start during a migration project?
… and when does it end?
Data Migration Testing: When?
8. When does the testing of data start during a migration project?
Simple answer: Right at the start!
Planning phase:
Starting with the planning phase of the project, allow for a parallel stream of testing
Discovery phase:
During discovery, get indicative counts from the business
• How many customers do you have?
• How many products do you manufacture?
Analysis phase:
During the initial data analysis, compare the indicative counts with what is found in the source
system(s), and report back
• Found 1,200 customer records, but you said you have only 1,000 active customers? What do we do with
the rest? Archive/discard/reactivate?
• Found 5,000 products, but you stated that you manufacture around 2,000? Does the product master we
have analysed contain sub-assemblies, and if so, how would we identify those?
Refine the counts based on the findings
Extract phase:
During the development of the ETL, test the volumes of extracted records against the refined
counts
• Are we missing records? Is this due to incorrect selection conditions, or missing data? (caused by inner
joins instead of outer joins used in combined extraction queries?)
9. When does testing end?
Traditional answer: Right at the end!
Testing should be the final step before the target system(s) is(are) handed over for User
Acceptance Testing
During the test cycles, all tests should be completed before handing over for (partial) UAT
During the final migration:
• Detail comparison of all business critical data, like product catalogues, financials and manufacturing
parameters needed for day to day operations, must be completed
• Detail data comparison of historical data, if it was migrated, can linger, as long as it is completed before the
first reporting runs that rely on this data
• Best practise suggests a 90% to 95% faultless data comparison is acceptable for sign-off
o Depends on the number of test cycles used during the project, and the success achieved during those
cycles
o Test cycles are primarily used to build confidence in the migration approach, and to (partially) test the
target system(s)
Correct answer: Testing should never stop!
In all migration projects, the test tools and completeness of the test suite is a perfect starting
point for ongoing data management, especially in a multi-system environment
It would be bad business practise not to exploit this fact, and allow the test suite to be
dismantled with the rest of the migration project
10. What
… is the scope of the data to be tested?
… is the grey area?
… do we need?
Data Migration Testing: What?
11. What is the scope of the data to be tested?
Simple answer: Every bit of data is in scope!
Normally, the business will only mention the source system(s) that are to be replaced and
decommissioned
This is the obvious data that has to be tested, but…
Integrated systems: will the interfaces still work as expected?
• When enriching/transforming data during migration, does the interface still give the target the exact data it
needs to understand and handle the incoming messages?
• Are the data formats extracted from the new systems the same as what is expected? Maybe longer or
shorter string values?
• One format very often overlooked: dates! Are they in the same format? Are they expected to be in UTC? Or
local time zone?
Online reporting tools (BI): will the data from the new system(s) flow into the cubes without
problems?
• New conversion layer might be necessary, or transformation of the current BI platform to allow seamless
continuation of the existing cubes and reports
Offline reporting tools: will these still be fed in a correct manner?
• Very often Excel reports are linked to the underlying database using ODBC queries, e.g. PowerPivot or
simply Excel tables
• Data sources need to be changed, and very often the queries adapted to produce the same output
• In an ideal world, these spreadsheets will be replaced, but we don’t live in an ideal world!
12. What is the grey area?
Historical data
If historical data is going to be archived as part of the migration, very often full blown data
testing is considered out of scope
However, the following tests are still required:
• Rudimentary tests like record and object counts
• Simple verification that archived historical data is accessible and displays in a valid manner
• Accessibility of historical data by the user groups who need access
Offline reporting tools
Many migrations include the aim to get rid of the myriad of spreadsheets used in the business
to manipulate and generate data reporting
• Typical scenario found at roughly 80% of businesses is that Excel is the final reporting tool for execs and
the board
However, the following is still required:
• 100% surety that these reports have something to replace them
• 100% surety that none of these spreadsheets contain any data not present in the target platform
• 100% surety that the business does not rely on any of these reports
If 100% surety can not be achieved, these offline reports immediately become part of the overall
migration scope, and needs to be addressed, fixed and tested!
13. What do we need?
Assumptions of requirements:
Migration Aim 1: non-disruptive to normal business
• Normal business users should ideally not be (physically) aware that a migration is taking place, i.e. no dips
in performance, no interruptions, no extra work
• To achieve this, source system(s) should be cloned to (temporary) systems where migration activities can
be executed
Migration Aim 2: target(s) should be flawless after the migration
• All possible scenario’s should be tested, including destructive testing and fail-over
• This is more a task of the systems team, so, to not interfere with migration/implementation/user aceptance
testing tasks, the target system(s) should be cloned
Migration Aim 3: target(s) must be functionally operable without workarounds
• User acceptance testing should include any and all possible operations, reports and interfacing between
systems
• Data integrity and quality should be of such a standard that there are no hick-ups when going live
Migration Aim 4: data testing should ideally be independent from the migration builders
• Independent verification of data transformations, enrichments and manipulations
But… budget available will be the main decision factor!
So, what should the ideal landscape look like?
14. Building the ideal migration landscape
Source(s) Target(s)
Staging
Testing
Clone(s) Clone(s)
15. How
… do we test data for completeness?
… do we test data for accuracy?
Data Migration Testing: How?
16. How do we test data for completeness?
Technical testing
Do we know where each record from the source has gone?
• End game is decommissioning, so each bit of data must be accounted for
• Whether migrating, archiving or truncating old data, each record must have a target, even if the target is
the bin!
Of the data that ends in the target(s), do we have the same or equivalent number of records as
those earmarked for migration in the source(s)?
• DQ rules could have combined records from the source(s) on the target(s)
• Enrichment could have caused extra data to be added to the target(s)
• Summarisation of e.g. historical sales orders could have caused one record in the target, and multiple in the
archiving system
• All these transformations have to be built into the reconciliation engine, and in the end, the count of the
source(s) must match the count of the target(s)
Functional testing (during UAT)
Can the users do their normal day-to-day work?
• No manual intervention or “work arounds” needed
Reports from source(s) and target(s) have the same values?
Are all interfaces working as expected?
• Do all integrated systems respond in the expected manner?
• Are the results from these systems fit for purpose on the target system(s)?
17. How do we test data for accuracy?
Read data from source
Read data from target
Add transformations and/or enrichments
Apply the same actions to source data, or
Reverse apply the actions to target data
Compare data on byte-for-byte level:
(e.g. simplified data set with 3 resultant columns)
select c1, c2, c3, sum(chk) from (
select s.c1, s.c2, s.c3, -1 as chk from source s
union all
select t.c1, t.c2, t.c3, 1 as chk from target t
)
group by c1, c2, c3
having sum(chk) <> 0
Any results? Error in migration!
Staging
Testing
Source(s) Target(s)
Lot of work to build, extremely complex and a lot of time needed for the comparison, but…
accuracy is 100% guaranteed!
19. Thank you for your attention!
Data Migration Testing: The End
Editor's Notes
1. The key to migration success
… test, test, test, and test again!
2. Where
… does it fit in the PDM v2 landscape?
3. When
… does the testing of data start during a migration project?
… and when does it end?
4. What
… is the scope of the data to be tested?
… is the grey area?
… do we need?
5. How
… do we test for completeness?
… do we test for accuracy?
6. Questions
1. Independent of the approach used in the migration (big bang or staggered, waterfall or agile) sufficient test cycles should always be part of the overall project plan
2. Best practise (based on experience, but flexible based on budget and timelines):
At least 3 full test cycles should be scheduled, with partial cycles in between (if needed)
Full cycles are used to test the end-to-end readiness of the target system(s)
Partial cycles are used to (iteratively) fix problems found in the User Acceptance Testing
Partial cycles can be as small as a single object, or can be as big as a 90% redo of the whole migration
If more than 90% needs to be fixed after a UAT cycle, something is terribly wrong with the whole project, and it might be better to start from scratch…
3. Use of actual target platform is recommended for all full test cycles
No use testing a full migration on scaled down versions of the source or target platforms, as it will completely distort any estimation of cut-over windows
4. A complete dry-run of the end-to-end migration should preferably be scheduled as the last full test cycle
Used as a dress-rehearsal for the final migration, and used to fine-tune the cut-over window
5. Every iteration should result in a more complete picture
Target will be evolving, based on implementation changes, and thus data migration will grow!
6. If in doubt of completeness or accuracy, beg for project extension to fit in another test cycle, rather than ending up with a failed project!
Traditionally, the testing of migrated data is limited to be a subset of the actual migration tool
But it should be much larger: it should be part of every portion that actually touches data!
1. Simple answer: Right at the start!
2. Planning phase:
Starting with the planning phase of the project, allow for a parallel stream of testing
3. Discovery phase:
During discovery, get indicative counts from the business
How many customers do you have?
How many products do you manufacture?
4. Analysis phase:
During the initial data analysis, compare the indicative counts with what is found in the source system(s), and report back
Found 1,200 customer records, but you said you have only 1,000 active customers? What do we do with the rest? Archive/discard/reactivate?
Found 5,000 products, but you stated that you manufacture around 2,000? Does the product master we have analysed contain sub-assemblies, and if so, how would we identify those?
Refine the counts based on the findings
5. Extract phase:
During the development of the ETL, test the volumes of extracted records against the refined counts
Are we missing records? Is this due to incorrect selection conditions, or missing data? (caused by inner joins instead of outer joins used in combined extraction queries?)
1. Traditional answer: Right at the end!
2. Testing should be the final step before the target system(s) is(are) handed over for User Acceptance Testing
During the test cycles, all tests should be completed before handing over for (partial) UAT
During the final migration:
Detail comparison of all business critical data, like product catalogues, financials and manufacturing parameters needed for day to day operations, must be completed
Detail data comparison of historical data, if it was migrated, can linger, as long as it is completed before the first reporting runs that rely on this data
Best practise suggests a 90% to 95% faultless data comparison is acceptable for sign-off
Depends on the number of test cycles used during the project, and the success achieved during those cycles
Test cycles are primarily used to build confidence in the migration approach, and to (partially) test the target system(s)
3. Correct answer: Testing should never stop!
In all migration projects, the test tools and completeness of the test suite is a perfect starting point for ongoing data management, especially in a multi-system environment
It would be bad business practise not to exploit this fact, and allow the test suite to be dismantled with the rest of the migration project
1. Simple answer: Every bit of data is in scope!
2. Normally, the business will only mention the source system(s) that are to be replaced and decommissioned
This is the obvious data that has to be tested, but…
3. Integrated systems: will the interfaces still work as expected?
When enriching/transforming data during migration, does the interface still give the target the exact data it needs to understand and handle the incoming messages?
Are the data formats extracted from the new systems the same as what is expected? Maybe longer or shorter string values?
One format very often overlooked: dates! Are they in the same format? Are they expected to be in UTC? Or local time zone?
4. Online reporting tools (BI): will the data from the new system(s) flow into the cubes without problems?
New conversion layer might be necessary, or transformation of the current BI platform to allow seamless continuation of the existing cubes and reports
5. Offline reporting tools: will these still be fed in a correct manner?
Very often Excel reports are linked to the underlying database using ODBC queries, e.g. PowerPivot or simply Excel tables
Data sources need to be changed, and very often the queries adapted to produce the same output
In an ideal world, these spreadsheets will be replaced, but we don’t live in an ideal world!
1. Historical data
If historical data is going to be archived as part of the migration, very often full blown data testing is considered out of scope
2. However, the following tests are still required:
Rudimentary tests like record and object counts
Simple verification that archived historical data is accessible and displays in a valid manner
Accessibility of historical data by the user groups who need access
3. Offline reporting tools
Many migrations include the aim to get rid of the myriad of spreadsheets used in the business to manipulate and generate data reporting
Typical scenario found at roughly 80% of businesses is that Excel is the final reporting tool for execs and the board
However, the following is still required:
100% surety that these reports have something to replace them
100% surety that none of these spreadsheets contain any data not present in the target platform
100% surety that the business does not rely on any of these reports
4. If 100% surety can not be achieved, these offline reports immediately become part of the overall migration scope, and needs to be addressed, fixed and tested!
1. Assumptions of requirements:
Migration Aim 1: non-disruptive to normal business
Normal business users should ideally not be (physically) aware that a migration is taking place, i.e. no dips in performance, no interruptions, no extra work
To achieve this, source system(s) should be cloned to (temporary) systems where migration activities can be executed
2. Migration Aim 2: target(s) should be flawless after the migration
All possible scenario’s should be tested, including destructive testing and fail-over
This is more a task of the systems team, so, to not interfere with migration/implementation/user aceptance testing tasks, the target system(s) should be cloned
3. Migration Aim 3: target(s) must be functionally operable without workarounds
User acceptance testing should include any and all possible operations, reports and interfacing between systems
Data integrity and quality should be of such a standard that there are no hick-ups when going live
4. Migration Aim 4: data testing should ideally be independent from the migration builders
Independent verification of data transformations, enrichments and manipulations
5. But… budget available will be the main decision factor!
6. So, what should the ideal landscape look like?
Aim is migrating from source to target
For data manipulation, we will probably need a staging area
To minimise disruption of normal BAU, we should clone the source systems (QA systems?)
To allow destructive tests on the new platform, we should also clone the target systems (new QA systems?)
To allow for independent testing, these activities should be physically separated from the staging environment
But, again, the budget will determine what can be achieved!
1. Technical testing
Do we know where each record from the source has gone?
End game is decommissioning, so each bit of data must be accounted for
Whether migrating, archiving or truncating old data, each record must have a target, even if the target is the bin!
2. Of the data that ends in the target(s), do we have the same or equivalent number of records as those earmarked for migration in the source(s)?
DQ rules could have combined records from the source(s) on the target(s)
Enrichment could have caused extra data to be added to the target(s)
Summarisation of e.g. historical sales orders could have caused one record in the target, and multiple in the archiving system
All these transformations have to be built into the reconciliation engine, and in the end, the count of the source(s) must match the count of the target(s)
3. Functional testing (during UAT)
Can the users do their normal day-to-day work?
No manual intervention or “work arounds” needed
4. Reports from source(s) and target(s) have the same values?
5. Are all interfaces working as expected? (NB - probably the most important test!)
Do all integrated systems respond in the expected manner?
Are the results from these systems fit for purpose on the target system(s)?
1. Read data from source
2. Read data from target
3. Add transformations and/or enrichments
Apply the same actions to source data, or
Reverse apply the actions to target data
4. Compare data on byte-for-byte level: (e.g. simplified data set with 3 resultant columns)
5. Any results? Error in migration!
6. Lot of work to build, extremely complex and a lot of time needed for the comparison, but…
7. Accuracy is 100% guaranteed