SlideShare a Scribd company logo
1 of 19
© 2016, Data Maven Limited
Data Migration Testing
The most important part of a migration project
Data Migration Testing: Agenda
The key to migration success
… test, test, test, and test again!
Where
… does it fit in the PDM v2 landscape?
When
… does the testing of data start during a migration project?
… and when does it end?
What
… is the scope of the data to be tested?
… is the grey area?
… do we need?
How
… do we test for completeness?
… do we test for accuracy?
Questions
Test, test, test, and test again!
(and repeat if necessary)
Data Migration Testing: The key to
Migration Success
Test, test, test, and test again!
 Independent of the approach used in the migration (big bang or staggered, waterfall or agile)
sufficient test cycles should always be part of the overall project plan
 Best practise (based on experience, but flexible based on budget and timelines):
 At least 3 full test cycles should be scheduled, with partial cycles in between (if needed)
• Full cycles are used to test the end-to-end readiness of the target system(s)
• Partial cycles are used to (iteratively) fix problems found in the User Acceptance Testing
• Partial cycles can be as small as a single object, or can be as big as a 90% redo of the whole migration
• If more than 90% needs to be fixed after a UAT cycle, something is terribly wrong with the whole project,
and it might be better to start from scratch…
 Use of actual target platform is recommended for all full test cycles
• No use testing a full migration on scaled down versions of the source or target platforms, as it will
completely distort any estimation of cut-over windows
 A complete dry-run of the end-to-end migration should preferably be scheduled as the last full
test cycle
• Used as a dress-rehearsal for the final migration, and used to fine-tune the cut-over window
 Every iteration should result in a more complete picture
 Target will be evolving, based on implementation changes, and thus data migration will grow!
 If in doubt of completeness or accuracy, beg for project extension to fit in another test cycle,
rather than ending up with a failed project!
Where does testing fit into the PDM v2 landscape?
Data Migration Testing: Where?
Where does testing fit into the PDM v2 landscape?
 Traditionally, the testing of migrated data is limited to be a subset of the actual migration tool
 But it should be much larger: it should be part of every portion that actually touches data!
Business
engagement
Technical
Landscape
analysis
(LA)
Gap analysis
and mapping
(GAM)
Migration design
and execution
(MDE)
Data quality rules
(DQR)
Legacy
decommissioning
(LD)
Key data stakeholder
management
(KDSM)
System retirement plan
(SRP)
Migrationstrategyandgovernance
(MSG)
Profiling tool Data quality tool
Migration
controller
DMZ
When
… does the testing of data start during a migration project?
… and when does it end?
Data Migration Testing: When?
When does the testing of data start during a migration project?
 Simple answer: Right at the start!
 Planning phase:
 Starting with the planning phase of the project, allow for a parallel stream of testing
 Discovery phase:
 During discovery, get indicative counts from the business
• How many customers do you have?
• How many products do you manufacture?
 Analysis phase:
 During the initial data analysis, compare the indicative counts with what is found in the source
system(s), and report back
• Found 1,200 customer records, but you said you have only 1,000 active customers? What do we do with
the rest? Archive/discard/reactivate?
• Found 5,000 products, but you stated that you manufacture around 2,000? Does the product master we
have analysed contain sub-assemblies, and if so, how would we identify those?
 Refine the counts based on the findings
 Extract phase:
 During the development of the ETL, test the volumes of extracted records against the refined
counts
• Are we missing records? Is this due to incorrect selection conditions, or missing data? (caused by inner
joins instead of outer joins used in combined extraction queries?)
When does testing end?
 Traditional answer: Right at the end!
 Testing should be the final step before the target system(s) is(are) handed over for User
Acceptance Testing
 During the test cycles, all tests should be completed before handing over for (partial) UAT
 During the final migration:
• Detail comparison of all business critical data, like product catalogues, financials and manufacturing
parameters needed for day to day operations, must be completed
• Detail data comparison of historical data, if it was migrated, can linger, as long as it is completed before the
first reporting runs that rely on this data
• Best practise suggests a 90% to 95% faultless data comparison is acceptable for sign-off
o Depends on the number of test cycles used during the project, and the success achieved during those
cycles
o Test cycles are primarily used to build confidence in the migration approach, and to (partially) test the
target system(s)
 Correct answer: Testing should never stop!
 In all migration projects, the test tools and completeness of the test suite is a perfect starting
point for ongoing data management, especially in a multi-system environment
 It would be bad business practise not to exploit this fact, and allow the test suite to be
dismantled with the rest of the migration project
What
… is the scope of the data to be tested?
… is the grey area?
… do we need?
Data Migration Testing: What?
What is the scope of the data to be tested?
 Simple answer: Every bit of data is in scope!
 Normally, the business will only mention the source system(s) that are to be replaced and
decommissioned
 This is the obvious data that has to be tested, but…
 Integrated systems: will the interfaces still work as expected?
• When enriching/transforming data during migration, does the interface still give the target the exact data it
needs to understand and handle the incoming messages?
• Are the data formats extracted from the new systems the same as what is expected? Maybe longer or
shorter string values?
• One format very often overlooked: dates! Are they in the same format? Are they expected to be in UTC? Or
local time zone?
 Online reporting tools (BI): will the data from the new system(s) flow into the cubes without
problems?
• New conversion layer might be necessary, or transformation of the current BI platform to allow seamless
continuation of the existing cubes and reports
 Offline reporting tools: will these still be fed in a correct manner?
• Very often Excel reports are linked to the underlying database using ODBC queries, e.g. PowerPivot or
simply Excel tables
• Data sources need to be changed, and very often the queries adapted to produce the same output
• In an ideal world, these spreadsheets will be replaced, but we don’t live in an ideal world!
What is the grey area?
 Historical data
 If historical data is going to be archived as part of the migration, very often full blown data
testing is considered out of scope
 However, the following tests are still required:
• Rudimentary tests like record and object counts
• Simple verification that archived historical data is accessible and displays in a valid manner
• Accessibility of historical data by the user groups who need access
 Offline reporting tools
 Many migrations include the aim to get rid of the myriad of spreadsheets used in the business
to manipulate and generate data reporting
• Typical scenario found at roughly 80% of businesses is that Excel is the final reporting tool for execs and
the board
 However, the following is still required:
• 100% surety that these reports have something to replace them
• 100% surety that none of these spreadsheets contain any data not present in the target platform
• 100% surety that the business does not rely on any of these reports
 If 100% surety can not be achieved, these offline reports immediately become part of the overall
migration scope, and needs to be addressed, fixed and tested!
What do we need?
 Assumptions of requirements:
 Migration Aim 1: non-disruptive to normal business
• Normal business users should ideally not be (physically) aware that a migration is taking place, i.e. no dips
in performance, no interruptions, no extra work
• To achieve this, source system(s) should be cloned to (temporary) systems where migration activities can
be executed
 Migration Aim 2: target(s) should be flawless after the migration
• All possible scenario’s should be tested, including destructive testing and fail-over
• This is more a task of the systems team, so, to not interfere with migration/implementation/user aceptance
testing tasks, the target system(s) should be cloned
 Migration Aim 3: target(s) must be functionally operable without workarounds
• User acceptance testing should include any and all possible operations, reports and interfacing between
systems
• Data integrity and quality should be of such a standard that there are no hick-ups when going live
 Migration Aim 4: data testing should ideally be independent from the migration builders
• Independent verification of data transformations, enrichments and manipulations
 But… budget available will be the main decision factor!
 So, what should the ideal landscape look like?
Building the ideal migration landscape
Source(s) Target(s)
Staging
Testing
Clone(s) Clone(s)
How
… do we test data for completeness?
… do we test data for accuracy?
Data Migration Testing: How?
How do we test data for completeness?
 Technical testing
 Do we know where each record from the source has gone?
• End game is decommissioning, so each bit of data must be accounted for
• Whether migrating, archiving or truncating old data, each record must have a target, even if the target is
the bin!
 Of the data that ends in the target(s), do we have the same or equivalent number of records as
those earmarked for migration in the source(s)?
• DQ rules could have combined records from the source(s) on the target(s)
• Enrichment could have caused extra data to be added to the target(s)
• Summarisation of e.g. historical sales orders could have caused one record in the target, and multiple in the
archiving system
• All these transformations have to be built into the reconciliation engine, and in the end, the count of the
source(s) must match the count of the target(s)
 Functional testing (during UAT)
 Can the users do their normal day-to-day work?
• No manual intervention or “work arounds” needed
 Reports from source(s) and target(s) have the same values?
 Are all interfaces working as expected?
• Do all integrated systems respond in the expected manner?
• Are the results from these systems fit for purpose on the target system(s)?
How do we test data for accuracy?
 Read data from source
 Read data from target
 Add transformations and/or enrichments
 Apply the same actions to source data, or
 Reverse apply the actions to target data
 Compare data on byte-for-byte level:
(e.g. simplified data set with 3 resultant columns)
select c1, c2, c3, sum(chk) from (
select s.c1, s.c2, s.c3, -1 as chk from source s
union all
select t.c1, t.c2, t.c3, 1 as chk from target t
)
group by c1, c2, c3
having sum(chk) <> 0
Any results? Error in migration!
Staging
Testing
Source(s) Target(s)
 Lot of work to build, extremely complex and a lot of time needed for the comparison, but…
accuracy is 100% guaranteed!
Any questions?
Data Migration Testing: Questions?
Thank you for your attention!
Data Migration Testing: The End

More Related Content

What's hot

Introducing the Snowflake Computing Cloud Data Warehouse
Introducing the Snowflake Computing Cloud Data WarehouseIntroducing the Snowflake Computing Cloud Data Warehouse
Introducing the Snowflake Computing Cloud Data WarehouseSnowflake Computing
 
07. Analytics & Reporting Requirements Template
07. Analytics & Reporting Requirements Template07. Analytics & Reporting Requirements Template
07. Analytics & Reporting Requirements TemplateAlan D. Duncan
 
Migration to Databricks - On-prem HDFS.pptx
Migration to Databricks - On-prem HDFS.pptxMigration to Databricks - On-prem HDFS.pptx
Migration to Databricks - On-prem HDFS.pptxKshitija(KJ) Gupte
 
MuleSoft Architecture Presentation
MuleSoft Architecture PresentationMuleSoft Architecture Presentation
MuleSoft Architecture PresentationRupesh Sinha
 
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...Hortonworks
 
DevOps Powerpoint Presentation Slides
DevOps Powerpoint Presentation SlidesDevOps Powerpoint Presentation Slides
DevOps Powerpoint Presentation SlidesSlideTeam
 
How Splunk connects Salesforce
How Splunk connects SalesforceHow Splunk connects Salesforce
How Splunk connects SalesforceMuleSoft
 
Cloud Migration Paths: Kubernetes, IaaS, or DBaaS
Cloud Migration Paths: Kubernetes, IaaS, or DBaaSCloud Migration Paths: Kubernetes, IaaS, or DBaaS
Cloud Migration Paths: Kubernetes, IaaS, or DBaaSEDB
 
Cloud-Native Observability
Cloud-Native ObservabilityCloud-Native Observability
Cloud-Native ObservabilityTyler Treat
 
The Importance of Metadata
The Importance of MetadataThe Importance of Metadata
The Importance of MetadataDATAVERSITY
 
Application Architecture: The Next Wave | MuleSoft
Application Architecture: The Next Wave | MuleSoftApplication Architecture: The Next Wave | MuleSoft
Application Architecture: The Next Wave | MuleSoftMuleSoft
 
Observability For Modern Applications
Observability For Modern ApplicationsObservability For Modern Applications
Observability For Modern ApplicationsAmazon Web Services
 
Data Architecture for Data Governance
Data Architecture for Data GovernanceData Architecture for Data Governance
Data Architecture for Data GovernanceDATAVERSITY
 
Observability for modern applications
Observability for modern applications  Observability for modern applications
Observability for modern applications MoovingON
 
Quality Assurance/Testing Overview & Capability Deck
Quality Assurance/Testing Overview & Capability DeckQuality Assurance/Testing Overview & Capability Deck
Quality Assurance/Testing Overview & Capability DeckSowmak Bardhan
 
Enterprise Data Architecture Deliverables
Enterprise Data Architecture DeliverablesEnterprise Data Architecture Deliverables
Enterprise Data Architecture DeliverablesLars E Martinsson
 
Data Management Services
Data Management ServicesData Management Services
Data Management ServicesBackOfficePro
 
How to implement DevOps in your Organization
How to implement DevOps in your OrganizationHow to implement DevOps in your Organization
How to implement DevOps in your OrganizationDalibor Blazevic
 

What's hot (20)

Introducing the Snowflake Computing Cloud Data Warehouse
Introducing the Snowflake Computing Cloud Data WarehouseIntroducing the Snowflake Computing Cloud Data Warehouse
Introducing the Snowflake Computing Cloud Data Warehouse
 
Data Migration Made Easy
Data Migration Made EasyData Migration Made Easy
Data Migration Made Easy
 
07. Analytics & Reporting Requirements Template
07. Analytics & Reporting Requirements Template07. Analytics & Reporting Requirements Template
07. Analytics & Reporting Requirements Template
 
Migration to Databricks - On-prem HDFS.pptx
Migration to Databricks - On-prem HDFS.pptxMigration to Databricks - On-prem HDFS.pptx
Migration to Databricks - On-prem HDFS.pptx
 
MuleSoft Architecture Presentation
MuleSoft Architecture PresentationMuleSoft Architecture Presentation
MuleSoft Architecture Presentation
 
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
 
DevOps Powerpoint Presentation Slides
DevOps Powerpoint Presentation SlidesDevOps Powerpoint Presentation Slides
DevOps Powerpoint Presentation Slides
 
How Splunk connects Salesforce
How Splunk connects SalesforceHow Splunk connects Salesforce
How Splunk connects Salesforce
 
Cloud Migration Paths: Kubernetes, IaaS, or DBaaS
Cloud Migration Paths: Kubernetes, IaaS, or DBaaSCloud Migration Paths: Kubernetes, IaaS, or DBaaS
Cloud Migration Paths: Kubernetes, IaaS, or DBaaS
 
Cloud-Native Observability
Cloud-Native ObservabilityCloud-Native Observability
Cloud-Native Observability
 
The Importance of Metadata
The Importance of MetadataThe Importance of Metadata
The Importance of Metadata
 
Application Architecture: The Next Wave | MuleSoft
Application Architecture: The Next Wave | MuleSoftApplication Architecture: The Next Wave | MuleSoft
Application Architecture: The Next Wave | MuleSoft
 
Observability For Modern Applications
Observability For Modern ApplicationsObservability For Modern Applications
Observability For Modern Applications
 
Introduction to MuleSoft
Introduction to MuleSoftIntroduction to MuleSoft
Introduction to MuleSoft
 
Data Architecture for Data Governance
Data Architecture for Data GovernanceData Architecture for Data Governance
Data Architecture for Data Governance
 
Observability for modern applications
Observability for modern applications  Observability for modern applications
Observability for modern applications
 
Quality Assurance/Testing Overview & Capability Deck
Quality Assurance/Testing Overview & Capability DeckQuality Assurance/Testing Overview & Capability Deck
Quality Assurance/Testing Overview & Capability Deck
 
Enterprise Data Architecture Deliverables
Enterprise Data Architecture DeliverablesEnterprise Data Architecture Deliverables
Enterprise Data Architecture Deliverables
 
Data Management Services
Data Management ServicesData Management Services
Data Management Services
 
How to implement DevOps in your Organization
How to implement DevOps in your OrganizationHow to implement DevOps in your Organization
How to implement DevOps in your Organization
 

Similar to DMM9 - Data Migration Testing

20171019 data migration (rk)
20171019 data migration (rk)20171019 data migration (rk)
20171019 data migration (rk)Ruud Kapteijn
 
Testing in the New World of Off-the-Shelf Software
Testing in the New World of Off-the-Shelf SoftwareTesting in the New World of Off-the-Shelf Software
Testing in the New World of Off-the-Shelf SoftwareJosiah Renaudin
 
Data migration patterns special
Data migration patterns   specialData migration patterns   special
Data migration patterns specialManikandan Suresh
 
Validation and Business Considerations for Clinical Study Migrations
Validation and Business Considerations for Clinical Study MigrationsValidation and Business Considerations for Clinical Study Migrations
Validation and Business Considerations for Clinical Study MigrationsPerficient, Inc.
 
E&P data management: Implementing data standards
E&P data management: Implementing data standardsE&P data management: Implementing data standards
E&P data management: Implementing data standardsETLSolutions
 
ThoughtWorks Continuous Delivery
ThoughtWorks Continuous DeliveryThoughtWorks Continuous Delivery
ThoughtWorks Continuous DeliveryKyle Hodgson
 
Creating a Data validation and Testing Strategy
Creating a Data validation and Testing StrategyCreating a Data validation and Testing Strategy
Creating a Data validation and Testing StrategyRTTS
 
Leveraging Your CMMS - From Selection to Daily Use
Leveraging Your CMMS - From Selection to Daily UseLeveraging Your CMMS - From Selection to Daily Use
Leveraging Your CMMS - From Selection to Daily Usejohnnyg14
 
Migration to the cloud
Migration to the cloudMigration to the cloud
Migration to the cloudEPAM Systems
 
Are we there Yet?? (The long journey of Migrating from close source to opens...
Are we there Yet?? (The long journey of Migrating from close source to opens...Are we there Yet?? (The long journey of Migrating from close source to opens...
Are we there Yet?? (The long journey of Migrating from close source to opens...Marco Tusa
 
Migrating Data Warehouse Solutions from Oracle to non-Oracle Databases
Migrating Data Warehouse Solutions from Oracle to non-Oracle DatabasesMigrating Data Warehouse Solutions from Oracle to non-Oracle Databases
Migrating Data Warehouse Solutions from Oracle to non-Oracle DatabasesJade Global
 
Data Warehouse (ETL) testing process
Data Warehouse (ETL) testing processData Warehouse (ETL) testing process
Data Warehouse (ETL) testing processRakesh Hansalia
 
The Ins and Outs of CTMS Data Migration
The Ins and Outs of CTMS Data MigrationThe Ins and Outs of CTMS Data Migration
The Ins and Outs of CTMS Data MigrationPerficient
 
Migration Decoded
Migration DecodedMigration Decoded
Migration DecodedCognizant
 
How to Migrate Drug Safety and Pharmacovigilance Data Cost-Effectively and wi...
How to Migrate Drug Safety and Pharmacovigilance Data Cost-Effectively and wi...How to Migrate Drug Safety and Pharmacovigilance Data Cost-Effectively and wi...
How to Migrate Drug Safety and Pharmacovigilance Data Cost-Effectively and wi...Perficient
 
Enterprise resource planning_system
Enterprise resource planning_systemEnterprise resource planning_system
Enterprise resource planning_systemJithin Zcs
 
How to improve your system monitoring
How to improve your system monitoringHow to improve your system monitoring
How to improve your system monitoringAndrew White
 
Anu_Sharma2016_DWH
Anu_Sharma2016_DWHAnu_Sharma2016_DWH
Anu_Sharma2016_DWHAnu Sharma
 

Similar to DMM9 - Data Migration Testing (20)

20171019 data migration (rk)
20171019 data migration (rk)20171019 data migration (rk)
20171019 data migration (rk)
 
Testing in the New World of Off-the-Shelf Software
Testing in the New World of Off-the-Shelf SoftwareTesting in the New World of Off-the-Shelf Software
Testing in the New World of Off-the-Shelf Software
 
Data migration patterns special
Data migration patterns   specialData migration patterns   special
Data migration patterns special
 
Validation and Business Considerations for Clinical Study Migrations
Validation and Business Considerations for Clinical Study MigrationsValidation and Business Considerations for Clinical Study Migrations
Validation and Business Considerations for Clinical Study Migrations
 
E&P data management: Implementing data standards
E&P data management: Implementing data standardsE&P data management: Implementing data standards
E&P data management: Implementing data standards
 
ThoughtWorks Continuous Delivery
ThoughtWorks Continuous DeliveryThoughtWorks Continuous Delivery
ThoughtWorks Continuous Delivery
 
Creating a Data validation and Testing Strategy
Creating a Data validation and Testing StrategyCreating a Data validation and Testing Strategy
Creating a Data validation and Testing Strategy
 
Leveraging Your CMMS - From Selection to Daily Use
Leveraging Your CMMS - From Selection to Daily UseLeveraging Your CMMS - From Selection to Daily Use
Leveraging Your CMMS - From Selection to Daily Use
 
Migration to the cloud
Migration to the cloudMigration to the cloud
Migration to the cloud
 
Are we there Yet?? (The long journey of Migrating from close source to opens...
Are we there Yet?? (The long journey of Migrating from close source to opens...Are we there Yet?? (The long journey of Migrating from close source to opens...
Are we there Yet?? (The long journey of Migrating from close source to opens...
 
Migrating Data Warehouse Solutions from Oracle to non-Oracle Databases
Migrating Data Warehouse Solutions from Oracle to non-Oracle DatabasesMigrating Data Warehouse Solutions from Oracle to non-Oracle Databases
Migrating Data Warehouse Solutions from Oracle to non-Oracle Databases
 
Data Warehouse (ETL) testing process
Data Warehouse (ETL) testing processData Warehouse (ETL) testing process
Data Warehouse (ETL) testing process
 
The Ins and Outs of CTMS Data Migration
The Ins and Outs of CTMS Data MigrationThe Ins and Outs of CTMS Data Migration
The Ins and Outs of CTMS Data Migration
 
Migration Decoded
Migration DecodedMigration Decoded
Migration Decoded
 
How to Migrate Drug Safety and Pharmacovigilance Data Cost-Effectively and wi...
How to Migrate Drug Safety and Pharmacovigilance Data Cost-Effectively and wi...How to Migrate Drug Safety and Pharmacovigilance Data Cost-Effectively and wi...
How to Migrate Drug Safety and Pharmacovigilance Data Cost-Effectively and wi...
 
Enterprise resource planning_system
Enterprise resource planning_systemEnterprise resource planning_system
Enterprise resource planning_system
 
Training - What is Performance ?
Training  - What is Performance ?Training  - What is Performance ?
Training - What is Performance ?
 
How to improve your system monitoring
How to improve your system monitoringHow to improve your system monitoring
How to improve your system monitoring
 
Anu_Sharma2016_DWH
Anu_Sharma2016_DWHAnu_Sharma2016_DWH
Anu_Sharma2016_DWH
 
Value Stream Maps
Value Stream MapsValue Stream Maps
Value Stream Maps
 

DMM9 - Data Migration Testing

  • 1. © 2016, Data Maven Limited Data Migration Testing The most important part of a migration project
  • 2. Data Migration Testing: Agenda The key to migration success … test, test, test, and test again! Where … does it fit in the PDM v2 landscape? When … does the testing of data start during a migration project? … and when does it end? What … is the scope of the data to be tested? … is the grey area? … do we need? How … do we test for completeness? … do we test for accuracy? Questions
  • 3. Test, test, test, and test again! (and repeat if necessary) Data Migration Testing: The key to Migration Success
  • 4. Test, test, test, and test again!  Independent of the approach used in the migration (big bang or staggered, waterfall or agile) sufficient test cycles should always be part of the overall project plan  Best practise (based on experience, but flexible based on budget and timelines):  At least 3 full test cycles should be scheduled, with partial cycles in between (if needed) • Full cycles are used to test the end-to-end readiness of the target system(s) • Partial cycles are used to (iteratively) fix problems found in the User Acceptance Testing • Partial cycles can be as small as a single object, or can be as big as a 90% redo of the whole migration • If more than 90% needs to be fixed after a UAT cycle, something is terribly wrong with the whole project, and it might be better to start from scratch…  Use of actual target platform is recommended for all full test cycles • No use testing a full migration on scaled down versions of the source or target platforms, as it will completely distort any estimation of cut-over windows  A complete dry-run of the end-to-end migration should preferably be scheduled as the last full test cycle • Used as a dress-rehearsal for the final migration, and used to fine-tune the cut-over window  Every iteration should result in a more complete picture  Target will be evolving, based on implementation changes, and thus data migration will grow!  If in doubt of completeness or accuracy, beg for project extension to fit in another test cycle, rather than ending up with a failed project!
  • 5. Where does testing fit into the PDM v2 landscape? Data Migration Testing: Where?
  • 6. Where does testing fit into the PDM v2 landscape?  Traditionally, the testing of migrated data is limited to be a subset of the actual migration tool  But it should be much larger: it should be part of every portion that actually touches data! Business engagement Technical Landscape analysis (LA) Gap analysis and mapping (GAM) Migration design and execution (MDE) Data quality rules (DQR) Legacy decommissioning (LD) Key data stakeholder management (KDSM) System retirement plan (SRP) Migrationstrategyandgovernance (MSG) Profiling tool Data quality tool Migration controller DMZ
  • 7. When … does the testing of data start during a migration project? … and when does it end? Data Migration Testing: When?
  • 8. When does the testing of data start during a migration project?  Simple answer: Right at the start!  Planning phase:  Starting with the planning phase of the project, allow for a parallel stream of testing  Discovery phase:  During discovery, get indicative counts from the business • How many customers do you have? • How many products do you manufacture?  Analysis phase:  During the initial data analysis, compare the indicative counts with what is found in the source system(s), and report back • Found 1,200 customer records, but you said you have only 1,000 active customers? What do we do with the rest? Archive/discard/reactivate? • Found 5,000 products, but you stated that you manufacture around 2,000? Does the product master we have analysed contain sub-assemblies, and if so, how would we identify those?  Refine the counts based on the findings  Extract phase:  During the development of the ETL, test the volumes of extracted records against the refined counts • Are we missing records? Is this due to incorrect selection conditions, or missing data? (caused by inner joins instead of outer joins used in combined extraction queries?)
  • 9. When does testing end?  Traditional answer: Right at the end!  Testing should be the final step before the target system(s) is(are) handed over for User Acceptance Testing  During the test cycles, all tests should be completed before handing over for (partial) UAT  During the final migration: • Detail comparison of all business critical data, like product catalogues, financials and manufacturing parameters needed for day to day operations, must be completed • Detail data comparison of historical data, if it was migrated, can linger, as long as it is completed before the first reporting runs that rely on this data • Best practise suggests a 90% to 95% faultless data comparison is acceptable for sign-off o Depends on the number of test cycles used during the project, and the success achieved during those cycles o Test cycles are primarily used to build confidence in the migration approach, and to (partially) test the target system(s)  Correct answer: Testing should never stop!  In all migration projects, the test tools and completeness of the test suite is a perfect starting point for ongoing data management, especially in a multi-system environment  It would be bad business practise not to exploit this fact, and allow the test suite to be dismantled with the rest of the migration project
  • 10. What … is the scope of the data to be tested? … is the grey area? … do we need? Data Migration Testing: What?
  • 11. What is the scope of the data to be tested?  Simple answer: Every bit of data is in scope!  Normally, the business will only mention the source system(s) that are to be replaced and decommissioned  This is the obvious data that has to be tested, but…  Integrated systems: will the interfaces still work as expected? • When enriching/transforming data during migration, does the interface still give the target the exact data it needs to understand and handle the incoming messages? • Are the data formats extracted from the new systems the same as what is expected? Maybe longer or shorter string values? • One format very often overlooked: dates! Are they in the same format? Are they expected to be in UTC? Or local time zone?  Online reporting tools (BI): will the data from the new system(s) flow into the cubes without problems? • New conversion layer might be necessary, or transformation of the current BI platform to allow seamless continuation of the existing cubes and reports  Offline reporting tools: will these still be fed in a correct manner? • Very often Excel reports are linked to the underlying database using ODBC queries, e.g. PowerPivot or simply Excel tables • Data sources need to be changed, and very often the queries adapted to produce the same output • In an ideal world, these spreadsheets will be replaced, but we don’t live in an ideal world!
  • 12. What is the grey area?  Historical data  If historical data is going to be archived as part of the migration, very often full blown data testing is considered out of scope  However, the following tests are still required: • Rudimentary tests like record and object counts • Simple verification that archived historical data is accessible and displays in a valid manner • Accessibility of historical data by the user groups who need access  Offline reporting tools  Many migrations include the aim to get rid of the myriad of spreadsheets used in the business to manipulate and generate data reporting • Typical scenario found at roughly 80% of businesses is that Excel is the final reporting tool for execs and the board  However, the following is still required: • 100% surety that these reports have something to replace them • 100% surety that none of these spreadsheets contain any data not present in the target platform • 100% surety that the business does not rely on any of these reports  If 100% surety can not be achieved, these offline reports immediately become part of the overall migration scope, and needs to be addressed, fixed and tested!
  • 13. What do we need?  Assumptions of requirements:  Migration Aim 1: non-disruptive to normal business • Normal business users should ideally not be (physically) aware that a migration is taking place, i.e. no dips in performance, no interruptions, no extra work • To achieve this, source system(s) should be cloned to (temporary) systems where migration activities can be executed  Migration Aim 2: target(s) should be flawless after the migration • All possible scenario’s should be tested, including destructive testing and fail-over • This is more a task of the systems team, so, to not interfere with migration/implementation/user aceptance testing tasks, the target system(s) should be cloned  Migration Aim 3: target(s) must be functionally operable without workarounds • User acceptance testing should include any and all possible operations, reports and interfacing between systems • Data integrity and quality should be of such a standard that there are no hick-ups when going live  Migration Aim 4: data testing should ideally be independent from the migration builders • Independent verification of data transformations, enrichments and manipulations  But… budget available will be the main decision factor!  So, what should the ideal landscape look like?
  • 14. Building the ideal migration landscape Source(s) Target(s) Staging Testing Clone(s) Clone(s)
  • 15. How … do we test data for completeness? … do we test data for accuracy? Data Migration Testing: How?
  • 16. How do we test data for completeness?  Technical testing  Do we know where each record from the source has gone? • End game is decommissioning, so each bit of data must be accounted for • Whether migrating, archiving or truncating old data, each record must have a target, even if the target is the bin!  Of the data that ends in the target(s), do we have the same or equivalent number of records as those earmarked for migration in the source(s)? • DQ rules could have combined records from the source(s) on the target(s) • Enrichment could have caused extra data to be added to the target(s) • Summarisation of e.g. historical sales orders could have caused one record in the target, and multiple in the archiving system • All these transformations have to be built into the reconciliation engine, and in the end, the count of the source(s) must match the count of the target(s)  Functional testing (during UAT)  Can the users do their normal day-to-day work? • No manual intervention or “work arounds” needed  Reports from source(s) and target(s) have the same values?  Are all interfaces working as expected? • Do all integrated systems respond in the expected manner? • Are the results from these systems fit for purpose on the target system(s)?
  • 17. How do we test data for accuracy?  Read data from source  Read data from target  Add transformations and/or enrichments  Apply the same actions to source data, or  Reverse apply the actions to target data  Compare data on byte-for-byte level: (e.g. simplified data set with 3 resultant columns) select c1, c2, c3, sum(chk) from ( select s.c1, s.c2, s.c3, -1 as chk from source s union all select t.c1, t.c2, t.c3, 1 as chk from target t ) group by c1, c2, c3 having sum(chk) <> 0 Any results? Error in migration! Staging Testing Source(s) Target(s)  Lot of work to build, extremely complex and a lot of time needed for the comparison, but… accuracy is 100% guaranteed!
  • 18. Any questions? Data Migration Testing: Questions?
  • 19. Thank you for your attention! Data Migration Testing: The End

Editor's Notes

  1. 1. The key to migration success … test, test, test, and test again! 2. Where … does it fit in the PDM v2 landscape? 3. When … does the testing of data start during a migration project? … and when does it end? 4. What … is the scope of the data to be tested? … is the grey area? … do we need? 5. How … do we test for completeness? … do we test for accuracy? 6. Questions
  2. 1. Independent of the approach used in the migration (big bang or staggered, waterfall or agile) sufficient test cycles should always be part of the overall project plan 2. Best practise (based on experience, but flexible based on budget and timelines): At least 3 full test cycles should be scheduled, with partial cycles in between (if needed) Full cycles are used to test the end-to-end readiness of the target system(s) Partial cycles are used to (iteratively) fix problems found in the User Acceptance Testing Partial cycles can be as small as a single object, or can be as big as a 90% redo of the whole migration If more than 90% needs to be fixed after a UAT cycle, something is terribly wrong with the whole project, and it might be better to start from scratch… 3. Use of actual target platform is recommended for all full test cycles No use testing a full migration on scaled down versions of the source or target platforms, as it will completely distort any estimation of cut-over windows 4. A complete dry-run of the end-to-end migration should preferably be scheduled as the last full test cycle Used as a dress-rehearsal for the final migration, and used to fine-tune the cut-over window 5. Every iteration should result in a more complete picture Target will be evolving, based on implementation changes, and thus data migration will grow! 6. If in doubt of completeness or accuracy, beg for project extension to fit in another test cycle, rather than ending up with a failed project!
  3. Traditionally, the testing of migrated data is limited to be a subset of the actual migration tool But it should be much larger: it should be part of every portion that actually touches data!
  4. 1. Simple answer: Right at the start! 2. Planning phase: Starting with the planning phase of the project, allow for a parallel stream of testing 3. Discovery phase: During discovery, get indicative counts from the business How many customers do you have? How many products do you manufacture? 4. Analysis phase: During the initial data analysis, compare the indicative counts with what is found in the source system(s), and report back Found 1,200 customer records, but you said you have only 1,000 active customers? What do we do with the rest? Archive/discard/reactivate? Found 5,000 products, but you stated that you manufacture around 2,000? Does the product master we have analysed contain sub-assemblies, and if so, how would we identify those? Refine the counts based on the findings 5. Extract phase: During the development of the ETL, test the volumes of extracted records against the refined counts Are we missing records? Is this due to incorrect selection conditions, or missing data? (caused by inner joins instead of outer joins used in combined extraction queries?)
  5. 1. Traditional answer: Right at the end! 2. Testing should be the final step before the target system(s) is(are) handed over for User Acceptance Testing During the test cycles, all tests should be completed before handing over for (partial) UAT During the final migration: Detail comparison of all business critical data, like product catalogues, financials and manufacturing parameters needed for day to day operations, must be completed Detail data comparison of historical data, if it was migrated, can linger, as long as it is completed before the first reporting runs that rely on this data Best practise suggests a 90% to 95% faultless data comparison is acceptable for sign-off Depends on the number of test cycles used during the project, and the success achieved during those cycles Test cycles are primarily used to build confidence in the migration approach, and to (partially) test the target system(s) 3. Correct answer: Testing should never stop! In all migration projects, the test tools and completeness of the test suite is a perfect starting point for ongoing data management, especially in a multi-system environment It would be bad business practise not to exploit this fact, and allow the test suite to be dismantled with the rest of the migration project
  6. 1. Simple answer: Every bit of data is in scope! 2. Normally, the business will only mention the source system(s) that are to be replaced and decommissioned This is the obvious data that has to be tested, but… 3. Integrated systems: will the interfaces still work as expected? When enriching/transforming data during migration, does the interface still give the target the exact data it needs to understand and handle the incoming messages? Are the data formats extracted from the new systems the same as what is expected? Maybe longer or shorter string values? One format very often overlooked: dates! Are they in the same format? Are they expected to be in UTC? Or local time zone? 4. Online reporting tools (BI): will the data from the new system(s) flow into the cubes without problems? New conversion layer might be necessary, or transformation of the current BI platform to allow seamless continuation of the existing cubes and reports 5. Offline reporting tools: will these still be fed in a correct manner? Very often Excel reports are linked to the underlying database using ODBC queries, e.g. PowerPivot or simply Excel tables Data sources need to be changed, and very often the queries adapted to produce the same output In an ideal world, these spreadsheets will be replaced, but we don’t live in an ideal world!
  7. 1. Historical data If historical data is going to be archived as part of the migration, very often full blown data testing is considered out of scope 2. However, the following tests are still required: Rudimentary tests like record and object counts Simple verification that archived historical data is accessible and displays in a valid manner Accessibility of historical data by the user groups who need access 3. Offline reporting tools Many migrations include the aim to get rid of the myriad of spreadsheets used in the business to manipulate and generate data reporting Typical scenario found at roughly 80% of businesses is that Excel is the final reporting tool for execs and the board However, the following is still required: 100% surety that these reports have something to replace them 100% surety that none of these spreadsheets contain any data not present in the target platform 100% surety that the business does not rely on any of these reports 4. If 100% surety can not be achieved, these offline reports immediately become part of the overall migration scope, and needs to be addressed, fixed and tested!
  8. 1. Assumptions of requirements: Migration Aim 1: non-disruptive to normal business Normal business users should ideally not be (physically) aware that a migration is taking place, i.e. no dips in performance, no interruptions, no extra work To achieve this, source system(s) should be cloned to (temporary) systems where migration activities can be executed 2. Migration Aim 2: target(s) should be flawless after the migration All possible scenario’s should be tested, including destructive testing and fail-over This is more a task of the systems team, so, to not interfere with migration/implementation/user aceptance testing tasks, the target system(s) should be cloned 3. Migration Aim 3: target(s) must be functionally operable without workarounds User acceptance testing should include any and all possible operations, reports and interfacing between systems Data integrity and quality should be of such a standard that there are no hick-ups when going live 4. Migration Aim 4: data testing should ideally be independent from the migration builders Independent verification of data transformations, enrichments and manipulations 5. But… budget available will be the main decision factor! 6. So, what should the ideal landscape look like?
  9. Aim is migrating from source to target For data manipulation, we will probably need a staging area To minimise disruption of normal BAU, we should clone the source systems (QA systems?) To allow destructive tests on the new platform, we should also clone the target systems (new QA systems?) To allow for independent testing, these activities should be physically separated from the staging environment But, again, the budget will determine what can be achieved!
  10. 1. Technical testing Do we know where each record from the source has gone? End game is decommissioning, so each bit of data must be accounted for Whether migrating, archiving or truncating old data, each record must have a target, even if the target is the bin! 2. Of the data that ends in the target(s), do we have the same or equivalent number of records as those earmarked for migration in the source(s)? DQ rules could have combined records from the source(s) on the target(s) Enrichment could have caused extra data to be added to the target(s) Summarisation of e.g. historical sales orders could have caused one record in the target, and multiple in the archiving system All these transformations have to be built into the reconciliation engine, and in the end, the count of the source(s) must match the count of the target(s) 3. Functional testing (during UAT) Can the users do their normal day-to-day work? No manual intervention or “work arounds” needed 4. Reports from source(s) and target(s) have the same values? 5. Are all interfaces working as expected? (NB - probably the most important test!) Do all integrated systems respond in the expected manner? Are the results from these systems fit for purpose on the target system(s)?
  11. 1. Read data from source 2. Read data from target 3. Add transformations and/or enrichments Apply the same actions to source data, or Reverse apply the actions to target data 4. Compare data on byte-for-byte level: (e.g. simplified data set with 3 resultant columns) 5. Any results? Error in migration! 6. Lot of work to build, extremely complex and a lot of time needed for the comparison, but… 7. Accuracy is 100% guaranteed