1. 26 Aug 2020
Data Migration in Life Sciences
Krishna veni Rapuru-
Manager,Engineering
2. ● Krishnaveni Rapuru
○ Engineering Manager at Medidata Solutions
○ 12 years of industry experience across Lifescience,
Healthcare,Government & Public sector
○ Leadership experience around 6 years
○ Passion : Technology, love <coding> & working with
people, and continuous learning.
About Me
3. Agenda
❏ About Medidata
❏ What is CTMS & Clinical Data?
❏ Why to take Data Migration Seriously?
❏ Medidata Vision & Purpose on Data Migration
❏ Constraints & Risk
❏ Approach
❏ Challenges & Learnings
❏ Best Practices
4. About Medidata
Medidata is leading the digital transformation of life sciences, creating hope for
millions of patients.
Medidata helps generate the evidence and insights to help pharmaceutical, biotech,
medical device and diagnostics companies, and academic researchers accelerate
value, minimize risk, and optimize outcomes. More than one million registered users
across 1,400 customers and partners access the world's most-used platform for
clinical development, commercial, and real-world data.
Medidata, a Dassault Systèmes company (Euronext Paris: #13065, DSY.PA), is
headquartered in New York City and has offices around the world to meet the needs
of its customers.
7. Brief on CTMS & Clinical Data
Simple terms …. A Clinical Trial Management System (CTMS) is a software system used by
biotechnology and pharmaceutical industries to manage clinical trials in clinical research.
8. Why to take Data Migration Seriously?
Good System + Bad Data = Disaster
• Adoption Problems
• Customer Relationship issues
• Decrease in Revenue Generation
• Analytics problems
9. Vision & Purpose
● Provide a seamless data migration solution for all new & existing
customers who wants to adopt Medidata new generation platform
in industry standard formats.
● Migrate all existing customers from legacy product/s to new
platform to reduce operational costs and to improve the efficiency
for users with single business process.
● Single view of data,streamlined process, intelligent monitoring &
reporting
10. Constraints and Risk
● Loss of Data: Data could be lost in the process of cleaning and/or migrating
between source to destination.
● Time Lag: Could be a gap between when the data is unavailable in the legacy
system and when it is available in the new system
● Time Overlap: Data could be available in two systems before the source or
legacy system is decommissioned
● Loss of Functionality: New CTMS might not have the same functionality as
all combined legacy systems and tools
13. ● Not contacting & communication with key stakeholders early in process
● Lack of Data governance
● Lack of planning
● Lack of Subject Matter Experts & Skill sets
● Waiting for “perfect” specs between source and target systems, instead start
development with 60% - 70% “knowns”
● Lack of Strategy & Execution around testing the migration. Make a choice to
start with Automation & Manual (both).
● Not spending time on dry run with “real” sample source data.
15. ● Understand the data
■ Source data to destination data
■ Assessment of Data Meaning & Quality
● Define Business Process & Project Governance
■ Clear business process with owners
■ Clear roles & responsibilities matrix
● Rollback & Dry Run
■ Define Rollback strategy to mitigate failures
■ Dry run to measure
● The Importance of the Data Mapping Specification Document
■ Source of truth
● Perform comprehensive validation testing
■ Source to Destination data integrity check (Automated & Manual)
■ Risk-based Sampling Strategy
16. Key Takeaways
● CTMS & Clinical Data
● Need for Data Migration Strategy
● Vision & Purpose for CTMS Data Migration
● Constraints & Risk
● Approach
● Challenge & Learnings
● Best Practices
Alright everyone, welcome and thank you so much for joining us today.
.. I am very excited to share, some insights about Data Migration, within the Life Science industry, and how medidata does Clinical data Migration . I promise, all of you get some valuable takeaways from this talk.
Frist, Let me introduce myself ,
I am Krishna, working as an engineering manager, in Medidata solutions for the past 4 years. I started my career as an engineer 12 years ago, and work through different roles across different industries like Healthcare , government and public sectors. I have been leading and managing teams for the past 6 years.
My passion has always been technology, and working with people. I love exploring and learning new things.
On the personal front , I am a mother of 2 crazy kids, and really enjoy spending time with them.
Let's get started! I’ll start with a brief intro about Medidata, then what is CTMS and clinical data means. After I will be doing a deep dive into Clinical data migration, and will be concluding with Best Practices, along with Q&A .
Now let me talk about Medidata for a moment
Medidata is a growing company, and part of DS systems. We are globally distributed company, headquartered in NY with around 2500 employees, and Europe is the fastest growing market within Medidata.
We are really having an impact, by working together, with clients, customers and partners, to provide a better quality of life, to so many human beings. Helping our customers, provide new drugs that cure cancer ,and other serious diseases, is really something that has a positive impact on society, and that really motivates me tremendously day by day.
Now bit more on Medidata...
Top 13/15 drugs sold in 2017, was developed using Medidata Technology.
Every time I read these kind of news, and I see that one of our clients has launched a new drug, I’m obviously very proud, and also very motivated, because all of the drugs, that have been developed by our partners and clients, were touched by Medidata’s platform.
Here you can see, some of the big & major brands, who are part of the Medidata client base.
Before going into the details of migration , let me give you a short brief on, what is CTMS and what classifies as Clinical Data , which is a key pre-step, before we delve into the details of the migration, for this session.
So what is CTMS ? CTMS is a market standard solution provided to CROs, i.e Clinical research organisation and Sponsors, who can conduct intelligent trials, which enable them to rollout drugs to the market.
Now going to the diagram from left to right ,
On the left hand side , we can see that, there are 2 segments of Data stream, which comes into Medidata Clinical Cloud. Medidata Clinical Cloud is a cloud native Application and Data Ecosystem.
First segment being, all the data which comes from electronic devices integration , otherwise called EDC, which is nothing but Electronic Data Capture , which are used in Trail sites,Hospitals and other pharmaceutical organizations. Second segment of data, which we are calling as “Other resources”, comes from lab equipment, sensors, x-ray images and electronic based agreements.
Second key integral of the Medidata echo system is CTMS, which you can see here, CTMS consumes and publishes data, in and out of Medidata Clinical Cloud . It is primarily used by CROs / Sponsors and other specific users within clinical industry , to track and manage,
What study wants to be conducted, where and by whom ? , This is known as Core Study
How many patients (which we call as Subjects) visited a site/location? A.k.a Subject visits
How many of them are enrolled on the trail, track and measure around this ? This is known as Enrollment
What issues are faced, as part of the trial process? Who and how to resolve? This is called Issues & Actions
What is the status , timelines and progress of the trail? Known as Milestones
Lastly, there are regulatory and country compliance requirements and standards, which has to be met as part of this entire process.
By the way for those who are interested ,all this information is available on our website.
Let’s start with why :)
Why it's very important to have a clear DataMigration Strategy, because even if we build a near perfect technology platform, once bad data drives the platform, it can turn out to be a potential disaster and nightmare. This will result in difficult customer adoptions and retention issues, revenue impact and lastly incorrect and high volatile behaviours, which drives inaccurate analytics.
Now let's talk about what's the vision, and purpose behind Data Migration for Medidata, specially around CTMS (1 min)
As part of vision, there are 3 key goals, which we wanted to achieve
How can we provide a seamless, and unified data migration platform, for our existing and new customers?
How to provide an upgrade path for our existing customers, from legacy systems to Medidata Clinical new platform, to reduce cost and improve operational efficiency?
Last and final goal being, providing a single and 360 view of migrated data, to our both internal and external customers, through intelligent monitoring and reporting.
As part of the migration strategy,key areas of risk and constraints, which needs to be considered are,
Potential data loss during migration, because of cleansing, transformation and various mapping activities.
Various time bound factors like, time and duration of overall activity , overlap & sync of old and migrated data , could create data inconsistencies between source and destination, which may potentially lead to data integrity issues as well.
Lastly , Capability / Functionality gap between source and destination systems, it's a “fact” that we can’t do a “like to like” match of all functionalities between source and destination, so it can end up in loose or consolidation of functionality, and data in destination.
Now, let me talk through the approach we have taken, to achieve overall strategy and execution.
As you see here, There are 3 key stages
Analysis
Development
Go live and support
During the Analysis phase, we have set objectives, scope and success criteria, along with identifying and engaging with key stakeholders and partners, which includes customers as well. Then we performed, a detailed data analysis, where we assessed data models between source and destinations, along with data quality .This generated
Data Specification Document,
Quality of data in source systems,
mapping document, which captures data relationships between source and destination and
capturing all customer journeys with respective data flows.
Next key stage is Development, where we did Design,Build and Test.
As part of Design phase,
We came up with high level enterprise architecture.
After that we went through the Buy/Build process, where we did technology/tool selection, based on scale,complexity, ease of adoption,business value, cost and time to market.
Then we Identified and documented all data mapping and translation rules .
During the Build phase, we developed capabilities to extract data from different sources in different formats , transforming the same into a unified format ,which destinations can understand, and also adhere to all mapping and business rules.
Along with transformation we also built the capability, to publish the data into destination systems in certain sequences ,so that data integrity was maintained.
The most important part when we develop this solution, is around the right level of instrumentation, so that we can proactively, monitor, measure the system availability and performance .
Key challenge is not just building the solution, but to have a proper test strategy, to test both sanity tests with smaller data sets, and also volumetric data test.This test strategy was executed ,by a combination of both automation and manual spot test.
Once we have developed the system / solution, and test strategy established, we ran several cycles of test , issue identifications and fix, till we meet our quality acceptance goal .
In addition to system design and testing ,we also built a reporting capability, where user can identify , trace the data, follow and resolve the issues in a timely manner. This in turn, helps to maintain the data integrity between source and destination, and seamless customer experience.
Last stage which is critical stage i.e Go live and support, this is the stage, where we deployed the solution into production in an automated fashion, integrating with a centralised monitoring system.
Before actual customer migration starts, we defined a business process flow, where every customer needs to go through, a pilot run of a real customer data , to check E2E flow with all integrations is working as expected or not.
Once the pilot run successfully completes, then we cleanse the customer data and perform actual migration. This is not a one day job and it’s cycle of activities, which happens over the time, which needs continuous care and support , till we achieve our migration completion goals for each customer.
While ago , I found this image accidently, when we were going through a challenging phase of the project. Believe it or not , this is so true!
No matter how big the migration / how well we plan, there will be always some surprises and challenges. So, I would like to share some of the challenges we went through, and how did we address them,
First one,
“Connect”with key stakeholders. No matter the size of the migration, validate your assumptions with key stakeholders, and explain the impact on them before you get going on the task .If you don’t, it will back bite you at some stage, and that will disrupt your timelines.
Second key one
“Constant Communication” with the business. Once you’ve explained the project to the stakeholders, be sure to keep them informed of your progress. It’s best to provide a status report on a weekly basis, especially if things get off track.
Next being
Data governance. Be sure you’re clear on who has the rights to create, approve, edit, or remove data from the source system, and document that in writing as part of your project plan / define the clear process.
If you ask me, this is the key for success which is
Planning - Do not underestimate the analysis between source and target systems, and understand at least 70% before planning. We went to development with a 40% understanding , and resulted in (good) amount of changes in destination systems, and behaviour in a short time. This also generated rework during the development phase.
Another important one to consider
Having the right skills and expertise. Although this is a straightforward task, there's a lot of complexity involved in moving data. Having an experienced professional, with excellent knowledge on both source and destination, helps the process go smoothly.
Next,
Don’t wait for perfect spec, make a start , and go with 70% readiness. You have to iterate, as you go along to reach the target state.
Last but one
Having a clear Strategy & Execution around testing. We made a choice to go with full automation ,then realised it was too much off upfront cost and slowed us down , then we went with the hybrid approach of, Automation & Manual.
Finally
Not spending a good amount of time, on dry run with “real” sample source data, especially data variations, during development and release cycle.
Lastly,I would like to share, a few best practices that we have been doing. I think it is an important aspect, to cover and give people the opportunity, to gain a little bit from our experience, and understand what is working .
Tip 1 – Understanding the data
Before starting the data migration, you have to prepare your data for the migration, carrying out an assessment of what is present in the source system, understanding clearly which data needs to be migrated.
We can divide the assessment of source system in two macro categories:
Assessment of the data meaning
Assessment of the data quality
Data meaning ,every piece of data, that you move is something that has to be validated, cleaned and transformed. In data migration projects, migrating only relevant data ensures efficiency and cost control.Understanding how the source data, will be used in the target system is necessary, for defining what to migrate.
The second macro area, is the assessment of the quality of the data.
It is very important, to define a process to measure data quality early in the project, The quality analysis, typically leads to a data cleaning activity. Cleaning the source data is a key element of reducing data migration effort.
Tip 2 – Project Governance
The best for approaching a data migration project is, clearly defining roles and responsibilities, and avoiding accountability overlapping. This can be done in several steps:
Define the owner of the data in the target system
Include the business users in decision-making. They understand the history, the structure and the meaning of the source data.
Based on our experience, what makes a difference is the presence of a business analyst. This is a person that acts as a bridge, between the technical staff involved in the technical implementation of the migration, and the businesspeople.
Tip 3 – Roll back & Dry Run
A roll back strategy has to be put in place in order to mitigate risks of potential failures. Access to source data have to be done in read only mode. This prevents any kind of data modification ,and ensures its integrity.
Tip 4 – The Importance of the Data Mapping Specification Document
This document is core of data migration. It ensures a complete field mapping, and it is used to collect all mapping rules and exceptions.This project phase is usually long and tiring, for a number of reasons.
Volume and amount of data details
Technical activity with technical documents
Little knowledge of dynamics of target database
Compromises that have to be made
Some tips I can share to help you to do it in the most efficient way:
Clarify what has to be migrated and what shouldn’t be migrated
Clean source data – this will reduce the number of fields to migrate
Liaise with a business analyst that will translate technical requirements, and help to explain how data will work in the target system
Rely on data migration expert, that have already performed similar data migration in the past
Lastly , Tip number 5 – Perform comprehensive validation testing
To ensure that the goals of the data migration strategy are achieved, a company needs to develop a solid migration verification process. The data migration verification strategy, needs to include ways to prove that, the migration was successfully completed, and data integrity was maintained.
So to quickly recap ,
We have covered, what is CTMS and clinical data , we also talked about Strategy , vision and purpose
Constraints and risks which comes as part of these kind of projects and
What approach we have taken
In the end we also talked about challenges , learnings and best practices.
As Steve jobs says, team comes first . so one pillar is strategy , but the most important factor for success is “team”. I’m extremely happy to be part of such a great team in Medidata , which helped us to achieve this goal. This is just the beginning of a journey.
I hope this session has been very informative to you all !
Thank you..