SlideShare a Scribd company logo
1 of 29
1
Presenter
Bill Hayduk
Founder / President
Presenter
Jeff Bocarsly, Ph.D.
Senior Architect
built bybuilt by
QuerySurge™
built by
The average organization loses $8.2 million annually
through poor Data Quality.
- Gartner
46% of companies cite Data Quality as a barrier
for adopting Business Intelligence products.
- InformationWeek
The cost per patient data of Phase 3 clinical studies of
new pharmaceuticals exceeds $26,000.
- Journal of Clinical Research Best Practices
built by
QuerySurge™
Pharma’s
2 Largest
Data Warehousing Concerns
built by
QuerySurge™
Pharma’s Largest Data Warehouse Concerns
(1) Data Integrity (2) Compliance
built by
QuerySurge™
(1) Data Integrity
high risk of defects that are not readily visible
Missing Data
Truncation of Data
Data Type Mismatch
Null Translation errors
Incorrect Type Translation
Misplaced Data
Extra Records
Transformation Logic Errors/Holes
Simple/Small Errors
Sequence Generator errors
Undocumented Requirements
Not Enough Records
built by
Pharma’s Data Warehouse Concerns
QuerySurge™
Pharma’s Data Warehouse Concerns
(2) Compliance
Need to comply with Part 11 mandates
historical test information test version history
test execution data:
who, what & when
test cycle information
visibility of assets archived test results
built by
QuerySurge™
Why is this Important?
 Periodic data reporting to FDA
 Periodic data reporting to int’l
bodies
(1) Data Integrity (2) Compliance
 FDA announced audits
 Unannounced FDA audits
Consequences
Severe financial and
business
built by
QuerySurge™
Pharma’s
Testing & Reporting
Needs
built by
QuerySurge™
 automate the manual testing of data
 compare millions of rows of data quickly
 flag mismatches and inconsistencies in data sets
 provide flexibility in scheduling test runs
 generate informative reports that can easily be shared
with the team
 validate up to 100% of all of all data, mitigating the risk
Data Integrity needs
Need a testing solution that can…
built by
QuerySurge™
Part 11 Reporting needs
 track test history
 provide reporting on test version history
 record all test execution by testing owner’s name
and date
 deliver auditable reports of test cycles
 store all test outcomes and test data
 offer a read-only user for reviewing test assets
 support archiving of results
Need a testing solution that can…
built by
QuerySurge™
built by
The solution…
built by
QuerySurge™
What is QuerySurge™?
the collaborative
Data Testing solution that
finds bad data & provides
a holistic view of your
data’s health
built by
QuerySurge™
• Reduce your costs & risks
• Improve your data quality
• Accelerate your testing cycles
• Share information with your team
with QuerySurge™ you can:
built by
QuerySurge™
• Provides huge ROI (i.e. 1,300%)*
*based on client’s calculation of Return on Investment
Finding Bad Data
SQL
HQL
SQL
HQL
SQL
SQL
 QS pulls data from data sources
 QS pulls data from target data store
 QS compares data quickly
 QS generates reports, audit trails
How?
Reports, Data Health Dashboard
built by
QuerySurge™
QuerySurge™ Architecture
Web-based…
Installs on...
Linux
Connects to…
…or any other JDBC compliant data source
built by
QuerySurge™
QuerySurge
Controller
QuerySurge
Server
QuerySurge
Agents
Flat Files
Collaboration
Testers
- functional testing
- regression testing
- result analysis
Developers / DBAs
- unit testing
- result analysis
Data Analysts
- review, analyze data
- verify mapping failures
Operations teams
- monitoring
- result analysis
Managers
- oversight
- result analysis
Share information on the
built by
QuerySurge™
the QuerySurge advantage
built by
QuerySurge™
Automate the entire testing cycle
 Automate kickoff, tests, comparison, auto-emailed results
Create Tests easily with no SQL programming
 ensures minimal time & effort to create tests / obtain results
Test across different platforms
 data warehouse, Hadoop, NoSQL, database, flat file, XML
Collaborate with team
 Data Health dashboard, shared tests & auto-emailed reports
Verify more data & do it quickly
 verifies up to 100% of all data up to 1,000 x faster
Integrate for Continuous Delivery
 Integrates with most Build, ETL & QA management software
built by
QuerySurge™
QuerySurge™ Modules
Design Tests
SchedulingDeep-Dive Reporting
Run Dashboard
Query WizardsData Health Dashboard
Fast and Easy.
No programming needed.
built by
QuerySurge™
QuerySurge™ Modules
Compare by Table, Column & Row
• Perform 80% of all data tests
•Automatically generates SQL code
• Opens up testing to novice & non-
technical team members
• Speeds up testing for skilled SQL coders
• provides a huge Return-On-Investment
built by
QuerySurge™
QuerySurge™ Modules
3 Types of Data Comparison Wizards:
The also provide you with automated features for:
o filtering (‘Where’ clause) and
o sorting (‘Order By’ clause)
Column-Level Comparison:
This is great for Big Data stores and Data Warehouses where tables will have some columns
containing transformations and some columns with no transformations. Many tables and
columns can be compared simultaneously and quickly.
Table-Level Comparison:
This comparator is great for Data Migrations and Database Upgrades with no
transformations at all. Many tables can be compared simultaneously and quickly.
Row Count Comparison:
Great for all - Big Data stores, Data Warehouses, Data Migrations and Database Upgrades.
Many tables and rows can be compared simultaneously and quickly.
Design Library
 Create custom Query Pairs (source & target SQLs)
 Great for team members skilled with SQL
QuerySurge™ Modules
Scheduling
 Build groups of Query Pairs
 Schedule Test Runs for:
• immediately
• at a specific date/time
• automatically after build or
ETL process
built by
QuerySurge™
Deep-Dive Reporting
 Examine and automatically
email test results
Run Dashboard
 View real-time execution
 Analyze real-time results
QuerySurge™ Modules
built by
QuerySurge™
built by
QuerySurge™
• view data reliability & pass rate
• add, move, filter, zoom-in on any data
widget & underlying data
• verify build success or failure
Test Management Connectors
built by
QuerySurge™
 Drive QuerySurge execution from your Test Management Solution
 Outcome results (Pass/Fail/etc.) are returned from QuerySurge to your Test Management Solution
 Results are linked in your Test Management Solution so that you can click directly into detailed QuerySurge
results
• HP ALM (Quality Center)
• Microsoft Team Foundation Server
• IBM Rational Quality Manager
Integration with leading
Test Management Solutions
Case Study
Fortune 500 firm:
Clinical Trial Data
built by
QuerySurge™
Case Study: Fortune 500 Pharma
Challenge
How can a Data Warehouse team assure data
integrity over multiple builds when the cost per patient
data of Phase 3 clinical studies exceeds $26,000 and
volume of live case data is > 1 TB?
Strategy
Implement QuerySurge™ to dramatically increase
coverage of data that is verified for each build.
Implementation
• 1,000 SQL queries written to compare case data from
the source systems to the DWH after ETL.
• QuerySurge™automated the scheduling, test runs,
comparisons and reporting for each build.
built by
QuerySurge™
Metrics
 500 mappings
 2.5 million data items
 1.25 billion verifications
 Complete run finished in 7 days
 45% of data was covered.
 14 builds were deployed
 115 defects were discovered and
remediated
Case Study: Fortune 500 Pharma
Benefits
• 10-fold increase in the speed of testing.
• Huge increase in coverage of data (from less than 1/10 % to 45%)
• Production defects discovered that were missed in previous cycles
• Huge savings on clean records (115 defects x $26,000/record)
• A huge time savings (3.6 years x 10 people)
• Avoidance of lawsuits and FDA fines
built by
QuerySurge™
(1) a Trial in the Cloud of QuerySurge, including self-learning
tutorial that works with sample data for 3 days or
(2) a Downloaded Trial of QuerySurge, including self-learning
tutorial with sample data or your data for 15 days or
(3) a Proof of Concept of QuerySurge, including a kickoff &
setup meeting and weekly meetings with our team of experts
for 30 days
http://www.querysurge.com/compare-trial-optionsfor more information, Go here
TRIAL
IN THE CLOUD
built by
QuerySurge™
Free Trials
built by
QuerySurge™
QuerySurge
For more on the Pharma & QuerySurge, go to
www.querysurge.com/solutions/pharmaceutical-industry

More Related Content

What's hot

The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
Databricks
 
Data Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshData Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to Mesh
Jeffrey T. Pollock
 
SQL Analytics Powering Telemetry Analysis at Comcast
SQL Analytics Powering Telemetry Analysis at ComcastSQL Analytics Powering Telemetry Analysis at Comcast
SQL Analytics Powering Telemetry Analysis at Comcast
Databricks
 

What's hot (20)

Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 
QuerySurge - the automated Data Testing solution
QuerySurge - the automated Data Testing solutionQuerySurge - the automated Data Testing solution
QuerySurge - the automated Data Testing solution
 
DataOps for the Modern Data Warehouse on Microsoft Azure @ NDCOslo 2020 - Lac...
DataOps for the Modern Data Warehouse on Microsoft Azure @ NDCOslo 2020 - Lac...DataOps for the Modern Data Warehouse on Microsoft Azure @ NDCOslo 2020 - Lac...
DataOps for the Modern Data Warehouse on Microsoft Azure @ NDCOslo 2020 - Lac...
 
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
 
Building a modern data warehouse
Building a modern data warehouseBuilding a modern data warehouse
Building a modern data warehouse
 
How to Prepare for a BI Migration
How to Prepare for a BI MigrationHow to Prepare for a BI Migration
How to Prepare for a BI Migration
 
Data Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshData Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to Mesh
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
Azure data platform overview
Azure data platform overviewAzure data platform overview
Azure data platform overview
 
Informatica slides
Informatica slidesInformatica slides
Informatica slides
 
ETL and Event Sourcing
ETL and Event SourcingETL and Event Sourcing
ETL and Event Sourcing
 
Microsoft Data Platform - What's included
Microsoft Data Platform - What's includedMicrosoft Data Platform - What's included
Microsoft Data Platform - What's included
 
Introduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse ArchitectureIntroduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse Architecture
 
Migrating on premises workload to azure sql database
Migrating on premises workload to azure sql databaseMigrating on premises workload to azure sql database
Migrating on premises workload to azure sql database
 
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
 
Informatica PowerCenter
Informatica PowerCenterInformatica PowerCenter
Informatica PowerCenter
 
Building an Effective Data Warehouse Architecture
Building an Effective Data Warehouse ArchitectureBuilding an Effective Data Warehouse Architecture
Building an Effective Data Warehouse Architecture
 
Data Quality Patterns in the Cloud with Azure Data Factory
Data Quality Patterns in the Cloud with Azure Data FactoryData Quality Patterns in the Cloud with Azure Data Factory
Data Quality Patterns in the Cloud with Azure Data Factory
 
SQL Analytics Powering Telemetry Analysis at Comcast
SQL Analytics Powering Telemetry Analysis at ComcastSQL Analytics Powering Telemetry Analysis at Comcast
SQL Analytics Powering Telemetry Analysis at Comcast
 
Introduction to ETL and Data Integration
Introduction to ETL and Data IntegrationIntroduction to ETL and Data Integration
Introduction to ETL and Data Integration
 

Similar to Data Warehousing in Pharma: How to Find Bad Data while Meeting Regulatory Requirements

Data Warehouse Testing in the Pharmaceutical Industry
Data Warehouse Testing in the Pharmaceutical IndustryData Warehouse Testing in the Pharmaceutical Industry
Data Warehouse Testing in the Pharmaceutical Industry
RTTS
 
How to Automate your Enterprise Application / ERP Testing
How to Automate your  Enterprise Application / ERP TestingHow to Automate your  Enterprise Application / ERP Testing
How to Automate your Enterprise Application / ERP Testing
RTTS
 
Reveal - An Enterprise Clinical Data Search Solution
Reveal - An Enterprise Clinical Data Search SolutionReveal - An Enterprise Clinical Data Search Solution
Reveal - An Enterprise Clinical Data Search Solution
d-Wise Technologies
 
Test data documentation ss
Test data documentation ssTest data documentation ss
Test data documentation ss
AshwiniPoloju
 

Similar to Data Warehousing in Pharma: How to Find Bad Data while Meeting Regulatory Requirements (20)

Data Warehouse Testing in the Pharmaceutical Industry
Data Warehouse Testing in the Pharmaceutical IndustryData Warehouse Testing in the Pharmaceutical Industry
Data Warehouse Testing in the Pharmaceutical Industry
 
Query Wizards - data testing made easy - no programming
Query Wizards - data testing made easy - no programmingQuery Wizards - data testing made easy - no programming
Query Wizards - data testing made easy - no programming
 
Deliver Trusted Data by Leveraging ETL Testing
Deliver Trusted Data by Leveraging ETL TestingDeliver Trusted Data by Leveraging ETL Testing
Deliver Trusted Data by Leveraging ETL Testing
 
How to Automate your Enterprise Application / ERP Testing
How to Automate your  Enterprise Application / ERP TestingHow to Automate your  Enterprise Application / ERP Testing
How to Automate your Enterprise Application / ERP Testing
 
Leveraging HPE ALM & QuerySurge to test HPE Vertica
Leveraging HPE ALM & QuerySurge to test HPE VerticaLeveraging HPE ALM & QuerySurge to test HPE Vertica
Leveraging HPE ALM & QuerySurge to test HPE Vertica
 
Etl testing strategies
Etl testing strategiesEtl testing strategies
Etl testing strategies
 
Improve the Health of Your Data
Improve the Health of Your DataImprove the Health of Your Data
Improve the Health of Your Data
 
Completing the Data Equation: Test Data + Data Validation = Success
Completing the Data Equation: Test Data + Data Validation = SuccessCompleting the Data Equation: Test Data + Data Validation = Success
Completing the Data Equation: Test Data + Data Validation = Success
 
Leveraging Automated Data Validation to Reduce Software Development Timeline...
Leveraging Automated Data Validation  to Reduce Software Development Timeline...Leveraging Automated Data Validation  to Reduce Software Development Timeline...
Leveraging Automated Data Validation to Reduce Software Development Timeline...
 
Creating a Project Plan for a Data Warehouse Testing Assignment
Creating a Project Plan for a Data Warehouse Testing AssignmentCreating a Project Plan for a Data Warehouse Testing Assignment
Creating a Project Plan for a Data Warehouse Testing Assignment
 
Qa what is_clinical_data_management
Qa what is_clinical_data_managementQa what is_clinical_data_management
Qa what is_clinical_data_management
 
Clinical data management
Clinical data management Clinical data management
Clinical data management
 
Clinical Data Management
Clinical Data ManagementClinical Data Management
Clinical Data Management
 
Building a Robust Big Data QA Ecosystem to Mitigate Data Integrity Challenges
Building a Robust Big Data QA Ecosystem to Mitigate Data Integrity ChallengesBuilding a Robust Big Data QA Ecosystem to Mitigate Data Integrity Challenges
Building a Robust Big Data QA Ecosystem to Mitigate Data Integrity Challenges
 
Data Warehouse (ETL) testing process
Data Warehouse (ETL) testing processData Warehouse (ETL) testing process
Data Warehouse (ETL) testing process
 
Reveal - An Enterprise Clinical Data Search Solution
Reveal - An Enterprise Clinical Data Search SolutionReveal - An Enterprise Clinical Data Search Solution
Reveal - An Enterprise Clinical Data Search Solution
 
Xcellerate® Data Review
Xcellerate® Data Review Xcellerate® Data Review
Xcellerate® Data Review
 
End User Informatics
End User InformaticsEnd User Informatics
End User Informatics
 
Test data documentation ss
Test data documentation ssTest data documentation ss
Test data documentation ss
 
The Real Value of Oracle Health Checks
The Real Value of Oracle Health ChecksThe Real Value of Oracle Health Checks
The Real Value of Oracle Health Checks
 

More from RTTS

QuerySurge for DevOps
QuerySurge for DevOpsQuerySurge for DevOps
QuerySurge for DevOps
RTTS
 
RTTS - the Software Quality Experts
RTTS - the Software Quality ExpertsRTTS - the Software Quality Experts
RTTS - the Software Quality Experts
RTTS
 

More from RTTS (14)

Automated Testing of Microsoft Power BI Reports
Automated Testing of Microsoft Power BI ReportsAutomated Testing of Microsoft Power BI Reports
Automated Testing of Microsoft Power BI Reports
 
QuerySurge AI webinar
QuerySurge AI webinarQuerySurge AI webinar
QuerySurge AI webinar
 
State of the Market - Data Quality in 2023
State of the Market - Data Quality in 2023State of the Market - Data Quality in 2023
State of the Market - Data Quality in 2023
 
TestGuild and QuerySurge Presentation -DevOps for Data Testing
TestGuild and QuerySurge Presentation -DevOps for Data TestingTestGuild and QuerySurge Presentation -DevOps for Data Testing
TestGuild and QuerySurge Presentation -DevOps for Data Testing
 
RTTS Postman and API Testing Webinar Slides.pdf
RTTS Postman and API Testing Webinar  Slides.pdfRTTS Postman and API Testing Webinar  Slides.pdf
RTTS Postman and API Testing Webinar Slides.pdf
 
QuerySurge Slide Deck for Big Data Testing Webinar
QuerySurge Slide Deck for Big Data Testing WebinarQuerySurge Slide Deck for Big Data Testing Webinar
QuerySurge Slide Deck for Big Data Testing Webinar
 
Webinar - QuerySurge and Azure DevOps in the Azure Cloud
 Webinar - QuerySurge and Azure DevOps in the Azure Cloud Webinar - QuerySurge and Azure DevOps in the Azure Cloud
Webinar - QuerySurge and Azure DevOps in the Azure Cloud
 
Implementing Azure DevOps with your Testing Project
Implementing Azure DevOps with your Testing ProjectImplementing Azure DevOps with your Testing Project
Implementing Azure DevOps with your Testing Project
 
the Data World Distilled
the Data World Distilledthe Data World Distilled
the Data World Distilled
 
QuerySurge for DevOps
QuerySurge for DevOpsQuerySurge for DevOps
QuerySurge for DevOps
 
Whitepaper: Volume Testing Thick Clients and Databases
Whitepaper:  Volume Testing Thick Clients and DatabasesWhitepaper:  Volume Testing Thick Clients and Databases
Whitepaper: Volume Testing Thick Clients and Databases
 
Case study: Open Source Automation Framework using Selenium WebDriver
Case study: Open Source Automation Framework using Selenium WebDriverCase study: Open Source Automation Framework using Selenium WebDriver
Case study: Open Source Automation Framework using Selenium WebDriver
 
Enterprise Business Intelligence & Data Warehousing: The Data Quality Conundrum
Enterprise Business Intelligence & Data Warehousing: The Data Quality ConundrumEnterprise Business Intelligence & Data Warehousing: The Data Quality Conundrum
Enterprise Business Intelligence & Data Warehousing: The Data Quality Conundrum
 
RTTS - the Software Quality Experts
RTTS - the Software Quality ExpertsRTTS - the Software Quality Experts
RTTS - the Software Quality Experts
 

Recently uploaded

CORS (Kitworks Team Study 양다윗 발표자료 240510)
CORS (Kitworks Team Study 양다윗 발표자료 240510)CORS (Kitworks Team Study 양다윗 발표자료 240510)
CORS (Kitworks Team Study 양다윗 발표자료 240510)
Wonjun Hwang
 
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
panagenda
 
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc
 
Microsoft BitLocker Bypass Attack Method.pdf
Microsoft BitLocker Bypass Attack Method.pdfMicrosoft BitLocker Bypass Attack Method.pdf
Microsoft BitLocker Bypass Attack Method.pdf
Overkill Security
 

Recently uploaded (20)

Working together SRE & Platform Engineering
Working together SRE & Platform EngineeringWorking together SRE & Platform Engineering
Working together SRE & Platform Engineering
 
Top 10 CodeIgniter Development Companies
Top 10 CodeIgniter Development CompaniesTop 10 CodeIgniter Development Companies
Top 10 CodeIgniter Development Companies
 
CORS (Kitworks Team Study 양다윗 발표자료 240510)
CORS (Kitworks Team Study 양다윗 발표자료 240510)CORS (Kitworks Team Study 양다윗 발표자료 240510)
CORS (Kitworks Team Study 양다윗 발표자료 240510)
 
Generative AI Use Cases and Applications.pdf
Generative AI Use Cases and Applications.pdfGenerative AI Use Cases and Applications.pdf
Generative AI Use Cases and Applications.pdf
 
ADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptxADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptx
 
JavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate GuideJavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate Guide
 
How to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cfHow to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cf
 
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
 
Vector Search @ sw2con for slideshare.pptx
Vector Search @ sw2con for slideshare.pptxVector Search @ sw2con for slideshare.pptx
Vector Search @ sw2con for slideshare.pptx
 
ERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage IntacctERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage Intacct
 
Introduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptxIntroduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptx
 
The Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and InsightThe Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and Insight
 
Simplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptxSimplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptx
 
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
 
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by Anitaraj
 
Microsoft BitLocker Bypass Attack Method.pdf
Microsoft BitLocker Bypass Attack Method.pdfMicrosoft BitLocker Bypass Attack Method.pdf
Microsoft BitLocker Bypass Attack Method.pdf
 
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
 
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
 
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
 
Portal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russePortal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russe
 

Data Warehousing in Pharma: How to Find Bad Data while Meeting Regulatory Requirements

  • 1. 1 Presenter Bill Hayduk Founder / President Presenter Jeff Bocarsly, Ph.D. Senior Architect built bybuilt by QuerySurge™
  • 2. built by The average organization loses $8.2 million annually through poor Data Quality. - Gartner 46% of companies cite Data Quality as a barrier for adopting Business Intelligence products. - InformationWeek The cost per patient data of Phase 3 clinical studies of new pharmaceuticals exceeds $26,000. - Journal of Clinical Research Best Practices built by QuerySurge™
  • 3. Pharma’s 2 Largest Data Warehousing Concerns built by QuerySurge™
  • 4. Pharma’s Largest Data Warehouse Concerns (1) Data Integrity (2) Compliance built by QuerySurge™
  • 5. (1) Data Integrity high risk of defects that are not readily visible Missing Data Truncation of Data Data Type Mismatch Null Translation errors Incorrect Type Translation Misplaced Data Extra Records Transformation Logic Errors/Holes Simple/Small Errors Sequence Generator errors Undocumented Requirements Not Enough Records built by Pharma’s Data Warehouse Concerns QuerySurge™
  • 6. Pharma’s Data Warehouse Concerns (2) Compliance Need to comply with Part 11 mandates historical test information test version history test execution data: who, what & when test cycle information visibility of assets archived test results built by QuerySurge™
  • 7. Why is this Important?  Periodic data reporting to FDA  Periodic data reporting to int’l bodies (1) Data Integrity (2) Compliance  FDA announced audits  Unannounced FDA audits Consequences Severe financial and business built by QuerySurge™
  • 9.  automate the manual testing of data  compare millions of rows of data quickly  flag mismatches and inconsistencies in data sets  provide flexibility in scheduling test runs  generate informative reports that can easily be shared with the team  validate up to 100% of all of all data, mitigating the risk Data Integrity needs Need a testing solution that can… built by QuerySurge™
  • 10. Part 11 Reporting needs  track test history  provide reporting on test version history  record all test execution by testing owner’s name and date  deliver auditable reports of test cycles  store all test outcomes and test data  offer a read-only user for reviewing test assets  support archiving of results Need a testing solution that can… built by QuerySurge™
  • 11. built by The solution… built by QuerySurge™
  • 12. What is QuerySurge™? the collaborative Data Testing solution that finds bad data & provides a holistic view of your data’s health built by QuerySurge™
  • 13. • Reduce your costs & risks • Improve your data quality • Accelerate your testing cycles • Share information with your team with QuerySurge™ you can: built by QuerySurge™ • Provides huge ROI (i.e. 1,300%)* *based on client’s calculation of Return on Investment
  • 14. Finding Bad Data SQL HQL SQL HQL SQL SQL  QS pulls data from data sources  QS pulls data from target data store  QS compares data quickly  QS generates reports, audit trails How? Reports, Data Health Dashboard built by QuerySurge™
  • 15. QuerySurge™ Architecture Web-based… Installs on... Linux Connects to… …or any other JDBC compliant data source built by QuerySurge™ QuerySurge Controller QuerySurge Server QuerySurge Agents Flat Files
  • 16. Collaboration Testers - functional testing - regression testing - result analysis Developers / DBAs - unit testing - result analysis Data Analysts - review, analyze data - verify mapping failures Operations teams - monitoring - result analysis Managers - oversight - result analysis Share information on the built by QuerySurge™
  • 17. the QuerySurge advantage built by QuerySurge™ Automate the entire testing cycle  Automate kickoff, tests, comparison, auto-emailed results Create Tests easily with no SQL programming  ensures minimal time & effort to create tests / obtain results Test across different platforms  data warehouse, Hadoop, NoSQL, database, flat file, XML Collaborate with team  Data Health dashboard, shared tests & auto-emailed reports Verify more data & do it quickly  verifies up to 100% of all data up to 1,000 x faster Integrate for Continuous Delivery  Integrates with most Build, ETL & QA management software
  • 18. built by QuerySurge™ QuerySurge™ Modules Design Tests SchedulingDeep-Dive Reporting Run Dashboard Query WizardsData Health Dashboard
  • 19. Fast and Easy. No programming needed. built by QuerySurge™ QuerySurge™ Modules Compare by Table, Column & Row • Perform 80% of all data tests •Automatically generates SQL code • Opens up testing to novice & non- technical team members • Speeds up testing for skilled SQL coders • provides a huge Return-On-Investment
  • 20. built by QuerySurge™ QuerySurge™ Modules 3 Types of Data Comparison Wizards: The also provide you with automated features for: o filtering (‘Where’ clause) and o sorting (‘Order By’ clause) Column-Level Comparison: This is great for Big Data stores and Data Warehouses where tables will have some columns containing transformations and some columns with no transformations. Many tables and columns can be compared simultaneously and quickly. Table-Level Comparison: This comparator is great for Data Migrations and Database Upgrades with no transformations at all. Many tables can be compared simultaneously and quickly. Row Count Comparison: Great for all - Big Data stores, Data Warehouses, Data Migrations and Database Upgrades. Many tables and rows can be compared simultaneously and quickly.
  • 21. Design Library  Create custom Query Pairs (source & target SQLs)  Great for team members skilled with SQL QuerySurge™ Modules Scheduling  Build groups of Query Pairs  Schedule Test Runs for: • immediately • at a specific date/time • automatically after build or ETL process built by QuerySurge™
  • 22. Deep-Dive Reporting  Examine and automatically email test results Run Dashboard  View real-time execution  Analyze real-time results QuerySurge™ Modules built by QuerySurge™
  • 23. built by QuerySurge™ • view data reliability & pass rate • add, move, filter, zoom-in on any data widget & underlying data • verify build success or failure
  • 24. Test Management Connectors built by QuerySurge™  Drive QuerySurge execution from your Test Management Solution  Outcome results (Pass/Fail/etc.) are returned from QuerySurge to your Test Management Solution  Results are linked in your Test Management Solution so that you can click directly into detailed QuerySurge results • HP ALM (Quality Center) • Microsoft Team Foundation Server • IBM Rational Quality Manager Integration with leading Test Management Solutions
  • 25. Case Study Fortune 500 firm: Clinical Trial Data built by QuerySurge™
  • 26. Case Study: Fortune 500 Pharma Challenge How can a Data Warehouse team assure data integrity over multiple builds when the cost per patient data of Phase 3 clinical studies exceeds $26,000 and volume of live case data is > 1 TB? Strategy Implement QuerySurge™ to dramatically increase coverage of data that is verified for each build. Implementation • 1,000 SQL queries written to compare case data from the source systems to the DWH after ETL. • QuerySurge™automated the scheduling, test runs, comparisons and reporting for each build. built by QuerySurge™
  • 27. Metrics  500 mappings  2.5 million data items  1.25 billion verifications  Complete run finished in 7 days  45% of data was covered.  14 builds were deployed  115 defects were discovered and remediated Case Study: Fortune 500 Pharma Benefits • 10-fold increase in the speed of testing. • Huge increase in coverage of data (from less than 1/10 % to 45%) • Production defects discovered that were missed in previous cycles • Huge savings on clean records (115 defects x $26,000/record) • A huge time savings (3.6 years x 10 people) • Avoidance of lawsuits and FDA fines built by QuerySurge™
  • 28. (1) a Trial in the Cloud of QuerySurge, including self-learning tutorial that works with sample data for 3 days or (2) a Downloaded Trial of QuerySurge, including self-learning tutorial with sample data or your data for 15 days or (3) a Proof of Concept of QuerySurge, including a kickoff & setup meeting and weekly meetings with our team of experts for 30 days http://www.querysurge.com/compare-trial-optionsfor more information, Go here TRIAL IN THE CLOUD built by QuerySurge™ Free Trials
  • 29. built by QuerySurge™ QuerySurge For more on the Pharma & QuerySurge, go to www.querysurge.com/solutions/pharmaceutical-industry

Editor's Notes

  1. Other Pharmaceutical Industry Complexities ------------------------------------------------------------------------ Industry consolidation causing massive integration of data FDA CFR Part 11 compliance A broad variety of data types and sources may be fed into a data warehouse. general Pharma-specific information exchange formats (e.g., HL7 feeds, CDISC feeds, other XML grammars) multiple proprietary and internal data formats, which may have been acquired in the process of industry consolidation.
  2. QuerySurge can automate the comparison of all data from source files and databases through different legs of the ETL process to the target data warehouse. QuerySurge can be scheduled to run immediately, next Monday at 11:00pm or when an event, such as the current ETL process ends. QuerySurge will execute tests that automate the comparison of target data to source data very quickly, comparing millions of rows of data in minutes. On completion of the run, QuerySurge will produce informative summary and detailed reports that can be viewed immediately or shared with the team via the automated email scheduler. QuerySurge will validate 100% of all of your data, providing full coverage and mitigating the risk while providing reports highlighting every data difference, down to the individual character.
  3. - tracks test history (user, date, each test version) - provides reporting on test version history for convenient auditing - supports tracking of deviations from approved tests - records all test execution owners by name and date - delivers auditable results reporting of test cycles - stores all test outcomes and test data for post-facto review or audit - offers a read-only user type for reviewing test assets - supports off-database archiving of results (for future restore) for effective long-term results data management
  4. QuerySurge provides insight into the health of your data throughout your organization through BI dashboards and reporting at your fingertips. It is a collaborative tool that allows for distributed use of the tool throughout your organization and provides for a sharable, holistic view of your data’s health and your organization’s level of maturity of your data management.
  5. QuerySurge helps your team coordinate your data quality initiatives while speeding up your development and testing cycles and finding your bad data. Why risk having your team identify trends and develop strategic initiatives when the underlying data is incorrect? QuerySurge reduces this risk.
  6. QuerySurge finds bad data by natively connecting to: any data source, whether it is any type of database, flat file or xml and can connect to any data target, whether it is a db, file, xml, data warehouse or hadoop implementation. QuerySurge pulls data from the source and the target and compares them very quickly (typically in a few minutes) and then produces reports that show every data difference, even if there are millions of rows and hundreds of columns in the test. These reports can be automatically emailed to your team. You can pick from a multitude of reports or export the results so that you can build your own reports.
  7. Your distributed team from around the world can use any of these web browsers: Internet Explorer, Chrome, Firefox and Safari. Installs on operating systems: Windows & Linux. QS connects to any JDBC-compliant data source. Even if it is not listed here.
  8. QuerySurge can utilized by active practitioners such as testers & developers to create and launch tests, or by managers, analysts and operations to view data test results and the overall health of the data. QuerySurge facilitates this by providing 2 types of licenses: (1) full user & (2) participant user. (1) Full User – This type of user has unlimited access to create QueryPairs, Suites, and Scenarios. This user can also schedule and run tests, see results, run and export reports, and export data. Perfect for anyone creating and/or running data tests while performing analysis of results. (2) Participant User – This user cannot create or run tests, but has access to all other information - including viewing all query pairs, results, and reports, receiving email notifications, and exporting test results and reports. Perfect for managers, analysts, architects, DBAs, developers, and operations users who need to know the health of their data.