SlideShare a Scribd company logo
1 of 12
Download to read offline
Comparing Files without Proc
Compare
Pharmasug 2008
Alejandro Jaramillo
Russ Lavery
Jaramillo & Lavery Pharmasug 208 1
• Purpose
• To present an efficient methodology to compare and validate files
that are expected to have the same data structure and contents
• Business Case
• In migrating data to new system business rules may indivertibly
changed. Having an adequate method and process to independently
and efficiently flag potential unexpected changes in the data is the
key to the project success
• Parallel File Comparison
• Parallel File Comparison is defined as the process for recreating data
files from raw data sources by an independent team and comparing
them to files produced by development team to feed enterprise or
production system. The goal is to detect differences due to different
interpretation or application of business rules or human error.
Jaramillo & Lavery Pharmasug 208 2
Scenarios for Parallel Data Comparisons
• Forward  Data is compared and validated independently at every stage
before data goes into production to feed enterprise applications
• Backward Data is fed to enterprise application and results are
regenerated independently from raw data. Even if enterprise results
match, validation of granular data feeding enterprise application is done
Enterprise
Results
Pre
summarization
Granular
Raw Data
Validation
Validation
Validation
ValidationForward
Backward
Jaramillo & Lavery Pharmasug 208 3
The Parallel File Comparison Method
• Method discussed in this presentation is based on using SAS for
comparison and validation. However method can be applied when
using any other system
• SAS Proc compare provides an excellent way to compare files when
they are expected to have no differences. However when proc
compare shows differences, a more detail methodology should be
used to trace the source of the differences.
• The method of comparing two files that are suppose to have the
same data and file structure but show differences via “Proc
compare” has the following 5 steps:
1. Produce the files to be compared against development or production
data
2. Start comparing pairs of similar files using proc compared. If comparison
fails go to #3
3. Compare file structure via proc contents= > If fails stop and get files to
conform to the same structure
4. Define keys and data
5. On both files run summaries on major keys (time, period, product code,
market code..etc)
6. Compare both raw files at the record level with regards to keys and data
=> if 6 or 5 fail, inquiry about file differences using raw data must be
followed
Jaramillo & Lavery Pharmasug 208 4
Early Diagnosis
If Proc compare shows differences. A more detail analysis is
required. Start with Proc contents
After confirming same file structure and number of observations, a
more detail check on the raw data must be conducted
Jaramillo & Lavery Pharmasug 208 5
Store Reg Prod LQ1 LQ2 LQ3 RQ1 RQ2 RQ3
AAA A 0P1 12 10 8 12 10 8
BBB A 0P1 10 11 7 10 11 7
FFF A 0P1 17 11 8 19 10 6
CCC A 0P1 12 10 8
DDD B 0P2 10 15 2
EEE B 0P3 10 15 2 10 15 4
NNN c 1P1 19 15 11
CCZ A 0P1 12 10 8
Store Reg Prod LQ1 LQ2 LQ3
AAA A 0P1 12 10 8
BBB A 0P1 10 11 7
FFF A 0P1 19 10 6
EEE B 0P3 10 15 4
NNN c 1P1 19 15 11
CCZ A 0P1 12 10 8
Store Reg Prod LQ1 LQ2 LQ3
AAA A 0P1 12 10 8
BBB A 0P1 10 11 7
FFF A 0P1 17 11 8
CCC A 0P1 12 10 8
DDD B 0P2 10 15 2
EEE B 0P3 10 15 2
Logic
---Left FIle--- ---Right FIle---
Match
Match
Data
Key
ODD
Data
ODD
Key
This method
checks
EVERY
value.
Match lines
on Key and
use array &
loop to
compare
data values.
Checking Keys and Data gives exact answers
Key
Error
Jaramillo & Lavery Pharmasug 208 6
Left File
Right File
Both
both files
good and bad
matches
bad_left
It is only on
the left file
Badright
It is only on
the right file
Get Merge by keys
Generation of
matching variables
Top view for comparing left and right files
run freqs on matching variables
List and compare a few raw records form bad files to get an idea
of the source of mismatches
Jaramillo & Lavery Pharmasug 208 7
Store Reg Prod LQ1 LQ2 LQ3 RQ1 RQ2 RQ3
AAA A 0P1 12 10 8 12 10 8
BBB A 0P1 10 11 7 10 11 7
FFF A 0P1 17 11 8 19 10 6
CCC A 0P1 12 10 8
DDD B 0P2 10 15 2
EEE B 0P3 10 15 2 10 15 4
NNN c 1P1 19 15 11
CCZ A 0P1 12 10 8
mismatch Left_vs_Right
|1= Obs |10= Obs |11= Obs | Total
Frequency |in Left |in |in both |
Percent |Only |Right |Left and|
| |only |Right |
-----------------ˆ--------ˆ--------ˆ--------ˆ
NO problems | 0 | 0 | 2 | 2
with key or data | 0.00 | 0.00 | 25.00 | 25.00
-----------------ˆ--------ˆ--------ˆ--------ˆ
Yes: Problems | 2 | 2 | 2 | 6
with key or data | 25.00 | 25.00 | 25.00 | 75.00
-----------------ˆ--------ˆ--------ˆ--------ˆ
Total 2 2 4 8
25.00 25.00 50.00 100.00
Logic
Match
Match
Data
Key
ODD
Data
ODD
Key
Checking Keys and Data gives exact answers
We are
comparing
data with
missing
values.
Data
problemJaramillo & Lavery Pharmasug 208 8
mismatch Sand_vs_ODW
|1= Obs |10= Obs |11= Obs | Total
Frequency |in Left |in |in both |
Percent |Only |Right |Left and|
| |only |Right |
-----------------ˆ--------ˆ--------ˆ--------ˆ
NO problems | 0 | 0 | 2 | 2
with key or data | 0.00 | 0.00 | 25.00 | 25.00
-----------------ˆ--------ˆ--------ˆ--------ˆ
Yes: Problems | 2 | 2 | 2 | 6
with key or data | 25.00 | 25.00 | 25.00 | 75.00
-----------------ˆ--------ˆ--------ˆ--------ˆ
Total 2 2 4 8
25.00 25.00 50.00 100.00
Store Reg Prod STrx1 STrx2 STrx3 OTrx1 OTrx2 OTrx3
AAA A 0P1 12 10 8 12 10 8
BBB A 0P1 10 11 7 10 11 7
FFF A 0P1 17 11 8 19 10 6
CCC A 0P1 12 10 8
DDD B 0P2 10 15 2
EEE B 0P3 10 15 2 10 15 4
NNN c 1P1 19 15 11
CCZ A 0P1 12 10 8
Logic
Match
Match
Data
Key
ODD
Data
ODD
Key
Ideally, all obs should be
here
Checking Keys and Data gives exact answers
Keys Match, problems
with the data
Jaramillo & Lavery Pharmasug 208 9
Timeline
Left File Right Left File
Check for duplicates Check for duplicates
Check for bad codes Check for bad codes
Clean the
file
Clean the
file
Contents: date & size Contents: date & size
Freq by Prod_code Freq by Prod_code
R
P
T
Rpt
Merge-Calc
Diff by
Prod_cd
Merge-
Calc High
Level Diffs
Rpt
electroni
c copy
Identify every row with problem
electroni
c copy
Problem Analysis – Row. electroni
c copy
Key Analysis
Problem Analysis – Rx
Rpt
Jaramillo & Lavery Pharmasug 208 10
Timeline QC Process
Write programs, for
series of files, in
anticipation of file
delivery.
A batch of
files to be
compared
is
delivered
Run QC
Programs on
the batch
files
Assemble
report on
batch files
(concurrent
w/ run)
QC Programming
Review/ annotate
report (1 day)
Arrange meeting
with Responsible
Group.
(1 week)
Discuss report W/
Responsible Group
and create action
items. (1 day)
FAIL
Create
new
version of
files
(2 weeks)
Investigate /
fix action
items.
(1 week)
File is OK
or “close”
If files are close user runs
reports with new file and
compares results(1 week)
Pass
F
A
I
L
log as
file done
Jaramillo & Lavery Pharmasug 208 11
Conclusion & Recommendations
• When data sources and process change use of a systematic approach
as the outlined in this presentation to compare data at the top and
record level provides an efficient mechanism to track progress,
identify and resolve potential problems.
• Comparison and validation should be included in project timeline.
• QC metrics should be established for development team. However
total validation must be conducted independently.
• Differences in data must be accounted 100% of the times.
Jaramillo & Lavery Pharmasug 208 12

More Related Content

Similar to 2008 Pharmasug, Parallel Validation of Files

Errors while sending packages from oltp to bi (one of error at the time of da...
Errors while sending packages from oltp to bi (one of error at the time of da...Errors while sending packages from oltp to bi (one of error at the time of da...
Errors while sending packages from oltp to bi (one of error at the time of da...
bhaskarbi
 
Data quality and bi
Data quality and biData quality and bi
Data quality and bi
jeffd00
 

Similar to 2008 Pharmasug, Parallel Validation of Files (20)

Etl testing
Etl testingEtl testing
Etl testing
 
Data integrity and consistency
Data integrity and consistencyData integrity and consistency
Data integrity and consistency
 
Building and managing complex dependencies pipeline using Apache Oozie
Building and managing complex dependencies pipeline using Apache OozieBuilding and managing complex dependencies pipeline using Apache Oozie
Building and managing complex dependencies pipeline using Apache Oozie
 
CSI approach to your Production Management
CSI approach to your Production ManagementCSI approach to your Production Management
CSI approach to your Production Management
 
Data Warehouse (ETL) testing process
Data Warehouse (ETL) testing processData Warehouse (ETL) testing process
Data Warehouse (ETL) testing process
 
Something old, something new, and the road between - Stephan Podevyn
Something old, something new, and the road between - Stephan PodevynSomething old, something new, and the road between - Stephan Podevyn
Something old, something new, and the road between - Stephan Podevyn
 
Errors while sending packages from oltp to bi (one of error at the time of da...
Errors while sending packages from oltp to bi (one of error at the time of da...Errors while sending packages from oltp to bi (one of error at the time of da...
Errors while sending packages from oltp to bi (one of error at the time of da...
 
CSI approach to your Production Management
CSI approach to your Production ManagementCSI approach to your Production Management
CSI approach to your Production Management
 
Fine grained root cause and impact analysis with CDAP Lineage
Fine grained root cause and impact analysis with CDAP LineageFine grained root cause and impact analysis with CDAP Lineage
Fine grained root cause and impact analysis with CDAP Lineage
 
Best practices for_the_construction_of_well_models
Best practices for_the_construction_of_well_modelsBest practices for_the_construction_of_well_models
Best practices for_the_construction_of_well_models
 
How Logilab ELN helps organizations to maintain ALCOA Data Integrity
How Logilab ELN helps organizations to maintain ALCOA Data IntegrityHow Logilab ELN helps organizations to maintain ALCOA Data Integrity
How Logilab ELN helps organizations to maintain ALCOA Data Integrity
 
Whitepaper : Working with Greenplum Database using Toad for Data Analysts
Whitepaper : Working with Greenplum Database using Toad for Data Analysts Whitepaper : Working with Greenplum Database using Toad for Data Analysts
Whitepaper : Working with Greenplum Database using Toad for Data Analysts
 
End User Informatics
End User InformaticsEnd User Informatics
End User Informatics
 
Intro to Data warehousing lecture 10
Intro to Data warehousing   lecture 10Intro to Data warehousing   lecture 10
Intro to Data warehousing lecture 10
 
Test Data Management: The Underestimated Pain
Test Data Management: The Underestimated PainTest Data Management: The Underestimated Pain
Test Data Management: The Underestimated Pain
 
ThoughtWorks Continuous Delivery
ThoughtWorks Continuous DeliveryThoughtWorks Continuous Delivery
ThoughtWorks Continuous Delivery
 
Automate data warehouse etl testing and migration testing the agile way
Automate data warehouse etl testing and migration testing the agile wayAutomate data warehouse etl testing and migration testing the agile way
Automate data warehouse etl testing and migration testing the agile way
 
Data quality and bi
Data quality and biData quality and bi
Data quality and bi
 
Six Sigma Presentation Storybd 07 Mar24
Six Sigma Presentation Storybd 07 Mar24Six Sigma Presentation Storybd 07 Mar24
Six Sigma Presentation Storybd 07 Mar24
 
Nephele 2.0: How to get the most out of your Nephele results
Nephele 2.0: How to get the most out of your Nephele resultsNephele 2.0: How to get the most out of your Nephele results
Nephele 2.0: How to get the most out of your Nephele results
 

Recently uploaded

Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
amitlee9823
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
amitlee9823
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
amitlee9823
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
amitlee9823
 

Recently uploaded (20)

VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptx
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 

2008 Pharmasug, Parallel Validation of Files

  • 1. Comparing Files without Proc Compare Pharmasug 2008 Alejandro Jaramillo Russ Lavery Jaramillo & Lavery Pharmasug 208 1
  • 2. • Purpose • To present an efficient methodology to compare and validate files that are expected to have the same data structure and contents • Business Case • In migrating data to new system business rules may indivertibly changed. Having an adequate method and process to independently and efficiently flag potential unexpected changes in the data is the key to the project success • Parallel File Comparison • Parallel File Comparison is defined as the process for recreating data files from raw data sources by an independent team and comparing them to files produced by development team to feed enterprise or production system. The goal is to detect differences due to different interpretation or application of business rules or human error. Jaramillo & Lavery Pharmasug 208 2
  • 3. Scenarios for Parallel Data Comparisons • Forward  Data is compared and validated independently at every stage before data goes into production to feed enterprise applications • Backward Data is fed to enterprise application and results are regenerated independently from raw data. Even if enterprise results match, validation of granular data feeding enterprise application is done Enterprise Results Pre summarization Granular Raw Data Validation Validation Validation ValidationForward Backward Jaramillo & Lavery Pharmasug 208 3
  • 4. The Parallel File Comparison Method • Method discussed in this presentation is based on using SAS for comparison and validation. However method can be applied when using any other system • SAS Proc compare provides an excellent way to compare files when they are expected to have no differences. However when proc compare shows differences, a more detail methodology should be used to trace the source of the differences. • The method of comparing two files that are suppose to have the same data and file structure but show differences via “Proc compare” has the following 5 steps: 1. Produce the files to be compared against development or production data 2. Start comparing pairs of similar files using proc compared. If comparison fails go to #3 3. Compare file structure via proc contents= > If fails stop and get files to conform to the same structure 4. Define keys and data 5. On both files run summaries on major keys (time, period, product code, market code..etc) 6. Compare both raw files at the record level with regards to keys and data => if 6 or 5 fail, inquiry about file differences using raw data must be followed Jaramillo & Lavery Pharmasug 208 4
  • 5. Early Diagnosis If Proc compare shows differences. A more detail analysis is required. Start with Proc contents After confirming same file structure and number of observations, a more detail check on the raw data must be conducted Jaramillo & Lavery Pharmasug 208 5
  • 6. Store Reg Prod LQ1 LQ2 LQ3 RQ1 RQ2 RQ3 AAA A 0P1 12 10 8 12 10 8 BBB A 0P1 10 11 7 10 11 7 FFF A 0P1 17 11 8 19 10 6 CCC A 0P1 12 10 8 DDD B 0P2 10 15 2 EEE B 0P3 10 15 2 10 15 4 NNN c 1P1 19 15 11 CCZ A 0P1 12 10 8 Store Reg Prod LQ1 LQ2 LQ3 AAA A 0P1 12 10 8 BBB A 0P1 10 11 7 FFF A 0P1 19 10 6 EEE B 0P3 10 15 4 NNN c 1P1 19 15 11 CCZ A 0P1 12 10 8 Store Reg Prod LQ1 LQ2 LQ3 AAA A 0P1 12 10 8 BBB A 0P1 10 11 7 FFF A 0P1 17 11 8 CCC A 0P1 12 10 8 DDD B 0P2 10 15 2 EEE B 0P3 10 15 2 Logic ---Left FIle--- ---Right FIle--- Match Match Data Key ODD Data ODD Key This method checks EVERY value. Match lines on Key and use array & loop to compare data values. Checking Keys and Data gives exact answers Key Error Jaramillo & Lavery Pharmasug 208 6
  • 7. Left File Right File Both both files good and bad matches bad_left It is only on the left file Badright It is only on the right file Get Merge by keys Generation of matching variables Top view for comparing left and right files run freqs on matching variables List and compare a few raw records form bad files to get an idea of the source of mismatches Jaramillo & Lavery Pharmasug 208 7
  • 8. Store Reg Prod LQ1 LQ2 LQ3 RQ1 RQ2 RQ3 AAA A 0P1 12 10 8 12 10 8 BBB A 0P1 10 11 7 10 11 7 FFF A 0P1 17 11 8 19 10 6 CCC A 0P1 12 10 8 DDD B 0P2 10 15 2 EEE B 0P3 10 15 2 10 15 4 NNN c 1P1 19 15 11 CCZ A 0P1 12 10 8 mismatch Left_vs_Right |1= Obs |10= Obs |11= Obs | Total Frequency |in Left |in |in both | Percent |Only |Right |Left and| | |only |Right | -----------------ˆ--------ˆ--------ˆ--------ˆ NO problems | 0 | 0 | 2 | 2 with key or data | 0.00 | 0.00 | 25.00 | 25.00 -----------------ˆ--------ˆ--------ˆ--------ˆ Yes: Problems | 2 | 2 | 2 | 6 with key or data | 25.00 | 25.00 | 25.00 | 75.00 -----------------ˆ--------ˆ--------ˆ--------ˆ Total 2 2 4 8 25.00 25.00 50.00 100.00 Logic Match Match Data Key ODD Data ODD Key Checking Keys and Data gives exact answers We are comparing data with missing values. Data problemJaramillo & Lavery Pharmasug 208 8
  • 9. mismatch Sand_vs_ODW |1= Obs |10= Obs |11= Obs | Total Frequency |in Left |in |in both | Percent |Only |Right |Left and| | |only |Right | -----------------ˆ--------ˆ--------ˆ--------ˆ NO problems | 0 | 0 | 2 | 2 with key or data | 0.00 | 0.00 | 25.00 | 25.00 -----------------ˆ--------ˆ--------ˆ--------ˆ Yes: Problems | 2 | 2 | 2 | 6 with key or data | 25.00 | 25.00 | 25.00 | 75.00 -----------------ˆ--------ˆ--------ˆ--------ˆ Total 2 2 4 8 25.00 25.00 50.00 100.00 Store Reg Prod STrx1 STrx2 STrx3 OTrx1 OTrx2 OTrx3 AAA A 0P1 12 10 8 12 10 8 BBB A 0P1 10 11 7 10 11 7 FFF A 0P1 17 11 8 19 10 6 CCC A 0P1 12 10 8 DDD B 0P2 10 15 2 EEE B 0P3 10 15 2 10 15 4 NNN c 1P1 19 15 11 CCZ A 0P1 12 10 8 Logic Match Match Data Key ODD Data ODD Key Ideally, all obs should be here Checking Keys and Data gives exact answers Keys Match, problems with the data Jaramillo & Lavery Pharmasug 208 9
  • 10. Timeline Left File Right Left File Check for duplicates Check for duplicates Check for bad codes Check for bad codes Clean the file Clean the file Contents: date & size Contents: date & size Freq by Prod_code Freq by Prod_code R P T Rpt Merge-Calc Diff by Prod_cd Merge- Calc High Level Diffs Rpt electroni c copy Identify every row with problem electroni c copy Problem Analysis – Row. electroni c copy Key Analysis Problem Analysis – Rx Rpt Jaramillo & Lavery Pharmasug 208 10
  • 11. Timeline QC Process Write programs, for series of files, in anticipation of file delivery. A batch of files to be compared is delivered Run QC Programs on the batch files Assemble report on batch files (concurrent w/ run) QC Programming Review/ annotate report (1 day) Arrange meeting with Responsible Group. (1 week) Discuss report W/ Responsible Group and create action items. (1 day) FAIL Create new version of files (2 weeks) Investigate / fix action items. (1 week) File is OK or “close” If files are close user runs reports with new file and compares results(1 week) Pass F A I L log as file done Jaramillo & Lavery Pharmasug 208 11
  • 12. Conclusion & Recommendations • When data sources and process change use of a systematic approach as the outlined in this presentation to compare data at the top and record level provides an efficient mechanism to track progress, identify and resolve potential problems. • Comparison and validation should be included in project timeline. • QC metrics should be established for development team. However total validation must be conducted independently. • Differences in data must be accounted 100% of the times. Jaramillo & Lavery Pharmasug 208 12