• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Cdha hinf6210 data_cleaning_v1.2_sept20_2011
 

Cdha hinf6210 data_cleaning_v1.2_sept20_2011

on

  • 115 views

 

Statistics

Views

Total Views
115
Views on SlideShare
115
Embed Views
0

Actions

Likes
0
Downloads
0
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Cdha hinf6210 data_cleaning_v1.2_sept20_2011 Cdha hinf6210 data_cleaning_v1.2_sept20_2011 Presentation Transcript

    • 1. Intro 2. Background 3. Key Indexes 4. Cases 5. Summary 1 of 14 Data Cleaning Process in the project: Determination and visualization of family practice sizes and characteristics in Nova Scotia Conrad Ng Dr. Anatoliy Gruzd Dr. Calvino Cheng Intern Supervisor Site Supervisor
    • 1. Intro 2. Background 3. Key Indexes 4. Cases 5. Summary 2 of 14 Project: Determination and visualization of family practice sizes and characteristics in Nova Scotia Purpose: To help better utilize laboratory resources Goals: 1) Determinate practice size of outpatient setting 2) Discover physicians and patient migration patterns 3) Benchmark for problematic cases a) Over usage b) Under usage
    • 1. Intro 2. Background 3. Key Indexes 4. Cases 5. Summary 3 of 14 Descriptive Statistics (Before cleaning) Rows 1007974 Patient count per Columns 28 Physician Clinic Mean 392.61 585.17 Physicians 966 Standard Error 16.13 50.29 Median 122 162 Clinics 626 Mode 1 1 Patients 288963 Standard Deviation 501.11 1257.15 No addresses 23 Range 2429 18434 Minimum 1 1 Maximum 2430 18435 Sum 378868 365731 Count 965 625
    • 1. Intro 2. Background 3. Key Indexes 4. Cases 5. Summary 4 of 14 Key Indexes a) Physicians PMB Problem b) Clinics Address Problem c) Time Active Order Date √ - slight problem d) Patients Patient Millennium ID √ e) Order Order ID √
    • 1. Intro 2. Background 3. Key Indexes 4. Cases (1/4) 5. Summary 5 of 14 PMB – Physician Registration Number  PMB_i2 DR_NAME PMB PMB_i2 Cameron , Marianne C (FF Spring Garden) 90802 14653 Cameron , Marianne C (PRIM) 14653 14653 Cameron , Marianne C (Tacoma) 92526 14653 Cameron , Stewart Martin 7748 7748 Campbell , Donald MacLeod 4055 4055 Campbell , Genevieve Marie 6678 6678 Campbell , Glenn R (FF Dartmouth) 90808 11889 Campbell , Glenn R (FF Sackville) 90809 11889 Campbell , Glenn R (FF Spring Garden) 90807 11889 Campbell , Glenn R (PRIM) 11889 11889 Kinley , Cecil Edwin (Park West) 90926 90926 Kinley , Cecil Edwin, Jr (Parkland) 7453 7453
    • 1. Intro 2. Background 3. Key Indexes 4. Cases (2/4) 5. Summary 6 of 14 Address STREET_ADDR STREET_ADDR2 STREET_ADDR3 Rapha Medical Centre 50 Tacoma Dr Suite 8 50 Tacoma Drive Medicine in Motion 5-121 Illsley Avenue Medicine in Motion 5-121 Ilsley Avenue Tantallon Family Practice 5150 St. Margarets Bay Road Suite 202 Scotia Square Medicine 5201 Duke Street Suite #0270 Scotia Square Medical 5201 Duke Street Suite 0270 Scotia Square Medical Clinic 5201 Duke Street Suite 0270 Sackville Medical Clinic PO Box 8-520 Sackville Dr Family Practice Clinic PO Box 94
    • 1. Intro 2. Background 3. Key Indexes 4. Cases (2/4) 5. Summary 7 of 14 Address STREET_ADDR STREET_ADDR2 STREET_ADDR3 Rapha Medical Centre 50 Tacoma Dr Suite 8 50 Tacoma Drive Medicine in Motion 5-121 Illsley Avenue Medicine in Motion 5-121 Ilsley Avenue Tantallon Family Practice 5150 St. Margarets Bay Road Suite 202 Scotia Square Medicine 5201 Duke Street Suite #0270 Scotia Square Medical 5201 Duke Street Suite 0270 Scotia Square Medical Clinic 5201 Duke Street Suite 0270 Sackville Medical Clinic PO Box 8-520 Sackville Dr Family Practice Clinic PO Box 94
    • 1. Intro 2. Background 3. Key Indexes 4. Cases (2/4) 5. Summary 8 of 14 Address (3 Columns  5 Columns)Name_ADDR1 Dept_ADDR2 St._ADDR3 PO_ADDR4 ADDR5Rapha Medical Centre 50 Tacoma Dr. 50 Tacoma Dr. Suite 8Medicine in Motion 121 Ilsley Ave. Suite 5Medicine in Motion 121 Ilsley Ave. Suite 5 5110 St Margarets BayTantallon Family Practice Rd. Suite 202Scotia Square Medicine 5201 Duke St. Suite 270Scotia Square Medical 5201 Duke St. Suite 270Scotia Square MedicalClinic 5201 Duke St. Suite 270Sackville Medical Clinic 520 Sackville Dr. PO Box 8Family Practice Clinic PO Box 94
    • 1. Intro 2. Background 3. Key Indexes 4. Cases (3/4) 5. Summary 9 of 14 AddressName_ADDR1 St._ADDR3 ADDR5 Clinic_Type 5991 Spring Garden Rd. Suite 820 General 5991 Spring Garden Rd. Suite 850 GeneralDr. Kevin Carbyn 5991 Spring Garden Rd. Suite 1040 GeneralFamily Focus Medical Clinic 5991 Spring Garden Rd. Suite 101 Family Focus 50 Tacoma Dr. General 50 Tacoma Dr. Suite 8 GeneralTacoma Family Medicine 58 Tacoma Dr. Suite 101 Walk-InTacoma Family Medicine 70 Tacoma Dr. Suite 101 Walk-In
    • 1. Intro 2. Background 3. Key Indexes 4. Cases (4/4) 5. Summary 10 of 14 Postal Code gpsvisualizer.com B2V2JS  B2V2J5 BOJ2L0  B0J2L0 BEH2Y9  B3H2Y9 Sample map it! 1) B3L2T2 2) BOL2T2
    • 1. Intro 2. Background 3. Key Indexes 4. Cases 5. Summary 11 of 14 Descriptive Statistics (Before and After cleaning) Before After Rows 1007974 925680 Columns 28 37 Physicians 966 426 Clinics 626 198 Patients 288963 278689 No addresses 23 0
    • 1. Intro 2. Background 3. Key Indexes 4. Cases 5. Summary 12 of 14 Descriptive Statistics (Before and After cleaning) Physician Clinic Before After Before AfterMean 392.61 814.15 585.17 1605.196Standard Error 16.13 27.41 50.29 186.7024Median 122 797 162 773Mode 1 1 1 1Standard Deviation 501.11 565.74 1257.15 2633.762Range 2429 2604 18434 19354Minimum 1 1 1 1Maximum 2430 2605 18435 19355Sum 378868 346830 365731 319434Count 965 426 625 199(-2)**Recent adjustment to duplication
    • 1. Intro 2. Background 3. Key Indexes 4. Cases 5. Summary 13 of 14 How did the errors occur? • Data Entry Errors (100,000 new daily) Average 200 errors for health card numbers, 0.5% [according to Data integrity officer] • Changes in address, names • Amalgamation of 6 different systems in the past • Need for compatibility with other systems [Presently 3] Summary • Need for greater authority control at input level (ex: Electronic Forms, index specialist) • Index! or relationship to indexed data! • Automation • Increase data usage/digestion so it becomes comparable to other counts
    • 1. Intro 2. Background 3. Key Indexes 4. Cases 5. Summary 14 of 14 Questions? Comments? Conrad Ng c.ng@dal.ca