SlideShare a Scribd company logo
1 of 15
Download to read offline
Data quality and coverage assessments
for the Secure Anonymous Information
Linkage databank
J. C. Demmler, C. Brooks, R. A. Lyons
College of Medicine, Swansea University
4th SHIP conference 2013, St. Andrews
29th of August 2013
Quality of routinely collected data
historically been collected for administrative purposes
data coverage is different for different datasets
but also changes over time and space
and between variables
Need for flexible data documentation
1 Documenting of dataset
2 Project documentation
3 Reproducible publications
4 Support web based enquiry system
Other considerations
usable by data analysts with minimum training
import and/or run code on the fly
option to display SQL and statistics coding
reproducible
Platform considerations
LaTeX, LyX or Open Office
using listings package
write or import syntax highlighted code
hide using comment package
weave together code and text using reproducible
research principle
Reproducible research (literate programming)
Package name Language
Sweave S-plus & R
knitr R
StatWeave STATA & SAS & R
SASweave SAS & R
MATweave Matlab/Octave
Pweave Python
Noweb C, C++, Perl, Java etc.
SPSS could be called through R shell or Noweb
STEP 1: Prepare SQL code
Lyx
SQL output if wanted in documentation
STEP 2: SQL → R
connect to SAIL database
connect to SHIP example1.sql script
run SHIP example1.sql script and save
results in a table called my.tables
manipulate and reformat table
create some summary statistics for
elements in table
STEP 3: R → documentation
Create formatted table straight from R object using xtable
Years covered by dataset
Spatial quality and coverage
Taking it further
Publish report online
export from LaTeX or LyX to HTML
create HTML straight from R markdown
Query basic data information through web site
Interactive web applications using R Studio Shiny
HTML form as input variables for LaTeX or R
markdown file
Thank you ...
LaTeX
Joanne Demmler www.latex-project.org
Email: j.demmler@swansea.ac.uk LyX
www.lyx.org
SAIL databank R
URL: www.SAILDatabank.com www.r-project.org
Email: SAILDatabank@swansea.ac.uk R Studio
www.rstudio.com

More Related Content

What's hot

2011 IBM-KNAW Cambridge - How to store meaningful bits permanently
2011 IBM-KNAW Cambridge - How to store meaningful bits permanently2011 IBM-KNAW Cambridge - How to store meaningful bits permanently
2011 IBM-KNAW Cambridge - How to store meaningful bits permanentlyDirk Roorda
 
Liberating Laboratory Data - Eureka
Liberating Laboratory Data - EurekaLiberating Laboratory Data - Eureka
Liberating Laboratory Data - EurekaStuart Chalk
 
D4Science scientific data infrastructure promoting interoperability by embrac...
D4Science scientific data infrastructure promoting interoperability by embrac...D4Science scientific data infrastructure promoting interoperability by embrac...
D4Science scientific data infrastructure promoting interoperability by embrac...FAO
 
Intro to Reproducible Research
Intro to Reproducible ResearchIntro to Reproducible Research
Intro to Reproducible ResearchC. Tobin Magle
 
Beecher cni fall 2010 v4
Beecher cni fall 2010 v4Beecher cni fall 2010 v4
Beecher cni fall 2010 v4Bryan Beecher
 
Fundamental file structure concepts & managing files of records
Fundamental file structure concepts & managing files of recordsFundamental file structure concepts & managing files of records
Fundamental file structure concepts & managing files of recordsDevyani Vaidya
 
Bibliosight (UKCoRR presentation)
Bibliosight (UKCoRR presentation)Bibliosight (UKCoRR presentation)
Bibliosight (UKCoRR presentation)Nick Sheppard
 

What's hot (8)

2011 IBM-KNAW Cambridge - How to store meaningful bits permanently
2011 IBM-KNAW Cambridge - How to store meaningful bits permanently2011 IBM-KNAW Cambridge - How to store meaningful bits permanently
2011 IBM-KNAW Cambridge - How to store meaningful bits permanently
 
Liberating Laboratory Data - Eureka
Liberating Laboratory Data - EurekaLiberating Laboratory Data - Eureka
Liberating Laboratory Data - Eureka
 
D4Science scientific data infrastructure promoting interoperability by embrac...
D4Science scientific data infrastructure promoting interoperability by embrac...D4Science scientific data infrastructure promoting interoperability by embrac...
D4Science scientific data infrastructure promoting interoperability by embrac...
 
computers
computerscomputers
computers
 
Intro to Reproducible Research
Intro to Reproducible ResearchIntro to Reproducible Research
Intro to Reproducible Research
 
Beecher cni fall 2010 v4
Beecher cni fall 2010 v4Beecher cni fall 2010 v4
Beecher cni fall 2010 v4
 
Fundamental file structure concepts & managing files of records
Fundamental file structure concepts & managing files of recordsFundamental file structure concepts & managing files of records
Fundamental file structure concepts & managing files of records
 
Bibliosight (UKCoRR presentation)
Bibliosight (UKCoRR presentation)Bibliosight (UKCoRR presentation)
Bibliosight (UKCoRR presentation)
 

Similar to Ship 2013 data quality and coverage

An Efficient PDP Scheme for Distributed Cloud Storage
An Efficient PDP Scheme for Distributed Cloud StorageAn Efficient PDP Scheme for Distributed Cloud Storage
An Efficient PDP Scheme for Distributed Cloud StorageIJMER
 
Federated Queries Across Both Different Storage Mediums and Different Data En...
Federated Queries Across Both Different Storage Mediums and Different Data En...Federated Queries Across Both Different Storage Mediums and Different Data En...
Federated Queries Across Both Different Storage Mediums and Different Data En...VMware Tanzu
 
Rule-based Capture/Storage of Scientific Data from PDF Files and Export using...
Rule-based Capture/Storage of Scientific Data from PDF Files and Export using...Rule-based Capture/Storage of Scientific Data from PDF Files and Export using...
Rule-based Capture/Storage of Scientific Data from PDF Files and Export using...Stuart Chalk
 
FAIR Workflows and Research Objects get a Workout
FAIR Workflows and Research Objects get a Workout FAIR Workflows and Research Objects get a Workout
FAIR Workflows and Research Objects get a Workout Carole Goble
 
Collaborative Data Analysis with Taverna Workflows
Collaborative Data Analysis with Taverna WorkflowsCollaborative Data Analysis with Taverna Workflows
Collaborative Data Analysis with Taverna WorkflowsAndrea Wiggins
 
Introduction to using REDCap for multi-site longitudinal research in medicine
Introduction to using REDCap for multi-site longitudinal research in medicineIntroduction to using REDCap for multi-site longitudinal research in medicine
Introduction to using REDCap for multi-site longitudinal research in medicineBrian T. Edwards
 
eResearch workflows for studying free and open source software development
eResearch workflows for studying free and open source software developmenteResearch workflows for studying free and open source software development
eResearch workflows for studying free and open source software developmentAndrea Wiggins
 
Creating R Packages
Creating R PackagesCreating R Packages
Creating R Packagesjalle6
 
Data science | What is Data science
Data science | What is Data scienceData science | What is Data science
Data science | What is Data scienceShilpaKrishna6
 
Data Publishing Workflows with Dataverse
Data Publishing Workflows with DataverseData Publishing Workflows with Dataverse
Data Publishing Workflows with DataverseMicah Altman
 
A Generic Scientific Data Model and Ontology for Representation of Chemical Data
A Generic Scientific Data Model and Ontology for Representation of Chemical DataA Generic Scientific Data Model and Ontology for Representation of Chemical Data
A Generic Scientific Data Model and Ontology for Representation of Chemical DataStuart Chalk
 
Curation-Friendly Tools for the Scientific Researcher
Curation-Friendly Tools for the Scientific ResearcherCuration-Friendly Tools for the Scientific Researcher
Curation-Friendly Tools for the Scientific Researcherbwestra
 
Replicating FLOSS Research as eResearch
Replicating FLOSS Research as eResearchReplicating FLOSS Research as eResearch
Replicating FLOSS Research as eResearchAndrea Wiggins
 
Sharing massive data analysis: from provenance to linked experiment reports
Sharing massive data analysis: from provenance to linked experiment reportsSharing massive data analysis: from provenance to linked experiment reports
Sharing massive data analysis: from provenance to linked experiment reportsGaignard Alban
 
What is an Open Data Lake? - Data Sheets | Whitepaper
What is an Open Data Lake? - Data Sheets | WhitepaperWhat is an Open Data Lake? - Data Sheets | Whitepaper
What is an Open Data Lake? - Data Sheets | WhitepaperVasu S
 

Similar to Ship 2013 data quality and coverage (20)

Manish@TCS
Manish@TCSManish@TCS
Manish@TCS
 
An Efficient PDP Scheme for Distributed Cloud Storage
An Efficient PDP Scheme for Distributed Cloud StorageAn Efficient PDP Scheme for Distributed Cloud Storage
An Efficient PDP Scheme for Distributed Cloud Storage
 
Federated Queries Across Both Different Storage Mediums and Different Data En...
Federated Queries Across Both Different Storage Mediums and Different Data En...Federated Queries Across Both Different Storage Mediums and Different Data En...
Federated Queries Across Both Different Storage Mediums and Different Data En...
 
Rule-based Capture/Storage of Scientific Data from PDF Files and Export using...
Rule-based Capture/Storage of Scientific Data from PDF Files and Export using...Rule-based Capture/Storage of Scientific Data from PDF Files and Export using...
Rule-based Capture/Storage of Scientific Data from PDF Files and Export using...
 
FAIR Workflows and Research Objects get a Workout
FAIR Workflows and Research Objects get a Workout FAIR Workflows and Research Objects get a Workout
FAIR Workflows and Research Objects get a Workout
 
Mn3422372248
Mn3422372248Mn3422372248
Mn3422372248
 
Collaborative Data Analysis with Taverna Workflows
Collaborative Data Analysis with Taverna WorkflowsCollaborative Data Analysis with Taverna Workflows
Collaborative Data Analysis with Taverna Workflows
 
Introduction to using REDCap for multi-site longitudinal research in medicine
Introduction to using REDCap for multi-site longitudinal research in medicineIntroduction to using REDCap for multi-site longitudinal research in medicine
Introduction to using REDCap for multi-site longitudinal research in medicine
 
Manish@CMC Ltd
Manish@CMC LtdManish@CMC Ltd
Manish@CMC Ltd
 
Resume_Intern
Resume_InternResume_Intern
Resume_Intern
 
eResearch workflows for studying free and open source software development
eResearch workflows for studying free and open source software developmenteResearch workflows for studying free and open source software development
eResearch workflows for studying free and open source software development
 
Creating R Packages
Creating R PackagesCreating R Packages
Creating R Packages
 
Data science | What is Data science
Data science | What is Data scienceData science | What is Data science
Data science | What is Data science
 
Data Publishing Workflows with Dataverse
Data Publishing Workflows with DataverseData Publishing Workflows with Dataverse
Data Publishing Workflows with Dataverse
 
N_BHANU_PRAKASH
N_BHANU_PRAKASHN_BHANU_PRAKASH
N_BHANU_PRAKASH
 
A Generic Scientific Data Model and Ontology for Representation of Chemical Data
A Generic Scientific Data Model and Ontology for Representation of Chemical DataA Generic Scientific Data Model and Ontology for Representation of Chemical Data
A Generic Scientific Data Model and Ontology for Representation of Chemical Data
 
Curation-Friendly Tools for the Scientific Researcher
Curation-Friendly Tools for the Scientific ResearcherCuration-Friendly Tools for the Scientific Researcher
Curation-Friendly Tools for the Scientific Researcher
 
Replicating FLOSS Research as eResearch
Replicating FLOSS Research as eResearchReplicating FLOSS Research as eResearch
Replicating FLOSS Research as eResearch
 
Sharing massive data analysis: from provenance to linked experiment reports
Sharing massive data analysis: from provenance to linked experiment reportsSharing massive data analysis: from provenance to linked experiment reports
Sharing massive data analysis: from provenance to linked experiment reports
 
What is an Open Data Lake? - Data Sheets | Whitepaper
What is an Open Data Lake? - Data Sheets | WhitepaperWhat is an Open Data Lake? - Data Sheets | Whitepaper
What is an Open Data Lake? - Data Sheets | Whitepaper
 

Recently uploaded

Using AI to boost productivity for developers
Using AI to boost productivity for developersUsing AI to boost productivity for developers
Using AI to boost productivity for developersTeri Eyenike
 
"I hear you": Moving beyond empathy in UXR
"I hear you": Moving beyond empathy in UXR"I hear you": Moving beyond empathy in UXR
"I hear you": Moving beyond empathy in UXRMegan Campos
 
STM valmiusseminaari 26-04-2024 PUUMALAINEN Ajankohtaista kansainvälisestä yh...
STM valmiusseminaari 26-04-2024 PUUMALAINEN Ajankohtaista kansainvälisestä yh...STM valmiusseminaari 26-04-2024 PUUMALAINEN Ajankohtaista kansainvälisestä yh...
STM valmiusseminaari 26-04-2024 PUUMALAINEN Ajankohtaista kansainvälisestä yh...Sosiaali- ja terveysministeriö / yleiset
 
SaaStr Workshop Wednesday with CEO of Guru
SaaStr Workshop Wednesday with CEO of GuruSaaStr Workshop Wednesday with CEO of Guru
SaaStr Workshop Wednesday with CEO of Gurusaastr
 
TSM unit 5 Toxicokinetics seminar by Ansari Aashif Raza.pptx
TSM unit 5 Toxicokinetics seminar by  Ansari Aashif Raza.pptxTSM unit 5 Toxicokinetics seminar by  Ansari Aashif Raza.pptx
TSM unit 5 Toxicokinetics seminar by Ansari Aashif Raza.pptxAnsari Aashif Raza Mohd Imtiyaz
 
The Concession of Asaba International Airport: Balancing Politics and Policy ...
The Concession of Asaba International Airport: Balancing Politics and Policy ...The Concession of Asaba International Airport: Balancing Politics and Policy ...
The Concession of Asaba International Airport: Balancing Politics and Policy ...Kayode Fayemi
 
2024-05-15-Surat Meetup-Hyperautomation.pptx
2024-05-15-Surat Meetup-Hyperautomation.pptx2024-05-15-Surat Meetup-Hyperautomation.pptx
2024-05-15-Surat Meetup-Hyperautomation.pptxnitishjain2015
 
2024 mega trends for the digital workplace - FINAL.pdf
2024 mega trends for the digital workplace - FINAL.pdf2024 mega trends for the digital workplace - FINAL.pdf
2024 mega trends for the digital workplace - FINAL.pdfNancy Goebel
 
ACM CHT Best Inspection Practices Kinben Innovation MIC Slideshare.pdf
ACM CHT Best Inspection Practices Kinben Innovation MIC Slideshare.pdfACM CHT Best Inspection Practices Kinben Innovation MIC Slideshare.pdf
ACM CHT Best Inspection Practices Kinben Innovation MIC Slideshare.pdfKinben Innovation Private Limited
 
DAY 0 8 A Revelation 05-19-2024 PPT.pptx
DAY 0 8 A Revelation 05-19-2024 PPT.pptxDAY 0 8 A Revelation 05-19-2024 PPT.pptx
DAY 0 8 A Revelation 05-19-2024 PPT.pptxFamilyWorshipCenterD
 
Databricks Machine Learning Associate Exam Dumps 2024.pdf
Databricks Machine Learning Associate Exam Dumps 2024.pdfDatabricks Machine Learning Associate Exam Dumps 2024.pdf
Databricks Machine Learning Associate Exam Dumps 2024.pdfSkillCertProExams
 
Microsoft Fabric Analytics Engineer (DP-600) Exam Dumps 2024.pdf
Microsoft Fabric Analytics Engineer (DP-600) Exam Dumps 2024.pdfMicrosoft Fabric Analytics Engineer (DP-600) Exam Dumps 2024.pdf
Microsoft Fabric Analytics Engineer (DP-600) Exam Dumps 2024.pdfSkillCertProExams
 

Recently uploaded (12)

Using AI to boost productivity for developers
Using AI to boost productivity for developersUsing AI to boost productivity for developers
Using AI to boost productivity for developers
 
"I hear you": Moving beyond empathy in UXR
"I hear you": Moving beyond empathy in UXR"I hear you": Moving beyond empathy in UXR
"I hear you": Moving beyond empathy in UXR
 
STM valmiusseminaari 26-04-2024 PUUMALAINEN Ajankohtaista kansainvälisestä yh...
STM valmiusseminaari 26-04-2024 PUUMALAINEN Ajankohtaista kansainvälisestä yh...STM valmiusseminaari 26-04-2024 PUUMALAINEN Ajankohtaista kansainvälisestä yh...
STM valmiusseminaari 26-04-2024 PUUMALAINEN Ajankohtaista kansainvälisestä yh...
 
SaaStr Workshop Wednesday with CEO of Guru
SaaStr Workshop Wednesday with CEO of GuruSaaStr Workshop Wednesday with CEO of Guru
SaaStr Workshop Wednesday with CEO of Guru
 
TSM unit 5 Toxicokinetics seminar by Ansari Aashif Raza.pptx
TSM unit 5 Toxicokinetics seminar by  Ansari Aashif Raza.pptxTSM unit 5 Toxicokinetics seminar by  Ansari Aashif Raza.pptx
TSM unit 5 Toxicokinetics seminar by Ansari Aashif Raza.pptx
 
The Concession of Asaba International Airport: Balancing Politics and Policy ...
The Concession of Asaba International Airport: Balancing Politics and Policy ...The Concession of Asaba International Airport: Balancing Politics and Policy ...
The Concession of Asaba International Airport: Balancing Politics and Policy ...
 
2024-05-15-Surat Meetup-Hyperautomation.pptx
2024-05-15-Surat Meetup-Hyperautomation.pptx2024-05-15-Surat Meetup-Hyperautomation.pptx
2024-05-15-Surat Meetup-Hyperautomation.pptx
 
2024 mega trends for the digital workplace - FINAL.pdf
2024 mega trends for the digital workplace - FINAL.pdf2024 mega trends for the digital workplace - FINAL.pdf
2024 mega trends for the digital workplace - FINAL.pdf
 
ACM CHT Best Inspection Practices Kinben Innovation MIC Slideshare.pdf
ACM CHT Best Inspection Practices Kinben Innovation MIC Slideshare.pdfACM CHT Best Inspection Practices Kinben Innovation MIC Slideshare.pdf
ACM CHT Best Inspection Practices Kinben Innovation MIC Slideshare.pdf
 
DAY 0 8 A Revelation 05-19-2024 PPT.pptx
DAY 0 8 A Revelation 05-19-2024 PPT.pptxDAY 0 8 A Revelation 05-19-2024 PPT.pptx
DAY 0 8 A Revelation 05-19-2024 PPT.pptx
 
Databricks Machine Learning Associate Exam Dumps 2024.pdf
Databricks Machine Learning Associate Exam Dumps 2024.pdfDatabricks Machine Learning Associate Exam Dumps 2024.pdf
Databricks Machine Learning Associate Exam Dumps 2024.pdf
 
Microsoft Fabric Analytics Engineer (DP-600) Exam Dumps 2024.pdf
Microsoft Fabric Analytics Engineer (DP-600) Exam Dumps 2024.pdfMicrosoft Fabric Analytics Engineer (DP-600) Exam Dumps 2024.pdf
Microsoft Fabric Analytics Engineer (DP-600) Exam Dumps 2024.pdf
 

Ship 2013 data quality and coverage

  • 1. Data quality and coverage assessments for the Secure Anonymous Information Linkage databank J. C. Demmler, C. Brooks, R. A. Lyons College of Medicine, Swansea University 4th SHIP conference 2013, St. Andrews 29th of August 2013
  • 2. Quality of routinely collected data historically been collected for administrative purposes data coverage is different for different datasets but also changes over time and space and between variables
  • 3. Need for flexible data documentation 1 Documenting of dataset 2 Project documentation 3 Reproducible publications 4 Support web based enquiry system
  • 4. Other considerations usable by data analysts with minimum training import and/or run code on the fly option to display SQL and statistics coding reproducible
  • 5. Platform considerations LaTeX, LyX or Open Office using listings package write or import syntax highlighted code hide using comment package weave together code and text using reproducible research principle
  • 6. Reproducible research (literate programming) Package name Language Sweave S-plus & R knitr R StatWeave STATA & SAS & R SASweave SAS & R MATweave Matlab/Octave Pweave Python Noweb C, C++, Perl, Java etc. SPSS could be called through R shell or Noweb
  • 7. STEP 1: Prepare SQL code Lyx SQL output if wanted in documentation
  • 8. STEP 2: SQL → R connect to SAIL database connect to SHIP example1.sql script run SHIP example1.sql script and save results in a table called my.tables manipulate and reformat table create some summary statistics for elements in table
  • 9. STEP 3: R → documentation
  • 10. Create formatted table straight from R object using xtable
  • 11. Years covered by dataset
  • 12.
  • 14. Taking it further Publish report online export from LaTeX or LyX to HTML create HTML straight from R markdown Query basic data information through web site Interactive web applications using R Studio Shiny HTML form as input variables for LaTeX or R markdown file
  • 15. Thank you ... LaTeX Joanne Demmler www.latex-project.org Email: j.demmler@swansea.ac.uk LyX www.lyx.org SAIL databank R URL: www.SAILDatabank.com www.r-project.org Email: SAILDatabank@swansea.ac.uk R Studio www.rstudio.com