Report Calc for Quality Control

Report Calc for Quality Control (RCQC) is an interpreter for the RCQC scripting language for text-mining log and data files to create reports and to control workflow within a workflow engine. It works as a python command line tool and also as a Galaxy bioinformatics platform tool. We're building a library of simple "recipe" scripts that extract quality control (QC) data from various reports like FastQC, QUAST, CheckM and SPAdes into a common JSON data format. By placing the RCQC app in your workflow downstream from one of these apps, you can convert their textual or tabular data into a much more standardized and software-friendly format.

To provide a genomic narrative that can be trusted, microbiology
laboratories need quality control (QC) metrics to accompany their
genomic pipelines. QC metrics enable:
•  Implementing standards in routine lab sample processing
•  Performance comparison of pipeline optimizations or alternatives
•  Retrospective tracing of problems that arise
QC metrics are not easy to implement – they may need to be adjusted for
organism type, sample quality, sequencing technology and preparation,
and the mix of software components that are brought together in a
pipeline. Another challenge is to transform QC reporting from a manual
review of a pipeline’s disparate and often opaque application log files,
into an automated system of reporting and decision making that can be
adjusted by researchers and system administrators who are not expert
programmers.
We have developed a general purpose text-mining and reporting
application called Report Calc for Quality Control (RCQC) that works
directly within command-line scripts, or as a tool in Galaxy (an interactive
bioinformatics platform and workflow engine). An RCQC interpreter
follows instructions in a RCQC script to extract QC variables from various
application log and report files. It can implement rules that trigger
warning or failure statuses in an active pipeline. Various opportunities
arise for metrics along the stages of a genomic pipeline; our initial focus
is on basic assembly metrics as illustrated on this poster.
Abstract
RCQC Recipes
QC Ontology
Using the JSON-LD format’s metadata feature, RCQC can link particular
QC report terms to their standardized ontology counterparts. Creating a
controlled vocabulary for QC enables reports from disparate genomic
pipelines to be compared, which should eventually lead to a set of
pipeline metrics for accrediting commercial, government and open source
software. Within the context of the OBOFoundry of ontologies we are
introducing an ontology called GenEpiO (currently available at
https://github.com/Public-Health-Bioinformatics/irida_ontology) which
holds QC terms like "genome size ratio", “contig count”, etc. Using the
Protégé ontology editor it is easy to see the definitions for these terms.
Acknowledgements
IRIDA project funding is provided by Genome Canada, Genome BC, and
the Genomics R&D Initiative (GRDI) with additional support from Simon
Fraser University and Cystic Fibrosis Canada. We thank additional
project advisors for constructive comments.
We have started a library of simple "recipe" scripts that extract quality
control (QC) data from various reports like FastQC, QUAST, CheckM and
SPAdes into the popular and software-friendly JSON format (an auto-
generated HTML version of the same content is also available). One can
override sections of an RCQC recipe with settings that test variations in a
pipeline job. An example RCQC text-mining script and output HTML and
JSON report is shown below along with typical report files from other
pipeline tools.
1Department of Pathology, University of British Columbia; 2National Microbiology Laboratory, Public Health Agency of Canada; 3Department of Pathology,
University of British Columbia & BC Public Health Microbiology and Reference Laboratory
Damion M. Dooley1; Aaron J. Petkau2; Franklin Bristow2;
Gary Van Domselaar2; William W.L. Hsiao3
A Scripting Language For Standardized Evaluation Of Quality
Metrics In Galaxy And Command-line Driven Workflows
This work stemmed from the plan to enhance QC reporting on the web-
based Integrated Rapid Infectious Disease Analysis (www.IRIDA.ca)
project which manages sequence libraries and pipelines for food-born
pathogen assembly, annotation, SNP detection, and phylogenetic
analysis. RCQC has been developed to work as a command-line python
app, but in addition, since IRIDA uses Galaxy to execute its pipeline, we
have a Galaxy RCQC tool for “pro” users to develop recipes. We will be
offering a basic version of this tool that allows users without programming
skills to adjust key QC parameters only.
Recipes can include conditionals that trigger a halt to a pipeline by
sending the appropriate signal (exit code). More than one RCQC recipe
can be run in a pipeline, and their report output can be daisy chained in
order to contribute to a single collective report. QC metric conditionals
shown below can either signal a possible error situation (the “fail(qc…)”
call), or even call a halt to futile pipeline work (via “fail(job …)”).
adjusting parameters and formulae for pipeline operation – one that did
not require recompilation after each user-driven change. As a result, the
RCQC system provides a more transparent rule set that reduces the skill
needed to make process adjustments. Standard assembly pipeline QC
metrics are introduced which provide a blueprint for the way QC
components could be shared amongst NGS sequencing pipelines.
Further information, including source code, is available at
https://github.com/Public-Health-Bioinformatics/rcqc.
Implementation
Protege ontology editor view of GenEpiO assembly quality control terms
JSON-LDHTML
FLASHFastQC
CheckM
RCQC recipe for text-mining flash.log
In developing a scripting language to
do this work, we did not want to
reinvent the wheel (in fact RCQC offers
up for reuse all of python’s built-in
math and operator functions). We did
however need a flexible mechanism for
FLASH

Recommended

Scale by
ScaleScale
ScaleMiguel Lopez
267 views19 slides
Kemwell Analytical presentation by
Kemwell Analytical presentation Kemwell Analytical presentation
Kemwell Analytical presentation MithaliRosario
262 views10 slides
Building Efficient Software with Property Based Testing by
Building Efficient Software with Property Based TestingBuilding Efficient Software with Property Based Testing
Building Efficient Software with Property Based TestingCitiusTech
429 views13 slides
ICH GUDLINES by
ICH GUDLINESICH GUDLINES
ICH GUDLINESGRamesh15
102 views14 slides
Ready, Set, Automate - Best Practices in Using Automated Tools for Validation by
Ready, Set, Automate - Best Practices in Using Automated Tools for ValidationReady, Set, Automate - Best Practices in Using Automated Tools for Validation
Ready, Set, Automate - Best Practices in Using Automated Tools for ValidationCovance
25 views1 slide
Agile for Software as a Medical Device by
Agile for Software as a Medical DeviceAgile for Software as a Medical Device
Agile for Software as a Medical DeviceOrthogonal
2.3K views55 slides

More Related Content

Similar to Report Calc for Quality Control

Scale and Load Testing of Micro-Service by
Scale and Load Testing of Micro-ServiceScale and Load Testing of Micro-Service
Scale and Load Testing of Micro-ServiceIRJET Journal
2 views4 slides
ABAP Test Cockpit in action with Doctor ZedGe and abap2xlsx by
ABAP Test Cockpit in action with Doctor ZedGe and abap2xlsxABAP Test Cockpit in action with Doctor ZedGe and abap2xlsx
ABAP Test Cockpit in action with Doctor ZedGe and abap2xlsxAlessandro Lavazzi
2.6K views52 slides
Solo Requisitos 2008 - 07 Upc by
Solo Requisitos 2008 - 07 UpcSolo Requisitos 2008 - 07 Upc
Solo Requisitos 2008 - 07 UpcPepe
303 views33 slides
safety assurence in process control by
safety assurence in process controlsafety assurence in process control
safety assurence in process controlNathiya Vaithi
116 views7 slides
Software Maintenance Bug Triaging by
Software Maintenance Bug TriagingSoftware Maintenance Bug Triaging
Software Maintenance Bug TriagingRamis Khan
1K views18 slides
Reports & Analysis_Katalyst HLS by
Reports & Analysis_Katalyst HLSReports & Analysis_Katalyst HLS
Reports & Analysis_Katalyst HLSKatalyst HLS
874 views28 slides

Similar to Report Calc for Quality Control(20)

Scale and Load Testing of Micro-Service by IRJET Journal
Scale and Load Testing of Micro-ServiceScale and Load Testing of Micro-Service
Scale and Load Testing of Micro-Service
IRJET Journal2 views
ABAP Test Cockpit in action with Doctor ZedGe and abap2xlsx by Alessandro Lavazzi
ABAP Test Cockpit in action with Doctor ZedGe and abap2xlsxABAP Test Cockpit in action with Doctor ZedGe and abap2xlsx
ABAP Test Cockpit in action with Doctor ZedGe and abap2xlsx
Alessandro Lavazzi2.6K views
Solo Requisitos 2008 - 07 Upc by Pepe
Solo Requisitos 2008 - 07 UpcSolo Requisitos 2008 - 07 Upc
Solo Requisitos 2008 - 07 Upc
Pepe 303 views
safety assurence in process control by Nathiya Vaithi
safety assurence in process controlsafety assurence in process control
safety assurence in process control
Nathiya Vaithi116 views
Software Maintenance Bug Triaging by Ramis Khan
Software Maintenance Bug TriagingSoftware Maintenance Bug Triaging
Software Maintenance Bug Triaging
Ramis Khan1K views
Reports & Analysis_Katalyst HLS by Katalyst HLS
Reports & Analysis_Katalyst HLSReports & Analysis_Katalyst HLS
Reports & Analysis_Katalyst HLS
Katalyst HLS874 views
Oracle application testing suite (OATS) by Koushik Arvapally
Oracle application testing suite (OATS)Oracle application testing suite (OATS)
Oracle application testing suite (OATS)
Koushik Arvapally21.1K views
Cypress/VSAC Presentation at HIMSS13 by Saul Kravitz
Cypress/VSAC Presentation at HIMSS13Cypress/VSAC Presentation at HIMSS13
Cypress/VSAC Presentation at HIMSS13
Saul Kravitz2.1K views
Cypress nlm himss13_03042013 by Saul Kravitz
Cypress nlm himss13_03042013Cypress nlm himss13_03042013
Cypress nlm himss13_03042013
Saul Kravitz865 views
Control source code quality using the SonarQube platform by PVS-Studio
Control source code quality using the SonarQube platformControl source code quality using the SonarQube platform
Control source code quality using the SonarQube platform
PVS-Studio47 views
LT033 RIQAS Explained MAY17 by Randox
LT033 RIQAS Explained MAY17LT033 RIQAS Explained MAY17
LT033 RIQAS Explained MAY17
Randox1.3K views
Value stream mapping for DevOps by Marc Hornbeek
Value stream mapping for DevOpsValue stream mapping for DevOps
Value stream mapping for DevOps
Marc Hornbeek1.1K views
CV_SyedShoeb_2015 by Syed Shoeb
CV_SyedShoeb_2015CV_SyedShoeb_2015
CV_SyedShoeb_2015
Syed Shoeb207 views
Overview on “Computer System Validation” CSV by Anil Sharma
Overview on  “Computer System Validation” CSVOverview on  “Computer System Validation” CSV
Overview on “Computer System Validation” CSV
Anil Sharma924 views
NRNB project Stoichiometry Plugin by Sravanthi Sinha
NRNB project Stoichiometry PluginNRNB project Stoichiometry Plugin
NRNB project Stoichiometry Plugin
Sravanthi Sinha464 views
QualityGate for buyers of custom software by Dr. Tibor Bakota
QualityGate for buyers of custom softwareQualityGate for buyers of custom software
QualityGate for buyers of custom software
Dr. Tibor Bakota198 views

More from IRIDA_community

Robertson immemxi final March 2016 by
Robertson immemxi final March 2016Robertson immemxi final March 2016
Robertson immemxi final March 2016IRIDA_community
367 views22 slides
Hetman immem xi final March 2016 by
Hetman immem xi final March 2016Hetman immem xi final March 2016
Hetman immem xi final March 2016IRIDA_community
486 views21 slides
Barker immemxi final March 2016 by
Barker immemxi final March 2016Barker immemxi final March 2016
Barker immemxi final March 2016IRIDA_community
312 views26 slides
Emma FoodON poster3 by
Emma FoodON poster3Emma FoodON poster3
Emma FoodON poster3IRIDA_community
404 views1 slide
Emma Food on workshop allergy_eg by
Emma Food on workshop allergy_egEmma Food on workshop allergy_eg
Emma Food on workshop allergy_egIRIDA_community
525 views15 slides
Biocuration gen epio_poster by
Biocuration gen epio_posterBiocuration gen epio_poster
Biocuration gen epio_posterIRIDA_community
262 views1 slide

More from IRIDA_community(15)

Robertson immemxi final March 2016 by IRIDA_community
Robertson immemxi final March 2016Robertson immemxi final March 2016
Robertson immemxi final March 2016
IRIDA_community367 views
Hetman immem xi final March 2016 by IRIDA_community
Hetman immem xi final March 2016Hetman immem xi final March 2016
Hetman immem xi final March 2016
IRIDA_community486 views
Emma Food on workshop allergy_eg by IRIDA_community
Emma Food on workshop allergy_egEmma Food on workshop allergy_eg
Emma Food on workshop allergy_eg
IRIDA_community525 views
Emma Griffiths ASM microbe gen_epio_poster by IRIDA_community
Emma Griffiths ASM microbe gen_epio_posterEmma Griffiths ASM microbe gen_epio_poster
Emma Griffiths ASM microbe gen_epio_poster
IRIDA_community383 views
Julie Shay CCBC poster may 11 2016 by IRIDA_community
Julie Shay CCBC poster may 11 2016Julie Shay CCBC poster may 11 2016
Julie Shay CCBC poster may 11 2016
IRIDA_community321 views
Integrate Ontologies into your apps by IRIDA_community
Integrate Ontologies into your appsIntegrate Ontologies into your apps
Integrate Ontologies into your apps
IRIDA_community132 views
IRIDA: Canada’s federated platform for genomic epidemiology, ABPHM 2015 WHsiao by IRIDA_community
IRIDA: Canada’s federated platform for genomic epidemiology, ABPHM 2015 WHsiaoIRIDA: Canada’s federated platform for genomic epidemiology, ABPHM 2015 WHsiao
IRIDA: Canada’s federated platform for genomic epidemiology, ABPHM 2015 WHsiao
IRIDA_community436 views
Domselaar GMI8 Beijing Canadian WGS Surveillance Experience by IRIDA_community
Domselaar GMI8 Beijing Canadian WGS Surveillance ExperienceDomselaar GMI8 Beijing Canadian WGS Surveillance Experience
Domselaar GMI8 Beijing Canadian WGS Surveillance Experience
IRIDA_community570 views

Recently uploaded

Introduction to Gradle by
Introduction to GradleIntroduction to Gradle
Introduction to GradleJohn Valentino
6 views7 slides
360 graden fabriek by
360 graden fabriek360 graden fabriek
360 graden fabriekinfo33492
165 views25 slides
Bootstrapping vs Venture Capital.pptx by
Bootstrapping vs Venture Capital.pptxBootstrapping vs Venture Capital.pptx
Bootstrapping vs Venture Capital.pptxZeljko Svedic
15 views17 slides
DRYiCE™ iAutomate: AI-enhanced Intelligent Runbook Automation by
DRYiCE™ iAutomate: AI-enhanced Intelligent Runbook AutomationDRYiCE™ iAutomate: AI-enhanced Intelligent Runbook Automation
DRYiCE™ iAutomate: AI-enhanced Intelligent Runbook AutomationHCLSoftware
6 views8 slides
How to build dyanmic dashboards and ensure they always work by
How to build dyanmic dashboards and ensure they always workHow to build dyanmic dashboards and ensure they always work
How to build dyanmic dashboards and ensure they always workWiiisdom
14 views13 slides
Agile 101 by
Agile 101Agile 101
Agile 101John Valentino
12 views20 slides

Recently uploaded(20)

360 graden fabriek by info33492
360 graden fabriek360 graden fabriek
360 graden fabriek
info33492165 views
Bootstrapping vs Venture Capital.pptx by Zeljko Svedic
Bootstrapping vs Venture Capital.pptxBootstrapping vs Venture Capital.pptx
Bootstrapping vs Venture Capital.pptx
Zeljko Svedic15 views
DRYiCE™ iAutomate: AI-enhanced Intelligent Runbook Automation by HCLSoftware
DRYiCE™ iAutomate: AI-enhanced Intelligent Runbook AutomationDRYiCE™ iAutomate: AI-enhanced Intelligent Runbook Automation
DRYiCE™ iAutomate: AI-enhanced Intelligent Runbook Automation
HCLSoftware6 views
How to build dyanmic dashboards and ensure they always work by Wiiisdom
How to build dyanmic dashboards and ensure they always workHow to build dyanmic dashboards and ensure they always work
How to build dyanmic dashboards and ensure they always work
Wiiisdom14 views
Understanding HTML terminology by artembondar5
Understanding HTML terminologyUnderstanding HTML terminology
Understanding HTML terminology
artembondar57 views
JioEngage_Presentation.pptx by admin125455
JioEngage_Presentation.pptxJioEngage_Presentation.pptx
JioEngage_Presentation.pptx
admin1254558 views
ADDO_2022_CICID_Tom_Halpin.pdf by TomHalpin9
ADDO_2022_CICID_Tom_Halpin.pdfADDO_2022_CICID_Tom_Halpin.pdf
ADDO_2022_CICID_Tom_Halpin.pdf
TomHalpin95 views
Dev-HRE-Ops - Addressing the _Last Mile DevOps Challenge_ in Highly Regulated... by TomHalpin9
Dev-HRE-Ops - Addressing the _Last Mile DevOps Challenge_ in Highly Regulated...Dev-HRE-Ops - Addressing the _Last Mile DevOps Challenge_ in Highly Regulated...
Dev-HRE-Ops - Addressing the _Last Mile DevOps Challenge_ in Highly Regulated...
TomHalpin96 views
Introduction to Git Source Control by John Valentino
Introduction to Git Source ControlIntroduction to Git Source Control
Introduction to Git Source Control
John Valentino7 views
Generic or specific? Making sensible software design decisions by Bert Jan Schrijver
Generic or specific? Making sensible software design decisionsGeneric or specific? Making sensible software design decisions
Generic or specific? Making sensible software design decisions
predicting-m3-devopsconMunich-2023.pptx by Tier1 app
predicting-m3-devopsconMunich-2023.pptxpredicting-m3-devopsconMunich-2023.pptx
predicting-m3-devopsconMunich-2023.pptx
Tier1 app8 views
How Workforce Management Software Empowers SMEs | TraQSuite by TraQSuite
How Workforce Management Software Empowers SMEs | TraQSuiteHow Workforce Management Software Empowers SMEs | TraQSuite
How Workforce Management Software Empowers SMEs | TraQSuite
TraQSuite6 views

Report Calc for Quality Control

  • 1. To provide a genomic narrative that can be trusted, microbiology laboratories need quality control (QC) metrics to accompany their genomic pipelines. QC metrics enable: •  Implementing standards in routine lab sample processing •  Performance comparison of pipeline optimizations or alternatives •  Retrospective tracing of problems that arise QC metrics are not easy to implement – they may need to be adjusted for organism type, sample quality, sequencing technology and preparation, and the mix of software components that are brought together in a pipeline. Another challenge is to transform QC reporting from a manual review of a pipeline’s disparate and often opaque application log files, into an automated system of reporting and decision making that can be adjusted by researchers and system administrators who are not expert programmers. We have developed a general purpose text-mining and reporting application called Report Calc for Quality Control (RCQC) that works directly within command-line scripts, or as a tool in Galaxy (an interactive bioinformatics platform and workflow engine). An RCQC interpreter follows instructions in a RCQC script to extract QC variables from various application log and report files. It can implement rules that trigger warning or failure statuses in an active pipeline. Various opportunities arise for metrics along the stages of a genomic pipeline; our initial focus is on basic assembly metrics as illustrated on this poster. Abstract RCQC Recipes QC Ontology Using the JSON-LD format’s metadata feature, RCQC can link particular QC report terms to their standardized ontology counterparts. Creating a controlled vocabulary for QC enables reports from disparate genomic pipelines to be compared, which should eventually lead to a set of pipeline metrics for accrediting commercial, government and open source software. Within the context of the OBOFoundry of ontologies we are introducing an ontology called GenEpiO (currently available at https://github.com/Public-Health-Bioinformatics/irida_ontology) which holds QC terms like "genome size ratio", “contig count”, etc. Using the Protégé ontology editor it is easy to see the definitions for these terms. Acknowledgements IRIDA project funding is provided by Genome Canada, Genome BC, and the Genomics R&D Initiative (GRDI) with additional support from Simon Fraser University and Cystic Fibrosis Canada. We thank additional project advisors for constructive comments. We have started a library of simple "recipe" scripts that extract quality control (QC) data from various reports like FastQC, QUAST, CheckM and SPAdes into the popular and software-friendly JSON format (an auto- generated HTML version of the same content is also available). One can override sections of an RCQC recipe with settings that test variations in a pipeline job. An example RCQC text-mining script and output HTML and JSON report is shown below along with typical report files from other pipeline tools. 1Department of Pathology, University of British Columbia; 2National Microbiology Laboratory, Public Health Agency of Canada; 3Department of Pathology, University of British Columbia & BC Public Health Microbiology and Reference Laboratory Damion M. Dooley1; Aaron J. Petkau2; Franklin Bristow2; Gary Van Domselaar2; William W.L. Hsiao3 A Scripting Language For Standardized Evaluation Of Quality Metrics In Galaxy And Command-line Driven Workflows This work stemmed from the plan to enhance QC reporting on the web- based Integrated Rapid Infectious Disease Analysis (www.IRIDA.ca) project which manages sequence libraries and pipelines for food-born pathogen assembly, annotation, SNP detection, and phylogenetic analysis. RCQC has been developed to work as a command-line python app, but in addition, since IRIDA uses Galaxy to execute its pipeline, we have a Galaxy RCQC tool for “pro” users to develop recipes. We will be offering a basic version of this tool that allows users without programming skills to adjust key QC parameters only. Recipes can include conditionals that trigger a halt to a pipeline by sending the appropriate signal (exit code). More than one RCQC recipe can be run in a pipeline, and their report output can be daisy chained in order to contribute to a single collective report. QC metric conditionals shown below can either signal a possible error situation (the “fail(qc…)” call), or even call a halt to futile pipeline work (via “fail(job …)”). adjusting parameters and formulae for pipeline operation – one that did not require recompilation after each user-driven change. As a result, the RCQC system provides a more transparent rule set that reduces the skill needed to make process adjustments. Standard assembly pipeline QC metrics are introduced which provide a blueprint for the way QC components could be shared amongst NGS sequencing pipelines. Further information, including source code, is available at https://github.com/Public-Health-Bioinformatics/rcqc. Implementation Protege ontology editor view of GenEpiO assembly quality control terms JSON-LDHTML FLASHFastQC CheckM RCQC recipe for text-mining flash.log In developing a scripting language to do this work, we did not want to reinvent the wheel (in fact RCQC offers up for reuse all of python’s built-in math and operator functions). We did however need a flexible mechanism for FLASH