SlideShare a Scribd company logo
Bringing the Power of
Synthetic (Sequence) Data
to the Masses
FAIR Hackathon, BioIT World Apr 15-16, 2019
R for REUSE / REPRODUCIBILITY
DATA
RESULT
ANALYSIS BIOLOGICAL
FINDINGS
DATA
RESULT
ANALYSIS
“Can we repeat
the experiment?”
OTHER
DATA
OTHER
ANALYSIS
“Can we confirm
that this is true?”
Reproduce
What does reproducibility mean to you?
≠ Replicate
?
The original project broad.io/ASHG2018
1-Collect-1000G-participant
2-Generate-synthetic-reads
3-Mutate-reads-with-BAMSurgeon
4-Call-single-sample-GVCF-GATK4
5-Joint-call-and-hard-filter-GATK4
Synthetic
exome data
Mutated
exome data
Sample
GVCFs
Variant
calls
Multisample
variant calls
6-Predict-variant-effects-GEMINI
Predicted
effects
ashg18-notebooks-cluster_analysis
Final table
of results
This hackathon project broad.io/FAIRdatahack2019
1-Collect-1000G-participant
2-Generate-synthetic-reads
3-Mutate-reads-with-BAMSurgeon
Synthetic
exome data
Mutated
exome data
Variant
calls
How do we turn this into a FAIR community resource to empower
biomedical researchers to leverage the underlying tools more easily?
This hackathon project broad.io/FAIRdatahack2019
1-Collect-1000G-participant
2-Generate-synthetic-reads
3-Mutate-reads-with-BAMSurgeon
Synthetic
exome data
Mutated
exome data
Variant
calls
#1 - Data in demand
What kind of datasets
would be useful to the
community?
Identified top needs based on literature
+ feedback from hackathon participants
This hackathon project broad.io/FAIRdatahack2019
1-Collect-1000G-participant
2-Generate-synthetic-reads
3-Mutate-reads-with-BAMSurgeon
Synthetic
exome data
Mutated
exome data
Variant
calls
#2 – Diversifying
Options
Enable generation of
more data types and
more variant types
Added parameters to original workflows to
enable adding indels + prototyped SVs
This hackathon project broad.io/FAIRdatahack2019
1-Collect-1000G-participant
2-Generate-synthetic-reads
3-Mutate-reads-with-BAMSurgeon
Synthetic
exome data
Mutated
exome data
Variant
calls
#3 – Method
Optimization
Reduce cost and
runtime of our
workflows
Identified performance bottlenecks and started
exploring use of multithreading options
This hackathon project broad.io/FAIRdatahack2019
1-Collect-1000G-participant
2-Generate-synthetic-reads
3-Mutate-reads-with-BAMSurgeon
Synthetic
exome data
Mutated
exome data
Variant
calls
#4 – Quality
Control
Evaluate whether the
synthetic data we
generate is
suitable
Created a metrics collection workflow and a
QC report prototype (jupyter notebook)
Collect QC metrics + display in notebook report
The Workspace
The Workspace
The Next Steps
Off-the-shelf synthetic data catalog
User-friendly tools for generating
custom synthetic (sequence) datasets
The Team
Broadies
Adelaide Rhodes
Allie Hajian
Anton Kovalsky
Ruchi Munshi
Tiffany Miller
Geraldine Van der Auwera
Guest stars
Ernesto Andrianantoandro
Dan Rozelle
Jay Moore
Rory Davidson
Roma Kurilov
Vrinda Pareek

More Related Content

Similar to BioIT19 NCBI FAIR Hackathon - Synthetic Data team

ReComp: optimising the re-execution of analytics pipelines in response to cha...
ReComp: optimising the re-execution of analytics pipelines in response to cha...ReComp: optimising the re-execution of analytics pipelines in response to cha...
ReComp: optimising the re-execution of analytics pipelines in response to cha...
Paolo Missier
 
Deep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless ReproducibilityDeep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless Reproducibility
University of Rennes, INSA Rennes, Inria/IRISA, CNRS
 
Ebgan
EbganEbgan
User-friendly bioinformatics (Monthly Informational workshop)
User-friendly bioinformatics (Monthly Informational workshop)User-friendly bioinformatics (Monthly Informational workshop)
User-friendly bioinformatics (Monthly Informational workshop)
Elia Brodsky
 
Reproducibility challenges in computational settings: what are they, why shou...
Reproducibility challenges in computational settings: what are they, why shou...Reproducibility challenges in computational settings: what are they, why shou...
Reproducibility challenges in computational settings: what are they, why shou...
Research Data Alliance
 
Cool Informatics Tools and Services for Biomedical Research
Cool Informatics Tools and Services for Biomedical ResearchCool Informatics Tools and Services for Biomedical Research
Cool Informatics Tools and Services for Biomedical Research
David Ruau
 
Deep learning for medical imaging
Deep learning for medical imagingDeep learning for medical imaging
Deep learning for medical imaging
geetachauhan
 
FHIR databases by Nikolai Ryzhikov, PhD at ScaleLA
FHIR databases by Nikolai Ryzhikov, PhD at ScaleLAFHIR databases by Nikolai Ryzhikov, PhD at ScaleLA
FHIR databases by Nikolai Ryzhikov, PhD at ScaleLA
HealthDev
 
Qu speaker series 14: Synthetic Data Generation in Finance
Qu speaker series 14: Synthetic Data Generation in FinanceQu speaker series 14: Synthetic Data Generation in Finance
Qu speaker series 14: Synthetic Data Generation in Finance
QuantUniversity
 
New methods dnanexus precisionfda evaluation of draft v4alpha
New methods   dnanexus precisionfda evaluation of draft v4alphaNew methods   dnanexus precisionfda evaluation of draft v4alpha
New methods dnanexus precisionfda evaluation of draft v4alpha
GenomeInABottle
 
Formulating Evolutionary Dynamics of Organism-Environment Couplings Using Gra...
Formulating Evolutionary Dynamics of Organism-Environment Couplings Using Gra...Formulating Evolutionary Dynamics of Organism-Environment Couplings Using Gra...
Formulating Evolutionary Dynamics of Organism-Environment Couplings Using Gra...
Hiroki Sayama
 
Introduction to LLMs
Introduction to LLMsIntroduction to LLMs
Introduction to LLMs
Loic Merckel
 
Scott Edmunds at #GAMe2017: GigaGalaxy & publishing workflows for publishing ...
Scott Edmunds at #GAMe2017: GigaGalaxy & publishing workflows for publishing ...Scott Edmunds at #GAMe2017: GigaGalaxy & publishing workflows for publishing ...
Scott Edmunds at #GAMe2017: GigaGalaxy & publishing workflows for publishing ...
GigaScience, BGI Hong Kong
 
Predicting Hospital Bills at Pre-admission
Predicting Hospital Bills at Pre-admissionPredicting Hospital Bills at Pre-admission
Predicting Hospital Bills at Pre-admission
Eugene Yan Ziyou
 
BioThings API: Building a FAIR API Ecosystem for Biomedical Knowledge
BioThings API: Building a FAIR API Ecosystem for Biomedical KnowledgeBioThings API: Building a FAIR API Ecosystem for Biomedical Knowledge
BioThings API: Building a FAIR API Ecosystem for Biomedical Knowledge
Chunlei Wu
 
BioData World Basel 2018
BioData World Basel 2018BioData World Basel 2018
BioData World Basel 2018
Anne Deslattes Mays
 
Sharing massive data analysis: from provenance to linked experiment reports
Sharing massive data analysis: from provenance to linked experiment reportsSharing massive data analysis: from provenance to linked experiment reports
Sharing massive data analysis: from provenance to linked experiment reports
Gaignard Alban
 
An introduction to Phage Genome Annotation (Viruses of Microbes 2018)
An introduction to Phage Genome Annotation (Viruses of Microbes 2018)An introduction to Phage Genome Annotation (Viruses of Microbes 2018)
An introduction to Phage Genome Annotation (Viruses of Microbes 2018)
Ramy K. Aziz
 
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
Paolo Missier
 
Reproducible research(1)
Reproducible research(1)Reproducible research(1)
Reproducible research(1)
건웅 문
 

Similar to BioIT19 NCBI FAIR Hackathon - Synthetic Data team (20)

ReComp: optimising the re-execution of analytics pipelines in response to cha...
ReComp: optimising the re-execution of analytics pipelines in response to cha...ReComp: optimising the re-execution of analytics pipelines in response to cha...
ReComp: optimising the re-execution of analytics pipelines in response to cha...
 
Deep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless ReproducibilityDeep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless Reproducibility
 
Ebgan
EbganEbgan
Ebgan
 
User-friendly bioinformatics (Monthly Informational workshop)
User-friendly bioinformatics (Monthly Informational workshop)User-friendly bioinformatics (Monthly Informational workshop)
User-friendly bioinformatics (Monthly Informational workshop)
 
Reproducibility challenges in computational settings: what are they, why shou...
Reproducibility challenges in computational settings: what are they, why shou...Reproducibility challenges in computational settings: what are they, why shou...
Reproducibility challenges in computational settings: what are they, why shou...
 
Cool Informatics Tools and Services for Biomedical Research
Cool Informatics Tools and Services for Biomedical ResearchCool Informatics Tools and Services for Biomedical Research
Cool Informatics Tools and Services for Biomedical Research
 
Deep learning for medical imaging
Deep learning for medical imagingDeep learning for medical imaging
Deep learning for medical imaging
 
FHIR databases by Nikolai Ryzhikov, PhD at ScaleLA
FHIR databases by Nikolai Ryzhikov, PhD at ScaleLAFHIR databases by Nikolai Ryzhikov, PhD at ScaleLA
FHIR databases by Nikolai Ryzhikov, PhD at ScaleLA
 
Qu speaker series 14: Synthetic Data Generation in Finance
Qu speaker series 14: Synthetic Data Generation in FinanceQu speaker series 14: Synthetic Data Generation in Finance
Qu speaker series 14: Synthetic Data Generation in Finance
 
New methods dnanexus precisionfda evaluation of draft v4alpha
New methods   dnanexus precisionfda evaluation of draft v4alphaNew methods   dnanexus precisionfda evaluation of draft v4alpha
New methods dnanexus precisionfda evaluation of draft v4alpha
 
Formulating Evolutionary Dynamics of Organism-Environment Couplings Using Gra...
Formulating Evolutionary Dynamics of Organism-Environment Couplings Using Gra...Formulating Evolutionary Dynamics of Organism-Environment Couplings Using Gra...
Formulating Evolutionary Dynamics of Organism-Environment Couplings Using Gra...
 
Introduction to LLMs
Introduction to LLMsIntroduction to LLMs
Introduction to LLMs
 
Scott Edmunds at #GAMe2017: GigaGalaxy & publishing workflows for publishing ...
Scott Edmunds at #GAMe2017: GigaGalaxy & publishing workflows for publishing ...Scott Edmunds at #GAMe2017: GigaGalaxy & publishing workflows for publishing ...
Scott Edmunds at #GAMe2017: GigaGalaxy & publishing workflows for publishing ...
 
Predicting Hospital Bills at Pre-admission
Predicting Hospital Bills at Pre-admissionPredicting Hospital Bills at Pre-admission
Predicting Hospital Bills at Pre-admission
 
BioThings API: Building a FAIR API Ecosystem for Biomedical Knowledge
BioThings API: Building a FAIR API Ecosystem for Biomedical KnowledgeBioThings API: Building a FAIR API Ecosystem for Biomedical Knowledge
BioThings API: Building a FAIR API Ecosystem for Biomedical Knowledge
 
BioData World Basel 2018
BioData World Basel 2018BioData World Basel 2018
BioData World Basel 2018
 
Sharing massive data analysis: from provenance to linked experiment reports
Sharing massive data analysis: from provenance to linked experiment reportsSharing massive data analysis: from provenance to linked experiment reports
Sharing massive data analysis: from provenance to linked experiment reports
 
An introduction to Phage Genome Annotation (Viruses of Microbes 2018)
An introduction to Phage Genome Annotation (Viruses of Microbes 2018)An introduction to Phage Genome Annotation (Viruses of Microbes 2018)
An introduction to Phage Genome Annotation (Viruses of Microbes 2018)
 
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
 
Reproducible research(1)
Reproducible research(1)Reproducible research(1)
Reproducible research(1)
 

Recently uploaded

快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
hozt8xgk
 
Applied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdfApplied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdf
University of Hertfordshire
 
Travis Hills of MN is Making Clean Water Accessible to All Through High Flux ...
Travis Hills of MN is Making Clean Water Accessible to All Through High Flux ...Travis Hills of MN is Making Clean Water Accessible to All Through High Flux ...
Travis Hills of MN is Making Clean Water Accessible to All Through High Flux ...
Travis Hills MN
 
The debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically youngThe debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically young
Sérgio Sacani
 
Microbiology of Central Nervous System INFECTIONS.pdf
Microbiology of Central Nervous System INFECTIONS.pdfMicrobiology of Central Nervous System INFECTIONS.pdf
Microbiology of Central Nervous System INFECTIONS.pdf
sammy700571
 
CLASS 12th CHEMISTRY SOLID STATE ppt (Animated)
CLASS 12th CHEMISTRY SOLID STATE ppt (Animated)CLASS 12th CHEMISTRY SOLID STATE ppt (Animated)
CLASS 12th CHEMISTRY SOLID STATE ppt (Animated)
eitps1506
 
Direct Seeded Rice - Climate Smart Agriculture
Direct Seeded Rice - Climate Smart AgricultureDirect Seeded Rice - Climate Smart Agriculture
Direct Seeded Rice - Climate Smart Agriculture
International Food Policy Research Institute- South Asia Office
 
11.1 Role of physical biological in deterioration of grains.pdf
11.1 Role of physical biological in deterioration of grains.pdf11.1 Role of physical biological in deterioration of grains.pdf
11.1 Role of physical biological in deterioration of grains.pdf
PirithiRaju
 
Summary Of transcription and Translation.pdf
Summary Of transcription and Translation.pdfSummary Of transcription and Translation.pdf
Summary Of transcription and Translation.pdf
vadgavevedant86
 
Sexuality - Issues, Attitude and Behaviour - Applied Social Psychology - Psyc...
Sexuality - Issues, Attitude and Behaviour - Applied Social Psychology - Psyc...Sexuality - Issues, Attitude and Behaviour - Applied Social Psychology - Psyc...
Sexuality - Issues, Attitude and Behaviour - Applied Social Psychology - Psyc...
PsychoTech Services
 
Evidence of Jet Activity from the Secondary Black Hole in the OJ 287 Binary S...
Evidence of Jet Activity from the Secondary Black Hole in the OJ 287 Binary S...Evidence of Jet Activity from the Secondary Black Hole in the OJ 287 Binary S...
Evidence of Jet Activity from the Secondary Black Hole in the OJ 287 Binary S...
Sérgio Sacani
 
MICROBIAL INTERACTION PPT/ MICROBIAL INTERACTION AND THEIR TYPES // PLANT MIC...
MICROBIAL INTERACTION PPT/ MICROBIAL INTERACTION AND THEIR TYPES // PLANT MIC...MICROBIAL INTERACTION PPT/ MICROBIAL INTERACTION AND THEIR TYPES // PLANT MIC...
MICROBIAL INTERACTION PPT/ MICROBIAL INTERACTION AND THEIR TYPES // PLANT MIC...
ABHISHEK SONI NIMT INSTITUTE OF MEDICAL AND PARAMEDCIAL SCIENCES , GOVT PG COLLEGE NOIDA
 
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...
Scintica Instrumentation
 
Immersive Learning That Works: Research Grounding and Paths Forward
Immersive Learning That Works: Research Grounding and Paths ForwardImmersive Learning That Works: Research Grounding and Paths Forward
Immersive Learning That Works: Research Grounding and Paths Forward
Leonel Morgado
 
Randomised Optimisation Algorithms in DAPHNE
Randomised Optimisation Algorithms in DAPHNERandomised Optimisation Algorithms in DAPHNE
Randomised Optimisation Algorithms in DAPHNE
University of Maribor
 
JAMES WEBB STUDY THE MASSIVE BLACK HOLE SEEDS
JAMES WEBB STUDY THE MASSIVE BLACK HOLE SEEDSJAMES WEBB STUDY THE MASSIVE BLACK HOLE SEEDS
JAMES WEBB STUDY THE MASSIVE BLACK HOLE SEEDS
Sérgio Sacani
 
Mending Clothing to Support Sustainable Fashion_CIMaR 2024.pdf
Mending Clothing to Support Sustainable Fashion_CIMaR 2024.pdfMending Clothing to Support Sustainable Fashion_CIMaR 2024.pdf
Mending Clothing to Support Sustainable Fashion_CIMaR 2024.pdf
Selcen Ozturkcan
 
The cost of acquiring information by natural selection
The cost of acquiring information by natural selectionThe cost of acquiring information by natural selection
The cost of acquiring information by natural selection
Carl Bergstrom
 
Sciences of Europe journal No 142 (2024)
Sciences of Europe journal No 142 (2024)Sciences of Europe journal No 142 (2024)
Sciences of Europe journal No 142 (2024)
Sciences of Europe
 
23PH301 - Optics - Optical Lenses.pptx
23PH301 - Optics  -  Optical Lenses.pptx23PH301 - Optics  -  Optical Lenses.pptx
23PH301 - Optics - Optical Lenses.pptx
RDhivya6
 

Recently uploaded (20)

快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
 
Applied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdfApplied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdf
 
Travis Hills of MN is Making Clean Water Accessible to All Through High Flux ...
Travis Hills of MN is Making Clean Water Accessible to All Through High Flux ...Travis Hills of MN is Making Clean Water Accessible to All Through High Flux ...
Travis Hills of MN is Making Clean Water Accessible to All Through High Flux ...
 
The debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically youngThe debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically young
 
Microbiology of Central Nervous System INFECTIONS.pdf
Microbiology of Central Nervous System INFECTIONS.pdfMicrobiology of Central Nervous System INFECTIONS.pdf
Microbiology of Central Nervous System INFECTIONS.pdf
 
CLASS 12th CHEMISTRY SOLID STATE ppt (Animated)
CLASS 12th CHEMISTRY SOLID STATE ppt (Animated)CLASS 12th CHEMISTRY SOLID STATE ppt (Animated)
CLASS 12th CHEMISTRY SOLID STATE ppt (Animated)
 
Direct Seeded Rice - Climate Smart Agriculture
Direct Seeded Rice - Climate Smart AgricultureDirect Seeded Rice - Climate Smart Agriculture
Direct Seeded Rice - Climate Smart Agriculture
 
11.1 Role of physical biological in deterioration of grains.pdf
11.1 Role of physical biological in deterioration of grains.pdf11.1 Role of physical biological in deterioration of grains.pdf
11.1 Role of physical biological in deterioration of grains.pdf
 
Summary Of transcription and Translation.pdf
Summary Of transcription and Translation.pdfSummary Of transcription and Translation.pdf
Summary Of transcription and Translation.pdf
 
Sexuality - Issues, Attitude and Behaviour - Applied Social Psychology - Psyc...
Sexuality - Issues, Attitude and Behaviour - Applied Social Psychology - Psyc...Sexuality - Issues, Attitude and Behaviour - Applied Social Psychology - Psyc...
Sexuality - Issues, Attitude and Behaviour - Applied Social Psychology - Psyc...
 
Evidence of Jet Activity from the Secondary Black Hole in the OJ 287 Binary S...
Evidence of Jet Activity from the Secondary Black Hole in the OJ 287 Binary S...Evidence of Jet Activity from the Secondary Black Hole in the OJ 287 Binary S...
Evidence of Jet Activity from the Secondary Black Hole in the OJ 287 Binary S...
 
MICROBIAL INTERACTION PPT/ MICROBIAL INTERACTION AND THEIR TYPES // PLANT MIC...
MICROBIAL INTERACTION PPT/ MICROBIAL INTERACTION AND THEIR TYPES // PLANT MIC...MICROBIAL INTERACTION PPT/ MICROBIAL INTERACTION AND THEIR TYPES // PLANT MIC...
MICROBIAL INTERACTION PPT/ MICROBIAL INTERACTION AND THEIR TYPES // PLANT MIC...
 
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...
 
Immersive Learning That Works: Research Grounding and Paths Forward
Immersive Learning That Works: Research Grounding and Paths ForwardImmersive Learning That Works: Research Grounding and Paths Forward
Immersive Learning That Works: Research Grounding and Paths Forward
 
Randomised Optimisation Algorithms in DAPHNE
Randomised Optimisation Algorithms in DAPHNERandomised Optimisation Algorithms in DAPHNE
Randomised Optimisation Algorithms in DAPHNE
 
JAMES WEBB STUDY THE MASSIVE BLACK HOLE SEEDS
JAMES WEBB STUDY THE MASSIVE BLACK HOLE SEEDSJAMES WEBB STUDY THE MASSIVE BLACK HOLE SEEDS
JAMES WEBB STUDY THE MASSIVE BLACK HOLE SEEDS
 
Mending Clothing to Support Sustainable Fashion_CIMaR 2024.pdf
Mending Clothing to Support Sustainable Fashion_CIMaR 2024.pdfMending Clothing to Support Sustainable Fashion_CIMaR 2024.pdf
Mending Clothing to Support Sustainable Fashion_CIMaR 2024.pdf
 
The cost of acquiring information by natural selection
The cost of acquiring information by natural selectionThe cost of acquiring information by natural selection
The cost of acquiring information by natural selection
 
Sciences of Europe journal No 142 (2024)
Sciences of Europe journal No 142 (2024)Sciences of Europe journal No 142 (2024)
Sciences of Europe journal No 142 (2024)
 
23PH301 - Optics - Optical Lenses.pptx
23PH301 - Optics  -  Optical Lenses.pptx23PH301 - Optics  -  Optical Lenses.pptx
23PH301 - Optics - Optical Lenses.pptx
 

BioIT19 NCBI FAIR Hackathon - Synthetic Data team

  • 1. Bringing the Power of Synthetic (Sequence) Data to the Masses FAIR Hackathon, BioIT World Apr 15-16, 2019 R for REUSE / REPRODUCIBILITY
  • 2. DATA RESULT ANALYSIS BIOLOGICAL FINDINGS DATA RESULT ANALYSIS “Can we repeat the experiment?” OTHER DATA OTHER ANALYSIS “Can we confirm that this is true?” Reproduce What does reproducibility mean to you? ≠ Replicate ?
  • 3. The original project broad.io/ASHG2018 1-Collect-1000G-participant 2-Generate-synthetic-reads 3-Mutate-reads-with-BAMSurgeon 4-Call-single-sample-GVCF-GATK4 5-Joint-call-and-hard-filter-GATK4 Synthetic exome data Mutated exome data Sample GVCFs Variant calls Multisample variant calls 6-Predict-variant-effects-GEMINI Predicted effects ashg18-notebooks-cluster_analysis Final table of results
  • 4. This hackathon project broad.io/FAIRdatahack2019 1-Collect-1000G-participant 2-Generate-synthetic-reads 3-Mutate-reads-with-BAMSurgeon Synthetic exome data Mutated exome data Variant calls How do we turn this into a FAIR community resource to empower biomedical researchers to leverage the underlying tools more easily?
  • 5. This hackathon project broad.io/FAIRdatahack2019 1-Collect-1000G-participant 2-Generate-synthetic-reads 3-Mutate-reads-with-BAMSurgeon Synthetic exome data Mutated exome data Variant calls #1 - Data in demand What kind of datasets would be useful to the community? Identified top needs based on literature + feedback from hackathon participants
  • 6. This hackathon project broad.io/FAIRdatahack2019 1-Collect-1000G-participant 2-Generate-synthetic-reads 3-Mutate-reads-with-BAMSurgeon Synthetic exome data Mutated exome data Variant calls #2 – Diversifying Options Enable generation of more data types and more variant types Added parameters to original workflows to enable adding indels + prototyped SVs
  • 7. This hackathon project broad.io/FAIRdatahack2019 1-Collect-1000G-participant 2-Generate-synthetic-reads 3-Mutate-reads-with-BAMSurgeon Synthetic exome data Mutated exome data Variant calls #3 – Method Optimization Reduce cost and runtime of our workflows Identified performance bottlenecks and started exploring use of multithreading options
  • 8. This hackathon project broad.io/FAIRdatahack2019 1-Collect-1000G-participant 2-Generate-synthetic-reads 3-Mutate-reads-with-BAMSurgeon Synthetic exome data Mutated exome data Variant calls #4 – Quality Control Evaluate whether the synthetic data we generate is suitable Created a metrics collection workflow and a QC report prototype (jupyter notebook) Collect QC metrics + display in notebook report
  • 11. The Next Steps Off-the-shelf synthetic data catalog User-friendly tools for generating custom synthetic (sequence) datasets
  • 12. The Team Broadies Adelaide Rhodes Allie Hajian Anton Kovalsky Ruchi Munshi Tiffany Miller Geraldine Van der Auwera Guest stars Ernesto Andrianantoandro Dan Rozelle Jay Moore Rory Davidson Roma Kurilov Vrinda Pareek