SlideShare a Scribd company logo
1 of 11
• Xin-zhuan Su
• Sittiporn Pattaradilokrat
• Sethu Nair
• Yanwei Qi
• Gordon Bullen
NIH/ NIAID – Malaria
Functional Genomics Section • Sebastian Gurevich
McGill University
Funding:
National Institutes of Health
Canadian Institutes of Health Research
• Philip Awadalla
University of Montreal
https://github.com/parasite-genomics/Pipelines - 2.0 Coming in July 2014
zmartine@gmail.com
ComPar: Genome Assembly, Variant Mapping, and
Validation Pipelines
Martine Zilversmit
http://www.slideshare.net/zmartine1/com-par-25jun14
ComPar: Genome Assembly, Variant Mapping, and
Validation Pipelines
https://github.com/parasite-genomics/Pipelines
• BASH-scripted
pipelines
• Accurate variant
prediction
– SNPs
– Small indels
– Large indels
(>17bp)
– Focused regions of
extreme divergence
(35-70% amino acid
identity)
• In silico variant
validation
Parameters:
- Quality Metric and Cutoff
- Number of variants per cluster
- Maximum distance between variants within a cluster
- Maximum distance between smaller clusters to merge
into an HDR
Finding Highly Divergent Regions – HDR Program
VCF File
False Positive
Variants
True Positive
Variants
HDR File:
- Size of HDR
- Position of HDR
- Variants Contained
Python - Stand-alone interactive or pipelined
NumberofVariants
Position on “Chromosome”
Dye-Terminator Sequenced Variation – 50 basepair Sliding window
Comparing 2 Plasmodium Genomes
200
400
600
800
1000
1200
1400
1600
1800
2000
2200
2400
2600
2800
3000
3200
3400
3600
3800
4000
4200
4400
4600
4800
5000
5200
5400
5600
5800
6000
6200
6400
6600
6800
7000
7200
7400
7600
7800
8000
8200
8400
8600
8800
9000
9200
9400
9600
0
2
4
6
8
10
12
Predicted Variants – No filtering Based on Quality Metrics
NumberofVariants
Position on “Chromosome”
NumberofVariants
Position on “Chromosome”
Comparing 2 Plasmodium Genomes
200
400
600
800
1000
1200
1400
1600
1800
2000
2200
2400
2600
2800
3000
3200
3400
3600
3800
4000
4200
4400
4600
4800
5000
5200
5400
5600
5800
6000
6200
6400
6600
6800
7000
7200
7400
7600
7800
8000
8200
8400
8600
8800
9000
9200
9400
9600
0
2
4
6
8
10
12
200
400
600
800
1000
1200
1400
1600
1800
2000
2200
2400
2600
2800
3000
3200
3400
3600
3800
4000
4200
4400
4600
4800
5000
5200
5400
5600
5800
6000
6200
6400
6600
6800
7000
7200
7400
7600
7800
8000
8200
8400
8600
8800
9000
9200
9400
9600
0
2
4
6
8
10
12
True Variants
Unfiltered Results
NumberofVariants
Position on “Chromosome”
NumberofVariants
Position on “Chromosome”
Comparing 2 Plasmodium Genomes
200
400
600
800
1000
1200
1400
1600
1800
2000
2200
2400
2600
2800
3000
3200
3400
3600
3800
4000
4200
4400
4600
4800
5000
5200
5400
5600
5800
6000
6200
6400
6600
6800
7000
7200
7400
7600
7800
8000
8200
8400
8600
8800
9000
9200
9400
9600
0
2
4
6
8
10
12
200
400
600
800
1000
1200
1400
1600
1800
2000
2200
2400
2600
2800
3000
3200
3400
3600
3800
4000
4200
4400
4600
4800
5000
5200
5400
5600
5800
6000
6200
6400
6600
6800
7000
7200
7400
7600
7800
8000
8200
8400
8600
8800
9000
9200
9400
9600
0
2
4
6
8
10
12
True Variants
Unfiltered Results
200
400
600
800
1000
1200
1400
1600
1800
2000
2200
2400
2600
2800
3000
3200
3400
3600
3800
4000
4200
4400
4600
4800
5000
5200
5400
5600
5800
6000
6200
6400
6600
6800
7000
7200
7400
7600
7800
8000
8200
8400
8600
8800
9000
9200
9400
9600
0
2
4
6
8
10
12
True Variants
Quality 30 Cutoff
Predicted Variants - Filtering Based on Quality Score ≥ 30 Cutoff
NumberofVariants
Position on “Chromosome”
NumberofVariants
Position on “Chromosome”
Comparing 2 Plasmodium Genomes
Filtering Based on Consensus Quality (FQ) ≤ -100 Cutoff
200
400
600
800
1000
1200
1400
1600
1800
2000
2200
2400
2600
2800
3000
3200
3400
3600
3800
4000
4200
4400
4600
4800
5000
5200
5400
5600
5800
6000
6200
6400
6600
6800
7000
7200
7400
7600
7800
8000
8200
8400
8600
8800
9000
9200
9400
9600
0
2
4
6
8
10
12
True Variants
Unfiltered Results
Quality 30 Cutoff
200
400
600
800
1000
1200
1400
1600
1800
2000
2200
2400
2600
2800
3000
3200
3400
3600
3800
4000
4200
4400
4600
4800
5000
5200
5400
5600
5800
6000
6200
6400
6600
6800
7000
7200
7400
7600
7800
8000
8200
8400
8600
8800
9000
9200
9400
9600
0
2
4
6
8
10
12
True Variants
FQ −100 Cuttoff
NumberofVariants
Position on “Chromosome”
NumberofVariants
Position on “Chromosome”
Comparing 2 Plasmodium Genomes
Highly-Divergent Regions (HDRs)
200
400
600
800
1000
1200
1400
1600
1800
2000
2200
2400
2600
2800
3000
3200
3400
3600
3800
4000
4200
4400
4600
4800
5000
5200
5400
5600
5800
6000
6200
6400
6600
6800
7000
7200
7400
7600
7800
8000
8200
8400
8600
8800
9000
9200
9400
9600
0
2
4
6
8
10
12
True Variants
Unfiltered Results
Quality 30 Cutoff
FQ −100 Cuttoff
200
400
600
800
1000
1200
1400
1600
1800
2000
2200
2400
2600
2800
3000
3200
3400
3600
3800
4000
4200
4400
4600
4800
5000
5200
5400
5600
5800
6000
6200
6400
6600
6800
7000
7200
7400
7600
7800
8000
8200
8400
8600
8800
9000
9200
9400
9600
0
2
4
6
8
10
12
True Variants
Unfiltered Results
Quality 30 Cutoff
FQ −100 Cuttoff
NumberofVariants
Position on “Chromosome”
NumberofVariants
Position on “Chromosome”
Comparing 2 Plasmodium Genomes
200
400
600
800
1000
1200
1400
1600
1800
2000
2200
2400
2600
2800
3000
3200
3400
3600
3800
4000
4200
4400
4600
4800
5000
5200
5400
5600
5800
6000
6200
6400
6600
6800
7000
7200
7400
7600
7800
8000
8200
8400
8600
8800
9000
9200
9400
9600
0
2
4
6
8
10
12
True Variants
Unfiltered Results
Quality 30 Cutoff
FQ −100 Cuttoff
Quality ≥ 30 Variants without Consensus Quality ≥ -100
Highly-Divergent Regions (HDRs)
200
400
600
800
1000
1200
1400
1600
1800
2000
2200
2400
2600
2800
3000
3200
3400
3600
3800
4000
4200
4400
4600
4800
5000
5200
5400
5600
5800
6000
6200
6400
6600
6800
7000
7200
7400
7600
7800
8000
8200
8400
8600
8800
9000
9200
9400
9600
0
2
4
6
8
10
12
True Variants
Quality 30, No HDRs
Characteristics of Highly Divergent Regions
33X 44.4%
By265 55.6%
N67 66.7%
histone acetyltransferase GCN5, putative (GCN5)
RNA-binding protein NOB1, putative
Percent Identity
DNA repair protein, putative
33X 41.4%
By265 79.3%
N67 51.7%
Characteristics of Highly Divergent Regions

More Related Content

What's hot

Annotation capabilities
Annotation capabilitiesAnnotation capabilities
Annotation capabilitiesGolden Helix
 
An Exploration of Clinical Workflows in VarSeq
An Exploration of Clinical Workflows in VarSeqAn Exploration of Clinical Workflows in VarSeq
An Exploration of Clinical Workflows in VarSeqGolden Helix
 
Evaluating Oncogenicity in VSClinical
Evaluating Oncogenicity in VSClinicalEvaluating Oncogenicity in VSClinical
Evaluating Oncogenicity in VSClinicalGolden Helix
 
Open zika presentation
Open zika presentation Open zika presentation
Open zika presentation Sean Ekins
 
Next-Generation Sequencing Commercial Milestones Infographic
Next-Generation Sequencing Commercial Milestones InfographicNext-Generation Sequencing Commercial Milestones Infographic
Next-Generation Sequencing Commercial Milestones InfographicQIAGEN
 
Real-Time Genome Sequencing of Resistant Bacteria Provides Precision Infectio...
Real-Time Genome Sequencing of Resistant Bacteria Provides Precision Infectio...Real-Time Genome Sequencing of Resistant Bacteria Provides Precision Infectio...
Real-Time Genome Sequencing of Resistant Bacteria Provides Precision Infectio...ExternalEvents
 
Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...
Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...
Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...Nathan Olson
 
Resume_Bill_Martinez
Resume_Bill_MartinezResume_Bill_Martinez
Resume_Bill_MartinezBill Martinez
 
Genome Sequencing: FAO's relevant activities in Animal Health
Genome Sequencing: FAO's relevant activities in Animal HealthGenome Sequencing: FAO's relevant activities in Animal Health
Genome Sequencing: FAO's relevant activities in Animal HealthFAO
 
2015 06-12-beiko-irida-big data
2015 06-12-beiko-irida-big data2015 06-12-beiko-irida-big data
2015 06-12-beiko-irida-big databeiko
 
Pizza club - May 2016 - Shaman
Pizza club - May 2016 - ShamanPizza club - May 2016 - Shaman
Pizza club - May 2016 - ShamanRSG Luxembourg
 
Oncogenicity Scoring in VSClinical
Oncogenicity Scoring in VSClinicalOncogenicity Scoring in VSClinical
Oncogenicity Scoring in VSClinicalGolden Helix
 
Using In Silico Tools in Repurposing Drugs for Neglected and Orphan Diseases
Using In Silico Tools in Repurposing Drugs for Neglected and Orphan DiseasesUsing In Silico Tools in Repurposing Drugs for Neglected and Orphan Diseases
Using In Silico Tools in Repurposing Drugs for Neglected and Orphan DiseasesSean Ekins
 
LAMPARAH - LJ_Manceras
LAMPARAH - LJ_MancerasLAMPARAH - LJ_Manceras
LAMPARAH - LJ_MancerasPerez Eric
 
academic / small company collaborations for rare and neglected diseasesv2
 academic / small company collaborations for rare and neglected diseasesv2 academic / small company collaborations for rare and neglected diseasesv2
academic / small company collaborations for rare and neglected diseasesv2Sean Ekins
 
2nd CRISPR Congress Boston, 23-25 February 2016
2nd CRISPR Congress Boston, 23-25 February 2016 2nd CRISPR Congress Boston, 23-25 February 2016
2nd CRISPR Congress Boston, 23-25 February 2016 Diane McKenna
 
Using Public Access Clinical Databases to Interpret NGS Variants
Using Public Access Clinical Databases to Interpret NGS VariantsUsing Public Access Clinical Databases to Interpret NGS Variants
Using Public Access Clinical Databases to Interpret NGS VariantsGolden Helix Inc
 
Identification of antibiotic resistance genes in Klebsiella pneumoniae isolat...
Identification of antibiotic resistance genes in Klebsiella pneumoniae isolat...Identification of antibiotic resistance genes in Klebsiella pneumoniae isolat...
Identification of antibiotic resistance genes in Klebsiella pneumoniae isolat...QIAGEN
 
Optimizing the Output of Your Molecular Pathology Laboratory
Optimizing the Output of Your Molecular Pathology LaboratoryOptimizing the Output of Your Molecular Pathology Laboratory
Optimizing the Output of Your Molecular Pathology LaboratoryJosh Forsythe
 

What's hot (20)

Annotation capabilities
Annotation capabilitiesAnnotation capabilities
Annotation capabilities
 
An Exploration of Clinical Workflows in VarSeq
An Exploration of Clinical Workflows in VarSeqAn Exploration of Clinical Workflows in VarSeq
An Exploration of Clinical Workflows in VarSeq
 
Evaluating Oncogenicity in VSClinical
Evaluating Oncogenicity in VSClinicalEvaluating Oncogenicity in VSClinical
Evaluating Oncogenicity in VSClinical
 
Open zika presentation
Open zika presentation Open zika presentation
Open zika presentation
 
Next-Generation Sequencing Commercial Milestones Infographic
Next-Generation Sequencing Commercial Milestones InfographicNext-Generation Sequencing Commercial Milestones Infographic
Next-Generation Sequencing Commercial Milestones Infographic
 
Real-Time Genome Sequencing of Resistant Bacteria Provides Precision Infectio...
Real-Time Genome Sequencing of Resistant Bacteria Provides Precision Infectio...Real-Time Genome Sequencing of Resistant Bacteria Provides Precision Infectio...
Real-Time Genome Sequencing of Resistant Bacteria Provides Precision Infectio...
 
Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...
Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...
Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...
 
Resume_Bill_Martinez
Resume_Bill_MartinezResume_Bill_Martinez
Resume_Bill_Martinez
 
Genome Sequencing: FAO's relevant activities in Animal Health
Genome Sequencing: FAO's relevant activities in Animal HealthGenome Sequencing: FAO's relevant activities in Animal Health
Genome Sequencing: FAO's relevant activities in Animal Health
 
2015 06-12-beiko-irida-big data
2015 06-12-beiko-irida-big data2015 06-12-beiko-irida-big data
2015 06-12-beiko-irida-big data
 
Pizza club - May 2016 - Shaman
Pizza club - May 2016 - ShamanPizza club - May 2016 - Shaman
Pizza club - May 2016 - Shaman
 
Oncogenicity Scoring in VSClinical
Oncogenicity Scoring in VSClinicalOncogenicity Scoring in VSClinical
Oncogenicity Scoring in VSClinical
 
Using In Silico Tools in Repurposing Drugs for Neglected and Orphan Diseases
Using In Silico Tools in Repurposing Drugs for Neglected and Orphan DiseasesUsing In Silico Tools in Repurposing Drugs for Neglected and Orphan Diseases
Using In Silico Tools in Repurposing Drugs for Neglected and Orphan Diseases
 
Ashg sedlazeck grc_share
Ashg sedlazeck grc_shareAshg sedlazeck grc_share
Ashg sedlazeck grc_share
 
LAMPARAH - LJ_Manceras
LAMPARAH - LJ_MancerasLAMPARAH - LJ_Manceras
LAMPARAH - LJ_Manceras
 
academic / small company collaborations for rare and neglected diseasesv2
 academic / small company collaborations for rare and neglected diseasesv2 academic / small company collaborations for rare and neglected diseasesv2
academic / small company collaborations for rare and neglected diseasesv2
 
2nd CRISPR Congress Boston, 23-25 February 2016
2nd CRISPR Congress Boston, 23-25 February 2016 2nd CRISPR Congress Boston, 23-25 February 2016
2nd CRISPR Congress Boston, 23-25 February 2016
 
Using Public Access Clinical Databases to Interpret NGS Variants
Using Public Access Clinical Databases to Interpret NGS VariantsUsing Public Access Clinical Databases to Interpret NGS Variants
Using Public Access Clinical Databases to Interpret NGS Variants
 
Identification of antibiotic resistance genes in Klebsiella pneumoniae isolat...
Identification of antibiotic resistance genes in Klebsiella pneumoniae isolat...Identification of antibiotic resistance genes in Klebsiella pneumoniae isolat...
Identification of antibiotic resistance genes in Klebsiella pneumoniae isolat...
 
Optimizing the Output of Your Molecular Pathology Laboratory
Optimizing the Output of Your Molecular Pathology LaboratoryOptimizing the Output of Your Molecular Pathology Laboratory
Optimizing the Output of Your Molecular Pathology Laboratory
 

Viewers also liked

Daniela. someone like you
Daniela. someone like youDaniela. someone like you
Daniela. someone like youdaanyeye
 
Sesión 2 actividad procedimental
Sesión 2 actividad procedimentalSesión 2 actividad procedimental
Sesión 2 actividad procedimentalFabiola Perez
 
Historia del arte
Historia del arteHistoria del arte
Historia del arteViviana
 
Grip_ RWJF Challenge
Grip_ RWJF ChallengeGrip_ RWJF Challenge
Grip_ RWJF Challengenicolegripapp
 
Simudyne rwjf aligning forces to generate data challenge
Simudyne rwjf   aligning forces to generate data challengeSimudyne rwjf   aligning forces to generate data challenge
Simudyne rwjf aligning forces to generate data challengeausdxw0
 
Listener do oracle database 02
Listener do oracle database 02Listener do oracle database 02
Listener do oracle database 02Ysmaylyka Macedo
 
What is Forensic Science? - Mocomi Kids
What is Forensic Science? - Mocomi KidsWhat is Forensic Science? - Mocomi Kids
What is Forensic Science? - Mocomi KidsMocomi Kids
 
Informe de guardia baumgertner
Informe de guardia baumgertnerInforme de guardia baumgertner
Informe de guardia baumgertnerAngelica Montigel
 
Detalles tipo division de baño -85 l
Detalles tipo division de baño  -85 lDetalles tipo division de baño  -85 l
Detalles tipo division de baño -85 lsurtialuminiossilva
 
KAROSERI BOX PENDINGIN
KAROSERI BOX PENDINGINKAROSERI BOX PENDINGIN
KAROSERI BOX PENDINGINKenzie Pratama
 
Judith A Mangan resume
Judith A  Mangan resumeJudith A  Mangan resume
Judith A Mangan resumeJudith Mangan
 
Tic En La EducacióN
Tic En La EducacióNTic En La EducacióN
Tic En La EducacióNcindyvilca
 

Viewers also liked (20)

Daniela. someone like you
Daniela. someone like youDaniela. someone like you
Daniela. someone like you
 
Guias del dinamometro
Guias del dinamometroGuias del dinamometro
Guias del dinamometro
 
Introduction
IntroductionIntroduction
Introduction
 
Sesión 2 actividad procedimental
Sesión 2 actividad procedimentalSesión 2 actividad procedimental
Sesión 2 actividad procedimental
 
Historia del arte
Historia del arteHistoria del arte
Historia del arte
 
Grip_ RWJF Challenge
Grip_ RWJF ChallengeGrip_ RWJF Challenge
Grip_ RWJF Challenge
 
Simudyne rwjf aligning forces to generate data challenge
Simudyne rwjf   aligning forces to generate data challengeSimudyne rwjf   aligning forces to generate data challenge
Simudyne rwjf aligning forces to generate data challenge
 
Listener do oracle database 02
Listener do oracle database 02Listener do oracle database 02
Listener do oracle database 02
 
What is Forensic Science? - Mocomi Kids
What is Forensic Science? - Mocomi KidsWhat is Forensic Science? - Mocomi Kids
What is Forensic Science? - Mocomi Kids
 
Informe de guardia baumgertner
Informe de guardia baumgertnerInforme de guardia baumgertner
Informe de guardia baumgertner
 
Detalles tipo division de baño -85 l
Detalles tipo division de baño  -85 lDetalles tipo division de baño  -85 l
Detalles tipo division de baño -85 l
 
Graphs
GraphsGraphs
Graphs
 
KAROSERI BOX PENDINGIN
KAROSERI BOX PENDINGINKAROSERI BOX PENDINGIN
KAROSERI BOX PENDINGIN
 
Lesson 03
Lesson 03Lesson 03
Lesson 03
 
What is you dynamics
What is you dynamicsWhat is you dynamics
What is you dynamics
 
Judith A Mangan resume
Judith A  Mangan resumeJudith A  Mangan resume
Judith A Mangan resume
 
OPEX real life challenge
OPEX real life challengeOPEX real life challenge
OPEX real life challenge
 
Tic En La EducacióN
Tic En La EducacióNTic En La EducacióN
Tic En La EducacióN
 
Uchiha
UchihaUchiha
Uchiha
 
HUET
HUETHUET
HUET
 

Similar to Com par 25jun14

16S MVRSION at Washington University
16S MVRSION at Washington University16S MVRSION at Washington University
16S MVRSION at Washington UniversitySeth Crosby
 
VarSeq 2.6.0: Advancing Pharmacogenomics and Genomic Analysis
VarSeq 2.6.0: Advancing Pharmacogenomics and Genomic AnalysisVarSeq 2.6.0: Advancing Pharmacogenomics and Genomic Analysis
VarSeq 2.6.0: Advancing Pharmacogenomics and Genomic AnalysisGolden Helix
 
Mohit Patel -- Disruptive Diner: Nano Possibilities
Mohit Patel -- Disruptive Diner: Nano PossibilitiesMohit Patel -- Disruptive Diner: Nano Possibilities
Mohit Patel -- Disruptive Diner: Nano PossibilitiesOpenly Disruptive
 
Development of FDA MicroDB: A Regulatory-Grade Microbial Reference Database
Development of FDA MicroDB: A Regulatory-Grade Microbial Reference DatabaseDevelopment of FDA MicroDB: A Regulatory-Grade Microbial Reference Database
Development of FDA MicroDB: A Regulatory-Grade Microbial Reference DatabaseNathan Olson
 
Giab for jax long read 190917
Giab for jax long read 190917Giab for jax long read 190917
Giab for jax long read 190917GenomeInABottle
 
Research Program Genetic Gains (RPGG) Review Meeting 2021: From Discovery to ...
Research Program Genetic Gains (RPGG) Review Meeting 2021: From Discovery to ...Research Program Genetic Gains (RPGG) Review Meeting 2021: From Discovery to ...
Research Program Genetic Gains (RPGG) Review Meeting 2021: From Discovery to ...ICRISAT
 
Basic Aspects of Microarray Technology and Data Analysis (UEB-UAT Bioinformat...
Basic Aspects of Microarray Technology and Data Analysis (UEB-UAT Bioinformat...Basic Aspects of Microarray Technology and Data Analysis (UEB-UAT Bioinformat...
Basic Aspects of Microarray Technology and Data Analysis (UEB-UAT Bioinformat...VHIR Vall d’Hebron Institut de Recerca
 
160627 giab for festival sv workshop
160627 giab for festival sv workshop160627 giab for festival sv workshop
160627 giab for festival sv workshopGenomeInABottle
 
Updates to VSClinical ACMG Guidelines & a Tour of Cancer Annotation Sources
Updates to VSClinical ACMG Guidelines & a Tour of Cancer Annotation SourcesUpdates to VSClinical ACMG Guidelines & a Tour of Cancer Annotation Sources
Updates to VSClinical ACMG Guidelines & a Tour of Cancer Annotation SourcesGolden Helix
 
Updates to VSClinical ACMG Guidelines & a Tour of Cancer Annotation Sources
Updates to VSClinical ACMG Guidelines & a Tour of Cancer Annotation SourcesUpdates to VSClinical ACMG Guidelines & a Tour of Cancer Annotation Sources
Updates to VSClinical ACMG Guidelines & a Tour of Cancer Annotation SourcesDelaina Hawkins
 
GIAB for AMP GeT-RM Forum
GIAB for AMP GeT-RM ForumGIAB for AMP GeT-RM Forum
GIAB for AMP GeT-RM ForumGenomeInABottle
 
Artificial Intelligence in Radiation Oncology
Artificial Intelligence in Radiation OncologyArtificial Intelligence in Radiation Oncology
Artificial Intelligence in Radiation OncologyWookjin Choi
 
Step by Step, from Liquid Biopsy to a Genomic Biomarker: Liquid Biopsy Series...
Step by Step, from Liquid Biopsy to a Genomic Biomarker: Liquid Biopsy Series...Step by Step, from Liquid Biopsy to a Genomic Biomarker: Liquid Biopsy Series...
Step by Step, from Liquid Biopsy to a Genomic Biomarker: Liquid Biopsy Series...QIAGEN
 
Bioanalytical Capabilities -- Thought-Leading Science Armed with the Latest T...
Bioanalytical Capabilities -- Thought-Leading Science Armed with the Latest T...Bioanalytical Capabilities -- Thought-Leading Science Armed with the Latest T...
Bioanalytical Capabilities -- Thought-Leading Science Armed with the Latest T...Covance
 
Achieve Complete Coverage of the SARS-CoV-2 Genome
Achieve Complete Coverage of the SARS-CoV-2 GenomeAchieve Complete Coverage of the SARS-CoV-2 Genome
Achieve Complete Coverage of the SARS-CoV-2 GenomeCamille Cappello
 
GIAB update for GRC GIAB workshop 191015
GIAB update for GRC GIAB workshop 191015GIAB update for GRC GIAB workshop 191015
GIAB update for GRC GIAB workshop 191015GenomeInABottle
 
Artificial Intelligence in Radiation Oncology
Artificial Intelligence in Radiation OncologyArtificial Intelligence in Radiation Oncology
Artificial Intelligence in Radiation OncologyWookjin Choi
 

Similar to Com par 25jun14 (20)

05 costa
05 costa05 costa
05 costa
 
16S MVRSION at Washington University
16S MVRSION at Washington University16S MVRSION at Washington University
16S MVRSION at Washington University
 
VarSeq 2.6.0: Advancing Pharmacogenomics and Genomic Analysis
VarSeq 2.6.0: Advancing Pharmacogenomics and Genomic AnalysisVarSeq 2.6.0: Advancing Pharmacogenomics and Genomic Analysis
VarSeq 2.6.0: Advancing Pharmacogenomics and Genomic Analysis
 
Mohit Patel -- Disruptive Diner: Nano Possibilities
Mohit Patel -- Disruptive Diner: Nano PossibilitiesMohit Patel -- Disruptive Diner: Nano Possibilities
Mohit Patel -- Disruptive Diner: Nano Possibilities
 
Development of FDA MicroDB: A Regulatory-Grade Microbial Reference Database
Development of FDA MicroDB: A Regulatory-Grade Microbial Reference DatabaseDevelopment of FDA MicroDB: A Regulatory-Grade Microbial Reference Database
Development of FDA MicroDB: A Regulatory-Grade Microbial Reference Database
 
Giab for jax long read 190917
Giab for jax long read 190917Giab for jax long read 190917
Giab for jax long read 190917
 
Research Program Genetic Gains (RPGG) Review Meeting 2021: From Discovery to ...
Research Program Genetic Gains (RPGG) Review Meeting 2021: From Discovery to ...Research Program Genetic Gains (RPGG) Review Meeting 2021: From Discovery to ...
Research Program Genetic Gains (RPGG) Review Meeting 2021: From Discovery to ...
 
Basic Aspects of Microarray Technology and Data Analysis (UEB-UAT Bioinformat...
Basic Aspects of Microarray Technology and Data Analysis (UEB-UAT Bioinformat...Basic Aspects of Microarray Technology and Data Analysis (UEB-UAT Bioinformat...
Basic Aspects of Microarray Technology and Data Analysis (UEB-UAT Bioinformat...
 
160627 giab for festival sv workshop
160627 giab for festival sv workshop160627 giab for festival sv workshop
160627 giab for festival sv workshop
 
Updates to VSClinical ACMG Guidelines & a Tour of Cancer Annotation Sources
Updates to VSClinical ACMG Guidelines & a Tour of Cancer Annotation SourcesUpdates to VSClinical ACMG Guidelines & a Tour of Cancer Annotation Sources
Updates to VSClinical ACMG Guidelines & a Tour of Cancer Annotation Sources
 
Updates to VSClinical ACMG Guidelines & a Tour of Cancer Annotation Sources
Updates to VSClinical ACMG Guidelines & a Tour of Cancer Annotation SourcesUpdates to VSClinical ACMG Guidelines & a Tour of Cancer Annotation Sources
Updates to VSClinical ACMG Guidelines & a Tour of Cancer Annotation Sources
 
GIAB for AMP GeT-RM Forum
GIAB for AMP GeT-RM ForumGIAB for AMP GeT-RM Forum
GIAB for AMP GeT-RM Forum
 
Artificial Intelligence in Radiation Oncology
Artificial Intelligence in Radiation OncologyArtificial Intelligence in Radiation Oncology
Artificial Intelligence in Radiation Oncology
 
Oncogenomics 2013
Oncogenomics 2013Oncogenomics 2013
Oncogenomics 2013
 
Step by Step, from Liquid Biopsy to a Genomic Biomarker: Liquid Biopsy Series...
Step by Step, from Liquid Biopsy to a Genomic Biomarker: Liquid Biopsy Series...Step by Step, from Liquid Biopsy to a Genomic Biomarker: Liquid Biopsy Series...
Step by Step, from Liquid Biopsy to a Genomic Biomarker: Liquid Biopsy Series...
 
Bioanalytical Capabilities -- Thought-Leading Science Armed with the Latest T...
Bioanalytical Capabilities -- Thought-Leading Science Armed with the Latest T...Bioanalytical Capabilities -- Thought-Leading Science Armed with the Latest T...
Bioanalytical Capabilities -- Thought-Leading Science Armed with the Latest T...
 
Achieve Complete Coverage of the SARS-CoV-2 Genome
Achieve Complete Coverage of the SARS-CoV-2 GenomeAchieve Complete Coverage of the SARS-CoV-2 Genome
Achieve Complete Coverage of the SARS-CoV-2 Genome
 
GIAB update for GRC GIAB workshop 191015
GIAB update for GRC GIAB workshop 191015GIAB update for GRC GIAB workshop 191015
GIAB update for GRC GIAB workshop 191015
 
Artificial Intelligence in Radiation Oncology
Artificial Intelligence in Radiation OncologyArtificial Intelligence in Radiation Oncology
Artificial Intelligence in Radiation Oncology
 
2013 Cornell's Plant Breeding and Genetic Seminar Series
2013 Cornell's Plant Breeding and Genetic Seminar Series2013 Cornell's Plant Breeding and Genetic Seminar Series
2013 Cornell's Plant Breeding and Genetic Seminar Series
 

Recently uploaded

SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 

Recently uploaded (20)

SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 

Com par 25jun14

  • 1. • Xin-zhuan Su • Sittiporn Pattaradilokrat • Sethu Nair • Yanwei Qi • Gordon Bullen NIH/ NIAID – Malaria Functional Genomics Section • Sebastian Gurevich McGill University Funding: National Institutes of Health Canadian Institutes of Health Research • Philip Awadalla University of Montreal https://github.com/parasite-genomics/Pipelines - 2.0 Coming in July 2014 zmartine@gmail.com ComPar: Genome Assembly, Variant Mapping, and Validation Pipelines Martine Zilversmit http://www.slideshare.net/zmartine1/com-par-25jun14
  • 2. ComPar: Genome Assembly, Variant Mapping, and Validation Pipelines https://github.com/parasite-genomics/Pipelines • BASH-scripted pipelines • Accurate variant prediction – SNPs – Small indels – Large indels (>17bp) – Focused regions of extreme divergence (35-70% amino acid identity) • In silico variant validation
  • 3. Parameters: - Quality Metric and Cutoff - Number of variants per cluster - Maximum distance between variants within a cluster - Maximum distance between smaller clusters to merge into an HDR Finding Highly Divergent Regions – HDR Program VCF File False Positive Variants True Positive Variants HDR File: - Size of HDR - Position of HDR - Variants Contained Python - Stand-alone interactive or pipelined
  • 4. NumberofVariants Position on “Chromosome” Dye-Terminator Sequenced Variation – 50 basepair Sliding window Comparing 2 Plasmodium Genomes 200 400 600 800 1000 1200 1400 1600 1800 2000 2200 2400 2600 2800 3000 3200 3400 3600 3800 4000 4200 4400 4600 4800 5000 5200 5400 5600 5800 6000 6200 6400 6600 6800 7000 7200 7400 7600 7800 8000 8200 8400 8600 8800 9000 9200 9400 9600 0 2 4 6 8 10 12
  • 5. Predicted Variants – No filtering Based on Quality Metrics NumberofVariants Position on “Chromosome” NumberofVariants Position on “Chromosome” Comparing 2 Plasmodium Genomes 200 400 600 800 1000 1200 1400 1600 1800 2000 2200 2400 2600 2800 3000 3200 3400 3600 3800 4000 4200 4400 4600 4800 5000 5200 5400 5600 5800 6000 6200 6400 6600 6800 7000 7200 7400 7600 7800 8000 8200 8400 8600 8800 9000 9200 9400 9600 0 2 4 6 8 10 12 200 400 600 800 1000 1200 1400 1600 1800 2000 2200 2400 2600 2800 3000 3200 3400 3600 3800 4000 4200 4400 4600 4800 5000 5200 5400 5600 5800 6000 6200 6400 6600 6800 7000 7200 7400 7600 7800 8000 8200 8400 8600 8800 9000 9200 9400 9600 0 2 4 6 8 10 12 True Variants Unfiltered Results
  • 6. NumberofVariants Position on “Chromosome” NumberofVariants Position on “Chromosome” Comparing 2 Plasmodium Genomes 200 400 600 800 1000 1200 1400 1600 1800 2000 2200 2400 2600 2800 3000 3200 3400 3600 3800 4000 4200 4400 4600 4800 5000 5200 5400 5600 5800 6000 6200 6400 6600 6800 7000 7200 7400 7600 7800 8000 8200 8400 8600 8800 9000 9200 9400 9600 0 2 4 6 8 10 12 200 400 600 800 1000 1200 1400 1600 1800 2000 2200 2400 2600 2800 3000 3200 3400 3600 3800 4000 4200 4400 4600 4800 5000 5200 5400 5600 5800 6000 6200 6400 6600 6800 7000 7200 7400 7600 7800 8000 8200 8400 8600 8800 9000 9200 9400 9600 0 2 4 6 8 10 12 True Variants Unfiltered Results 200 400 600 800 1000 1200 1400 1600 1800 2000 2200 2400 2600 2800 3000 3200 3400 3600 3800 4000 4200 4400 4600 4800 5000 5200 5400 5600 5800 6000 6200 6400 6600 6800 7000 7200 7400 7600 7800 8000 8200 8400 8600 8800 9000 9200 9400 9600 0 2 4 6 8 10 12 True Variants Quality 30 Cutoff Predicted Variants - Filtering Based on Quality Score ≥ 30 Cutoff
  • 7. NumberofVariants Position on “Chromosome” NumberofVariants Position on “Chromosome” Comparing 2 Plasmodium Genomes Filtering Based on Consensus Quality (FQ) ≤ -100 Cutoff 200 400 600 800 1000 1200 1400 1600 1800 2000 2200 2400 2600 2800 3000 3200 3400 3600 3800 4000 4200 4400 4600 4800 5000 5200 5400 5600 5800 6000 6200 6400 6600 6800 7000 7200 7400 7600 7800 8000 8200 8400 8600 8800 9000 9200 9400 9600 0 2 4 6 8 10 12 True Variants Unfiltered Results Quality 30 Cutoff 200 400 600 800 1000 1200 1400 1600 1800 2000 2200 2400 2600 2800 3000 3200 3400 3600 3800 4000 4200 4400 4600 4800 5000 5200 5400 5600 5800 6000 6200 6400 6600 6800 7000 7200 7400 7600 7800 8000 8200 8400 8600 8800 9000 9200 9400 9600 0 2 4 6 8 10 12 True Variants FQ −100 Cuttoff
  • 8. NumberofVariants Position on “Chromosome” NumberofVariants Position on “Chromosome” Comparing 2 Plasmodium Genomes Highly-Divergent Regions (HDRs) 200 400 600 800 1000 1200 1400 1600 1800 2000 2200 2400 2600 2800 3000 3200 3400 3600 3800 4000 4200 4400 4600 4800 5000 5200 5400 5600 5800 6000 6200 6400 6600 6800 7000 7200 7400 7600 7800 8000 8200 8400 8600 8800 9000 9200 9400 9600 0 2 4 6 8 10 12 True Variants Unfiltered Results Quality 30 Cutoff FQ −100 Cuttoff 200 400 600 800 1000 1200 1400 1600 1800 2000 2200 2400 2600 2800 3000 3200 3400 3600 3800 4000 4200 4400 4600 4800 5000 5200 5400 5600 5800 6000 6200 6400 6600 6800 7000 7200 7400 7600 7800 8000 8200 8400 8600 8800 9000 9200 9400 9600 0 2 4 6 8 10 12 True Variants Unfiltered Results Quality 30 Cutoff FQ −100 Cuttoff
  • 9. NumberofVariants Position on “Chromosome” NumberofVariants Position on “Chromosome” Comparing 2 Plasmodium Genomes 200 400 600 800 1000 1200 1400 1600 1800 2000 2200 2400 2600 2800 3000 3200 3400 3600 3800 4000 4200 4400 4600 4800 5000 5200 5400 5600 5800 6000 6200 6400 6600 6800 7000 7200 7400 7600 7800 8000 8200 8400 8600 8800 9000 9200 9400 9600 0 2 4 6 8 10 12 True Variants Unfiltered Results Quality 30 Cutoff FQ −100 Cuttoff Quality ≥ 30 Variants without Consensus Quality ≥ -100 Highly-Divergent Regions (HDRs) 200 400 600 800 1000 1200 1400 1600 1800 2000 2200 2400 2600 2800 3000 3200 3400 3600 3800 4000 4200 4400 4600 4800 5000 5200 5400 5600 5800 6000 6200 6400 6600 6800 7000 7200 7400 7600 7800 8000 8200 8400 8600 8800 9000 9200 9400 9600 0 2 4 6 8 10 12 True Variants Quality 30, No HDRs
  • 10. Characteristics of Highly Divergent Regions 33X 44.4% By265 55.6% N67 66.7% histone acetyltransferase GCN5, putative (GCN5) RNA-binding protein NOB1, putative Percent Identity DNA repair protein, putative 33X 41.4% By265 79.3% N67 51.7%
  • 11. Characteristics of Highly Divergent Regions

Editor's Notes

  1. MAPPING, DEFINE! DE NOVO, DEFINE! SAY WHAT VARIANTS ARE
  2. Define highly divergent regions
  3. Define highly divergent regions
  4. Define highly divergent regions
  5. Define highly divergent regions
  6. Define highly divergent regions
  7. Define highly divergent regions