SlideShare a Scribd company logo
1 of 11
• Xin-zhuan Su
• Sittiporn Pattaradilokrat
• Sethu Nair
• Yanwei Qi
• Gordon Bullen
NIH/ NIAID – Malaria
Functional Genomics Section • Sebastian Gurevich
McGill University
Funding:
National Institutes of Health
Canadian Institutes of Health Research
• Philip Awadalla
University of Montreal
https://github.com/parasite-genomics/Pipelines - 2.0 Coming in July 2014
zmartine@gmail.com
ComPar: Genome Assembly, Variant Mapping, and
Validation Pipelines
Martine Zilversmit
ComPar: Genome Assembly, Variant Mapping, and
Validation Pipelines
https://github.com/parasite-genomics/Pipelines
• BASH-scripted
pipelines
• Accurate variant
prediction
– SNPs
– Small indels
– Large indels
(>17bp)
– Focused regions of
extreme divergence
(35-70% amino acid
identity)
• In silico variant
validation
Parameters:
- Quality Metric and Cutoff
- Number of variants per cluster
- Maximum distance between variants within a cluster
- Maximum distance between smaller clusters to merge
into an HDR
Finding Highly Divergent Regions – HDR Program
VCF File
False Positive
Variants
True Positive
Variants
HDR File:
- Size of HDR
- Position of HDR
- Variants Contained
Python - Stand-alone interactive or pipelined
NumberofVariants
Position on “Chromosome”
Dye-Terminator Sequenced Variation – 50 basepair Sliding window
Comparing 2 Plasmodium Genomes
200
400
600
800
1000
1200
1400
1600
1800
2000
2200
2400
2600
2800
3000
3200
3400
3600
3800
4000
4200
4400
4600
4800
5000
5200
5400
5600
5800
6000
6200
6400
6600
6800
7000
7200
7400
7600
7800
8000
8200
8400
8600
8800
9000
9200
9400
9600
0
2
4
6
8
10
12
Predicted Variants – No filtering Based on Quality Metrics
NumberofVariants
Position on “Chromosome”
NumberofVariants
Position on “Chromosome”
Comparing 2 Plasmodium Genomes
200
400
600
800
1000
1200
1400
1600
1800
2000
2200
2400
2600
2800
3000
3200
3400
3600
3800
4000
4200
4400
4600
4800
5000
5200
5400
5600
5800
6000
6200
6400
6600
6800
7000
7200
7400
7600
7800
8000
8200
8400
8600
8800
9000
9200
9400
9600
0
2
4
6
8
10
12
200
400
600
800
1000
1200
1400
1600
1800
2000
2200
2400
2600
2800
3000
3200
3400
3600
3800
4000
4200
4400
4600
4800
5000
5200
5400
5600
5800
6000
6200
6400
6600
6800
7000
7200
7400
7600
7800
8000
8200
8400
8600
8800
9000
9200
9400
9600
0
2
4
6
8
10
12
True Variants
Unfiltered Results
NumberofVariants
Position on “Chromosome”
NumberofVariants
Position on “Chromosome”
Comparing 2 Plasmodium Genomes
200
400
600
800
1000
1200
1400
1600
1800
2000
2200
2400
2600
2800
3000
3200
3400
3600
3800
4000
4200
4400
4600
4800
5000
5200
5400
5600
5800
6000
6200
6400
6600
6800
7000
7200
7400
7600
7800
8000
8200
8400
8600
8800
9000
9200
9400
9600
0
2
4
6
8
10
12
200
400
600
800
1000
1200
1400
1600
1800
2000
2200
2400
2600
2800
3000
3200
3400
3600
3800
4000
4200
4400
4600
4800
5000
5200
5400
5600
5800
6000
6200
6400
6600
6800
7000
7200
7400
7600
7800
8000
8200
8400
8600
8800
9000
9200
9400
9600
0
2
4
6
8
10
12
True Variants
Unfiltered Results
200
400
600
800
1000
1200
1400
1600
1800
2000
2200
2400
2600
2800
3000
3200
3400
3600
3800
4000
4200
4400
4600
4800
5000
5200
5400
5600
5800
6000
6200
6400
6600
6800
7000
7200
7400
7600
7800
8000
8200
8400
8600
8800
9000
9200
9400
9600
0
2
4
6
8
10
12
True Variants
Quality 30 Cutoff
Predicted Variants - Filtering Based on Quality Score ≥ 30 Cutoff
NumberofVariants
Position on “Chromosome”
NumberofVariants
Position on “Chromosome”
Comparing 2 Plasmodium Genomes
Filtering Based on Consensus Quality (FQ) ≤ -100 Cutoff
200
400
600
800
1000
1200
1400
1600
1800
2000
2200
2400
2600
2800
3000
3200
3400
3600
3800
4000
4200
4400
4600
4800
5000
5200
5400
5600
5800
6000
6200
6400
6600
6800
7000
7200
7400
7600
7800
8000
8200
8400
8600
8800
9000
9200
9400
9600
0
2
4
6
8
10
12
True Variants
Unfiltered Results
Quality 30 Cutoff
200
400
600
800
1000
1200
1400
1600
1800
2000
2200
2400
2600
2800
3000
3200
3400
3600
3800
4000
4200
4400
4600
4800
5000
5200
5400
5600
5800
6000
6200
6400
6600
6800
7000
7200
7400
7600
7800
8000
8200
8400
8600
8800
9000
9200
9400
9600
0
2
4
6
8
10
12
True Variants
FQ −100 Cuttoff
NumberofVariants
Position on “Chromosome”
NumberofVariants
Position on “Chromosome”
Comparing 2 Plasmodium Genomes
Highly-Divergent Regions (HDRs)
200
400
600
800
1000
1200
1400
1600
1800
2000
2200
2400
2600
2800
3000
3200
3400
3600
3800
4000
4200
4400
4600
4800
5000
5200
5400
5600
5800
6000
6200
6400
6600
6800
7000
7200
7400
7600
7800
8000
8200
8400
8600
8800
9000
9200
9400
9600
0
2
4
6
8
10
12
True Variants
Unfiltered Results
Quality 30 Cutoff
FQ −100 Cuttoff
200
400
600
800
1000
1200
1400
1600
1800
2000
2200
2400
2600
2800
3000
3200
3400
3600
3800
4000
4200
4400
4600
4800
5000
5200
5400
5600
5800
6000
6200
6400
6600
6800
7000
7200
7400
7600
7800
8000
8200
8400
8600
8800
9000
9200
9400
9600
0
2
4
6
8
10
12
True Variants
Unfiltered Results
Quality 30 Cutoff
FQ −100 Cuttoff
NumberofVariants
Position on “Chromosome”
NumberofVariants
Position on “Chromosome”
Comparing 2 Plasmodium Genomes
200
400
600
800
1000
1200
1400
1600
1800
2000
2200
2400
2600
2800
3000
3200
3400
3600
3800
4000
4200
4400
4600
4800
5000
5200
5400
5600
5800
6000
6200
6400
6600
6800
7000
7200
7400
7600
7800
8000
8200
8400
8600
8800
9000
9200
9400
9600
0
2
4
6
8
10
12
True Variants
Unfiltered Results
Quality 30 Cutoff
FQ −100 Cuttoff
Quality ≥ 30 Variants without Consensus Quality ≥ -100
Highly-Divergent Regions (HDRs)
200
400
600
800
1000
1200
1400
1600
1800
2000
2200
2400
2600
2800
3000
3200
3400
3600
3800
4000
4200
4400
4600
4800
5000
5200
5400
5600
5800
6000
6200
6400
6600
6800
7000
7200
7400
7600
7800
8000
8200
8400
8600
8800
9000
9200
9400
9600
0
2
4
6
8
10
12
True Variants
Quality 30, No HDRs
Characteristics of Highly Divergent Regions
33X 44.4%
By265 55.6%
N67 66.7%
histone acetyltransferase GCN5, putative (GCN5)
RNA-binding protein NOB1, putative
Percent Identity
DNA repair protein, putative
33X 41.4%
By265 79.3%
N67 51.7%
Characteristics of Highly Divergent Regions

More Related Content

What's hot

Next-Generation Sequencing Commercial Milestones Infographic
Next-Generation Sequencing Commercial Milestones InfographicNext-Generation Sequencing Commercial Milestones Infographic
Next-Generation Sequencing Commercial Milestones InfographicQIAGEN
 
SPIN Workshop Microbial Genomics @NIST
SPIN Workshop Microbial Genomics @NISTSPIN Workshop Microbial Genomics @NIST
SPIN Workshop Microbial Genomics @NISTNathan Olson
 
Pizza club - May 2016 - Shaman
Pizza club - May 2016 - ShamanPizza club - May 2016 - Shaman
Pizza club - May 2016 - ShamanRSG Luxembourg
 
Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...
Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...
Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...Nathan Olson
 
Applications of Whole Genome Sequencing (WGS) to Food Safety – Perspective fr...
Applications of Whole Genome Sequencing (WGS) to Food Safety – Perspective fr...Applications of Whole Genome Sequencing (WGS) to Food Safety – Perspective fr...
Applications of Whole Genome Sequencing (WGS) to Food Safety – Perspective fr...ExternalEvents
 
Dr. Patrick McDermott - One Health Antibiotic Stewardship State of Science - ...
Dr. Patrick McDermott - One Health Antibiotic Stewardship State of Science - ...Dr. Patrick McDermott - One Health Antibiotic Stewardship State of Science - ...
Dr. Patrick McDermott - One Health Antibiotic Stewardship State of Science - ...John Blue
 
Resume_Bill_Martinez
Resume_Bill_MartinezResume_Bill_Martinez
Resume_Bill_MartinezBill Martinez
 
Genome Sequencing: FAO's relevant activities in Animal Health
Genome Sequencing: FAO's relevant activities in Animal HealthGenome Sequencing: FAO's relevant activities in Animal Health
Genome Sequencing: FAO's relevant activities in Animal HealthFAO
 
Open zika presentation
Open zika presentation Open zika presentation
Open zika presentation Sean Ekins
 
Real-Time Genome Sequencing of Resistant Bacteria Provides Precision Infectio...
Real-Time Genome Sequencing of Resistant Bacteria Provides Precision Infectio...Real-Time Genome Sequencing of Resistant Bacteria Provides Precision Infectio...
Real-Time Genome Sequencing of Resistant Bacteria Provides Precision Infectio...ExternalEvents
 
Next Generation Sequencing application in virology
Next Generation Sequencing application in virologyNext Generation Sequencing application in virology
Next Generation Sequencing application in virologyEben Titus
 
2015 06-12-beiko-irida-big data
2015 06-12-beiko-irida-big data2015 06-12-beiko-irida-big data
2015 06-12-beiko-irida-big databeiko
 
Challenges and Opportunities for Digital PCR in the CLIA Laboratory of the Mo...
Challenges and Opportunities for Digital PCR in the CLIA Laboratory of the Mo...Challenges and Opportunities for Digital PCR in the CLIA Laboratory of the Mo...
Challenges and Opportunities for Digital PCR in the CLIA Laboratory of the Mo...Kate Barlow
 
WGS in public health microbiology - MDU/VIDRL Seminar - wed 17 jun 2015
WGS in public health microbiology - MDU/VIDRL Seminar - wed 17 jun 2015WGS in public health microbiology - MDU/VIDRL Seminar - wed 17 jun 2015
WGS in public health microbiology - MDU/VIDRL Seminar - wed 17 jun 2015Torsten Seemann
 
Identification of antibiotic resistance genes in Klebsiella pneumoniae isolat...
Identification of antibiotic resistance genes in Klebsiella pneumoniae isolat...Identification of antibiotic resistance genes in Klebsiella pneumoniae isolat...
Identification of antibiotic resistance genes in Klebsiella pneumoniae isolat...QIAGEN
 
2nd CRISPR Congress Boston, 23-25 February 2016
2nd CRISPR Congress Boston, 23-25 February 2016 2nd CRISPR Congress Boston, 23-25 February 2016
2nd CRISPR Congress Boston, 23-25 February 2016 Diane McKenna
 
A simple and rapid dna extraction method from FINA nd qPCR
A simple and rapid dna extraction method from FINA nd qPCRA simple and rapid dna extraction method from FINA nd qPCR
A simple and rapid dna extraction method from FINA nd qPCRManish Thakur
 

What's hot (20)

Next-Generation Sequencing Commercial Milestones Infographic
Next-Generation Sequencing Commercial Milestones InfographicNext-Generation Sequencing Commercial Milestones Infographic
Next-Generation Sequencing Commercial Milestones Infographic
 
SPIN Workshop Microbial Genomics @NIST
SPIN Workshop Microbial Genomics @NISTSPIN Workshop Microbial Genomics @NIST
SPIN Workshop Microbial Genomics @NIST
 
Pizza club - May 2016 - Shaman
Pizza club - May 2016 - ShamanPizza club - May 2016 - Shaman
Pizza club - May 2016 - Shaman
 
First Coast Final
First Coast FinalFirst Coast Final
First Coast Final
 
Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...
Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...
Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...
 
Applications of Whole Genome Sequencing (WGS) to Food Safety – Perspective fr...
Applications of Whole Genome Sequencing (WGS) to Food Safety – Perspective fr...Applications of Whole Genome Sequencing (WGS) to Food Safety – Perspective fr...
Applications of Whole Genome Sequencing (WGS) to Food Safety – Perspective fr...
 
Dr. Patrick McDermott - One Health Antibiotic Stewardship State of Science - ...
Dr. Patrick McDermott - One Health Antibiotic Stewardship State of Science - ...Dr. Patrick McDermott - One Health Antibiotic Stewardship State of Science - ...
Dr. Patrick McDermott - One Health Antibiotic Stewardship State of Science - ...
 
Resume_Bill_Martinez
Resume_Bill_MartinezResume_Bill_Martinez
Resume_Bill_Martinez
 
Genome Sequencing: FAO's relevant activities in Animal Health
Genome Sequencing: FAO's relevant activities in Animal HealthGenome Sequencing: FAO's relevant activities in Animal Health
Genome Sequencing: FAO's relevant activities in Animal Health
 
Ashg sedlazeck grc_share
Ashg sedlazeck grc_shareAshg sedlazeck grc_share
Ashg sedlazeck grc_share
 
Open zika presentation
Open zika presentation Open zika presentation
Open zika presentation
 
Real-Time Genome Sequencing of Resistant Bacteria Provides Precision Infectio...
Real-Time Genome Sequencing of Resistant Bacteria Provides Precision Infectio...Real-Time Genome Sequencing of Resistant Bacteria Provides Precision Infectio...
Real-Time Genome Sequencing of Resistant Bacteria Provides Precision Infectio...
 
Next Generation Sequencing application in virology
Next Generation Sequencing application in virologyNext Generation Sequencing application in virology
Next Generation Sequencing application in virology
 
2015 06-12-beiko-irida-big data
2015 06-12-beiko-irida-big data2015 06-12-beiko-irida-big data
2015 06-12-beiko-irida-big data
 
Challenges and Opportunities for Digital PCR in the CLIA Laboratory of the Mo...
Challenges and Opportunities for Digital PCR in the CLIA Laboratory of the Mo...Challenges and Opportunities for Digital PCR in the CLIA Laboratory of the Mo...
Challenges and Opportunities for Digital PCR in the CLIA Laboratory of the Mo...
 
WGS in public health microbiology - MDU/VIDRL Seminar - wed 17 jun 2015
WGS in public health microbiology - MDU/VIDRL Seminar - wed 17 jun 2015WGS in public health microbiology - MDU/VIDRL Seminar - wed 17 jun 2015
WGS in public health microbiology - MDU/VIDRL Seminar - wed 17 jun 2015
 
Identification of antibiotic resistance genes in Klebsiella pneumoniae isolat...
Identification of antibiotic resistance genes in Klebsiella pneumoniae isolat...Identification of antibiotic resistance genes in Klebsiella pneumoniae isolat...
Identification of antibiotic resistance genes in Klebsiella pneumoniae isolat...
 
2nd CRISPR Congress Boston, 23-25 February 2016
2nd CRISPR Congress Boston, 23-25 February 2016 2nd CRISPR Congress Boston, 23-25 February 2016
2nd CRISPR Congress Boston, 23-25 February 2016
 
A simple and rapid dna extraction method from FINA nd qPCR
A simple and rapid dna extraction method from FINA nd qPCRA simple and rapid dna extraction method from FINA nd qPCR
A simple and rapid dna extraction method from FINA nd qPCR
 
Overview of the ECDC whole genome sequencing strategy
Overview of the ECDC whole genome sequencing strategyOverview of the ECDC whole genome sequencing strategy
Overview of the ECDC whole genome sequencing strategy
 

Similar to I evobio zilversmit_25jun14

16S MVRSION at Washington University
16S MVRSION at Washington University16S MVRSION at Washington University
16S MVRSION at Washington UniversitySeth Crosby
 
160627 giab for festival sv workshop
160627 giab for festival sv workshop160627 giab for festival sv workshop
160627 giab for festival sv workshopGenomeInABottle
 
Mohit Patel -- Disruptive Diner: Nano Possibilities
Mohit Patel -- Disruptive Diner: Nano PossibilitiesMohit Patel -- Disruptive Diner: Nano Possibilities
Mohit Patel -- Disruptive Diner: Nano PossibilitiesOpenly Disruptive
 
Giab for jax long read 190917
Giab for jax long read 190917Giab for jax long read 190917
Giab for jax long read 190917GenomeInABottle
 
VarSeq 2.6.0: Advancing Pharmacogenomics and Genomic Analysis
VarSeq 2.6.0: Advancing Pharmacogenomics and Genomic AnalysisVarSeq 2.6.0: Advancing Pharmacogenomics and Genomic Analysis
VarSeq 2.6.0: Advancing Pharmacogenomics and Genomic AnalysisGolden Helix
 
Development of FDA MicroDB: A Regulatory-Grade Microbial Reference Database
Development of FDA MicroDB: A Regulatory-Grade Microbial Reference DatabaseDevelopment of FDA MicroDB: A Regulatory-Grade Microbial Reference Database
Development of FDA MicroDB: A Regulatory-Grade Microbial Reference DatabaseNathan Olson
 
GIAB for AMP GeT-RM Forum
GIAB for AMP GeT-RM ForumGIAB for AMP GeT-RM Forum
GIAB for AMP GeT-RM ForumGenomeInABottle
 
Achieve Complete Coverage of the SARS-CoV-2 Genome
Achieve Complete Coverage of the SARS-CoV-2 GenomeAchieve Complete Coverage of the SARS-CoV-2 Genome
Achieve Complete Coverage of the SARS-CoV-2 GenomeCamille Cappello
 
GIAB update for GRC GIAB workshop 191015
GIAB update for GRC GIAB workshop 191015GIAB update for GRC GIAB workshop 191015
GIAB update for GRC GIAB workshop 191015GenomeInABottle
 
Research Program Genetic Gains (RPGG) Review Meeting 2021: From Discovery to ...
Research Program Genetic Gains (RPGG) Review Meeting 2021: From Discovery to ...Research Program Genetic Gains (RPGG) Review Meeting 2021: From Discovery to ...
Research Program Genetic Gains (RPGG) Review Meeting 2021: From Discovery to ...ICRISAT
 
Artificial Intelligence in Radiation Oncology
Artificial Intelligence in Radiation OncologyArtificial Intelligence in Radiation Oncology
Artificial Intelligence in Radiation OncologyWookjin Choi
 
Basic Aspects of Microarray Technology and Data Analysis (UEB-UAT Bioinformat...
Basic Aspects of Microarray Technology and Data Analysis (UEB-UAT Bioinformat...Basic Aspects of Microarray Technology and Data Analysis (UEB-UAT Bioinformat...
Basic Aspects of Microarray Technology and Data Analysis (UEB-UAT Bioinformat...VHIR Vall d’Hebron Institut de Recerca
 
Artificial Intelligence in Radiation Oncology
Artificial Intelligence in Radiation OncologyArtificial Intelligence in Radiation Oncology
Artificial Intelligence in Radiation OncologyWookjin Choi
 
Updates to VSClinical ACMG Guidelines & a Tour of Cancer Annotation Sources
Updates to VSClinical ACMG Guidelines & a Tour of Cancer Annotation SourcesUpdates to VSClinical ACMG Guidelines & a Tour of Cancer Annotation Sources
Updates to VSClinical ACMG Guidelines & a Tour of Cancer Annotation SourcesGolden Helix
 
Updates to VSClinical ACMG Guidelines & a Tour of Cancer Annotation Sources
Updates to VSClinical ACMG Guidelines & a Tour of Cancer Annotation SourcesUpdates to VSClinical ACMG Guidelines & a Tour of Cancer Annotation Sources
Updates to VSClinical ACMG Guidelines & a Tour of Cancer Annotation SourcesDelaina Hawkins
 
RNA Seq Data Analysis
RNA Seq Data AnalysisRNA Seq Data Analysis
RNA Seq Data AnalysisRavi Gandham
 
Step by Step, from Liquid Biopsy to a Genomic Biomarker: Liquid Biopsy Series...
Step by Step, from Liquid Biopsy to a Genomic Biomarker: Liquid Biopsy Series...Step by Step, from Liquid Biopsy to a Genomic Biomarker: Liquid Biopsy Series...
Step by Step, from Liquid Biopsy to a Genomic Biomarker: Liquid Biopsy Series...QIAGEN
 
The Next, Next Generation of Sequencing - From Semiconductor to Single Molecule
The Next, Next Generation of Sequencing - From Semiconductor to Single MoleculeThe Next, Next Generation of Sequencing - From Semiconductor to Single Molecule
The Next, Next Generation of Sequencing - From Semiconductor to Single MoleculeJustin Johnson
 

Similar to I evobio zilversmit_25jun14 (20)

05 costa
05 costa05 costa
05 costa
 
16S MVRSION at Washington University
16S MVRSION at Washington University16S MVRSION at Washington University
16S MVRSION at Washington University
 
160627 giab for festival sv workshop
160627 giab for festival sv workshop160627 giab for festival sv workshop
160627 giab for festival sv workshop
 
Mohit Patel -- Disruptive Diner: Nano Possibilities
Mohit Patel -- Disruptive Diner: Nano PossibilitiesMohit Patel -- Disruptive Diner: Nano Possibilities
Mohit Patel -- Disruptive Diner: Nano Possibilities
 
Giab for jax long read 190917
Giab for jax long read 190917Giab for jax long read 190917
Giab for jax long read 190917
 
VarSeq 2.6.0: Advancing Pharmacogenomics and Genomic Analysis
VarSeq 2.6.0: Advancing Pharmacogenomics and Genomic AnalysisVarSeq 2.6.0: Advancing Pharmacogenomics and Genomic Analysis
VarSeq 2.6.0: Advancing Pharmacogenomics and Genomic Analysis
 
Development of FDA MicroDB: A Regulatory-Grade Microbial Reference Database
Development of FDA MicroDB: A Regulatory-Grade Microbial Reference DatabaseDevelopment of FDA MicroDB: A Regulatory-Grade Microbial Reference Database
Development of FDA MicroDB: A Regulatory-Grade Microbial Reference Database
 
GIAB for AMP GeT-RM Forum
GIAB for AMP GeT-RM ForumGIAB for AMP GeT-RM Forum
GIAB for AMP GeT-RM Forum
 
Achieve Complete Coverage of the SARS-CoV-2 Genome
Achieve Complete Coverage of the SARS-CoV-2 GenomeAchieve Complete Coverage of the SARS-CoV-2 Genome
Achieve Complete Coverage of the SARS-CoV-2 Genome
 
GIAB update for GRC GIAB workshop 191015
GIAB update for GRC GIAB workshop 191015GIAB update for GRC GIAB workshop 191015
GIAB update for GRC GIAB workshop 191015
 
Research Program Genetic Gains (RPGG) Review Meeting 2021: From Discovery to ...
Research Program Genetic Gains (RPGG) Review Meeting 2021: From Discovery to ...Research Program Genetic Gains (RPGG) Review Meeting 2021: From Discovery to ...
Research Program Genetic Gains (RPGG) Review Meeting 2021: From Discovery to ...
 
Oncogenomics 2013
Oncogenomics 2013Oncogenomics 2013
Oncogenomics 2013
 
Artificial Intelligence in Radiation Oncology
Artificial Intelligence in Radiation OncologyArtificial Intelligence in Radiation Oncology
Artificial Intelligence in Radiation Oncology
 
Basic Aspects of Microarray Technology and Data Analysis (UEB-UAT Bioinformat...
Basic Aspects of Microarray Technology and Data Analysis (UEB-UAT Bioinformat...Basic Aspects of Microarray Technology and Data Analysis (UEB-UAT Bioinformat...
Basic Aspects of Microarray Technology and Data Analysis (UEB-UAT Bioinformat...
 
Artificial Intelligence in Radiation Oncology
Artificial Intelligence in Radiation OncologyArtificial Intelligence in Radiation Oncology
Artificial Intelligence in Radiation Oncology
 
Updates to VSClinical ACMG Guidelines & a Tour of Cancer Annotation Sources
Updates to VSClinical ACMG Guidelines & a Tour of Cancer Annotation SourcesUpdates to VSClinical ACMG Guidelines & a Tour of Cancer Annotation Sources
Updates to VSClinical ACMG Guidelines & a Tour of Cancer Annotation Sources
 
Updates to VSClinical ACMG Guidelines & a Tour of Cancer Annotation Sources
Updates to VSClinical ACMG Guidelines & a Tour of Cancer Annotation SourcesUpdates to VSClinical ACMG Guidelines & a Tour of Cancer Annotation Sources
Updates to VSClinical ACMG Guidelines & a Tour of Cancer Annotation Sources
 
RNA Seq Data Analysis
RNA Seq Data AnalysisRNA Seq Data Analysis
RNA Seq Data Analysis
 
Step by Step, from Liquid Biopsy to a Genomic Biomarker: Liquid Biopsy Series...
Step by Step, from Liquid Biopsy to a Genomic Biomarker: Liquid Biopsy Series...Step by Step, from Liquid Biopsy to a Genomic Biomarker: Liquid Biopsy Series...
Step by Step, from Liquid Biopsy to a Genomic Biomarker: Liquid Biopsy Series...
 
The Next, Next Generation of Sequencing - From Semiconductor to Single Molecule
The Next, Next Generation of Sequencing - From Semiconductor to Single MoleculeThe Next, Next Generation of Sequencing - From Semiconductor to Single Molecule
The Next, Next Generation of Sequencing - From Semiconductor to Single Molecule
 

Recently uploaded

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 

Recently uploaded (20)

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 

I evobio zilversmit_25jun14

  • 1. • Xin-zhuan Su • Sittiporn Pattaradilokrat • Sethu Nair • Yanwei Qi • Gordon Bullen NIH/ NIAID – Malaria Functional Genomics Section • Sebastian Gurevich McGill University Funding: National Institutes of Health Canadian Institutes of Health Research • Philip Awadalla University of Montreal https://github.com/parasite-genomics/Pipelines - 2.0 Coming in July 2014 zmartine@gmail.com ComPar: Genome Assembly, Variant Mapping, and Validation Pipelines Martine Zilversmit
  • 2. ComPar: Genome Assembly, Variant Mapping, and Validation Pipelines https://github.com/parasite-genomics/Pipelines • BASH-scripted pipelines • Accurate variant prediction – SNPs – Small indels – Large indels (>17bp) – Focused regions of extreme divergence (35-70% amino acid identity) • In silico variant validation
  • 3. Parameters: - Quality Metric and Cutoff - Number of variants per cluster - Maximum distance between variants within a cluster - Maximum distance between smaller clusters to merge into an HDR Finding Highly Divergent Regions – HDR Program VCF File False Positive Variants True Positive Variants HDR File: - Size of HDR - Position of HDR - Variants Contained Python - Stand-alone interactive or pipelined
  • 4. NumberofVariants Position on “Chromosome” Dye-Terminator Sequenced Variation – 50 basepair Sliding window Comparing 2 Plasmodium Genomes 200 400 600 800 1000 1200 1400 1600 1800 2000 2200 2400 2600 2800 3000 3200 3400 3600 3800 4000 4200 4400 4600 4800 5000 5200 5400 5600 5800 6000 6200 6400 6600 6800 7000 7200 7400 7600 7800 8000 8200 8400 8600 8800 9000 9200 9400 9600 0 2 4 6 8 10 12
  • 5. Predicted Variants – No filtering Based on Quality Metrics NumberofVariants Position on “Chromosome” NumberofVariants Position on “Chromosome” Comparing 2 Plasmodium Genomes 200 400 600 800 1000 1200 1400 1600 1800 2000 2200 2400 2600 2800 3000 3200 3400 3600 3800 4000 4200 4400 4600 4800 5000 5200 5400 5600 5800 6000 6200 6400 6600 6800 7000 7200 7400 7600 7800 8000 8200 8400 8600 8800 9000 9200 9400 9600 0 2 4 6 8 10 12 200 400 600 800 1000 1200 1400 1600 1800 2000 2200 2400 2600 2800 3000 3200 3400 3600 3800 4000 4200 4400 4600 4800 5000 5200 5400 5600 5800 6000 6200 6400 6600 6800 7000 7200 7400 7600 7800 8000 8200 8400 8600 8800 9000 9200 9400 9600 0 2 4 6 8 10 12 True Variants Unfiltered Results
  • 6. NumberofVariants Position on “Chromosome” NumberofVariants Position on “Chromosome” Comparing 2 Plasmodium Genomes 200 400 600 800 1000 1200 1400 1600 1800 2000 2200 2400 2600 2800 3000 3200 3400 3600 3800 4000 4200 4400 4600 4800 5000 5200 5400 5600 5800 6000 6200 6400 6600 6800 7000 7200 7400 7600 7800 8000 8200 8400 8600 8800 9000 9200 9400 9600 0 2 4 6 8 10 12 200 400 600 800 1000 1200 1400 1600 1800 2000 2200 2400 2600 2800 3000 3200 3400 3600 3800 4000 4200 4400 4600 4800 5000 5200 5400 5600 5800 6000 6200 6400 6600 6800 7000 7200 7400 7600 7800 8000 8200 8400 8600 8800 9000 9200 9400 9600 0 2 4 6 8 10 12 True Variants Unfiltered Results 200 400 600 800 1000 1200 1400 1600 1800 2000 2200 2400 2600 2800 3000 3200 3400 3600 3800 4000 4200 4400 4600 4800 5000 5200 5400 5600 5800 6000 6200 6400 6600 6800 7000 7200 7400 7600 7800 8000 8200 8400 8600 8800 9000 9200 9400 9600 0 2 4 6 8 10 12 True Variants Quality 30 Cutoff Predicted Variants - Filtering Based on Quality Score ≥ 30 Cutoff
  • 7. NumberofVariants Position on “Chromosome” NumberofVariants Position on “Chromosome” Comparing 2 Plasmodium Genomes Filtering Based on Consensus Quality (FQ) ≤ -100 Cutoff 200 400 600 800 1000 1200 1400 1600 1800 2000 2200 2400 2600 2800 3000 3200 3400 3600 3800 4000 4200 4400 4600 4800 5000 5200 5400 5600 5800 6000 6200 6400 6600 6800 7000 7200 7400 7600 7800 8000 8200 8400 8600 8800 9000 9200 9400 9600 0 2 4 6 8 10 12 True Variants Unfiltered Results Quality 30 Cutoff 200 400 600 800 1000 1200 1400 1600 1800 2000 2200 2400 2600 2800 3000 3200 3400 3600 3800 4000 4200 4400 4600 4800 5000 5200 5400 5600 5800 6000 6200 6400 6600 6800 7000 7200 7400 7600 7800 8000 8200 8400 8600 8800 9000 9200 9400 9600 0 2 4 6 8 10 12 True Variants FQ −100 Cuttoff
  • 8. NumberofVariants Position on “Chromosome” NumberofVariants Position on “Chromosome” Comparing 2 Plasmodium Genomes Highly-Divergent Regions (HDRs) 200 400 600 800 1000 1200 1400 1600 1800 2000 2200 2400 2600 2800 3000 3200 3400 3600 3800 4000 4200 4400 4600 4800 5000 5200 5400 5600 5800 6000 6200 6400 6600 6800 7000 7200 7400 7600 7800 8000 8200 8400 8600 8800 9000 9200 9400 9600 0 2 4 6 8 10 12 True Variants Unfiltered Results Quality 30 Cutoff FQ −100 Cuttoff 200 400 600 800 1000 1200 1400 1600 1800 2000 2200 2400 2600 2800 3000 3200 3400 3600 3800 4000 4200 4400 4600 4800 5000 5200 5400 5600 5800 6000 6200 6400 6600 6800 7000 7200 7400 7600 7800 8000 8200 8400 8600 8800 9000 9200 9400 9600 0 2 4 6 8 10 12 True Variants Unfiltered Results Quality 30 Cutoff FQ −100 Cuttoff
  • 9. NumberofVariants Position on “Chromosome” NumberofVariants Position on “Chromosome” Comparing 2 Plasmodium Genomes 200 400 600 800 1000 1200 1400 1600 1800 2000 2200 2400 2600 2800 3000 3200 3400 3600 3800 4000 4200 4400 4600 4800 5000 5200 5400 5600 5800 6000 6200 6400 6600 6800 7000 7200 7400 7600 7800 8000 8200 8400 8600 8800 9000 9200 9400 9600 0 2 4 6 8 10 12 True Variants Unfiltered Results Quality 30 Cutoff FQ −100 Cuttoff Quality ≥ 30 Variants without Consensus Quality ≥ -100 Highly-Divergent Regions (HDRs) 200 400 600 800 1000 1200 1400 1600 1800 2000 2200 2400 2600 2800 3000 3200 3400 3600 3800 4000 4200 4400 4600 4800 5000 5200 5400 5600 5800 6000 6200 6400 6600 6800 7000 7200 7400 7600 7800 8000 8200 8400 8600 8800 9000 9200 9400 9600 0 2 4 6 8 10 12 True Variants Quality 30, No HDRs
  • 10. Characteristics of Highly Divergent Regions 33X 44.4% By265 55.6% N67 66.7% histone acetyltransferase GCN5, putative (GCN5) RNA-binding protein NOB1, putative Percent Identity DNA repair protein, putative 33X 41.4% By265 79.3% N67 51.7%
  • 11. Characteristics of Highly Divergent Regions

Editor's Notes

  1. MAPPING, DEFINE! DE NOVO, DEFINE! SAY WHAT VARIANTS ARE
  2. Define highly divergent regions
  3. Define highly divergent regions
  4. Define highly divergent regions
  5. Define highly divergent regions
  6. Define highly divergent regions
  7. Define highly divergent regions