SlideShare a Scribd company logo
DATA ANALYSIS SOFTWARE:
METAPY.
METAPY
• https://github.com
/widdowquinn/THA
PBI-pycits
Method Tool
DNA extraction/ PCR
DNAseq
QC, Trim, Chimera detection
Assemble reads
Error correction ???:
Bayes Hammer
Nested PCR
Illumina overlapping read
Fastqc, Trimmomatic, Vsearch
Flash / PEAR
Convert FQ, FA
Trim primers off seq
Cluster
Biopython
Python
Swarm
CD-HIT
Vsearch
Bowtie
Blastclust
Python: sklearn
Compare
clustering
Graphics
Summarise species Python
Clustering
Swarm – Not biased to input order. Works on
a d difference method
CD-HIT – Biased to input order. Works on %
identity
Vsearch - Biased to input order. Works on %
identity
BLASTCLUST – Unknown bias to input order.
Works on % identity
Bowtie - PERFECT matches only!
Read
database
New approach to Metabarcoding
analysis – Exact sequence variants
Jamie Orr
OTU – Operational Taxonomic Unit
OTUs vs ZOTUs (Exact sequence
variants)
• OTUs define sequence-similar
groups; variation could be
biological, or technical
(PCR/sequencing).
• ZOTUs explicitly try to correct
PCR and sequencing errors.
Correcting - ZOTUs
• Two sequences (A and B)
• Skew = abundanceA/abundanceB
• B(d)=1/2ad+1
Where “d” is the number of positional differences
between two sequences
“a” is set by the user
• If skew is less than B(d) then A is assigned to B
Method Tool
DNA extraction/ PCR
DNAseq
QC, Trim, Chimera detection
Assemble reads
Nested PCR
Illumina overlapping read
Fastqc, Trimmomatic, Vsearch
Flash / PEAR
Convert FQ, FA
Trim primers off seq
Cluster
Biopython
Python
Swarm
CD-HIT
Vsearch
Bowtie
Blastclust
Python: sklearn
Compare
clustering
Graphics
Summarise species Python
DADA2 (ZOTU)
Metapy checks database is OK :
INFO: QC passed on sequences: assembled_skew: normal skewtest assemb_lens = 0.718
pvalue = 0.4731
database_skew: normal skewtest db_lens = -2.703 pvalue = 0.0069 Mann_whitney U test: 0.000104940190514
INFO: db_mean= 196.958 db_stdev= 18.808 assem_mean = 189.681 , assem_stdev = 21.155
Metapy checks database is OK to use:
FAILED – Used in previous publication
The assembled size of your reads is significantly different to your database. You need to
adjust your DB sequences to that of the region you sequenced.
assembled_skew: normal skewtest assemb_lens = 0.718 pvalue = 0.4731 database_skew: normal skewtest db_lens = -
8.199 pvalue = 0.0000 Mann_whitney U test: 1.3189757498e-85 db_mean= 711.194 db_stdev= 218.250
assem_mean = 189.681 , assem_stdev = 21.155INFO
Database matters!!!
• If you are going to pick species based on a
database. These entries matter!
• Reference database quality critically determines
classification accuracy!
• Compare 5 Phytophthora database.
• 2 used for publications
Database matters!!!
Phytophthora_db_v0.001
• Tracked on Github
• Can be automatically updated and generated by
scripts.
• If you are going to pick species based on a
database. These entries matter!
• Reference database quality critically determines
classification accuracy!
• Compare 5 Phytophthora database.
• 2 used for publications
Compare
databases
Out of a known 10 species "spiked" sample - DNAmix
Database:
TOOL: Catergory
235_FULL
length_error
_removed
235_trimm
ed_to_ITS1
Santi_
modified
David’s pre -
database
trimmed to ITS1
Phytophthora
DB version 0.01
Result
found in
all tools
true
positives
0 4 5 2 4
mis -
cluster
0 3 1 3 1
Blastclust
true
positives
3 7 9 8 9
mis -
cluster
23 34 37 29 21
Bowtie
true
positives
4 4 5 3 4
mis -
cluster
5 3 1 3 1
cdhit
true
positives
7 7 8 7 8
mis -
cluster
39 23 17 26 19
Swarm
true
positives
0 7 8 7 8
mis -
cluster
0 11 8 20 19
Vsearch
fastclust
true
positives
4 7 8 4 7
mis -
cluster
26 15 12 11 8
Vsearch
true
positives
6 6 8 4 7
mis -
cluster
7 7 7 8 5
DADA2
true
positives
0 3 3 2 3
mis -
cluster
0 2 1 2 0
Out of a known 10 species "spiked" sample - DNAmix
Database:
TOOL: Catergory
235_FULL
length_error
_removed
235_trimm
ed_to_ITS1
Santi_
modified
David’s pre -
database
trimmed to ITS1
Phytophthora
DB version 0.01
Result
found in
all tools
true
positives
0 4 5 2 4
mis -
cluster
0 3 1 3 1
Blastclust
true
positives
3 7 9 8 9
mis -
cluster
23 34 37 29 21
Bowtie
true
positives
4 4 5 3 4
mis -
cluster
5 3 1 3 1
cdhit
true
positives
7 7 8 7 8
mis -
cluster
39 23 17 26 19
Swarm
true
positives
0 7 8 7 8
mis -
cluster
0 11 8 20 19
Vsearch
fastclust
true
positives
4 7 8 4 7
mis -
cluster
26 15 12 11 8
Vsearch
true
positives
6 6 8 4 7
mis -
cluster
7 7 7 8 5
DADA2
true
positives
0 3 3 2 3
mis -
cluster
0 2 1 2 0
Compare
databases
Message 1:
• Database length matters.
• Including non-ITS1 region has
negative impact (obvious, but used
in publications!)
Out of a known 10 species "spiked" sample - DNAmix
Database:
tergory
235_FULL
length_error
_removed
235_trimm
ed_to_ITS1
Santi_
modified
David’s pre -
database
trimmed to ITS1
Phytophthora
DB version 0.01
true
ositives
0 4 5 2 4
mis -
luster
0 3 1 3 1
true
ositives
3 7 9 8 9
mis -
luster
23 34 37 29 21
true
ositives
4 4 5 3 4
mis -
luster
5 3 1 3 1
true
ositives
7 7 8 7 8
mis -
luster
39 23 17 26 19
true
ositives
0 7 8 7 8
mis -
luster
0 11 8 20 19
true
ositives
4 7 8 4 7
mis -
luster
26 15 12 11 8
true
ositives
6 6 8 4 7
mis -
luster
7 7 7 8 5
true
ositives
0 3 3 2 3
mis -
luster
0 2 1 2 0
Compare
databases
Message 2:
• Bowtie and DADA2
reduced mis-cluster
rate (and true positive
rate)
Out of a known 10 species "spiked" sample - DNAmix
Database:
TOOL: Catergory
235_FULL
length_error
_removed
235_trimm
ed_to_ITS1
Santi_
modified
David’s pre -
database
trimmed to ITS1
Phytophthora
DB version 0.01
Result
found in
all tools
true
positives
0 4 5 2 4
mis -
cluster
0 3 1 3 1
Blastclust
true
positives
3 7 9 8 9
mis -
cluster
23 34 37 29 21
Bowtie
true
positives
4 4 5 3 4
mis -
cluster
5 3 1 3 1
cdhit
true
positives
7 7 8 7 8
mis -
cluster
39 23 17 26 19
Swarm
true
positives
0 7 8 7 8
mis -
cluster
0 11 8 20 19
Vsearch
fastclust
true
positives
4 7 8 4 7
mis -
cluster
26 15 12 11 8
Vsearch
true
positives
6 6 8 4 7
mis -
cluster
7 7 7 8 5
DADA2
true
positives
0 3 3 2 3
mis -
cluster
0 2 1 2 0
Out of a known 10 species "spiked" sample - DNAmix
Database:
ry
235_FULL
length_error
_removed
235_trimm
ed_to_ITS1
Santi_
modified
David’s pre -
database
trimmed to ITS1
Phytophthora
DB version 0.01
es
0 4 5 2 4
r
0 3 1 3 1
es
3 7 9 8 9
r
23 34 37 29 21
es
4 4 5 3 4
r
5 3 1 3 1
es
7 7 8 7 8
r
39 23 17 26 19
es
0 7 8 7 8
r
0 11 8 20 19
es
4 7 8 4 7
r
26 15 12 11 8
es
6 6 8 4 7
r
7 7 7 8 5
es
0 3 3 2 3
r
0 2 1 2 0
Out of a known 10 species "spiked" sample - DNAmix
Database:
TOOL: Catergory
235_FULL
length_error
_removed
235_trimm
ed_to_ITS1
Santi_
modified
David’s pre -
database
trimmed to ITS1
Phytophthora
DB version 0.01
Result
und in
ll tools
true
positives
0 4 5 2 4
mis -
cluster
0 3 1 3 1
astclust
true
positives
3 7 9 8 9
mis -
cluster
23 34 37 29 21
Bowtie
true
positives
4 4 5 3 4
mis -
cluster
5 3 1 3 1
cdhit
true
positives
7 7 8 7 8
mis -
cluster
39 23 17 26 19
Swarm
true
positives
0 7 8 7 8
mis -
cluster
0 11 8 20 19
search
astclust
true
positives
4 7 8 4 7
mis -
cluster
26 15 12 11 8
search
true
positives
6 6 8 4 7
mis -
cluster
7 7 7 8 5
DADA2
true
positives
0 3 3 2 3
mis -
cluster
0 2 1 2 0
Compare
databases
Message 2:
• Bowtie and DADA2 reduced
false positive rate
Message 3:
• Blastclust is the worst.
We knew that already!!
• Blastclust does not
produce reliable
identifications with
these ITS1 databases.
• Blastclust also
deprecated – do not
use!
Out of a known 10 species "spiked" sample - DNAmix
Database:
TOOL: Catergory
235_FULL
length_error
_removed
235_trimm
ed_to_ITS1
Santi_
modified
David’s pre -
database
trimmed to ITS1
Phytophthora
DB version 0.01
Result
found in
all tools
true
positives
0 4 5 2 4
mis -
cluster
0 3 1 3 1
Blastclust
true
positives
3 7 9 8 9
mis -
cluster
23 34 37 29 21
Bowtie
true
positives
4 4 5 3 4
mis -
cluster
5 3 1 3 1
cdhit
true
positives
7 7 8 7 8
mis -
cluster
39 23 17 26 19
Swarm
true
positives
0 7 8 7 8
mis -
cluster
0 11 8 20 19
Vsearch
fastclust
true
positives
4 7 8 4 7
mis -
cluster
26 15 12 11 8
Vsearch
true
positives
6 6 8 4 7
mis -
cluster
7 7 7 8 5
DADA2
true
positives
0 3 3 2 3
mis -
cluster
0 2 1 2 0
Out of a known 10 species "spiked" sample - DNAmix
Database:
ory
235_FULL
length_error
_removed
235_trimm
ed_to_ITS1
Santi_
modified
David’s pre -
database
trimmed to ITS1
Phytophthora
DB version 0.01
es
0 4 5 2 4
r
0 3 1 3 1
es
3 7 9 8 9
r
23 34 37 29 21
es
4 4 5 3 4
r
5 3 1 3 1
es
7 7 8 7 8
r
39 23 17 26 19
es
0 7 8 7 8
r
0 11 8 20 19
es
4 7 8 4 7
r
26 15 12 11 8
es
6 6 8 4 7
r
7 7 7 8 5
es
0 3 3 2 3
r
0 2 1 2 0
Out of a known 10 species "spiked" sample - DNAmix
Database:
OOL: Catergory
235_FULL
length_error
_removed
235_trimm
ed_to_ITS1
Santi_
modified
David’s pre -
database
trimmed to ITS1
Phytophthora
DB version 0.01
esult
nd in
tools
true
positives
0 4 5 2 4
mis -
cluster
0 3 1 3 1
tclust
true
positives
3 7 9 8 9
mis -
cluster
23 34 37 29 21
wtie
true
positives
4 4 5 3 4
mis -
cluster
5 3 1 3 1
dhit
true
positives
7 7 8 7 8
mis -
cluster
39 23 17 26 19
warm
true
positives
0 7 8 7 8
mis -
cluster
0 11 8 20 19
earch
tclust
true
positives
4 7 8 4 7
mis -
cluster
26 15 12 11 8
earch
true
positives
6 6 8 4 7
mis -
cluster
7 7 7 8 5
ADA2
true
positives
0 3 3 2 3
mis -
cluster
0 2 1 2 0
Out of a known 10 species "spiked" sample - DNAmix
Database:
ry
235_FULL
length_error
_removed
235_trimm
ed_to_ITS1
Santi_
modified
David’s pre -
database
trimmed to ITS1
Phytophthora
DB version 0.01
es
0 4 5 2 4
r
0 3 1 3 1
es
3 7 9 8 9
r
23 34 37 29 21
es
4 4 5 3 4
r
5 3 1 3 1
es
7 7 8 7 8
r
39 23 17 26 19
es
0 7 8 7 8
r
0 11 8 20 19
es
4 7 8 4 7
r
26 15 12 11 8
es
6 6 8 4 7
r
7 7 7 8 5
es
0 3 3 2 3
r
0 2 1 2 0
Compare
databases
Message 2:
• Bowtie and DADA2 reduced false
positive rate
Message 3:
• Blastclust is the worst.
Message 4:
• These results are
helping us refine the
DB. Mis-cluster rate is
now reducing
Out of a known 10 species "spiked" sample - DNAmix
Database:
TOOL: Catergory
235_FULL
length_error
_removed
235_trimm
ed_to_ITS1
Santi_
modified
David’s pre -
database
trimmed to ITS1
Phytophthora
DB version 0.01
Result
found in
all tools
true
positives
0 4 5 2 4
mis -
cluster
0 3 1 3 1
Blastclust
true
positives
3 7 9 8 9
mis -
cluster
23 34 37 29 21
Bowtie
true
positives
4 4 5 3 4
mis -
cluster
5 3 1 3 1
cdhit
true
positives
7 7 8 7 8
mis -
cluster
39 23 17 26 19
Swarm
true
positives
0 7 8 7 8
mis -
cluster
0 11 8 20 19
Vsearch
fastclust
true
positives
4 7 8 4 7
mis -
cluster
26 15 12 11 8
Vsearch
true
positives
6 6 8 4 7
mis -
cluster
7 7 7 8 5
DADA2
true
positives
0 3 3 2 3
mis -
cluster
0 2 1 2 0
Out of a known 10 species "spiked" sample - DNAmix
Database:
Catergory
235_FULL
length_error
_removed
235_trimm
ed_to_ITS1
Santi_
modified
David’s pre -
database
trimmed to ITS1
Phytophthora
DB version 0.01
true
positives
0 4 5 2 4
mis -
cluster
0 3 1 3 1
true
positives
3 7 9 8 9
mis -
cluster
23 34 37 29 21
true
positives
4 4 5 3 4
mis -
cluster
5 3 1 3 1
true
positives
7 7 8 7 8
mis -
cluster
39 23 17 26 19
true
positives
0 7 8 7 8
mis -
cluster
0 11 8 20 19
true
positives
4 7 8 4 7
mis -
cluster
26 15 12 11 8
true
positives
6 6 8 4 7
mis -
cluster
7 7 7 8 5
true
positives
0 3 3 2 3
mis -
cluster
0 2 1 2 0
Other software made for this project
• Software estimates copy number of a given gene of interest.
• ITS(theoretical) = ∑ITS_hits ⋅ (x̅ ITS_coverage(assembled) / x̅ gene_coverage)
https://github.com/widdowquinn/THAPBI/tree/master/Phyt_ITS_identifying_pipeline
Quantify gene copy number:
Sanger sequencing identification:
• No need for “pointy and clicky” sequencing editor, then web BLAST
• Does it all for you! Sanger read ----> Species
https://github.com/peterthorpe5/public_scripts/tree/master/Sanger_read_metagenetics
Future directions
“Pipeline” needs to be verified with controls.
 Sequencing controls: known spikes, “fake” sequences to
obtain error rates, identification limitations
 TODO: Write Bayesian based clustering/ probabilistic
model
Thanks!
Plant health testing and natural ecosystem surveillance
via In situ water sampling and metabarcoding of
Phytophthora diversity
THE TEAM!
David Cooke
Leighton Pritchard
Eva Randall & Beatrix Clark

More Related Content

What's hot

Jan2015 GIAB intro, Update, and Data Analysis Planning
Jan2015 GIAB intro, Update, and Data Analysis PlanningJan2015 GIAB intro, Update, and Data Analysis Planning
Jan2015 GIAB intro, Update, and Data Analysis Planning
GenomeInABottle
 
It's all in the genes
It's all in the genesIt's all in the genes
It's all in the genes
René Kuipers
 
Cool Informatics Tools and Services for Biomedical Research
Cool Informatics Tools and Services for Biomedical ResearchCool Informatics Tools and Services for Biomedical Research
Cool Informatics Tools and Services for Biomedical ResearchDavid Ruau
 
Tips for effective use of BLAST and other NCBI tools
Tips for effective use of BLAST and other NCBI toolsTips for effective use of BLAST and other NCBI tools
Tips for effective use of BLAST and other NCBI tools
Integrated DNA Technologies
 
Understanding the reference assembly: CSHL Hackathon
Understanding the reference assembly: CSHL HackathonUnderstanding the reference assembly: CSHL Hackathon
Understanding the reference assembly: CSHL Hackathon
Genome Reference Consortium
 
Next-Generation Sequencing Commercial Milestones Infographic
Next-Generation Sequencing Commercial Milestones InfographicNext-Generation Sequencing Commercial Milestones Infographic
Next-Generation Sequencing Commercial Milestones Infographic
QIAGEN
 
Internship Research Paper_Jordan Clarke
Internship Research Paper_Jordan ClarkeInternship Research Paper_Jordan Clarke
Internship Research Paper_Jordan ClarkeJordan Clarke
 
Grc ashg2015 workshop_mudge
Grc ashg2015 workshop_mudgeGrc ashg2015 workshop_mudge
Grc ashg2015 workshop_mudge
Genome Reference Consortium
 
Variant Calling II
Variant Calling IIVariant Calling II
Variant Calling II
Genome Reference Consortium
 
GIAB Sep2016 Lightning megan cleveland targeted seq
GIAB Sep2016 Lightning megan cleveland targeted seqGIAB Sep2016 Lightning megan cleveland targeted seq
GIAB Sep2016 Lightning megan cleveland targeted seq
GenomeInABottle
 
Nephele 2.0: How to get the most out of your Nephele results
Nephele 2.0: How to get the most out of your Nephele resultsNephele 2.0: How to get the most out of your Nephele results
Nephele 2.0: How to get the most out of your Nephele results
Bioinformatics and Computational Biosciences Branch
 
CS Lecture 2017 04-11 from Data to Precision Medicine
CS Lecture 2017 04-11 from Data to Precision MedicineCS Lecture 2017 04-11 from Data to Precision Medicine
CS Lecture 2017 04-11 from Data to Precision Medicine
Gabe Rudy
 
Ashg2014 grc workshop_schneider
Ashg2014 grc workshop_schneiderAshg2014 grc workshop_schneider
Ashg2014 grc workshop_schneider
Genome Reference Consortium
 
2016 ashg giab poster
2016 ashg giab poster2016 ashg giab poster
2016 ashg giab poster
GenomeInABottle
 
Dgaston dec-06-2012
Dgaston dec-06-2012Dgaston dec-06-2012
Dgaston dec-06-2012
Dan Gaston
 
agbt 2016 workshop church
agbt 2016 workshop churchagbt 2016 workshop church
agbt 2016 workshop church
Genome Reference Consortium
 
HUG @ NGCLE@e-Novia 15.11.2017
HUG @ NGCLE@e-Novia 15.11.2017HUG @ NGCLE@e-Novia 15.11.2017
HUG @ NGCLE@e-Novia 15.11.2017
NECST Lab @ Politecnico di Milano
 
Ashg grc workshop2015_tg
Ashg grc workshop2015_tgAshg grc workshop2015_tg
Ashg grc workshop2015_tg
Genome Reference Consortium
 
Reproducible bioinformatics pipelines with Docker and Anduril
Reproducible bioinformatics pipelines with Docker and AndurilReproducible bioinformatics pipelines with Docker and Anduril
Reproducible bioinformatics pipelines with Docker and Anduril
Christian Frech
 
Ashg2015 schneider final
Ashg2015 schneider finalAshg2015 schneider final
Ashg2015 schneider final
Genome Reference Consortium
 

What's hot (20)

Jan2015 GIAB intro, Update, and Data Analysis Planning
Jan2015 GIAB intro, Update, and Data Analysis PlanningJan2015 GIAB intro, Update, and Data Analysis Planning
Jan2015 GIAB intro, Update, and Data Analysis Planning
 
It's all in the genes
It's all in the genesIt's all in the genes
It's all in the genes
 
Cool Informatics Tools and Services for Biomedical Research
Cool Informatics Tools and Services for Biomedical ResearchCool Informatics Tools and Services for Biomedical Research
Cool Informatics Tools and Services for Biomedical Research
 
Tips for effective use of BLAST and other NCBI tools
Tips for effective use of BLAST and other NCBI toolsTips for effective use of BLAST and other NCBI tools
Tips for effective use of BLAST and other NCBI tools
 
Understanding the reference assembly: CSHL Hackathon
Understanding the reference assembly: CSHL HackathonUnderstanding the reference assembly: CSHL Hackathon
Understanding the reference assembly: CSHL Hackathon
 
Next-Generation Sequencing Commercial Milestones Infographic
Next-Generation Sequencing Commercial Milestones InfographicNext-Generation Sequencing Commercial Milestones Infographic
Next-Generation Sequencing Commercial Milestones Infographic
 
Internship Research Paper_Jordan Clarke
Internship Research Paper_Jordan ClarkeInternship Research Paper_Jordan Clarke
Internship Research Paper_Jordan Clarke
 
Grc ashg2015 workshop_mudge
Grc ashg2015 workshop_mudgeGrc ashg2015 workshop_mudge
Grc ashg2015 workshop_mudge
 
Variant Calling II
Variant Calling IIVariant Calling II
Variant Calling II
 
GIAB Sep2016 Lightning megan cleveland targeted seq
GIAB Sep2016 Lightning megan cleveland targeted seqGIAB Sep2016 Lightning megan cleveland targeted seq
GIAB Sep2016 Lightning megan cleveland targeted seq
 
Nephele 2.0: How to get the most out of your Nephele results
Nephele 2.0: How to get the most out of your Nephele resultsNephele 2.0: How to get the most out of your Nephele results
Nephele 2.0: How to get the most out of your Nephele results
 
CS Lecture 2017 04-11 from Data to Precision Medicine
CS Lecture 2017 04-11 from Data to Precision MedicineCS Lecture 2017 04-11 from Data to Precision Medicine
CS Lecture 2017 04-11 from Data to Precision Medicine
 
Ashg2014 grc workshop_schneider
Ashg2014 grc workshop_schneiderAshg2014 grc workshop_schneider
Ashg2014 grc workshop_schneider
 
2016 ashg giab poster
2016 ashg giab poster2016 ashg giab poster
2016 ashg giab poster
 
Dgaston dec-06-2012
Dgaston dec-06-2012Dgaston dec-06-2012
Dgaston dec-06-2012
 
agbt 2016 workshop church
agbt 2016 workshop churchagbt 2016 workshop church
agbt 2016 workshop church
 
HUG @ NGCLE@e-Novia 15.11.2017
HUG @ NGCLE@e-Novia 15.11.2017HUG @ NGCLE@e-Novia 15.11.2017
HUG @ NGCLE@e-Novia 15.11.2017
 
Ashg grc workshop2015_tg
Ashg grc workshop2015_tgAshg grc workshop2015_tg
Ashg grc workshop2015_tg
 
Reproducible bioinformatics pipelines with Docker and Anduril
Reproducible bioinformatics pipelines with Docker and AndurilReproducible bioinformatics pipelines with Docker and Anduril
Reproducible bioinformatics pipelines with Docker and Anduril
 
Ashg2015 schneider final
Ashg2015 schneider finalAshg2015 schneider final
Ashg2015 schneider final
 

Similar to DATA ANALYSIS SOFTWARE: METAPY

The Scientific Method on the Semantic Web
The Scientific Method on the Semantic WebThe Scientific Method on the Semantic Web
The Scientific Method on the Semantic Web
Mark Wilkinson
 
Introduction to SQL
Introduction to SQLIntroduction to SQL
Introduction to SQL
solgenomics
 
Golden Rules of Bioinformatics
Golden Rules of BioinformaticsGolden Rules of Bioinformatics
Golden Rules of Bioinformatics
Leighton Pritchard
 
Role of bioinformatics in life sciences research
Role of bioinformatics in life sciences researchRole of bioinformatics in life sciences research
Role of bioinformatics in life sciences research
Anshika Bansal
 
Bioinformatics MiRON
Bioinformatics MiRONBioinformatics MiRON
Bioinformatics MiRON
Prabin Shakya
 
Biomart Update
Biomart UpdateBiomart Update
Biomart Update
bosc
 
BLAST(Basic Local Alignment Tool)
BLAST(Basic Local Alignment Tool)BLAST(Basic Local Alignment Tool)
BLAST(Basic Local Alignment Tool)
Sobia
 
Enabling Biobank-Scale Genomic Processing with Spark SQL
Enabling Biobank-Scale Genomic Processing with Spark SQLEnabling Biobank-Scale Genomic Processing with Spark SQL
Enabling Biobank-Scale Genomic Processing with Spark SQL
Databricks
 
BB_NCBI_PAG_2019_Workshop
BB_NCBI_PAG_2019_WorkshopBB_NCBI_PAG_2019_Workshop
BB_NCBI_PAG_2019_Workshop
Ben Busby
 
A Genome Sequence Analysis System Built with Hypertable
A Genome Sequence Analysis System Built with HypertableA Genome Sequence Analysis System Built with Hypertable
A Genome Sequence Analysis System Built with Hypertable
DATAVERSITY
 
Howard University: Center for Computational Biology and Bioinformatics
Howard University: Center for Computational Biology and BioinformaticsHoward University: Center for Computational Biology and Bioinformatics
Howard University: Center for Computational Biology and Bioinformaticskarl.barnes
 
Detecting and Quantifying Low Level Variants in Sanger Sequencing Traces
Detecting and Quantifying Low Level Variants in Sanger Sequencing TracesDetecting and Quantifying Low Level Variants in Sanger Sequencing Traces
Detecting and Quantifying Low Level Variants in Sanger Sequencing Traces
Thermo Fisher Scientific
 
Seqr - Protein Sequence Search: Presented by Lianyi Han, Medical Science & Co...
Seqr - Protein Sequence Search: Presented by Lianyi Han, Medical Science & Co...Seqr - Protein Sequence Search: Presented by Lianyi Han, Medical Science & Co...
Seqr - Protein Sequence Search: Presented by Lianyi Han, Medical Science & Co...
Lucidworks
 
Advancing Microbiome Research: From challenging samples to insight with Confi...
Advancing Microbiome Research: From challenging samples to insight with Confi...Advancing Microbiome Research: From challenging samples to insight with Confi...
Advancing Microbiome Research: From challenging samples to insight with Confi...
QIAGEN
 
How we've made a global search engine for genetic data
How we've made a global search engine for genetic dataHow we've made a global search engine for genetic data
How we've made a global search engine for genetic data
Miro Cupak
 
Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...
Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...
Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...
Nathan Olson
 
2016 bioinformatics i_database_searching_wimvancriekinge
2016 bioinformatics i_database_searching_wimvancriekinge2016 bioinformatics i_database_searching_wimvancriekinge
2016 bioinformatics i_database_searching_wimvancriekinge
Prof. Wim Van Criekinge
 
ECCB 2014: Extracting patterns of database and software usage from the bioinf...
ECCB 2014: Extracting patterns of database and software usage from the bioinf...ECCB 2014: Extracting patterns of database and software usage from the bioinf...
ECCB 2014: Extracting patterns of database and software usage from the bioinf...
geraintduck
 
Bioflorida Chapter Meeting Jupiter 2008v2
Bioflorida Chapter Meeting Jupiter 2008v2Bioflorida Chapter Meeting Jupiter 2008v2
Bioflorida Chapter Meeting Jupiter 2008v2
hrhammers
 

Similar to DATA ANALYSIS SOFTWARE: METAPY (20)

The Scientific Method on the Semantic Web
The Scientific Method on the Semantic WebThe Scientific Method on the Semantic Web
The Scientific Method on the Semantic Web
 
Introduction to SQL
Introduction to SQLIntroduction to SQL
Introduction to SQL
 
Golden Rules of Bioinformatics
Golden Rules of BioinformaticsGolden Rules of Bioinformatics
Golden Rules of Bioinformatics
 
Role of bioinformatics in life sciences research
Role of bioinformatics in life sciences researchRole of bioinformatics in life sciences research
Role of bioinformatics in life sciences research
 
Bioinformatics MiRON
Bioinformatics MiRONBioinformatics MiRON
Bioinformatics MiRON
 
Biomart Update
Biomart UpdateBiomart Update
Biomart Update
 
BLAST(Basic Local Alignment Tool)
BLAST(Basic Local Alignment Tool)BLAST(Basic Local Alignment Tool)
BLAST(Basic Local Alignment Tool)
 
Enabling Biobank-Scale Genomic Processing with Spark SQL
Enabling Biobank-Scale Genomic Processing with Spark SQLEnabling Biobank-Scale Genomic Processing with Spark SQL
Enabling Biobank-Scale Genomic Processing with Spark SQL
 
BB_NCBI_PAG_2019_Workshop
BB_NCBI_PAG_2019_WorkshopBB_NCBI_PAG_2019_Workshop
BB_NCBI_PAG_2019_Workshop
 
A Genome Sequence Analysis System Built with Hypertable
A Genome Sequence Analysis System Built with HypertableA Genome Sequence Analysis System Built with Hypertable
A Genome Sequence Analysis System Built with Hypertable
 
Howard University: Center for Computational Biology and Bioinformatics
Howard University: Center for Computational Biology and BioinformaticsHoward University: Center for Computational Biology and Bioinformatics
Howard University: Center for Computational Biology and Bioinformatics
 
Detecting and Quantifying Low Level Variants in Sanger Sequencing Traces
Detecting and Quantifying Low Level Variants in Sanger Sequencing TracesDetecting and Quantifying Low Level Variants in Sanger Sequencing Traces
Detecting and Quantifying Low Level Variants in Sanger Sequencing Traces
 
Seqr - Protein Sequence Search: Presented by Lianyi Han, Medical Science & Co...
Seqr - Protein Sequence Search: Presented by Lianyi Han, Medical Science & Co...Seqr - Protein Sequence Search: Presented by Lianyi Han, Medical Science & Co...
Seqr - Protein Sequence Search: Presented by Lianyi Han, Medical Science & Co...
 
_BLAST.ppt
_BLAST.ppt_BLAST.ppt
_BLAST.ppt
 
Advancing Microbiome Research: From challenging samples to insight with Confi...
Advancing Microbiome Research: From challenging samples to insight with Confi...Advancing Microbiome Research: From challenging samples to insight with Confi...
Advancing Microbiome Research: From challenging samples to insight with Confi...
 
How we've made a global search engine for genetic data
How we've made a global search engine for genetic dataHow we've made a global search engine for genetic data
How we've made a global search engine for genetic data
 
Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...
Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...
Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...
 
2016 bioinformatics i_database_searching_wimvancriekinge
2016 bioinformatics i_database_searching_wimvancriekinge2016 bioinformatics i_database_searching_wimvancriekinge
2016 bioinformatics i_database_searching_wimvancriekinge
 
ECCB 2014: Extracting patterns of database and software usage from the bioinf...
ECCB 2014: Extracting patterns of database and software usage from the bioinf...ECCB 2014: Extracting patterns of database and software usage from the bioinf...
ECCB 2014: Extracting patterns of database and software usage from the bioinf...
 
Bioflorida Chapter Meeting Jupiter 2008v2
Bioflorida Chapter Meeting Jupiter 2008v2Bioflorida Chapter Meeting Jupiter 2008v2
Bioflorida Chapter Meeting Jupiter 2008v2
 

More from Forest Research

Mike Dunn & Mariella Marzano wp2 13 Nov 19
Mike Dunn & Mariella Marzano wp2 13 Nov 19Mike Dunn & Mariella Marzano wp2 13 Nov 19
Mike Dunn & Mariella Marzano wp2 13 Nov 19
Forest Research
 
Ewan mollison wp4 13 Nov 19
Ewan mollison wp4 13 Nov 19Ewan mollison wp4 13 Nov 19
Ewan mollison wp4 13 Nov 19
Forest Research
 
David cooke wp1 13 Nov 19
David cooke wp1 13 Nov 19David cooke wp1 13 Nov 19
David cooke wp1 13 Nov 19
Forest Research
 
Sarah Green introduction 13 Nov 19
Sarah Green introduction 13 Nov 19Sarah Green introduction 13 Nov 19
Sarah Green introduction 13 Nov 19
Forest Research
 
Helen Bentley-Fox & Amanda Calvert 13 Nov 19
Helen Bentley-Fox & Amanda Calvert 13 Nov 19Helen Bentley-Fox & Amanda Calvert 13 Nov 19
Helen Bentley-Fox & Amanda Calvert 13 Nov 19
Forest Research
 
Mike Dunn - Factors for accreditation success interactive session 13 Nov 19
Mike Dunn - Factors for accreditation success interactive session 13 Nov 19Mike Dunn - Factors for accreditation success interactive session 13 Nov 19
Mike Dunn - Factors for accreditation success interactive session 13 Nov 19Forest Research
 
Beth Purse wp3 13 Nov 19
Beth Purse wp3 13 Nov 19Beth Purse wp3 13 Nov 19
Beth Purse wp3 13 Nov 19
Forest Research
 
Louise barwell wp3 14 Nov 19
Louise barwell wp3 14 Nov 19Louise barwell wp3 14 Nov 19
Louise barwell wp3 14 Nov 19
Forest Research
 
Gregory Valatin wp2 14 Nov 19
Gregory Valatin wp2 14 Nov 19Gregory Valatin wp2 14 Nov 19
Gregory Valatin wp2 14 Nov 19
Forest Research
 
Ewan Mollison wp4 14 Nov 19
Ewan Mollison wp4 14 Nov 19Ewan Mollison wp4 14 Nov 19
Ewan Mollison wp4 14 Nov 19
Forest Research
 
David Cooke wp1 14 Nov 19
David Cooke wp1 14 Nov 19David Cooke wp1 14 Nov 19
David Cooke wp1 14 Nov 19
Forest Research
 
Mariella Marzano wp2 14 Nov 19
Mariella Marzano wp2 14 Nov 19Mariella Marzano wp2 14 Nov 19
Mariella Marzano wp2 14 Nov 19
Forest Research
 
Sarah Green wp5 Nov 2018
Sarah Green wp5 Nov 2018Sarah Green wp5 Nov 2018
Sarah Green wp5 Nov 2018
Forest Research
 
Paul Sharp and Ewan Mollison wp4 Nov 2018
Paul Sharp and Ewan Mollison wp4 Nov 2018Paul Sharp and Ewan Mollison wp4 Nov 2018
Paul Sharp and Ewan Mollison wp4 Nov 2018
Forest Research
 
Mike Dunn wp3 Nov 2018
Mike Dunn wp3 Nov 2018Mike Dunn wp3 Nov 2018
Mike Dunn wp3 Nov 2018
Forest Research
 
Mariella Marzano and Mike Dunn wp2 Nov 2018
Mariella Marzano and Mike Dunn wp2 Nov 2018Mariella Marzano and Mike Dunn wp2 Nov 2018
Mariella Marzano and Mike Dunn wp2 Nov 2018
Forest Research
 
Leighton Pritchard wp1 Nov 2018
Leighton Pritchard wp1 Nov 2018Leighton Pritchard wp1 Nov 2018
Leighton Pritchard wp1 Nov 2018
Forest Research
 
Gregory Valatin wp2 Nov 2018
Gregory Valatin wp2 Nov 2018Gregory Valatin wp2 Nov 2018
Gregory Valatin wp2 Nov 2018
Forest Research
 
Glyn Jones wp2 Nov 2018
Glyn Jones wp2 Nov 2018Glyn Jones wp2 Nov 2018
Glyn Jones wp2 Nov 2018
Forest Research
 
David Cooke wp1 Nov 2018
David Cooke wp1 Nov 2018David Cooke wp1 Nov 2018
David Cooke wp1 Nov 2018
Forest Research
 

More from Forest Research (20)

Mike Dunn & Mariella Marzano wp2 13 Nov 19
Mike Dunn & Mariella Marzano wp2 13 Nov 19Mike Dunn & Mariella Marzano wp2 13 Nov 19
Mike Dunn & Mariella Marzano wp2 13 Nov 19
 
Ewan mollison wp4 13 Nov 19
Ewan mollison wp4 13 Nov 19Ewan mollison wp4 13 Nov 19
Ewan mollison wp4 13 Nov 19
 
David cooke wp1 13 Nov 19
David cooke wp1 13 Nov 19David cooke wp1 13 Nov 19
David cooke wp1 13 Nov 19
 
Sarah Green introduction 13 Nov 19
Sarah Green introduction 13 Nov 19Sarah Green introduction 13 Nov 19
Sarah Green introduction 13 Nov 19
 
Helen Bentley-Fox & Amanda Calvert 13 Nov 19
Helen Bentley-Fox & Amanda Calvert 13 Nov 19Helen Bentley-Fox & Amanda Calvert 13 Nov 19
Helen Bentley-Fox & Amanda Calvert 13 Nov 19
 
Mike Dunn - Factors for accreditation success interactive session 13 Nov 19
Mike Dunn - Factors for accreditation success interactive session 13 Nov 19Mike Dunn - Factors for accreditation success interactive session 13 Nov 19
Mike Dunn - Factors for accreditation success interactive session 13 Nov 19
 
Beth Purse wp3 13 Nov 19
Beth Purse wp3 13 Nov 19Beth Purse wp3 13 Nov 19
Beth Purse wp3 13 Nov 19
 
Louise barwell wp3 14 Nov 19
Louise barwell wp3 14 Nov 19Louise barwell wp3 14 Nov 19
Louise barwell wp3 14 Nov 19
 
Gregory Valatin wp2 14 Nov 19
Gregory Valatin wp2 14 Nov 19Gregory Valatin wp2 14 Nov 19
Gregory Valatin wp2 14 Nov 19
 
Ewan Mollison wp4 14 Nov 19
Ewan Mollison wp4 14 Nov 19Ewan Mollison wp4 14 Nov 19
Ewan Mollison wp4 14 Nov 19
 
David Cooke wp1 14 Nov 19
David Cooke wp1 14 Nov 19David Cooke wp1 14 Nov 19
David Cooke wp1 14 Nov 19
 
Mariella Marzano wp2 14 Nov 19
Mariella Marzano wp2 14 Nov 19Mariella Marzano wp2 14 Nov 19
Mariella Marzano wp2 14 Nov 19
 
Sarah Green wp5 Nov 2018
Sarah Green wp5 Nov 2018Sarah Green wp5 Nov 2018
Sarah Green wp5 Nov 2018
 
Paul Sharp and Ewan Mollison wp4 Nov 2018
Paul Sharp and Ewan Mollison wp4 Nov 2018Paul Sharp and Ewan Mollison wp4 Nov 2018
Paul Sharp and Ewan Mollison wp4 Nov 2018
 
Mike Dunn wp3 Nov 2018
Mike Dunn wp3 Nov 2018Mike Dunn wp3 Nov 2018
Mike Dunn wp3 Nov 2018
 
Mariella Marzano and Mike Dunn wp2 Nov 2018
Mariella Marzano and Mike Dunn wp2 Nov 2018Mariella Marzano and Mike Dunn wp2 Nov 2018
Mariella Marzano and Mike Dunn wp2 Nov 2018
 
Leighton Pritchard wp1 Nov 2018
Leighton Pritchard wp1 Nov 2018Leighton Pritchard wp1 Nov 2018
Leighton Pritchard wp1 Nov 2018
 
Gregory Valatin wp2 Nov 2018
Gregory Valatin wp2 Nov 2018Gregory Valatin wp2 Nov 2018
Gregory Valatin wp2 Nov 2018
 
Glyn Jones wp2 Nov 2018
Glyn Jones wp2 Nov 2018Glyn Jones wp2 Nov 2018
Glyn Jones wp2 Nov 2018
 
David Cooke wp1 Nov 2018
David Cooke wp1 Nov 2018David Cooke wp1 Nov 2018
David Cooke wp1 Nov 2018
 

Recently uploaded

Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
University of Maribor
 
Mudde & Rovira Kaltwasser. - Populism - a very short introduction [2017].pdf
Mudde & Rovira Kaltwasser. - Populism - a very short introduction [2017].pdfMudde & Rovira Kaltwasser. - Populism - a very short introduction [2017].pdf
Mudde & Rovira Kaltwasser. - Populism - a very short introduction [2017].pdf
frank0071
 
Leaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdfLeaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdf
RenuJangid3
 
Seminar of U.V. Spectroscopy by SAMIR PANDA
 Seminar of U.V. Spectroscopy by SAMIR PANDA Seminar of U.V. Spectroscopy by SAMIR PANDA
Seminar of U.V. Spectroscopy by SAMIR PANDA
SAMIR PANDA
 
DMARDs Pharmacolgy Pharm D 5th Semester.pdf
DMARDs Pharmacolgy Pharm D 5th Semester.pdfDMARDs Pharmacolgy Pharm D 5th Semester.pdf
DMARDs Pharmacolgy Pharm D 5th Semester.pdf
fafyfskhan251kmf
 
Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...
Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...
Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...
Studia Poinsotiana
 
Richard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlandsRichard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlands
Richard Gill
 
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Sérgio Sacani
 
Orion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWSOrion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWS
Columbia Weather Systems
 
role of pramana in research.pptx in science
role of pramana in research.pptx in sciencerole of pramana in research.pptx in science
role of pramana in research.pptx in science
sonaliswain16
 
Toxic effects of heavy metals : Lead and Arsenic
Toxic effects of heavy metals : Lead and ArsenicToxic effects of heavy metals : Lead and Arsenic
Toxic effects of heavy metals : Lead and Arsenic
sanjana502982
 
In silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptxIn silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptx
AlaminAfendy1
 
GBSN - Microbiology (Lab 4) Culture Media
GBSN - Microbiology (Lab 4) Culture MediaGBSN - Microbiology (Lab 4) Culture Media
GBSN - Microbiology (Lab 4) Culture Media
Areesha Ahmad
 
Phenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvementPhenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvement
IshaGoswami9
 
general properties of oerganologametal.ppt
general properties of oerganologametal.pptgeneral properties of oerganologametal.ppt
general properties of oerganologametal.ppt
IqrimaNabilatulhusni
 
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
yqqaatn0
 
What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.
moosaasad1975
 
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...
Wasswaderrick3
 
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
University of Maribor
 
bordetella pertussis.................................ppt
bordetella pertussis.................................pptbordetella pertussis.................................ppt
bordetella pertussis.................................ppt
kejapriya1
 

Recently uploaded (20)

Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
 
Mudde & Rovira Kaltwasser. - Populism - a very short introduction [2017].pdf
Mudde & Rovira Kaltwasser. - Populism - a very short introduction [2017].pdfMudde & Rovira Kaltwasser. - Populism - a very short introduction [2017].pdf
Mudde & Rovira Kaltwasser. - Populism - a very short introduction [2017].pdf
 
Leaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdfLeaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdf
 
Seminar of U.V. Spectroscopy by SAMIR PANDA
 Seminar of U.V. Spectroscopy by SAMIR PANDA Seminar of U.V. Spectroscopy by SAMIR PANDA
Seminar of U.V. Spectroscopy by SAMIR PANDA
 
DMARDs Pharmacolgy Pharm D 5th Semester.pdf
DMARDs Pharmacolgy Pharm D 5th Semester.pdfDMARDs Pharmacolgy Pharm D 5th Semester.pdf
DMARDs Pharmacolgy Pharm D 5th Semester.pdf
 
Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...
Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...
Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...
 
Richard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlandsRichard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlands
 
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
 
Orion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWSOrion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWS
 
role of pramana in research.pptx in science
role of pramana in research.pptx in sciencerole of pramana in research.pptx in science
role of pramana in research.pptx in science
 
Toxic effects of heavy metals : Lead and Arsenic
Toxic effects of heavy metals : Lead and ArsenicToxic effects of heavy metals : Lead and Arsenic
Toxic effects of heavy metals : Lead and Arsenic
 
In silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptxIn silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptx
 
GBSN - Microbiology (Lab 4) Culture Media
GBSN - Microbiology (Lab 4) Culture MediaGBSN - Microbiology (Lab 4) Culture Media
GBSN - Microbiology (Lab 4) Culture Media
 
Phenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvementPhenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvement
 
general properties of oerganologametal.ppt
general properties of oerganologametal.pptgeneral properties of oerganologametal.ppt
general properties of oerganologametal.ppt
 
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
 
What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.
 
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...
 
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
 
bordetella pertussis.................................ppt
bordetella pertussis.................................pptbordetella pertussis.................................ppt
bordetella pertussis.................................ppt
 

DATA ANALYSIS SOFTWARE: METAPY

  • 3. Method Tool DNA extraction/ PCR DNAseq QC, Trim, Chimera detection Assemble reads Error correction ???: Bayes Hammer Nested PCR Illumina overlapping read Fastqc, Trimmomatic, Vsearch Flash / PEAR Convert FQ, FA Trim primers off seq Cluster Biopython Python Swarm CD-HIT Vsearch Bowtie Blastclust Python: sklearn Compare clustering Graphics Summarise species Python
  • 4. Clustering Swarm – Not biased to input order. Works on a d difference method CD-HIT – Biased to input order. Works on % identity Vsearch - Biased to input order. Works on % identity BLASTCLUST – Unknown bias to input order. Works on % identity Bowtie - PERFECT matches only! Read database
  • 5. New approach to Metabarcoding analysis – Exact sequence variants Jamie Orr OTU – Operational Taxonomic Unit
  • 6. OTUs vs ZOTUs (Exact sequence variants) • OTUs define sequence-similar groups; variation could be biological, or technical (PCR/sequencing). • ZOTUs explicitly try to correct PCR and sequencing errors.
  • 7. Correcting - ZOTUs • Two sequences (A and B) • Skew = abundanceA/abundanceB • B(d)=1/2ad+1 Where “d” is the number of positional differences between two sequences “a” is set by the user • If skew is less than B(d) then A is assigned to B
  • 8. Method Tool DNA extraction/ PCR DNAseq QC, Trim, Chimera detection Assemble reads Nested PCR Illumina overlapping read Fastqc, Trimmomatic, Vsearch Flash / PEAR Convert FQ, FA Trim primers off seq Cluster Biopython Python Swarm CD-HIT Vsearch Bowtie Blastclust Python: sklearn Compare clustering Graphics Summarise species Python DADA2 (ZOTU)
  • 9. Metapy checks database is OK : INFO: QC passed on sequences: assembled_skew: normal skewtest assemb_lens = 0.718 pvalue = 0.4731 database_skew: normal skewtest db_lens = -2.703 pvalue = 0.0069 Mann_whitney U test: 0.000104940190514 INFO: db_mean= 196.958 db_stdev= 18.808 assem_mean = 189.681 , assem_stdev = 21.155
  • 10. Metapy checks database is OK to use: FAILED – Used in previous publication The assembled size of your reads is significantly different to your database. You need to adjust your DB sequences to that of the region you sequenced. assembled_skew: normal skewtest assemb_lens = 0.718 pvalue = 0.4731 database_skew: normal skewtest db_lens = - 8.199 pvalue = 0.0000 Mann_whitney U test: 1.3189757498e-85 db_mean= 711.194 db_stdev= 218.250 assem_mean = 189.681 , assem_stdev = 21.155INFO
  • 11. Database matters!!! • If you are going to pick species based on a database. These entries matter! • Reference database quality critically determines classification accuracy! • Compare 5 Phytophthora database. • 2 used for publications
  • 12. Database matters!!! Phytophthora_db_v0.001 • Tracked on Github • Can be automatically updated and generated by scripts. • If you are going to pick species based on a database. These entries matter! • Reference database quality critically determines classification accuracy! • Compare 5 Phytophthora database. • 2 used for publications
  • 13. Compare databases Out of a known 10 species "spiked" sample - DNAmix Database: TOOL: Catergory 235_FULL length_error _removed 235_trimm ed_to_ITS1 Santi_ modified David’s pre - database trimmed to ITS1 Phytophthora DB version 0.01 Result found in all tools true positives 0 4 5 2 4 mis - cluster 0 3 1 3 1 Blastclust true positives 3 7 9 8 9 mis - cluster 23 34 37 29 21 Bowtie true positives 4 4 5 3 4 mis - cluster 5 3 1 3 1 cdhit true positives 7 7 8 7 8 mis - cluster 39 23 17 26 19 Swarm true positives 0 7 8 7 8 mis - cluster 0 11 8 20 19 Vsearch fastclust true positives 4 7 8 4 7 mis - cluster 26 15 12 11 8 Vsearch true positives 6 6 8 4 7 mis - cluster 7 7 7 8 5 DADA2 true positives 0 3 3 2 3 mis - cluster 0 2 1 2 0
  • 14. Out of a known 10 species "spiked" sample - DNAmix Database: TOOL: Catergory 235_FULL length_error _removed 235_trimm ed_to_ITS1 Santi_ modified David’s pre - database trimmed to ITS1 Phytophthora DB version 0.01 Result found in all tools true positives 0 4 5 2 4 mis - cluster 0 3 1 3 1 Blastclust true positives 3 7 9 8 9 mis - cluster 23 34 37 29 21 Bowtie true positives 4 4 5 3 4 mis - cluster 5 3 1 3 1 cdhit true positives 7 7 8 7 8 mis - cluster 39 23 17 26 19 Swarm true positives 0 7 8 7 8 mis - cluster 0 11 8 20 19 Vsearch fastclust true positives 4 7 8 4 7 mis - cluster 26 15 12 11 8 Vsearch true positives 6 6 8 4 7 mis - cluster 7 7 7 8 5 DADA2 true positives 0 3 3 2 3 mis - cluster 0 2 1 2 0 Compare databases Message 1: • Database length matters. • Including non-ITS1 region has negative impact (obvious, but used in publications!) Out of a known 10 species "spiked" sample - DNAmix Database: tergory 235_FULL length_error _removed 235_trimm ed_to_ITS1 Santi_ modified David’s pre - database trimmed to ITS1 Phytophthora DB version 0.01 true ositives 0 4 5 2 4 mis - luster 0 3 1 3 1 true ositives 3 7 9 8 9 mis - luster 23 34 37 29 21 true ositives 4 4 5 3 4 mis - luster 5 3 1 3 1 true ositives 7 7 8 7 8 mis - luster 39 23 17 26 19 true ositives 0 7 8 7 8 mis - luster 0 11 8 20 19 true ositives 4 7 8 4 7 mis - luster 26 15 12 11 8 true ositives 6 6 8 4 7 mis - luster 7 7 7 8 5 true ositives 0 3 3 2 3 mis - luster 0 2 1 2 0
  • 15. Compare databases Message 2: • Bowtie and DADA2 reduced mis-cluster rate (and true positive rate) Out of a known 10 species "spiked" sample - DNAmix Database: TOOL: Catergory 235_FULL length_error _removed 235_trimm ed_to_ITS1 Santi_ modified David’s pre - database trimmed to ITS1 Phytophthora DB version 0.01 Result found in all tools true positives 0 4 5 2 4 mis - cluster 0 3 1 3 1 Blastclust true positives 3 7 9 8 9 mis - cluster 23 34 37 29 21 Bowtie true positives 4 4 5 3 4 mis - cluster 5 3 1 3 1 cdhit true positives 7 7 8 7 8 mis - cluster 39 23 17 26 19 Swarm true positives 0 7 8 7 8 mis - cluster 0 11 8 20 19 Vsearch fastclust true positives 4 7 8 4 7 mis - cluster 26 15 12 11 8 Vsearch true positives 6 6 8 4 7 mis - cluster 7 7 7 8 5 DADA2 true positives 0 3 3 2 3 mis - cluster 0 2 1 2 0 Out of a known 10 species "spiked" sample - DNAmix Database: ry 235_FULL length_error _removed 235_trimm ed_to_ITS1 Santi_ modified David’s pre - database trimmed to ITS1 Phytophthora DB version 0.01 es 0 4 5 2 4 r 0 3 1 3 1 es 3 7 9 8 9 r 23 34 37 29 21 es 4 4 5 3 4 r 5 3 1 3 1 es 7 7 8 7 8 r 39 23 17 26 19 es 0 7 8 7 8 r 0 11 8 20 19 es 4 7 8 4 7 r 26 15 12 11 8 es 6 6 8 4 7 r 7 7 7 8 5 es 0 3 3 2 3 r 0 2 1 2 0 Out of a known 10 species "spiked" sample - DNAmix Database: TOOL: Catergory 235_FULL length_error _removed 235_trimm ed_to_ITS1 Santi_ modified David’s pre - database trimmed to ITS1 Phytophthora DB version 0.01 Result und in ll tools true positives 0 4 5 2 4 mis - cluster 0 3 1 3 1 astclust true positives 3 7 9 8 9 mis - cluster 23 34 37 29 21 Bowtie true positives 4 4 5 3 4 mis - cluster 5 3 1 3 1 cdhit true positives 7 7 8 7 8 mis - cluster 39 23 17 26 19 Swarm true positives 0 7 8 7 8 mis - cluster 0 11 8 20 19 search astclust true positives 4 7 8 4 7 mis - cluster 26 15 12 11 8 search true positives 6 6 8 4 7 mis - cluster 7 7 7 8 5 DADA2 true positives 0 3 3 2 3 mis - cluster 0 2 1 2 0
  • 16. Compare databases Message 2: • Bowtie and DADA2 reduced false positive rate Message 3: • Blastclust is the worst. We knew that already!! • Blastclust does not produce reliable identifications with these ITS1 databases. • Blastclust also deprecated – do not use! Out of a known 10 species "spiked" sample - DNAmix Database: TOOL: Catergory 235_FULL length_error _removed 235_trimm ed_to_ITS1 Santi_ modified David’s pre - database trimmed to ITS1 Phytophthora DB version 0.01 Result found in all tools true positives 0 4 5 2 4 mis - cluster 0 3 1 3 1 Blastclust true positives 3 7 9 8 9 mis - cluster 23 34 37 29 21 Bowtie true positives 4 4 5 3 4 mis - cluster 5 3 1 3 1 cdhit true positives 7 7 8 7 8 mis - cluster 39 23 17 26 19 Swarm true positives 0 7 8 7 8 mis - cluster 0 11 8 20 19 Vsearch fastclust true positives 4 7 8 4 7 mis - cluster 26 15 12 11 8 Vsearch true positives 6 6 8 4 7 mis - cluster 7 7 7 8 5 DADA2 true positives 0 3 3 2 3 mis - cluster 0 2 1 2 0 Out of a known 10 species "spiked" sample - DNAmix Database: ory 235_FULL length_error _removed 235_trimm ed_to_ITS1 Santi_ modified David’s pre - database trimmed to ITS1 Phytophthora DB version 0.01 es 0 4 5 2 4 r 0 3 1 3 1 es 3 7 9 8 9 r 23 34 37 29 21 es 4 4 5 3 4 r 5 3 1 3 1 es 7 7 8 7 8 r 39 23 17 26 19 es 0 7 8 7 8 r 0 11 8 20 19 es 4 7 8 4 7 r 26 15 12 11 8 es 6 6 8 4 7 r 7 7 7 8 5 es 0 3 3 2 3 r 0 2 1 2 0 Out of a known 10 species "spiked" sample - DNAmix Database: OOL: Catergory 235_FULL length_error _removed 235_trimm ed_to_ITS1 Santi_ modified David’s pre - database trimmed to ITS1 Phytophthora DB version 0.01 esult nd in tools true positives 0 4 5 2 4 mis - cluster 0 3 1 3 1 tclust true positives 3 7 9 8 9 mis - cluster 23 34 37 29 21 wtie true positives 4 4 5 3 4 mis - cluster 5 3 1 3 1 dhit true positives 7 7 8 7 8 mis - cluster 39 23 17 26 19 warm true positives 0 7 8 7 8 mis - cluster 0 11 8 20 19 earch tclust true positives 4 7 8 4 7 mis - cluster 26 15 12 11 8 earch true positives 6 6 8 4 7 mis - cluster 7 7 7 8 5 ADA2 true positives 0 3 3 2 3 mis - cluster 0 2 1 2 0
  • 17. Out of a known 10 species "spiked" sample - DNAmix Database: ry 235_FULL length_error _removed 235_trimm ed_to_ITS1 Santi_ modified David’s pre - database trimmed to ITS1 Phytophthora DB version 0.01 es 0 4 5 2 4 r 0 3 1 3 1 es 3 7 9 8 9 r 23 34 37 29 21 es 4 4 5 3 4 r 5 3 1 3 1 es 7 7 8 7 8 r 39 23 17 26 19 es 0 7 8 7 8 r 0 11 8 20 19 es 4 7 8 4 7 r 26 15 12 11 8 es 6 6 8 4 7 r 7 7 7 8 5 es 0 3 3 2 3 r 0 2 1 2 0 Compare databases Message 2: • Bowtie and DADA2 reduced false positive rate Message 3: • Blastclust is the worst. Message 4: • These results are helping us refine the DB. Mis-cluster rate is now reducing Out of a known 10 species "spiked" sample - DNAmix Database: TOOL: Catergory 235_FULL length_error _removed 235_trimm ed_to_ITS1 Santi_ modified David’s pre - database trimmed to ITS1 Phytophthora DB version 0.01 Result found in all tools true positives 0 4 5 2 4 mis - cluster 0 3 1 3 1 Blastclust true positives 3 7 9 8 9 mis - cluster 23 34 37 29 21 Bowtie true positives 4 4 5 3 4 mis - cluster 5 3 1 3 1 cdhit true positives 7 7 8 7 8 mis - cluster 39 23 17 26 19 Swarm true positives 0 7 8 7 8 mis - cluster 0 11 8 20 19 Vsearch fastclust true positives 4 7 8 4 7 mis - cluster 26 15 12 11 8 Vsearch true positives 6 6 8 4 7 mis - cluster 7 7 7 8 5 DADA2 true positives 0 3 3 2 3 mis - cluster 0 2 1 2 0 Out of a known 10 species "spiked" sample - DNAmix Database: Catergory 235_FULL length_error _removed 235_trimm ed_to_ITS1 Santi_ modified David’s pre - database trimmed to ITS1 Phytophthora DB version 0.01 true positives 0 4 5 2 4 mis - cluster 0 3 1 3 1 true positives 3 7 9 8 9 mis - cluster 23 34 37 29 21 true positives 4 4 5 3 4 mis - cluster 5 3 1 3 1 true positives 7 7 8 7 8 mis - cluster 39 23 17 26 19 true positives 0 7 8 7 8 mis - cluster 0 11 8 20 19 true positives 4 7 8 4 7 mis - cluster 26 15 12 11 8 true positives 6 6 8 4 7 mis - cluster 7 7 7 8 5 true positives 0 3 3 2 3 mis - cluster 0 2 1 2 0
  • 18. Other software made for this project • Software estimates copy number of a given gene of interest. • ITS(theoretical) = ∑ITS_hits ⋅ (x̅ ITS_coverage(assembled) / x̅ gene_coverage) https://github.com/widdowquinn/THAPBI/tree/master/Phyt_ITS_identifying_pipeline Quantify gene copy number: Sanger sequencing identification: • No need for “pointy and clicky” sequencing editor, then web BLAST • Does it all for you! Sanger read ----> Species https://github.com/peterthorpe5/public_scripts/tree/master/Sanger_read_metagenetics
  • 19. Future directions “Pipeline” needs to be verified with controls.  Sequencing controls: known spikes, “fake” sequences to obtain error rates, identification limitations  TODO: Write Bayesian based clustering/ probabilistic model
  • 20. Thanks! Plant health testing and natural ecosystem surveillance via In situ water sampling and metabarcoding of Phytophthora diversity THE TEAM! David Cooke Leighton Pritchard Eva Randall & Beatrix Clark

Editor's Notes

  1. First why whither and what does it mean? “What is the likely future of” To remind us that language also changes and evolves as do species concepts Terminology – metabarcoding better term