SlideShare a Scribd company logo
(an example of)
Computing the
Microbial World
Rob Beiko
June 25, 2014
Siddique et al. (2014) Front Microbiol
Lawley et al., PLoS Genet (2012)
The Breakfast Organisms
"Bacon Fields" Author: Michael DeForge
240M “pieces”, each 150 nucleotides long
3.6 x 1010 nucleotides
~40 GB
Hundreds of “species”
Genomes between 1.5M – 6M nucleotides
150 nt x 150 nt
We know this And this
But not this
who is doing what?
Marker genes WHO
Environmental “Shotgun” WHAT
The challenge of
METAGENOME CLASSIFICATION
Clues – Sequence similarity
(homology)
150 nt x 150 nt
Reference
genes
Take the WHOLE SEQUENCE
Best
Worst
Clues – composition
150 nt x 150 nt
Reference
genome
k-mer profiles
Genome #1:
20% G & C
30% A & T
Genome #2:
24% G & C
26% A & T
Best
Worst
Take a
K-MER FREQUENCY
DECOMPOSITION
Homology >> Composition
* GGCTGGACCA
1 GACTGGACCA
2 GGCCGGACTA
But homology evidence can
mislead or be absent
Homology + Composition >
Homology alone
GGCTGGACCA
GCCTGGTCCA
GCCAGGTGCA
GCCTGTCCA
NNNNNNNNNN
NNNNNNNNNN
NNNNNNNNNN
NNNNNNNNNN
NNNNNNNNNN
NNNNNNNNNN
NNNNNNNNNN
NNNNNNNNNN
NNNNNNNNNN
Query:
Subject:
Exact string search? NO
BLAST? OK, but SLOW!
A compromise: UBLAST
• BLAST seeks out very similar “anchor points”
between a pair of sequences before doing a more
thorough search
• Typically, a query is compared against all candidate DB
sequences, but most will return no hits
UBLAST:
GGCTGGACCA
GCCTGTCCA
NNNNNNNNNN
NNNNNNNNNN
GCCAGGTGCA
NNNNNNNNNN
NNNNNNNNNN
NNNNNNNNNN
NNNNNNNNNN
GCCTGGTCCA
NNNNNNNNNN
NNNNNNNNNN
NNNNNNNNNN
NNNNNNNNNN
NNNNNNNNNN
(1) Query,
DB sequences
GGCTGGACCA
GCCTGGTCCA
GCCAGGTGCA
GCCTGTCCA
NNNNNNNNNN
NNNNNNNNNN
NNNNNNNNNN
NNNNNNNNNN
NNNNNNNNNN
NNNNNNNNNN
NNNNNNNNNN
NNNNNNNNNN
NNNNNNNNNN
NNNNNNNNNN
NNNNNNNNNN
(3) Rank DB
based on k-mer
matching
GGCTGGACCA
GCCTGGTCCA
GCCAGGTGCA
GCCTGTCCA
NNNNNNNNNN
NNNNNNNNNN
NNNNNNNNNN
NNNNNNNNNN
NNNNNNNNNN
NNNNNNNNNN
NNNNNNNNNN
NNNNNNNNNN
NNNNNNNNNN
NNNNNNNNNN
NNNNNNNNNN
(4) Do detailed search
until there is
no more point
X
(2) k-mer table
Compositional models
• Interpolated Markov models: adaptively generate
frequency models based on extending k-mers with
sufficiently high frequencies
• One model per genome
• Evaluate probability of each k-mer in query sequence,
given shorter k-mers in sequence
• Model construction can take a while
k = 4 k = 5 k = 6 k = 7
PhymmBL: Brady and Salzberg (2009) Nat Methods
An alternative: Naïve Bayes
• Just compute the frequency of each k-mer for a fixed
length k
• Build one frequency model for each genome
• FAST
• Assumes conditional independence – may not matter
Probability of a query
Fragment originating
from genome Gi
For all k-mers in the fragment…
The frequency of that k-mer in Gi
Parks et al. (2011) BMC Bioinformatics
RITA: Rapid Identification of
Taxonomic Assignments
UBLAST filter
MacDonald et al. (2012) Nucleic Acids Res
Evaluation set
• “Fake metagenome”: take sequences from known
genomes, randomly sample fragments of 50, 100,
200 and 1000 nt in different trials
• Build reference models from other genomes – can
leave close relatives out of reference model
• Leave out other strains within the same species – not so
hard
• Leave out other classes in the same phylum - HARD
But does it work?
Full RITA
Best class
(homology and composition agree)
DNA sequence length50
Predicting genus from different species Predicting phylum from different class
Conclusions
• Careful attention needs to be paid to the choice of
approach – simple is better
• RITA illustrates two key points in (microbial)
bioinformatics:
1. Homology: How heuristic are you willing to go?
2. Naïve Bayes: Keep it simple until told otherwise
• Technological change means that many bioinformatics
algorithms will be irrelevant in 5 years
FIN

More Related Content

What's hot

485 lec4 the_genome
485 lec4 the_genome485 lec4 the_genome
485 lec4 the_genome
hhalhaddad
 
Comparative Genomics and Visualisation - Part 1
Comparative Genomics and Visualisation - Part 1Comparative Genomics and Visualisation - Part 1
Comparative Genomics and Visualisation - Part 1
Leighton Pritchard
 
Replication Of DNA
Replication Of DNAReplication Of DNA
Replication Of DNA
MSCW Mysore
 
485 lec2 history and review (i)
485 lec2 history and review (i)485 lec2 history and review (i)
485 lec2 history and review (i)
hhalhaddad
 
Macromolecule human awareness
Macromolecule  human awarenessMacromolecule  human awareness
Macromolecule human awarenessPaula Mills
 
Macromolecule evolution
Macromolecule  evolutionMacromolecule  evolution
Macromolecule evolutionPaula Mills
 
3.1 genes (2)
3.1 genes (2)3.1 genes (2)
3.1 genes (2)
lucascw
 
Cox2004-Probing_specificity_of_RNAribonucleoprotein_interactions_through_in_v...
Cox2004-Probing_specificity_of_RNAribonucleoprotein_interactions_through_in_v...Cox2004-Probing_specificity_of_RNAribonucleoprotein_interactions_through_in_v...
Cox2004-Probing_specificity_of_RNAribonucleoprotein_interactions_through_in_v...J. Colin Cox
 
Genome sequencing
Genome sequencingGenome sequencing
Genome sequencing
Shital Pal
 
Gene cloning prof.a.k.saha
Gene cloning prof.a.k.sahaGene cloning prof.a.k.saha
Gene cloning prof.a.k.sahaAnanda Saha
 
Miledna All
Miledna AllMiledna All
Miledna All
jlcastilloch
 
Artificial Transformation Methodologies
Artificial Transformation MethodologiesArtificial Transformation Methodologies
Artificial Transformation Methodologies
SAEED S. ALSMANI
 
Bacterial genetics. Basics
Bacterial genetics. BasicsBacterial genetics. Basics
Genetic modification through recombination breeding j.d
Genetic modification through recombination breeding  j.dGenetic modification through recombination breeding  j.d
Genetic modification through recombination breeding j.d
Jagdeep Singh
 
Synthetic biology
Synthetic biologySynthetic biology
Synthetic biology
Vasyl Mykytyuk
 
Pertemuan 1. introduction
Pertemuan 1. introductionPertemuan 1. introduction
Pertemuan 1. introduction
Suryati Purba
 
Introduction to genomes
Introduction to genomesIntroduction to genomes
Introduction to genomesavrilcoghlan
 
Genetic engineering
Genetic engineeringGenetic engineering
Genetic engineering
Dr.Priyanka Sharma
 
Investigation of phylogenic relationships of shrew populations using genetic...
Investigation of phylogenic relationships  of shrew populations using genetic...Investigation of phylogenic relationships  of shrew populations using genetic...
Investigation of phylogenic relationships of shrew populations using genetic...
Juan Barrera
 

What's hot (19)

485 lec4 the_genome
485 lec4 the_genome485 lec4 the_genome
485 lec4 the_genome
 
Comparative Genomics and Visualisation - Part 1
Comparative Genomics and Visualisation - Part 1Comparative Genomics and Visualisation - Part 1
Comparative Genomics and Visualisation - Part 1
 
Replication Of DNA
Replication Of DNAReplication Of DNA
Replication Of DNA
 
485 lec2 history and review (i)
485 lec2 history and review (i)485 lec2 history and review (i)
485 lec2 history and review (i)
 
Macromolecule human awareness
Macromolecule  human awarenessMacromolecule  human awareness
Macromolecule human awareness
 
Macromolecule evolution
Macromolecule  evolutionMacromolecule  evolution
Macromolecule evolution
 
3.1 genes (2)
3.1 genes (2)3.1 genes (2)
3.1 genes (2)
 
Cox2004-Probing_specificity_of_RNAribonucleoprotein_interactions_through_in_v...
Cox2004-Probing_specificity_of_RNAribonucleoprotein_interactions_through_in_v...Cox2004-Probing_specificity_of_RNAribonucleoprotein_interactions_through_in_v...
Cox2004-Probing_specificity_of_RNAribonucleoprotein_interactions_through_in_v...
 
Genome sequencing
Genome sequencingGenome sequencing
Genome sequencing
 
Gene cloning prof.a.k.saha
Gene cloning prof.a.k.sahaGene cloning prof.a.k.saha
Gene cloning prof.a.k.saha
 
Miledna All
Miledna AllMiledna All
Miledna All
 
Artificial Transformation Methodologies
Artificial Transformation MethodologiesArtificial Transformation Methodologies
Artificial Transformation Methodologies
 
Bacterial genetics. Basics
Bacterial genetics. BasicsBacterial genetics. Basics
Bacterial genetics. Basics
 
Genetic modification through recombination breeding j.d
Genetic modification through recombination breeding  j.dGenetic modification through recombination breeding  j.d
Genetic modification through recombination breeding j.d
 
Synthetic biology
Synthetic biologySynthetic biology
Synthetic biology
 
Pertemuan 1. introduction
Pertemuan 1. introductionPertemuan 1. introduction
Pertemuan 1. introduction
 
Introduction to genomes
Introduction to genomesIntroduction to genomes
Introduction to genomes
 
Genetic engineering
Genetic engineeringGenetic engineering
Genetic engineering
 
Investigation of phylogenic relationships of shrew populations using genetic...
Investigation of phylogenic relationships  of shrew populations using genetic...Investigation of phylogenic relationships  of shrew populations using genetic...
Investigation of phylogenic relationships of shrew populations using genetic...
 

Viewers also liked

Beiko dcsi2013
Beiko dcsi2013Beiko dcsi2013
Beiko dcsi2013
beiko
 
Testing & Optimization - A Deeper Look
Testing & Optimization - A Deeper LookTesting & Optimization - A Deeper Look
Testing & Optimization - A Deeper LookCaleb Whitmore
 
Goals and Goal Funnels | Presented by Justin Spencer of Analytics Pros
Goals and Goal Funnels | Presented by Justin Spencer of Analytics ProsGoals and Goal Funnels | Presented by Justin Spencer of Analytics Pros
Goals and Goal Funnels | Presented by Justin Spencer of Analytics ProsCaleb Whitmore
 
Amaia steps nova presentation
Amaia steps nova presentationAmaia steps nova presentation
Amaia steps nova presentationLilibeth Lucas
 
BEST Practices - Testing & Optimization | Bredan Rendan
BEST Practices - Testing & Optimization | Bredan RendanBEST Practices - Testing & Optimization | Bredan Rendan
BEST Practices - Testing & Optimization | Bredan Rendan
Caleb Whitmore
 
Using SEO in Google Analytics | Analytics Pros Webinar by Mark McLaren
Using SEO in Google Analytics | Analytics Pros Webinar by Mark McLarenUsing SEO in Google Analytics | Analytics Pros Webinar by Mark McLaren
Using SEO in Google Analytics | Analytics Pros Webinar by Mark McLarenCaleb Whitmore
 
Ozbun VTSII
Ozbun VTSIIOzbun VTSII
Ozbun VTSIIeozbun
 
Pedoman Nasional Pengobatan Antiretroviral (ART)
Pedoman Nasional Pengobatan Antiretroviral (ART)Pedoman Nasional Pengobatan Antiretroviral (ART)
Pedoman Nasional Pengobatan Antiretroviral (ART)Ditya Permana Adi
 
University of Nebraska Think Tank: Google Analytics as your Digital Marketing...
University of Nebraska Think Tank: Google Analytics as your Digital Marketing...University of Nebraska Think Tank: Google Analytics as your Digital Marketing...
University of Nebraska Think Tank: Google Analytics as your Digital Marketing...
Caleb Whitmore
 
Hsbc
HsbcHsbc
Hsbc
yasirusuf
 
20151223application of deep learning in basic bio
20151223application of deep learning in basic bio 20151223application of deep learning in basic bio
20151223application of deep learning in basic bio
Charlene Hsuan-Lin Her
 
Travel ventures international express
Travel ventures international expressTravel ventures international express
Travel ventures international express
Lilibeth Lucas
 
Spark, Deep Learning and Life Sciences, Systems Biology in the Big Data Age
Spark, Deep Learning and Life Sciences, Systems Biology in the Big Data AgeSpark, Deep Learning and Life Sciences, Systems Biology in the Big Data Age
Spark, Deep Learning and Life Sciences, Systems Biology in the Big Data Age
batchinsights
 
Un jour de traitement efficace par artesunate + sulfamethoxypyrazine/pyriméth...
Un jour de traitement efficace par artesunate + sulfamethoxypyrazine/pyriméth...Un jour de traitement efficace par artesunate + sulfamethoxypyrazine/pyriméth...
Un jour de traitement efficace par artesunate + sulfamethoxypyrazine/pyriméth...
Institut Pasteur de Madagascar
 
Top 10 Things to do in San Francisco
Top 10 Things to do in San FranciscoTop 10 Things to do in San Francisco
Top 10 Things to do in San Francisco
VisitorsCoverage
 
La lutte anti-vectorielle
La lutte anti-vectorielleLa lutte anti-vectorielle
La lutte anti-vectorielle
Institut Pasteur de Madagascar
 
Couverture en Moustiquaires Imprégnées d’Insecticides (MII) en Afrique
Couverture en Moustiquaires Imprégnées d’Insecticides (MII) en AfriqueCouverture en Moustiquaires Imprégnées d’Insecticides (MII) en Afrique
Couverture en Moustiquaires Imprégnées d’Insecticides (MII) en Afrique
Institut Pasteur de Madagascar
 

Viewers also liked (20)

Beiko dcsi2013
Beiko dcsi2013Beiko dcsi2013
Beiko dcsi2013
 
Testing & Optimization - A Deeper Look
Testing & Optimization - A Deeper LookTesting & Optimization - A Deeper Look
Testing & Optimization - A Deeper Look
 
Goals and Goal Funnels | Presented by Justin Spencer of Analytics Pros
Goals and Goal Funnels | Presented by Justin Spencer of Analytics ProsGoals and Goal Funnels | Presented by Justin Spencer of Analytics Pros
Goals and Goal Funnels | Presented by Justin Spencer of Analytics Pros
 
Amaia steps nova presentation
Amaia steps nova presentationAmaia steps nova presentation
Amaia steps nova presentation
 
Ga+presentation
Ga+presentationGa+presentation
Ga+presentation
 
BEST Practices - Testing & Optimization | Bredan Rendan
BEST Practices - Testing & Optimization | Bredan RendanBEST Practices - Testing & Optimization | Bredan Rendan
BEST Practices - Testing & Optimization | Bredan Rendan
 
Using SEO in Google Analytics | Analytics Pros Webinar by Mark McLaren
Using SEO in Google Analytics | Analytics Pros Webinar by Mark McLarenUsing SEO in Google Analytics | Analytics Pros Webinar by Mark McLaren
Using SEO in Google Analytics | Analytics Pros Webinar by Mark McLaren
 
Ozbun VTSII
Ozbun VTSIIOzbun VTSII
Ozbun VTSII
 
Setting vpn pptp client
Setting vpn pptp clientSetting vpn pptp client
Setting vpn pptp client
 
Buku saku faq bpjs
Buku saku faq bpjsBuku saku faq bpjs
Buku saku faq bpjs
 
Pedoman Nasional Pengobatan Antiretroviral (ART)
Pedoman Nasional Pengobatan Antiretroviral (ART)Pedoman Nasional Pengobatan Antiretroviral (ART)
Pedoman Nasional Pengobatan Antiretroviral (ART)
 
University of Nebraska Think Tank: Google Analytics as your Digital Marketing...
University of Nebraska Think Tank: Google Analytics as your Digital Marketing...University of Nebraska Think Tank: Google Analytics as your Digital Marketing...
University of Nebraska Think Tank: Google Analytics as your Digital Marketing...
 
Hsbc
HsbcHsbc
Hsbc
 
20151223application of deep learning in basic bio
20151223application of deep learning in basic bio 20151223application of deep learning in basic bio
20151223application of deep learning in basic bio
 
Travel ventures international express
Travel ventures international expressTravel ventures international express
Travel ventures international express
 
Spark, Deep Learning and Life Sciences, Systems Biology in the Big Data Age
Spark, Deep Learning and Life Sciences, Systems Biology in the Big Data AgeSpark, Deep Learning and Life Sciences, Systems Biology in the Big Data Age
Spark, Deep Learning and Life Sciences, Systems Biology in the Big Data Age
 
Un jour de traitement efficace par artesunate + sulfamethoxypyrazine/pyriméth...
Un jour de traitement efficace par artesunate + sulfamethoxypyrazine/pyriméth...Un jour de traitement efficace par artesunate + sulfamethoxypyrazine/pyriméth...
Un jour de traitement efficace par artesunate + sulfamethoxypyrazine/pyriméth...
 
Top 10 Things to do in San Francisco
Top 10 Things to do in San FranciscoTop 10 Things to do in San Francisco
Top 10 Things to do in San Francisco
 
La lutte anti-vectorielle
La lutte anti-vectorielleLa lutte anti-vectorielle
La lutte anti-vectorielle
 
Couverture en Moustiquaires Imprégnées d’Insecticides (MII) en Afrique
Couverture en Moustiquaires Imprégnées d’Insecticides (MII) en AfriqueCouverture en Moustiquaires Imprégnées d’Insecticides (MII) en Afrique
Couverture en Moustiquaires Imprégnées d’Insecticides (MII) en Afrique
 

Similar to Beiko hpcs

scRNA-Seq Lecture - Stem Cell Network RNA-Seq Workshop 2017
scRNA-Seq Lecture - Stem Cell Network RNA-Seq Workshop 2017scRNA-Seq Lecture - Stem Cell Network RNA-Seq Workshop 2017
scRNA-Seq Lecture - Stem Cell Network RNA-Seq Workshop 2017
David Cook
 
Microbiome studies using 16S ribosomal DNA PCR: some cautionary tales.
Microbiome studies using 16S ribosomal DNA PCR: some cautionary tales.Microbiome studies using 16S ribosomal DNA PCR: some cautionary tales.
Microbiome studies using 16S ribosomal DNA PCR: some cautionary tales.
jennomics
 
2015 beacon-metagenome-tutorial
2015 beacon-metagenome-tutorial2015 beacon-metagenome-tutorial
2015 beacon-metagenome-tutorial
c.titus.brown
 
Structure based computer aided drug design
Structure based computer aided drug designStructure based computer aided drug design
Structure based computer aided drug design
Thanh Truong
 
Plant functionalgenomics
Plant functionalgenomicsPlant functionalgenomics
Plant functionalgenomicsClifford Stone
 
scRNA-Seq Workshop Presentation - Stem Cell Network 2018
scRNA-Seq Workshop Presentation - Stem Cell Network 2018scRNA-Seq Workshop Presentation - Stem Cell Network 2018
scRNA-Seq Workshop Presentation - Stem Cell Network 2018
David Cook
 
Structural Systems Pharmacology
Structural Systems PharmacologyStructural Systems Pharmacology
Structural Systems Pharmacology
Philip Bourne
 
2008 PGSAS G-nomes
2008 PGSAS G-nomes2008 PGSAS G-nomes
2008 PGSAS G-nomesgfb1
 
2008 PGSAS G-nomes
2008 PGSAS G-nomes2008 PGSAS G-nomes
2008 PGSAS G-nomesgfb1
 
Bioinformatics A Biased Overview
Bioinformatics A Biased OverviewBioinformatics A Biased Overview
Bioinformatics A Biased Overview
Philip Bourne
 
CAMERA Presentation at KNAW ICoMM Colloquium May 2008
CAMERA Presentation at KNAW ICoMM Colloquium May 2008CAMERA Presentation at KNAW ICoMM Colloquium May 2008
CAMERA Presentation at KNAW ICoMM Colloquium May 2008
Saul Kravitz
 
Marzillier_09052014.pdf
Marzillier_09052014.pdfMarzillier_09052014.pdf
Marzillier_09052014.pdf
7006ASWATHIRR
 
Sweden_eemis_big_data
Sweden_eemis_big_dataSweden_eemis_big_data
Sweden_eemis_big_data
Adina Chuang Howe
 
06_Alignment_2022.pdf
06_Alignment_2022.pdf06_Alignment_2022.pdf
06_Alignment_2022.pdf
Kristen DeAngelis
 
Vivo vitrothingamajig
Vivo vitrothingamajigVivo vitrothingamajig
Vivo vitrothingamajig
Kevin Thornton
 
Genetics of gene expression primer
Genetics of gene expression primerGenetics of gene expression primer
Genetics of gene expression primer
Chris Cotsapas
 
Building a Community Cyberinfrastructure to Support Marine Microbial Ecology ...
Building a Community Cyberinfrastructure to Support Marine Microbial Ecology ...Building a Community Cyberinfrastructure to Support Marine Microbial Ecology ...
Building a Community Cyberinfrastructure to Support Marine Microbial Ecology ...
Larry Smarr
 
Human genetic variation and its contribution to complex traits
Human genetic variation and its contribution to complex traitsHuman genetic variation and its contribution to complex traits
Human genetic variation and its contribution to complex traits
groovescience
 
Finding Needles in Genomic Haystacks with “Wide” Random Forest: Spark Summit ...
Finding Needles in Genomic Haystacks with “Wide” Random Forest: Spark Summit ...Finding Needles in Genomic Haystacks with “Wide” Random Forest: Spark Summit ...
Finding Needles in Genomic Haystacks with “Wide” Random Forest: Spark Summit ...
Spark Summit
 

Similar to Beiko hpcs (20)

scRNA-Seq Lecture - Stem Cell Network RNA-Seq Workshop 2017
scRNA-Seq Lecture - Stem Cell Network RNA-Seq Workshop 2017scRNA-Seq Lecture - Stem Cell Network RNA-Seq Workshop 2017
scRNA-Seq Lecture - Stem Cell Network RNA-Seq Workshop 2017
 
Microbiome studies using 16S ribosomal DNA PCR: some cautionary tales.
Microbiome studies using 16S ribosomal DNA PCR: some cautionary tales.Microbiome studies using 16S ribosomal DNA PCR: some cautionary tales.
Microbiome studies using 16S ribosomal DNA PCR: some cautionary tales.
 
2015 beacon-metagenome-tutorial
2015 beacon-metagenome-tutorial2015 beacon-metagenome-tutorial
2015 beacon-metagenome-tutorial
 
Structure based computer aided drug design
Structure based computer aided drug designStructure based computer aided drug design
Structure based computer aided drug design
 
Plant functionalgenomics
Plant functionalgenomicsPlant functionalgenomics
Plant functionalgenomics
 
scRNA-Seq Workshop Presentation - Stem Cell Network 2018
scRNA-Seq Workshop Presentation - Stem Cell Network 2018scRNA-Seq Workshop Presentation - Stem Cell Network 2018
scRNA-Seq Workshop Presentation - Stem Cell Network 2018
 
Structural Systems Pharmacology
Structural Systems PharmacologyStructural Systems Pharmacology
Structural Systems Pharmacology
 
2008 PGSAS G-nomes
2008 PGSAS G-nomes2008 PGSAS G-nomes
2008 PGSAS G-nomes
 
2008 PGSAS G-nomes
2008 PGSAS G-nomes2008 PGSAS G-nomes
2008 PGSAS G-nomes
 
Bioinformatics A Biased Overview
Bioinformatics A Biased OverviewBioinformatics A Biased Overview
Bioinformatics A Biased Overview
 
CAMERA Presentation at KNAW ICoMM Colloquium May 2008
CAMERA Presentation at KNAW ICoMM Colloquium May 2008CAMERA Presentation at KNAW ICoMM Colloquium May 2008
CAMERA Presentation at KNAW ICoMM Colloquium May 2008
 
Marzillier_09052014.pdf
Marzillier_09052014.pdfMarzillier_09052014.pdf
Marzillier_09052014.pdf
 
Sweden_eemis_big_data
Sweden_eemis_big_dataSweden_eemis_big_data
Sweden_eemis_big_data
 
06_Alignment_2022.pdf
06_Alignment_2022.pdf06_Alignment_2022.pdf
06_Alignment_2022.pdf
 
Vivo vitrothingamajig
Vivo vitrothingamajigVivo vitrothingamajig
Vivo vitrothingamajig
 
2012 stamps-mbl-1
2012 stamps-mbl-12012 stamps-mbl-1
2012 stamps-mbl-1
 
Genetics of gene expression primer
Genetics of gene expression primerGenetics of gene expression primer
Genetics of gene expression primer
 
Building a Community Cyberinfrastructure to Support Marine Microbial Ecology ...
Building a Community Cyberinfrastructure to Support Marine Microbial Ecology ...Building a Community Cyberinfrastructure to Support Marine Microbial Ecology ...
Building a Community Cyberinfrastructure to Support Marine Microbial Ecology ...
 
Human genetic variation and its contribution to complex traits
Human genetic variation and its contribution to complex traitsHuman genetic variation and its contribution to complex traits
Human genetic variation and its contribution to complex traits
 
Finding Needles in Genomic Haystacks with “Wide” Random Forest: Spark Summit ...
Finding Needles in Genomic Haystacks with “Wide” Random Forest: Spark Summit ...Finding Needles in Genomic Haystacks with “Wide” Random Forest: Spark Summit ...
Finding Needles in Genomic Haystacks with “Wide” Random Forest: Spark Summit ...
 

More from beiko

ASMNGS_ARETE_Beiko_2022Oct19.pptx
ASMNGS_ARETE_Beiko_2022Oct19.pptxASMNGS_ARETE_Beiko_2022Oct19.pptx
ASMNGS_ARETE_Beiko_2022Oct19.pptx
beiko
 
Beiko cmo gen_epi_monday
Beiko cmo gen_epi_mondayBeiko cmo gen_epi_monday
Beiko cmo gen_epi_monday
beiko
 
Beiko networks 2019_final
Beiko networks 2019_finalBeiko networks 2019_final
Beiko networks 2019_final
beiko
 
Biomedical data
Biomedical dataBiomedical data
Biomedical data
beiko
 
Rob csm2018
Rob csm2018Rob csm2018
Rob csm2018
beiko
 
Beiko taconic-nov3
Beiko taconic-nov3Beiko taconic-nov3
Beiko taconic-nov3
beiko
 
CCBC tutorial beiko
CCBC tutorial beikoCCBC tutorial beiko
CCBC tutorial beiko
beiko
 
GenGIS presentation at Vizbi 2016
GenGIS presentation at Vizbi 2016GenGIS presentation at Vizbi 2016
GenGIS presentation at Vizbi 2016
beiko
 
Beiko ANL Soil Metagenomics presentation
Beiko ANL Soil Metagenomics presentationBeiko ANL Soil Metagenomics presentation
Beiko ANL Soil Metagenomics presentation
beiko
 
DCSI presentation 2015
DCSI presentation 2015DCSI presentation 2015
DCSI presentation 2015
beiko
 
2015 06-12-beiko-irida-big data
2015 06-12-beiko-irida-big data2015 06-12-beiko-irida-big data
2015 06-12-beiko-irida-big data
beiko
 
Beiko cms final
Beiko cms finalBeiko cms final
Beiko cms final
beiko
 
Is microbial ecology driven by roaming genes?
Is microbial ecology driven by roaming genes?Is microbial ecology driven by roaming genes?
Is microbial ecology driven by roaming genes?
beiko
 
Gene sharing in microbes: good for the individual, good for the community?
Gene sharing in microbes: good for the individual, good for the community?Gene sharing in microbes: good for the individual, good for the community?
Gene sharing in microbes: good for the individual, good for the community?
beiko
 
Beiko biogeography
Beiko biogeographyBeiko biogeography
Beiko biogeography
beiko
 
2014 04-beiko-biology
2014 04-beiko-biology2014 04-beiko-biology
2014 04-beiko-biology
beiko
 
Beiko Deep Genomics presentation - "Grand theft operon - lateral city"
Beiko Deep Genomics presentation - "Grand theft operon - lateral city"Beiko Deep Genomics presentation - "Grand theft operon - lateral city"
Beiko Deep Genomics presentation - "Grand theft operon - lateral city"beiko
 
Rob's GenGIS presentation at IBS Special Meeting (Montreal 2013)
Rob's GenGIS presentation at IBS Special Meeting (Montreal 2013)Rob's GenGIS presentation at IBS Special Meeting (Montreal 2013)
Rob's GenGIS presentation at IBS Special Meeting (Montreal 2013)
beiko
 
Beiko smbe2013-final
Beiko smbe2013-finalBeiko smbe2013-final
Beiko smbe2013-finalbeiko
 
Rob Beiko - #SMBE12 presentation
Rob Beiko - #SMBE12 presentationRob Beiko - #SMBE12 presentation
Rob Beiko - #SMBE12 presentation
beiko
 

More from beiko (20)

ASMNGS_ARETE_Beiko_2022Oct19.pptx
ASMNGS_ARETE_Beiko_2022Oct19.pptxASMNGS_ARETE_Beiko_2022Oct19.pptx
ASMNGS_ARETE_Beiko_2022Oct19.pptx
 
Beiko cmo gen_epi_monday
Beiko cmo gen_epi_mondayBeiko cmo gen_epi_monday
Beiko cmo gen_epi_monday
 
Beiko networks 2019_final
Beiko networks 2019_finalBeiko networks 2019_final
Beiko networks 2019_final
 
Biomedical data
Biomedical dataBiomedical data
Biomedical data
 
Rob csm2018
Rob csm2018Rob csm2018
Rob csm2018
 
Beiko taconic-nov3
Beiko taconic-nov3Beiko taconic-nov3
Beiko taconic-nov3
 
CCBC tutorial beiko
CCBC tutorial beikoCCBC tutorial beiko
CCBC tutorial beiko
 
GenGIS presentation at Vizbi 2016
GenGIS presentation at Vizbi 2016GenGIS presentation at Vizbi 2016
GenGIS presentation at Vizbi 2016
 
Beiko ANL Soil Metagenomics presentation
Beiko ANL Soil Metagenomics presentationBeiko ANL Soil Metagenomics presentation
Beiko ANL Soil Metagenomics presentation
 
DCSI presentation 2015
DCSI presentation 2015DCSI presentation 2015
DCSI presentation 2015
 
2015 06-12-beiko-irida-big data
2015 06-12-beiko-irida-big data2015 06-12-beiko-irida-big data
2015 06-12-beiko-irida-big data
 
Beiko cms final
Beiko cms finalBeiko cms final
Beiko cms final
 
Is microbial ecology driven by roaming genes?
Is microbial ecology driven by roaming genes?Is microbial ecology driven by roaming genes?
Is microbial ecology driven by roaming genes?
 
Gene sharing in microbes: good for the individual, good for the community?
Gene sharing in microbes: good for the individual, good for the community?Gene sharing in microbes: good for the individual, good for the community?
Gene sharing in microbes: good for the individual, good for the community?
 
Beiko biogeography
Beiko biogeographyBeiko biogeography
Beiko biogeography
 
2014 04-beiko-biology
2014 04-beiko-biology2014 04-beiko-biology
2014 04-beiko-biology
 
Beiko Deep Genomics presentation - "Grand theft operon - lateral city"
Beiko Deep Genomics presentation - "Grand theft operon - lateral city"Beiko Deep Genomics presentation - "Grand theft operon - lateral city"
Beiko Deep Genomics presentation - "Grand theft operon - lateral city"
 
Rob's GenGIS presentation at IBS Special Meeting (Montreal 2013)
Rob's GenGIS presentation at IBS Special Meeting (Montreal 2013)Rob's GenGIS presentation at IBS Special Meeting (Montreal 2013)
Rob's GenGIS presentation at IBS Special Meeting (Montreal 2013)
 
Beiko smbe2013-final
Beiko smbe2013-finalBeiko smbe2013-final
Beiko smbe2013-final
 
Rob Beiko - #SMBE12 presentation
Rob Beiko - #SMBE12 presentationRob Beiko - #SMBE12 presentation
Rob Beiko - #SMBE12 presentation
 

Recently uploaded

By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
Pierluigi Pugliese
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
Welocme to ViralQR, your best QR code generator.
Welocme to ViralQR, your best QR code generator.Welocme to ViralQR, your best QR code generator.
Welocme to ViralQR, your best QR code generator.
ViralQR
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
UiPathCommunity
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
RinaMondal9
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
Peter Spielvogel
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 

Recently uploaded (20)

By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
Welocme to ViralQR, your best QR code generator.
Welocme to ViralQR, your best QR code generator.Welocme to ViralQR, your best QR code generator.
Welocme to ViralQR, your best QR code generator.
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 

Beiko hpcs

  • 1. (an example of) Computing the Microbial World Rob Beiko June 25, 2014
  • 2.
  • 3. Siddique et al. (2014) Front Microbiol
  • 4. Lawley et al., PLoS Genet (2012)
  • 5.
  • 6. The Breakfast Organisms "Bacon Fields" Author: Michael DeForge
  • 7. 240M “pieces”, each 150 nucleotides long 3.6 x 1010 nucleotides ~40 GB Hundreds of “species” Genomes between 1.5M – 6M nucleotides
  • 8. 150 nt x 150 nt We know this And this But not this
  • 9. who is doing what? Marker genes WHO Environmental “Shotgun” WHAT The challenge of METAGENOME CLASSIFICATION
  • 10. Clues – Sequence similarity (homology) 150 nt x 150 nt Reference genes Take the WHOLE SEQUENCE Best Worst
  • 11. Clues – composition 150 nt x 150 nt Reference genome k-mer profiles Genome #1: 20% G & C 30% A & T Genome #2: 24% G & C 26% A & T Best Worst Take a K-MER FREQUENCY DECOMPOSITION
  • 12. Homology >> Composition * GGCTGGACCA 1 GACTGGACCA 2 GGCCGGACTA But homology evidence can mislead or be absent Homology + Composition > Homology alone
  • 14. A compromise: UBLAST • BLAST seeks out very similar “anchor points” between a pair of sequences before doing a more thorough search • Typically, a query is compared against all candidate DB sequences, but most will return no hits UBLAST: GGCTGGACCA GCCTGTCCA NNNNNNNNNN NNNNNNNNNN GCCAGGTGCA NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN GCCTGGTCCA NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN (1) Query, DB sequences GGCTGGACCA GCCTGGTCCA GCCAGGTGCA GCCTGTCCA NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN (3) Rank DB based on k-mer matching GGCTGGACCA GCCTGGTCCA GCCAGGTGCA GCCTGTCCA NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN (4) Do detailed search until there is no more point X (2) k-mer table
  • 15. Compositional models • Interpolated Markov models: adaptively generate frequency models based on extending k-mers with sufficiently high frequencies • One model per genome • Evaluate probability of each k-mer in query sequence, given shorter k-mers in sequence • Model construction can take a while k = 4 k = 5 k = 6 k = 7 PhymmBL: Brady and Salzberg (2009) Nat Methods
  • 16. An alternative: Naïve Bayes • Just compute the frequency of each k-mer for a fixed length k • Build one frequency model for each genome • FAST • Assumes conditional independence – may not matter Probability of a query Fragment originating from genome Gi For all k-mers in the fragment… The frequency of that k-mer in Gi Parks et al. (2011) BMC Bioinformatics
  • 17. RITA: Rapid Identification of Taxonomic Assignments UBLAST filter MacDonald et al. (2012) Nucleic Acids Res
  • 18. Evaluation set • “Fake metagenome”: take sequences from known genomes, randomly sample fragments of 50, 100, 200 and 1000 nt in different trials • Build reference models from other genomes – can leave close relatives out of reference model • Leave out other strains within the same species – not so hard • Leave out other classes in the same phylum - HARD
  • 19.
  • 20. But does it work? Full RITA Best class (homology and composition agree) DNA sequence length50 Predicting genus from different species Predicting phylum from different class
  • 21. Conclusions • Careful attention needs to be paid to the choice of approach – simple is better • RITA illustrates two key points in (microbial) bioinformatics: 1. Homology: How heuristic are you willing to go? 2. Naïve Bayes: Keep it simple until told otherwise • Technological change means that many bioinformatics algorithms will be irrelevant in 5 years
  • 22. FIN