SlideShare a Scribd company logo
1 of 48
Classifying biological information
the promise and perils of DNA sequences
Rob Beiko
September 19
DCSI 2013
Norm MacDonald Donovan Parks
1
From Francis Crick’s letter to his son Michael, 1953
2
Your Genome and You
23 chromosomes
20,000 genes
3.1 billion nucleotides
Mycobacterium
tuberculosis
1 chromosome
4,000 genes
4.4 million nucleotides
Tremblaya
princeps
1 chromosome
110 genes
138,931 nucleotides
Daphnia pulex
12 chromosomes
31,000 genes
200 million nucleotides
Paris japonica
?? chromosomes
??? genes
150 billion nucleotides3
DNA Encodes the Business of the Cell
Chromosome
Chromosome region
Gene GGATCCTATGGATGCATGCCGCCGTAGTATAAT…
Protein
Protein functions
Copying the genome and the cell
Transport into and out of the cell
Energy production and storage
Cellular defense
etc…
4
Three key questions
(1)What genes in an organism’s genome are responsible for its
unique properties? For example:
- Ability to withstand environmental challenges
- Developmental “plan”
- Sources of nutrients
(2) How can we use properties of an organism’s genome as a
“fingerprint” to identify that organism?
(3) What mutations to an organism’s genome (including single base
changes) are responsible for altered properties of that organism?
5
Microbes: hot or not?
+ ++ +++ +++++++
Strain 121
MacDonald, NJ and Beiko, RG (2010). Efficient learning of microbial
genotype–phenotype association rules. Bioinformatics 26: 1834-1840. 6
Beating the heat
Proteins tend to stop working at temperatures above 37-40° C
Heat shock – “Things are getting uncomfortable here”
Extreme heat shock – “Make it stop make it stop make it stop!!!!”
What does an organism need to get by at higher temperatures?
(1) Specific proteins that help keep everything working
(2) Changes to all proteins that make them more heat tolerant
(3) Various other things
Proteins tend to stop working at temperatures above 37-40° C
Heat shock – “Things are getting uncomfortable here”
Extreme heat shock – “Make it stop make it stop make it stop!!!!”
What does an organism need to get by at higher temperatures?
(1) Specific proteins that help keep everything working
(2) Changes to all proteins that make them more heat tolerant
(3) Various other things
7
The “genotype-phenotype association” problem
Genotype: An organism’s DNA sequence, somehow defined
Phenotype: An organism’s physical properties
In this case, “genotype” will refer to the presence of genes that
are similar enough that they likely share the same function
8
The “genotype-phenotype association” problem
Gene 1 Gene 2 Gene 3 Gene 4 Gene 5


















9
A suitable approach
Problem: a typical dataset will contain between 50-500 genomes,
and presence / absence data for >10,000 genes
We need an approach that can detect interactions among genes, so
the potential feature space is very large. Searching all 210,000 rule
combinations is obviously not going to happen.
ASSOCIATION RULE MINING (Agrawal et al 1993):
Discover associative rules between items, e.g. {Milk, Eggs} -> {Flour}
Classification Based on Predictive Association Rules (Yin and Han,
2003): iteratively generate rules to “cover” each subset of the data
10
11
F
F, Q
F, Z
A
None above
gain threshold
Rules discovered:
1. F, Q -> POSITIVE
2. F, Z -> POSITIVE
3. A -> POSITIVE
Covered samples get their weight reduced before the next iteration
None above
gain threshold
None above
gain threshold
Classification based on Predictive Association Rules
(CPAR)
CPAR results
One example for now: THERMOPHILY – the ability of an organism to
grow at temperatures above 42° C
427 genomes in the dataset: 376 mesophiles (negative set), 51
thermophiles (positive set)
26,290 genes to consider
Use CPAR to learn rules, submit identified genes to SVM for
classification. 10x 5-fold cross-validation
CPAR accuracy: 84.3% (obtained in 10.6 seconds)
Best competitor (NETCAR): 79.3% (obtained in 1250.9 seconds)
12
CPAR Results
Aeropyrum_pernix_K1 YES 0
Archaeoglobus_fulgidus_DSM_4304 YES 0
Caldicellulosiruptor_saccharolyticus_DSM_8903YES 0
Fervidobacterium_nodosum_Rt17-B1 YES 0
Hyperthermus_butylicus_DSM_5456 YES 0
Ignicoccus_hospitalis_KIN4/I YES 0
Metallosphaera_sedula_DSM_5348 YES 0
Pyrobaculum_arsenaticum_DSM_13514 YES 0
Pyrobaculum_calidifontis_JCM_11548 YES 0
Pyrobaculum_islandicum_DSM_4184 YES 0
Pyrococcus_abyssi_GE5 YES 0
Pyrococcus_furiosus_DSM_3638 YES 0
Pyrococcus_horikoshii_OT3 YES 0
Staphylothermus_marinus_F1 YES 0
Sulfolobus_acidocaldarius_DSM_639 YES 0
Sulfolobus_solfataricus_P2 YES 0
Thermoanaerobacter_tengcongensis_MB4 YES 0
Thermofilum_pendens_Hrk_5 YES 0
Thermotoga_maritima_MSB8 YES 0
Thermotoga_petrophila_RKU-1 YES 0
Thermus_thermophilus_HB27 YES 0
Thermus_thermophilus_HB8 YES 0
Roseiflexus_castenholzii_DSM_13941 YES 0
Thermosipho_melanesiensis_BI429 YES 0
Roseiflexus_sp._RS-1 YES 0
Moorella_thermoacetica_ATCC_39073 YES 0
Streptococcus_thermophilus_LMG_18311 YES 0
Thermoplasma_volcanium_GSS1 YES 1
Methanosaeta_thermophila_PT YES 1
Nanoarchaeum_equitans_Kin4-M YES 1
Thermoplasma_acidophilum_DSM_1728 YES 1
Picrophilus_torridus_DSM_9790 YES 1
Carboxydothermus_hydrogenoformans_Z-2901YES 1
Streptococcus_thermophilus_CNRZ1066 YES 1
Aquifex_aeolicus_VF5 YES 2
Methanopyrus_kandleri_AV19 YES 2
Pelotomaculum_thermopropionicum_SI YES 2
Rubrobacter_xylanophilus_DSM_9941 YES 3
Geobacillus_kaustophilus_HTA426 YES 3
Nitratiruptor_sp._SB155-2 YES 4
Synechococcus_sp._JA-3-3Ab YES 6
Geobacillus_thermodenitrificans_NG80-2 YES 7
Methanocaldococcus_jannaschii_DSM_2661 YES 8
Acidothermus_cellulolyticus_11B YES 8
Deinococcus_geothermalis_DSM_11300 YES 9
Clostridium_thermocellum_ATCC_27405 YES 9
Thermosynechococcus_elongatus_BP-1 YES 9
Sulfurovum_sp._NBC37-1 YES 10
Thermobifida_fusca_YX YES 10
Chlorobium_tepidum_TLS YES 10
Symbiobacterium_thermophilum_IAM_14863YES 10 # of misclassifications in 10 replicate runs
Sulfurovum_sp._NBC37-1 YES 10
The classifier is right; the database is wrong!!
13
A complication
Organisms are not independent observations!
They share common ancestry
Gene 1 Gene 2 Gene 3 Gene 4 Gene 5


















14
What to do?
MUTUAL INFORMATION:
CONDITIONAL MUTUAL INFORMATION:
Weight CMI by total MI – CONDITIONAL WEIGHTED MUTUAL INFORMATION (CWMI)
Reweight CPAR rules to reflect MI or CWMI: what patterns emerge?
15
What genes are identified?
16
Highlighted boxes: genes identified in “A DNA repair system specific for
thermophilic Archaea and bacteria predicted by genomic context analysis”
(Makarova et al., Nucleic Acids Research, 2002, 30 (2) , 482-496)
Top CWMITop MI
Wrong, but in different ways
1717
Organism CPAR MI CWMI
Streptococcus thermophilus LMG 18311 0 10 10
Streptococcus thermophilus CNRZ1066 1 10 10
Carboxydothermus hydrogenoformans Z-2901 1 8 5
Geobacillus kaustophilus HTA426 3 10 9
Synechococcus sp. JA-3-3Ab 6 8 2
Methanocaldococcus jannaschii DSM 2661 8 0 0
Acidothermus cellulolyticus 11B 8 9 6
Deinococcus geothermalis DSM 11300 9 8 5
Clostridium thermocellum ATCC 27405 9 10 4
Chlorobium tepidum TLS 10 10 8
Summary
18
Misclassifications (10 replicates)
18
- CPAR is FAST and fairly accurate, but the problem is challenging:
no “magic” set of genes that automatically make you a thermophile
- But we can investigate what pops up in the rules to find out which
genes are most likely associated with heat tolerance
- The hardest organisms to classify are from weird groups, with few
or no close relatives that are also thermophilic
- Different weighting schemes, especially those that consider the
confounding effects of taxonomy, have different strengths and can
identify different candidate genes
What’s next?
1919
- Much larger microbial datasets with much broader taxonomic
coverage are now available
- Will give us more precise models of what genes make a
thermophile, pathogen, etc.
- Consider other lines of evidence: variation WITHIN genes in
addition to gene presence/absence
- Apply to emerging pathogen data: classify outbreak isolates
based on antibiotic resistance, virulence and other properties
(SFU, BCCDC, National Microbiology Laboratory)
Jie (Jessie) Ning
METAGENOMICS:
Because one genome at a time is too easy
MacDonald NJ, Parks DH, and Beiko, RG (2012). Rapid identification of high-confidence
taxonomic assignments for metagenomic data. Nucleic Acids Research 40: e111.
Parks DH, MacDonald NJ, and Beiko, RG (2011). Classifying short genomic fragments from novel
lineages using composition and homology. BMC Bioinformatics 12: 328.
20
The microbial community problem
- Microbes almost never act alone;
samples will typically contain
dozens or hundreds of different
species
- How can we answer the following
questions:
- What microbes are present in
a given sample?
- What functions do they carry
out?
- How do they interact with one
another?
21
Metagenomics
Sample Extract DNA Sequence DNA Assign sequences
GATAA
? ?
??
22
The species assignment problem
GATAAATCTGG
? ?
??
- UNSUPERVISED (clustering-ish)
and SUPERVISED approaches
- For supervised classification, we
need a set of known genomes
- Two attributes provide key clues:
(i) Genomic composition of k-
mers (aka n-grams)
(ii) Comparison with known
gene sequences
23
The species assignment problem
GATAAATCTGG
24
Mystery sequence
Where did I come from?
COMPOSITION
(k-mers)
k-mer frequency
AA 2/10
AC 0/10
AG 0/10
AT 1/10
k-mer frequency
AA 2/10
AC 0/10
AG 0/10
AT 1/10
k-mer frequency
AA 2/10
AC 0/10
AG 0/10
AT 1/10
k-mer frequency
AA 2/10
AC 0/10
AG 0/10
AT 1/10
k-mer frequency
AA 2/10
AC 0/10
AG 0/10
AT 1/10
Genome models
SIMILARITY
GATAAATCTGG
GATAAGTCTGG
GACCAATCTGG
GATAAACTTAG
CAAGGATAAGC
Sequences from
reference genomes
Sequence from
metagenome
Metagenomes - the first few years
25
Cost of DNA sequencing
(note log scale)
Study Author, Year # of
nucleotides
Size of each
“read”
Acid mine drainage Tyson et al, 2004 7.62 x 107 737 nt
Obese / Lean twins Turnbaugh et al, 2009 1.83 x 109 341 nt
Human gut
“catalogue”
Qin et al, 2010 5.77 x 1011 75 nt
Summary of challenges
26
- Datasets are already huge, and getting bigger and more numerous
- DNA sequences that we need to classify are SHORT: unstable
estimates of composition and similarity
- Our predictions depend on the coverage in our reference database
- We need to combine different lines of evidence into a coherent
prediction scheme
Two approaches
27
PhymmBL: Brady and Salzberg,
2010
- Similarity of sequences
assessed through the BLAST
algorithm
- Composition assessed using
interpolated context models
- Predictions are combined
using a formula
RITA: MacDonald, Parks and
Beiko, 2012
- Similarity of sequences
assessed using UBLAST and
BLAST
- Composition assessed using
naïve Bayes approach
- Look for agreement between
predictors; if no agreement,
decide based on best evidence
The naïve Bayes approach
28
- Build k-mer profiles for each reference genome
- The probability that a given DNA sequence fragment F originated from
a given genome Gi is:
- (that is, the combined frequencies of all k-mers from F in genome Gi)
- Note that naïve Bayes assumes INDEPENDENCE, which is a bit funny
with overlapping k-mers (But We Did It Anyway)
M
j
iji GwPGFP
1
||
AGGCTTGTCAA
Naïve Bayes in action
29
Build fake metagenomes by chopping up real sequenced genomes into
pieces of length 200
Build a reference database that excludes the chopped up genomes AND
Their close relatives (leave-one-out)
How accurate is the classifier, for different values of k?
k
Average proportion
of sequences correctly
classified
Composition versus Similarity
30
Similarity (three right-hand sets) are more accurate (and slower)
than composition approaches NB and P
1000 nt
200 nt
RITA:
Rapid Identification of Taxonomic Assignments
31
Query DNA
sequence fragment
Run naïve Bayes
classifier
UBLAST filter
(fast, imprecise)
BLAST comparisons
(slower, better)
Is there a BLAST
match?
Is there a strong
naïve Bayes
preference?
Do BLAST and
naïve Bayes
agree?
Is there a strong
BLAST preference?
Group 2 Group 3
Group 1a
Group 1b
Yes!
No!
Performance on different sequence lengths
32
Running time
33
0.01
0.1
1
10
100
Runningtime(h)
Running times
on 116,244 sequences
Application to human microbiome
data sets
34
Homology+CompositionComposition
Without HMP genomes:
Clostridium, Bacteroides and Eubacterium, but
lots of low-confidence calls too
With HMP reference genomes:
Add Ruminococcus, Faecalibacterium,
Lachnospiraceae
Good Less Good
Data from Turnbaugh et al., 2010
Application to bioremediation metagenome
35Hug et al., 2012
Three sets of microbes, all can clean up
PCEs. Are there differences in the
composition of these sets?
Summary
36
- Naïve Bayes is FAST and performs as well as alternative, more
complicated approaches
- The combination of composition and similarity is superior to either
approach in isolation
- The accuracy on short reads is good, but a substantial minority of
reads are misclassified so the question of “who is doing what”
remains somewhat open
What’s next?
37
- Apply to emerging metagenomic data sets:
- Bioremediation
- Aging and frailty in mice and humans
- Refine the approach to include both
unsupervised and supervised components
Coda #1: mammalian fertility
38
Random mating
CONTROL (105)
Selective breeding
SELECTED (344)
Starting colony
30 years of….
Examine genetic variation at >8000 positions
within the genome.
Are there any genetic differences at one or
more sites that distinguish the populations
and individuals within the populations?
Alex Keddy
Katherine
Rutherford
Machine-learning results
39
Different ML approaches with feature selection
Observed vs Predicted reproductive rate
for RF regression model
What’s next?
40Jeremy Koenig
- Expand the project: more data, and more types of data!
- Integrating lines of evidence from multiple sources will be a
significant challenge – each yields overlapping / different
predictions
- Map interesting results into the cow genome and test effectiveness
Developer to be
named later
Coda #2: data retrieval and GIS
41
20,304 samples
1.7 billion sequences
42
Conor Meehan
Objectives
43
- Automated classification of data from sources such as the EMP
- Retrieval of data from EMP via Web services under development
(some plugins already completed – come in October for the story)
What’s next?
44
My Dal Homecoming lecture on October 4
Classifying DNA: Adventures in
Multidisciplinarity
45
Genetics
Evolution
Statistics
Machine
Learning
Throw in the challenges of massive data sets,
data retrieval challenges,
emerging technologies,
and uncertain reliability of some data sets,
And there is a lot of work still to be done!!
Chris Whidden
Donovan Parks
Morgan Langille
Open Science
46
@rob_beiko
Github
Preprint servers
This presentation:
http://www.slideshare.net/beiko/beiko-dcsi2013
Fin
47
Fin
Image credits
Please follow links for copyright information
Slide 1: http://commons.wikimedia.org/wiki/File:DNA Overview2.png
Slide 2: s3.documentcloud.org/documents/706661/francis-crick-letter.pdf
Slide 3: http://www.nature.com/nature/journal/v393/n6685/full/393537a0.html
http://www.ncbi.nlm.nih.gov/sutils/static/GP IMAGE/Mycobacterium.jpg
http://commons.wikimedia.org/wiki/File:Shakespeare.jpg
http://commons.wikimedia.org/wiki/File:Wolllaus.jpg
http://phenomena.nationalgeographic.com/files/2013/06/Tremblaya Moranella.jpg (Ryuichi Koga, National Institute of Advanced Industrial
Science and Technology, Japan)
http://commons.wikimedia.org/wiki/File:Daphnia pulex.png
http://commons.wikimedia.org/wiki/File:Paris japonica Kinugasasou in Hakusan 2003 7 27.jpg
Slide 4: http://upload.wikimedia.org/wikipedia/commons/2/21/DNA human male chromosomes.gif
http://commons.wikimedia.org/wiki/File:LKB1 complex structure 2WTK.png
Slide 6: http://commons.wikimedia.org/wiki/File:Yogurt of the Bulgarija Pavilion of Expo 2005 Aichi Japan.jpg
http://en.wikipedia.org/wiki/File:Grand prismatic spring.jpg
http://www.nsf.gov/od/lpa/news/03/images/scsmoker2th.jpg
http://www.nsf.gov/od/lpa/news/03/images/strain121 thin th.jpg
Slide 21:
http://commons.wikimedia.org/wiki/File:EPA_TECHNICIAN_COLLECTS_WATER_SAMPLE_FROM_PAHRANAGAT_LAKE_ABOUT_10_MILES_SOUTH
_OF_ALAMO_-_NARA_-_549007.jpg
http://commons.wikimedia.org/wiki/File:DNA_orbit_animated_small.gif
http://commons.wikimedia.org/wiki/File:DNA_sequence.svg
Slide 24:Stein, L. Genome Biology 2010 11:207
Slide 36: http://commons.wikimedia.org/wiki/File:Mouse-19-Dec-2004.jpg
48

More Related Content

What's hot

Genetic engineering
Genetic engineering Genetic engineering
Genetic engineering Snehal Jadav
 
Genetics By Swati & Sheela
Genetics By Swati & SheelaGenetics By Swati & Sheela
Genetics By Swati & Sheelasubzero64
 
Genetic Engineering and the future of Evolutiom
Genetic Engineering and the future of EvolutiomGenetic Engineering and the future of Evolutiom
Genetic Engineering and the future of EvolutiomRicha Khatiwada
 
Genetic engineering
Genetic engineering Genetic engineering
Genetic engineering PurvenBhavsar
 
BioMinds Poster!!!!!!!!
BioMinds Poster!!!!!!!!BioMinds Poster!!!!!!!!
BioMinds Poster!!!!!!!!Zuleika86
 
Genetic Engineering
Genetic EngineeringGenetic Engineering
Genetic EngineeringSUNY Oswego
 
Biotechnology and1 genetic engineering
Biotechnology and1 genetic engineeringBiotechnology and1 genetic engineering
Biotechnology and1 genetic engineeringmandalina landy
 
genetic engineering
genetic engineeringgenetic engineering
genetic engineeringcbsua
 
Genetic Engineering
Genetic Engineering Genetic Engineering
Genetic Engineering Sultana Jamil
 
Genetic engineering and biotechnology 2016
Genetic engineering and biotechnology 2016Genetic engineering and biotechnology 2016
Genetic engineering and biotechnology 2016Dobbs Ferry High School
 
Microbial Metagenomics and Human Health
Microbial Metagenomics and Human HealthMicrobial Metagenomics and Human Health
Microbial Metagenomics and Human HealthLarry Smarr
 
Genetic Engineering
Genetic EngineeringGenetic Engineering
Genetic EngineeringDamien512
 
Genetic engineering in animal
Genetic engineering in animalGenetic engineering in animal
Genetic engineering in animalTaikiat Kiat
 
Biotechnology- Principles and processes investigatory project.
Biotechnology- Principles and processes investigatory project.Biotechnology- Principles and processes investigatory project.
Biotechnology- Principles and processes investigatory project.Nishant Upadhyay
 
Genetic engineering
Genetic engineeringGenetic engineering
Genetic engineeringSoz Najat
 

What's hot (20)

Cloning & Genetic Engineering
Cloning & Genetic EngineeringCloning & Genetic Engineering
Cloning & Genetic Engineering
 
Genetic engineering
Genetic engineering Genetic engineering
Genetic engineering
 
Genetic engineering
Genetic engineering Genetic engineering
Genetic engineering
 
Genetics By Swati & Sheela
Genetics By Swati & SheelaGenetics By Swati & Sheela
Genetics By Swati & Sheela
 
Metagenomics
MetagenomicsMetagenomics
Metagenomics
 
Metagenomic
MetagenomicMetagenomic
Metagenomic
 
Genetic Engineering and the future of Evolutiom
Genetic Engineering and the future of EvolutiomGenetic Engineering and the future of Evolutiom
Genetic Engineering and the future of Evolutiom
 
Genetic engineering
Genetic engineering Genetic engineering
Genetic engineering
 
BioMinds Poster!!!!!!!!
BioMinds Poster!!!!!!!!BioMinds Poster!!!!!!!!
BioMinds Poster!!!!!!!!
 
Genetic Engineering
Genetic EngineeringGenetic Engineering
Genetic Engineering
 
Biotechnology and1 genetic engineering
Biotechnology and1 genetic engineeringBiotechnology and1 genetic engineering
Biotechnology and1 genetic engineering
 
genetic engineering
genetic engineeringgenetic engineering
genetic engineering
 
Genetic Engineering
Genetic Engineering Genetic Engineering
Genetic Engineering
 
Genetic engineering and biotechnology 2016
Genetic engineering and biotechnology 2016Genetic engineering and biotechnology 2016
Genetic engineering and biotechnology 2016
 
Microbial Metagenomics and Human Health
Microbial Metagenomics and Human HealthMicrobial Metagenomics and Human Health
Microbial Metagenomics and Human Health
 
Genetic Engineering
Genetic EngineeringGenetic Engineering
Genetic Engineering
 
Genetic engineering in animal
Genetic engineering in animalGenetic engineering in animal
Genetic engineering in animal
 
Biotechnology- Principles and processes investigatory project.
Biotechnology- Principles and processes investigatory project.Biotechnology- Principles and processes investigatory project.
Biotechnology- Principles and processes investigatory project.
 
Genetic engineering
Genetic engineeringGenetic engineering
Genetic engineering
 
Genetic Engineering ppt
Genetic Engineering pptGenetic Engineering ppt
Genetic Engineering ppt
 

Viewers also liked

2015 06-12-beiko-irida-big data
2015 06-12-beiko-irida-big data2015 06-12-beiko-irida-big data
2015 06-12-beiko-irida-big databeiko
 
Beiko biogeography
Beiko biogeographyBeiko biogeography
Beiko biogeographybeiko
 
Beiko hpcs
Beiko hpcsBeiko hpcs
Beiko hpcsbeiko
 
Using SEO in Google Analytics | Analytics Pros Webinar by Mark McLaren
Using SEO in Google Analytics | Analytics Pros Webinar by Mark McLarenUsing SEO in Google Analytics | Analytics Pros Webinar by Mark McLaren
Using SEO in Google Analytics | Analytics Pros Webinar by Mark McLarenCaleb Whitmore
 
Goals and Goal Funnels | Presented by Justin Spencer of Analytics Pros
Goals and Goal Funnels | Presented by Justin Spencer of Analytics ProsGoals and Goal Funnels | Presented by Justin Spencer of Analytics Pros
Goals and Goal Funnels | Presented by Justin Spencer of Analytics ProsCaleb Whitmore
 
Amaia steps nova presentation
Amaia steps nova presentationAmaia steps nova presentation
Amaia steps nova presentationLilibeth Lucas
 
Testing & Optimization - A Deeper Look
Testing & Optimization - A Deeper LookTesting & Optimization - A Deeper Look
Testing & Optimization - A Deeper LookCaleb Whitmore
 
Ozbun VTSII
Ozbun VTSIIOzbun VTSII
Ozbun VTSIIeozbun
 
BEST Practices - Testing & Optimization | Bredan Rendan
BEST Practices - Testing & Optimization | Bredan RendanBEST Practices - Testing & Optimization | Bredan Rendan
BEST Practices - Testing & Optimization | Bredan RendanCaleb Whitmore
 
Pedoman Nasional Pengobatan Antiretroviral (ART)
Pedoman Nasional Pengobatan Antiretroviral (ART)Pedoman Nasional Pengobatan Antiretroviral (ART)
Pedoman Nasional Pengobatan Antiretroviral (ART)Ditya Permana Adi
 
University of Nebraska Think Tank: Google Analytics as your Digital Marketing...
University of Nebraska Think Tank: Google Analytics as your Digital Marketing...University of Nebraska Think Tank: Google Analytics as your Digital Marketing...
University of Nebraska Think Tank: Google Analytics as your Digital Marketing...Caleb Whitmore
 
20151223application of deep learning in basic bio
20151223application of deep learning in basic bio 20151223application of deep learning in basic bio
20151223application of deep learning in basic bio Charlene Hsuan-Lin Her
 
Travel ventures international express
Travel ventures international expressTravel ventures international express
Travel ventures international expressLilibeth Lucas
 
Spark, Deep Learning and Life Sciences, Systems Biology in the Big Data Age
Spark, Deep Learning and Life Sciences, Systems Biology in the Big Data AgeSpark, Deep Learning and Life Sciences, Systems Biology in the Big Data Age
Spark, Deep Learning and Life Sciences, Systems Biology in the Big Data Agebatchinsights
 
Deep Learning and its Applications - Computer Vision
Deep Learning and its Applications - Computer VisionDeep Learning and its Applications - Computer Vision
Deep Learning and its Applications - Computer VisionAdam Gibson
 
Information Retrieval with Deep Learning
Information Retrieval with Deep LearningInformation Retrieval with Deep Learning
Information Retrieval with Deep LearningAdam Gibson
 

Viewers also liked (20)

2015 06-12-beiko-irida-big data
2015 06-12-beiko-irida-big data2015 06-12-beiko-irida-big data
2015 06-12-beiko-irida-big data
 
Beiko biogeography
Beiko biogeographyBeiko biogeography
Beiko biogeography
 
Beiko hpcs
Beiko hpcsBeiko hpcs
Beiko hpcs
 
Ga+presentation
Ga+presentationGa+presentation
Ga+presentation
 
Using SEO in Google Analytics | Analytics Pros Webinar by Mark McLaren
Using SEO in Google Analytics | Analytics Pros Webinar by Mark McLarenUsing SEO in Google Analytics | Analytics Pros Webinar by Mark McLaren
Using SEO in Google Analytics | Analytics Pros Webinar by Mark McLaren
 
Goals and Goal Funnels | Presented by Justin Spencer of Analytics Pros
Goals and Goal Funnels | Presented by Justin Spencer of Analytics ProsGoals and Goal Funnels | Presented by Justin Spencer of Analytics Pros
Goals and Goal Funnels | Presented by Justin Spencer of Analytics Pros
 
Amaia steps nova presentation
Amaia steps nova presentationAmaia steps nova presentation
Amaia steps nova presentation
 
Testing & Optimization - A Deeper Look
Testing & Optimization - A Deeper LookTesting & Optimization - A Deeper Look
Testing & Optimization - A Deeper Look
 
Ozbun VTSII
Ozbun VTSIIOzbun VTSII
Ozbun VTSII
 
BEST Practices - Testing & Optimization | Bredan Rendan
BEST Practices - Testing & Optimization | Bredan RendanBEST Practices - Testing & Optimization | Bredan Rendan
BEST Practices - Testing & Optimization | Bredan Rendan
 
Setting vpn pptp client
Setting vpn pptp clientSetting vpn pptp client
Setting vpn pptp client
 
Buku saku faq bpjs
Buku saku faq bpjsBuku saku faq bpjs
Buku saku faq bpjs
 
Pedoman Nasional Pengobatan Antiretroviral (ART)
Pedoman Nasional Pengobatan Antiretroviral (ART)Pedoman Nasional Pengobatan Antiretroviral (ART)
Pedoman Nasional Pengobatan Antiretroviral (ART)
 
University of Nebraska Think Tank: Google Analytics as your Digital Marketing...
University of Nebraska Think Tank: Google Analytics as your Digital Marketing...University of Nebraska Think Tank: Google Analytics as your Digital Marketing...
University of Nebraska Think Tank: Google Analytics as your Digital Marketing...
 
Hsbc
HsbcHsbc
Hsbc
 
20151223application of deep learning in basic bio
20151223application of deep learning in basic bio 20151223application of deep learning in basic bio
20151223application of deep learning in basic bio
 
Travel ventures international express
Travel ventures international expressTravel ventures international express
Travel ventures international express
 
Spark, Deep Learning and Life Sciences, Systems Biology in the Big Data Age
Spark, Deep Learning and Life Sciences, Systems Biology in the Big Data AgeSpark, Deep Learning and Life Sciences, Systems Biology in the Big Data Age
Spark, Deep Learning and Life Sciences, Systems Biology in the Big Data Age
 
Deep Learning and its Applications - Computer Vision
Deep Learning and its Applications - Computer VisionDeep Learning and its Applications - Computer Vision
Deep Learning and its Applications - Computer Vision
 
Information Retrieval with Deep Learning
Information Retrieval with Deep LearningInformation Retrieval with Deep Learning
Information Retrieval with Deep Learning
 

Similar to Beiko dcsi2013

L14 human genome
L14 human genomeL14 human genome
L14 human genomeMUBOSScz
 
bioinformatics simple
bioinformatics simple bioinformatics simple
bioinformatics simple nadeem akhter
 
George Church: Standards & Open-Access Genome-Environment-Trait Data
George Church: Standards & Open-Access Genome-Environment-Trait DataGeorge Church: Standards & Open-Access Genome-Environment-Trait Data
George Church: Standards & Open-Access Genome-Environment-Trait DataGenomeInABottle
 
A novel phylum-level archaea characterized by combining single-cell and metag...
A novel phylum-level archaea characterized by combining single-cell and metag...A novel phylum-level archaea characterized by combining single-cell and metag...
A novel phylum-level archaea characterized by combining single-cell and metag...Guillaume Reboul
 
Microbiome studies using 16S ribosomal DNA PCR: some cautionary tales.
Microbiome studies using 16S ribosomal DNA PCR: some cautionary tales.Microbiome studies using 16S ribosomal DNA PCR: some cautionary tales.
Microbiome studies using 16S ribosomal DNA PCR: some cautionary tales.jennomics
 
Research report (alternative splicing, protein structure; retinitis pigmentosa)
Research report (alternative splicing, protein structure; retinitis pigmentosa)Research report (alternative splicing, protein structure; retinitis pigmentosa)
Research report (alternative splicing, protein structure; retinitis pigmentosa)avalgar
 
General-Biology-2___Recombinant-DNA.pptx
General-Biology-2___Recombinant-DNA.pptxGeneral-Biology-2___Recombinant-DNA.pptx
General-Biology-2___Recombinant-DNA.pptxetonblue
 
Bioinformatics for Computer Scientists.ppt
Bioinformatics for Computer Scientists.pptBioinformatics for Computer Scientists.ppt
Bioinformatics for Computer Scientists.pptAbdullah Yousafzai
 
Introduction-to-Bioinformatics-1.ppt
Introduction-to-Bioinformatics-1.pptIntroduction-to-Bioinformatics-1.ppt
Introduction-to-Bioinformatics-1.pptRichardEstradaC
 
Group 5 DNA Tech - Ecology & Envt
Group 5 DNA Tech - Ecology & EnvtGroup 5 DNA Tech - Ecology & Envt
Group 5 DNA Tech - Ecology & EnvtJessica Kabigting
 
Bioinformatics final
Bioinformatics finalBioinformatics final
Bioinformatics finalRainu Rajeev
 
Seminario sobre la Aplicación "Expression2Kinases"
Seminario sobre la Aplicación "Expression2Kinases"Seminario sobre la Aplicación "Expression2Kinases"
Seminario sobre la Aplicación "Expression2Kinases"Rafael Diego Macho Reyes
 
6 26-2012
6 26-20126 26-2012
6 26-2012Sky Lar
 

Similar to Beiko dcsi2013 (20)

L14 human genome
L14 human genomeL14 human genome
L14 human genome
 
bioinformatics simple
bioinformatics simple bioinformatics simple
bioinformatics simple
 
George Church: Standards & Open-Access Genome-Environment-Trait Data
George Church: Standards & Open-Access Genome-Environment-Trait DataGeorge Church: Standards & Open-Access Genome-Environment-Trait Data
George Church: Standards & Open-Access Genome-Environment-Trait Data
 
155 dna microarray
155 dna microarray155 dna microarray
155 dna microarray
 
155 dna microarray
155 dna microarray155 dna microarray
155 dna microarray
 
Dna microarray mehran
Dna microarray  mehranDna microarray  mehran
Dna microarray mehran
 
A novel phylum-level archaea characterized by combining single-cell and metag...
A novel phylum-level archaea characterized by combining single-cell and metag...A novel phylum-level archaea characterized by combining single-cell and metag...
A novel phylum-level archaea characterized by combining single-cell and metag...
 
Eisen.Geba.Jgi2009b
Eisen.Geba.Jgi2009bEisen.Geba.Jgi2009b
Eisen.Geba.Jgi2009b
 
CE-Symm jLBR talk
CE-Symm jLBR talkCE-Symm jLBR talk
CE-Symm jLBR talk
 
Microbiome studies using 16S ribosomal DNA PCR: some cautionary tales.
Microbiome studies using 16S ribosomal DNA PCR: some cautionary tales.Microbiome studies using 16S ribosomal DNA PCR: some cautionary tales.
Microbiome studies using 16S ribosomal DNA PCR: some cautionary tales.
 
Research report (alternative splicing, protein structure; retinitis pigmentosa)
Research report (alternative splicing, protein structure; retinitis pigmentosa)Research report (alternative splicing, protein structure; retinitis pigmentosa)
Research report (alternative splicing, protein structure; retinitis pigmentosa)
 
General-Biology-2___Recombinant-DNA.pptx
General-Biology-2___Recombinant-DNA.pptxGeneral-Biology-2___Recombinant-DNA.pptx
General-Biology-2___Recombinant-DNA.pptx
 
Dna microarray mehran- u of toronto
Dna microarray  mehran- u of torontoDna microarray  mehran- u of toronto
Dna microarray mehran- u of toronto
 
Bioinformatics for Computer Scientists.ppt
Bioinformatics for Computer Scientists.pptBioinformatics for Computer Scientists.ppt
Bioinformatics for Computer Scientists.ppt
 
Introduction-to-Bioinformatics-1.ppt
Introduction-to-Bioinformatics-1.pptIntroduction-to-Bioinformatics-1.ppt
Introduction-to-Bioinformatics-1.ppt
 
Group 5 DNA Tech - Ecology & Envt
Group 5 DNA Tech - Ecology & EnvtGroup 5 DNA Tech - Ecology & Envt
Group 5 DNA Tech - Ecology & Envt
 
Bioinformatics final
Bioinformatics finalBioinformatics final
Bioinformatics final
 
Seminario sobre la Aplicación "Expression2Kinases"
Seminario sobre la Aplicación "Expression2Kinases"Seminario sobre la Aplicación "Expression2Kinases"
Seminario sobre la Aplicación "Expression2Kinases"
 
6 26-2012
6 26-20126 26-2012
6 26-2012
 
Mt DNA
Mt DNAMt DNA
Mt DNA
 

More from beiko

ASMNGS_ARETE_Beiko_2022Oct19.pptx
ASMNGS_ARETE_Beiko_2022Oct19.pptxASMNGS_ARETE_Beiko_2022Oct19.pptx
ASMNGS_ARETE_Beiko_2022Oct19.pptxbeiko
 
Beiko cmo gen_epi_monday
Beiko cmo gen_epi_mondayBeiko cmo gen_epi_monday
Beiko cmo gen_epi_mondaybeiko
 
Beiko networks 2019_final
Beiko networks 2019_finalBeiko networks 2019_final
Beiko networks 2019_finalbeiko
 
Biomedical data
Biomedical dataBiomedical data
Biomedical databeiko
 
Rob csm2018
Rob csm2018Rob csm2018
Rob csm2018beiko
 
Beiko taconic-nov3
Beiko taconic-nov3Beiko taconic-nov3
Beiko taconic-nov3beiko
 
CCBC tutorial beiko
CCBC tutorial beikoCCBC tutorial beiko
CCBC tutorial beikobeiko
 
GenGIS presentation at Vizbi 2016
GenGIS presentation at Vizbi 2016GenGIS presentation at Vizbi 2016
GenGIS presentation at Vizbi 2016beiko
 
Beiko ANL Soil Metagenomics presentation
Beiko ANL Soil Metagenomics presentationBeiko ANL Soil Metagenomics presentation
Beiko ANL Soil Metagenomics presentationbeiko
 
DCSI presentation 2015
DCSI presentation 2015DCSI presentation 2015
DCSI presentation 2015beiko
 
Beiko Deep Genomics presentation - "Grand theft operon - lateral city"
Beiko Deep Genomics presentation - "Grand theft operon - lateral city"Beiko Deep Genomics presentation - "Grand theft operon - lateral city"
Beiko Deep Genomics presentation - "Grand theft operon - lateral city"beiko
 
Rob's GenGIS presentation at IBS Special Meeting (Montreal 2013)
Rob's GenGIS presentation at IBS Special Meeting (Montreal 2013)Rob's GenGIS presentation at IBS Special Meeting (Montreal 2013)
Rob's GenGIS presentation at IBS Special Meeting (Montreal 2013)beiko
 
Beiko smbe2013-final
Beiko smbe2013-finalBeiko smbe2013-final
Beiko smbe2013-finalbeiko
 
Rob Beiko - #SMBE12 presentation
Rob Beiko - #SMBE12 presentationRob Beiko - #SMBE12 presentation
Rob Beiko - #SMBE12 presentationbeiko
 
Beiko gen gis2-share
Beiko gen gis2-shareBeiko gen gis2-share
Beiko gen gis2-sharebeiko
 

More from beiko (15)

ASMNGS_ARETE_Beiko_2022Oct19.pptx
ASMNGS_ARETE_Beiko_2022Oct19.pptxASMNGS_ARETE_Beiko_2022Oct19.pptx
ASMNGS_ARETE_Beiko_2022Oct19.pptx
 
Beiko cmo gen_epi_monday
Beiko cmo gen_epi_mondayBeiko cmo gen_epi_monday
Beiko cmo gen_epi_monday
 
Beiko networks 2019_final
Beiko networks 2019_finalBeiko networks 2019_final
Beiko networks 2019_final
 
Biomedical data
Biomedical dataBiomedical data
Biomedical data
 
Rob csm2018
Rob csm2018Rob csm2018
Rob csm2018
 
Beiko taconic-nov3
Beiko taconic-nov3Beiko taconic-nov3
Beiko taconic-nov3
 
CCBC tutorial beiko
CCBC tutorial beikoCCBC tutorial beiko
CCBC tutorial beiko
 
GenGIS presentation at Vizbi 2016
GenGIS presentation at Vizbi 2016GenGIS presentation at Vizbi 2016
GenGIS presentation at Vizbi 2016
 
Beiko ANL Soil Metagenomics presentation
Beiko ANL Soil Metagenomics presentationBeiko ANL Soil Metagenomics presentation
Beiko ANL Soil Metagenomics presentation
 
DCSI presentation 2015
DCSI presentation 2015DCSI presentation 2015
DCSI presentation 2015
 
Beiko Deep Genomics presentation - "Grand theft operon - lateral city"
Beiko Deep Genomics presentation - "Grand theft operon - lateral city"Beiko Deep Genomics presentation - "Grand theft operon - lateral city"
Beiko Deep Genomics presentation - "Grand theft operon - lateral city"
 
Rob's GenGIS presentation at IBS Special Meeting (Montreal 2013)
Rob's GenGIS presentation at IBS Special Meeting (Montreal 2013)Rob's GenGIS presentation at IBS Special Meeting (Montreal 2013)
Rob's GenGIS presentation at IBS Special Meeting (Montreal 2013)
 
Beiko smbe2013-final
Beiko smbe2013-finalBeiko smbe2013-final
Beiko smbe2013-final
 
Rob Beiko - #SMBE12 presentation
Rob Beiko - #SMBE12 presentationRob Beiko - #SMBE12 presentation
Rob Beiko - #SMBE12 presentation
 
Beiko gen gis2-share
Beiko gen gis2-shareBeiko gen gis2-share
Beiko gen gis2-share
 

Recently uploaded

🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 

Recently uploaded (20)

🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 

Beiko dcsi2013

  • 1. Classifying biological information the promise and perils of DNA sequences Rob Beiko September 19 DCSI 2013 Norm MacDonald Donovan Parks 1
  • 2. From Francis Crick’s letter to his son Michael, 1953 2
  • 3. Your Genome and You 23 chromosomes 20,000 genes 3.1 billion nucleotides Mycobacterium tuberculosis 1 chromosome 4,000 genes 4.4 million nucleotides Tremblaya princeps 1 chromosome 110 genes 138,931 nucleotides Daphnia pulex 12 chromosomes 31,000 genes 200 million nucleotides Paris japonica ?? chromosomes ??? genes 150 billion nucleotides3
  • 4. DNA Encodes the Business of the Cell Chromosome Chromosome region Gene GGATCCTATGGATGCATGCCGCCGTAGTATAAT… Protein Protein functions Copying the genome and the cell Transport into and out of the cell Energy production and storage Cellular defense etc… 4
  • 5. Three key questions (1)What genes in an organism’s genome are responsible for its unique properties? For example: - Ability to withstand environmental challenges - Developmental “plan” - Sources of nutrients (2) How can we use properties of an organism’s genome as a “fingerprint” to identify that organism? (3) What mutations to an organism’s genome (including single base changes) are responsible for altered properties of that organism? 5
  • 6. Microbes: hot or not? + ++ +++ +++++++ Strain 121 MacDonald, NJ and Beiko, RG (2010). Efficient learning of microbial genotype–phenotype association rules. Bioinformatics 26: 1834-1840. 6
  • 7. Beating the heat Proteins tend to stop working at temperatures above 37-40° C Heat shock – “Things are getting uncomfortable here” Extreme heat shock – “Make it stop make it stop make it stop!!!!” What does an organism need to get by at higher temperatures? (1) Specific proteins that help keep everything working (2) Changes to all proteins that make them more heat tolerant (3) Various other things Proteins tend to stop working at temperatures above 37-40° C Heat shock – “Things are getting uncomfortable here” Extreme heat shock – “Make it stop make it stop make it stop!!!!” What does an organism need to get by at higher temperatures? (1) Specific proteins that help keep everything working (2) Changes to all proteins that make them more heat tolerant (3) Various other things 7
  • 8. The “genotype-phenotype association” problem Genotype: An organism’s DNA sequence, somehow defined Phenotype: An organism’s physical properties In this case, “genotype” will refer to the presence of genes that are similar enough that they likely share the same function 8
  • 9. The “genotype-phenotype association” problem Gene 1 Gene 2 Gene 3 Gene 4 Gene 5                   9
  • 10. A suitable approach Problem: a typical dataset will contain between 50-500 genomes, and presence / absence data for >10,000 genes We need an approach that can detect interactions among genes, so the potential feature space is very large. Searching all 210,000 rule combinations is obviously not going to happen. ASSOCIATION RULE MINING (Agrawal et al 1993): Discover associative rules between items, e.g. {Milk, Eggs} -> {Flour} Classification Based on Predictive Association Rules (Yin and Han, 2003): iteratively generate rules to “cover” each subset of the data 10
  • 11. 11 F F, Q F, Z A None above gain threshold Rules discovered: 1. F, Q -> POSITIVE 2. F, Z -> POSITIVE 3. A -> POSITIVE Covered samples get their weight reduced before the next iteration None above gain threshold None above gain threshold Classification based on Predictive Association Rules (CPAR)
  • 12. CPAR results One example for now: THERMOPHILY – the ability of an organism to grow at temperatures above 42° C 427 genomes in the dataset: 376 mesophiles (negative set), 51 thermophiles (positive set) 26,290 genes to consider Use CPAR to learn rules, submit identified genes to SVM for classification. 10x 5-fold cross-validation CPAR accuracy: 84.3% (obtained in 10.6 seconds) Best competitor (NETCAR): 79.3% (obtained in 1250.9 seconds) 12
  • 13. CPAR Results Aeropyrum_pernix_K1 YES 0 Archaeoglobus_fulgidus_DSM_4304 YES 0 Caldicellulosiruptor_saccharolyticus_DSM_8903YES 0 Fervidobacterium_nodosum_Rt17-B1 YES 0 Hyperthermus_butylicus_DSM_5456 YES 0 Ignicoccus_hospitalis_KIN4/I YES 0 Metallosphaera_sedula_DSM_5348 YES 0 Pyrobaculum_arsenaticum_DSM_13514 YES 0 Pyrobaculum_calidifontis_JCM_11548 YES 0 Pyrobaculum_islandicum_DSM_4184 YES 0 Pyrococcus_abyssi_GE5 YES 0 Pyrococcus_furiosus_DSM_3638 YES 0 Pyrococcus_horikoshii_OT3 YES 0 Staphylothermus_marinus_F1 YES 0 Sulfolobus_acidocaldarius_DSM_639 YES 0 Sulfolobus_solfataricus_P2 YES 0 Thermoanaerobacter_tengcongensis_MB4 YES 0 Thermofilum_pendens_Hrk_5 YES 0 Thermotoga_maritima_MSB8 YES 0 Thermotoga_petrophila_RKU-1 YES 0 Thermus_thermophilus_HB27 YES 0 Thermus_thermophilus_HB8 YES 0 Roseiflexus_castenholzii_DSM_13941 YES 0 Thermosipho_melanesiensis_BI429 YES 0 Roseiflexus_sp._RS-1 YES 0 Moorella_thermoacetica_ATCC_39073 YES 0 Streptococcus_thermophilus_LMG_18311 YES 0 Thermoplasma_volcanium_GSS1 YES 1 Methanosaeta_thermophila_PT YES 1 Nanoarchaeum_equitans_Kin4-M YES 1 Thermoplasma_acidophilum_DSM_1728 YES 1 Picrophilus_torridus_DSM_9790 YES 1 Carboxydothermus_hydrogenoformans_Z-2901YES 1 Streptococcus_thermophilus_CNRZ1066 YES 1 Aquifex_aeolicus_VF5 YES 2 Methanopyrus_kandleri_AV19 YES 2 Pelotomaculum_thermopropionicum_SI YES 2 Rubrobacter_xylanophilus_DSM_9941 YES 3 Geobacillus_kaustophilus_HTA426 YES 3 Nitratiruptor_sp._SB155-2 YES 4 Synechococcus_sp._JA-3-3Ab YES 6 Geobacillus_thermodenitrificans_NG80-2 YES 7 Methanocaldococcus_jannaschii_DSM_2661 YES 8 Acidothermus_cellulolyticus_11B YES 8 Deinococcus_geothermalis_DSM_11300 YES 9 Clostridium_thermocellum_ATCC_27405 YES 9 Thermosynechococcus_elongatus_BP-1 YES 9 Sulfurovum_sp._NBC37-1 YES 10 Thermobifida_fusca_YX YES 10 Chlorobium_tepidum_TLS YES 10 Symbiobacterium_thermophilum_IAM_14863YES 10 # of misclassifications in 10 replicate runs Sulfurovum_sp._NBC37-1 YES 10 The classifier is right; the database is wrong!! 13
  • 14. A complication Organisms are not independent observations! They share common ancestry Gene 1 Gene 2 Gene 3 Gene 4 Gene 5                   14
  • 15. What to do? MUTUAL INFORMATION: CONDITIONAL MUTUAL INFORMATION: Weight CMI by total MI – CONDITIONAL WEIGHTED MUTUAL INFORMATION (CWMI) Reweight CPAR rules to reflect MI or CWMI: what patterns emerge? 15
  • 16. What genes are identified? 16 Highlighted boxes: genes identified in “A DNA repair system specific for thermophilic Archaea and bacteria predicted by genomic context analysis” (Makarova et al., Nucleic Acids Research, 2002, 30 (2) , 482-496) Top CWMITop MI
  • 17. Wrong, but in different ways 1717 Organism CPAR MI CWMI Streptococcus thermophilus LMG 18311 0 10 10 Streptococcus thermophilus CNRZ1066 1 10 10 Carboxydothermus hydrogenoformans Z-2901 1 8 5 Geobacillus kaustophilus HTA426 3 10 9 Synechococcus sp. JA-3-3Ab 6 8 2 Methanocaldococcus jannaschii DSM 2661 8 0 0 Acidothermus cellulolyticus 11B 8 9 6 Deinococcus geothermalis DSM 11300 9 8 5 Clostridium thermocellum ATCC 27405 9 10 4 Chlorobium tepidum TLS 10 10 8
  • 18. Summary 18 Misclassifications (10 replicates) 18 - CPAR is FAST and fairly accurate, but the problem is challenging: no “magic” set of genes that automatically make you a thermophile - But we can investigate what pops up in the rules to find out which genes are most likely associated with heat tolerance - The hardest organisms to classify are from weird groups, with few or no close relatives that are also thermophilic - Different weighting schemes, especially those that consider the confounding effects of taxonomy, have different strengths and can identify different candidate genes
  • 19. What’s next? 1919 - Much larger microbial datasets with much broader taxonomic coverage are now available - Will give us more precise models of what genes make a thermophile, pathogen, etc. - Consider other lines of evidence: variation WITHIN genes in addition to gene presence/absence - Apply to emerging pathogen data: classify outbreak isolates based on antibiotic resistance, virulence and other properties (SFU, BCCDC, National Microbiology Laboratory) Jie (Jessie) Ning
  • 20. METAGENOMICS: Because one genome at a time is too easy MacDonald NJ, Parks DH, and Beiko, RG (2012). Rapid identification of high-confidence taxonomic assignments for metagenomic data. Nucleic Acids Research 40: e111. Parks DH, MacDonald NJ, and Beiko, RG (2011). Classifying short genomic fragments from novel lineages using composition and homology. BMC Bioinformatics 12: 328. 20
  • 21. The microbial community problem - Microbes almost never act alone; samples will typically contain dozens or hundreds of different species - How can we answer the following questions: - What microbes are present in a given sample? - What functions do they carry out? - How do they interact with one another? 21
  • 22. Metagenomics Sample Extract DNA Sequence DNA Assign sequences GATAA ? ? ?? 22
  • 23. The species assignment problem GATAAATCTGG ? ? ?? - UNSUPERVISED (clustering-ish) and SUPERVISED approaches - For supervised classification, we need a set of known genomes - Two attributes provide key clues: (i) Genomic composition of k- mers (aka n-grams) (ii) Comparison with known gene sequences 23
  • 24. The species assignment problem GATAAATCTGG 24 Mystery sequence Where did I come from? COMPOSITION (k-mers) k-mer frequency AA 2/10 AC 0/10 AG 0/10 AT 1/10 k-mer frequency AA 2/10 AC 0/10 AG 0/10 AT 1/10 k-mer frequency AA 2/10 AC 0/10 AG 0/10 AT 1/10 k-mer frequency AA 2/10 AC 0/10 AG 0/10 AT 1/10 k-mer frequency AA 2/10 AC 0/10 AG 0/10 AT 1/10 Genome models SIMILARITY GATAAATCTGG GATAAGTCTGG GACCAATCTGG GATAAACTTAG CAAGGATAAGC Sequences from reference genomes Sequence from metagenome
  • 25. Metagenomes - the first few years 25 Cost of DNA sequencing (note log scale) Study Author, Year # of nucleotides Size of each “read” Acid mine drainage Tyson et al, 2004 7.62 x 107 737 nt Obese / Lean twins Turnbaugh et al, 2009 1.83 x 109 341 nt Human gut “catalogue” Qin et al, 2010 5.77 x 1011 75 nt
  • 26. Summary of challenges 26 - Datasets are already huge, and getting bigger and more numerous - DNA sequences that we need to classify are SHORT: unstable estimates of composition and similarity - Our predictions depend on the coverage in our reference database - We need to combine different lines of evidence into a coherent prediction scheme
  • 27. Two approaches 27 PhymmBL: Brady and Salzberg, 2010 - Similarity of sequences assessed through the BLAST algorithm - Composition assessed using interpolated context models - Predictions are combined using a formula RITA: MacDonald, Parks and Beiko, 2012 - Similarity of sequences assessed using UBLAST and BLAST - Composition assessed using naïve Bayes approach - Look for agreement between predictors; if no agreement, decide based on best evidence
  • 28. The naïve Bayes approach 28 - Build k-mer profiles for each reference genome - The probability that a given DNA sequence fragment F originated from a given genome Gi is: - (that is, the combined frequencies of all k-mers from F in genome Gi) - Note that naïve Bayes assumes INDEPENDENCE, which is a bit funny with overlapping k-mers (But We Did It Anyway) M j iji GwPGFP 1 || AGGCTTGTCAA
  • 29. Naïve Bayes in action 29 Build fake metagenomes by chopping up real sequenced genomes into pieces of length 200 Build a reference database that excludes the chopped up genomes AND Their close relatives (leave-one-out) How accurate is the classifier, for different values of k? k Average proportion of sequences correctly classified
  • 30. Composition versus Similarity 30 Similarity (three right-hand sets) are more accurate (and slower) than composition approaches NB and P 1000 nt 200 nt
  • 31. RITA: Rapid Identification of Taxonomic Assignments 31 Query DNA sequence fragment Run naïve Bayes classifier UBLAST filter (fast, imprecise) BLAST comparisons (slower, better) Is there a BLAST match? Is there a strong naïve Bayes preference? Do BLAST and naïve Bayes agree? Is there a strong BLAST preference? Group 2 Group 3 Group 1a Group 1b Yes! No!
  • 32. Performance on different sequence lengths 32
  • 34. Application to human microbiome data sets 34 Homology+CompositionComposition Without HMP genomes: Clostridium, Bacteroides and Eubacterium, but lots of low-confidence calls too With HMP reference genomes: Add Ruminococcus, Faecalibacterium, Lachnospiraceae Good Less Good Data from Turnbaugh et al., 2010
  • 35. Application to bioremediation metagenome 35Hug et al., 2012 Three sets of microbes, all can clean up PCEs. Are there differences in the composition of these sets?
  • 36. Summary 36 - Naïve Bayes is FAST and performs as well as alternative, more complicated approaches - The combination of composition and similarity is superior to either approach in isolation - The accuracy on short reads is good, but a substantial minority of reads are misclassified so the question of “who is doing what” remains somewhat open
  • 37. What’s next? 37 - Apply to emerging metagenomic data sets: - Bioremediation - Aging and frailty in mice and humans - Refine the approach to include both unsupervised and supervised components
  • 38. Coda #1: mammalian fertility 38 Random mating CONTROL (105) Selective breeding SELECTED (344) Starting colony 30 years of…. Examine genetic variation at >8000 positions within the genome. Are there any genetic differences at one or more sites that distinguish the populations and individuals within the populations? Alex Keddy Katherine Rutherford
  • 39. Machine-learning results 39 Different ML approaches with feature selection Observed vs Predicted reproductive rate for RF regression model
  • 40. What’s next? 40Jeremy Koenig - Expand the project: more data, and more types of data! - Integrating lines of evidence from multiple sources will be a significant challenge – each yields overlapping / different predictions - Map interesting results into the cow genome and test effectiveness Developer to be named later
  • 41. Coda #2: data retrieval and GIS 41 20,304 samples 1.7 billion sequences
  • 43. Objectives 43 - Automated classification of data from sources such as the EMP - Retrieval of data from EMP via Web services under development (some plugins already completed – come in October for the story)
  • 44. What’s next? 44 My Dal Homecoming lecture on October 4
  • 45. Classifying DNA: Adventures in Multidisciplinarity 45 Genetics Evolution Statistics Machine Learning Throw in the challenges of massive data sets, data retrieval challenges, emerging technologies, and uncertain reliability of some data sets, And there is a lot of work still to be done!! Chris Whidden Donovan Parks Morgan Langille
  • 46. Open Science 46 @rob_beiko Github Preprint servers This presentation: http://www.slideshare.net/beiko/beiko-dcsi2013
  • 48. Image credits Please follow links for copyright information Slide 1: http://commons.wikimedia.org/wiki/File:DNA Overview2.png Slide 2: s3.documentcloud.org/documents/706661/francis-crick-letter.pdf Slide 3: http://www.nature.com/nature/journal/v393/n6685/full/393537a0.html http://www.ncbi.nlm.nih.gov/sutils/static/GP IMAGE/Mycobacterium.jpg http://commons.wikimedia.org/wiki/File:Shakespeare.jpg http://commons.wikimedia.org/wiki/File:Wolllaus.jpg http://phenomena.nationalgeographic.com/files/2013/06/Tremblaya Moranella.jpg (Ryuichi Koga, National Institute of Advanced Industrial Science and Technology, Japan) http://commons.wikimedia.org/wiki/File:Daphnia pulex.png http://commons.wikimedia.org/wiki/File:Paris japonica Kinugasasou in Hakusan 2003 7 27.jpg Slide 4: http://upload.wikimedia.org/wikipedia/commons/2/21/DNA human male chromosomes.gif http://commons.wikimedia.org/wiki/File:LKB1 complex structure 2WTK.png Slide 6: http://commons.wikimedia.org/wiki/File:Yogurt of the Bulgarija Pavilion of Expo 2005 Aichi Japan.jpg http://en.wikipedia.org/wiki/File:Grand prismatic spring.jpg http://www.nsf.gov/od/lpa/news/03/images/scsmoker2th.jpg http://www.nsf.gov/od/lpa/news/03/images/strain121 thin th.jpg Slide 21: http://commons.wikimedia.org/wiki/File:EPA_TECHNICIAN_COLLECTS_WATER_SAMPLE_FROM_PAHRANAGAT_LAKE_ABOUT_10_MILES_SOUTH _OF_ALAMO_-_NARA_-_549007.jpg http://commons.wikimedia.org/wiki/File:DNA_orbit_animated_small.gif http://commons.wikimedia.org/wiki/File:DNA_sequence.svg Slide 24:Stein, L. Genome Biology 2010 11:207 Slide 36: http://commons.wikimedia.org/wiki/File:Mouse-19-Dec-2004.jpg 48