SlideShare a Scribd company logo
1 of 70
Genomics and bioinformatics in non-model
organisms: where is the
data tidal wave taking us?
C. Titus Brown
Assistant Professor
Microbiology; Computer Science; BEACON
Michigan State University
Feb 2014
ctb@msu.edu
Practical implications of sequencing -Molgula oculata

One graduate student;
Two transcriptomes;
Three draft genomes;
In four years.
Molgula oculata

Molgula occulta

Elijah Lowe

Ciona intestinalis
Research
Agricultural
genomics &
transcriptomics

Metagenomics
(Environmental &
host-associated)

Novel
computational
approaches

Computing
+
Biology
Education and
training

Good software
development

Capacity building

Evo-devo
genomics &
transcriptomics

Open science/
source/data/
access
Research
Agricultural
genomics &
transcriptomics

Metagenomics
(Environmental &
host-associated)

Novel
computational
approaches

Computing
+
Biology
Education and
training

Good software
development

Capacity building

Evo-devo
genomics &
transcriptomics

Open science/
source/data/
access
Our research philosophy:
 Enable good biology by generating hypotheses

worth testing.
 Try to maximize sensitivity of analyses, in light of

fairly high specificity in sequencing based
approaches.
 Collaborate intensively on research projects.
 Typically, share graduate students with ―wet‖ labs.

 Goal is to cross-train everyone involved.
Three mini-stories:
1.

Building better gene models for chicken

2.

Dealing with an endless stream of data

3.

Evaluating the effect of gene model
completeness on pathway prediction.
1. Building a better chicken (gene
model)
 Most extant computational tools focus on model

organisms..
 Assume low polymorphism (internal variation)
 Assume quality reference genome or transcriptome
 Assume somewhat reliable functional annotation
 More significant compute infrastructure

requirements
Likit Preeyanon

 How can we best use mRNAseq for chicken?
Interpreting RNAseq requires gene
models:

http://www.hitseq.com/images/RNA-seq_AS.jp
Marek‘s Disease project:
 To identify alternative splicing that contributes to

disease resistance.
w/Hans Cheng, USDA ADOL

Inbred line 6

Inbred line 7
Types of Alternative Splicing
40%

25%

<5%, more in plants, fungi, protozoa

Karen H, Lev-Maor G & Ast G Nat Genet 2010
Data
 RNA-Seq from chicken line 6 (resistant) and 7

(susceptible)
 Pre and post infection
 Single-end reads for assembly (~30 million reads x 4)
 Paired-end reads for validation (~40 million reads x 4)

 Chicken genome: galGal3
 ESTs from UCSC genome website
 mRNA from Genbank

w/Hans Cheng, USDA ADOL; Jerry Dodgson, M
Pipeline
Global
Assembl
y
k=21-31

Velvet 1.2.03
Oases 0.2.06

Local
Assembl
y k=2131

Trimming and
cleaning

Seqclean

Mapping to a genome

BLAT

Other gene models
Build all putative
isoforms

Gimme 0.9.0

Predict coding regions

ESTScan 2.1
Local Assembly – early attempt to scale
Tophat 2.0

Velvet/Oases
Assembler
Predicting putative isoforms
w/Gimme:

Source code is publicly available at https://github.com/ged-lab/gimme.git
Exon Graph approach (―Gimme‖)
exon2

exon1

exons2

intron1

exon3

intron2

Exon3.a

exon1

https://github.com/ged-lab/gimme.git

exon2

Exon3.b

exon3

Likit Preeyanon
Predicting putative isoforms
w/Gimme:

Source code is publicly available at https://github.com/ged-lab/gimme.git
We recover annotated isoforms…

USP15

Both annotated isoforms are detected by our pipeline.
…and we detect unknown
isoforms.

TOM1

Local assembly increase sensitivity of isoform detection.
Example of extended 3‘UTR
UTR

SLC25A3
Gene Model Summary
Method

Gene

Transcript

Global Assembly

14,832

32,311

Local Assembly

15,297

23,028

Global + Local Assembly

15,934

46,797

*Number of genes and transcripts might be overestimated due to incomplete assemb
and spurious splice junctions.
Cross-validation with technical
replicates
Later,
Does independent sequencing data confirm? better data => confirms
Dataset

Single-end
Mapped

Unmapped

Paired-end
Mapped

Unmapped

Line 6
uninfected

18,375,966
(77.93%)

5,203,586
(22.07%)

21,598,218
(64.16%)

12,065,659
(35.84%)

Line 6 infected

17,160,695
(73.18%)

6,288,286
(26.82%)

15,274,638
(63.89%)

8633855
(36.11%)

Line 7
uninfected

18,130,072
(75.77%)

5,795,737
(24.22%)

20,961,033
(63.67%)

11,960,299
(36.33%)

Line 7 infected

19,912,046
(78.51%)

5,450,521
(21.49%)

22,485,833
(65.22%)

11,992,002
(34.78%)
Cross-validation w/read splicing

95% of splice junctions have more than three spliced reads
Splice junction comparison
Assembled transcripts
104,366

Genbank mRNA
74,065

7,756

2,412

21,128

46,132
17,765

34,694

110,543
Expressed Sequence Tags
209,134
95% of splice junctions supported by > 4 reads.
Gimme pipeline
 Our pipeline can detect many isoforms
 Local assembly enhances isoform detection
 Cufflinks (mapping-based gene models) is not

superior to de novo transcriptome assembly in
chicken…
(Was Cufflinks trained on mouse/human?)
 The pipeline can be used to build gene models

for other organisms
 Pipeline can do incremental combining of new
data sets
How to detectSpliced reads
differential splicing

2
7

12
21

45
43

98
86

Read coverage

120 45
112 95

?

230
243
Exon Region Comparison

2
7

12
21

25 20
23 20

98
86

Read coverage

120 45
112 95

40
43

203
199
Skipped Exon

DEXseq
Skipped Exon
sulfatase
BRCA1 domain

Alternative 3‘UTR

DNA repair, apoptosis, DNA replication, genome stability
Differential Exon Usage
Summary
Number of exons
Adjusted p-value

False

True

0.1

18,631

66

0.01

18,656

41

0.001

18,663

34

Chromosome 1
Total 3,728 genes

Next steps: scaling analysis to entire genome.
And… interpretation (??)
Gene model thoughts - Can build gene models that represent the data

we have fairly well;
 Robust exon-exon splice site reporting;

 Planning ahead for multiple iterations of new

data;
 …interpretation of results? See story 3.
2. Endless data!
 It is now under $1000 to generate a new

mRNAseq data set.
 Collaborators routinely generate new data sets

every 3-6 months… (note: each of them, x 510…)
 How can we make use of this data iteratively!?
Making iterative use of new data.

Data!

Refined gene
models

Existing gene
models

Differential
expression

??

Some data will yield
new gene models, but
much will be redundant
(e.g. ―housekeeping‖
genes)
Digital normalization
Digital normalization
Digital normalization
Digital normalization
Digital normalization
Digital normalization
Digital normalization approach
A digital analog to cDNA library normalization,
diginorm:
 Is single pass: looks at each read only once;
 Does not ―collect‖ the majority of sequencing

errors;
 Keeps all low-coverage reads;

Enables analyses that are otherwise completely
impossible;
Integrated into several assemblers (Trinity and
Evaluating on ascidians (sea squirts):
Molgula oculata

Molgula oculata

Molgula occulta

Ciona intestinalis
Diginorm applied to Molgula
embryonic mRNAseq – set aside
~90% of data
No.$ reads Reads$
of$
kept
M.#
occulta$
F+3
M.#
occulta$
F+3
M.#
occulta$
F+4
M.#
occulta$
F+5
M.#
occulta$
F+6
M.#
occulta!Total
M.#
oculata$
F+3
M.#
oculata$
F+4
M.#
oculata$
F+6
M.#
oculata!Total

42,174,510
50,018,302
44,948,983
53,692,296
45,782,981
236,617,072
47,045,433
52,890,938
50,156,895
150,093,266

15,642,268
6,012,894
3,499,935
2,993,715
2,774,342
30,923,154
10,754,899
3,949,489
2,874,196
17,578,584

Percentage$
kept
?
?
?
?
?
13%
?
?
?
11.70%
But: does diginorm “lose” transcript
information? No.
M. occulta
Diginorm
Raw

37

13623

C. intestinalis

M. oculata
Diginorm
Raw

17

missing 2446

64

13646

15

missing 2398

C. intestinalis

Reciprocal best hit vs. Ciona
BLAST e-value cutoff: 1e-6

Elijah Lowe
Where are we taking diginorm?
 Streaming online algorithms only look at data

~once.
 Diginorm is streaming, online…

 Conceptually, can move many aspects of

sequence analysis into streaming mode.
=> Extraordinary potential for computational
efficiency.
=> Streaming, online variant
calling.

Single pass, reference free, tunable, streaming online varian
Potentially quite clinically useful.

See NIH BIG DATA grant, http://ged.msu.edu/
Prospective: sequencing tumor cells
 Goal: phylogenetically reconstruct causal ―driver

mutations‖ in face of passenger mutations.
 1000 cells x 3 Gbp x 20 coverage: 60 Tbp of

sequence.
 Most of this data will be redundant and not useful.
 Developing diginorm-based algorithms to

eliminate data while retaining variant information.

See NIH BIG DATA grant, http://ged.msu.edu/
3. Evaluating effects of gene models
on pathway prediction

Vertically integrated comparison.

Likit Preeyanon
KEGG Pathway
Ensembl Enriched KEGG Pathway
Term

Count

Benjamin

Cytokine-cytokine receptor interaction

36

6.2E-02

Lysosome

25

1.2E-01

Apoptosis

19

3.5E-01

Arginine and proline metabolism

12

3.1E-01

Starch and sucrose metabolism

9

3.4E-01

Toll-like receptor signaling pathway

19

3.7E-01

Natural killer cell mediated cytotoxicity

17

3.4E-01

Cytosolic DNA-sensing pathway

9

4.2E-01

Valine, leucine and isoleucine degradation

11

4.1E-01

Glutathione metabolism

10

4.3E-01

NOD-line receptor signaling pathway

11

4.6E-01

Intestinal immune network for IgA production

9

5.6E-01

VEGF signaling pathway

14

5.6E-01

PPAR signaling pathway

13

6E-01
Gimme Enriched KEGG Pathway
Term

Count

Benjamin

Cytokine-cytokine receptor interaction

34

3.7E-02

Toll-like receptor signaling pathway

22

2.7E-02

Jak-STAT signaling pathway

28

3.4E-02

Arginine and proline metabolism

13

4.5E-02

Lysosome

22

1.3E-01

Natural killer cell mediated cytotoxicity

17

1.6E-01

Alanine, aspartate and glutamate metabolism

9

1.8E-01

Amino sugar and nucleotide sugar metabolism

10

3.6E-01

Cysteine and methionine metabolism

9

4E-01

ECM-receptor interaction

16

3.7E-01

Apoptosis

16

3.7E-01

Glycosis / Gluconeogenesis

11

4E-01

DNA replication

8

3.8E-01

Cell adhesion molecules (CAMs)

19

4.6E-01

PPAR signaling pathway

12

6E-01

Intestinal immune network for IgA production

8

6.1E-01
Compared Enriched KEGG Pathway
Term
Cytokine-cytokine receptor interaction
Toll-like receptor signaling pathway

Common

Lysosome
Apoptosis

Arginine and proline metabolism
Natural killer cells
Intestinal immune network for IgA production
PPAR signaling pathway
Starch and sucrose

Ensembl

Valine, leucine and isoleucine degradation
Glutathione metabolism
NOD-like receptor signaling pathway
VEGF signaling pathway
Jak-STAT signaling pathway
Alanine, aspartate and glutamate metabolism
Amino sugar and nucleotide sugar metabolism
ECM-receptor interaction
Cell adhesion molecules (CAMs)
DNA replication

Gimme
Ensembl

Common

Gimme
INFB – we annotate UTR not
present in other gene models.
INFB – 3‘ bias + missing UTR =>
insensitive
Ensembl

Common

Gimme
So, where does this leave us?
 Our methods for generating hypotheses from

mRNAseq data are sensitive to references &
technical details of the approaches.
(This is expected but Bad.)
 We can build (and have built!) approaches that

we believe to be more accurate for non- or semimodel organisms.
(They‘re also open; try ‗em out.)
=> Standards for execution, evaluation,
comparison, and education.
khmer-protocols:
Read cleaning

 Effort to provide standard ―cheap‖

assembly protocols for the cloud.
Diginorm

 Entirely copy/paste; ~2-6 days from

raw reads to assembly,
annotations, and differential
expression analysis. ~$150 per
data set (on Amazon rental
computers)
 Open, versioned, forkable, citable.

(Announced at Davis in December ‗13!)

Assembly

Annotation

RSEM differential
expression
CC0; BSD; on github; in reStructuredText.
Summer NGS workshop (2010-2017)
A few thoughts on our
approach…
 Explicitly a ―protocol‖ – explicit steps, copy-paste,

customizable.
 No requirement for computational expertise or

significant computational hardware.
 ~1-5 days to teach a bench biologist to use.
 $100-150 of rental compute (―cloud computing‖)…
 …for $1000 data set.

 Adding in quality control and internal validation

steps.
Can we crowdsource bioinformatics?
We already are! Bioinformatics is already a
tremendously open and collaborative endeavor. (Let‘s
take advantage of it!)
―It‘s as if somewhere, out there, is a collection of totally
free software that can do a far better job than ours can,
with open, published methods, great support networks
and fantastic tutorials. But that‘s madness – who on
Earth would create such an amazing resource?‖

http://thescienceweb.wordpress.com/2014/02/21/bioinfo
rmatics-software-companies-have-no-clue-why-no-onebuys-their-products/
Where is the data tidal wave taking
biology!?
 A world with a lot more data, and, eventually, a lot

more information.
 A more integrative world: genomics, molecular

function, evolution, population genetics,
monitoring, ??, and models that feed back into
experimental design.
―Data-Intensive Biology‖
Data intensive biology & hypothesis
generation
 My interest in biological data is to enable better

hypothesis generation.
Additional projects - Bacterial symbionts of bone eating worms – w/Shana Goffredi.

(ISME, 2013)
 Genome of Haemonchus contortus, a parasitic nematode (with

Erich Schwarz and Robin Gasser). (Genome Biology, 2013)
 Soil metagenome analysis (with Jim Tiedje, Susannah Tringe,

and Janet Jansson). (In review, PNAS.)
 Lamprey transcriptome (with Weiming Li). (in preparation).
 Ascidian genomes and transcriptomes (with Billie Swalla). (in

preparation)
 Loligo pealeii (the giant axon squid) – 5 transcriptomes and skim

genome posted publicly (Feb 2014).
In progress
 Cattle paratuberculosis analysis (w/Paul

Coussens).
 Improving the chick genome using nth-generation

sequencing technology (PacBio, Moleculo).
and building software and protocols to make it
easy for the next 1000 genomes.
% of reads aligning

Moleculo data vs chick genome.

Luiz Irber

Read length
What are the challenges ahead?
 Obviously: Genotype/phenotype mapping.
 But also: Conserved unknown/unannotated

genes.
 Data sharing, and more generally open

access/data/source/science.
 Data integration!
The problem of lopsided gene characterization is
pervasive: e.g., the brain "ignorome"

"...ignorome genes do not differ from well-studied genes in terms of connectivity in coexpression
networks. Nor do they differ with respect to numbers of orthologs, paralogs, or protein domains.
The major distinguishing characteristic between these sets of genes is date of discovery, early
discovery being associated with greater research momentum—a genomic bandwagon effect."

lide courtesy Erich Schwarz

Ref.: Pandey et al. (2014), PLoS One 11, e88889.
Thanks!
Thanks!
 References and grants at

http://ged.msu.edu/research.html
 Software at http://github.com/ged-lab/

 Blog at http://ivory.idyll.org/blog/
 Twitter: @ctitusbrown

E-mail me: ctb@msu.edu

More Related Content

What's hot

American Gut Project presentation at Masaryk University
American Gut Project presentation at Masaryk UniversityAmerican Gut Project presentation at Masaryk University
American Gut Project presentation at Masaryk Universitymcdonadt
 
RNA-Seq To Identify Novel Markers For Research on Neural Tissue Differentiation
RNA-Seq To Identify Novel Markers For Research on Neural Tissue DifferentiationRNA-Seq To Identify Novel Markers For Research on Neural Tissue Differentiation
RNA-Seq To Identify Novel Markers For Research on Neural Tissue DifferentiationThermo Fisher Scientific
 
Microbial Phylogenomics (EVE161) Class 17: Genomes from Uncultured
Microbial Phylogenomics (EVE161) Class 17: Genomes from UnculturedMicrobial Phylogenomics (EVE161) Class 17: Genomes from Uncultured
Microbial Phylogenomics (EVE161) Class 17: Genomes from UnculturedJonathan Eisen
 
F Giordano ScanPAV Analysis Pipeline
F Giordano ScanPAV Analysis PipelineF Giordano ScanPAV Analysis Pipeline
F Giordano ScanPAV Analysis PipelineFrancesca Giordano
 
CRISPR Screening: the What, Why and How
CRISPR Screening: the What, Why and HowCRISPR Screening: the What, Why and How
CRISPR Screening: the What, Why and HowHorizonDiscovery
 
2013 ucdavis-smbe-eukaryotes
2013 ucdavis-smbe-eukaryotes2013 ucdavis-smbe-eukaryotes
2013 ucdavis-smbe-eukaryotesc.titus.brown
 
'Novel technologies to study the resistome'
'Novel technologies to study the resistome''Novel technologies to study the resistome'
'Novel technologies to study the resistome'Willem van Schaik
 
NCER Position on Crispr-Cas9
NCER Position on Crispr-Cas9NCER Position on Crispr-Cas9
NCER Position on Crispr-Cas9Joe Szczepaniak
 
B.Tech Biotechnology II Elements of Biotechnology Unit 4 DNA Fingerprinting
B.Tech Biotechnology II Elements of Biotechnology Unit 4 DNA FingerprintingB.Tech Biotechnology II Elements of Biotechnology Unit 4 DNA Fingerprinting
B.Tech Biotechnology II Elements of Biotechnology Unit 4 DNA FingerprintingRai University
 
Techniques of-biotechnology-mcclean-good
Techniques of-biotechnology-mcclean-goodTechniques of-biotechnology-mcclean-good
Techniques of-biotechnology-mcclean-goodrcolatru
 
When viruses are beneficial
When viruses are beneficialWhen viruses are beneficial
When viruses are beneficialDanielDuvalle
 
Bayesian Taxonomic Assignment for the Next-Generation Metagenomics
Bayesian Taxonomic Assignment for the Next-Generation MetagenomicsBayesian Taxonomic Assignment for the Next-Generation Metagenomics
Bayesian Taxonomic Assignment for the Next-Generation MetagenomicsJonathan Eisen
 
Applications of biotechnology in forensic sciences
Applications of biotechnology in forensic sciencesApplications of biotechnology in forensic sciences
Applications of biotechnology in forensic sciencesZahra Naz
 
SURCA 2016 poster
SURCA 2016 posterSURCA 2016 poster
SURCA 2016 posterMitchell Go
 
2015 ohsu-metagenome
2015 ohsu-metagenome2015 ohsu-metagenome
2015 ohsu-metagenomec.titus.brown
 
Studying the microbiome
Studying the microbiomeStudying the microbiome
Studying the microbiomeMick Watson
 
Crispr-cas9 food editing (genetic)
Crispr-cas9 food editing (genetic)Crispr-cas9 food editing (genetic)
Crispr-cas9 food editing (genetic)GhaidaAlrumaizan
 

What's hot (20)

2013 duke-talk
2013 duke-talk2013 duke-talk
2013 duke-talk
 
American Gut Project presentation at Masaryk University
American Gut Project presentation at Masaryk UniversityAmerican Gut Project presentation at Masaryk University
American Gut Project presentation at Masaryk University
 
RNA-Seq To Identify Novel Markers For Research on Neural Tissue Differentiation
RNA-Seq To Identify Novel Markers For Research on Neural Tissue DifferentiationRNA-Seq To Identify Novel Markers For Research on Neural Tissue Differentiation
RNA-Seq To Identify Novel Markers For Research on Neural Tissue Differentiation
 
2014 ucl
2014 ucl2014 ucl
2014 ucl
 
Microbial Phylogenomics (EVE161) Class 17: Genomes from Uncultured
Microbial Phylogenomics (EVE161) Class 17: Genomes from UnculturedMicrobial Phylogenomics (EVE161) Class 17: Genomes from Uncultured
Microbial Phylogenomics (EVE161) Class 17: Genomes from Uncultured
 
F Giordano ScanPAV Analysis Pipeline
F Giordano ScanPAV Analysis PipelineF Giordano ScanPAV Analysis Pipeline
F Giordano ScanPAV Analysis Pipeline
 
CRISPR Screening: the What, Why and How
CRISPR Screening: the What, Why and HowCRISPR Screening: the What, Why and How
CRISPR Screening: the What, Why and How
 
2013 ucdavis-smbe-eukaryotes
2013 ucdavis-smbe-eukaryotes2013 ucdavis-smbe-eukaryotes
2013 ucdavis-smbe-eukaryotes
 
Testing for Food Authenticity
Testing for Food AuthenticityTesting for Food Authenticity
Testing for Food Authenticity
 
'Novel technologies to study the resistome'
'Novel technologies to study the resistome''Novel technologies to study the resistome'
'Novel technologies to study the resistome'
 
NCER Position on Crispr-Cas9
NCER Position on Crispr-Cas9NCER Position on Crispr-Cas9
NCER Position on Crispr-Cas9
 
B.Tech Biotechnology II Elements of Biotechnology Unit 4 DNA Fingerprinting
B.Tech Biotechnology II Elements of Biotechnology Unit 4 DNA FingerprintingB.Tech Biotechnology II Elements of Biotechnology Unit 4 DNA Fingerprinting
B.Tech Biotechnology II Elements of Biotechnology Unit 4 DNA Fingerprinting
 
Techniques of-biotechnology-mcclean-good
Techniques of-biotechnology-mcclean-goodTechniques of-biotechnology-mcclean-good
Techniques of-biotechnology-mcclean-good
 
When viruses are beneficial
When viruses are beneficialWhen viruses are beneficial
When viruses are beneficial
 
Bayesian Taxonomic Assignment for the Next-Generation Metagenomics
Bayesian Taxonomic Assignment for the Next-Generation MetagenomicsBayesian Taxonomic Assignment for the Next-Generation Metagenomics
Bayesian Taxonomic Assignment for the Next-Generation Metagenomics
 
Applications of biotechnology in forensic sciences
Applications of biotechnology in forensic sciencesApplications of biotechnology in forensic sciences
Applications of biotechnology in forensic sciences
 
SURCA 2016 poster
SURCA 2016 posterSURCA 2016 poster
SURCA 2016 poster
 
2015 ohsu-metagenome
2015 ohsu-metagenome2015 ohsu-metagenome
2015 ohsu-metagenome
 
Studying the microbiome
Studying the microbiomeStudying the microbiome
Studying the microbiome
 
Crispr-cas9 food editing (genetic)
Crispr-cas9 food editing (genetic)Crispr-cas9 food editing (genetic)
Crispr-cas9 food editing (genetic)
 

Viewers also liked

Shepley ross introduction_od_es_manual_4th
Shepley ross introduction_od_es_manual_4thShepley ross introduction_od_es_manual_4th
Shepley ross introduction_od_es_manual_4thgabo GAG
 
Roll Over Power Point Advanced Edu Safety Gew 6 10 09
Roll Over Power Point Advanced Edu Safety Gew 6 10 09Roll Over Power Point Advanced Edu Safety Gew 6 10 09
Roll Over Power Point Advanced Edu Safety Gew 6 10 09George Wendleton
 
Testtestest
TesttestestTesttestest
Testtestestderwick
 
Buscadores (Fodehum)
Buscadores (Fodehum)Buscadores (Fodehum)
Buscadores (Fodehum)grupo3fodehum
 
Seniorforsker Uffe Jørgensen; Aarhus Universitet
Seniorforsker Uffe Jørgensen; Aarhus UniversitetSeniorforsker Uffe Jørgensen; Aarhus Universitet
Seniorforsker Uffe Jørgensen; Aarhus UniversitetBertel Bolt-Jørgensen
 
Where to focus event innovation? - An audience led approach
Where to focus event innovation? - An audience led approachWhere to focus event innovation? - An audience led approach
Where to focus event innovation? - An audience led approachLive Union
 
Experiments in Web 2.0: creative communications and digital footprints
Experiments in Web 2.0: creative communications and digital footprints Experiments in Web 2.0: creative communications and digital footprints
Experiments in Web 2.0: creative communications and digital footprints Judith Baines
 
About the company
About the company About the company
About the company Sponsormob
 
Long term evaluation of IL programme paper
Long term evaluation of IL programme paperLong term evaluation of IL programme paper
Long term evaluation of IL programme paperTina Hohmann
 
Transformatie door innovatie IGC Amsterdam
Transformatie door innovatie IGC AmsterdamTransformatie door innovatie IGC Amsterdam
Transformatie door innovatie IGC AmsterdamPiet van Vugt
 
Cloudxp keynote 19 sept pvu
Cloudxp keynote 19 sept pvuCloudxp keynote 19 sept pvu
Cloudxp keynote 19 sept pvuPiet van Vugt
 
Trainings Evaluation Reports WPS Phase-II Layyah
Trainings Evaluation Reports WPS Phase-II LayyahTrainings Evaluation Reports WPS Phase-II Layyah
Trainings Evaluation Reports WPS Phase-II LayyahZafar Ahmad
 
La comunicazione-del-vino-ai-tempi-di-facebook
La comunicazione-del-vino-ai-tempi-di-facebookLa comunicazione-del-vino-ai-tempi-di-facebook
La comunicazione-del-vino-ai-tempi-di-facebookSlawka G. Scarso
 
Global Brand Management Series: Internet Marketing for Start-Ups in Taiwan
Global Brand Management Series: Internet Marketing for Start-Ups in TaiwanGlobal Brand Management Series: Internet Marketing for Start-Ups in Taiwan
Global Brand Management Series: Internet Marketing for Start-Ups in Taiwankwoolf
 

Viewers also liked (20)

Shepley ross introduction_od_es_manual_4th
Shepley ross introduction_od_es_manual_4thShepley ross introduction_od_es_manual_4th
Shepley ross introduction_od_es_manual_4th
 
2015 Ohio Ballot Issues
2015 Ohio Ballot Issues2015 Ohio Ballot Issues
2015 Ohio Ballot Issues
 
Roll Over Power Point Advanced Edu Safety Gew 6 10 09
Roll Over Power Point Advanced Edu Safety Gew 6 10 09Roll Over Power Point Advanced Edu Safety Gew 6 10 09
Roll Over Power Point Advanced Edu Safety Gew 6 10 09
 
Pagerank
PagerankPagerank
Pagerank
 
Testtestest
TesttestestTesttestest
Testtestest
 
Demystifying SEO
Demystifying SEODemystifying SEO
Demystifying SEO
 
Buscadores (Fodehum)
Buscadores (Fodehum)Buscadores (Fodehum)
Buscadores (Fodehum)
 
Seniorforsker Uffe Jørgensen; Aarhus Universitet
Seniorforsker Uffe Jørgensen; Aarhus UniversitetSeniorforsker Uffe Jørgensen; Aarhus Universitet
Seniorforsker Uffe Jørgensen; Aarhus Universitet
 
Where to focus event innovation? - An audience led approach
Where to focus event innovation? - An audience led approachWhere to focus event innovation? - An audience led approach
Where to focus event innovation? - An audience led approach
 
Roundtable Discussions with Experts - India
Roundtable Discussions with Experts - India Roundtable Discussions with Experts - India
Roundtable Discussions with Experts - India
 
Experiments in Web 2.0: creative communications and digital footprints
Experiments in Web 2.0: creative communications and digital footprints Experiments in Web 2.0: creative communications and digital footprints
Experiments in Web 2.0: creative communications and digital footprints
 
XBRL in Oracle 11i and R12
XBRL in Oracle 11i and R12XBRL in Oracle 11i and R12
XBRL in Oracle 11i and R12
 
About the company
About the company About the company
About the company
 
Long term evaluation of IL programme paper
Long term evaluation of IL programme paperLong term evaluation of IL programme paper
Long term evaluation of IL programme paper
 
OW2 Nanoko
OW2 NanokoOW2 Nanoko
OW2 Nanoko
 
Transformatie door innovatie IGC Amsterdam
Transformatie door innovatie IGC AmsterdamTransformatie door innovatie IGC Amsterdam
Transformatie door innovatie IGC Amsterdam
 
Cloudxp keynote 19 sept pvu
Cloudxp keynote 19 sept pvuCloudxp keynote 19 sept pvu
Cloudxp keynote 19 sept pvu
 
Trainings Evaluation Reports WPS Phase-II Layyah
Trainings Evaluation Reports WPS Phase-II LayyahTrainings Evaluation Reports WPS Phase-II Layyah
Trainings Evaluation Reports WPS Phase-II Layyah
 
La comunicazione-del-vino-ai-tempi-di-facebook
La comunicazione-del-vino-ai-tempi-di-facebookLa comunicazione-del-vino-ai-tempi-di-facebook
La comunicazione-del-vino-ai-tempi-di-facebook
 
Global Brand Management Series: Internet Marketing for Start-Ups in Taiwan
Global Brand Management Series: Internet Marketing for Start-Ups in TaiwanGlobal Brand Management Series: Internet Marketing for Start-Ups in Taiwan
Global Brand Management Series: Internet Marketing for Start-Ups in Taiwan
 

Similar to 2014 davis-talk

Introduction to Apollo: A webinar for the i5K Research Community
Introduction to Apollo: A webinar for the i5K Research CommunityIntroduction to Apollo: A webinar for the i5K Research Community
Introduction to Apollo: A webinar for the i5K Research CommunityMonica Munoz-Torres
 
Apollo Introduction for the Chestnut Research Community
Apollo Introduction for the Chestnut Research CommunityApollo Introduction for the Chestnut Research Community
Apollo Introduction for the Chestnut Research CommunityMonica Munoz-Torres
 
Inference and informatics in a 'sequenced' world
Inference and informatics in a 'sequenced' worldInference and informatics in a 'sequenced' world
Inference and informatics in a 'sequenced' worldJoe Parker
 
Apollo - A webinar for the Phascolarctos cinereus research community
Apollo - A webinar for the Phascolarctos cinereus research communityApollo - A webinar for the Phascolarctos cinereus research community
Apollo - A webinar for the Phascolarctos cinereus research communityMonica Munoz-Torres
 
Minimal and Compact
Minimal and CompactMinimal and Compact
Minimal and CompactJoshua Gefen
 
Apollo : A workshop for the Manakin Research Coordination Network
Apollo: A workshop for the Manakin Research Coordination NetworkApollo: A workshop for the Manakin Research Coordination Network
Apollo : A workshop for the Manakin Research Coordination NetworkMonica Munoz-Torres
 
ppgardner-lecture03-genomesize-complexity.pdf
ppgardner-lecture03-genomesize-complexity.pdfppgardner-lecture03-genomesize-complexity.pdf
ppgardner-lecture03-genomesize-complexity.pdfPaul Gardner
 
2014 marine-microbes-grc
2014 marine-microbes-grc2014 marine-microbes-grc
2014 marine-microbes-grcc.titus.brown
 
Jillian ms defense-4-14-14-ja-novid3
Jillian ms defense-4-14-14-ja-novid3Jillian ms defense-4-14-14-ja-novid3
Jillian ms defense-4-14-14-ja-novid3Jillian Aurisano
 
Genome Curation using Apollo - Workshop at UTK
Genome Curation using Apollo - Workshop at UTKGenome Curation using Apollo - Workshop at UTK
Genome Curation using Apollo - Workshop at UTKMonica Munoz-Torres
 
01 Slide_Oscar
01 Slide_Oscar01 Slide_Oscar
01 Slide_OscarOscar Chan
 

Similar to 2014 davis-talk (20)

2014 naples
2014 naples2014 naples
2014 naples
 
2014 villefranche
2014 villefranche2014 villefranche
2014 villefranche
 
Introduction to Apollo: A webinar for the i5K Research Community
Introduction to Apollo: A webinar for the i5K Research CommunityIntroduction to Apollo: A webinar for the i5K Research Community
Introduction to Apollo: A webinar for the i5K Research Community
 
Apollo Introduction for the Chestnut Research Community
Apollo Introduction for the Chestnut Research CommunityApollo Introduction for the Chestnut Research Community
Apollo Introduction for the Chestnut Research Community
 
Inference and informatics in a 'sequenced' world
Inference and informatics in a 'sequenced' worldInference and informatics in a 'sequenced' world
Inference and informatics in a 'sequenced' world
 
Apollo - A webinar for the Phascolarctos cinereus research community
Apollo - A webinar for the Phascolarctos cinereus research communityApollo - A webinar for the Phascolarctos cinereus research community
Apollo - A webinar for the Phascolarctos cinereus research community
 
Minimal and Compact
Minimal and CompactMinimal and Compact
Minimal and Compact
 
2014 sage-talk
2014 sage-talk2014 sage-talk
2014 sage-talk
 
Apollo : A workshop for the Manakin Research Coordination Network
Apollo: A workshop for the Manakin Research Coordination NetworkApollo: A workshop for the Manakin Research Coordination Network
Apollo : A workshop for the Manakin Research Coordination Network
 
ppgardner-lecture03-genomesize-complexity.pdf
ppgardner-lecture03-genomesize-complexity.pdfppgardner-lecture03-genomesize-complexity.pdf
ppgardner-lecture03-genomesize-complexity.pdf
 
CROP GENOME SEQUENCING
CROP GENOME SEQUENCINGCROP GENOME SEQUENCING
CROP GENOME SEQUENCING
 
20140710 6 c_mason_ercc2.0_workshop
20140710 6 c_mason_ercc2.0_workshop20140710 6 c_mason_ercc2.0_workshop
20140710 6 c_mason_ercc2.0_workshop
 
Genome Curation using Apollo
Genome Curation using ApolloGenome Curation using Apollo
Genome Curation using Apollo
 
Genome project.pdf
Genome project.pdfGenome project.pdf
Genome project.pdf
 
2014 marine-microbes-grc
2014 marine-microbes-grc2014 marine-microbes-grc
2014 marine-microbes-grc
 
Apolo Taller en BIOS
Apolo Taller en BIOS Apolo Taller en BIOS
Apolo Taller en BIOS
 
New generation Sequencing
New generation Sequencing New generation Sequencing
New generation Sequencing
 
Jillian ms defense-4-14-14-ja-novid3
Jillian ms defense-4-14-14-ja-novid3Jillian ms defense-4-14-14-ja-novid3
Jillian ms defense-4-14-14-ja-novid3
 
Genome Curation using Apollo - Workshop at UTK
Genome Curation using Apollo - Workshop at UTKGenome Curation using Apollo - Workshop at UTK
Genome Curation using Apollo - Workshop at UTK
 
01 Slide_Oscar
01 Slide_Oscar01 Slide_Oscar
01 Slide_Oscar
 

More from c.titus.brown

More from c.titus.brown (20)

2016 bergen-sars
2016 bergen-sars2016 bergen-sars
2016 bergen-sars
 
2016 davis-plantbio
2016 davis-plantbio2016 davis-plantbio
2016 davis-plantbio
 
2016 davis-biotech
2016 davis-biotech2016 davis-biotech
2016 davis-biotech
 
2015 genome-center
2015 genome-center2015 genome-center
2015 genome-center
 
2015 beacon-metagenome-tutorial
2015 beacon-metagenome-tutorial2015 beacon-metagenome-tutorial
2015 beacon-metagenome-tutorial
 
2015 aem-grs-keynote
2015 aem-grs-keynote2015 aem-grs-keynote
2015 aem-grs-keynote
 
2015 msu-code-review
2015 msu-code-review2015 msu-code-review
2015 msu-code-review
 
2015 illinois-talk
2015 illinois-talk2015 illinois-talk
2015 illinois-talk
 
2015 mcgill-talk
2015 mcgill-talk2015 mcgill-talk
2015 mcgill-talk
 
2015 opencon-webcast
2015 opencon-webcast2015 opencon-webcast
2015 opencon-webcast
 
2015 vancouver-vanbug
2015 vancouver-vanbug2015 vancouver-vanbug
2015 vancouver-vanbug
 
2015 osu-metagenome
2015 osu-metagenome2015 osu-metagenome
2015 osu-metagenome
 
2015 balti-and-bioinformatics
2015 balti-and-bioinformatics2015 balti-and-bioinformatics
2015 balti-and-bioinformatics
 
2015 pag-chicken
2015 pag-chicken2015 pag-chicken
2015 pag-chicken
 
2015 pag-metagenome
2015 pag-metagenome2015 pag-metagenome
2015 pag-metagenome
 
2014 bangkok-talk
2014 bangkok-talk2014 bangkok-talk
2014 bangkok-talk
 
2014 anu-canberra-streaming
2014 anu-canberra-streaming2014 anu-canberra-streaming
2014 anu-canberra-streaming
 
2014 nicta-reproducibility
2014 nicta-reproducibility2014 nicta-reproducibility
2014 nicta-reproducibility
 
2014 aus-agta
2014 aus-agta2014 aus-agta
2014 aus-agta
 
2014 abic-talk
2014 abic-talk2014 abic-talk
2014 abic-talk
 

Recently uploaded

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 

Recently uploaded (20)

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 

2014 davis-talk

  • 1. Genomics and bioinformatics in non-model organisms: where is the data tidal wave taking us? C. Titus Brown Assistant Professor Microbiology; Computer Science; BEACON Michigan State University Feb 2014 ctb@msu.edu
  • 2. Practical implications of sequencing -Molgula oculata One graduate student; Two transcriptomes; Three draft genomes; In four years. Molgula oculata Molgula occulta Elijah Lowe Ciona intestinalis
  • 3. Research Agricultural genomics & transcriptomics Metagenomics (Environmental & host-associated) Novel computational approaches Computing + Biology Education and training Good software development Capacity building Evo-devo genomics & transcriptomics Open science/ source/data/ access
  • 4. Research Agricultural genomics & transcriptomics Metagenomics (Environmental & host-associated) Novel computational approaches Computing + Biology Education and training Good software development Capacity building Evo-devo genomics & transcriptomics Open science/ source/data/ access
  • 5. Our research philosophy:  Enable good biology by generating hypotheses worth testing.  Try to maximize sensitivity of analyses, in light of fairly high specificity in sequencing based approaches.  Collaborate intensively on research projects.  Typically, share graduate students with ―wet‖ labs.  Goal is to cross-train everyone involved.
  • 6. Three mini-stories: 1. Building better gene models for chicken 2. Dealing with an endless stream of data 3. Evaluating the effect of gene model completeness on pathway prediction.
  • 7. 1. Building a better chicken (gene model)  Most extant computational tools focus on model organisms..  Assume low polymorphism (internal variation)  Assume quality reference genome or transcriptome  Assume somewhat reliable functional annotation  More significant compute infrastructure requirements Likit Preeyanon  How can we best use mRNAseq for chicken?
  • 8. Interpreting RNAseq requires gene models: http://www.hitseq.com/images/RNA-seq_AS.jp
  • 9. Marek‘s Disease project:  To identify alternative splicing that contributes to disease resistance. w/Hans Cheng, USDA ADOL Inbred line 6 Inbred line 7
  • 10. Types of Alternative Splicing 40% 25% <5%, more in plants, fungi, protozoa Karen H, Lev-Maor G & Ast G Nat Genet 2010
  • 11. Data  RNA-Seq from chicken line 6 (resistant) and 7 (susceptible)  Pre and post infection  Single-end reads for assembly (~30 million reads x 4)  Paired-end reads for validation (~40 million reads x 4)  Chicken genome: galGal3  ESTs from UCSC genome website  mRNA from Genbank w/Hans Cheng, USDA ADOL; Jerry Dodgson, M
  • 12. Pipeline Global Assembl y k=21-31 Velvet 1.2.03 Oases 0.2.06 Local Assembl y k=2131 Trimming and cleaning Seqclean Mapping to a genome BLAT Other gene models Build all putative isoforms Gimme 0.9.0 Predict coding regions ESTScan 2.1
  • 13. Local Assembly – early attempt to scale Tophat 2.0 Velvet/Oases Assembler
  • 14. Predicting putative isoforms w/Gimme: Source code is publicly available at https://github.com/ged-lab/gimme.git
  • 15. Exon Graph approach (―Gimme‖) exon2 exon1 exons2 intron1 exon3 intron2 Exon3.a exon1 https://github.com/ged-lab/gimme.git exon2 Exon3.b exon3 Likit Preeyanon
  • 16. Predicting putative isoforms w/Gimme: Source code is publicly available at https://github.com/ged-lab/gimme.git
  • 17. We recover annotated isoforms… USP15 Both annotated isoforms are detected by our pipeline.
  • 18. …and we detect unknown isoforms. TOM1 Local assembly increase sensitivity of isoform detection.
  • 19. Example of extended 3‘UTR UTR SLC25A3
  • 20. Gene Model Summary Method Gene Transcript Global Assembly 14,832 32,311 Local Assembly 15,297 23,028 Global + Local Assembly 15,934 46,797 *Number of genes and transcripts might be overestimated due to incomplete assemb and spurious splice junctions.
  • 21. Cross-validation with technical replicates Later, Does independent sequencing data confirm? better data => confirms Dataset Single-end Mapped Unmapped Paired-end Mapped Unmapped Line 6 uninfected 18,375,966 (77.93%) 5,203,586 (22.07%) 21,598,218 (64.16%) 12,065,659 (35.84%) Line 6 infected 17,160,695 (73.18%) 6,288,286 (26.82%) 15,274,638 (63.89%) 8633855 (36.11%) Line 7 uninfected 18,130,072 (75.77%) 5,795,737 (24.22%) 20,961,033 (63.67%) 11,960,299 (36.33%) Line 7 infected 19,912,046 (78.51%) 5,450,521 (21.49%) 22,485,833 (65.22%) 11,992,002 (34.78%)
  • 22. Cross-validation w/read splicing 95% of splice junctions have more than three spliced reads
  • 23. Splice junction comparison Assembled transcripts 104,366 Genbank mRNA 74,065 7,756 2,412 21,128 46,132 17,765 34,694 110,543 Expressed Sequence Tags 209,134 95% of splice junctions supported by > 4 reads.
  • 24. Gimme pipeline  Our pipeline can detect many isoforms  Local assembly enhances isoform detection  Cufflinks (mapping-based gene models) is not superior to de novo transcriptome assembly in chicken… (Was Cufflinks trained on mouse/human?)  The pipeline can be used to build gene models for other organisms  Pipeline can do incremental combining of new data sets
  • 25. How to detectSpliced reads differential splicing 2 7 12 21 45 43 98 86 Read coverage 120 45 112 95 ? 230 243
  • 26. Exon Region Comparison 2 7 12 21 25 20 23 20 98 86 Read coverage 120 45 112 95 40 43 203 199
  • 29. BRCA1 domain Alternative 3‘UTR DNA repair, apoptosis, DNA replication, genome stability
  • 30. Differential Exon Usage Summary Number of exons Adjusted p-value False True 0.1 18,631 66 0.01 18,656 41 0.001 18,663 34 Chromosome 1 Total 3,728 genes Next steps: scaling analysis to entire genome. And… interpretation (??)
  • 31. Gene model thoughts - Can build gene models that represent the data we have fairly well;  Robust exon-exon splice site reporting;  Planning ahead for multiple iterations of new data;  …interpretation of results? See story 3.
  • 32. 2. Endless data!  It is now under $1000 to generate a new mRNAseq data set.  Collaborators routinely generate new data sets every 3-6 months… (note: each of them, x 510…)  How can we make use of this data iteratively!?
  • 33. Making iterative use of new data. Data! Refined gene models Existing gene models Differential expression ?? Some data will yield new gene models, but much will be redundant (e.g. ―housekeeping‖ genes)
  • 40. Digital normalization approach A digital analog to cDNA library normalization, diginorm:  Is single pass: looks at each read only once;  Does not ―collect‖ the majority of sequencing errors;  Keeps all low-coverage reads; Enables analyses that are otherwise completely impossible; Integrated into several assemblers (Trinity and
  • 41. Evaluating on ascidians (sea squirts): Molgula oculata Molgula oculata Molgula occulta Ciona intestinalis
  • 42. Diginorm applied to Molgula embryonic mRNAseq – set aside ~90% of data No.$ reads Reads$ of$ kept M.# occulta$ F+3 M.# occulta$ F+3 M.# occulta$ F+4 M.# occulta$ F+5 M.# occulta$ F+6 M.# occulta!Total M.# oculata$ F+3 M.# oculata$ F+4 M.# oculata$ F+6 M.# oculata!Total 42,174,510 50,018,302 44,948,983 53,692,296 45,782,981 236,617,072 47,045,433 52,890,938 50,156,895 150,093,266 15,642,268 6,012,894 3,499,935 2,993,715 2,774,342 30,923,154 10,754,899 3,949,489 2,874,196 17,578,584 Percentage$ kept ? ? ? ? ? 13% ? ? ? 11.70%
  • 43. But: does diginorm “lose” transcript information? No. M. occulta Diginorm Raw 37 13623 C. intestinalis M. oculata Diginorm Raw 17 missing 2446 64 13646 15 missing 2398 C. intestinalis Reciprocal best hit vs. Ciona BLAST e-value cutoff: 1e-6 Elijah Lowe
  • 44. Where are we taking diginorm?  Streaming online algorithms only look at data ~once.  Diginorm is streaming, online…  Conceptually, can move many aspects of sequence analysis into streaming mode. => Extraordinary potential for computational efficiency.
  • 45. => Streaming, online variant calling. Single pass, reference free, tunable, streaming online varian Potentially quite clinically useful. See NIH BIG DATA grant, http://ged.msu.edu/
  • 46. Prospective: sequencing tumor cells  Goal: phylogenetically reconstruct causal ―driver mutations‖ in face of passenger mutations.  1000 cells x 3 Gbp x 20 coverage: 60 Tbp of sequence.  Most of this data will be redundant and not useful.  Developing diginorm-based algorithms to eliminate data while retaining variant information. See NIH BIG DATA grant, http://ged.msu.edu/
  • 47. 3. Evaluating effects of gene models on pathway prediction Vertically integrated comparison. Likit Preeyanon
  • 49. Ensembl Enriched KEGG Pathway Term Count Benjamin Cytokine-cytokine receptor interaction 36 6.2E-02 Lysosome 25 1.2E-01 Apoptosis 19 3.5E-01 Arginine and proline metabolism 12 3.1E-01 Starch and sucrose metabolism 9 3.4E-01 Toll-like receptor signaling pathway 19 3.7E-01 Natural killer cell mediated cytotoxicity 17 3.4E-01 Cytosolic DNA-sensing pathway 9 4.2E-01 Valine, leucine and isoleucine degradation 11 4.1E-01 Glutathione metabolism 10 4.3E-01 NOD-line receptor signaling pathway 11 4.6E-01 Intestinal immune network for IgA production 9 5.6E-01 VEGF signaling pathway 14 5.6E-01 PPAR signaling pathway 13 6E-01
  • 50. Gimme Enriched KEGG Pathway Term Count Benjamin Cytokine-cytokine receptor interaction 34 3.7E-02 Toll-like receptor signaling pathway 22 2.7E-02 Jak-STAT signaling pathway 28 3.4E-02 Arginine and proline metabolism 13 4.5E-02 Lysosome 22 1.3E-01 Natural killer cell mediated cytotoxicity 17 1.6E-01 Alanine, aspartate and glutamate metabolism 9 1.8E-01 Amino sugar and nucleotide sugar metabolism 10 3.6E-01 Cysteine and methionine metabolism 9 4E-01 ECM-receptor interaction 16 3.7E-01 Apoptosis 16 3.7E-01 Glycosis / Gluconeogenesis 11 4E-01 DNA replication 8 3.8E-01 Cell adhesion molecules (CAMs) 19 4.6E-01 PPAR signaling pathway 12 6E-01 Intestinal immune network for IgA production 8 6.1E-01
  • 51. Compared Enriched KEGG Pathway Term Cytokine-cytokine receptor interaction Toll-like receptor signaling pathway Common Lysosome Apoptosis Arginine and proline metabolism Natural killer cells Intestinal immune network for IgA production PPAR signaling pathway Starch and sucrose Ensembl Valine, leucine and isoleucine degradation Glutathione metabolism NOD-like receptor signaling pathway VEGF signaling pathway Jak-STAT signaling pathway Alanine, aspartate and glutamate metabolism Amino sugar and nucleotide sugar metabolism ECM-receptor interaction Cell adhesion molecules (CAMs) DNA replication Gimme
  • 53. INFB – we annotate UTR not present in other gene models.
  • 54. INFB – 3‘ bias + missing UTR => insensitive
  • 56. So, where does this leave us?  Our methods for generating hypotheses from mRNAseq data are sensitive to references & technical details of the approaches. (This is expected but Bad.)  We can build (and have built!) approaches that we believe to be more accurate for non- or semimodel organisms. (They‘re also open; try ‗em out.) => Standards for execution, evaluation, comparison, and education.
  • 57. khmer-protocols: Read cleaning  Effort to provide standard ―cheap‖ assembly protocols for the cloud. Diginorm  Entirely copy/paste; ~2-6 days from raw reads to assembly, annotations, and differential expression analysis. ~$150 per data set (on Amazon rental computers)  Open, versioned, forkable, citable. (Announced at Davis in December ‗13!) Assembly Annotation RSEM differential expression
  • 58. CC0; BSD; on github; in reStructuredText.
  • 59. Summer NGS workshop (2010-2017)
  • 60. A few thoughts on our approach…  Explicitly a ―protocol‖ – explicit steps, copy-paste, customizable.  No requirement for computational expertise or significant computational hardware.  ~1-5 days to teach a bench biologist to use.  $100-150 of rental compute (―cloud computing‖)…  …for $1000 data set.  Adding in quality control and internal validation steps.
  • 61. Can we crowdsource bioinformatics? We already are! Bioinformatics is already a tremendously open and collaborative endeavor. (Let‘s take advantage of it!) ―It‘s as if somewhere, out there, is a collection of totally free software that can do a far better job than ours can, with open, published methods, great support networks and fantastic tutorials. But that‘s madness – who on Earth would create such an amazing resource?‖ http://thescienceweb.wordpress.com/2014/02/21/bioinfo rmatics-software-companies-have-no-clue-why-no-onebuys-their-products/
  • 62. Where is the data tidal wave taking biology!?  A world with a lot more data, and, eventually, a lot more information.  A more integrative world: genomics, molecular function, evolution, population genetics, monitoring, ??, and models that feed back into experimental design. ―Data-Intensive Biology‖
  • 63. Data intensive biology & hypothesis generation  My interest in biological data is to enable better hypothesis generation.
  • 64. Additional projects - Bacterial symbionts of bone eating worms – w/Shana Goffredi. (ISME, 2013)  Genome of Haemonchus contortus, a parasitic nematode (with Erich Schwarz and Robin Gasser). (Genome Biology, 2013)  Soil metagenome analysis (with Jim Tiedje, Susannah Tringe, and Janet Jansson). (In review, PNAS.)  Lamprey transcriptome (with Weiming Li). (in preparation).  Ascidian genomes and transcriptomes (with Billie Swalla). (in preparation)  Loligo pealeii (the giant axon squid) – 5 transcriptomes and skim genome posted publicly (Feb 2014).
  • 65. In progress  Cattle paratuberculosis analysis (w/Paul Coussens).  Improving the chick genome using nth-generation sequencing technology (PacBio, Moleculo). and building software and protocols to make it easy for the next 1000 genomes.
  • 66. % of reads aligning Moleculo data vs chick genome. Luiz Irber Read length
  • 67. What are the challenges ahead?  Obviously: Genotype/phenotype mapping.  But also: Conserved unknown/unannotated genes.  Data sharing, and more generally open access/data/source/science.  Data integration!
  • 68. The problem of lopsided gene characterization is pervasive: e.g., the brain "ignorome" "...ignorome genes do not differ from well-studied genes in terms of connectivity in coexpression networks. Nor do they differ with respect to numbers of orthologs, paralogs, or protein domains. The major distinguishing characteristic between these sets of genes is date of discovery, early discovery being associated with greater research momentum—a genomic bandwagon effect." lide courtesy Erich Schwarz Ref.: Pandey et al. (2014), PLoS One 11, e88889.
  • 70. Thanks!  References and grants at http://ged.msu.edu/research.html  Software at http://github.com/ged-lab/  Blog at http://ivory.idyll.org/blog/  Twitter: @ctitusbrown E-mail me: ctb@msu.edu

Editor's Notes

  1. For the first project, we are interested in finding alternative isoforms that differentially expressed in chickens line 6 and 7, which is resistant and susceptible to Marek’s disease respectively.Both line 6 and line 7 can get infected by Marek’s disease virus but only line 7 develop T-cell lymphoma.Studies have shown that alternative splicing can increase susceptibility of some diseases in human so we hypothesize that it might play the same role in Marek’s disease.
  2. In this study we used single-end reads from line 6 and line 7, before and after infection to build gene models and use paired-end reads from the same samples for validation.We also use ESTs and mRNA from genbank to validate the gene models.
  3. The we assemble short reads to obtain longer contigs.We used two assembly methods called global and local assembly to increase the sensitivity of isoform detection.We also do assembly with multiple k-mer or hash length to obtain transcripts with different expression levels.We then removed low complexity sequences and trimmed off poly-A tail. Then we mapped all contigs to the genome using BLAT.The alignments from BLAT were then used to predict all putative isoforms, which is done by a program called Gimme that I developed.Then a coding region of each isoforms is predicted by ESTScan.
  4. In the pipeline we used two assembly methods called global and local assembly.In local assembly, only reads mapped to a genome are assembled, on the other hand, all reads are assembled in global assembly.Basically, we used a program that can map both spliced and unspliced reads to the genome, for example Tophat.Then we extract reads mapped to each chromosome and perform assembly of those reads separately using velvet and oases.
  5. This figure shows alignments of sequences from assembly that are aligned chicken genomeOftentimes we do not get a complete transcript from assembly, so I develop Gimme, a program that assembles transcripts based on sequence alignment.It basically merges all incomplete transcripts from assembly together and predict the structure of the gene model with all possible isoforms.The program works with all kind of sequences including expressed sequence tags and mRNAs.Therefore, we can also incorporate data from other sources to build gene models.
  6. This figure shows alignments of sequences from assembly that are aligned chicken genomeOftentimes we do not get a complete transcript from assembly, so I develop Gimme, a program that assembles transcripts based on sequence alignment.It basically merges all incomplete transcripts from assembly together and predict the structure of the gene model with all possible isoforms.The program works with all kind of sequences including expressed sequence tags and mRNAs.Therefore, we can also incorporate data from other sources to build gene models.
  7. This is an example of complete annotated gene models compared with gene models from our pipeline.Our gene models include both isoforms as well as correct coding region.
  8. And this figure shows extra isoforms that only detected by local assembly. The highlighted exon is not found in global assembly but it is annotated in reference sequence, this means that global assembly is missing a real exon, which can only be found by local assembly.
  9. Gene model from RNA-Seq can be used to improve existing gene models, for example we can extend untranslated region which is not well annotated and difficult to predict from a genome sequence.
  10. From out gene models, a total number of genes is about 15,000 genes with 47,000 transcripts, however this number is overestimated due to incomplete assembly.
  11. The easiest way to validate gene models is to map the same set of reads back to the gene models. We found that up to 78% of single-end reads are mapped to the gene models.This number is high for RNA-Seq data and really indicates that the gene models are high-quality. Also up to 65% of paired-end reads from the same samples are mapped to the gene models.The paired-end mapping is more stringent, so the number help confirm a good quality of the gene models.
  12. To validate splice junctions, we compared splice junctions found in our models to ESTs and mRNA.~80% of splice junctions are supported by Genbank mRNA or ESTs or both, which indicates that these splice junctions are real.21,000 splice junctions that are not supported by mRNA and ESTs may include some novel splice junctions.
  13. To summarize, our method can detect many known and unknown isoforms from RNA-Seq data and local assembly technique increases sensitivity of isoform detection.Cufflinks is not better than de novo assembly in chicken.And the pipeline should work with RNA-Seq data from other organisms.
  14. The green model is from single-end reads. Skipped exon in not included in gene models but detected by DEXSeq.
  15. 6x more. What do we do?
  16. Since I work with multiple people, I really notice.
  17. Note general problem with bioinfo.
  18. Translation initiation factor
  19. Lure them in with bioinformatics and then show them that Michigan, in the summertime, isqite nice!
  20. Think lab protocol.
  21. More generally….