poster

Genomic
Predic,on
&
compara,ve
analysis
of
Pathogenicity
of
the
new
“super
bug”:
Clostridium
difficile

Debjit
Ray*,
Kelly
Williams*,
Hudson
Corey*,
Christopher
Polage†,
Joseph
S.
Schoeniger*

*Sandia
NaConal
Laboratories,
Livermore,
CA;

†University
of
California
Davis
Medical
Center,
Sacramento,
CA

IntroducCon

Experimental
Design
and
Methods

Conclusions
and
future
direcCons

We have demonstrated that it is possible to rapidly sequence and produce de novo genome assemblies for reagent costs
of around $200 per genome
• Assembly errors mainly occur at repeat regions, especially rRNA.
• The resulting genomes appear suitable for comparative phylogenetic analysis.
• Improved bioinformatics tools may be able to significantly improve assemblies.
Preliminary data indicates that it is feasible to sequence and assemble and obtain nearly complete coverage of genomes
from samples composed of mixed gDNA from disparate genera. This intentional strategy of limited metagenomic assembly
may enable library prep costs to be halved. In the near future we will test whether long read data (e.g. Oxford Nanopore
MinION) can improve our ability to scaffold over repeats and close genomes.
Results

Sandia
NaConal
Laboratories
is
a
mulC-‐program
laboratory
managed
and

operated
by
Sandia
CorporaCon,
a
wholly
owned
subsidiary
of
Lockheed
MarCn

CorporaCon,
for
the
U.S.
Department
of
Energy's
NaConal
Nuclear
Security

AdministraCon
under
contract
DE-‐AC04-‐94AL85000.

debray@sandia.gov

Horizontal gene transfer (HGT) and recombination leads to the emergence of
bacterial antibiotic resistance and pathogenic traits. Genetic changes range from
acquisition of a large plasmid to insertion of transposon into a regulatory gene. In-
depth comparative phylogenomics can identify subtle genome or plasmid
structural changes or mutations associated with phenotypic changes. Comparative
phylogenomics requires that accurately sequenced, complete and properly
annotated genomes of the organism. Assembling closed genomes requires
additional mate-pair reads or “long read” sequencing data to accompany short-
read paired-end data. Our goal is to improve the understanding of emergence of
pathogenesis using sequencing, comparative genomics, and machine learning
analysis of ~1000 pathogen genomes.

Machine learning algorithms will be
used to digest the diverse features
(change in virulence genes,
recombination, horizontal gene
transfer, patient diagnostics).
Temporal data and evolutionary
models can thus determine
whether the origin of a particular
isolate is likely to have been from
the environment. It can be useful
for comparing differences in
virulence along or across the tree.
Culturing
of
Microorganisms
and
Sequencing
Library
Prep

Peptoclostridium
difficile
(Cdiff)
hypervirulent
strains
(027
ribotype)
were
obtained
from

collec,ons
of
clinical
isolates
at
UC
Davis
Medical
Center
and
grown
on
plates
with

permissive
media
at
37
degrees
C
for
72
hours
under
anaerobic
condi,ons.
Total

genomic
DNA
(gDNA)
was
extracted
using
the
QIAgen
Blood
&
Tissue
Total
DNA

Isola,on
kit.

Libraries
were
prepared
for
the
Illumina
NextSeq
sequencer
following
Illumina
protocols

for
kits
using
transposon-‐mediated
fragmenta,on,
as
shown
below.
Sequencing
was

performed
using
a
300
cycle
kit
to
create
150bp
paired
end
reads.

Funding was provided by the Laboratory Directed Research and Development program at Sandia National Laboratories

Paired-‐Ends

(90
min
/
$19)

Sequencing
of
10M
Reads

(2
day/$100
)

Mate-‐Pairs

(2
day/$80
)

Sequencing
and
Sequence
Assembly

Both
mate
pair
and
paired
end
libraries
were
prepared
for
seventeen
Cdiff
isolates
(S1

through
S17).

In
total
17
mate
pair
libraries
and
17
paired
end
libraries
were
bar-‐
coded
and
sequenced
together
in
a
single
NexSeq
run
with
a
kits
that
produced
~150M

reads.

Standard
Illumina
mate
pair
kits
support
only
up
to
12
single-‐end
bar
codes

sequencer
run,
but
these
cannot
be
easily
demul,plexed
using
standard
so[ware
such

as
bcl2fastq
(Illumina).
SPAdes
3.6.0
is
capable
in
a
few
hours
of
conver,ng
mixes
of

reads
from
different
library
preps
into
high-‐quality
assemblies
with
only
a
few
gaps.

Remaining
breaks
in
scaffolds
are
generally
due
to
repeats
(e.g.,
rRNA
genes)
and
we

are
use
gap
closure
techniques
that
avoid
custom
PCR
or
targeted
sequencing.

Improvements
could
be
made
toward
comple,ng
the
whole
genome
by
developing

our
own
so[ware
tools
for
mate
pair
guided
bridging
(Bridger)
Sample
Paired
end

reads

Mate
pair

reads

Spades

Scaff

Final

con,gs

Genome
Mean

GC%

Cdiff
1
7,696,793
5,178,578
17
2
3957333
28.54

Cdiff
2
8,049,303
2,566,745
19
5
4182280
28.71

Cdiff
3
9,598,027
4,713,959
13
3
4154044
28.65

Cdiff
4
8,884,058
3,555,923
20
2
4145236
28.61

Cdiff
5
7,305,180
4,604,059
20
3
4169542
28.69

Cdiff
6
7,265,736
4,959,974
23
3
4120797
28.51

Cdiff
7
7,160,304
3,344,022

18
4
4201537
28.75

Cdiff
8
6,988,513
6,429,131
13
4
4169879
28.33

Cdiff
9
6,431,108
6,493,984
11
5
4178334
28.14

Cdiff
10
8,757,850
9,326,335
17
3
4227574
28.66

Cdiff
11
6,820,879
6,598,639
21
3
4175884
28.88

Cdiff
12
5,660,381
6,605,606
19
2
4175038
28.21

Cdiff
13
6,656,614
6,314,774
33
3
4271639
28.28

Cdiff
14
5,847,659
9,675,039
13
3
4151289
28.50

Cdiff
16
6,495,214
6,436,182
12
3
4172824
28.11

Cdiff
17
4,973,061
6,786,947
11
2
4171486
28.25

Genome
Size
Func,on

Cd2
170
hypotheCcal
protein

1919
Tetracycline
resistance
protein
TetM

Cd16
1466

Prophage
LambdaBa042C
site-‐specific

recombinase2C
phage
integrase

200
hypotheCcal
protein

Cd17
2147
Excisionase
from
transposon
Tn916

395
Transposase
from
transposon
Tn916

221
ConjugaCve
transposon
protein
TcpC

Increase
Mate
Pair

Size
to
Span
rDNA

Repeats
Reliably

Compara,ve
Analysis
of
Genomes

Un,l
recently,
sequencing
and
assembling
and
annota,ng
a
bacterial
genome
was
a
major
effort,
generally
undertaken
in
order
to

establish
phylogeny
and
a
basic
inventory
of
genes,
metabolic
pathways.
A
large
number
of
well-‐annotated
reference
genomes

now
exist,
however,
for
most
pathogens,
and
there
are
good
tools
for
standard
annota,on.

It
is
now
feasible
to
sequence
and

assemble
large
numbers
of
closely-‐related
strains
in
order
to
understand
changes
to
the
genome
that
occur
over
short
,me
scales

We
are
construc,ng
pipelines
for
assembly,
annota,on
and
compara,ve
analysis
of
genomes
that
primarily
focus
on
the

iden,fica,on
of
mobile
elements
and
genes
and
genome
features
closely
associated
with
virulence
and
an,bio,c
resistance.

Genome
%
tRNA

Iden,ty

Island

Length

Island_1
Cd1-‐
Cd16
100
18,965
Cas,
Phage_integrase,
SmpB

Island_2
Cd2,
Cd17
89
82,810
Phage_integrase

Island_3
Cd7,
Cd10
98
21,817
Phage_integrase

Island_1
Cd1-‐
Cd16
100
18,965
Cas,
Phage_integrase,
SmpB

label2
7
3 1
3 2
3 5
3 8
4 1
4 8
5 7
6 4
6 5
6 8
6 9
7 3
8 1
8 4
8 5
8 8
9 1
9 2
9 5
9 8
100
5.0E-6
Cd2
Cd8
Cd11
CD196
Cd17
Cd7
Cd13
CIP_107932
2007855
Cd9
Cd16
Cd6
R20291
Cd14
Cd5
QCD_76w55
QCD_97b34
QCD_32g58
Cd12
Cd4
Cd1
QCD_66c26
BI1
Cd10
Cd3
QCD_37x79
6 5
3 1
2 7
7 3
9 8
3 2
9 5
8 4
6 4
9 8
3 8
5 7
9 8
8 1
4 1
6 8
100
9 2
9 1
4 8
8 8
6 9
3 5
8 5
Phylogene,c
Tree

Feature
annota,on
and
machine
learning

Tools
such
has
Mugsy
(mugsy.sourceforge.net/)
enable
mul,ple
whole
genome
alignment
to
form
a
Pan-‐Genome.

Features
that

are
unique
to
subsets
of
the

genomes
can
be
iden,fied
and
genome
annota,on
collected
for
these
regions.

A
preliminary

exercise
of
this
strategy
on
the
clinical
isolates
of
Cdiff
reveals
several
puta,ve
recent
horizontal
gene
transfer
events
that
may
be

associated
with
changes
in
an,bio,c
resistance
or
virulence.

Other
tools
such
as
Islander
(bioinforma,cs.sandia.gov)
enable

discovery
of
new
genomic
islands.
(Phage
integra,on
may
lead
to
acquisi,on
of
new
virulence
genes.)

Create
Pan-‐Genome

Conserved
and
unique
blocks

Unique
genomic
features

Unique
HGT
/
Transposons
Ab
resistance

The
unique
genomic
features
across
the
different
clinical
samples
and
their
corresponding

pa,ent
phenotypic
features
(age,
sex,
onset
,me
etc.)
would
be
used
to
develop
the

machine
learning
algorithm
that
can
predict
pa,ent
outcomes.
Chances
of
reoccurrence

and
gradual
changes
in
the
an,bio,c
resistance.
The
so[ware
tool
developed
would
be

suitable
for
rou,ne
clinical
pathogenecity
detec,on
and
drug
administra,on.

Assembled

Genomes
Annotation

“RAST” or “PROKKA”
Gene
Finder

”Prodigal”

RNA
Genes

“rfind”

Islands

”Islander”
Gene
Families

“HMMR”

Virulence
DB

Abx
Res
DB

Transposases

Integrases

CAS/CRISPR

Custom
(Cdiff)

Integrons

”Integral”

Whole
Genome
Alignment

“Mugsy”

poster

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to poster

Similar to poster (20)

poster