SlideShare a Scribd company logo
1 of 1
Download to read offline
Genomic	
  Predic,on	
  &	
  compara,ve	
  analysis	
  of	
  Pathogenicity	
  of	
  the	
  new	
  “super	
  bug”:	
  Clostridium	
  difficile	
  
Debjit	
  Ray*,	
  Kelly	
  Williams*,	
  Hudson	
  Corey*,	
  Christopher	
  Polage†,	
  Joseph	
  S.	
  Schoeniger*	
  
	
  
*Sandia	
  NaConal	
  Laboratories,	
  Livermore,	
  CA;	
  	
  †University	
  of	
  California	
  Davis	
  Medical	
  Center,	
  Sacramento,	
  CA	
  
IntroducCon	
  
Experimental	
  Design	
  and	
  Methods	
  
Conclusions	
  and	
  future	
  direcCons	
  
We have demonstrated that it is possible to rapidly sequence and produce de novo genome assemblies for reagent costs
of around $200 per genome
• Assembly errors mainly occur at repeat regions, especially rRNA.
• The resulting genomes appear suitable for comparative phylogenetic analysis.
• Improved bioinformatics tools may be able to significantly improve assemblies.
Preliminary data indicates that it is feasible to sequence and assemble and obtain nearly complete coverage of genomes
from samples composed of mixed gDNA from disparate genera. This intentional strategy of limited metagenomic assembly
may enable library prep costs to be halved. In the near future we will test whether long read data (e.g. Oxford Nanopore
MinION) can improve our ability to scaffold over repeats and close genomes.
Results	
  
Sandia	
  NaConal	
  Laboratories	
  is	
  a	
  mulC-­‐program	
  laboratory	
  managed	
  and	
  
operated	
  by	
  Sandia	
  CorporaCon,	
  a	
  wholly	
  owned	
  subsidiary	
  of	
  Lockheed	
  MarCn	
  
CorporaCon,	
  for	
  the	
  U.S.	
  Department	
  of	
  Energy's	
  NaConal	
  Nuclear	
  Security	
  
AdministraCon	
  under	
  contract	
  DE-­‐AC04-­‐94AL85000.	
  	
  
debray@sandia.gov	
  
Horizontal gene transfer (HGT) and recombination leads to the emergence of
bacterial antibiotic resistance and pathogenic traits. Genetic changes range from
acquisition of a large plasmid to insertion of transposon into a regulatory gene. In-
depth comparative phylogenomics can identify subtle genome or plasmid
structural changes or mutations associated with phenotypic changes. Comparative
phylogenomics requires that accurately sequenced, complete and properly
annotated genomes of the organism. Assembling closed genomes requires
additional mate-pair reads or “long read” sequencing data to accompany short-
read paired-end data. Our goal is to improve the understanding of emergence of
pathogenesis using sequencing, comparative genomics, and machine learning
analysis of ~1000 pathogen genomes. 	
  
Machine learning algorithms will be
used to digest the diverse features
(change in virulence genes,
recombination, horizontal gene
transfer, patient diagnostics).
Temporal data and evolutionary
models can thus determine
whether the origin of a particular
isolate is likely to have been from
the environment. It can be useful
for comparing differences in
virulence along or across the tree.
Culturing	
  of	
  Microorganisms	
  and	
  Sequencing	
  Library	
  Prep	
  	
  
	
  
Peptoclostridium	
  difficile	
  (Cdiff)	
  hypervirulent	
  strains	
  (027	
  ribotype)	
  were	
  obtained	
  from	
  
collec,ons	
   of	
   clinical	
   isolates	
   at	
   UC	
   Davis	
   Medical	
   Center	
   and	
   grown	
   on	
   plates	
   with	
  
permissive	
   media	
   at	
   37	
   degrees	
   C	
   for	
   72	
   hours	
   under	
   anaerobic	
   condi,ons.	
   Total	
  
genomic	
   DNA	
   (gDNA)	
   was	
   extracted	
   using	
   the	
   QIAgen	
   Blood	
   &	
   Tissue	
   Total	
   DNA	
  
Isola,on	
  kit.	
  
Libraries	
  were	
  prepared	
  for	
  the	
  Illumina	
  NextSeq	
  sequencer	
  following	
  Illumina	
  protocols	
  
for	
   kits	
   using	
   transposon-­‐mediated	
   fragmenta,on,	
   as	
   shown	
   below.	
   Sequencing	
   was	
  
performed	
  using	
  a	
  300	
  cycle	
  kit	
  to	
  create	
  150bp	
  paired	
  end	
  reads.	
  
Funding was provided by the Laboratory Directed Research and Development program at Sandia National Laboratories 	
  
Paired-­‐Ends	
  	
  
(90	
  min	
  /	
  $19)	
  
Sequencing	
  of	
  10M	
  Reads	
  	
  
(2	
  day/$100	
  )	
  
Mate-­‐Pairs	
  	
  
(2	
  day/$80	
  )	
  
Sequencing	
  and	
  Sequence	
  Assembly	
  
	
  
Both	
  mate	
  pair	
  and	
  paired	
  end	
  libraries	
  were	
  prepared	
  for	
  seventeen	
  Cdiff	
  isolates	
  (S1	
  
through	
  S17).	
   	
  In	
  total	
  17	
  mate	
  pair	
  libraries	
  and	
  17	
  paired	
  end	
  libraries	
  were	
  bar-­‐
coded	
  and	
  sequenced	
  together	
  in	
  a	
  single	
  NexSeq	
  run	
  with	
  a	
  kits	
  that	
  produced	
  ~150M	
  
reads.	
   	
  Standard	
  Illumina	
  mate	
  pair	
  kits	
  support	
  only	
  up	
  to	
  12	
  single-­‐end	
  bar	
  codes	
  
sequencer	
  run,	
  but	
  these	
  cannot	
  be	
  easily	
  demul,plexed	
  using	
  standard	
  so[ware	
  such	
  
as	
  bcl2fastq	
  (Illumina).	
  SPAdes	
  3.6.0	
  is	
  capable	
  in	
  a	
  few	
  hours	
  of	
  conver,ng	
  mixes	
  of	
  
reads	
  from	
  different	
  library	
  preps	
  into	
  high-­‐quality	
  assemblies	
  with	
  only	
  a	
  few	
  gaps.	
  	
  
Remaining	
  breaks	
  in	
  scaffolds	
  are	
  generally	
  due	
  to	
  repeats	
  (e.g.,	
  rRNA	
  genes)	
  and	
  we	
  
are	
   use	
   gap	
   closure	
   techniques	
   that	
   avoid	
   custom	
   PCR	
   or	
   targeted	
   sequencing.	
  
Improvements	
   could	
   be	
   made	
   toward	
   comple,ng	
   the	
   whole	
   genome	
   by	
   developing	
  
our	
  own	
  so[ware	
  tools	
  for	
  mate	
  pair	
  guided	
  bridging	
  (Bridger)
Sample	
   Paired	
  end	
  
reads	
  
Mate	
  pair	
  
reads	
  
Spades	
  
Scaff	
  
Final	
  
con,gs	
  
	
  
Genome	
   Mean	
  
GC%	
  
Cdiff	
  1	
   7,696,793	
   5,178,578	
   17	
   2	
   3957333	
   28.54	
  
Cdiff	
  2	
   8,049,303	
   2,566,745	
   19	
   5	
   4182280	
   28.71	
  
Cdiff	
  3	
   9,598,027	
   4,713,959	
   13	
   3	
   4154044	
   28.65	
  
Cdiff	
  4	
   8,884,058	
   3,555,923	
   20	
   2	
   4145236	
   28.61	
  
Cdiff	
  5	
   7,305,180	
   4,604,059	
   20	
   3	
   4169542	
   28.69	
  
Cdiff	
  6	
   7,265,736	
   4,959,974	
   23	
   3	
   4120797	
   28.51	
  
Cdiff	
  7	
   7,160,304	
   3,344,022	
  	
   18	
   4	
   4201537	
   28.75	
  
Cdiff	
  8	
   6,988,513	
   6,429,131	
   13	
   4	
   4169879	
   28.33	
  
Cdiff	
  9	
   6,431,108	
   6,493,984	
   11	
   5	
   4178334	
   28.14	
  
Cdiff	
  10	
   8,757,850	
   9,326,335	
   17	
   3	
   4227574	
   28.66	
  
Cdiff	
  11	
   6,820,879	
   6,598,639	
   21	
   3	
   4175884	
   28.88	
  
Cdiff	
  12	
   5,660,381	
   6,605,606	
   19	
   2	
   4175038	
   28.21	
  
Cdiff	
  13	
   6,656,614	
   6,314,774	
   33	
   3	
   4271639	
   28.28	
  
Cdiff	
  14	
   5,847,659	
   9,675,039	
   13	
   3	
   4151289	
   28.50	
  
Cdiff	
  16	
   6,495,214	
   6,436,182	
   12	
   3	
   4172824	
   28.11	
  
Cdiff	
  17	
   4,973,061	
   6,786,947	
   11	
   2	
   4171486	
   28.25	
  
Genome	
   Size	
   Func,on	
  
Cd2	
   170	
   hypotheCcal	
  protein	
  
1919	
   Tetracycline	
  resistance	
  protein	
  TetM	
  
Cd16	
   1466	
  
Prophage	
  LambdaBa042C	
  site-­‐specific	
  
recombinase2C	
  phage	
  integrase	
  
200	
   hypotheCcal	
  protein	
  
Cd17	
   2147	
   Excisionase	
  from	
  transposon	
  Tn916	
  
395	
   Transposase	
  from	
  transposon	
  Tn916	
  
221	
   ConjugaCve	
  transposon	
  protein	
  TcpC	
  
Increase	
  Mate	
  Pair	
  
Size	
  to	
  Span	
  rDNA	
  
Repeats	
  Reliably	
  
Compara,ve	
  Analysis	
  of	
  Genomes	
  
	
  
Un,l	
  recently,	
  sequencing	
  and	
  assembling	
  and	
  annota,ng	
  a	
  bacterial	
  genome	
  was	
  a	
  major	
  effort,	
  generally	
  undertaken	
  in	
  order	
  to	
  
establish	
  phylogeny	
  and	
  a	
  basic	
  inventory	
  of	
  genes,	
  metabolic	
  pathways.	
  A	
  large	
  number	
  of	
  well-­‐annotated	
  reference	
  genomes	
  
now	
  exist,	
  however,	
  for	
  most	
  pathogens,	
  and	
  there	
  are	
  good	
  tools	
  for	
  standard	
  annota,on.	
  	
  It	
  is	
  now	
  feasible	
  to	
  sequence	
  and	
  
assemble	
  large	
  numbers	
  of	
  closely-­‐related	
  strains	
  in	
  order	
  to	
  understand	
  changes	
  to	
  the	
  genome	
  that	
  occur	
  over	
  short	
  ,me	
  scales	
  	
  
We	
  are	
  construc,ng	
  pipelines	
  for	
  assembly,	
  annota,on	
  and	
  compara,ve	
  analysis	
  of	
  genomes	
  that	
  primarily	
  focus	
  on	
  the	
  
iden,fica,on	
  of	
  mobile	
  elements	
  and	
  genes	
  and	
  genome	
  features	
  closely	
  associated	
  with	
  virulence	
  and	
  an,bio,c	
  resistance.	
  	
  	
  
Genome	
   %	
  tRNA	
  
Iden,ty	
  
Island	
  
Length	
  
Island_1	
   Cd1-­‐	
  Cd16	
   100	
   18,965	
   Cas,	
  Phage_integrase,	
  SmpB	
  
Island_2	
   Cd2,	
  Cd17	
   89	
   82,810	
   Phage_integrase	
  
Island_3	
   Cd7,	
  Cd10	
   98	
   21,817	
   Phage_integrase	
  
Island_1	
   Cd1-­‐	
  Cd16	
   100	
   18,965	
   Cas,	
  Phage_integrase,	
  SmpB	
  
label2
7
3 1
3 2
3 5
3 8
4 1
4 8
5 7
6 4
6 5
6 8
6 9
7 3
8 1
8 4
8 5
8 8
9 1
9 2
9 5
9 8
100
5.0E-6
Cd2
Cd8
Cd11
CD196
Cd17
Cd7
Cd13
CIP_107932
2007855
Cd9
Cd16
Cd6
R20291
Cd14
Cd5
QCD_76w55
QCD_97b34
QCD_32g58
Cd12
Cd4
Cd1
QCD_66c26
BI1
Cd10
Cd3
QCD_37x79
6 5
3 1
2 7
7 3
9 8
3 2
9 5
8 4
6 4
9 8
3 8
5 7
9 8
8 1
4 1
6 8
100
9 2
9 1
4 8
8 8
6 9
3 5
8 5
Phylogene,c	
  Tree	
  
Feature	
  annota,on	
  and	
  machine	
  learning	
  
	
  
Tools	
  such	
  has	
  Mugsy	
  (mugsy.sourceforge.net/)	
  enable	
  mul,ple	
  whole	
  genome	
  alignment	
  to	
  form	
  a	
  Pan-­‐Genome.	
  	
  Features	
  that	
  
are	
  unique	
  to	
  subsets	
  of	
  the	
  	
  genomes	
  can	
  be	
  iden,fied	
  and	
  genome	
  annota,on	
  collected	
  for	
  these	
  regions.	
  	
  A	
  preliminary	
  
exercise	
  of	
  this	
  strategy	
  on	
  the	
  clinical	
  isolates	
  of	
  Cdiff	
  reveals	
  several	
  puta,ve	
  recent	
  horizontal	
  gene	
  transfer	
  events	
  that	
  may	
  be	
  
associated	
  with	
  changes	
  in	
  an,bio,c	
  resistance	
  or	
  virulence.	
  	
  Other	
  tools	
  such	
  as	
  Islander	
  (bioinforma,cs.sandia.gov)	
  enable	
  
discovery	
  of	
  new	
  genomic	
  islands.	
  (Phage	
  integra,on	
  may	
  lead	
  to	
  acquisi,on	
  of	
  new	
  virulence	
  genes.)	
  
Create	
  Pan-­‐Genome	
  
Conserved	
  and	
  unique	
  blocks	
  
Unique	
  genomic	
  features	
  
Unique	
  HGT	
  /	
  Transposons	
   Ab	
  resistance	
  
The	
  unique	
  genomic	
  features	
  across	
  the	
  different	
  clinical	
  samples	
  and	
  their	
  corresponding	
  
pa,ent	
  phenotypic	
  features	
  (age,	
  sex,	
  onset	
  ,me	
  etc.)	
  would	
  be	
  used	
  to	
  develop	
  the	
  
machine	
  learning	
  algorithm	
  that	
  can	
  predict	
  pa,ent	
  outcomes.	
  Chances	
  of	
  reoccurrence	
  
and	
  gradual	
  changes	
  in	
  the	
  an,bio,c	
  resistance.	
  The	
  so[ware	
  tool	
  developed	
  would	
  be	
  
suitable	
  for	
  rou,ne	
  clinical	
  pathogenecity	
  detec,on	
  and	
  drug	
  administra,on.	
  	
  
Assembled	
  
Genomes
Annotation
	
   “RAST” or “PROKKA”
Gene	
  Finder	
  
	
  	
  	
  ”Prodigal”	
  	
  	
  
RNA	
  Genes	
  
“rfind”
	
  	
  	
  Islands	
  
	
  	
   ”Islander”
Gene	
  Families	
  
	
  	
  	
  “HMMR”
	
  	
  	
  Virulence	
  DB	
  
	
  	
  	
  	
  Abx	
  Res	
  DB	
  
	
  	
  	
  	
  Transposases	
  
	
  	
  	
  	
  Integrases	
  
	
  	
  	
  	
  CAS/CRISPR	
  
	
  	
  	
  	
  Custom	
  (Cdiff)	
  
Integrons	
  
	
  	
  	
  ”Integral”	
  	
  	
  
Whole	
  Genome	
  Alignment	
  
	
  “Mugsy”

More Related Content

What's hot

NCER Position on Crispr-Cas9
NCER Position on Crispr-Cas9NCER Position on Crispr-Cas9
NCER Position on Crispr-Cas9Joe Szczepaniak
 
2014 khmer protocols
2014 khmer protocols2014 khmer protocols
2014 khmer protocolsc.titus.brown
 
Aug2013 Heidi Rehm integrating large scale sequencing into clinical practice
Aug2013 Heidi Rehm integrating large scale sequencing into clinical practiceAug2013 Heidi Rehm integrating large scale sequencing into clinical practice
Aug2013 Heidi Rehm integrating large scale sequencing into clinical practiceGenomeInABottle
 
Genome Editing Comes of Age
Genome Editing Comes of AgeGenome Editing Comes of Age
Genome Editing Comes of AgeCandy Smellie
 
Aug2015 analysis team spiral genetics
Aug2015 analysis team spiral geneticsAug2015 analysis team spiral genetics
Aug2015 analysis team spiral geneticsGenomeInABottle
 
RNASeq - Analysis Pipeline for Differential Expression
RNASeq - Analysis Pipeline for Differential ExpressionRNASeq - Analysis Pipeline for Differential Expression
RNASeq - Analysis Pipeline for Differential ExpressionJatinder Singh
 
Genetic Engineering and Biotechnology
Genetic Engineering and BiotechnologyGenetic Engineering and Biotechnology
Genetic Engineering and BiotechnologyStephen Taylor
 
Knowing Your NGS Upstream: Alignment and Variants
Knowing Your NGS Upstream: Alignment and VariantsKnowing Your NGS Upstream: Alignment and Variants
Knowing Your NGS Upstream: Alignment and VariantsGolden Helix Inc
 
Bioinformatics tools for NGS data analysis
Bioinformatics tools for NGS data analysisBioinformatics tools for NGS data analysis
Bioinformatics tools for NGS data analysisDespoina Kalfakakou
 
Jan2016 dnanexus giab uses andrew carroll
Jan2016 dnanexus giab uses andrew carrollJan2016 dnanexus giab uses andrew carroll
Jan2016 dnanexus giab uses andrew carrollGenomeInABottle
 
Jan2016 bio nano han cao
Jan2016 bio nano han caoJan2016 bio nano han cao
Jan2016 bio nano han caoGenomeInABottle
 
2011 jeroen vanhoudt_ngs
2011 jeroen vanhoudt_ngs2011 jeroen vanhoudt_ngs
2011 jeroen vanhoudt_ngsDin Apellidos
 
ICMP MPS SNP Panel for Missing Persons - Michelle Peck et al.
ICMP MPS SNP Panel for Missing Persons - Michelle Peck et al.ICMP MPS SNP Panel for Missing Persons - Michelle Peck et al.
ICMP MPS SNP Panel for Missing Persons - Michelle Peck et al.QIAGEN
 
Toolbox for bacterial population analysis using NGS
Toolbox for bacterial population analysis using NGSToolbox for bacterial population analysis using NGS
Toolbox for bacterial population analysis using NGSMirko Rossi
 
Review of CRISPR/Cas9
Review of CRISPR/Cas9Review of CRISPR/Cas9
Review of CRISPR/Cas9Hub_lot
 
Rnaseq basics ngs_application1
Rnaseq basics ngs_application1Rnaseq basics ngs_application1
Rnaseq basics ngs_application1Yaoyu Wang
 
Coding & Best Practice in Programming in the NGS era
Coding & Best Practice in Programming in the NGS eraCoding & Best Practice in Programming in the NGS era
Coding & Best Practice in Programming in the NGS eraLex Nederbragt
 
Avances en genética. Utilidad de la NGS y la bioinformática.
Avances en genética. Utilidad de la NGS y la bioinformática.Avances en genética. Utilidad de la NGS y la bioinformática.
Avances en genética. Utilidad de la NGS y la bioinformática.BBK Innova Sarea
 

What's hot (20)

NCER Position on Crispr-Cas9
NCER Position on Crispr-Cas9NCER Position on Crispr-Cas9
NCER Position on Crispr-Cas9
 
2014 khmer protocols
2014 khmer protocols2014 khmer protocols
2014 khmer protocols
 
Aug2013 Heidi Rehm integrating large scale sequencing into clinical practice
Aug2013 Heidi Rehm integrating large scale sequencing into clinical practiceAug2013 Heidi Rehm integrating large scale sequencing into clinical practice
Aug2013 Heidi Rehm integrating large scale sequencing into clinical practice
 
Genome Editing Comes of Age
Genome Editing Comes of AgeGenome Editing Comes of Age
Genome Editing Comes of Age
 
Aug2015 analysis team spiral genetics
Aug2015 analysis team spiral geneticsAug2015 analysis team spiral genetics
Aug2015 analysis team spiral genetics
 
Genome Assembly 2018
Genome Assembly 2018Genome Assembly 2018
Genome Assembly 2018
 
RNASeq - Analysis Pipeline for Differential Expression
RNASeq - Analysis Pipeline for Differential ExpressionRNASeq - Analysis Pipeline for Differential Expression
RNASeq - Analysis Pipeline for Differential Expression
 
Genetic Engineering and Biotechnology
Genetic Engineering and BiotechnologyGenetic Engineering and Biotechnology
Genetic Engineering and Biotechnology
 
Knowing Your NGS Upstream: Alignment and Variants
Knowing Your NGS Upstream: Alignment and VariantsKnowing Your NGS Upstream: Alignment and Variants
Knowing Your NGS Upstream: Alignment and Variants
 
Bioinformatics tools for NGS data analysis
Bioinformatics tools for NGS data analysisBioinformatics tools for NGS data analysis
Bioinformatics tools for NGS data analysis
 
Jan2016 dnanexus giab uses andrew carroll
Jan2016 dnanexus giab uses andrew carrollJan2016 dnanexus giab uses andrew carroll
Jan2016 dnanexus giab uses andrew carroll
 
Jan2016 bio nano han cao
Jan2016 bio nano han caoJan2016 bio nano han cao
Jan2016 bio nano han cao
 
Genome editing
Genome editingGenome editing
Genome editing
 
2011 jeroen vanhoudt_ngs
2011 jeroen vanhoudt_ngs2011 jeroen vanhoudt_ngs
2011 jeroen vanhoudt_ngs
 
ICMP MPS SNP Panel for Missing Persons - Michelle Peck et al.
ICMP MPS SNP Panel for Missing Persons - Michelle Peck et al.ICMP MPS SNP Panel for Missing Persons - Michelle Peck et al.
ICMP MPS SNP Panel for Missing Persons - Michelle Peck et al.
 
Toolbox for bacterial population analysis using NGS
Toolbox for bacterial population analysis using NGSToolbox for bacterial population analysis using NGS
Toolbox for bacterial population analysis using NGS
 
Review of CRISPR/Cas9
Review of CRISPR/Cas9Review of CRISPR/Cas9
Review of CRISPR/Cas9
 
Rnaseq basics ngs_application1
Rnaseq basics ngs_application1Rnaseq basics ngs_application1
Rnaseq basics ngs_application1
 
Coding & Best Practice in Programming in the NGS era
Coding & Best Practice in Programming in the NGS eraCoding & Best Practice in Programming in the NGS era
Coding & Best Practice in Programming in the NGS era
 
Avances en genética. Utilidad de la NGS y la bioinformática.
Avances en genética. Utilidad de la NGS y la bioinformática.Avances en genética. Utilidad de la NGS y la bioinformática.
Avances en genética. Utilidad de la NGS y la bioinformática.
 

Similar to poster

construction of genomicc dna libraries
construction of genomicc dna librariesconstruction of genomicc dna libraries
construction of genomicc dna librariesgohil sanjay bhagvanji
 
Next Generation Sequencing Technologies and Their Applications in Ornamental ...
Next Generation Sequencing Technologies and Their Applications in Ornamental ...Next Generation Sequencing Technologies and Their Applications in Ornamental ...
Next Generation Sequencing Technologies and Their Applications in Ornamental ...Ravindra Kumar
 
20150601 bio sb_assembly_course
20150601 bio sb_assembly_course20150601 bio sb_assembly_course
20150601 bio sb_assembly_coursehansjansen9999
 
Gene expression profile analysis of human hepatocellular carcinoma using sage...
Gene expression profile analysis of human hepatocellular carcinoma using sage...Gene expression profile analysis of human hepatocellular carcinoma using sage...
Gene expression profile analysis of human hepatocellular carcinoma using sage...Ahmed Madni
 
Whole genome sequencing of bacteria & analysis
Whole genome sequencing of bacteria & analysisWhole genome sequencing of bacteria & analysis
Whole genome sequencing of bacteria & analysisdrelamuruganvet
 
RNA Sequencing Research
RNA Sequencing ResearchRNA Sequencing Research
RNA Sequencing ResearchTanmay Ghai
 
2013 pag-equine-workshop
2013 pag-equine-workshop2013 pag-equine-workshop
2013 pag-equine-workshopc.titus.brown
 
Impact_of_gene_length_on_DEG
Impact_of_gene_length_on_DEGImpact_of_gene_length_on_DEG
Impact_of_gene_length_on_DEGLong Pei
 
132 gene expression in atherosclerotic plaques
132 gene expression in atherosclerotic plaques132 gene expression in atherosclerotic plaques
132 gene expression in atherosclerotic plaquesSHAPE Society
 
Dna assembly 2
Dna assembly 2Dna assembly 2
Dna assembly 2marafawi
 
Genetics biotechnology-and-genetic-engineering-
Genetics   biotechnology-and-genetic-engineering-Genetics   biotechnology-and-genetic-engineering-
Genetics biotechnology-and-genetic-engineering-Khadim Hussain
 

Similar to poster (20)

31931 31941
31931 3194131931 31941
31931 31941
 
construction of genomicc dna libraries
construction of genomicc dna librariesconstruction of genomicc dna libraries
construction of genomicc dna libraries
 
Next Generation Sequencing Technologies and Their Applications in Ornamental ...
Next Generation Sequencing Technologies and Their Applications in Ornamental ...Next Generation Sequencing Technologies and Their Applications in Ornamental ...
Next Generation Sequencing Technologies and Their Applications in Ornamental ...
 
20150601 bio sb_assembly_course
20150601 bio sb_assembly_course20150601 bio sb_assembly_course
20150601 bio sb_assembly_course
 
Gene expression profile analysis of human hepatocellular carcinoma using sage...
Gene expression profile analysis of human hepatocellular carcinoma using sage...Gene expression profile analysis of human hepatocellular carcinoma using sage...
Gene expression profile analysis of human hepatocellular carcinoma using sage...
 
2.CRISPR .pptx
2.CRISPR .pptx2.CRISPR .pptx
2.CRISPR .pptx
 
Whole genome sequencing of bacteria & analysis
Whole genome sequencing of bacteria & analysisWhole genome sequencing of bacteria & analysis
Whole genome sequencing of bacteria & analysis
 
Introduction to Apollo for i5k
Introduction to Apollo for i5kIntroduction to Apollo for i5k
Introduction to Apollo for i5k
 
RNA Sequencing Research
RNA Sequencing ResearchRNA Sequencing Research
RNA Sequencing Research
 
2013 pag-equine-workshop
2013 pag-equine-workshop2013 pag-equine-workshop
2013 pag-equine-workshop
 
CROP GENOME SEQUENCING
CROP GENOME SEQUENCINGCROP GENOME SEQUENCING
CROP GENOME SEQUENCING
 
DNA Library
DNA LibraryDNA Library
DNA Library
 
Impact_of_gene_length_on_DEG
Impact_of_gene_length_on_DEGImpact_of_gene_length_on_DEG
Impact_of_gene_length_on_DEG
 
132 gene expression in atherosclerotic plaques
132 gene expression in atherosclerotic plaques132 gene expression in atherosclerotic plaques
132 gene expression in atherosclerotic plaques
 
132 gene expression in atherosclerotic plaques
132 gene expression in atherosclerotic plaques132 gene expression in atherosclerotic plaques
132 gene expression in atherosclerotic plaques
 
Micro array study for gene expression in vp
Micro array study for gene expression in vpMicro array study for gene expression in vp
Micro array study for gene expression in vp
 
Dna assembly 2
Dna assembly 2Dna assembly 2
Dna assembly 2
 
CRISPR+101.pdf
CRISPR+101.pdfCRISPR+101.pdf
CRISPR+101.pdf
 
Genetics biotechnology-and-genetic-engineering-
Genetics   biotechnology-and-genetic-engineering-Genetics   biotechnology-and-genetic-engineering-
Genetics biotechnology-and-genetic-engineering-
 
New generation Sequencing
New generation Sequencing New generation Sequencing
New generation Sequencing
 

poster

  • 1. Genomic  Predic,on  &  compara,ve  analysis  of  Pathogenicity  of  the  new  “super  bug”:  Clostridium  difficile   Debjit  Ray*,  Kelly  Williams*,  Hudson  Corey*,  Christopher  Polage†,  Joseph  S.  Schoeniger*     *Sandia  NaConal  Laboratories,  Livermore,  CA;    †University  of  California  Davis  Medical  Center,  Sacramento,  CA   IntroducCon   Experimental  Design  and  Methods   Conclusions  and  future  direcCons   We have demonstrated that it is possible to rapidly sequence and produce de novo genome assemblies for reagent costs of around $200 per genome • Assembly errors mainly occur at repeat regions, especially rRNA. • The resulting genomes appear suitable for comparative phylogenetic analysis. • Improved bioinformatics tools may be able to significantly improve assemblies. Preliminary data indicates that it is feasible to sequence and assemble and obtain nearly complete coverage of genomes from samples composed of mixed gDNA from disparate genera. This intentional strategy of limited metagenomic assembly may enable library prep costs to be halved. In the near future we will test whether long read data (e.g. Oxford Nanopore MinION) can improve our ability to scaffold over repeats and close genomes. Results   Sandia  NaConal  Laboratories  is  a  mulC-­‐program  laboratory  managed  and   operated  by  Sandia  CorporaCon,  a  wholly  owned  subsidiary  of  Lockheed  MarCn   CorporaCon,  for  the  U.S.  Department  of  Energy's  NaConal  Nuclear  Security   AdministraCon  under  contract  DE-­‐AC04-­‐94AL85000.     debray@sandia.gov   Horizontal gene transfer (HGT) and recombination leads to the emergence of bacterial antibiotic resistance and pathogenic traits. Genetic changes range from acquisition of a large plasmid to insertion of transposon into a regulatory gene. In- depth comparative phylogenomics can identify subtle genome or plasmid structural changes or mutations associated with phenotypic changes. Comparative phylogenomics requires that accurately sequenced, complete and properly annotated genomes of the organism. Assembling closed genomes requires additional mate-pair reads or “long read” sequencing data to accompany short- read paired-end data. Our goal is to improve the understanding of emergence of pathogenesis using sequencing, comparative genomics, and machine learning analysis of ~1000 pathogen genomes.   Machine learning algorithms will be used to digest the diverse features (change in virulence genes, recombination, horizontal gene transfer, patient diagnostics). Temporal data and evolutionary models can thus determine whether the origin of a particular isolate is likely to have been from the environment. It can be useful for comparing differences in virulence along or across the tree. Culturing  of  Microorganisms  and  Sequencing  Library  Prep       Peptoclostridium  difficile  (Cdiff)  hypervirulent  strains  (027  ribotype)  were  obtained  from   collec,ons   of   clinical   isolates   at   UC   Davis   Medical   Center   and   grown   on   plates   with   permissive   media   at   37   degrees   C   for   72   hours   under   anaerobic   condi,ons.   Total   genomic   DNA   (gDNA)   was   extracted   using   the   QIAgen   Blood   &   Tissue   Total   DNA   Isola,on  kit.   Libraries  were  prepared  for  the  Illumina  NextSeq  sequencer  following  Illumina  protocols   for   kits   using   transposon-­‐mediated   fragmenta,on,   as   shown   below.   Sequencing   was   performed  using  a  300  cycle  kit  to  create  150bp  paired  end  reads.   Funding was provided by the Laboratory Directed Research and Development program at Sandia National Laboratories   Paired-­‐Ends     (90  min  /  $19)   Sequencing  of  10M  Reads     (2  day/$100  )   Mate-­‐Pairs     (2  day/$80  )   Sequencing  and  Sequence  Assembly     Both  mate  pair  and  paired  end  libraries  were  prepared  for  seventeen  Cdiff  isolates  (S1   through  S17).    In  total  17  mate  pair  libraries  and  17  paired  end  libraries  were  bar-­‐ coded  and  sequenced  together  in  a  single  NexSeq  run  with  a  kits  that  produced  ~150M   reads.    Standard  Illumina  mate  pair  kits  support  only  up  to  12  single-­‐end  bar  codes   sequencer  run,  but  these  cannot  be  easily  demul,plexed  using  standard  so[ware  such   as  bcl2fastq  (Illumina).  SPAdes  3.6.0  is  capable  in  a  few  hours  of  conver,ng  mixes  of   reads  from  different  library  preps  into  high-­‐quality  assemblies  with  only  a  few  gaps.     Remaining  breaks  in  scaffolds  are  generally  due  to  repeats  (e.g.,  rRNA  genes)  and  we   are   use   gap   closure   techniques   that   avoid   custom   PCR   or   targeted   sequencing.   Improvements   could   be   made   toward   comple,ng   the   whole   genome   by   developing   our  own  so[ware  tools  for  mate  pair  guided  bridging  (Bridger) Sample   Paired  end   reads   Mate  pair   reads   Spades   Scaff   Final   con,gs     Genome   Mean   GC%   Cdiff  1   7,696,793   5,178,578   17   2   3957333   28.54   Cdiff  2   8,049,303   2,566,745   19   5   4182280   28.71   Cdiff  3   9,598,027   4,713,959   13   3   4154044   28.65   Cdiff  4   8,884,058   3,555,923   20   2   4145236   28.61   Cdiff  5   7,305,180   4,604,059   20   3   4169542   28.69   Cdiff  6   7,265,736   4,959,974   23   3   4120797   28.51   Cdiff  7   7,160,304   3,344,022     18   4   4201537   28.75   Cdiff  8   6,988,513   6,429,131   13   4   4169879   28.33   Cdiff  9   6,431,108   6,493,984   11   5   4178334   28.14   Cdiff  10   8,757,850   9,326,335   17   3   4227574   28.66   Cdiff  11   6,820,879   6,598,639   21   3   4175884   28.88   Cdiff  12   5,660,381   6,605,606   19   2   4175038   28.21   Cdiff  13   6,656,614   6,314,774   33   3   4271639   28.28   Cdiff  14   5,847,659   9,675,039   13   3   4151289   28.50   Cdiff  16   6,495,214   6,436,182   12   3   4172824   28.11   Cdiff  17   4,973,061   6,786,947   11   2   4171486   28.25   Genome   Size   Func,on   Cd2   170   hypotheCcal  protein   1919   Tetracycline  resistance  protein  TetM   Cd16   1466   Prophage  LambdaBa042C  site-­‐specific   recombinase2C  phage  integrase   200   hypotheCcal  protein   Cd17   2147   Excisionase  from  transposon  Tn916   395   Transposase  from  transposon  Tn916   221   ConjugaCve  transposon  protein  TcpC   Increase  Mate  Pair   Size  to  Span  rDNA   Repeats  Reliably   Compara,ve  Analysis  of  Genomes     Un,l  recently,  sequencing  and  assembling  and  annota,ng  a  bacterial  genome  was  a  major  effort,  generally  undertaken  in  order  to   establish  phylogeny  and  a  basic  inventory  of  genes,  metabolic  pathways.  A  large  number  of  well-­‐annotated  reference  genomes   now  exist,  however,  for  most  pathogens,  and  there  are  good  tools  for  standard  annota,on.    It  is  now  feasible  to  sequence  and   assemble  large  numbers  of  closely-­‐related  strains  in  order  to  understand  changes  to  the  genome  that  occur  over  short  ,me  scales     We  are  construc,ng  pipelines  for  assembly,  annota,on  and  compara,ve  analysis  of  genomes  that  primarily  focus  on  the   iden,fica,on  of  mobile  elements  and  genes  and  genome  features  closely  associated  with  virulence  and  an,bio,c  resistance.       Genome   %  tRNA   Iden,ty   Island   Length   Island_1   Cd1-­‐  Cd16   100   18,965   Cas,  Phage_integrase,  SmpB   Island_2   Cd2,  Cd17   89   82,810   Phage_integrase   Island_3   Cd7,  Cd10   98   21,817   Phage_integrase   Island_1   Cd1-­‐  Cd16   100   18,965   Cas,  Phage_integrase,  SmpB   label2 7 3 1 3 2 3 5 3 8 4 1 4 8 5 7 6 4 6 5 6 8 6 9 7 3 8 1 8 4 8 5 8 8 9 1 9 2 9 5 9 8 100 5.0E-6 Cd2 Cd8 Cd11 CD196 Cd17 Cd7 Cd13 CIP_107932 2007855 Cd9 Cd16 Cd6 R20291 Cd14 Cd5 QCD_76w55 QCD_97b34 QCD_32g58 Cd12 Cd4 Cd1 QCD_66c26 BI1 Cd10 Cd3 QCD_37x79 6 5 3 1 2 7 7 3 9 8 3 2 9 5 8 4 6 4 9 8 3 8 5 7 9 8 8 1 4 1 6 8 100 9 2 9 1 4 8 8 8 6 9 3 5 8 5 Phylogene,c  Tree   Feature  annota,on  and  machine  learning     Tools  such  has  Mugsy  (mugsy.sourceforge.net/)  enable  mul,ple  whole  genome  alignment  to  form  a  Pan-­‐Genome.    Features  that   are  unique  to  subsets  of  the    genomes  can  be  iden,fied  and  genome  annota,on  collected  for  these  regions.    A  preliminary   exercise  of  this  strategy  on  the  clinical  isolates  of  Cdiff  reveals  several  puta,ve  recent  horizontal  gene  transfer  events  that  may  be   associated  with  changes  in  an,bio,c  resistance  or  virulence.    Other  tools  such  as  Islander  (bioinforma,cs.sandia.gov)  enable   discovery  of  new  genomic  islands.  (Phage  integra,on  may  lead  to  acquisi,on  of  new  virulence  genes.)   Create  Pan-­‐Genome   Conserved  and  unique  blocks   Unique  genomic  features   Unique  HGT  /  Transposons   Ab  resistance   The  unique  genomic  features  across  the  different  clinical  samples  and  their  corresponding   pa,ent  phenotypic  features  (age,  sex,  onset  ,me  etc.)  would  be  used  to  develop  the   machine  learning  algorithm  that  can  predict  pa,ent  outcomes.  Chances  of  reoccurrence   and  gradual  changes  in  the  an,bio,c  resistance.  The  so[ware  tool  developed  would  be   suitable  for  rou,ne  clinical  pathogenecity  detec,on  and  drug  administra,on.     Assembled   Genomes Annotation   “RAST” or “PROKKA” Gene  Finder        ”Prodigal”       RNA  Genes   “rfind”      Islands       ”Islander” Gene  Families        “HMMR”      Virulence  DB          Abx  Res  DB          Transposases          Integrases          CAS/CRISPR          Custom  (Cdiff)   Integrons        ”Integral”       Whole  Genome  Alignment    “Mugsy”