1. Genome analysis of
the novel cluster A2
phage Serenity, and an
introduction to Single
Molecule Real Time
(SMRT) Sequencing
Presented by Mitchell Go
Washington State University
Pullman, WA
7. Serenity genome
• Sequenced at Virginia Commonwealth University
• 454 Pyrosequencing
• 52088 bp long
• 62.6% GC content
• Cluster A2 mycobacteriophage
8. Comparison to other A2 phages
• Jsquared was discovered in Corpus Christi, TX
• Trixie was discovered in Chester, VA
9. Comparison to other A2 phages
Serenity
Jsquared
Trixie
• Typically see more diversity in second half of
the genome
• Serenity and Jsquared are very different from
most A2 phages
10. Additional project
• Sequenced Sillygoose, Baxter, Russell, Bear06
and AtticusBane
• Using Single Molecule Real Time (SMRT) Sequencing
11. Introduction to SMRT sequencing
• SMRT sequencing allows for observing DNA
polymerase reactions in real time
• Records the nucleotides used in DNA replication
• Advantages over 454 Sequencing:
• Longer readlengths
• Cost less
• Phospholinked labels
• Zero-mode waveguides (ZMWs)
• DNA methylation detection
12. SMRT vs. 454
SMRT 454
Readlengths 3,000 bps 500 bps
Cost $100 $500-$1,000
Fluorescent tags Phospholinked Baselinked
Reaction chamber SMRT Cell/ ZMW DNA Library beads/
PicoTiterPlate device
BP modification
detection
5-methylcytosine, N6-
methyladnenine, N4-
methylcytosine, DNA
oxidative damage, etc.
5-methylcytosine
13. Phospholinked labels
• Fluorophores are
phospholinked rather
than baselinked
• Each nucleotide has
is own color
• Fluorescent tag
cleaved by
polymerase
• Less background light
Baselinked
Phospholinked
454
SMRT
15. Zero-mode waveguide (ZMW)
• Diameter of hole smaller
than wavelength of light
• Allows for small
illumination volume
• Only illuminates
nucleotides that diffuse to
bottom of ZMW
• Fluorescent tags cannot
emit vertically
17. Real-time detection
•A light pulse is produced
at the bottom of each
ZMW
•The attached flurophore
is released and a colored
light is emitted
•Video cameras record
light pulses and deliver
them to be analyzed
22. Bear06 SMRT results
• Cluster A3 mycobacteriophage
52,412 bp, 64% GC content
• Annotation
89 putative genes, 3 tRNA genes
Function predicted for 24/89 genes
• Possible guanine methylation
At GGCNA
23. Conclusions
• Serenity’s genome is 52088 base pairs long
and has a GC content of 62.6%
• There are 95 genes in which only
approximately 21% had a potential function.
24. Conclusions/ future plans
• SMRT Sequencing was successfully used to
sequence mycobacteriophages
• When compared to 454, SMRT delivers:
• Longer readlengths, less cost, wider range
detection of nucleotide modifications
• Look into possible guanine methylation in
Bear06, other mycobacteriophages and
Mycobacterium smegmatis
25. Acknowledgements
• Howard Hughes Medical Institute SEA-PHAGES
• School of Biological Sciences and School of
Molecular Biosciences at Washington State
University
• Dr. Patrick Carter, Dr. William Davis, Dr. McKenna
Kyriss, Stacy Hathcox, Steven Micheletti, and Dr.
Julie Stanton
• Nick Sisneros & Pacific BioSciences®
27. In Situ: TEM results
Migo94 Sillygoose
Bear06
AtticusBane
Baxter Russell
28. Serenity genome
• Sequenced at Virginia Commonwealth University
• 454 Pyrosequencing
• 52088 bp long
• 62.6% GC content
• Cluster A2 mycobacteriophage
29. How SMRT sequencing
works
Light Source
Dichroic
ObjectiveLens
SMRT Cell
(Multiplexed ZMWs)
Color Separation
Primary Analysis
Base Calling
Any freshman student enrolling in the one year introductory biology sequence was invited to participate no matter what their GPA or SAT score.
I think the plaques were actually turbid.
Point out nick
Typically see More diversity in the second half of the genome
Especially true here
Serenity is very different from most A2 phages
Are they A2 phages??
Point out smrt machine
Holding smrt cell
Single genome on a single smrt cell
Cost less
Squicker
Novel application to Sea phages
Look at methylation
Compare and contrast with 454
Costs are rough estimates,
Things change, what center you are using
Show the traditional linkage too.
Baselinked are used in 454,
Flurophers used in SMRT
Baselinkeds change polymerase
75000 wells
Add more
Explain pictures
Just explain picture
Point out different color of each base
Explain picture
Example of a 6mA
Coverage before filtering was 2834x coverage
Filtering: removed any reads less than 100 bp, removed any reads with a quality score less than 88%
We can do this and still get 540x coverage because of how many reads we get with SMRT sequencing
545 adv: 8-10x coverage, readlength 100-200 bps, tech craps out in qualtiy at certain readlegnth
SMRT adv: too much sequence coverage
Table?
Data to get near perfect results?
Xaxis number of bp in a read
Histogram of number of reads y axis
Post filtered
Filter more stringent could have been
Checking the assembly
The tape measure gene is the longest in the genome and can be confirmed by BlastP. This helps indicate if the start site is correct.
Be careful with any tool you use
Here is one of them figured out
Learning curve
Start site sequences are highly conserved in cluster A3 phages. This helped us identify the correct start and end site for Bear06.
3’ overhang helps id start site
Possible guanine methylation? Data points it out but no rationaliztion, needs more work done
Kinetic changes in certain guanins in the Dna sequence
Can detect things that havent been detected before
Interesting result bc prior work shows two common in phage 6ma and 5mc (also in humans)
Pliminary finding, needs more work to figure out what it means on a bio scale
Could have been a host derived factor
Look at msmeg using smrt seq
Diffent project
Shows we get meaningful useful data from smrt seq
N= any nucleotide
Second part:
Sucessfulyl applied smrt seq to myco phage genomic
Data from bear
Interesting suggestions for guanine
Generous dontation from pacbio and Nick
Donated the reagents
Another way to identify genome start sites is by using a graphical user interface like Tablet.
.BAM file from SMRT sequencing was loaded into Tablet (Milne et al 2013). Tablet displays the reads.
A build up of reads that start at the exact same base indicates the potential start site.
(In this case the genome needed to be reverse complemented.)
Not reversed complimented