Exome Sequencing

Genome & Exome Sequencing
Read Mapping
Xiaole Shirley Liu
STAT115, STAT215, BIO298, BIST520

Whole Genome Sequencing
• Usually need 30-50X coverage (~ 3 lanes of
100bp PE HiSeq2000 sequencing)
2

Exome Sequencing
• Solution Hybrid
Selection: Probes in
solution can capture
all exons (exome) for
high throughput
sequencing
• 1-2% of whole
genome seq
• Easily multiplex 20
samples in one lane
4

Comparative Sequencing
• Somatic mutation
detection between
normal / cancer pairs
• WGS or WES
• More mutation yield
and better causal
gene identification
than Mendelian
disorders
5
Meyerson et al, Nat Rev Genet 2010

Hallmark of Mendelian Disease
Gene Discovery
6
Gilissen, Genome Biol 2011

Hallmark of Mendelian Disease
Gene Discovery
7

Mutation Targets vs Disorder
Frequency
Rarer disorders are focused on fewer mutated genes
8

Whole Genome or Exome Seq?
• Enabling technologies: NGS machines, open-source
algorithms, capture reagents, lowering cost, big sample
collections
• Exomes more cost effective: Sequence patient DNA
and filter common SNPs; compare parents child trios;
compare paired normal cancer
• Challenges:
– Still can’t interpret many Mendelian disorders
– Rare variants need large samples sizes
– Exome might miss region (e.g. novel non-coding
genes)
– Unsuccessful at using exome-seq to interpret clinical
data9
Shendure, Genome Biol 2011

Read Mapping
• Mapping hundreds of millions of reads back
to the reference genome is CPU and RAM
intensive, and slow
• Read quality decreases with length (small
single nucleotide mismatches or indels)
• Very few mapper deals with indel, and
often allow ~2 mismatches within first 30bp
(4 ^ 28 could still uniquely identify most
30bp sequences in a 3GB genome)
• Mapping output: SAM (BAM) or BED
10

Spaced seed
alignment
• Tags and tag-sized pieces of
reference are cut into small
“seeds.”
• Pairs of spaced seeds are
stored in an index.
• Look up spaced seeds for
each tag.
• For each “hit,” confirm the
remaining positions.
• Report results to the user.

Burrows-Wheeler
• Store entire reference
genome.
• Align tag base by base from
the end.
• When tag is traversed, all
active locations are
reported.
• If no match is found, then
back up and try a
substitution.
Trapnell & Salzberg, Nat Biotech 2009Trapnell & Salzberg, Nat Biotech 2009

Burrows-Wheeler Transform
• Reversible permutation used originally in compression
• Once BWT(T) is built, all else shown here is discarded
– Matrix will be shown for illustration only
Burrows
Wheeler
Matrix
Last column
BWT(T)T
Burrows M, Wheeler DJ: A block sorting lossless data compression algorithm. Digital Equipment
Corporation, Palo Alto, CA 1994, Technical Report 124; 1994
Slides from Ben Langmead

• Property that makes BWT(T) reversible is “LF Mapping”
– ith
occurrence of a character in Last column is same
text occurrence as the ith
occurrence in First column
T
BWT(T)
Burrows Wheeler
Matrix
Rank: 2
Rank: 2

• To recreate T from BWT(T), repeatedly apply rule:
T = BWT[ LF(i) ] + T; i = LF(i)
– Where LF(i) maps row i to row whose first character
corresponds to i’s last per LF Mapping
Final T

Exact Matching with FM Index
• To match Q in T using BWT(T), repeatedly apply rule:
top = LF(top, qc); bot = LF(bot, qc)
– Where qc is the next character in Q (right-to-left) and
LF(i, qc) maps row i to the row whose first character
corresponds to i’s last character as if it were qc

• In progressive rounds, top & bot delimit the range of
rows beginning with progressively longer suffixes of Q

• If range becomes empty (top = bot) the query suffix
(and therefore the query) does not occur in the text

Backtracking
• Consider an attempt to find Q = “agc” in T = “acaacg”:
• Instead of giving up, try to “backtrack” to a previous
position and try a different base (much slower)
• For 50bp reads, need to have ~25bp perfect match
“gc” does not
occur in the text
“g”
“c”

Seq Files
• Raw FASTQ
– Sequence ID, sequence
– Quality ID, quality score
• Mapped SAM
– Map: 0 OK, 4 unmapped,
16 mapped reverse strand
– XA (mapper-specific)
– MD: mismatch info
– NM: number of mismatch
• Mapped BED
– Chr, start, end, strand
20
@HWI-EAS305:1:1:1:991#0/1
GCTGGAGGTTCAGGCTGGCCGGATTTAAACGTAT
+HWI-EAS305:1:1:1:991#0/1
MVXUWVRKTWWULRQQMMWWBBBBBBBBBBBBBB
@HWI-EAS305:1:1:1:201#0/1
AAGACAAAGATGTGCTTTCTAAATCTGCACTAAT
+HWI-EAS305:1:1:1:201#0/1
PXX[[[[XTXYXTTWYYY[XXWWW[TMTVXWBBB
HWUSI-EAS366_0112:6:1:1298:18828#0/1 16 chr9
98116600 255 38M * 0 0
TACAATATGTCTTTATTTGAGATATGGATTTTAG
GCCG Y]bc^dab
[_UU`^`LbTUTccLbbYaY`cWLYW^ XA:i:1
MD:Z:3C30T3 NM:i:2
HWUSI-EAS366_0112:6:1:1257:18819#0/1 4 * 0 0
* * 0 0
AGACCACATGAAGCTCAAGAAGAAGGAAGACA
AAAGTG ece^dddTcT^c`a`ccdKc^^__]Yb_cKS^_W
XM:i:1
HWUSI-EAS366_0112:6:1:1315:19529#0/1 16 chr9
102610263 255 38M * 0 0
GCACTCAAGGGTACAGGAAAAGGGTCAGAAGT
GTGGCC ^c_YcLcb`bbYdTadd`dda`cddYddd^cT`
XA:i:0 MD:Z:38 NM:i:0
chr1 123450 123500 +
chr5 28374615 28374615 -
http://samtools.sourceforge.net/SAM1.pdf

Data Analysis
• Heuristic filtering
to identify novel
genes for
Mendelian
disorders
21
Stitziel et al, Genome Biol 2011

Genomic Structural Variation
22 Baker et al, Nat Meth 2012
altered genome found in a sample is shown at the bottom. B) Inversion (INV) has reciprocal join
in opposite orientations. C) Intra-chromosome translocation (ITX) has unilateral join in opposite
orientation. D) Deletion (DEL) has two breakpoints joined in ascending order of genomic
coordinates in the same orientation. E) Insertion (INS) has two breakpoints joined in descending
order of genomic coordinates in the same orientation.

Structural Variation Detection
BreakDancer
Chen et al,
Nat Meth
2009
Only look at
anomalous
read pairs

Structural Variation Detection
• Crest (Wang et al, Nat Meth 2011)
– Use soft-clipped reads, kind of like bidir-blast
24

Copy Number Variation Detection
• Change in read coverage
25

Representation: VCF Format
• http://www.1000genomes.org/node/101
26

Summary
• Whole genome and whole exome
sequencing
– Solution hybrid selection
– Specific locus for rare diseases
• Bioinformatics issues:
– Read mapping
– SNP, indel detection
– Heuristic filtering
– Structural variation detection
27

Exome Sequencing

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (6)

Similar to Exome Sequencing

Similar to Exome Sequencing (20)

Recently uploaded

Recently uploaded (20)

Exome Sequencing