Alzheimer’s disease (AD) is a devastating neurodegenerative disease that is genetically complex. Although great progress has been made in identifying fully penetrant mutations in genes that cause early-onset AD, these still represent a very small percentage of AD cases. Large-scale, genome-wide association studies (GWAS) have identified at least 20 additional genetic risk loci for the more common form: late-onset AD. However, the identified SNPs are typically not the actual risk variants, but are in linkage disequilibrium with the presumed causative variants [1].
To help identify causative genetic variants, we have combined highly accurate, long-read sequencing with hybrid-capture technology. In this collaborative webinar*, we present this method and show how combining IDT xGen® Lockdown® Probes with PacBio SMRT® Sequencing allows targeting and sequencing of candidate genes from genomic DNA and corresponding transcripts from cDNA. Using a panel of target capture probes for 35 AD candidate genes, we demonstrate the power of this approach by looking at data for two individuals with AD. Some additional benefits of this method include the ability to leverage long reads, phase heterozygous variants, and link corresponding transcript isoforms to their respective alleles.
Reference: 1. Van Cauwenberghe C, Van Broeckhoven C, Sleegers K. (2016) The genetic landscape of Alzheimer disease: clinical implications and perspectives. Genet Med, 18(5):421–430.
* This presentation represents a collaboration between Pacific Biosciences and Integrated DNA Technologies. The individual opinions expressed may not reflect shared opinions of Pacific Biosciences and Integrated DNA Technologies.
4. ALZHEIMER’S DISEASE (AD)
Alzheimer’s disease is the most common form of neurodegenerative dementia.
https://www.alz.co.uk/research/WorldAlzheimerReport2015.pdf
Clinical characterization:
Progressive loss of memory and
deficits in thinking, problem
solving, and language
46.8M 131.5M
Neuropathological characterization:
Progressive cortical atrophy due to neuronal loss and
characteristic intracellular and extracellular deposits
of insoluble tau and amyloid β proteins
http://www.reverseagingcentre.com/media/links/signs-of-
alzheimers/
4
5. ALZHEIMER’S DISEASE (AD)
-Genetically divided into two different groups: early-onset and late-onset
-Relative risk for first degree relatives is 3.5 – 7.5
-30 – 48% of AD patients have an affected first-degree relative
Late-onset AD:
- Manifests after 65 years
- Multifactorial with strong genetic
predisposition
- GWAS have identified 20+ genetic risk
loci with small Odds Ratios (1.1 – 2.0
per risk allele) including both common
functional variants and rare and
structural variants
Early-onset AD:
- For 2 – 10% of patients first symptoms
occur in their 20s or 30s.
- Four genes account for 5 – 10% of
early onset AD:
-APP
PSEN1
PSEN2
APOE
The complex genetic makeup of AD
5
6. CANDIDATE DISEASE GENES IN ALZHEIMER’S DISEASE (AD)
Many associated genetic
loci contain several genes
Which candidates involved
in disease risk remains
unclear (20+ genes)
Strategies for assessing
GWAS candidate genes:
-DNA sequencing
-Transcriptome
sequencing
-Proteome studies
-Methylome studies
Cuyvers E. et al. (2016) Genetic variations underlying Alzheimer's disease: evidence from genome-wide association studies and beyond. Lancet Neurol. 15(8),857-68.
Several decade long search for risk genes in Alzheimer’s disease
6
8. TYPICAL DATA
Read lengths >20 kb
Data per SMRT Cell: 5 – 8 Gb
Half of data in reads >20 kb
Top 5% of reads >35 kb
Maximum read lengths >60 kb
Read length data shown from 30 kb size-selected human library on the Sequel System (10-hour movie, 2.0
chemistry) with a total output of 7.6 Gb. Each Sequel System SMRT Cell 1M generates ~365,000 reads.
Read length (bp)
Reads(#)
8
9. BENEFITS OF LONG-READ SEQUENCING FOR
CHARACTERIZING GENOMIC STRUCTURAL VARIATION
Mechanisms underlying structural variant formation in genomic disorders. Carvalho CM et al. Nat Rev Genet. (2016)
Structural variation (SV) is an important
contributor to human diversity and disease
SV is also difficult to characterize
Example SV Types and Mechanisms
Targeted SMRT Sequencing allows scientists to
directly characterize:
• Complete Genes (introns & exons)
• Phased Variants (allelic haplotypes)
• Repetitive Regions
• Regulatory Regions (upstream/downstream)
• Insertions & Deletions
• Copy Number Variations
At high coverage for specific genes or regions of
interest across multiple samples.
9
10. GENETIC VARIATION SEQUENCING WITH SMRT SEQUENCING
1 10 100 1 kb 10 kb 100 kb 1 Mb 10 Mb 100 Mb
Size of Variant
VARIANT
TYPE
SNPs
Small
Indels
STRs &
VNTRs
Large
Insertions,
Deletions
Mobile
Elements
Complex
Variants
Phasing SVs
and SNVs
Indels
Repeat Expansions
One PacBio Read Spans Most Variants
Structural Variants
Phasing (SNVs and SVs)
Haplotype
Reconstruction
Assembled PacBio Reads Span Euchromatic Genome Variation
L1, Alu, SVA
Copy Number Variation
Inversions / Translocations
Phasing Phased Alleles
Medium to
Large SV’s
Haplotypes
Large Structural Rearrangement
10
11. ADDITIONALLY CHARACTERIZE TRANSCRIPTOME SPLICE
VARIATION WITH LONG-READ SEQUENCING
National Human Genome Research Institute. Bioinformatics: Finding genes. (2013) http://www.genome.gov/25020001
- Proteins and their functions are not only impacted by variants in exonic regions
- Variants in regulatory regions (enhancers/promoters, including methylation) and
intronic regions can also play an important role
- High transcript isoform diversity from alternative splicing
- Obtain full-length transcript sequences with Iso-Seq analysis
11
13. CASE STUDY: VARIANT SCREENING IN ALZHEIMER’S DISEASE
WITH LONG-READ SEQUENCING
-Genomic and transcriptomic (cDNA) capture experiment
-Combined data provide better insight on variant-affected gene expression
-Gene panel applied to two AD patients (35 candidate genes):
• Average gDNA fragment size: ~6 kb
• Full-length transcripts ranging from <1 kb – ~10 kb
13
14. PACBIO TARGETED PROBE-BASED CAPTURE WORKFLOW
(GENOMIC DNA CAPTURE)
Shear to 7 kb
(6 kb for multiplex)
Amplification
Probe hybridization,
bead capture, wash
EXPERIMENTAL PIPELINE
INFORMATICS PIPELINE
Phasing with
SAMtools
Bin reads by
haplotype
Phased allelic
consensus
sequence
Tertiary
analysis
Map reads of
insert to
Reference
1 2 3 4 5
9 10 11 12 13
Size selection
3
5-9 kb
5-9 kb
6
Amplification and
SMRTbell prep.
+ Size selection
78
SequencingAnalysis
Genomic DNA
Ligate
barcoded
adapters
14
15. BEST PRACTICE SUMMARY: GENOMIC CAPTURE
-Save on project costs by multiplexing and spacing probes up to 1 kb.
-Multiplex up to 12 samples.
-Use PacBio linear barcoded adapters.
-High molecular weight DNA required.
-Size-selection highly recommended to max. on long-read recovery.
-Aim for 100-fold coverage of targeted panel size (full-length gene coverage).
15
16. 10 kb shear
AD SAMPLES: SHEARED GDNA QC
Recommend starting with HMW gDNA (2 µg)
16
20. BEST PRACTICE SUMMARY: CDNA CAPTURE
-Recover high-quality RNA transcripts
-Size-selection is optional, but helpful for specific fractions.
-Targeted capture Iso-Seq analysis is recommended to characterize splice
isoforms
-Not recommended for characterizing gene expression levels
-Aim for min. 30-fold per anticipated splice isoform in samples
-Probes can be designed to exons only and/or including introns
20
21. AD SAMPLES: MRNA QC
RIN = 8.0
RIN = 8.1
Temporal lobe 1 RNA
Temporal lobe 2 RNA
Recommend RIN > 6
(RNA Integrity Number)
21
23. DESIGNING CUSTOM IDT XGEN® LOCKDOWN® CAPTURE PANEL
-Key benefit of xGen® Lockdown® Probes is flexibility in design
-Do not need to redesign existing probe panels
-However, recommend full-gene design by including introns and
exons, plus extra upstream and downstream sequences
-Probes can be spaced up to 1000 bp apart
-Use the same probes for genomic and cDNA capture
FULL-GENE DESIGN
Gene A
Gene B
23
24. 67 2
3
39
319
154
312
SNPs AND LARGER SVs DISCOVERED IN AD SAMPLES
STUDY RESULTS:
Detected broad range of genomic
variants (SNPs and SVs):
-31 unique SVs ranging from 65 bp to
several kb in size
500+ Isoforms found in each patient
-Patient 1: 515 isoforms
-Patient 2: 507 isoforms
88% novel splice isoforms identified
-Only 39 isoform shared among both
patients and those reported in Gencode v25
24
26. ZCWPW1 GENE: ~750 bp DELETION DETECTED IN BOTH
PATIENTS
Patient 1
Patient 2
26
27. BACE1 GENE: PHASED ALLELES (34 KB)
Heterozygous SNPs can be used to phase alleles across multi-kilobase regions
Phase 0
Phase 1
Gene
Probes
Target
Phased
SNPs
27
28. BIN1 GENE: PHASED ALLELES (63 KB)
Heterozygous SNPs can be used to phase alleles across multi-kilobase regions
Gene
Probes
Target
Phased
SNPs
Phase 0
Phase 1
28
29. MAPT gene results:
-Detected a
heterozygous
deletion
-One allele is
transcribed into 21
isoforms and the
other only into 5
-Detected a novel
exon and
transcript
MAPT GENE RESULTS FOR PATIENT 1
21 isoforms
5 isoforms
Heterozygous genomic variants can be linked to
corresponding expressed transcripts
29
30. ZCWPW1 GENE: RETAINED INTRONS AND NEW EXONS
Patient
1
Patient
2
Retained intron
Novel exon
30
31. -AD has a large
economic impact on
the global society
(2010: $604B)
-To date, over 20+
putative genetic risk
variants have been
mapped
-Associated SNPs are
usually not the true
causative variant
CONCLUSION
-Combining gDNA and
cDNA data is more
informative
-Custom IDT xGen®
Lockdown® Panels
allow flexibility to scale
projects
-SMRT sequencing
provides multi-kilobase
phased alleles and full-
length transcripts
http://www.mvcenters.com/2015/02/11/dementia-
takes-toll-claims-another-american-great-dean-smith/
“Structural variants can be more informative for disease diagnostics,
prognostics and translation than current SNP mapping and exon sequencing.”
Roses A.D. et al. (2016) Structural variants can be more informative for disease diagnostics, prognostics and translation than current SNP mapping and exon sequencing. Expert Opin
Drug Metab Toxicol. 12(2),135-47.
31
32. Kevin Eng
Ting Hon
Elizabeth Tseng
Aaron Wenger
William Rowell
Jenny Ekholm
Steve Kujawa
ACKNOWLEDGEMENT
Kristina Giorda
Jiashi Wang
Mirna Jarosz
Visit PacBio Blog for new announcements and updates on Targeted Sequencing!
http://www.pacb.com/blog
http://www.pacb.com/applications/targeted-sequencing/
Feel free to contact ! Jenny Gu (jgu@pacb.com)
36. SMRT LINK PROVIDES BASIC PROCESSING OF RAW DATA FOR
TARGETED CAPTURE ENRICHMENT STUDIES
SMRT Analysis produces:
-Filtered subreads
-Circular consensus sequences
-Alignment to reference (BAM files)
-Iso-Seq full-length transcripts
36
37. BIOINFORMATICS WORKFLOW FOR PHASING ALLELES
Github: Targeted phasing consensus (genomic capture)
Subreads
Raw data SMRTLink CCS reads SMRTLink
Aligned BAM
file
IGV 3.0
Visualize
capture2target.py
Defined
phase blocks
samtoolsPhased
alleles/region
cmdline:
PacBio arrow
1 2 3a 4 5
7
8
910
3b
11
Phased consensus
sequences
(*.fasta)
12
>99.9% accuracy
(dependent on coverage)
Data
SMRTLink
Command line tools
Third party software
Probe *.bed
6
Subset
and phase
Polish
37