This presents a number of case studies on the application on high-throughput sequencing (HTS), next generation sequencing (NGS), to biological problems ranging from human genome sequencing, identification of disease mutations, metagenomics, virus discovery, epidemic, transmission chains and viral populations. Presented at the University of Glasgow on Friday 26th June 2015.
1. Case studies of HTS applications
Richard Orton
CVR, University of Glasgow
2. Talk Contents
• NGS
• Very brief history
• Human genome
• Human genome project, 1000 genomes, WGS – disease SNPs, Bacteria
• Meta-genomics
• Sample characterisation & Pathogen discovery
• Epidemics
• Transmission – genome sequence to answer who infected who
• Viral populations
• Rapid evolution, mutation tracking
3. NGS – HTS – 1st
– 2nd
– 3rd
Gen
• Next Generation Sequencing is now the
Current Generation Sequencing
• NGS: High-Throughput Sequencing (HTS)
• 1st
Generation: The automated Sanger
sequencing method
• 2nd
Generation: NGS (Illumina, Roche 454, Ion
Torrent etc)
• 3rd
Generation: PacBio & Oxford Nanopore: The
Next Next Generation Sequencing. Single
molecule, sequencing without read steps.
4. Human Genome Project
• 1984: Plan – Sanger sequencing
• 1990: Start
• 1998: Craig Venter & Celera
• 2001: Draft(s) published
• 2004: Final published
• Size: ~3 billion base pairs
• Cost: ~$3 billion
5. Human Genome XPRIZE
• XPRIZES: intended to encourage technological development that
could benefit mankind
• 1996: Ansari XPRIZE for suborbital spaceflight. Claimed by
SpaceShipOne in 2004 ($10 million)
• 2006: Archon Genomics XPRIZE: $10 million to rapidly and
accurately sequence 100 whole human genomes to a standard
never before achieved at a cost of $10,000 or less per genome.
• 2007: Google Lunar XPRIZE: $20 million to land a rover on the
moon, move more than 500m, transmits HD images and video
back to earth.
• 2011: Tricorder XPRIZE:$10 million for a mobile device that can
diagnose patients equal to a panel of board certified physicians
Rebranded in 2011, cancelled in 2013-
human genome $1000 in days
6. 1000 Genomes Project
• 2008: Launched
• Establish a detailed catalogue of human genetic variation
• Sequence 1000 anonymous participants from a number
of different ethnic groups within 3 years
• 2012: 1,092 Genomes announced
• Each person carries 250 to 300 loss of function variants in
annotated genes
• 50 to 100 variants previously implicated in inherited
disorders
• Mutation rate of 10-8
per base pair per generation (based
on mother-father-child trios)
• 1000 nematode genomes, 1000 plant genomes, Genome
10K project…
7. Human Genome Sequencing
Human Chromosome
Human
DNA
DNA Extraction
Sequencing
Quality Control
HTS Reads
Genome
Mapping
Cystic
Fibrosis
Sickle cell
anemia
Sickle cell SNP observed in sample
8. Rapid disease diagnosis
• Saunders et al (2012): Rapid Whole-Genome
Sequencing for Genetic Disease Diagnosis in
Neonatal Intensive Care Units
• SSAGA: Symptom and sign assisted genome
analysis: cross references mutations and
symptoms with known disease signatures
• WGS of newborn babies
• Results – analysed data - within 50-hours
• Identified a wide range of genetic disorders
• Combine with WGS of parents and siblings – rule
out SNPs of no significance
• Rapid diagnosis – Rapid treatment
9. Non-Human Genomes
• Farhat et al (2013). Genomic analysis
identifies targets of convergent positive
selection in drug-resistant
Mycobacterium tuberculosis
• 116 TB isolates – 47 were drug resistant
• By searching for the mutations across
isolates in the same nucleotide position
or gene—identified all known
resistance markers.
• Found an additional 39 genomic regions
of interest in resistant isolates.
11. Metagenomics
• Metagenomics can be defined as the sequenced-based analysis of the
whole collection of genomes directly isolated from a sample.
• Does not need isolation & culturing – just extraction & sequencing
(although a bit more to it than that!)
• 16S small subunit ribosomal RNA (rRNA) gene: relatively short, often
conserved within a species, and generally different between species.
Restricted to bacteria and archea. PCR amplicons.
14. Metagenomics
• Title: Saunders et al (2012): Afshinnekoo et al (2015):
Geospatial Resolution of Human and Bacterial Diversity with
City-Scale Metagenomics
• Headline: Scientists Basically Just Discovered Alien Life — In
The NYC Subway. Ninja Turtles?
15. Metagenomics
• Title: Granberg et al (2013): Metagenomic Detection of Viral
Pathogens in Spanish Honeybees: Co-Infection by Aphid Lethal
Paralysis, Israel Acute Paralysis and Lake Sinai Viruses
• Also Identified bees as a vector of turnip ringspot virus
16. Metagenomics – Virus Discovery
• Title: Hoffman et al (2012): Novel Orthobunyavirus in Cattle, Europe,
2011
• Illness reported in cattle with server symptons, all known diseases
excluded as cause.
• Blood samples from 3 infected cows pooled, metagenomics and de
novo assembly used
• Schmallenberg virus discovered in the sequences
18. Epidemics – Ebola
• The 2013-2015 West Africa
Ebola Epidemic, as of May
2015: 26,648 cases 11,017
deaths
• HTS sequencing as been
used throughout the
epidemic to sequence ebola
genomes from patient
samples
• Used to monitor the
evolution of the virus: how
fast is it mutating, where is it
mutating, selection
pressures
19. Epidemics – Who infected who
Genome
Sequence
SNPs Mutations
Sample A
Sample B
Sample C
Sample D
Sample E
Sample F
Sample G
Sample H
Sample I
Sample J
Sample K
Sample L
Sample M
20. Epidemics – Who infected who
B
H
L
J
C
E
K
F
D
I
M
A
G
Can then combine with
epidemiological information – date,
location etc
21. Epidemics – Who infected who
• Identify source of infection
• Identify long transmission events
• Identify Super Spreaders – individual or hub level
• Identify new incursions/spillovers
22. Epidemics – FMDV
• 2001 FMDV outbreak cost £4 - £6 billion
• Smaller outbreak in 2007 – 8 farms – in
two phases
• Sequencing showed an excessive number
of mutations between the two phases,
suggesting an undetected farm
• Targeted surveillance found the long
infected sheep farm, the epidemic was
contained
Phase 1 Phase 2
23. Epidemics – bTB
• WGS of bovine tuberculosis samples from
badgers and cattle in Northern Ireland
• Used to investigate the transmission
direction of bTB
• Small pilot study, could not infer
transmission between badgers and cows.
• But showed promise – detect between cow
transmission and herd maintenance
bTB is SLOW
25. Viral populations
• Viruses mutate rapidly
• A single virus can enter a cell, and output tens of
thousands of virions within hours
• Every time the genome is copied mutations are
introduced
• Enables viruses to adapt to change rapidly
• New environments
• New hosts
• Drug and vaccine treatment
• Viruses exist as a large, constantly and rapidly
evolving swarm – the quasi species
28. Viral mutation tracking
• Ability of detect mutations at low levels in a sample
• Can then examine samples for the presence of important mutations: e.g. drug resistance
• Mone et al (2014) – HTS to detect signatures of Highly Pathogenic Avian Influenza in earlier
low pathogenic samples
• Herfst et al (2012) Airborne Transmission of Influenza A/H5N1 Virus Between Ferrets: routine
to monitor for the 5 mutations that will make it easily spread between humans
29. Other HTS seqs
• RNA-Seq: whole transcriptome shotgun
sequencing
• Chip-Seq: combines chromatin
immunopreciptation with HTS: method to
analyse protein interactions with DNA
• RAD-Seq: Restriction site associated
markers: good for population genetic
• …
• 3rd
Gen Sequencers
Editor's Notes
Note:
If less text for title, keep white block leveled at top, otherwise the spiral will distract too much.
Thermus aquaticus
Kary Banks Mullis
These regions encode components in cell wall biosynthesis, transcriptional regulation and DNA repair pathways. Mutations in these regions could directly confer resistance or compensate for fitness costs associated with resistance. Functional genetic analysis of mutations in one gene, ponA1, demonstrated an in vitro growth advantage in the presence of the drug rifamp