SlideShare a Scribd company logo
How to interpret your
own genome.
C. Titus Brown
ctbrown@ucdavis.edu
@ctitusbrown
http://ivory.idyll.org/blog/
Second in my ongoing attempt to explain what I actually do to Terry Peppers.
Some basic facts about
DNA
The primary DNA sequence consists of strings of A, C, G, and T.
Most human cells contain approximately 6 billion of these.
They are divided into 23 chromosome pairs.
These chromosomes are the primary unit of heredity.
http://classes.biology.ucsd.edu/bimm110.SP07/lectures_WEB/L08.05_Cytogenetics.htm
How DNA is interpreted –
“It’s complicated.”
http://www.exploringnature.org/db/detail.php?dbID=106&detID=2454
How inheritance & generation
of variation works
http://genetics.thetech.org/ask/ask435
+ approximately 300-
600 mutations
per generation
If we knew a person’s genome
sequence perfectly…
We still wouldn’t know all that much!
We could correlate variation between genomes with
diseases.
We could identify parentage and genetic inheritance.
We could probably identify ethnic origin.
We could find known “mistakes” or problems.
But… why wouldn’t we know
that much?? Isn’t the genome
the person?
Let’s ignore environmental factors, first of all…
Imagine…
…you’re locked in a room, with feral lawyers roaming
around outside;
You have a bunch of source code on a stack of CDs
to understand;
And you’ve been given a Windows 98 machine with
Python installed.
(see David Beazley, “Discovering Python”, PyCon
2014)
This talk came partly from listening to his talk…
This “locked room” problem is a
pretty good analogy to genomics!
“Here are 3 billion characters of DNA! Go
figure out what it all means!”
It’s like the previous locked room problem, and:
The code is all written in Perl 8, for which neither a
specification or software interpreter exists.
But you have access to the Internet and a world-wide
collection of other scientists, and (some of) their data and
papers.
Oh, and: the answers hold the keys to life and death.
Genomes are still useful! How
do we find sequence?
Primary approach for human genomes is: spend a lot of money
sequencing one, or a few; use that as reference.
Initial cost: $2.7 bn (in 1991)
Current human genome reference is from 13 anonymous
volunteers in Buffalo, NY (Wikipedia ;)
Older technology: identify points of variation, then target for
further investigation.
Current technology: sequence. (The rest of this talk.
Next technology: longer reads. (Sequence more, better.)
Working with short read
sequencing - overview
Sequence Map
Call
variants
Interpret
Working with short read
sequencing - sequencing
Need about 250 ng of DNA at 2 ng/ul.
“Under $1,000 dollars”
http://biome.biomedcentral.com/welcome-to-the-1000-
genome/
…some up front investment required :)
Sequence Map
Call
variants
Interpret
Working with short read
sequencing - sequencing
Sequence Map
Call
variants
Interpret
@D00360:18:H8VC6ADXX:1:1103:1434:46766/1
AACCCCCTCCCCATGCTTACAAGCAAGTACAGCAATCAACCCTCAACTATCACACA
+
@@@DDDDDFHHFHHIIIBHGIIDGIA;EDGD@CG@FDDEFFB@DCGHGGIG8CHGD
Raw data looks something like this (x 2 bn)
Mapping: locate sequences in
referencehttp://en.wikipedia.org/wiki/File:Mapping_Reads.png
Sequence Map
Call
variants
Interpret
=> BAMFASTQ =>
Variant detection after mapping
http://www.kenkraaijeveld.nl/genomics/bioinformatics/
Sequence Map
Call
variants
Interpret
BAM => => VCF
Working with short-read
sequencing – annotate variants
Is it a variant known to have an effect?
Is it in a gene?
Is it in a gene and does it have some “obvious” effect (e.g.
breaking the gene)?
Has it been associated with some effect?
Sequence Map
Call
variants
Interpret
Pipeline, approaches, formats,
technologies.
Sequence Map
Call
variants
Interpret
Illumina BWA
Samtools
FreeBayes
VEP
SNPedia
Gemini bcbio 
See http://ivory.idyll.org/blog/2015-pycon-talk.html for details.
~1500 hours ~12 hours~100 hours
An example data set
Sequences from a “trio” (son, father, mother) of Ashkenazi
Jews are available, together with medical records (see links
in blog post).
The Ashkenazim branched off from other Jews ~2500 years
ago, flourished during Roman Empire, then “went through a
'severe bottleneck' as they dispersed, reducing a population
of several million to just 400 families who left Northern Italy
around the year 1000.”
http://en.wikipedia.org/wiki/Ashkenazi_Jews#Genetics
“Raw” human data:
BAM file: 108 GB
(contains sequences + quality scores)
+ human genome (~3 GB or so)
+ lots of databases of varying size.
Full instructions at:
http://ivory.idyll.org/blog/2015-pycon-talk.html
Working with short-read
sequencing – mapping.
Software such as BWA takes in a reference genome and a
set of reads and yields tab-delimited output:
D00360:37:HA3HMADXX:1:2104:14000:62852 163 chr22
16050001 15 87S8M1I10M1D41M1S =
16050476 621 CCA…. 3((…
This contains information about where each read maps, how
well it maps, etc.
Sequence Map
Call
variants
Interpret
Most parts of the genome are
sampled many times (~50,
here)
HG002 data set
Sequence Map
Call
variants
Interpret
Calling variants w/FreeBayes
https://github.com/ekg/freebayes
Sequence Map
Call
variants
Interpret
Working with short-read
sequencing – annotate variants
HG002 data setVariants annotated with VEP using Gemini.
Sequence Map
Call
variants
Interpret
Most differences are
~uninterpretable!
Total variants: 5,562,545
Between genes: 3,032,670
Between parts of genes
(exons): 2,014,962
Remaining: 514,913
(Only 2% of human genome
makes genes; maybe ~5% of
genome thought to be functional)
HG002 data set
OK, you’ve got your variants –
now what??
HT to Slate Star Codex,
http://slatestarcodex.com/2014/11/12/how-to-use-23andme-irresponsibly/
Chasing down a disease-
related variant: Canavan
disease.
http://www.snpedia.com/index.php/Rs12948217
chr17:3397702 (hg19) in HG002 sample (son)
The son and both parents
are heterozygous (1/2) for
this – they are carriers,
but not afflicted with
disease.
¼ of their children would
have homozygous allele
and probably be affected
by Canavan’s Disease:
“Children who inherit two
copies of the gene
appear normal at birth,
but between three and
nine months of age they
begin to show symptoms
... These children cannot
sit, crawl, or talk, and few
live past age 10.”
http://www.snpedia.com/index.php/Can
ease
Challenges in actually
interpreting – “version hell”.
Variant is actually a T.
Snpedia says A is the problematic variant, but that’s on
hg38.
On hg19, which is what variants were called on, relevant
gene is on reverse strand so T => A.
Human migrations into Europe (~40kya – fall of Roman Empire)
Veeramah and Novembre, doi:10.1101/cshperspect.a008516
Veeramah and Novembre, doi:10.1101/cshperspect.a008516
Human genetic comparisons overlayed on map of Europe.
Predicting new disease
variants:Can we find associations between variants and diseases?
“Genome Wide Association Study (GWAS)”
Wellcome Trust CCT, 2007,
doi:10.1038/nature05911
…cautions of GWAS:
Need to account for relatedness in samples;
Large sample sizes needed;
Complex statistics needed & “multiple testing” issues;
Different identifier/database mixtures;
Correlation is not causation;
Large effects are rare – typically many small signals
combined.
The data science problem from hell!
Where next?
Short-term: next 2-5 years
Medium-term: 10 years
Long-term: 20 years+
Short term
Lots more data! “Millions to billions of human
genomes” coming.
Individual data – est 300,000 human genomes
sequenced in 2014.
Tumor and somatic data.
Time course data (“narcissome”) - Mike Snyder
Newer sequencing data types – e.g. longer reads.
see: http://www.nature.com/news/the-rise-of-the-narciss-ome-1.10240
Short-term software
problems
Increasingly many open source Python projects
(bcbio, Gemini);
Help with integration between tools (dependency
hell, versioning hell);
Optimization of specific approaches not so
important.
Lack of concordance => technical problem.
General speed ~meh
Flexible and robust libraries still maturing.
Medium term
We’ll be sequencing everything all the time (but still
won’t really know what it means); => data integration
and data mining.
Large scale sequencing is rapidly being extended to
agriculture, ecology, and veterinary medicine.
We will soon be able to “edit” whatever genomes we
want (check out CRISPR), but will not have a good
idea of what to actually edit (c.f. Perl8 analogy,
above).
Read up on “gene drive” if you want the bejeezus scared out of you:
http://news.sciencemag.org/biology/2015/03/chain-reaction-spreads-gene-
through-insects
Longer term
No one knows.
We’ve only had large scale sequencing & the human
genome for ~15 years!!
Free associate the following:
cheap sequencing; quantified self; Internet of Things.
How to get involved?
A lot of the software is open source!
(bwa, samtools, etc. etc.)
…but:
Warning: genomics is large, and deep, and largely invisible, and
has its own culture.
Sadly, your best bet is probably to come do a PhD with someone like me, for
free.
(just kidding! …)
bcbio and Gemini
Help with:
Gemini: SQLite to PostgreSQL conversion;
Gemini: “bigwig” parsing performance;
bcbio: improving use & cleanliness of Cloud port
bcbio: moving to Common Workflow Language (note,
reference implementation in Python)
See talk blog post at http://ivory.idyll.org/2015-pycon-
talk.html for more info.
How can you sequence your
own genome?
Most genetic testing services (23andme, etc.) don’t
actually sequence your 6 billion bases of DNA; they
instead use a more targeted approach and look at
common variants or known disease variants.
If it costs < $1000, they’re not actually sequencing you :)
DNA extraction, etc, is fairly straightforward if you have
access to a lab and the necessary expertise.
Main suggestion: see http://www.personalgenomes.org/
Thanks for coming!
Please see links to data, instructions, and more reading at
http://ivory.idyll.org/blog/2015-pycon-talk.html

More Related Content

What's hot

Crash. Burn. Roast the Marshmallows.
Crash. Burn. Roast the Marshmallows.Crash. Burn. Roast the Marshmallows.
Crash. Burn. Roast the Marshmallows.
Yaniv Erlich
 
Bio263 Who is our Closest Relative
Bio263 Who is  our Closest RelativeBio263 Who is  our Closest Relative
Bio263 Who is our Closest Relative
Mark Pallen
 
Introduction to Biotechnology
Introduction to BiotechnologyIntroduction to Biotechnology
Introduction to Biotechnology
Doug Jones
 
Genome Evolution Chromosomes Heslop-Harrison ICC Prague
Genome Evolution Chromosomes Heslop-Harrison ICC PragueGenome Evolution Chromosomes Heslop-Harrison ICC Prague
Genome Evolution Chromosomes Heslop-Harrison ICC Prague
Pat (JS) Heslop-Harrison
 
Bio263 Lecture 2: Becoming human
Bio263 Lecture 2: Becoming humanBio263 Lecture 2: Becoming human
Bio263 Lecture 2: Becoming human
Mark Pallen
 
Dna of human and great ape
Dna of human and great apeDna of human and great ape
Dna of human and great ape
LekshmiJohnson
 
Why we should clone extinct animals
Why we should clone extinct animalsWhy we should clone extinct animals
Why we should clone extinct animalsMorganScience
 
Project powerpoint
Project powerpointProject powerpoint
Project powerpoint
MorganScience
 
Heterologous expression lecture
Heterologous expression lectureHeterologous expression lecture
Heterologous expression lecture
Tassanee Lerksuthirat
 
L14 human genome
L14 human genomeL14 human genome
L14 human genomeMUBOSScz
 
Human genome project
Human genome projectHuman genome project
Human genome project
YashaswineeSahoo
 
Using of dt40 chicken cell line as a reverse genetic tool to study human disease
Using of dt40 chicken cell line as a reverse genetic tool to study human diseaseUsing of dt40 chicken cell line as a reverse genetic tool to study human disease
Using of dt40 chicken cell line as a reverse genetic tool to study human disease
Tassanee Lerksuthirat
 
The Genographic Project 2015
The Genographic Project 2015The Genographic Project 2015
The Genographic Project 2015
Family Tree DNA
 
Chromosomes, Crops and Superdomestication - Pat Heslop-Harrison Malaysia
Chromosomes, Crops and Superdomestication - Pat Heslop-Harrison MalaysiaChromosomes, Crops and Superdomestication - Pat Heslop-Harrison Malaysia
Chromosomes, Crops and Superdomestication - Pat Heslop-Harrison Malaysia
Pat (JS) Heslop-Harrison
 
Xenotransplantion
 Xenotransplantion Xenotransplantion
Xenotransplantion
Achyut Bora
 
Superdomestication, feed-forward breeding and climate proofing crops
Superdomestication, feed-forward breeding and climate proofing cropsSuperdomestication, feed-forward breeding and climate proofing crops
Superdomestication, feed-forward breeding and climate proofing crops
Pat (JS) Heslop-Harrison
 
Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, Decembe...
Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, Decembe...Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, Decembe...
Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, Decembe...
Dan Graur
 
The language of life (all the subtitles)first ppt 2 bimester
The language of life (all the subtitles)first ppt 2 bimesterThe language of life (all the subtitles)first ppt 2 bimester
The language of life (all the subtitles)first ppt 2 bimester
Sofia Paz
 
Domestication, polyploidy and genomics of crops #PAGXXV Heslop-Harrison
Domestication, polyploidy and genomics of crops #PAGXXV Heslop-HarrisonDomestication, polyploidy and genomics of crops #PAGXXV Heslop-Harrison
Domestication, polyploidy and genomics of crops #PAGXXV Heslop-Harrison
Pat (JS) Heslop-Harrison
 

What's hot (20)

Crash. Burn. Roast the Marshmallows.
Crash. Burn. Roast the Marshmallows.Crash. Burn. Roast the Marshmallows.
Crash. Burn. Roast the Marshmallows.
 
Bio263 Who is our Closest Relative
Bio263 Who is  our Closest RelativeBio263 Who is  our Closest Relative
Bio263 Who is our Closest Relative
 
Introduction to Biotechnology
Introduction to BiotechnologyIntroduction to Biotechnology
Introduction to Biotechnology
 
Genome Evolution Chromosomes Heslop-Harrison ICC Prague
Genome Evolution Chromosomes Heslop-Harrison ICC PragueGenome Evolution Chromosomes Heslop-Harrison ICC Prague
Genome Evolution Chromosomes Heslop-Harrison ICC Prague
 
Bio263 Lecture 2: Becoming human
Bio263 Lecture 2: Becoming humanBio263 Lecture 2: Becoming human
Bio263 Lecture 2: Becoming human
 
Dna of human and great ape
Dna of human and great apeDna of human and great ape
Dna of human and great ape
 
Why we should clone extinct animals
Why we should clone extinct animalsWhy we should clone extinct animals
Why we should clone extinct animals
 
Project powerpoint
Project powerpointProject powerpoint
Project powerpoint
 
Heterologous expression lecture
Heterologous expression lectureHeterologous expression lecture
Heterologous expression lecture
 
L14 human genome
L14 human genomeL14 human genome
L14 human genome
 
Human genome project
Human genome projectHuman genome project
Human genome project
 
Using of dt40 chicken cell line as a reverse genetic tool to study human disease
Using of dt40 chicken cell line as a reverse genetic tool to study human diseaseUsing of dt40 chicken cell line as a reverse genetic tool to study human disease
Using of dt40 chicken cell line as a reverse genetic tool to study human disease
 
Bliss
BlissBliss
Bliss
 
The Genographic Project 2015
The Genographic Project 2015The Genographic Project 2015
The Genographic Project 2015
 
Chromosomes, Crops and Superdomestication - Pat Heslop-Harrison Malaysia
Chromosomes, Crops and Superdomestication - Pat Heslop-Harrison MalaysiaChromosomes, Crops and Superdomestication - Pat Heslop-Harrison Malaysia
Chromosomes, Crops and Superdomestication - Pat Heslop-Harrison Malaysia
 
Xenotransplantion
 Xenotransplantion Xenotransplantion
Xenotransplantion
 
Superdomestication, feed-forward breeding and climate proofing crops
Superdomestication, feed-forward breeding and climate proofing cropsSuperdomestication, feed-forward breeding and climate proofing crops
Superdomestication, feed-forward breeding and climate proofing crops
 
Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, Decembe...
Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, Decembe...Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, Decembe...
Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, Decembe...
 
The language of life (all the subtitles)first ppt 2 bimester
The language of life (all the subtitles)first ppt 2 bimesterThe language of life (all the subtitles)first ppt 2 bimester
The language of life (all the subtitles)first ppt 2 bimester
 
Domestication, polyploidy and genomics of crops #PAGXXV Heslop-Harrison
Domestication, polyploidy and genomics of crops #PAGXXV Heslop-HarrisonDomestication, polyploidy and genomics of crops #PAGXXV Heslop-Harrison
Domestication, polyploidy and genomics of crops #PAGXXV Heslop-Harrison
 

Viewers also liked

Morsø erhversråd energimærkning
Morsø erhversråd   energimærkningMorsø erhversråd   energimærkning
Morsø erhversråd energimærkning
Bertel Bolt-Jørgensen
 
Luxury presentation
Luxury presentationLuxury presentation
Luxury presentation
lmeneley
 
Tendencias En Comunicacion Digital Eyeblaster Oded Lida Ded09
Tendencias En Comunicacion Digital  Eyeblaster Oded Lida Ded09Tendencias En Comunicacion Digital  Eyeblaster Oded Lida Ded09
Tendencias En Comunicacion Digital Eyeblaster Oded Lida Ded09Eyeblaster Spain
 
Celebrating 30 years
Celebrating 30 yearsCelebrating 30 years
Celebrating 30 years
kfitzsy
 
How to do windows movie maker?
How to do windows movie maker?How to do windows movie maker?
How to do windows movie maker?jessecadelina
 
Eyeblaster Analytics Bulleting Online Video
Eyeblaster  Analytics  Bulleting  Online VideoEyeblaster  Analytics  Bulleting  Online Video
Eyeblaster Analytics Bulleting Online VideoEyeblaster Spain
 
e-book: Social Business Now
e-book: Social Business Nowe-book: Social Business Now
e-book: Social Business Now
Sanne Heerink
 
Qualitative reconstruction of the camera and geometry of a scene, as a key to...
Qualitative reconstruction of the camera and geometry of a scene, as a key to...Qualitative reconstruction of the camera and geometry of a scene, as a key to...
Qualitative reconstruction of the camera and geometry of a scene, as a key to...
Alexander Lavrov
 
Point Dynamics Our Story
Point Dynamics   Our StoryPoint Dynamics   Our Story
Point Dynamics Our Story
guestc8ec941c
 
Presentation Flazznet
Presentation FlazznetPresentation Flazznet
Presentation FlazznetSusy Rizky
 
Getting results when working with english result
Getting results when working with english resultGetting results when working with english result
Getting results when working with english resultemege68
 
Ondernemen in de toekomst
Ondernemen in de toekomstOndernemen in de toekomst
Ondernemen in de toekomst
Piet van Vugt
 
Ondernemen kwf 26 nov 2012
Ondernemen kwf 26 nov 2012Ondernemen kwf 26 nov 2012
Ondernemen kwf 26 nov 2012
Piet van Vugt
 
Passivhuse: Udfordringer og muligheder
Passivhuse: Udfordringer og mulighederPassivhuse: Udfordringer og muligheder
Passivhuse: Udfordringer og muligheder
Bertel Bolt-Jørgensen
 
MicroMedia B2B / case Rocla
MicroMedia B2B / case RoclaMicroMedia B2B / case Rocla
MicroMedia B2B / case RoclaAntti81
 
Loco Legacy Mini-Update
Loco Legacy Mini-UpdateLoco Legacy Mini-Update
Loco Legacy Mini-Update
guest2cd8a3
 
Personnel Planning &amp; Recruiting
Personnel Planning &amp; RecruitingPersonnel Planning &amp; Recruiting
Personnel Planning &amp; Recruiting
abir014
 

Viewers also liked (20)

Morsø erhversråd energimærkning
Morsø erhversråd   energimærkningMorsø erhversråd   energimærkning
Morsø erhversråd energimærkning
 
Luxury presentation
Luxury presentationLuxury presentation
Luxury presentation
 
Tendencias En Comunicacion Digital Eyeblaster Oded Lida Ded09
Tendencias En Comunicacion Digital  Eyeblaster Oded Lida Ded09Tendencias En Comunicacion Digital  Eyeblaster Oded Lida Ded09
Tendencias En Comunicacion Digital Eyeblaster Oded Lida Ded09
 
Celebrating 30 years
Celebrating 30 yearsCelebrating 30 years
Celebrating 30 years
 
How to do windows movie maker?
How to do windows movie maker?How to do windows movie maker?
How to do windows movie maker?
 
Eyeblaster Analytics Bulleting Online Video
Eyeblaster  Analytics  Bulleting  Online VideoEyeblaster  Analytics  Bulleting  Online Video
Eyeblaster Analytics Bulleting Online Video
 
e-book: Social Business Now
e-book: Social Business Nowe-book: Social Business Now
e-book: Social Business Now
 
Qualitative reconstruction of the camera and geometry of a scene, as a key to...
Qualitative reconstruction of the camera and geometry of a scene, as a key to...Qualitative reconstruction of the camera and geometry of a scene, as a key to...
Qualitative reconstruction of the camera and geometry of a scene, as a key to...
 
Point Dynamics Our Story
Point Dynamics   Our StoryPoint Dynamics   Our Story
Point Dynamics Our Story
 
Presentation Flazznet
Presentation FlazznetPresentation Flazznet
Presentation Flazznet
 
Getting results when working with english result
Getting results when working with english resultGetting results when working with english result
Getting results when working with english result
 
Ondernemen in de toekomst
Ondernemen in de toekomstOndernemen in de toekomst
Ondernemen in de toekomst
 
Ondernemen kwf 26 nov 2012
Ondernemen kwf 26 nov 2012Ondernemen kwf 26 nov 2012
Ondernemen kwf 26 nov 2012
 
Passivhuse: Udfordringer og muligheder
Passivhuse: Udfordringer og mulighederPassivhuse: Udfordringer og muligheder
Passivhuse: Udfordringer og muligheder
 
Vizerra 2010
Vizerra 2010Vizerra 2010
Vizerra 2010
 
MicroMedia B2B / case Rocla
MicroMedia B2B / case RoclaMicroMedia B2B / case Rocla
MicroMedia B2B / case Rocla
 
Br10 sommerhus
Br10 sommerhusBr10 sommerhus
Br10 sommerhus
 
إلى
إلىإلى
إلى
 
Loco Legacy Mini-Update
Loco Legacy Mini-UpdateLoco Legacy Mini-Update
Loco Legacy Mini-Update
 
Personnel Planning &amp; Recruiting
Personnel Planning &amp; RecruitingPersonnel Planning &amp; Recruiting
Personnel Planning &amp; Recruiting
 

Similar to 2015 pycon-talk

2014 whitney-public-talk
2014 whitney-public-talk2014 whitney-public-talk
2014 whitney-public-talkc.titus.brown
 
A voyage-inward-02
A voyage-inward-02A voyage-inward-02
A voyage-inward-02
Raman Kannan
 
Genomic Data Analysis
Genomic Data AnalysisGenomic Data Analysis
Genomic Data Analysis
Data Driven Innovation
 
Instructions for Written Assignment 2For the second (and final.docx
Instructions for Written Assignment 2For the second (and final.docxInstructions for Written Assignment 2For the second (and final.docx
Instructions for Written Assignment 2For the second (and final.docx
maoanderton
 
Human genome project 1
Human genome project 1Human genome project 1
Human genome project 1
surendran aduthila
 
HGP.ppt
HGP.pptHGP.ppt
HGP.ppt
Silpa87
 
2012 hpcuserforum talk
2012 hpcuserforum talk2012 hpcuserforum talk
2012 hpcuserforum talk
c.titus.brown
 
2015 illinois-talk
2015 illinois-talk2015 illinois-talk
2015 illinois-talk
c.titus.brown
 
TLSC Biotech 101 Noc 2010 (Moore)
TLSC Biotech 101 Noc 2010 (Moore)TLSC Biotech 101 Noc 2010 (Moore)
TLSC Biotech 101 Noc 2010 (Moore)
jmoore89
 
Dan Graur - Can the human genome be 100% functional?
Dan Graur - Can the human genome be 100% functional?Dan Graur - Can the human genome be 100% functional?
Dan Graur - Can the human genome be 100% functional?
Andrei Afanasiev
 
Marzillier_09052014.pdf
Marzillier_09052014.pdfMarzillier_09052014.pdf
Marzillier_09052014.pdf
7006ASWATHIRR
 
PAPER 3.1 ~ HUMAN GENOME PROJECT
PAPER 3.1 ~  HUMAN GENOME PROJECTPAPER 3.1 ~  HUMAN GENOME PROJECT
PAPER 3.1 ~ HUMAN GENOME PROJECT
Nusrat Gulbarga
 
Data analytics challenges in genomics
Data analytics challenges in genomicsData analytics challenges in genomics
Data analytics challenges in genomics
mikaelhuss
 
Complete assignment on human Genome Project
Complete assignment on human Genome ProjectComplete assignment on human Genome Project
Complete assignment on human Genome Project
aafaq ali
 

Similar to 2015 pycon-talk (20)

2014 whitney-public-talk
2014 whitney-public-talk2014 whitney-public-talk
2014 whitney-public-talk
 
2014 naples
2014 naples2014 naples
2014 naples
 
A voyage-inward-02
A voyage-inward-02A voyage-inward-02
A voyage-inward-02
 
2013 alumni-webinar
2013 alumni-webinar2013 alumni-webinar
2013 alumni-webinar
 
2014 ucl
2014 ucl2014 ucl
2014 ucl
 
2014 villefranche
2014 villefranche2014 villefranche
2014 villefranche
 
Genomic Data Analysis
Genomic Data AnalysisGenomic Data Analysis
Genomic Data Analysis
 
Instructions for Written Assignment 2For the second (and final.docx
Instructions for Written Assignment 2For the second (and final.docxInstructions for Written Assignment 2For the second (and final.docx
Instructions for Written Assignment 2For the second (and final.docx
 
Human genome project 1
Human genome project 1Human genome project 1
Human genome project 1
 
HGP.ppt
HGP.pptHGP.ppt
HGP.ppt
 
2012 hpcuserforum talk
2012 hpcuserforum talk2012 hpcuserforum talk
2012 hpcuserforum talk
 
2014 sage-talk
2014 sage-talk2014 sage-talk
2014 sage-talk
 
2015 illinois-talk
2015 illinois-talk2015 illinois-talk
2015 illinois-talk
 
TLSC Biotech 101 Noc 2010 (Moore)
TLSC Biotech 101 Noc 2010 (Moore)TLSC Biotech 101 Noc 2010 (Moore)
TLSC Biotech 101 Noc 2010 (Moore)
 
Dan Graur - Can the human genome be 100% functional?
Dan Graur - Can the human genome be 100% functional?Dan Graur - Can the human genome be 100% functional?
Dan Graur - Can the human genome be 100% functional?
 
Human encodeproject
Human encodeprojectHuman encodeproject
Human encodeproject
 
Marzillier_09052014.pdf
Marzillier_09052014.pdfMarzillier_09052014.pdf
Marzillier_09052014.pdf
 
PAPER 3.1 ~ HUMAN GENOME PROJECT
PAPER 3.1 ~  HUMAN GENOME PROJECTPAPER 3.1 ~  HUMAN GENOME PROJECT
PAPER 3.1 ~ HUMAN GENOME PROJECT
 
Data analytics challenges in genomics
Data analytics challenges in genomicsData analytics challenges in genomics
Data analytics challenges in genomics
 
Complete assignment on human Genome Project
Complete assignment on human Genome ProjectComplete assignment on human Genome Project
Complete assignment on human Genome Project
 

More from c.titus.brown

2016 bergen-sars
2016 bergen-sars2016 bergen-sars
2016 bergen-sars
c.titus.brown
 
2016 davis-plantbio
2016 davis-plantbio2016 davis-plantbio
2016 davis-plantbio
c.titus.brown
 
2016 davis-biotech
2016 davis-biotech2016 davis-biotech
2016 davis-biotech
c.titus.brown
 
2015 genome-center
2015 genome-center2015 genome-center
2015 genome-center
c.titus.brown
 
2015 beacon-metagenome-tutorial
2015 beacon-metagenome-tutorial2015 beacon-metagenome-tutorial
2015 beacon-metagenome-tutorial
c.titus.brown
 
2015 aem-grs-keynote
2015 aem-grs-keynote2015 aem-grs-keynote
2015 aem-grs-keynote
c.titus.brown
 
2015 msu-code-review
2015 msu-code-review2015 msu-code-review
2015 msu-code-review
c.titus.brown
 
2015 mcgill-talk
2015 mcgill-talk2015 mcgill-talk
2015 mcgill-talk
c.titus.brown
 
2015 opencon-webcast
2015 opencon-webcast2015 opencon-webcast
2015 opencon-webcast
c.titus.brown
 
2015 vancouver-vanbug
2015 vancouver-vanbug2015 vancouver-vanbug
2015 vancouver-vanbug
c.titus.brown
 
2015 osu-metagenome
2015 osu-metagenome2015 osu-metagenome
2015 osu-metagenome
c.titus.brown
 
2015 ohsu-metagenome
2015 ohsu-metagenome2015 ohsu-metagenome
2015 ohsu-metagenome
c.titus.brown
 
2015 balti-and-bioinformatics
2015 balti-and-bioinformatics2015 balti-and-bioinformatics
2015 balti-and-bioinformatics
c.titus.brown
 
2015 pag-chicken
2015 pag-chicken2015 pag-chicken
2015 pag-chicken
c.titus.brown
 
2015 pag-metagenome
2015 pag-metagenome2015 pag-metagenome
2015 pag-metagenome
c.titus.brown
 
2014 nyu-bio-talk
2014 nyu-bio-talk2014 nyu-bio-talk
2014 nyu-bio-talk
c.titus.brown
 
2014 bangkok-talk
2014 bangkok-talk2014 bangkok-talk
2014 bangkok-talk
c.titus.brown
 
2014 anu-canberra-streaming
2014 anu-canberra-streaming2014 anu-canberra-streaming
2014 anu-canberra-streaming
c.titus.brown
 
2014 nicta-reproducibility
2014 nicta-reproducibility2014 nicta-reproducibility
2014 nicta-reproducibility
c.titus.brown
 
2014 aus-agta
2014 aus-agta2014 aus-agta
2014 aus-agta
c.titus.brown
 

More from c.titus.brown (20)

2016 bergen-sars
2016 bergen-sars2016 bergen-sars
2016 bergen-sars
 
2016 davis-plantbio
2016 davis-plantbio2016 davis-plantbio
2016 davis-plantbio
 
2016 davis-biotech
2016 davis-biotech2016 davis-biotech
2016 davis-biotech
 
2015 genome-center
2015 genome-center2015 genome-center
2015 genome-center
 
2015 beacon-metagenome-tutorial
2015 beacon-metagenome-tutorial2015 beacon-metagenome-tutorial
2015 beacon-metagenome-tutorial
 
2015 aem-grs-keynote
2015 aem-grs-keynote2015 aem-grs-keynote
2015 aem-grs-keynote
 
2015 msu-code-review
2015 msu-code-review2015 msu-code-review
2015 msu-code-review
 
2015 mcgill-talk
2015 mcgill-talk2015 mcgill-talk
2015 mcgill-talk
 
2015 opencon-webcast
2015 opencon-webcast2015 opencon-webcast
2015 opencon-webcast
 
2015 vancouver-vanbug
2015 vancouver-vanbug2015 vancouver-vanbug
2015 vancouver-vanbug
 
2015 osu-metagenome
2015 osu-metagenome2015 osu-metagenome
2015 osu-metagenome
 
2015 ohsu-metagenome
2015 ohsu-metagenome2015 ohsu-metagenome
2015 ohsu-metagenome
 
2015 balti-and-bioinformatics
2015 balti-and-bioinformatics2015 balti-and-bioinformatics
2015 balti-and-bioinformatics
 
2015 pag-chicken
2015 pag-chicken2015 pag-chicken
2015 pag-chicken
 
2015 pag-metagenome
2015 pag-metagenome2015 pag-metagenome
2015 pag-metagenome
 
2014 nyu-bio-talk
2014 nyu-bio-talk2014 nyu-bio-talk
2014 nyu-bio-talk
 
2014 bangkok-talk
2014 bangkok-talk2014 bangkok-talk
2014 bangkok-talk
 
2014 anu-canberra-streaming
2014 anu-canberra-streaming2014 anu-canberra-streaming
2014 anu-canberra-streaming
 
2014 nicta-reproducibility
2014 nicta-reproducibility2014 nicta-reproducibility
2014 nicta-reproducibility
 
2014 aus-agta
2014 aus-agta2014 aus-agta
2014 aus-agta
 

Recently uploaded

Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Ana Luísa Pinho
 
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdfUnveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Erdal Coalmaker
 
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills MN
 
bordetella pertussis.................................ppt
bordetella pertussis.................................pptbordetella pertussis.................................ppt
bordetella pertussis.................................ppt
kejapriya1
 
Richard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlandsRichard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlands
Richard Gill
 
Eukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptxEukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptx
RitabrataSarkar3
 
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptxThe use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
MAGOTI ERNEST
 
ESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptxESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptx
PRIYANKA PATEL
 
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
yqqaatn0
 
Orion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWSOrion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWS
Columbia Weather Systems
 
Oedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptxOedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptx
muralinath2
 
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdfTopic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
TinyAnderson
 
SAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdfSAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdf
KrushnaDarade1
 
BREEDING METHODS FOR DISEASE RESISTANCE.pptx
BREEDING METHODS FOR DISEASE RESISTANCE.pptxBREEDING METHODS FOR DISEASE RESISTANCE.pptx
BREEDING METHODS FOR DISEASE RESISTANCE.pptx
RASHMI M G
 
Mudde & Rovira Kaltwasser. - Populism - a very short introduction [2017].pdf
Mudde & Rovira Kaltwasser. - Populism - a very short introduction [2017].pdfMudde & Rovira Kaltwasser. - Populism - a very short introduction [2017].pdf
Mudde & Rovira Kaltwasser. - Populism - a very short introduction [2017].pdf
frank0071
 
Shallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptxShallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptx
Gokturk Mehmet Dilci
 
platelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptxplatelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptx
muralinath2
 
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốtmô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
HongcNguyn6
 
Anemia_ types_clinical significance.pptx
Anemia_ types_clinical significance.pptxAnemia_ types_clinical significance.pptx
Anemia_ types_clinical significance.pptx
muralinath2
 
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
University of Maribor
 

Recently uploaded (20)

Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
 
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdfUnveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdf
 
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
 
bordetella pertussis.................................ppt
bordetella pertussis.................................pptbordetella pertussis.................................ppt
bordetella pertussis.................................ppt
 
Richard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlandsRichard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlands
 
Eukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptxEukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptx
 
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptxThe use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
 
ESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptxESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptx
 
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
 
Orion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWSOrion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWS
 
Oedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptxOedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptx
 
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdfTopic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
 
SAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdfSAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdf
 
BREEDING METHODS FOR DISEASE RESISTANCE.pptx
BREEDING METHODS FOR DISEASE RESISTANCE.pptxBREEDING METHODS FOR DISEASE RESISTANCE.pptx
BREEDING METHODS FOR DISEASE RESISTANCE.pptx
 
Mudde & Rovira Kaltwasser. - Populism - a very short introduction [2017].pdf
Mudde & Rovira Kaltwasser. - Populism - a very short introduction [2017].pdfMudde & Rovira Kaltwasser. - Populism - a very short introduction [2017].pdf
Mudde & Rovira Kaltwasser. - Populism - a very short introduction [2017].pdf
 
Shallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptxShallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptx
 
platelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptxplatelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptx
 
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốtmô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
 
Anemia_ types_clinical significance.pptx
Anemia_ types_clinical significance.pptxAnemia_ types_clinical significance.pptx
Anemia_ types_clinical significance.pptx
 
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
 

2015 pycon-talk

  • 1. How to interpret your own genome. C. Titus Brown ctbrown@ucdavis.edu @ctitusbrown http://ivory.idyll.org/blog/ Second in my ongoing attempt to explain what I actually do to Terry Peppers.
  • 2. Some basic facts about DNA The primary DNA sequence consists of strings of A, C, G, and T. Most human cells contain approximately 6 billion of these. They are divided into 23 chromosome pairs. These chromosomes are the primary unit of heredity. http://classes.biology.ucsd.edu/bimm110.SP07/lectures_WEB/L08.05_Cytogenetics.htm
  • 3. How DNA is interpreted – “It’s complicated.” http://www.exploringnature.org/db/detail.php?dbID=106&detID=2454
  • 4. How inheritance & generation of variation works http://genetics.thetech.org/ask/ask435 + approximately 300- 600 mutations per generation
  • 5. If we knew a person’s genome sequence perfectly… We still wouldn’t know all that much! We could correlate variation between genomes with diseases. We could identify parentage and genetic inheritance. We could probably identify ethnic origin. We could find known “mistakes” or problems.
  • 6. But… why wouldn’t we know that much?? Isn’t the genome the person? Let’s ignore environmental factors, first of all…
  • 7. Imagine… …you’re locked in a room, with feral lawyers roaming around outside; You have a bunch of source code on a stack of CDs to understand; And you’ve been given a Windows 98 machine with Python installed. (see David Beazley, “Discovering Python”, PyCon 2014) This talk came partly from listening to his talk…
  • 8. This “locked room” problem is a pretty good analogy to genomics! “Here are 3 billion characters of DNA! Go figure out what it all means!” It’s like the previous locked room problem, and: The code is all written in Perl 8, for which neither a specification or software interpreter exists. But you have access to the Internet and a world-wide collection of other scientists, and (some of) their data and papers. Oh, and: the answers hold the keys to life and death.
  • 9. Genomes are still useful! How do we find sequence? Primary approach for human genomes is: spend a lot of money sequencing one, or a few; use that as reference. Initial cost: $2.7 bn (in 1991) Current human genome reference is from 13 anonymous volunteers in Buffalo, NY (Wikipedia ;) Older technology: identify points of variation, then target for further investigation. Current technology: sequence. (The rest of this talk. Next technology: longer reads. (Sequence more, better.)
  • 10. Working with short read sequencing - overview Sequence Map Call variants Interpret
  • 11. Working with short read sequencing - sequencing Need about 250 ng of DNA at 2 ng/ul. “Under $1,000 dollars” http://biome.biomedcentral.com/welcome-to-the-1000- genome/ …some up front investment required :) Sequence Map Call variants Interpret
  • 12. Working with short read sequencing - sequencing Sequence Map Call variants Interpret @D00360:18:H8VC6ADXX:1:1103:1434:46766/1 AACCCCCTCCCCATGCTTACAAGCAAGTACAGCAATCAACCCTCAACTATCACACA + @@@DDDDDFHHFHHIIIBHGIIDGIA;EDGD@CG@FDDEFFB@DCGHGGIG8CHGD Raw data looks something like this (x 2 bn)
  • 13. Mapping: locate sequences in referencehttp://en.wikipedia.org/wiki/File:Mapping_Reads.png Sequence Map Call variants Interpret => BAMFASTQ =>
  • 14.
  • 15. Variant detection after mapping http://www.kenkraaijeveld.nl/genomics/bioinformatics/ Sequence Map Call variants Interpret BAM => => VCF
  • 16.
  • 17. Working with short-read sequencing – annotate variants Is it a variant known to have an effect? Is it in a gene? Is it in a gene and does it have some “obvious” effect (e.g. breaking the gene)? Has it been associated with some effect? Sequence Map Call variants Interpret
  • 18. Pipeline, approaches, formats, technologies. Sequence Map Call variants Interpret Illumina BWA Samtools FreeBayes VEP SNPedia Gemini bcbio  See http://ivory.idyll.org/blog/2015-pycon-talk.html for details. ~1500 hours ~12 hours~100 hours
  • 19. An example data set Sequences from a “trio” (son, father, mother) of Ashkenazi Jews are available, together with medical records (see links in blog post). The Ashkenazim branched off from other Jews ~2500 years ago, flourished during Roman Empire, then “went through a 'severe bottleneck' as they dispersed, reducing a population of several million to just 400 families who left Northern Italy around the year 1000.” http://en.wikipedia.org/wiki/Ashkenazi_Jews#Genetics
  • 20. “Raw” human data: BAM file: 108 GB (contains sequences + quality scores) + human genome (~3 GB or so) + lots of databases of varying size. Full instructions at: http://ivory.idyll.org/blog/2015-pycon-talk.html
  • 21. Working with short-read sequencing – mapping. Software such as BWA takes in a reference genome and a set of reads and yields tab-delimited output: D00360:37:HA3HMADXX:1:2104:14000:62852 163 chr22 16050001 15 87S8M1I10M1D41M1S = 16050476 621 CCA…. 3((… This contains information about where each read maps, how well it maps, etc. Sequence Map Call variants Interpret
  • 22. Most parts of the genome are sampled many times (~50, here) HG002 data set Sequence Map Call variants Interpret
  • 24. Working with short-read sequencing – annotate variants HG002 data setVariants annotated with VEP using Gemini. Sequence Map Call variants Interpret
  • 25. Most differences are ~uninterpretable! Total variants: 5,562,545 Between genes: 3,032,670 Between parts of genes (exons): 2,014,962 Remaining: 514,913 (Only 2% of human genome makes genes; maybe ~5% of genome thought to be functional) HG002 data set
  • 26. OK, you’ve got your variants – now what?? HT to Slate Star Codex, http://slatestarcodex.com/2014/11/12/how-to-use-23andme-irresponsibly/
  • 27. Chasing down a disease- related variant: Canavan disease. http://www.snpedia.com/index.php/Rs12948217
  • 28. chr17:3397702 (hg19) in HG002 sample (son) The son and both parents are heterozygous (1/2) for this – they are carriers, but not afflicted with disease. ¼ of their children would have homozygous allele and probably be affected by Canavan’s Disease: “Children who inherit two copies of the gene appear normal at birth, but between three and nine months of age they begin to show symptoms ... These children cannot sit, crawl, or talk, and few live past age 10.” http://www.snpedia.com/index.php/Can ease
  • 29. Challenges in actually interpreting – “version hell”. Variant is actually a T. Snpedia says A is the problematic variant, but that’s on hg38. On hg19, which is what variants were called on, relevant gene is on reverse strand so T => A.
  • 30. Human migrations into Europe (~40kya – fall of Roman Empire) Veeramah and Novembre, doi:10.1101/cshperspect.a008516
  • 31. Veeramah and Novembre, doi:10.1101/cshperspect.a008516 Human genetic comparisons overlayed on map of Europe.
  • 32. Predicting new disease variants:Can we find associations between variants and diseases? “Genome Wide Association Study (GWAS)” Wellcome Trust CCT, 2007, doi:10.1038/nature05911
  • 33. …cautions of GWAS: Need to account for relatedness in samples; Large sample sizes needed; Complex statistics needed & “multiple testing” issues; Different identifier/database mixtures; Correlation is not causation; Large effects are rare – typically many small signals combined. The data science problem from hell!
  • 34. Where next? Short-term: next 2-5 years Medium-term: 10 years Long-term: 20 years+
  • 35. Short term Lots more data! “Millions to billions of human genomes” coming. Individual data – est 300,000 human genomes sequenced in 2014. Tumor and somatic data. Time course data (“narcissome”) - Mike Snyder Newer sequencing data types – e.g. longer reads. see: http://www.nature.com/news/the-rise-of-the-narciss-ome-1.10240
  • 36. Short-term software problems Increasingly many open source Python projects (bcbio, Gemini); Help with integration between tools (dependency hell, versioning hell); Optimization of specific approaches not so important. Lack of concordance => technical problem. General speed ~meh Flexible and robust libraries still maturing.
  • 37. Medium term We’ll be sequencing everything all the time (but still won’t really know what it means); => data integration and data mining. Large scale sequencing is rapidly being extended to agriculture, ecology, and veterinary medicine. We will soon be able to “edit” whatever genomes we want (check out CRISPR), but will not have a good idea of what to actually edit (c.f. Perl8 analogy, above). Read up on “gene drive” if you want the bejeezus scared out of you: http://news.sciencemag.org/biology/2015/03/chain-reaction-spreads-gene- through-insects
  • 38. Longer term No one knows. We’ve only had large scale sequencing & the human genome for ~15 years!! Free associate the following: cheap sequencing; quantified self; Internet of Things.
  • 39. How to get involved? A lot of the software is open source! (bwa, samtools, etc. etc.) …but: Warning: genomics is large, and deep, and largely invisible, and has its own culture. Sadly, your best bet is probably to come do a PhD with someone like me, for free. (just kidding! …)
  • 40. bcbio and Gemini Help with: Gemini: SQLite to PostgreSQL conversion; Gemini: “bigwig” parsing performance; bcbio: improving use & cleanliness of Cloud port bcbio: moving to Common Workflow Language (note, reference implementation in Python) See talk blog post at http://ivory.idyll.org/2015-pycon- talk.html for more info.
  • 41. How can you sequence your own genome? Most genetic testing services (23andme, etc.) don’t actually sequence your 6 billion bases of DNA; they instead use a more targeted approach and look at common variants or known disease variants. If it costs < $1000, they’re not actually sequencing you :) DNA extraction, etc, is fairly straightforward if you have access to a lab and the necessary expertise. Main suggestion: see http://www.personalgenomes.org/
  • 42. Thanks for coming! Please see links to data, instructions, and more reading at http://ivory.idyll.org/blog/2015-pycon-talk.html