Lecture 3,4

Genome Sequencing Projects, Genome
Size, Application of sequence information for
identification of disease genes

Complete Genome Sequencing
 Whole genome shotgun sequencing
 BAC end sequencing
 Chromosome walking
 End sealing

Reference: http://en.wikipedia.org/wiki/File:Genome_Sizes.png

Nextgen sequencing methods
 454 sequencing methods(2006)
 Principles of pyrophosphate detection(1985, 1988)

 Illumina(Solexa) Genome sequencing methods(2007)
 Applied Biosystems ABI SOLiD System(2007)
 Helicos single molecule sequencing(Helioscope, 2007)
 Pacific Biosciences single-molecule real-time(SMRT)
technology, 2010
 Sequenom for Nanotechnology based sequencing.
 BioNanomatrixnanofluidiscs
 RNAP technology
http://www.ncbi.nlm.nih.gov/books/NBK20261/

Sequencing methods

http://www.wellcome.ac.uk/Education-resources/Teaching-and-
education/Animations/DNA/WTDV026689.htm

Ref: http://www.wellcome.ac.uk/Education-resources/Teaching-and-
education/Animations/DNA/WTX056046.htm

http://www.wellcome.ac.uk/Education-resources/Teaching-and-
education/Animations/DNA/WTX056051.htm

http://www.genomesonline.org/cgi-bin/GOLD/index.cgi

http://www.insdc.org/ http://www.ebi.ac.uk/embl
/Contact/collaboration.ht
ml

Microbial Genome Sequencing
• JGI – IMG [http://img.jgi.doe.gov/]
• Broad [http://www.broadinstitute.org/]
• TIGR [http://www.jcvi.org/]
• WashU [http://genome.wustl.edu/]
• VBI at Virginia Tech [www.vbi.vt.edu]

Human Genome Project
NHGRI
Solicited RFAs were
First
pilot sought for
Publicati
proposal for full
on in
ENCODE ENCODE
2000

In October GWAS -
Finished 90% lies First Report
1990 Human ENCODE
paper in outside on Encode
Genome coding published
2003 Published in
project started 2005 2012
2007

What happens next?
 You have 10 million characters – what to do with them?
 Locate genes
 Determine the function of the gene
 By similarity search
 By domain search
 By Predicting signal peptide
 By locating transmembrane region

Ref: http://www.nature.com/nature/journal/v406/n6797/pdf/406799a0.pdf

Genome Annotation

Run 6 frame Run Blastp
ATGAAGATAGACAG translation with nr
CATACTAGCAGCAT
AGAATAGATAAGAG
ATAGAAATAGAATA Matc
h
AATATAAGAGAGA found
N
o

Repeat
Finding, miRN Product found
A
Make an
finding, tRNAs
hmmsearch
can etc. N
O
Pathway analysis
Matc
Other analysis
h
found

Unknown
Genes Hypothesis

Genome Sizes
 Gametic Nuclear DNA content
 Represented as mass in pg(pico grams) or length in
mega bases

1 pg = 10^-12 gms
1mb = 10^6 bases
1 pg = 978 Mb

Ref: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1669731/

Genome Sizes
 Database of Genome Sizes
 http://www.cbs.dtu.dk/databases/DOGS/
 Plant Genome database
 http://www.kew.org/genomesize/homepage.html
 Mamalian genome size database
 http://www.unipv.it/webbio/dbagsdb.htm
 Animal Genome size database
 www.genomesize.com
 Fungal Genome size database.
 www.zbi.ee/fungal-genomesize

Ref: http://www.kew.org/genomesize/homepage.html

Ref: http://www.genomesize.com/

Ref: http://www-3.unipv.it/webbio/dbagsh.htm

Ref: http://www.zbi.ee/fungal-genomesize/

Identifying Human Disease genes
ref: http://www.ncbi.nlm.nih.gov/books/NBK7561/

 Before 1980, very few genes were recognized
 Reverse Genetics: Know gene product and go back to
gene and do a positional cloning
 Genetic Redundancy: Multiple genes have the same
function

Identification of genes through
protein product

1000 genomes project
 1092 genomes of different individuals sequenced.
 14 populations
 Low coverage exome sequencing

38 million SNPs
1.4 million short insertions
14,000 large deletions

Ref: http://www.nature.com/nature/journal/v491/n7422/full/nature11632.html

Lecture 3,4

Recommended

Recommended

More Related Content

What's hot

What's hot (14)

Similar to Lecture 3,4

Similar to Lecture 3,4 (20)

More from Sucheta Tripathy

More from Sucheta Tripathy (20)

Recently uploaded

Recently uploaded (20)

Lecture 3,4