2. Human Genome Project
• Historical context.
• Goals of the HGP.
• Strategy.
• Results.
• Impact on Biomedical domain.
3. « Finished » sequence
April 1953-April 2003
February 2001
4.
5. Brief history of HGP
1984 to 1986 – first proposed at US DOE meetings
1988 – endorsed by US National Research Council
(Funded by NIH and US DOE $3 billion set aside)
1990 – Human Genome Project started (NHGRI)
Later – UK, France, Japan, Germany, China
1998. Celera announces a 3-year plan to complete
the project years early
First draft published in Science and Nature in
February, 2001
Finished Human Genome sequence published in
Nature 2003.
6. Challenges
• Genome Attributes
– Size
– Polymorphism
– Repeats (Smaller repeats are technically difficult to sequence,
some sequences are repeated all over the genome: How can these
be placed?).
• Available Technology
– 600 bp per “read”(Sequencing works by extension from a primer/
gel electrophoresis. Limited by resolution of gel).
– Error (~1 error per 600. Sequencing multiple times decreases
error; same error unlikely in multiple reads. 10x Coverage = error
rate ~1/10,000).
– Relies on cloning (Some regions are difficult to clone
Heterochromatin; some sequences rearrange or are deleted when
cloned)
7. Goals of HGP
• Create a genetic and physical map of the 24
human chromosomes (22 autosomes, X & Y)
• Identify the entire set of genes & map them all to
their chromosomes
• Determine the nucleotide sequence of the
estimated 3 billion base pairs
• Analyze genetic variation among humans
• Map and sequence the genomes of model
organisms
8. Model organisms
• Bacteria (E. coli, influenza, several others)
• Yeast (Saccharomyces cerevisiae)
• Plant (Arabidopsis thaliana)
• Roundworm (Caenorhabditis elegans)
• Fruit fly (Drosophila melanogaster)
• Mouse (Mus musculus)
12. Whole-genome shotgun sequencing
Private company Celera used to sequence whole human genome
• Whole genome randomly
sheared three times
– Plasmid library constructed
with ~ 2kb inserts
– Plasmid library with ~10 kb
inserts
– BAC library with ~ 200 kb
inserts
• Computer program assembles
sequences into chromosomes
• No physical map construction
• Only one BAC library
• Reduces problems of repeat
sequences
15. Human genome content
The Human Genome
Total length 3000 Mb
~ 40,000 genes (coding seq)
Gene sequences < 5%
Exons ~ 1.5% (coding)
Introns ~ 3.5% (noncoding)
Intergenic regions (junk) > 95%
Repeats > 50%
16. Global properties
• Pericentromeric and subtelomeric regions of
chromosomes filled with large recent transposable
elements
• Marked decline in the overall activity of
transposable elements or transposons
• Male mutation rate about twice female
– most mutation occurs in males
• Recombination rates much higher in distal regions
of chromosomes and on shorter chromosome arms
– > one crossover per chromosome arm in each
meiosis
17. Important features of Human proteome
• 30,000–40,000 protein-coding genes
• Proteome (full set of proteins) more complex than
those of invertebrates.
– pre-existing components arranged into a richer
architectures.
• Hundreds of genes seem to come from horizontal
transfer from bacteria questionable
• Dozens of genes seem to come from transposable
elements.
18. Noncoding RNA genes
• Transfer RNAs (tRNAs) – adaptors that translate
triplet code of RNA into amino acid sequence of
proteins
• Ribosomal RNAs (rRNAs) – components of
ribosome
• Small nucleolar RNAs (snoRNAs) – RNA
processing and base modification in nucleolus
• Small nuclear RNAs (sncRNAs) - spliceosomes
19. Human races have similar genes
• Genome sequence centers have sequenced
significant portions of at least three races
• Range of polymorphisms within a race can
be much greater than the range of
differences between any two individuals of
different race
• Very few genes are race specific
20. • Complexity of proteome increase from
yeast to humans
– More genes
– Shuffling, increase, or decrease of functional
modules
– Alternative RNA splicing – humans exhibit
significantly more
– Chemical modification of proteins is higher in
humans
21. Yeast
• 70 human genes are known to repair mutations in yeast
•Nearly all we know about cell cycle and cancer comes from
studies of yeast
•Advantages:
•fewer genes (6000)
•few introns
• 31% of yeast genes give same products as human
homologues
22. Drosophila
• nearly all we know of how mutations affect gene function come
from Drosophila studies
•We share 50% of their genes
•61% of genes mutated in 289 human diseases are found in
fruit flies
•68% of genes associated with cancers are found in fruit flies
•Knockout mutants
•Homeobox genes
23. C. elegans
• 959 cells in the nervous system
• 131 of those programmed for apoptosis
• apoptosis involved in several human genetic neurological
disorders
•Alzheimers
•Huntingtons
•Parkinsons
24. Mouse
• known as “mini” humans
•Very similar physiological systems
•Share 90% of their genes
25. Questions Remain about the
Human Genome
– Difficult to precisely estimate number of genes
at this time
• Small genes are hard to identify
• Some genes are rarely expressed and do not have
normal codon usage patterns – thus hard to detect
27. Applications to medicine and
biology
• Disease genes
– human genomic sequence in public databases
allows rapid identification of disease genes in
silico
• Drug targets
– pharmaceutical industry has depended upon a
limited set of drug targets to develop new
therapies
– now can find new target in silico
• Basic biology
– basic physiology, cell biology…
28. Improve the understanding of disease
etiology and mechanism
Early disease risk assessment
Discover new drug targets
Disease prevention
population or ethnic group variability
The potential benefits of identifying genes/variations
involved in disease
Predisposition
Targeted screening
Prevention
Diagnosis
Therapy
Predictive
medicine
You all heard the annoncement that the human genome has been sequenced and a first draft was published in February 2001 and a « finished » sequence.
Achievement of this project was compared to the first step of man on the moon
Big deception of the space program, A lot of money to spend some where else, the huge fear from radiation and atomic bomb, main effect cancer, sequence the whole genome, read the book of life.
Before going to the Gene map, genetic maps have been constructed, vectors