The Human Genome Project
The Human Genome Project Began
in 1990
• The Mission of the HGP: The quest to
understand the human genome and the
role it plays in both health and disease.
“The true payoff from the HGP will be the ability to
better diagnose, treat, and prevent disease.”
--- Francis Collins, Director of the HGP and the National Human
Genome Research Institute (NHGRI)
The genome is our Genetic Blueprint
• Nearly every human
cell contains 23 pairs
of chromosomes
– 1 - 22 and XY or XX
•XY = Male
•XX = Female
• Length of chr 1-22, X,
Y together is ~3.2
billion bases (about 2
meters diploid)
The Genome is
Who We Are on the inside!
• Chromosomes consist
of DNA
– molecular strings of
A, C, G, & T
– base pairs, A-T, C-G
• Genes
– DNA sequences that
encode proteins
– less than 3% of human
genome
Information coded
in DNA
CACACTTGCATGTGAGAGCTTCTAATATCTAAATTAATGTTGAATCATTATTCAGAAACAGAGAGCTAACTGTTATCCCATCCTGACTTTATTCTTTATG
AGAAAAATACAGTGATTCC
AAGTTACCAAGTTAGTGCTGCTTGCTTTATAAATGAAGTAATATTTTAAAAGTTGTGCATAAGTTAAAATTCAGAAATAAAACTTCATCCTAAAACTCTGTGTGTTGCTTTAAATAAT
C
AGAGCATCTGC TACTTAATTTTTTGTGTGTGGGTGCACAATAGATGTTTAATGAGATCCTGTCATCTGTCTGCTTTTTTATTGTAAAACAGGAGGGGTTTTAATACTGGAGGAACAA
CTGATGTACCTCTGAAAAGAGA AGAGATTAGTTATTAATTGAATTGAGGGTTGTCTTGTCTTAGTAGCTTTTATTCTCTAGGTACTATTTGATTATGATTGTGAAAATAGAATTTATCC
CTCATTAAATGTAAAATCAACAGGAGAATAGCAAAAACTTATGAGATAGATGAACGTTGTGTGAGTGGCATGGTTTAATTTGTTTGGAAGAAGCACTTGCCCCAGAAGATACACAA
T
GAAATTCATGTTATTGAGTAGAGTAGTAATACAGTGTGTTCCCTTGTGAAGTTCATAACCAAGAATTTTAGTAGTGGATAGGTAGGCTGAATAACTGACTTCCTATC ATTTTCAGGTT
CTGCGTTTGATTTTTTTTACATATTAATTTCTTTGATCCACATTAAGCTCAGTTATGTATTTCCATTTTATAAATGAAAAAAAATAGGCACTTGCAAATGTCAGATCACTTGCCTGTGGT
CATTCGGGTAGAGATTTGTGGAGCTAAGTTGGTCTTAATCAAATGTCAAGCTTTTTTTTTTCTTATAAAATATAGGTTTTAATATGAGTTTTAAAATAAAATTAATTAGAAAAAGGCAA
ATTACTCAATATATATAAGGTATTGCATTTGTAATAGGTAGGTATTTCATTTTCTAGTTATGGTGGGATATTATTCAGACTATAATTCCCAATGAAAAAACTTTAAAAAATGCTAGTGA
TTGCACACTTAAAACACCTTTTAAAAAGCATTGAGAGCTTATAAAATTTTAATGAGTGATAAAACCAAATTTGAAGAGAAAAGAAGAACCCAGAGAGGTAAGGATATAACCTTAC
C
AGTTGCAATTTGCCGATCTCTACAAATATTAATATTTATTTTGACAGTTTCAGGGTGAATGAGAAAGAAACCAAAACCCAAGACTAGCATATGTTGTCTTCTTAAGGAGCCCTCCCCT
AAAAGATTGAGATGACCAAATCTTATACTCTCAGCATAAGGTGAACCAGACAGACCTAAAGCAGTGGTAGCTTGGATCCACTACTTGGGTTTGTGTGTGGCGTGACTCAGGTAATCT
CAAGAATTGAACATTTTTTTAAGGTGGTCCTACTCATACACTGCCCAGGTATTAGGGAGAAGCAAATCTGAATGCTTTATAAAAATACCCTAAAGCTAAATCTTACAATATTCTCAA
G
AACACAGTGAA ACAAGGCAAAATAAGTTAAAATCAACAAAAACAACATGAAACATAATTAGACACACAAAGACTTCAAACATTGGAAAATACCAGAGAAAGATAATAAATAT
TTTACTCTTTAAAAATTTAGTTAAAAGCTTAAACTAATTGTAGAGAAAA
AACTATGTTAGTATTATATTGTAGATGAAATAAGCAAAACATTTAAAATACAAATGTGATTACTTAAAT
TAAATATAATAGATAATTTACCACCAGATTAGATACCATTGAAGGAATAATTAATATACTGAAATACAGGTCAGTAGAATTTTTTTCAATTCAGCATGGAGATGTAAAAAATGAAA
A
TTAATGCAAAAAATAAGGGCACAAAAAGAAATGAGTAATTTTGATCAGAAATGTATTAAAATTAATAAACTGGAAATTTGACATTTAAAAAAAGCATTGTCATCCAAGTAGATGTG
TCTATTAAATAGTTGTTCTCATATCCAGTAATGTAATTATTATTCCCTCTCATGCAGTTCAGATTCTGGGGTAATCTTTAGACATCAGTTTTGTCTTTTATATTATTTATTCTGTTTACTAC
ATTTTATTTTGCTAATGATATTTTTAATTTCTGACATTCTGGAGTATTGCTTGTAAAAGGTATTTTTAAAAATACTTTATGGTTATTTTTGTGATTCCTATTCCTCTATGGACACCAAGGC
T
ATTGACATTTTCTTTGGTTTCTTCTGTTACTTCTATTTTCTTAGTGTTTATATCATTTCATAGATAGGATATTCTTTATTTTTTATTTTTATTTAAATATTTGGTGATTCTTGGTTTTCTCAGC
C
ATCTATTGTCAAGTGTTCTTATTAAGCATTATTATTAAATAAAGATTATTTCCTCTAATCACATGAGAATCTTTATTTCCCCCAAGTAATTGAAAATTGCAATGCCATGCTGCCATGTG
G
TACAGCATGGGTTTGGGCTTGCTTTCTTCTTTTTTTTTTAACTTTTATTTTAGGTTTGGGAGTACCTGTGAAAGTTTGTTATATAGGTAAACTCGTGTCACCAGGGTTTGTTGTACAGATC
A
TTTTGTCACCTAGGTACCAAGTACTCAACAATTATTTTTCCTGCTCCTCTGTCTCCTGTCACCCTCCACTCTCAAGTAGACTCCGGTGTCTGCTGTTCCATTCTTTGTGTCCATGTGTTCT
C
ATAATTTAGTTCCCCACTTGTAAGTGAGAACATGCAGTATTTTCTAGTATTTGGTTTTTTGTTCCTGTGTTAATTTGCCCAGTATAATAGCCTCCAGCTCCATCCATGTTACTGCAAAGA
A
CATGATCTCATTCTTTTTTATAGCTCCATGGTGTCTATATACCACATTTTCTTTATCTAAACTCTTATTGATGAGCATTGAGGTGGATTCTATGTCTTTGCTATTGTGCATATTGCTGCAA
G
AACATTTGTGTGCATGTGTCTTTATGGTAGAATGATATATTTTCTTCTGGGTATATATGCAGTAATGCGATTGCTGGTTGGAATGGTAGTTCTGCTTTTATCTCTTTGAGGAATTGCCAT
G
CTGCTTTCCACAATAGTTGAACTAACTTACACTCCCACTAACAGTGTGTAAGTGTTTCCTTTTCTCCACAACCTGCCAGCATCTGTTATTTTTTGACATTTTAATAGTAGCCATTTTAAC
T
GGTATGAAATTATATTTCATTGTGGTTTTAATTTGCATTTCTCTAATGATCAGTGATATTGAGTTTGTTTTTTTTCACATGCTTGTTGGCTGCATGTATGTCTTCTTTTAAAAAGTGTCTGT
T
CATGTACTTTGCCCACATTTTAATGGGGTTGTTTTTCTCTTGTAAATTTGTTTAAATTCCTTATAGGTGCTGGATTTTAGACATTTGTCAGACGCATAGTTTGCAAATAGTTTCTCCCATT
5000 bases per page
The Beginning of the Project
• Most the first 10 years of the project were
spent improving the technology to
sequence and analyze DNA.
• Scientists all around the world worked to
make detailed maps of our chromosomes
and sequence model organisms, like
worm, fruit fly, and mouse.
The Challenges were
Overwhelming
• First there was the
Assembly
The DNA sequence is so long that
no technology can read it all at
once, so it was broken into pieces.
There were millions of clones
(small sequence fragments).
The assembly process included
finding where the pieces
overlapped in order to put the
draft together.
3,200,000 piece puzzle anyone?
Introduction
• Until the early 1970’s, DNA was the most
difficult cellular molecule for biochemists
to analyze.
• DNA is now the easiest molecule to
analyze – we can now isolate a specific
region of the genome, produce a virtually
unlimited number of copies of it, and
determine its nucleotide sequence
overnight.
Introduction
• At the height of the Human Genome
Project, sequencing factories were
generating DNA sequences at a rate of
1000 nucleotides per second 24/7.
• Technical breakthroughs that allowed the
Human Genome Project to be completed
have had an enormous impact on all of
biology…..
Human Genome Project
Goals:
■ identify all the approximate 30,000 genes in human DNA,
■ determine the sequences of the 3 billion chemical base pairs that make up human
DNA,
■ store this information in databases,
■ improve tools for data analysis,
■ transfer related technologies to the private sector, and
■ address the ethical, legal, and social issues (ELSI) that may arise from the project.
Milestones:
■ 1990: Project initiated as joint effort of U.S. Department of Energy and the National
Institutes of Health
■ June 2000: Completion of a working draft of the entire human genome (covers >90%
of the genome to a depth of 3-4x redundant sequence)
■ February 2001: Analyses of the working draft are published
■ April 2003: HGP sequencing is completed and Project is declared finished two years
ahead of schedule
What does the draft human
genome sequence tell us?
By the Numbers
• The human genome contains 3 billion chemical nucleotide bases (A, C, T, and G).
• The average gene consists of 3000 bases, but sizes vary greatly, with the largest
known human gene being dystrophin at 2.4 million bases.
• The total number of genes is estimated at around 30,000--much lower than
previous estimates of 80,000 to 140,000.
• Almost all (99.9%) nucleotide bases are exactly the same in all people.
• The functions are unknown for over 50% of discovered genes.
What does the draft human
genome sequence tell us?
How It's Arranged
• The human genome's gene-dense "urban centers" are predominantly composed of
the DNA building blocks G and C.
• In contrast, the gene-poor "deserts" are rich in the DNA building blocks A and T.
GC- and AT-rich regions usually can be seen through a microscope as light and
dark bands on chromosomes.
• Genes appear to be concentrated in random areas along the genome, with vast
expanses of noncoding DNA between.
• Stretches of up to 30,000 C and G bases repeating over and over often occur
adjacent to gene-rich areas, forming a barrier between the genes and the "junk
DNA." These CpG islands are believed to help regulate gene activity.
• Chromosome 1 has the most genes (2968), and the Y chromosome has the fewest
(231).
What does the draft human
genome sequence tell us?
The Wheat from the Chaff
• Less than 2% of the genome codes for proteins.
• Repeated sequences that do not code for proteins ("junk DNA") make up at least
50% of the human genome.
• Repetitive sequences are thought to have no direct functions, but they shed light
on chromosome structure and dynamics. Over time, these repeats reshape the
genome by rearranging it, creating entirely new genes, and modifying and
reshuffling existing genes.
• The human genome has a much greater portion (50%) of repeat sequences than
the mustard weed (11%), the worm (7%), and the fly (3%).
What does the draft human
genome sequence tell us?
How the Human Compares with Other Organisms
• Unlike the human's seemingly random distribution of gene-rich areas, many other
organisms' genomes are more uniform, with genes evenly spaced throughout.
• Humans have on average three times as many kinds of proteins as the fly or worm
because of mRNA transcript "alternative splicing" and chemical modifications to the
proteins. This process can yield different protein products from the same gene.
• Humans share most of the same protein families with worms, flies, and plants; but the
number of gene family members has expanded in humans, especially in proteins
involved in development and immunity.
• Although humans appear to have stopped accumulating repeated DNA over 50 million
years ago, there seems to be no such decline in rodents. This may account for some of
the fundamental differences between hominids and rodents, although gene estimates
are similar in these species. Scientists have proposed many theories to explain
evolutionary contrasts between humans and other organisms, including those of life
span, litter sizes, inbreeding, and genetic drift.
What does the draft human
genome sequence tell us?
Variations and Mutations
• Scientists have identified about 3 million locations where single-base DNA differences
(SNPs) occur in humans. This information promises to revolutionize the processes of
finding chromosomal locations for disease-associated sequences and tracing human
history.
• The ratio of germline (sperm or egg cell) mutations is 2:1 in males vs females.
Researchers point to several reasons for the higher mutation rate in the male germline,
including the greater number of cell divisions required for sperm formation than for eggs.
What does the draft human
genome sequence tell us?
• Led to the discovery of whole new classes of
proteins and genes, while revealing that many
proteins have been much more highly conserved
in evolution than had been suspected.
• Provided new tools for determining the
functions of proteins and of individual domains
within proteins, revealing a host of unexpected
relationships between them.
What does the draft human
genome sequence tell us?
• By making large amounts of protein
available, it has yielded an efficient way to
mass produce protein hormones and
vaccines
• Dissection of regulatory genes has
provided an important tool for unraveling
the complex regulatory networks by which
eukaryotic gene expression is controlled.
Molecular Biology Of The Cell. Alberts et al. 491-495
How does the human genome stack
up?
Organism Genome Size (Bases) Estimated Genes
Human (Homo sapiens) 3 billion 30,000
Laboratory mouse (M. musculus) 2.6 billion 30,000
Mustard weed (A. thaliana) 100 million 25,000
Roundworm (C. elegans) 97 million 19,000
Fruit fly (D. melanogaster) 137 million 13,000
Yeast (S. cerevisiae) 12.1 million 6,000
Bacterium (E. coli) 4.6 million 3,200
Human immunodeficiency virus (HIV) 9700 9
http://doegenomes.org
• Gene number, exact locations, and functions
• Gene regulation
• DNA sequence organization
• Chromosomal structure and organization
• Noncoding DNA types, amount, distribution, information content, and functions
• Coordination of gene expression, protein synthesis, and post-translational events
• Interaction of proteins in complex molecular machines
• Predicted vs experimentally determined gene function
• Evolutionary conservation among organisms
• Protein conservation (structure and function)
• Proteomes (total protein content and function) in organisms
• Correlation of SNPs (single-base DNA variations among individuals) with health and
disease
• Disease-susceptibility prediction based on gene sequence variation
• Genes involved in complex traits and multigene diseases
• Complex systems biology including microbial consortia useful for environmental
restoration
• Developmental genetics, genomics
Future Challenges:
What We Still Don’t Know
Anticipated Benefits of
Genome Research
Molecular Medicine
• improve diagnosis of disease
• detect genetic predispositions to disease
• create drugs based on molecular information
• use gene therapy and control systems as drugs
• design “custom drugs” (pharmacogenomics) based on individual genetic profiles
Microbial Genomics
• rapidly detect and treat pathogens (disease-causing microbes) in clinical practice
• develop new energy sources (biofuels)
• monitor environments to detect pollutants
• protect citizenry from biological and chemical warfare
• clean up toxic waste safely and efficiently
U.S. Department of Energy Genome Programs, Genomics and Its Impact on Science and Society, 2003
Risk Assessment
• evaluate the health risks faced by individuals who may be exposed to radiation
(including low levels in industrial areas) and to cancer-causing chemicals and toxins
Bioarchaeology, Anthropology, Evolution, and Human Migration
• study evolution through germline mutations in lineages
• study migration of different population groups based on maternal inheritance
• study mutations on the Y chromosome to trace lineage and migration of males
• compare breakpoints in the evolution of mutations with ages of populations and
historical events
U.S. Department of Energy Genome Programs, Genomics and Its Impact on Science and Society, 2003
Anticipated Benefits of
Genome Research-cont.
DNA Identification (Forensics)
• identify potential suspects whose DNA may match evidence left at crime scenes
• exonerate persons wrongly accused of crimes
• identify crime and catastrophe victims
• establish paternity and other family relationships
• identify endangered and protected species as an aid to wildlife officials (could be
used for prosecuting poachers)
• detect bacteria and other organisms that may pollute air, water, soil, and food
• match organ donors with recipients in transplant programs
• determine pedigree for seed or livestock breeds
• authenticate consumables such as caviar and wine
U.S. Department of Energy Genome Programs, Genomics and Its Impact on Science and Society, 2003
Anticipated Benefits of
Genome Research-cont.
http://doegenomes.org
The Completion of the Human
Genome Sequence
• June 2000 White House
announcement that the majority
of the human genome (80%)
had been sequenced (working
draft).
• Working draft made available
on the web July 2000 at
genome.ucsc.edu.
• Publication of 90 percent of the
sequence in the February 2001
issue of the journal Nature.
• Completion of 99.99% of the
genome as finished sequence on
July 2003.
The Project is not Done…
•Next there is the Annotation:
The sequence is like a topographical map,
the annotation would include cities, towns,
schools, libraries and coffee shops!
So, where are the genes?
How do genes work?
And, how do scientists use
this information for
scientific understanding
and to benefit us?
What do genes do anyway?
• We only have ~27,000 genes, so that means that
each gene has to do a lot.
• Genes make proteins that make up nearly all we are
(muscles, hair, eyes).
• Almost everything that happens in our bodies
happens because of proteins (walking, digestion,
fighting disease).
Eye Color and Hair Color
are determined by genes
OR
OR
Of Mice and Men:
It’s all in the genes
Humans and Mice have about the same
number of genes. But we are so different
from each other, how is this possible?
One human gene can make many different
proteins while a mouse gene can only
make a few!
Did you say
cheese?
Mmm,
Cheese!
Genes are important
• By selecting different pieces of a gene, your
body can make many kinds of proteins. (This
process is called alternative splicing.)
• If a gene is “expressed” that means it is turned
on and it will make proteins.
What we’ve learned from our
genome so far…
• There are a relatively small number of human
genes, less than 30,000, but they have a complex
architecture that we are only beginning to
understand and appreciate.
-We know where 85% of genes are in the
sequence.
-We don’t know where the other 15% are
because we haven’t seen them “on” (they may only
be expressed during fetal development).
-We only know what about 20% of our genes
do so far.
• So it is relatively easy to locate genes in the
genome, but it is hard to figure out what they do.
How do scientists find genes?
• The genome is so large that useful
information is hard to find.
• Researchers at UCSC decided to make a
computational microscope to help
scientists search the genome.
• Just as you would use “google” to find
something on the internet, researchers
can use the “UCSC Genome Browser” to
find information in the human genome.
Explore it at http://genome.ucsc.edu
The UCSC Genome Browser
The browser takes you from
early maps of the genome . . .
. . . to a multi-resolution view . . .
. . . at the gene cluster level . . .
. . . the single gene level . . .
. . . the single exon level . . .
. . . and at the single base level
caggcggactcagtggatctggccagctgtgacttgacaag
caggcggactcagtggatctagccagctgtgacttgacaag
The Continuing Project
• Finding the complete set of genes and annotating
the entire sequence. Annotation is like detailing;
scientists annotate sequence by listing what has
been learn experimentally and computationally
about its function.
• Proteomics is studying the structure and function
of groups of proteins. Proteins are really
important, but we don’t really understand how
they work.
• Comparative Genomics is the process of
comparing different genomes in order to better
understand what they do and how they work.
Like comparing humans, chimpanzees, and mice
that are all mammals but all very different.

human genome project - genetics and gene mapping

  • 1.
  • 2.
    The Human GenomeProject Began in 1990 • The Mission of the HGP: The quest to understand the human genome and the role it plays in both health and disease. “The true payoff from the HGP will be the ability to better diagnose, treat, and prevent disease.” --- Francis Collins, Director of the HGP and the National Human Genome Research Institute (NHGRI)
  • 3.
    The genome isour Genetic Blueprint • Nearly every human cell contains 23 pairs of chromosomes – 1 - 22 and XY or XX •XY = Male •XX = Female • Length of chr 1-22, X, Y together is ~3.2 billion bases (about 2 meters diploid)
  • 4.
    The Genome is WhoWe Are on the inside! • Chromosomes consist of DNA – molecular strings of A, C, G, & T – base pairs, A-T, C-G • Genes – DNA sequences that encode proteins – less than 3% of human genome Information coded in DNA
  • 5.
    CACACTTGCATGTGAGAGCTTCTAATATCTAAATTAATGTTGAATCATTATTCAGAAACAGAGAGCTAACTGTTATCCCATCCTGACTTTATTCTTTATG AGAAAAATACAGTGATTCC AAGTTACCAAGTTAGTGCTGCTTGCTTTATAAATGAAGTAATATTTTAAAAGTTGTGCATAAGTTAAAATTCAGAAATAAAACTTCATCCTAAAACTCTGTGTGTTGCTTTAAATAAT C AGAGCATCTGC TACTTAATTTTTTGTGTGTGGGTGCACAATAGATGTTTAATGAGATCCTGTCATCTGTCTGCTTTTTTATTGTAAAACAGGAGGGGTTTTAATACTGGAGGAACAA CTGATGTACCTCTGAAAAGAGA AGAGATTAGTTATTAATTGAATTGAGGGTTGTCTTGTCTTAGTAGCTTTTATTCTCTAGGTACTATTTGATTATGATTGTGAAAATAGAATTTATCC CTCATTAAATGTAAAATCAACAGGAGAATAGCAAAAACTTATGAGATAGATGAACGTTGTGTGAGTGGCATGGTTTAATTTGTTTGGAAGAAGCACTTGCCCCAGAAGATACACAA T GAAATTCATGTTATTGAGTAGAGTAGTAATACAGTGTGTTCCCTTGTGAAGTTCATAACCAAGAATTTTAGTAGTGGATAGGTAGGCTGAATAACTGACTTCCTATCATTTTCAGGTT CTGCGTTTGATTTTTTTTACATATTAATTTCTTTGATCCACATTAAGCTCAGTTATGTATTTCCATTTTATAAATGAAAAAAAATAGGCACTTGCAAATGTCAGATCACTTGCCTGTGGT CATTCGGGTAGAGATTTGTGGAGCTAAGTTGGTCTTAATCAAATGTCAAGCTTTTTTTTTTCTTATAAAATATAGGTTTTAATATGAGTTTTAAAATAAAATTAATTAGAAAAAGGCAA ATTACTCAATATATATAAGGTATTGCATTTGTAATAGGTAGGTATTTCATTTTCTAGTTATGGTGGGATATTATTCAGACTATAATTCCCAATGAAAAAACTTTAAAAAATGCTAGTGA TTGCACACTTAAAACACCTTTTAAAAAGCATTGAGAGCTTATAAAATTTTAATGAGTGATAAAACCAAATTTGAAGAGAAAAGAAGAACCCAGAGAGGTAAGGATATAACCTTAC C AGTTGCAATTTGCCGATCTCTACAAATATTAATATTTATTTTGACAGTTTCAGGGTGAATGAGAAAGAAACCAAAACCCAAGACTAGCATATGTTGTCTTCTTAAGGAGCCCTCCCCT AAAAGATTGAGATGACCAAATCTTATACTCTCAGCATAAGGTGAACCAGACAGACCTAAAGCAGTGGTAGCTTGGATCCACTACTTGGGTTTGTGTGTGGCGTGACTCAGGTAATCT CAAGAATTGAACATTTTTTTAAGGTGGTCCTACTCATACACTGCCCAGGTATTAGGGAGAAGCAAATCTGAATGCTTTATAAAAATACCCTAAAGCTAAATCTTACAATATTCTCAA G AACACAGTGAA ACAAGGCAAAATAAGTTAAAATCAACAAAAACAACATGAAACATAATTAGACACACAAAGACTTCAAACATTGGAAAATACCAGAGAAAGATAATAAATAT TTTACTCTTTAAAAATTTAGTTAAAAGCTTAAACTAATTGTAGAGAAAA AACTATGTTAGTATTATATTGTAGATGAAATAAGCAAAACATTTAAAATACAAATGTGATTACTTAAAT TAAATATAATAGATAATTTACCACCAGATTAGATACCATTGAAGGAATAATTAATATACTGAAATACAGGTCAGTAGAATTTTTTTCAATTCAGCATGGAGATGTAAAAAATGAAA A TTAATGCAAAAAATAAGGGCACAAAAAGAAATGAGTAATTTTGATCAGAAATGTATTAAAATTAATAAACTGGAAATTTGACATTTAAAAAAAGCATTGTCATCCAAGTAGATGTG TCTATTAAATAGTTGTTCTCATATCCAGTAATGTAATTATTATTCCCTCTCATGCAGTTCAGATTCTGGGGTAATCTTTAGACATCAGTTTTGTCTTTTATATTATTTATTCTGTTTACTAC ATTTTATTTTGCTAATGATATTTTTAATTTCTGACATTCTGGAGTATTGCTTGTAAAAGGTATTTTTAAAAATACTTTATGGTTATTTTTGTGATTCCTATTCCTCTATGGACACCAAGGC T ATTGACATTTTCTTTGGTTTCTTCTGTTACTTCTATTTTCTTAGTGTTTATATCATTTCATAGATAGGATATTCTTTATTTTTTATTTTTATTTAAATATTTGGTGATTCTTGGTTTTCTCAGC C ATCTATTGTCAAGTGTTCTTATTAAGCATTATTATTAAATAAAGATTATTTCCTCTAATCACATGAGAATCTTTATTTCCCCCAAGTAATTGAAAATTGCAATGCCATGCTGCCATGTG G TACAGCATGGGTTTGGGCTTGCTTTCTTCTTTTTTTTTTAACTTTTATTTTAGGTTTGGGAGTACCTGTGAAAGTTTGTTATATAGGTAAACTCGTGTCACCAGGGTTTGTTGTACAGATC A TTTTGTCACCTAGGTACCAAGTACTCAACAATTATTTTTCCTGCTCCTCTGTCTCCTGTCACCCTCCACTCTCAAGTAGACTCCGGTGTCTGCTGTTCCATTCTTTGTGTCCATGTGTTCT C ATAATTTAGTTCCCCACTTGTAAGTGAGAACATGCAGTATTTTCTAGTATTTGGTTTTTTGTTCCTGTGTTAATTTGCCCAGTATAATAGCCTCCAGCTCCATCCATGTTACTGCAAAGA A CATGATCTCATTCTTTTTTATAGCTCCATGGTGTCTATATACCACATTTTCTTTATCTAAACTCTTATTGATGAGCATTGAGGTGGATTCTATGTCTTTGCTATTGTGCATATTGCTGCAA G AACATTTGTGTGCATGTGTCTTTATGGTAGAATGATATATTTTCTTCTGGGTATATATGCAGTAATGCGATTGCTGGTTGGAATGGTAGTTCTGCTTTTATCTCTTTGAGGAATTGCCAT G CTGCTTTCCACAATAGTTGAACTAACTTACACTCCCACTAACAGTGTGTAAGTGTTTCCTTTTCTCCACAACCTGCCAGCATCTGTTATTTTTTGACATTTTAATAGTAGCCATTTTAAC T GGTATGAAATTATATTTCATTGTGGTTTTAATTTGCATTTCTCTAATGATCAGTGATATTGAGTTTGTTTTTTTTCACATGCTTGTTGGCTGCATGTATGTCTTCTTTTAAAAAGTGTCTGT T CATGTACTTTGCCCACATTTTAATGGGGTTGTTTTTCTCTTGTAAATTTGTTTAAATTCCTTATAGGTGCTGGATTTTAGACATTTGTCAGACGCATAGTTTGCAAATAGTTTCTCCCATT 5000 bases per page
  • 6.
    The Beginning ofthe Project • Most the first 10 years of the project were spent improving the technology to sequence and analyze DNA. • Scientists all around the world worked to make detailed maps of our chromosomes and sequence model organisms, like worm, fruit fly, and mouse.
  • 7.
    The Challenges were Overwhelming •First there was the Assembly The DNA sequence is so long that no technology can read it all at once, so it was broken into pieces. There were millions of clones (small sequence fragments). The assembly process included finding where the pieces overlapped in order to put the draft together. 3,200,000 piece puzzle anyone?
  • 8.
    Introduction • Until theearly 1970’s, DNA was the most difficult cellular molecule for biochemists to analyze. • DNA is now the easiest molecule to analyze – we can now isolate a specific region of the genome, produce a virtually unlimited number of copies of it, and determine its nucleotide sequence overnight.
  • 9.
    Introduction • At theheight of the Human Genome Project, sequencing factories were generating DNA sequences at a rate of 1000 nucleotides per second 24/7. • Technical breakthroughs that allowed the Human Genome Project to be completed have had an enormous impact on all of biology…..
  • 10.
    Human Genome Project Goals: ■identify all the approximate 30,000 genes in human DNA, ■ determine the sequences of the 3 billion chemical base pairs that make up human DNA, ■ store this information in databases, ■ improve tools for data analysis, ■ transfer related technologies to the private sector, and ■ address the ethical, legal, and social issues (ELSI) that may arise from the project. Milestones: ■ 1990: Project initiated as joint effort of U.S. Department of Energy and the National Institutes of Health ■ June 2000: Completion of a working draft of the entire human genome (covers >90% of the genome to a depth of 3-4x redundant sequence) ■ February 2001: Analyses of the working draft are published ■ April 2003: HGP sequencing is completed and Project is declared finished two years ahead of schedule
  • 11.
    What does thedraft human genome sequence tell us? By the Numbers • The human genome contains 3 billion chemical nucleotide bases (A, C, T, and G). • The average gene consists of 3000 bases, but sizes vary greatly, with the largest known human gene being dystrophin at 2.4 million bases. • The total number of genes is estimated at around 30,000--much lower than previous estimates of 80,000 to 140,000. • Almost all (99.9%) nucleotide bases are exactly the same in all people. • The functions are unknown for over 50% of discovered genes.
  • 12.
    What does thedraft human genome sequence tell us? How It's Arranged • The human genome's gene-dense "urban centers" are predominantly composed of the DNA building blocks G and C. • In contrast, the gene-poor "deserts" are rich in the DNA building blocks A and T. GC- and AT-rich regions usually can be seen through a microscope as light and dark bands on chromosomes. • Genes appear to be concentrated in random areas along the genome, with vast expanses of noncoding DNA between. • Stretches of up to 30,000 C and G bases repeating over and over often occur adjacent to gene-rich areas, forming a barrier between the genes and the "junk DNA." These CpG islands are believed to help regulate gene activity. • Chromosome 1 has the most genes (2968), and the Y chromosome has the fewest (231).
  • 13.
    What does thedraft human genome sequence tell us? The Wheat from the Chaff • Less than 2% of the genome codes for proteins. • Repeated sequences that do not code for proteins ("junk DNA") make up at least 50% of the human genome. • Repetitive sequences are thought to have no direct functions, but they shed light on chromosome structure and dynamics. Over time, these repeats reshape the genome by rearranging it, creating entirely new genes, and modifying and reshuffling existing genes. • The human genome has a much greater portion (50%) of repeat sequences than the mustard weed (11%), the worm (7%), and the fly (3%).
  • 14.
    What does thedraft human genome sequence tell us? How the Human Compares with Other Organisms • Unlike the human's seemingly random distribution of gene-rich areas, many other organisms' genomes are more uniform, with genes evenly spaced throughout. • Humans have on average three times as many kinds of proteins as the fly or worm because of mRNA transcript "alternative splicing" and chemical modifications to the proteins. This process can yield different protein products from the same gene. • Humans share most of the same protein families with worms, flies, and plants; but the number of gene family members has expanded in humans, especially in proteins involved in development and immunity. • Although humans appear to have stopped accumulating repeated DNA over 50 million years ago, there seems to be no such decline in rodents. This may account for some of the fundamental differences between hominids and rodents, although gene estimates are similar in these species. Scientists have proposed many theories to explain evolutionary contrasts between humans and other organisms, including those of life span, litter sizes, inbreeding, and genetic drift.
  • 15.
    What does thedraft human genome sequence tell us? Variations and Mutations • Scientists have identified about 3 million locations where single-base DNA differences (SNPs) occur in humans. This information promises to revolutionize the processes of finding chromosomal locations for disease-associated sequences and tracing human history. • The ratio of germline (sperm or egg cell) mutations is 2:1 in males vs females. Researchers point to several reasons for the higher mutation rate in the male germline, including the greater number of cell divisions required for sperm formation than for eggs.
  • 16.
    What does thedraft human genome sequence tell us? • Led to the discovery of whole new classes of proteins and genes, while revealing that many proteins have been much more highly conserved in evolution than had been suspected. • Provided new tools for determining the functions of proteins and of individual domains within proteins, revealing a host of unexpected relationships between them.
  • 17.
    What does thedraft human genome sequence tell us? • By making large amounts of protein available, it has yielded an efficient way to mass produce protein hormones and vaccines • Dissection of regulatory genes has provided an important tool for unraveling the complex regulatory networks by which eukaryotic gene expression is controlled. Molecular Biology Of The Cell. Alberts et al. 491-495
  • 18.
    How does thehuman genome stack up? Organism Genome Size (Bases) Estimated Genes Human (Homo sapiens) 3 billion 30,000 Laboratory mouse (M. musculus) 2.6 billion 30,000 Mustard weed (A. thaliana) 100 million 25,000 Roundworm (C. elegans) 97 million 19,000 Fruit fly (D. melanogaster) 137 million 13,000 Yeast (S. cerevisiae) 12.1 million 6,000 Bacterium (E. coli) 4.6 million 3,200 Human immunodeficiency virus (HIV) 9700 9 http://doegenomes.org
  • 19.
    • Gene number,exact locations, and functions • Gene regulation • DNA sequence organization • Chromosomal structure and organization • Noncoding DNA types, amount, distribution, information content, and functions • Coordination of gene expression, protein synthesis, and post-translational events • Interaction of proteins in complex molecular machines • Predicted vs experimentally determined gene function • Evolutionary conservation among organisms • Protein conservation (structure and function) • Proteomes (total protein content and function) in organisms • Correlation of SNPs (single-base DNA variations among individuals) with health and disease • Disease-susceptibility prediction based on gene sequence variation • Genes involved in complex traits and multigene diseases • Complex systems biology including microbial consortia useful for environmental restoration • Developmental genetics, genomics Future Challenges: What We Still Don’t Know
  • 20.
    Anticipated Benefits of GenomeResearch Molecular Medicine • improve diagnosis of disease • detect genetic predispositions to disease • create drugs based on molecular information • use gene therapy and control systems as drugs • design “custom drugs” (pharmacogenomics) based on individual genetic profiles Microbial Genomics • rapidly detect and treat pathogens (disease-causing microbes) in clinical practice • develop new energy sources (biofuels) • monitor environments to detect pollutants • protect citizenry from biological and chemical warfare • clean up toxic waste safely and efficiently U.S. Department of Energy Genome Programs, Genomics and Its Impact on Science and Society, 2003
  • 21.
    Risk Assessment • evaluatethe health risks faced by individuals who may be exposed to radiation (including low levels in industrial areas) and to cancer-causing chemicals and toxins Bioarchaeology, Anthropology, Evolution, and Human Migration • study evolution through germline mutations in lineages • study migration of different population groups based on maternal inheritance • study mutations on the Y chromosome to trace lineage and migration of males • compare breakpoints in the evolution of mutations with ages of populations and historical events U.S. Department of Energy Genome Programs, Genomics and Its Impact on Science and Society, 2003 Anticipated Benefits of Genome Research-cont.
  • 22.
    DNA Identification (Forensics) •identify potential suspects whose DNA may match evidence left at crime scenes • exonerate persons wrongly accused of crimes • identify crime and catastrophe victims • establish paternity and other family relationships • identify endangered and protected species as an aid to wildlife officials (could be used for prosecuting poachers) • detect bacteria and other organisms that may pollute air, water, soil, and food • match organ donors with recipients in transplant programs • determine pedigree for seed or livestock breeds • authenticate consumables such as caviar and wine U.S. Department of Energy Genome Programs, Genomics and Its Impact on Science and Society, 2003 Anticipated Benefits of Genome Research-cont. http://doegenomes.org
  • 23.
    The Completion ofthe Human Genome Sequence • June 2000 White House announcement that the majority of the human genome (80%) had been sequenced (working draft). • Working draft made available on the web July 2000 at genome.ucsc.edu. • Publication of 90 percent of the sequence in the February 2001 issue of the journal Nature. • Completion of 99.99% of the genome as finished sequence on July 2003.
  • 24.
    The Project isnot Done… •Next there is the Annotation: The sequence is like a topographical map, the annotation would include cities, towns, schools, libraries and coffee shops! So, where are the genes? How do genes work? And, how do scientists use this information for scientific understanding and to benefit us?
  • 25.
    What do genesdo anyway? • We only have ~27,000 genes, so that means that each gene has to do a lot. • Genes make proteins that make up nearly all we are (muscles, hair, eyes). • Almost everything that happens in our bodies happens because of proteins (walking, digestion, fighting disease). Eye Color and Hair Color are determined by genes OR OR
  • 26.
    Of Mice andMen: It’s all in the genes Humans and Mice have about the same number of genes. But we are so different from each other, how is this possible? One human gene can make many different proteins while a mouse gene can only make a few! Did you say cheese? Mmm, Cheese!
  • 27.
    Genes are important •By selecting different pieces of a gene, your body can make many kinds of proteins. (This process is called alternative splicing.) • If a gene is “expressed” that means it is turned on and it will make proteins.
  • 28.
    What we’ve learnedfrom our genome so far… • There are a relatively small number of human genes, less than 30,000, but they have a complex architecture that we are only beginning to understand and appreciate. -We know where 85% of genes are in the sequence. -We don’t know where the other 15% are because we haven’t seen them “on” (they may only be expressed during fetal development). -We only know what about 20% of our genes do so far. • So it is relatively easy to locate genes in the genome, but it is hard to figure out what they do.
  • 29.
    How do scientistsfind genes? • The genome is so large that useful information is hard to find. • Researchers at UCSC decided to make a computational microscope to help scientists search the genome. • Just as you would use “google” to find something on the internet, researchers can use the “UCSC Genome Browser” to find information in the human genome. Explore it at http://genome.ucsc.edu
  • 30.
  • 31.
    The browser takesyou from early maps of the genome . . .
  • 32.
    . . .to a multi-resolution view . . .
  • 33.
    . . .at the gene cluster level . . .
  • 34.
    . . .the single gene level . . .
  • 35.
    . . .the single exon level . . .
  • 36.
    . . .and at the single base level caggcggactcagtggatctggccagctgtgacttgacaag caggcggactcagtggatctagccagctgtgacttgacaag
  • 37.
    The Continuing Project •Finding the complete set of genes and annotating the entire sequence. Annotation is like detailing; scientists annotate sequence by listing what has been learn experimentally and computationally about its function. • Proteomics is studying the structure and function of groups of proteins. Proteins are really important, but we don’t really understand how they work. • Comparative Genomics is the process of comparing different genomes in order to better understand what they do and how they work. Like comparing humans, chimpanzees, and mice that are all mammals but all very different.

Editor's Notes

  • #3 The female genome is larger than the male genome.
  • #4 The reference human genome sequence is a computer file of about 3 billon bases
  • #7 Small sequence fragments reproduced so there was enough material to sequence.
  • #31 First looked thru microscope. Geisma staining produced chromosome bands. Later Fluorescence In Situ Hybridization developed. Genetic map grew to 5000 markers (places where distinctive variation occurs, like those used in DNA fingerprinting).(picture shows chr 18, about 85 million bases. Order of the markers determined by studying family inheritance of variations. Studies also led to the identification of genes associated with some diseases: famous hunts for Huntington’s disease, Duchenne muscular distrophy, retinoblastoma, Cystic Fibrosis and other genes in the 80’s.
  • #32 Ochre is mouse chr 5, yellow is mouse chr7
  • #33 HD = Huntington’s disease gene. First success of RFLP (restriction fragment length polymorphism) mapping. Found linkage in 1983 before the first RFLP map was even constructed. Disease claimed Woody Guthrie. Nancy Wexler, whose mother had died of the disease, became director of the Huntington’s commission (congressional) and of NIH project. Collected family data in Venezuela. “Lucky Jim” Gusella found a link between HD and one of the first RFLP markers he tested. Took another 10 years to actually locate the gene. Takes seconds on the browser.
  • #37 Annotation: includes, for example, not only the genes, but the proteins the genes makes and what diseases they are associated with. Proteins are really important for drug development and discovery.