Jennifer Lyon, MS, MLIS Eskind Biomedical Library Vanderbilt University Medical Center [email_address]
Begun formally in 1990, the U.S. Human Genome Project was a 13-year effort coordinated by the U.S. Department of Energy and the National Institutes of Health. Landmark papers detailing sequence and analysis of the human genome were published in the April 2003 issues of Nature and Science . Continued work has been done since to check, complete, describe, and understand the sequence. The full sequence is freely available online to anyone with internet access.
What is the Human Genome?
A genome is all the DNA contained in an organism or a cell, which includes the chromosomes plus the DNA in mitochondria (and DNA in the chloroplasts of plant cells).
The Human Genome contains about 3.1 billion base-pairs of DNA, divided into 22 autosomal (non-sex) chromosomes and 2 sex-determining chromosomes (X and Y), as well as mitochondrial DNA.
See Video: http://www.genome.gov/25520211
The initial assembly of the base-pair sequence is done at the National Center for Biotechnology Information (NCBI) and then is openly shared.
Three major sites (NCBI, EMSEMBL, and UCSC) annotate (describe) the sequence and present it for public use.
The process used to assemble the contigs and annotate the sequence is complex and continuously being refined. The full NCBI process is described at http://www.ncbi.nlm.nih.gov/genome/guide/build.shtml
Summary of Genome Assembly
Input data includes both finished and draft genomic sequence data from GenBank
Data is screened for contamination by bacterial and viral sequences
Sequence is compared to other genomes, such as the mouse
Repeats are masked
Clone layout stage
Sequence building stage
Melds of overlapping sequences formed and ordered
Contigs are placed on a chromosome using sequence overlaps with mapped STS markers and paired BAC-end sequences.
Each of the three major sites has its own process of annotating the sequence – identifying the location of biologically significant elements of the sequence including
Genes and mRNA transcripts, plus associated protein function information
Clones/Contigs (smaller sections created during the sequencing process)
Maps and Tracks
Looking at 3.1 billion ATGCs doesn’t mean much to the human eye.
Mapping of the biologically-important features is extremely important.
Each map of a specific type of biological feature is laid across the sequence like a road map. Maps can also be called ‘tracks’.
Maps vary in their unit of measurement, scale and resolution.
Multiple maps can be simultaneously viewed.
Why All These Maps?
Evolution of mapping methods over time
Older maps created before the Human Genome Project (HGP)
Some maps were created for the HGP to help in the process of reassembling the sequence
Sequence-based maps could only be created after the HGP was completed
Maps produced by different groups or using different methods often show
different types of map objects
different subsets of map objects
Types of Maps
cytogenetic maps (using chromosome band numbers)
genetic linkage maps (also called "genetic maps")
radiation hybrid maps
clone based maps (e.g., YAC map)
sequence maps (based on the completed sequence)
These are the oldest type of maps and use the light and dark bands that result from staining chromosomes with a dye. Dark bands have higher density of DNA, and therefore absorb more stain. These can viewed under a microscope.
Usually, something is hybridized (attached) to the chromosomes and labeled with a fluorescent or radioactive tag. The location is then identified microscopically based on the unique banding pattern of each chromosome
The pattern of bands on each chromosome is unique. The detail of the banding pattern has increased as microscope power has increased over time. Each human chromosome has a short arm ("p" for "petit") and long arm ("q" for "queue") , separated by a centromere . The ends of the chromosome are called telomeres . The ends of the chromosomes are labeled ptel and qtel. For example, the notation 7qtel refers to the end of the long arm of chromosome 7.
The cytogenetic bands are labeled p1, p2, p3, q1, q2, q3, etc., counting from the centromere out toward the telomeres. At higher resolutions, sub-bands can be seen within the bands. The sub-bands are also numbered from the centromere out toward the telomere.
pedigree a simplified diagram of a family's genealogy that shows family members' relationships to each other and how a particular trait or disease has been inherited.
Scale for Genetic Maps
Scale: centiMorgans (cM)
A centiMorgan is a unit of genetic distance that represents a 1% probability of recombination during meiosis.
If two genes are 1 cM apart, there is a 1% chance they will break apart during meiosis. If two genes are 20 cM apart, there is a 20% chance they will break apart during meiosis.
One cM is equivalent, on average , to a physical distance of approximately 1 megabase in the human genome. This is just an average because genetic recombination rates vary along different parts of the chromosomes.
These have the highest resolution
The sequence-based maps are the best because the identify locations by exact base-pair coordinates
An older type of physical map is a ‘radiation hybrid map’ that is based on hitting chromosomes with radiation and measuring where things are relative to the random breakpoints. RH maps are static.
YAC, BAC or other clone maps identify the position of large cloned chunks of human genomic DNA relative to the complete chromosomes. These were mostly used during the reassembly of the complete sequence.
These are the result of the Human Genome Project – the ability to identify where biological features are in exact base-pair locations
More and more sequence-based maps are being developed
These are the definitive maps and are replacing the older map types
Scale is always in base-pairs (bp) though you may see kbp (kilobase-pairs = thousands of bps) and Mbp (megabase-pairs = millions of bps)
What is known about the function of the protein this gene produces?
How many exons are in this gene?
Find a human EST cluster. Is it conserved in other organisms?
Are there any known repeat sequences within this gene?
Locate some SNPs (single-nucleotide polymorphisms) in the gene. If there is a coding SNP, identify the nucleotide change for it.
Identify a sequence tagged site (STS) within this gene. What size PCR product is made by the set of primers?
What is the size of this gene in bps?
Practice Questions (2)
Name four genes found at chromosomal location 11q13.1
Locate the gene ACTN3. Go to the sequence view level. Locate the mRNA start. Locate the first exon. What are the first five amino acids? How long is the first exon?
Find the human gene for Huntington’s Disease (HD). Display both the human and mouse genes simultaneously. In which chromosomal region for each organism are the homologous HD genes found? How similar/different are the mouse and human genes? Are any of the other genes nearby also possibly homologous between the human and the mouse?
Practice Questions (3)
Display non-synonymous SNPs in the BRCA1 gene using the UCSC browser. Color them blue. Link to external data on one of these SNPs.
Can you use ENSEMBL or the NCBI MapViewer to see only non-synonymous SNPs in the BRCA1 gene like you just did with UCSC?