Structural genomics

Structural Genomics
• Structural genomics is concerned with sequencing and understanding
the content of genomes.
• The first steps in characterizing a genome is to prepare its maps:
– Genetic Maps
– Physical Maps
These maps provide information about the:
– relative locations of genes,
– Molecular markers, and
– chromosome segments,

Genetic Maps
• Genetic maps (also called linkage maps) provide a rough
approximation of the locations of genes relative to the locations of
other known genes.
• These maps are based on the genetic function of recombination
• Individuals heterozygous at two or more genetic loci are crossed,
and the frequency of recombination between loci is determined by
examining the progeny.
• Recombination frequency between two loci is 50%, then the loci are
located on different chromosomes or are far apart on the same
chromosome.

• Recombination frequency <50%, the loci are linked.
• For linked genes, the rate of recombination is proportional to the
physical distance between the loci.
• Distances on genetic maps are measured in percent recombination
(centimorgans, cM) or map units.

Limitations in Genetic Maps
• 1st is resolution or detail.
• 3.4 billion base pairs of DNA and has a total genetic distance of
about 4000 cM, an average of 850,000 bp/cM.
• Even if a marker occurred every centimorgan (which is unrealistic),
the resolution in regard to the physical structure of the DNA would
still be quite low.
• 2nd they do not always accurately correspond to physical distances
between genes.
• Based on rates of crossing over, which vary; so the distances on a
genetic map are only approximations of real physical distances
along a chromosome.

Genetic and physical maps may differ in relative
distances and even in the position of genes on a
chromosome.
• Compares the genetic map of
chromosome III of yeast with a
physical map determined by DNA
sequencing.
• There are some discrepancies
between the distances and even
among the positions of some genes.
• In spite of these limitations, genetic
maps have been critical to the
development of physical maps and the
sequencing of whole genomes.

Physical Maps
• Based on the direct analysis of DNA, and they place genes in relation to
distances measured in number of base pairs, kilobases, or megabases.
• A common type of physical map is one when a pieces of genomic DNA is
cloned in bacteria or yeast.
• Physical maps generally have higher resolution and are more accurate than
genetic maps.
• A number of techniques exist for creating physical maps including:
• Restriction mapping, which determines the positions of restriction sites on
DNA;
• Sequence-tagged site (STS) mapping, which locates the positions of short
unique sequences of DNA on a chromosome;
• Fluorescent in situ hybridization (FISH), by which markers can be visually
mapped to locations on chromosomes;
• DNA sequencing.

Restriction Mapping
• Restriction mapping determines the relative positions of restriction
sites on a piece of DNA.
• When a piece of DNA is cut with a restriction enzyme.
• The fragments are separated by gel electrophoresis.
• The number of restriction sites in the DNA and the distances
between them can be determined by the number and positions of
bands on the gel.

• Example:
• We have sample of a linear 13,000-bp (13-kb) DNA fragment
• 1st sample is cut with the restriction enzyme EcoRI;
• Second sample of the same DNA is cut with BamHI;
• Third sample is cut with both EcoRI and BamHI (double digest).
• The resulting fragments are separated and sized by gel electrophoresis
• Determine the positions of the EcoRI and BamHI restriction sites on the
original 13-kb fragment?

• Most restriction mapping is done with several restriction enzymes,
used alone and in various combinations, producing many restriction
fragments.
• With long pieces of DNA (greater than 30 kb), computer programs
are used to determine the restriction maps.
• Restriction mapping may be facilitated by tagging one end of a large
DNA fragment with radioactivity or by identifying the end with the
use of a probe.

DNA-Sequencing Methods
• The most detailed physical maps are based on direct DNA sequence
information.
• 1975 and 1977 Frederick Sanger and his colleagues created the
dideoxy sequencing method based on the elongation of DNA;
• Allan Maxam and Walter Gilbert developed a second method based
on the chemical degradation of DNA.

How the primers are constructed when we don’t know
the sequence of DNA?
• By cloning the target DNA in a vector that contains sequences
recognized by a common primer (called universal sequencing primer
sites) on either side of the site where the target DNA will be inserted.
• The target DNA is then isolated from the vector and will contain
universal sequencing primer sites at each end

Sequencing an Entire Genome
• The ultimate goal of structural genomics is to determine the ordered
nucleotide sequences of entire genomes of organisms.
• The main obstacle to this task is the immense size of most
genomes.
– Bacterial genomes are usually at least several million base pairs long
– Eukaryotic genomes are billions of base pairs long and are distributed
among dozens of chromosomes.
– For technical reasons, it is not possible to begin sequencing at one end
of a chromosome and continue straight through to the other end;
– Only small fragments of DNA—usually from 500 to 700 nucleotides—
can be sequenced at one time.

• The DNA be broken into thousands or millions of smaller fragments
that can then be sequenced.
• Again a problem is there:
– Putting these short sequences back together in the correct order.
• Two approaches are used to resolve this task.
– Map-based sequencing
– Whole-genome shotgun sequencing

Map-based approach
• Map-based approach, requires the initial creation of detailed genetic
and physical maps of the genome, which provide known locations of
genetic markers at regularly spaced intervals along each chromosome.
• These markers can later be used to help align the short, sequenced
fragments into their correct order.
• After the genetic and physical maps are available, chromosomes or
large pieces of chromosomes are separated by:
• PFGE -large molecules of DNA or whole chromosomes are separated
in a gel by periodically alternating the orientation of an electrical current.
• Flow cytometry: chromosomes are sorted optically by size

• Each chromosome (or sometimes the entire genome) is then cut up
by partial digestion with restriction enzymes.
• Thus partial digestion produces a set of large overlapping DNA
fragments.
• Which are then cloned by using cosmids, yeast artificial
chromosomes (YACs), or bacterial artificial chromosomes (BACs).
• Clones are screened with specific probe.
A set of two or more overlapping
DNA fragments that form a
contiguous stretch of DNA is called a
contig.
Thisapproach was used in 1993 to
create a contig of the human
Y chromosome consisting of 196
overlapping YAC clones

Each clone can be cut with a series of
restriction enzymes, and the resulting
fragments are then separated by gel
electrophoresis.
A computer program is then used to
examine the restriction patterns of all
the clones and look for areas of
overlap.
The overlap is then used to arrange
the clones in order

Whole-genome shotgun sequencing
• small-insert clones are prepared directly from genomic DNA and
sequenced.
• Powerful computer programs then assemble the entire genome by
examining overlap among the small-insert clones.

Whole-genome shotgun sequencing utilizes sequence
overlap to align sequenced fragments.

The Human Genome Project
• The Human Genome Project is an effort to sequence the entire
human genome.
• Begun in 1990, a rough draft of the
sequence was completed by two
competing teams:
• An international consortium of
publicly supported investigators
• Private company Celera Genomics,
both of which finished a rough draft
of the genome sequence in 2000.

Data Supporting Structural Genomics
• In addition to the DNA sequence of an entire genome, several other
types of data are useful for genomic projects and have been the
focus of sequencing efforts.
• Including:
- SNPs
- ESTs

Single-Nucleotide Polymorphisms
• Are single-base-pair differences in DNA sequence between
individual members of a species.
• Arising through mutation.
• Single-nucleotide polymorphisms are numerous and are present
throughout genomes.
• In a comparison of the same chromosome from two different people,
a SNP can be found approximately every 1000 bp.
• Because of their variability and widespread occurrence throughout
the genome, SNPs are valuable as markers in linkage studies.

Expressed-Sequence Tags
• Another type of data identified by sequencing projects consists of
databases of expressed-sequence tags (ESTs).
• In most eukaryotic organisms, only a small percentage of the DNA
actually encodes proteins; in humans, less than 2% of human DNA
encodes the amino acids of proteins.
• If only protein-encoding genes are of interest, it is often more
efficient to examine RNA than the entire DNA genomic sequence.
• RNA can be examined by using ESTs—markers associated with
DNA sequences that are expressed as RNA.
• RNA Reverse transciptase cDNA Short stretches of
cDNA fragments are then sequenced, and the sequence obtained
(called a tag) provides a marker that identifies the DNA fragment.
• Expressed-sequence tags can be used to find active genes in a
particular tissue or at a particular point in development.

Functional Genomics
attempts to understand the function of information in genomes

Goals of functional genomics
• The goals of functional genomics include:
• Identifying all the RNA molecules transcribed from a genome (the
transcriptome)
• All the proteins encoded by the genome (the proteome).
• Functional genomics exploits both bioinformatics and laboratory-
based experimental approaches in its search to define the function
of DNA sequences.

•Several methods for identifying genes and assessing
their functions are available including:
•In situhybridization,
•DNA footprinting,
•Experimental mutagenesis,
•Use of transgenic animals and knockouts.

Predicting Function from Sequence
• The nucleotide sequence of a gene can be used to predict the
amino acid sequence of the protein that it encodes.
•The protein can then be synthesized or isolated and its properties
studied to determine its function.
•This biochemical approach to understanding gene function is both
time consuming and expensive.
•A major goal of functional genomics has been to develop
computational methods that allow gene function to be identified
from DNA sequence alone,
•This will bypassing the laborious process of isolating and
characterizing individual proteins.

Homology searches
Relies on comparing DNA and protein sequences from the same
and different organisms.
Genes that are evolutionarily related are said to be homologous.
Homologous genes found in different species that evolved from
the same gene in a common ancestor are called orthologs
For example, both mouse and human genomes contain a gene
that encodes the alpha subunit of hemoglobin;
The mouse and human alpha hemoglobin genes are said to be
orthologs, because both genes evolved from an alpha-hemoglobin
gene in a mammalian ancestor common to mice and humans.

Homologous genes in the same organism are called paralogs
 By duplication of a single gene in the evolutionary past
Within the human genome is a gene that encodes the alpha subunit
of hemoglobin and another homologous gene that encodes the beta
subunit of hemoglobin.
These two genes arose because an ancestral gene underwent
duplication and the resulting two genes diverged through evolutionary
time, giving rise to the alpha- and beta-subunit genes; these two genes
are paralogs.
 Homologous genes (both orthologs and paralogs) often have the
same or related functions; so, after a function has been assigned to a
particular gene, it can provide a clue to the function of a homologous
gene.

Database Search for Orthologous
Databases containing genes and proteins found in a wide array of
organisms are available for homology searches.
Powerful computer programs have been developed for scanning
these databases to look for particular sequences.
A commonly used homology search program is BLAST (Basic
Local Alignment Search Tool).
Suppose a geneticist sequences a genome and locates a gene
that encodes a protein of unknown function.
A homology search conducted on databases containing the DNA
or protein sequences of other organisms may identify one or more
orthologous sequences.

Database Search for Paralogous
• computer programs can search a single genome for paralogs.
• Eukaryotic organisms often contain families of genes that have
arisen by duplication of a single gene.
• If a paralog is found and its function has been previously assigned,
this function can provide information about a possible function of the
unknown gene.

Other sequence comparisons
• Complex proteins often have specific domains.
• Each domain has its characteristic amino acid arrangement
• For example, certain DNA-binding proteins attach to DNA in the same
way;
• All these proteins have a common DNA-binding domain
• Many protein domains have been characterized, and their molecular
functions have been determined.
• Newly identified gene can be scanned against a database of known
domains.
• If it encodes one or more domains of known function, the function of
the domain can provide important information about a possible
function of the new gene.

Phylogenetic Profile
• Another computational method for predicting protein function
• In this method, the presence- and-absence pattern of a particular
protein is examined across a set of organisms whose genomes have
been sequenced.
• If two proteins are either both present or both absent in all genomes
surveyed, the two proteins may be functionally related.
• Consider the following proteins in four bacterial species :
• E. coli: protein 1, protein 2, protein 3, protein 4, protein 5, protein 6
• Species A: protein 1, protein 2, protein 3, protein 6
• Species B: protein 1, protein 3, protein 4, protein 6
• Species C: protein 2, protein 4, protein 5

The phylogenetic profile reveals that
proteins 1, 3, and 6 are either all
present or all absent in all species; so
these proteins might be functionally
related.

Gene Neighbor Analysis
• Genes that encode functionally related proteins are often closely
linked in bacteria.
• For example, if two genes are
consistently linked in the genomes
of several bacteria, they might be
functionally related.

Gene Expression and Microarrays
• The development of microarrays has allowed the expression of
thousand of genes to be monitored simultaneously.
• Microarrays rely on nucleic acid hybridization, in which a known
DNA fragment is used as a probe to find complementary sequences
• The probe is usually fixed to some type of solid support, such as a
nylon filter or a glass slide.
• A solution containing a mixture of DNA or RNA is applied to the solid
support; any nucleic acid that is complementary to the probe will
bind to it.
• Nucleic acids in the mixture are labeled with a radioactive or
fluorescent tag so that molecules bound to the probe can be easily
detected.

How can we examine changes in gene
expression?
• Let we have two types of cells:
Experimental cells
mRNA is converted into cDNA and
labeled with red fluorescent nucleotides
Control cells
cDNA, labeled with green
fluorescent nucleotides.
Labelled cDNA are mixed and
Hybridized to DNA chip, which contain DNA
Probes form different genes
Hybridization of the red (experimental) and green (control) cDNAs is
proportional to the relative amounts of mRNA in the samples.

• Red indicates the over expression of a gene in the experimental
cells (more red-labeled cDNA ybridizes),
• Green indicates the under expression of a gene in the experimental
cells (more green-labeled cDNA hybridizes).
• Yellow indicates equal expression (equal hybridization of red- and
greenlabeled cDNAs.
• No or black color indicates no expression.
• Microarrays allow the expression of thousands of genes to be
monitored simultaneously,
• To study which genes are active in particular tissues.
• To investigate how gene expression changes in the course of
biological processes such as development or disease progression.

Genome wide Mutagenesis
• One of the best methods for determining the function of a gene is to
examine the phenotypes of individual organisms that possess a
mutation in the gene.
• To conduct a mutagenesis screen, random mutations are induced in a
population of organisms, creating new phenotypes.
• Random inducement of mutations on a genome wide basis and
mapping with molecular markers—are coupled and automated in a
mutagenesis screen.
• Mutagenesis screens can be used to search for specific genes
encoding a particular function or trait.

Structural genomics

More Related Content

What's hot

Similar to Structural genomics

More from Ashfaq Ahmad

Recently uploaded

Structural genomics