Bentham & Hooker's Classification. along with the merits and demerits of the ...
Genomics_final.pptx
1.
2. The genome is all the DNA in a cell.
› All the DNA on all the chromosomes
› Includes genes, intergenic sequences, repeats
Specifically, it is all the DNA in an organelle.
Eukaryotes can have 2-3 genomes
› Nuclear genome
› Mitochondrial genome
› Plastid genome
If not specified, “genome” usually refers to the
nuclear genome.
3. Genomics is the study of genomes, including
large chromosomal segments containing many
genes.
The initial phase of genomics aims to map and
sequence an initial set of entire genomes.
Functional genomics aims to deduce
information about the function of DNA
sequences.
› Should continue long after the initial genome
sequences have been completed.
4. Genomics-what is it?
Development and application of genetic mapping, sequencing,
and computation (bioinformatics) to analyze the genomes of
organisms.
Sub-fields of genomics:
1. Structural genomics-genetic and physical mapping of genomes.
2. Functional genomics-analysis of gene function (and non-genes).
3. Comparative genomics-comparison of genomes across species.
Includes structural and functional genomics.
Evolutionary genomics.
5. 22 autosome pairs + 2
sex chromosomes
3 billion base pairs in
the haploid genome
Where and what are
the 30,000 to 40,000
genes?
Is there anything else
interesting/important?
From NCBI web site, photo from T. Ried,
Natl Human Genome Research Institute, NIH
6. Human genome has 3.2 billion base pairs
of DNA
About 3% codes for proteins
About 40-50% is repetitive, made by
(retro)transposition
What is the function of the remaining
50%?
7. Know (close to) all the genes in a genome, and
the sequence of the proteins they encode.
BIOLOGY HAS BECOME A FINITE SCIENCE
› Hypotheses have to conform to what is present,
not what you could imagine could happen.
No longer look at just individual genes
› Examine whole genomes or systems of genes
8. Genetics: study of inherited phenotypes
Genomics: study of genomes
Biochemistry: study of the chemistry of
living organisms and/or cells
Revolution lauched by full genome
sequencing
› Many biological problems now have finite (albeit
complex) solutions.
› New era will see an even greater interaction
among these three disciplines
9. Distinct components of genomes
Abundance and complexity of mRNA
Normalized cDNA libraries and ESTs
Genome sequences: gene numbers
Comparative genomics
10. Complex genomes have roughly 10x to 30x
more DNA than is required to encode all the
RNAs or proteins in the organism.
Contributors to the non-coding DNA include:
› Introns in genes
› Regulatory elements of genes
› Multiple copies of genes, including pseudogenes
› Intergenic sequences
› Interspersed repeats
11. Highly repeated DNA
› R (repetition frequency) >100,000
› Almost no information, low complexity
Moderately repeated DNA
› 10<R<10,000
› Little information, moderate complexity
“Single copy” DNA
› R=1 or 2
› Much information, high complexity
12. • Genes were originally defined in terms of
phenotypes of mutants
• Now we have sequences of lots of DNA from
a variety of organisms, so ...
• Which portions of DNA actually do something?
• What do they do?
• code for protein or some other product?
• regulate expression?
• used in replication, etc?
13.
14. Determining a 3D structure
› X-ray crystallography
Structural elements
Modeling a 3D structure
15. Primary Secondary Tertiary Quaternary
Amino acid
sequence.
Alpha helices &
Beta sheets,
Loops.
Arrangement
of secondary
elements in
3D space.
Packing of several
polypeptide chains.
Given an amino acid sequence, we are interested in its secondary
structures, and how they are arranged in higher structures.
Protein Structures
17. Ca or CA Ball-and-stick CPK
• It’s often as important to decide what to omit as it is to decide what to
include
• What you omit depends on what you want to emphasize
19. EBI (PDBe)
› Lots of hyperlinks out
› Educational info (proteins of the month)
RCSB (PDB)
› Lots of hyperlinks out
› Educational info (proteins of the month)
21. The last 15 years have
witnessed an explosion in
the number of known
protein structures. How
do we make sense of all
this information?
blue bars: yearly total
red bars: cumulative total
N=87,153
Non-redundant ~ 49,158
22.
23. Classification of Protein Structures
The explosion of protein structures has led to the development of
hierarchical systems for comparing and classifying them.
Effective protein classification systems allow us to address several
fundamental and important questions:
If two proteins have similar structures, are they related by
common ancestry, or did they converge on a common theme from
two different starting points?
How likely is that two proteins with similar structures have the
same function?
Put another way, if I have experimental knowledge of, or can
somehow predict, a protein’s structure, I can fit into known
classification systems. How much do I then know about that
protein? Do I know what other proteins it is homologous to? Do I
know what its function is?
24. “A polypeptide or part of a polypeptide chain that
can independently fold into a stable tertiary
structure...”
from Introduction to Protein Structure, by Branden &
Tooze
“Compact units within the folding pattern of a
single chain that look as if they should have
independent stability.”
from Introduction to Protein Architecture, by Lesk
Thus, domains:
• can be built from structural motifs;
• independently folding elements;
• functional units;
• separable by proteases.
Two domains of a
bifunctional enzyme
25. Proteins often have a modular organization
Single polypeptide chain may be divisible into smaller independent
units of tertiary structure called domains
Domains are the fundamental units of structure classification
Different domains in a protein are also often associated with different
functions carried out by the protein, though some functions occur at
the interface between domains
1 60 100 300 324 355 363 393
activation
domain
sequence-specific
DNA binding domain
tetramer-
ization
domain
non-specific
DNA-binding
domain
domain organization of P53 tumor suppressor
26. Not all proteins change at
the same rate;
Why?
Functional pressures
› Surface residues are
observed to change most
frequently;
› Interior less frequently;
29. Proteins reflect millions of years of evolution.
Most proteins belong to large evolutionary families.
3D structure is better conserved than sequence during
evolution.
Similarities between sequences or between structures may
reveal information about shared biological functions of a protein
family.
30. How is a 3D structure determined ?
1. Experimental methods (Best approach):
• X-rays crystallography - stable fold, good quality crystals.
• NMR - stable fold, not suitable for large molecule.
2. In-silico methods (partial solutions -
based on similarity):
• Sequence or profile alignment - uses similar sequences,
limited use of 3D information.
• Threading - needs 3D structure, combinatorial complexity.
• Ab-initio structure prediction - not always successful.
41. A structure is a
“MODEL”!!
What does that
mean?
› It is someone’s
interpretation of the
primary data!!!
42.
43. A comparison of gene numbers ,
gene locations & biological functions
of gene, in the genomes of different
organisms, one objective being to
identify groups of genes that play a
unique biological role in a particular
organism.
44. Homology :- Relationship of any two
characters ( such as two proteins that have
similar sequences ) that have descended,
usually through divergence, from a common
ancestral character.
Homologues are thus components or
characters (such as genes/proteins with
similar sequences) that can be attributed to a
common ancestor of the two organisms
during evolution.
45. Orthologues are homologues that have evolved
from a common ancestral gene by speciation. They
usually have similar functions.
Paralogues are homologues that are related or
produced by duplication within a genome followed
by subsequent divergence. They often have
different functions.
Xenologues are homologous that are related by an
interspecies (horizontal transfer) of the genetic
material for one of the homologues. The functions
of the xenologues are quite often similar.
46. Analogues are non-homologues
genes/proteins that have descended
convergently from an unrelated
ancestor. They have similar functions
although they are unrelated in either
sequence or structure.
47. Comparative Genomics
Two very large problems are immediately apparent in
undertaking the sequencing of entire genomes.
First, the vast numbers of species and the much larger size of
some genomes makes the entire sequencing of all genomes a
non-optimal approach for understanding genome structure.
Second, within a given species most individuals are genetically
distinct in a number of ways. What does it actually mean, for
example, to "sequence a human genome"? The genomes of two
individuals who are genetically distinct differ with respect to DNA
sequence by definition.
These two problems, and the potential for other novel
applications, have given rise to new approaches which, taken
together, constitute the field of comparative genomics.
48. All modern genomes have arisen from common ancestral
genomes, the relationships between genomes can be studied
with this fact in mind. This commonality means that information
gained in one organism can have application in other even
distantly related organisms.
Comparative genomics enables the application of information
gained from facile model systems to agricultural and medical
problems. The nature and significance of differences between
genomes also provides a powerful tool for determining the
relationship between genotype and phenotype through
comparative genomics and morphological and physiological
studies.