2. Genomics
Genomics is the study of the Requires a large amount of
genomes (i.e. the entire information per individual:
hereditary information) of • Expensive in agriculture
organisms and includes: where many individuals
• Determining the entire DNA need to be analyzed
sequence
• Fine-scale genetic mapping
• Studies of intragenomic
phenomena
Used to determine an ideal
genotype instead of just a
few genes.
The study of whole genomes
of populations of individuals
can reveal the genetic basis
of different responses to both
biotic and abiotic stresses
3. Benefits of Genomics to Crop
Improvement
Unlimited possibilities for crop improvement,
especially in combination with genetic
engineering:
• Improved crop productivity
• Increased nutritional quality and quantity
• Tolerance to abiotic stresses – drought, low
quality soils (acidity, low nutrient content)
• Tolerance to biotic stresses - pests and diseases
• Etc, etc, etc
4. Proteomics
The study of the proteins
present in a cell (specific
time and conditions)
Proteomics includes:
• Identification of all proteins
in a cell
• Posttranslational
modifications of proteins
• Protein-protein interaction
• Subcellular location of
proteins
5. Bioinformatics
A term describing the tools to
handle the enormous
amounts of data coming from
the genomics and proteomics
programs
Makes possible the
investigation of correlations
which would not be possible
manually
The algorithms to analyze
the data are still at an
experimental stage and there
are still questions over doing
experiments in silico which
might not be relevant in the
biological world
6. Subfields of Genomics
Structural genomics:
• Construction of genomic sequence data
• Gene discovery and localization
• Construction of gene maps
Functional genomics:
• Biological function of genes
• Regulation
• Products
• Plant development studies
Comparative genomics:
• Compares gene sequences to elucidate
functional or evolutionary relationships
7. Structural Genomics
Uses DNA sequencing technology and
software programs to generate, store, and
analyze genomic sequence information
Two approaches to genome sequencing:
• Map-based sequencing
• Shotgun sequencing
10. Shotgun Sequencing
• Multiple copies of the genome are randomly
shredded into pieces by squeezing the DNA
through a pressurized syringe. This is done
a second time to generate pieces that are
10,000 bp long
• Each 2,000 and 10,000 bp fragment is
inserted into a plasmid
– The two collections of plasmids containing
2,000 and 10,000 bp chunks of DNA are
plasmid libraries
• Both the 2,000 and the 10,000 bp plasmid
libraries are sequenced. 500 bp from each
end of each fragment are decoded
generating millions of sequences
Sequencing both ends of each insert is
critical for the assembling the entire
chromosome
• Computer algorithms assemble the millions
of sequenced fragments into a continuous
stretch resembling each chromosome
11. Finding the Genes
After sequencing, need to find the
genes, using computer algorithms –
this step is called ‘annotation’
Annotation identifies:
• Protein-coding genes
• Initiation sequences
• Regulatory sequences
• Termination sequences
• Nonprotein-coding sequences
12. Finding the Genes, Cont’d
The identifying features of protein-coding
genes are open reading frames (ORFs):
• Continuous sets of DNA nucleotide triplets
that can be translated into the amino acid
sequence of a protein
• ORFs begin with an initiation sequence,
usually ATG
• ORFs end with a termination sequence,
usually TAA, TAG or TGA
14. Analysis of DNA Sequence Information I:
Location of Genes Not Apparent
15. Analysis of DNA Sequence Information II:
Location of Regulatory Sequence and ORF’s
16. Gene Function?
After genome sequencing is annotated, functions
need to be assigned to all genes in the sequence
• Some of the identified genes might have functions
assigned already via classical methods of
mutagenesis and linkage mapping
• Some may not have assigned functions – use
homology searches:
Computer-based comparisons of the sequence under
study with known sequences from other organisms
18. Unique Features of Eukaryotic Genomes
Gene density
• Wide range compared to prokaryotes
Introns
• Wide variation among eukaryotes
Repetitive sequences
• Along with the presence of introns, repetitive
sequences are responsible for the wide range
of genome sizes in eukaryotes
In maize two thirds of the genome comprises
repetitive DNA
19. Plant Model Organisms
• Arabidiopsis thaliana
--Model flowering plant and dicot
--Sequence finished in 2001
--First flowering plant to be sequenced
• Oryza sativa (rice)
--Model monocot
--Sequence finished in 2005
--First crop plant to be sequenced
• Medicago truncatula (barrel medic)
--Model legume
• Lycopersicon esculentum (tomato)
--Model fruit-bearing plant
Note: Hundreds of other genomes (plant, animal, bacterial and viral)
have been, or are being, sequenced
20. Arabidopsis Sequencing Facts
•Arabidopsis has a small (125 Mb) sized-genome on 5
chromosomes
-Human has 3,000 Mb on 23 chromosomes
-Maize has 2,500 Mb on 10 chromosomes
-Medicago has 520 Mb on 8 chromosomes
-Rice has 430 Mb on 12 chromosomes
-Lily has 50,000 Mb on 12 chromosomes
•Arabidopsis has approx.
25,500 genes
-humans have slightly fewer,
about 24,000
22. Comparative Genomics
The study of the relationship of genome structure
and function across different biological species or
strains:
• Holds great promise to yield insights into many
aspects of the evolution of modern species
Enormous potential for crop genetics and breeding
• The vast amount of information contained in
modern genomes necessitates that the methods of
comparative genomics are automated
• Having come a long way from its initial use of
finding functional proteins, comparative genomics
is now concentrating on finding regulatory regions
and other features of the genome
Editor's Notes
Bullet 1: The genome of an organism is a complete genetic sequence of a full set of chromosomes in a gamete Intragenomic phenomena such as heterosis, epistasis, pleiotropy and other interactions between loci and alleles within the genome. In contrast, the investigation of the roles and functions of single genes is a primary focus of molecular biology or genetics Research on single genes does not fall into the definition of genomics unless the aim of the study is to elucidate its effect on, place in, and response to the entire genome Heterosis or hybrid vigor: increased strength of different characteristics in hybrids; Epistasis: the interaction between genes/when the effects of one gene are modified by one or several other genes, which are sometimes called modifier genes Pleiotropy: describes the genetic effect of a single gene on multiple phenotypic traits Bullet 3: beyond the single gene affects which are currently being studied.
Outgrowth of genomics Analyzing the functional aspects of genes (i.e. the proteins and their activation) provides information about how genes operate to affect traits. This information enables rational design of modification to genes and genomes rather than simply identifying ‘good’ or ‘bad’ genes The level of complexity is very high and currently the protein spectrum in very few tissues (mostly human) has been analyzed in any detail Metabolomics: Study of the metabolome, the complete set of metabolites in an organism
Bullet 1: Not a technology per se, but Already a vital tool to handle data, much of the software was developed during the human genome project in the 90s Bullet 3: algorithm is an effective method expressed as a finite list [1] of well-defined instructions for calculating a function
A GENETIC MAP IS A CHROMOSOME MAP OF A SPECIES OR EXPERIMENTAL POPULATION THAT SHOWS THE POSITION OF ITS KNOWN GENES AND/OR MARKERS RELATIVE TO EACH OTHER, RATHER THAN AS SPECIFIC PHYSICAL POINTS ON EACH CHROMOSOME. MAP-BASED SEQUENCING BEGINS WITH THE CONSTRUCTION OF A GENOMIC LIBRARY USING VECTORS THAT CAN ACCOMMODATE LARGE FRAGMENTS OF AN ORGANISM’S GENOME A GENOMIC LIBRARY IS A POPULATION OF HOST BACTERIA, EACH OF WHICH CARRIES A DNA MOLECULE THAT WAS INSERTED INTO A CLONING VECTOR, SUCH THAT THE COLLECTION OF CLONED DNA MOLECULES REPRESENTS THE ENTIRE GENOME OF THE SOURCE ORGANISM. NEXT, THE CLONES ARE ASSEMBLED INTO GENETIC MAPS GENETIC MAPS ARE BASED ON THE FREQUENCIES OF RECOMBINATION BETWEEN MARKERS DURING CROSSOVER OF HOMOLOGOUS CHROMOSOMES. THE GREATER THE FREQUENCY OF RECOMBINATION (SEGREGATION) BETWEEN TWO GENETIC MARKERS, THE FARTHER APART THEY ARE ASSUMED TO BE.
PHYSICAL MAP IS BASED ON DIRECT ANALYSIS OF DNA RATHER THAN RECOMBINATIONAL FREQUENCY – INCREASES MAP RESOLUTION SEE FIG – PHYSICAL MAP BASED ON SET OF OVERLAPPING ORDERED CLONES (CONTIGS) A CONTIG (FROM CONTIGUOUS ) IS A SET OF OVERLAPPING DNA SEGMENTS DERIVED FROM A SINGLE GENETIC SOURCE EACH CLONE SEQUENCED INDIVIDUALLY INCREASING RESOLUTION FROM START TO FINISH
The shotgun sequencing method goes straight to the job of decoding, bypassing the need for a physical map Much faster than map-based sequencing COMBINATION OF SEQUENCING TECHNOLOGY AND SOFTWARE DEVELOPMENT CLONAL SELECTION IS RANDOM LAST STEP: COMPUTER FACILITATES ID OF SEQUENCE OVERLAPS TO ASSEMBLE TOTAL SEQUENCE
BUT, ORGANIZATION OF GENES IN EUKARYOTES MAKES DIRECT SEARCHING OFR ORFs MORE DIFFICULT THAN IN PROKARYOTES 1. EUKARYOTIC GENES HAVE INTRONS, NON CODING REGIONS, BETWEEN CODING REGIONS - SO MOST EUKARYOTIC GENES COMPRISE ORFs (EXONS) INTERSPERSED WITH INTRONS BOTTOM LINE – IS IMPORTANT TO DISTINGUISH BETWEEN INTRONS, EXONS, AND GENES
SHOWS PORTION OF THE HUMAN GENOME SEQUENCE INITIALLY NOT CLEAR IF IT CONTAINS ANY GENES BUT CONTROL REGIONS AT THE BEGINNING OF GENES ARE MARKED BY IDENTIFIABLE SEQUENCES SPLICE SITES BETWEEN EXONS AND INTRONS HAVE A PREDICTABLE SEQUENCE: MOST INTRONS BEGIN WITH GT AND END WITH AG. END OF GENE HAS A POLY A TAIL IF A DNA SEQUENCE ENCODES A PROTEIN, AFTER SPLICING THE SEQUENCE CONTAINS ONE OR MORE ORFs In molecular biology splicing is a modification of an RNA after transcription, in which introns are removed and exons are joined. This is needed for the typical eukaryotic messenger RNA before it can be used to produce a correct protein through translation. For many eukaryotic introns, splicing is done in a series of reactions which are catalyzed by the spliceosome, a complex of small nuclear ribonucleoproteins (snRNPs, but there are also self-splicing introns.
ANALYSIS OF THE SEQUENCE SHOWS IT CONTAINS A CONTROL REGION AND THREE EXONS USING THIS SEQUENCE TO SEARCH GENOMIC DATABASES, SHOWED THIS IS THE SEQUENCE OF A SINGLE GENE – THE HUMAN BETA GLOBIN GENE MOST INTRONS BEGIN WITH GT AND END WITH AG. END OF GENE HAS A POLY A TAIL
NOW CONSIDER SOME FEATURES OF EUKARYOTIC GENOMES BASIC FEATURES SIMILAR, GENOME SIZE IS HIGHLY VARIABLE 10,000 FOLD RANGE BETWEEN FUNGI AND FLOWERING PLANTS NUMBER OF GENES VARIES MUCH LESS
FRUIT FLY OF THE PLANT WORLD SMALL SHORT GENERATION TIME SMALL GENOME ON 5 CHROMOSOMES MAP BASED SEQUENCING USED
ASSIGMENT OF GENES BASED ON HOMOLOGY SEARCHES CROP PLANTS HAVE MUCH LARGER GENOMES THAN ARABIDOPSIS BUT SOME HAVE ABOUT THE SAME NUMBER OF GENES IN THE LARGE GENOME PLANTS, GENES ARE CLUSTERED IN STRETCHES OF DNA SEPARATED LONG STRETCHES OF SPACER DNA SEQUENCING OF RICE GENOME COMPLETED IN 2005 90% OF ARABIDOPSIS GENES FOUND IN RICE BUT ONLY 70% OF RICE GENES FOUND IN ARABIDOPSIS INDICATES THAT CEREAL CROPS MAY HAVE UNIQUE GENE SETS ID OF THESE UNIQUE GENES CRITICAL TO REALIZING THE FULL POTENTIAL OF GENOMICS (AND GENETIC ENGINEERING) IN HELPING TO FEED THE EARTH’S GROWING POPULATION