Genomics and Bioinformatics          Peter Gregory and Senthil Natesan
Genomics   Genomics is the study of the        Requires a large amount of    genomes (i.e. the entire             inform...
Benefits of Genomics to Crop                 Improvement   Unlimited possibilities for crop improvement,    especially in...
Proteomics   The study of the proteins    present in a cell (specific    time and conditions)   Proteomics includes:    ...
Bioinformatics   A term describing the tools to    handle the enormous    amounts of data coming from    the genomics and...
Subfields of Genomics   Structural genomics:    •   Construction of genomic sequence data    •   Gene discovery and local...
Structural Genomics   Uses DNA sequencing technology and    software programs to generate, store, and    analyze genomic ...
Map-Based Sequencing
Map-Based Sequencing
Shotgun Sequencing       •   Multiple copies of the genome are randomly           shredded into pieces by squeezing the DN...
Finding the Genes   After sequencing, need to find the    genes, using computer algorithms –    this step is called ‘anno...
Finding the Genes, Cont’d   The identifying features of protein-coding    genes are open reading frames (ORFs):     • Con...
Introns and Exons
Analysis of DNA Sequence Information I:    Location of Genes Not Apparent
Analysis of DNA Sequence Information II:Location of Regulatory Sequence and ORF’s
Gene Function?   After genome sequencing is annotated, functions    need to be assigned to all genes in the sequence    •...
Genome Size and Gene Number in Selected              Eukaryotes
Unique Features of Eukaryotic Genomes   Gene density    • Wide range compared to prokaryotes   Introns    • Wide variati...
Plant Model Organisms          • Arabidiopsis thaliana             --Model flowering plant and dicot               --Seque...
Arabidopsis Sequencing Facts•Arabidopsis has a small (125 Mb) sized-genome on 5chromosomes    -Human has 3,000 Mb on 23 ch...
Arabidopsis Genome
Comparative Genomics   The study of the relationship of genome structure    and function across different biological spec...
Upcoming SlideShare
Loading in …5
×

Genomics and bioinformatics

2,115 views

Published on

Published in: Education, Technology
0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,115
On SlideShare
0
From Embeds
0
Number of Embeds
18
Actions
Shares
0
Downloads
192
Comments
0
Likes
4
Embeds 0
No embeds

No notes for slide
  • Bullet 1: The genome of an organism is a complete genetic sequence of a full set of chromosomes in a gamete Intragenomic phenomena such as heterosis, epistasis, pleiotropy and other interactions between loci and alleles within the genome. In contrast, the investigation of the roles and functions of single genes is a primary focus of molecular biology or genetics Research on single genes does not fall into the definition of genomics unless the aim of the study is to elucidate its effect on, place in, and response to the entire genome Heterosis or hybrid vigor: increased strength of different characteristics in hybrids; Epistasis: the interaction between genes/when the effects of one gene are modified by one or several other genes, which are sometimes called modifier genes Pleiotropy: describes the genetic effect of a single gene on multiple phenotypic traits Bullet 3: beyond the single gene affects which are currently being studied.
  • Outgrowth of genomics Analyzing the functional aspects of genes (i.e. the proteins and their activation) provides information about how genes operate to affect traits. This information enables rational design of modification to genes and genomes rather than simply identifying ‘good’ or ‘bad’ genes The level of complexity is very high and currently the protein spectrum in very few tissues (mostly human) has been analyzed in any detail Metabolomics: Study of the metabolome, the complete set of metabolites in an organism
  • Bullet 1: Not a technology per se, but Already a vital tool to handle data, much of the software was developed during the human genome project in the 90s Bullet 3: algorithm is an effective method expressed as a finite list [1] of well-defined instructions for calculating a function
  • A GENETIC MAP IS A CHROMOSOME MAP OF A SPECIES OR EXPERIMENTAL POPULATION THAT SHOWS THE POSITION OF ITS KNOWN GENES AND/OR MARKERS RELATIVE TO EACH OTHER, RATHER THAN AS SPECIFIC PHYSICAL POINTS ON EACH CHROMOSOME. MAP-BASED SEQUENCING BEGINS WITH THE CONSTRUCTION OF A GENOMIC LIBRARY USING VECTORS THAT CAN ACCOMMODATE LARGE FRAGMENTS OF AN ORGANISM’S GENOME A GENOMIC LIBRARY IS A POPULATION OF HOST BACTERIA, EACH OF WHICH CARRIES A DNA MOLECULE THAT WAS INSERTED INTO A CLONING VECTOR, SUCH THAT THE COLLECTION OF CLONED DNA MOLECULES REPRESENTS THE ENTIRE GENOME OF THE SOURCE ORGANISM. NEXT, THE CLONES ARE ASSEMBLED INTO GENETIC MAPS GENETIC MAPS ARE BASED ON THE FREQUENCIES OF RECOMBINATION BETWEEN MARKERS DURING CROSSOVER OF HOMOLOGOUS CHROMOSOMES. THE GREATER THE FREQUENCY OF RECOMBINATION (SEGREGATION) BETWEEN TWO GENETIC MARKERS, THE FARTHER APART THEY ARE ASSUMED TO BE.
  • PHYSICAL MAP IS BASED ON DIRECT ANALYSIS OF DNA RATHER THAN RECOMBINATIONAL FREQUENCY – INCREASES MAP RESOLUTION SEE FIG – PHYSICAL MAP BASED ON SET OF OVERLAPPING ORDERED CLONES (CONTIGS) A CONTIG (FROM CONTIGUOUS ) IS A SET OF OVERLAPPING DNA SEGMENTS DERIVED FROM A SINGLE GENETIC SOURCE EACH CLONE SEQUENCED INDIVIDUALLY INCREASING RESOLUTION FROM START TO FINISH
  • The shotgun sequencing method goes straight to the job of decoding, bypassing the need for a physical map Much faster than map-based sequencing COMBINATION OF SEQUENCING TECHNOLOGY AND SOFTWARE DEVELOPMENT CLONAL SELECTION IS RANDOM LAST STEP: COMPUTER FACILITATES ID OF SEQUENCE OVERLAPS TO ASSEMBLE TOTAL SEQUENCE
  • BUT, ORGANIZATION OF GENES IN EUKARYOTES MAKES DIRECT SEARCHING OFR ORFs MORE DIFFICULT THAN IN PROKARYOTES 1. EUKARYOTIC GENES HAVE INTRONS, NON CODING REGIONS, BETWEEN CODING REGIONS - SO MOST EUKARYOTIC GENES COMPRISE ORFs (EXONS) INTERSPERSED WITH INTRONS BOTTOM LINE – IS IMPORTANT TO DISTINGUISH BETWEEN INTRONS, EXONS, AND GENES
  • SHOWS PORTION OF THE HUMAN GENOME SEQUENCE INITIALLY NOT CLEAR IF IT CONTAINS ANY GENES BUT CONTROL REGIONS AT THE BEGINNING OF GENES ARE MARKED BY IDENTIFIABLE SEQUENCES SPLICE SITES BETWEEN EXONS AND INTRONS HAVE A PREDICTABLE SEQUENCE: MOST INTRONS BEGIN WITH GT AND END WITH AG. END OF GENE HAS A POLY A TAIL IF A DNA SEQUENCE ENCODES A PROTEIN, AFTER SPLICING THE SEQUENCE CONTAINS ONE OR MORE ORFs In molecular biology splicing is a modification of an RNA after transcription, in which introns are removed and exons are joined. This is needed for the typical eukaryotic messenger RNA before it can be used to produce a correct protein through translation. For many eukaryotic introns, splicing is done in a series of reactions which are catalyzed by the spliceosome, a complex of small nuclear ribonucleoproteins (snRNPs, but there are also self-splicing introns.
  • ANALYSIS OF THE SEQUENCE SHOWS IT CONTAINS A CONTROL REGION AND THREE EXONS USING THIS SEQUENCE TO SEARCH GENOMIC DATABASES, SHOWED THIS IS THE SEQUENCE OF A SINGLE GENE – THE HUMAN BETA GLOBIN GENE MOST INTRONS BEGIN WITH GT AND END WITH AG. END OF GENE HAS A POLY A TAIL
  • NOW CONSIDER SOME FEATURES OF EUKARYOTIC GENOMES BASIC FEATURES SIMILAR, GENOME SIZE IS HIGHLY VARIABLE 10,000 FOLD RANGE BETWEEN FUNGI AND FLOWERING PLANTS NUMBER OF GENES VARIES MUCH LESS
  • FRUIT FLY OF THE PLANT WORLD SMALL SHORT GENERATION TIME SMALL GENOME ON 5 CHROMOSOMES MAP BASED SEQUENCING USED
  • ASSIGMENT OF GENES BASED ON HOMOLOGY SEARCHES CROP PLANTS HAVE MUCH LARGER GENOMES THAN ARABIDOPSIS BUT SOME HAVE ABOUT THE SAME NUMBER OF GENES IN THE LARGE GENOME PLANTS, GENES ARE CLUSTERED IN STRETCHES OF DNA SEPARATED LONG STRETCHES OF SPACER DNA SEQUENCING OF RICE GENOME COMPLETED IN 2005 90% OF ARABIDOPSIS GENES FOUND IN RICE BUT ONLY 70% OF RICE GENES FOUND IN ARABIDOPSIS INDICATES THAT CEREAL CROPS MAY HAVE UNIQUE GENE SETS ID OF THESE UNIQUE GENES CRITICAL TO REALIZING THE FULL POTENTIAL OF GENOMICS (AND GENETIC ENGINEERING) IN HELPING TO FEED THE EARTH’S GROWING POPULATION
  • Genomics and bioinformatics

    1. 1. Genomics and Bioinformatics Peter Gregory and Senthil Natesan
    2. 2. Genomics Genomics is the study of the  Requires a large amount of genomes (i.e. the entire information per individual: hereditary information) of • Expensive in agriculture organisms and includes: where many individuals • Determining the entire DNA need to be analyzed sequence • Fine-scale genetic mapping • Studies of intragenomic phenomena Used to determine an ideal genotype instead of just a few genes. The study of whole genomes of populations of individuals can reveal the genetic basis of different responses to both biotic and abiotic stresses
    3. 3. Benefits of Genomics to Crop Improvement Unlimited possibilities for crop improvement, especially in combination with genetic engineering: • Improved crop productivity • Increased nutritional quality and quantity • Tolerance to abiotic stresses – drought, low quality soils (acidity, low nutrient content) • Tolerance to biotic stresses - pests and diseases • Etc, etc, etc
    4. 4. Proteomics The study of the proteins present in a cell (specific time and conditions) Proteomics includes: • Identification of all proteins in a cell • Posttranslational modifications of proteins • Protein-protein interaction • Subcellular location of proteins
    5. 5. Bioinformatics A term describing the tools to handle the enormous amounts of data coming from the genomics and proteomics programs Makes possible the investigation of correlations which would not be possible manually The algorithms to analyze the data are still at an experimental stage and there are still questions over doing experiments in silico which might not be relevant in the biological world
    6. 6. Subfields of Genomics Structural genomics: • Construction of genomic sequence data • Gene discovery and localization • Construction of gene maps Functional genomics: • Biological function of genes • Regulation • Products • Plant development studies Comparative genomics: • Compares gene sequences to elucidate functional or evolutionary relationships
    7. 7. Structural Genomics Uses DNA sequencing technology and software programs to generate, store, and analyze genomic sequence information Two approaches to genome sequencing: • Map-based sequencing • Shotgun sequencing
    8. 8. Map-Based Sequencing
    9. 9. Map-Based Sequencing
    10. 10. Shotgun Sequencing • Multiple copies of the genome are randomly shredded into pieces by squeezing the DNA through a pressurized syringe. This is done a second time to generate pieces that are 10,000 bp long • Each 2,000 and 10,000 bp fragment is inserted into a plasmid – The two collections of plasmids containing 2,000 and 10,000 bp chunks of DNA are plasmid libraries • Both the 2,000 and the 10,000 bp plasmid libraries are sequenced. 500 bp from each end of each fragment are decoded generating millions of sequences Sequencing both ends of each insert is critical for the assembling the entire chromosome • Computer algorithms assemble the millions of sequenced fragments into a continuous stretch resembling each chromosome
    11. 11. Finding the Genes After sequencing, need to find the genes, using computer algorithms – this step is called ‘annotation’ Annotation identifies: • Protein-coding genes • Initiation sequences • Regulatory sequences • Termination sequences • Nonprotein-coding sequences
    12. 12. Finding the Genes, Cont’d The identifying features of protein-coding genes are open reading frames (ORFs): • Continuous sets of DNA nucleotide triplets that can be translated into the amino acid sequence of a protein • ORFs begin with an initiation sequence, usually ATG • ORFs end with a termination sequence, usually TAA, TAG or TGA
    13. 13. Introns and Exons
    14. 14. Analysis of DNA Sequence Information I: Location of Genes Not Apparent
    15. 15. Analysis of DNA Sequence Information II:Location of Regulatory Sequence and ORF’s
    16. 16. Gene Function? After genome sequencing is annotated, functions need to be assigned to all genes in the sequence • Some of the identified genes might have functions assigned already via classical methods of mutagenesis and linkage mapping • Some may not have assigned functions – use homology searches:  Computer-based comparisons of the sequence under study with known sequences from other organisms
    17. 17. Genome Size and Gene Number in Selected Eukaryotes
    18. 18. Unique Features of Eukaryotic Genomes Gene density • Wide range compared to prokaryotes Introns • Wide variation among eukaryotes Repetitive sequences • Along with the presence of introns, repetitive sequences are responsible for the wide range of genome sizes in eukaryotes  In maize two thirds of the genome comprises repetitive DNA
    19. 19. Plant Model Organisms • Arabidiopsis thaliana --Model flowering plant and dicot --Sequence finished in 2001 --First flowering plant to be sequenced • Oryza sativa (rice) --Model monocot --Sequence finished in 2005 --First crop plant to be sequenced • Medicago truncatula (barrel medic) --Model legume • Lycopersicon esculentum (tomato) --Model fruit-bearing plantNote: Hundreds of other genomes (plant, animal, bacterial and viral)have been, or are being, sequenced
    20. 20. Arabidopsis Sequencing Facts•Arabidopsis has a small (125 Mb) sized-genome on 5chromosomes -Human has 3,000 Mb on 23 chromosomes -Maize has 2,500 Mb on 10 chromosomes -Medicago has 520 Mb on 8 chromosomes -Rice has 430 Mb on 12 chromosomes -Lily has 50,000 Mb on 12 chromosomes•Arabidopsis has approx.25,500 genes -humans have slightly fewer, about 24,000
    21. 21. Arabidopsis Genome
    22. 22. Comparative Genomics The study of the relationship of genome structure and function across different biological species or strains: • Holds great promise to yield insights into many aspects of the evolution of modern species  Enormous potential for crop genetics and breeding • The vast amount of information contained in modern genomes necessitates that the methods of comparative genomics are automated • Having come a long way from its initial use of finding functional proteins, comparative genomics is now concentrating on finding regulatory regions and other features of the genome

    ×