Genomics and bioinformatics

Genomics and Bioinformatics

Peter Gregory and Senthil Natesan

Genomics
 Genomics is the study of the  Requires a large amount of
genomes (i.e. the entire information per individual:
hereditary information) of • Expensive in agriculture
organisms and includes: where many individuals
• Determining the entire DNA need to be analyzed
sequence
• Fine-scale genetic mapping
• Studies of intragenomic
phenomena
 Used to determine an ideal
genotype instead of just a
few genes.
 The study of whole genomes
of populations of individuals
can reveal the genetic basis
of different responses to both
biotic and abiotic stresses

Benefits of Genomics to Crop
Improvement
 Unlimited possibilities for crop improvement,
especially in combination with genetic
engineering:
• Improved crop productivity
• Increased nutritional quality and quantity
• Tolerance to abiotic stresses – drought, low
quality soils (acidity, low nutrient content)
• Tolerance to biotic stresses - pests and diseases
• Etc, etc, etc

Proteomics
 The study of the proteins
present in a cell (specific
time and conditions)
 Proteomics includes:
• Identification of all proteins
in a cell
• Posttranslational
modifications of proteins
• Protein-protein interaction
• Subcellular location of
proteins

Bioinformatics
 A term describing the tools to
handle the enormous
amounts of data coming from
the genomics and proteomics
programs
 Makes possible the
investigation of correlations
which would not be possible
manually
 The algorithms to analyze
the data are still at an
experimental stage and there
are still questions over doing
experiments in silico which
might not be relevant in the
biological world

Subfields of Genomics
 Structural genomics:
• Construction of genomic sequence data
• Gene discovery and localization
• Construction of gene maps
 Functional genomics:
• Biological function of genes
• Regulation
• Products
• Plant development studies
 Comparative genomics:
• Compares gene sequences to elucidate
functional or evolutionary relationships

Structural Genomics

 Uses DNA sequencing technology and
software programs to generate, store, and
analyze genomic sequence information
 Two approaches to genome sequencing:
• Map-based sequencing
• Shotgun sequencing

Shotgun Sequencing
• Multiple copies of the genome are randomly
shredded into pieces by squeezing the DNA
through a pressurized syringe. This is done
a second time to generate pieces that are
10,000 bp long
• Each 2,000 and 10,000 bp fragment is
inserted into a plasmid
– The two collections of plasmids containing
2,000 and 10,000 bp chunks of DNA are
plasmid libraries
• Both the 2,000 and the 10,000 bp plasmid
libraries are sequenced. 500 bp from each
end of each fragment are decoded
generating millions of sequences
Sequencing both ends of each insert is
critical for the assembling the entire
chromosome
• Computer algorithms assemble the millions
of sequenced fragments into a continuous
stretch resembling each chromosome

Finding the Genes

 After sequencing, need to find the
genes, using computer algorithms –
this step is called ‘annotation’
 Annotation identifies:
• Protein-coding genes
• Initiation sequences
• Regulatory sequences
• Termination sequences
• Nonprotein-coding sequences

Finding the Genes, Cont’d
 The identifying features of protein-coding
genes are open reading frames (ORFs):
• Continuous sets of DNA nucleotide triplets
that can be translated into the amino acid
sequence of a protein
• ORFs begin with an initiation sequence,
usually ATG
• ORFs end with a termination sequence,
usually TAA, TAG or TGA

Analysis of DNA Sequence Information I:
Location of Genes Not Apparent

Analysis of DNA Sequence Information II:
Location of Regulatory Sequence and ORF’s

Gene Function?
 After genome sequencing is annotated, functions
need to be assigned to all genes in the sequence
• Some of the identified genes might have functions
assigned already via classical methods of
mutagenesis and linkage mapping
• Some may not have assigned functions – use
homology searches:
 Computer-based comparisons of the sequence under
study with known sequences from other organisms

Genome Size and Gene Number in Selected
Eukaryotes

Unique Features of Eukaryotic Genomes

 Gene density
• Wide range compared to prokaryotes
 Introns
• Wide variation among eukaryotes
 Repetitive sequences
• Along with the presence of introns, repetitive
sequences are responsible for the wide range
of genome sizes in eukaryotes
 In maize two thirds of the genome comprises
repetitive DNA

Plant Model Organisms
• Arabidiopsis thaliana
--Model flowering plant and dicot
--Sequence finished in 2001
--First flowering plant to be sequenced

• Oryza sativa (rice)
--Model monocot
--Sequence finished in 2005
--First crop plant to be sequenced
• Medicago truncatula (barrel medic)
--Model legume
• Lycopersicon esculentum (tomato)
--Model fruit-bearing plant

Note: Hundreds of other genomes (plant, animal, bacterial and viral)
have been, or are being, sequenced

Arabidopsis Sequencing Facts
•Arabidopsis has a small (125 Mb) sized-genome on 5
chromosomes
-Human has 3,000 Mb on 23 chromosomes
-Maize has 2,500 Mb on 10 chromosomes
-Medicago has 520 Mb on 8 chromosomes
-Rice has 430 Mb on 12 chromosomes
-Lily has 50,000 Mb on 12 chromosomes

•Arabidopsis has approx.
25,500 genes
-humans have slightly fewer,
about 24,000

Comparative Genomics
 The study of the relationship of genome structure
and function across different biological species or
strains:
• Holds great promise to yield insights into many
aspects of the evolution of modern species
 Enormous potential for crop genetics and breeding
• The vast amount of information contained in
modern genomes necessitates that the methods of
comparative genomics are automated
• Having come a long way from its initial use of
finding functional proteins, comparative genomics
is now concentrating on finding regulatory regions
and other features of the genome

Genomics and bioinformatics

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Genomics and bioinformatics

Similar to Genomics and bioinformatics (20)

More from Senthil Natesan

More from Senthil Natesan (20)

Recently uploaded

Recently uploaded (20)

Genomics and bioinformatics

Editor's Notes