Bioinformatics seminar

University of Agricultural Sciences, Dharwad
College of Agriculture, Vijayapur
Bioinformatics: An overview and its applications
Department of Biotechnology
Master’s seminar I

Introduction
Importance
F
L
O
W
O
F
S
E
M
I
N
A
R
History
Biological
databases
Tools
Applications
Case study
Conclusion
1

Evolution of Crop Improvement
2Mahendar ThudiSenior Scientist (Chickpea Genomics) ICRISAT

Oliver Smithies Frederick Sanger 6

Herbert Boyer and Stanley Cohen
7

Importance of Bioinformatics…
Quick Sequencing Capabilities. Ever Advancing Technology.
9

Massive Storage.
Agriculture.
10

U A C U G C C U A G U C G
mRNA
Transcriptomics
12

Proteomics
mRNA
Ribosome
Protein
10 20 30 structural

info
12

Metabolomics
mRNA
Ribosome
Metabolic pathway
Biochemical
reactions
and
pathway
Protein
Metabolites
(Enzyme)
12

Biological Databases
Primary Database
(Archival)
Direct experimental results.
Secondary Databases
(Curated)
GenBank
EMBL
DDBJ
PDB
PubMed
Result of analysis on primary databases.
RefSeq
Taxon
SGD
UniProt
13

Sequence - GenBank
15

Protein Structure -Protein data bank
15

Genome - Ensembl
15

Metabolic pathway-Pathway interaction database
15

Lipids - Lipid Maps Structured Database
15

http://www.phytozome.jgi.doe.gov/pz/
19

http://www.greenphyl.org/cgi-bin/index.cgi/
20

http://www.genomevolution.org/CoGe/
21

http://www.bioinformatics.psb.ugent.be/plaza/ 22

Crop plant Hyperlink
Rice http://rgp.dna.affrc.go.jp/IRGSP/
Maize http://www.maizegdb.org/
Sorghum http://www.phytozome.net/sorghum.php
Wheat http://www.wheatgenome.org/
Tomato http://solgenomics.net/organism/Solanum_lycopersicum/genome
Potato http://www.potatogenome.net/index.php/Main_Page
Rapeseed http://www.brassica.info/info/reference/genome-sizes.php
Soybean http://soybase.org/
Castor Bean http://www.phytozome.net/ricinus.php
Flax http://www.phytozome.net/soybean.php
Common Bean http://www.phytozome.net/commonbean.php
Foxtail millet http://foxtailmillet.genomics.org.cn/page/species/index.jsp
Cotton http://www.cottondb.org/wwwroot/cdbhome.php
Chick pea http://www.icrisat.org/gt-bt/ICGGC/homepage.htm
Pigeon pea http://gigadb.org/dataset/100028
Sunflower https://www.sunflowergenome.org/
Crop databases
23

NCBI
(National Center for Biotechnology Information)
• The establishment of the National Center for Biotechnology Information (NCBI) in
November of 1988 occurred primarily through the convergence of three independent
but related actions. They were:
➢1984-86
➢1986
➢1987
24

25
1980: EMBL established their data library.
1981: DNA Databank was established by Japan.

1986: The SWISS-PROT database created.

1988: NCBI was created at NIM/NLH.

1990: BLAST is a fast sequence similarity searching.

1990: The Human Genome Project was started in 1990,

By (1991) a total of 1879 human genes had been mapped
1991:ENTREZ is a search and retrieval tool for NCBI’s linked databases introduced in CD form.
1995:GENOMES provides information on genomes, including sequences, maps,

chromosomes, assembles and annotations.
1997: PubMed is a freely accessible bibliographic retrieval system

to the entire MEDLINE database
2001: Bookshelf is the new ENTREZ database introduced to provide free access to books and
documents in life sciences and health care field.

Formulated functions of NCBI were:
• Design, develop, implement, and manage automated systems for the collection, storage, retrieval,
analysis, and dissemination of knowledge concerning human molecular biology, biochemistry,
and genetics;
• Perform research into advanced methods of computer-based information processing capable of
representing and analyzing the vast number of biologically important molecules and compounds;
• Enable persons engaged in biotechnology research and medical care to use systems developed
under paragraph and methods described in paragraph; and
• Coordinate, as much as is practicable, efforts to gather biotechnology information on an
international basis.
26

The GenBank sequence format is a rich
format for storing sequences and
associated annotations.
Begins with single-line description,
followed by lines of sequence data.
✓ Description line is distinguished
from the sequence data by “>”
symbol.
✓ All lines of text be shorter than 80
characters in length.
✓ Blank lines are not allowed in the
middle of FASTA input.
Sequence Formats
27

Major categories of Bioinformatics Tools
✤ Homology and Similarity Tools
✤ Protein Function Analysis
✤Structural Analysis
Tools used in Bioinformatics
✤ Sequence Analysis
BLAST
EMBOSS
RasMol
PROSPECT
28

Application of Bioinformatics
29

Phylogenetic trees are genealogical trees which are built up with information
gained from the comparison of the amino acid sequences of a protein like
cytochrome C, sampled from different species.
Phylogenetic trees
32

!42
How do we identify a gene in a genome?
A gene is characterized by several features (promoter, ORF…)
some are easier and some harder to detect…
33

CCTGACAAATTCGACGTGCGGCATTGCATGCAGACGTGCATG
CGTGCAAATAATCAATGTGGACTTTTCTGCGATTATGGAAGAA
CTTTGTTACGCGTTTTTGTCATGGCTTTGGTCCCGCTTTGTTC
AGAATGCTTTTAATAAGCGGGGTTACCGGTTTGGTTAGCGAGA
AGAGCCAGTAAAAGACGCAGTGACGGAGATGTCTGATG CAA
TAT GGA CAA TTG GTT TCT TCT CTG AAT ......
.............. TGAAAAACGTA
Annotation
34

Annotation
Transcriptional Factors Binding Site
34

Annotation
Promotor
34

Annotation
Initiator codon
Promotor
34

Annotation
Initiator codon
Promotor
Open reading Frame
34

Annotation
Initiator codon
Promotor
Open reading Frame
Terminator Codon
34

Efforts were made to develop a database, PMDBase(Plant Microsatellite DNA
Database), which integrates large amounts of microsatellite DNAs and web service
for its identification.
In PMDBase, 26 230 099 microsatellite DNAs were identified spanning 110 plant
species.
They also developed MISAweb and embedded Primer3web to help users to identify
microsatellite DNAs and design corresponding primers.
35

Workflow of PMD Base development
Analysis pipeline for generating
microsatellite DNAs
Structure of PMD Base.
Implementation of PMD Base
Yu et al., 2016
36

Above is a schematic outlining how scientists can use bioinformatics to aid rational drug
discovery. Given the nucleotide sequence, the probable amino acid sequence of the
encoded protein can be determined using translation software. Sequence search techniques
can be used to find homologues in model organisms, and based on sequence similarity, it is
possible to model the structure of the human protein on experimentally characterised
structures.
Luscombe et al., 2000
37

‘ASHOKA’
(Advanced Supercomputing Hub for OMICS Knowledge in Agriculture)
38
What we have under ICAR?

Stumbling blocks in Bioinformatics:
• Very expensive to use.
• It rare instances it is possible for the algorithm to make mistakes altering
the final result.
• Loss of privacy (Genetic Screening).
• Discrimination from the health insurance companies due to having a certain
genetic disorder revealed through genetic sequencing.
39

Conclusion… 
Life itself is a information technology…!!
40

Bioinformatics seminar

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Bioinformatics seminar

Similar to Bioinformatics seminar (20)

Recently uploaded

Recently uploaded (20)

Bioinformatics seminar