Bioinformatics in biotechnology by kk sahu

BIOINFORMATICS IN BIOTECHNOLOGY
By
KAUSHAL KUMAR SAHU
Assistant Professor (Ad Hoc)
Department of Biotechnology
Govt. Digvijay Autonomous P. G. College
Raj-Nandgaon ( C. G. )

SYNOPSIS
 Introduction
 Bioinformatics – definition
 History
 Required skills
 Core areas of bioinformatics
 Components of bioinformatics
 Nomenclature system in bioinformatics
 Biological databases
 Types of database
 Bioinformatics tools
 Applications of bioinformatics
 Conclusion
 References

INTRODUCTION
 What is bioinformatics?
 In today the applications of biotechnology in various
fields like medicine, agriculture, industry and
environment. Research and development in these
areas of biotechnology, generate a huge amount of
biological data. Manual maintenance of such a vast
amount of data is a cumbersome task. We require a
large computer databases to maintain biological data
for the future use. Development of biological
databases and analysis of biological data forms a
new branch of biotechnology, called Bioinformatics.

DEFINITION
 Bioinformatics is the application of
Information technology to store, organize and
analyze the vast amount of biological data,
which is available in the form of sequences
and structures of proteins (the building
blocks of organisms) and nucleic acids (the
information carrier). The biological
information of nucleic acids is available as
sequences, while the data of proteins is
available as sequences and structures.

REQUIRED SKILLS
 Biology
 Computer science
 Mathematics/Statistics
 Information technology
 All needed to some degree but some needed
more for some applications than others
 Often, not all required skills are present in one
person. Need teams with programmer, biologist,
IT person, statistician, even graphic designer

HISTORY
 The term bioinformetics was coined pauling hogeweg in 1978 for the study
of informatics processes in biotic systems.
 The first comprehensive collection of amino acid sequences was compiles
in the ‘Atlas of protein sequence and structure’ by the National Biomedical
Research Foundation (NBRF) this collection was edited by M. O. Dayhoff
from 1965 to 1978.
 The European Molecular Biology Laboratory (EMBL). Established their
data library in 1980 to collect. Organize and distribute nucleotide sequence
data and related information. This function is now performed by European
Bioinformetics Institute (EBI) , Hinxton U.K.
 In 1984, the National Biomedical Research Foundation (NBRF)
established the protein information resource (PIR).
 In 1988, the National Institute of Health (NIH), USA developed the National
Center for Biotechnology Information (NCBI) to develop information
system in molecular biology.
 The DNA Databank of japan (DDBI) at mishma joined the data collecting
collaboration a few years lat

CORE AREAS OF BIONFORMETICS
 Bioinformetics consist of three core areas
 Molecular biology database
 Sequence comparison and sequence
analysis
 The emerging technology of microarrays

COMPONENTS OF BIOINFORMATICS
 CREATION OF DATABASE: This involves the organization, storage,
and management of large biological data sets. The databases are
accessible to researchers to know the existing information and
submit new entries e.g. protein and sequence data bank for
molecular structure, databases will be of no use until analyzed.
 DEVELOPMENT OF ALGORITHMS AND STATISTICS: this involves the
development of tools and resources to determine the relationship
among the members of large sets e.g. comparison of protein
sequences data with already existing sequences.
 ANALYSIS OF DATA AND INTERPRETATION: the appropriate use of
above two components is to analyze the data and interpret the
results in a biologically meaningful manner.

NOMENCLATURE SYSTEM IN BIOINFORMATICS
 The nomenclature system in bioinformatics is based on certain
recommendations made by IUPAC.
 Nomenclature of DNA sequences


NOMENCLATURE OF PROTEIN SEQUENCES





TYPES OF SEQUENCES USED IN
BIOINFORMATICS
 There are different types of sequences which are
known to have genetic information.
 Such sequences are used in bioinformatics.
 The databases on DNA sequences contain a variety
of sequence types:
 Genomic DNA sequences
 CDNA sequences
 Organellar DNA sequences
 Expressed sequence tags
 Gene sequencing tags
 Sequences of RNA

DATABASE
WHAT IS A DATABASE?
 A database is a repository of DNA or Amino acid
sequences which provide a homogenous and centralized
viwe of its contents
 The repository is created and modified through a DBMS
(Data Base Management System).
 The contents of database can be accessed through a
graphical user interface (GUI) that allows browsing
through the contents of the repository.
 The biological information can be stored in different
databases. Each database has its own website with
unique navigation tools.

CLASSIFICATION OF DATABASE
 the databases are broadly classified into two main
categories:
 Sequence databases: Involves the sequences of
both proteins and nucleic acids
 Structural databases: Involves only protein
sequences
 In addition, it is also classified in to three main
categories
 Primary database
 Secondary database
 Composite database

PRIMARY DATABASE
 Contain information of the sequence or
structure alone of either protein or nucleic
acid.
 Primary sequence database are a database
that stores bimolecular sequence and
associated annotation information.
 Primary database tools are effective for
identifying the sequence similarities.

NUCLEIC ACID SEQUENCE DATABASE
 Contain information of the sequence or structure
of nucleic acid
 Major nucleic acid sequence databases are:
 EMBL (European Molecular Biology Laboratory
nucleotide sequence database at EBI, Hinxton, UK)
 GenBank (at National Center for Biotechnology
information, NCBI, Bethesda, MD, USA)
 DDBJ (DNA Data Bank Japan at CIB , Mishima,
Japan)

THE INTERNATIONAL NUCLEOTIDE SEQUENCE
DATABASE COLLABORATION (INSDC)
 The INSDC consists of DDBJ (Japan),
GenBank (USA) and the EMBL Nucleotide
Sequence Database. The three databases
exchange new and updated data on a daily
basis to achieve optimal synchronisation.
DDBJ/EMBL/GenBank adhere to
documented guidelines:
 The DDBJ/EMBL/GenBank Feature Table
Definition regulating the content and syntax
of the database entries.
 A set of database policies issued and
published by the Int. Advisors to
DDBJ/EMBL/GenBank.
 This strong and successful collaboration is
based on daily interactions between
database staff as well as working meetings
amongst the databases.
 They exchange data on daily basis to ensure
comprehensive coverage at each of the
sites.

GENBANK (GENETIC SEQUENCE DATABANK)
 Genbank is the main nucleotide sequence databases held by the
National Centre for Biotechnology Information (NCBI).
 NCBI is a division of National Library of Medicine located at National
Institute of Health (NIH) in maryland (USA). NCBI maintains sequences
from every organisms every sources, every type of DNA from m RNA to
c DNA clones to expressed sequence tags to high throughput genome
data and information about sequence polymorphism.
 Genbank is one of the fastest growing repositories of known genetic
sequences.
 Genbank new releases come once in two months.
 It is a part of International Nucleotide Sequence Databases
Collaboration.
 In addition to sequence data, GenBank files contain information like
accession numbers and gene names, phylogenetic classifications and
reference to published literature.

EMBL ( EUROPEAN MOLECULAR BIOLOGY
LABORATORY)
 EMBL is one of the top research institutions of the world, it is the flagship
of European Molecular biology, ranking as the highest non-US institute
in research program.
 Established in 1978 at Heidelberg, Germany.
 It is a comprehensive database of DNA & RNA sequences collected
from scientific researches and patent applications.
 EMBI is supported by 20 European countries and Australia as associate
member.
 Main laboratory has five research units:
 Cell biology and biophysics
 Development biology
 Gene Expression
 Structural and computational biology
 Biochemical and Instrumentation.

DDBJ (DNA DATA BANK OF JAPAN)
 It is the DNA Data Bank.
 Started in 1984 at the National Institute of
Genetics (NIG) in mishima, japan.
 DDBJ has been functioning as an International
Nucleotide sequence Database, including EBI
(European Bioinformatics Institute: responsible
for EMBL database) in Europe and NCBI
(National Center for Biotechnological
Information: responsible for GenBank database)
in USA as the two other members.

PROTEIN SEQUENCE DATABASE
 Contain information of the sequence or structure of proteins
 Major protein sequence database are:
 Protein information resource (PIR)
 PIR was developed at the National Biomedical Research Foundation in
the early 1960s by Margaret Dayhoff as a collection of sequence for
investigating evolutionary relationships among proteins
 From 1988, the protein sequence database has been maintained
collaboratively by PIR international, an associated macro molecular data
collection centre.
 In its current form, the database is split in to four sections PIR 1 to PIR
4.
 PIR has collaborated with EBI and SIB (Swiss Bioinformatics Institute) to
establish the UniPort (United protein Database).

SECONDARY DATABASE
 Secondary database contain derived information from primary
database. For ex.
 Information on conserved sequence and active site residues of
protein families.
 It is more useful than primary database
 Some of the major secondary database are as follows:
 PROSITE
 PRINTS
 BLOCKS
 Profile
 Pfem
 IDENTIFY
 The type of information stored in each of the secondary database
is different.

SECONDARY DATABASE
 PROSITE
 First one to develop is PROSITE as a secondary database.
 Maintained collaboratively at the SWISS Institute of Bioinformatics.
 It uses a single consensus pattern to characterize each family of
sequence.
•BLOCKS
•Blocks database is an automatically
generated database of ungapped multiple
sequence aligments that correspond to the
most conserved regions of the proteins.

COMPOSITE DATABASE
 Specialized databases are those that cater
to a particular research interest. For
example, Flybase, HIV sequence database,
and Ribosomal Database Project are
databases that specialize in a particular
organism or a particular type of data.

BIOINFORMATICS TOOLS
 These are software programs that are designed for extracting the
meaningful information from the mass of molecular
biology/biological databases and to carry out sequence and
structural analysis
 After the formation of the databases, tools become available to
search sequences databases.
 The bioinformatics tools can be categorized in to the following
categories:
 Biological databases
 Homology and similarity tools (sequence alignment tool)
 Protein function analysis tools
 Structural analysis tools
 Sequence manipulation tools
 Sequence analysis tools

HOMOLOGY AND SIMILARITY TOOLS
 BLAST (Basic Alignment Search Tools)
 It is a program for sequence similarity searching developed at the NCBI.
 It identifies genes and genetic features.
 It executes sequences searches against the entire DNA database in less that 15 seconds.
 A BLAST search enables a researcher to compare a query sequence with a database of
sequence, and identify database sequence that resemble the query sequence.
 There are several variant of BLAST:
 BLAST p compares an amino acid query sequence against a protein sequence database.
 BLAST n compares a nucleotide query sequence against a nucleotide sequence
database.
 BLAST x compares a nucleotide query sequence translated in all reading frames against a
protein sequence database.
 T BLAST n compares a protein query sequence against a nucleotide sequence
database.dynamically in all reading frames.
 T BLAST x compares the six-frame translation of a nucleotide query sequence against the
six-frame translations of a nucleotide sequence database.

HOMOLOGY AND SIMILARITY TOOLS
 FAST-A (FAST-ALL)
 It is another sequence analysis tool very much similar
to BLAST.
 This was originally developed by W.R. Pearson &
Lipman.
 FastA can be accessed from European Bioinformatics
Institute (EBI) site.
 FastA gives better results for nucleotide sequences
than protein sequences.
 The FastA programs search the database files to find
a number of related sequences to the query
sequence and display a pair-wise alignment between
them.

APPLICATIONS OF BIOINFORMATICS
 Drug design and gene therapy: one of the earliest medical applications of bioinformatics
has been in helping rational drug design.
 Gene expression analysis: This usually involves compiling expression data for cells
affected by different diseases like cancer and comparing the measurements against
normal expression levels. Identification of genes that are expressed differently in affected
cells provides a basis for explaining the causes of illness and highlights potential drug
targets.
 Functional genomics uses the emerging knowledge about genomes to understand genes
and their products’ functions.
 Microbial genomics offers new fermentation based products and technologies to combat
microbial infection.
 Animal genomics opens areas in veterinary science and transgenic models
 Plant genomics will open new vistas in agriculture and horticulture

CONCLUSION
 Translation of billions of characters in DNA
sequences that make the genome into
biologically meaningful information has given
birth to a new field of science called
bioinformatics. The whole area of biology is
benefitted from the bioinformatics approach.
Bioinformatics tools for efficient research will
have significant implications in life sciences and
betterment of human lives.

REFERENCES
,
 The Cell-A Molecular Approach, Cooper G. M. (2000); 2/e. Sinaur
Associates Inc. Sunderland.
 Bioinformatics, Baxevanis AD and Francis Oullette BF (Eds.),
(2001); John Wiley and Sons, 2nd Edition.
 Human Molecular Genetics, Strachan T, Read A. P. (1999); 2/e.
BIOS Scientific Publishers Ltd. Oxford.

 www.ncbi.nim.gov
 www.ag.auburn.edu
 www.bioinformaticscentre.org
 www.accessexcellence.com
 www.fao.org
 www.ifse.tamu.edu

Bioinformatics in biotechnology by kk sahu

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Bioinformatics in biotechnology by kk sahu

Similar to Bioinformatics in biotechnology by kk sahu (20)

More from KAUSHAL SAHU

More from KAUSHAL SAHU (20)

Recently uploaded

Recently uploaded (20)

Bioinformatics in biotechnology by kk sahu