Introduction
Bioinformatics – definition
History
Required skills
Core areas of bioinformatics
Components of bioinformatics
Nomenclature system in bioinformatics
Biological databases
Types of database
Bioinformatics tools
Applications of bioinformatics
Conclusion
References
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
Bioinformatics in biotechnology by kk sahu
1. BIOINFORMATICS IN BIOTECHNOLOGY
By
KAUSHAL KUMAR SAHU
Assistant Professor (Ad Hoc)
Department of Biotechnology
Govt. Digvijay Autonomous P. G. College
Raj-Nandgaon ( C. G. )
2. SYNOPSIS
Introduction
Bioinformatics – definition
History
Required skills
Core areas of bioinformatics
Components of bioinformatics
Nomenclature system in bioinformatics
Biological databases
Types of database
Bioinformatics tools
Applications of bioinformatics
Conclusion
References
3. INTRODUCTION
What is bioinformatics?
In today the applications of biotechnology in various
fields like medicine, agriculture, industry and
environment. Research and development in these
areas of biotechnology, generate a huge amount of
biological data. Manual maintenance of such a vast
amount of data is a cumbersome task. We require a
large computer databases to maintain biological data
for the future use. Development of biological
databases and analysis of biological data forms a
new branch of biotechnology, called Bioinformatics.
4. DEFINITION
Bioinformatics is the application of
Information technology to store, organize and
analyze the vast amount of biological data,
which is available in the form of sequences
and structures of proteins (the building
blocks of organisms) and nucleic acids (the
information carrier). The biological
information of nucleic acids is available as
sequences, while the data of proteins is
available as sequences and structures.
5. REQUIRED SKILLS
Biology
Computer science
Mathematics/Statistics
Information technology
All needed to some degree but some needed
more for some applications than others
Often, not all required skills are present in one
person. Need teams with programmer, biologist,
IT person, statistician, even graphic designer
6. HISTORY
The term bioinformetics was coined pauling hogeweg in 1978 for the study
of informatics processes in biotic systems.
The first comprehensive collection of amino acid sequences was compiles
in the ‘Atlas of protein sequence and structure’ by the National Biomedical
Research Foundation (NBRF) this collection was edited by M. O. Dayhoff
from 1965 to 1978.
The European Molecular Biology Laboratory (EMBL). Established their
data library in 1980 to collect. Organize and distribute nucleotide sequence
data and related information. This function is now performed by European
Bioinformetics Institute (EBI) , Hinxton U.K.
In 1984, the National Biomedical Research Foundation (NBRF)
established the protein information resource (PIR).
In 1988, the National Institute of Health (NIH), USA developed the National
Center for Biotechnology Information (NCBI) to develop information
system in molecular biology.
The DNA Databank of japan (DDBI) at mishma joined the data collecting
collaboration a few years lat
7. CORE AREAS OF BIONFORMETICS
Bioinformetics consist of three core areas
Molecular biology database
Sequence comparison and sequence
analysis
The emerging technology of microarrays
8. COMPONENTS OF BIOINFORMATICS
CREATION OF DATABASE: This involves the organization, storage,
and management of large biological data sets. The databases are
accessible to researchers to know the existing information and
submit new entries e.g. protein and sequence data bank for
molecular structure, databases will be of no use until analyzed.
DEVELOPMENT OF ALGORITHMS AND STATISTICS: this involves the
development of tools and resources to determine the relationship
among the members of large sets e.g. comparison of protein
sequences data with already existing sequences.
ANALYSIS OF DATA AND INTERPRETATION: the appropriate use of
above two components is to analyze the data and interpret the
results in a biologically meaningful manner.
9. NOMENCLATURE SYSTEM IN BIOINFORMATICS
The nomenclature system in bioinformatics is based on certain
recommendations made by IUPAC.
Nomenclature of DNA sequences
11. TYPES OF SEQUENCES USED IN
BIOINFORMATICS
There are different types of sequences which are
known to have genetic information.
Such sequences are used in bioinformatics.
The databases on DNA sequences contain a variety
of sequence types:
Genomic DNA sequences
CDNA sequences
Organellar DNA sequences
Expressed sequence tags
Gene sequencing tags
Sequences of RNA
12. DATABASE
WHAT IS A DATABASE?
A database is a repository of DNA or Amino acid
sequences which provide a homogenous and centralized
viwe of its contents
The repository is created and modified through a DBMS
(Data Base Management System).
The contents of database can be accessed through a
graphical user interface (GUI) that allows browsing
through the contents of the repository.
The biological information can be stored in different
databases. Each database has its own website with
unique navigation tools.
13. CLASSIFICATION OF DATABASE
the databases are broadly classified into two main
categories:
Sequence databases: Involves the sequences of
both proteins and nucleic acids
Structural databases: Involves only protein
sequences
In addition, it is also classified in to three main
categories
Primary database
Secondary database
Composite database
14. PRIMARY DATABASE
Contain information of the sequence or
structure alone of either protein or nucleic
acid.
Primary sequence database are a database
that stores bimolecular sequence and
associated annotation information.
Primary database tools are effective for
identifying the sequence similarities.
15. NUCLEIC ACID SEQUENCE DATABASE
Contain information of the sequence or structure
of nucleic acid
Major nucleic acid sequence databases are:
EMBL (European Molecular Biology Laboratory
nucleotide sequence database at EBI, Hinxton, UK)
GenBank (at National Center for Biotechnology
information, NCBI, Bethesda, MD, USA)
DDBJ (DNA Data Bank Japan at CIB , Mishima,
Japan)
16. THE INTERNATIONAL NUCLEOTIDE SEQUENCE
DATABASE COLLABORATION (INSDC)
The INSDC consists of DDBJ (Japan),
GenBank (USA) and the EMBL Nucleotide
Sequence Database. The three databases
exchange new and updated data on a daily
basis to achieve optimal synchronisation.
DDBJ/EMBL/GenBank adhere to
documented guidelines:
The DDBJ/EMBL/GenBank Feature Table
Definition regulating the content and syntax
of the database entries.
A set of database policies issued and
published by the Int. Advisors to
DDBJ/EMBL/GenBank.
This strong and successful collaboration is
based on daily interactions between
database staff as well as working meetings
amongst the databases.
They exchange data on daily basis to ensure
comprehensive coverage at each of the
sites.
17. GENBANK (GENETIC SEQUENCE DATABANK)
Genbank is the main nucleotide sequence databases held by the
National Centre for Biotechnology Information (NCBI).
NCBI is a division of National Library of Medicine located at National
Institute of Health (NIH) in maryland (USA). NCBI maintains sequences
from every organisms every sources, every type of DNA from m RNA to
c DNA clones to expressed sequence tags to high throughput genome
data and information about sequence polymorphism.
Genbank is one of the fastest growing repositories of known genetic
sequences.
Genbank new releases come once in two months.
It is a part of International Nucleotide Sequence Databases
Collaboration.
In addition to sequence data, GenBank files contain information like
accession numbers and gene names, phylogenetic classifications and
reference to published literature.
18. EMBL ( EUROPEAN MOLECULAR BIOLOGY
LABORATORY)
EMBL is one of the top research institutions of the world, it is the flagship
of European Molecular biology, ranking as the highest non-US institute
in research program.
Established in 1978 at Heidelberg, Germany.
It is a comprehensive database of DNA & RNA sequences collected
from scientific researches and patent applications.
EMBI is supported by 20 European countries and Australia as associate
member.
Main laboratory has five research units:
Cell biology and biophysics
Development biology
Gene Expression
Structural and computational biology
Biochemical and Instrumentation.
19. DDBJ (DNA DATA BANK OF JAPAN)
It is the DNA Data Bank.
Started in 1984 at the National Institute of
Genetics (NIG) in mishima, japan.
DDBJ has been functioning as an International
Nucleotide sequence Database, including EBI
(European Bioinformatics Institute: responsible
for EMBL database) in Europe and NCBI
(National Center for Biotechnological
Information: responsible for GenBank database)
in USA as the two other members.
20. PROTEIN SEQUENCE DATABASE
Contain information of the sequence or structure of proteins
Major protein sequence database are:
Protein information resource (PIR)
PIR was developed at the National Biomedical Research Foundation in
the early 1960s by Margaret Dayhoff as a collection of sequence for
investigating evolutionary relationships among proteins
From 1988, the protein sequence database has been maintained
collaboratively by PIR international, an associated macro molecular data
collection centre.
In its current form, the database is split in to four sections PIR 1 to PIR
4.
PIR has collaborated with EBI and SIB (Swiss Bioinformatics Institute) to
establish the UniPort (United protein Database).
21. SECONDARY DATABASE
Secondary database contain derived information from primary
database. For ex.
Information on conserved sequence and active site residues of
protein families.
It is more useful than primary database
Some of the major secondary database are as follows:
PROSITE
PRINTS
BLOCKS
Profile
Pfem
IDENTIFY
The type of information stored in each of the secondary database
is different.
22. SECONDARY DATABASE
PROSITE
First one to develop is PROSITE as a secondary database.
Maintained collaboratively at the SWISS Institute of Bioinformatics.
It uses a single consensus pattern to characterize each family of
sequence.
•BLOCKS
•Blocks database is an automatically
generated database of ungapped multiple
sequence aligments that correspond to the
most conserved regions of the proteins.
23. COMPOSITE DATABASE
Specialized databases are those that cater
to a particular research interest. For
example, Flybase, HIV sequence database,
and Ribosomal Database Project are
databases that specialize in a particular
organism or a particular type of data.
24. BIOINFORMATICS TOOLS
These are software programs that are designed for extracting the
meaningful information from the mass of molecular
biology/biological databases and to carry out sequence and
structural analysis
After the formation of the databases, tools become available to
search sequences databases.
The bioinformatics tools can be categorized in to the following
categories:
Biological databases
Homology and similarity tools (sequence alignment tool)
Protein function analysis tools
Structural analysis tools
Sequence manipulation tools
Sequence analysis tools
25. HOMOLOGY AND SIMILARITY TOOLS
BLAST (Basic Alignment Search Tools)
It is a program for sequence similarity searching developed at the NCBI.
It identifies genes and genetic features.
It executes sequences searches against the entire DNA database in less that 15 seconds.
A BLAST search enables a researcher to compare a query sequence with a database of
sequence, and identify database sequence that resemble the query sequence.
There are several variant of BLAST:
BLAST p compares an amino acid query sequence against a protein sequence database.
BLAST n compares a nucleotide query sequence against a nucleotide sequence
database.
BLAST x compares a nucleotide query sequence translated in all reading frames against a
protein sequence database.
T BLAST n compares a protein query sequence against a nucleotide sequence
database.dynamically in all reading frames.
T BLAST x compares the six-frame translation of a nucleotide query sequence against the
six-frame translations of a nucleotide sequence database.
26. HOMOLOGY AND SIMILARITY TOOLS
FAST-A (FAST-ALL)
It is another sequence analysis tool very much similar
to BLAST.
This was originally developed by W.R. Pearson &
Lipman.
FastA can be accessed from European Bioinformatics
Institute (EBI) site.
FastA gives better results for nucleotide sequences
than protein sequences.
The FastA programs search the database files to find
a number of related sequences to the query
sequence and display a pair-wise alignment between
them.
27. APPLICATIONS OF BIOINFORMATICS
Drug design and gene therapy: one of the earliest medical applications of bioinformatics
has been in helping rational drug design.
Gene expression analysis: This usually involves compiling expression data for cells
affected by different diseases like cancer and comparing the measurements against
normal expression levels. Identification of genes that are expressed differently in affected
cells provides a basis for explaining the causes of illness and highlights potential drug
targets.
Functional genomics uses the emerging knowledge about genomes to understand genes
and their products’ functions.
Microbial genomics offers new fermentation based products and technologies to combat
microbial infection.
Animal genomics opens areas in veterinary science and transgenic models
Plant genomics will open new vistas in agriculture and horticulture
28. CONCLUSION
Translation of billions of characters in DNA
sequences that make the genome into
biologically meaningful information has given
birth to a new field of science called
bioinformatics. The whole area of biology is
benefitted from the bioinformatics approach.
Bioinformatics tools for efficient research will
have significant implications in life sciences and
betterment of human lives.
29. REFERENCES
,
The Cell-A Molecular Approach, Cooper G. M. (2000); 2/e. Sinaur
Associates Inc. Sunderland.
Bioinformatics, Baxevanis AD and Francis Oullette BF (Eds.),
(2001); John Wiley and Sons, 2nd Edition.
Human Molecular Genetics, Strachan T, Read A. P. (1999); 2/e.
BIOS Scientific Publishers Ltd. Oxford.
www.ncbi.nim.gov
www.ag.auburn.edu
www.bioinformaticscentre.org
www.accessexcellence.com
www.fao.org
www.ifse.tamu.edu