SlideShare a Scribd company logo
1 of 130
B) BIOINFORMATICS
1.
Introduction :
Importance and Scope
IMPORTANCE
 It is an interdisciplinary subject, where three subjects Biology,
Computer science and Information technology compain or merge
together to form the new disciplin ….. Bioinformatics.
OR
 Bioinformatics is a branch of biology which deals with very fast,
accurate and logical analysis of biological data’s and information for
interpretations and predictions by making use of computational
techniques. ( Margaret Day Hoff )
DEFINITION
 Bioinformatics, n. The science of information and information flow
in biological systems, esp. of the use of computational methods in
genetics and genomics. (Oxford English Dictionary)
 "The mathematical, statistical and computing methods that aim to solve
biological problems using DNA and amino acid sequences and related
information." -- Fredj Tekaia
SCOPE
1) Better documentation, store large quantity of data and addition,
documentation, delition of data are also possible.
a) Design and discovery of drugs. Considering genomic structure
of pathogens and chemical structure of drugs.
b) Study based on the important biomolecules protein and nucleic
acid.
PROTEIN: Structural and functional unit.
NUCLEIC ACID: Hereditary determining path.
c) Bioinformatics is the comparison based on the already available
details of protein and nucleic acid.
2) Very easy to search and access information.
3) Fast, accurate, logical analysis.
4) Interpretation and prediction.
Applications
1) Comparison
 Comparison of nucleic acid and protein sequence.
 It provides similarities and differences between the sequence of protein and nucleic
acids.
 Two type analysis is there
1) Structural analysis 2) Functional analysis
Get structural details Get functional details
 Molecular level of classification of organism are possible by using bioinformatic tool.
 Classification by comparing sequences by their similarities and differences of protein
as well as nucleic acid sequences and there by relationship of both nucleic acid and
protein.
 In taxonomy morphological, enzymatic analysis and comparisons are only occur but
for obtaining accurate level analysis molecular level analysis requires.
 Comparison of protein and nucleic acid helps to,
Classification of protein
Classification of nucleic acid
Classification of individual
Evalutionary relationship between organism
2) Gene finding
 Using bioinformatic gene finding easy.
 Nucleic acid is the expression product of genes.
 By finding the nucleic acid sequences, helps to identify the gene
responsible for certain characters. Eg : gene responsible for yeild
improvement
 Gene finding has application in crop improvement such as
resistance to insect, disease, drought, salinity etc. higher yeild.
 In agricultural and medical field – useful in comparison of normal
one with diseased one.
 In medical field, to find out the gene responsible for genetic
disorders and rectify in embryo and patient level by compairing
normal with diseased one.
 By Embryo therapy : at embryo level or rectify in sperm/egg
Patient therapy : rectify at particular cells or nucleic acid
3) Protein structure prediction
 Comparison of protein structure with protein structure database.
 By knowing protein structure, find out the final activities, their
influence in physiological and metabolic pathway of an organisms & also
related growth of organisms via knowing protein structure.
 Find out the disease pathway; by identifying defective protein and
defective gene.
 By identifying protein coding gene, helps to cure genetic disorders.
 NMR technique, X- Ray diffraction technique is used for identifying
protein structure. But it is very expensive and time consuming methods.
 Inted of there 2 method bioinformatics are applicable, very easy, less
expensive and time saving method.
 Very short time required for structure prediction.
 Discovery of near noval protein using bioinformatics inserted of NMR
and X- Ray diffraction technique, which is used in several field, drug
discovery and pharmaceutical etc.
 By knowing protein structure we can synthesis biologically valuable
synthetic enzymes.
4) Evalutionary relationship study
 By structural genomics, functional genomic and comparison
genomics.
5) Construction of biological data bases
 Construction of data bases is a part of coming under better
documentation.
 Depending up of type and kind of information, different type of
databases are there.
DATA BASE: area or spaces where informations are stored in
electric format. Different type of data bases are present, based on
the information containing ( information about protein/ nucleic
acid) Eg: EMBL, Gene bank
6) Total genomic structural study of an organism
 Helps to species identification.
7) Used in environmental cleaning up programme
By gene finding: scope for bioremediation. Eg: In oil spils we
use psuedomonas putrida to decrease the effect of
hydrocarbons in oils.
Plasmid – degrade hydocarbon – total oil degrade
Improve and modify individual useful for bioremediation
8) Creation of bio weapon
By gene finding near future bio weapons are used for
Eg: different disease causing microorganism identify and
used as weapon.
2.
Biological Databases
a) Nucleic Acid Databases
EMBL, Gene Bank – Structure of
Gene Bank entries. Specialized
genomic resources. UniGene
EMBL
 Nucleotide sequence data base
 It is developed by EBI ( European Bioinformatic Institute in UK)
 European Molecular Biology Laboratory (EMBL)
 It collect information from different sources such as
* Genome sequencing projects
* Scientific literature
* Direct auther submission
 It associated with Gene Bank, DDBJ, for exchanging information each other. So we can see
comprehensive collection of information.
 Its growth rate is very fast, double the information in 9-10 months.
 It divided in to many subdivisions.
 The Laboratory operates from six sites: the main laboratory in Heidelberg, and outstations
in Hinxton (the European Bioinformatics Institute (EBI), in
England), Grenoble (France), Hamburg (Germany), Rome (Italy) and Barcelona (Spain).
 EMBL groups and laboratories perform basic research in molecular biology and molecular medicine
as well as training for scientists, students and visitors.
 Informations are accessing through SRS system. SRS: SEQUENCE RETRIEVAL SYSTEM
 The first systematic genetic analysis of embryonic development in the fruit fly was conducted
at EMBL by Christiane Nüsslein-Volhard and Eric Wieschaus,[13] for which they were awarded
the Nobel Prize in Physiology or Medicine in 1995.
 In the early 1980s, Jacques Dubochet and his team at EMBL, developed cryogenic electron
microscopy for biological structures. It was rewarded with the 2017 Nobel Prize in Chemistry.
 URL Address : ( Uniform Resource Location) http://w.w.w.ebl.uk/embl/
GENE BANK
 It is a primary nucleiotide sequence biological data base.
 Full form Gene Bank
 Developed by NCBI (National Centre for Biotechnology Information)
 Less restriction
 AIM: Helps the scientific and research community in order to support their
research activity that contain information without restrictions except copy right
sequence and patent sequence.
 Growth rate : 1 months; with in one month double the informations.
 Information's are divided into 17 divisions for getting information easily.
 There are 17 divisions to make convinient & efficient informations in Gene
Bank.
 2 Retrieval system:
1) Entrenz Integrated Retrieval system : It have a capacity to link with
nucleotide sequence db with protein sequence db.
2) MEDLINE Facility: useful to get information of abstract of originally
bublised papers related to nucleiotide sequences.
 http://w.w.w.ncbi.nlm.nih.gov/genebank
Gene Bank incorporates information from
# publish available sources
# primarily from direct author submissions
# large scale sequencing project
To help ensure comprehensive coverage, the resource
exchanges data with both the EMBL data library and DDBJ.
Structural Entities
The Structure of Gene Bank Entries
A Gene Bank release includes the sequence files, indices created
on various databases fields and information derived from the
databases.
Gene Bank was made availabe on CD-ROM
It is convenient machanism for widespread.
Relatively inexpensive distribution
As the size of the database, large no.of CD required and dificult to
handle for the producers and for the users.
Today Gene Bank is available in FTP format.
Commonly used is the sequence entry file which contains the
sequence itself and disruptive information relating to it.
Each entry consist of no. of keywords,relevent associated sub-
keywords and an optional features.
The structure of gene bank entries consist of 13 structural
components:
1) LOCUS
2) DEFINITION
3) ACCESSION NUMBER
4) VERSION
5) KEYWORDS
6) SOURCE
7) ORGANISM
8) REFERENCE
9) AUTHOR
10) TITLE
11) JOURNEL
12) PUB MED NO
13) REMARK/COMMENT
1) LOCUS: we need to provide an entry number (identification for nucleiotide sequence)
[ NM- 000555- mRNA- Tuesday, 21.7.2018]
(entry no.) (Type of sequence) (day ) (day.M.year)
2) DEFINITION: scientific name of source organism.
Eg for Bt gene: Sequence entering there and expresssion product.
scientific name: Baccillus thuringenesis, mRNA, βendotoxin.
3) ACCESSION NUMBER: normallysimilar to entry number.
[NM: 000555]
4) VERSION: if we want to update information we first write entry No. and version No.
and also gene information Id No. along with it.
[NM: 000555.5.G: Id No 12345]
5) KEYWORDS: we must provide the key word of our work, if no key word put a dot.
Eg: Insert resistivity .
6) SOURCE: name of source of organism which we get, we must write common name .
source organism: Bacteria
7) ORGANISM: name of source of organism, we must write scientific name. scientific
name of source of organism: Bacillus thuringenesis
8) REFERENCE: reference of that paper published related to
enter the nucleotide sequence of interest.
9) AUTHOR: we need to enter the name of author in the same
order as in the same order as in the case of published.
10) TITLE: title of the paper
11) JOURNEL: name of the journel where you have publishd
the paper.
12) PUB MED NO: this is the no. which helps to access the
archived published paper with in PUB MED( scientific
journel archiver).
13) REMARK/COMMENT: we can enter, biological
importance/ expression/changes/source organism as
comment.
Specialized genomic resources
 The purpose of specialized resources is to focus on species -
species genomics and to particular sequencing techniques.
The particular aim of such a data base is the integrated view
of a particular biological system.
a) UniGene
* The collection represents genes from many organisms and
each cluster relating to a unique gene and including related
information corresponding to the gene.
* A valuable role of UniGene is in gene discovery.
* UniGene is also used for gene mapping projects and large
scale gene expression analysis.
b)TDB — The TIGR Database
* These databases containing DNA and protein sequence, gene
expression, protein family information etc.
* Also the data such as taxonomic range of plants and humans, role
of cellular components are also present.
c) SGD (Saccharomyces Genome Database)
* SGD is an online data resource which contain information on the
molecular biology and genetics of S.cerevisiae (Budding yeast).
* This data base provides internet access to the genome, its genes
and their products etc.
* SGD helps the research field by uniting together functions to
perform sequence similarity search tools.
* The illustration of genetic maps using dynamically created
graphical displays make the data base user friendly.
UniGene
It is an specilized genomic resources.
There are the databases, which tend to be linked, to some
extend, with the primary DNA databases from which they
may derive their data and into which their results are usually
fed.
Purpose of specialized genomic resource
1) to species-specific genomics
2) to particular sequencing technique
Primary goal of human genome project is to determine the
complete sequence of human genome.93 billion base pairs)
3% of the genome encodes protein.
Biological significance of remainder is unknown
A transcript map is a vital resource in flagging there parts of
the genome that are actually expressed.
Unigene attempts to provide a transcript map by utilising
sets of non-redundant gene-oriented clusters derived from
genebank sequence.
The collection represents gene from many organisms each
cluster relating to a unique gene and including related
information., such as the type in which the gene is expressed,
map location etc.
b) Protein Sequencing Databases
PIR
SWISS-PROT
TrEMBL
Composite Protein Databases
NRDB
OWL
Secondary Databases
PROSITE
PRINTS
BLOCKS
IDENTIFY
SWISS-PROT
• Protein sequence database
• Switzerland based database.
• SWISS-PROT is an annotated protein sequence database established in
1986 and maintained collaboratively, since 1987, by the Department of
Medical Biochemistry of the University of Geneva and the EMBL Data
Library.
• It is a curated protein sequence database, which strives to provide a high
level of annotation
• (such as the description of the function of a protein, its domain
structure, posttranslational modifications, variants, source and
organisms)
• a minimal level of redundancy, and a high level of integration with other
databases.
• SWISS-PROT contains the information about the name and origin of the
protein, protein attributes, general information, ontologies, sequence
annotation, amino acid sequence, bibliographic references, cross-
references with sequence, structure and interaction databases, and entry
information.
 It is maintained collaboratively by the Swiss Institute for
Bioinformatics (SIB) and the European Bioinformatics
Institute (EBI).
 The SWISS-PROT group is headed by: Rolf Apweiler.
 It contains non-redundant sequence entries and
informations are thoroughly revealed and annotated.
 Provide protein sequence to students researchers and other
related industries like pharmasutical industries.
 SWISS-PROT aims to be minimally redundant and is
interlinked to many other resourses.
 Linked with other databases EMBL and TrEMBL.
TrEMBL
 It is primary protein sequence database
 Translated EMBL
 A protein sequence database of nucleotide translated sequences.
 Created in 1996 as a computer annotatd suppliment to SWISS-
PROT
 This is complete annotated protein sequence databases.
 There databases is constructed via translatingeach nucleiotide
sequence that are available in EMBL in to protein sequence by
using computational techniques.
 The TrEMBL sequence database contains the translations of all
coding sequences (CDS) present in the DDBJ/EMBL/GenBank
Nucleotide Sequence Database and also protein sequences
extracted from the literature or submitted to SWISS-PROT, which
are not yet integrated into SWISS-PROT.
TrEMBL consist two divisions:
SP TrEMBL REM TrEMBL
 It is an temporary storing area
where incomplete sequence
have not yet manually
annotated.when it is fully
discribed contains entries that
well eventually be incorporated
in to SWISS-PROT.
 TrEMBL developed by EBI
 It contains completely explained
and fully annotated sequences.
 Contains sequences that are not
destined to be included in SWISS-
PROT
 Eg:
# immunoglobulins & t cell
receptors.
# fragments of four than eight
amino acids
# synthetic sequences
# patented sequences
PIR
 Primary protein sequence data base.
 Protein Information Resource[1960]
 Developed by Margaret Dayhoff in 1960 as a collection of
sequence for investigating evolutionary relationships among
proteins.
 Developed at the National Biomedical Research Foundation
( NBRF)
 The databases is split into 4 distinct sections. Based on kind
of informations level.
 PIR-1, PIR-2, PIR-3, PIR-4
 They differ in the terms of
# quality of data
# level of anotation provided.
1) PIR-1
 Contains fully classified and annotated.
2) PIR-2
 Includes preliminary entries, which have not been
throughly reviewed and may contain redundancy
3) PIR-3
 Contains unverified entries, which have not been reviewed.
4) PIR-4
 Contains protein sequences that are not geneticallly
encoded and not produced on ribosomes. So they are
synthetic protein sequences.
Composite Protein Databases
 These are the amalgamation or compilation of product
of different primary databases.
 Makes searching easy and efficient for a searcher.
 They render sequence searching much more, because
they obviate the need to interrogate multiple resources
1) NRDB
2) OWL
NRDB- Non-Redundant Data Base
 It is built localy at NCBI
 Combination of 6 primary DB
1. SWISS-PROT
2. PDB
3. PIR
4. Gen pept
5. Gen pept update
6. SP update
 Non-redundant & error free
 But if strictly speaking chance of redundency and error
 When redundency and error and incorrect sequence are present in any
component DB. As such they where incorporated in to NRDB, especially in
SWISS-PROT.
 Make more efficient via, avoiding to search to too much DB for getting related
information.
OWL- Ontology Web Language
 Web ontology language
 Compilation of 4 primary DB
1. Gene Bank
2. SWISS-PROT
3. NRL-3D
4. PIR-4
 Make searching more efficient via, avoiding or obivating too much DB for getting
related information
 Developed by NCBI
 If any redundency in Gene Bank, it is as much incorporated into OWL during
amalgamation.
 Development of university deals –UK in association with Daresburg laboratory in
warrington 1994
 The sources are aligant on the basis of level of annotation and sequence
validations
 SWISS-PROT has the highest priority
 OWL is only released on a 6-8 weekly basis .
Secondary Databases
PROSITE
PRINTS
BLOCKS
IDENTIFY
 It contains the fruits of analysis of sequences in the primary
sources
 Simply secondary data were derived from primary
 These are db which are analysed primary databases, which
from secondary data. These are several different primary db
& a variety of ways of analysing protein sequences.
PROSITE
 First secondary DB to have been developed was PROSITE
 Generate its information from the primary data base SWISS-PROT
 Produced and maintained by SIB
 Relesed date : 1988 by amosbiroch
 URL Address: http://www.prosite.expasy.org.
 It categorises the protein sequences in families.
 Proteins are grouped into different family. Based on the single most
conserved Motif.
 Motif: it is a ring of aminoacid (10-20 amino acid sequences)they are
responsible for protein function and preserves its 3D structure.
 Such Motifs usually according key biological function.
 Eg: enzymes active site, ligand or metal binding site
 Motif indicate or represent charecteristic features or site for each family.
 The region act as signatures of particular protein family and help to
identify the other newly members of family
 PROSITE is developed a largely manual process of seeking the patterns
that best fit particular families and functions.
 In PROSITE entries are developed in two different files
1) First of this pattern and list all matches in the new version of SWISS-PROT
2) Documentation file provide:
# details of characterized family
# discription of biological molecule of choosen Motif
# supporting biografy
SIGNIFICANCE
 To find families based on Motif, ie; presence of motif the same portion of many
sequence are considered a single family.
 Fat functional charecterization and annotation of protein sequences.
 Identify possible functions of newly discoered protein and analyses of protein
for previously unditermined activity
 Offers tool for protein sequence analyses and Motif detection
 It is a part of expasy proteomics analysing server
APPLICATION
 Classification of protein is possible based on the highest conserved motif
 Based on particular motif can identify the charecteristic features of motif and
representing character.
 Eg: the structural and functional details if that proteins
PRINTS
 Collect information from OWL in future. It will collect
information from SP, TrEMBL and SWISS-PROT
 Information deriving process from OWL is called interactive data
base scanning.
 Contributed by SIB
 In 1999 it was maintained in the department of biochemistry and
molecular biology at university college London (UCL).
 http://www.bioinf.man.ac.uk/db browser/ bioactivity/ protein 2
frm. html.
 Here we need to consider multiple Motif. Insert to single common
Motif.
 Helps to find out the more similar sequence. So clear information
are available.
 More accurate analyses is possible based on similar multiple motif
sharing by sequences.
BLOCKS
 Multiple Motifs based database
 Ungaped multiple alignment of Motifs
 Database contains informations on blocks
 Highly conserved multiple motifs are arranged without any gap
 Developed by Henikoff 1998
 Automatically derived database
 Database constructed by using automated PROTOMAT system.
 Ultimately encoded as ungapped local alignments are calibrated against
SWISS-PROT to obtain a measure of the likelihood of a chance match
 Two scores are noted for each block :
 first denotes at the level at which 99.5 percentage of matches are true
negative.
 Second median value of the true positive scores .
 The median standardized score for known true positive matches is
termed strength .
 Because the database is derived by fully automatic methods, The blocks
are not annotated but links are made to the corresponding PROSITE
family documentation file .
 These information are derived from the secondary
database PRINTS & PROSITE it can also called as tertiary
database .
 It is based on protein families contained in PROSITE, at Fred
Hutchinson Cancer Research Centre (FHCRC).
 The motifs or BLOCKS are created by automatically detecting the
most highly conserved regions of each protein family.
 The blocks are ultimately and encoded as Ungappped local or
multiple alignment.
 Structure of BLOCKS entries:
 Where each block is identified by a general code (ID) line and
accession number.
 ID line indicates the type of discriminated to expect in the life.
 AC line indicates the minimum and maximum distance of the
blocks from its preceding neighbour.
 DI line contains the descriptions for a title of the family.
 BL line indicates the diagnostic power (amino acid triplet, number
of sequence it contains)
IDENTIFY
 Another automatically derived tertiary source
 Derived from BLOCKS and PRINTS
 Developed in the department of biochemistry at stanford
university by Navill - Manning et al 1998
 Constructed on the basis of e-motif
 e-motif : it is a based on the similarities of highly conserved
Motif sequence.
 This database is constructed on the basis of generalised
expressions of similarities between highly conserved Motif
sequences.
 It is designed to be more flexible band exact regular expression
matching.
 They are accessible for use the protein function web server from
the biochemistry department at stanford sets and their properties
are used in e-Motif.
Structure Classification DataBases
 Many proteins share structural similarities, reflecting,
common evolutionary origins
1) SCOP
2) CATH
SCOP
 Structural Classification Of Proteins
 It is maintained under MRC laboratory of molecular biology
and centre for protein engineering.
 Which describes structural and evolutionary relationships
between proteins of known structure 1995.
 It is helpful for at the multi domain level and individual
domain level.
 It is constructed using a combination of manual inspection
and automated methods.
 The information of structure of protein is available due to
the Checking done with automatic and manual method
result would be more accurate.
Scope Classification
 proteins are classified in a hierarchical fashion to reflect their structural and
evolutionary relationships.
 In this protein structures are assigned in a hierarchical order at three levels:
1) Family
2) Super family
3) Fold
 Family
proteins are clustered into families with clear evolutionary relationship if they
have sequence identify more than 30 percentage sequence similarity
 Super family
proteins are placed in super families when in spite of low sequence identify
their structure structure and functional characteristics suggest a common
evolutionary origin.
 Fold
proteins are classified as a common fold is have the same major secondary
structures in the same arrangement and with the same topology
 Scope is accessible for keyword via MRC laboratory webserver
 http://www.bioinf.man.ac.uk/db browser/ bioactivity/ structure frm. html
CATH
 Class Architecture Topology Homology
 It is a hierarchy in classification of protein structures maintained at University
College of London (UCL) 1997.
 The resource is largely derived using automatic methods but manual inspection
is necessary word automatic methods, fail.
 Developed by UCL's biomolecular structure and protein modelling unit. Used
for classification of protein structure. There are five levels within the hierarchy.
A) CLASS
Is derived from gross secondary structure content and packing of protein.
four classes of domain are recognised ,
1. SUBCLASS 1
2. SUBCLASS 2
3. SUBCLASS 3
4. SUBCLASS 4
Sub class 1: mainly similarities in alpha helix
Sub class 2: similarities in beta sheet
Sub class 3: alpha - beta which includes both alternating alpha /beta and
alpha + beta structures
Sub class 4: based on secondary structure content for element secondary
structural element contents will be very less in amount
B)ARCHITECHTURE
 Describe the gross arrangement of secondary structure ignoring the
connectivities.
C) TOPOLOGY
 both the overall shape and the connectivity of Secondary structures
protein
D) HOMOLOGY
 share more than 35 percentage sequence identity and share a common
and sister (homologous )similarities are first identified by sequence
comparison and and structure comparison algorithm
E) SEQUENCE
# Final level in the hierarchy.
# Structures with homology groups are further clustered on te basis of
sequence identify.
# domains have sequence identifies more than 35 % indicating highly
similar structures and functions
CATH is as accessable keyword via UCL’s biomolecular structure and
modelling unit web server.
3.Data Base Searching
A) Sequence Data Base Searching
EST searches
Different approaches to EST analysis
Merck/IMAGE
Incyte
TIGR
EGAD
EST analytical tools
Sequence similarity
Sequence assembly and Sequence clustering
EST searches
 Expressed Sequence database.
 EST data are held in the EST database.
 EST sequence tag are also called gene transcripts.
 Which maintains its own format and identification number
system.
 Expression tag sequence is a short sequence .
 Short nucleotide sequence produced from CDNA
 mRNA- reverse transcriptase enzyme- single stranded DNA.
 A typical EST will be between 200 to 500 basis in length, with
modern technical advances increasing the theoretical length
resulting from a single run 1000 bases are more
 It is called genes transcript and parcel sequences and series are
noisy sequences that, as a result of sequences errors, may not only
contain have ambiguous bases but also be missing bases.
 In analysing EST’s, the following points should:
 The EST alphabet is five characters ACGTN.
 EST will be sum sequence of any other sequence in the database
 EST may not represent part of the series of CDS of any gene .
 EST production is highly automated and results often
contaminated with ambiguous are missing bases. This course
difficulties in sequence interpretation.
Uses
 Identification of particular gene
 Mapping of genes within a genome by using a small stretch of
sequence
 Identification of species
 For academic analyses or commercial exploitation have been
developed
Different approaches to EST analysis
 These are the EST’s information providing sources.
 Where is approaches to establishing libraries of EST’s for
academic or commercial exploitation have been developed.
 Much of the publicity available data are collected together
into the EST sections of the year EMBL data library and Gene
Bank (db EST)
 Merck/ IMAGE
Incyte
TIGR
EGAD
Merck/ IMAGE
 It is a research project was run by the university of washington and
funded Merck and company.
 In 1994 , Merck and co-founded a research project based at the
university of washington to sequence 300000 EST’s from a variety
of normalised libraries.
AIM:
 To produce 3 lakh EST’s from CDNA libraries.
 For many years Merck has sponsored the production of a drug
index.
Approaches of the sources
 To support academic analysis
 Commercialization of EST information to drug production
 The drug index is known as Merck Gene Index as of May 1997,
A,84,421 EST’s had been submitted by the project to dbEST
Incyte
 It is a pharmaceutical company
 Incyte pharmaceutical Inc.
 It produces a database Life Seq, that enphasises the quantitative
information derived by sequencing strand CDNA libraries.
AIM
 To provide/collect information on relative copy numbers of genes
in healthy and deseased tissue.
 To facilitate the elucidation of potential therapeutic targets.
APPROACH
 Commercialization of genomic information regarding EST’s of
healthy and diseased cells. Then it give to the therapeutic targets.
 Production of drugs for getting money
 In april 1998, the size of Life Seq was 2.5 million EST’s
representing 8000 to 12000 different genes.
TIGR
 The Institute for Genomic Rsearch .
 It is a government organisation .
 It purely stands for academic purposes .
 It is a research organisations with interest in structure, functional and
comparative analysis of genomes and gene products .
 The range of organisms covered includes viruses, Eubacteria ,pathogenic
bacteria ,archaebacteria and eubacteria (plant and animal)
AIM
 Preparation of Human Gene Index (HGI).
 This index integrates results from human genome research projects
around the world including that from db EST and Gene Bank.
 To create a non redundant view of all human genes and informations on
their expression pattern cellular roles , functions and evolutionary
relationship.
 Data in HGI are freely available.
 TIGR sequence more than 100000 EST’s from over 300 CDNA libraries
+ data from db EST + non redundant Human Transcript Information
using the technique of sequence assembly, to generate Tentative Human
Consensus ( THC) sequences .
EGAD
 Expression Gene Anatomy Database
 It is database providing information of EST’s
EST Analytical Tools
There are many tools avilable for the analysis of EST’s:
 Commercially available Tool = Incyte Life Tools
 Publicaly available Tool = 3 Types
1) Sequence Similarity Search Tools
2) Sequence Assembly Tools
3) Sequence Clustering Tools
1) Sequence Similarity Search Tools
 We consider the tools as the relate to EST's.
 If the reason est is told, then identify the tool which shows
the sequence similarity with the EST, by comparing the all
sequences.
 Eg: BLAST tool
BLAST P
BLAST N
BLAST X
X BLAST N
2) Sequence Assembly Tools
 When a search of databases reveals several EST matching
with probe sequence, normally the ESTs must be aligned
with each other to reveal the consensus sequences.
 This tool is used in when there are several EST sequences
showing similarity to a probe sequence .
 In this situation, this tool will do aligning and merging of
different fragments of sequences to reconstruct the original
mRNA .
 Example; Phrap, Staten assembler, TIGR assembler
3) Sequence Clustering Tools
 These are the programs that take a large set of sequences and
divide them into subsets, or clusters, between the extent of shared
sequences are defined in a minimum overlap region.
 These tools having the capacity to analyse a large set of sequences
and capable of grouping for clustering sequences based on the
sharing of maximum similar regions .
 Reliable and effective mechanism for clustering EST will reduced
redundancy in the database And save database search time and
analysis effort .
 Example:
Wed EST clustering tools
USEARCH
CD- HIT
Sequence similarity searching tools
 These are softwares used for searching, assessing, analysis, interpretation and prediction
of information containing in databases.
 These are two types
1) Pair wise sequence alignment and similarity searching tool
# A pair of sequence involved
# one will query sequence and other template.
# query – sequence will be studied
# template – will be find out from DB
Eg; BLAST , FASTA
2) Multiple sequence alignment and similarity search tool or
homology searching tool
# more than two sequence involved.
# a set of sequence can compare in it & alignment possible
Eg; CLUSTAL , MODELLER
PSI - BLASTA
# Position specific Interacted blast
# It is an hybrid of pairwise sequence alignment and multiple sequence similarity search
tool
 sequences are aligned to find region of higher density or
strong similarity.
 According to the sequence length, sequence alignment are
two types;
1) Local sequence alignment: Sequence alignment that
select only regional areas only which exhibit strong
similarity
Eg: BLAST, FASTA, PSI - BLAST
2) Global sequence alignment :
Sequence alignment that consider entire sequence known
as global sequence alignment
Functional Analysis Tool
• Protein as well as nucleotide.
• Used for functional analysis.
• To study the similarities of sequence based on their
function
• GOFFA :
# Gene ontology for functional analysis
# using for identification of functional elements in
genome and related
functional analysis of gene and genome
• Ermine J :
# Used for genome analysis
# and also for functional analysis related to gene
expression
• Interproscan :
# It is used for the functional analysis of protein
Structural Analysis Tool
 Structural analysis of nucleotide and proteins .
Eg:
 SWISS PROT
 PDB viewer
 Ras Mol
Statistical Analysis Tool
 Statistical analysis the value of similarity and
differences
Eg:
 Statistica
 Met Lab
 Perl
B) Pair-Wise Sequence Alignment
Technique
 Comparison of sequences and sub sequences
 Identity and similarity
 Substitution matrics
 PAM
 BLOSUM
 DOTPLOT
 BLAST
 FASTA
Substitution matrices
( BLOSUM & PAM)
 When two sequences compare, one sequences have Leusine and
other also have Leusin at comparing sequences,
 If the residue to residue (Leusin- Leusin)Similarity in amino acid
in the both sequences plot alignment score as 1.
 But according to this substitution matrix program due to
mutation or evolutionary change, the amino acid can change and
cause mismatches.
 But the mismatch can accept matching ones, since they do not
change the basic structural or functional.
 The matching are considered by deep analysis.
 Used in the study of evolutionary relationship.
 If amino acid changes their nature will be considered. if
nature Remains same in deeper analysis, researcher should be
considered them as match one and plotted it in matrices such
plotted matrices produce called substitutional matrices.
BLOSUM Model
 It is a substitution matrices.
 BLOCKS amino acid substitution matrices .
 It was proposed to overcome the problem of alignment of distantly
related sequences comparisons on substitution matrices .
 It was proposed by Steven Heinkoff & Jorja G Henikoff in
1992 , From the conserve regions of blocks the informations are
derived from the and amino acid patterns of distantly related
protein sequences available in BLOCKS databases hence the name
BLOCK SUBSTITUTION MATRIX.
 BLOSUM Matrices are based on a much larger data set.
 Represent distant relationships more explicitly. The closely related
sequences are considered and clustered together and treated as
single sequences.
The cluster contains sequences that have sequences
identifies higher than it cutoff called clustering percentage
changes in clustering percentage Leads to a family of
matrices.
This has three versions of comparison:
BLOSUM 30 - 30 less than 30 percentage similarity
BLOSUM 62 - 62 or between 62 and 30 percentage similarity
BLOSUM 90- 90 or between 90 and 62 percentage of
similarity
It helps to detect all kinds of information and to get diverse
type of relationships (closely and distinct )
PAM
 (Point Accepted Mutation or DayHoff PAM model)
 Also known as DayHoff amino acid substitution matrix.
 It was derived by M.O.DayHoff In 1978.
 Here Substitutions of A.As are observed in homologuos protein
sequences during evolution, so these amino acids Substitutions
do not significantly change the function of the protein.
 These substitutions are accepted by natural selection.
 These matrixes are known as as accepted point mutation or point
accepted mutation PAM.
 To prepare PAM Matrices , observed substitutions that occur in
alignments between similar sequences estimated Then used to
generate a 20×20 mutation probability matrix p representing all
amino acid changes.
 Each element of matrix Pij Represent the probability of
replacement of A.A. j by A.A i Over a fixed evolutionary
period .
 For PAM 1 Is the unit of evolutionary divergence in which
one percentage of amino acids have been changed .
 The model has limited value.
 Applied for highly similar sequence alignment and
comparison .
 Only used in case of closely related sequence comparison .
 Not provide distantly related Closely related sequences and
relation to overcome this later proposed BLOSUM.
 Used in evolutionary studies
DOT PLOT Analysis
 It is a paradise sequence alignment
 It is a very simple and basic pair why sequence analysis technique
 It is done by manual and graphical method of sequence analysis
 W ithin a plot, two identical sequences are characteristic
 It is the most basic method of comparing two sequences A visual
approach known as Dot Plot.
 It was first described by A J Gibbs and G A Memory in 1970
 It is a graphical method for comparing two sequences to identify the
region of similarity or dissimilarity, depicted by the presence or absence
of a dot on the plot, hence the name Dot Plot.
 To construct dot plot of sequences A and sequence B, the first
sequences is taken on the top of the plot (x axis) and the second
sequences is taken on the left side (y-axis) of the plot.
 A dot is placed on the plot if any sequence character Ai Present in A
sequences is identical to sequences character Bi Present in sequence B.
 A region of constructive Identical characters between both
sequences forms a diagonal line on the plot space .
 When large similar sequences are compared, such clouds
become crowded or noisy. To overcome this, the sliding
window concept is used .
 From the dot plot, the alignment score is calculated .
Uses
 Used for improvise logical sequence analysis.
 Useful for comparison of protein sequences.
 The plot is characterized by some apparently random dots
(noise) indicates regions of greater similarities between two
sequences
BLAST
 Basic Local Alignment Searching Tool
 Pair wise sequence alignment tool.
 Developed and maintained by NCBI
 It is a tool specialised in local sequence alignment inserted of
whole sequence alignment.
 Tool based on a statistical, theory called explicit statistical theory
by Altschul et al 1990
 Ungapped Alignment of regional sequences
 Can be used to align both protein and nucleotide sequences but it
can provide with alignment for protein sequences
 Very fast searching tool
 This tool can be search a data with millions of sequences in the
data base with In a second in pair wise manner.
Use
 Construct pair why sequence alignment by comparisons between two
sequence.
 Best tool for searching single most best sequence from corresponding
database.
 To find out the structural sequence similarity of quary sequence include 3d
structure.
 Used in the interpretation and prediction of structural information.
 Interpretation and prediction of functional information.
Steps
 Selection of regional areas of information shows best similarity .
 Extension of searching towards both the sides of selected region to get
maximum similarity .
Demerits
 At a time, we can only Compare a query sequence with a single sequence.
sensitivity to select sequences.
sometimes it may loses its sensitivity in selecting best matches
from databases (because when this tool tries to maintain thier speed in
selecting the best .it may missed certain matches that may be better than
selected one .

1) BLAST P
Used to search and find out a perfect protein sequences from
the P.S.D.B for for the query sequences.
2) BLAST N
Search and find the best N.S from N.S.D.B For the query
sequences .
3) t BLAST N
query sequeneces equal to protein sequences.
Then the given N.S.D.B Is converted into protein sequences then
comparing the quarry with the translated nucleotide sequences.
4) BLAST X
query sequence = nucleotide sequence
we are searching within P.S.D.B, Then the protein sequences are
converted into nucleotide sequences and compare nucleotide
sequences with the translated protein sequences.
5) t BLAST X
This translates Both N & P sequences in the respected databases
and then searching is occurs.
FASTA
 fast all
 it is a sequence alignment tool
 developed by Lipman and pearson 1985
 The FASTA format is a text-based format for representing either nucleotide
sequences or amino acid (protein) sequences, in which nucleotides or amino
acids are represented using single-letter codes.
 The format also allows for sequence names and comments to precede the
sequences.
 The format originates from the FASTA software package, but has now become a
near universal standard in the field of bioinformatics.
 The simplicity of FASTA format makes it easy to manipulate and parse
sequences using text-processing tools and scripting languages like the R
programming language, Python, Ruby, and Perl.
comparison with BLAST:
 It give better results for nucleotides but can used for both P& N sequences .
 It can provide better results than BLAST N But not better than BLAST P.
 More sensitive than BLAST in selecting best matches Missing of sequences
while searching is lesser than BLAST.
Different forms of FASTA:
1) FAST A3
It has a normal function used for both N & P Sequences for
searching P& N sequence query
2) FAST S3
Used to compare linked peptides against a protein sequences
databases
3) FAST f3
Used to compare mixed peptides against protein sequences
databases
4) FAST X/Y3
Used to search within protein sequences databases against a
translated query N.S.
5) t FAST X/Y3
Used to search within a translated protein sequence databases
for comparing a query protein sequences
C) Multiple Alignment Technique
 Objective, manual, simultaneous and progressive
methods
 Databases of multiple alignments
 PSI-BLAST
 CLUSTAL-W
Multiple Sequence Alignment
 More than two sequences involved.
 A set of sequences can compare at time and alignment also possible.
 2 type alignment:
 Simultaneous Multiple Sequence Alignment and Progressive Multiple
Sequence Alignment.
1) Simultaneous Multiple Sequence Alignment
 Alignment occur a time, that is simultaneously.
 There is no hierarchy fashion of arrangement or orderly arrangement.
 But sequences having similarity.
Advantage
Very fast, very quick alignment
Disadvantage
 We can't expect orderly arrangement of sequences based on similarity.
 Evolutionary relationship study is not possible
2) Progressive multiple sequence alignment
 Hierarchical arrangement of sequences and clear cut orderly
arrangement can seen.
 Sequence alignment of occurs progressively by step by step,
little time consuming process.
 This alignment best and most similar sequence, arrange next
after query sequence.
Advantage
 Arrange at hierarchical fashion .
 Evolutionary relationship study possible
Diadvantage
 Comparatively slow and little time consuming process
PSI-BLAST
 PSI-BLAST (Position-Specific Iterative Basic Local Alignment Search
Tool) derives a position-specific scoring matrix (PSSM) or profile from
the multiple sequence alignment of sequences detected above a given
score threshold using protein–protein BLAST.
 This PSSM is used to further search the database for new matches, and is
updated for subsequent iterations with these newly detected sequences.
 Thus, PSI-BLAST provides a means of detecting distant relationships
between proteins.
 PSI-BLAST is most conveniently used on the internet with the help of
the graphical user interface provided by the PSI-BLAST search page on
National Center for Biotechnology Information (NCBI) website
(http://www.ncbi.nlm.nih.gov/BLAST/).
 The PSI-BLAST page may be customized by the user in terms of
automated or semiautomated or “two-page formatting” and other
parameters modified as desired.
 This page can then be saved as permanent internet bookmark for
repeated use on future occasions.
 It is an hybrid tool
 It is a recent approach
 Hybrid element of both device and multiple sequence alignment
method
 It was proposed by Altschul in 1997
 Hybrid of pairwise sequence alignment and multiple sequence
alignment and similarity searching tool.
 It can aligned sequence via progressive sequences alignment
 Searching residue to residue similarity, we compare sequence only,
plot dot similarity occurs.
 If there similarity present, place a dot mark as graphical
representation
 Calculate similarity
 Out of 7, 5 is similar
 Used mainly for nucleotide sequence comparison
 Here, sequences are aligned via pair wise , but with repeated blast in order to
get more and more related sequences.
 So they act as pair wise as well as look like a multiple sequence alignment .
 So they contains maximum similarity, median and least similarity
Advantages
 To increase the search of BLAST
 fast to run
 provide sequences with diverse range of sequence similarity like M.S. alignment
 Searches are more sensitive and Selective, able to detect weak but meaningful
similarities.
 running the program, increases search sensitivity.
Disadvantages
 To derive diagnostic family motifs can be very time consuming and demands
levels of understanding for general use.
 Automated interactive stearch may degenerate and lead to profile dilution
CLUSTAL
 3 forms:
1) CLUSTAL X
2) CLUSTAL W
3) CLUSTAL ω
 CLUSTAL X&W:
Protein sequence as well as nucleotide sequence alignment possible
 CLUSTAL ω:
Can only align the protein sequence
 CLUSTAL X:
 In CLUSTAL X Controlling interface is graphical user interface.
 Menu based operations for this handling or graphical representations
are used.
 CLUSTAL W CLUSTAL ω:
 Command line interface.
 For controlling interphase using text command.
Clustal W
 Clustal W like the other Clustal tools is used for aligning
multiple nucleotide or protein sequences in an efficient
manner.
 It uses progressive alignment methods, which align the most
similar sequences first and work their way down to the least
similar sequences until a global alignment is created.
 Clustal W is a matrix-based algorithm, whereas tools like T-
Coffee and Dialign are consistency-based.
 ClustalW has a fairly efficient algorithm that competes well
against other software.
 This program requires three or more sequences in order to
calculate a global alignment, for pairwise sequence alignment
(2 sequences) use tools similar to EMBOSS, LALIGN
 Multiple sequence alignment tool
 progressive multiple sequence alignment possible
 written in O ++ programming language.
 this can run almost all platforms like Unix, Linux, Metash, Windows
 Developed by Juli Thomson and Toby Gibson
 Developed and maintained by EBI
 User interface is command line, interface by write text commands.
 Due to progressive multiple sequence alignment comparison is very easy
due to orderly arrangement.
Application
 Very easy to compare sequences due to progressive sequence alignment
 Very useful for the classification of both protein and nucleotide
sequences.
 Application in predicting structural and functional features of both
nucleotide as well as protein sequences.
 This is the best tool for evolutionary relationships study .
4.Protein Structure Prediction
A)Secondary structure prediction
1) Chou-fasman Method
2) J Pred prediction method
Secondary structure prediction
 Commonly two methods are used for protein structure
prediction
1) X - ray diffraction technique
2) Nuclear magnetic resonance technique
 Birthday are very expensive by clever wise and time taking
processes.
 To over comes these issues we are used by biinformatics
tools.
 Less time consuming and very fast method.
 Skilled labours are not required.
 Cheapest method, when comparing with above 2.
Chou-fasman Method
 Chou fasman Method is an empirical technique for the prediction
of secondary structures in proteins .
 Development by Peter Y Chou and Gerald D Fasman.
 The method is based on analysis of the relative frequencies of each
amino acid in alpha helix, beta sheets and turn based on known
protein structures solve with x-ray crystallography.
 From these frequencies a set of probability parameters were
derived for the appearances of each amino acid in each secondary
structure type, And these parameters are used to predict the
probability that a given sequence of amino acids would form a
helix, a beta strand, for a turn in a protein.
 Significantly Low accurate than the modern machine learning
based technique.
 50 to 60 percentage accurate in identify correct secondary
structures
Definition
 It is an statistical procedure in which each and every amino acids
and their frequencies of given sequence is Compared with the
probability of amino acids and their corresponding propensitive
Values given by Chou Fasman in order to Fit the given protein to a
particular secondary structure
Probability table
What are the amino acids and their numbers are present in
secondary structure of protein according to traditional sequence
Propensitive value
 Is is the value at which a particular and aminoacid showing their
tendency towards a particular secondary structure.
 Propensity value of an aminoacid is generally depends the
chemical properties and their R groups:
# Alpha helix: 4 helix markers + 2 helix breakers
# Beta sheet: 3 sheet markers + 2 sheet breakers
Steps
 Scan through the given polypeptide chain
 For to find out the what are the different amino acids
present in the given strand
 Also for finding out their numbers
 Compare the same with the probability and propensitive
value given by Chou Fasman
J Pred prediction method
 A protein secondary structure prediction server
 Fully automatic method
 It has been operation since approximately 19
 J Pred Incorporate the J net algorithm in order to make more
accurate predictions.
 Combination of 6 Independent protein structure prediction
method
1) Z PRED
2) MUL PRED
3) DSC
4) PHD
5) NNSSP
6) PREDATOR
 All 6 different method predict independency .
 396 Domain data support secondary structure
information.
 Evaluate 6 different methods result with 396 domain data
and get final structural information.
 Inserted of 6 method, using Gives more accurate results
than it using Z PRED, MUL PRED Methods.
 4 methods compilation gives accuracy 72.9 percentage .
 It is an Secondary structural prediction method, hear
combilation of 6 different independent methods are using .
Tertiary Structure Prediction
Comparative modelling-
 MODELLER
 RasMol
Comparative modelling
 Comparative modelling/Homology modelling
 It predict the 3d structure of proteins.
 It uses experimentally determined protein sequences as
models (templates)
 The method predict the structure of another protein that
exhibits aa sequence similarity to the template protein.
 Evolutionary related protein have similar sequence and
structure.
 These similarities are very high in Core regions the
sequence similarity should be greater than 35 percentage
Steps
1) selection of tablet sequences
 select template from protein sequences database.
 the template strand should show maximum sequences similarity
or homology
2) Preparation of sequence alignment
 alignment of two sequences for homology determinations
3) Construct 3d model
 it is made between the cordinents of template
 We consider the length height width For comparing the template
with the query sequences between the coordinates of templates
4) Evaluation of the model constructed
 it is evaluated between known 3d model.
 the method is more accurate.
 the accuracy is depends on sequence alignment
 Homologous models are identified and extinct of their
sequences similarity with one another and the unknown is
determined.
 Sequence databases search tools BLAST and FASTA are
used to search related structures.
 Sequences are aligned together with the help of a MSA tool
called clustal W.
 Structurally conserved and variable regions are identified
Co-ordinate of core residues of unknown structure and those
of non are generated.
 The side chain and combinations are built.
 Unknown structures are refined and evaluated
various software packages are used WHAT, RASMOL,
MODELLER.
 It exploited the revolutionary related proteins.
MODELLER
 Used for 3d structure prediction.
 It is written in FORTRAN 90 languages.
 It is a software used in homology or knowledge based modelling.
 It was developed by Anrej sali at the university of california san
francisco .
 The ModWeb with comparative protein structure modelling webserver is
based on MODELLER.
 It has limited incorporation with abintitio.
 It is a computer program used in producing homology models of protein
tertiary as well as quarternary structures.
 It is freely available for academic use.
 Graphical user interface and commercial versions are different .
 Computer program.
 Used for sequence database searching
 For protein structural comparison.
 used for sequence clustering
4 important steps
1) Selection of tablet sequence
 select temperature sequence from protein sequence databases template
to sequence exhibit maximum homology with sequence which is used
to study
2) Preparation of sequence alignment
 preparation of sequence alignment between the sequence which is to be
analyised with that template sequence
3) Construction of 3d model
 construction 3d model based on the coordinates of the templet using
technique called satisfaction spacial restraints
 Here by using certain geometrical criteria Length, breadth, height
compare the complete with query sequence especially on the basis of
coordinates of the tablets searches loop, folding, side chains etc.
4) Evaluation of model constructed
 we can expect 90 % accuracy, when provides sequence alignment highly
accurate
RASMOL
 Molecular visualisation software.
 Molecular structural analysis of protein as well as nucleic acid and
other similar molecule is possible.
 Used for visualising molecular structure.
 Used in a maily for structural analysis.
 Example : pollen grains, detailed molecular structure study .
 Zooming facility of molecular structure and getting full size of
monitor .
 Rotating facility in any 3d direction x, y, z 180 degree, 120 degree,
120 degree etc.
 Periferal analysis is possible.
 Different colouring scheme available for particular part projection.
 We can view entire structure is possible detailed study is possible
by using RASMOL.
Advantage
 detail study of structure is possible by using RASMOL.
 Molecular visualisation software .
 Very good for detailed molecular analysis of small
molecules like nucleotide or protein etc.
1) Group colouring scheme
2) Shapely colouring scheme
3) amino colouring scheme
5.Emerging Areas of Bioinformatics
1) DNA microarrays
2) Functional genomics
3) Comparative genomics
4) Pharmacogenomics
5) Chemoinformatics
6) Medical informatics
DNA Microarrays
 it is genetic analysis technique.
 used for analysis of nucleic acid
 in genetic analysis technique 100 to 1000 of microscopic dots of
dna was spotted on small glass plate in an orderly fashion.
 Location of each DNA dots, structural details, final details and
expression products informations are available.
and stored in computer program .
 All informations of spotted DNA are available form computer, by
using these information genetic analysis occurs
 Started at 1990.
 Also called DNA chips, gene chips, DNA array, gene array and
biochiyps.
 Principles is hybridizations between nucleotides
Procedure
 for this, normal mRNA from normal expresses cell and it is
enter into this microarray, get the rate of gene expression.
 Collect mRNA and prepare DNA microarray.
 Radiolabeling the CDNA (100 NOS )and which is considered
as the probe
 Introduced into DNA microarray.
 Radiolabelled CDNA Hybridization with DNA microarrays
dots that indicate the number of hybridization
Application
 Gene expression study
1) for comparison of gene expression in similar cell type (diseased cell and normal
type )
2) for comparison of gene expression in different cell type (different cell of
different individual)
 Identification of tissues specific gene
 Discovery of drugs
 Diagnostics and genetic mapping
 Study of protein protein interaction
 Functional genomics
 DNA sequencing
 Agricultural biotechnology
 Study the expression of plants
 DNA polymorphism
 Detection of pathogen
 Gene finding
 Analysis 100 -1000 genes at a time
 Gene mapping
Functional genomics
 Study the functions of genes.
 example growth and physiological environment biochemical environment and role in
growth.
 In activity of genes and its reasons.
 Genes are inactive by the actions of other genes and expression of genes may die to the
suppression of other gene, the causing reason.
 Development and application of genomic analysis technique .
 Identify the genes involving in the disease.
1) Positional cloning technique
2) genome sequencing technique
 Example:
# Mirring Shotgun method
# enzymatic method
# chemical method
 are developed on the basis of functional genomics
 get information about structural and functions of gene
3) Gene expression Profiling technique comparison of similar cell type but different in
 gene expression due to mutation
 So used to find out the expression
4) Knockout technique
Comparative genomics
 Compare the structural and functional details and based on the similarities and
differences find out the relationship
 Gene finding
 classification of nucleotide sequence
 find out the evolutionary relationship comparison of gene expression
 Analysis of protein sets from completely sequenced genomes
 For better understanding of the genomes and biology of the respective
organism
 Example methanococcus, mycoplasma, E.coli, bacillus subtilis are fully
sequenced
 Genes involved in ripening green mangoes to yellow mangoes
 In this genome of mango is compared to the annotated genome of similar
species to identify the genes and the functions that they do
 Databases used for comparative genomics:
A. PEDANT Give informations about proteins, enzyme
B. KEGG A comprehensive set of metabolic pathway of genome
C. MBGD Microbial genome database. search for microbial genome
D. WIT Metabolic reconstruction of completely sequenced genomes
Pharmacogenomics
 Is the study of the role of the genome in drug response
 its name reflects its combining of pharmacology and genomics
 Pharmacogenomics analyses how the genetic makeup of an
individual affects his or her response to drugs
 It deals with the influence of acquired and inherited genetic
variation on drug response in patients by correlating gene
expression for single nucleotide polymorphism with pharmaco
kinetic and pharmacodynamics
 Pharmacogenomics aims to develop rational means to optimise
drug therapy.
 with respective patients genotype, to ensure maximum efficiency
with minimal adverse effect
 Genomic research will allow drugmakers to tailor a therapy to the
individual specific need
 It is described as a marriage between functional genomics and
molecular pharmacology
 A new journel pharmacogenomics was started by the nature group
of journals
 The entire spectrum of genes that determine response and
sensitivity to individual drugs
 Example human genome project
 Pharmacogenetics is the narrow spectrum of inherited differences
in drug metabolism and disposition .
 Both pharmacogenomics and genetics are Interchangeable
 It provide tools to classify interogenity of disease, Individual
response to medicine.
 give fascinating area in biotechnology research.
 Example: diagnosis, mechanism of disease and Response of
patients to medicine
 2 approaches to pharmacogenomics
1) candidate gene approach
2) linkage disequilibrium approach
 In industrial level, it is used to know variability in
clinical trials
 Disturb differential side effects
 Inconsistency in disease models
Chemoinformatics
 Also known as chemoinformatics, Chemio informatics
and Chemical informatics
 It is the use of computer and informational techniques
applied to a range of problems in the field of chemistry
Application
 In pharmaceutical companies and academic settings in the
process of drug discovery
 These methods can also be used in chemical and allied
industries in various other forms
Medical informatics
 Also called health informatics
 Clinical informatics
 It is information engineering applied to the field of healthcare,
essentially the management and use of patient healthcare information
 It is a multidisciplinary field that uses health information technology to
improve health care via any combination of higher quality, higher
efficiency and new opportunities
 Used in gene therapy
 Neurological and metabolic disorders
 Cystic fibrosis
 Infectious diseases
 More efficient to patient case
 Cardiovascular diseases, cancer gene therapy, human gene therapy
Bioinformatics

More Related Content

What's hot

Nucleic acid and protein databanks
Nucleic acid and protein databanksNucleic acid and protein databanks
Nucleic acid and protein databanksNithyaNandapal
 
Comparative genomics
Comparative genomicsComparative genomics
Comparative genomicshemantbreeder
 
Phylogenetic Tree, types and Applicantion
Phylogenetic Tree, types and Applicantion Phylogenetic Tree, types and Applicantion
Phylogenetic Tree, types and Applicantion Faisal Hussain
 
Nucleic Acid Sequence databases
Nucleic Acid Sequence databasesNucleic Acid Sequence databases
Nucleic Acid Sequence databasesPranavathiyani G
 
History and scope in bioinformatics
History and scope in bioinformaticsHistory and scope in bioinformatics
History and scope in bioinformaticsKAUSHAL SAHU
 
Sequence Submission Tools
Sequence Submission ToolsSequence Submission Tools
Sequence Submission ToolsRishikaMaji
 
Bioinformatics biological databases
Bioinformatics biological databasesBioinformatics biological databases
Bioinformatics biological databasesSangeeta Das
 
Tools of bioinforformatics by kk
Tools of bioinforformatics by kkTools of bioinforformatics by kk
Tools of bioinforformatics by kkKAUSHAL SAHU
 
Primary and secondary database
Primary and secondary databasePrimary and secondary database
Primary and secondary databaseKAUSHAL SAHU
 
Phylogenetic analysis
Phylogenetic analysis Phylogenetic analysis
Phylogenetic analysis Nitin Naik
 
Primary and secondary databases ppt by puneet kulyana
Primary and secondary databases ppt by puneet kulyanaPrimary and secondary databases ppt by puneet kulyana
Primary and secondary databases ppt by puneet kulyanaPuneet Kulyana
 
Uni prot presentation
Uni prot presentationUni prot presentation
Uni prot presentationRida Khalid
 
History and devolopment of bioinfomatics.ppt (1)
History and devolopment of bioinfomatics.ppt (1)History and devolopment of bioinfomatics.ppt (1)
History and devolopment of bioinfomatics.ppt (1)Madan Kumar Ca
 

What's hot (20)

Nucleic acid and protein databanks
Nucleic acid and protein databanksNucleic acid and protein databanks
Nucleic acid and protein databanks
 
Structural databases
Structural databases Structural databases
Structural databases
 
Tools and database of NCBI
Tools and database of NCBITools and database of NCBI
Tools and database of NCBI
 
Comparative genomics
Comparative genomicsComparative genomics
Comparative genomics
 
Kegg databse
Kegg databseKegg databse
Kegg databse
 
Phylogenetic Tree, types and Applicantion
Phylogenetic Tree, types and Applicantion Phylogenetic Tree, types and Applicantion
Phylogenetic Tree, types and Applicantion
 
Nucleic Acid Sequence databases
Nucleic Acid Sequence databasesNucleic Acid Sequence databases
Nucleic Acid Sequence databases
 
Scop database
Scop databaseScop database
Scop database
 
History and scope in bioinformatics
History and scope in bioinformaticsHistory and scope in bioinformatics
History and scope in bioinformatics
 
Sequence Submission Tools
Sequence Submission ToolsSequence Submission Tools
Sequence Submission Tools
 
EMBL
EMBLEMBL
EMBL
 
Bioinformatics on internet
Bioinformatics on internetBioinformatics on internet
Bioinformatics on internet
 
Bioinformatics biological databases
Bioinformatics biological databasesBioinformatics biological databases
Bioinformatics biological databases
 
Entrez databases
Entrez databasesEntrez databases
Entrez databases
 
Tools of bioinforformatics by kk
Tools of bioinforformatics by kkTools of bioinforformatics by kk
Tools of bioinforformatics by kk
 
Primary and secondary database
Primary and secondary databasePrimary and secondary database
Primary and secondary database
 
Phylogenetic analysis
Phylogenetic analysis Phylogenetic analysis
Phylogenetic analysis
 
Primary and secondary databases ppt by puneet kulyana
Primary and secondary databases ppt by puneet kulyanaPrimary and secondary databases ppt by puneet kulyana
Primary and secondary databases ppt by puneet kulyana
 
Uni prot presentation
Uni prot presentationUni prot presentation
Uni prot presentation
 
History and devolopment of bioinfomatics.ppt (1)
History and devolopment of bioinfomatics.ppt (1)History and devolopment of bioinfomatics.ppt (1)
History and devolopment of bioinfomatics.ppt (1)
 

Similar to Bioinformatics

Bioinformatics in biotechnology by kk sahu
Bioinformatics in biotechnology by kk sahu Bioinformatics in biotechnology by kk sahu
Bioinformatics in biotechnology by kk sahu KAUSHAL SAHU
 
Introduction to Bioinformatics
Introduction to BioinformaticsIntroduction to Bioinformatics
Introduction to BioinformaticsAsad Afridi
 
BIOINFO unit 1.pptx
BIOINFO unit 1.pptxBIOINFO unit 1.pptx
BIOINFO unit 1.pptxrnath286
 
Bioinformatics Introduction and Use of BLAST Tool
Bioinformatics Introduction and Use of BLAST ToolBioinformatics Introduction and Use of BLAST Tool
Bioinformatics Introduction and Use of BLAST ToolJesminBinti
 
5. BIOINFORMATICS.pptx B.Pharm sem 2 Computer Applications in Pharmacy
5. BIOINFORMATICS.pptx B.Pharm sem 2 Computer Applications in Pharmacy5. BIOINFORMATICS.pptx B.Pharm sem 2 Computer Applications in Pharmacy
5. BIOINFORMATICS.pptx B.Pharm sem 2 Computer Applications in PharmacyVedika Narvekar
 
LECTURE NOTES ON BIOINFORMATICS
LECTURE NOTES ON BIOINFORMATICSLECTURE NOTES ON BIOINFORMATICS
LECTURE NOTES ON BIOINFORMATICSMSCW Mysore
 
Sequence and Structural Databases of DNA and Protein, and its significance in...
Sequence and Structural Databases of DNA and Protein, and its significance in...Sequence and Structural Databases of DNA and Protein, and its significance in...
Sequence and Structural Databases of DNA and Protein, and its significance in...SBituila
 
Sequence and Structural Databases of DNA and Protein, and its significance in...
Sequence and Structural Databases of DNA and Protein, and its significance in...Sequence and Structural Databases of DNA and Protein, and its significance in...
Sequence and Structural Databases of DNA and Protein, and its significance in...BibiQuinah
 
Primary Databases.pptx
Primary Databases.pptxPrimary Databases.pptx
Primary Databases.pptxSwarup Malakar
 
Database in bioinformatics
Database in bioinformaticsDatabase in bioinformatics
Database in bioinformaticsVinaKhan1
 
Presentation.pptx
Presentation.pptxPresentation.pptx
Presentation.pptxAshuAsh15
 
Bioinformatics introduction
Bioinformatics introductionBioinformatics introduction
Bioinformatics introductionDrGopaSarma
 

Similar to Bioinformatics (20)

Bioinformatics .pptx
Bioinformatics .pptxBioinformatics .pptx
Bioinformatics .pptx
 
Bioinformatics in biotechnology by kk sahu
Bioinformatics in biotechnology by kk sahu Bioinformatics in biotechnology by kk sahu
Bioinformatics in biotechnology by kk sahu
 
Biological database
Biological databaseBiological database
Biological database
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
Introduction to Bioinformatics
Introduction to BioinformaticsIntroduction to Bioinformatics
Introduction to Bioinformatics
 
BIOINFO unit 1.pptx
BIOINFO unit 1.pptxBIOINFO unit 1.pptx
BIOINFO unit 1.pptx
 
Bioinformatics Introduction and Use of BLAST Tool
Bioinformatics Introduction and Use of BLAST ToolBioinformatics Introduction and Use of BLAST Tool
Bioinformatics Introduction and Use of BLAST Tool
 
Biological databases.pptx
Biological databases.pptxBiological databases.pptx
Biological databases.pptx
 
5. BIOINFORMATICS.pptx B.Pharm sem 2 Computer Applications in Pharmacy
5. BIOINFORMATICS.pptx B.Pharm sem 2 Computer Applications in Pharmacy5. BIOINFORMATICS.pptx B.Pharm sem 2 Computer Applications in Pharmacy
5. BIOINFORMATICS.pptx B.Pharm sem 2 Computer Applications in Pharmacy
 
LECTURE NOTES ON BIOINFORMATICS
LECTURE NOTES ON BIOINFORMATICSLECTURE NOTES ON BIOINFORMATICS
LECTURE NOTES ON BIOINFORMATICS
 
Sequence and Structural Databases of DNA and Protein, and its significance in...
Sequence and Structural Databases of DNA and Protein, and its significance in...Sequence and Structural Databases of DNA and Protein, and its significance in...
Sequence and Structural Databases of DNA and Protein, and its significance in...
 
Sequence and Structural Databases of DNA and Protein, and its significance in...
Sequence and Structural Databases of DNA and Protein, and its significance in...Sequence and Structural Databases of DNA and Protein, and its significance in...
Sequence and Structural Databases of DNA and Protein, and its significance in...
 
Primary Databases.pptx
Primary Databases.pptxPrimary Databases.pptx
Primary Databases.pptx
 
Database in bioinformatics
Database in bioinformaticsDatabase in bioinformatics
Database in bioinformatics
 
Presentation.pptx
Presentation.pptxPresentation.pptx
Presentation.pptx
 
Introduction to databases.pptx
Introduction to databases.pptxIntroduction to databases.pptx
Introduction to databases.pptx
 
Protocols for genomics and proteomics
Protocols for genomics and proteomics Protocols for genomics and proteomics
Protocols for genomics and proteomics
 
Bioinformatics introduction
Bioinformatics introductionBioinformatics introduction
Bioinformatics introduction
 
Intro bioinfo
Intro bioinfoIntro bioinfo
Intro bioinfo
 

More from Kottakkal farook arts and science college

Environmental and health implications of chemical fertilizers and pesticides
Environmental and health implications of chemical fertilizers and pesticidesEnvironmental and health implications of chemical fertilizers and pesticides
Environmental and health implications of chemical fertilizers and pesticidesKottakkal farook arts and science college
 
Climate conditions and crop rotation for optimal nutritionally valuable food ...
Climate conditions and crop rotation for optimal nutritionally valuable food ...Climate conditions and crop rotation for optimal nutritionally valuable food ...
Climate conditions and crop rotation for optimal nutritionally valuable food ...Kottakkal farook arts and science college
 

More from Kottakkal farook arts and science college (20)

Environmental and health implications of chemical fertilizers and pesticides
Environmental and health implications of chemical fertilizers and pesticidesEnvironmental and health implications of chemical fertilizers and pesticides
Environmental and health implications of chemical fertilizers and pesticides
 
Soil Fertility management, Causes And Consequences
Soil Fertility management, Causes And ConsequencesSoil Fertility management, Causes And Consequences
Soil Fertility management, Causes And Consequences
 
solid waste and e-waste causes and management
solid waste and e-waste  causes and managementsolid waste and e-waste  causes and management
solid waste and e-waste causes and management
 
Soil Fertility.pptx
Soil Fertility.pptxSoil Fertility.pptx
Soil Fertility.pptx
 
organic farming.pptx
organic farming.pptxorganic farming.pptx
organic farming.pptx
 
Ecological balance in the agro-ecosystem.pptx
Ecological balance in the agro-ecosystem.pptxEcological balance in the agro-ecosystem.pptx
Ecological balance in the agro-ecosystem.pptx
 
Climate conditions and crop rotation for optimal nutritionally valuable food ...
Climate conditions and crop rotation for optimal nutritionally valuable food ...Climate conditions and crop rotation for optimal nutritionally valuable food ...
Climate conditions and crop rotation for optimal nutritionally valuable food ...
 
plastids.pptx
plastids.pptxplastids.pptx
plastids.pptx
 
LICHEN.pptx
LICHEN.pptxLICHEN.pptx
LICHEN.pptx
 
ER.pptx
ER.pptxER.pptx
ER.pptx
 
golgi bodies.pptx
golgi bodies.pptxgolgi bodies.pptx
golgi bodies.pptx
 
electron transport chain.pptx
electron transport chain.pptxelectron transport chain.pptx
electron transport chain.pptx
 
mitochondria.pptx
mitochondria.pptxmitochondria.pptx
mitochondria.pptx
 
mitochondria biogenesis and functions.pptx
mitochondria biogenesis and functions.pptxmitochondria biogenesis and functions.pptx
mitochondria biogenesis and functions.pptx
 
ANACARDACEAE.pptx
ANACARDACEAE.pptxANACARDACEAE.pptx
ANACARDACEAE.pptx
 
COFFEE.docx
COFFEE.docxCOFFEE.docx
COFFEE.docx
 
CHILLI.pptx
CHILLI.pptxCHILLI.pptx
CHILLI.pptx
 
SYNTHETIC SEED.pptx
SYNTHETIC SEED.pptxSYNTHETIC SEED.pptx
SYNTHETIC SEED.pptx
 
SHOOT TIP CULTURE.pptx
SHOOT TIP CULTURE.pptxSHOOT TIP CULTURE.pptx
SHOOT TIP CULTURE.pptx
 
somatic embryogenesis.pptx
somatic embryogenesis.pptxsomatic embryogenesis.pptx
somatic embryogenesis.pptx
 

Recently uploaded

Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...RohitNehra6
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsAArockiyaNisha
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Lokesh Kothari
 
Work, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE PhysicsWork, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE Physicsvishikhakeshava1
 
Boyles law module in the grade 10 science
Boyles law module in the grade 10 scienceBoyles law module in the grade 10 science
Boyles law module in the grade 10 sciencefloriejanemacaya1
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PPRINCE C P
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bSérgio Sacani
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxAleenaTreesaSaji
 
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfSELF-EXPLANATORY
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxAArockiyaNisha
 
zoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzohaibmir069
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Patrick Diehl
 
G9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptG9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptMAESTRELLAMesa2
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real timeSatoshi NAKAHIRA
 
Scheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxScheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxyaramohamed343013
 
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |aasikanpl
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​kaibalyasahoo82800
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhousejana861314
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...Sérgio Sacani
 

Recently uploaded (20)

Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based Nanomaterials
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
Work, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE PhysicsWork, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE Physics
 
Boyles law module in the grade 10 science
Boyles law module in the grade 10 scienceBoyles law module in the grade 10 science
Boyles law module in the grade 10 science
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C P
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptx
 
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
 
zoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistan
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?
 
G9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptG9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.ppt
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real time
 
Scheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxScheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docx
 
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhouse
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
 

Bioinformatics

  • 1.
  • 4. IMPORTANCE  It is an interdisciplinary subject, where three subjects Biology, Computer science and Information technology compain or merge together to form the new disciplin ….. Bioinformatics. OR  Bioinformatics is a branch of biology which deals with very fast, accurate and logical analysis of biological data’s and information for interpretations and predictions by making use of computational techniques. ( Margaret Day Hoff ) DEFINITION  Bioinformatics, n. The science of information and information flow in biological systems, esp. of the use of computational methods in genetics and genomics. (Oxford English Dictionary)  "The mathematical, statistical and computing methods that aim to solve biological problems using DNA and amino acid sequences and related information." -- Fredj Tekaia
  • 5. SCOPE 1) Better documentation, store large quantity of data and addition, documentation, delition of data are also possible. a) Design and discovery of drugs. Considering genomic structure of pathogens and chemical structure of drugs. b) Study based on the important biomolecules protein and nucleic acid. PROTEIN: Structural and functional unit. NUCLEIC ACID: Hereditary determining path. c) Bioinformatics is the comparison based on the already available details of protein and nucleic acid. 2) Very easy to search and access information. 3) Fast, accurate, logical analysis. 4) Interpretation and prediction.
  • 6. Applications 1) Comparison  Comparison of nucleic acid and protein sequence.  It provides similarities and differences between the sequence of protein and nucleic acids.  Two type analysis is there 1) Structural analysis 2) Functional analysis Get structural details Get functional details  Molecular level of classification of organism are possible by using bioinformatic tool.  Classification by comparing sequences by their similarities and differences of protein as well as nucleic acid sequences and there by relationship of both nucleic acid and protein.  In taxonomy morphological, enzymatic analysis and comparisons are only occur but for obtaining accurate level analysis molecular level analysis requires.  Comparison of protein and nucleic acid helps to, Classification of protein Classification of nucleic acid Classification of individual Evalutionary relationship between organism
  • 7. 2) Gene finding  Using bioinformatic gene finding easy.  Nucleic acid is the expression product of genes.  By finding the nucleic acid sequences, helps to identify the gene responsible for certain characters. Eg : gene responsible for yeild improvement  Gene finding has application in crop improvement such as resistance to insect, disease, drought, salinity etc. higher yeild.  In agricultural and medical field – useful in comparison of normal one with diseased one.  In medical field, to find out the gene responsible for genetic disorders and rectify in embryo and patient level by compairing normal with diseased one.  By Embryo therapy : at embryo level or rectify in sperm/egg Patient therapy : rectify at particular cells or nucleic acid
  • 8. 3) Protein structure prediction  Comparison of protein structure with protein structure database.  By knowing protein structure, find out the final activities, their influence in physiological and metabolic pathway of an organisms & also related growth of organisms via knowing protein structure.  Find out the disease pathway; by identifying defective protein and defective gene.  By identifying protein coding gene, helps to cure genetic disorders.  NMR technique, X- Ray diffraction technique is used for identifying protein structure. But it is very expensive and time consuming methods.  Inted of there 2 method bioinformatics are applicable, very easy, less expensive and time saving method.  Very short time required for structure prediction.  Discovery of near noval protein using bioinformatics inserted of NMR and X- Ray diffraction technique, which is used in several field, drug discovery and pharmaceutical etc.  By knowing protein structure we can synthesis biologically valuable synthetic enzymes.
  • 9. 4) Evalutionary relationship study  By structural genomics, functional genomic and comparison genomics. 5) Construction of biological data bases  Construction of data bases is a part of coming under better documentation.  Depending up of type and kind of information, different type of databases are there. DATA BASE: area or spaces where informations are stored in electric format. Different type of data bases are present, based on the information containing ( information about protein/ nucleic acid) Eg: EMBL, Gene bank 6) Total genomic structural study of an organism  Helps to species identification.
  • 10. 7) Used in environmental cleaning up programme By gene finding: scope for bioremediation. Eg: In oil spils we use psuedomonas putrida to decrease the effect of hydrocarbons in oils. Plasmid – degrade hydocarbon – total oil degrade Improve and modify individual useful for bioremediation 8) Creation of bio weapon By gene finding near future bio weapons are used for Eg: different disease causing microorganism identify and used as weapon.
  • 12. a) Nucleic Acid Databases EMBL, Gene Bank – Structure of Gene Bank entries. Specialized genomic resources. UniGene
  • 13. EMBL  Nucleotide sequence data base  It is developed by EBI ( European Bioinformatic Institute in UK)  European Molecular Biology Laboratory (EMBL)  It collect information from different sources such as * Genome sequencing projects * Scientific literature * Direct auther submission  It associated with Gene Bank, DDBJ, for exchanging information each other. So we can see comprehensive collection of information.  Its growth rate is very fast, double the information in 9-10 months.  It divided in to many subdivisions.  The Laboratory operates from six sites: the main laboratory in Heidelberg, and outstations in Hinxton (the European Bioinformatics Institute (EBI), in England), Grenoble (France), Hamburg (Germany), Rome (Italy) and Barcelona (Spain).  EMBL groups and laboratories perform basic research in molecular biology and molecular medicine as well as training for scientists, students and visitors.  Informations are accessing through SRS system. SRS: SEQUENCE RETRIEVAL SYSTEM  The first systematic genetic analysis of embryonic development in the fruit fly was conducted at EMBL by Christiane Nüsslein-Volhard and Eric Wieschaus,[13] for which they were awarded the Nobel Prize in Physiology or Medicine in 1995.  In the early 1980s, Jacques Dubochet and his team at EMBL, developed cryogenic electron microscopy for biological structures. It was rewarded with the 2017 Nobel Prize in Chemistry.  URL Address : ( Uniform Resource Location) http://w.w.w.ebl.uk/embl/
  • 14.
  • 15. GENE BANK  It is a primary nucleiotide sequence biological data base.  Full form Gene Bank  Developed by NCBI (National Centre for Biotechnology Information)  Less restriction  AIM: Helps the scientific and research community in order to support their research activity that contain information without restrictions except copy right sequence and patent sequence.  Growth rate : 1 months; with in one month double the informations.  Information's are divided into 17 divisions for getting information easily.  There are 17 divisions to make convinient & efficient informations in Gene Bank.  2 Retrieval system: 1) Entrenz Integrated Retrieval system : It have a capacity to link with nucleotide sequence db with protein sequence db. 2) MEDLINE Facility: useful to get information of abstract of originally bublised papers related to nucleiotide sequences.  http://w.w.w.ncbi.nlm.nih.gov/genebank
  • 16. Gene Bank incorporates information from # publish available sources # primarily from direct author submissions # large scale sequencing project To help ensure comprehensive coverage, the resource exchanges data with both the EMBL data library and DDBJ.
  • 17. Structural Entities The Structure of Gene Bank Entries A Gene Bank release includes the sequence files, indices created on various databases fields and information derived from the databases. Gene Bank was made availabe on CD-ROM It is convenient machanism for widespread. Relatively inexpensive distribution As the size of the database, large no.of CD required and dificult to handle for the producers and for the users. Today Gene Bank is available in FTP format. Commonly used is the sequence entry file which contains the sequence itself and disruptive information relating to it. Each entry consist of no. of keywords,relevent associated sub- keywords and an optional features.
  • 18. The structure of gene bank entries consist of 13 structural components: 1) LOCUS 2) DEFINITION 3) ACCESSION NUMBER 4) VERSION 5) KEYWORDS 6) SOURCE 7) ORGANISM 8) REFERENCE 9) AUTHOR 10) TITLE 11) JOURNEL 12) PUB MED NO 13) REMARK/COMMENT
  • 19. 1) LOCUS: we need to provide an entry number (identification for nucleiotide sequence) [ NM- 000555- mRNA- Tuesday, 21.7.2018] (entry no.) (Type of sequence) (day ) (day.M.year) 2) DEFINITION: scientific name of source organism. Eg for Bt gene: Sequence entering there and expresssion product. scientific name: Baccillus thuringenesis, mRNA, βendotoxin. 3) ACCESSION NUMBER: normallysimilar to entry number. [NM: 000555] 4) VERSION: if we want to update information we first write entry No. and version No. and also gene information Id No. along with it. [NM: 000555.5.G: Id No 12345] 5) KEYWORDS: we must provide the key word of our work, if no key word put a dot. Eg: Insert resistivity . 6) SOURCE: name of source of organism which we get, we must write common name . source organism: Bacteria 7) ORGANISM: name of source of organism, we must write scientific name. scientific name of source of organism: Bacillus thuringenesis
  • 20. 8) REFERENCE: reference of that paper published related to enter the nucleotide sequence of interest. 9) AUTHOR: we need to enter the name of author in the same order as in the same order as in the case of published. 10) TITLE: title of the paper 11) JOURNEL: name of the journel where you have publishd the paper. 12) PUB MED NO: this is the no. which helps to access the archived published paper with in PUB MED( scientific journel archiver). 13) REMARK/COMMENT: we can enter, biological importance/ expression/changes/source organism as comment.
  • 21. Specialized genomic resources  The purpose of specialized resources is to focus on species - species genomics and to particular sequencing techniques. The particular aim of such a data base is the integrated view of a particular biological system. a) UniGene * The collection represents genes from many organisms and each cluster relating to a unique gene and including related information corresponding to the gene. * A valuable role of UniGene is in gene discovery. * UniGene is also used for gene mapping projects and large scale gene expression analysis.
  • 22. b)TDB — The TIGR Database * These databases containing DNA and protein sequence, gene expression, protein family information etc. * Also the data such as taxonomic range of plants and humans, role of cellular components are also present. c) SGD (Saccharomyces Genome Database) * SGD is an online data resource which contain information on the molecular biology and genetics of S.cerevisiae (Budding yeast). * This data base provides internet access to the genome, its genes and their products etc. * SGD helps the research field by uniting together functions to perform sequence similarity search tools. * The illustration of genetic maps using dynamically created graphical displays make the data base user friendly.
  • 23. UniGene It is an specilized genomic resources. There are the databases, which tend to be linked, to some extend, with the primary DNA databases from which they may derive their data and into which their results are usually fed. Purpose of specialized genomic resource 1) to species-specific genomics 2) to particular sequencing technique Primary goal of human genome project is to determine the complete sequence of human genome.93 billion base pairs) 3% of the genome encodes protein. Biological significance of remainder is unknown
  • 24. A transcript map is a vital resource in flagging there parts of the genome that are actually expressed. Unigene attempts to provide a transcript map by utilising sets of non-redundant gene-oriented clusters derived from genebank sequence. The collection represents gene from many organisms each cluster relating to a unique gene and including related information., such as the type in which the gene is expressed, map location etc.
  • 25. b) Protein Sequencing Databases PIR SWISS-PROT TrEMBL Composite Protein Databases NRDB OWL Secondary Databases PROSITE PRINTS BLOCKS IDENTIFY
  • 26. SWISS-PROT • Protein sequence database • Switzerland based database. • SWISS-PROT is an annotated protein sequence database established in 1986 and maintained collaboratively, since 1987, by the Department of Medical Biochemistry of the University of Geneva and the EMBL Data Library. • It is a curated protein sequence database, which strives to provide a high level of annotation • (such as the description of the function of a protein, its domain structure, posttranslational modifications, variants, source and organisms) • a minimal level of redundancy, and a high level of integration with other databases. • SWISS-PROT contains the information about the name and origin of the protein, protein attributes, general information, ontologies, sequence annotation, amino acid sequence, bibliographic references, cross- references with sequence, structure and interaction databases, and entry information.
  • 27.
  • 28.  It is maintained collaboratively by the Swiss Institute for Bioinformatics (SIB) and the European Bioinformatics Institute (EBI).  The SWISS-PROT group is headed by: Rolf Apweiler.  It contains non-redundant sequence entries and informations are thoroughly revealed and annotated.  Provide protein sequence to students researchers and other related industries like pharmasutical industries.  SWISS-PROT aims to be minimally redundant and is interlinked to many other resourses.  Linked with other databases EMBL and TrEMBL.
  • 29. TrEMBL  It is primary protein sequence database  Translated EMBL  A protein sequence database of nucleotide translated sequences.  Created in 1996 as a computer annotatd suppliment to SWISS- PROT  This is complete annotated protein sequence databases.  There databases is constructed via translatingeach nucleiotide sequence that are available in EMBL in to protein sequence by using computational techniques.  The TrEMBL sequence database contains the translations of all coding sequences (CDS) present in the DDBJ/EMBL/GenBank Nucleotide Sequence Database and also protein sequences extracted from the literature or submitted to SWISS-PROT, which are not yet integrated into SWISS-PROT.
  • 30.
  • 31. TrEMBL consist two divisions: SP TrEMBL REM TrEMBL  It is an temporary storing area where incomplete sequence have not yet manually annotated.when it is fully discribed contains entries that well eventually be incorporated in to SWISS-PROT.  TrEMBL developed by EBI  It contains completely explained and fully annotated sequences.  Contains sequences that are not destined to be included in SWISS- PROT  Eg: # immunoglobulins & t cell receptors. # fragments of four than eight amino acids # synthetic sequences # patented sequences
  • 32. PIR  Primary protein sequence data base.  Protein Information Resource[1960]  Developed by Margaret Dayhoff in 1960 as a collection of sequence for investigating evolutionary relationships among proteins.  Developed at the National Biomedical Research Foundation ( NBRF)  The databases is split into 4 distinct sections. Based on kind of informations level.  PIR-1, PIR-2, PIR-3, PIR-4  They differ in the terms of # quality of data # level of anotation provided.
  • 33. 1) PIR-1  Contains fully classified and annotated. 2) PIR-2  Includes preliminary entries, which have not been throughly reviewed and may contain redundancy 3) PIR-3  Contains unverified entries, which have not been reviewed. 4) PIR-4  Contains protein sequences that are not geneticallly encoded and not produced on ribosomes. So they are synthetic protein sequences.
  • 34. Composite Protein Databases  These are the amalgamation or compilation of product of different primary databases.  Makes searching easy and efficient for a searcher.  They render sequence searching much more, because they obviate the need to interrogate multiple resources 1) NRDB 2) OWL
  • 35. NRDB- Non-Redundant Data Base  It is built localy at NCBI  Combination of 6 primary DB 1. SWISS-PROT 2. PDB 3. PIR 4. Gen pept 5. Gen pept update 6. SP update  Non-redundant & error free  But if strictly speaking chance of redundency and error  When redundency and error and incorrect sequence are present in any component DB. As such they where incorporated in to NRDB, especially in SWISS-PROT.  Make more efficient via, avoiding to search to too much DB for getting related information.
  • 36. OWL- Ontology Web Language  Web ontology language  Compilation of 4 primary DB 1. Gene Bank 2. SWISS-PROT 3. NRL-3D 4. PIR-4  Make searching more efficient via, avoiding or obivating too much DB for getting related information  Developed by NCBI  If any redundency in Gene Bank, it is as much incorporated into OWL during amalgamation.  Development of university deals –UK in association with Daresburg laboratory in warrington 1994  The sources are aligant on the basis of level of annotation and sequence validations  SWISS-PROT has the highest priority  OWL is only released on a 6-8 weekly basis .
  • 37. Secondary Databases PROSITE PRINTS BLOCKS IDENTIFY  It contains the fruits of analysis of sequences in the primary sources  Simply secondary data were derived from primary  These are db which are analysed primary databases, which from secondary data. These are several different primary db & a variety of ways of analysing protein sequences.
  • 38. PROSITE  First secondary DB to have been developed was PROSITE  Generate its information from the primary data base SWISS-PROT  Produced and maintained by SIB  Relesed date : 1988 by amosbiroch  URL Address: http://www.prosite.expasy.org.  It categorises the protein sequences in families.  Proteins are grouped into different family. Based on the single most conserved Motif.  Motif: it is a ring of aminoacid (10-20 amino acid sequences)they are responsible for protein function and preserves its 3D structure.  Such Motifs usually according key biological function.  Eg: enzymes active site, ligand or metal binding site  Motif indicate or represent charecteristic features or site for each family.  The region act as signatures of particular protein family and help to identify the other newly members of family  PROSITE is developed a largely manual process of seeking the patterns that best fit particular families and functions.
  • 39.  In PROSITE entries are developed in two different files 1) First of this pattern and list all matches in the new version of SWISS-PROT 2) Documentation file provide: # details of characterized family # discription of biological molecule of choosen Motif # supporting biografy SIGNIFICANCE  To find families based on Motif, ie; presence of motif the same portion of many sequence are considered a single family.  Fat functional charecterization and annotation of protein sequences.  Identify possible functions of newly discoered protein and analyses of protein for previously unditermined activity  Offers tool for protein sequence analyses and Motif detection  It is a part of expasy proteomics analysing server APPLICATION  Classification of protein is possible based on the highest conserved motif  Based on particular motif can identify the charecteristic features of motif and representing character.  Eg: the structural and functional details if that proteins
  • 40. PRINTS  Collect information from OWL in future. It will collect information from SP, TrEMBL and SWISS-PROT  Information deriving process from OWL is called interactive data base scanning.  Contributed by SIB  In 1999 it was maintained in the department of biochemistry and molecular biology at university college London (UCL).  http://www.bioinf.man.ac.uk/db browser/ bioactivity/ protein 2 frm. html.  Here we need to consider multiple Motif. Insert to single common Motif.  Helps to find out the more similar sequence. So clear information are available.  More accurate analyses is possible based on similar multiple motif sharing by sequences.
  • 41. BLOCKS  Multiple Motifs based database  Ungaped multiple alignment of Motifs  Database contains informations on blocks  Highly conserved multiple motifs are arranged without any gap  Developed by Henikoff 1998  Automatically derived database  Database constructed by using automated PROTOMAT system.  Ultimately encoded as ungapped local alignments are calibrated against SWISS-PROT to obtain a measure of the likelihood of a chance match  Two scores are noted for each block :  first denotes at the level at which 99.5 percentage of matches are true negative.  Second median value of the true positive scores .  The median standardized score for known true positive matches is termed strength .  Because the database is derived by fully automatic methods, The blocks are not annotated but links are made to the corresponding PROSITE family documentation file .
  • 42.  These information are derived from the secondary database PRINTS & PROSITE it can also called as tertiary database .  It is based on protein families contained in PROSITE, at Fred Hutchinson Cancer Research Centre (FHCRC).  The motifs or BLOCKS are created by automatically detecting the most highly conserved regions of each protein family.  The blocks are ultimately and encoded as Ungappped local or multiple alignment.  Structure of BLOCKS entries:  Where each block is identified by a general code (ID) line and accession number.  ID line indicates the type of discriminated to expect in the life.  AC line indicates the minimum and maximum distance of the blocks from its preceding neighbour.  DI line contains the descriptions for a title of the family.  BL line indicates the diagnostic power (amino acid triplet, number of sequence it contains)
  • 43. IDENTIFY  Another automatically derived tertiary source  Derived from BLOCKS and PRINTS  Developed in the department of biochemistry at stanford university by Navill - Manning et al 1998  Constructed on the basis of e-motif  e-motif : it is a based on the similarities of highly conserved Motif sequence.  This database is constructed on the basis of generalised expressions of similarities between highly conserved Motif sequences.  It is designed to be more flexible band exact regular expression matching.  They are accessible for use the protein function web server from the biochemistry department at stanford sets and their properties are used in e-Motif.
  • 44. Structure Classification DataBases  Many proteins share structural similarities, reflecting, common evolutionary origins 1) SCOP 2) CATH
  • 45. SCOP  Structural Classification Of Proteins  It is maintained under MRC laboratory of molecular biology and centre for protein engineering.  Which describes structural and evolutionary relationships between proteins of known structure 1995.  It is helpful for at the multi domain level and individual domain level.  It is constructed using a combination of manual inspection and automated methods.  The information of structure of protein is available due to the Checking done with automatic and manual method result would be more accurate.
  • 46. Scope Classification  proteins are classified in a hierarchical fashion to reflect their structural and evolutionary relationships.  In this protein structures are assigned in a hierarchical order at three levels: 1) Family 2) Super family 3) Fold  Family proteins are clustered into families with clear evolutionary relationship if they have sequence identify more than 30 percentage sequence similarity  Super family proteins are placed in super families when in spite of low sequence identify their structure structure and functional characteristics suggest a common evolutionary origin.  Fold proteins are classified as a common fold is have the same major secondary structures in the same arrangement and with the same topology  Scope is accessible for keyword via MRC laboratory webserver  http://www.bioinf.man.ac.uk/db browser/ bioactivity/ structure frm. html
  • 47. CATH  Class Architecture Topology Homology  It is a hierarchy in classification of protein structures maintained at University College of London (UCL) 1997.  The resource is largely derived using automatic methods but manual inspection is necessary word automatic methods, fail.  Developed by UCL's biomolecular structure and protein modelling unit. Used for classification of protein structure. There are five levels within the hierarchy. A) CLASS Is derived from gross secondary structure content and packing of protein. four classes of domain are recognised , 1. SUBCLASS 1 2. SUBCLASS 2 3. SUBCLASS 3 4. SUBCLASS 4 Sub class 1: mainly similarities in alpha helix Sub class 2: similarities in beta sheet Sub class 3: alpha - beta which includes both alternating alpha /beta and alpha + beta structures Sub class 4: based on secondary structure content for element secondary structural element contents will be very less in amount
  • 48. B)ARCHITECHTURE  Describe the gross arrangement of secondary structure ignoring the connectivities. C) TOPOLOGY  both the overall shape and the connectivity of Secondary structures protein D) HOMOLOGY  share more than 35 percentage sequence identity and share a common and sister (homologous )similarities are first identified by sequence comparison and and structure comparison algorithm E) SEQUENCE # Final level in the hierarchy. # Structures with homology groups are further clustered on te basis of sequence identify. # domains have sequence identifies more than 35 % indicating highly similar structures and functions CATH is as accessable keyword via UCL’s biomolecular structure and modelling unit web server.
  • 50. A) Sequence Data Base Searching EST searches Different approaches to EST analysis Merck/IMAGE Incyte TIGR EGAD EST analytical tools Sequence similarity Sequence assembly and Sequence clustering
  • 51. EST searches  Expressed Sequence database.  EST data are held in the EST database.  EST sequence tag are also called gene transcripts.  Which maintains its own format and identification number system.  Expression tag sequence is a short sequence .  Short nucleotide sequence produced from CDNA  mRNA- reverse transcriptase enzyme- single stranded DNA.  A typical EST will be between 200 to 500 basis in length, with modern technical advances increasing the theoretical length resulting from a single run 1000 bases are more  It is called genes transcript and parcel sequences and series are noisy sequences that, as a result of sequences errors, may not only contain have ambiguous bases but also be missing bases.
  • 52.  In analysing EST’s, the following points should:  The EST alphabet is five characters ACGTN.  EST will be sum sequence of any other sequence in the database  EST may not represent part of the series of CDS of any gene .  EST production is highly automated and results often contaminated with ambiguous are missing bases. This course difficulties in sequence interpretation. Uses  Identification of particular gene  Mapping of genes within a genome by using a small stretch of sequence  Identification of species  For academic analyses or commercial exploitation have been developed
  • 53. Different approaches to EST analysis  These are the EST’s information providing sources.  Where is approaches to establishing libraries of EST’s for academic or commercial exploitation have been developed.  Much of the publicity available data are collected together into the EST sections of the year EMBL data library and Gene Bank (db EST)  Merck/ IMAGE Incyte TIGR EGAD
  • 54. Merck/ IMAGE  It is a research project was run by the university of washington and funded Merck and company.  In 1994 , Merck and co-founded a research project based at the university of washington to sequence 300000 EST’s from a variety of normalised libraries. AIM:  To produce 3 lakh EST’s from CDNA libraries.  For many years Merck has sponsored the production of a drug index. Approaches of the sources  To support academic analysis  Commercialization of EST information to drug production  The drug index is known as Merck Gene Index as of May 1997, A,84,421 EST’s had been submitted by the project to dbEST
  • 55. Incyte  It is a pharmaceutical company  Incyte pharmaceutical Inc.  It produces a database Life Seq, that enphasises the quantitative information derived by sequencing strand CDNA libraries. AIM  To provide/collect information on relative copy numbers of genes in healthy and deseased tissue.  To facilitate the elucidation of potential therapeutic targets. APPROACH  Commercialization of genomic information regarding EST’s of healthy and diseased cells. Then it give to the therapeutic targets.  Production of drugs for getting money  In april 1998, the size of Life Seq was 2.5 million EST’s representing 8000 to 12000 different genes.
  • 56. TIGR  The Institute for Genomic Rsearch .  It is a government organisation .  It purely stands for academic purposes .  It is a research organisations with interest in structure, functional and comparative analysis of genomes and gene products .  The range of organisms covered includes viruses, Eubacteria ,pathogenic bacteria ,archaebacteria and eubacteria (plant and animal) AIM  Preparation of Human Gene Index (HGI).  This index integrates results from human genome research projects around the world including that from db EST and Gene Bank.  To create a non redundant view of all human genes and informations on their expression pattern cellular roles , functions and evolutionary relationship.  Data in HGI are freely available.  TIGR sequence more than 100000 EST’s from over 300 CDNA libraries + data from db EST + non redundant Human Transcript Information using the technique of sequence assembly, to generate Tentative Human Consensus ( THC) sequences .
  • 57. EGAD  Expression Gene Anatomy Database  It is database providing information of EST’s
  • 58. EST Analytical Tools There are many tools avilable for the analysis of EST’s:  Commercially available Tool = Incyte Life Tools  Publicaly available Tool = 3 Types 1) Sequence Similarity Search Tools 2) Sequence Assembly Tools 3) Sequence Clustering Tools
  • 59. 1) Sequence Similarity Search Tools  We consider the tools as the relate to EST's.  If the reason est is told, then identify the tool which shows the sequence similarity with the EST, by comparing the all sequences.  Eg: BLAST tool BLAST P BLAST N BLAST X X BLAST N
  • 60. 2) Sequence Assembly Tools  When a search of databases reveals several EST matching with probe sequence, normally the ESTs must be aligned with each other to reveal the consensus sequences.  This tool is used in when there are several EST sequences showing similarity to a probe sequence .  In this situation, this tool will do aligning and merging of different fragments of sequences to reconstruct the original mRNA .  Example; Phrap, Staten assembler, TIGR assembler
  • 61. 3) Sequence Clustering Tools  These are the programs that take a large set of sequences and divide them into subsets, or clusters, between the extent of shared sequences are defined in a minimum overlap region.  These tools having the capacity to analyse a large set of sequences and capable of grouping for clustering sequences based on the sharing of maximum similar regions .  Reliable and effective mechanism for clustering EST will reduced redundancy in the database And save database search time and analysis effort .  Example: Wed EST clustering tools USEARCH CD- HIT
  • 62. Sequence similarity searching tools  These are softwares used for searching, assessing, analysis, interpretation and prediction of information containing in databases.  These are two types 1) Pair wise sequence alignment and similarity searching tool # A pair of sequence involved # one will query sequence and other template. # query – sequence will be studied # template – will be find out from DB Eg; BLAST , FASTA 2) Multiple sequence alignment and similarity search tool or homology searching tool # more than two sequence involved. # a set of sequence can compare in it & alignment possible Eg; CLUSTAL , MODELLER PSI - BLASTA # Position specific Interacted blast # It is an hybrid of pairwise sequence alignment and multiple sequence similarity search tool
  • 63.  sequences are aligned to find region of higher density or strong similarity.  According to the sequence length, sequence alignment are two types; 1) Local sequence alignment: Sequence alignment that select only regional areas only which exhibit strong similarity Eg: BLAST, FASTA, PSI - BLAST 2) Global sequence alignment : Sequence alignment that consider entire sequence known as global sequence alignment
  • 64. Functional Analysis Tool • Protein as well as nucleotide. • Used for functional analysis. • To study the similarities of sequence based on their function • GOFFA : # Gene ontology for functional analysis # using for identification of functional elements in genome and related functional analysis of gene and genome • Ermine J : # Used for genome analysis # and also for functional analysis related to gene expression • Interproscan : # It is used for the functional analysis of protein
  • 65. Structural Analysis Tool  Structural analysis of nucleotide and proteins . Eg:  SWISS PROT  PDB viewer  Ras Mol
  • 66. Statistical Analysis Tool  Statistical analysis the value of similarity and differences Eg:  Statistica  Met Lab  Perl
  • 67. B) Pair-Wise Sequence Alignment Technique  Comparison of sequences and sub sequences  Identity and similarity  Substitution matrics  PAM  BLOSUM  DOTPLOT  BLAST  FASTA
  • 68. Substitution matrices ( BLOSUM & PAM)  When two sequences compare, one sequences have Leusine and other also have Leusin at comparing sequences,  If the residue to residue (Leusin- Leusin)Similarity in amino acid in the both sequences plot alignment score as 1.  But according to this substitution matrix program due to mutation or evolutionary change, the amino acid can change and cause mismatches.  But the mismatch can accept matching ones, since they do not change the basic structural or functional.  The matching are considered by deep analysis.  Used in the study of evolutionary relationship.  If amino acid changes their nature will be considered. if nature Remains same in deeper analysis, researcher should be considered them as match one and plotted it in matrices such plotted matrices produce called substitutional matrices.
  • 69. BLOSUM Model  It is a substitution matrices.  BLOCKS amino acid substitution matrices .  It was proposed to overcome the problem of alignment of distantly related sequences comparisons on substitution matrices .  It was proposed by Steven Heinkoff & Jorja G Henikoff in 1992 , From the conserve regions of blocks the informations are derived from the and amino acid patterns of distantly related protein sequences available in BLOCKS databases hence the name BLOCK SUBSTITUTION MATRIX.  BLOSUM Matrices are based on a much larger data set.  Represent distant relationships more explicitly. The closely related sequences are considered and clustered together and treated as single sequences.
  • 70.
  • 71. The cluster contains sequences that have sequences identifies higher than it cutoff called clustering percentage changes in clustering percentage Leads to a family of matrices. This has three versions of comparison: BLOSUM 30 - 30 less than 30 percentage similarity BLOSUM 62 - 62 or between 62 and 30 percentage similarity BLOSUM 90- 90 or between 90 and 62 percentage of similarity It helps to detect all kinds of information and to get diverse type of relationships (closely and distinct )
  • 72. PAM  (Point Accepted Mutation or DayHoff PAM model)  Also known as DayHoff amino acid substitution matrix.  It was derived by M.O.DayHoff In 1978.  Here Substitutions of A.As are observed in homologuos protein sequences during evolution, so these amino acids Substitutions do not significantly change the function of the protein.  These substitutions are accepted by natural selection.  These matrixes are known as as accepted point mutation or point accepted mutation PAM.  To prepare PAM Matrices , observed substitutions that occur in alignments between similar sequences estimated Then used to generate a 20×20 mutation probability matrix p representing all amino acid changes.
  • 73.
  • 74.  Each element of matrix Pij Represent the probability of replacement of A.A. j by A.A i Over a fixed evolutionary period .  For PAM 1 Is the unit of evolutionary divergence in which one percentage of amino acids have been changed .  The model has limited value.  Applied for highly similar sequence alignment and comparison .  Only used in case of closely related sequence comparison .  Not provide distantly related Closely related sequences and relation to overcome this later proposed BLOSUM.  Used in evolutionary studies
  • 75. DOT PLOT Analysis  It is a paradise sequence alignment  It is a very simple and basic pair why sequence analysis technique  It is done by manual and graphical method of sequence analysis  W ithin a plot, two identical sequences are characteristic  It is the most basic method of comparing two sequences A visual approach known as Dot Plot.  It was first described by A J Gibbs and G A Memory in 1970  It is a graphical method for comparing two sequences to identify the region of similarity or dissimilarity, depicted by the presence or absence of a dot on the plot, hence the name Dot Plot.  To construct dot plot of sequences A and sequence B, the first sequences is taken on the top of the plot (x axis) and the second sequences is taken on the left side (y-axis) of the plot.  A dot is placed on the plot if any sequence character Ai Present in A sequences is identical to sequences character Bi Present in sequence B.
  • 76.
  • 77.  A region of constructive Identical characters between both sequences forms a diagonal line on the plot space .  When large similar sequences are compared, such clouds become crowded or noisy. To overcome this, the sliding window concept is used .  From the dot plot, the alignment score is calculated . Uses  Used for improvise logical sequence analysis.  Useful for comparison of protein sequences.  The plot is characterized by some apparently random dots (noise) indicates regions of greater similarities between two sequences
  • 78. BLAST  Basic Local Alignment Searching Tool  Pair wise sequence alignment tool.  Developed and maintained by NCBI  It is a tool specialised in local sequence alignment inserted of whole sequence alignment.  Tool based on a statistical, theory called explicit statistical theory by Altschul et al 1990  Ungapped Alignment of regional sequences  Can be used to align both protein and nucleotide sequences but it can provide with alignment for protein sequences  Very fast searching tool  This tool can be search a data with millions of sequences in the data base with In a second in pair wise manner.
  • 79.
  • 80. Use  Construct pair why sequence alignment by comparisons between two sequence.  Best tool for searching single most best sequence from corresponding database.  To find out the structural sequence similarity of quary sequence include 3d structure.  Used in the interpretation and prediction of structural information.  Interpretation and prediction of functional information. Steps  Selection of regional areas of information shows best similarity .  Extension of searching towards both the sides of selected region to get maximum similarity . Demerits  At a time, we can only Compare a query sequence with a single sequence. sensitivity to select sequences. sometimes it may loses its sensitivity in selecting best matches from databases (because when this tool tries to maintain thier speed in selecting the best .it may missed certain matches that may be better than selected one .
  • 81.  1) BLAST P Used to search and find out a perfect protein sequences from the P.S.D.B for for the query sequences. 2) BLAST N Search and find the best N.S from N.S.D.B For the query sequences . 3) t BLAST N query sequeneces equal to protein sequences. Then the given N.S.D.B Is converted into protein sequences then comparing the quarry with the translated nucleotide sequences. 4) BLAST X query sequence = nucleotide sequence we are searching within P.S.D.B, Then the protein sequences are converted into nucleotide sequences and compare nucleotide sequences with the translated protein sequences. 5) t BLAST X This translates Both N & P sequences in the respected databases and then searching is occurs.
  • 82. FASTA  fast all  it is a sequence alignment tool  developed by Lipman and pearson 1985  The FASTA format is a text-based format for representing either nucleotide sequences or amino acid (protein) sequences, in which nucleotides or amino acids are represented using single-letter codes.  The format also allows for sequence names and comments to precede the sequences.  The format originates from the FASTA software package, but has now become a near universal standard in the field of bioinformatics.  The simplicity of FASTA format makes it easy to manipulate and parse sequences using text-processing tools and scripting languages like the R programming language, Python, Ruby, and Perl. comparison with BLAST:  It give better results for nucleotides but can used for both P& N sequences .  It can provide better results than BLAST N But not better than BLAST P.  More sensitive than BLAST in selecting best matches Missing of sequences while searching is lesser than BLAST.
  • 83. Different forms of FASTA: 1) FAST A3 It has a normal function used for both N & P Sequences for searching P& N sequence query 2) FAST S3 Used to compare linked peptides against a protein sequences databases 3) FAST f3 Used to compare mixed peptides against protein sequences databases 4) FAST X/Y3 Used to search within protein sequences databases against a translated query N.S. 5) t FAST X/Y3 Used to search within a translated protein sequence databases for comparing a query protein sequences
  • 84. C) Multiple Alignment Technique  Objective, manual, simultaneous and progressive methods  Databases of multiple alignments  PSI-BLAST  CLUSTAL-W
  • 85. Multiple Sequence Alignment  More than two sequences involved.  A set of sequences can compare at time and alignment also possible.  2 type alignment:  Simultaneous Multiple Sequence Alignment and Progressive Multiple Sequence Alignment. 1) Simultaneous Multiple Sequence Alignment  Alignment occur a time, that is simultaneously.  There is no hierarchy fashion of arrangement or orderly arrangement.  But sequences having similarity. Advantage Very fast, very quick alignment Disadvantage  We can't expect orderly arrangement of sequences based on similarity.  Evolutionary relationship study is not possible
  • 86. 2) Progressive multiple sequence alignment  Hierarchical arrangement of sequences and clear cut orderly arrangement can seen.  Sequence alignment of occurs progressively by step by step, little time consuming process.  This alignment best and most similar sequence, arrange next after query sequence. Advantage  Arrange at hierarchical fashion .  Evolutionary relationship study possible Diadvantage  Comparatively slow and little time consuming process
  • 87. PSI-BLAST  PSI-BLAST (Position-Specific Iterative Basic Local Alignment Search Tool) derives a position-specific scoring matrix (PSSM) or profile from the multiple sequence alignment of sequences detected above a given score threshold using protein–protein BLAST.  This PSSM is used to further search the database for new matches, and is updated for subsequent iterations with these newly detected sequences.  Thus, PSI-BLAST provides a means of detecting distant relationships between proteins.  PSI-BLAST is most conveniently used on the internet with the help of the graphical user interface provided by the PSI-BLAST search page on National Center for Biotechnology Information (NCBI) website (http://www.ncbi.nlm.nih.gov/BLAST/).  The PSI-BLAST page may be customized by the user in terms of automated or semiautomated or “two-page formatting” and other parameters modified as desired.  This page can then be saved as permanent internet bookmark for repeated use on future occasions.
  • 88.
  • 89.  It is an hybrid tool  It is a recent approach  Hybrid element of both device and multiple sequence alignment method  It was proposed by Altschul in 1997  Hybrid of pairwise sequence alignment and multiple sequence alignment and similarity searching tool.  It can aligned sequence via progressive sequences alignment  Searching residue to residue similarity, we compare sequence only, plot dot similarity occurs.  If there similarity present, place a dot mark as graphical representation  Calculate similarity  Out of 7, 5 is similar  Used mainly for nucleotide sequence comparison
  • 90.  Here, sequences are aligned via pair wise , but with repeated blast in order to get more and more related sequences.  So they act as pair wise as well as look like a multiple sequence alignment .  So they contains maximum similarity, median and least similarity Advantages  To increase the search of BLAST  fast to run  provide sequences with diverse range of sequence similarity like M.S. alignment  Searches are more sensitive and Selective, able to detect weak but meaningful similarities.  running the program, increases search sensitivity. Disadvantages  To derive diagnostic family motifs can be very time consuming and demands levels of understanding for general use.  Automated interactive stearch may degenerate and lead to profile dilution
  • 91. CLUSTAL  3 forms: 1) CLUSTAL X 2) CLUSTAL W 3) CLUSTAL ω  CLUSTAL X&W: Protein sequence as well as nucleotide sequence alignment possible  CLUSTAL ω: Can only align the protein sequence  CLUSTAL X:  In CLUSTAL X Controlling interface is graphical user interface.  Menu based operations for this handling or graphical representations are used.  CLUSTAL W CLUSTAL ω:  Command line interface.  For controlling interphase using text command.
  • 92. Clustal W  Clustal W like the other Clustal tools is used for aligning multiple nucleotide or protein sequences in an efficient manner.  It uses progressive alignment methods, which align the most similar sequences first and work their way down to the least similar sequences until a global alignment is created.  Clustal W is a matrix-based algorithm, whereas tools like T- Coffee and Dialign are consistency-based.  ClustalW has a fairly efficient algorithm that competes well against other software.  This program requires three or more sequences in order to calculate a global alignment, for pairwise sequence alignment (2 sequences) use tools similar to EMBOSS, LALIGN
  • 93.
  • 94.  Multiple sequence alignment tool  progressive multiple sequence alignment possible  written in O ++ programming language.  this can run almost all platforms like Unix, Linux, Metash, Windows  Developed by Juli Thomson and Toby Gibson  Developed and maintained by EBI  User interface is command line, interface by write text commands.  Due to progressive multiple sequence alignment comparison is very easy due to orderly arrangement. Application  Very easy to compare sequences due to progressive sequence alignment  Very useful for the classification of both protein and nucleotide sequences.  Application in predicting structural and functional features of both nucleotide as well as protein sequences.  This is the best tool for evolutionary relationships study .
  • 95. 4.Protein Structure Prediction A)Secondary structure prediction 1) Chou-fasman Method 2) J Pred prediction method
  • 96. Secondary structure prediction  Commonly two methods are used for protein structure prediction 1) X - ray diffraction technique 2) Nuclear magnetic resonance technique  Birthday are very expensive by clever wise and time taking processes.  To over comes these issues we are used by biinformatics tools.  Less time consuming and very fast method.  Skilled labours are not required.  Cheapest method, when comparing with above 2.
  • 97. Chou-fasman Method  Chou fasman Method is an empirical technique for the prediction of secondary structures in proteins .  Development by Peter Y Chou and Gerald D Fasman.  The method is based on analysis of the relative frequencies of each amino acid in alpha helix, beta sheets and turn based on known protein structures solve with x-ray crystallography.  From these frequencies a set of probability parameters were derived for the appearances of each amino acid in each secondary structure type, And these parameters are used to predict the probability that a given sequence of amino acids would form a helix, a beta strand, for a turn in a protein.  Significantly Low accurate than the modern machine learning based technique.  50 to 60 percentage accurate in identify correct secondary structures
  • 98.
  • 99. Definition  It is an statistical procedure in which each and every amino acids and their frequencies of given sequence is Compared with the probability of amino acids and their corresponding propensitive Values given by Chou Fasman in order to Fit the given protein to a particular secondary structure Probability table What are the amino acids and their numbers are present in secondary structure of protein according to traditional sequence Propensitive value  Is is the value at which a particular and aminoacid showing their tendency towards a particular secondary structure.  Propensity value of an aminoacid is generally depends the chemical properties and their R groups: # Alpha helix: 4 helix markers + 2 helix breakers # Beta sheet: 3 sheet markers + 2 sheet breakers
  • 100. Steps  Scan through the given polypeptide chain  For to find out the what are the different amino acids present in the given strand  Also for finding out their numbers  Compare the same with the probability and propensitive value given by Chou Fasman
  • 101. J Pred prediction method  A protein secondary structure prediction server  Fully automatic method  It has been operation since approximately 19  J Pred Incorporate the J net algorithm in order to make more accurate predictions.  Combination of 6 Independent protein structure prediction method 1) Z PRED 2) MUL PRED 3) DSC 4) PHD 5) NNSSP 6) PREDATOR
  • 102.  All 6 different method predict independency .  396 Domain data support secondary structure information.  Evaluate 6 different methods result with 396 domain data and get final structural information.  Inserted of 6 method, using Gives more accurate results than it using Z PRED, MUL PRED Methods.  4 methods compilation gives accuracy 72.9 percentage .  It is an Secondary structural prediction method, hear combilation of 6 different independent methods are using .
  • 103.
  • 104. Tertiary Structure Prediction Comparative modelling-  MODELLER  RasMol
  • 105. Comparative modelling  Comparative modelling/Homology modelling  It predict the 3d structure of proteins.  It uses experimentally determined protein sequences as models (templates)  The method predict the structure of another protein that exhibits aa sequence similarity to the template protein.  Evolutionary related protein have similar sequence and structure.  These similarities are very high in Core regions the sequence similarity should be greater than 35 percentage
  • 106. Steps 1) selection of tablet sequences  select template from protein sequences database.  the template strand should show maximum sequences similarity or homology 2) Preparation of sequence alignment  alignment of two sequences for homology determinations 3) Construct 3d model  it is made between the cordinents of template  We consider the length height width For comparing the template with the query sequences between the coordinates of templates 4) Evaluation of the model constructed  it is evaluated between known 3d model.  the method is more accurate.  the accuracy is depends on sequence alignment
  • 107.  Homologous models are identified and extinct of their sequences similarity with one another and the unknown is determined.  Sequence databases search tools BLAST and FASTA are used to search related structures.  Sequences are aligned together with the help of a MSA tool called clustal W.  Structurally conserved and variable regions are identified Co-ordinate of core residues of unknown structure and those of non are generated.  The side chain and combinations are built.  Unknown structures are refined and evaluated various software packages are used WHAT, RASMOL, MODELLER.  It exploited the revolutionary related proteins.
  • 108.
  • 109. MODELLER  Used for 3d structure prediction.  It is written in FORTRAN 90 languages.  It is a software used in homology or knowledge based modelling.  It was developed by Anrej sali at the university of california san francisco .  The ModWeb with comparative protein structure modelling webserver is based on MODELLER.  It has limited incorporation with abintitio.  It is a computer program used in producing homology models of protein tertiary as well as quarternary structures.  It is freely available for academic use.  Graphical user interface and commercial versions are different .  Computer program.  Used for sequence database searching  For protein structural comparison.  used for sequence clustering
  • 110. 4 important steps 1) Selection of tablet sequence  select temperature sequence from protein sequence databases template to sequence exhibit maximum homology with sequence which is used to study 2) Preparation of sequence alignment  preparation of sequence alignment between the sequence which is to be analyised with that template sequence 3) Construction of 3d model  construction 3d model based on the coordinates of the templet using technique called satisfaction spacial restraints  Here by using certain geometrical criteria Length, breadth, height compare the complete with query sequence especially on the basis of coordinates of the tablets searches loop, folding, side chains etc. 4) Evaluation of model constructed  we can expect 90 % accuracy, when provides sequence alignment highly accurate
  • 111. RASMOL  Molecular visualisation software.  Molecular structural analysis of protein as well as nucleic acid and other similar molecule is possible.  Used for visualising molecular structure.  Used in a maily for structural analysis.  Example : pollen grains, detailed molecular structure study .  Zooming facility of molecular structure and getting full size of monitor .  Rotating facility in any 3d direction x, y, z 180 degree, 120 degree, 120 degree etc.  Periferal analysis is possible.  Different colouring scheme available for particular part projection.  We can view entire structure is possible detailed study is possible by using RASMOL.
  • 112.
  • 113. Advantage  detail study of structure is possible by using RASMOL.  Molecular visualisation software .  Very good for detailed molecular analysis of small molecules like nucleotide or protein etc. 1) Group colouring scheme 2) Shapely colouring scheme 3) amino colouring scheme
  • 114. 5.Emerging Areas of Bioinformatics 1) DNA microarrays 2) Functional genomics 3) Comparative genomics 4) Pharmacogenomics 5) Chemoinformatics 6) Medical informatics
  • 115. DNA Microarrays  it is genetic analysis technique.  used for analysis of nucleic acid  in genetic analysis technique 100 to 1000 of microscopic dots of dna was spotted on small glass plate in an orderly fashion.  Location of each DNA dots, structural details, final details and expression products informations are available. and stored in computer program .  All informations of spotted DNA are available form computer, by using these information genetic analysis occurs  Started at 1990.  Also called DNA chips, gene chips, DNA array, gene array and biochiyps.  Principles is hybridizations between nucleotides
  • 116. Procedure  for this, normal mRNA from normal expresses cell and it is enter into this microarray, get the rate of gene expression.  Collect mRNA and prepare DNA microarray.  Radiolabeling the CDNA (100 NOS )and which is considered as the probe  Introduced into DNA microarray.  Radiolabelled CDNA Hybridization with DNA microarrays dots that indicate the number of hybridization
  • 117.
  • 118. Application  Gene expression study 1) for comparison of gene expression in similar cell type (diseased cell and normal type ) 2) for comparison of gene expression in different cell type (different cell of different individual)  Identification of tissues specific gene  Discovery of drugs  Diagnostics and genetic mapping  Study of protein protein interaction  Functional genomics  DNA sequencing  Agricultural biotechnology  Study the expression of plants  DNA polymorphism  Detection of pathogen  Gene finding  Analysis 100 -1000 genes at a time  Gene mapping
  • 119. Functional genomics  Study the functions of genes.  example growth and physiological environment biochemical environment and role in growth.  In activity of genes and its reasons.  Genes are inactive by the actions of other genes and expression of genes may die to the suppression of other gene, the causing reason.  Development and application of genomic analysis technique .  Identify the genes involving in the disease. 1) Positional cloning technique 2) genome sequencing technique  Example: # Mirring Shotgun method # enzymatic method # chemical method  are developed on the basis of functional genomics  get information about structural and functions of gene 3) Gene expression Profiling technique comparison of similar cell type but different in  gene expression due to mutation  So used to find out the expression 4) Knockout technique
  • 120.
  • 121. Comparative genomics  Compare the structural and functional details and based on the similarities and differences find out the relationship  Gene finding  classification of nucleotide sequence  find out the evolutionary relationship comparison of gene expression  Analysis of protein sets from completely sequenced genomes  For better understanding of the genomes and biology of the respective organism  Example methanococcus, mycoplasma, E.coli, bacillus subtilis are fully sequenced  Genes involved in ripening green mangoes to yellow mangoes  In this genome of mango is compared to the annotated genome of similar species to identify the genes and the functions that they do  Databases used for comparative genomics: A. PEDANT Give informations about proteins, enzyme B. KEGG A comprehensive set of metabolic pathway of genome C. MBGD Microbial genome database. search for microbial genome D. WIT Metabolic reconstruction of completely sequenced genomes
  • 122.
  • 123. Pharmacogenomics  Is the study of the role of the genome in drug response  its name reflects its combining of pharmacology and genomics  Pharmacogenomics analyses how the genetic makeup of an individual affects his or her response to drugs  It deals with the influence of acquired and inherited genetic variation on drug response in patients by correlating gene expression for single nucleotide polymorphism with pharmaco kinetic and pharmacodynamics  Pharmacogenomics aims to develop rational means to optimise drug therapy.  with respective patients genotype, to ensure maximum efficiency with minimal adverse effect  Genomic research will allow drugmakers to tailor a therapy to the individual specific need
  • 124.
  • 125.  It is described as a marriage between functional genomics and molecular pharmacology  A new journel pharmacogenomics was started by the nature group of journals  The entire spectrum of genes that determine response and sensitivity to individual drugs  Example human genome project  Pharmacogenetics is the narrow spectrum of inherited differences in drug metabolism and disposition .  Both pharmacogenomics and genetics are Interchangeable  It provide tools to classify interogenity of disease, Individual response to medicine.  give fascinating area in biotechnology research.  Example: diagnosis, mechanism of disease and Response of patients to medicine
  • 126.  2 approaches to pharmacogenomics 1) candidate gene approach 2) linkage disequilibrium approach  In industrial level, it is used to know variability in clinical trials  Disturb differential side effects  Inconsistency in disease models
  • 127.
  • 128. Chemoinformatics  Also known as chemoinformatics, Chemio informatics and Chemical informatics  It is the use of computer and informational techniques applied to a range of problems in the field of chemistry Application  In pharmaceutical companies and academic settings in the process of drug discovery  These methods can also be used in chemical and allied industries in various other forms
  • 129. Medical informatics  Also called health informatics  Clinical informatics  It is information engineering applied to the field of healthcare, essentially the management and use of patient healthcare information  It is a multidisciplinary field that uses health information technology to improve health care via any combination of higher quality, higher efficiency and new opportunities  Used in gene therapy  Neurological and metabolic disorders  Cystic fibrosis  Infectious diseases  More efficient to patient case  Cardiovascular diseases, cancer gene therapy, human gene therapy