As an interdisciplinary field of science, bioinformatics combines biology, computer science, information engineering, mathematics and statistics to analyze and interpret the biological data.
4. IMPORTANCE
It is an interdisciplinary subject, where three subjects Biology,
Computer science and Information technology compain or merge
together to form the new disciplin ….. Bioinformatics.
OR
Bioinformatics is a branch of biology which deals with very fast,
accurate and logical analysis of biological data’s and information for
interpretations and predictions by making use of computational
techniques. ( Margaret Day Hoff )
DEFINITION
Bioinformatics, n. The science of information and information flow
in biological systems, esp. of the use of computational methods in
genetics and genomics. (Oxford English Dictionary)
"The mathematical, statistical and computing methods that aim to solve
biological problems using DNA and amino acid sequences and related
information." -- Fredj Tekaia
5. SCOPE
1) Better documentation, store large quantity of data and addition,
documentation, delition of data are also possible.
a) Design and discovery of drugs. Considering genomic structure
of pathogens and chemical structure of drugs.
b) Study based on the important biomolecules protein and nucleic
acid.
PROTEIN: Structural and functional unit.
NUCLEIC ACID: Hereditary determining path.
c) Bioinformatics is the comparison based on the already available
details of protein and nucleic acid.
2) Very easy to search and access information.
3) Fast, accurate, logical analysis.
4) Interpretation and prediction.
6. Applications
1) Comparison
Comparison of nucleic acid and protein sequence.
It provides similarities and differences between the sequence of protein and nucleic
acids.
Two type analysis is there
1) Structural analysis 2) Functional analysis
Get structural details Get functional details
Molecular level of classification of organism are possible by using bioinformatic tool.
Classification by comparing sequences by their similarities and differences of protein
as well as nucleic acid sequences and there by relationship of both nucleic acid and
protein.
In taxonomy morphological, enzymatic analysis and comparisons are only occur but
for obtaining accurate level analysis molecular level analysis requires.
Comparison of protein and nucleic acid helps to,
Classification of protein
Classification of nucleic acid
Classification of individual
Evalutionary relationship between organism
7. 2) Gene finding
Using bioinformatic gene finding easy.
Nucleic acid is the expression product of genes.
By finding the nucleic acid sequences, helps to identify the gene
responsible for certain characters. Eg : gene responsible for yeild
improvement
Gene finding has application in crop improvement such as
resistance to insect, disease, drought, salinity etc. higher yeild.
In agricultural and medical field – useful in comparison of normal
one with diseased one.
In medical field, to find out the gene responsible for genetic
disorders and rectify in embryo and patient level by compairing
normal with diseased one.
By Embryo therapy : at embryo level or rectify in sperm/egg
Patient therapy : rectify at particular cells or nucleic acid
8. 3) Protein structure prediction
Comparison of protein structure with protein structure database.
By knowing protein structure, find out the final activities, their
influence in physiological and metabolic pathway of an organisms & also
related growth of organisms via knowing protein structure.
Find out the disease pathway; by identifying defective protein and
defective gene.
By identifying protein coding gene, helps to cure genetic disorders.
NMR technique, X- Ray diffraction technique is used for identifying
protein structure. But it is very expensive and time consuming methods.
Inted of there 2 method bioinformatics are applicable, very easy, less
expensive and time saving method.
Very short time required for structure prediction.
Discovery of near noval protein using bioinformatics inserted of NMR
and X- Ray diffraction technique, which is used in several field, drug
discovery and pharmaceutical etc.
By knowing protein structure we can synthesis biologically valuable
synthetic enzymes.
9. 4) Evalutionary relationship study
By structural genomics, functional genomic and comparison
genomics.
5) Construction of biological data bases
Construction of data bases is a part of coming under better
documentation.
Depending up of type and kind of information, different type of
databases are there.
DATA BASE: area or spaces where informations are stored in
electric format. Different type of data bases are present, based on
the information containing ( information about protein/ nucleic
acid) Eg: EMBL, Gene bank
6) Total genomic structural study of an organism
Helps to species identification.
10. 7) Used in environmental cleaning up programme
By gene finding: scope for bioremediation. Eg: In oil spils we
use psuedomonas putrida to decrease the effect of
hydrocarbons in oils.
Plasmid – degrade hydocarbon – total oil degrade
Improve and modify individual useful for bioremediation
8) Creation of bio weapon
By gene finding near future bio weapons are used for
Eg: different disease causing microorganism identify and
used as weapon.
12. a) Nucleic Acid Databases
EMBL, Gene Bank – Structure of
Gene Bank entries. Specialized
genomic resources. UniGene
13. EMBL
Nucleotide sequence data base
It is developed by EBI ( European Bioinformatic Institute in UK)
European Molecular Biology Laboratory (EMBL)
It collect information from different sources such as
* Genome sequencing projects
* Scientific literature
* Direct auther submission
It associated with Gene Bank, DDBJ, for exchanging information each other. So we can see
comprehensive collection of information.
Its growth rate is very fast, double the information in 9-10 months.
It divided in to many subdivisions.
The Laboratory operates from six sites: the main laboratory in Heidelberg, and outstations
in Hinxton (the European Bioinformatics Institute (EBI), in
England), Grenoble (France), Hamburg (Germany), Rome (Italy) and Barcelona (Spain).
EMBL groups and laboratories perform basic research in molecular biology and molecular medicine
as well as training for scientists, students and visitors.
Informations are accessing through SRS system. SRS: SEQUENCE RETRIEVAL SYSTEM
The first systematic genetic analysis of embryonic development in the fruit fly was conducted
at EMBL by Christiane Nüsslein-Volhard and Eric Wieschaus,[13] for which they were awarded
the Nobel Prize in Physiology or Medicine in 1995.
In the early 1980s, Jacques Dubochet and his team at EMBL, developed cryogenic electron
microscopy for biological structures. It was rewarded with the 2017 Nobel Prize in Chemistry.
URL Address : ( Uniform Resource Location) http://w.w.w.ebl.uk/embl/
14.
15. GENE BANK
It is a primary nucleiotide sequence biological data base.
Full form Gene Bank
Developed by NCBI (National Centre for Biotechnology Information)
Less restriction
AIM: Helps the scientific and research community in order to support their
research activity that contain information without restrictions except copy right
sequence and patent sequence.
Growth rate : 1 months; with in one month double the informations.
Information's are divided into 17 divisions for getting information easily.
There are 17 divisions to make convinient & efficient informations in Gene
Bank.
2 Retrieval system:
1) Entrenz Integrated Retrieval system : It have a capacity to link with
nucleotide sequence db with protein sequence db.
2) MEDLINE Facility: useful to get information of abstract of originally
bublised papers related to nucleiotide sequences.
http://w.w.w.ncbi.nlm.nih.gov/genebank
16. Gene Bank incorporates information from
# publish available sources
# primarily from direct author submissions
# large scale sequencing project
To help ensure comprehensive coverage, the resource
exchanges data with both the EMBL data library and DDBJ.
17. Structural Entities
The Structure of Gene Bank Entries
A Gene Bank release includes the sequence files, indices created
on various databases fields and information derived from the
databases.
Gene Bank was made availabe on CD-ROM
It is convenient machanism for widespread.
Relatively inexpensive distribution
As the size of the database, large no.of CD required and dificult to
handle for the producers and for the users.
Today Gene Bank is available in FTP format.
Commonly used is the sequence entry file which contains the
sequence itself and disruptive information relating to it.
Each entry consist of no. of keywords,relevent associated sub-
keywords and an optional features.
18. The structure of gene bank entries consist of 13 structural
components:
1) LOCUS
2) DEFINITION
3) ACCESSION NUMBER
4) VERSION
5) KEYWORDS
6) SOURCE
7) ORGANISM
8) REFERENCE
9) AUTHOR
10) TITLE
11) JOURNEL
12) PUB MED NO
13) REMARK/COMMENT
19. 1) LOCUS: we need to provide an entry number (identification for nucleiotide sequence)
[ NM- 000555- mRNA- Tuesday, 21.7.2018]
(entry no.) (Type of sequence) (day ) (day.M.year)
2) DEFINITION: scientific name of source organism.
Eg for Bt gene: Sequence entering there and expresssion product.
scientific name: Baccillus thuringenesis, mRNA, βendotoxin.
3) ACCESSION NUMBER: normallysimilar to entry number.
[NM: 000555]
4) VERSION: if we want to update information we first write entry No. and version No.
and also gene information Id No. along with it.
[NM: 000555.5.G: Id No 12345]
5) KEYWORDS: we must provide the key word of our work, if no key word put a dot.
Eg: Insert resistivity .
6) SOURCE: name of source of organism which we get, we must write common name .
source organism: Bacteria
7) ORGANISM: name of source of organism, we must write scientific name. scientific
name of source of organism: Bacillus thuringenesis
20. 8) REFERENCE: reference of that paper published related to
enter the nucleotide sequence of interest.
9) AUTHOR: we need to enter the name of author in the same
order as in the same order as in the case of published.
10) TITLE: title of the paper
11) JOURNEL: name of the journel where you have publishd
the paper.
12) PUB MED NO: this is the no. which helps to access the
archived published paper with in PUB MED( scientific
journel archiver).
13) REMARK/COMMENT: we can enter, biological
importance/ expression/changes/source organism as
comment.
21. Specialized genomic resources
The purpose of specialized resources is to focus on species -
species genomics and to particular sequencing techniques.
The particular aim of such a data base is the integrated view
of a particular biological system.
a) UniGene
* The collection represents genes from many organisms and
each cluster relating to a unique gene and including related
information corresponding to the gene.
* A valuable role of UniGene is in gene discovery.
* UniGene is also used for gene mapping projects and large
scale gene expression analysis.
22. b)TDB — The TIGR Database
* These databases containing DNA and protein sequence, gene
expression, protein family information etc.
* Also the data such as taxonomic range of plants and humans, role
of cellular components are also present.
c) SGD (Saccharomyces Genome Database)
* SGD is an online data resource which contain information on the
molecular biology and genetics of S.cerevisiae (Budding yeast).
* This data base provides internet access to the genome, its genes
and their products etc.
* SGD helps the research field by uniting together functions to
perform sequence similarity search tools.
* The illustration of genetic maps using dynamically created
graphical displays make the data base user friendly.
23. UniGene
It is an specilized genomic resources.
There are the databases, which tend to be linked, to some
extend, with the primary DNA databases from which they
may derive their data and into which their results are usually
fed.
Purpose of specialized genomic resource
1) to species-specific genomics
2) to particular sequencing technique
Primary goal of human genome project is to determine the
complete sequence of human genome.93 billion base pairs)
3% of the genome encodes protein.
Biological significance of remainder is unknown
24. A transcript map is a vital resource in flagging there parts of
the genome that are actually expressed.
Unigene attempts to provide a transcript map by utilising
sets of non-redundant gene-oriented clusters derived from
genebank sequence.
The collection represents gene from many organisms each
cluster relating to a unique gene and including related
information., such as the type in which the gene is expressed,
map location etc.
25. b) Protein Sequencing Databases
PIR
SWISS-PROT
TrEMBL
Composite Protein Databases
NRDB
OWL
Secondary Databases
PROSITE
PRINTS
BLOCKS
IDENTIFY
26. SWISS-PROT
• Protein sequence database
• Switzerland based database.
• SWISS-PROT is an annotated protein sequence database established in
1986 and maintained collaboratively, since 1987, by the Department of
Medical Biochemistry of the University of Geneva and the EMBL Data
Library.
• It is a curated protein sequence database, which strives to provide a high
level of annotation
• (such as the description of the function of a protein, its domain
structure, posttranslational modifications, variants, source and
organisms)
• a minimal level of redundancy, and a high level of integration with other
databases.
• SWISS-PROT contains the information about the name and origin of the
protein, protein attributes, general information, ontologies, sequence
annotation, amino acid sequence, bibliographic references, cross-
references with sequence, structure and interaction databases, and entry
information.
27.
28. It is maintained collaboratively by the Swiss Institute for
Bioinformatics (SIB) and the European Bioinformatics
Institute (EBI).
The SWISS-PROT group is headed by: Rolf Apweiler.
It contains non-redundant sequence entries and
informations are thoroughly revealed and annotated.
Provide protein sequence to students researchers and other
related industries like pharmasutical industries.
SWISS-PROT aims to be minimally redundant and is
interlinked to many other resourses.
Linked with other databases EMBL and TrEMBL.
29. TrEMBL
It is primary protein sequence database
Translated EMBL
A protein sequence database of nucleotide translated sequences.
Created in 1996 as a computer annotatd suppliment to SWISS-
PROT
This is complete annotated protein sequence databases.
There databases is constructed via translatingeach nucleiotide
sequence that are available in EMBL in to protein sequence by
using computational techniques.
The TrEMBL sequence database contains the translations of all
coding sequences (CDS) present in the DDBJ/EMBL/GenBank
Nucleotide Sequence Database and also protein sequences
extracted from the literature or submitted to SWISS-PROT, which
are not yet integrated into SWISS-PROT.
30.
31. TrEMBL consist two divisions:
SP TrEMBL REM TrEMBL
It is an temporary storing area
where incomplete sequence
have not yet manually
annotated.when it is fully
discribed contains entries that
well eventually be incorporated
in to SWISS-PROT.
TrEMBL developed by EBI
It contains completely explained
and fully annotated sequences.
Contains sequences that are not
destined to be included in SWISS-
PROT
Eg:
# immunoglobulins & t cell
receptors.
# fragments of four than eight
amino acids
# synthetic sequences
# patented sequences
32. PIR
Primary protein sequence data base.
Protein Information Resource[1960]
Developed by Margaret Dayhoff in 1960 as a collection of
sequence for investigating evolutionary relationships among
proteins.
Developed at the National Biomedical Research Foundation
( NBRF)
The databases is split into 4 distinct sections. Based on kind
of informations level.
PIR-1, PIR-2, PIR-3, PIR-4
They differ in the terms of
# quality of data
# level of anotation provided.
33. 1) PIR-1
Contains fully classified and annotated.
2) PIR-2
Includes preliminary entries, which have not been
throughly reviewed and may contain redundancy
3) PIR-3
Contains unverified entries, which have not been reviewed.
4) PIR-4
Contains protein sequences that are not geneticallly
encoded and not produced on ribosomes. So they are
synthetic protein sequences.
34. Composite Protein Databases
These are the amalgamation or compilation of product
of different primary databases.
Makes searching easy and efficient for a searcher.
They render sequence searching much more, because
they obviate the need to interrogate multiple resources
1) NRDB
2) OWL
35. NRDB- Non-Redundant Data Base
It is built localy at NCBI
Combination of 6 primary DB
1. SWISS-PROT
2. PDB
3. PIR
4. Gen pept
5. Gen pept update
6. SP update
Non-redundant & error free
But if strictly speaking chance of redundency and error
When redundency and error and incorrect sequence are present in any
component DB. As such they where incorporated in to NRDB, especially in
SWISS-PROT.
Make more efficient via, avoiding to search to too much DB for getting related
information.
36. OWL- Ontology Web Language
Web ontology language
Compilation of 4 primary DB
1. Gene Bank
2. SWISS-PROT
3. NRL-3D
4. PIR-4
Make searching more efficient via, avoiding or obivating too much DB for getting
related information
Developed by NCBI
If any redundency in Gene Bank, it is as much incorporated into OWL during
amalgamation.
Development of university deals –UK in association with Daresburg laboratory in
warrington 1994
The sources are aligant on the basis of level of annotation and sequence
validations
SWISS-PROT has the highest priority
OWL is only released on a 6-8 weekly basis .
37. Secondary Databases
PROSITE
PRINTS
BLOCKS
IDENTIFY
It contains the fruits of analysis of sequences in the primary
sources
Simply secondary data were derived from primary
These are db which are analysed primary databases, which
from secondary data. These are several different primary db
& a variety of ways of analysing protein sequences.
38. PROSITE
First secondary DB to have been developed was PROSITE
Generate its information from the primary data base SWISS-PROT
Produced and maintained by SIB
Relesed date : 1988 by amosbiroch
URL Address: http://www.prosite.expasy.org.
It categorises the protein sequences in families.
Proteins are grouped into different family. Based on the single most
conserved Motif.
Motif: it is a ring of aminoacid (10-20 amino acid sequences)they are
responsible for protein function and preserves its 3D structure.
Such Motifs usually according key biological function.
Eg: enzymes active site, ligand or metal binding site
Motif indicate or represent charecteristic features or site for each family.
The region act as signatures of particular protein family and help to
identify the other newly members of family
PROSITE is developed a largely manual process of seeking the patterns
that best fit particular families and functions.
39. In PROSITE entries are developed in two different files
1) First of this pattern and list all matches in the new version of SWISS-PROT
2) Documentation file provide:
# details of characterized family
# discription of biological molecule of choosen Motif
# supporting biografy
SIGNIFICANCE
To find families based on Motif, ie; presence of motif the same portion of many
sequence are considered a single family.
Fat functional charecterization and annotation of protein sequences.
Identify possible functions of newly discoered protein and analyses of protein
for previously unditermined activity
Offers tool for protein sequence analyses and Motif detection
It is a part of expasy proteomics analysing server
APPLICATION
Classification of protein is possible based on the highest conserved motif
Based on particular motif can identify the charecteristic features of motif and
representing character.
Eg: the structural and functional details if that proteins
40. PRINTS
Collect information from OWL in future. It will collect
information from SP, TrEMBL and SWISS-PROT
Information deriving process from OWL is called interactive data
base scanning.
Contributed by SIB
In 1999 it was maintained in the department of biochemistry and
molecular biology at university college London (UCL).
http://www.bioinf.man.ac.uk/db browser/ bioactivity/ protein 2
frm. html.
Here we need to consider multiple Motif. Insert to single common
Motif.
Helps to find out the more similar sequence. So clear information
are available.
More accurate analyses is possible based on similar multiple motif
sharing by sequences.
41. BLOCKS
Multiple Motifs based database
Ungaped multiple alignment of Motifs
Database contains informations on blocks
Highly conserved multiple motifs are arranged without any gap
Developed by Henikoff 1998
Automatically derived database
Database constructed by using automated PROTOMAT system.
Ultimately encoded as ungapped local alignments are calibrated against
SWISS-PROT to obtain a measure of the likelihood of a chance match
Two scores are noted for each block :
first denotes at the level at which 99.5 percentage of matches are true
negative.
Second median value of the true positive scores .
The median standardized score for known true positive matches is
termed strength .
Because the database is derived by fully automatic methods, The blocks
are not annotated but links are made to the corresponding PROSITE
family documentation file .
42. These information are derived from the secondary
database PRINTS & PROSITE it can also called as tertiary
database .
It is based on protein families contained in PROSITE, at Fred
Hutchinson Cancer Research Centre (FHCRC).
The motifs or BLOCKS are created by automatically detecting the
most highly conserved regions of each protein family.
The blocks are ultimately and encoded as Ungappped local or
multiple alignment.
Structure of BLOCKS entries:
Where each block is identified by a general code (ID) line and
accession number.
ID line indicates the type of discriminated to expect in the life.
AC line indicates the minimum and maximum distance of the
blocks from its preceding neighbour.
DI line contains the descriptions for a title of the family.
BL line indicates the diagnostic power (amino acid triplet, number
of sequence it contains)
43. IDENTIFY
Another automatically derived tertiary source
Derived from BLOCKS and PRINTS
Developed in the department of biochemistry at stanford
university by Navill - Manning et al 1998
Constructed on the basis of e-motif
e-motif : it is a based on the similarities of highly conserved
Motif sequence.
This database is constructed on the basis of generalised
expressions of similarities between highly conserved Motif
sequences.
It is designed to be more flexible band exact regular expression
matching.
They are accessible for use the protein function web server from
the biochemistry department at stanford sets and their properties
are used in e-Motif.
45. SCOP
Structural Classification Of Proteins
It is maintained under MRC laboratory of molecular biology
and centre for protein engineering.
Which describes structural and evolutionary relationships
between proteins of known structure 1995.
It is helpful for at the multi domain level and individual
domain level.
It is constructed using a combination of manual inspection
and automated methods.
The information of structure of protein is available due to
the Checking done with automatic and manual method
result would be more accurate.
46. Scope Classification
proteins are classified in a hierarchical fashion to reflect their structural and
evolutionary relationships.
In this protein structures are assigned in a hierarchical order at three levels:
1) Family
2) Super family
3) Fold
Family
proteins are clustered into families with clear evolutionary relationship if they
have sequence identify more than 30 percentage sequence similarity
Super family
proteins are placed in super families when in spite of low sequence identify
their structure structure and functional characteristics suggest a common
evolutionary origin.
Fold
proteins are classified as a common fold is have the same major secondary
structures in the same arrangement and with the same topology
Scope is accessible for keyword via MRC laboratory webserver
http://www.bioinf.man.ac.uk/db browser/ bioactivity/ structure frm. html
47. CATH
Class Architecture Topology Homology
It is a hierarchy in classification of protein structures maintained at University
College of London (UCL) 1997.
The resource is largely derived using automatic methods but manual inspection
is necessary word automatic methods, fail.
Developed by UCL's biomolecular structure and protein modelling unit. Used
for classification of protein structure. There are five levels within the hierarchy.
A) CLASS
Is derived from gross secondary structure content and packing of protein.
four classes of domain are recognised ,
1. SUBCLASS 1
2. SUBCLASS 2
3. SUBCLASS 3
4. SUBCLASS 4
Sub class 1: mainly similarities in alpha helix
Sub class 2: similarities in beta sheet
Sub class 3: alpha - beta which includes both alternating alpha /beta and
alpha + beta structures
Sub class 4: based on secondary structure content for element secondary
structural element contents will be very less in amount
48. B)ARCHITECHTURE
Describe the gross arrangement of secondary structure ignoring the
connectivities.
C) TOPOLOGY
both the overall shape and the connectivity of Secondary structures
protein
D) HOMOLOGY
share more than 35 percentage sequence identity and share a common
and sister (homologous )similarities are first identified by sequence
comparison and and structure comparison algorithm
E) SEQUENCE
# Final level in the hierarchy.
# Structures with homology groups are further clustered on te basis of
sequence identify.
# domains have sequence identifies more than 35 % indicating highly
similar structures and functions
CATH is as accessable keyword via UCL’s biomolecular structure and
modelling unit web server.
50. A) Sequence Data Base Searching
EST searches
Different approaches to EST analysis
Merck/IMAGE
Incyte
TIGR
EGAD
EST analytical tools
Sequence similarity
Sequence assembly and Sequence clustering
51. EST searches
Expressed Sequence database.
EST data are held in the EST database.
EST sequence tag are also called gene transcripts.
Which maintains its own format and identification number
system.
Expression tag sequence is a short sequence .
Short nucleotide sequence produced from CDNA
mRNA- reverse transcriptase enzyme- single stranded DNA.
A typical EST will be between 200 to 500 basis in length, with
modern technical advances increasing the theoretical length
resulting from a single run 1000 bases are more
It is called genes transcript and parcel sequences and series are
noisy sequences that, as a result of sequences errors, may not only
contain have ambiguous bases but also be missing bases.
52. In analysing EST’s, the following points should:
The EST alphabet is five characters ACGTN.
EST will be sum sequence of any other sequence in the database
EST may not represent part of the series of CDS of any gene .
EST production is highly automated and results often
contaminated with ambiguous are missing bases. This course
difficulties in sequence interpretation.
Uses
Identification of particular gene
Mapping of genes within a genome by using a small stretch of
sequence
Identification of species
For academic analyses or commercial exploitation have been
developed
53. Different approaches to EST analysis
These are the EST’s information providing sources.
Where is approaches to establishing libraries of EST’s for
academic or commercial exploitation have been developed.
Much of the publicity available data are collected together
into the EST sections of the year EMBL data library and Gene
Bank (db EST)
Merck/ IMAGE
Incyte
TIGR
EGAD
54. Merck/ IMAGE
It is a research project was run by the university of washington and
funded Merck and company.
In 1994 , Merck and co-founded a research project based at the
university of washington to sequence 300000 EST’s from a variety
of normalised libraries.
AIM:
To produce 3 lakh EST’s from CDNA libraries.
For many years Merck has sponsored the production of a drug
index.
Approaches of the sources
To support academic analysis
Commercialization of EST information to drug production
The drug index is known as Merck Gene Index as of May 1997,
A,84,421 EST’s had been submitted by the project to dbEST
55. Incyte
It is a pharmaceutical company
Incyte pharmaceutical Inc.
It produces a database Life Seq, that enphasises the quantitative
information derived by sequencing strand CDNA libraries.
AIM
To provide/collect information on relative copy numbers of genes
in healthy and deseased tissue.
To facilitate the elucidation of potential therapeutic targets.
APPROACH
Commercialization of genomic information regarding EST’s of
healthy and diseased cells. Then it give to the therapeutic targets.
Production of drugs for getting money
In april 1998, the size of Life Seq was 2.5 million EST’s
representing 8000 to 12000 different genes.
56. TIGR
The Institute for Genomic Rsearch .
It is a government organisation .
It purely stands for academic purposes .
It is a research organisations with interest in structure, functional and
comparative analysis of genomes and gene products .
The range of organisms covered includes viruses, Eubacteria ,pathogenic
bacteria ,archaebacteria and eubacteria (plant and animal)
AIM
Preparation of Human Gene Index (HGI).
This index integrates results from human genome research projects
around the world including that from db EST and Gene Bank.
To create a non redundant view of all human genes and informations on
their expression pattern cellular roles , functions and evolutionary
relationship.
Data in HGI are freely available.
TIGR sequence more than 100000 EST’s from over 300 CDNA libraries
+ data from db EST + non redundant Human Transcript Information
using the technique of sequence assembly, to generate Tentative Human
Consensus ( THC) sequences .
58. EST Analytical Tools
There are many tools avilable for the analysis of EST’s:
Commercially available Tool = Incyte Life Tools
Publicaly available Tool = 3 Types
1) Sequence Similarity Search Tools
2) Sequence Assembly Tools
3) Sequence Clustering Tools
59. 1) Sequence Similarity Search Tools
We consider the tools as the relate to EST's.
If the reason est is told, then identify the tool which shows
the sequence similarity with the EST, by comparing the all
sequences.
Eg: BLAST tool
BLAST P
BLAST N
BLAST X
X BLAST N
60. 2) Sequence Assembly Tools
When a search of databases reveals several EST matching
with probe sequence, normally the ESTs must be aligned
with each other to reveal the consensus sequences.
This tool is used in when there are several EST sequences
showing similarity to a probe sequence .
In this situation, this tool will do aligning and merging of
different fragments of sequences to reconstruct the original
mRNA .
Example; Phrap, Staten assembler, TIGR assembler
61. 3) Sequence Clustering Tools
These are the programs that take a large set of sequences and
divide them into subsets, or clusters, between the extent of shared
sequences are defined in a minimum overlap region.
These tools having the capacity to analyse a large set of sequences
and capable of grouping for clustering sequences based on the
sharing of maximum similar regions .
Reliable and effective mechanism for clustering EST will reduced
redundancy in the database And save database search time and
analysis effort .
Example:
Wed EST clustering tools
USEARCH
CD- HIT
62. Sequence similarity searching tools
These are softwares used for searching, assessing, analysis, interpretation and prediction
of information containing in databases.
These are two types
1) Pair wise sequence alignment and similarity searching tool
# A pair of sequence involved
# one will query sequence and other template.
# query – sequence will be studied
# template – will be find out from DB
Eg; BLAST , FASTA
2) Multiple sequence alignment and similarity search tool or
homology searching tool
# more than two sequence involved.
# a set of sequence can compare in it & alignment possible
Eg; CLUSTAL , MODELLER
PSI - BLASTA
# Position specific Interacted blast
# It is an hybrid of pairwise sequence alignment and multiple sequence similarity search
tool
63. sequences are aligned to find region of higher density or
strong similarity.
According to the sequence length, sequence alignment are
two types;
1) Local sequence alignment: Sequence alignment that
select only regional areas only which exhibit strong
similarity
Eg: BLAST, FASTA, PSI - BLAST
2) Global sequence alignment :
Sequence alignment that consider entire sequence known
as global sequence alignment
64. Functional Analysis Tool
• Protein as well as nucleotide.
• Used for functional analysis.
• To study the similarities of sequence based on their
function
• GOFFA :
# Gene ontology for functional analysis
# using for identification of functional elements in
genome and related
functional analysis of gene and genome
• Ermine J :
# Used for genome analysis
# and also for functional analysis related to gene
expression
• Interproscan :
# It is used for the functional analysis of protein
66. Statistical Analysis Tool
Statistical analysis the value of similarity and
differences
Eg:
Statistica
Met Lab
Perl
67. B) Pair-Wise Sequence Alignment
Technique
Comparison of sequences and sub sequences
Identity and similarity
Substitution matrics
PAM
BLOSUM
DOTPLOT
BLAST
FASTA
68. Substitution matrices
( BLOSUM & PAM)
When two sequences compare, one sequences have Leusine and
other also have Leusin at comparing sequences,
If the residue to residue (Leusin- Leusin)Similarity in amino acid
in the both sequences plot alignment score as 1.
But according to this substitution matrix program due to
mutation or evolutionary change, the amino acid can change and
cause mismatches.
But the mismatch can accept matching ones, since they do not
change the basic structural or functional.
The matching are considered by deep analysis.
Used in the study of evolutionary relationship.
If amino acid changes their nature will be considered. if
nature Remains same in deeper analysis, researcher should be
considered them as match one and plotted it in matrices such
plotted matrices produce called substitutional matrices.
69. BLOSUM Model
It is a substitution matrices.
BLOCKS amino acid substitution matrices .
It was proposed to overcome the problem of alignment of distantly
related sequences comparisons on substitution matrices .
It was proposed by Steven Heinkoff & Jorja G Henikoff in
1992 , From the conserve regions of blocks the informations are
derived from the and amino acid patterns of distantly related
protein sequences available in BLOCKS databases hence the name
BLOCK SUBSTITUTION MATRIX.
BLOSUM Matrices are based on a much larger data set.
Represent distant relationships more explicitly. The closely related
sequences are considered and clustered together and treated as
single sequences.
70.
71. The cluster contains sequences that have sequences
identifies higher than it cutoff called clustering percentage
changes in clustering percentage Leads to a family of
matrices.
This has three versions of comparison:
BLOSUM 30 - 30 less than 30 percentage similarity
BLOSUM 62 - 62 or between 62 and 30 percentage similarity
BLOSUM 90- 90 or between 90 and 62 percentage of
similarity
It helps to detect all kinds of information and to get diverse
type of relationships (closely and distinct )
72. PAM
(Point Accepted Mutation or DayHoff PAM model)
Also known as DayHoff amino acid substitution matrix.
It was derived by M.O.DayHoff In 1978.
Here Substitutions of A.As are observed in homologuos protein
sequences during evolution, so these amino acids Substitutions
do not significantly change the function of the protein.
These substitutions are accepted by natural selection.
These matrixes are known as as accepted point mutation or point
accepted mutation PAM.
To prepare PAM Matrices , observed substitutions that occur in
alignments between similar sequences estimated Then used to
generate a 20×20 mutation probability matrix p representing all
amino acid changes.
73.
74. Each element of matrix Pij Represent the probability of
replacement of A.A. j by A.A i Over a fixed evolutionary
period .
For PAM 1 Is the unit of evolutionary divergence in which
one percentage of amino acids have been changed .
The model has limited value.
Applied for highly similar sequence alignment and
comparison .
Only used in case of closely related sequence comparison .
Not provide distantly related Closely related sequences and
relation to overcome this later proposed BLOSUM.
Used in evolutionary studies
75. DOT PLOT Analysis
It is a paradise sequence alignment
It is a very simple and basic pair why sequence analysis technique
It is done by manual and graphical method of sequence analysis
W ithin a plot, two identical sequences are characteristic
It is the most basic method of comparing two sequences A visual
approach known as Dot Plot.
It was first described by A J Gibbs and G A Memory in 1970
It is a graphical method for comparing two sequences to identify the
region of similarity or dissimilarity, depicted by the presence or absence
of a dot on the plot, hence the name Dot Plot.
To construct dot plot of sequences A and sequence B, the first
sequences is taken on the top of the plot (x axis) and the second
sequences is taken on the left side (y-axis) of the plot.
A dot is placed on the plot if any sequence character Ai Present in A
sequences is identical to sequences character Bi Present in sequence B.
76.
77. A region of constructive Identical characters between both
sequences forms a diagonal line on the plot space .
When large similar sequences are compared, such clouds
become crowded or noisy. To overcome this, the sliding
window concept is used .
From the dot plot, the alignment score is calculated .
Uses
Used for improvise logical sequence analysis.
Useful for comparison of protein sequences.
The plot is characterized by some apparently random dots
(noise) indicates regions of greater similarities between two
sequences
78. BLAST
Basic Local Alignment Searching Tool
Pair wise sequence alignment tool.
Developed and maintained by NCBI
It is a tool specialised in local sequence alignment inserted of
whole sequence alignment.
Tool based on a statistical, theory called explicit statistical theory
by Altschul et al 1990
Ungapped Alignment of regional sequences
Can be used to align both protein and nucleotide sequences but it
can provide with alignment for protein sequences
Very fast searching tool
This tool can be search a data with millions of sequences in the
data base with In a second in pair wise manner.
79.
80. Use
Construct pair why sequence alignment by comparisons between two
sequence.
Best tool for searching single most best sequence from corresponding
database.
To find out the structural sequence similarity of quary sequence include 3d
structure.
Used in the interpretation and prediction of structural information.
Interpretation and prediction of functional information.
Steps
Selection of regional areas of information shows best similarity .
Extension of searching towards both the sides of selected region to get
maximum similarity .
Demerits
At a time, we can only Compare a query sequence with a single sequence.
sensitivity to select sequences.
sometimes it may loses its sensitivity in selecting best matches
from databases (because when this tool tries to maintain thier speed in
selecting the best .it may missed certain matches that may be better than
selected one .
81.
1) BLAST P
Used to search and find out a perfect protein sequences from
the P.S.D.B for for the query sequences.
2) BLAST N
Search and find the best N.S from N.S.D.B For the query
sequences .
3) t BLAST N
query sequeneces equal to protein sequences.
Then the given N.S.D.B Is converted into protein sequences then
comparing the quarry with the translated nucleotide sequences.
4) BLAST X
query sequence = nucleotide sequence
we are searching within P.S.D.B, Then the protein sequences are
converted into nucleotide sequences and compare nucleotide
sequences with the translated protein sequences.
5) t BLAST X
This translates Both N & P sequences in the respected databases
and then searching is occurs.
82. FASTA
fast all
it is a sequence alignment tool
developed by Lipman and pearson 1985
The FASTA format is a text-based format for representing either nucleotide
sequences or amino acid (protein) sequences, in which nucleotides or amino
acids are represented using single-letter codes.
The format also allows for sequence names and comments to precede the
sequences.
The format originates from the FASTA software package, but has now become a
near universal standard in the field of bioinformatics.
The simplicity of FASTA format makes it easy to manipulate and parse
sequences using text-processing tools and scripting languages like the R
programming language, Python, Ruby, and Perl.
comparison with BLAST:
It give better results for nucleotides but can used for both P& N sequences .
It can provide better results than BLAST N But not better than BLAST P.
More sensitive than BLAST in selecting best matches Missing of sequences
while searching is lesser than BLAST.
83. Different forms of FASTA:
1) FAST A3
It has a normal function used for both N & P Sequences for
searching P& N sequence query
2) FAST S3
Used to compare linked peptides against a protein sequences
databases
3) FAST f3
Used to compare mixed peptides against protein sequences
databases
4) FAST X/Y3
Used to search within protein sequences databases against a
translated query N.S.
5) t FAST X/Y3
Used to search within a translated protein sequence databases
for comparing a query protein sequences
84. C) Multiple Alignment Technique
Objective, manual, simultaneous and progressive
methods
Databases of multiple alignments
PSI-BLAST
CLUSTAL-W
85. Multiple Sequence Alignment
More than two sequences involved.
A set of sequences can compare at time and alignment also possible.
2 type alignment:
Simultaneous Multiple Sequence Alignment and Progressive Multiple
Sequence Alignment.
1) Simultaneous Multiple Sequence Alignment
Alignment occur a time, that is simultaneously.
There is no hierarchy fashion of arrangement or orderly arrangement.
But sequences having similarity.
Advantage
Very fast, very quick alignment
Disadvantage
We can't expect orderly arrangement of sequences based on similarity.
Evolutionary relationship study is not possible
86. 2) Progressive multiple sequence alignment
Hierarchical arrangement of sequences and clear cut orderly
arrangement can seen.
Sequence alignment of occurs progressively by step by step,
little time consuming process.
This alignment best and most similar sequence, arrange next
after query sequence.
Advantage
Arrange at hierarchical fashion .
Evolutionary relationship study possible
Diadvantage
Comparatively slow and little time consuming process
87. PSI-BLAST
PSI-BLAST (Position-Specific Iterative Basic Local Alignment Search
Tool) derives a position-specific scoring matrix (PSSM) or profile from
the multiple sequence alignment of sequences detected above a given
score threshold using protein–protein BLAST.
This PSSM is used to further search the database for new matches, and is
updated for subsequent iterations with these newly detected sequences.
Thus, PSI-BLAST provides a means of detecting distant relationships
between proteins.
PSI-BLAST is most conveniently used on the internet with the help of
the graphical user interface provided by the PSI-BLAST search page on
National Center for Biotechnology Information (NCBI) website
(http://www.ncbi.nlm.nih.gov/BLAST/).
The PSI-BLAST page may be customized by the user in terms of
automated or semiautomated or “two-page formatting” and other
parameters modified as desired.
This page can then be saved as permanent internet bookmark for
repeated use on future occasions.
88.
89. It is an hybrid tool
It is a recent approach
Hybrid element of both device and multiple sequence alignment
method
It was proposed by Altschul in 1997
Hybrid of pairwise sequence alignment and multiple sequence
alignment and similarity searching tool.
It can aligned sequence via progressive sequences alignment
Searching residue to residue similarity, we compare sequence only,
plot dot similarity occurs.
If there similarity present, place a dot mark as graphical
representation
Calculate similarity
Out of 7, 5 is similar
Used mainly for nucleotide sequence comparison
90. Here, sequences are aligned via pair wise , but with repeated blast in order to
get more and more related sequences.
So they act as pair wise as well as look like a multiple sequence alignment .
So they contains maximum similarity, median and least similarity
Advantages
To increase the search of BLAST
fast to run
provide sequences with diverse range of sequence similarity like M.S. alignment
Searches are more sensitive and Selective, able to detect weak but meaningful
similarities.
running the program, increases search sensitivity.
Disadvantages
To derive diagnostic family motifs can be very time consuming and demands
levels of understanding for general use.
Automated interactive stearch may degenerate and lead to profile dilution
91. CLUSTAL
3 forms:
1) CLUSTAL X
2) CLUSTAL W
3) CLUSTAL ω
CLUSTAL X&W:
Protein sequence as well as nucleotide sequence alignment possible
CLUSTAL ω:
Can only align the protein sequence
CLUSTAL X:
In CLUSTAL X Controlling interface is graphical user interface.
Menu based operations for this handling or graphical representations
are used.
CLUSTAL W CLUSTAL ω:
Command line interface.
For controlling interphase using text command.
92. Clustal W
Clustal W like the other Clustal tools is used for aligning
multiple nucleotide or protein sequences in an efficient
manner.
It uses progressive alignment methods, which align the most
similar sequences first and work their way down to the least
similar sequences until a global alignment is created.
Clustal W is a matrix-based algorithm, whereas tools like T-
Coffee and Dialign are consistency-based.
ClustalW has a fairly efficient algorithm that competes well
against other software.
This program requires three or more sequences in order to
calculate a global alignment, for pairwise sequence alignment
(2 sequences) use tools similar to EMBOSS, LALIGN
93.
94. Multiple sequence alignment tool
progressive multiple sequence alignment possible
written in O ++ programming language.
this can run almost all platforms like Unix, Linux, Metash, Windows
Developed by Juli Thomson and Toby Gibson
Developed and maintained by EBI
User interface is command line, interface by write text commands.
Due to progressive multiple sequence alignment comparison is very easy
due to orderly arrangement.
Application
Very easy to compare sequences due to progressive sequence alignment
Very useful for the classification of both protein and nucleotide
sequences.
Application in predicting structural and functional features of both
nucleotide as well as protein sequences.
This is the best tool for evolutionary relationships study .
96. Secondary structure prediction
Commonly two methods are used for protein structure
prediction
1) X - ray diffraction technique
2) Nuclear magnetic resonance technique
Birthday are very expensive by clever wise and time taking
processes.
To over comes these issues we are used by biinformatics
tools.
Less time consuming and very fast method.
Skilled labours are not required.
Cheapest method, when comparing with above 2.
97. Chou-fasman Method
Chou fasman Method is an empirical technique for the prediction
of secondary structures in proteins .
Development by Peter Y Chou and Gerald D Fasman.
The method is based on analysis of the relative frequencies of each
amino acid in alpha helix, beta sheets and turn based on known
protein structures solve with x-ray crystallography.
From these frequencies a set of probability parameters were
derived for the appearances of each amino acid in each secondary
structure type, And these parameters are used to predict the
probability that a given sequence of amino acids would form a
helix, a beta strand, for a turn in a protein.
Significantly Low accurate than the modern machine learning
based technique.
50 to 60 percentage accurate in identify correct secondary
structures
98.
99. Definition
It is an statistical procedure in which each and every amino acids
and their frequencies of given sequence is Compared with the
probability of amino acids and their corresponding propensitive
Values given by Chou Fasman in order to Fit the given protein to a
particular secondary structure
Probability table
What are the amino acids and their numbers are present in
secondary structure of protein according to traditional sequence
Propensitive value
Is is the value at which a particular and aminoacid showing their
tendency towards a particular secondary structure.
Propensity value of an aminoacid is generally depends the
chemical properties and their R groups:
# Alpha helix: 4 helix markers + 2 helix breakers
# Beta sheet: 3 sheet markers + 2 sheet breakers
100. Steps
Scan through the given polypeptide chain
For to find out the what are the different amino acids
present in the given strand
Also for finding out their numbers
Compare the same with the probability and propensitive
value given by Chou Fasman
101. J Pred prediction method
A protein secondary structure prediction server
Fully automatic method
It has been operation since approximately 19
J Pred Incorporate the J net algorithm in order to make more
accurate predictions.
Combination of 6 Independent protein structure prediction
method
1) Z PRED
2) MUL PRED
3) DSC
4) PHD
5) NNSSP
6) PREDATOR
102. All 6 different method predict independency .
396 Domain data support secondary structure
information.
Evaluate 6 different methods result with 396 domain data
and get final structural information.
Inserted of 6 method, using Gives more accurate results
than it using Z PRED, MUL PRED Methods.
4 methods compilation gives accuracy 72.9 percentage .
It is an Secondary structural prediction method, hear
combilation of 6 different independent methods are using .
105. Comparative modelling
Comparative modelling/Homology modelling
It predict the 3d structure of proteins.
It uses experimentally determined protein sequences as
models (templates)
The method predict the structure of another protein that
exhibits aa sequence similarity to the template protein.
Evolutionary related protein have similar sequence and
structure.
These similarities are very high in Core regions the
sequence similarity should be greater than 35 percentage
106. Steps
1) selection of tablet sequences
select template from protein sequences database.
the template strand should show maximum sequences similarity
or homology
2) Preparation of sequence alignment
alignment of two sequences for homology determinations
3) Construct 3d model
it is made between the cordinents of template
We consider the length height width For comparing the template
with the query sequences between the coordinates of templates
4) Evaluation of the model constructed
it is evaluated between known 3d model.
the method is more accurate.
the accuracy is depends on sequence alignment
107. Homologous models are identified and extinct of their
sequences similarity with one another and the unknown is
determined.
Sequence databases search tools BLAST and FASTA are
used to search related structures.
Sequences are aligned together with the help of a MSA tool
called clustal W.
Structurally conserved and variable regions are identified
Co-ordinate of core residues of unknown structure and those
of non are generated.
The side chain and combinations are built.
Unknown structures are refined and evaluated
various software packages are used WHAT, RASMOL,
MODELLER.
It exploited the revolutionary related proteins.
108.
109. MODELLER
Used for 3d structure prediction.
It is written in FORTRAN 90 languages.
It is a software used in homology or knowledge based modelling.
It was developed by Anrej sali at the university of california san
francisco .
The ModWeb with comparative protein structure modelling webserver is
based on MODELLER.
It has limited incorporation with abintitio.
It is a computer program used in producing homology models of protein
tertiary as well as quarternary structures.
It is freely available for academic use.
Graphical user interface and commercial versions are different .
Computer program.
Used for sequence database searching
For protein structural comparison.
used for sequence clustering
110. 4 important steps
1) Selection of tablet sequence
select temperature sequence from protein sequence databases template
to sequence exhibit maximum homology with sequence which is used
to study
2) Preparation of sequence alignment
preparation of sequence alignment between the sequence which is to be
analyised with that template sequence
3) Construction of 3d model
construction 3d model based on the coordinates of the templet using
technique called satisfaction spacial restraints
Here by using certain geometrical criteria Length, breadth, height
compare the complete with query sequence especially on the basis of
coordinates of the tablets searches loop, folding, side chains etc.
4) Evaluation of model constructed
we can expect 90 % accuracy, when provides sequence alignment highly
accurate
111. RASMOL
Molecular visualisation software.
Molecular structural analysis of protein as well as nucleic acid and
other similar molecule is possible.
Used for visualising molecular structure.
Used in a maily for structural analysis.
Example : pollen grains, detailed molecular structure study .
Zooming facility of molecular structure and getting full size of
monitor .
Rotating facility in any 3d direction x, y, z 180 degree, 120 degree,
120 degree etc.
Periferal analysis is possible.
Different colouring scheme available for particular part projection.
We can view entire structure is possible detailed study is possible
by using RASMOL.
112.
113. Advantage
detail study of structure is possible by using RASMOL.
Molecular visualisation software .
Very good for detailed molecular analysis of small
molecules like nucleotide or protein etc.
1) Group colouring scheme
2) Shapely colouring scheme
3) amino colouring scheme
114. 5.Emerging Areas of Bioinformatics
1) DNA microarrays
2) Functional genomics
3) Comparative genomics
4) Pharmacogenomics
5) Chemoinformatics
6) Medical informatics
115. DNA Microarrays
it is genetic analysis technique.
used for analysis of nucleic acid
in genetic analysis technique 100 to 1000 of microscopic dots of
dna was spotted on small glass plate in an orderly fashion.
Location of each DNA dots, structural details, final details and
expression products informations are available.
and stored in computer program .
All informations of spotted DNA are available form computer, by
using these information genetic analysis occurs
Started at 1990.
Also called DNA chips, gene chips, DNA array, gene array and
biochiyps.
Principles is hybridizations between nucleotides
116. Procedure
for this, normal mRNA from normal expresses cell and it is
enter into this microarray, get the rate of gene expression.
Collect mRNA and prepare DNA microarray.
Radiolabeling the CDNA (100 NOS )and which is considered
as the probe
Introduced into DNA microarray.
Radiolabelled CDNA Hybridization with DNA microarrays
dots that indicate the number of hybridization
117.
118. Application
Gene expression study
1) for comparison of gene expression in similar cell type (diseased cell and normal
type )
2) for comparison of gene expression in different cell type (different cell of
different individual)
Identification of tissues specific gene
Discovery of drugs
Diagnostics and genetic mapping
Study of protein protein interaction
Functional genomics
DNA sequencing
Agricultural biotechnology
Study the expression of plants
DNA polymorphism
Detection of pathogen
Gene finding
Analysis 100 -1000 genes at a time
Gene mapping
119. Functional genomics
Study the functions of genes.
example growth and physiological environment biochemical environment and role in
growth.
In activity of genes and its reasons.
Genes are inactive by the actions of other genes and expression of genes may die to the
suppression of other gene, the causing reason.
Development and application of genomic analysis technique .
Identify the genes involving in the disease.
1) Positional cloning technique
2) genome sequencing technique
Example:
# Mirring Shotgun method
# enzymatic method
# chemical method
are developed on the basis of functional genomics
get information about structural and functions of gene
3) Gene expression Profiling technique comparison of similar cell type but different in
gene expression due to mutation
So used to find out the expression
4) Knockout technique
120.
121. Comparative genomics
Compare the structural and functional details and based on the similarities and
differences find out the relationship
Gene finding
classification of nucleotide sequence
find out the evolutionary relationship comparison of gene expression
Analysis of protein sets from completely sequenced genomes
For better understanding of the genomes and biology of the respective
organism
Example methanococcus, mycoplasma, E.coli, bacillus subtilis are fully
sequenced
Genes involved in ripening green mangoes to yellow mangoes
In this genome of mango is compared to the annotated genome of similar
species to identify the genes and the functions that they do
Databases used for comparative genomics:
A. PEDANT Give informations about proteins, enzyme
B. KEGG A comprehensive set of metabolic pathway of genome
C. MBGD Microbial genome database. search for microbial genome
D. WIT Metabolic reconstruction of completely sequenced genomes
122.
123. Pharmacogenomics
Is the study of the role of the genome in drug response
its name reflects its combining of pharmacology and genomics
Pharmacogenomics analyses how the genetic makeup of an
individual affects his or her response to drugs
It deals with the influence of acquired and inherited genetic
variation on drug response in patients by correlating gene
expression for single nucleotide polymorphism with pharmaco
kinetic and pharmacodynamics
Pharmacogenomics aims to develop rational means to optimise
drug therapy.
with respective patients genotype, to ensure maximum efficiency
with minimal adverse effect
Genomic research will allow drugmakers to tailor a therapy to the
individual specific need
124.
125. It is described as a marriage between functional genomics and
molecular pharmacology
A new journel pharmacogenomics was started by the nature group
of journals
The entire spectrum of genes that determine response and
sensitivity to individual drugs
Example human genome project
Pharmacogenetics is the narrow spectrum of inherited differences
in drug metabolism and disposition .
Both pharmacogenomics and genetics are Interchangeable
It provide tools to classify interogenity of disease, Individual
response to medicine.
give fascinating area in biotechnology research.
Example: diagnosis, mechanism of disease and Response of
patients to medicine
126. 2 approaches to pharmacogenomics
1) candidate gene approach
2) linkage disequilibrium approach
In industrial level, it is used to know variability in
clinical trials
Disturb differential side effects
Inconsistency in disease models
127.
128. Chemoinformatics
Also known as chemoinformatics, Chemio informatics
and Chemical informatics
It is the use of computer and informational techniques
applied to a range of problems in the field of chemistry
Application
In pharmaceutical companies and academic settings in the
process of drug discovery
These methods can also be used in chemical and allied
industries in various other forms
129. Medical informatics
Also called health informatics
Clinical informatics
It is information engineering applied to the field of healthcare,
essentially the management and use of patient healthcare information
It is a multidisciplinary field that uses health information technology to
improve health care via any combination of higher quality, higher
efficiency and new opportunities
Used in gene therapy
Neurological and metabolic disorders
Cystic fibrosis
Infectious diseases
More efficient to patient case
Cardiovascular diseases, cancer gene therapy, human gene therapy