3. Bioinformatics is an interdisciplinary field
that develops methods and software tools
for understanding biological data
3
4. History
• The first protein was sequenced (1955)
• Margaret O.Dayhoff
• Atlas of Protein Sequence and Structure (1965)
• Paulien Hogeweg and Ben Hesper coined “bioinformatics” in 1970
• The protein data bank (1973)
• DNA sequencing (1977)
• NCBI( 1988)
• Human genome project lunched (1990)
• The human genome is published (2001) https://www.smithsonianmag.com/science-
nature/how-margaret-dayhoff-helped-bring-
computing-scientific-research-180971904/
4
7. Unites Of Information In Biological Macromolecules
DNA
Sequence analysis
Mutation and polymorphism studies
Identification of regulatory regions
Gene finding
Genome annotation
Comparative genomics
RNA
RNA sequencing
Splice variants
Tissue expression level
MicroArray
Single gene analysis
Sequence contigs
Protein
Homology modeling
Structure function prediction
Ligand docking
Protein-protein interaction
Protein expression
Phylogenic analysis 7
9. Develop templates to develop potent drug molecules
• Structural analysis of secretory phospholipase A2 from Clonorchis sinensis
https://link.springer.com/article/10.1007/s00894-011-1333-8
9
19. Tumor antigens
• The ideal antigen for a cancer vaccine should be highly immunogenic,
explicitly expressed in all cancer cells (not in normal cells) and
necessary for the survival of cancer cells
• Tumor-associated antigens (TAAs)
• Tumor-specific antigens (TSAs)
20.
21.
22. Bioinformatics journals
• Briefings in Bioinformatics
• Bioinformatics
• Genomics, Proteomics & Bioinformatics
• Current Bioinformatics
• BMC Bioinformatics
• Computers in Biology and Medicine
• Journal of Computer-Aided Molecular Design
• In Silico Biology
• Journal of molecular modeling
22
26. Gene
• A searchable database of genes, focusing on genomes that have been
completely sequenced and that have an active research community to
contribute gene-specific data. Information includes nomenclature,
chromosomal localization, gene products and their attributes (e.g.,
protein interactions), associated markers, phenotypes, interactions,
and links to citations, sequences, variation details, maps, expression
reports, homologs, protein domain content, and external databases
26
28. Genome
• Contains sequence and map data from the whole genomes of over
1000 organisms. The genomes represent both completely sequenced
organisms and those for which sequencing is in progress. All three
main domains of life (bacteria, archaea, and eukaryota) are
represented, as well as many viruses, phages, viroids, plasmids, and
organelles.
28
30. Nucleotide
• A collection of nucleotide sequences from several sources, including
GenBank, RefSeq, the Third Party Annotation (TPA) database, and
PDB. Searching the Nucleotide Database will yield available results
from each of its component databases.
30
32. RefSeq: NCBI Reference Sequence Database
• A comprehensive, integrated, non-redundant, well-annotated set of
reference sequences including genomic, transcript, and protein.
32
33. What is the difference between RefSeq and GenBank?
• GenBank sequence records are owned by the original submitter and
cannot be altered by a third party. RefSeq sequences are not part of
the INSDC but are derived from INSDC sequences to provide non-
redundant curated data representing our current knowledge of
known genes
33
34. BLAST
• The Basic Local Alignment Search Tool (BLAST) finds regions of local
similarity between sequences. The program compares nucleotide or
protein sequences to sequence databases and calculates the statistical
significance of matches. BLAST can be used to infer functional and
evolutionary relationships between sequences as well as help identify
members of gene families.
34
36. Protein
• The Protein database is a collection of sequences from several sources,
including translations from annotated coding regions in GenBank,
RefSeq and TPA, as well as records from SwissProt, PIR, PRF, and PDB.
Protein sequences are the fundamental determinants of biological
structure and function.
36
37. UniProt
• Universal Protein Resource (UniProt) is a comprehensive resource for
protein sequence and annotation data. The UniProt databases are the
UniProt Knowledgebase (UniProtKB), the UniProt Reference Clusters
(UniRef), and the UniProt Archive (UniParc). UniProt is a collaboration
between the European Bioinformatics Institute (EMBL-EBI), the SIB Swiss
Institute of Bioinformatics and the Protein Information Resource (PIR).
37
41. Protein Data Bank (PDB)
• The Protein Data Bank (PDB) is a database for the three-dimensional
structural data of large biological molecules, such as proteins and
nucleic acids.
41
The active site of the enzyme shows
the classical features of PLA2 with the participation of the
three residues: histidine-aspartic acid-tyrosine in hydrogen
bond formation. This is an interesting variation from the
house keeping group III PLA2 enzyme of human which has
a histidine-aspartic acid and phenylalanine arrangement at
the active site. This difference is therefore an important
structural parameter that can be exploited to design specific
inhibitor molecules against the pathogen PLA2
In this study, a detailed structural
and ligand binding analysis of the isoforms has been done
by modeling. The overall three dimensional structures of
the isoforms are well conserved with three helices and a bwing
stabilized by four disulfide bonds. There are characteristic
differences at the calcium binding loop, hydrophobic
channel and the C-terminal domain that can
potentially be exploited for drug binding. But the most
significant feature pertains to the catalytic site where the
isoforms exhibit three variations of either a histidineaspartate-
tyrosine or histidine-glutamate-tyrosine or histidine-
aspartate-phenylalanine. Molecular docking studies
show that isoform specific residues and their conformations
in the substrate binding hydrophobic channel make unique
interactions with certain inhibitor molecules resulting in a
perfect tight fit.
Fluorescence-based differential in-gel expression coupled with mass spectrometric analysis was used
for discovery phase of experiments, and real-time polymerase chain reaction, Western blotting, and pathway analysis were performed for expression and
functional validation of differentially expressed proteins. While aldehyde reductase, hnRNP, cyclophilin A, heat shock protein-27, and actin are upregulated
in responders, prohibitin, enoyl-coA hydratase, peroxiredoxin, and fibrin- are upregulated in the nonresponders. The expressions of some of these
proteins correlated with increased apoptotic activity in responders and decreased apoptotic activity in nonresponders. Therefore, the proteins qualify as
potential biomarkers to predict chemotherapy response.
These differences include: (1) loop-L3 between H2 and H3, which bears residue Gly80 in the wild type, is in a closed conformation with respect to the channel opening, while in the mutant enzyme it adopts a relatively open conformation; (2) the mutant enzyme is less compact and has higher solvent accessible surface area; and (3) interfacial binding contact surface area is greater, and the quality of interactions with the receptor is better in the mutant enzyme as compared to the wild type. Therefore, the structural differences delineated in this study are potential biophysical factors that could determine the increased potency of the mutant enzyme with macrophage receptor for cytokine secreting function, resulting in exacerbation of cachexia in COPD.
The gentamicin molecule binds to the lectin site of the calreticulin and lies in the concave channel formed by the long beta sheets. It makes interactions with residues Tyr109, Asp125, Asp135, Asp317 and Trp319 which are crucial for the chaperone function of the calreticulin. The superimposing of the modeled complex with the only available crystal structure complex of calreticulin with a tetrasaccharide (Glc1Man3) shows interesting features. First, the rings of the gentamicin occupy the positions of glucose and the first two mannose sugars of the tetrasaccharide molecule. Second, the oxygen atoms of the glycosidic linkage of these two ligands have a positional deviation of 1.3 Ǻ
TAAs include “self-antigens” such as differentiated antigens, overexpressed antigens, cancer-testicular antigens, and viral-original “non-self” antigens
International Nucleotide Sequence Database Collaboration (INSDC)