This document outlines the course content for a bioinformatics course covering 4 units:
Unit 1 introduces basic concepts of bioinformatics including proteins, DNA, RNA, and sequence, structure, and function.
Unit 2 covers major bioinformatics databases including those for nucleotide sequences, protein sequences, sequence motifs, protein structures, and other relevant databases.
Unit 3 discusses topics like single and pairwise sequence alignment, scoring matrices, and multiple sequence alignments.
Unit 4 covers the human genome project, gene and genomic databases, genomic data mining, and microarray techniques.
2. Biores-111: Bio-informatics
Unit I
1. What is Bioinformatics? Overview of Bioinformatics.
2. Basic concepts and applications of bioinformatics.
3. Bio-informatics related to proteins, amino acids, DNA and RNA
5. Sequence, structure and function.
Unit II
•Bioinformatics databases, introduction, type of databases: Nucleotide sequence databases; primary
nucleotide sequence databases viz., EMBL; GeneBank ; DDBJ
•Protein sequence databases viz., SwissProt/TrEMBL; PIR
•Sequence motif databases: Pfam and. PROSITE
•Protein structure databases: Protein Data Bank; SCOP; CATH
•Other relevant databases e.g., KEGG.
Unit III
•Single sequence and pair-wise alignments
•Scoring matrix: PAM and BLOSUM and Dot Plots
•Heuristic methods: FASTA and BLAST
•Statistics of sequence alignment score: E-Value; P-Value.
•Multiple sequence alignments: ClustalW, PSI-Blast.
Unit IV
• Human Genome Project
• Genbank SNPs, GOG, STSs, and ESTs data bases
• Genomic Data Mining
• Understanding of the principles of the microarray technique, limitations of microarrays and
problems associated with the technique
•Brief introduction to PERL language.
5. Protein Database
UniPro - protein knowledge database
Swiss 2DPAGE - 2D PAGE
Pfam - protein family and domain
Prosite - protein family and domain
SMART - protein module
BLOCK - protein conserved regions
17. The Pfam database is one the most important collections
of information in the world for classifying proteins. The
database categorises 75 per cent of known proteins to form
a library of protein families - a 'periodic table' of biology.
The open access resource was established at the Wellcome
Trust Sanger Institute in 1998. Its vision is to provide a
tool which allows experimental, computational and
evolutionary biologists to classify protein sequences and
answer questions about what they do and how they have
evolved. The Pfam project is led by Dr Alex Bateman at the
Sanger Institute.
Each entry in the Pfam database includes a protein sequence
alignment as well as an accompanying statistical model, called a
hidden Markov model
Pfam :: Home
18. Pfam :: Home
Pfam is a large collection of multiple sequence
alignments and hidden Markov models
(HMMs)covering many common protein families.
Pfam version 26.0 (November, 2011) contains
alignments and models for 13672 protein families,
based on the Swissprot and SP-TrEMBL
protein sequence databases.
19. HMM: A Hidden Markov Model
HMM: A Hidden Markov Model,
or HMM, is a statistical model for
any system that can be represented
as a succession of transitions
between discrete states.
20. •Proteins are generally composed of one or more functional
regions, commonly termed domains. Different combinations of
domains give rise to the diverse range of proteins found in nature.
The identification of domains that occur within proteins can
therefore provide insights into their function.
•There are two components to Pfam: Pfam-A and Pfam-B. Pfam-A
entries are high quality, manually curated families. Although
these Pfam-A entries cover a large proportion of the sequences in
the underlying sequence database, in order to give a more
comprehensive coverage of known proteins we also generate a
supplement using the ADDA (Automatic Domain Decomposition
Algorithm ) database. These automatically generated entries are
called Pfam-B. Although of lower quality, Pfam-B families can be
useful for identifying functionally conserved regions when no
Pfam-A entries are found.
•Pfam also generates higher-level groupings of related
families, known as clans. A clan is a collection of Pfam-A
entries which are related by similarity of sequence, structure or
25. Biotin Synthase
Biotin synthase (BioB) converts dethiobiotin into biotin by
inserting a sulfur atom between C6 and C9 of dethiobiotin
inan S-adenosylmethionine (SAM)-dependent reaction.
30. PROSITE is a protein database. It consists of entries describing the
protein families, domains and functional sites as well as amino acid patterns,
signatures, and profiles in them, which are manually curated by a team of the
Swiss Institute of Bioinformatics and tightly integrated into Swiss-Prot protein
annotation.
PROSITE was created in 1988 by Amos Bairoch, who directed the group for
more than 20 years. Since July 2009 the director of the PROSITE, Swiss-Prot
and Vital-IT groups is Ioannis Xenarios.
PROSITE's uses include identifying possible functions of newly discovered
proteins and analysis of known proteins for previously undetermined activity.
Properties from well-studied genes can be propagated to biologically related
organisms, and for different or poorly known genes biochemical functions can be
predicted from similarities.
PROSITE offers tools for protein sequence analysis and motif detection. It is
part of the ExPASy proteomics analysis servers.
PROSITE
78. KEGG (Kyoto Encyclopedia of Genes and Genomes) is a
collection of online databases dealing with genomes, enzymatic
pathways, and biological chemicals. The PATHWAY database records
networks of molecular interactions in the cells, and variants of them
specific to particular organisms. As of July 2011, KEGG has switched
to a subscription model and access via FTP is no longer free.
The KEGG, the Kyoto Encyclopedia of Genes and Genomes, was
initiated by the Japanese human genome programme in 1995.
According to the developers they consider KEGG to be a "computer
representation" of the biological system. The KEGG database can be
utilized for modeling and simulation, browsing and retrieval of data. It
is a part of the systems biology approach.
KEGG maintains five main databases:
KEGG Atlas
KEGG Pathway
KEGG Genes
KEGG Ligand
KEGG BRITE
79. Purpose
Developed at the Kanehisa Laboratory
Integrates:
current knowledge of molecular interaction
networks
information about genes and proteins
information about chemical compounds
and reactions