2. Whatis bioinformatics?
ā¢ In biology, bioinformatics is defined as, ā
t
h
e use of
computer to store, retrieve, analyse or predict the
composition or structure of bio-moleculesā . Bioinformatics
is the application of computational techniques and
informationtechnology totheorganisationandmanagement
of biological data. Classical bioinformatics deals primarily
withsequenceanalysis.
4. Applications of bioinformatics
Applications of
bioinformatis
Gene
therapy
Drug
designing
Antibiotic
resistance
Crop
development
Medicine
biotechnology
Drought
resistance
Evolutionary
studies
Forensic
analysis
Veterinary
science
Wheather
analysisy
Waste
ceanup
5. Biological databases
ā¢ Biological data are complex, exception-ridden, vast and
incomplete. Therefore several databases has been created
and interpreted to ensure unambiguous results. A
collection of biological data arranged in computer
readable form that enhances the speed of search and
retrieval and convenient to use is called biological
database. A good database must have updated
information.
6. Importanceofbiological database
ā¢ A range of information like biological sequences,
structures, binding sites, metabolic interactions, molecular
action, functional relationships, protein families, motifs
and homologous can be retrieved by using biological
databases.The mainpurposeof abiological databaseis to
store and manage biological data and information in
computerreadableforms.
10. GeneBank
ļ¼ Oneof thefastest growing repositories of knownnucleotide sequences,GeneBank
(Genetic Sequence Databank), has a flat file structure. It is an ASCII text file,
readable by both humans and computers. Besides sequence data, GeneBank files
contain information such as accession numbers and gene names, phylogenetic
classificationandreferencestopublishedliterature.
ļ¼ This database has been developed and maintained at the NCBI, Bethesda, MD,
USA, asapartofInternationalSequenceDatabaseCollaboration(INSDC).
ļ¼ Itis anopenaccesssequencedatabase.
ļ¼ It coordinateswithindividual laboratories
EMBL and DDBJ.
and other sequence databases like
Continueā¦ā¦ā¦ā¦ā¦ā¦..
11. ļ¼It is anannotatedcollection of all nucleotide sequencesthatareavailable to
thepublic.
ļ¼The nucleotide database was divided into three databases at NCBI:
CoreNucleotide database, Expressed Sequence Tag (EST) and Genome
SurveySequence(GSS).
ļ¼CoreNucleotide databasehasmostof thenucleotide sequencesused. It also
enclosesall nucleotiderecordsthatarenotintheEST andGSS databases.
ļ¼Submission of sequences to GeneBank can be done using BankIt, Sequin
andtbl2asntools.
12. EMBL(EuropeanMolecular
Biology Laboratory)
ā¢ A comprehensive database of DNA and RNA sequences, EMBL
nucleotide sequencedatabaseis collected from scientific literature,
patientoffices andis directly submittedby researchers. EMBL has
been prepared in collaboration with GeneBank (USA) and the
DNA DatabaseofJapan(DDBJ).
ā¢ Itis establishedin1980.
ā¢ Itis maintainedbyEBI (EuropeanBioinformatics Institute)
13. Swiss-Port
ļ¼ This is a curatedprotein sequencedatabasethatoffers ahigh level
of integrationwithotherdatabasesandalso hasavery lowlevel of
redundancy. Swiss-Port strives toprovide protein sequenceswith a
high level of annotation (for instance, the description of protein
function, domain structure and post translational modifications,
etc.).
ļ¼ It is established in 1986 and maintained collaboratively , since
1987, by the department of Medical Biochemistry of the
UniversityofGenevaandtheEMBL dataLibrary.
ļ¼ TrEMBL is a computerāannotated supplement of Swiss-Port that
contains all translations of EMBL nucleotide sequence entries,
whichis notyetintegratedinSwiss-Port.
ļ¼ Currently Swiss-Port have 0.5 and TrEMBL have 7.6 milliom
sequences.
14. ProteinInformationResource(PIR)
ā¢ PIR is an integrated public bioinformatics resource to
support genomic and proteomic research and scientific
studies. Nowadays, PIR offers a wide variety of resources
mainly oriented to assisting the propagation and
consistency of protein annotations like PIRSF, ProClass
andProLINK.
16. MotifDatabases
ā¢ Protein sequence motif is a set of conserved amino acid
residuesthatareimportantfor proteinfunctionandarelocated
within a certain distance from one another
. These motifs
usually provide clues to the functions of otherwise
uncharacterisedproteins.
ā¢ The PROSITE database consists of documentation entries
describing protein domains, families and functional sites as
wellasassociatedpatternsandprofilestoidentifythem.
ā¢ PRINT is adatabasefor proteinfingerprints. A fingerprint is a
group of conserved motifs used to characterise a protein
family.
17. DomainDatabase
ā¢ A protein domain is an independently folded, structurally
compact unit that forms a steady three- dimensional structure
and shows a certain level of evolutionary conservation.
Typically ,aconserveddomaincontainsoneormoremotifs.
ā¢ ProDomisaproteindomaindatabaseautomaticallygenerated
fromtheSwiss-PortandTrEMBL sequencedatabase.
ā¢ SMART is a highly reliable and sensitive tool for
domain identification.
ā¢ COG is adatabaseandaconvenienttoolformotifanddomain
identification.
18. 3DStructuredatabases
ā¢ PDB (Protein Data bank) is themainprimary databasefor 3D
structures of biological macromolecules determined by X-ray,
crystallography and NMR. It also accepts experimental data
usedtodeterminethestructuresandhomologymodels.
ā¢ SCOP (Structural Classification of Protein database) classifies
protein 3D structures in a hierarchical scheme of structure
classes. All the protein structures in PDB are classified her,
andtheupdatednewstructuresaredepositedinPDB.
ā¢ The CATH database (Class, Architecture, Topology,
Homologous) contains a hierarchical classification of protein
domainstructure.
19. Protein data bank
ā¢ PDB (Protein data bank) is a repository for 3D structural
data obtained by x-ray crystallography or NMR
spectroscopyofproteinsandnucleic acids.
ā¢ Research Collaboratory for Structural Bioinformatics
(RCSB) PDB provides avariety of tools andresourcesfor
studying the structures of biological macromolecules and
their relationship with other sequences, its function and
diseasescausedif any.
20. Geneexpressiondatabases
ā¢ GEO or Gene Expression Omnibus is a curated online
resource and a gene expression molecular abundance
repository for gene expression data browsing, query and
retrieval.
ā¢ GXD or GeneExpressionDatabase is acommunity resource
forgeneexpressioninformation.
ā¢ MGED or Microarray Gene Expression data contains
microarray data, generated by functional genomics and
proteomicsexperiments.
ā¢ ArrayExpress from European Bioinformatics Institute is a
repositoryfortranscriptomicsdata.
21. Metabolicpathwaydatabases
ā¢ KEGG PATHWAY Databasecontains graphical pathwaymapsfor all
knownmetabolicpathwaysfromvariousorganisms.
ā¢ EcoCyc is an E. coli database , stores information regarding the
genomeandbiochemical machineryofE.coli.
ā¢ LIGAND is achemical databasefor enzymereactions attheInstitute
for Chemical Research, Kyoto. It is composite database currently
consisting of the COMPOUND, DRUG, GLYCAN, REACTION,
RPAIR andENZYMEdatabases.
ā¢ MetaCyc is a non-redundant, experimentally elucidated metabolic
pathwaydatabase.
ā¢ BRENDA is an enzyme database tat contains information on all
aspectsofenzymesandenzymaticreactions.
22. Genomedatabases
ā¢ Genome databases give absolute information on the heritable
properties of an organism. These databases help to identify
genes and predict their functions. A few genome databases
havelinkswithspecificorganismdatabases.
ā¢ GOLD (Genomes Online Database at the University of
Illinois, USA) contains a list of all thecomplete and ongoing
genomeprojectsworldwide.
ā¢ Genomes at NCBI (National Centre for Biotechnology
Information,USA).
ā¢ TIGR database (TDB), at the institute for Genomic Research
atRockeville MD,USA.
23. Virological databases
ā¢ A virological database contains all the sequences and
related information of viruses of animals, plants, bacteria,
fungi and archea; for example, the HIV protease
database. A committee called The International
Committee on Taxonomy of Viruses(ICTV) authorises
and organises the taxonomic classification of viruses.
ICTVdB contains taxonomic information for over
thousandsofvirusspecies.
24. Worldbiodiversitydatabases
ā¢ Taxonomic databases are built to document all known
species and them available and
worldwide.
accessibl
e databases contain taxonomic
hierarchies,
make
These
species names, synonyms, descriptions,
illustrations and references. For example: CCINFO,
STRAIN andALGAE.
26. AnnotationofGene?????
In molecular biology, genomes make the basic genetic material and
typically consistofDNA. Whereby,genomeincludethegenes(coding)
andnoncoding regions, of interest tous, arethecoding regions asthey
actively influence basic life processes. The genes contain useful
biological information that is required in building up and maintaining
an organism. Gene annotation can be defined merely as the process of
makingnucleotidesequencemeaningful.
Gene annotation involves the process of taking the raw DNA sequence
produced by genomesequencing projects andadding layers of analysis
and interpretation necessary to extracting biologically significant
information andplacing such derived details into context. Annotation is
the process by which pertinent information about these raw DNA
sequencesisaddedtothedatabases.
27. Accessionnumber
Accession numbers are unique identifiers which
permanently identify sequences in the database.
Accession numbers are assigned and
communicated to authors within twoworking days
ofthereceiptofsubmission.