SlideShare a Scribd company logo
1 of 75
Vartika's Presentation
Data
Data is raw, unorganized facts that
need to be processed.
Example:- Each student's test score
is one
piece of data.
Vartika's Presentation
INFORMATION
When data is processed,
organized, structured or presented in
a given context so as to make it
useful, it is called information.
What is database????
• Database are convenient system to properly store, searchand
retrieve any type of data.
• A database helps to easily handle and share large amount of data
and supports large scaleanalysis by easyaccessand data updating.
Vartika's Presentation
What is Biological Database
• Biological databases are libraries of life sciences information
,collected from scientific experiments, published literature, high-
throughput experiment technology and computational analysis.
• They contain information from genomics, proteomics, microarray
gene expression.
• Informationcontained in Biological database includes function,
gene structure, localization(both cellular and
chromosomal),biological sequences andstructures.
Vartika's Presentation
Major purpose of these Data Base is :
•Availability of Biological data.
•Systemization of data.
•Analysis of computed Biological Data.
Vartika's Presentation
History:
 1956; first sequence database when insulin was sequenced
 51 amino acids.
 Atlas of protein sequences and structures in 1965 by Margaret Day Hoff et
al was a printed book.
 Became base for PIR protein information resource
 First nucleotide sequence: yeast tRNA
 77 bases
 During this time 3D structure of proteins was being studied and renowned
PDB was made.
 First genome published was of free living virus Haemophilus influenzae in
1995.
Vartika's Presentation
Features of Biological Data Bases:
1) Data heterogeneity
2) High volume data
3) Uncertainty
4) Data Curation
5) Large scale data integration
6) Data sharing
7) Dynamic and subject to change
Vartika's Presentation
Classification scheme for
biological databases :
Data type
Maintenance status
Data access
Data source
Database design
Organism
Vartika's Presentation
Data Types :
Vartika's Presentation
Based on data
sources
Based on
data
sources
Vartika's Presentation
Content Based:
Genome database
Sequence database
Structure database
Microarray database
Chemical database
Pathway database
Enzyme database
Disease database
Literature database
Vartika's Presentation
Based on maintenance
status
NCBI EMBL SIB
Vartika's Presentation
Based on data
access
1) Publicly available
2) Available with copy wright
3) Browsing only, accessible but not
downloadable
4) Academic but not freely available
5) Proprietary commercial
6) Restricted
Vartika's Presentation
Biological
sequence
Databases
Vartika's Presentation
Vartika's Presentation
Databases Architecture
Information system
)Querysystem
StorageSystem
Data
(The Google,Entrez
SRS)
Your search keywords
Oracle,MySQL,PCbinary
files,Unix text
files,Bookshelves
GenBank flat file
PDBfile
Interaction Record
Title of abook
BookVartika's Presentation
A Sequence Retrieving and
Manipulation Network
DNA
NCBI-GenBANK
Protein
PIR
SWISSPROTDDBJ
EBI-EMBL EXPASY, PDB
GCG
SeqWEB
Vector NTI
GenoMAX
Entrez
SRS
GenBANK
GCG
FASTA
Staden
Image
Databases
Softwares
Formats
Sequence
Converter
Retriva
l
System
Information
Sequnece, Pdb, Image
Vartika's Presentation
Types of biological databases
 Primary Database.
Secondarydatabase.
Vartika's Presentation
Primary databases
Thesesare the primary sourcesof data usedto store nucleic acid, protein sequences and
structural information of biological macromolecules.
Some primarydatabases-
• NCBI(The National Centre for Biotechnology Information)
• GenBank
• DDBJ(DNAdata bank of Japan)
• SWISS-PROT(Swiss-Prot)
• PIR(Protein InformationResource)
• PDB(Protein DataBank)
This sequencecollection of this database is due to the efforts of basic researchfrom
academic industrial and sequencinglab)
Vartika's Presentation
GenBank/EMBL/DDBJ
International
Nucleotide Sequence Database
DDBJ:DNAData Bankof Japan
CIB:Center for Information Biology and
DNAData Bankof Japan
NIG:National Institute of Genetics
IAM: International Advisory Meeting
ICM: International Collaborative Meeting
EMBL:
European Molecular Biology
Laboratory
EBI:
European Bioinformatics
Institute
NCBI:
National Center for BiotechnologyInformation
NLM:
National Library of Medicine Vartika's Presentation
Secondary Database
• ASecondary database contain additional information derived from the analysis
of data available in primary sources.
• Secondary databasesare analysed in avariety of waysand contain different
information in different formats.
• Some secondarydatabases
• TrEMBL
• Pfam
• PROSITE
• Profiles
• SCOP
• CATH
Vartika's Presentation
Flat File Storage Data Formats
• When GenBank, EMBL and DDBJ formed a collaboration (1986), sequence
databases had moved to a defined flat file format with a shared feature
table format and annotation standards.
• The flat file formats from the sequence databases are still used to access
and display sequence and annotation. They are also convenient for storage
of localcopies.
Vartika's Presentation
Vartika's Presentation
Vartika's Presentation
Vartika's Presentation
Vartika's Presentation
The National Center for
Biotechnology Information
Bethesda,
MD
Created in 1988 as a part of the
National Library of Medicine at NIH
– Establish public databases
– Research in computational biology
– Develop software tools for sequence analysis
– Disseminate biomedical informationVartika's Presentation
NCBI Databases and Services
• GenBank primary sequencedatabase
• Free public accesstobiomedical literature
• PubMed free Medline (3million searches per day)
• PubMedCentral full text online access
• Entrez integrated molecular and literature databases
• BLASThighest volume sequence searchservice
(100 – 200 Ksearches perday)
• VASTstructure similaritysearches
• Software andDatabases
Vartika's Presentation
GenBank (Genetic Sequence Databank)
• GenBank®is the genetic sequencedatabaseat the National
Center for Biotechnology Information (NCBI).
• It wasestablished in the year 1982and now maintained by the
NationalCenter for Biotechnology (NCBI).
• DNAsequencescanbe submitted to GenBankusing several
different methods.
• It contains publicly available nucleotide sequencesfor more than
240 000 named organisms, obtained primarily through
submissions from individual laboratories and batch submissions
fromlarge-scale sequencing projects.Vartika's Presentation
• It hasaflat file structure that is anASCIItext file,
readable & downloadable by both humans and
computers.
• There are two main waysof making batch sequence
submissions to GenBank: NCBI’sBarcode
SubmissionTool(BarSTool) and Sequin.
Vartika's Presentation
Vartika's Presentation
Vartika's Presentation
EMBL
• The European Molecular Biology Laboratory (EMBL) is amolecular biology research
institution supported by 22member states, four prospect and two associate member
states.
• EMBLwascreated in 1974and is an intergovernmental organisation funded by public
researchmoney from its member states.
• The Laboratory operates from five sites: the main laboratory in Heidelberg, and
outstations in Hinxton (the European Bioinformatics Institute (EBI), in England),
Grenoble (France),Hamburg (Germany), and Monterotondo (near Rome).
• EMBLgroups and laboratories perform basicresearchin molecular biology and
molecular medicine aswell astraining for scientists,students and visitors.
• Israel is the onlyAsian state that hasfull membership.
• TheEMBLNucleotide SequenceDatabase (http:// www.ebi.ac.uk/embl/), maintained
at the European Bioinformatics Institute (EBI),
Vartika's Presentation
• It is used to incorporate and distributes nucleotide sequences from
public sources.
• The database is apart of an international collaboration with DDBJ
(Japan) and GenBank(USA).
• Data are exchangedbetween the collaborating databases on a
daily basis.
• The web-based tool, Webin, is the preferred system for individual
submission of nucleotide sequences,including Third Party
Annotation (TPA) and alignment data.
Vartika's Presentation
• Automatic submission procedures are usedfor submission of data
from large-scale genomesequencing
• The latest data collection canbe accessedvia FTP,email and
WWW interfaces.
• The EBI's Sequence Retrieval System (SRS) integrates and links
the main nucleotide and protein databases aswell asmany other
specialist molecular biologydatabases.
• For sequencesimilarity searching, avariety of tools (e.g. FASTA
and BLAST) are available that allow external users to compare
their own sequences against the data in the EMBLNucleotide
Sequence Database and otherdatabases.
• All available resourcescanbe accessedvia the EBIhome page atVartika's Presentation
Vartika's Presentation
Vartika's Presentation
Vartika's Presentation
Vartika's Presentation
Vartika's Presentation
Vartika's Presentation
Vartika's Presentation
EMBL format
28-APR-1992 (Rel. 31, Created)
30-JUN-1993 (Rel. 36, Last updated, Version 6)
L.ivanovii sod gene for superoxide dismutase
sod gene; superoxide dismutase.
Listeria ivanovii
Bacteria; Firmicutes; Bacillus/Clostridium group;
Bacillus/Staphylococcus group; Listeria.
[1]
MEDLINE; 92140371.
Haas A., Goebel W.;
"Cloning of a superoxide dismutase gene from Listeria ivanovii by
functional complementation in Escherichia coli and characterization of
ID LISOD standard; DNA; PRO; 756 BP.
XX
AC X64011; S78972;
XX
SV X64011.1
XX
DT
DT
XX
DE
XX
KW
XX
OS
OC
OC
XX
RN
RX
RA
RT
RT
the
RT gene product.";
Vartika's Presentation
M o l . G e n . G e n e t . 2 3 1 : 3 1 3 - 3 2 2 ( 1 9 9 2 ) .
[ 2 ]
1 - 7 5 6
K r e f t J . ;
;
S u b m i t t e d ( 2 1 - A P R - 1 9 9 2 ) t o t h e E M B L / G e n B a n k / D D B J d a t a b a s e s .
J . K r e f t , I n s t i t u t f . M i k r o b i o l o g i e , U n i v e r s i t a e t W u e r z b u r g , B i o z e n t r u m
H u b l a n d , 8 7 0 0 W u e r z b u r g , F R G
S W I S S - P R O T ; P 2 8 7 6 3 ; S O D M _ L I S I V .
K e y L o c a t i o n / Q u a l i f i e r s
s o u r c e
R B S
t e r m i n a t o r
C D S
1 . . 7 5 6
/ d b _ x r e f = " t a x o n : 1 6 3 8 "
/ o r g a n i s m = " L i s t e r i a i v a n o v i i "
/ s t r a i n = " A T C C 1 9 1 1 9 "
9 5 . . 1 0 0
/ g e n e = " s o d "
7 2 3 . . 7 4 6
/ g e n e = " s o d "
1 0 9 . . 7 1 7
/ d b _ x r e f = " S W I S S - P R O T : P 2 8 7 6 3 "
/ t r a n s l _ t a b l e = 1 1
/ g e n e = " s o d "
/ E C _ n u m b e r = " 1 . 1 5 . 1 . 1 "
/ p r o d u c t = " s u p e r o x i d e d i s m u t a s e "
/ p r o t e i n _ i d = " C A A 4 5 4 0 6 . 1 "
/ t r a n s l a t i o n = " M T Y E L P K L P Y T Y D A L E P N F D K E T M E I H Y T K H H N I Y V T K L N E A
H A E L A S K P G E E L V A N L D S V P E E I R G A V R N H G G G H A N H T L F W S S L S P N G G G A P T G N L
I E S E F G T F D E F K E K F N A A A A A R F G S G W A W L V V N N G K L E I V S T A N Q D S P L S E G K T P V
D V W E H A Y Y L K F Q N R R P E Y I D T F W N V I N W D E R N K R F D A A K "
R L
X X
R N
R P
R A
R T
R L
R L
A m
R L
X X
D R
X X
F H
F H
F T
F T
F T
F T
F T
F T
F T
F T
F T
F T
F T
F T
F T
F T
F T
F T
V S G
F T
K A A
F T
L G L
F T
X X
S Q S e q u e n c e 7 5 6 B P ; 2 4 7 A ; 1 3 6 C ; 1 5 1 G ; 2 2 2 T ; 0 o t h e r ;
c g t t a t t t a a g g t g t t a c a t a g t t c t a t g g a a a t a g g g t c t a t a c c t t t c
g c c t t a c a a t
g t a a t t t c t t
g a c t t a c g a a
t t a c c a a a a t
a g a a a c a a t g
g a a a t t c a c t
a g c a g t c t c a
g g a c a c g c a g
a g a t a g c g t t
c c t g a a g a a a
c c a t a c t t t a
t t c t g g t c t a
a a a a g c a g c a
a t c g a a a g c g
g g c a g c t g c g
g c t c g t t t t g
t a a t a a a c a a t c c g a g g a g g a a t t t t t a a t
t t a t g a t g c t t t g g a g c c g a a t t t t g a t a a
c c a c a a t a t t t a t g t a a c a a a a c t a a a t g a
t a a a c c t g g g g a a g a a t t a g t t g c t a a t c t
a g t a c g t a a c c a c g g t g g t g g a c a t g c t a a
a a a t g g t g g t g g t g c t c c a a c t g g t a a c t t
a t t t g a t g a a t t c a a a g a a a a a t t c a a t g c
g g c a t g g c t a g t a g t g a a c a a t g g t a a a c t
a g a a a t t g t t
6 0
t t c a c a t a a a
1 2 0
t a c c t t a t a c
1 8 0
a t a c a a a g c a
2 4 0
a a c t t g c a a g
3 0 0
t t c g t g g c g c
3 6 0
g t c t t a g c c c
4 2 0
a a t t c g g c a c
4 8 0
g t t c a g g a t g
Vartika's Presentation
I D - I d e n t i f i c a t i o n .
A C - A c c e s s i o n n u m b e r ( s ) .
D T - D a t e .
D E - D e s c r i p t i o n .
G N - G e n e n a m e ( s ) .
O S - O r g a n i s m s p e c i e s .
O G - O r g a n e l l e .
O C - O r g a n i s m c l a s s i f i c a t i o n .
R N - R e f e r e n c e n u m b e r .
R P - R e f e r e n c e p o s i t i o n .
R C - R e f e r e n c e c o m m e n t s .
R X - R e f e r e n c e c r o s s - r e f e r e n c e s .
R A - R e f e r e n c e a u t h o r s .
R L - R e f e r e n c e l o c a t i o n .
C C - C o m m e n t s o r n o t e s .
D R - D a t a b a s e c r o s s - r e f e r e n c e s .
K W - K e y w o r d s .
F T - F e a t u r e t a b l e d a t a .
S Q - S e q u e n c e h e a d e r .
- ( b l a n k s ) s e q u e n c e d a t a .
/ / - T e r m i n a t i o n l i n e .
S o m e e n t r i e s d o n o t c o n t a i n a l l o f t h e l i n e t y p e s , a n d s o m e l i n e t y p e s o c c u r m a n y t i m e s i n a s i n g l e
e n t r y . E a c h e n t r y m u s t b e g i n w i t h a n i d e n t i f i c a t i o n l i n e ( I D ) a n d e n d w i t h a t e r m i n a t o r l i n e ( / / ) .Vartika's Presentation
PubMed
• PubMed is a free search engine accessing primarily
the MEDLINE database of references and abstracts on
sciences and biomedical topics.
• The PubMed system was offered free to the public in
1997.
• The United States National Library of Medicine (NLM)
the National Institutes of Health maintains the
part of the Entrez system of information retrieval.
• PMID is the unique identifier number used in
Vartika's Presentation
• Theyare assignedto eacharticle record when it enters the
PubMedsystem.
• ThePMID# is alwaysfound at the end of aPubMed
citation.
• PubMed Central (PMC) is afree digital system that
archivespublicly accessiblefull-text scholarly articles that
have been published within the biomedical and life
sciences journalliterature.
• A"PubMed Mobile" option, providing accessto amobileVartika's Presentation
Vartika's Presentation
Vartika's Presentation
Vartika's Presentation
Vartika's Presentation
Vartika's Presentation
Vartika's Presentation
Vartika's Presentation
Entrez
• WWW-based data retrievalsystem.
• Developed by NCBI(National Centre for Biotechnology
Information).
• - Integrates information held in different DBs.
Vartika's Presentation
Data bases covered by Entrez are
• Nucleic acid -GenBank,
RefSeq,PDB.
• Protein seqs-SWISS-
PROT,PIR.
• 3Dstructures –MMDB
• Genomes –Many
sources
• PopSet – FromGenBank
• OMIM –OMIM
• Taxonomy – NCBItaxonomy
database
• Books- Bookshelf
• ProbeSet – GEO(Gene
ExpressionOmnibus)
• Literature -PubMed
Vartika's Presentation
Vartika's Presentation
Vartika's Presentation
Vartika's Presentation
Vartika's Presentation
Vartika's Presentation
Vartika's Presentation
Vartika's Presentation
Vartika's Presentation
SRS
• SRSis aSequence RetrievalSystem
• - Data retrieval tool developed by EBI
• - Integrates 80 molecular biology DBs
• -AnOpen sourcesoftware (Canbe installed locally)
• SRShas an associated scripting language calledIcarus
• Central resource for molecular biology data
• - more than 250databanks have been indexed. More than 35SRS
servers over theWWW(world wide)
Vartika's Presentation
• Information retrieval
• Easy way to retrieve information from sequence and sequence-related
databases
• Possibility to search for multiple words/other criteria
• Linkage between different databases
• E.g. Find all primary structures with known three-dimensional
• Different types of database in SRS
• Sequence & structure
• DNA, protein, three-dimensional structures
• Sequence-related
• Gene-related
• Genome, mapping, mutations, transcription factors
• SNP
• Bibliographic
Vartika's Presentation
• SRS main toolbar tabs:
• Top Page: displays databases in different database groups
• Query: displays either the standard or extended query form
• Results or “the query manager”: maintains a history of all the
results obtained during a session
• Projects or “the project manager”: maintains a history of all
queries and views used during a session
• Views: allows a user to define a user specific view for one or
more databases
• Databanks: contains a list and some facts about the databases
available in the system
Vartika's Presentation
• Search terms in SRS
• SRS indexed fields can be searched using any of the
• Single word search
• Multiple word phrases
• Numbers and dates
• Regular expressions
• Wildcards
•
Vartika's Presentation
Vartika's Presentation
Vartika's Presentation
Vartika's Presentation
LocusLink
• LocusLink (http://www.ncbi.nlm.nih.gov/LocusLink) is aNational
Center for Biotechnology Information (NCBI) online resource.
• It is principally intended for useby graduate students and
professional researchersin the biomedical sciences.
• It is designed to bring together related information on genetic loci
and gene products from several sources.
• LocusLink provides acentral point of accessfor basic biomedical
information and molecular data for genes, transcripts, and proteins
from model organisms, currently including human, rat, mouse,
fruit fly,and zebrafish.
• Now it is not availablein NCBI.
Vartika's Presentation
Vartika's Presentation
Vartika's Presentation
Vartika's Presentation

More Related Content

What's hot (20)

Swiss pdb viewer
Swiss pdb viewerSwiss pdb viewer
Swiss pdb viewer
 
NCBI
NCBINCBI
NCBI
 
Kegg
KeggKegg
Kegg
 
Uni prot presentation
Uni prot presentationUni prot presentation
Uni prot presentation
 
Homology modeling
Homology modelingHomology modeling
Homology modeling
 
DNA data bank of japan (DDBJ)
DNA data bank of japan (DDBJ)DNA data bank of japan (DDBJ)
DNA data bank of japan (DDBJ)
 
Introduction to ncbi, embl, ddbj
Introduction to ncbi, embl, ddbjIntroduction to ncbi, embl, ddbj
Introduction to ncbi, embl, ddbj
 
TrEMBL
TrEMBLTrEMBL
TrEMBL
 
Protein structure
Protein structureProtein structure
Protein structure
 
Swiss prot database
Swiss prot databaseSwiss prot database
Swiss prot database
 
An Introduction to Genomics
An Introduction to GenomicsAn Introduction to Genomics
An Introduction to Genomics
 
Protein database
Protein  databaseProtein  database
Protein database
 
Gen bank (genetic sequence databank)
Gen bank (genetic sequence databank)Gen bank (genetic sequence databank)
Gen bank (genetic sequence databank)
 
NCBI National Center for Biotechnology Information
NCBI National Center for Biotechnology InformationNCBI National Center for Biotechnology Information
NCBI National Center for Biotechnology Information
 
History and scope in bioinformatics
History and scope in bioinformaticsHistory and scope in bioinformatics
History and scope in bioinformatics
 
European molecular biology laboratory (EMBL)
European molecular biology laboratory (EMBL)European molecular biology laboratory (EMBL)
European molecular biology laboratory (EMBL)
 
Genomics types
Genomics typesGenomics types
Genomics types
 
Protein 3 d structure prediction
Protein 3 d structure predictionProtein 3 d structure prediction
Protein 3 d structure prediction
 
Molecular modeling database
Molecular modeling database Molecular modeling database
Molecular modeling database
 
UniProt
UniProtUniProt
UniProt
 

Similar to Data base in detail

Presentation on Biological database By Elufer Akram @ University Of Science ...
Presentation on Biological database  By Elufer Akram @ University Of Science ...Presentation on Biological database  By Elufer Akram @ University Of Science ...
Presentation on Biological database By Elufer Akram @ University Of Science ...Elufer Akram
 
Primary Bioinformatics Database.pptx
Primary Bioinformatics Database.pptxPrimary Bioinformatics Database.pptx
Primary Bioinformatics Database.pptxVandana Yadav03
 
Introduction OF BIOLOGICAL DATABASE
Introduction OF BIOLOGICAL DATABASEIntroduction OF BIOLOGICAL DATABASE
Introduction OF BIOLOGICAL DATABASEPrashantSharma807
 
Biological databases.pptx
Biological databases.pptxBiological databases.pptx
Biological databases.pptxPagudalaSangeetha
 
Primary Databases.pptx
Primary Databases.pptxPrimary Databases.pptx
Primary Databases.pptxSwarup Malakar
 
Introduction to Biological database ppt(1).pptx
Introduction to Biological database ppt(1).pptxIntroduction to Biological database ppt(1).pptx
Introduction to Biological database ppt(1).pptxRAJESHKUMAR428748
 
Biological databases
Biological databasesBiological databases
Biological databasesBiotech Online
 
Biological Database (1)pptxpdfpdfpdf.pdf
Biological Database (1)pptxpdfpdfpdf.pdfBiological Database (1)pptxpdfpdfpdf.pdf
Biological Database (1)pptxpdfpdfpdf.pdfBioinformaticsCentre
 
Sequence and Structural Databases of DNA and Protein, and its significance in...
Sequence and Structural Databases of DNA and Protein, and its significance in...Sequence and Structural Databases of DNA and Protein, and its significance in...
Sequence and Structural Databases of DNA and Protein, and its significance in...SBituila
 
Sequence and Structural Databases of DNA and Protein, and its significance in...
Sequence and Structural Databases of DNA and Protein, and its significance in...Sequence and Structural Databases of DNA and Protein, and its significance in...
Sequence and Structural Databases of DNA and Protein, and its significance in...BibiQuinah
 
protein databases.ppt
protein databases.pptprotein databases.ppt
protein databases.pptSanthiyaAK
 
Bioinformatics in biotechnology by kk sahu
Bioinformatics in biotechnology by kk sahu Bioinformatics in biotechnology by kk sahu
Bioinformatics in biotechnology by kk sahu KAUSHAL SAHU
 
bioinfomatics
bioinfomaticsbioinfomatics
bioinfomaticsnguyenpg
 
What are Databases?
What are Databases?What are Databases?
What are Databases?Muzzamilahmed15
 
Biological databases
Biological databasesBiological databases
Biological databasesSarfaraz Nasri
 

Similar to Data base in detail (20)

Major databases in bioinformatics
Major databases in bioinformaticsMajor databases in bioinformatics
Major databases in bioinformatics
 
Presentation on Biological database By Elufer Akram @ University Of Science ...
Presentation on Biological database  By Elufer Akram @ University Of Science ...Presentation on Biological database  By Elufer Akram @ University Of Science ...
Presentation on Biological database By Elufer Akram @ University Of Science ...
 
Primary Bioinformatics Database.pptx
Primary Bioinformatics Database.pptxPrimary Bioinformatics Database.pptx
Primary Bioinformatics Database.pptx
 
Introduction OF BIOLOGICAL DATABASE
Introduction OF BIOLOGICAL DATABASEIntroduction OF BIOLOGICAL DATABASE
Introduction OF BIOLOGICAL DATABASE
 
Biological database
Biological databaseBiological database
Biological database
 
Biological databases.pptx
Biological databases.pptxBiological databases.pptx
Biological databases.pptx
 
Primary Databases.pptx
Primary Databases.pptxPrimary Databases.pptx
Primary Databases.pptx
 
Introduction to databases.pptx
Introduction to databases.pptxIntroduction to databases.pptx
Introduction to databases.pptx
 
Introduction to Biological database ppt(1).pptx
Introduction to Biological database ppt(1).pptxIntroduction to Biological database ppt(1).pptx
Introduction to Biological database ppt(1).pptx
 
Biological databases
Biological databasesBiological databases
Biological databases
 
Biological Database (1)pptxpdfpdfpdf.pdf
Biological Database (1)pptxpdfpdfpdf.pdfBiological Database (1)pptxpdfpdfpdf.pdf
Biological Database (1)pptxpdfpdfpdf.pdf
 
Sequence and Structural Databases of DNA and Protein, and its significance in...
Sequence and Structural Databases of DNA and Protein, and its significance in...Sequence and Structural Databases of DNA and Protein, and its significance in...
Sequence and Structural Databases of DNA and Protein, and its significance in...
 
Sequence and Structural Databases of DNA and Protein, and its significance in...
Sequence and Structural Databases of DNA and Protein, and its significance in...Sequence and Structural Databases of DNA and Protein, and its significance in...
Sequence and Structural Databases of DNA and Protein, and its significance in...
 
protein databases.ppt
protein databases.pptprotein databases.ppt
protein databases.ppt
 
Bioinformatics in biotechnology by kk sahu
Bioinformatics in biotechnology by kk sahu Bioinformatics in biotechnology by kk sahu
Bioinformatics in biotechnology by kk sahu
 
bioinfomatics
bioinfomaticsbioinfomatics
bioinfomatics
 
Proteins databases
Proteins databasesProteins databases
Proteins databases
 
Introduction to Biological databases
Introduction to Biological databasesIntroduction to Biological databases
Introduction to Biological databases
 
What are Databases?
What are Databases?What are Databases?
What are Databases?
 
Biological databases
Biological databasesBiological databases
Biological databases
 

Recently uploaded

Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentationtahreemzahra82
 
Volatile Oils Pharmacognosy And Phytochemistry -I
Volatile Oils Pharmacognosy And Phytochemistry -IVolatile Oils Pharmacognosy And Phytochemistry -I
Volatile Oils Pharmacognosy And Phytochemistry -INandakishor Bhaurao Deshmukh
 
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)riyaescorts54
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real timeSatoshi NAKAHIRA
 
Sulphur & Phosphrus Cycle PowerPoint Presentation (2) [Autosaved]-3-1.pptx
Sulphur & Phosphrus Cycle PowerPoint Presentation (2) [Autosaved]-3-1.pptxSulphur & Phosphrus Cycle PowerPoint Presentation (2) [Autosaved]-3-1.pptx
Sulphur & Phosphrus Cycle PowerPoint Presentation (2) [Autosaved]-3-1.pptxnoordubaliya2003
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024AyushiRastogi48
 
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptxRESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptxFarihaAbdulRasheed
 
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfSELF-EXPLANATORY
 
Neurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trNeurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trssuser06f238
 
Solution chemistry, Moral and Normal solutions
Solution chemistry, Moral and Normal solutionsSolution chemistry, Moral and Normal solutions
Solution chemistry, Moral and Normal solutionsHajira Mahmood
 
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPirithiRaju
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.PraveenaKalaiselvan1
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxSwapnil Therkar
 
User Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationUser Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationColumbia Weather Systems
 
TOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physicsTOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physicsssuserddc89b
 
Davis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologyDavis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologycaarthichand2003
 
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxSTOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxMurugaveni B
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Patrick Diehl
 
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxLIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxmalonesandreagweneth
 

Recently uploaded (20)

Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentation
 
Volatile Oils Pharmacognosy And Phytochemistry -I
Volatile Oils Pharmacognosy And Phytochemistry -IVolatile Oils Pharmacognosy And Phytochemistry -I
Volatile Oils Pharmacognosy And Phytochemistry -I
 
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real time
 
Sulphur & Phosphrus Cycle PowerPoint Presentation (2) [Autosaved]-3-1.pptx
Sulphur & Phosphrus Cycle PowerPoint Presentation (2) [Autosaved]-3-1.pptxSulphur & Phosphrus Cycle PowerPoint Presentation (2) [Autosaved]-3-1.pptx
Sulphur & Phosphrus Cycle PowerPoint Presentation (2) [Autosaved]-3-1.pptx
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024
 
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptxRESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
 
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
 
Neurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trNeurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 tr
 
Solution chemistry, Moral and Normal solutions
Solution chemistry, Moral and Normal solutionsSolution chemistry, Moral and Normal solutions
Solution chemistry, Moral and Normal solutions
 
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
 
User Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationUser Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather Station
 
TOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physicsTOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physics
 
Davis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologyDavis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technology
 
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxSTOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?
 
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxLIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
 

Data base in detail

  • 2. Data Data is raw, unorganized facts that need to be processed. Example:- Each student's test score is one piece of data. Vartika's Presentation INFORMATION When data is processed, organized, structured or presented in a given context so as to make it useful, it is called information.
  • 3. What is database???? • Database are convenient system to properly store, searchand retrieve any type of data. • A database helps to easily handle and share large amount of data and supports large scaleanalysis by easyaccessand data updating. Vartika's Presentation
  • 4. What is Biological Database • Biological databases are libraries of life sciences information ,collected from scientific experiments, published literature, high- throughput experiment technology and computational analysis. • They contain information from genomics, proteomics, microarray gene expression. • Informationcontained in Biological database includes function, gene structure, localization(both cellular and chromosomal),biological sequences andstructures. Vartika's Presentation
  • 5. Major purpose of these Data Base is : •Availability of Biological data. •Systemization of data. •Analysis of computed Biological Data. Vartika's Presentation
  • 6. History:  1956; first sequence database when insulin was sequenced  51 amino acids.  Atlas of protein sequences and structures in 1965 by Margaret Day Hoff et al was a printed book.  Became base for PIR protein information resource  First nucleotide sequence: yeast tRNA  77 bases  During this time 3D structure of proteins was being studied and renowned PDB was made.  First genome published was of free living virus Haemophilus influenzae in 1995. Vartika's Presentation
  • 7. Features of Biological Data Bases: 1) Data heterogeneity 2) High volume data 3) Uncertainty 4) Data Curation 5) Large scale data integration 6) Data sharing 7) Dynamic and subject to change Vartika's Presentation
  • 8. Classification scheme for biological databases : Data type Maintenance status Data access Data source Database design Organism Vartika's Presentation
  • 9. Data Types : Vartika's Presentation
  • 10. Based on data sources Based on data sources Vartika's Presentation
  • 11. Content Based: Genome database Sequence database Structure database Microarray database Chemical database Pathway database Enzyme database Disease database Literature database Vartika's Presentation
  • 12. Based on maintenance status NCBI EMBL SIB Vartika's Presentation
  • 13. Based on data access 1) Publicly available 2) Available with copy wright 3) Browsing only, accessible but not downloadable 4) Academic but not freely available 5) Proprietary commercial 6) Restricted Vartika's Presentation
  • 16. Databases Architecture Information system )Querysystem StorageSystem Data (The Google,Entrez SRS) Your search keywords Oracle,MySQL,PCbinary files,Unix text files,Bookshelves GenBank flat file PDBfile Interaction Record Title of abook BookVartika's Presentation
  • 17. A Sequence Retrieving and Manipulation Network DNA NCBI-GenBANK Protein PIR SWISSPROTDDBJ EBI-EMBL EXPASY, PDB GCG SeqWEB Vector NTI GenoMAX Entrez SRS GenBANK GCG FASTA Staden Image Databases Softwares Formats Sequence Converter Retriva l System Information Sequnece, Pdb, Image Vartika's Presentation
  • 18. Types of biological databases  Primary Database. Secondarydatabase. Vartika's Presentation
  • 19. Primary databases Thesesare the primary sourcesof data usedto store nucleic acid, protein sequences and structural information of biological macromolecules. Some primarydatabases- • NCBI(The National Centre for Biotechnology Information) • GenBank • DDBJ(DNAdata bank of Japan) • SWISS-PROT(Swiss-Prot) • PIR(Protein InformationResource) • PDB(Protein DataBank) This sequencecollection of this database is due to the efforts of basic researchfrom academic industrial and sequencinglab) Vartika's Presentation
  • 20. GenBank/EMBL/DDBJ International Nucleotide Sequence Database DDBJ:DNAData Bankof Japan CIB:Center for Information Biology and DNAData Bankof Japan NIG:National Institute of Genetics IAM: International Advisory Meeting ICM: International Collaborative Meeting EMBL: European Molecular Biology Laboratory EBI: European Bioinformatics Institute NCBI: National Center for BiotechnologyInformation NLM: National Library of Medicine Vartika's Presentation
  • 21. Secondary Database • ASecondary database contain additional information derived from the analysis of data available in primary sources. • Secondary databasesare analysed in avariety of waysand contain different information in different formats. • Some secondarydatabases • TrEMBL • Pfam • PROSITE • Profiles • SCOP • CATH Vartika's Presentation
  • 22. Flat File Storage Data Formats • When GenBank, EMBL and DDBJ formed a collaboration (1986), sequence databases had moved to a defined flat file format with a shared feature table format and annotation standards. • The flat file formats from the sequence databases are still used to access and display sequence and annotation. They are also convenient for storage of localcopies. Vartika's Presentation
  • 27. The National Center for Biotechnology Information Bethesda, MD Created in 1988 as a part of the National Library of Medicine at NIH – Establish public databases – Research in computational biology – Develop software tools for sequence analysis – Disseminate biomedical informationVartika's Presentation
  • 28. NCBI Databases and Services • GenBank primary sequencedatabase • Free public accesstobiomedical literature • PubMed free Medline (3million searches per day) • PubMedCentral full text online access • Entrez integrated molecular and literature databases • BLASThighest volume sequence searchservice (100 – 200 Ksearches perday) • VASTstructure similaritysearches • Software andDatabases Vartika's Presentation
  • 29. GenBank (Genetic Sequence Databank) • GenBankÂŽis the genetic sequencedatabaseat the National Center for Biotechnology Information (NCBI). • It wasestablished in the year 1982and now maintained by the NationalCenter for Biotechnology (NCBI). • DNAsequencescanbe submitted to GenBankusing several different methods. • It contains publicly available nucleotide sequencesfor more than 240 000 named organisms, obtained primarily through submissions from individual laboratories and batch submissions fromlarge-scale sequencing projects.Vartika's Presentation
  • 30. • It hasaflat file structure that is anASCIItext file, readable & downloadable by both humans and computers. • There are two main waysof making batch sequence submissions to GenBank: NCBI’sBarcode SubmissionTool(BarSTool) and Sequin. Vartika's Presentation
  • 33. EMBL • The European Molecular Biology Laboratory (EMBL) is amolecular biology research institution supported by 22member states, four prospect and two associate member states. • EMBLwascreated in 1974and is an intergovernmental organisation funded by public researchmoney from its member states. • The Laboratory operates from five sites: the main laboratory in Heidelberg, and outstations in Hinxton (the European Bioinformatics Institute (EBI), in England), Grenoble (France),Hamburg (Germany), and Monterotondo (near Rome). • EMBLgroups and laboratories perform basicresearchin molecular biology and molecular medicine aswell astraining for scientists,students and visitors. • Israel is the onlyAsian state that hasfull membership. • TheEMBLNucleotide SequenceDatabase (http:// www.ebi.ac.uk/embl/), maintained at the European Bioinformatics Institute (EBI), Vartika's Presentation
  • 34. • It is used to incorporate and distributes nucleotide sequences from public sources. • The database is apart of an international collaboration with DDBJ (Japan) and GenBank(USA). • Data are exchangedbetween the collaborating databases on a daily basis. • The web-based tool, Webin, is the preferred system for individual submission of nucleotide sequences,including Third Party Annotation (TPA) and alignment data. Vartika's Presentation
  • 35. • Automatic submission procedures are usedfor submission of data from large-scale genomesequencing • The latest data collection canbe accessedvia FTP,email and WWW interfaces. • The EBI's Sequence Retrieval System (SRS) integrates and links the main nucleotide and protein databases aswell asmany other specialist molecular biologydatabases. • For sequencesimilarity searching, avariety of tools (e.g. FASTA and BLAST) are available that allow external users to compare their own sequences against the data in the EMBLNucleotide Sequence Database and otherdatabases. • All available resourcescanbe accessedvia the EBIhome page atVartika's Presentation
  • 43. EMBL format 28-APR-1992 (Rel. 31, Created) 30-JUN-1993 (Rel. 36, Last updated, Version 6) L.ivanovii sod gene for superoxide dismutase sod gene; superoxide dismutase. Listeria ivanovii Bacteria; Firmicutes; Bacillus/Clostridium group; Bacillus/Staphylococcus group; Listeria. [1] MEDLINE; 92140371. Haas A., Goebel W.; "Cloning of a superoxide dismutase gene from Listeria ivanovii by functional complementation in Escherichia coli and characterization of ID LISOD standard; DNA; PRO; 756 BP. XX AC X64011; S78972; XX SV X64011.1 XX DT DT XX DE XX KW XX OS OC OC XX RN RX RA RT RT the RT gene product."; Vartika's Presentation
  • 44. M o l . G e n . G e n e t . 2 3 1 : 3 1 3 - 3 2 2 ( 1 9 9 2 ) . [ 2 ] 1 - 7 5 6 K r e f t J . ; ; S u b m i t t e d ( 2 1 - A P R - 1 9 9 2 ) t o t h e E M B L / G e n B a n k / D D B J d a t a b a s e s . J . K r e f t , I n s t i t u t f . M i k r o b i o l o g i e , U n i v e r s i t a e t W u e r z b u r g , B i o z e n t r u m H u b l a n d , 8 7 0 0 W u e r z b u r g , F R G S W I S S - P R O T ; P 2 8 7 6 3 ; S O D M _ L I S I V . K e y L o c a t i o n / Q u a l i f i e r s s o u r c e R B S t e r m i n a t o r C D S 1 . . 7 5 6 / d b _ x r e f = " t a x o n : 1 6 3 8 " / o r g a n i s m = " L i s t e r i a i v a n o v i i " / s t r a i n = " A T C C 1 9 1 1 9 " 9 5 . . 1 0 0 / g e n e = " s o d " 7 2 3 . . 7 4 6 / g e n e = " s o d " 1 0 9 . . 7 1 7 / d b _ x r e f = " S W I S S - P R O T : P 2 8 7 6 3 " / t r a n s l _ t a b l e = 1 1 / g e n e = " s o d " / E C _ n u m b e r = " 1 . 1 5 . 1 . 1 " / p r o d u c t = " s u p e r o x i d e d i s m u t a s e " / p r o t e i n _ i d = " C A A 4 5 4 0 6 . 1 " / t r a n s l a t i o n = " M T Y E L P K L P Y T Y D A L E P N F D K E T M E I H Y T K H H N I Y V T K L N E A H A E L A S K P G E E L V A N L D S V P E E I R G A V R N H G G G H A N H T L F W S S L S P N G G G A P T G N L I E S E F G T F D E F K E K F N A A A A A R F G S G W A W L V V N N G K L E I V S T A N Q D S P L S E G K T P V D V W E H A Y Y L K F Q N R R P E Y I D T F W N V I N W D E R N K R F D A A K " R L X X R N R P R A R T R L R L A m R L X X D R X X F H F H F T F T F T F T F T F T F T F T F T F T F T F T F T F T F T F T V S G F T K A A F T L G L F T X X S Q S e q u e n c e 7 5 6 B P ; 2 4 7 A ; 1 3 6 C ; 1 5 1 G ; 2 2 2 T ; 0 o t h e r ; c g t t a t t t a a g g t g t t a c a t a g t t c t a t g g a a a t a g g g t c t a t a c c t t t c g c c t t a c a a t g t a a t t t c t t g a c t t a c g a a t t a c c a a a a t a g a a a c a a t g g a a a t t c a c t a g c a g t c t c a g g a c a c g c a g a g a t a g c g t t c c t g a a g a a a c c a t a c t t t a t t c t g g t c t a a a a a g c a g c a a t c g a a a g c g g g c a g c t g c g g c t c g t t t t g t a a t a a a c a a t c c g a g g a g g a a t t t t t a a t t t a t g a t g c t t t g g a g c c g a a t t t t g a t a a c c a c a a t a t t t a t g t a a c a a a a c t a a a t g a t a a a c c t g g g g a a g a a t t a g t t g c t a a t c t a g t a c g t a a c c a c g g t g g t g g a c a t g c t a a a a a t g g t g g t g g t g c t c c a a c t g g t a a c t t a t t t g a t g a a t t c a a a g a a a a a t t c a a t g c g g c a t g g c t a g t a g t g a a c a a t g g t a a a c t a g a a a t t g t t 6 0 t t c a c a t a a a 1 2 0 t a c c t t a t a c 1 8 0 a t a c a a a g c a 2 4 0 a a c t t g c a a g 3 0 0 t t c g t g g c g c 3 6 0 g t c t t a g c c c 4 2 0 a a t t c g g c a c 4 8 0 g t t c a g g a t g Vartika's Presentation
  • 45. I D - I d e n t i f i c a t i o n . A C - A c c e s s i o n n u m b e r ( s ) . D T - D a t e . D E - D e s c r i p t i o n . G N - G e n e n a m e ( s ) . O S - O r g a n i s m s p e c i e s . O G - O r g a n e l l e . O C - O r g a n i s m c l a s s i f i c a t i o n . R N - R e f e r e n c e n u m b e r . R P - R e f e r e n c e p o s i t i o n . R C - R e f e r e n c e c o m m e n t s . R X - R e f e r e n c e c r o s s - r e f e r e n c e s . R A - R e f e r e n c e a u t h o r s . R L - R e f e r e n c e l o c a t i o n . C C - C o m m e n t s o r n o t e s . D R - D a t a b a s e c r o s s - r e f e r e n c e s . K W - K e y w o r d s . F T - F e a t u r e t a b l e d a t a . S Q - S e q u e n c e h e a d e r . - ( b l a n k s ) s e q u e n c e d a t a . / / - T e r m i n a t i o n l i n e . S o m e e n t r i e s d o n o t c o n t a i n a l l o f t h e l i n e t y p e s , a n d s o m e l i n e t y p e s o c c u r m a n y t i m e s i n a s i n g l e e n t r y . E a c h e n t r y m u s t b e g i n w i t h a n i d e n t i f i c a t i o n l i n e ( I D ) a n d e n d w i t h a t e r m i n a t o r l i n e ( / / ) .Vartika's Presentation
  • 46. PubMed • PubMed is a free search engine accessing primarily the MEDLINE database of references and abstracts on sciences and biomedical topics. • The PubMed system was offered free to the public in 1997. • The United States National Library of Medicine (NLM) the National Institutes of Health maintains the part of the Entrez system of information retrieval. • PMID is the unique identifier number used in Vartika's Presentation
  • 47. • Theyare assignedto eacharticle record when it enters the PubMedsystem. • ThePMID# is alwaysfound at the end of aPubMed citation. • PubMed Central (PMC) is afree digital system that archivespublicly accessiblefull-text scholarly articles that have been published within the biomedical and life sciences journalliterature. • A"PubMed Mobile" option, providing accessto amobileVartika's Presentation
  • 55. Entrez • WWW-based data retrievalsystem. • Developed by NCBI(National Centre for Biotechnology Information). • - Integrates information held in different DBs. Vartika's Presentation
  • 56. Data bases covered by Entrez are • Nucleic acid -GenBank, RefSeq,PDB. • Protein seqs-SWISS- PROT,PIR. • 3Dstructures –MMDB • Genomes –Many sources • PopSet – FromGenBank • OMIM –OMIM • Taxonomy – NCBItaxonomy database • Books- Bookshelf • ProbeSet – GEO(Gene ExpressionOmnibus) • Literature -PubMed Vartika's Presentation
  • 65. SRS • SRSis aSequence RetrievalSystem • - Data retrieval tool developed by EBI • - Integrates 80 molecular biology DBs • -AnOpen sourcesoftware (Canbe installed locally) • SRShas an associated scripting language calledIcarus • Central resource for molecular biology data • - more than 250databanks have been indexed. More than 35SRS servers over theWWW(world wide) Vartika's Presentation
  • 66. • Information retrieval • Easy way to retrieve information from sequence and sequence-related databases • Possibility to search for multiple words/other criteria • Linkage between different databases • E.g. Find all primary structures with known three-dimensional • Different types of database in SRS • Sequence & structure • DNA, protein, three-dimensional structures • Sequence-related • Gene-related • Genome, mapping, mutations, transcription factors • SNP • Bibliographic Vartika's Presentation
  • 67. • SRS main toolbar tabs: • Top Page: displays databases in different database groups • Query: displays either the standard or extended query form • Results or “the query manager”: maintains a history of all the results obtained during a session • Projects or “the project manager”: maintains a history of all queries and views used during a session • Views: allows a user to define a user specific view for one or more databases • Databanks: contains a list and some facts about the databases available in the system Vartika's Presentation
  • 68. • Search terms in SRS • SRS indexed fields can be searched using any of the • Single word search • Multiple word phrases • Numbers and dates • Regular expressions • Wildcards • Vartika's Presentation
  • 72. LocusLink • LocusLink (http://www.ncbi.nlm.nih.gov/LocusLink) is aNational Center for Biotechnology Information (NCBI) online resource. • It is principally intended for useby graduate students and professional researchersin the biomedical sciences. • It is designed to bring together related information on genetic loci and gene products from several sources. • LocusLink provides acentral point of accessfor basic biomedical information and molecular data for genes, transcripts, and proteins from model organisms, currently including human, rat, mouse, fruit fly,and zebrafish. • Now it is not availablein NCBI. Vartika's Presentation