INFORMATION RETRIEVAL
FROM
DATABASES
SRS (Sequence Retrieval System)
It is a network browser for database in molecular biology , this
involved to help EMBnet users.
It allows any flat-file database to be index to any other , it allows
user to retrieve , link & access entries from all the interconnected
resources.
The source links nucleic acid , protein sequence , structure ,
pattern , bibliographic databases.
2
SRS is integral system for info retrieval from many different
sequence & for feeding the sequences retrieved into analytic tools
such as sequence comparison and alignment programes.
It can search a total of 141 databases of protein & nucleotide
sequences , metabolic pathways , 3D structures & functions ,
genomes , diseases and phenotype information.
3
….SRS
SRS performs searches on the following categories:
References Sequence libraries - complete
Sequence libraries - subsections InterPro&Related
SeqRelated TransFac
User Owned Databanks Application Results
Protein3DStruct Genome
Mapping Mutations
Locus Specific Mutations Metabolic Pathways
Others SNP
EMBOSS DOCS System
Searches can be carried out using
•Quick search on all entries
•Standard form with Boolean operators
•Extended form with field names
SRS- Sequence Retrieval System
-Facilitates the browsing of biological databases.
-Paragon of connectivity allows the user to explore virtually all
existing molecular biology databases/databanks installed at different
locations, making full use of the inter-database links.
- Maintains indexes relating entries in one databank to entries in others.
- Maintains a large network of related databank entries.
- Provides search tools that enables users to retrieve particular entries or
to find the accession code of entries that satisfy specific criteria e.g.,
entries with fields that match a given pattern strings, or fields with
numeric values in given range.
- Provides link operators that enables the users to follow a chain of cross
references where there is no direct link between the two.
Although EMBnet has played a vital role in centralizing data resources
for its national user communities, a problem that emerged was that
there was no effective, efficient way of interrogating all the resources
gathered together at a particular site, since there was no common
formats among the different database types.
As a result, a research project was undertaken within EMBnet to
address the problems inherent in interfacing complex environment, i.e.,
SEQUENCE RETRIEVAL SYSTEM (SRS)
SRS allows any flat-file database to be indexed to any other.
SRS has an advantage that the derived indices may be rapidly searched,
allowing users to retrieve, link and access entries from all the
interconnected resources.
The system has the particular strength that it can be readily customized
to use any defined set of databanks.
The Typical resource links:
- Nucleic acids
-EST
-Protein sequence, pattern, structure
- Specialist/boutique and/or bibliographic databases
SRS is thus a very powerful tool, allowing users to
formulate queries across a range of different database
types via a single interface, without having to worry
about underlying data structures, query languages, and so
on.
SEQUENCE RETRIEVAL SYSTEM
Inter-relationship between various databases on Entrez
Entrez Databases and their description
Database Description
PubMed The Biomedical literature
Nucleotide Sequence database (GenBank)
Protein Sequence database
Structure 3-D Molecular structures
Genome Complete genome assemblies
PopSet Population study data sets
OMIM Online Mendelian Inheritance in Man
Taxonomy Organisms in GenBank
Books BookShelf online Books
3-D domains Domains from Entrez structure
UniSTS Markers and Mapping data
SNP Single Nucleotide Polymorphisms
CDD Conserved domains
Journals Journals in Entrez
UniGene Gene-oriented clusters of transcript sequences
PMC Full-text digital archive of life sciences journal
literature
NCBI website NCBI website search
Various Data Retrieval and Submission Tools @ NCBI
Text term searching
1. Entrez- provides integrated access to nucleotide and protein
sequence data from over 100,000 organisms along with 3-D
protein structures, genomic mapping information, and PubMed
MEDLINE.
2. LinkOut- A registry service to create links form specific
articles, journals, or biological data in Entrez to resources on
external websites.
3. Cubby- Allows Entrez users to store and update searches and to
customize their LinkOut display to include or exclude links to
providers.
4. Citation Matcher- Allows you to find the PubMed ID or the
MEDLINE UID of any article in the PubMed database, given its
bibliographic information.
Sequence Similarity Searching
1. BLAST Homepage- Access to BLAST programs, overview, help
documentation, and FAQs.
2. Blink- Displays results of BLAST searches that have been done for
every protein sequence in the Entrez Protein database.
3. Network-Client BLAST- A BLAST client (blastcl3) that access the
NCBI BLAST search engine.
Taxonomy
1. Taxonomy Browser- A tools for searching NCBI’s taxonomy
database.
2. Taxonomy BLAST- Groups BLAST hits by source organism
according to their classification in NCBI’s taxonomy database.
3. TaxTable- Summarizes BLAST taxonomy data and displays the
relationship of the organism to others through a color-coded graph.
4. ProtTable- Provides a summary of protein coding regions in Genome.
5. TaxPlot- Provides a 3-way view of genome similarities.
Sequence Submission
1. Sequin- A data submission tool that includes ORF finder, an alignment
viewer/editor, and a link to PowerBLAST.
2. BankIt- a WWW submission tool for one or simple sequence submissions
and allows access to following databases:
(a) EMBL Nucleotide database
(b) Swiss-Prot
(c) Macromolecular Structure database
(d) Array Express
(e) ENSEMBL
3. DBGET- an integrated database retrieval system
4. Biology Workbench- an integrated tool for performing searches on
protein and nucleotide databases.
5. BioRS- a retrieval system for biological data developed by Biomax
Informatics AG to perform various retrieval tasks.

7. Information retrieval from databases.ppt

  • 1.
  • 2.
    SRS (Sequence RetrievalSystem) It is a network browser for database in molecular biology , this involved to help EMBnet users. It allows any flat-file database to be index to any other , it allows user to retrieve , link & access entries from all the interconnected resources. The source links nucleic acid , protein sequence , structure , pattern , bibliographic databases. 2
  • 3.
    SRS is integralsystem for info retrieval from many different sequence & for feeding the sequences retrieved into analytic tools such as sequence comparison and alignment programes. It can search a total of 141 databases of protein & nucleotide sequences , metabolic pathways , 3D structures & functions , genomes , diseases and phenotype information. 3
  • 4.
    ….SRS SRS performs searcheson the following categories: References Sequence libraries - complete Sequence libraries - subsections InterPro&Related SeqRelated TransFac User Owned Databanks Application Results Protein3DStruct Genome Mapping Mutations Locus Specific Mutations Metabolic Pathways Others SNP EMBOSS DOCS System Searches can be carried out using •Quick search on all entries •Standard form with Boolean operators •Extended form with field names
  • 5.
    SRS- Sequence RetrievalSystem -Facilitates the browsing of biological databases. -Paragon of connectivity allows the user to explore virtually all existing molecular biology databases/databanks installed at different locations, making full use of the inter-database links. - Maintains indexes relating entries in one databank to entries in others. - Maintains a large network of related databank entries. - Provides search tools that enables users to retrieve particular entries or to find the accession code of entries that satisfy specific criteria e.g., entries with fields that match a given pattern strings, or fields with numeric values in given range. - Provides link operators that enables the users to follow a chain of cross references where there is no direct link between the two.
  • 6.
    Although EMBnet hasplayed a vital role in centralizing data resources for its national user communities, a problem that emerged was that there was no effective, efficient way of interrogating all the resources gathered together at a particular site, since there was no common formats among the different database types. As a result, a research project was undertaken within EMBnet to address the problems inherent in interfacing complex environment, i.e., SEQUENCE RETRIEVAL SYSTEM (SRS) SRS allows any flat-file database to be indexed to any other. SRS has an advantage that the derived indices may be rapidly searched, allowing users to retrieve, link and access entries from all the interconnected resources. The system has the particular strength that it can be readily customized to use any defined set of databanks.
  • 7.
    The Typical resourcelinks: - Nucleic acids -EST -Protein sequence, pattern, structure - Specialist/boutique and/or bibliographic databases SRS is thus a very powerful tool, allowing users to formulate queries across a range of different database types via a single interface, without having to worry about underlying data structures, query languages, and so on.
  • 8.
  • 9.
  • 10.
    Entrez Databases andtheir description Database Description PubMed The Biomedical literature Nucleotide Sequence database (GenBank) Protein Sequence database Structure 3-D Molecular structures Genome Complete genome assemblies PopSet Population study data sets OMIM Online Mendelian Inheritance in Man Taxonomy Organisms in GenBank Books BookShelf online Books
  • 11.
    3-D domains Domainsfrom Entrez structure UniSTS Markers and Mapping data SNP Single Nucleotide Polymorphisms CDD Conserved domains Journals Journals in Entrez UniGene Gene-oriented clusters of transcript sequences PMC Full-text digital archive of life sciences journal literature NCBI website NCBI website search
  • 12.
    Various Data Retrievaland Submission Tools @ NCBI Text term searching 1. Entrez- provides integrated access to nucleotide and protein sequence data from over 100,000 organisms along with 3-D protein structures, genomic mapping information, and PubMed MEDLINE. 2. LinkOut- A registry service to create links form specific articles, journals, or biological data in Entrez to resources on external websites. 3. Cubby- Allows Entrez users to store and update searches and to customize their LinkOut display to include or exclude links to providers. 4. Citation Matcher- Allows you to find the PubMed ID or the MEDLINE UID of any article in the PubMed database, given its bibliographic information.
  • 13.
    Sequence Similarity Searching 1.BLAST Homepage- Access to BLAST programs, overview, help documentation, and FAQs. 2. Blink- Displays results of BLAST searches that have been done for every protein sequence in the Entrez Protein database. 3. Network-Client BLAST- A BLAST client (blastcl3) that access the NCBI BLAST search engine. Taxonomy 1. Taxonomy Browser- A tools for searching NCBI’s taxonomy database. 2. Taxonomy BLAST- Groups BLAST hits by source organism according to their classification in NCBI’s taxonomy database. 3. TaxTable- Summarizes BLAST taxonomy data and displays the relationship of the organism to others through a color-coded graph. 4. ProtTable- Provides a summary of protein coding regions in Genome. 5. TaxPlot- Provides a 3-way view of genome similarities.
  • 14.
    Sequence Submission 1. Sequin-A data submission tool that includes ORF finder, an alignment viewer/editor, and a link to PowerBLAST. 2. BankIt- a WWW submission tool for one or simple sequence submissions and allows access to following databases: (a) EMBL Nucleotide database (b) Swiss-Prot (c) Macromolecular Structure database (d) Array Express (e) ENSEMBL 3. DBGET- an integrated database retrieval system 4. Biology Workbench- an integrated tool for performing searches on protein and nucleotide databases. 5. BioRS- a retrieval system for biological data developed by Biomax Informatics AG to perform various retrieval tasks.