The document discusses biological databases. It begins by defining what a database is, including that it is a collection of related data organized in a way that allows information to be retrieved easily. It then discusses different types of biological databases, including those containing nucleotide sequences, protein sequences, 3D structures, gene expression data, and metabolic pathways. The rest of the document provides details on specific biological databases like GenBank, EMBL, DDBJ, and NCBI databases including Entrez. It emphasizes that biological data is heterogeneous, large in volume, dynamic and integrated across multiple databases.
SWISS-PROT- Protein Database- The Universal Protein Resource Knowledgebase (UniProtKB) is the central hub for the collection of functional information on proteins.
This presentation gives you a detailed information about the swiss prot database that comes under UniProtKB. It also covers TrEMBL: a computer annotated supplement to Swiss-Prot.
SWISS-PROT- Protein Database- The Universal Protein Resource Knowledgebase (UniProtKB) is the central hub for the collection of functional information on proteins.
This presentation gives you a detailed information about the swiss prot database that comes under UniProtKB. It also covers TrEMBL: a computer annotated supplement to Swiss-Prot.
INTRODUCTION
WHAT IS DATA AND DATABASE?
WHAT IS BIOLOGICAL DATABASE?
TYPES OF BIOLOGICAL DATABASE
PRIMARY DATABASE
Nucleic acid sequence database
Protein sequence database
SECONDARY DATABASE
COMPOSITE DATABASE
TERTIARY DATABASE
WHY NEED?
CONCLUSION
REFRENCES
Sequence alig Sequence Alignment Pairwise alignment:-naveed ul mushtaq
Sequence Alignment Pairwise alignment:- Global Alignment and Local AlignmentTwo types of alignment Progressive Programs for multiple sequence alignment BLOSUM Point accepted mutation (PAM)PAM VS BLOSUM
INTRODUCTION OF BIOINFORMATICS
HISTORY
WHAT IS DATABASE
NEED FOR DATABASE
TYPES OF DATABASE
PRIMARY DATABASE
NUCLEIC ACID SEQUENCE DATABASE
GENE BANK
INTRODUCTION
GENE BANK SUBMISSION TOOL
GENE BANK SUBMISSION TYPE
HOW TO RETRIEVE DATA FROM GENEBANK
APPLICATION
CONCLUSION
REFERENCE
The DNA Data Bank of Japan (DDBJ) is a biological database that collects DNA sequences. It is located at the National Institute of Genetics (NIG) in the Shizuoka prefecture of Japan. It is also a member of the International Nucleotide Sequence Database Collaboration or INSDC.
INTRODUCTION.
NCBI.
EMBL.
DDBJ.
CONCLUSION.
REFERENSE.
The National Center for Biotechnology Information (NCBI) is part of the United States National Library of Medicine (NLM), a branch of the National Institutes of Health.
The NCBI is located in Bethesda, Maryland and was founded in 1988 through legislation sponsored by Senator Claude Pepper.
The NCBI houses a series of databases relevant to biotechnology and biomedicine. Major databases include GenBank for DNA sequences and PubMed, a bibliographic database for the biomedical literature.
All these databases are available online through the Entrez search engine.
Lecture delivered by T. Ashok Kumar, Head, Department of Bioinformatics, Noorul Islam College of Arts and Science, Kumaracoil, Thuckalay, INDIA. UGC Sponsored National Workshop on BIOINFORMATICS AND GENOME ANALYSIS for College Teachers on August 11 & 12, 2014. Organized by Centre for Bioinformatics, Department of Zoology, NMCC.
As an interdisciplinary field of science, bioinformatics combines biology, computer science, information engineering, mathematics and statistics to analyze and interpret the biological data.
INTRODUCTION
WHAT IS DATA AND DATABASE?
WHAT IS BIOLOGICAL DATABASE?
TYPES OF BIOLOGICAL DATABASE
PRIMARY DATABASE
Nucleic acid sequence database
Protein sequence database
SECONDARY DATABASE
COMPOSITE DATABASE
TERTIARY DATABASE
WHY NEED?
CONCLUSION
REFRENCES
Sequence alig Sequence Alignment Pairwise alignment:-naveed ul mushtaq
Sequence Alignment Pairwise alignment:- Global Alignment and Local AlignmentTwo types of alignment Progressive Programs for multiple sequence alignment BLOSUM Point accepted mutation (PAM)PAM VS BLOSUM
INTRODUCTION OF BIOINFORMATICS
HISTORY
WHAT IS DATABASE
NEED FOR DATABASE
TYPES OF DATABASE
PRIMARY DATABASE
NUCLEIC ACID SEQUENCE DATABASE
GENE BANK
INTRODUCTION
GENE BANK SUBMISSION TOOL
GENE BANK SUBMISSION TYPE
HOW TO RETRIEVE DATA FROM GENEBANK
APPLICATION
CONCLUSION
REFERENCE
The DNA Data Bank of Japan (DDBJ) is a biological database that collects DNA sequences. It is located at the National Institute of Genetics (NIG) in the Shizuoka prefecture of Japan. It is also a member of the International Nucleotide Sequence Database Collaboration or INSDC.
INTRODUCTION.
NCBI.
EMBL.
DDBJ.
CONCLUSION.
REFERENSE.
The National Center for Biotechnology Information (NCBI) is part of the United States National Library of Medicine (NLM), a branch of the National Institutes of Health.
The NCBI is located in Bethesda, Maryland and was founded in 1988 through legislation sponsored by Senator Claude Pepper.
The NCBI houses a series of databases relevant to biotechnology and biomedicine. Major databases include GenBank for DNA sequences and PubMed, a bibliographic database for the biomedical literature.
All these databases are available online through the Entrez search engine.
Lecture delivered by T. Ashok Kumar, Head, Department of Bioinformatics, Noorul Islam College of Arts and Science, Kumaracoil, Thuckalay, INDIA. UGC Sponsored National Workshop on BIOINFORMATICS AND GENOME ANALYSIS for College Teachers on August 11 & 12, 2014. Organized by Centre for Bioinformatics, Department of Zoology, NMCC.
As an interdisciplinary field of science, bioinformatics combines biology, computer science, information engineering, mathematics and statistics to analyze and interpret the biological data.
In the era of computers life sciences databases are still understated. Here is my presentation on biological databases. Complete classification of different databases.
For more presentations and work come and visit
https://www.linkedin.com/in/shradheya-r-r-gupta-54492984/
BIOLOGICAL DATABASES :
A biological database is a large, organized body of persistent data, usually associated with computerized software designed to update, query, and retrieve components of the data stored within the system.
The chief objective of the development of a database is to organize data in a set of structured records to enable easy retrieval of information.
Example. A few popular databases are GenBank from NCBI (National Center for Biotechnology Information), SwissProt from the Swiss Institute of Bioinformatics and PIR from the Protein Information Resource.
IMPORTANCE OF DATABASES :
1. Databases act as a store house of information.
2. Databases are used to store and organize data in such a way that information can be retrieved easily via a variety of search criteria.
3. It allows knowledge discovery, which refers to the identification of connections between pieces of information that were not known when the information was first entered. This facilitates the discovery of new biological insights from raw data.
4. Secondary databases have become the molecular biologist’s reference library over the past decade or so, providing a wealth of information on just about any gene or gene product that has been investigated by the research community.
5. It helps to solve cases where many users want to access the same entries of data.
6. Allows the indexing of data.
7. It helps to remove redundancy of data.
TYPES OF BIOLOGICAL DATABASES:
Biological databases are classified on
1. Based on content of biological data
2. Based on the nature of data.
1. BASED ON CONTENT OF BIOLOGICAL DATA :
Based on their contents, biological databases can be roughly divided into two categories:
1. Primary databases
2. Secondary databases
Different types of relations found among organisms.
Relationships ensures food and space among organism.
These relationship is known as interaction.
On the basis of benefits and harmful effects – two types of interaction.
Positive interaction & Negative interaction.
Undesirable changes occurring in water which may harmfully affect the life activities of man and domesticated species.
An alternation in physical, chemical, biological characteristics of water making unsuitable for use.
Basic functional unit of ecology
Interacting system
Fundamental ecological Unit (ODUM)
Biotic and Abiotic factors
A.G.Tansley (1935)
Eco – environment and system – complex coordinated unit
Holocoenosis
Introduction:
Life table:
Life table is a comprehensive method of describing mortality, survival and other vital events in a population.
It is composed of several sets of values showing how a group of infants who are under unchanging conditions would gradually die.
It provides concise measures of longevity of that population.
Separate tables are prepared for males and females after each decennium census.
It is also called as the “Biometer” of the population by William Farr.
Vital statistics, the most important branch of statistics, deals with the mankind in aggregate.
It provides a description of the vital events occurring in given communities.
By vital events, we mean birth, death, sickness, marriage, divorce, fertility, etc.
It deals with people rather than with things.
Vital statistics are of much importance for the people and nation.
Vermitechnology means rearing of earthworms. earthworm is friend of farmer. earthworm is doing a great job and also produced a good organic manure is called vermicompost. vermicompost is a biofertilzer. which is enhancing soil qualities. This is explained earthworm biology, importance and preparation of vermicompost, vermiwash, panchgavya and their importance.
Setting an aquarium is an important steps to maintaining healthy ornamental fishes. It gives mind relaxation and peaceful. It is a hobby and reduces the stress also
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdffxintegritypublishin
Advancements in technology unveil a myriad of electrical and electronic breakthroughs geared towards efficiently harnessing limited resources to meet human energy demands. The optimization of hybrid solar PV panels and pumped hydro energy supply systems plays a pivotal role in utilizing natural resources effectively. This initiative not only benefits humanity but also fosters environmental sustainability. The study investigated the design optimization of these hybrid systems, focusing on understanding solar radiation patterns, identifying geographical influences on solar radiation, formulating a mathematical model for system optimization, and determining the optimal configuration of PV panels and pumped hydro storage. Through a comparative analysis approach and eight weeks of data collection, the study addressed key research questions related to solar radiation patterns and optimal system design. The findings highlighted regions with heightened solar radiation levels, showcasing substantial potential for power generation and emphasizing the system's efficiency. Optimizing system design significantly boosted power generation, promoted renewable energy utilization, and enhanced energy storage capacity. The study underscored the benefits of optimizing hybrid solar PV panels and pumped hydro energy supply systems for sustainable energy usage. Optimizing the design of solar PV panels and pumped hydro energy supply systems as examined across diverse climatic conditions in a developing country, not only enhances power generation but also improves the integration of renewable energy sources and boosts energy storage capacities, particularly beneficial for less economically prosperous regions. Additionally, the study provides valuable insights for advancing energy research in economically viable areas. Recommendations included conducting site-specific assessments, utilizing advanced modeling tools, implementing regular maintenance protocols, and enhancing communication among system components.
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)MdTanvirMahtab2
This presentation is about the working procedure of Shahjalal Fertilizer Company Limited (SFCL). A Govt. owned Company of Bangladesh Chemical Industries Corporation under Ministry of Industries.
Water scarcity is the lack of fresh water resources to meet the standard water demand. There are two type of water scarcity. One is physical. The other is economic water scarcity.
Final project report on grocery store management system..pdfKamal Acharya
In today’s fast-changing business environment, it’s extremely important to be able to respond to client needs in the most effective and timely manner. If your customers wish to see your business online and have instant access to your products or services.
Online Grocery Store is an e-commerce website, which retails various grocery products. This project allows viewing various products available enables registered users to purchase desired products instantly using Paytm, UPI payment processor (Instant Pay) and also can place order by using Cash on Delivery (Pay Later) option. This project provides an easy access to Administrators and Managers to view orders placed using Pay Later and Instant Pay options.
In order to develop an e-commerce website, a number of Technologies must be studied and understood. These include multi-tiered architecture, server and client-side scripting techniques, implementation technologies, programming language (such as PHP, HTML, CSS, JavaScript) and MySQL relational databases. This is a project with the objective to develop a basic website where a consumer is provided with a shopping cart website and also to know about the technologies used to develop such a website.
This document will discuss each of the underlying technologies to create and implement an e- commerce website.
Cosmetic shop management system project report.pdfKamal Acharya
Buying new cosmetic products is difficult. It can even be scary for those who have sensitive skin and are prone to skin trouble. The information needed to alleviate this problem is on the back of each product, but it's thought to interpret those ingredient lists unless you have a background in chemistry.
Instead of buying and hoping for the best, we can use data science to help us predict which products may be good fits for us. It includes various function programs to do the above mentioned tasks.
Data file handling has been effectively used in the program.
The automated cosmetic shop management system should deal with the automation of general workflow and administration process of the shop. The main processes of the system focus on customer's request where the system is able to search the most appropriate products and deliver it to the customers. It should help the employees to quickly identify the list of cosmetic product that have reached the minimum quantity and also keep a track of expired date for each cosmetic product. It should help the employees to find the rack number in which the product is placed.It is also Faster and more efficient way.
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptxR&R Consult
CFD analysis is incredibly effective at solving mysteries and improving the performance of complex systems!
Here's a great example: At a large natural gas-fired power plant, where they use waste heat to generate steam and energy, they were puzzled that their boiler wasn't producing as much steam as expected.
R&R and Tetra Engineering Group Inc. were asked to solve the issue with reduced steam production.
An inspection had shown that a significant amount of hot flue gas was bypassing the boiler tubes, where the heat was supposed to be transferred.
R&R Consult conducted a CFD analysis, which revealed that 6.3% of the flue gas was bypassing the boiler tubes without transferring heat. The analysis also showed that the flue gas was instead being directed along the sides of the boiler and between the modules that were supposed to capture the heat. This was the cause of the reduced performance.
Based on our results, Tetra Engineering installed covering plates to reduce the bypass flow. This improved the boiler's performance and increased electricity production.
It is always satisfying when we can help solve complex challenges like this. Do your systems also need a check-up or optimization? Give us a call!
Work done in cooperation with James Malloy and David Moelling from Tetra Engineering.
More examples of our work https://www.r-r-consult.dk/en/cases-en/
About
Indigenized remote control interface card suitable for MAFI system CCR equipment. Compatible for IDM8000 CCR. Backplane mounted serial and TCP/Ethernet communication module for CCR remote access. IDM 8000 CCR remote control on serial and TCP protocol.
• Remote control: Parallel or serial interface.
• Compatible with MAFI CCR system.
• Compatible with IDM8000 CCR.
• Compatible with Backplane mount serial communication.
• Compatible with commercial and Defence aviation CCR system.
• Remote control system for accessing CCR and allied system over serial or TCP.
• Indigenized local Support/presence in India.
• Easy in configuration using DIP switches.
Technical Specifications
Indigenized remote control interface card suitable for MAFI system CCR equipment. Compatible for IDM8000 CCR. Backplane mounted serial and TCP/Ethernet communication module for CCR remote access. IDM 8000 CCR remote control on serial and TCP protocol.
Key Features
Indigenized remote control interface card suitable for MAFI system CCR equipment. Compatible for IDM8000 CCR. Backplane mounted serial and TCP/Ethernet communication module for CCR remote access. IDM 8000 CCR remote control on serial and TCP protocol.
• Remote control: Parallel or serial interface
• Compatible with MAFI CCR system
• Copatiable with IDM8000 CCR
• Compatible with Backplane mount serial communication.
• Compatible with commercial and Defence aviation CCR system.
• Remote control system for accessing CCR and allied system over serial or TCP.
• Indigenized local Support/presence in India.
Application
• Remote control: Parallel or serial interface.
• Compatible with MAFI CCR system.
• Compatible with IDM8000 CCR.
• Compatible with Backplane mount serial communication.
• Compatible with commercial and Defence aviation CCR system.
• Remote control system for accessing CCR and allied system over serial or TCP.
• Indigenized local Support/presence in India.
• Easy in configuration using DIP switches.
Welcome to WIPAC Monthly the magazine brought to you by the LinkedIn Group Water Industry Process Automation & Control.
In this month's edition, along with this month's industry news to celebrate the 13 years since the group was created we have articles including
A case study of the used of Advanced Process Control at the Wastewater Treatment works at Lleida in Spain
A look back on an article on smart wastewater networks in order to see how the industry has measured up in the interim around the adoption of Digital Transformation in the Water Industry.
Water Industry Process Automation and Control Monthly - May 2024.pdf
Introduction to Biological databases
1.
2. BIOLOGICAL DATABASES
Dr. K. RameshKumar
Assistant Professor
PG & Research Dept. of Zoology
Vivekananda College
Tiruvedakam- West
MADURAI - 625234
3. WHAT IS DATABASE?
A database is any collection of related data.
A Computerized archive used to store and organize
data in such a way that information can be retrieved
easily.
A database is a collection of interrelated data store
together without harmful and unnecessary
redundancy (duplicate data) to serve multiple
applications
Retrieving is called firing a query
4. Structured collection of information.
Consists of basic units called records or entries.
Each record consists of fields, which hold pre-
defined data related to the record.
For example, a protein database would have
protein entries as records and protein properties as
fields (e.g., name of protein, length, amino-acid
sequence)
5. THE ‘PERFECT’ DATABASE
Comprehensive, but easy to search.
Annotated, but not “too annotated”.
A simple, easy to understand structure.
Cross-referenced.
Minimum redundancy.
Easy retrieval of data.
6.
7. BIOLOGICAL DATABASES
Data Heterogeneity – diversity in the data types
High Volume of data – highly heterogeneous-
voluminous to support comprehensive investigation in
various fields
Uncertainty – Biological phenomena are observed and
assumed to be true
Data Curation – inconsistencies are due to a lack of
knowledge in the desired field
Large scale data integration – data collected from lab
worldwide after years of research
Data sharing – Scientific community examination and
inspection
Dynamic and subject to continual change – Everyday
in various lab
8. BIOLOGICAL DATABASES
Need for storing and communicating large
datasets has grown
Make biological data available to scientists.
To make biological data available in computer-
readable form.
9.
10.
11. DIFFERENT CLASSIFICATIONS OF
DATABASES
Type of data
nucleotide sequences
protein sequences
proteins sequence patterns or motifs
macromolecular 3D structure
gene expression data
metabolic pathways
12. DIFFERENT CLASSIFICATIONS OF DATABASES….
Primary or derived databases
Primary databases: experimental results directly into
database
Secondary databases: results of analysis of primary
databases
Aggregate of many databases
Links to other data items
Combination of data
Consolidation of data
13. DIFFERENT CLASSIFICATIONS OF DATABASES….
Availability
Publicly available, no restrictions
Available, but with copyright
Accessible, but not downloadable
Academic, but not freely available
Proprietary, commercial; possibly free for academics
14. THE CENTRAL DOGMA & BIOLOGICAL DATA
Protein structures
-Experiments
-Models (homologues)
Literature information
Original DNA Sequences
(Genomes)
Protein Sequences
-Inferred
-Direct sequencing
Expressed DNA sequence
( = mRNA Sequences
= cDNA sequences)
Expressed Sequence Tag
(ESTs)
15.
16. NATIONAL CENTER FOR
BIOTECHNOLOGY INFORMATION
Created in 1988 as a part of the
National Library of Medicine at NIH
– Establish public databases
– Research in computational biology
– Develop software tools for sequence analysis
– Disseminate biomedical information
Bethesda,MD
18. 18
NCBI AND ENTREZ
One of the largest and most comprehensive
databases belonging to the NIH – national
institute of health (USA)
Entrez is the search engine of NCBI
Search for :
genes, proteins, genomes, structures, diseases,
publications and more.
http://www.ncbi.nlm.nih.gov/
19. PRIMARY DATABASES
This databases contains the raw nucleic acid
sequence data which are produced and submitted
by researchers worldwide.
• Nucleic acid
EMBL
GenBank
DDBJ (DNA Data Bank of Japan)
• Protein
PIR
MIPS
SWISS-PROT
TrEMBL
NRL-3D
20. NUCLEOTIDE SEQUENCE DATABASES
EMBL, GenBank, and DDBJ are the three primary
nucleotide sequence databases
EMBL www.ebi.ac.uk/embl/
GenBank www.ncbi.nlm.nih.gov/Genbank/
DDBJ www.ddbj.nig.ac.jp
They together constitute the International
Nucleotide Sequence database callaboration.
21. GENBANK
An annotated collection of all publicly available
nucleotide and proteins
Set up in 1979 at the LANL (Los Alamos).
Maintained since 1992 NCBI (Bethesda).
http://www.ncbi.nlm.nih.gov
24. EMBL NUCLEOTIDE SEQUENCE
DATABASE
An annotated collection of all publicly available
nucleotide and protein sequences
Created in 1980 at the European Molecular
Biology Laboratory in Heidelberg.
Maintained since 1994 by EBI- Cambridge.
http://www.ebi.ac.uk/embl.html
25. DDBJ–DNA DATA BANK OF JAPAN
An annotated collection of all publicly
available nucleotide and protein sequences
Started, 1984 at the National Institute of
Genetics (NIG) in Mishima.
Still maintained in this institute a team led
by Takashi Gojobori.
http://www.ddbj.nig.ac.jp
26. NCBI DATABASES AND SERVICES
GenBank primary sequence database
Free public access to biomedical literature
PubMed free Medline (3 million searches per day)
PubMed Central full text online access
Entrez integrated molecular and literature databases
27. TYPES OF MOLECULAR DATABASES
Primary Databases
Original submissions by experimentalists
Content controlled by the submitter
Examples: GenBank, Trace, SRA, SNP, GEO
Derivative Databases
Derived from primary data
Content controlled by third party (NCBI)
Examples: NCBI Protein, Refseq, TPA, RefSNP, GEO datasets, UniGene,
Homologene, Structure, Conserved Domain
28. PRIMARY VS. DERIVATIVE SEQUENCE
DATABASES
GenBank
Sequencing
Centers
TATAGCCG TATAGCCG
TATAGCCG TATAGCCG
Labs
Algorithms
UniGene
Curators
RefSeq
Genome
Assembly
TATAGCCG
AGCTCCGATA
CCGATGACAA
Updated
continually
by NCBI
Updated ONLY
by submitters
29. SEQUENCE DATABASES AT NCBI
Primary
GenBank: NCBI’s primary sequence database
Trace Archive: reads from capillary sequencers
Sequence Read Archive: next generation data
Derivative
GenPept (GenBank translations)
Outside Protein (UniProt—Swiss-Prot, PDB)
NCBI Reference Sequences (RefSeq)
30. GENBANK - PRIMARY SEQUENCE DB
Nucleotide only sequence database
Archival in nature
Historical
Reflective of submitter point of view (subjective)
Redundant
Data
Direct submissions (traditional records)
Batch submissions
FTP accounts (genome data)
31. GENBANK - PRIMARY SEQUENCE DB (2)
Three collaborating databases
1. GenBank
2. DNA Database of Japan (DDBJ)
3. European Molecular Biology Laboratory (EMBL) Database
32. TRADITIONAL GENBANK RECORD
ACCESSION U07418
VERSION U07418.1 GI:466461
Accession
•Stable
•Reportable
•Universal
Version
Tracks changes in sequence
GI number
NCBI internal use
well annotated
the sequence is the data
34. FEATURES Location/Qualifiers
source 1..2484
/organism="Homo sapiens"
/mol_type="mRNA"
/db_xref="taxon:9606"
/chromosome="3"
/map="3p22-p23"
gene 1..2484
/gene="MLH1"
CDS 22..2292
/gene="MLH1"
/note="homolog of S. cerevisiae PMS1 (Swiss-Prot Accession
Number P14242), S. cerevisiae MLH1 (GenBank Accession
Number U07187), E. coli MUTL (Swiss-Prot Accession Number
P23367), Salmonella typhimurium MUTL (Swiss-Prot Accession
Number P14161) and Streptococcus pneumoniae (Swiss-Prot
Accession Number P14160)"
/codon_start=1
/product="DNA mismatch repair protein homolog"
/protein_id="AAC50285.1"
/db_xref="GI:463989"
/translation="MSFVAGVIRRLDETVVNRIAAGEVIQRPANAIKEMIENCLDAKS
TSIQVIVKEGGLKLIQIQDNGTGIRKEDLDIVCERFTTSKLQSFEDLASISTYGFRGE
ALASISHVAHVTITTKTADGKCAYRASYSDGKLKAPPKPCAGNQGTQITVEDLFYNIA
TRRKALKNPSEEYGKILEVVGRYSVHNAGISFSVKKQGETVADVRTLPNASTVDNIRS
GENPEPT: GENBANK CDS TRANSLATIONS
>gi|463989|gb|AAC50285.1| DNA mismatch repair prote...
MSFVAGVIRRLDETVVNRIAAGEVIQRPANAIKEMIENCLDAKSTSIQVIV...
EDLDIVCERFTTSKLQSFEDLASISTYGFRGEALASISHVAHVTITTKTAD...
35. REFSEQ: DERIVATIVE SEQUENCE DATABASE
Curated transcripts and proteins
Model transcripts and proteins
Assembled Genomic Regions
Chromosome records
Human genome
microbial
organelle
ftp://ftp.ncbi.nih.gov/refseq/release/
38. REFSEQS: ANNOTATION REAGENTS
Genomic DNA
(NC, NT, NW)
Model mRNA (XM)
(XR)
Curated mRNA (NM)
(NR)
Model protein (XP)
Curated Protein (NP)
Scanning....
= ?
GenBank
Sequences
RefSeq
39. REFSEQ BENEFITS
Non-redundancy
Updates to reflect current sequence data and biology
Data validation
Format consistency
Distinct accession series
Stewardship by NCBI staff and collaborators
42. ENTREZ: A DISCOVERY SYSTEM
Gene
Taxonomy
PubMed
abstracts
Nucleotide
sequences
Protein
sequences
3-D
Structure
3 -D
Structure
Word weight
VAST
BLAST
BLAST
Phylogeny
Hard Link
Neighbors
Related Sequences
Neighbors
Related Sequences
BLink
Domains
Neighbors
Related Structures
Pre-computed and pre-compiled data.
•A potential “gold mine” of undiscovered
relationships.
•Used less than expected.
43. GLOBAL QUERY: ALL NCBI DATABASES
The Entrez system: 38 (and counting) integrated databases
44. TRADITIONAL METHOD: THE LINKS MENU
DNA Sequence
Nucleotide – Protein Link
Related Proteins
Protein – Structure Link
3-D Structure
45. THE PROBLEM
Rapidly growing databases with complex and changing
relationships
Rapidly changing interfaces to match the above
Result
Many people don’t know:
Where to begin
Where to click on a Web page
Why it might be useful to click there
53. ‘TAKE HOME MESSAGE’ ADVANTAGES OF
DATA INTEGRATION
More relevant inter-related information in one place
Makes it easier to find additional relevant
information related to your initial query
Potentially find information indirectly linked, but
relevant to your subject of interest
uncover non-obvious genetic features that explain
phenotype or disease
Easier to build a ‘story’ based on multiple pieces of
biological evidence