BioSQL: A Generic relational model for
Bioinformatics
BI-691
Generic Data ModelOverview of BioSQL SchemaPreface of BioSQLDependency of BioSQLIntroductionInstallation BioSQL
Application of BioSQLAdvantages of BioSQLLimitation of BioSQLConclusionReferences
For database management Relational model is very
important
Conceptualization of real world thing into logical
model
First formulated and proposed in 1969 by Eadger
F. Codd
Logical model is use making relation and their
relationship
Relational
Model
•Table
•Tuple
•Relation
Instance
•Relation
schema
•Relation
Key
•Attribute
Domain
•Key
Constraint
•Domain
Constraint
•Referential
Integrity
Constraint
This model is represented in terms of tuples,
grouped into relations
A database organized in terms of the relational model
is a relational database
Relational data model is the primary data model
This used widely around the world for data storage
and processing
The generic data model is the generalization of the
conventional data model
This generic data model defines the standardised
relation types
Consensus among the different Relational
Modeler of can produce a generic model of a
particular domain
Generic Data Model
Ewan Birney started BioSQL in 2001
Major Redesign and Refactorings 2002-2003
PhyloDb module added at 2006
V1.0 released in March 2008
Not a Query Language, It is a schema/dbmodel!!!
Covering sequences, features, sequence and
feature annotation, a reference taxonomy, and
ontologies
Required highly normalized relational model
Local storage of global biological data
BioSQL schema is not strongly typed paradigm
Derived entity always is in object oriented sense
Weakly typed paradigm
Generic, but can hold any number of
specialization
Annotation
Bundle
Seqfeature
With location
And
Annotation
Ontology
term and
Relationship
Bioentry with
taxon and
names
spaces
BioEntry&TaxonBiodatabaseBioentryBiosequenceBioentry RelationshipTaxonTaxon Name
Core entity of BioSQL
Track any single entry or record in a biological
databases
The BIOENTRY contains information about the
record's public name, public accession and version
A BIODATABASE is simply a collection of bioentries
one BIOENTRY may only belong to one
BIODATABASE
one BIODATABASE may contain many bioentries
In BioSQL, all relation have bioentries
BIOSEQUENCE table contains the raw sequence
information associated with a BIOENTRY
Alphabet information ('protein', 'dna', 'rna')
One to One Relationship with BIOENTRY
BIOENTRY may themselves be related to one another
(e.g., a PDB record may be composed of multiple
subrecords for separate chains)
Basic taxonomic information about the organism to
which a given BIOENTRY refers
Reflect the structure of NCBI's taxonomy database
Each BIOENTRY can be associated with only one
taxon
Many BIOENTRY can be associated with the same
taxon
Annotation
Bundle
Seqfeature
With location
And
Annotation
Ontology
term and
Relationship
Bioentry with
taxon and
names
spaces
Seqfeatures Location
&Annotation
LocationSeqFeatureSEQFEATURE_RELATIONSHIPLocationQ.valueS.Q.ValueS.F DBxref
Semantic of Sequence Describing the stop and
start coordinates and
strand
Annotation
Bundle
Seqfeature
With location
And
Annotation
Ontology
term and
Relationship
Bioentry with
taxon and
names
spaces
Ontology term and
RelationshipTerm RelationTermTerm SynonymTermdbxrefOntology
Term is used to "label" a
seqfeature's name
An ontology is
essentially a dictionary
of terms in a
somewhat-controlled
vocabulary
Annotation
Bundle
Seqfeature
With location
And
Annotation
Ontology
term and
Relationship
Bioentry with
taxon and
names
spaces
Annotation
Bundle
ReferencesBioentryReferencesCommentDbxrefBioentryDbxrefB&D QValue
Annotation
Bundle
Seqfeature
With location
And
Annotation
Ontology
term and
Relationship
Bioentry with
taxon and
names
spaces
http://www.biosql.org/wiki/Downloads
Local MySQL Database
The BioSQL project provides a well thought out relational
database schema for storing biological sequences and
annotations
Advantages of reusability
Compatible with several programming languages
like BioPython, BioPerl, BioJava, BioRuby etc
Flexible storage of data via a key/value pair model
Extensible with the required situation
Overall data model based on GenBank flat files
It also allows great flexibility in choosing the data
used by Snapshot since sequence data from any
source, including online databases
locally generated sequence data can be added
This is a single user solution
This is the least flexible since the database can not be
shared
No Consideration of protein secondary structure
prediction
Local ‘GenBank’ with random access
‘GenBank’ in Relational format
Easy load of NCBI taxonomy data into Local DB
Integrated sequence and annotation databases
Handy Tool For Bioinformatics Community
•http://biojava.org/wiki/BioJava:Tutorial:Installing_and
_using_BioSQL
•http://biopython.org/wiki/BioSQL
•http://biosqlweb.appspot.com/
•http://en.wikipedia.org/wiki/Generic_data_model
•http://userweb.eng.gla.ac.uk/umer.ijaz/bioinformatics/
BIOSQL_tutorial.pdf
•http://www.bioinformatics.be/new/faq/mygenbank-
howto-setup-your-local-relational-database-for-
storing-sequence-data/
•http://www.bioperl.org/wiki/BioPerl_db
•http://www.biosql.org/wiki/Main_Page
•https://biorelated.wordpress.com/2009/01/07/bio-
graphics-biosql-and-rails-part-1/
•https://github.com/biosql/biosql/blob/master/INSTALL
Bio sql presentation

Bio sql presentation