SlideShare a Scribd company logo
How do you solve a problem like a
biological database?
(BNF 216 - Database Modeling and Design for Bioinformatics)
Arjei Balandra
Software Developer
National Telehealth Center
University of the Philippines – Manila
http://bumblebest.net
Database
• A database is a set of data that has a regular
structure and that is organized in such a way
that a computer can easily find the
desired information.
– The Linux Information Project
(http://www.linfo.org/database.html)
Biological Database
• Biological databases are libraries of life
sciences information collected from scientific
experiments, published literature, high-
throughput experiment technology, and
computational analyses.
- Wikipedia (en.wikipedia.org/wiki/Biological_database)
NCBI - GenBank
European Nucleotide Archive –
EMBL-EBI
DDBJ – DNA Data Bank Of Japan
Why Database?
• Data-intensive techniques such as high-
throughput screening and gene expression
experiments demand methods to correlate
large and diverse datasets.
• Databases integrate information from a
variety of sources allowing faster and more
powerful searches.
DO A “GOOD” DATABASE DESIGN
Tip #1:
Good Database Design
• Provides easy access to previous results.
• Supports both expert- and machine-guided
searches for novel correlations in data.
Bad Database Design
• Obfuscates the correlations for which the user
is searching
• makes it difficult for biologists to fit their data
into the database or to find previously stored
data resulting to user contempt.
• ‘brittle’
LEARN FROM EXISTING LITERATURE
Tip #2:
• Generalizations
• Incorporate existing schema into the database
design
• Use existing structures for common data
Generalizations
aMAZE (http://www.ncbi.nlm.nih.gov/pmc/articles/PMC308873/figure/gkh139f2/)
RESPECT THE UNIQUE NEEDS OF
BIOLOGISTS (AND USERS)
Tip #3:
Business rules
• constraints
– based on data derived from the real-world
entities
– specific to the needs of the organization.
What they need?
– Use free-text Comments
– Create user-specific
categories
Dealing with Business Rules
User-Specific Categories
DESIGN THE DATABASE BEFORE
BUILDING IT
Tip #4:
USE THE DATABASE TO ENFORCE
DATA INTEGRITY
Tip #5:
Normalization
Normalization
Normalization
KEEP THE DATABASE SCOPE
MANAGEABLE
Tip #6:
• In Biology, one size does not fit all
• Focus on a subset of Biology (ie. Genes,
Proteins)
• In large subsets, do it one at a time
• Inclusive
Keep the database scope manageable
LISTEN TO THE PEOPLE WHO HAVE TO
WRITE AND USE THE INTERFACE
Tip #7:
• Databases are successful only when people
use it
Users know what they want and need
+ Developers know what they can do
+ Designers know what must be done
---------------------------------------------------------
= Collaborative approach to develop a
successful database
TEST THE DESIGN WITH REALISTIC
DATA
Tip #8:
MAKE THE DATABASE STRUCTURE
UNDERSTANDABLE AND
EASY TO MAINTAIN
Tip #9:
THANK YOU!
REPLACE(quote,
”pagmamahal”,”
data”);
quote
References
• The Linux Information Project
(http://www.linfo.org/database.html)
• Nelson, M.R., Reisinger, S.J., Henry, S. (2003).Designing
databases to store biological information. BIOSILICO
Vol. 1, No. 4
• Wikipedia (en.wikipedia.org/wiki/Biological_database)
• Lemer, C., Antezana, E., Couche, F., Fays, F., Santolaria,
X., Janky, R., … Wodak, S. J. (2004). The aMAZE
LightBench: a web interface to a relational database
of cellular processes. Nucleic Acids
Research, 32(Database issue), D443–D448.
doi:10.1093/nar/gkh139

More Related Content

What's hot

What's hot (20)

Primary Databases.pptx
Primary Databases.pptxPrimary Databases.pptx
Primary Databases.pptx
 
Metabolic Network Analysis
Metabolic Network AnalysisMetabolic Network Analysis
Metabolic Network Analysis
 
Nucleic Acid Sequence databases
Nucleic Acid Sequence databasesNucleic Acid Sequence databases
Nucleic Acid Sequence databases
 
BITS: UCSC genome browser - Part 1
BITS: UCSC genome browser - Part 1BITS: UCSC genome browser - Part 1
BITS: UCSC genome browser - Part 1
 
Transformation and transfection
Transformation and transfection Transformation and transfection
Transformation and transfection
 
Primary and secondary databases ppt by puneet kulyana
Primary and secondary databases ppt by puneet kulyanaPrimary and secondary databases ppt by puneet kulyana
Primary and secondary databases ppt by puneet kulyana
 
Fasta
FastaFasta
Fasta
 
High-Throughput Sequencing
High-Throughput SequencingHigh-Throughput Sequencing
High-Throughput Sequencing
 
Genome annotation
Genome annotationGenome annotation
Genome annotation
 
Gene prediction method
Gene prediction method Gene prediction method
Gene prediction method
 
Dna chip
Dna chipDna chip
Dna chip
 
LECTURE NOTES ON BIOINFORMATICS
LECTURE NOTES ON BIOINFORMATICSLECTURE NOTES ON BIOINFORMATICS
LECTURE NOTES ON BIOINFORMATICS
 
Types of DNA sequences
Types of DNA sequencesTypes of DNA sequences
Types of DNA sequences
 
PIR- Protein Information Resource
PIR- Protein Information ResourcePIR- Protein Information Resource
PIR- Protein Information Resource
 
New Generation Sequencing Technologies: an overview
New Generation Sequencing Technologies: an overviewNew Generation Sequencing Technologies: an overview
New Generation Sequencing Technologies: an overview
 
An Introduction to "Bioinformatics & Internet"
An Introduction to "Bioinformatics & Internet"An Introduction to "Bioinformatics & Internet"
An Introduction to "Bioinformatics & Internet"
 
Biological data base
Biological data baseBiological data base
Biological data base
 
Sequence assembly
Sequence assemblySequence assembly
Sequence assembly
 
Functional genomics
Functional genomicsFunctional genomics
Functional genomics
 
Functional proteomics, and tools
Functional proteomics, and toolsFunctional proteomics, and tools
Functional proteomics, and tools
 

Viewers also liked (7)

Wyklad 2
Wyklad 2Wyklad 2
Wyklad 2
 
Bioinformatic tools in Pheromone technology
Bioinformatic tools in Pheromone technologyBioinformatic tools in Pheromone technology
Bioinformatic tools in Pheromone technology
 
Presentation4 - Microbio
Presentation4 - MicrobioPresentation4 - Microbio
Presentation4 - Microbio
 
Biological databases
Biological databasesBiological databases
Biological databases
 
2017 biological databasespart2
2017 biological databasespart22017 biological databasespart2
2017 biological databasespart2
 
Bioinformatics principles and applications
Bioinformatics principles and applicationsBioinformatics principles and applications
Bioinformatics principles and applications
 
2017 biological databases_part1_vupload
2017 biological databases_part1_vupload2017 biological databases_part1_vupload
2017 biological databases_part1_vupload
 

Similar to Designing Biological Databases

Module 1 - Chapter1.pptx
Module 1 - Chapter1.pptxModule 1 - Chapter1.pptx
Module 1 - Chapter1.pptx
SoniaDevi15
 

Similar to Designing Biological Databases (20)

Module 1 - Chapter1.pptx
Module 1 - Chapter1.pptxModule 1 - Chapter1.pptx
Module 1 - Chapter1.pptx
 
Designing a community resource - Sandra Orchard
Designing a community resource - Sandra OrchardDesigning a community resource - Sandra Orchard
Designing a community resource - Sandra Orchard
 
Datat and donuts: how to write a data management plan
Datat and donuts: how to write a data management planDatat and donuts: how to write a data management plan
Datat and donuts: how to write a data management plan
 
Biological data bioinformatics
Biological data bioinformatics Biological data bioinformatics
Biological data bioinformatics
 
HKU Data Curation MLIM7350 Class 9
HKU Data Curation MLIM7350 Class 9 HKU Data Curation MLIM7350 Class 9
HKU Data Curation MLIM7350 Class 9
 
Data retreival system
Data retreival systemData retreival system
Data retreival system
 
Reproducible research: theory
Reproducible research: theoryReproducible research: theory
Reproducible research: theory
 
biological databases.pptx
biological databases.pptxbiological databases.pptx
biological databases.pptx
 
Bioinformatics__Lecture_1.ppt
Bioinformatics__Lecture_1.pptBioinformatics__Lecture_1.ppt
Bioinformatics__Lecture_1.ppt
 
Composite protein databases
Composite protein databasesComposite protein databases
Composite protein databases
 
The Progress on Sagace and Data Integration
The Progress on Sagace and Data IntegrationThe Progress on Sagace and Data Integration
The Progress on Sagace and Data Integration
 
Data-knowledge transition zones within the biomedical research ecosystem
Data-knowledge transition zones within the biomedical research ecosystemData-knowledge transition zones within the biomedical research ecosystem
Data-knowledge transition zones within the biomedical research ecosystem
 
SciBite
SciBiteSciBite
SciBite
 
Primary and secondary database
Primary and secondary databasePrimary and secondary database
Primary and secondary database
 
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
 
EiTESAL eHealth Conference 14&15 May 2017
EiTESAL eHealth Conference 14&15 May 2017 EiTESAL eHealth Conference 14&15 May 2017
EiTESAL eHealth Conference 14&15 May 2017
 
Intro to RDM
Intro to RDMIntro to RDM
Intro to RDM
 
Computing 7
Computing 7Computing 7
Computing 7
 
Clinical Anatomy 9566
Clinical Anatomy 9566Clinical Anatomy 9566
Clinical Anatomy 9566
 
Database part1-
Database part1-Database part1-
Database part1-
 

Recently uploaded

Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Peter Udo Diehl
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 

Recently uploaded (20)

GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
НАДІЯ ФЕДЮШКО БАЦ «Професійне зростання QA спеціаліста»
НАДІЯ ФЕДЮШКО БАЦ  «Професійне зростання QA спеціаліста»НАДІЯ ФЕДЮШКО БАЦ  «Професійне зростання QA спеціаліста»
НАДІЯ ФЕДЮШКО БАЦ «Професійне зростання QA спеціаліста»
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 

Designing Biological Databases