SlideShare a Scribd company logo
1 of 87
FBW
20-02-2018
Biological Databases
Wim Van Criekinge
Math
Informatics
Bioinformatics, a scientific discipline ? Or the new (molecular) biology ?
Theoretical Biology
Computational Biology
(Molecular)
Biology
Computer Science
Bioinformatics
Lab for Bioinformatics and
computational genomics
Statistics
Machine Learning
Text Mining
Bioinformatics
Discovery Informatics
Informatics (Molecular)
Biology
Statistics
Machine Learning
Text Mining
Python, …
Biological Databases
Bioinformatics
Discovery Informatics
(Molecular)
Biology
The most valuable programming skills to have on a resume
New kid in the coding block …
Statistics
Machine Learning
Text Mining
Python, …
Biological Databases
Epigenetics
Bioinformatics
Discovery Informatics
Sander-Schneider
• HSSP: homology derived secondary structure
Usage of the databases
Annotation searches - Search for keywords, authors, features
Usage of the databases
Annotation searches - Search for keywords, authors, features
 What is the protein sequence for human insulin?
 How does the 3D structure of calmodulin look like?
 What is the genetic location of the cystic fibrosis gene?
 List all intron sequences in rat.
Usage of the databases
Annotation searches - Search for keywords, authors, features
Usage of the databases
Annotation searches - Search for keywords, authors, features
Homology (similarity) searches - Search for similar sequences
Usage of the databases
Annotation searches - Search for keywords, authors, features
Homology (similarity) searches - Search for similar sequences
 Is there any known protein sequence that is similar to x?
 Is this gene known in any other species?
 Has someone already cloned this sequence?
Usage of the databases
Annotation searches - Search for keywords, authors, features
Homology (similarity) searches - Search for similar sequences
Usage of the databases
Annotation searches - Search for keywords, authors, features
Homology (similarity) searches - Search for similar sequences
Pattern searches - Search for occurrences of patterns
Usage of the databases
Annotation searches - Search for keywords, authors, features
Homology (similarity) searches - Search for similar sequences
Pattern searches - Search for occurrences of patterns
 Do my protein sequence contain any known motif
(that can give me a clue about the function)?
 Which known sequences contain this motif?
 Is any part of my nucleotide sequence recognized
by a transcriptional factor?
 List all known start, splice and stop signals in my
genomic sequence.
Usage of the databases
Annotation searches - Search for keywords, authors, features
Homology (similarity) searches - Search for similar sequences
Pattern searches - Search for occurrences of patterns
Usage of the databases
Annotation searches - Search for keywords, authors, features
Homology (similarity) searches - Search for similar sequences
Pattern searches - Search for occurrences of patterns
Predictions - Using the databases as knowledge databases
Usage of the databases
Annotation searches - Search for keywords, authors, features
Homology (similarity) searches - Search for similar sequences
Pattern searches - Search for occurrences of patterns
Predictions - Using the databases as knowledge databases
 What may the structure of my protein be?
Secondary structure prediction.
Modelling by homology.
 What is the gene structure of my genomic sequence?
 Which parts of my protein have a high antigenicity?
Usage of the databases
Annotation searches - Search for keywords, authors, features
Homology (similarity) searches - Search for similar sequences
Pattern searches - Search for occurrences of patterns
Predictions - Using the databases as knowledge databases
Usage of the databases
Annotation searches - Search for keywords, authors, features
Homology (similarity) searches - Search for similar sequences
Pattern searches - Search for occurrences of patterns
Predictions - Using the databases as knowledge databases
Comparisons
Usage of the databases
Annotation searches - Search for keywords, authors, features
Homology (similarity) searches - Search for similar sequences
Pattern searches - Search for occurrences of patterns
Predictions - Using the databases as knowledge databases
Comparisons
 Gene families
 Phylogenetic trees
Les 1
• Bioinformatics I Revisited in 5 slides
• Why bother making databases ?
• DataBases
– FF
• *.txt
• Indexed version
– Relational (RDBMS)
• Access, MySQL, PostGRES, Oracle
– OO (OODBMS)
• AceDB, ObjectStore
– Hierarchical
• XML
– Frame based system
• Eg. DAML+OIL
– Hybrid systems
GenBank Format
LOCUS LISOD 756 bp DNA BCT 30-JUN-1993
DEFINITION L.ivanovii sod gene for superoxide dismutase.
ACCESSION X64011.1 GI:37619753
NID g44010
KEYWORDS sod gene; superoxide dismutase.
SOURCE Listeria ivanovii.
ORGANISM Listeria ivanovii
Eubacteria; Firmicutes; Low G+C gram-positive bacteria;
Bacillaceae; Listeria.
REFERENCE 1 (bases 1 to 756)
AUTHORS Haas,A. and Goebel,W.
TITLE Cloning of a superoxide dismutase gene from Listeria ivanovii
by functional complementation in Escherichia coli and
characterization of the gene product
JOURNAL Mol. Gen. Genet. 231 (2), 313-322 (1992)
MEDLINE 92140371
REFERENCE 2 (bases 1 to 756)
AUTHORS Kreft,J.
TITLE Direct Submission
JOURNAL Submitted (21-APR-1992) J. Kreft, Institut f. Mikrobiologie,
Universitaet Wuerzburg, Biozentrum Am Hubland, 8700
Wuerzburg, FRG
Problems with Flat files …
• Wasted storage space
• Wasted processing time
• Data control problems
• Problems caused by changes to data
structures
• Access to data difficult
• Data out of date
• Constraints are system based
• Limited querying eg. all single exon
GPCRs (<1000 bp)
• What is a relational database ?
– Sets of tables and links (the data)
– A language to query the datanase (Structured
Query Language)
– A program to manage the data (RDBMS)
• Flat files are not relational
– Data type (attribute) is part of the data
– Record order mateters
– Multiline records
– Massive duplication
• Bv Organism: Homo sapeinsm Eukaryota, …
– Some records are hierarchical
• Xrefs
– Records contain multiple “sub-records”
– Implecit “Key”
• records
• fields
• linear file of
homogeneous records
name.........................
surname....................
phone........................
address......................
name.........................
surname....................
phone........................
address......................
name.........................
surname....................
phone........................
address......................
name.........................
surname....................
phone........................
address......................
name.........................
surname....................
phone........................
address......................
name.........................
surname....................
phone........................
address......................
name.........................
surname....................
phone........................
address......................
name.........................
surname....................
phone........................
address......................
• Terms and concepts:
– tuple
– domain
– attribute
– key
– integrity rules
Introduction to Database Systems
• Historic Background
– Hierarchical databases (IMS) - IBM 1968
• Hierarchical structures between file records
– Network databases - CODASYL Group 1969
• Network structures of record types
• Linked chains between 'Owner' and 'Member' records
• Included in Cobol, procedural language - Manual
navigation
– Relational Data Model - E. F. Codd 1970
• Mathematical foundation of databases
• New non-procedural language SQL - Automatic
navigation
– Object-relational databases
– Object-oriented databases
Relational
• The Relational model is not only very mature, but it
has developed a strong knowledge on how to make a
relational back-end fast and reliable, and how to
exploit different technologies such as massive SMP,
Optical jukeboxes, clustering and etc. Object
databases are nowhere near to this, and I do not
expect then to get there in the short or medium term.
• Relational Databases have a very well-known and
proven underlying mathematical theory, a simple one
(the set theory) that makes possible
– automatic cost-based query optimization,
– schema generation from high-level models and
– many other features that are now vital for mission-critical
Information Systems development and operations.
The Benefits of Databases
• Redundancy can be reduced
• Inconsistency can be avoided
• Conflicting requirements can be
balanced
• Standards can be enforced
• Data can be shared
• Data independence
• Integrity can be maintained
• Security restrictions can be applied
Relational Terminology
ID NAME PHONE EMP_ID
201 Unisports 55-2066101 12
202 Simms Atheletics 81-20101 14
203 Delhi Sports 91-10351 14
204 Womansport 1-206-104-0103 11
Row (Tuple)
Column (Attribute)
CUSTOMER Table (Relation)
Relational Database Terminology
• Each row of data in a table is uniquely identified by a primary key (PK)
• Information in multiple tables can be logically related by foreign keys (FK)
ID LAST_NAME FIRST_NAME
10 Havel Marta
11 Magee Colin
12 Giljum Henry
14 Nguyen Mai
ID NAME PHONE EMP_ID
201 Unisports 55-2066101 12
202 Simms Atheletics 81-20101 14
203 Delhi Sports 91-10351 14
204 Womansport 1-206-104-0103 11
Table Name: CUSTOMER Table Name: EMP
Primary Key Foreign Key Primary Key
Relational Database Terminology
Relational operators
• Relational
– select
rel WHERE boolean-xpr
– project
rel [ attr-specs ]
– join
rel JOIN rel
– divide by
rel DIVIDEBY rel
• Set-based

rel UNION rel

rel INTERSECT rel

rel MINUS rel

rel TIMES rel
Disadvantages
• size
• complexity
• cost
• Additional hardware costs
• Higher impact of failure
• Recovery more difficult
• RDBM products
– Free
• MySQL, very fast, widely usedm easy to
jump into but limited non standard SQL
• PostrgreSQL – full SQLm limited OO,
higher learning curve than MySQL
– Commercial
• MS Access – Great query builder, GUI
interfaces
• MS SQL Server – full SQL, NT only
• Oracle, everything, including the kitchen
sink
• IBM DB2, Sybase
Example 3-tier model in biological database
http://www.bioinformatics.be
Example of different interface to the same back-end database (MySQL)
BioSQL
Conclusions
• A database is a central component of any
contemporary information system
• The operations on the database and the mainenance
of database consistency is handled by a DBMS
• There exist stand alone query languages or
embedded languages but both deal with definition
(DDL) and manipulation (DML) aspects
• The structural properties, constraints and operations
permitted within a DBMS are defined by a data
model - hierarchical, network, relational
• Recovery and concurrency control are essential
• Linking of heterogebous datasources is central theme
in modern bioinformatics
What is to come ?
Basic outline
• Setup RDMBS
• OLTP Access through CLI, dedicated
client, PHP, Perl/Python
• OLAP Access through Perl/Python, R ..
Integration
• Cytoscape
Semantic Web
• noSQL/Hadoop
• SPARQL
Project
•Sciencecraft
•iGem
•BioDesignChallenge
•mHealth
•Social Genetics
3/05/2016 Project Biological Databases
2015-2016
Biological Databases
Bruno Verstraeten, Arthur Zwaenepoel,
Jules Haezebrouck, Laurenz De Cock, Jonathan
Walgraeve, Cedric Bogaert, Dries Schaumont
What is minecraft
• Sandbox game
• Designed by Markus “Notch” Persson
• Mojang
• Bought by Microsoft in 2014
• 70 million sold copies (june 2015)
Minecraft programming from Python
Third party mods
• Extra content made by users
• Adding items, magic and features to
the original game
• The true beauty of minecraft
And now Sciencecraft
• Visualizing proteins in minecraft
• Minecraft Tools python package
• Data directly from PDB flat files or
from the PDB server
• Spigot minecraft server
The basics
1. Start a server with Minecraft Tools
2. Using python import the pdb file
3. Retrieve the coordinates from the file
4. Using the setBlock function blocks of
specific colours are placed in the
minecraft server to represent the protein
5. Fly around and take screenshots
Minecraft programming from Python
# Connect to Minecraft
from mcpi.minecraft import Minecraft
mc = Minecraft.create()
# Set x, y, and z variables to represent coordinates
x = 10.0
y = 110.0
z = 12.0
# Change the player's position
mc.player.setPos(x, y, z)
Verotoxin
Apo-lipoprotein A1
Kinesine
Retrieving PDB data using SPARQL
• PDB available in RDF (wwPDB)
• Using python SPARQLwrapper
Using SPARQL with Python – SPARQLWrapper
SPARQL endpoint
Using SPARQL with Python – SPARQLWrapper
“Search engine”
• Naive regex based
• Returns list of all pdb
entries containing a
certain keyword with
organism name and
full description
• PDB entry can be
retrieved with previous
query
Retrieve .xml.gz file:
 Actual structure information in xml file
<?xml version="1.0" encoding="UTF-8" ?>
<PDBx:datablock datablockName="1O9K"
xmlns:PDBx="http://pdbml.pdb.org/schema/pdbx-v40.xsd"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://pdbml.pdb.org/schema/pdbx-
v40.xsd pdbx-v40.xsd">
<PDBx:atom_siteCategory>
<PDBx:atom_site id="1">
<PDBx:B_iso_or_equiv>62.42</PDBx:B_iso_or_equiv>
<PDBx:Cartn_x>13.258</PDBx:Cartn_x>
<PDBx:Cartn_y>142.706</PDBx:Cartn_y>
<PDBx:Cartn_z>30.410</PDBx:Cartn_z>
<PDBx:auth_asym_id>A</PDBx:auth_asym_id>
<PDBx:auth_atom_id>N</PDBx:auth_atom_id>
<PDBx:auth_comp_id>MET</PDBx:auth_comp_id>
<PDBx:auth_seq_id>379</PDBx:auth_seq_id>
<PDBx:group_PDB>ATOM</PDBx:group_PDB>
<PDBx:label_alt_id xsi:nil="true" />
<PDBx:label_asym_id>A</PDBx:label_asym_id>
<PDBx:label_atom_id>N</PDBx:label_atom_id>
<PDBx:label_comp_id>MET</PDBx:label_comp_id>
<PDBx:label_entity_id>1</PDBx:label_entity_id>
<PDBx:label_seq_id>8</PDBx:label_seq_id>
<PDBx:occupancy>1.00</PDBx:occupancy>
….
Using SPARQL with Python – SPARQLWrapper
Project
•Sciencecraft
•iGem
•BioDesignChallenge
•mHealth
•Social Genetics
CE
ENGINEER
ING
TOGETHE
R:
PARTICIPA
TING AT
IGEM
INTERNATIONAL GENETICALLY
ENGINEERED MACHINE➤ Annual synthetic biology competition
➤ Making new organisms: biobricks
➤ Hosted by MIT: five teams in 2004, 130 teams in 2016
PAST IGEM WINNERS
2014
biosensor for olive
oil quality
2015
3D printing of
biofilms 2016
system for the
control of co-
culture stability
UGENT 2016 TEAM
SOLVING WATER SHORTAGE
FOUR WORK PACKAGES
WP2: Filament
WP3: Biofunction
WP1: Shape
WP4: Measurement
WP1: SHAPE OPTIMISATION
Fogstand beetle
WP2: FILAMENT
WP3: BIOFUNCTION
+
lysatemembrane
WP4: FUNCTIONAL ASSAY
OUR INPUT
OUR INPUT
IN BOSTON: IGEM
CONFERENCE
Presenting, learning and having fun in Boston
FOLLOW UP
➤ Maker City
➤ BrainBooster session
CropDesign
➤ Biodesign competition
➤ Bachelor project on 3D
printing
➤ PLOS iGEM collection
Next steps
Ideas for the next iGem
teams …
83
v2
84
• Thermodynamic compatible testing setting
• Real-life testing With UCSC (make bigger version ?)
and/or green “plantable” versions for field tests (self-
watering plant ?)
• Introduce temperature gradient ?? Blend current dewpal
with solar and/or wind energy source …
85
http://waterseer.org
http://fontus.at
86
… why go to the trouble of collecting water out of the air?
Why not simply cause more rain to fall?
With our INP ???
http://science.howstuffworks.com/environmental/earth/geophysics/manufacture-water1.htm
An even wilder idea …
Environmental remediation projects (e.g. ISS water recovery –
change freezing point … contact Prof Arne Verliefde)
Molecular diagnostics - Liquid Biopsy project
Capture (methylated) cancer molecules from blood and/urine
in oncology, we can run head-to-head samples from clinical
trial in bladder and/or prostate cancer
Alternative projects
87
2018 02 20_biological_databases_part1_v_upload

More Related Content

What's hot

Bio ontologies and semantic technologies
Bio ontologies and semantic technologiesBio ontologies and semantic technologies
Bio ontologies and semantic technologiesProf. Wim Van Criekinge
 
How to make your published data findable, accessible, interoperable and reusable
How to make your published data findable, accessible, interoperable and reusableHow to make your published data findable, accessible, interoperable and reusable
How to make your published data findable, accessible, interoperable and reusablePhoenix Bioinformatics
 
2019 03 05_biological_databases_part5_v_upload
2019 03 05_biological_databases_part5_v_upload2019 03 05_biological_databases_part5_v_upload
2019 03 05_biological_databases_part5_v_uploadProf. Wim Van Criekinge
 
Connecting life sciences data at the European Bioinformatics Institute
Connecting life sciences data at the European Bioinformatics InstituteConnecting life sciences data at the European Bioinformatics Institute
Connecting life sciences data at the European Bioinformatics InstituteConnected Data World
 
BLAST(Basic Local Alignment Tool)
BLAST(Basic Local Alignment Tool)BLAST(Basic Local Alignment Tool)
BLAST(Basic Local Alignment Tool)Sobia
 
Verifying Integrity Constraints of a RDF-based WordNet
Verifying Integrity Constraints of a RDF-based WordNetVerifying Integrity Constraints of a RDF-based WordNet
Verifying Integrity Constraints of a RDF-based WordNetAlexandre Rademaker
 
Tracing Networks: Ontology-based Software in a Nutshell
Tracing Networks: Ontology-based Software in a NutshellTracing Networks: Ontology-based Software in a Nutshell
Tracing Networks: Ontology-based Software in a NutshellTracingNetworks
 
BLAST (Basic local alignment search Tool)
BLAST (Basic local alignment search Tool)BLAST (Basic local alignment search Tool)
BLAST (Basic local alignment search Tool)Ariful Islam Sagar
 
Semi-automated Exploration and Extraction of Data in Scientific Tables
Semi-automated Exploration and Extraction of Data in Scientific TablesSemi-automated Exploration and Extraction of Data in Scientific Tables
Semi-automated Exploration and Extraction of Data in Scientific TablesElsevier
 
20130622 okfn hackathon t2
20130622 okfn hackathon t220130622 okfn hackathon t2
20130622 okfn hackathon t2Seonho Kim
 

What's hot (20)

Bio ontologies and semantic technologies
Bio ontologies and semantic technologiesBio ontologies and semantic technologies
Bio ontologies and semantic technologies
 
P7 2018 biopython3
P7 2018 biopython3P7 2018 biopython3
P7 2018 biopython3
 
How to make your published data findable, accessible, interoperable and reusable
How to make your published data findable, accessible, interoperable and reusableHow to make your published data findable, accessible, interoperable and reusable
How to make your published data findable, accessible, interoperable and reusable
 
2019 03 05_biological_databases_part5_v_upload
2019 03 05_biological_databases_part5_v_upload2019 03 05_biological_databases_part5_v_upload
2019 03 05_biological_databases_part5_v_upload
 
2016 02 23_biological_databases_part1
2016 02 23_biological_databases_part12016 02 23_biological_databases_part1
2016 02 23_biological_databases_part1
 
Paul Groth
Paul GrothPaul Groth
Paul Groth
 
Blasta
BlastaBlasta
Blasta
 
Connecting life sciences data at the European Bioinformatics Institute
Connecting life sciences data at the European Bioinformatics InstituteConnecting life sciences data at the European Bioinformatics Institute
Connecting life sciences data at the European Bioinformatics Institute
 
BLAST(Basic Local Alignment Tool)
BLAST(Basic Local Alignment Tool)BLAST(Basic Local Alignment Tool)
BLAST(Basic Local Alignment Tool)
 
Verifying Integrity Constraints of a RDF-based WordNet
Verifying Integrity Constraints of a RDF-based WordNetVerifying Integrity Constraints of a RDF-based WordNet
Verifying Integrity Constraints of a RDF-based WordNet
 
Blast
BlastBlast
Blast
 
Phpconf2008 Sphinx En
Phpconf2008 Sphinx EnPhpconf2008 Sphinx En
Phpconf2008 Sphinx En
 
Tracing Networks: Ontology-based Software in a Nutshell
Tracing Networks: Ontology-based Software in a NutshellTracing Networks: Ontology-based Software in a Nutshell
Tracing Networks: Ontology-based Software in a Nutshell
 
BLAST (Basic local alignment search Tool)
BLAST (Basic local alignment search Tool)BLAST (Basic local alignment search Tool)
BLAST (Basic local alignment search Tool)
 
Fasta
FastaFasta
Fasta
 
BLAST
BLASTBLAST
BLAST
 
Semi-automated Exploration and Extraction of Data in Scientific Tables
Semi-automated Exploration and Extraction of Data in Scientific TablesSemi-automated Exploration and Extraction of Data in Scientific Tables
Semi-automated Exploration and Extraction of Data in Scientific Tables
 
Fasta
FastaFasta
Fasta
 
Blast 2013 1
Blast 2013 1Blast 2013 1
Blast 2013 1
 
20130622 okfn hackathon t2
20130622 okfn hackathon t220130622 okfn hackathon t2
20130622 okfn hackathon t2
 

Similar to 2018 02 20_biological_databases_part1_v_upload

Data science training in hyderabad
Data science training in hyderabadData science training in hyderabad
Data science training in hyderabadGeohedrick
 
Yde de Jong & Dave Roberts - ZooBank and EDIT: Towards a business model for Z...
Yde de Jong & Dave Roberts - ZooBank and EDIT: Towards a business model for Z...Yde de Jong & Dave Roberts - ZooBank and EDIT: Towards a business model for Z...
Yde de Jong & Dave Roberts - ZooBank and EDIT: Towards a business model for Z...ICZN
 
Bioinformatics data mining
Bioinformatics data miningBioinformatics data mining
Bioinformatics data miningSangeeta Das
 
Cartic Ramakrishnan's dissertation defense
Cartic Ramakrishnan's dissertation defenseCartic Ramakrishnan's dissertation defense
Cartic Ramakrishnan's dissertation defenseCartic Ramakrishnan
 
EiTESAL eHealth Conference 14&15 May 2017
EiTESAL eHealth Conference 14&15 May 2017 EiTESAL eHealth Conference 14&15 May 2017
EiTESAL eHealth Conference 14&15 May 2017 EITESANGO
 
Ontologies for life sciences: examples from the gene ontology
Ontologies for life sciences: examples from the gene ontologyOntologies for life sciences: examples from the gene ontology
Ontologies for life sciences: examples from the gene ontologyMelanie Courtot
 
Investigating plant systems using data integration and network analysis
Investigating plant systems using data integration and network analysisInvestigating plant systems using data integration and network analysis
Investigating plant systems using data integration and network analysisCatherine Canevet
 
Data retriveal ,srg and dbget
Data retriveal ,srg and dbgetData retriveal ,srg and dbget
Data retriveal ,srg and dbgetSurendraKumar338
 
Pharmacoinformatics Database basics(sree)
Pharmacoinformatics Database basics(sree)Pharmacoinformatics Database basics(sree)
Pharmacoinformatics Database basics(sree)Sreekanth Gali
 
Data Base in Bioinformatics.ppt
Data Base in Bioinformatics.pptData Base in Bioinformatics.ppt
Data Base in Bioinformatics.pptBangaluru
 
An Overview to Protein bioinformatics
An Overview to Protein bioinformaticsAn Overview to Protein bioinformatics
An Overview to Protein bioinformaticsJoel Ricci-López
 
Data standards in bioinformatics
Data standards in bioinformatics	Data standards in bioinformatics
Data standards in bioinformatics Rafael C. Jimenez
 
Information Retrieval
Information Retrieval Information Retrieval
Information Retrieval ShujaatZaheer3
 
Standarization in Proteomics: From raw data to metadata files
Standarization in Proteomics: From raw data to metadata filesStandarization in Proteomics: From raw data to metadata files
Standarization in Proteomics: From raw data to metadata filesYasset Perez-Riverol
 

Similar to 2018 02 20_biological_databases_part1_v_upload (20)

2012 03 01_bioinformatics_ii_les1
2012 03 01_bioinformatics_ii_les12012 03 01_bioinformatics_ii_les1
2012 03 01_bioinformatics_ii_les1
 
Data science training in hyderabad
Data science training in hyderabadData science training in hyderabad
Data science training in hyderabad
 
Yde de Jong & Dave Roberts - ZooBank and EDIT: Towards a business model for Z...
Yde de Jong & Dave Roberts - ZooBank and EDIT: Towards a business model for Z...Yde de Jong & Dave Roberts - ZooBank and EDIT: Towards a business model for Z...
Yde de Jong & Dave Roberts - ZooBank and EDIT: Towards a business model for Z...
 
wipo_ip_mnl_19_t5.pdf
wipo_ip_mnl_19_t5.pdfwipo_ip_mnl_19_t5.pdf
wipo_ip_mnl_19_t5.pdf
 
Bioinformatics data mining
Bioinformatics data miningBioinformatics data mining
Bioinformatics data mining
 
Data retrieval tools
Data retrieval toolsData retrieval tools
Data retrieval tools
 
Cartic Ramakrishnan's dissertation defense
Cartic Ramakrishnan's dissertation defenseCartic Ramakrishnan's dissertation defense
Cartic Ramakrishnan's dissertation defense
 
EiTESAL eHealth Conference 14&15 May 2017
EiTESAL eHealth Conference 14&15 May 2017 EiTESAL eHealth Conference 14&15 May 2017
EiTESAL eHealth Conference 14&15 May 2017
 
Ontologies for life sciences: examples from the gene ontology
Ontologies for life sciences: examples from the gene ontologyOntologies for life sciences: examples from the gene ontology
Ontologies for life sciences: examples from the gene ontology
 
Biological databases
Biological databasesBiological databases
Biological databases
 
Data Retrieval Systems
Data Retrieval SystemsData Retrieval Systems
Data Retrieval Systems
 
Investigating plant systems using data integration and network analysis
Investigating plant systems using data integration and network analysisInvestigating plant systems using data integration and network analysis
Investigating plant systems using data integration and network analysis
 
Data retrieval
Data retrievalData retrieval
Data retrieval
 
Data retriveal ,srg and dbget
Data retriveal ,srg and dbgetData retriveal ,srg and dbget
Data retriveal ,srg and dbget
 
Pharmacoinformatics Database basics(sree)
Pharmacoinformatics Database basics(sree)Pharmacoinformatics Database basics(sree)
Pharmacoinformatics Database basics(sree)
 
Data Base in Bioinformatics.ppt
Data Base in Bioinformatics.pptData Base in Bioinformatics.ppt
Data Base in Bioinformatics.ppt
 
An Overview to Protein bioinformatics
An Overview to Protein bioinformaticsAn Overview to Protein bioinformatics
An Overview to Protein bioinformatics
 
Data standards in bioinformatics
Data standards in bioinformatics	Data standards in bioinformatics
Data standards in bioinformatics
 
Information Retrieval
Information Retrieval Information Retrieval
Information Retrieval
 
Standarization in Proteomics: From raw data to metadata files
Standarization in Proteomics: From raw data to metadata filesStandarization in Proteomics: From raw data to metadata files
Standarization in Proteomics: From raw data to metadata files
 

More from Prof. Wim Van Criekinge

2019 02 21_biological_databases_part2_v_upload
2019 02 21_biological_databases_part2_v_upload2019 02 21_biological_databases_part2_v_upload
2019 02 21_biological_databases_part2_v_uploadProf. Wim Van Criekinge
 
2018 03 27_biological_databases_part4_v_upload
2018 03 27_biological_databases_part4_v_upload2018 03 27_biological_databases_part4_v_upload
2018 03 27_biological_databases_part4_v_uploadProf. Wim Van Criekinge
 
2018 02 20_biological_databases_part2_v_upload
2018 02 20_biological_databases_part2_v_upload2018 02 20_biological_databases_part2_v_upload
2018 02 20_biological_databases_part2_v_uploadProf. Wim Van Criekinge
 
2017 molecular profiling_wim_vancriekinge
2017 molecular profiling_wim_vancriekinge2017 molecular profiling_wim_vancriekinge
2017 molecular profiling_wim_vancriekingeProf. Wim Van Criekinge
 

More from Prof. Wim Van Criekinge (20)

2019 02 21_biological_databases_part2_v_upload
2019 02 21_biological_databases_part2_v_upload2019 02 21_biological_databases_part2_v_upload
2019 02 21_biological_databases_part2_v_upload
 
P6 2018 biopython2b
P6 2018 biopython2bP6 2018 biopython2b
P6 2018 biopython2b
 
P4 2018 io_functions
P4 2018 io_functionsP4 2018 io_functions
P4 2018 io_functions
 
P3 2018 python_regexes
P3 2018 python_regexesP3 2018 python_regexes
P3 2018 python_regexes
 
P1 2018 python
P1 2018 pythonP1 2018 python
P1 2018 python
 
2018 05 08_biological_databases_no_sql
2018 05 08_biological_databases_no_sql2018 05 08_biological_databases_no_sql
2018 05 08_biological_databases_no_sql
 
2018 03 27_biological_databases_part4_v_upload
2018 03 27_biological_databases_part4_v_upload2018 03 27_biological_databases_part4_v_upload
2018 03 27_biological_databases_part4_v_upload
 
2018 03 20_biological_databases_part3
2018 03 20_biological_databases_part32018 03 20_biological_databases_part3
2018 03 20_biological_databases_part3
 
2018 02 20_biological_databases_part2_v_upload
2018 02 20_biological_databases_part2_v_upload2018 02 20_biological_databases_part2_v_upload
2018 02 20_biological_databases_part2_v_upload
 
P7 2017 biopython3
P7 2017 biopython3P7 2017 biopython3
P7 2017 biopython3
 
P6 2017 biopython2
P6 2017 biopython2P6 2017 biopython2
P6 2017 biopython2
 
Van criekinge 2017_11_13_rodebiotech
Van criekinge 2017_11_13_rodebiotechVan criekinge 2017_11_13_rodebiotech
Van criekinge 2017_11_13_rodebiotech
 
P4 2017 io
P4 2017 ioP4 2017 io
P4 2017 io
 
T5 2017 database_searching_v_upload
T5 2017 database_searching_v_uploadT5 2017 database_searching_v_upload
T5 2017 database_searching_v_upload
 
P1 3 2017_python_exercises
P1 3 2017_python_exercisesP1 3 2017_python_exercises
P1 3 2017_python_exercises
 
P3 2017 python_regexes
P3 2017 python_regexesP3 2017 python_regexes
P3 2017 python_regexes
 
P2 2017 python_strings
P2 2017 python_stringsP2 2017 python_strings
P2 2017 python_strings
 
P1 2017 python
P1 2017 pythonP1 2017 python
P1 2017 python
 
2017 molecular profiling_wim_vancriekinge
2017 molecular profiling_wim_vancriekinge2017 molecular profiling_wim_vancriekinge
2017 molecular profiling_wim_vancriekinge
 
Mysql all
Mysql allMysql all
Mysql all
 

Recently uploaded

Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfchloefrazer622
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...fonyou31
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room servicediscovermytutordmt
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...Sapna Thakur
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpinRaunakKeshri1
 

Recently uploaded (20)

Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdf
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room service
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 

2018 02 20_biological_databases_part1_v_upload

  • 1.
  • 3.
  • 4.
  • 5. Math Informatics Bioinformatics, a scientific discipline ? Or the new (molecular) biology ? Theoretical Biology Computational Biology (Molecular) Biology Computer Science Bioinformatics
  • 6. Lab for Bioinformatics and computational genomics
  • 7. Statistics Machine Learning Text Mining Bioinformatics Discovery Informatics Informatics (Molecular) Biology
  • 8. Statistics Machine Learning Text Mining Python, … Biological Databases Bioinformatics Discovery Informatics (Molecular) Biology
  • 9. The most valuable programming skills to have on a resume
  • 10. New kid in the coding block …
  • 11. Statistics Machine Learning Text Mining Python, … Biological Databases Epigenetics Bioinformatics Discovery Informatics
  • 12. Sander-Schneider • HSSP: homology derived secondary structure
  • 13.
  • 14.
  • 15. Usage of the databases Annotation searches - Search for keywords, authors, features
  • 16. Usage of the databases Annotation searches - Search for keywords, authors, features  What is the protein sequence for human insulin?  How does the 3D structure of calmodulin look like?  What is the genetic location of the cystic fibrosis gene?  List all intron sequences in rat.
  • 17. Usage of the databases Annotation searches - Search for keywords, authors, features
  • 18. Usage of the databases Annotation searches - Search for keywords, authors, features Homology (similarity) searches - Search for similar sequences
  • 19. Usage of the databases Annotation searches - Search for keywords, authors, features Homology (similarity) searches - Search for similar sequences  Is there any known protein sequence that is similar to x?  Is this gene known in any other species?  Has someone already cloned this sequence?
  • 20. Usage of the databases Annotation searches - Search for keywords, authors, features Homology (similarity) searches - Search for similar sequences
  • 21. Usage of the databases Annotation searches - Search for keywords, authors, features Homology (similarity) searches - Search for similar sequences Pattern searches - Search for occurrences of patterns
  • 22. Usage of the databases Annotation searches - Search for keywords, authors, features Homology (similarity) searches - Search for similar sequences Pattern searches - Search for occurrences of patterns  Do my protein sequence contain any known motif (that can give me a clue about the function)?  Which known sequences contain this motif?  Is any part of my nucleotide sequence recognized by a transcriptional factor?  List all known start, splice and stop signals in my genomic sequence.
  • 23. Usage of the databases Annotation searches - Search for keywords, authors, features Homology (similarity) searches - Search for similar sequences Pattern searches - Search for occurrences of patterns
  • 24. Usage of the databases Annotation searches - Search for keywords, authors, features Homology (similarity) searches - Search for similar sequences Pattern searches - Search for occurrences of patterns Predictions - Using the databases as knowledge databases
  • 25. Usage of the databases Annotation searches - Search for keywords, authors, features Homology (similarity) searches - Search for similar sequences Pattern searches - Search for occurrences of patterns Predictions - Using the databases as knowledge databases  What may the structure of my protein be? Secondary structure prediction. Modelling by homology.  What is the gene structure of my genomic sequence?  Which parts of my protein have a high antigenicity?
  • 26. Usage of the databases Annotation searches - Search for keywords, authors, features Homology (similarity) searches - Search for similar sequences Pattern searches - Search for occurrences of patterns Predictions - Using the databases as knowledge databases
  • 27. Usage of the databases Annotation searches - Search for keywords, authors, features Homology (similarity) searches - Search for similar sequences Pattern searches - Search for occurrences of patterns Predictions - Using the databases as knowledge databases Comparisons
  • 28. Usage of the databases Annotation searches - Search for keywords, authors, features Homology (similarity) searches - Search for similar sequences Pattern searches - Search for occurrences of patterns Predictions - Using the databases as knowledge databases Comparisons  Gene families  Phylogenetic trees
  • 29. Les 1 • Bioinformatics I Revisited in 5 slides • Why bother making databases ? • DataBases – FF • *.txt • Indexed version – Relational (RDBMS) • Access, MySQL, PostGRES, Oracle – OO (OODBMS) • AceDB, ObjectStore – Hierarchical • XML – Frame based system • Eg. DAML+OIL – Hybrid systems
  • 30. GenBank Format LOCUS LISOD 756 bp DNA BCT 30-JUN-1993 DEFINITION L.ivanovii sod gene for superoxide dismutase. ACCESSION X64011.1 GI:37619753 NID g44010 KEYWORDS sod gene; superoxide dismutase. SOURCE Listeria ivanovii. ORGANISM Listeria ivanovii Eubacteria; Firmicutes; Low G+C gram-positive bacteria; Bacillaceae; Listeria. REFERENCE 1 (bases 1 to 756) AUTHORS Haas,A. and Goebel,W. TITLE Cloning of a superoxide dismutase gene from Listeria ivanovii by functional complementation in Escherichia coli and characterization of the gene product JOURNAL Mol. Gen. Genet. 231 (2), 313-322 (1992) MEDLINE 92140371 REFERENCE 2 (bases 1 to 756) AUTHORS Kreft,J. TITLE Direct Submission JOURNAL Submitted (21-APR-1992) J. Kreft, Institut f. Mikrobiologie, Universitaet Wuerzburg, Biozentrum Am Hubland, 8700 Wuerzburg, FRG
  • 31. Problems with Flat files … • Wasted storage space • Wasted processing time • Data control problems • Problems caused by changes to data structures • Access to data difficult • Data out of date • Constraints are system based • Limited querying eg. all single exon GPCRs (<1000 bp)
  • 32. • What is a relational database ? – Sets of tables and links (the data) – A language to query the datanase (Structured Query Language) – A program to manage the data (RDBMS) • Flat files are not relational – Data type (attribute) is part of the data – Record order mateters – Multiline records – Massive duplication • Bv Organism: Homo sapeinsm Eukaryota, … – Some records are hierarchical • Xrefs – Records contain multiple “sub-records” – Implecit “Key”
  • 33. • records • fields • linear file of homogeneous records name......................... surname.................... phone........................ address...................... name......................... surname.................... phone........................ address...................... name......................... surname.................... phone........................ address...................... name......................... surname.................... phone........................ address...................... name......................... surname.................... phone........................ address...................... name......................... surname.................... phone........................ address...................... name......................... surname.................... phone........................ address...................... name......................... surname.................... phone........................ address......................
  • 34. • Terms and concepts: – tuple – domain – attribute – key – integrity rules
  • 35. Introduction to Database Systems • Historic Background – Hierarchical databases (IMS) - IBM 1968 • Hierarchical structures between file records – Network databases - CODASYL Group 1969 • Network structures of record types • Linked chains between 'Owner' and 'Member' records • Included in Cobol, procedural language - Manual navigation – Relational Data Model - E. F. Codd 1970 • Mathematical foundation of databases • New non-procedural language SQL - Automatic navigation – Object-relational databases – Object-oriented databases
  • 36. Relational • The Relational model is not only very mature, but it has developed a strong knowledge on how to make a relational back-end fast and reliable, and how to exploit different technologies such as massive SMP, Optical jukeboxes, clustering and etc. Object databases are nowhere near to this, and I do not expect then to get there in the short or medium term. • Relational Databases have a very well-known and proven underlying mathematical theory, a simple one (the set theory) that makes possible – automatic cost-based query optimization, – schema generation from high-level models and – many other features that are now vital for mission-critical Information Systems development and operations.
  • 37. The Benefits of Databases • Redundancy can be reduced • Inconsistency can be avoided • Conflicting requirements can be balanced • Standards can be enforced • Data can be shared • Data independence • Integrity can be maintained • Security restrictions can be applied
  • 38. Relational Terminology ID NAME PHONE EMP_ID 201 Unisports 55-2066101 12 202 Simms Atheletics 81-20101 14 203 Delhi Sports 91-10351 14 204 Womansport 1-206-104-0103 11 Row (Tuple) Column (Attribute) CUSTOMER Table (Relation)
  • 39. Relational Database Terminology • Each row of data in a table is uniquely identified by a primary key (PK) • Information in multiple tables can be logically related by foreign keys (FK) ID LAST_NAME FIRST_NAME 10 Havel Marta 11 Magee Colin 12 Giljum Henry 14 Nguyen Mai ID NAME PHONE EMP_ID 201 Unisports 55-2066101 12 202 Simms Atheletics 81-20101 14 203 Delhi Sports 91-10351 14 204 Womansport 1-206-104-0103 11 Table Name: CUSTOMER Table Name: EMP Primary Key Foreign Key Primary Key
  • 40. Relational Database Terminology Relational operators • Relational – select rel WHERE boolean-xpr – project rel [ attr-specs ] – join rel JOIN rel – divide by rel DIVIDEBY rel • Set-based  rel UNION rel  rel INTERSECT rel rel MINUS rel  rel TIMES rel
  • 41. Disadvantages • size • complexity • cost • Additional hardware costs • Higher impact of failure • Recovery more difficult
  • 42. • RDBM products – Free • MySQL, very fast, widely usedm easy to jump into but limited non standard SQL • PostrgreSQL – full SQLm limited OO, higher learning curve than MySQL – Commercial • MS Access – Great query builder, GUI interfaces • MS SQL Server – full SQL, NT only • Oracle, everything, including the kitchen sink • IBM DB2, Sybase
  • 43. Example 3-tier model in biological database http://www.bioinformatics.be Example of different interface to the same back-end database (MySQL)
  • 44.
  • 45.
  • 47. Conclusions • A database is a central component of any contemporary information system • The operations on the database and the mainenance of database consistency is handled by a DBMS • There exist stand alone query languages or embedded languages but both deal with definition (DDL) and manipulation (DML) aspects • The structural properties, constraints and operations permitted within a DBMS are defined by a data model - hierarchical, network, relational • Recovery and concurrency control are essential • Linking of heterogebous datasources is central theme in modern bioinformatics
  • 48. What is to come ? Basic outline • Setup RDMBS • OLTP Access through CLI, dedicated client, PHP, Perl/Python • OLAP Access through Perl/Python, R .. Integration • Cytoscape Semantic Web • noSQL/Hadoop • SPARQL
  • 50. 3/05/2016 Project Biological Databases 2015-2016 Biological Databases Bruno Verstraeten, Arthur Zwaenepoel, Jules Haezebrouck, Laurenz De Cock, Jonathan Walgraeve, Cedric Bogaert, Dries Schaumont
  • 51. What is minecraft • Sandbox game • Designed by Markus “Notch” Persson • Mojang • Bought by Microsoft in 2014 • 70 million sold copies (june 2015)
  • 52.
  • 54. Third party mods • Extra content made by users • Adding items, magic and features to the original game • The true beauty of minecraft
  • 55. And now Sciencecraft • Visualizing proteins in minecraft • Minecraft Tools python package • Data directly from PDB flat files or from the PDB server • Spigot minecraft server
  • 56. The basics 1. Start a server with Minecraft Tools 2. Using python import the pdb file 3. Retrieve the coordinates from the file 4. Using the setBlock function blocks of specific colours are placed in the minecraft server to represent the protein 5. Fly around and take screenshots
  • 57. Minecraft programming from Python # Connect to Minecraft from mcpi.minecraft import Minecraft mc = Minecraft.create() # Set x, y, and z variables to represent coordinates x = 10.0 y = 110.0 z = 12.0 # Change the player's position mc.player.setPos(x, y, z)
  • 61. Retrieving PDB data using SPARQL • PDB available in RDF (wwPDB) • Using python SPARQLwrapper
  • 62. Using SPARQL with Python – SPARQLWrapper SPARQL endpoint
  • 63. Using SPARQL with Python – SPARQLWrapper “Search engine” • Naive regex based • Returns list of all pdb entries containing a certain keyword with organism name and full description • PDB entry can be retrieved with previous query
  • 64.
  • 65. Retrieve .xml.gz file:  Actual structure information in xml file <?xml version="1.0" encoding="UTF-8" ?> <PDBx:datablock datablockName="1O9K" xmlns:PDBx="http://pdbml.pdb.org/schema/pdbx-v40.xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://pdbml.pdb.org/schema/pdbx- v40.xsd pdbx-v40.xsd"> <PDBx:atom_siteCategory> <PDBx:atom_site id="1"> <PDBx:B_iso_or_equiv>62.42</PDBx:B_iso_or_equiv> <PDBx:Cartn_x>13.258</PDBx:Cartn_x> <PDBx:Cartn_y>142.706</PDBx:Cartn_y> <PDBx:Cartn_z>30.410</PDBx:Cartn_z> <PDBx:auth_asym_id>A</PDBx:auth_asym_id> <PDBx:auth_atom_id>N</PDBx:auth_atom_id> <PDBx:auth_comp_id>MET</PDBx:auth_comp_id> <PDBx:auth_seq_id>379</PDBx:auth_seq_id> <PDBx:group_PDB>ATOM</PDBx:group_PDB> <PDBx:label_alt_id xsi:nil="true" /> <PDBx:label_asym_id>A</PDBx:label_asym_id> <PDBx:label_atom_id>N</PDBx:label_atom_id> <PDBx:label_comp_id>MET</PDBx:label_comp_id> <PDBx:label_entity_id>1</PDBx:label_entity_id> <PDBx:label_seq_id>8</PDBx:label_seq_id> <PDBx:occupancy>1.00</PDBx:occupancy> …. Using SPARQL with Python – SPARQLWrapper
  • 68. INTERNATIONAL GENETICALLY ENGINEERED MACHINE➤ Annual synthetic biology competition ➤ Making new organisms: biobricks ➤ Hosted by MIT: five teams in 2004, 130 teams in 2016
  • 69. PAST IGEM WINNERS 2014 biosensor for olive oil quality 2015 3D printing of biofilms 2016 system for the control of co- culture stability
  • 72. FOUR WORK PACKAGES WP2: Filament WP3: Biofunction WP1: Shape WP4: Measurement
  • 80. Presenting, learning and having fun in Boston
  • 81. FOLLOW UP ➤ Maker City ➤ BrainBooster session CropDesign ➤ Biodesign competition ➤ Bachelor project on 3D printing ➤ PLOS iGEM collection
  • 82. Next steps Ideas for the next iGem teams … 83
  • 83. v2 84 • Thermodynamic compatible testing setting • Real-life testing With UCSC (make bigger version ?) and/or green “plantable” versions for field tests (self- watering plant ?) • Introduce temperature gradient ?? Blend current dewpal with solar and/or wind energy source …
  • 85. 86 … why go to the trouble of collecting water out of the air? Why not simply cause more rain to fall? With our INP ??? http://science.howstuffworks.com/environmental/earth/geophysics/manufacture-water1.htm An even wilder idea …
  • 86. Environmental remediation projects (e.g. ISS water recovery – change freezing point … contact Prof Arne Verliefde) Molecular diagnostics - Liquid Biopsy project Capture (methylated) cancer molecules from blood and/urine in oncology, we can run head-to-head samples from clinical trial in bladder and/or prostate cancer Alternative projects 87