SlideShare a Scribd company logo
SQL, noSQL or no database at all?
Are databases still a core skill?
Neil Saunders
COMPUTATIONAL INFORMATICS
www.csiro.au
Databases: Slide 2 of 24
alternative title: should David Lovell learn databases?
Databases: Slide 3 of 24
actual recent email request
Hi Neil,
I was wondering if you could help me with something. I am trying to put
together a table but it is rather slow by hand. Do you know if you can
help me with this task with a script? If it is too much of your time,
don’t worry about it. Just thought I’d ask before I start.
The task is:
The targets listed in A tab need to be found in B tab then the entire row
copied into C tab. Then the details in column C of C tab then need to be
matched with the details in D tab so that the patients with the mutations
are listed in row AG and AH of C tab.
Again, if this isn’t an easy task for you then don’t worry about it.
Databases: Slide 4 of 24
sounds like a database to me (c. 2004)
Databases: Slide 5 of 24
database design is a profession in itself
-- KEGG_DB schema
CREATE TABLE ec2go (
ec_no VARCHAR(16) NOT NULL, -- EC number (with "EC:" prefix)
go_id CHAR(10) NOT NULL -- GO ID
);
CREATE TABLE pathway2gene (
pathway_id CHAR(8) NOT NULL, -- KEGG pathway long ID
gene_id VARCHAR(20) NOT NULL -- Entrez Gene or ORF ID
);
CREATE TABLE pathway2name (
path_id CHAR(5) NOT NULL UNIQUE, -- KEGG pathway short ID
path_name VARCHAR(80) NOT NULL UNIQUE -- KEGG pathway name
);
-- Indexes.
CREATE INDEX Ipathway2gene ON pathway2gene (gene_id);
Databases: Slide 6 of 24
know your ORM from your MVC
(do you DSL?)
http://en.wikipedia.org/wiki/Model%E2%80%93view%E2%80%93controller
Databases: Slide 7 of 24
my one tip for today: use ORM
= object relational mapping
#!/usr/bin/ruby
require ’sequel’
# connect to UCSC Genomes MySQL server
DB = Sequel.connect(:adapter => "mysql", :host => "genome-mysql.cse.ucsc.edu",
:user => "genome", :database => "hg19")
# instead of "SELECT count(*) FROM knownGene"
DB.from(:knownGene).count
# => 82960
# instead of "SELECT name, chrom, txStart FROM knownGene LIMIT 1"
DB.from(:knownGene).select(:name, :chrom, :txStart).first
# => {:name=>"uc001aaa.3", :chrom=>"chr1", :txStart=>11873}
# instead of "SELECT name FROM knownGene WHERE chrom == ’chrM’"
DB.from(:knownGene).where(:chrom => "chrM").all
# => [{:name=>"uc004coq.4"}, {:name=>"uc022bqo.2"}, {:name=>"uc004cor.1"}, {:name=>"uc004cos.5"},
# {:name=>"uc022bqp.1"}, {:name=>"uc022bqq.1"}, {:name=>"uc022bqr.1"}, {:name=>"uc031tga.1"},
# {:name=>"uc022bqs.1"}, {:name=>"uc011mfi.2"}, {:name=>"uc022bqt.1"}, {:name=>"uc022bqu.2"},
# {:name=>"uc004cov.5"}, {:name=>"uc031tgb.1"}, {:name=>"uc004cow.2"}, {:name=>"uc004cox.4"},
# {:name=>"uc022bqv.1"}, {:name=>"uc022bqw.1"}, {:name=>"uc022bqx.1"}, {:name=>"uc004coz.1"}]
Databases: Slide 8 of 24
don’t want to CREATE? you still might want to SELECT
Question: How to map a SNP to a gene around +/- 60KB ?
I am looking at a bunch of SNPs. Some of them are part of genes,
but other are not. I am interested to look up +60KB or -60KB of
those SNPs to get details about some nearby genes. Please share
your experience in dealing with such a situation or thoughts on
any methods that can do this. Thanks in advance.
http://www.biostars.org/p/413/
Databases: Slide 9 of 24
example SELECT
mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -D hg19 -e ’
select
K.proteinID, K.name, S.name,
S.avHet, S.chrom, S.chromStart,
K.txStart, K.txEnd
from snp130 as S
left join knownGene as K on
(S.chrom = K.chrom and not(K.txEnd + 60000 < S.chromStart or
S.chromEnd + 60000 < K.txStart))
where
S.name in ("rs25","rs100","rs75","rs9876","rs101")
’
Databases: Slide 10 of 24
example SELECT result
Databases: Slide 11 of 24
let’s talk about noSQL
http://www.infoivy.com/2013/07/nosql-database-comparison-chart-only.html
Databases: Slide 12 of 24
(potentially) a good fit for biological data
Databases: Slide 13 of 24
many data sources are “key-value ready”
(or close enough)
http://togows.dbcls.jp/entry/pathway/hsa00030/genes.json
[
{
"2821": "GPI; glucose-6-phosphate isomerase [KO:K01810] [EC:5.3.1.9]",
"2539": "G6PD; glucose-6-phosphate dehydrogenase [KO:K00036] [EC:1.1.1.49]",
"25796": "PGLS; 6-phosphogluconolactonase [KO:K01057] [EC:3.1.1.31]",
...
"5213": "PFKM; phosphofructokinase, muscle [KO:K00850] [EC:2.7.1.11]",
"5214": "PFKP; phosphofructokinase, platelet [KO:K00850] [EC:2.7.1.11]",
"5211": "PFKL; phosphofructokinase, liver [KO:K00850] [EC:2.7.1.11]"
}
]
Databases: Slide 14 of 24
schema-free: save first, worry later
(= agile)
#!/usr/bin/ruby
require "mongo"
require "json/pure"
require "open-uri"
db = Mongo::Connection.new.db(’kegg’)
col = db.collection(’genes’)
j = JSON.parse(open("http://togows.dbcls.jp/entry/pathway/hsa00030/genes.json").read)
j.each do |g|
gene = Hash.new
g.each_pair do |key, val|
gene[:_id] = key
gene[:desc] = val
col.save(gene)
end
end
Ruby code to save JSON from the TogoWS REST service
Databases: Slide 15 of 24
example application - PMRetract
ask later if interested
http://pmretract.heroku.com/
https://github.com/neilfws/PubMed/tree/master/retractions
Databases: Slide 16 of 24
when rows + columns != database
- sometimes a database is overkill
Databases: Slide 17 of 24
example 1 - R/IRanges
Databases: Slide 18 of 24
example 2 - bedtools
http://bedtools.readthedocs.org/en/latest/
Databases: Slide 19 of 24
example 3 - unix join (and the shell in general)
Databases: Slide 20 of 24
when are databases good?
- when data are updated frequently
- when multiple users do the updating
- when queries are complex or ever-changing
- as backends to web applications
Databases: Slide 21 of 24
when are databases not/less good?
- for basic “set operations”
- for sequence data [1]
(?)
[1] no time to discuss BioSQL, GBrowse/Bio::DB::GFF, BioDAS etc.
Databases: Slide 22 of 24
so how did I answer that email?
options(java.parameters = "-Xmx4g")
library(XLConnect)
wb <- loadWorkbook("˜/Downloads/NGS Target list Tumour for Neil.xlsx")
s1 <- readWorksheet(wb, sheet = 1, startCol = 1, endCol = 1, header = F)
s2 <- readWorksheet(wb, sheet = 2, startCol = 1, endCol = 32, header = T)
s4 <- readWorksheet(wb, sheet = 4, startCol = 1, endCol = 3, header = T)
# then use gsub, match, %in% etc. to clean and join the data
# ...
Read spreadsheet into R using the XLConnect package, then “munge”

More Related Content

What's hot

Cookies
CookiesCookies
Cookies
amuthadeepa
 
PostgreSQL
PostgreSQLPostgreSQL
PostgreSQL
Reuven Lerner
 
MUC - Moodle Universal Cache
MUC - Moodle Universal CacheMUC - Moodle Universal Cache
MUC - Moodle Universal Cache
Tim Hunt
 
Book integrated assignment
Book integrated assignmentBook integrated assignment
Book integrated assignment
Akash gupta
 
Running ms sql stored procedures in mule
Running ms sql stored procedures in muleRunning ms sql stored procedures in mule
Running ms sql stored procedures in mule
AnilKumar Etagowni
 
Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn
Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp KrennJavantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn
Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn
HUJAK - Hrvatska udruga Java korisnika / Croatian Java User Association
 
PGDay.Amsterdam 2018 - Stefan Fercot - Save your data with pgBackRest
PGDay.Amsterdam 2018 - Stefan Fercot - Save your data with pgBackRestPGDay.Amsterdam 2018 - Stefan Fercot - Save your data with pgBackRest
PGDay.Amsterdam 2018 - Stefan Fercot - Save your data with pgBackRest
PGDay.Amsterdam
 
MySQL5.7で遊んでみよう
MySQL5.7で遊んでみようMySQL5.7で遊んでみよう
MySQL5.7で遊んでみよう
yoku0825
 
2015 02-09 - NoSQL Vorlesung Mosbach
2015 02-09 - NoSQL Vorlesung Mosbach2015 02-09 - NoSQL Vorlesung Mosbach
2015 02-09 - NoSQL Vorlesung Mosbach
Johannes Hoppe
 

What's hot (9)

Cookies
CookiesCookies
Cookies
 
PostgreSQL
PostgreSQLPostgreSQL
PostgreSQL
 
MUC - Moodle Universal Cache
MUC - Moodle Universal CacheMUC - Moodle Universal Cache
MUC - Moodle Universal Cache
 
Book integrated assignment
Book integrated assignmentBook integrated assignment
Book integrated assignment
 
Running ms sql stored procedures in mule
Running ms sql stored procedures in muleRunning ms sql stored procedures in mule
Running ms sql stored procedures in mule
 
Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn
Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp KrennJavantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn
Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn
 
PGDay.Amsterdam 2018 - Stefan Fercot - Save your data with pgBackRest
PGDay.Amsterdam 2018 - Stefan Fercot - Save your data with pgBackRestPGDay.Amsterdam 2018 - Stefan Fercot - Save your data with pgBackRest
PGDay.Amsterdam 2018 - Stefan Fercot - Save your data with pgBackRest
 
MySQL5.7で遊んでみよう
MySQL5.7で遊んでみようMySQL5.7で遊んでみよう
MySQL5.7で遊んでみよう
 
2015 02-09 - NoSQL Vorlesung Mosbach
2015 02-09 - NoSQL Vorlesung Mosbach2015 02-09 - NoSQL Vorlesung Mosbach
2015 02-09 - NoSQL Vorlesung Mosbach
 

Viewers also liked

MongoDB to Cassandra
MongoDB to CassandraMongoDB to Cassandra
MongoDB to Cassandra
fredvdd
 
SQL Server 2012 Deep Dive (rus)
SQL Server 2012 Deep Dive (rus)SQL Server 2012 Deep Dive (rus)
SQL Server 2012 Deep Dive (rus)
Денис Резник
 
iForum 2015: SQL vs. NoSQL
iForum 2015: SQL vs. NoSQLiForum 2015: SQL vs. NoSQL
iForum 2015: SQL vs. NoSQL
Денис Резник
 
Equinix Big Data Platform and Cassandra - A view into the journey
Equinix Big Data Platform and Cassandra - A view into the journeyEquinix Big Data Platform and Cassandra - A view into the journey
Equinix Big Data Platform and Cassandra - A view into the journey
Praveen Kumar
 
NoSQL Databases, Not just a Buzzword
NoSQL Databases, Not just a Buzzword NoSQL Databases, Not just a Buzzword
NoSQL Databases, Not just a Buzzword
Haitham El-Ghareeb
 
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL databaseHBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
Edureka!
 
Big Data, NoSQL with MongoDB and Cassasdra
Big Data, NoSQL with MongoDB and CassasdraBig Data, NoSQL with MongoDB and Cassasdra
Big Data, NoSQL with MongoDB and Cassasdra
Brian Enochson
 
Deep Dive into SharePoint Topologies and Server Architecture for SharePoint 2013
Deep Dive into SharePoint Topologies and Server Architecture for SharePoint 2013Deep Dive into SharePoint Topologies and Server Architecture for SharePoint 2013
Deep Dive into SharePoint Topologies and Server Architecture for SharePoint 2013
K.Mohamed Faizal
 

Viewers also liked (8)

MongoDB to Cassandra
MongoDB to CassandraMongoDB to Cassandra
MongoDB to Cassandra
 
SQL Server 2012 Deep Dive (rus)
SQL Server 2012 Deep Dive (rus)SQL Server 2012 Deep Dive (rus)
SQL Server 2012 Deep Dive (rus)
 
iForum 2015: SQL vs. NoSQL
iForum 2015: SQL vs. NoSQLiForum 2015: SQL vs. NoSQL
iForum 2015: SQL vs. NoSQL
 
Equinix Big Data Platform and Cassandra - A view into the journey
Equinix Big Data Platform and Cassandra - A view into the journeyEquinix Big Data Platform and Cassandra - A view into the journey
Equinix Big Data Platform and Cassandra - A view into the journey
 
NoSQL Databases, Not just a Buzzword
NoSQL Databases, Not just a Buzzword NoSQL Databases, Not just a Buzzword
NoSQL Databases, Not just a Buzzword
 
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL databaseHBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
 
Big Data, NoSQL with MongoDB and Cassasdra
Big Data, NoSQL with MongoDB and CassasdraBig Data, NoSQL with MongoDB and Cassasdra
Big Data, NoSQL with MongoDB and Cassasdra
 
Deep Dive into SharePoint Topologies and Server Architecture for SharePoint 2013
Deep Dive into SharePoint Topologies and Server Architecture for SharePoint 2013Deep Dive into SharePoint Topologies and Server Architecture for SharePoint 2013
Deep Dive into SharePoint Topologies and Server Architecture for SharePoint 2013
 

Similar to SQL, noSQL or no database at all? Are databases still a core skill?

Why re-use core classes?
Why re-use core classes?Why re-use core classes?
Why re-use core classes?
Levi Waldron
 
ClickHouse Introduction by Alexander Zaitsev, Altinity CTO
ClickHouse Introduction by Alexander Zaitsev, Altinity CTOClickHouse Introduction by Alexander Zaitsev, Altinity CTO
ClickHouse Introduction by Alexander Zaitsev, Altinity CTO
Altinity Ltd
 
Introduction to Apache Cassandra
Introduction to Apache CassandraIntroduction to Apache Cassandra
Introduction to Apache Cassandra
Robert Stupp
 
Let your DBAs get some REST(api)
Let your DBAs get some REST(api)Let your DBAs get some REST(api)
Let your DBAs get some REST(api)
Ludovico Caldara
 
10 Key MongoDB Performance Indicators
10 Key MongoDB Performance Indicators  10 Key MongoDB Performance Indicators
10 Key MongoDB Performance Indicators
iammutex
 
Indic threads pune12-nosql now and path ahead
Indic threads pune12-nosql now and path aheadIndic threads pune12-nosql now and path ahead
Indic threads pune12-nosql now and path ahead
IndicThreads
 
SQL is Dead; Long Live SQL: Lightweight Query Services for Long Tail Science
SQL is Dead; Long Live SQL: Lightweight Query Services for Long Tail ScienceSQL is Dead; Long Live SQL: Lightweight Query Services for Long Tail Science
SQL is Dead; Long Live SQL: Lightweight Query Services for Long Tail Science
University of Washington
 
Convolutional neural networks for image classification — evidence from Kaggle...
Convolutional neural networks for image classification — evidence from Kaggle...Convolutional neural networks for image classification — evidence from Kaggle...
Convolutional neural networks for image classification — evidence from Kaggle...
Dmytro Mishkin
 
Introduction to NoSQL Database
Introduction to NoSQL DatabaseIntroduction to NoSQL Database
Introduction to NoSQL Database
Mohammad Alghanem
 
MySQL as a Document Store
MySQL as a Document StoreMySQL as a Document Store
MySQL as a Document Store
Dave Stokes
 
Open Source SQL databases enters millions queries per second era
Open Source SQL databases enters millions queries per second eraOpen Source SQL databases enters millions queries per second era
Open Source SQL databases enters millions queries per second era
Sveta Smirnova
 
RUCK 2017 MxNet과 R을 연동한 딥러닝 소개
RUCK 2017 MxNet과 R을 연동한 딥러닝 소개RUCK 2017 MxNet과 R을 연동한 딥러닝 소개
RUCK 2017 MxNet과 R을 연동한 딥러닝 소개
r-kor
 
Tutorial On Database Management System
Tutorial On Database Management SystemTutorial On Database Management System
Tutorial On Database Management System
psathishcs
 
Open Source SQL databases enter millions queries per second era
Open Source SQL databases enter millions queries per second eraOpen Source SQL databases enter millions queries per second era
Open Source SQL databases enter millions queries per second era
Alexander Korotkov
 
Server-Side Development for the Cloud
Server-Side Developmentfor the CloudServer-Side Developmentfor the Cloud
Server-Side Development for the Cloud
Michael Rosenblum
 
MySQL Without the SQL - Oh My! August 2nd presentation at Mid Atlantic Develo...
MySQL Without the SQL - Oh My! August 2nd presentation at Mid Atlantic Develo...MySQL Without the SQL - Oh My! August 2nd presentation at Mid Atlantic Develo...
MySQL Without the SQL - Oh My! August 2nd presentation at Mid Atlantic Develo...
Dave Stokes
 
Dynamic SQL: How to Build Fast Multi-Parameter Stored Procedures
Dynamic SQL: How to Build Fast Multi-Parameter Stored ProceduresDynamic SQL: How to Build Fast Multi-Parameter Stored Procedures
Dynamic SQL: How to Build Fast Multi-Parameter Stored Procedures
Brent Ozar
 
Open Source World June '21 -- JSON Within a Relational Database
Open Source World June '21 -- JSON Within a Relational DatabaseOpen Source World June '21 -- JSON Within a Relational Database
Open Source World June '21 -- JSON Within a Relational Database
Dave Stokes
 
MySQL Without the MySQL -- Oh My!
MySQL Without the MySQL -- Oh My!MySQL Without the MySQL -- Oh My!
MySQL Without the MySQL -- Oh My!
Dave Stokes
 
All Things Open 2016 -- Database Programming for Newbies
All Things Open 2016 -- Database Programming for NewbiesAll Things Open 2016 -- Database Programming for Newbies
All Things Open 2016 -- Database Programming for Newbies
Dave Stokes
 

Similar to SQL, noSQL or no database at all? Are databases still a core skill? (20)

Why re-use core classes?
Why re-use core classes?Why re-use core classes?
Why re-use core classes?
 
ClickHouse Introduction by Alexander Zaitsev, Altinity CTO
ClickHouse Introduction by Alexander Zaitsev, Altinity CTOClickHouse Introduction by Alexander Zaitsev, Altinity CTO
ClickHouse Introduction by Alexander Zaitsev, Altinity CTO
 
Introduction to Apache Cassandra
Introduction to Apache CassandraIntroduction to Apache Cassandra
Introduction to Apache Cassandra
 
Let your DBAs get some REST(api)
Let your DBAs get some REST(api)Let your DBAs get some REST(api)
Let your DBAs get some REST(api)
 
10 Key MongoDB Performance Indicators
10 Key MongoDB Performance Indicators  10 Key MongoDB Performance Indicators
10 Key MongoDB Performance Indicators
 
Indic threads pune12-nosql now and path ahead
Indic threads pune12-nosql now and path aheadIndic threads pune12-nosql now and path ahead
Indic threads pune12-nosql now and path ahead
 
SQL is Dead; Long Live SQL: Lightweight Query Services for Long Tail Science
SQL is Dead; Long Live SQL: Lightweight Query Services for Long Tail ScienceSQL is Dead; Long Live SQL: Lightweight Query Services for Long Tail Science
SQL is Dead; Long Live SQL: Lightweight Query Services for Long Tail Science
 
Convolutional neural networks for image classification — evidence from Kaggle...
Convolutional neural networks for image classification — evidence from Kaggle...Convolutional neural networks for image classification — evidence from Kaggle...
Convolutional neural networks for image classification — evidence from Kaggle...
 
Introduction to NoSQL Database
Introduction to NoSQL DatabaseIntroduction to NoSQL Database
Introduction to NoSQL Database
 
MySQL as a Document Store
MySQL as a Document StoreMySQL as a Document Store
MySQL as a Document Store
 
Open Source SQL databases enters millions queries per second era
Open Source SQL databases enters millions queries per second eraOpen Source SQL databases enters millions queries per second era
Open Source SQL databases enters millions queries per second era
 
RUCK 2017 MxNet과 R을 연동한 딥러닝 소개
RUCK 2017 MxNet과 R을 연동한 딥러닝 소개RUCK 2017 MxNet과 R을 연동한 딥러닝 소개
RUCK 2017 MxNet과 R을 연동한 딥러닝 소개
 
Tutorial On Database Management System
Tutorial On Database Management SystemTutorial On Database Management System
Tutorial On Database Management System
 
Open Source SQL databases enter millions queries per second era
Open Source SQL databases enter millions queries per second eraOpen Source SQL databases enter millions queries per second era
Open Source SQL databases enter millions queries per second era
 
Server-Side Development for the Cloud
Server-Side Developmentfor the CloudServer-Side Developmentfor the Cloud
Server-Side Development for the Cloud
 
MySQL Without the SQL - Oh My! August 2nd presentation at Mid Atlantic Develo...
MySQL Without the SQL - Oh My! August 2nd presentation at Mid Atlantic Develo...MySQL Without the SQL - Oh My! August 2nd presentation at Mid Atlantic Develo...
MySQL Without the SQL - Oh My! August 2nd presentation at Mid Atlantic Develo...
 
Dynamic SQL: How to Build Fast Multi-Parameter Stored Procedures
Dynamic SQL: How to Build Fast Multi-Parameter Stored ProceduresDynamic SQL: How to Build Fast Multi-Parameter Stored Procedures
Dynamic SQL: How to Build Fast Multi-Parameter Stored Procedures
 
Open Source World June '21 -- JSON Within a Relational Database
Open Source World June '21 -- JSON Within a Relational DatabaseOpen Source World June '21 -- JSON Within a Relational Database
Open Source World June '21 -- JSON Within a Relational Database
 
MySQL Without the MySQL -- Oh My!
MySQL Without the MySQL -- Oh My!MySQL Without the MySQL -- Oh My!
MySQL Without the MySQL -- Oh My!
 
All Things Open 2016 -- Database Programming for Newbies
All Things Open 2016 -- Database Programming for NewbiesAll Things Open 2016 -- Database Programming for Newbies
All Things Open 2016 -- Database Programming for Newbies
 

More from Neil Saunders

Online bioinformatics forums: why do we keep asking the same questions?
Online bioinformatics forums: why do we keep asking the same questions?Online bioinformatics forums: why do we keep asking the same questions?
Online bioinformatics forums: why do we keep asking the same questions?
Neil Saunders
 
Should I be dead? a very personal genomics
Should I be dead? a very personal genomicsShould I be dead? a very personal genomics
Should I be dead? a very personal genomics
Neil Saunders
 
Learning from complete strangers: social networking for bioinformaticians
Learning from complete strangers: social networking for bioinformaticiansLearning from complete strangers: social networking for bioinformaticians
Learning from complete strangers: social networking for bioinformaticians
Neil Saunders
 
Data Integration: What I Haven't Yet Achieved
Data Integration: What I Haven't Yet AchievedData Integration: What I Haven't Yet Achieved
Data Integration: What I Haven't Yet Achieved
Neil Saunders
 
Building A Web Application To Monitor PubMed Retraction Notices
Building A Web Application To Monitor PubMed Retraction NoticesBuilding A Web Application To Monitor PubMed Retraction Notices
Building A Web Application To Monitor PubMed Retraction Notices
Neil Saunders
 
Version Control in Bioinformatics: Our Experience Using Git
Version Control in Bioinformatics: Our Experience Using GitVersion Control in Bioinformatics: Our Experience Using Git
Version Control in Bioinformatics: Our Experience Using Git
Neil Saunders
 
What can science networking online do for you
What can science networking online do for youWhat can science networking online do for you
What can science networking online do for you
Neil Saunders
 
Using structural information to predict protein-protein interaction and enyzm...
Using structural information to predict protein-protein interaction and enyzm...Using structural information to predict protein-protein interaction and enyzm...
Using structural information to predict protein-protein interaction and enyzm...
Neil Saunders
 
Predikin and PredikinDB: tools to predict protein kinase peptide specificity
Predikin and PredikinDB:  tools to predict protein kinase peptide specificityPredikin and PredikinDB:  tools to predict protein kinase peptide specificity
Predikin and PredikinDB: tools to predict protein kinase peptide specificity
Neil Saunders
 
The Viking labelled release experiment: life on Mars?
The Viking labelled release experiment:  life on Mars?The Viking labelled release experiment:  life on Mars?
The Viking labelled release experiment: life on Mars?
Neil Saunders
 
Protein function and bioinformatics
Protein function and bioinformaticsProtein function and bioinformatics
Protein function and bioinformatics
Neil Saunders
 
Genomics of cold-adapted microorganisms
Genomics of cold-adapted microorganismsGenomics of cold-adapted microorganisms
Genomics of cold-adapted microorganisms
Neil Saunders
 

More from Neil Saunders (12)

Online bioinformatics forums: why do we keep asking the same questions?
Online bioinformatics forums: why do we keep asking the same questions?Online bioinformatics forums: why do we keep asking the same questions?
Online bioinformatics forums: why do we keep asking the same questions?
 
Should I be dead? a very personal genomics
Should I be dead? a very personal genomicsShould I be dead? a very personal genomics
Should I be dead? a very personal genomics
 
Learning from complete strangers: social networking for bioinformaticians
Learning from complete strangers: social networking for bioinformaticiansLearning from complete strangers: social networking for bioinformaticians
Learning from complete strangers: social networking for bioinformaticians
 
Data Integration: What I Haven't Yet Achieved
Data Integration: What I Haven't Yet AchievedData Integration: What I Haven't Yet Achieved
Data Integration: What I Haven't Yet Achieved
 
Building A Web Application To Monitor PubMed Retraction Notices
Building A Web Application To Monitor PubMed Retraction NoticesBuilding A Web Application To Monitor PubMed Retraction Notices
Building A Web Application To Monitor PubMed Retraction Notices
 
Version Control in Bioinformatics: Our Experience Using Git
Version Control in Bioinformatics: Our Experience Using GitVersion Control in Bioinformatics: Our Experience Using Git
Version Control in Bioinformatics: Our Experience Using Git
 
What can science networking online do for you
What can science networking online do for youWhat can science networking online do for you
What can science networking online do for you
 
Using structural information to predict protein-protein interaction and enyzm...
Using structural information to predict protein-protein interaction and enyzm...Using structural information to predict protein-protein interaction and enyzm...
Using structural information to predict protein-protein interaction and enyzm...
 
Predikin and PredikinDB: tools to predict protein kinase peptide specificity
Predikin and PredikinDB:  tools to predict protein kinase peptide specificityPredikin and PredikinDB:  tools to predict protein kinase peptide specificity
Predikin and PredikinDB: tools to predict protein kinase peptide specificity
 
The Viking labelled release experiment: life on Mars?
The Viking labelled release experiment:  life on Mars?The Viking labelled release experiment:  life on Mars?
The Viking labelled release experiment: life on Mars?
 
Protein function and bioinformatics
Protein function and bioinformaticsProtein function and bioinformatics
Protein function and bioinformatics
 
Genomics of cold-adapted microorganisms
Genomics of cold-adapted microorganismsGenomics of cold-adapted microorganisms
Genomics of cold-adapted microorganisms
 

Recently uploaded

Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Ana Luísa Pinho
 
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Sérgio Sacani
 
bordetella pertussis.................................ppt
bordetella pertussis.................................pptbordetella pertussis.................................ppt
bordetella pertussis.................................ppt
kejapriya1
 
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Sérgio Sacani
 
Lateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensiveLateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensive
silvermistyshot
 
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATIONPRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
ChetanK57
 
Leaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdfLeaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdf
RenuJangid3
 
20240520 Planning a Circuit Simulator in JavaScript.pptx
20240520 Planning a Circuit Simulator in JavaScript.pptx20240520 Planning a Circuit Simulator in JavaScript.pptx
20240520 Planning a Circuit Simulator in JavaScript.pptx
Sharon Liu
 
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdfUnveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Erdal Coalmaker
 
Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.
Nistarini College, Purulia (W.B) India
 
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
University of Maribor
 
Seminar of U.V. Spectroscopy by SAMIR PANDA
 Seminar of U.V. Spectroscopy by SAMIR PANDA Seminar of U.V. Spectroscopy by SAMIR PANDA
Seminar of U.V. Spectroscopy by SAMIR PANDA
SAMIR PANDA
 
What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.
moosaasad1975
 
Shallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptxShallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptx
Gokturk Mehmet Dilci
 
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptxThe use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
MAGOTI ERNEST
 
Chapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisisChapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisis
tonzsalvador2222
 
Phenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvementPhenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvement
IshaGoswami9
 
THEMATIC APPERCEPTION TEST(TAT) cognitive abilities, creativity, and critic...
THEMATIC  APPERCEPTION  TEST(TAT) cognitive abilities, creativity, and critic...THEMATIC  APPERCEPTION  TEST(TAT) cognitive abilities, creativity, and critic...
THEMATIC APPERCEPTION TEST(TAT) cognitive abilities, creativity, and critic...
Abdul Wali Khan University Mardan,kP,Pakistan
 
Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...
Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...
Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...
Studia Poinsotiana
 
Deep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless ReproducibilityDeep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless Reproducibility
University of Rennes, INSA Rennes, Inria/IRISA, CNRS
 

Recently uploaded (20)

Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
 
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
 
bordetella pertussis.................................ppt
bordetella pertussis.................................pptbordetella pertussis.................................ppt
bordetella pertussis.................................ppt
 
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
 
Lateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensiveLateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensive
 
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATIONPRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
 
Leaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdfLeaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdf
 
20240520 Planning a Circuit Simulator in JavaScript.pptx
20240520 Planning a Circuit Simulator in JavaScript.pptx20240520 Planning a Circuit Simulator in JavaScript.pptx
20240520 Planning a Circuit Simulator in JavaScript.pptx
 
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdfUnveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdf
 
Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.
 
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
 
Seminar of U.V. Spectroscopy by SAMIR PANDA
 Seminar of U.V. Spectroscopy by SAMIR PANDA Seminar of U.V. Spectroscopy by SAMIR PANDA
Seminar of U.V. Spectroscopy by SAMIR PANDA
 
What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.
 
Shallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptxShallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptx
 
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptxThe use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
 
Chapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisisChapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisis
 
Phenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvementPhenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvement
 
THEMATIC APPERCEPTION TEST(TAT) cognitive abilities, creativity, and critic...
THEMATIC  APPERCEPTION  TEST(TAT) cognitive abilities, creativity, and critic...THEMATIC  APPERCEPTION  TEST(TAT) cognitive abilities, creativity, and critic...
THEMATIC APPERCEPTION TEST(TAT) cognitive abilities, creativity, and critic...
 
Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...
Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...
Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...
 
Deep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless ReproducibilityDeep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless Reproducibility
 

SQL, noSQL or no database at all? Are databases still a core skill?

  • 1. SQL, noSQL or no database at all? Are databases still a core skill? Neil Saunders COMPUTATIONAL INFORMATICS www.csiro.au
  • 2. Databases: Slide 2 of 24 alternative title: should David Lovell learn databases?
  • 3. Databases: Slide 3 of 24 actual recent email request Hi Neil, I was wondering if you could help me with something. I am trying to put together a table but it is rather slow by hand. Do you know if you can help me with this task with a script? If it is too much of your time, don’t worry about it. Just thought I’d ask before I start. The task is: The targets listed in A tab need to be found in B tab then the entire row copied into C tab. Then the details in column C of C tab then need to be matched with the details in D tab so that the patients with the mutations are listed in row AG and AH of C tab. Again, if this isn’t an easy task for you then don’t worry about it.
  • 4. Databases: Slide 4 of 24 sounds like a database to me (c. 2004)
  • 5. Databases: Slide 5 of 24 database design is a profession in itself -- KEGG_DB schema CREATE TABLE ec2go ( ec_no VARCHAR(16) NOT NULL, -- EC number (with "EC:" prefix) go_id CHAR(10) NOT NULL -- GO ID ); CREATE TABLE pathway2gene ( pathway_id CHAR(8) NOT NULL, -- KEGG pathway long ID gene_id VARCHAR(20) NOT NULL -- Entrez Gene or ORF ID ); CREATE TABLE pathway2name ( path_id CHAR(5) NOT NULL UNIQUE, -- KEGG pathway short ID path_name VARCHAR(80) NOT NULL UNIQUE -- KEGG pathway name ); -- Indexes. CREATE INDEX Ipathway2gene ON pathway2gene (gene_id);
  • 6. Databases: Slide 6 of 24 know your ORM from your MVC (do you DSL?) http://en.wikipedia.org/wiki/Model%E2%80%93view%E2%80%93controller
  • 7. Databases: Slide 7 of 24 my one tip for today: use ORM = object relational mapping #!/usr/bin/ruby require ’sequel’ # connect to UCSC Genomes MySQL server DB = Sequel.connect(:adapter => "mysql", :host => "genome-mysql.cse.ucsc.edu", :user => "genome", :database => "hg19") # instead of "SELECT count(*) FROM knownGene" DB.from(:knownGene).count # => 82960 # instead of "SELECT name, chrom, txStart FROM knownGene LIMIT 1" DB.from(:knownGene).select(:name, :chrom, :txStart).first # => {:name=>"uc001aaa.3", :chrom=>"chr1", :txStart=>11873} # instead of "SELECT name FROM knownGene WHERE chrom == ’chrM’" DB.from(:knownGene).where(:chrom => "chrM").all # => [{:name=>"uc004coq.4"}, {:name=>"uc022bqo.2"}, {:name=>"uc004cor.1"}, {:name=>"uc004cos.5"}, # {:name=>"uc022bqp.1"}, {:name=>"uc022bqq.1"}, {:name=>"uc022bqr.1"}, {:name=>"uc031tga.1"}, # {:name=>"uc022bqs.1"}, {:name=>"uc011mfi.2"}, {:name=>"uc022bqt.1"}, {:name=>"uc022bqu.2"}, # {:name=>"uc004cov.5"}, {:name=>"uc031tgb.1"}, {:name=>"uc004cow.2"}, {:name=>"uc004cox.4"}, # {:name=>"uc022bqv.1"}, {:name=>"uc022bqw.1"}, {:name=>"uc022bqx.1"}, {:name=>"uc004coz.1"}]
  • 8. Databases: Slide 8 of 24 don’t want to CREATE? you still might want to SELECT Question: How to map a SNP to a gene around +/- 60KB ? I am looking at a bunch of SNPs. Some of them are part of genes, but other are not. I am interested to look up +60KB or -60KB of those SNPs to get details about some nearby genes. Please share your experience in dealing with such a situation or thoughts on any methods that can do this. Thanks in advance. http://www.biostars.org/p/413/
  • 9. Databases: Slide 9 of 24 example SELECT mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -D hg19 -e ’ select K.proteinID, K.name, S.name, S.avHet, S.chrom, S.chromStart, K.txStart, K.txEnd from snp130 as S left join knownGene as K on (S.chrom = K.chrom and not(K.txEnd + 60000 < S.chromStart or S.chromEnd + 60000 < K.txStart)) where S.name in ("rs25","rs100","rs75","rs9876","rs101") ’
  • 10. Databases: Slide 10 of 24 example SELECT result
  • 11. Databases: Slide 11 of 24 let’s talk about noSQL http://www.infoivy.com/2013/07/nosql-database-comparison-chart-only.html
  • 12. Databases: Slide 12 of 24 (potentially) a good fit for biological data
  • 13. Databases: Slide 13 of 24 many data sources are “key-value ready” (or close enough) http://togows.dbcls.jp/entry/pathway/hsa00030/genes.json [ { "2821": "GPI; glucose-6-phosphate isomerase [KO:K01810] [EC:5.3.1.9]", "2539": "G6PD; glucose-6-phosphate dehydrogenase [KO:K00036] [EC:1.1.1.49]", "25796": "PGLS; 6-phosphogluconolactonase [KO:K01057] [EC:3.1.1.31]", ... "5213": "PFKM; phosphofructokinase, muscle [KO:K00850] [EC:2.7.1.11]", "5214": "PFKP; phosphofructokinase, platelet [KO:K00850] [EC:2.7.1.11]", "5211": "PFKL; phosphofructokinase, liver [KO:K00850] [EC:2.7.1.11]" } ]
  • 14. Databases: Slide 14 of 24 schema-free: save first, worry later (= agile) #!/usr/bin/ruby require "mongo" require "json/pure" require "open-uri" db = Mongo::Connection.new.db(’kegg’) col = db.collection(’genes’) j = JSON.parse(open("http://togows.dbcls.jp/entry/pathway/hsa00030/genes.json").read) j.each do |g| gene = Hash.new g.each_pair do |key, val| gene[:_id] = key gene[:desc] = val col.save(gene) end end Ruby code to save JSON from the TogoWS REST service
  • 15. Databases: Slide 15 of 24 example application - PMRetract ask later if interested http://pmretract.heroku.com/ https://github.com/neilfws/PubMed/tree/master/retractions
  • 16. Databases: Slide 16 of 24 when rows + columns != database - sometimes a database is overkill
  • 17. Databases: Slide 17 of 24 example 1 - R/IRanges
  • 18. Databases: Slide 18 of 24 example 2 - bedtools http://bedtools.readthedocs.org/en/latest/
  • 19. Databases: Slide 19 of 24 example 3 - unix join (and the shell in general)
  • 20. Databases: Slide 20 of 24 when are databases good? - when data are updated frequently - when multiple users do the updating - when queries are complex or ever-changing - as backends to web applications
  • 21. Databases: Slide 21 of 24 when are databases not/less good? - for basic “set operations” - for sequence data [1] (?) [1] no time to discuss BioSQL, GBrowse/Bio::DB::GFF, BioDAS etc.
  • 22. Databases: Slide 22 of 24 so how did I answer that email? options(java.parameters = "-Xmx4g") library(XLConnect) wb <- loadWorkbook("˜/Downloads/NGS Target list Tumour for Neil.xlsx") s1 <- readWorksheet(wb, sheet = 1, startCol = 1, endCol = 1, header = F) s2 <- readWorksheet(wb, sheet = 2, startCol = 1, endCol = 32, header = T) s4 <- readWorksheet(wb, sheet = 4, startCol = 1, endCol = 3, header = T) # then use gsub, match, %in% etc. to clean and join the data # ... Read spreadsheet into R using the XLConnect package, then “munge”