FBW
12-03-2019
Biological Databases
Wim Van Criekinge
Data Warehousing and Decision Support
Views and Decision Support
• OLAP queries are typically aggregate queries.
– Precomputation is essential for interactive response
times.
– The CUBE is in fact a collection of aggregate
queries, and precomputation is especially important:
lots of work on what is best to precompute given a
limited amount of space to store precomputed
results.
• Warehouses can be thought of as a collection
of asynchronously replicated tables and
periodically maintained views.
– Has renewed interest in view maintenance!
View Modification (Evaluate On Demand)
CREATE VIEW RegionalSales(category,sales,state)
AS SELECT P.category, S.sales, L.state
FROM Products P, Sales S, Locations L
WHERE P.pid=S.pid AND S.locid=L.locid
SELECT R.category, R.state, SUM(R.sales)
FROM RegionalSales AS R GROUP BY R.category, R.state
SELECT R.category, R.state, SUM(R.sales)
FROM (SELECT P.category, S.sales, L.state
FROM Products P, Sales S, Locations L
WHERE P.pid=S.pid AND S.locid=L.locid) AS R
GROUP BY R.category, R.state
View
Query
Modified
Query
View Materialization (Precomputation)
• Suppose we precompute RegionalSales and store
it with a clustered B+ tree index on
[category,state,sales].
– Then, previous query can be answered by an index-
only scan.
SELECT R.state, SUM(R.sales)
FROM RegionalSales R
WHERE R.category=“Laptop”
GROUP BY R.state
SELECT R.state, SUM(R.sales)
FROM RegionalSales R
WHERE R. state=“Wisconsin”
GROUP BY R.category
Index on precomputed view
is great!
Index is less useful (must
scan entire leaf level).
Materialized Views
• A view whose tuples are stored in the database
is said to be materialized.
– Provides fast access, like a (very high-level) cache.
– Need to maintain the view as the underlying tables
change.
– Ideally, we want incremental view maintenance
algorithms.
• Close relationship to data warehousing, OLAP,
(asynchronously) maintaining distributed
databases, checking integrity constraints, and
evaluating rules and triggers.
Issues in View Materialization
• What views should we materialize, and
what indexes should we build on the
precomputed results?
• Given a query and a set of materialized
views, can we use the materialized
views to answer the query?
• How frequently should we refresh
materialized views to make them
consistent with the underlying tables?
(And how can we do this
incrementally?)
Toad Edge for MySQL
Install BIOSQL locally
• Get latest version of mysql (MAMP,
mariaDB)
• Download biosqldb-mysql.sql
• Remove type=innodb
• Launch database server
• Connect using toad (port 8889)
• Create database biosql;
• Set as active database
• Use worksheet to execute biosqldb-
mysql.sql
MySQL and python DB API(pymysql)
Database drivers
pymysql Installation
pip install pymysql
MySQL Installation
brew install mysql
# Path Setting and inserting into .bash_profile
export MYSQL_PATH=/usr/local/Cellar/mysql/5.7.14
export PATH=$PATH:$MYSQL_PATH/bin
MySQL Start
Start: mysql.server start
Connection by root user: mysql -u root
Creating Database:
Create database djangogirls
Exit:
exit
Connecting MySQL using Client Tool
Tool that helps to manage dadabases iike Toad, Sequel Pro, DataGrip etc.
But tool for today is PyCharm!
print ("Uploading data");
import pymysql
db= pymysql.connect(host =
"localhost",port=8889,user="root",passwd="root",db="db")
cursor=db.cursor()
#cursor.execute("DROP TABLE IF EXISTS USER")
sql="insert into tb (tb_id,tb_name,tb_age,tb_sex) values
('1','Demo','26','ma')"
cursor.execute(sql)
db.commit()
db.close()
print ("Done")
Import from BioPython to BIOSQL
#Connecting to a BioSQL database -http://biopython.org/wiki/BioSQL
from Bio import Entrez
from Bio import SeqIO
from BioSQL import BioSeqDatabase
server = BioSeqDatabase.open_database(driver = "pymysql",host =
"localhost",port=8889,user="root",passwd="root",db="bio2019")
db = server.new_database("test2")
db = server["test2"]
import pprint
Entrez.email = "A.N.Other@example.com"
handle = Entrez.efetch(db="nucleotide", rettype="gb", retmode="text", id="6273291,6273290,6273289")
print ("Loading into BIOSQL")
count = db.load(SeqIO.parse(handle, "genbank"))
print ("Loaded %i records" % count)
server.adaptor.commit()
print ("ended succesfully")
Lab for Bioinformatics and computational genomics
Lab for Bioinformatics and computational genomics
Lab for Bioinformatics and computational genomics
Lab for Bioinformatics and computational genomics
Lab for Bioinformatics and computational genomics
Lab for Bioinformatics and computational genomics
Lab for Bioinformatics and computational genomics
Lab for Bioinformatics and computational genomics
The Technical Feasibility Argument
The Quality Argument
The Price Argument
The Logistics Argument
Lab for Bioinformatics and computational genomics
Lab for Bioinformatics and computational genomics
Recreational genomics
Lab for Bioinformatics and computational genomics
Recreational genomics
• Experimental designs are outdated by technological advances
• Genetic background (reference genome) as a concept will need to be
updated
• Traits dependent on multiple loci are “complicated”: educate and
provide tools to deal with it
Lab for Bioinformatics and computational genomics
Recreational genomics
Lab for Bioinformatics and computational genomics
Recreational genomics
• Eye color … why not the ear wax/asparagus or unibrown example
• … metabolize nutrients (newborns ?)
• … metabolize drugs in case you need it urgently ?
Lab for Bioinformatics and computational genomics
Recreational genomics
Lab for Bioinformatics and computational genomics
Recreational genomics
“several 23andMe users have reported taking the FDA’s
advice of reviewing their genetic results with their
physicians, only to find the doctors unprepared, unwilling,
or downright hostile to helping interpret the data”
Lab for Bioinformatics and computational genomics
Lab for Bioinformatics and computational genomics
Recreational genomics
Lab for Bioinformatics and computational genomics
Lab for Bioinformatics and computational genomics
Recreational genomics
Lab for Bioinformatics and computational genomics
Recreational genomics
Lab for Bioinformatics and computational genomics
Lab for Bioinformatics and computational genomics
my genome is too important (for me)
to leave it (only) to doctors
Lab for Bioinformatics and computational genomics
NXTGNT biohackerspace …
Lab for Bioinformatics and computational genomics
PGMv2: Personal Genomics Manifesto
Lab for Bioinformatics and computational genomics
Everyone should have the power and legitimacy to
be able to discover, develop and find new things
about their own genome data.
Intelligent exploration, experimentation and trial to
push the boundaries of knowledge are a basic
human right.
PGMv2: Personal Genomics Manifesto
Lab for Bioinformatics and computational genomics
Personal genome data access should be
affordable to all irrespective of nationality, gender,
social background or any other circumstance.
Not having access to a personal genetic test is in
itself a new kind of discrimination.
PGMv2: Personal Genomics Manifesto
Lab for Bioinformatics and computational genomics
Whether one wants to share genome data or keep it
private should be a matter of personal choice.
Whatever attitude a person has towards personal
genome privacy, it should be utterly respected.
Corporate interest can never compromise any human
right. Laws must fully protect individual human rights of
equality for every person, irrespective of predicted risks
from genetic data.
PGMv2: Personal Genomics Manifesto
Lab for Bioinformatics and computational genomics
Stating that genetic tests merely provide non-
clinical information misses the point of what
personal genomics is all about.
Most genomic information is uninterpretable and
may well be meaningless. But those are not
reasons to deny it to people.
Genetic test results are not unrelated to
someone’s health, one’s ability to respond to
certain drugs and one’s ethnic ancestry.
PGMv2: Personal Genomics Manifesto
Lab for Bioinformatics and computational genomics
Education in risks and opportunities for personal
genetic testing should be the primary aim of
policy makers.
Restricting access to interested people makes
no sense and it is virtually impossible to ensure.
Access to personal genomics data and tools for
its interpretation should become accessible to
everyone.
PGMv2: Personal Genomics Manifesto
Lab for Bioinformatics and computational genomics
Overview
• Who ? Where ?
• > Genetics
• Technology: Next Gen Sequencing
• Personal …. Medicine/Genomics
• Manifesto
• The App
^[now][transl⎮comput]ational[epi]genomic$
Lab for Bioinformatics and computational genomics
65
2019 03 05_biological_databases_part4_v_upload
2019 03 05_biological_databases_part4_v_upload
2019 03 05_biological_databases_part4_v_upload
2019 03 05_biological_databases_part4_v_upload
2019 03 05_biological_databases_part4_v_upload
2019 03 05_biological_databases_part4_v_upload
2019 03 05_biological_databases_part4_v_upload

2019 03 05_biological_databases_part4_v_upload

  • 2.
  • 4.
    Data Warehousing andDecision Support
  • 5.
    Views and DecisionSupport • OLAP queries are typically aggregate queries. – Precomputation is essential for interactive response times. – The CUBE is in fact a collection of aggregate queries, and precomputation is especially important: lots of work on what is best to precompute given a limited amount of space to store precomputed results. • Warehouses can be thought of as a collection of asynchronously replicated tables and periodically maintained views. – Has renewed interest in view maintenance!
  • 6.
    View Modification (EvaluateOn Demand) CREATE VIEW RegionalSales(category,sales,state) AS SELECT P.category, S.sales, L.state FROM Products P, Sales S, Locations L WHERE P.pid=S.pid AND S.locid=L.locid SELECT R.category, R.state, SUM(R.sales) FROM RegionalSales AS R GROUP BY R.category, R.state SELECT R.category, R.state, SUM(R.sales) FROM (SELECT P.category, S.sales, L.state FROM Products P, Sales S, Locations L WHERE P.pid=S.pid AND S.locid=L.locid) AS R GROUP BY R.category, R.state View Query Modified Query
  • 7.
    View Materialization (Precomputation) •Suppose we precompute RegionalSales and store it with a clustered B+ tree index on [category,state,sales]. – Then, previous query can be answered by an index- only scan. SELECT R.state, SUM(R.sales) FROM RegionalSales R WHERE R.category=“Laptop” GROUP BY R.state SELECT R.state, SUM(R.sales) FROM RegionalSales R WHERE R. state=“Wisconsin” GROUP BY R.category Index on precomputed view is great! Index is less useful (must scan entire leaf level).
  • 8.
    Materialized Views • Aview whose tuples are stored in the database is said to be materialized. – Provides fast access, like a (very high-level) cache. – Need to maintain the view as the underlying tables change. – Ideally, we want incremental view maintenance algorithms. • Close relationship to data warehousing, OLAP, (asynchronously) maintaining distributed databases, checking integrity constraints, and evaluating rules and triggers.
  • 9.
    Issues in ViewMaterialization • What views should we materialize, and what indexes should we build on the precomputed results? • Given a query and a set of materialized views, can we use the materialized views to answer the query? • How frequently should we refresh materialized views to make them consistent with the underlying tables? (And how can we do this incrementally?)
  • 10.
  • 11.
    Install BIOSQL locally •Get latest version of mysql (MAMP, mariaDB) • Download biosqldb-mysql.sql • Remove type=innodb • Launch database server • Connect using toad (port 8889) • Create database biosql; • Set as active database • Use worksheet to execute biosqldb- mysql.sql
  • 18.
    MySQL and pythonDB API(pymysql)
  • 19.
  • 20.
  • 21.
    MySQL Installation brew installmysql # Path Setting and inserting into .bash_profile export MYSQL_PATH=/usr/local/Cellar/mysql/5.7.14 export PATH=$PATH:$MYSQL_PATH/bin
  • 22.
    MySQL Start Start: mysql.serverstart Connection by root user: mysql -u root Creating Database: Create database djangogirls Exit: exit
  • 23.
    Connecting MySQL usingClient Tool Tool that helps to manage dadabases iike Toad, Sequel Pro, DataGrip etc. But tool for today is PyCharm!
  • 24.
    print ("Uploading data"); importpymysql db= pymysql.connect(host = "localhost",port=8889,user="root",passwd="root",db="db") cursor=db.cursor() #cursor.execute("DROP TABLE IF EXISTS USER") sql="insert into tb (tb_id,tb_name,tb_age,tb_sex) values ('1','Demo','26','ma')" cursor.execute(sql) db.commit() db.close() print ("Done")
  • 25.
    Import from BioPythonto BIOSQL #Connecting to a BioSQL database -http://biopython.org/wiki/BioSQL from Bio import Entrez from Bio import SeqIO from BioSQL import BioSeqDatabase server = BioSeqDatabase.open_database(driver = "pymysql",host = "localhost",port=8889,user="root",passwd="root",db="bio2019") db = server.new_database("test2") db = server["test2"] import pprint Entrez.email = "A.N.Other@example.com" handle = Entrez.efetch(db="nucleotide", rettype="gb", retmode="text", id="6273291,6273290,6273289") print ("Loading into BIOSQL") count = db.load(SeqIO.parse(handle, "genbank")) print ("Loaded %i records" % count) server.adaptor.commit() print ("ended succesfully")
  • 26.
    Lab for Bioinformaticsand computational genomics
  • 27.
    Lab for Bioinformaticsand computational genomics
  • 29.
    Lab for Bioinformaticsand computational genomics
  • 30.
    Lab for Bioinformaticsand computational genomics
  • 31.
    Lab for Bioinformaticsand computational genomics
  • 32.
    Lab for Bioinformaticsand computational genomics
  • 33.
    Lab for Bioinformaticsand computational genomics
  • 34.
    Lab for Bioinformaticsand computational genomics The Technical Feasibility Argument The Quality Argument The Price Argument The Logistics Argument
  • 35.
    Lab for Bioinformaticsand computational genomics
  • 36.
    Lab for Bioinformaticsand computational genomics Recreational genomics
  • 37.
    Lab for Bioinformaticsand computational genomics Recreational genomics • Experimental designs are outdated by technological advances • Genetic background (reference genome) as a concept will need to be updated • Traits dependent on multiple loci are “complicated”: educate and provide tools to deal with it
  • 38.
    Lab for Bioinformaticsand computational genomics Recreational genomics
  • 39.
    Lab for Bioinformaticsand computational genomics Recreational genomics • Eye color … why not the ear wax/asparagus or unibrown example • … metabolize nutrients (newborns ?) • … metabolize drugs in case you need it urgently ?
  • 40.
    Lab for Bioinformaticsand computational genomics Recreational genomics
  • 41.
    Lab for Bioinformaticsand computational genomics Recreational genomics “several 23andMe users have reported taking the FDA’s advice of reviewing their genetic results with their physicians, only to find the doctors unprepared, unwilling, or downright hostile to helping interpret the data”
  • 42.
    Lab for Bioinformaticsand computational genomics
  • 43.
    Lab for Bioinformaticsand computational genomics Recreational genomics
  • 44.
    Lab for Bioinformaticsand computational genomics
  • 45.
    Lab for Bioinformaticsand computational genomics Recreational genomics
  • 46.
    Lab for Bioinformaticsand computational genomics Recreational genomics
  • 47.
    Lab for Bioinformaticsand computational genomics
  • 48.
    Lab for Bioinformaticsand computational genomics my genome is too important (for me) to leave it (only) to doctors
  • 50.
    Lab for Bioinformaticsand computational genomics NXTGNT biohackerspace …
  • 51.
    Lab for Bioinformaticsand computational genomics PGMv2: Personal Genomics Manifesto
  • 52.
    Lab for Bioinformaticsand computational genomics Everyone should have the power and legitimacy to be able to discover, develop and find new things about their own genome data. Intelligent exploration, experimentation and trial to push the boundaries of knowledge are a basic human right. PGMv2: Personal Genomics Manifesto
  • 53.
    Lab for Bioinformaticsand computational genomics Personal genome data access should be affordable to all irrespective of nationality, gender, social background or any other circumstance. Not having access to a personal genetic test is in itself a new kind of discrimination. PGMv2: Personal Genomics Manifesto
  • 54.
    Lab for Bioinformaticsand computational genomics Whether one wants to share genome data or keep it private should be a matter of personal choice. Whatever attitude a person has towards personal genome privacy, it should be utterly respected. Corporate interest can never compromise any human right. Laws must fully protect individual human rights of equality for every person, irrespective of predicted risks from genetic data. PGMv2: Personal Genomics Manifesto
  • 55.
    Lab for Bioinformaticsand computational genomics Stating that genetic tests merely provide non- clinical information misses the point of what personal genomics is all about. Most genomic information is uninterpretable and may well be meaningless. But those are not reasons to deny it to people. Genetic test results are not unrelated to someone’s health, one’s ability to respond to certain drugs and one’s ethnic ancestry. PGMv2: Personal Genomics Manifesto
  • 56.
    Lab for Bioinformaticsand computational genomics Education in risks and opportunities for personal genetic testing should be the primary aim of policy makers. Restricting access to interested people makes no sense and it is virtually impossible to ensure. Access to personal genomics data and tools for its interpretation should become accessible to everyone. PGMv2: Personal Genomics Manifesto
  • 57.
    Lab for Bioinformaticsand computational genomics Overview • Who ? Where ? • > Genetics • Technology: Next Gen Sequencing • Personal …. Medicine/Genomics • Manifesto • The App ^[now][transl⎮comput]ational[epi]genomic$
  • 58.
    Lab for Bioinformaticsand computational genomics
  • 65.