2019 03 05_biological_databases_part4_v_upload

FBW
12-03-2019
Biological Databases
Wim Van Criekinge

Data Warehousing and Decision Support

Views and Decision Support
• OLAP queries are typically aggregate queries.
– Precomputation is essential for interactive response
times.
– The CUBE is in fact a collection of aggregate
queries, and precomputation is especially important:
lots of work on what is best to precompute given a
limited amount of space to store precomputed
results.
• Warehouses can be thought of as a collection
of asynchronously replicated tables and
periodically maintained views.
– Has renewed interest in view maintenance!

View Modification (Evaluate On Demand)
CREATE VIEW RegionalSales(category,sales,state)
AS SELECT P.category, S.sales, L.state
FROM Products P, Sales S, Locations L
WHERE P.pid=S.pid AND S.locid=L.locid
SELECT R.category, R.state, SUM(R.sales)
FROM RegionalSales AS R GROUP BY R.category, R.state
SELECT R.category, R.state, SUM(R.sales)
FROM (SELECT P.category, S.sales, L.state
FROM Products P, Sales S, Locations L
WHERE P.pid=S.pid AND S.locid=L.locid) AS R
GROUP BY R.category, R.state
View
Query
Modified
Query

View Materialization (Precomputation)
• Suppose we precompute RegionalSales and store
it with a clustered B+ tree index on
[category,state,sales].
– Then, previous query can be answered by an index-
only scan.
SELECT R.state, SUM(R.sales)
FROM RegionalSales R
WHERE R.category=“Laptop”
GROUP BY R.state
SELECT R.state, SUM(R.sales)
FROM RegionalSales R
WHERE R. state=“Wisconsin”
GROUP BY R.category
Index on precomputed view
is great!
Index is less useful (must
scan entire leaf level).

Materialized Views
• A view whose tuples are stored in the database
is said to be materialized.
– Provides fast access, like a (very high-level) cache.
– Need to maintain the view as the underlying tables
change.
– Ideally, we want incremental view maintenance
algorithms.
• Close relationship to data warehousing, OLAP,
(asynchronously) maintaining distributed
databases, checking integrity constraints, and
evaluating rules and triggers.

Issues in View Materialization
• What views should we materialize, and
what indexes should we build on the
precomputed results?
• Given a query and a set of materialized
views, can we use the materialized
views to answer the query?
• How frequently should we refresh
materialized views to make them
consistent with the underlying tables?
(And how can we do this
incrementally?)

Install BIOSQL locally
• Get latest version of mysql (MAMP,
mariaDB)
• Download biosqldb-mysql.sql
• Remove type=innodb
• Launch database server
• Connect using toad (port 8889)
• Create database biosql;
• Set as active database
• Use worksheet to execute biosqldb-
mysql.sql

MySQL and python DB API(pymysql)

pymysql Installation
pip install pymysql

MySQL Installation
brew install mysql
# Path Setting and inserting into .bash_profile
export MYSQL_PATH=/usr/local/Cellar/mysql/5.7.14
export PATH=$PATH:$MYSQL_PATH/bin

MySQL Start
Start: mysql.server start
Connection by root user: mysql -u root
Creating Database:
Create database djangogirls
Exit:
exit

Connecting MySQL using Client Tool
Tool that helps to manage dadabases iike Toad, Sequel Pro, DataGrip etc.
But tool for today is PyCharm!

print ("Uploading data");
import pymysql
db= pymysql.connect(host =
"localhost",port=8889,user="root",passwd="root",db="db")
cursor=db.cursor()
#cursor.execute("DROP TABLE IF EXISTS USER")
sql="insert into tb (tb_id,tb_name,tb_age,tb_sex) values
('1','Demo','26','ma')"
cursor.execute(sql)
db.commit()
db.close()
print ("Done")

Import from BioPython to BIOSQL
#Connecting to a BioSQL database -http://biopython.org/wiki/BioSQL
from Bio import Entrez
from Bio import SeqIO
from BioSQL import BioSeqDatabase
server = BioSeqDatabase.open_database(driver = "pymysql",host =
"localhost",port=8889,user="root",passwd="root",db="bio2019")
db = server.new_database("test2")
db = server["test2"]
import pprint
Entrez.email = "A.N.Other@example.com"
handle = Entrez.efetch(db="nucleotide", rettype="gb", retmode="text", id="6273291,6273290,6273289")
print ("Loading into BIOSQL")
count = db.load(SeqIO.parse(handle, "genbank"))
print ("Loaded %i records" % count)
server.adaptor.commit()
print ("ended succesfully")

Lab for Bioinformatics and computational genomics

The Technical Feasibility Argument
The Quality Argument
The Price Argument
The Logistics Argument

Recreational genomics

• Experimental designs are outdated by technological advances
• Genetic background (reference genome) as a concept will need to be
updated
• Traits dependent on multiple loci are “complicated”: educate and
provide tools to deal with it

• Eye color … why not the ear wax/asparagus or unibrown example
• … metabolize nutrients (newborns ?)
• … metabolize drugs in case you need it urgently ?

“several 23andMe users have reported taking the FDA’s
advice of reviewing their genetic results with their
physicians, only to find the doctors unprepared, unwilling,
or downright hostile to helping interpret the data”

my genome is too important (for me)
to leave it (only) to doctors

NXTGNT biohackerspace …

PGMv2: Personal Genomics Manifesto

Everyone should have the power and legitimacy to
be able to discover, develop and find new things
about their own genome data.
Intelligent exploration, experimentation and trial to
push the boundaries of knowledge are a basic
human right.

Personal genome data access should be
affordable to all irrespective of nationality, gender,
social background or any other circumstance.
Not having access to a personal genetic test is in
itself a new kind of discrimination.

Whether one wants to share genome data or keep it
private should be a matter of personal choice.
Whatever attitude a person has towards personal
genome privacy, it should be utterly respected.
Corporate interest can never compromise any human
right. Laws must fully protect individual human rights of
equality for every person, irrespective of predicted risks
from genetic data.

Stating that genetic tests merely provide non-
clinical information misses the point of what
personal genomics is all about.
Most genomic information is uninterpretable and
may well be meaningless. But those are not
reasons to deny it to people.
Genetic test results are not unrelated to
someone’s health, one’s ability to respond to
certain drugs and one’s ethnic ancestry.

Education in risks and opportunities for personal
genetic testing should be the primary aim of
policy makers.
Restricting access to interested people makes
no sense and it is virtually impossible to ensure.
Access to personal genomics data and tools for
its interpretation should become accessible to
everyone.

Overview
• Who ? Where ?
• > Genetics
• Technology: Next Gen Sequencing
• Personal …. Medicine/Genomics
• Manifesto
• The App
^[now][transl⎮comput]ational[epi]genomic$

2019 03 05_biological_databases_part4_v_upload

2019 03 05_biological_databases_part4_v_upload

More Related Content

What's hot

Similar to 2019 03 05_biological_databases_part4_v_upload

More from Prof. Wim Van Criekinge

Recently uploaded

2019 03 05_biological_databases_part4_v_upload