FBW
20-03-2018
Biological Databases
Wim Van Criekinge
SPARQL 1/2
SPARQL 2/2
Prof Devisscher
A Symptoms checker - Disease
B Metabolic engineering
C Cancer Neoantigens
Familienaam Voornaam E-mail Opleidingsprogramma
Davey Lucas Lucas.Davey@UGent.be C CMBIOISB
David Sven Sven.David@UGent.be B IMCELB
Engelen Yanou Yanou.Engelen@UGent.be ? IMCELB
Ezquerro Marrodán Elsa Elsa.EzquerroMarrodan@UGent.be C IXGAEX
Georis Raphaël Raphael.Georis@UGent.be B IMCELB
Gilis Jeroen Jeroen.Gilis@UGent.be B CMBIOISB
Lashkari Samira Samira.Lashakri@UGent.be A CMBIOISB
Recer Karmen Karmen.Recer@UGent.be C IXGAEX
Schindfessel Cédric Cedric.Schindfessel@UGent.be B IMCELB
Silva Marta Marta.Silva@UGent.be B CMBIOISB
Silva Meneses Rodrigo Rodrigo.Meneses@UGent.be C CMBIOISB
Strybol Pieter-Paul PieterPaul.Strybol@UGent.be B/C CMBIOIBE
Taelman Steff Steff.Taelman@UGent.be A CMBIOIBE
Tóth Máté István MateIstvan.Toth@UGent.be C EXGAEX
Toulmé Coralyne Coralyne.Toulme@UGent.be C IXGAEX
Van hoyweghen Sergej Sergej.Vanhoyweghen@UGent.be ? IMCELB
Willems Thomas Thomas.Willems@UGent.be ? IMCELB
Wojciulewitsch Coralie Coralie.Wojciulewitsch@UGent.be A IMCELB
Yekimov Illya Illya.Yekimov@UGent.be A IMCELB
For the project I suggest to take the following
steps (individual or in group - maybe setup a
collaborative tool like slack)
1. What it is about ? What do you want to
achieve ? For who ?
2. identify information resources - think about
a basic data-model
3. Draw (mockup) an interface, don't be
constrained by technical consideration - think
ouside the box:)
SQL Clients
• Best SQL clients
• Free TOAD edge ???
• Alternatives
• Navicat
• Python example
• BIOSQL load genbank (BIOSQL –
biopython example)
print ("Uploading data");
import pymysql
db= pymysql.connect(host =
"localhost",port=8889,user="root",passwd="root",db="db")
cursor=db.cursor()
#cursor.execute("DROP TABLE IF EXISTS USER")
sql="insert into tb (tb_id,tb_name,tb_age,tb_sex) values ('1','Demo','25','male')"
cursor.execute(sql)
db.commit()
db.close()
print ("Done")
Install BIOSQL locally
• Get latest version of mysql (MAMP,
mariaDB)
• Download biosqldb-mysql.sql
• Remove type=innodb
• Launch database server
• Connect using toad (port 8889)
• Create database biosql;
• Set as active database
• Use worksheet to execute biosqldb-
mysql.sql
#Connecting to a BioSQL database -http://biopython.org/wiki/BioSQL
from Bio import Entrez
from Bio import SeqIO
from BioSQL import BioSeqDatabase
#db= pymysql.connect(host = "localhost",port=8889,user="root",passwd="root",db="db")
server = BioSeqDatabase.open_database(driver = "pymysql",host = "localhost",port=8889,user="root",passwd="root",db="db")
#db = server.new_database("test")
db = server["test"]
import pprint
Entrez.email = "A.N.Other@example.com"
handle = Entrez.efetch(db="nucleotide", rettype="gb", retmode="text", id="6273291,6273290,6273289")
print ("Loading into BIOSQL")
count = db.load(SeqIO.parse(handle, "genbank"))
print ("Loaded %i records" % count)
server.adaptor.commit()
for seq_record in SeqIO.parse(handle, "genbank"):
print (seq_record.id, seq_record.description[:50] + "...")
print ("Sequence length %i," % len(seq_record))
print ("%i features," % len(seq_record.features))
print ("from: %s" % seq_record.annotations["source"])
pprint.pprint(seq_record)
# pprint ("Loading")
#load into BIOSQL
# db.load_seqrecord(seq_record)
Example 3-tier model in biological database
http://www.bioinformatics.be
Example of different interface to the same back-end database (MySQL)
• Apache
• PHP examples
Slide 17Prepared 3/21/2018
What IS Apache, Anyway?
 Open-Source Web server originally based on
NCSA server
 Available on over 160 varieties of Unix -- and
Windows NT
 Over 56% of Internet Web servers run Apache or
an Apache derivative
Graph copyright Netcraft (<http://www.netcraft.com/survey/>)
Slide 18Prepared 3/21/2018
Configuring Apache
 Choosing functionality
– Apache functionality is available
through modules which are either built
into or loaded into the server
 Server instructions
– Apache reads its run-time configuration
instructions from text files
– No GUI available
– 182 configuration directives in base
package
Slide 19Prepared 3/21/2018
Configuring Apache
(continued)
 When used with -d option, server reads
httpd.conf; -f allows use of a different
name
 After httpd.conf, server reads
srm.conf, then access.conf (unless
the latter two have been renamed by
ResourceConfig and AccessConfig
directives, respectively)
 Consider combining these into a single
file for simplicity
Slide 20Prepared 3/21/2018
Logfiles
 Two basic logfiles
– Access log -- who’s been visiting your
server and what they wanted
– Error log -- problems the server has
encountered and things it has noticed
 Can be configured for each virtual host,
or for entire server
 Access log format can be customised
Simple Web form
<html>
<head><title>simple form</title></head>
<body>
<form name="simpleForm" method="put"
action="simpleHandler.cgi">
Your email address:
<input type="text" name="email">
<input type="submit" value="Submit">
</form>
</body>
</html>
Interacting with Web Forms
typically need to generate the form (which
may be a normal static Web page), then
• validate user input
• process user input
• generate a response
dynamically.
these three steps may be done within the
Web browser (client-side) or within the
Web server (server-side) or some
combination of both.
CGI
Common Gateway Interface
mechanism for a Web browser to send data
to a Web server
allow browser to submit data to a program
running on the server
• program is often called a ‘CGI script’
• typically written in Perl, PHP or ASP
• can also be a ‘real’ program (e.g.
written in C)
CGI (2)
used primarily for form submission
can also be used to upload local files
‘CGI’ URLs often contain ‘?’ and ‘&’
characters - but don’t have to!
output from CGI usually dynamic and
therefore not cached
CGI (3)
Web
browser
Web
server
Web
form
Results
Data sent
using
CGI
Email
File
Database
Brief History of PHP
PHP (PHP: Hypertext Preprocessor) was created by Rasmus Lerdorf in
1994. It was initially developed for HTTP usage logging and server-side
form generation in Unix.
PHP 2 (1995) transformed the language into a Server-side embedded
scripting language. Added database support, file uploads, variables,
arrays, recursive functions, conditionals, iteration, regular expressions,
etc.
PHP 3 (1998) added support for ODBC data sources, multiple platform
support, email protocols (SNMP,IMAP), and new parser written by Zeev
Suraski and Andi Gutmans .
PHP 4 (2000) became an independent component of the web server for
added efficiency. The parser was renamed the Zend Engine. Many
security features were added.
PHP 5 (2004) adds Zend Engine II with object oriented programming,
robust XML support using the libxml2 library, SOAP extension for
interoperability with Web Services, SQLite has been bundled with PHP
Why is PHP used?
1. Easy to Use
Code is embedded into HTML. The PHP code is enclosed in special start and
end tags that allow you to jump into and out of "PHP mode".
<html>
<head>
<title>Example</title>
</head>
<body>
<?php
echo "Hi, I'm a PHP script!";
?>
</body>
</html>
Getting Started
1. How to escape from HTML and enter PHP mode
• PHP parses a file by looking for one of the special tags that
tells it to start interpreting the text as PHP code. The parser then
executes all of the code it finds until it runs into a PHP closing tag.
Starting tag Ending tag Notes
<?php ?> Preferred method as it allows the use of
PHP with XHTML
<? ?> Not recommended. Easier to type, but has
to be enabled and may conflict with XML
<script language="php"> ?> Always available, best if used when
FrontPage is the HTML editor
<% %> Not recommended. ASP tags support was
added in 3.0.4
<?php echo “Hello World”; ?>
PHP CODE HTMLHTML
Getting Started
2. Simple HTML Page with PHP
• The following is a basic example to output text using
PHP.
<html><head>
<title>My First PHP Page</title>
</head>
<body>
<?php
echo "Hello World!";
?>
</body></html>
Copy the code onto your web server and save it as “test.php”.
You should see “Hello World!” displayed.
Notice that the semicolon is used at the end of each line of PHP
code to signify a line break. Like HTML, PHP ignores whitespace
between lines of code. (An HTML equivalent is <BR>)
Getting Started
3. Using conditional statements
• Conditional statements are very useful for displaying specific
content to the user. The following example shows how to display
content according to the day of the week.
<?php
$today_dayofweek = date(“w”);
if ($today_dayofweek == 4){
echo “Today is Thursday!”;
}
else{
echo “Today is not Thursday.”;
}
?>
Getting Started
3. Using conditional statements
The if statement checks the value of $today_dayofweek
(which is the numerical day of the week, 0=Sunday… 6=Saturday)
• If it is equal to 4 (the numeric representation of Thurs.) it will display
everything within the first { } bracket after the “if()”.
• If it is not equal to 4, it will display everything in the second { } bracket
after the “else”.
<?php
$today_dayofweek = date(“w”);
if ($today_dayofweek == 4){
echo “Today is Thursday!”;
}
else{
echo “Today is not Thursday.”;
}
?>
Getting Started
3. Using conditional statements
If we run the script on a Thursday, we should see:
“Today is Thursday”.
On days other than Thursday, we will see:
“Today is not Thursday.”
<?php
$today_dayofweek = date(“w”);
if ($today_dayofweek == 4){
echo “Today is Thursday!”;
}
else{
echo “Today is not Thursday.”;
}
?>
example
• HTML file greet.html has
<form action="greet.php" method="get"><p>
your last name: <input type="text"
name="lastname"/></p></form>
• PHP file greet.php has
<?php
print "Hello ";
print $_GET['lastname'];
?>
in addition to the usual HTML stuff.
WHAT is PEAR
The PHP Extension and Application Repository, or PEAR, is a framework and
distribution system for PHP code components. Stig S. Bakken founded the PEAR
project in 1999 to promote the re-use of code that performs common functions.
The project has the goals of:
• providing a structured library of code
• maintaining a system for distributing code and for managing code
packages
• promoting a standard coding-style
PEAR DB
DB is a database abstraction layer providing:
* an OO-style query API
* portability features that make programs written for one DBMS work with other
DBMS's
* a DSN (data source name) format for specifying database servers
* prepare/execute (bind) emulation for databases that don't support it natively
* a result object for each query response
* portable error codes
* sequence emulation
* sequential and non-sequential row fetching as well as bulk fetching
* formats fetched rows as associative arrays, ordered arrays or objects
* row limit support
* transactions support
* table information interface
* DocBook and phpDocumentor API documentation
To access a database through PEAR DB, you have to create a data source
name (DSN) that specifies the appropriate PEAR DB backend for your
database and the parameters necessary to connect to the database.
DSN syntax:
phptype(dbsyntax)://username:password@protocol+hostspec/database?option=value
for example: (mysql)
Connecting to Databases through PEAR DB
How to connect and disconnect
Connecting to Databases through PEAR DB
Connecting to a database, creating a
table, and inserting a record.
Some Database Functions
 Query function
 $d->query takes an SQL command as its string
argument
 Sends query to database server for execution
 $d–>setErrorHandling(PEAR_ERROR_DIE)
 Terminate program and print default error
messages if any subsequent errors occur
Retrieval Queries from Database
Tables
 $q
 Variable that holds query result
 $q->fetchRow() retrieve next record in query
result and control loop
 $allresult = $d->getAll(query)
 Holds all the records in a query result in a single
variable called $allresult
Summary
 PHP scripting language
 Very popular for Web database programming
 PHP basics for Web programming
 Data types
 Database commands include:
 Creating tables, inserting new records, and
retrieving database records
 Looping over a query result
SQL consists of only 4 statements, sometimes
referred to as CRUD:
–Create - INSERT - to store new data
–Read - SELECT - to retrieve data
–Update - UPDATE - to change or modify
data.
–Delete - DELETE - delete or remove data
Structured Query Language
Relationships within Relational Database
• Relationship classifications
– 1:1
– 1:M
– M:N
• E-R Model
– ERD Maps E-R model
– Entities - of tabellen. Ze worden weergegeven door
rechthoeken met daarin de naam van het
entiteitstype en soms een opsomming van de
attributen
– Relationships - dit zijn verbanden tussen de
entiteittypen. We worden weergegeven door lijnen
tussen de verbonden eniteitstypen
ERD Symbols
• Rectangles represent entities
• Diamonds represent the relationship(s) between
the entities
• “1” side of relationship
– Number 1 in Chen Model
– Bar crossing line in Crow’s Feet Model
• “Many” relationships
– Letter “M” and “N” in Chen Model
– Three pronged “Crow’s foot” in Crow’s Feet Model
Example 1:M Relationship
Example 1:M Relationship
Example M:N Relationship
Example M:N Relationship
Converting M:N Relationship to Two 1:M Relationships
Converting M:N Relationship to Two 1:M Relationships (con’t.)
Converting M:N Relationship to Two 1:M Relationships (con’t.)
Converting M:N Relationship to Two 1:M Relationships (con’t.)
Indexes
• Points to location
• Makes retrieval of data faster
Applications of B+-Trees
(1) Search key of B+-Tree is primary key for data file, index
is dense (data file may or may not be sorted by pk):
– There is one key-pointer pair in a leaf for every
record of the data file
(2) Data file is sorted by its primary key:
– B+-Tree is a sparse index with one key-pointer
pair at a leaf for each block of the file
(3) Data file is sorted by an attribute that is not a key
(search key for B+-Tree):
– For each value K that appears in the data file,
there is one key-pointer pair at the leaf.
Pointer goes to the first of the records that
have K as their sort-key value
• Database ontwerpen - Gegevensmodellering
• Welke informatie zal de database moeten
kunnen leveren ?
• Welke gegevens moet de database bevatten
om in de vastgestelde informatiebehoefte te
voorzien ?
• Welke gegevens zijn beschikbaar ?
• Hoe is het verband tussen de benodigde
gegevens ?
• Database ontwerpen - Gegevensmodellering
• Top-down (beginnen bij de grote lijnen)
• Bottum-Up (beginned bij detail)
• Een veld veranderen of bijmaken is veel moeilijker
dan een record invoegen
Normalization
Normalisatie (komen tot een gegevensmodel waarin geen enkel feit
redundant, dat will zeggen meer dan één keer, voorkomt
Nulde normaalvorm: begin
Gegevensmodel uit initiele informatieanalyse, typische herhaalde
attributen of herhaalde groepen van attributen
Eerste normaalvorm: verwijder herhaalde groepen
Tweede normaalvorm: bekijk samengestelde sleutels
Derde normaalvorm: bekijk transitieve afhankelijkheden
Normalization
• Normalization is used to design a set of relation
schemas that is optimal from the point of view of
database updating
• The normalization starts from a universal relation
schema
• There are six normal forms, of which only three are
based on functional dependencies
• Normal forms define to which extent we should
normalize
• The Synthesis algorithm and the Decomposition
algorithm represent the formal normalization
methods
Finishing Database Design
• To complete a database schema design,
after the normalization is done, one has to
define interrelation constraints (referential
integrity constraints), as well
• Normalization results in a set of relation
schema
• That design is suitable for efficient database
update
• But, it can slow down execution of queries
• Sometimes, it is advisable to undertake
controlled de normalization
Privileges in SQL
• select: allows read access to relation,or the
ability to query using the view
– Example: grant users U1, U2, and U3 select
authorization on the branch relation:
grant select on branch to U1, U2, U3
• insert: the ability to insert tuples
• update: the ability to update using the SQL
update statement
• delete: the ability to delete tuples.
• all privileges: used as a short form for all the
allowable privileges
Authorization
Forms of authorization on parts of the database:
• Read - allows reading, but not modification of data.
• Insert - allows insertion of new data, but not modification of existing
data.
• Update - allows modification, but not deletion of data.
• Delete - allows deletion of data.
Forms of authorization to modify the database schema
Index - allows creation and deletion of indices.
• Resources - allows creation of new relations.
• Alteration - allows addition or deletion of attributes in a relation.
• Drop - allows deletion of relations.
Authorization Specification in SQL
• The grant statement is used to confer authorization
grant <privilege list>
on <relation name or view name> to <user list>
• <user list> is:
– a user-id
– public, which allows all valid users the privilege
granted
– A role
• Granting a privilege on a view does not imply granting
any privileges on the underlying relations.
• The grantor of the privilege must already hold the
privilege on the specified item (or be the database
administrator).
Revoking Authorization in SQL
• The revoke statement is used to revoke authorization.
revoke <privilege list>
on <relation name or view name> from <user list>
• Example:
revoke select on branch from U1, U2, U3
• <privilege-list> may be all to revoke all privileges the revokee
may hold.
• If <revokee-list> includes public, all users lose the privilege
except those granted it explicitly.
• If the same privilege was granted twice to the same user by
different grantees, the user may retain the privilege after the
revocation.
• All privileges that depend on the privilege being revoked are
also revoked.
Data Dictionary and System Catalog
• meta-gegevens, data-dictionary: een soort
geautomatiseerd naslagwerk met een overzicht over
alle gebruikers, gegevens en geheugens
• Data dictionary
– Provides detailed account of all tables found within
database
– Metadata
– Attribute names and characteristics
• System catalog
– Detailed data dictionary
– System-created database
– Stores database characteristics and contents
– Tables can be queried just like any other tables
– Automatically produces database documentation
Functions and Procedures
• SQL:1999 supports functions and procedures
– Functions/procedures can be written in SQL itself, or in an
external programming language
– Functions are particularly useful with specialized data types
such as images and geometric objects
• Example: functions to check if polygons overlap, or to compare
images for similarity
– Some database systems support table-valued functions,
which can return a relation as a result
• SQL:1999 also supports a rich set of imperative
constructs, including
– Loops, if-then-else, assignment
• Many databases have proprietary procedural
extensions to SQL that differ from SQL:1999
Procedural Extensions and Stored Procedures
• SQL provides a module language
– Permits definition of procedures in SQL, with
if-then-else statements, for and while loops,
etc.
• Stored Procedures
– Can store procedures in the database
– then execute them using the call statement
– permit external applications to operate on the
database without knowing about internal
details
Transaction Processing
Transaction Processing - Basics
• A transaction is a logical unit of a
database processing
• Transaction processing systems
include large databases and hundreds
of concurrent users
• Examples of these systems are:
– airline reservations,
– banking,
– credit card processing,
– supermarket checkout, and
– similar systems
Multi - User Database Systems
• One way to classify DBMSs is according to the
number of concurrent users:
– single user
– multi-user
• Majority of database systems are of a multi - user
type
• Concurrent (or simultaneous from the user point of
view) database usage is possible thanks to
computer multiprogramming
• Multiprogramming operating systems execute some
commands of one process, then suspend this
process and execute some commands of another
process
• After a while, the execution of the first process is
resumed at the point where it was interrupted
• This type of process execution is called
interleaving
The Notion of a Transaction
• A transaction is a logical unit of a database
processing that includes one or more
database access operations (read and write )
• Each execution of a transaction program is a
transaction
• If a transaction finishes successfully, all
data it has changed are visible to other
transactions
• If a transaction fails for any reason, DBMS
has to undo all the changes that the
transaction made against the database
Transactions (continued)
• In multi – user transaction processing
systems, users execute database transactions
concurrently
• Most often, concurrent means interleaved
• The users can attempt to modify the same
database items at the same time, and that is
potential source of database inconsistency
• Checking database integrity constraints is
not enough to protect a database from
threats induced by its concurrent usage
Commit
• A transaction reaches its commit point when all of its
operations that access the database have been
executed successfully and the effect of all transaction
operations on the database have been recorded in the
log
• Beyond the commit point, the effect of a transaction
is assumed to be permanently recorded in the
database
• If a transaction does not reach its commit point and
there is no [commit, T ]
record in the log file, this transaction has to be rolled
back
• Read committed protocol:
– If a transaction T updates a database item A, other transactions can read A only after T
has committed
Sources of Database Inconsistency
Uncontrolled execution of database
transactions in a multi – user
environment can lead to database
inconsistency
• There is a number of possible sources
of database inconsistency
• The typical ones are:
– lost update problem,
– dirty read problem, and
– unrepeatable read problem
Lost Update Problem
T1
T2
read_item ( X )
X = X – N
write_item (X )
read_item (X)
X = X + M
write_item (X)
t
i
m
e
After termination of T2, X = X + M.
T1's update to X is lost because
T2 wrote over X
Generally, lost update
problem is characterized by:
•T2 reads X,
•T1 writes X, and
•T2 writes X
Dirty Read Problem
T1
T2
read_item ( X )
X = X – N
write_item (X )
read_item ( Y )
T1 fails
read_item (X)
X = X + M
write_item (X)
t
i
m
e
Generally, dirty read
problem is characterized
by:
•T1 writes X,
•T2 reads X, and
•T1 fails
Since T1 failed, DBMS is
going to undo the changes
it made against the
database
T2 has already read item X =
X - N value, and that value is
going to be altered by DBMS
back to X
Unrepeatable Read Problem
T1
T2
read_item ( X )
read_item (X )
read_item (X)
X = X + M
write_item (X)
t
i
m
e
Transaction T1 has got two different values of X in two subsequent
reads, because T2 has changed it in the meantime
Even if T1 didn't execute the second read command, it would use a
stale X value, and that's another form of the unrepeatable read problem
Generally, unrepeatable read
problem is characterized by:
•T1 reads X,
•T2 writes X, and
•T1 reads X
Prevention of Concurrency Anomalies
• Lost update, dirty read and unrepeatable
read are called concurrency anomalies
• The concurrency control part of a DBMS
has the task to prevent these problems
• DBMS is responsible to ensure that either
all operations of a transaction are
successfully executed and their effect is
permanently stored in the database, or it
happens as if the transaction were even not
started
• The effect of a partially executed
transaction has to be undone
Types of Failures
• A transaction can be partially
executed due to:
– A computer failure(hardware, software,
network,…)
– A transaction error (overflow, division by
zero,…)
– An exception condition (lack of data)
– A concurrency control enforcement (dead lock,
timeout,…)
– An abort command in the transaction program
Transaction State Transition Diagram
Failed
abort
begin
transaction
Active
read,
write
Partially
committed
end
transaction
Committed
commit
Terminated
Program
command
Transaction
state
Log File
• To be able to recover from failures
DBMS maintains a log file
• Typically, a log file contains records
with following contents:
[start_transaction, T ] (*T is transaction
id*)
[write_item, T, X, old_value, new_value]
[read_item,T, X ] (*optional*)
[commit, T ]
[abort, T ]
Summary
• Executing transaction in an interleaved way
may bring a database in an inconsistent state
• Transaction anomalies are:
– Lost update,
– Dirty read, and
– Unrepeatable read
• A DBMS is responsible to ensure that either
all operations of a transaction are successfully
executed, or it is rolled back
• Log file records all important events (start,
read, write, commit)
• When a transaction reaches its commit point,
everything is safely stored in a database (or a
log file)
Data Warehousing and Decision Support
Views and Decision Support
• OLAP queries are typically aggregate queries.
– Precomputation is essential for interactive response
times.
– The CUBE is in fact a collection of aggregate
queries, and precomputation is especially important:
lots of work on what is best to precompute given a
limited amount of space to store precomputed
results.
• Warehouses can be thought of as a collection
of asynchronously replicated tables and
periodically maintained views.
– Has renewed interest in view maintenance!
View Modification (Evaluate On Demand)
CREATE VIEW RegionalSales(category,sales,state)
AS SELECT P.category, S.sales, L.state
FROM Products P, Sales S, Locations L
WHERE P.pid=S.pid AND S.locid=L.locid
SELECT R.category, R.state, SUM(R.sales)
FROM RegionalSales AS R GROUP BY R.category, R.state
SELECT R.category, R.state, SUM(R.sales)
FROM (SELECT P.category, S.sales, L.state
FROM Products P, Sales S, Locations L
WHERE P.pid=S.pid AND S.locid=L.locid) AS R
GROUP BY R.category, R.state
View
Query
Modified
Query
View Materialization (Precomputation)
• Suppose we precompute RegionalSales and store
it with a clustered B+ tree index on
[category,state,sales].
– Then, previous query can be answered by an index-
only scan.
SELECT R.state, SUM(R.sales)
FROM RegionalSales R
WHERE R.category=“Laptop”
GROUP BY R.state
SELECT R.state, SUM(R.sales)
FROM RegionalSales R
WHERE R. state=“Wisconsin”
GROUP BY R.category
Index on precomputed view
is great!
Index is less useful (must
scan entire leaf level).
Materialized Views
• A view whose tuples are stored in the database
is said to be materialized.
– Provides fast access, like a (very high-level) cache.
– Need to maintain the view as the underlying tables
change.
– Ideally, we want incremental view maintenance
algorithms.
• Close relationship to data warehousing, OLAP,
(asynchronously) maintaining distributed
databases, checking integrity constraints, and
evaluating rules and triggers.
Issues in View Materialization
• What views should we materialize, and
what indexes should we build on the
precomputed results?
• Given a query and a set of materialized
views, can we use the materialized
views to answer the query?
• How frequently should we refresh
materialized views to make them
consistent with the underlying tables?
(And how can we do this
incrementally?)

2018 03 20_biological_databases_part3

  • 2.
  • 3.
  • 7.
    A Symptoms checker- Disease B Metabolic engineering C Cancer Neoantigens Familienaam Voornaam E-mail Opleidingsprogramma Davey Lucas Lucas.Davey@UGent.be C CMBIOISB David Sven Sven.David@UGent.be B IMCELB Engelen Yanou Yanou.Engelen@UGent.be ? IMCELB Ezquerro Marrodán Elsa Elsa.EzquerroMarrodan@UGent.be C IXGAEX Georis Raphaël Raphael.Georis@UGent.be B IMCELB Gilis Jeroen Jeroen.Gilis@UGent.be B CMBIOISB Lashkari Samira Samira.Lashakri@UGent.be A CMBIOISB Recer Karmen Karmen.Recer@UGent.be C IXGAEX Schindfessel Cédric Cedric.Schindfessel@UGent.be B IMCELB Silva Marta Marta.Silva@UGent.be B CMBIOISB Silva Meneses Rodrigo Rodrigo.Meneses@UGent.be C CMBIOISB Strybol Pieter-Paul PieterPaul.Strybol@UGent.be B/C CMBIOIBE Taelman Steff Steff.Taelman@UGent.be A CMBIOIBE Tóth Máté István MateIstvan.Toth@UGent.be C EXGAEX Toulmé Coralyne Coralyne.Toulme@UGent.be C IXGAEX Van hoyweghen Sergej Sergej.Vanhoyweghen@UGent.be ? IMCELB Willems Thomas Thomas.Willems@UGent.be ? IMCELB Wojciulewitsch Coralie Coralie.Wojciulewitsch@UGent.be A IMCELB Yekimov Illya Illya.Yekimov@UGent.be A IMCELB
  • 8.
    For the projectI suggest to take the following steps (individual or in group - maybe setup a collaborative tool like slack) 1. What it is about ? What do you want to achieve ? For who ? 2. identify information resources - think about a basic data-model 3. Draw (mockup) an interface, don't be constrained by technical consideration - think ouside the box:)
  • 9.
    SQL Clients • BestSQL clients • Free TOAD edge ??? • Alternatives • Navicat
  • 10.
    • Python example •BIOSQL load genbank (BIOSQL – biopython example)
  • 11.
    print ("Uploading data"); importpymysql db= pymysql.connect(host = "localhost",port=8889,user="root",passwd="root",db="db") cursor=db.cursor() #cursor.execute("DROP TABLE IF EXISTS USER") sql="insert into tb (tb_id,tb_name,tb_age,tb_sex) values ('1','Demo','25','male')" cursor.execute(sql) db.commit() db.close() print ("Done")
  • 12.
    Install BIOSQL locally •Get latest version of mysql (MAMP, mariaDB) • Download biosqldb-mysql.sql • Remove type=innodb • Launch database server • Connect using toad (port 8889) • Create database biosql; • Set as active database • Use worksheet to execute biosqldb- mysql.sql
  • 13.
    #Connecting to aBioSQL database -http://biopython.org/wiki/BioSQL from Bio import Entrez from Bio import SeqIO from BioSQL import BioSeqDatabase #db= pymysql.connect(host = "localhost",port=8889,user="root",passwd="root",db="db") server = BioSeqDatabase.open_database(driver = "pymysql",host = "localhost",port=8889,user="root",passwd="root",db="db") #db = server.new_database("test") db = server["test"] import pprint Entrez.email = "A.N.Other@example.com" handle = Entrez.efetch(db="nucleotide", rettype="gb", retmode="text", id="6273291,6273290,6273289") print ("Loading into BIOSQL") count = db.load(SeqIO.parse(handle, "genbank")) print ("Loaded %i records" % count) server.adaptor.commit() for seq_record in SeqIO.parse(handle, "genbank"): print (seq_record.id, seq_record.description[:50] + "...") print ("Sequence length %i," % len(seq_record)) print ("%i features," % len(seq_record.features)) print ("from: %s" % seq_record.annotations["source"]) pprint.pprint(seq_record) # pprint ("Loading") #load into BIOSQL # db.load_seqrecord(seq_record)
  • 15.
    Example 3-tier modelin biological database http://www.bioinformatics.be Example of different interface to the same back-end database (MySQL)
  • 16.
  • 17.
    Slide 17Prepared 3/21/2018 WhatIS Apache, Anyway?  Open-Source Web server originally based on NCSA server  Available on over 160 varieties of Unix -- and Windows NT  Over 56% of Internet Web servers run Apache or an Apache derivative Graph copyright Netcraft (<http://www.netcraft.com/survey/>)
  • 18.
    Slide 18Prepared 3/21/2018 ConfiguringApache  Choosing functionality – Apache functionality is available through modules which are either built into or loaded into the server  Server instructions – Apache reads its run-time configuration instructions from text files – No GUI available – 182 configuration directives in base package
  • 19.
    Slide 19Prepared 3/21/2018 ConfiguringApache (continued)  When used with -d option, server reads httpd.conf; -f allows use of a different name  After httpd.conf, server reads srm.conf, then access.conf (unless the latter two have been renamed by ResourceConfig and AccessConfig directives, respectively)  Consider combining these into a single file for simplicity
  • 20.
    Slide 20Prepared 3/21/2018 Logfiles Two basic logfiles – Access log -- who’s been visiting your server and what they wanted – Error log -- problems the server has encountered and things it has noticed  Can be configured for each virtual host, or for entire server  Access log format can be customised
  • 21.
    Simple Web form <html> <head><title>simpleform</title></head> <body> <form name="simpleForm" method="put" action="simpleHandler.cgi"> Your email address: <input type="text" name="email"> <input type="submit" value="Submit"> </form> </body> </html>
  • 22.
    Interacting with WebForms typically need to generate the form (which may be a normal static Web page), then • validate user input • process user input • generate a response dynamically. these three steps may be done within the Web browser (client-side) or within the Web server (server-side) or some combination of both.
  • 23.
    CGI Common Gateway Interface mechanismfor a Web browser to send data to a Web server allow browser to submit data to a program running on the server • program is often called a ‘CGI script’ • typically written in Perl, PHP or ASP • can also be a ‘real’ program (e.g. written in C)
  • 24.
    CGI (2) used primarilyfor form submission can also be used to upload local files ‘CGI’ URLs often contain ‘?’ and ‘&’ characters - but don’t have to! output from CGI usually dynamic and therefore not cached
  • 25.
  • 26.
    Brief History ofPHP PHP (PHP: Hypertext Preprocessor) was created by Rasmus Lerdorf in 1994. It was initially developed for HTTP usage logging and server-side form generation in Unix. PHP 2 (1995) transformed the language into a Server-side embedded scripting language. Added database support, file uploads, variables, arrays, recursive functions, conditionals, iteration, regular expressions, etc. PHP 3 (1998) added support for ODBC data sources, multiple platform support, email protocols (SNMP,IMAP), and new parser written by Zeev Suraski and Andi Gutmans . PHP 4 (2000) became an independent component of the web server for added efficiency. The parser was renamed the Zend Engine. Many security features were added. PHP 5 (2004) adds Zend Engine II with object oriented programming, robust XML support using the libxml2 library, SOAP extension for interoperability with Web Services, SQLite has been bundled with PHP
  • 27.
    Why is PHPused? 1. Easy to Use Code is embedded into HTML. The PHP code is enclosed in special start and end tags that allow you to jump into and out of "PHP mode". <html> <head> <title>Example</title> </head> <body> <?php echo "Hi, I'm a PHP script!"; ?> </body> </html>
  • 28.
    Getting Started 1. Howto escape from HTML and enter PHP mode • PHP parses a file by looking for one of the special tags that tells it to start interpreting the text as PHP code. The parser then executes all of the code it finds until it runs into a PHP closing tag. Starting tag Ending tag Notes <?php ?> Preferred method as it allows the use of PHP with XHTML <? ?> Not recommended. Easier to type, but has to be enabled and may conflict with XML <script language="php"> ?> Always available, best if used when FrontPage is the HTML editor <% %> Not recommended. ASP tags support was added in 3.0.4 <?php echo “Hello World”; ?> PHP CODE HTMLHTML
  • 29.
    Getting Started 2. SimpleHTML Page with PHP • The following is a basic example to output text using PHP. <html><head> <title>My First PHP Page</title> </head> <body> <?php echo "Hello World!"; ?> </body></html> Copy the code onto your web server and save it as “test.php”. You should see “Hello World!” displayed. Notice that the semicolon is used at the end of each line of PHP code to signify a line break. Like HTML, PHP ignores whitespace between lines of code. (An HTML equivalent is <BR>)
  • 30.
    Getting Started 3. Usingconditional statements • Conditional statements are very useful for displaying specific content to the user. The following example shows how to display content according to the day of the week. <?php $today_dayofweek = date(“w”); if ($today_dayofweek == 4){ echo “Today is Thursday!”; } else{ echo “Today is not Thursday.”; } ?>
  • 31.
    Getting Started 3. Usingconditional statements The if statement checks the value of $today_dayofweek (which is the numerical day of the week, 0=Sunday… 6=Saturday) • If it is equal to 4 (the numeric representation of Thurs.) it will display everything within the first { } bracket after the “if()”. • If it is not equal to 4, it will display everything in the second { } bracket after the “else”. <?php $today_dayofweek = date(“w”); if ($today_dayofweek == 4){ echo “Today is Thursday!”; } else{ echo “Today is not Thursday.”; } ?>
  • 32.
    Getting Started 3. Usingconditional statements If we run the script on a Thursday, we should see: “Today is Thursday”. On days other than Thursday, we will see: “Today is not Thursday.” <?php $today_dayofweek = date(“w”); if ($today_dayofweek == 4){ echo “Today is Thursday!”; } else{ echo “Today is not Thursday.”; } ?>
  • 33.
    example • HTML filegreet.html has <form action="greet.php" method="get"><p> your last name: <input type="text" name="lastname"/></p></form> • PHP file greet.php has <?php print "Hello "; print $_GET['lastname']; ?> in addition to the usual HTML stuff.
  • 34.
    WHAT is PEAR ThePHP Extension and Application Repository, or PEAR, is a framework and distribution system for PHP code components. Stig S. Bakken founded the PEAR project in 1999 to promote the re-use of code that performs common functions. The project has the goals of: • providing a structured library of code • maintaining a system for distributing code and for managing code packages • promoting a standard coding-style
  • 35.
    PEAR DB DB isa database abstraction layer providing: * an OO-style query API * portability features that make programs written for one DBMS work with other DBMS's * a DSN (data source name) format for specifying database servers * prepare/execute (bind) emulation for databases that don't support it natively * a result object for each query response * portable error codes * sequence emulation * sequential and non-sequential row fetching as well as bulk fetching * formats fetched rows as associative arrays, ordered arrays or objects * row limit support * transactions support * table information interface * DocBook and phpDocumentor API documentation
  • 36.
    To access adatabase through PEAR DB, you have to create a data source name (DSN) that specifies the appropriate PEAR DB backend for your database and the parameters necessary to connect to the database. DSN syntax: phptype(dbsyntax)://username:password@protocol+hostspec/database?option=value for example: (mysql) Connecting to Databases through PEAR DB
  • 37.
    How to connectand disconnect Connecting to Databases through PEAR DB
  • 38.
    Connecting to adatabase, creating a table, and inserting a record.
  • 39.
    Some Database Functions Query function  $d->query takes an SQL command as its string argument  Sends query to database server for execution  $d–>setErrorHandling(PEAR_ERROR_DIE)  Terminate program and print default error messages if any subsequent errors occur
  • 40.
    Retrieval Queries fromDatabase Tables  $q  Variable that holds query result  $q->fetchRow() retrieve next record in query result and control loop  $allresult = $d->getAll(query)  Holds all the records in a query result in a single variable called $allresult
  • 41.
    Summary  PHP scriptinglanguage  Very popular for Web database programming  PHP basics for Web programming  Data types  Database commands include:  Creating tables, inserting new records, and retrieving database records  Looping over a query result
  • 42.
    SQL consists ofonly 4 statements, sometimes referred to as CRUD: –Create - INSERT - to store new data –Read - SELECT - to retrieve data –Update - UPDATE - to change or modify data. –Delete - DELETE - delete or remove data Structured Query Language
  • 43.
    Relationships within RelationalDatabase • Relationship classifications – 1:1 – 1:M – M:N • E-R Model – ERD Maps E-R model – Entities - of tabellen. Ze worden weergegeven door rechthoeken met daarin de naam van het entiteitstype en soms een opsomming van de attributen – Relationships - dit zijn verbanden tussen de entiteittypen. We worden weergegeven door lijnen tussen de verbonden eniteitstypen
  • 44.
    ERD Symbols • Rectanglesrepresent entities • Diamonds represent the relationship(s) between the entities • “1” side of relationship – Number 1 in Chen Model – Bar crossing line in Crow’s Feet Model • “Many” relationships – Letter “M” and “N” in Chen Model – Three pronged “Crow’s foot” in Crow’s Feet Model
  • 45.
  • 46.
  • 47.
  • 48.
  • 49.
    Converting M:N Relationshipto Two 1:M Relationships
  • 50.
    Converting M:N Relationshipto Two 1:M Relationships (con’t.)
  • 51.
    Converting M:N Relationshipto Two 1:M Relationships (con’t.)
  • 52.
    Converting M:N Relationshipto Two 1:M Relationships (con’t.)
  • 53.
    Indexes • Points tolocation • Makes retrieval of data faster
  • 54.
    Applications of B+-Trees (1)Search key of B+-Tree is primary key for data file, index is dense (data file may or may not be sorted by pk): – There is one key-pointer pair in a leaf for every record of the data file (2) Data file is sorted by its primary key: – B+-Tree is a sparse index with one key-pointer pair at a leaf for each block of the file (3) Data file is sorted by an attribute that is not a key (search key for B+-Tree): – For each value K that appears in the data file, there is one key-pointer pair at the leaf. Pointer goes to the first of the records that have K as their sort-key value
  • 55.
    • Database ontwerpen- Gegevensmodellering • Welke informatie zal de database moeten kunnen leveren ? • Welke gegevens moet de database bevatten om in de vastgestelde informatiebehoefte te voorzien ? • Welke gegevens zijn beschikbaar ? • Hoe is het verband tussen de benodigde gegevens ?
  • 56.
    • Database ontwerpen- Gegevensmodellering • Top-down (beginnen bij de grote lijnen) • Bottum-Up (beginned bij detail) • Een veld veranderen of bijmaken is veel moeilijker dan een record invoegen
  • 57.
    Normalization Normalisatie (komen toteen gegevensmodel waarin geen enkel feit redundant, dat will zeggen meer dan één keer, voorkomt Nulde normaalvorm: begin Gegevensmodel uit initiele informatieanalyse, typische herhaalde attributen of herhaalde groepen van attributen Eerste normaalvorm: verwijder herhaalde groepen Tweede normaalvorm: bekijk samengestelde sleutels Derde normaalvorm: bekijk transitieve afhankelijkheden
  • 58.
    Normalization • Normalization isused to design a set of relation schemas that is optimal from the point of view of database updating • The normalization starts from a universal relation schema • There are six normal forms, of which only three are based on functional dependencies • Normal forms define to which extent we should normalize • The Synthesis algorithm and the Decomposition algorithm represent the formal normalization methods
  • 59.
    Finishing Database Design •To complete a database schema design, after the normalization is done, one has to define interrelation constraints (referential integrity constraints), as well • Normalization results in a set of relation schema • That design is suitable for efficient database update • But, it can slow down execution of queries • Sometimes, it is advisable to undertake controlled de normalization
  • 60.
    Privileges in SQL •select: allows read access to relation,or the ability to query using the view – Example: grant users U1, U2, and U3 select authorization on the branch relation: grant select on branch to U1, U2, U3 • insert: the ability to insert tuples • update: the ability to update using the SQL update statement • delete: the ability to delete tuples. • all privileges: used as a short form for all the allowable privileges
  • 61.
    Authorization Forms of authorizationon parts of the database: • Read - allows reading, but not modification of data. • Insert - allows insertion of new data, but not modification of existing data. • Update - allows modification, but not deletion of data. • Delete - allows deletion of data. Forms of authorization to modify the database schema Index - allows creation and deletion of indices. • Resources - allows creation of new relations. • Alteration - allows addition or deletion of attributes in a relation. • Drop - allows deletion of relations.
  • 62.
    Authorization Specification inSQL • The grant statement is used to confer authorization grant <privilege list> on <relation name or view name> to <user list> • <user list> is: – a user-id – public, which allows all valid users the privilege granted – A role • Granting a privilege on a view does not imply granting any privileges on the underlying relations. • The grantor of the privilege must already hold the privilege on the specified item (or be the database administrator).
  • 63.
    Revoking Authorization inSQL • The revoke statement is used to revoke authorization. revoke <privilege list> on <relation name or view name> from <user list> • Example: revoke select on branch from U1, U2, U3 • <privilege-list> may be all to revoke all privileges the revokee may hold. • If <revokee-list> includes public, all users lose the privilege except those granted it explicitly. • If the same privilege was granted twice to the same user by different grantees, the user may retain the privilege after the revocation. • All privileges that depend on the privilege being revoked are also revoked.
  • 64.
    Data Dictionary andSystem Catalog • meta-gegevens, data-dictionary: een soort geautomatiseerd naslagwerk met een overzicht over alle gebruikers, gegevens en geheugens • Data dictionary – Provides detailed account of all tables found within database – Metadata – Attribute names and characteristics • System catalog – Detailed data dictionary – System-created database – Stores database characteristics and contents – Tables can be queried just like any other tables – Automatically produces database documentation
  • 65.
    Functions and Procedures •SQL:1999 supports functions and procedures – Functions/procedures can be written in SQL itself, or in an external programming language – Functions are particularly useful with specialized data types such as images and geometric objects • Example: functions to check if polygons overlap, or to compare images for similarity – Some database systems support table-valued functions, which can return a relation as a result • SQL:1999 also supports a rich set of imperative constructs, including – Loops, if-then-else, assignment • Many databases have proprietary procedural extensions to SQL that differ from SQL:1999
  • 66.
    Procedural Extensions andStored Procedures • SQL provides a module language – Permits definition of procedures in SQL, with if-then-else statements, for and while loops, etc. • Stored Procedures – Can store procedures in the database – then execute them using the call statement – permit external applications to operate on the database without knowing about internal details
  • 67.
  • 68.
    Transaction Processing -Basics • A transaction is a logical unit of a database processing • Transaction processing systems include large databases and hundreds of concurrent users • Examples of these systems are: – airline reservations, – banking, – credit card processing, – supermarket checkout, and – similar systems
  • 69.
    Multi - UserDatabase Systems • One way to classify DBMSs is according to the number of concurrent users: – single user – multi-user • Majority of database systems are of a multi - user type • Concurrent (or simultaneous from the user point of view) database usage is possible thanks to computer multiprogramming • Multiprogramming operating systems execute some commands of one process, then suspend this process and execute some commands of another process • After a while, the execution of the first process is resumed at the point where it was interrupted • This type of process execution is called interleaving
  • 70.
    The Notion ofa Transaction • A transaction is a logical unit of a database processing that includes one or more database access operations (read and write ) • Each execution of a transaction program is a transaction • If a transaction finishes successfully, all data it has changed are visible to other transactions • If a transaction fails for any reason, DBMS has to undo all the changes that the transaction made against the database
  • 71.
    Transactions (continued) • Inmulti – user transaction processing systems, users execute database transactions concurrently • Most often, concurrent means interleaved • The users can attempt to modify the same database items at the same time, and that is potential source of database inconsistency • Checking database integrity constraints is not enough to protect a database from threats induced by its concurrent usage
  • 72.
    Commit • A transactionreaches its commit point when all of its operations that access the database have been executed successfully and the effect of all transaction operations on the database have been recorded in the log • Beyond the commit point, the effect of a transaction is assumed to be permanently recorded in the database • If a transaction does not reach its commit point and there is no [commit, T ] record in the log file, this transaction has to be rolled back • Read committed protocol: – If a transaction T updates a database item A, other transactions can read A only after T has committed
  • 73.
    Sources of DatabaseInconsistency Uncontrolled execution of database transactions in a multi – user environment can lead to database inconsistency • There is a number of possible sources of database inconsistency • The typical ones are: – lost update problem, – dirty read problem, and – unrepeatable read problem
  • 74.
    Lost Update Problem T1 T2 read_item( X ) X = X – N write_item (X ) read_item (X) X = X + M write_item (X) t i m e After termination of T2, X = X + M. T1's update to X is lost because T2 wrote over X Generally, lost update problem is characterized by: •T2 reads X, •T1 writes X, and •T2 writes X
  • 75.
    Dirty Read Problem T1 T2 read_item( X ) X = X – N write_item (X ) read_item ( Y ) T1 fails read_item (X) X = X + M write_item (X) t i m e Generally, dirty read problem is characterized by: •T1 writes X, •T2 reads X, and •T1 fails Since T1 failed, DBMS is going to undo the changes it made against the database T2 has already read item X = X - N value, and that value is going to be altered by DBMS back to X
  • 76.
    Unrepeatable Read Problem T1 T2 read_item( X ) read_item (X ) read_item (X) X = X + M write_item (X) t i m e Transaction T1 has got two different values of X in two subsequent reads, because T2 has changed it in the meantime Even if T1 didn't execute the second read command, it would use a stale X value, and that's another form of the unrepeatable read problem Generally, unrepeatable read problem is characterized by: •T1 reads X, •T2 writes X, and •T1 reads X
  • 77.
    Prevention of ConcurrencyAnomalies • Lost update, dirty read and unrepeatable read are called concurrency anomalies • The concurrency control part of a DBMS has the task to prevent these problems • DBMS is responsible to ensure that either all operations of a transaction are successfully executed and their effect is permanently stored in the database, or it happens as if the transaction were even not started • The effect of a partially executed transaction has to be undone
  • 78.
    Types of Failures •A transaction can be partially executed due to: – A computer failure(hardware, software, network,…) – A transaction error (overflow, division by zero,…) – An exception condition (lack of data) – A concurrency control enforcement (dead lock, timeout,…) – An abort command in the transaction program
  • 79.
    Transaction State TransitionDiagram Failed abort begin transaction Active read, write Partially committed end transaction Committed commit Terminated Program command Transaction state
  • 80.
    Log File • Tobe able to recover from failures DBMS maintains a log file • Typically, a log file contains records with following contents: [start_transaction, T ] (*T is transaction id*) [write_item, T, X, old_value, new_value] [read_item,T, X ] (*optional*) [commit, T ] [abort, T ]
  • 81.
    Summary • Executing transactionin an interleaved way may bring a database in an inconsistent state • Transaction anomalies are: – Lost update, – Dirty read, and – Unrepeatable read • A DBMS is responsible to ensure that either all operations of a transaction are successfully executed, or it is rolled back • Log file records all important events (start, read, write, commit) • When a transaction reaches its commit point, everything is safely stored in a database (or a log file)
  • 82.
    Data Warehousing andDecision Support
  • 83.
    Views and DecisionSupport • OLAP queries are typically aggregate queries. – Precomputation is essential for interactive response times. – The CUBE is in fact a collection of aggregate queries, and precomputation is especially important: lots of work on what is best to precompute given a limited amount of space to store precomputed results. • Warehouses can be thought of as a collection of asynchronously replicated tables and periodically maintained views. – Has renewed interest in view maintenance!
  • 84.
    View Modification (EvaluateOn Demand) CREATE VIEW RegionalSales(category,sales,state) AS SELECT P.category, S.sales, L.state FROM Products P, Sales S, Locations L WHERE P.pid=S.pid AND S.locid=L.locid SELECT R.category, R.state, SUM(R.sales) FROM RegionalSales AS R GROUP BY R.category, R.state SELECT R.category, R.state, SUM(R.sales) FROM (SELECT P.category, S.sales, L.state FROM Products P, Sales S, Locations L WHERE P.pid=S.pid AND S.locid=L.locid) AS R GROUP BY R.category, R.state View Query Modified Query
  • 85.
    View Materialization (Precomputation) •Suppose we precompute RegionalSales and store it with a clustered B+ tree index on [category,state,sales]. – Then, previous query can be answered by an index- only scan. SELECT R.state, SUM(R.sales) FROM RegionalSales R WHERE R.category=“Laptop” GROUP BY R.state SELECT R.state, SUM(R.sales) FROM RegionalSales R WHERE R. state=“Wisconsin” GROUP BY R.category Index on precomputed view is great! Index is less useful (must scan entire leaf level).
  • 86.
    Materialized Views • Aview whose tuples are stored in the database is said to be materialized. – Provides fast access, like a (very high-level) cache. – Need to maintain the view as the underlying tables change. – Ideally, we want incremental view maintenance algorithms. • Close relationship to data warehousing, OLAP, (asynchronously) maintaining distributed databases, checking integrity constraints, and evaluating rules and triggers.
  • 87.
    Issues in ViewMaterialization • What views should we materialize, and what indexes should we build on the precomputed results? • Given a query and a set of materialized views, can we use the materialized views to answer the query? • How frequently should we refresh materialized views to make them consistent with the underlying tables? (And how can we do this incrementally?)