Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Database Management Systems  Prof. Oliver Günther, Ph.D.
Databases = Electronic Filing Cabinets? <ul><li>online access vs. applications </li></ul><ul><li>difference DB-WWW? </li><...
Databases = Electronic Filing Cabinets?
Requirements for a Database System <ul><li>large capacity - huge data sets: </li></ul><ul><li>-  banking/insurance apps.: ...
3-Layer Architecture <ul><li>External layers  PASCAL  COBOL </li></ul><ul><li>User views  record emp of  01 Ang </li></ul>...
3-Layer Architecture (cont.) <ul><li>External layer </li></ul><ul><ul><li>one external layer per user view or application ...
Database Administration <ul><li>Database administrator (DBA) </li></ul><ul><li>-  user contact </li></ul><ul><li>-  defini...
Abstraction Layers: Logical vs. Physical <ul><li>logical modeling </li></ul><ul><li>-  entities </li></ul><ul><li>-  relat...
Entity Relationship Model <ul><li>ER = Entity - Relationship </li></ul><ul><li>entity : object, „thing“ </li></ul><ul><li>...
Data Models <ul><li>Hierarchical </li></ul><ul><li>-  1:n relationships </li></ul><ul><li>-  tree-like data structures </l...
Hierarchical Data Model - Example
Hierarchical Data Model - Example
Hierarchical Data Model - a Concrete Database
Hierarchical Data Model - a Concrete Database
Data Models (2) <ul><ul><ul><li>Network (''CODASYL'') </li></ul></ul></ul><ul><li>n:m relationships - graph-like data stru...
Data Models (2) <ul><ul><ul><li>Network (''CODASYL'') </li></ul></ul></ul><ul><li>n:m relationships - graph-like data stru...
Data Models (2) <ul><ul><ul><li>Network (''CODASYL'') </li></ul></ul></ul><ul><li>n:m relationships - graph-like data stru...
Data Models (3) <ul><li>Relational </li></ul><ul><li>-  n:m relationships </li></ul><ul><li>-  table as data structure </l...
The Relational Data Model <ul><li>Ex.: Supplier - Part </li></ul><ul><li>Supplier (S-No, Name, Address) </li></ul><ul><li>...
The Relational Data Model: Formal Calculus <ul><li>relation (table):  subset of the Cartesian product of a list of domains...
The Relational Data Model: Formal Calculus (cont.) <ul><li>Cartesian Product  of a set of domains D i </li></ul><ul><li>(D...
The Relational Data Model: Formal Calculus (cont.) <ul><li>relation : finite subset of the Cartesian product </li></ul><ul...
Entities vs. Relationships vs. Relations  <ul><li>entity set     relation </li></ul><ul><li>attribute     column </li></...
Key of a  Relation <ul><li>a set S of attributes of a relation R is a  key  of R if </li></ul><ul><li>(1) no instance of R...
Key of a Relation (cont.) <ul><li>the question of what is the key of a relation R depends on R‘s schema,  </li></ul><ul><l...
Relational Algebra <ul><li>set of mathematical operations on relations [Codd, 1970s] </li></ul><ul><li>Ex.:  R  S </li></u...
Relational Algebra (cont.) <ul><li>Cartesian Product R    S </li></ul><ul><li>projection (subset of columns) </li></ul><u...
Relational Algebra (cont.)    - Join  R  S i  j -  i, j : names of columns (R.i, S.j) -    : arithmetic comparison oper...
Relational Algebra (cont.) <ul><li>equijoin </li></ul><ul><li>special kind of   - Join:    is = </li></ul><ul><li>Ex.:  ...
SQL: Structured Query Language <ul><li>4 th  Generation Language  (4 GL) for data querying and manipulation </li></ul><ul>...
Toy Database Customers Orders Contains Supplies
SQL: Projection <ul><li>Ex.: Find name and account balance of all customers </li></ul><ul><li>relational algebra: </li></u...
SQL: Selection <ul><li>Ex.: Find all customers with negative balance </li></ul><ul><li>SQL: </li></ul><ul><li>relational a...
SQL: Uniqueness of Names <ul><li>if attribute names are unique, one can drop the relation name(s)  </li></ul><ul><li>in th...
SQL: Aliases <ul><li>alias = second name for an attribute </li></ul><ul><li>to be attached to the original name of the col...
SQL: Equijoins <ul><li>Ex.: Find the products ordered by Peter </li></ul><ul><li>in relational algebra  </li></ul><ul><li>...
SQL: Processing a Join Query <ul><li>Ex.: </li></ul><ul><li>SELECT  Product </li></ul><ul><li>FROM  Orders, Contains </li>...
SQL: Processing a Join Query (cont.)    Selection: Customer = '‘Peter''    Equijoin: Orders.O_No = Contains.O_No <ul><li...
SQL: Processing a Join Query (cont.) Projection: SELECT Product   <ul><li>Operations can sometimes be exchanged    effic...
SQL: Deleting Multiple Copies of a Tuple <ul><li>why do they exist? </li></ul><ul><li>keyword  DISTINCT </li></ul><ul><li>...
SQL: Tuple Variables <ul><li>necessary if one needs to address several different tuples of the  </li></ul><ul><li>same rel...
SQL: Tuple Variables <ul><li>necessary if one needs to address several different tuples of the  </li></ul><ul><li>same rel...
SQL: Subqueries <ul><li>nesting of queries </li></ul><ul><li>reference to intermediate results via the keyword  IN </li></...
SQL: Subqueries (cont.) <ul><li>Instead of  IN : ALL   </li></ul><ul><li>SELECT  Product </li></ul><ul><li>FROM  Supplies ...
SQL: Subqueries (cont.) <ul><li>Instead of  IN : ANY </li></ul><ul><li>SELECT  FROM  Orders </li></ul><ul><li>WHERE  O_No ...
SQL: Subqueries (cont.) <ul><li>Statt  IN :  = </li></ul><ul><li>SELECT  Product </li></ul><ul><li>FROM  Contains </li></u...
SQL: Aggregates <ul><li>Functions for the aggregation of single values </li></ul><ul><li>AVG  - average </li></ul><ul><li>...
SQL: Aggregates (cont.) <ul><li>Ex.: </li></ul><ul><li>SELECT COUNT (DISTINCT Name) No-Suppliers </li></ul><ul><li>FROM Su...
SQL: Aggregation and Grouping <ul><li>GROUP BY   </li></ul><ul><li>A 1 , A 2 , ..., A k </li></ul><ul><li>two tuples are i...
SQL: Aggregation and Grouping (cont.) <ul><li>Ex: </li></ul><ul><li>SELECT  Customer, AVG(Amount) </li></ul><ul><li>FROM  ...
SQL: GROUP BY ... HAVING <ul><li>general format: </li></ul><ul><li>GROUP BY A 1 , A 2 , ..., A k </li></ul><ul><li>HAVING ...
SQL: Insertion of Tuples <ul><li>in general: </li></ul><ul><li>INSERT INTO R </li></ul><ul><li>VALUES   (V i , ..., V k ) ...
SQL: Deletion of Tuples <ul><li>in general: </li></ul><ul><li>DELETE FROM R </li></ul><ul><li>WHERE   </li></ul><ul><li>e...
SQL: Updating Tuples <ul><li>in general: </li></ul><ul><li>UPDATE R </li></ul><ul><li>SET A 1 =x 1 , ..., A k =x k </li></...
SQL - DDL <ul><li>DDL : Data Definition Language </li></ul><ul><li>so far we only discussed the  DML  - Data Manipulation ...
Views <ul><li>logical relations </li></ul><ul><li>so far we only discussed physical relations (stored on disk), also calle...
Views (cont.) <ul><li>view definition - general form </li></ul><ul><li>CREATE VIEW V (A 1 , ... , A k ) AS </li></ul><ul><...
View Update Problem <ul><li>ex.: Offer - Chris </li></ul><ul><ul><li>DELETE </li></ul></ul><ul><ul><li>INSERT </li></ul></...
View Update Problem (cont.) <ul><li>ex.:  CREATE VIEW X AS </li></ul><ul><li>SELECT Product, AVG(Price) DP </li></ul><ul><...
View Update Problem (cont.) <ul><li>ex.:  CREATE VIEW Y AS </li></ul><ul><li>SELECT C2.Name, C2.Address </li></ul><ul><li>...
View Update Problem (cont.) <ul><li>Views can be updated if </li></ul><ul><li>(1)  the corresponding base relations can be...
View Update Problem (cont.) all possible views views that can be updated views according to  (1) and (2) views that can be...
Views - Summary <ul><li>logical relations </li></ul><ul><li>defined using physical base relations (and possibly other view...
Databases - Programming Languages <ul><li>collision of two different paradigms </li></ul><ul><li>-  PL:  one tuple at a ti...
Ex: Embedded SQL exec sql begin declare section; int O_No, Amount; char Date [10], Customer [20], Product [10]; exec sql e...
Integrity in Databases <ul><li>maintenance of a correct relationship database - real world </li></ul><ul><li>(possibly aut...
Integrity in Databases (cont.) <ul><li>key integrity </li></ul><ul><li>-  rule 1 ( entity integrity ): </li></ul><ul><li>e...
Database Design <ul><li>ex. for bad database design: </li></ul><ul><li>Suppliers - Info </li></ul><ul><li>disadvantages </...
Database Design by Decomposition <ul><li>approach: </li></ul><ul><ul><li>decomposition into relations with less columns </...
Functional Dependencies <ul><li>logical dependencies between columns </li></ul><ul><li>causes many of the problems discuss...
Functional Dependencies (cont.) <ul><li>Ex.: </li></ul><ul><ul><li>Customers:  Name     Address </li></ul></ul><ul><li>Na...
Closure of FD Sets <ul><li>F + := {X    Y: there is an FD A    B in F: A    B  |= X    Y} </li></ul><ul><li>the closur...
Minimal Cover of a Set F of FDs <ul><li>given a set  F  of FDs,  F  is a  minimal cover  of F if and only if: </li></ul><u...
FDs and Database Design <ul><li>potential problem: too many FDs in a relation  </li></ul><ul><li>may lead to anomalies and...
Database Design and Normal Forms <ul><li>why normal forms? </li></ul><ul><li>-  format standardization (1NF) </li></ul><ul...
1st Normal Form (1NF) <ul><li>all attributes have to be atomic </li></ul><ul><li>no „repeating groups“ </li></ul><ul><li>i...
2nd Normal Form (2NF) <ul><li>1NF + for all attributes A and attribute sets X in relation R: </li></ul><ul><li>X    A in ...
3rd Normal Form (3NF) <ul><li>2NF + for all attributes A and attribute sets X in relation R: </li></ul><ul><li>X is a key ...
3NF - An Example <ul><li>relation R = (C, S, Z) </li></ul><ul><li>functional dependencies: F = { CS   Z, Z    C}  </li><...
3NF - An Example <ul><li>relation R = (C, S, Z) </li></ul><ul><li>functional dependencies: F = { CS   Z, Z    C}  </li><...
3NF - An Example <ul><li>relation R = (C, S, Z) </li></ul><ul><li>functional dependencies: F = { CS   Z, Z    C}  </li><...
3NF - An Example <ul><li>relation R = (C, S, Z) </li></ul><ul><li>functional dependencies: F = { CS   Z, Z    C}  </li><...
3NF - An Example <ul><li>relation R = (C, S, Z) </li></ul><ul><li>functional dependencies: F = { CS   Z, Z    C}  </li><...
3NF - An Example <ul><li>relation R = (C, S, Z) </li></ul><ul><li>functional dependencies: F = { CS   Z, Z    C}  </li><...
Decompositon into 3NF <ul><li>given : relation R, set of FD's F </li></ul><ul><li>find : decomposition of R into a set of ...
Decomposition into 3NF - Example Attributes: L ...  Lecture   R ...  Room I ...  Instructor  S ...  Student T ...  Time   ...
Decomposition into 3NF - Example (cont.) Attributes: L ...  Lecture   R ...  Room I ...  Instructor  S ...  Student T ... ...
Decomposition into 3NF - Example (cont.) <ul><li>Keys: </li></ul><ul><li>Key Attributes: </li></ul>
Decomposition into 3NF - Example (cont.) <ul><li>Keys: ST </li></ul><ul><li>Key Attributes: S, T </li></ul>
Decomposition into 3NF - Example (cont.) <ul><li>F = {  L    I ,  </li></ul><ul><li>TR    L, </li></ul><ul><li>TI    R,...
Decomposition into 3NF - Example (cont.) <ul><li>F = {  L    I ,  </li></ul><ul><li>TR    L, </li></ul><ul><li>TI    R,...
Indices <ul><li>data structures  (often tree structures) that serve to accelerate database searches </li></ul><ul><li>freq...
Indices (cont.) <ul><li>Name  and  Product  are the  indexed columns </li></ul><ul><li>Index on  Name  is  primary index <...
Dense vs. Sparse Indices <ul><li>relations are stored in blocks (pages) on the magnetic disk </li></ul><ul><li>crucial cos...
How Does a Disk Access Work? Disk Drive Read block Write block Main Memory
Dense vs. sparse indices: An Example Oysters Peanuts Lettuce Index   on Name (sparse) Index on Product (dense) Price Peanu...
<ul><li>large relations    large indices </li></ul><ul><li>indexing a larger index leads to a smaller index etc. </li></u...
B+ Tree <ul><li>tree structure as described above </li></ul>A
B+ Tree   (cont.) <ul><li>B+ trees are balanced (i.e., all leaves are on the same level) </li></ul><ul><li>lowest level (l...
B+ Tree (cont.) <ul><li>each node has between N/2 and N entries </li></ul><ul><li>problems: overflow, underflow </li></ul>...
B+ Tree (cont.)
B+Baum (cont.)
Hashing - An Alternative to Indices <ul><li>hash function h: </li></ul><ul><li>data value    storage address </li></ul><u...
Hashing (cont.): Storage Structure  <ul><li>only one hash field per relation! </li></ul><ul><li>advantage: very fast acces...
Hashing (cont.): Collision Chains
Query Optimization <ul><li>Ex.: </li></ul><ul><li>SELECT  DISTINCT Orders.Customer </li></ul><ul><li>FROM  Orders, Contain...
Query Optimization (cont.) <ul><li>Strategy 1: </li></ul><ul><li>1) Compute cartesian product  Orders     Contains </li><...
Query Optimization (cont.) <ul><li>Analysis Strategy 1: </li></ul><ul><li>(1)+(2):  Tuple-I/Os for  Orders : </li></ul><ul...
Query Optimization (cont.) <ul><li>Which (meta)data should be stored? (Statistics) </li></ul><ul><li>-  number of tuples f...
Transaction Processing <ul><li>Transaction (TA) </li></ul><ul><li>-  logical unit of work </li></ul><ul><li>-  should be e...
Recovery <ul><li>Recovery: restart after system fault </li></ul><ul><li>System faults </li></ul><ul><li>-  program crash <...
Recovery (cont .) <ul><li>COMMIT </li></ul><ul><li>-  operation to terminate a TA successfully </li></ul><ul><li>-  all up...
Recovery (cont.) (Updates are stored on some “safe” medium) checkpoint  checkpoint  checkpoint  error  recovery
Recovery (cont.) <ul><li>3 types of transactions </li></ul><ul><li>- transactions that already completed and whose results...
Concurrency: Dirty Read Problem transaction A action on basis of  R.X  read from  R.X transaction B commit B update  R.X  ...
Concurrency: Lost Update Problem transaction A transaction B transaction B transaction A A reads  R.X double  R.X A writes...
Concurrency: Possible Solutions <ul><li>Timestamps  to coordinate transactions </li></ul><ul><li>Locks : temporary blockin...
Locks: Application to Dirty Read Yes Yes Yes Yes Yes Yes N N N
Locks: Application to Dirty Read (cont.) TA A obtains an X-lock for the field R.X to prepare for the planned update TA B a...
Locks: Application to Dirty Read (cont.) TA A requests X-Lock for R.X  TA A obtains X-Lock, updates R.X TA B requests S-Lo...
Locks: Application to Lost Update DEADLOCK    break via Rollback of some TA TA A wants to read R.X, asks for S-lock TA A ...
Deadlocks <ul><li>Problem: How to recognize deadlocks? </li></ul><ul><li>How to treat deadlocks involving several TAs? </l...
Serializability <ul><li>Given a set of TAs, which possible events should be considered correct? </li></ul><ul><li>Conventi...
Serializability - An Example <ul><li>Assumption: A = 1 </li></ul><ul><li>TA1, TA2, TA3: </li></ul><ul><li>TA1, TA3, TA2: <...
Concurrency: 2-Phase Locking <ul><li>2-Phase locking protocol </li></ul><ul><ul><li>for each transaction  one first asks f...
Concurrency and 2-Phase Locking Theorem: 2-Phase Locking Protokoll for each transaction Serializability of the schedule 2-...
<ul><li>Constraints and Properties  </li></ul><ul><li>-  Minimum distance between roads and biotopes  </li></ul><ul><li>- ...
Environmental Data Modeling: An Example (2)  <ul><li>Queries  </li></ul><ul><li>What is the distance between the planned r...
Spatial Data Types  <ul><li>Points  </li></ul><ul><li>Lines  </li></ul><ul><li>Polygons  </li></ul><ul><li>Curves  </li></...
Spatial Operators (1): Set Operators  <ul><li>Union  </li></ul><ul><li>Intersection  </li></ul><ul><li>Difference </li></ul>
Spatial Operators (2): Search Operators Point Query:   find all spatial objects that contain/are near a given point Range ...
Spatial Operators (3): Similarity Operators  <ul><li>Translation  </li></ul><ul><li>Rotation  </li></ul><ul><li>Scaling   ...
Spatial Operators (4): Spatial Joins  <ul><li>Join between different classes of objects  </li></ul><ul><li>Examples  </li>...
Spatial Data Structures (1): Vertex Lists <ul><li>List of polygon vertices  </li></ul><ul><li>Supported operators : </li><...
Spatial Data Structures (2): B-Rep (Boundary Representation)
Spatial Data Structures (3): B-Rep (Boundary Representation) <ul><li>3D: DAG of height 3  </li></ul><ul><li>Supported oper...
What's the problem with commercial GIS?  <ul><li>GIS = Geographic Information Systems  </li></ul><ul><li>Originally orient...
And what about commercial databases? (1)  <ul><li>No geometric data types: point, line, polygon, ...  </li></ul><ul><li>Ge...
And what about commercial databases? (2)  <ul><li>Objects may be decomposed onto different relations  </li></ul><ul><li>No...
And what about commercial databases? (3)  <ul><li>No spatial access methods  </li></ul><ul><li>Little support for applicat...
Database Extensions (1) Abstract Data Types <ul><li>Abstract data types (ADTs)  </li></ul><ul><li>-  Encapsulation of a (u...
Database Extensions (2): Implementation of Abstract Data Types define type  Box is (Internal length = 16,  Input Proc = Ch...
Database Extensions (3): Implementation of Abstract Data Types <ul><li>Advantages  </li></ul><ul><li>-  Very flexible  </l...
<ul><li>Point query  </li></ul><ul><li>Range query   </li></ul>Database Extensions (3): Spatial Access Methods
Database Extensions (5): R - Trees <ul><li>Features  </li></ul><ul><li>-  Hierarchy of d-dimensional boxes  </li></ul><ul>...
Object-Oriented Database Systems <ul><li>The OODBS Manifesto (Atkinson et al. 1989):  OODBS = DBS + ...  </li></ul><ul><ul...
Behavioral Object-Orientation for Geometric Modeling <ul><li>Integration of complex geometric data types and operators  </...
Structural Object-Orientation for Geometric Modeling (1) <ul><li>Complex geometric objects  </li></ul><ul><li>Boundary rep...
Structural Object-Orientation for Geometric Modeling (2) add  class River type tuple (rname: string    rshape: list(Polyli...
Structural Object-Orientation for Application Modeling (1) <ul><li>Complex geo-objects  </li></ul><ul><li>Example: city - ...
Structural Object-Orientation for Application Modeling (2) add class  City type tuple (cname: string cpopulation: integer ...
Behavioral Object-Orientation for Application Modeling <ul><li>Integration of application-specific data types and operatio...
Upcoming SlideShare
Loading in …5
×

O. Günther: Database Management Systems

2,678 views

Published on

O. Günther: Database Management Systems

  1. 1. Database Management Systems Prof. Oliver Günther, Ph.D.
  2. 2. Databases = Electronic Filing Cabinets? <ul><li>online access vs. applications </li></ul><ul><li>difference DB-WWW? </li></ul>
  3. 3. Databases = Electronic Filing Cabinets?
  4. 4. Requirements for a Database System <ul><li>large capacity - huge data sets: </li></ul><ul><li>- banking/insurance apps.: gigabytes of data (10 9 - 10 11 bytes) </li></ul><ul><li>- environmental apps.: terabytes of data ( > 10 12 bytes) </li></ul><ul><li>user-friendly read/write access </li></ul><ul><li>efficient processing - short response times </li></ul><ul><li>data security </li></ul><ul><li>privacy </li></ul><ul><li>persistency, robustness towards hardware problems </li></ul><ul><li>control of redundancy </li></ul><ul><li>consistency </li></ul><ul><li>multiple users (including concurrency) </li></ul><ul><li>integrated data management </li></ul><ul><li>structured data management (logical, physical) </li></ul><ul><li>low cost </li></ul><ul><li>role of standards </li></ul><ul><li>data independence </li></ul>
  5. 5. 3-Layer Architecture <ul><li>External layers PASCAL COBOL </li></ul><ul><li>User views record emp of 01 Ang </li></ul><ul><li>pno: string; 02 P-NR PIC X(6) </li></ul><ul><li>... salary: integer; 02 ABT PIC X(4) </li></ul><ul><li>end </li></ul><ul><li>Conceptional layer EMPLOYEE </li></ul><ul><li>common logical view PNO CHAR(6) </li></ul><ul><li>DEPT CHAR(4) </li></ul><ul><li>SALARY INT </li></ul><ul><li>Internal layer STORED_EMP LENGTH=20 </li></ul><ul><li>common physical view PREFIX TYPE=BYTE(6), OFFSET=0 </li></ul><ul><li>EMP# TYPE=BYTE(6), OFFSET=6 </li></ul><ul><li>DEPT# TYPE=BYTE(4), OFFSET=12 </li></ul><ul><li>WAGE TYPE=FULLWORD, </li></ul><ul><li>OFFSET=16 </li></ul>
  6. 6. 3-Layer Architecture (cont.) <ul><li>External layer </li></ul><ul><ul><li>one external layer per user view or application program </li></ul></ul><ul><ul><li>Application program: embedded database commands </li></ul></ul><ul><ul><li>User: ad hoc query languages, menus, frames </li></ul></ul><ul><li>Conceptional layer </li></ul><ul><ul><li>logical view of the complete database </li></ul></ul><ul><ul><li>often union of all external views </li></ul></ul><ul><li>Internal layer </li></ul><ul><ul><li>oriented along the physical storage structure </li></ul></ul><ul><ul><li>(pages/blocks) </li></ul></ul><ul><ul><li>data independence?? </li></ul></ul>
  7. 7. Database Administration <ul><li>Database administrator (DBA) </li></ul><ul><li>- user contact </li></ul><ul><li>- definition of external views </li></ul><ul><li>- definition of conceptional view </li></ul><ul><li>- definition of internal view </li></ul><ul><li>- security mechanisms </li></ul><ul><li>- backup and recovery mechanisms </li></ul><ul><li>- monitoring of response behavior </li></ul><ul><li>Data dictionary (metadata) </li></ul><ul><li>- which data are known? </li></ul><ul><li>- how are the data structured logically ? </li></ul><ul><li>- how are the data structured physically ? </li></ul>
  8. 8. Abstraction Layers: Logical vs. Physical <ul><li>logical modeling </li></ul><ul><li>- entities </li></ul><ul><li>- relationships </li></ul><ul><li>data modeling </li></ul><ul><li>- hierarchical </li></ul><ul><li>- network </li></ul><ul><li>- relational </li></ul><ul><li>- object-oriented </li></ul><ul><li>physical modeling </li></ul><ul><li>- storage structures </li></ul><ul><li>- access methods </li></ul>
  9. 9. Entity Relationship Model <ul><li>ER = Entity - Relationship </li></ul><ul><li>entity : object, „thing“ </li></ul><ul><li>attribute : property </li></ul><ul><li>entity set/entity class: object class </li></ul><ul><li>relationship </li></ul><ul><li>Example.: </li></ul><ul><li>- entity classes : supplier, part </li></ul><ul><li>- attributes : supplier number, supplier name, address </li></ul><ul><li>part number, part name, color </li></ul><ul><li>- entities : Miller, Smith, Shultz, <supplier> </li></ul><ul><li>screw, nail <part> </li></ul><ul><li>- relationships : supplies </li></ul><ul><li>- attributes (of relationships) : capacity </li></ul>
  10. 10. Data Models <ul><li>Hierarchical </li></ul><ul><li>- 1:n relationships </li></ul><ul><li>- tree-like data structures </li></ul><ul><li>- Products: IMS, ... </li></ul><ul><li>Ex.: Company, Supplier, Product, Part </li></ul><ul><li>Problems </li></ul><ul><li>- n:m relationships (Ex.: Product-Supplier) </li></ul><ul><li>- redundancies </li></ul><ul><li>- tight coupling logical-physical </li></ul>
  11. 11. Hierarchical Data Model - Example
  12. 12. Hierarchical Data Model - Example
  13. 13. Hierarchical Data Model - a Concrete Database
  14. 14. Hierarchical Data Model - a Concrete Database
  15. 15. Data Models (2) <ul><ul><ul><li>Network (''CODASYL'') </li></ul></ul></ul><ul><li>n:m relationships - graph-like data structures </li></ul><ul><li>Products: IDMS, ADABAS (Software AG), ... </li></ul><ul><li>Ex.: Supplier - Part (n:m relationship) </li></ul><ul><li>database schema </li></ul><ul><li>a concrete database </li></ul>
  16. 16. Data Models (2) <ul><ul><ul><li>Network (''CODASYL'') </li></ul></ul></ul><ul><li>n:m relationships - graph-like data structures </li></ul><ul><li>Products: IDMS, ADABAS (Software AG), ... </li></ul><ul><li>Ex.: Supplier - Part (n:m relationship) </li></ul><ul><li>database schema </li></ul><ul><li>a concrete database </li></ul><ul><ul><ul><li>Problem: confusing, inefficient </li></ul></ul></ul>
  17. 17. Data Models (2) <ul><ul><ul><li>Network (''CODASYL'') </li></ul></ul></ul><ul><li>n:m relationships - graph-like data structures </li></ul><ul><li>Products: IDMS, ADABAS (Software AG), ... </li></ul><ul><li>Ex.: Supplier - Part (n:m relationship) </li></ul><ul><li>database schema </li></ul><ul><li>a concrete database </li></ul><ul><ul><ul><li>Problem: confusing, inefficient </li></ul></ul></ul>Supplier Part
  18. 18. Data Models (3) <ul><li>Relational </li></ul><ul><li>- n:m relationships </li></ul><ul><li>- table as data structure </li></ul><ul><li>- Products: Oracle 8i, Informix Universal Server, IBM DB2, </li></ul><ul><li>SYBASE, Microsoft Access, Microsoft SQL Server, ... </li></ul><ul><li>- market share still growing </li></ul><ul><li>Problem </li></ul><ul><li>- legacy problems </li></ul><ul><li>- migration strategies (Y2K)? </li></ul>
  19. 19. The Relational Data Model <ul><li>Ex.: Supplier - Part </li></ul><ul><li>Supplier (S-No, Name, Address) </li></ul><ul><li>Part (P-No,P-Name, Color) </li></ul><ul><li>SP_R (S-No, P-No, Capacity) </li></ul>
  20. 20. The Relational Data Model: Formal Calculus <ul><li>relation (table): subset of the Cartesian product of a list of domains </li></ul><ul><li>domain : set of possible values for one column </li></ul><ul><li>- Ex.: INTEGER </li></ul><ul><li>{0,1} </li></ul><ul><li>{grey, blue, red} </li></ul><ul><li>- similar to a data type in programming languages </li></ul>
  21. 21. The Relational Data Model: Formal Calculus (cont.) <ul><li>Cartesian Product of a set of domains D i </li></ul><ul><li>(D 1 ×D 2 ×...×D k ): set of all k-tuples (v 1 , ..., v k ), where v i  D i (i=1,...,k) </li></ul><ul><li>- Ex.: k=2, D 1 ={0,1}, D 2 ={a,b,c} </li></ul><ul><li>(0, a) </li></ul><ul><li>(0, b) </li></ul><ul><li>(0, c) </li></ul><ul><li>(1, a) </li></ul><ul><li>(1, b) </li></ul><ul><li>(1, c) </li></ul>D 1  D 2 =
  22. 22. The Relational Data Model: Formal Calculus (cont.) <ul><li>relation : finite subset of the Cartesian product </li></ul><ul><li>- Ex.: </li></ul><ul><li>tuple : element (line) of a relation </li></ul><ul><li>arity or degree : number of attributes (columns) of a relation </li></ul><ul><li>a tuple ( v 1 ... v k ) has k components (k-tuple) </li></ul><ul><li>schema : the collection of the relation name, the attribute names, and the domains </li></ul><ul><li>Ex.: Supplier ( S-No, Name, Address ) </li></ul><ul><li>relations are sets (in principle ...) </li></ul><ul><li>- no tuples can appear more than once </li></ul><ul><li>- not sorted </li></ul>
  23. 23. Entities vs. Relationships vs. Relations <ul><li>entity set  relation </li></ul><ul><li>attribute  column </li></ul><ul><li>ex.: Supplier (S-No, Name, Adress, ...) </li></ul><ul><li>entity  line (tuple) </li></ul><ul><li>relationship between entity sets E 1 , ..., E k  relation whose schema </li></ul><ul><li>consists of the key attributes of E 1 , ..., E k (+ possibly additional information) </li></ul><ul><li>Ex. 1: Supplier (S-No, Name, Address) </li></ul><ul><li> Part (P-No, P-Name, Color) </li></ul><ul><li> ZT_R (S-No, P-No, Capacity ) </li></ul><ul><li>Ex. 2: Student ( Student-No, Name, Birthdate, ... ) </li></ul><ul><li> Lecture (Department, Lecture-No, ...) </li></ul><ul><li> Takes (Student-No, Lecture-No, Grade, ...) </li></ul>
  24. 24. Key of a Relation <ul><li>a set S of attributes of a relation R is a key of R if </li></ul><ul><li>(1) no instance of R may contain two different tuples that have the </li></ul><ul><li>same values for all attributes in S ( uniqueness ) </li></ul><ul><li>(2) there is no true subset of S that has property (1) ( minimality ) </li></ul><ul><li>often depends on the application: </li></ul><ul><li>Ex. 1: Supplier (S-No, Name, Address) </li></ul><ul><li> Part (P-No, P-Name, Color) </li></ul><ul><li> ZT_R (S-No, P-No, Capacity ) </li></ul><ul><li>Ex. 2: Student ( Student-No, Name, Birthdate, ... ) </li></ul><ul><li> Lecture (Department, Lecture-No, ...) </li></ul><ul><li> Takes (Student-No, Lecture-No, Grade, ...) </li></ul>
  25. 25. Key of a Relation (cont.) <ul><li>the question of what is the key of a relation R depends on R‘s schema, </li></ul><ul><li>not on the current instance (Ex.: Supplier . Name ) </li></ul><ul><li>relations can have more than one key </li></ul><ul><ul><li>Ex.: Department ( Name, Address, Dept_Code ): Name and </li></ul></ul><ul><ul><li>Dept_Code are both unique (in one company) and therefore keys </li></ul></ul><ul><ul><li>But: Employee ( Name, P-No, Salary )?? </li></ul></ul><ul><li>If there is more than one key, one selects one of these candidate </li></ul><ul><li>keys as primary key , depending on the application </li></ul><ul><li>if one has more than one relation with the same key, one may consider </li></ul><ul><li>merging them </li></ul><ul><ul><li>Ex.: Department ( Name , Address, Dept_Code) </li></ul></ul><ul><li>Manager (Emp_No, Dept_Name ) </li></ul><ul><li> Dept ( Name , Address, Dept_Code, Manager) </li></ul>
  26. 26. Relational Algebra <ul><li>set of mathematical operations on relations [Codd, 1970s] </li></ul><ul><li>Ex.: R S </li></ul><ul><li>union R  S and difference R - S </li></ul><ul><li>- R and S have to have same arity </li></ul><ul><li>- domains have to be compatible </li></ul>
  27. 27. Relational Algebra (cont.) <ul><li>Cartesian Product R  S </li></ul><ul><li>projection (subset of columns) </li></ul><ul><li>-Ex.:  A,C (R) </li></ul><ul><li>Selection (subset of tuples (lines)) </li></ul><ul><li>- Ex.:  B=b (R) </li></ul>
  28. 28. Relational Algebra (cont.)  - Join R S i  j - i, j : names of columns (R.i, S.j) -  : arithmetic comparison operator (=, <,  , ...) - subset of the Cartesian product R  S, for which  is true - Ex.: R S B<D
  29. 29. Relational Algebra (cont.) <ul><li>equijoin </li></ul><ul><li>special kind of  - Join:  is = </li></ul><ul><li>Ex.: R S </li></ul><ul><li>B=D </li></ul><ul><li>natural Join </li></ul><ul><li>special kind of equijoin </li></ul><ul><li>applicable if the two input relations have columns </li></ul><ul><li>with the same name </li></ul><ul><li>Ex.: T U </li></ul><ul><li>T U T U </li></ul>
  30. 30. SQL: Structured Query Language <ul><li>4 th Generation Language (4 GL) for data querying and manipulation </li></ul><ul><li>4 GL: user only has to specify which data are needed, not how </li></ul><ul><li>they can be obtained (data independence!) </li></ul><ul><li>DBMS (Database Management System) takes care of (efficient) </li></ul><ul><li>computation </li></ul><ul><li>SQL : IBM Research (San Jose, Kalifornien), '70s </li></ul>
  31. 31. Toy Database Customers Orders Contains Supplies
  32. 32. SQL: Projection <ul><li>Ex.: Find name and account balance of all customers </li></ul><ul><li>relational algebra: </li></ul><ul><li>in SQL: </li></ul><ul><li>projection in SQL in general: </li></ul><ul><li>SELECT R i 1 ·A 1 , R i 2 ·A 2 , ..., R i r ·A r </li></ul><ul><li>FROM R 1 , R 2 , ..., R k </li></ul>
  33. 33. SQL: Selection <ul><li>Ex.: Find all customers with negative balance </li></ul><ul><li>SQL: </li></ul><ul><li>relational algebra: </li></ul><ul><li>selection in SQL in general: </li></ul><ul><li>SELECT * {alle Attribute} </li></ul><ul><li>FROM R </li></ul><ul><li>WHERE  </li></ul>
  34. 34. SQL: Uniqueness of Names <ul><li>if attribute names are unique, one can drop the relation name(s) </li></ul><ul><li>in the SELECT and the WHERE clause </li></ul><ul><li>Ex.: </li></ul><ul><li>SELECT Customers.Name </li></ul><ul><li>FROM Customers </li></ul><ul><li>WHERE Customers.Balance < 0 </li></ul>
  35. 35. SQL: Aliases <ul><li>alias = second name for an attribute </li></ul><ul><li>to be attached to the original name of the column </li></ul><ul><li>Ex.: </li></ul><ul><li>SELECT Name Client, Address, Balance Deficit </li></ul><ul><li>FROM Customers </li></ul><ul><li>WHERE Balance < 0 </li></ul>
  36. 36. SQL: Equijoins <ul><li>Ex.: Find the products ordered by Peter </li></ul><ul><li>in relational algebra </li></ul><ul><li>in SQL </li></ul><ul><li>SELECT </li></ul><ul><li>FROM </li></ul><ul><li>WHERE </li></ul><ul><li>Ex.: Find the names of all suppliers that carry at least one of the </li></ul><ul><li>products that have been ordered by Peter </li></ul>
  37. 37. SQL: Processing a Join Query <ul><li>Ex.: </li></ul><ul><li>SELECT Product </li></ul><ul><li>FROM Orders, Contains </li></ul><ul><li>WHERE Customer = '‘Peter'' </li></ul><ul><li>AND Orders.O_No = Contains.O_No </li></ul>
  38. 38. SQL: Processing a Join Query (cont.)  Selection: Customer = '‘Peter''  Equijoin: Orders.O_No = Contains.O_No <ul><li>Starting with Orders </li></ul>
  39. 39. SQL: Processing a Join Query (cont.) Projection: SELECT Product  <ul><li>Operations can sometimes be exchanged  efficiency? </li></ul>
  40. 40. SQL: Deleting Multiple Copies of a Tuple <ul><li>why do they exist? </li></ul><ul><li>keyword DISTINCT </li></ul><ul><li>Ex.: SELECT DISTINCT Customer </li></ul><ul><li>FROM Orders </li></ul><ul><li>without DISTINCT ? </li></ul>
  41. 41. SQL: Tuple Variables <ul><li>necessary if one needs to address several different tuples of the </li></ul><ul><li>same relation in the same query </li></ul><ul><li>Ex.: Find names and addresses of all customers that have less money </li></ul><ul><li>on their account than Jane </li></ul><ul><li>SELECT </li></ul><ul><li>FROM </li></ul><ul><li>WHERE </li></ul><ul><li>AND </li></ul><ul><li>tuple variables are relations, i.e., sets of tuples </li></ul><ul><li>they serve to represent intermediate results </li></ul>
  42. 42. SQL: Tuple Variables <ul><li>necessary if one needs to address several different tuples of the </li></ul><ul><li>same relation in the same query </li></ul><ul><li>Ex.: Find names and addresses of all customers that have less money </li></ul><ul><li>on their account than Jane </li></ul><ul><li>SELECT C1.Name, C1.Address </li></ul><ul><li>FROM Customers C1, Customers C2, </li></ul><ul><li>WHERE C1.Balance < C2.Balance </li></ul><ul><li>AND C2.Name = ''Jane'' </li></ul><ul><li>tuple variables are relations, i.e., sets of tuples </li></ul><ul><li>they serve to represent intermediate results </li></ul>
  43. 43. SQL: Subqueries <ul><li>nesting of queries </li></ul><ul><li>reference to intermediate results via the keyword IN </li></ul><ul><li>Ex.: Find all suppliers that carry at least one of the products ordered by Peter </li></ul><ul><li>1 SELECT Name </li></ul><ul><li>2 FROM Supplies </li></ul><ul><li>3 WHERE Product IN </li></ul><ul><li>4 (SELECT Product </li></ul><ul><li>5 FROM Contains </li></ul><ul><li>6 WHERE O_No IN </li></ul><ul><li>7 (SELECT O_No </li></ul><ul><li>8 FROM Orders </li></ul><ul><li>9 WHERE Customer = '‘Peter'')) </li></ul><ul><li>IN corresponds to the element operator  </li></ul>
  44. 44. SQL: Subqueries (cont.) <ul><li>Instead of IN : ALL </li></ul><ul><li>SELECT Product </li></ul><ul><li>FROM Supplies </li></ul><ul><li>WHERE Price >= ALL </li></ul><ul><li>(SELECT Price </li></ul><ul><li> FROM Supplies) </li></ul><ul><li>ALL corresponds to the universal quantor  </li></ul>
  45. 45. SQL: Subqueries (cont.) <ul><li>Instead of IN : ANY </li></ul><ul><li>SELECT FROM Orders </li></ul><ul><li>WHERE O_No < ANY </li></ul><ul><li>(SELECT O_No </li></ul><ul><li>FROM Orders </li></ul><ul><li>WHERE Customer ='‘Peter'') </li></ul><ul><li>ANY corresponds to the existential quantor  </li></ul>
  46. 46. SQL: Subqueries (cont.) <ul><li>Statt IN : = </li></ul><ul><li>SELECT Product </li></ul><ul><li>FROM Contains </li></ul><ul><li>WHERE O_No = </li></ul><ul><li>(SELECT O_No </li></ul><ul><li> FROM Orders </li></ul><ul><li> WHERE Customer = ''Ruth'') </li></ul><ul><li>If cardinality of the subquery‘s result is greater than 1: ERROR </li></ul>
  47. 47. SQL: Aggregates <ul><li>Functions for the aggregation of single values </li></ul><ul><li>AVG - average </li></ul><ul><li>COUNT - number </li></ul><ul><li>SUM - sum </li></ul><ul><li>MIN - minimum </li></ul><ul><li>MAX - maximum </li></ul><ul><li>STDDEV - standard deviation </li></ul><ul><li>VARIANCE - variance </li></ul><ul><li>Ex.: SELECT AVG(Balance) </li></ul><ul><li>FROM Customers </li></ul><ul><li>Or: SELECT AVG(Balance) Average </li></ul><ul><li>FROM Customers </li></ul>
  48. 48. SQL: Aggregates (cont.) <ul><li>Ex.: </li></ul><ul><li>SELECT COUNT (DISTINCT Name) No-Suppliers </li></ul><ul><li>FROM Supplies </li></ul><ul><li>Ex.: </li></ul><ul><li>SELECT COUNT(Name) No-Brie-Suppliers </li></ul><ul><li>FROM Supplies </li></ul><ul><li>WHERE Product =''Brie'' </li></ul><ul><li>- no duplicate elimination required </li></ul>
  49. 49. SQL: Aggregation and Grouping <ul><li>GROUP BY </li></ul><ul><li>A 1 , A 2 , ..., A k </li></ul><ul><li>two tuples are in the same group if they have the same values for </li></ul><ul><li>the attributes A 1 , A 2 , ..., A k </li></ul><ul><li>Ex.: </li></ul><ul><li>SELECT Product, AVG(Price) Average-Price </li></ul><ul><li>FROM Supplies </li></ul><ul><li>GROUP BY Product </li></ul>
  50. 50. SQL: Aggregation and Grouping (cont.) <ul><li>Ex: </li></ul><ul><li>SELECT Customer, AVG(Amount) </li></ul><ul><li>FROM Orders, Contains </li></ul><ul><li>WHERE Orders.O_No = Contains.O_No </li></ul><ul><li>GROUP BY Customer </li></ul>
  51. 51. SQL: GROUP BY ... HAVING <ul><li>general format: </li></ul><ul><li>GROUP BY A 1 , A 2 , ..., A k </li></ul><ul><li>HAVING  </li></ul><ul><li> is a boolean expression that is applied to each group separately </li></ul><ul><li>one selects only those groups where the condition  is true </li></ul><ul><li>Ex.: </li></ul><ul><li>SELECT Product, AVG(Price) Average-Price </li></ul><ul><li>FROM Supplies </li></ul><ul><li>GROUP BY Product </li></ul><ul><li>HAVING COUNT( * ) > 1 </li></ul><ul><li>Or : </li></ul><ul><li>HAVING COUNT (DISTINCT Price) > 1 </li></ul>
  52. 52. SQL: Insertion of Tuples <ul><li>in general: </li></ul><ul><li>INSERT INTO R </li></ul><ul><li>VALUES (V i , ..., V k ) </li></ul><ul><li>ex.: </li></ul><ul><li>INSERT INTO Supplies </li></ul><ul><li>VALUES (''Jack'',''Oysters'',.24) </li></ul><ul><li>null values: </li></ul><ul><li>INSERT INTO Supplies (Name, Product) </li></ul><ul><li>VALUES (''Jack'',''Oysters'') </li></ul><ul><li>nested insertions: </li></ul><ul><li>INSERT INTO Sales-Chris </li></ul><ul><li>SELECT Product, Price </li></ul><ul><li>FROM Supplies </li></ul><ul><li>WHERE Name = ''Chris'' </li></ul>
  53. 53. SQL: Deletion of Tuples <ul><li>in general: </li></ul><ul><li>DELETE FROM R </li></ul><ul><li>WHERE  </li></ul><ul><li>ex.: </li></ul><ul><li>DELETE FROM Supplies </li></ul><ul><li>WHERE Name = ''Chris'' </li></ul><ul><li>AND Product = ''Perrier'' </li></ul><ul><li>ex.: Delete all orders containing Brie </li></ul>
  54. 54. SQL: Updating Tuples <ul><li>in general: </li></ul><ul><li>UPDATE R </li></ul><ul><li>SET A 1 =x 1 , ..., A k =x k </li></ul><ul><li>WHERE  </li></ul><ul><li>ex.: </li></ul><ul><li>UPDATE Supplies </li></ul><ul><li>SET Price = 1.00 </li></ul><ul><li>WHERE Name = ''Chris'' </li></ul><ul><li>AND Product = ''Perrier'' </li></ul><ul><li>ex.: Chris reduces all prices by 10 percent.. </li></ul>
  55. 55. SQL - DDL <ul><li>DDL : Data Definition Language </li></ul><ul><li>so far we only discussed the DML - Data Manipulation Language </li></ul><ul><li>typical DDL command: CREATE TABLE </li></ul><ul><li>general format: </li></ul><ul><li>CREATE TABLE R(A 1 T 1 [NOT NULL], ..., </li></ul><ul><li> A k T k [NOT NULL]) </li></ul><ul><li>ex.: </li></ul><ul><li>CREATE TABLE Supplies </li></ul><ul><li>(Name CHAR(20) NOT NULL, </li></ul><ul><li> Product CHAR(10) NOT NULL, </li></ul><ul><li> Price NUMBER (6,2)) </li></ul><ul><li>to delete a table: DROP TABLE Supplies </li></ul>
  56. 56. Views <ul><li>logical relations </li></ul><ul><li>so far we only discussed physical relations (stored on disk), also called base relations </li></ul><ul><li>views serve to represent specific user views </li></ul><ul><li>view contents are not stored physically but computed on demand </li></ul><ul><li>one can query (i.e., read only) views just like base relations </li></ul><ul><li>updates (write access) are not so easy </li></ul>
  57. 57. Views (cont.) <ul><li>view definition - general form </li></ul><ul><li>CREATE VIEW V (A 1 , ... , A k ) AS </li></ul><ul><li><SELECT Query> </li></ul><ul><li>Ex.: CREATE VIEW Offer - Chris (Product, Price) AS </li></ul><ul><li>SELECT Product, Price </li></ul><ul><li>FROM Supplies </li></ul><ul><li>WHERE Name = 'Chris' </li></ul>
  58. 58. View Update Problem <ul><li>ex.: Offer - Chris </li></ul><ul><ul><li>DELETE </li></ul></ul><ul><ul><li>INSERT </li></ul></ul><ul><ul><li>UPDATE (Price) </li></ul></ul><ul><ul><li>UPDATE (Product) </li></ul></ul><ul><li>more complex example.: </li></ul><ul><li>CREATE VIEW Customer-Order (Name, Date, Product, Amount) AS </li></ul><ul><li>SELECT Customer, Date, Product, Amount </li></ul><ul><li>FROM Orders, Contains </li></ul><ul><li>WHERE Orders.O_No = Contains.O_No </li></ul><ul><li>- DELETE </li></ul><ul><li>- INSERT </li></ul><ul><li>- UPDATE (Name) </li></ul><ul><li>- UPDATE (Date) </li></ul><ul><li>- UPDATE (Product) </li></ul><ul><li>- UPDATE (Amount) </li></ul>
  59. 59. View Update Problem (cont.) <ul><li>ex.: CREATE VIEW X AS </li></ul><ul><li>SELECT Product, AVG(Price) DP </li></ul><ul><li>FROM Supplies </li></ul><ul><li>GROUP BY Product </li></ul><ul><li>- UPDATE (DP) </li></ul><ul><li>- UPDATE (Product) </li></ul><ul><li>- INSERT </li></ul><ul><li>- DELETE </li></ul>
  60. 60. View Update Problem (cont.) <ul><li>ex.: CREATE VIEW Y AS </li></ul><ul><li>SELECT C2.Name, C2.Address </li></ul><ul><li>FROM Customers C1, Customers C2 </li></ul><ul><li>WHERE C2.Balance < C1.Balance </li></ul><ul><li>AND C1.Name = 'Jane' </li></ul><ul><li>- INSERT </li></ul><ul><li>- DELETE </li></ul><ul><li>- UPDATE (Name) </li></ul><ul><li>- UPDATE (Address) </li></ul>
  61. 61. View Update Problem (cont.) <ul><li>Views can be updated if </li></ul><ul><li>(1) the corresponding base relations can be updated (i.e., no </li></ul><ul><li>non-updatable views) </li></ul><ul><li>(2) the SELECT command is a combination of only projections </li></ul><ul><li>( column subsets ) and selections ( row subsets ) (i.e., no joins, </li></ul><ul><li>subqueries, tuple variables, aggregates, etc.). In case of projections, </li></ul><ul><li>the key has to be preserved. </li></ul>
  62. 62. View Update Problem (cont.) all possible views views that can be updated views according to (1) and (2) views that can be updated in SQL (version-dependent)
  63. 63. Views - Summary <ul><li>logical relations </li></ul><ul><li>defined using physical base relations (and possibly other views) </li></ul><ul><li>(typically) not stored physically but computed on demand using </li></ul><ul><li>the current content of the base relations </li></ul><ul><li>same data can be „viewed“ in different shapes </li></ul><ul><li>supports different user groups and privacy </li></ul><ul><li>view updates: problematic because not all updates can be mapped </li></ul><ul><li>to base relations </li></ul>
  64. 64. Databases - Programming Languages <ul><li>collision of two different paradigms </li></ul><ul><li>- PL: one tuple at a time </li></ul><ul><li>- DB: many tuples at a time </li></ul><ul><li>interface tuple - variable: communication via „cursors“ (buffer) </li></ul><ul><li>queries are preformulated using variables </li></ul><ul><li>instantiation at run-time with real values </li></ul>
  65. 65. Ex: Embedded SQL exec sql begin declare section; int O_No, Amount; char Date [10], Customer [20], Product [10]; exec sql end declare section; exec sql connect; exec sql prepare order-insert from insert into Orders values (:O_No, :Date, :Customer); exec sql prepare cont-insert from insert into Contains values (:O_No, :Product, :Amount); write (‚Enter Order No., Date, and Customer‘); read (O_No); read (Date); read (Customer); exec sql execute order-insert using :O_No, :Date, :Customer; write (‚Enter a list of tuples ‚Product-Amount‘, terminate with ´end´´); read (Product); while (Product ! = 'end') { read (Amount); exec sql execute cont_insert using :O_No, :Product, :Amount; read (Product); }
  66. 66. Integrity in Databases <ul><li>maintenance of a correct relationship database - real world </li></ul><ul><li>(possibly automatical) identification of invalid states of the database </li></ul><ul><li>(i.e., states without correspondence in the real world) </li></ul><ul><li>three kinds of integrity </li></ul><ul><ul><li>domain-specific integrity (application-specific, ex.: date) </li></ul></ul><ul><ul><li>key integrity </li></ul></ul><ul><ul><li>schema integrity </li></ul></ul>
  67. 67. Integrity in Databases (cont.) <ul><li>key integrity </li></ul><ul><li>- rule 1 ( entity integrity ): </li></ul><ul><li>each relation must have a key, and each tuple in the relation must have </li></ul><ul><li>a key value that is unique and non-NULL. </li></ul><ul><li>- rule 2 ( referential integrity ): </li></ul><ul><li>for each foreign key FK there is another relation with a primary key </li></ul><ul><li>PK such that each non-NULL value of FK is identical to an existing </li></ul><ul><li>value of PK . </li></ul><ul><li>- Ex.: </li></ul><ul><li>foreign key O_No in relation Contains , </li></ul><ul><li>foreign key Customer in relation Orders </li></ul><ul><li>schema integrity </li></ul>
  68. 68. Database Design <ul><li>ex. for bad database design: </li></ul><ul><li>Suppliers - Info </li></ul><ul><li>disadvantages </li></ul><ul><ul><li>redundancies </li></ul></ul><ul><ul><li>update anomalies </li></ul></ul><ul><ul><li>insertion anomalies (ex: supplier without products) </li></ul></ul><ul><ul><li>deletion anomalies (NULL in key) </li></ul></ul>
  69. 69. Database Design by Decomposition <ul><li>approach: </li></ul><ul><ul><li>decomposition into relations with less columns </li></ul></ul><ul><ul><li>Careful: no information loss </li></ul></ul><ul><li>Ex.: Suppliers (L-Name, L-Address) </li></ul><ul><li>Supplies (L-Name, Product, Price) </li></ul><ul><li>disadvantage: may require additional join operations at query time </li></ul>
  70. 70. Functional Dependencies <ul><li>logical dependencies between columns </li></ul><ul><li>causes many of the problems discussed above </li></ul><ul><li>- redundancies </li></ul><ul><li>- update anomalies </li></ul><ul><li>- ... </li></ul><ul><li>Definition : If for a relation R there is a functional dependency (FD) </li></ul><ul><li>X  Y (where X and Y may represent one or several columns of R) </li></ul><ul><li>then the following holds for two arbitrary tuples t 1 and t 2 in R: </li></ul><ul><li>t1 [X] = t 2 [X]  t 1 [Y] = t 2 [Y] . </li></ul><ul><li>A functional dependency defined on relation R holds for all instances of R </li></ul>
  71. 71. Functional Dependencies (cont.) <ul><li>Ex.: </li></ul><ul><ul><li>Customers: Name  Address </li></ul></ul><ul><li>Name  Balance </li></ul><ul><ul><li>Orders: O_No  Date </li></ul></ul><ul><li>O_No  Customer </li></ul><ul><ul><li>Customers: Address  Address </li></ul></ul><ul><ul><li>Supplies: {Name, Product}  Price </li></ul></ul><ul><li>for each key S of a relation R and each subset T of columns of R </li></ul><ul><li>we have: </li></ul><ul><li>S  T </li></ul><ul><li>Some FDs imply other FDs </li></ul><ul><li>Ex.: F = {A  B, B  C} |= A  C </li></ul>
  72. 72. Closure of FD Sets <ul><li>F + := {X  Y: there is an FD A  B in F: A  B |= X  Y} </li></ul><ul><li>the closure F + of a set F of FDs contains all functional dependencies </li></ul><ul><li>implied by the FDs in F </li></ul><ul><li>Ex.: </li></ul><ul><li>F = {A  B; B  C; AB  C} </li></ul><ul><li>F + = </li></ul>
  73. 73. Minimal Cover of a Set F of FDs <ul><li>given a set F of FDs, F is a minimal cover of F if and only if: </li></ul><ul><li>(1) F + = F + , i.e., all FDs F are implied by the FDs in F . </li></ul><ul><li>F and F are equivalent. </li></ul><ul><li>(2) the right side of each FD in F is a single attribute </li></ul><ul><li>(3) there is no (X  A)  F : ( F -{X  A}) + = F + , </li></ul><ul><li>i.e., there are no superfluous FDs in F </li></ul><ul><li>(4) there is no (X  A)  F , Z  X: F - (X  A)  (Z  A)) + = F + , </li></ul><ul><li>i.e., no FD in F can be replaced by a simpler FD </li></ul>
  74. 74. FDs and Database Design <ul><li>potential problem: too many FDs in a relation </li></ul><ul><li>may lead to anomalies and redundancies </li></ul><ul><li>solution: decomposition into several simple relations </li></ul><ul><li>R i  R (i = 1,..., k) </li></ul><ul><li>R = R 1 |  | R 2 |  | ... |  | R k </li></ul><ul><li>less redundancies but possibly more joins </li></ul><ul><li>important for preservation of information: </li></ul><ul><ul><li>one has to be able to re-assemble R by joining the R i </li></ul></ul><ul><ul><li>(lossless join ) </li></ul></ul><ul><ul><li>the FDs defined in R have to be definable on the R i </li></ul></ul><ul><ul><li>(preservation of dependencies ) </li></ul></ul>
  75. 75. Database Design and Normal Forms <ul><li>why normal forms? </li></ul><ul><li>- format standardization (1NF) </li></ul><ul><li>- reduction/elimination of redundancies (2NF, 3NF, ...) </li></ul><ul><li>theoretical tool for improving/maintaining database design quality </li></ul><ul><li>in practice, however: redundancy vs. efficiency </li></ul><ul><li>- redundant data may lead to inconsistencies after updates </li></ul><ul><li>- but useful for efficiency reasons </li></ul><ul><li>(shorter response times) </li></ul><ul><li>tradeoff problem: to be decided case by case </li></ul>
  76. 76. 1st Normal Form (1NF) <ul><li>all attributes have to be atomic </li></ul><ul><li>no „repeating groups“ </li></ul><ul><li>important foundation of the relation model </li></ul><ul><li>but: may lead to increased redundancy </li></ul><ul><li>Ex.: relation Supplies </li></ul>repeating groups (a) not in 1NF (b) in 1NF
  77. 77. 2nd Normal Form (2NF) <ul><li>1NF + for all attributes A and attribute sets X in relation R: </li></ul><ul><li>X  A in R X is no real subset of at least one key of R </li></ul><ul><li>AND  OR </li></ul><ul><li>A not in X A is key attribute (i.e., it belongs to at least one key of R) </li></ul><ul><li>note: if R has only one key, this is equivalent to: </li></ul><ul><li>1 NF + each non-key attribute is fully functionally dependent on the key, </li></ul><ul><li>i.e., it can not be inferred from part of the key </li></ul><ul><li>trivially true for one-column keys </li></ul><ul><li>Ex.: relation Supplies </li></ul><ul><li>- Supplies (Name, Product, Price) is in 2NF if and only if Price </li></ul><ul><li>depends on both Name and Product (free pricing) </li></ul><ul><li>- with fixed prices (e.g. books in Germany), Supplies is no longer in 2NF </li></ul><ul><li>- possibly decomposition into Supplies’ (Name, Product) and </li></ul><ul><li>Costs (Product, Price) </li></ul>
  78. 78. 3rd Normal Form (3NF) <ul><li>2NF + for all attributes A and attribute sets X in relation R: </li></ul><ul><li>X is a key of R </li></ul><ul><li>X  A in R OR </li></ul><ul><li>AND  X contains a key of R </li></ul><ul><li>A not in X OR </li></ul><ul><li>A is a key attribute </li></ul><ul><li>note: if there is only one key, this is equivalent to: </li></ul><ul><li>2NF + non-key attributes are mutually independent </li></ul><ul><li>sufficient (but not necessary) condition:: </li></ul><ul><li>if an FD in the minimal cover contains all attributes of R then R is in 3NF </li></ul><ul><li>Ex.: relation Customers (Name, Address, Balance) </li></ul><ul><li>- all attributes atomic  1NF </li></ul><ul><li>- keys have only one column  2NF </li></ul><ul><li>- Address and Balance are mutually independent  3NF </li></ul>
  79. 79. 3NF - An Example <ul><li>relation R = (C, S, Z) </li></ul><ul><li>functional dependencies: F = { CS  Z, Z  C} </li></ul><ul><li>R in 3NF? </li></ul><ul><li>keys of R: </li></ul><ul><li>key attributes of R: </li></ul><ul><li>1NF </li></ul><ul><li>2NF </li></ul><ul><li>3 NF </li></ul>
  80. 80. 3NF - An Example <ul><li>relation R = (C, S, Z) </li></ul><ul><li>functional dependencies: F = { CS  Z, Z  C} </li></ul><ul><li>R in 3NF? </li></ul><ul><li>keys of R: CS, ZS </li></ul><ul><li>key attributes of R: </li></ul><ul><li>1NF </li></ul><ul><li>2NF </li></ul><ul><li>3 NF </li></ul>
  81. 81. 3NF - An Example <ul><li>relation R = (C, S, Z) </li></ul><ul><li>functional dependencies: F = { CS  Z, Z  C} </li></ul><ul><li>R in 3NF? </li></ul><ul><li>keys of R: CS, ZS </li></ul><ul><li>key attributes of R: C, S, Z </li></ul><ul><li>1NF </li></ul><ul><li>2NF </li></ul><ul><li>3 NF </li></ul>
  82. 82. 3NF - An Example <ul><li>relation R = (C, S, Z) </li></ul><ul><li>functional dependencies: F = { CS  Z, Z  C} </li></ul><ul><li>R in 3NF? </li></ul><ul><li>keys of R: CS, ZS </li></ul><ul><li>key attributes of R: C, S, Z </li></ul><ul><li>1NF: no problem </li></ul><ul><li>2NF </li></ul><ul><li>3 NF </li></ul>
  83. 83. 3NF - An Example <ul><li>relation R = (C, S, Z) </li></ul><ul><li>functional dependencies: F = { CS  Z, Z  C} </li></ul><ul><li>R in 3NF? </li></ul><ul><li>keys of R: CS, ZS </li></ul><ul><li>key attributes of R: C, S, Z </li></ul><ul><li>1NF: no problem </li></ul><ul><li>2NF: o.k. because Z and C are key attributes </li></ul><ul><li>3 NF </li></ul>
  84. 84. 3NF - An Example <ul><li>relation R = (C, S, Z) </li></ul><ul><li>functional dependencies: F = { CS  Z, Z  C} </li></ul><ul><li>R in 3NF? </li></ul><ul><li>keys of R: CS, ZS </li></ul><ul><li>key attributes of R: C, S, Z </li></ul><ul><li>1NF: no problem </li></ul><ul><li>2NF: o.k. because Z and C are key attributes </li></ul><ul><li>3 NF: o.k. for the same reason </li></ul>
  85. 85. Decompositon into 3NF <ul><li>given : relation R, set of FD's F </li></ul><ul><li>find : decomposition of R into a set of 3NF relations R i </li></ul><ul><li>algorithm: </li></ul><ul><li>IF R in 3NF </li></ul><ul><li>THEN stop </li></ul><ul><li>ELSE </li></ul><ul><li>compute minimal cover F of F; </li></ul><ul><li>create a separate relation R i = A for each attribute Athat does not </li></ul><ul><li>occur in any FD in F ; </li></ul><ul><li>create a relation R i = XA for each FD X  A in F ; </li></ul><ul><li>if the key K of R does not occur in any relation R i , create one </li></ul><ul><li>more relation R i = K. </li></ul><ul><li>decomposition fulfills </li></ul><ul><li>- lossless join </li></ul><ul><li>- preservation of dependencies </li></ul>
  86. 86. Decomposition into 3NF - Example Attributes: L ... Lecture R ... Room I ... Instructor S ... Student T ... Time G ... Grade Relational Schema: R= (L, I, T, R, S, G) Functional Dependencies:
  87. 87. Decomposition into 3NF - Example (cont.) Attributes: L ... Lecture R ... Room I ... Instructor S ... Student T ... Time G ... Grade Relational Schema: R= (L, I, T, R, S, G) Functional Dependencies: F = { L  I , TR  L, TI  R, LS  G, TS  R, TRI  LR}
  88. 88. Decomposition into 3NF - Example (cont.) <ul><li>Keys: </li></ul><ul><li>Key Attributes: </li></ul>
  89. 89. Decomposition into 3NF - Example (cont.) <ul><li>Keys: ST </li></ul><ul><li>Key Attributes: S, T </li></ul>
  90. 90. Decomposition into 3NF - Example (cont.) <ul><li>F = { L  I , </li></ul><ul><li>TR  L, </li></ul><ul><li>TI  R, </li></ul><ul><li>LS  G, </li></ul><ul><li>TS  R, </li></ul><ul><li>TRI  LR} </li></ul><ul><li>Minimal Cover </li></ul><ul><li>Decomposition into R i </li></ul>
  91. 91. Decomposition into 3NF - Example (cont.) <ul><li>F = { L  I , </li></ul><ul><li>TR  L, </li></ul><ul><li>TI  R, </li></ul><ul><li>LS  G, </li></ul><ul><li>TS  R, </li></ul><ul><li>TRI  LR} </li></ul><ul><li>Minimal Cover </li></ul><ul><li>F = {L  I , </li></ul><ul><li>TR  L, </li></ul><ul><li>TI  R, </li></ul><ul><li>LS  G, </li></ul><ul><li>TS  R} </li></ul><ul><li>Decomposition into R i </li></ul>
  92. 92. Indices <ul><li>data structures (often tree structures) that serve to accelerate database searches </li></ul><ul><li>frequent synonyms: index structures, access methods </li></ul><ul><li>Ex.: Supplies (Name, Product, Price) </li></ul>
  93. 93. Indices (cont.) <ul><li>Name and Product are the indexed columns </li></ul><ul><li>Index on Name is primary index </li></ul><ul><li>- indexed column is part of the primary key </li></ul><ul><li>- relation is sorted by increasing primary key </li></ul><ul><li>- well suited for processing range queries (Ex.: Find all suppliers </li></ul><ul><li>whose name starts with B, C or D) </li></ul><ul><li>all other indices: secondary indices </li></ul><ul><li>tradeoff: queries vs. updates </li></ul><ul><li>- indices accelerate many queries ... </li></ul><ul><li>- ... but slow down updates </li></ul>
  94. 94. Dense vs. Sparse Indices <ul><li>relations are stored in blocks (pages) on the magnetic disk </li></ul><ul><li>crucial cost factor: how many blocks to I have to transfer from disk </li></ul><ul><li>to main memory in order to answer the query? </li></ul><ul><li>non-dense (or sparse ) index: one index entry per block </li></ul><ul><li>- for a primary index it suffices to store the smallest key value per block </li></ul><ul><li>- index supports the system when looking for the relevant block(s) </li></ul><ul><li>- inside each block: local search (cf. telephone directory) </li></ul><ul><li>- useful for large relations because very compact </li></ul><ul><li>- only possible for columns according to which the relation </li></ul><ul><li>has been sorted (cf. phone directory) </li></ul><ul><li>- therefore: at most one sparse index per relation </li></ul><ul><li>dense index: one index entry per tuple </li></ul>
  95. 95. How Does a Disk Access Work? Disk Drive Read block Write block Main Memory
  96. 96. Dense vs. sparse indices: An Example Oysters Peanuts Lettuce Index on Name (sparse) Index on Product (dense) Price Peanuts Oysters Lettuce
  97. 97. <ul><li>large relations  large indices </li></ul><ul><li>indexing a larger index leads to a smaller index etc. </li></ul><ul><li>tree structure </li></ul><ul><li>root fits on one page (= one block) </li></ul>Layered Indices Index (often dense) File (Relation)
  98. 98. B+ Tree <ul><li>tree structure as described above </li></ul>A
  99. 99. B+ Tree (cont.) <ul><li>B+ trees are balanced (i.e., all leaves are on the same level) </li></ul><ul><li>lowest level (leaves): dense, otherwise : sparse </li></ul><ul><li>each node fits on one page (  N entries) </li></ul><ul><li>N = page size / space requirements per entry (Ex. above: N = 3) </li></ul><ul><li>minimal page utilization (guaranteed): N/2 entries </li></ul>
  100. 100. B+ Tree (cont.) <ul><li>each node has between N/2 and N entries </li></ul><ul><li>problems: overflow, underflow </li></ul><ul><li>Ex.: N = 3 </li></ul>A
  101. 101. B+ Tree (cont.)
  102. 102. B+Baum (cont.)
  103. 103. Hashing - An Alternative to Indices <ul><li>hash function h: </li></ul><ul><li>data value  storage address </li></ul><ul><li>Ex.: storage address = data value MOD p </li></ul><ul><li>(p typically a prime number) </li></ul><ul><li>Ex.: p = 13 </li></ul>Hash Field
  104. 104. Hashing (cont.): Storage Structure <ul><li>only one hash field per relation! </li></ul><ul><li>advantage: very fast access </li></ul><ul><li>disadvantage: </li></ul><ul><li>- relation dispersed across the disk </li></ul><ul><li>- collisions </li></ul>
  105. 105. Hashing (cont.): Collision Chains
  106. 106. Query Optimization <ul><li>Ex.: </li></ul><ul><li>SELECT DISTINCT Orders.Customer </li></ul><ul><li>FROM Orders, Contains </li></ul><ul><li>WHERE Orders.O_No = Contains.O_No </li></ul><ul><li>AND Contains.Product = 'Brie' </li></ul><ul><li>Assumptions: </li></ul><ul><li>100,000 tuples in Orders, 1000 bytes each </li></ul><ul><li>1,000,000 tuples in Contains , 100 bytes each </li></ul><ul><li>1,000 tuples in Contains concern Brie </li></ul><ul><li>100 MB main memory </li></ul>
  107. 107. Query Optimization (cont.) <ul><li>Strategy 1: </li></ul><ul><li>1) Compute cartesian product Orders  Contains </li></ul><ul><li>2) Select all tuples with Orders.O_No = Contains.O_No </li></ul><ul><li>3) Select all tuples with Contains.Product = 'Brie' </li></ul><ul><li>4) Project to Customer </li></ul><ul><li>Strategy 2: </li></ul><ul><li>1) Select all tuples from Contains with Product = 'Brie' </li></ul><ul><li>2) Compute cartesian product with Orders </li></ul><ul><li>3) Select all tuples with Orders.O_No = Contains.O_No </li></ul><ul><li>4) Project to Customer </li></ul>
  108. 108. Query Optimization (cont.) <ul><li>Analysis Strategy 1: </li></ul><ul><li>(1)+(2): Tuple-I/Os for Orders : </li></ul><ul><li> Tuple-I/Os for Contains : </li></ul><ul><li>(3)+(4): Tuple-I/Os: </li></ul><ul><li>Tuple-I/Os in total: </li></ul><ul><li>Analysis Strategy 2: </li></ul><ul><li>(1): Tuple-I/Os for Contains : </li></ul><ul><li>(2)-(4): Tuple-I/Os: </li></ul><ul><li>Tuple-I/Os in total: </li></ul><ul><li>Strategy 2 is clearly superior (Factor?) </li></ul>
  109. 109. Query Optimization (cont.) <ul><li>Which (meta)data should be stored? (Statistics) </li></ul><ul><li>- number of tuples for each relation </li></ul><ul><li>- number of columns for each relation </li></ul><ul><li>- number of different values per column </li></ul><ul><li>- occurence frequencies of particular values </li></ul><ul><li>More information facilitates query optimization but slows down updates </li></ul><ul><li>Automatical optimization preferable because </li></ul><ul><li>- statistics always up-to-date </li></ul><ul><li>- more cost-efficient </li></ul><ul><li>- dynamic </li></ul><ul><li>Important strength of relational systems </li></ul>
  110. 110. Transaction Processing <ul><li>Transaction (TA) </li></ul><ul><li>- logical unit of work </li></ul><ul><li>- should be executed either completely or not at all </li></ul><ul><li>- atomic, i.e., not decomposable </li></ul><ul><li>Recovery </li></ul><ul><li>Concurrency </li></ul>
  111. 111. Recovery <ul><li>Recovery: restart after system fault </li></ul><ul><li>System faults </li></ul><ul><li>- program crash </li></ul><ul><li>- arithmetic mistakes (e.g. overflow) </li></ul><ul><li>- disk crash </li></ul><ul><li>- power failure </li></ul><ul><li>Ex.: </li></ul><ul><li>DELETE </li></ul><ul><li>FROM Contains </li></ul><ul><li>WHERE O_No = 1024 </li></ul><ul><li>What happens in case of system fault „in the middle“ </li></ul>
  112. 112. Recovery (cont .) <ul><li>COMMIT </li></ul><ul><li>- operation to terminate a TA successfully </li></ul><ul><li>- all updates are stored in the database permanently </li></ul><ul><li>- storage on „safe“ storage medium </li></ul><ul><li>- transaktion is finalized </li></ul><ul><li>- bundling of several COMMIT operations in checkpoints </li></ul><ul><li>ROLLBACK </li></ul><ul><li>- operation to abort a TA in case of system fault </li></ul><ul><li>- changes in CPU registers and storage are reversed </li></ul><ul><li>Important for ROLLBACK </li></ul><ul><li>- logging each single modification </li></ul><ul><li>- storing the log on a „safe“ medium </li></ul>
  113. 113. Recovery (cont.) (Updates are stored on some “safe” medium) checkpoint checkpoint checkpoint error recovery
  114. 114. Recovery (cont.) <ul><li>3 types of transactions </li></ul><ul><li>- transactions that already completed and whose results have been made </li></ul><ul><li>permanent: T1 </li></ul><ul><li>- transactions that have already completed but whose results have not </li></ul><ul><li>yet been made permanent: T2, T4  REDO (i.e. re-run, after recovery </li></ul><ul><li>these transactions will have completed) </li></ul><ul><li>- transactions that started but that did not finish: T3, T5  UNDO (i.e. </li></ul><ul><li>reversal of all modifications, ROLLBACK of each transaction concerned; </li></ul><ul><li>after recovery these transactions will not have completed) </li></ul>
  115. 115. Concurrency: Dirty Read Problem transaction A action on basis of R.X read from R.X transaction B commit B update R.X Problem! ROLLBACK A R.X .. attribut es of R R .. relation
  116. 116. Concurrency: Lost Update Problem transaction A transaction B transaction B transaction A A reads R.X double R.X A writes new value of R.X Commit A B reads R.X B adds 2 to R.X B writes new value of R.X Commit B A changes R.X B changes R.X A reads R.X A Rollback B Commit
  117. 117. Concurrency: Possible Solutions <ul><li>Timestamps to coordinate transactions </li></ul><ul><li>Locks : temporary blocking of parts of the database </li></ul><ul><li>- exclusive lock (X-Lock): read/write lock, i.e. no other TA </li></ul><ul><li>is allowed to read or write the blocked data </li></ul><ul><li>- shared lock (S-Lock): write lock, i.e., others can read but not write </li></ul><ul><li>If a TA wants to read, it first has to ask for an S-lock for the required data </li></ul><ul><li>If a TA wants to write, it first has to ask for an X-lock for the required data </li></ul><ul><li>compatibility of locks </li></ul><ul><li>S+S ... OK </li></ul><ul><li>S+X ... Not OK </li></ul><ul><li>X+X ... Not OK </li></ul>
  118. 118. Locks: Application to Dirty Read Yes Yes Yes Yes Yes Yes N N N
  119. 119. Locks: Application to Dirty Read (cont.) TA A obtains an X-lock for the field R.X to prepare for the planned update TA B asks for an S-lock to prepare for the planned read operation  REJECTED ROLLBACK A  locks are released TA A obtains S-lock TA B performs read operation + COMMIT restart TA A <ul><li>Ex. 1: </li></ul>
  120. 120. Locks: Application to Dirty Read (cont.) TA A requests X-Lock for R.X TA A obtains X-Lock, updates R.X TA B requests S-Lock  REJECTED, TA B waits TA A ROLLBACK TA B obtains S-Lock, reads R.X TA B COMMIT restart TA A, re-obtains X-Lock <ul><li>Ex. 2: </li></ul>
  121. 121. Locks: Application to Lost Update DEADLOCK  break via Rollback of some TA TA A wants to read R.X, asks for S-lock TA A obtains S-lock, reads R.X TA B also wants to read R.X, asks for S-Lock TA B obtains S-Lock, reads R.X TA A wants to update R.X, asks for X-Lock TA A does not obtain X-Lock because TA B holds an S-Lock  A waits TA B wants to update R.X, asks for X-Lock TA B does NOT obtain X-Lock  B waits
  122. 122. Deadlocks <ul><li>Problem: How to recognize deadlocks? </li></ul><ul><li>How to treat deadlocks involving several TAs? </li></ul><ul><li>Searching for cycles in the WAIT-FOR graph </li></ul>wait for
  123. 123. Serializability <ul><li>Given a set of TAs, which possible events should be considered correct? </li></ul><ul><li>Convention: a schedule is considered correct if it is serializable </li></ul><ul><li>Serializability means that the result of the schedule is identical to the result </li></ul><ul><li>of some serial schedule </li></ul><ul><li>Ex.: </li></ul>(TA1) A := A + 1  read A into main memory add 1 write A back into the DB (TA2) A := 2 * A  read A into main memory multiply by 2 write A back into the DB (TA3) write A  read A into main memory display A on the screen set A to 1 in the DB
  124. 124. Serializability - An Example <ul><li>Assumption: A = 1 </li></ul><ul><li>TA1, TA2, TA3: </li></ul><ul><li>TA1, TA3, TA2: </li></ul><ul><li>TA2, TA3, TA1: </li></ul><ul><li>TA2, TA1, TA3: </li></ul><ul><li>TA3, TA1, TA2: </li></ul><ul><li>TA3, TA2, TA1: </li></ul>
  125. 125. Concurrency: 2-Phase Locking <ul><li>2-Phase locking protocol </li></ul><ul><ul><li>for each transaction one first asks for all required locks (phase I) </li></ul></ul><ul><ul><li>processing ... </li></ul></ul><ul><ul><li>then all locks are (gradually) released (phase II) </li></ul></ul>TA2: no 2-phase-locking number of locks
  126. 126. Concurrency and 2-Phase Locking Theorem: 2-Phase Locking Protokoll for each transaction Serializability of the schedule 2-phase-locking all „reasonable“ possibilities equivalent to FIFO serial serializable
  127. 127. <ul><li>Constraints and Properties </li></ul><ul><li>- Minimum distance between roads and biotopes </li></ul><ul><li>- River width varies widely - line vs. polygon </li></ul><ul><li>- Roads are not necessarily connected </li></ul><ul><li>- River and road shapes are independent of each other </li></ul><ul><li>- Biotope shape depends on river shape </li></ul>Environmental Data Modeling: An Example
  128. 128. Environmental Data Modeling: An Example (2) <ul><li>Queries </li></ul><ul><li>What is the distance between the planned road and the biotope? </li></ul><ul><li>Which roads have a distance of less than x meters from a biotope? </li></ul><ul><li>Where do we need an intersection? </li></ul><ul><li>Where do we need a bridge? </li></ul><ul><li>How much area is enclosed between roads and river? </li></ul><ul><li>Which roads go along the river? </li></ul><ul><li>Updates </li></ul><ul><li>An intersection is built. </li></ul><ul><li>The road is built. </li></ul><ul><li>A bridge is built. Generate a class bridge dynamically. </li></ul>
  129. 129. Spatial Data Types <ul><li>Points </li></ul><ul><li>Lines </li></ul><ul><li>Polygons </li></ul><ul><li>Curves </li></ul><ul><li>Polyhedra in arbitrary dimensions </li></ul><ul><li>Applications </li></ul><ul><li>Computer graphics Robotics </li></ul><ul><li>CAD/CAM Geography </li></ul><ul><li>Computer vision Environmental information systems </li></ul>
  130. 130. Spatial Operators (1): Set Operators <ul><li>Union </li></ul><ul><li>Intersection </li></ul><ul><li>Difference </li></ul>
  131. 131. Spatial Operators (2): Search Operators Point Query: find all spatial objects that contain/are near a given point Range Query: find all objects that contain/ intersect/are contained in a given spatial object, such as a polygon
  132. 132. Spatial Operators (3): Similarity Operators <ul><li>Translation </li></ul><ul><li>Rotation </li></ul><ul><li>Scaling </li></ul>
  133. 133. Spatial Operators (4): Spatial Joins <ul><li>Join between different classes of objects </li></ul><ul><li>Examples </li></ul><ul><li>Find all houses that are within 10 km from a lake </li></ul><ul><li>Find all buildings that are located within a biotope </li></ul><ul><li>Find all schools that are more than 5 km away from a firestation </li></ul><ul><li>Related: general map overlay </li></ul>
  134. 134. Spatial Data Structures (1): Vertex Lists <ul><li>List of polygon vertices </li></ul><ul><li>Supported operators : </li></ul><ul><li>Similarity operators </li></ul><ul><li>(Set operators) </li></ul><ul><li>Problems: </li></ul><ul><li>Not unique </li></ul><ul><li>No invariants </li></ul><ul><li>List vs. set </li></ul><ul><li>Simple polygons - invalid representations </li></ul>
  135. 135. Spatial Data Structures (2): B-Rep (Boundary Representation)
  136. 136. Spatial Data Structures (3): B-Rep (Boundary Representation) <ul><li>3D: DAG of height 3 </li></ul><ul><li>Supported operators: Similarity operators </li></ul><ul><li>Problems: not unique, invalid representations, </li></ul><ul><li>search / set operators, redundancy </li></ul>
  137. 137. What's the problem with commercial GIS? <ul><li>GIS = Geographic Information Systems </li></ul><ul><li>Originally oriented towards file systems </li></ul><ul><li>Scaling problems </li></ul><ul><li>No ad hoc query facility </li></ul><ul><li>Semantic integrity problems </li></ul><ul><li>Single user environment, little or no concurrency </li></ul><ul><li>No distributed GIS </li></ul><ul><li>Little support for application-specific data types or operators </li></ul><ul><li>Possible solution: use commercial databases </li></ul>
  138. 138. And what about commercial databases? (1) <ul><li>No geometric data types: point, line, polygon, ... </li></ul><ul><li>Geometric representation may be hidden in a long field </li></ul><ul><li>... or in an external file </li></ul><ul><li>Inflexible </li></ul><ul><li>No database support for geometric operations </li></ul><ul><li>No notion of topology </li></ul><ul><li>Redundancy </li></ul>polygon polygon
  139. 139. And what about commercial databases? (2) <ul><li>Objects may be decomposed onto different relations </li></ul><ul><li>No spatial clustering </li></ul><ul><li>Shared objects  less redundancy </li></ul><ul><li>Example: boundary representation </li></ul>part faces edges vertices
  140. 140. And what about commercial databases? (3) <ul><li>No spatial access methods </li></ul><ul><li>Little support for application-specific object types </li></ul><ul><li>- Cities </li></ul><ul><li>- Rivers </li></ul><ul><li>- ... </li></ul><ul><li>... or for application-specific operations </li></ul><ul><li>- Build a bridge </li></ul><ul><li>- Modify a shape </li></ul><ul><li>- ... </li></ul>
  141. 141. Database Extensions (1) Abstract Data Types <ul><li>Abstract data types (ADTs) </li></ul><ul><li>- Encapsulation of a (user-defined) data structure </li></ul><ul><li>- Collection of (user-defined) operators on this structure </li></ul><ul><li>- Implementation details hidden from the user </li></ul><ul><li>ADTs in databases: BOX - example </li></ul>create boxes (ID = i4, layer = c15, box-desc = Box) append to boxes (ID = 99, layer = &quot;polysilicon&quot;, box-desc = &quot;0,0 : 2,3&quot;) range of b is boxes replace b (box-desc = b.box-desc INT &quot;0,0 : 4,1&quot;) where b.ID = 99 retrieve (boxes.ID) where AREA(boxes.box-desc > 100)
  142. 142. Database Extensions (2): Implementation of Abstract Data Types define type Box is (Internal length = 16, Input Proc = CharToBox, Output Proc = BoxToChar, Default = '' '') define operator INT (Box,Box) returns Box is (Proc = BoxInt, Precedence = 3, Associativity = ''left'', Sort = left X) define operator AE (Box,Box) returns boolean is (Proc = BoxAE, Precedence = 3, Associativity = ''left'', Sort = BoxArea, Hashes, Restrict = AERSelect, Join = AEJSelect, Negator = BoxAreaNE) <ul><li>C-Procedures BoxArea, AERSelect, AEJSelect, etc. </li></ul>
  143. 143. Database Extensions (3): Implementation of Abstract Data Types <ul><li>Advantages </li></ul><ul><li>- Very flexible </li></ul><ul><li>- Data structures and operators can be very complex </li></ul><ul><li>Disadvantages </li></ul><ul><li>- Two programming paradigms: DBMS and C </li></ul><ul><li>- ADT maps into only one column: structural information gets lost </li></ul><ul><li>- Complexity hidden in the ''black box'‘ </li></ul><ul><li>- Problems for query optimization: what's inside? </li></ul>
  144. 144. <ul><li>Point query </li></ul><ul><li>Range query </li></ul>Database Extensions (3): Spatial Access Methods
  145. 145. Database Extensions (5): R - Trees <ul><li>Features </li></ul><ul><li>- Hierarchy of d-dimensional boxes </li></ul><ul><li>- Balanced tree </li></ul><ul><li>- One node per disk page </li></ul><ul><li>- Fully dynamic </li></ul><ul><li>Problems </li></ul><ul><li>- Overlap of sibling boxes - bad for point searches </li></ul><ul><li>- Arbitrary shapes: additional computations and disk accesses (clustering!) </li></ul>
  146. 146. Object-Oriented Database Systems <ul><li>The OODBS Manifesto (Atkinson et al. 1989): OODBS = DBS + ... </li></ul><ul><ul><li>Complex objects (PART-OF) - Structural OO </li></ul></ul><ul><ul><li>User-defined data types - Behavioral OO </li></ul></ul><ul><ul><li>Object identity </li></ul></ul><ul><ul><li>Encapsulation </li></ul></ul><ul><ul><li>Types/Classes </li></ul></ul><ul><ul><li>Inheritance (IS-A) </li></ul></ul><ul><ul><li>Operators: overloading / overriding / late binding </li></ul></ul>
  147. 147. Behavioral Object-Orientation for Geometric Modeling <ul><li>Integration of complex geometric data types and operators </li></ul><ul><li>add class Point </li></ul><ul><li>type tuple (x: real </li></ul><ul><li>y: real) </li></ul><ul><li>add method DistOrigin: real </li></ul><ul><li>in class Point </li></ul><ul><li>return (sqrt(sqr(selfx)+sqr(selfy))) </li></ul>
  148. 148. Structural Object-Orientation for Geometric Modeling (1) <ul><li>Complex geometric objects </li></ul><ul><li>Boundary representation: 3D 2D 1D 0D </li></ul><ul><li>Shared subobjects: faces, lines, points </li></ul>
  149. 149. Structural Object-Orientation for Geometric Modeling (2) add class River type tuple (rname: string rshape: list(PolylineOrPolygon)) add class PolylineOrPolygon type list(Point) add class Polyline inherits PolylineOrPolygon ... add class Polygon inherits PolylineOrPolygon ... add class Point type tuple (x: real y: real)
  150. 150. Structural Object-Orientation for Application Modeling (1) <ul><li>Complex geo-objects </li></ul><ul><li>Example: city - districts - streets </li></ul>
  151. 151. Structural Object-Orientation for Application Modeling (2) add class City type tuple (cname: string cpopulation: integer districts: set(District) cshape: Polygon) add class District type tuple (dname: string dpopulation: integer dshape: Polygon streets: set(Street)) add class Street type tuple (sname: string sshape: Polyline)
  152. 152. Behavioral Object-Orientation for Application Modeling <ul><li>Integration of application-specific data types and operations </li></ul><ul><li>add method CompPop: integer </li></ul><ul><li>in class City </li></ul><ul><li>d: District </li></ul><ul><li>p: integer </li></ul><ul><li>for each d in self  districts { </li></ul><ul><li>p = p+d  dpopulation </li></ul><ul><li>} </li></ul><ul><li>return(p) </li></ul><ul><li>add method CompShape </li></ul><ul><li>... </li></ul><ul><li>add method CompStreets </li></ul><ul><li>... </li></ul>

×