• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content







Total Views
Views on SlideShare
Embed Views



0 Embeds 0

No embeds



Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

    Stage2Raj.ppt Stage2Raj.ppt Presentation Transcript

    • An open source DBMS for handheld devices Stage 2 by Rajkumar Sen IIT Bombay Under the guidance of Prof. Krithi Ramamritham
    • Outline
      • Introduction
      • Storage Management
      • Query Processing
      • Future Work
    • Introduction
      • Stage 1
      • Survey of
        • Storage Models: Flat Storage, Domain Storage,
        • and Ring Storage
        • Query Processing issues
        • Data Synchronization
        • Concurrency Control and Recovery
      • Goals for stage 2
        • New storage models to further reduce storage cost
        • Memory cognizant query processing
        • Data Synchronization issues
        • System Implementation issues
    • Storage Management
      • Aim at compactness in representation of data
      • Existing storage models
        • Flat Storage
        • Pointer-based Domain Storage
      • In Domain Storage, pointer of size p (typically 4 bytes) to
      • point to the domain value.
        • Can we further reduce the storage cost?
    • Storage Management
      • ID Storage :
        • An identifier for each of the domain values
        • Identifier is the ordinal value in the domain table
        • Store the identifier instead of the pointer
        • Use the identifier as an offset into the domain table
        • Extendable IDs, length of the identifier grows and shrinks depending on the number of domain values
    • Storage Management
      • D domain values can be distinguished by identifiers of
      • length log 2 D /8 bytes.
      • Starting with 1 byte identifiers, the length grows and shrinks.
      • ID values are projected out from the rest of the relation and
      • stored separately maintaining Positional Indexing.
      • Why not bit identifiers?
        • Storage is byte addressable.
        • Packing bit identifiers in bytes increases the storage management complexity.
    • Storage Management Relation R ID Values Figure: ID Storage 0 1 2 1 n 0 n v0 v1 vn Domain Values Positional Indexing
    • Storage Management
      • Ping Pong Effect
        • At the boundaries, there is reorganization of ID values
        • when the identifier length changes
        • Frequent insertions and deletions at the boundaries might
        • result in a lot of reorganization
        • Phenomena should be avoided
      • No deletion of Domain values
        • Domain structure means a future insertion might reference
        • the deleted value
        • Do not delete a domain value even it is not referenced
      • Setting a threshold for deletion
        • Delete only if number of deletions exceeds a threshold
        • Increase the threshold when boundaries are being crossed
    • Storage Management
      • Primary Key-Foreign Key relationship
        • Primary key: A domain in itself
        • IDs for primary key values
        • Values present in child table are the corresponding primary key IDs
        • Projected foreign key column forms a Join Index
      Child Table Relation S S.B ID Values Figure: Primary Key-Foreign Key Join Index 0 1 2 1 n 0 n v0 v1 vn Parent Table Relation R
    • Storage Management
      • ID based Storage wins over Domain Storage
      • when p > log 2 D /8
      • Relations in a small device do not have a very high cardinality
      • Above condition true for most of the data.
      • Advantages
      • (i) Considerable saving in storage cost.
      • (ii) Efficient join between parent table and child table
    • Storage Management
      • Bitmap Storage
        • When the number of domain values is very less compared
        • to the number of tuples, e.g., True, False
        • Selection on multiple attributes
      • A Data + Index Model
        • A bitmap index is created for every bitmap attribute
        • Attribute values are not stored in the base relation
        • The index can be used to retrieve the domain value of each tuple
      • Cost of Projection becomes high as is the case with Ring Storage
      • Join index of parent table-child table possible by storing
      • bitmaps for every primary key value
    • Storage Management
      • Bitmap Storage not an alternative to Ring Storage
      • Indexing capabilities of both models are different
      • Depending on attribute characteristics, choose the
      • appropriate model
      • Memory requirement for selection
        • Number of bit vectors is equal to the number of attributes
        • that form part of the selection
        • Bit vectors in memory
    • Query Processing
      • Considerations
        • Minimize writes to secondary storage
        • Efficient usage of limited main memory
        • Read buffer not required
        • Main memory as write buffer
        • If read:write ratio very high, flash memory as write buffer
      • Query Plan
        • An optimal query plan is needed
        • Reduce materialization, if absolutely necessary use main memory
        • Bushy trees and right-deep trees are ruled out
        • Left deep tree is most suited for pipelined evaluation
        • Right operand in a left-deep tree is always a stored relation
        • Only one input is pipelined
    • Query Processing
      • Memory Allocation to Operators
        • Limited main memory, cannot assume that the entire memory
        • is available for every operator in the left-deep tree plan
        • Can the plan be executed with the available memory?
        • If nested loop algorithms are used for every operator, minimum
        • amount of memory is needed to execute the plan
        • Nested loop algorithms are inefficient
        • Should memory usage be reduced to a minimum at the
        • cost of performance?
        • Memory increasing with every new device
        • Different devices come with different memory sizes
        • Query plans should make extensive use of memory
        • Memory must be optimally allocated among all operators
    • Query Processing
      • Operator evaluation schemes
        • Different schemes for an operator
        • All have different memory usage and cost
        • Schemes conform to left-deep tree query plan
        • Cost of a scheme is the computation time
      • Schemes for Join
        • Nested Loop Join
        • Indexed Nested Loop Join
        • Hash Join
      • Similar schemes for other operators
    • Query Processing
      • Benefit/Size of a scheme
        • Every scheme is characterized by a benefit/size ratio which
        • represents its benefit per unit memory allocation
        • Minimum scheme for an operator is the scheme that has max.
        • cost and min. memory
        • Assume n schemes s 1 , s 2 ,…s n to implement an operator o
        • min(o)=s min
        • i, 1≤i≤n : Cost(s i ) ≤ Cost(s min ) ,
        • Memory(s i ) ≥ Memory(s min )
        • s min is the minimum scheme for operator o . Then,
        • Benefit(s i )=Cost(s min ) – Cost(s i )
        • Size(s i ) =Memory(s i ) – Memory(s min )
    • Query Processing
      • An operator is defined by the benefit and size of its schemes
      • Every operator is a collection of (size,benefit) points, n points
      • for n schemes
      Benefit (0,0) (s1,b1) (s2,b2) Figure: (Size, Benefit) points for an operator Size
    • Query Processing
      • Optimal Memory Allocation
        • Determine the amount of memory allocated to each operator
        • to get maximum benefit
      • 2-Phase Approach
        • Phase 1: Query is first optimized to get a query plan
        • Phase 2: Division of memory among the operators
        • Scheme for every operator is determined in phase 1 and remains
        • unchanged after phase 2, memory allocation in phase 2 on the
        • basis of the cost functions of the schemes
        • Memory is assumed to be available for all the schemes, this may
        • not be true for a resource constrained device
    • Query Processing
      • Depending on the available memory, need to determine the
      • best scheme for every operator out of all possible ones
      • Schemes in phase 1 and after phase 2 need not be the same
      • Optimal division of memory involves the decision of selecting
      • the best scheme for every operator
    • Query Processing
      • Our Solution
        • We use a heuristic to determine which operator gains
        • the most per unit memory allocation and allocate
        • memory to that operator
        • Gain of every operator is determined by its best
        • possible scheme
        • Repeat the process till memory allocation is done
        • Heuristic:
        • Select the scheme that has the maximum benefit/size
        • and allocate its memory
    • Query Processing
      • MemAllocate(M Total ) {
      • 1. M min = Memory(min(i))
      • 2. for i=1 to m do
      • 3. Scheme(i)=min(i)
      • 4. end for
      • 5. M avail = M Total – M min
      • 6. s best ,o best =GetBestScheme(M avail )
      • 7. if no best scheme then return
      • 8. else {
      • 9. M avail = M avail – Memory(s best ) + Memory(Scheme(o best ))
      • 10. Scheme(o best )=s best
      • 11. RemoveSchemes(s best ,o best )
      • 12. RecomputeBenefits(s best ,o best )
      • 13. }
      • 14. goto step 6
      • }
      • Complexity = O(nm 2 ), m=no. of operators, n=no. of schemes
      Σ i=1 m
    • Query Processing
      • Recomputation of Benefits
          • Once the operator o best gets memory Memory(s best ),
          • the benefit and size of all the schemes of o best that
          • have higher memory than s best change.
          • New benefit and size values will be the difference
          • between their old values and those of s best.
      Benefit Size (0,0) (s1,b1) (s2,b2) (s2-s1) (b2-b1) Scheme 1 has highest benefit/size ratio Benefit(Scheme 2)=(b2-b1) Size(Scheme 2)=(s2-s1) Figure: Benefit and Size Recomputation
    • Query Processing
      • 1 Phase Approach
      • The 2-phase solution optimally allocates memory to all the
      • operators in the query plan.
      • However, the plan itself might be suboptimal for the given
      • available memory.
      • 1-phase approach takes into account memory division
      • among operators while choosing between plans.
      • Ideally, 1-phase optimization should be done but the
      • optimizer becomes complex.
    • Future Work
      • Implementation Status
        • 1. Flat Storage, Domain Storage, Ring Storage, and ID Storage
        • 2. Join algorithms
      • Future Work
        • Bitmap Storage implementation
        • Algorithms for aggregation
        • Query optimizer and the iterator
        • Test using sample relations and data from handheld apps
        • Examine the feasibility of a 1-phase optimizer
        • Database Module Toolkit
        • An operator that returns first-k results of a query
        • Application specific DBMS
    • Thank You
    • References
      • A. Ammann, M. Hanrahan, and R. Krishnamurthy. Design of a Memory Resident DBMS. In IEEE COMPCON, 1985.
      • 2. C. Bobineau, L. Bouganim, P. Pucheral, and P. Valduriez. PicoDBMS: Scaling down Database Techniques for the Smartcard. In VLDB, 2000.
      • 3. Stephen Blott and Henry F. Korth. An Almost Serial Protocol for Transaction Execution in Main Memory Database Systems. In VLDB, 2002.
      • 4. DB2 Everyplace. http://www.ibm.com/software/data/db2/everyplace.
      • 5. Anindya Datta, Debra VanderMeer, Krithi Ramamritham, and Bongki Moon. Applying Parallel Processing Techniques in Data Warehousing and OLAP. In VLDB, 1999.
      • 6. A. Hulgeri, S. Sudarshan, and S. Seshadri. Memory Cognizant Query Optimization. In Advances In Data Management, 2000.
    • References
      • 7. Arthur M. Keller. Algorithms for Translating View Updates to
      • Database Updates for Views Involving Selections, Projections and
      • Joins. In ACM PODS, 1985.
      • 8. Rom Langerak. View Updates in Relational Databases with an
      • Independent Scheme. In ACM PODS, 1990.
      • T. Lehmann and M. Carey. A Study of Index Structures for Main
      • Memory DBMS. In VLDB, 1986.
      • 10. M. Missikov and M. Scholl. Relational Queries in a Domain Based
      • DBMS. In ACM SIGMOD, 1983.
      • Mysql. http://www.mysql.com.
      • 12. P. Pucheral, P. Valduriez, and J.M.Thevenin. EÆcient Main
      • Memory Data Management using the DBGraph Storage Model. In
      • VLDB, 1990.
      • 13. The Simputer. http://www.simputer.org.
      • A