Upcoming SlideShare
Loading in...5







Total Views
Views on SlideShare
Embed Views



0 Embeds 0

No embeds



Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

Stage1Raj.ppt Stage1Raj.ppt Presentation Transcript

  • An open source DBMS for handheld devices by Rajkumar Sen
  • Outline
    • Introduction and Motivation
    • System Architecture
    • Storage Model and Index
    • Query Processing
    • Data Synchronization
    • Transaction Management
    • Summary and Proposed Future Work
  • Introduction and Motivation
    • What is a handheld device
      • A small computer with limited resources
      • e.g. Simputer, Palm devices, iPAQ etc.
    • The Simputer
      • One of the most powerful handhelds
      • Low cost and shareable
      • Developed at IISc
      • Intel StrongArm processor
      • Flash Memory for stable storage (24MB)
      • Limited main memory (32MB)
      • Recognizes Smartcards
      • Linux operating system
  • Introduction and Motivation
    • Why DBMS for a Simputer
      • Increasing number of applications
      • e.g. Microbanking, E-governance, Agricultural market pricing etc.
      • They deal with a fair amount of data
      • Complex queries involving joins and aggregates
      • Atomicity and Durability for data consistency
      • Ease of application development
    • Need for Synchronization
      • Data from remote server downloaded on the Simputer
      • Updates at both places
      • Common data needs to be synchronized
    • Open source since Simputer is designed to be low-cost
  • Introduction and Motivation
    • No open source DBMS designed for handheld devices
      • Oracle Lite and DB2 Everyplace are not open source
      • MySQL and BerkeleyDB primarily designed for disk-based systems
    • Data in Flash Memory
      • Access time less than disk
      • Random access as good as sequential access
    • System Constraints
      • Limited storage capacity
      • Less main memory
      • Slow writes in Flash Memory
  • Introduction and Motivation
    • Goal is to design a system that consists of
      • A lightweight relational DBMS that resides on the Simputer
      • An application that downloads data from a remote server
      • A synchronization tool
    • Database Module Toolkit
      • Developing the building blocks as modules
      • Modules can be plugged depending on the type of application
  • System Architecture
    • Key components of the system
      • Data and Metadata Manager
        • Representation of relational data and data structures for metadata
      • Main Memory Manager
        • Manages data to be kept in memory
      • Transaction Manager
        • Ensures data consistency
      • Access Right Manager
        • Security rules needed as Simputer is shareable
      • Query Engine
        • SQL compiler and query processing
      • FetchRelation Module
        • Fetches data from a remote server
      • Synchronization Module
        • Synchronizes common data in Simputer and remote server
  • System Architecture Figure: System Architecture
  • Storage Model and Indexing
    • Aim at compactness in representation of data as well as index
    • Existing models
      • Flat Storage
        • Tuples are stored sequentially. Ensures access locality but consumes
        • space. Access locality not an issue since data in flash memory
      • Pointer-based Domain Storage
        • Eliminates duplicates
        • Values partitioned into domains which are sets of unique values
        • Tuples reference the attribute value by means of pointers
        • One domain shared among multiple attributes
        • No index
      • Pointer-based Ring Storage (Bobineau et al)
        • Modification to Domain Storage
        • Uses domain structure as index
        • Tuple-to-tuple pointers connect two tuples that have same attribute value
        • Index structure in the form of a ring
  • Storage Model and Indexing
    • Ring Storage Model contd.
      • Advantages
      • Addresses data and index compactness
      • Join becomes easy
      • Drawbacks
      • For projecting value of an attribute, need to traverse half of the ring
    • Some other Models
      • DBGraph (Pucheral et al)
      • Maintains value-to-tuple as well as tuple-to-value pointers.
      • Too many pointers
      • Domain Tree (Missikov et al)
      • Maintains Domain trees, for simple projection need to scan
      • all the domain trees.
      • Another model (Krithi et al ) suggested for data warehouses
      • uses projection index and join index
  • Storage Model and Indexing
    • Our Model
      • Based on Domain Storage Model
      • Each domain element is a (length, value) tuple
      • Domains used for duplicates and variable width attributes
      • When primary key-foreign key relation, instead of storing value in child table we store pointer to the tuple in the parent table which contains the value
    Figure: Our storage model
  • Storage Model and Indexing
    • Index structure
      • Sorted array of Tuple Identifier List (pointers to tuples)
      • Each List corresponds to a value, consists of pointers to those tuples that share it
      • Ordering preserved in the index structure, hence efficient execution of range queries
    Figure: Our index model
  • Storage Model and Indexing
    • Our model eliminates the problems by storing some additional pointers
      • Data storage separate from index structure, hence projection is easy
      • Index created on attribute rather than domains
      • For primary key-foreign key relation, join requires scanning of only child table
      • For domain based join attributes, join indices can be built
    Figure: Join index
  • Query Processing
    • Entire data is memory resident
      • Aim to reduce number of tuples read from memory
    • Join without indices
    • Assuming R and S are the outer and inner relations with n and m tuples
      • Nested loop join
        • Most reasonable choice when very less memory available.
        • No additional structure
        • Cost of join is mn
      • Hash Join
        • Requires the creation of hash partitions in main memory.
      • Memory permitted, hash join is faster
      • Instead of storing values in the partitions, we store pointers to tuples
      • Cost = (m + n ) //for building hash partitions
      • + (m + n) //for reading the hash partitions
      • (m + n) writes to memory and (m + n) pointer chases
  • Query Processing
    • Sort-Merge Join
    • Need memory for sorting
    • While sorting, do not store the tuples but only pointers to the tuples
    • Cost = ( m log m + n log n) //for sorting
    • + ( m + n) // for reading to merge
    • (m + n) writes to memory and (m + n) pointer chases
    • Join with Indices
      • Indexed Nested loop join
      • Relation on which index is available is treated as the inner relation
      • Cost = n //for reading the outer relation R
      • + n log m //for each tuple of R, index into S
  • Query Processing
      • Sort-Merge Join
      • If indices available on both join attributes, sorting not needed
    • Cost = m + n //for merging
    • Some cost involved in traversing the index and chasing pointers
    • Join Indices
    • A Single traversal of the index and chasing pointers gives the tuples
    • which are candidates for joining
    • Other operations
    • Selection, Projection and Sorting are straightforward.
    • Duplicate elimination , aggregation ,union, intersection, and difference
    • can be computed by sorting and building hash indices on relations
  • Query Processing
    • Query Plan Generation and Memory Allocation
      • An optimal query execution plan is needed
      • Reduce materialization, left-deep tree and bushy trees are ruled out
      • Choose from right-deep and extreme right-deep tree depending on
      • memory available for storing intermediate results
      • Query optimizer has to be memory cognizant
      • Memory must be optimally allocated among all operators since we
      • cannot assume entire memory is available for each operator
      • Arvind et. al. suggests memory allocation to operators based on cost
      • functions
      • Aggregation, sorting and duplicate removal generally performed on
      • materialized results.
      • By enforcing a particular tuple arrival order at the leaf of the plan tree,
      • pipelining of aggregation etc. is possible
  • Synchronization
    • Main data resides on a remote server
      • Information in the form of relations in a remote DBMS
      • Relevant information is downloaded on the Simputer
      • Copies of data present on both Simputer and remote server
      • Need for synchronization
    • Two scenarios in data synchronization
    • 1. Relation downloaded on the Simputer
      • Download the entire relation R only once
      • Subsequently, transfer only the differentials
      • R simpnew = R simpold U ∆ R main
      • R mainnew = R mainold U ∆ R simp
    • Problem arises when the same tuple is updated at both places.
    • Tuple Conflict
        • Detection harder than resolution
  • Synchronization
    • 2.Resultset downloaded on the Simputer
      • Execute the query at the remote server and transfer only once
      • Subsequently transfer only the differentials
      • q= ( R S )
      • q new = q old U (i r S)
      • q new = q old − (d r S)
      • For each query, find out what differential to compute.
        • Resembles the Incremental View Maintenance Problem
      • In-place updates on the resultset in the Simputer means changes
      • have to be propagated to the remote database
        • Resembles the View Update Problem
      • Updates to downloaded relations and resultsets recorded in
      • Intention List and Intention View List respectively.
      • Determine whether the changes can be reflected
  • Synchronization
    • View Update Considerations
      • Only legal view updates
      • Only legal database updates
      • Only underlying tuples affected
      • Insertion of extra tuples
      • Modification of other view tuples
      • Keller suggests translations for inserts, deletes, and modifications
      • - Restricts the relations to BCNF
      • - Joins should have an inclusion dependency
      • - Our remote server relations need not satisfy
      • Langerak suggests translations based on extension joins
      • - No restriction on normal forms and joins
      • - Can be the basis of our synchronization engine
    • Most of the rules need data and metadata about the base relations
      • - Remote database connection needed to validate updates
  • Transaction Management
    • ACID properties need to be maintained
      • Global atomicity
      • - data updates in both Simputer and smartcard.
    • - Simputer participates in distributed transactions
      • Simputer a single user system, why Concurrency Control
      • - Transactions on behalf of data synchronization
      • - Long transaction doing some aggregation
    • Concurrency Control
      • Most of the transactions expected to complete fast
      • Lock contention reduces, hence large locking granules
      • Korth et al proposes a serial protocol . Main aim is to improve
      • response time rather than throughput
      • Instead of hash tables, bits can be used for lock information
  • Transaction Management
    • Commit processing
      • Shadow updates
          • Data locality not an issue since data in memory
          • Not suitable for pointer-based storage models
          • If one domain value changes, its location also changes
          • and all tuple pointers need to be updated
          • Size of the shadow is also a concern
      • Log-based updates
          • Better suited for pointer-based models
          • Cost of maintaining the logs
          • Pointer-based logging instead of value-based logging
    • Limited memory, hence Steal buffer replacement policy.
      • Dirty blocks output to smartcard stable storage when connected
      • If all dirty blocks of a transaction output to smartcard , no Undo
    • More detailed survey left as future work
  • Summary and Proposed Future Work
    • Discussion and solutions proposed can be generalized for any
    • handheld device.
    • However, device specific optimizations possible
    • Future Work
      • More detailed survey of concurrency control and recovery
      • Protocols for validation of updates on downloaded data
      • Query engine needs to be designed
      • Access Rights Management issues need to be addressed
      • Database module toolkit
      • Implement the system on the Simputer
      • Evaluate the system
  • Thank You
  • References
    • A. Ammann, M. Hanrahan, and R. Krishnamurthy. Design of a Memory Resident DBMS. In IEEE COMPCON, 1985.
    • 2. C. Bobineau, L. Bouganim, P. Pucheral, and P. Valduriez. PicoDBMS: Scaling down Database Techniques for the Smartcard. In VLDB, 2000.
    • 3. Stephen Blott and Henry F. Korth. An Almost Serial Protocol for Transaction Execution in Main Memory Database Systems. In VLDB, 2002.
    • 4. DB2 Everyplace.
    • 5. Anindya Datta, Debra VanderMeer, Krithi Ramamritham, and Bongki Moon. Applying Parallel Processing Techniques in Data Warehousing and OLAP. In VLDB, 1999.
    • 6. A. Hulgeri, S. Sudarshan, and S. Seshadri. Memory Cognizant Query Optimization. In Advances In Data Management, 2000.
  • References
    • 7. Arthur M. Keller. Algorithms for Translating View Updates to
    • Database Updates for Views Involving Selections, Projections and
    • Joins. In ACM PODS, 1985.
    • 8. Rom Langerak. View Updates in Relational Databases with an
    • Independent Scheme. In ACM PODS, 1990.
    • T. Lehmann and M. Carey. A Study of Index Structures for Main
    • Memory DBMS. In VLDB, 1986.
    • 10. M. Missikov and M. Scholl. Relational Queries in a Domain Based
    • DBMS. In ACM SIGMOD, 1983.
    • Mysql.
    • 12. P. Pucheral, P. Valduriez, and J.M.Thevenin. EÆcient Main
    • Memory Data Management using the DBGraph Storage Model. In
    • VLDB, 1990.
    • 13. The Simputer.
  • Storage Model and Indexing
    • Aim at c ompactness in representation of data as well as index
    • Existing models
      • Flat Storage
        • Tuples are stored sequentially. Ensures access locality but consumes
        • space. Access locality not an issue since data in flash memory
      • Pointer-based Domain Storage
        • Eliminates duplicates
        • Values partitioned into domains which are sets of unique value
        • Tuples reference the attribute value by means of pointers
        • One domain shared among multiple attributes
        • No index
    Figure: Domain Storage
  • Storage Model and Indexing
    • Pointer-based Ring Storage (Bobineau et al)
      • Modification to Domain Storage
      • Uses domain structure as index
      • Tuple-to-tuple pointers connect two tuples that have same attribute value
      • Index structure in the form of a ring
    Figure: Ring Storage