Published on

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Volume is High: Aqua is about 6MB Simple query: get the names of pesticides for crop disease x Complex query: the amount of money got by selling the crops
  • Why not bit identifiers? Storage is byte addressable. Packing bit identifiers in bytes increases the storage management complexity.
  • Cpu intensive query optimization eg
  • DElite_overview.ppt

    1. 1. The DELite Project: Database Support for Embedded Lightweight Devices Prof. Krithi Ramamritham
    2. 2. Outline of the talk <ul><li>Need for small footprint DBMSs </li></ul><ul><li>New Issues in Implementation </li></ul><ul><li>Project Goals </li></ul><ul><li>Review of Existing Work </li></ul><ul><li>Current Implementation Status </li></ul>
    3. 3. Small DBMSs, e.g., for Handhelds <ul><li>Small, Convenient, Carry anywhere </li></ul><ul><li>Powerful </li></ul><ul><ul><li>E.g. Simputer- 206MHz, 32MB SDRAM, 24 MB Flash memory, LCD display, Smart card </li></ul></ul><ul><li>Applications </li></ul><ul><ul><li>Personal Info Management </li></ul></ul><ul><ul><ul><li>E-dairy </li></ul></ul></ul><ul><ul><li>Enterprise Applications </li></ul></ul><ul><ul><ul><li>Health-care, Micro-banking </li></ul></ul></ul>
    4. 4. Need for Handheld DBMS <ul><li>Handheld applications </li></ul><ul><ul><li>Volume of data is high </li></ul></ul><ul><ul><li>Simple and Complex Queries </li></ul></ul><ul><ul><ul><li>select, project, aggregate </li></ul></ul></ul><ul><ul><li>ACID properties of transactions </li></ul></ul><ul><ul><li>Require Data Privacy </li></ul></ul><ul><ul><li>Need Synchronization </li></ul></ul><ul><li>Database management techniques are needed to meet the above requirements </li></ul>
    5. 5. New Issues in Implementation <ul><li>Small DBMS vs. Disk DBMS </li></ul><ul><ul><li>Handheld DB is Flash memory based </li></ul></ul><ul><ul><ul><li>Disk read time is very small </li></ul></ul></ul><ul><ul><li>Storage model should consider small memory and computation power </li></ul></ul><ul><ul><li>Transaction management and synchronization have to consider disconnections, mobility and communication cost </li></ul></ul><ul><ul><li>Handheld Operating System provides lesser facilities </li></ul></ul><ul><ul><ul><li>E.g. no multi-threading support in PalmOS </li></ul></ul></ul><ul><ul><li>Better security measures are required as handhelds are easily stolen, damaged and lost </li></ul></ul>
    6. 6. Project Goals <ul><li>Existing work – </li></ul><ul><li>Investigations of </li></ul><ul><ul><li>Storage models </li></ul></ul><ul><ul><li>Query processing & optimization </li></ul></ul><ul><ul><li>Executor </li></ul></ul><ul><li>Proposed work </li></ul><ul><ul><li>Compression in Storage </li></ul></ul><ul><ul><li>Transaction management </li></ul></ul><ul><ul><li>Synchronization </li></ul></ul>
    7. 7. Existing Work – Review <ul><li>Storage Management </li></ul><ul><ul><li>Aim at compactness in representation of data </li></ul></ul><ul><ul><li>Limited storage could preclude any additional index </li></ul></ul><ul><ul><ul><li>Data model should try to incorporate some index information </li></ul></ul></ul><ul><li>Query Processing </li></ul><ul><ul><li>Minimize writes to secondary storage </li></ul></ul><ul><ul><li>Efficient usage of limited main memory </li></ul></ul>
    8. 8. Storage Management <ul><li>Existing storage models </li></ul><ul><ul><li>Flat Storage </li></ul></ul><ul><ul><ul><li>Tuples are stored sequentially. Duplicates not eliminated </li></ul></ul></ul><ul><ul><li>Pointer-based Domain Storage </li></ul></ul><ul><ul><ul><li>Values partitioned into domains which are sets of unique values </li></ul></ul></ul><ul><ul><ul><li>Tuples reference the attribute value by means of pointers </li></ul></ul></ul><ul><ul><ul><li>One domain shared among multiple attributes </li></ul></ul></ul>
    9. 9. Storage Management (cont) Flat Storage Domain Storage <ul><li>In Domain Storage, pointer of size p (typically 4 bytes) points to the domain value. Can we further reduce the storage cost? </li></ul>10 20 30 40 p q s r IT12 Flat Relation CSE11 CSE11 CSE11 CSE11 10 20 30 40 p q r s Domain Relation 4 bytes IT12
    10. 10. ID Based Storage Relation R ID Values 0 1 2 1 n 0 n v0 v1 vn Domain Values Positional Indexing
    11. 11. ID Based Storage <ul><li>ID Storage </li></ul><ul><ul><li>An identifier for each of the domain values </li></ul></ul><ul><ul><li>Store the smaller identifier instead of the pointer </li></ul></ul><ul><ul><li>Identifier is the positional value in the domain table. Use it as an offset into the domain table </li></ul></ul><ul><ul><li>D domain values can be distinguished by identifiers of length log 2 D /8 bytes. </li></ul></ul>
    12. 12. ID Storage (cont) <ul><ul><li>Extendable IDs are used. Length of the identifier grows and shrinks depending on the number of domain values </li></ul></ul><ul><ul><li>Starting with 1 byte identifiers, the length grows and shrinks. </li></ul></ul><ul><ul><li>To reduce reorganization of data, ID values are projected out from the rest of the relation and stored separately maintaining Positional Indexing. </li></ul></ul>
    13. 13. ID Storage (cont) <ul><li>Ping Pong Effect </li></ul><ul><ul><li>At the boundaries, there is reorganization of ID values </li></ul></ul><ul><ul><li>when the identifier length changes </li></ul></ul><ul><ul><li>Frequent insertions and deletions at the boundaries might </li></ul></ul><ul><ul><li>result in a lot of reorganization </li></ul></ul><ul><ul><li>Phenomena should be avoided </li></ul></ul><ul><li>No deletion of Domain values </li></ul><ul><ul><li>Domain structure means a future insertion might reference </li></ul></ul><ul><ul><li>the deleted value </li></ul></ul><ul><ul><li>Do not delete a domain value even it is not referenced </li></ul></ul><ul><li>Setting a threshold for deletion for domain values </li></ul><ul><ul><li>Delete only if number of deletions exceeds a threshold </li></ul></ul><ul><ul><li>Increase the threshold when boundaries are being crossed to reduce ping pong effect </li></ul></ul>
    14. 14. ID Storage (cont) <ul><li>Primary Key-Foreign Key relationship </li></ul><ul><ul><li>Primary key is a domain in itself </li></ul></ul><ul><ul><li>IDs for primary key values </li></ul></ul><ul><ul><li>Values present in child table are the corresponding primary key IDs </li></ul></ul><ul><ul><li>Projected foreign key column forms a Join Index </li></ul></ul>Figure: Primary Key-Foreign Key Join Index 0 1 2 1 n 0 n v0 v1 vn Parent Table Relation R Child Table
    15. 15. ID Storage (cont) <ul><li>ID based Storage wins over Domain Storage when pointer size > log 2 D /8 </li></ul><ul><li>Relations in a small device do not have a very high cardinality. </li></ul><ul><li>Above condition true for most of the data. </li></ul><ul><li>Advantages of ID storage </li></ul><ul><ul><li>Considerable saving in storage cost. </li></ul></ul><ul><ul><li>Efficient join between parent table and child table </li></ul></ul>
    16. 16. Query Processing <ul><li>Considerations </li></ul><ul><ul><li>Minimize writes to secondary storage </li></ul></ul><ul><ul><li>Use Main memory as write buffer </li></ul></ul><ul><li>Need for Left-deep Query Plan </li></ul><ul><ul><li>Reduce materialization in flash memory. If absolutely necessary use main memory </li></ul></ul><ul><ul><li>Bushy trees use materialization </li></ul></ul><ul><ul><li>Left deep tree is most suited for pipelined evaluation </li></ul></ul><ul><ul><li>Right operand in a left-deep tree is always a stored relation </li></ul></ul>
    17. 17. Query Processing (cont) <ul><li>Need for optimal memory allocation </li></ul><ul><ul><li>Using nested loop algorithms for every operator ensures that minimum amount of memory used to execute the plan </li></ul></ul><ul><ul><li>Nested loop algorithms are inefficient </li></ul></ul><ul><ul><li>Different devices come with different memory sizes </li></ul></ul><ul><ul><li>Query plans should make efficient use of memory. Memory must be optimally allocated among all operators </li></ul></ul><ul><li>Need to generate the best query execution plan depending on the available memory </li></ul>
    18. 18. Query Processing (cont) <ul><li>Operator evaluation schemes </li></ul><ul><ul><li>Different schemes for an operator </li></ul></ul><ul><ul><li>Schemes conform to left-deep tree query plan </li></ul></ul><ul><ul><li>All have different memory usage and cost </li></ul></ul><ul><ul><li>Cost of a scheme is the computation time </li></ul></ul>
    19. 19. Query Processing (cont) <ul><li>2-Phase optimizer </li></ul><ul><ul><li>Phase 1: Query is first optimized to get a query plan </li></ul></ul><ul><ul><li>Phase 2: Division of memory among the operators </li></ul></ul><ul><ul><li>Scheme for every operator is determined in phase 1 and remains unchanged after phase 2, memory allocation in phase 2 is on the basis of the cost functions of the schemes </li></ul></ul><ul><ul><li>Memory is assumed to be available for all the schemes, this may not be true for a resource constrained device </li></ul></ul><ul><li>Traditional 2-phase optimization cannot be used </li></ul>
    20. 20. Query Processing (cont) <ul><li>1-Phase optimizer </li></ul><ul><ul><li>Query optimizer is made memory cognizant </li></ul></ul><ul><ul><li>Modified optimizer takes into account division of memory among operators while choosing between plans </li></ul></ul><ul><ul><li>Ideally, 1-phase optimization should be done but the optimizer becomes complex. </li></ul></ul>
    21. 21. Query Processing (cont) <ul><li>Modified 2-phase optimizer </li></ul><ul><ul><li>Optimal division of memory involves the decision of selecting the best scheme for every operator </li></ul></ul><ul><ul><li>Phase 1: </li></ul></ul><ul><ul><ul><li>Determine the optimal left-deep join order using dynamic programming approach </li></ul></ul></ul><ul><ul><li>Phase 2: </li></ul></ul><ul><ul><ul><li>Divide memory among the operators </li></ul></ul></ul><ul><ul><ul><li>Choose the scheme for every operator depending on the memory allocated </li></ul></ul></ul>
    22. 22. Query Processing (cont) <ul><li>Memory allocation algorithms </li></ul><ul><ul><li>Exact memory allocation </li></ul></ul><ul><ul><li>Heuristic memory allocation </li></ul></ul><ul><li>Conclusions </li></ul><ul><ul><li>Response times highest with minimum memory and least with maximum memory </li></ul></ul><ul><ul><li>Computing power of the handheld affects the response time in a big way </li></ul></ul><ul><ul><li>Heuristic memory allocation differed from exact algorithm in a few points only </li></ul></ul>
    23. 23. Compression in DB <ul><li>Advantages </li></ul><ul><ul><li>Saves space </li></ul></ul><ul><ul><li>Reduces read time and write time as less data is processed </li></ul></ul><ul><ul><li>Logging consumes less space and time </li></ul></ul><ul><li>Disadvantages </li></ul><ul><ul><li>CPU intensive </li></ul></ul><ul><ul><li>Competes with other CPU intensive DBMS tasks. </li></ul></ul><ul><ul><li>May slow down the DBMS </li></ul></ul>
    24. 24. Compression in Disk DB <ul><li>Main assumption </li></ul><ul><ul><li>The high disk read time compensates for the extra time required for compression and decompression </li></ul></ul><ul><ul><li>E.g. Let time taken to read 10 blocks of data from the disk be 10ms. Let the time taken for compression and decompression be 5ms. After compression 10 blocks occupy only 1 block. </li></ul></ul><ul><ul><li>Processing time with compression/decompression </li></ul></ul><ul><ul><ul><li> = ( 1ms + 5ms) = 6ms </li></ul></ul></ul><ul><li>Handheld DB is Flash memory based </li></ul><ul><ul><li>Read time is very less. Above assumption is no longer valid!! </li></ul></ul>
    25. 25. Transaction Management <ul><li>Ensure ACID properties of local and global transactions </li></ul><ul><ul><li>Local transaction - Update address book entry in Simputer </li></ul></ul><ul><ul><li>Global transaction - Transfer money from a bank account to an epurse in a smart card attached to a Simputer </li></ul></ul><ul><li>Issues </li></ul><ul><ul><li>Frequent disconnections, resource constraints, mobility, loss or damage to handheld </li></ul></ul>
    26. 26. Synchronization <ul><li>Access data Anytime and Anywhere using the handheld </li></ul><ul><ul><li>Mobile sales person, Wireless ware house </li></ul></ul><ul><li>Problem – Not possible to remain connected always </li></ul><ul><li>Solution- Replicate data in the handheld </li></ul><ul><ul><li>Download a copy of the data into the handheld from the remote server and process it offline. Periodically merge the changes with the server </li></ul></ul>
    27. 27. Synchronization -Issues <ul><li>Data replication can lead to conflicts </li></ul><ul><ul><li>Update-update, Update-delete, Unique key violation, Integrity constraint violation </li></ul></ul><ul><li>Maintain global consistency between replicated copies </li></ul><ul><ul><li>Strict consistency with Data partitioning </li></ul></ul><ul><ul><li>Strict consistency with Reservation protocols or Leases </li></ul></ul><ul><ul><ul><li>Efficient when data is rarely shared </li></ul></ul></ul><ul><ul><li>Weak consistency with Eventual consistency </li></ul></ul><ul><ul><ul><li>leases restrictive when data is shared between many copies </li></ul></ul></ul><ul><ul><ul><li>Independently access and update data </li></ul></ul></ul><ul><ul><ul><li>only tentative commits possible </li></ul></ul></ul><ul><ul><ul><li>Actual commit when transaction is executed at the server </li></ul></ul></ul>
    28. 28. Conclusions <ul><li>Handheld DBMS techniques have to consider the resource constraints, mobility, frequent disconnections, and security aspects of the handheld </li></ul><ul><li>The techniques used for one component will influence the choice of the technique used in another component. There is a very strong interdependence between the components of the handheld DBMS </li></ul><ul><li>Techniques rejected for the disk environment may be explored in the handheld environment </li></ul>
    29. 29. Future work <ul><li>Sync tool </li></ul><ul><li>Transaction management component </li></ul><ul><li>Recovery management component </li></ul><ul><li>Concurrency control component </li></ul><ul><li>Performance analysis of existing compression techniques in handheld environment </li></ul>
    30. 30. References
    31. 31. References (cont)
    32. 32. References (cont)
    33. 33. References (cont)