Stage1Raj.ppt

807 views
753 views

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
807
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
12
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Stage1Raj.ppt

  1. 1. An open source DBMS for handheld devices by Rajkumar Sen
  2. 2. Outline <ul><li>Introduction and Motivation </li></ul><ul><li>System Architecture </li></ul><ul><li>Storage Model and Index </li></ul><ul><li>Query Processing </li></ul><ul><li>Data Synchronization </li></ul><ul><li>Transaction Management </li></ul><ul><li>Summary and Proposed Future Work </li></ul>
  3. 3. Introduction and Motivation <ul><li>What is a handheld device </li></ul><ul><ul><li>A small computer with limited resources </li></ul></ul><ul><ul><li>e.g. Simputer, Palm devices, iPAQ etc. </li></ul></ul><ul><li>The Simputer </li></ul><ul><ul><li>One of the most powerful handhelds </li></ul></ul><ul><ul><li>Low cost and shareable </li></ul></ul><ul><ul><li>Developed at IISc </li></ul></ul><ul><ul><li>Intel StrongArm processor </li></ul></ul><ul><ul><li>Flash Memory for stable storage (24MB) </li></ul></ul><ul><ul><li>Limited main memory (32MB) </li></ul></ul><ul><ul><li>Recognizes Smartcards </li></ul></ul><ul><ul><li>Linux operating system </li></ul></ul>
  4. 4. Introduction and Motivation <ul><li>Why DBMS for a Simputer </li></ul><ul><ul><li>Increasing number of applications </li></ul></ul><ul><ul><li>e.g. Microbanking, E-governance, Agricultural market pricing etc. </li></ul></ul><ul><ul><li>They deal with a fair amount of data </li></ul></ul><ul><ul><li>Complex queries involving joins and aggregates </li></ul></ul><ul><ul><li>Atomicity and Durability for data consistency </li></ul></ul><ul><ul><li>Ease of application development </li></ul></ul><ul><li>Need for Synchronization </li></ul><ul><ul><li>Data from remote server downloaded on the Simputer </li></ul></ul><ul><ul><li>Updates at both places </li></ul></ul><ul><ul><li>Common data needs to be synchronized </li></ul></ul><ul><li>Open source since Simputer is designed to be low-cost </li></ul>
  5. 5. Introduction and Motivation <ul><li>No open source DBMS designed for handheld devices </li></ul><ul><ul><li>Oracle Lite and DB2 Everyplace are not open source </li></ul></ul><ul><ul><li>MySQL and BerkeleyDB primarily designed for disk-based systems </li></ul></ul><ul><li>Data in Flash Memory </li></ul><ul><ul><li>Access time less than disk </li></ul></ul><ul><ul><li>Random access as good as sequential access </li></ul></ul><ul><li>System Constraints </li></ul><ul><ul><li>Limited storage capacity </li></ul></ul><ul><ul><li>Less main memory </li></ul></ul><ul><ul><li>Slow writes in Flash Memory </li></ul></ul>
  6. 6. Introduction and Motivation <ul><li>Goal is to design a system that consists of </li></ul><ul><ul><li>A lightweight relational DBMS that resides on the Simputer </li></ul></ul><ul><ul><li>An application that downloads data from a remote server </li></ul></ul><ul><ul><li>A synchronization tool </li></ul></ul><ul><li>Database Module Toolkit </li></ul><ul><ul><li>Developing the building blocks as modules </li></ul></ul><ul><ul><li>Modules can be plugged depending on the type of application </li></ul></ul>
  7. 7. System Architecture <ul><li>Key components of the system </li></ul><ul><ul><li>Data and Metadata Manager </li></ul></ul><ul><ul><ul><li>Representation of relational data and data structures for metadata </li></ul></ul></ul><ul><ul><li>Main Memory Manager </li></ul></ul><ul><ul><ul><li>Manages data to be kept in memory </li></ul></ul></ul><ul><ul><li>Transaction Manager </li></ul></ul><ul><ul><ul><li>Ensures data consistency </li></ul></ul></ul><ul><ul><li>Access Right Manager </li></ul></ul><ul><ul><ul><li>Security rules needed as Simputer is shareable </li></ul></ul></ul><ul><ul><li>Query Engine </li></ul></ul><ul><ul><ul><li>SQL compiler and query processing </li></ul></ul></ul><ul><ul><li>FetchRelation Module </li></ul></ul><ul><ul><ul><li>Fetches data from a remote server </li></ul></ul></ul><ul><ul><li>Synchronization Module </li></ul></ul><ul><ul><ul><li>Synchronizes common data in Simputer and remote server </li></ul></ul></ul>
  8. 8. System Architecture Figure: System Architecture
  9. 9. Storage Model and Indexing <ul><li>Aim at compactness in representation of data as well as index </li></ul><ul><li>Existing models </li></ul><ul><ul><li>Flat Storage </li></ul></ul><ul><ul><ul><li>Tuples are stored sequentially. Ensures access locality but consumes </li></ul></ul></ul><ul><ul><ul><li>space. Access locality not an issue since data in flash memory </li></ul></ul></ul><ul><ul><li>Pointer-based Domain Storage </li></ul></ul><ul><ul><ul><li>Eliminates duplicates </li></ul></ul></ul><ul><ul><ul><li>Values partitioned into domains which are sets of unique values </li></ul></ul></ul><ul><ul><ul><li>Tuples reference the attribute value by means of pointers </li></ul></ul></ul><ul><ul><ul><li>One domain shared among multiple attributes </li></ul></ul></ul><ul><ul><ul><li>No index </li></ul></ul></ul><ul><ul><li>Pointer-based Ring Storage (Bobineau et al) </li></ul></ul><ul><ul><ul><li>Modification to Domain Storage </li></ul></ul></ul><ul><ul><ul><li>Uses domain structure as index </li></ul></ul></ul><ul><ul><ul><li>Tuple-to-tuple pointers connect two tuples that have same attribute value </li></ul></ul></ul><ul><ul><ul><li>Index structure in the form of a ring </li></ul></ul></ul>
  10. 10. Storage Model and Indexing <ul><li>Ring Storage Model contd. </li></ul><ul><ul><li>Advantages </li></ul></ul><ul><ul><li>Addresses data and index compactness </li></ul></ul><ul><ul><li>Join becomes easy </li></ul></ul><ul><ul><li>Drawbacks </li></ul></ul><ul><ul><li>For projecting value of an attribute, need to traverse half of the ring </li></ul></ul><ul><li>Some other Models </li></ul><ul><ul><li>DBGraph (Pucheral et al) </li></ul></ul><ul><ul><li>Maintains value-to-tuple as well as tuple-to-value pointers. </li></ul></ul><ul><ul><li>Too many pointers </li></ul></ul><ul><ul><li>Domain Tree (Missikov et al) </li></ul></ul><ul><ul><li>Maintains Domain trees, for simple projection need to scan </li></ul></ul><ul><ul><li>all the domain trees. </li></ul></ul><ul><ul><li>Another model (Krithi et al ) suggested for data warehouses </li></ul></ul><ul><ul><li>uses projection index and join index </li></ul></ul>
  11. 11. Storage Model and Indexing <ul><li>Our Model </li></ul><ul><ul><li>Based on Domain Storage Model </li></ul></ul><ul><ul><li>Each domain element is a (length, value) tuple </li></ul></ul><ul><ul><li>Domains used for duplicates and variable width attributes </li></ul></ul><ul><ul><li>When primary key-foreign key relation, instead of storing value in child table we store pointer to the tuple in the parent table which contains the value </li></ul></ul>Figure: Our storage model
  12. 12. Storage Model and Indexing <ul><li>Index structure </li></ul><ul><ul><li>Sorted array of Tuple Identifier List (pointers to tuples) </li></ul></ul><ul><ul><li>Each List corresponds to a value, consists of pointers to those tuples that share it </li></ul></ul><ul><ul><li>Ordering preserved in the index structure, hence efficient execution of range queries </li></ul></ul>Figure: Our index model
  13. 13. Storage Model and Indexing <ul><li>Our model eliminates the problems by storing some additional pointers </li></ul><ul><ul><li>Data storage separate from index structure, hence projection is easy </li></ul></ul><ul><ul><li>Index created on attribute rather than domains </li></ul></ul><ul><ul><li>For primary key-foreign key relation, join requires scanning of only child table </li></ul></ul><ul><ul><li>For domain based join attributes, join indices can be built </li></ul></ul>Figure: Join index
  14. 14. Query Processing <ul><li>Entire data is memory resident </li></ul><ul><ul><li>Aim to reduce number of tuples read from memory </li></ul></ul><ul><li>Join without indices </li></ul><ul><li>Assuming R and S are the outer and inner relations with n and m tuples </li></ul><ul><ul><li>Nested loop join </li></ul></ul><ul><ul><ul><li>Most reasonable choice when very less memory available. </li></ul></ul></ul><ul><ul><ul><li>No additional structure </li></ul></ul></ul><ul><ul><ul><li>Cost of join is mn </li></ul></ul></ul><ul><ul><li>Hash Join </li></ul></ul><ul><ul><ul><li>Requires the creation of hash partitions in main memory. </li></ul></ul></ul><ul><ul><li>Memory permitted, hash join is faster </li></ul></ul><ul><ul><li>Instead of storing values in the partitions, we store pointers to tuples </li></ul></ul><ul><ul><li>Cost = (m + n ) //for building hash partitions </li></ul></ul><ul><ul><li>+ (m + n) //for reading the hash partitions </li></ul></ul><ul><ul><li>(m + n) writes to memory and (m + n) pointer chases </li></ul></ul>
  15. 15. Query Processing <ul><li>Sort-Merge Join </li></ul><ul><li>Need memory for sorting </li></ul><ul><li>While sorting, do not store the tuples but only pointers to the tuples </li></ul><ul><li>Cost = ( m log m + n log n) //for sorting </li></ul><ul><li>+ ( m + n) // for reading to merge </li></ul><ul><li>(m + n) writes to memory and (m + n) pointer chases </li></ul><ul><li>Join with Indices </li></ul><ul><ul><li>Indexed Nested loop join </li></ul></ul><ul><ul><li>Relation on which index is available is treated as the inner relation </li></ul></ul><ul><ul><li>Cost = n //for reading the outer relation R </li></ul></ul><ul><ul><li>+ n log m //for each tuple of R, index into S </li></ul></ul>
  16. 16. Query Processing <ul><ul><li>Sort-Merge Join </li></ul></ul><ul><ul><li>If indices available on both join attributes, sorting not needed </li></ul></ul><ul><li>Cost = m + n //for merging </li></ul><ul><li>Some cost involved in traversing the index and chasing pointers </li></ul><ul><li>Join Indices </li></ul><ul><li>A Single traversal of the index and chasing pointers gives the tuples </li></ul><ul><li>which are candidates for joining </li></ul><ul><li>Other operations </li></ul><ul><li>Selection, Projection and Sorting are straightforward. </li></ul><ul><li>Duplicate elimination , aggregation ,union, intersection, and difference </li></ul><ul><li>can be computed by sorting and building hash indices on relations </li></ul>
  17. 17. Query Processing <ul><li>Query Plan Generation and Memory Allocation </li></ul><ul><ul><li>An optimal query execution plan is needed </li></ul></ul><ul><ul><li>Reduce materialization, left-deep tree and bushy trees are ruled out </li></ul></ul><ul><ul><li>Choose from right-deep and extreme right-deep tree depending on </li></ul></ul><ul><ul><li>memory available for storing intermediate results </li></ul></ul><ul><ul><li>Query optimizer has to be memory cognizant </li></ul></ul><ul><ul><li>Memory must be optimally allocated among all operators since we </li></ul></ul><ul><ul><li>cannot assume entire memory is available for each operator </li></ul></ul><ul><ul><li>Arvind et. al. suggests memory allocation to operators based on cost </li></ul></ul><ul><ul><li>functions </li></ul></ul><ul><ul><li>Aggregation, sorting and duplicate removal generally performed on </li></ul></ul><ul><ul><li>materialized results. </li></ul></ul><ul><ul><li>By enforcing a particular tuple arrival order at the leaf of the plan tree, </li></ul></ul><ul><ul><li>pipelining of aggregation etc. is possible </li></ul></ul>
  18. 18. Synchronization <ul><li>Main data resides on a remote server </li></ul><ul><ul><li>Information in the form of relations in a remote DBMS </li></ul></ul><ul><ul><li>Relevant information is downloaded on the Simputer </li></ul></ul><ul><ul><li>Copies of data present on both Simputer and remote server </li></ul></ul><ul><ul><li>Need for synchronization </li></ul></ul><ul><li>Two scenarios in data synchronization </li></ul><ul><li>1. Relation downloaded on the Simputer </li></ul><ul><ul><li>Download the entire relation R only once </li></ul></ul><ul><ul><li>Subsequently, transfer only the differentials </li></ul></ul><ul><ul><li>R simpnew = R simpold U ∆ R main </li></ul></ul><ul><ul><li>R mainnew = R mainold U ∆ R simp </li></ul></ul><ul><li>Problem arises when the same tuple is updated at both places. </li></ul><ul><li>Tuple Conflict </li></ul><ul><ul><ul><li>Detection harder than resolution </li></ul></ul></ul>
  19. 19. Synchronization <ul><li>2.Resultset downloaded on the Simputer </li></ul><ul><ul><li>Execute the query at the remote server and transfer only once </li></ul></ul><ul><ul><li>Subsequently transfer only the differentials </li></ul></ul><ul><ul><li>q= ( R S ) </li></ul></ul><ul><ul><li>q new = q old U (i r S) </li></ul></ul><ul><ul><li>q new = q old − (d r S) </li></ul></ul><ul><ul><li>For each query, find out what differential to compute. </li></ul></ul><ul><ul><ul><li>Resembles the Incremental View Maintenance Problem </li></ul></ul></ul><ul><ul><li>In-place updates on the resultset in the Simputer means changes </li></ul></ul><ul><ul><li>have to be propagated to the remote database </li></ul></ul><ul><ul><ul><li>Resembles the View Update Problem </li></ul></ul></ul><ul><ul><li>Updates to downloaded relations and resultsets recorded in </li></ul></ul><ul><ul><li>Intention List and Intention View List respectively. </li></ul></ul><ul><ul><li>Determine whether the changes can be reflected </li></ul></ul>
  20. 20. Synchronization <ul><li>View Update Considerations </li></ul><ul><ul><li>Only legal view updates </li></ul></ul><ul><ul><li>Only legal database updates </li></ul></ul><ul><ul><li>Only underlying tuples affected </li></ul></ul><ul><ul><li>Insertion of extra tuples </li></ul></ul><ul><ul><li>Modification of other view tuples </li></ul></ul><ul><ul><li>Keller suggests translations for inserts, deletes, and modifications </li></ul></ul><ul><ul><li>- Restricts the relations to BCNF </li></ul></ul><ul><ul><li>- Joins should have an inclusion dependency </li></ul></ul><ul><ul><li>- Our remote server relations need not satisfy </li></ul></ul><ul><ul><li>Langerak suggests translations based on extension joins </li></ul></ul><ul><ul><li>- No restriction on normal forms and joins </li></ul></ul><ul><ul><li>- Can be the basis of our synchronization engine </li></ul></ul><ul><li>Most of the rules need data and metadata about the base relations </li></ul><ul><ul><li>- Remote database connection needed to validate updates </li></ul></ul>
  21. 21. Transaction Management <ul><li>ACID properties need to be maintained </li></ul><ul><ul><li>Global atomicity </li></ul></ul><ul><ul><li>- data updates in both Simputer and smartcard. </li></ul></ul><ul><li>- Simputer participates in distributed transactions </li></ul><ul><ul><li>Simputer a single user system, why Concurrency Control </li></ul></ul><ul><ul><li>- Transactions on behalf of data synchronization </li></ul></ul><ul><ul><li>- Long transaction doing some aggregation </li></ul></ul><ul><li>Concurrency Control </li></ul><ul><ul><li>Most of the transactions expected to complete fast </li></ul></ul><ul><ul><li>Lock contention reduces, hence large locking granules </li></ul></ul><ul><ul><li>Korth et al proposes a serial protocol . Main aim is to improve </li></ul></ul><ul><ul><li>response time rather than throughput </li></ul></ul><ul><ul><li>Instead of hash tables, bits can be used for lock information </li></ul></ul>
  22. 22. Transaction Management <ul><li>Commit processing </li></ul><ul><ul><li>Shadow updates </li></ul></ul><ul><ul><ul><ul><li>Data locality not an issue since data in memory </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Not suitable for pointer-based storage models </li></ul></ul></ul></ul><ul><ul><ul><ul><li>If one domain value changes, its location also changes </li></ul></ul></ul></ul><ul><ul><ul><ul><li>and all tuple pointers need to be updated </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Size of the shadow is also a concern </li></ul></ul></ul></ul><ul><ul><li>Log-based updates </li></ul></ul><ul><ul><ul><ul><li>Better suited for pointer-based models </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Cost of maintaining the logs </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Pointer-based logging instead of value-based logging </li></ul></ul></ul></ul><ul><li>Limited memory, hence Steal buffer replacement policy. </li></ul><ul><ul><li>Dirty blocks output to smartcard stable storage when connected </li></ul></ul><ul><ul><li>If all dirty blocks of a transaction output to smartcard , no Undo </li></ul></ul><ul><li>More detailed survey left as future work </li></ul>
  23. 23. Summary and Proposed Future Work <ul><li>Discussion and solutions proposed can be generalized for any </li></ul><ul><li>handheld device. </li></ul><ul><li>However, device specific optimizations possible </li></ul><ul><li>Future Work </li></ul><ul><ul><li>More detailed survey of concurrency control and recovery </li></ul></ul><ul><ul><li>Protocols for validation of updates on downloaded data </li></ul></ul><ul><ul><li>Query engine needs to be designed </li></ul></ul><ul><ul><li>Access Rights Management issues need to be addressed </li></ul></ul><ul><ul><li>Database module toolkit </li></ul></ul><ul><ul><li>Implement the system on the Simputer </li></ul></ul><ul><ul><li>Evaluate the system </li></ul></ul>
  24. 24. Thank You
  25. 25. References <ul><li>A. Ammann, M. Hanrahan, and R. Krishnamurthy. Design of a Memory Resident DBMS. In IEEE COMPCON, 1985. </li></ul><ul><li>2. C. Bobineau, L. Bouganim, P. Pucheral, and P. Valduriez. PicoDBMS: Scaling down Database Techniques for the Smartcard. In VLDB, 2000. </li></ul><ul><li>3. Stephen Blott and Henry F. Korth. An Almost Serial Protocol for Transaction Execution in Main Memory Database Systems. In VLDB, 2002. </li></ul><ul><li>4. DB2 Everyplace. http://www.ibm.com/software/data/db2/everyplace. </li></ul><ul><li>5. Anindya Datta, Debra VanderMeer, Krithi Ramamritham, and Bongki Moon. Applying Parallel Processing Techniques in Data Warehousing and OLAP. In VLDB, 1999. </li></ul><ul><li>6. A. Hulgeri, S. Sudarshan, and S. Seshadri. Memory Cognizant Query Optimization. In Advances In Data Management, 2000. </li></ul>
  26. 26. References <ul><li>7. Arthur M. Keller. Algorithms for Translating View Updates to </li></ul><ul><li>Database Updates for Views Involving Selections, Projections and </li></ul><ul><li>Joins. In ACM PODS, 1985. </li></ul><ul><li>8. Rom Langerak. View Updates in Relational Databases with an </li></ul><ul><li>Independent Scheme. In ACM PODS, 1990. </li></ul><ul><li>T. Lehmann and M. Carey. A Study of Index Structures for Main </li></ul><ul><li>Memory DBMS. In VLDB, 1986. </li></ul><ul><li>10. M. Missikov and M. Scholl. Relational Queries in a Domain Based </li></ul><ul><li>DBMS. In ACM SIGMOD, 1983. </li></ul><ul><li>Mysql. http://www.mysql.com. </li></ul><ul><li>12. P. Pucheral, P. Valduriez, and J.M.Thevenin. EÆcient Main </li></ul><ul><li>Memory Data Management using the DBGraph Storage Model. In </li></ul><ul><li>VLDB, 1990. </li></ul><ul><li>13. The Simputer. http://www.simputer.org. </li></ul>
  27. 27. Storage Model and Indexing <ul><li>Aim at c ompactness in representation of data as well as index </li></ul><ul><li>Existing models </li></ul><ul><ul><li>Flat Storage </li></ul></ul><ul><ul><ul><li>Tuples are stored sequentially. Ensures access locality but consumes </li></ul></ul></ul><ul><ul><ul><li>space. Access locality not an issue since data in flash memory </li></ul></ul></ul><ul><ul><li>Pointer-based Domain Storage </li></ul></ul><ul><ul><ul><li>Eliminates duplicates </li></ul></ul></ul><ul><ul><ul><li>Values partitioned into domains which are sets of unique value </li></ul></ul></ul><ul><ul><ul><li>Tuples reference the attribute value by means of pointers </li></ul></ul></ul><ul><ul><ul><li>One domain shared among multiple attributes </li></ul></ul></ul><ul><ul><ul><li>No index </li></ul></ul></ul>Figure: Domain Storage
  28. 28. Storage Model and Indexing <ul><li>Pointer-based Ring Storage (Bobineau et al) </li></ul><ul><ul><li>Modification to Domain Storage </li></ul></ul><ul><ul><li>Uses domain structure as index </li></ul></ul><ul><ul><li>Tuple-to-tuple pointers connect two tuples that have same attribute value </li></ul></ul><ul><ul><li>Index structure in the form of a ring </li></ul></ul>Figure: Ring Storage

×