Your SlideShare is downloading. ×
0
Oracle Berkeley DB - Concurrent Data Storage (CDS) Tutorial
Oracle Berkeley DB - Concurrent Data Storage (CDS) Tutorial
Oracle Berkeley DB - Concurrent Data Storage (CDS) Tutorial
Oracle Berkeley DB - Concurrent Data Storage (CDS) Tutorial
Oracle Berkeley DB - Concurrent Data Storage (CDS) Tutorial
Oracle Berkeley DB - Concurrent Data Storage (CDS) Tutorial
Oracle Berkeley DB - Concurrent Data Storage (CDS) Tutorial
Oracle Berkeley DB - Concurrent Data Storage (CDS) Tutorial
Oracle Berkeley DB - Concurrent Data Storage (CDS) Tutorial
Oracle Berkeley DB - Concurrent Data Storage (CDS) Tutorial
Oracle Berkeley DB - Concurrent Data Storage (CDS) Tutorial
Oracle Berkeley DB - Concurrent Data Storage (CDS) Tutorial
Oracle Berkeley DB - Concurrent Data Storage (CDS) Tutorial
Oracle Berkeley DB - Concurrent Data Storage (CDS) Tutorial
Oracle Berkeley DB - Concurrent Data Storage (CDS) Tutorial
Oracle Berkeley DB - Concurrent Data Storage (CDS) Tutorial
Oracle Berkeley DB - Concurrent Data Storage (CDS) Tutorial
Oracle Berkeley DB - Concurrent Data Storage (CDS) Tutorial
Oracle Berkeley DB - Concurrent Data Storage (CDS) Tutorial
Oracle Berkeley DB - Concurrent Data Storage (CDS) Tutorial
Oracle Berkeley DB - Concurrent Data Storage (CDS) Tutorial
Oracle Berkeley DB - Concurrent Data Storage (CDS) Tutorial
Oracle Berkeley DB - Concurrent Data Storage (CDS) Tutorial
Oracle Berkeley DB - Concurrent Data Storage (CDS) Tutorial
Oracle Berkeley DB - Concurrent Data Storage (CDS) Tutorial
Oracle Berkeley DB - Concurrent Data Storage (CDS) Tutorial
Oracle Berkeley DB - Concurrent Data Storage (CDS) Tutorial
Oracle Berkeley DB - Concurrent Data Storage (CDS) Tutorial
Oracle Berkeley DB - Concurrent Data Storage (CDS) Tutorial
Oracle Berkeley DB - Concurrent Data Storage (CDS) Tutorial
Oracle Berkeley DB - Concurrent Data Storage (CDS) Tutorial
Oracle Berkeley DB - Concurrent Data Storage (CDS) Tutorial
Oracle Berkeley DB - Concurrent Data Storage (CDS) Tutorial
Oracle Berkeley DB - Concurrent Data Storage (CDS) Tutorial
Oracle Berkeley DB - Concurrent Data Storage (CDS) Tutorial
Oracle Berkeley DB - Concurrent Data Storage (CDS) Tutorial
Oracle Berkeley DB - Concurrent Data Storage (CDS) Tutorial
Oracle Berkeley DB - Concurrent Data Storage (CDS) Tutorial
Oracle Berkeley DB - Concurrent Data Storage (CDS) Tutorial
Oracle Berkeley DB - Concurrent Data Storage (CDS) Tutorial
Oracle Berkeley DB - Concurrent Data Storage (CDS) Tutorial
Oracle Berkeley DB - Concurrent Data Storage (CDS) Tutorial
Oracle Berkeley DB - Concurrent Data Storage (CDS) Tutorial
Oracle Berkeley DB - Concurrent Data Storage (CDS) Tutorial
Oracle Berkeley DB - Concurrent Data Storage (CDS) Tutorial
Oracle Berkeley DB - Concurrent Data Storage (CDS) Tutorial
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Oracle Berkeley DB - Concurrent Data Storage (CDS) Tutorial

2,402

Published on

Published in: Technology
1 Comment
2 Likes
Statistics
Notes
No Downloads
Views
Total Views
2,402
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
56
Comments
1
Likes
2
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • Designed to follow the Use-Case Based Tutorial for Data Store
  • Ideally want three use cases: simple one, one that uses ALLDB, and one that uses cds_groupbegin WW-like site: Per-user database? Food database (secondary points to food) Recipe database (secondary for ingredients to recipes; categories to recipes) Add food: look up food (read) write per-user Add recipe: look up recipe, iterate over incredients, write into per-user New recipe:
  • CDS is the simplest product that facilitates Read/Write concurrency.
  • Assume that you already know what is in the DS tutorial. Three areas where you need to modify your programs. Environment configuration (environments are requires) Tell us which cursor might be used for writes so we can guarantee deadlock free execution Additional configuration options depending on now many databases you have and whether your concurrent threads execute in the same process or different processes.
  • Contrast environment REQUIRED (versus optional for data store)
  • Let’s take a look at how CDS performs locking.
  • This explains a bit about WHY you need to use the writecursor flag. Important points to note: Only one write cursor at a time. Attempts to open a second write cursor will block Writes can ONLY be performed using a write cursor. Write cursors are the mechanism we use to ensure deadlock free execution.
  • Cds_groupbegin
  • Let’s look at ALLDB in a bit more detail, because it can limit concurrency and therefore should only be used when necessary.
  • Workload characterization on the iPod: Read mostly Only a single writer iPod is a cache for what is on laptop, so recoverability is not required.
  • Key is unique; artist secondary is not
  • Potential problems: Names are not unique! This is going to play the first song we find with the right title. In actuality, we probably need to iterate over all the duplicates. We’ll see how to do this in the next example.
  • Imagine that songs is concatenated: 12345:65432:29292: … Looking up songs by album is easy -- just a regular data access, so let’s look at the secondary callback and how to find the song on the greatest number of albums.
  • This secondary is more complicated than many -- each secondary key (album) returns a set of primary keys. That’s why we need to iterate over the multiple returns. For more details on exactly how this works, please consult the datastore tutorial
  • So far, all we’ve done are read-only queries. If we only had to support read-only queries, then we wouldn’t need CDS, so let’s now consider our update queries
  • Note: It’s possible that you’re adding a song that might need to go into an album, but we have no way of knowing that, so let’s ignore it.
  • Why DB_CDB_ALL? Because we have multiple databases and we need to avoid deadlocks between the multiple databases.
  • Absdbp = album-by-song DBP Pkey already contains the song ID
  • Recall that we have a read lock on the albums via the secondary cursor. Now we need a write lock to do the PUT. Multiple databases (include primaries and secondaries) requires use of CDB_ALL so that we do not self deadlock.
  • Container is a single file containing multiple BDB databases. Rather than focus on the actual data representation in this use case, we’re going to focus on how different operations use the database and necessitate the use of cds_groupbegin. To that end, we’ll focus exclusively on adding indexes (as that creates an opportunity to demonstrate the use of cdsgroup_begin.
  • Default indexes indexes ALL tags. So, when you create one, you can’t map the tags into the dictionary before you start looking at documents. So, a cursor in the document and then you need to add dictionary entries for each tag; that requires a writecursor; you’re using DB_CDB_ALL So that would block unless you get the right locker.
  • Imagine that we already have a set of documents in a container and we wish to create a default index. Recall that when a program is multi-threaded, every cursor represents a new locker-ID. When we add to dictionary, we need to get a writelock, but we already have a read cursor open (on the document). If we do not use the same locker ID for the event reader and the add-to-dictionary, then we’ll self-deadlock. Default indexes indexes ALL tags. So, when you create one, you can’t map the tags into the dictionary before you start looking at documents. So, a cursor in the document and then you need to add dictionary entries for each tag; that requires a writecursor; you’re using DB_CDB_ALL So that would block unless you get the right locker.
  • Recall that when a program is multi-threaded, every cursor represents a new locker-ID. When we add to dictionary, we need to get a writelock, but we already have a read cursor open (on the document). If we do not use the same locker ID for the event reader and the add-to-dictionary, then we’ll self-deadlock. Default indexes indexes ALL tags. So, when you create one, you can’t map the tags into the dictionary before you start looking at documents. So, a cursor in the document and then you need to add dictionary entries for each tag; that requires a writecursor; you’re using DB_CDB_ALL So that would block unless you get the right locker.
  • Transcript

    • 1. Oracle Berkeley DB Concurrent Data Store A Use-Case Based Tutorial
    • 2. Part II: Concurrent Data Store <ul><li>Overview </li></ul><ul><ul><li>What is CDS? </li></ul></ul><ul><ul><li>When is CDS appropriate? </li></ul></ul><ul><li>Case Studies </li></ul><ul><ul><li>Managing music </li></ul></ul><ul><ul><li>Storing XML </li></ul></ul>
    • 3. Concurrent Data Store <ul><li>Concurrency with reads and writes. </li></ul><ul><li>Deadlock-free </li></ul><ul><li>API-level locking </li></ul><ul><li>What CDS does not do: </li></ul><ul><ul><li>Recovery </li></ul></ul><ul><ul><li>Transactions </li></ul></ul><ul><ul><li>Replication </li></ul></ul>
    • 4. When to use CDS <ul><li>Read-mostly workload. </li></ul><ul><li>Single-writer at a time is OK. </li></ul><ul><li>Transient data (e.g., caching). </li></ul><ul><li>No recovery after unclean shutdown. </li></ul>
    • 5. CDS Programmatically <ul><li>Setting up your environment </li></ul><ul><li>Write cursors </li></ul><ul><li>Locking for complex applications </li></ul>
    • 6. Setting up your Environment <ul><li>Environment required for CDS. </li></ul><ul><li>Specify CDS on DB_ENV-&gt;open call. </li></ul><ul><li>/* Create handle. */ </li></ul><ul><li>ret = db_env_create(&amp;dbenv, 0); </li></ul><ul><li>… error_handling … </li></ul><ul><li>/* Create and open the environment. */ </li></ul><ul><li>flags = DB_CREATE | DB_INIT_CDB ; </li></ul><ul><li>ret = dbenv-&gt;open(dbenv, HOME, flags, 0); </li></ul><ul><li>… error_handling … </li></ul>
    • 7. CDS Locking <ul><li>CDS acquires locks at the API level. </li></ul><ul><li>By default, CDS locking is per-database. </li></ul><ul><ul><li>Multiple readers in a single-database, OR </li></ul></ul><ul><ul><li>Single writer in a database. </li></ul></ul><ul><li>Applications can both read and write via cursors, so they need to be handled specially. </li></ul>
    • 8. Write Cursors <ul><li>CDS deadlock free. </li></ul><ul><li>Must avoid upgrading readlocks to writelocks. </li></ul><ul><li>No upgrades on DBP operations. </li></ul><ul><li>Cursors perform both reads and writes. </li></ul><ul><li>Not all cursors write. </li></ul><ul><li>Cursors that may write must be created with the DB_WRITECURSOR flag. </li></ul>
    • 9. Creating a write cursor (code) <ul><ul><li>DBC *dbc = NULL; </li></ul></ul><ul><ul><li>ret = dbp-&gt;cursor(dbp, NULL, &amp;dbc, DB_WRITECURSOR); </li></ul></ul><ul><ul><li>if (ret != 0) </li></ul></ul><ul><ul><li>… error_handling … </li></ul></ul>
    • 10. Multiple Databases <ul><li>Normally CDS locks per-database. </li></ul><ul><li>If applications maintain open cursors on one database while operating on another database, concurrent operations can deadlock. </li></ul><ul><li>DB_CDB_ALLDB locks per-environment. </li></ul>
    • 11. DB_CDB_ALLDB <ul><li>Big hammer solution. </li></ul><ul><li>Performs locking on an environment-wide basis. </li></ul><ul><ul><li>Only one write cursor in the entire environment. </li></ul></ul><ul><ul><li>No open read cursors in environment while writing. </li></ul></ul><ul><li>Simple, but … </li></ul><ul><li>Low concurrency in presence of writes. </li></ul><ul><li>Appropriate for applications whose operations touch multiple databases. </li></ul>
    • 12. Configuring DB_CDB_ALLDB <ul><li>/* Create handle. */ </li></ul><ul><li>ret = db_env_create(&amp;dbenv, 0); </li></ul><ul><li>… error_handling … </li></ul><ul><li>/* Turn on environment-wide locking. */ </li></ul><ul><li>ret = dbenv-&gt;set_flags(dbenv, DB_CDB_ALLDB , 1); </li></ul><ul><li>/* Create and open the environment. */ </li></ul><ul><li>flags = DB_CREATE | DB_INIT_CDB ; </li></ul><ul><li>ret = dbenv-&gt;open(dbenv, HOME, flags, 0); </li></ul><ul><li>… error_handling … </li></ul>
    • 13. Multi-threaded CDS <ul><li>Normally a locker-id is allocated per database. </li></ul><ul><li>In a multi-threaded process, locker IDs are allocated per cursor. </li></ul><ul><li>Multiple threads of control (in the same process) accessing multiple databases can deadlock. </li></ul><ul><li>cdsgroup_begin allocates a locker-id for a thread of control. </li></ul>
    • 14. cdsgroup_begin <ul><li>Assigns a locker to a set of accesses called a group (like a transaction). </li></ul><ul><li>Allows multiple write-cursors in a group. </li></ul><ul><li>Allows open read cursor in a group during a write in the same group. </li></ul><ul><li>Prevents deadlocks between groups. </li></ul><ul><li>Requires DB_CDB_ALLDB . </li></ul><ul><li>Appropriate for multi-threaded applications that use multiple databases simultaneously. </li></ul>
    • 15. Using cdsgroup_begin <ul><li>/* </li></ul><ul><li>* Assume open dbenv (with DB_CDB_ALLDB) and two </li></ul><ul><li>* open DBPs. </li></ul><ul><li>*/ </li></ul><ul><li>DB_TXN *gid; </li></ul><ul><li>DBC *dbc1, *dbc2; </li></ul><ul><li>ret = dbenv-&gt;cdsgroup_begin(dbenv, &amp;gid); </li></ul><ul><li>if (ret != 0) </li></ul><ul><li>… error handling … </li></ul><ul><li>dbp1-&gt;cursor(dbp1, gid, dbc1, DB_WRITECURSOR); </li></ul><ul><li>dbp2-&gt;cursor(dbp2, gid, dbc2, DB_WRITECURSOR); </li></ul><ul><li>/* Perform operations on dbp1, dbp2, dbc1 &amp; dbc2. */ </li></ul><ul><li>/* End the group of operations. */ </li></ul><ul><li>gid-&gt;commit(gid, 0); </li></ul>
    • 16. Use Cases <ul><li>Managing a music database. </li></ul><ul><li>Storing and Indexing XML. </li></ul>
    • 17. Managing a Music Database Write songs/data Get songs, play-lists, random collections
    • 18. Music Data <ul><li>Songs have: </li></ul><ul><ul><li>A unique ID </li></ul></ul><ul><ul><li>A (possibly non-unique) title </li></ul></ul><ul><ul><li>An artist </li></ul></ul><ul><ul><li>A year in which it was published </li></ul></ul><ul><ul><li>One or more albums on which it appeared </li></ul></ul><ul><ul><li>An MP3 containing the actual song </li></ul></ul>
    • 19. Song Queries <ul><li>Play song by title </li></ul><ul><li>Find songs by artist </li></ul><ul><li>Find songs by album </li></ul><ul><li>Find song appearing on the most albums </li></ul><ul><li>Updates </li></ul><ul><ul><li>Add/remove song </li></ul></ul><ul><ul><li>Add/remove album </li></ul></ul>
    • 20. Primary Data &amp; Indices Key Track Id Artist File Name Year Song Title Data Add secondary index on Title Add secondary index on Artist
    • 21. Processing Queries <ul><li>Play song by title </li></ul><ul><ul><li>Secondary access by title, extract file name, play file. </li></ul></ul><ul><li>Find songs by artist </li></ul><ul><ul><li>Secondary access by artist, iterate over all duplicates </li></ul></ul><ul><li>Find songs by album </li></ul><ul><ul><li>Create album database; iterate using secondary with multiple </li></ul></ul><ul><li>Find the song that appears on the greatest number of albums </li></ul><ul><ul><li>Create secondary on album; iterate, count, and max </li></ul></ul><ul><li>Updates </li></ul><ul><ul><li>Add songs </li></ul></ul><ul><ul><li>Remove songs </li></ul></ul><ul><ul><li>Add album </li></ul></ul><ul><ul><li>Update album/artist information </li></ul></ul>
    • 22. Play Song by Title <ul><li>/* Assume title secondary database open as sdbp. */ </li></ul><ul><li>char *filename; </li></ul><ul><li>int ret; </li></ul><ul><li>DBT datadbt, keydbt; </li></ul><ul><li>memset(&amp;datadbt, 0, sizeof(datadbt)); </li></ul><ul><li>memset(&amp;keydbt, 0, sizeof(keydbt)); </li></ul><ul><li>key-&gt;data = “Let it be”; </li></ul><ul><li>key-&gt;size = strlen((char *)key-&gt;data) + 1; </li></ul><ul><li>if ((ret = sdbp-&gt;get(sdbp, NULL, &amp;keydbt, &amp;datadbt, 0)) != 0) </li></ul><ul><li>return (ret); </li></ul><ul><li>filename = extract_file_from_record(datadbt-&gt;data); </li></ul><ul><li>play_song(filename); </li></ul>
    • 23. Retrieve by Artist <ul><li>/* Assume artist secondary database open as sdbp. */ </li></ul><ul><li>char *filename; </li></ul><ul><li>int ret; </li></ul><ul><li>DBT datadbt, keydbt; </li></ul><ul><li>DBC *dbc; </li></ul><ul><li>memset(&amp;datadbt, 0, sizeof(datadbt)); </li></ul><ul><li>memset(&amp;keydbt, 0, sizeof(keydbt)); </li></ul><ul><li>key-&gt;data = “Beatles”; </li></ul><ul><li>key-&gt;size = strlen((char *)key-&gt;data) + 1; </li></ul><ul><li>If ((ret = sdbp-&gt;cursor(sdbp, NULL, &amp;dbc, 0)) != 0) </li></ul><ul><li>goto err; </li></ul><ul><li>for (ret = dbc-&gt;get(dbc, &amp;keydbt, &amp;datadbt, DB_SET)/ </li></ul><ul><li> ret == 0; </li></ul><ul><li> ret = dbc-&gt;get(dbc, &amp;keydbt, &amp;datadbt, DB_NEXT_DUP)) </li></ul><ul><li>filename = extract_file_from_record(datadbt-&gt;data); </li></ul>
    • 24. The Album Database <ul><li>An album identifies a collection of songs. </li></ul><ul><li>Representation choices: </li></ul><ul><ul><li>Single key/data pair per album </li></ul></ul><ul><ul><ul><li>Album names are unique keys </li></ul></ul></ul><ul><ul><ul><li>Data item is a collection of track IDs </li></ul></ul></ul><ul><ul><li>Multiple key/data pairs per album </li></ul></ul><ul><ul><ul><li>Album is a non-unique key </li></ul></ul></ul><ul><ul><ul><li>Each data item represents a single track </li></ul></ul></ul><ul><ul><li>Key-only representation </li></ul></ul><ul><ul><ul><li>Key is Album/Song </li></ul></ul></ul><ul><ul><ul><li>Data is empty </li></ul></ul></ul><ul><li>How do you select the representation? </li></ul>
    • 25. Selecting Album Representation (1) <ul><li>Carefully review queries </li></ul><ul><ul><li>Find songs by album -- easy either way. </li></ul></ul><ul><ul><li>Find song that appears on the greatest number of albums -- slow unless you have a secondary index. </li></ul></ul><ul><li>A primary database with secondaries, must have a unique primary key. </li></ul><ul><ul><li>Single-item per album OR </li></ul></ul><ul><ul><li>Key-only representation </li></ul></ul>
    • 26. Selecting Album Representation (2) <ul><li>Single-item per album </li></ul><ul><ul><li>+ Compact </li></ul></ul><ul><ul><li>+ Easy to store other album data (e.g., year). </li></ul></ul><ul><ul><li>- Application must parse data </li></ul></ul><ul><li>Key-only representation </li></ul><ul><ul><li>Application must figure out end of album </li></ul></ul><ul><ul><li>Not obvious where to store album-wide data (could encode is as a “special” track ID). </li></ul></ul><ul><li>We’ll pick single-item per album. </li></ul>
    • 27. Album Data Key Album Name Songs … Year Artist Data Number Sold Secondary index on each song
    • 28. Album Secondary <ul><li>Map song to album(s). </li></ul><ul><li>Need a callback function </li></ul><ul><li>song_callback(DB* dbp, DBT *key, DBT *data, DBT *result){ </li></ul><ul><li>char *songs; </li></ul><ul><li>/* Extract songlist from data DBT. */ </li></ul><ul><li>songs = songs_from_data(data); </li></ul><ul><li>result-&gt;size = num_songs(songs); </li></ul><ul><li>result-&gt;data = malloc(result-&gt;size * sizeof(DBT)); </li></ul><ul><li>result-&gt;flags |= DB_DBT_MULTIPLE | DB_DBT_APPMALLOC; </li></ul><ul><li>foreach song in songs </li></ul><ul><li>place song in DBT in result array </li></ul><ul><li>} </li></ul>
    • 29. Updates <ul><li>Two queries: </li></ul><ul><ul><li>Add song </li></ul></ul><ul><ul><li>Remove song </li></ul></ul><ul><li>Design questions: </li></ul><ul><ul><li>What happens when you remove a song that belongs to an album? </li></ul></ul><ul><ul><li>How do you coordinate adding songs and albums? </li></ul></ul>
    • 30. Add a song <ul><li>If song doesn’t exist, no need to worry about its existence in an album. </li></ul><ul><li>So, this is a simple add (BDB updates the secondaries automatically) </li></ul><ul><li>DBT keydbt, datadbt; </li></ul><ul><li>memset(&amp;data, 0, sizeof(data)); </li></ul><ul><li>memset(&amp;key, 0, sizeof(key)); </li></ul><ul><li>key-&gt;data = “123456”; </li></ul><ul><li>key-&gt;size = strlen(key-&gt;data) + 1; </li></ul><ul><li>data-&gt;data = data_to_bytestring(song_data); </li></ul><ul><li>data-&gt;size = sizeof(data_to_bytestring(song_data)); </li></ul><ul><li>dbp-&gt;put(dbp, NULL, &amp;keydbt, &amp;datadbt, 0); </li></ul>
    • 31. Remove a Song <ul><li>If the song belongs to an album, then we’ve got multiple databases to update. </li></ul><ul><li>This is the kind of update that requires use of DB_CDB_ALL. </li></ul>
    • 32. Removing a song <ul><li>Look up song by title (get song ID). </li></ul><ul><li>Remove song from song database. </li></ul><ul><li>Create cursor on album-by-song secondary </li></ul><ul><li>Foreach album containing song </li></ul><ul><li>Remove song from album (requires writing album) </li></ul><ul><li>Close album cursor </li></ul>
    • 33. Remove song (code) <ul><ul><li>DB *sbtdbp; /* song-by-title index handle */ </li></ul></ul><ul><ul><li>DB *songs; /* song database handle */ </li></ul></ul><ul><ul><li>DBT keydbt, pkeydbt, datadbt; </li></ul></ul><ul><ul><li>/* 1. Look up primary key of song. */ </li></ul></ul><ul><ul><li>memset(&amp;datadbt, 0, sizeof(datadbt)); </li></ul></ul><ul><ul><li>memset(&amp;keydbt, 0, sizeof(keydbt)); </li></ul></ul><ul><ul><li>memset(&amp;pkeydbt, 0, sizeof(pkeydbt)); </li></ul></ul><ul><ul><li>key-&gt;data = “Time in a bottle”; </li></ul></ul><ul><ul><li>key-&gt;size = 17; </li></ul></ul><ul><ul><li>sbtdbp-&gt;pget(dbc, &amp;keydbt, &amp;pkeydbt, &amp;datadbt, DB_SET); </li></ul></ul><ul><ul><li>/* 2. Now delete song. */ </li></ul></ul><ul><ul><li>songs-&gt;del(songs, &amp;pkeydbt, 0); </li></ul></ul>
    • 34. Remove Song (code) <ul><ul><li>DBP *absdbp; /* Album-by-song secondary handle */ </li></ul></ul><ul><ul><li>DBC *adbc = NULL; </li></ul></ul><ul><ul><li>DBT akeydbt; /* Holds primary key of album. */ </li></ul></ul><ul><ul><li>/* 3. Create cursor on album-by-song secondary. */ </li></ul></ul><ul><ul><li>memset(&amp;datadbt, 0, sizeof(datadbt)); </li></ul></ul><ul><ul><li>memset(&amp;akeydbt, 0, sizeof(keydbt)); </li></ul></ul><ul><ul><li>absdbp-&gt;cursor(absdbp, NULL, &amp;adbc, 0); </li></ul></ul><ul><ul><li>/* </li></ul></ul><ul><ul><li>* Opening a cursor requires a READ lock on both the </li></ul></ul><ul><ul><li>* secondary index AND the primary database (albums). </li></ul></ul><ul><ul><li>*/ </li></ul></ul>
    • 35. Remove Song <ul><ul><li>/* 4. Foreach album containing song. */ </li></ul></ul><ul><ul><li>while (ret = </li></ul></ul><ul><ul><li>adbc-&gt;pget(adbc, &amp;pkeydbt, &amp;akeydbt, &amp;datadbt, DB_SET; </li></ul></ul><ul><ul><li>ret == 0; ret = adbc-&gt;get(adbc, </li></ul></ul><ul><ul><li>&amp;keydbt, &amp;akeydbt, &amp;datadbt, DB_NEXT)) { </li></ul></ul><ul><ul><li>/* 5. Remove the song from this album. */ </li></ul></ul><ul><ul><li>… remove song ID from song array in datadbt-&gt;data … </li></ul></ul><ul><ul><li>ret = album-&gt;put(album, &amp;akeydbt, &amp;datadbt, 0); </li></ul></ul><ul><ul><li>} </li></ul></ul><ul><ul><li>/* 6. Close cursor. */ </li></ul></ul><ul><ul><li>adbc-&gt;close(adbc, 0); </li></ul></ul>
    • 36. Why we need DB_CDB_ALL <ul><li>The cursor that iterates over the album secondary will acquire read locks on both the secondary index and the primary database. </li></ul><ul><li>When we update the album data, we need to write lock the primary database. </li></ul><ul><li>Without DB_CDB_ALL, these calls will deadlock. </li></ul><ul><li>With DB_CDB_ALL, we acquire a single lock that protects all the databases. </li></ul>
    • 37. Use Cases <ul><li>Managing a music database. </li></ul><ul><li>Storing and indexing XML. </li></ul>
    • 38. BDB XML <ul><li>Berkeley DB XML is built atop Berkeley DB. </li></ul><ul><li>Can configure DBXML to use CDS. </li></ul><ul><li>Let’s look at how XML uses BDB and the implications of CDS locking. </li></ul>
    • 39. XML Containers <ul><li>A container is a file containing multiple databases: </li></ul><ul><ul><li>A document database </li></ul></ul><ul><ul><li>Index databases </li></ul></ul><ul><ul><li>Index statistic database </li></ul></ul><ul><li>Operations on containers: </li></ul><ul><ul><li>Put/get documents </li></ul></ul><ul><ul><li>Update document </li></ul></ul><ul><ul><li>Add/remove indexes </li></ul></ul><ul><ul><li>Query using XQuery </li></ul></ul><ul><ul><li>Administration: create, remove, rename … </li></ul></ul>
    • 40. DbXML Internal Basics <ul><li>A default index provides keyed access to all tags in all documents. </li></ul><ul><li>XML references all tags by ID number. </li></ul><ul><li>Dictionary maps from tag-name to ID number. </li></ul><ul><li>Most operations touch multiple databases. </li></ul><ul><li>When using CDS, must use DB_CDB_ALL. </li></ul><ul><li>Frequently used multi-threaded. </li></ul>
    • 41. Add Default Index <ul><li>For each document in the container: </li></ul><ul><ul><li>Create an event reader (cursor) for the document </li></ul></ul><ul><ul><li>Iterate over the document&apos;s nodes </li></ul></ul><ul><ul><ul><li>If node tag, not in dictionary, add to dictionary </li></ul></ul></ul><ul><ul><ul><li>Collect index information into “keystash” </li></ul></ul></ul><ul><ul><li>Close event reader (cursor) </li></ul></ul><ul><ul><li>Take all items in keystash and add to index. </li></ul></ul>
    • 42. Add Default Index with cdsgroup_begin <ul><li>For each document in the container: </li></ul><ul><ul><li>Start cdsgroup. </li></ul></ul><ul><ul><li>Create an event reader (cursor) for the document. </li></ul></ul><ul><ul><li>Iterate over the document&apos;s nodes </li></ul></ul><ul><ul><ul><li>If node tag, not in dictionary, add to dictionary </li></ul></ul></ul><ul><ul><ul><li>Collect index information into “keystash” </li></ul></ul></ul><ul><ul><li>Close event reader (cursor) </li></ul></ul><ul><ul><li>End cdsgroup. </li></ul></ul><ul><ul><li>Take all items in keystash and add to index. </li></ul></ul>
    • 43. Coding with cdsgroup_begin <ul><ul><li>/* </li></ul></ul><ul><ul><li>* Assume that we have the document DB (doc_dbp) and </li></ul></ul><ul><ul><li>* dictionary DB (dict_dbp) open. </li></ul></ul><ul><ul><li>*/ </li></ul></ul><ul><ul><li>DB_TXN *tid; </li></ul></ul><ul><ul><li>DBC *dbc; </li></ul></ul><ul><ul><li>DBT keydbt; datadbt; </li></ul></ul><ul><ul><li>memset(&amp;keydbt, 0, sizeof(keydbt)); </li></ul></ul><ul><ul><li>memset(&amp;datadbt, 0, sizeof(datadbt)); </li></ul></ul><ul><ul><li>/* 1. Start cds group. */ </li></ul></ul><ul><ul><li>ret = dbenv-&gt;cdsgroup_begin(dbenv, &amp;tid); </li></ul></ul><ul><ul><li>/* 2. Create cursor on document database. </li></ul></ul><ul><ul><li>ret = doc_dbp-&gt;cursor(doc_dbp, tid, &amp;dbc, 0); </li></ul></ul>
    • 44. cdsgroup_begin (cont) <ul><ul><li>/* 3. Iterate over nodes in document. </li></ul></ul><ul><ul><li>while (ret = dbc-&gt;get(dbc, &amp;keydbt, &amp;datadbt, DB_FIRST); </li></ul></ul><ul><ul><li>ret == 0; </li></ul></ul><ul><ul><li>ret = dbc-&gt;get(dbc, &amp;keydbt, &amp;datadbt, DB_NEXT)) { </li></ul></ul><ul><ul><li>/* 4. Enter node in dictionary. */ </li></ul></ul><ul><ul><li>data_to_node(&amp;datadbt, nodename, nodeid); </li></ul></ul><ul><ul><li>ret = dict_dbp-&gt;put(dict_dbp, tid, nodename, nodeid, 0); </li></ul></ul><ul><ul><li>/* 5. Save index info into keystash. */ </li></ul></ul><ul><ul><li>} </li></ul></ul><ul><ul><li>/* 6. Close cursor. */ </li></ul></ul><ul><ul><li>dbc-&gt;close(adbc, 0); </li></ul></ul><ul><ul><li>/* 7.End group. */ </li></ul></ul><ul><ul><li>tid-&gt;commit(tid, 0); </li></ul></ul>
    • 45. Concurrent Data Store: What we Learned <ul><li>What is concurrent data store (CDS)? </li></ul><ul><li>When to use CDS. </li></ul><ul><li>Updating data with CDS. </li></ul><ul><li>Using DB_CDB_ALL and cdsgroup_begin. </li></ul><ul><li>(More) Use of secondary indices. </li></ul>
    • 46. End of Part II

    ×