Oracle Berkeley DB - Data Storage (DS) Tutorial
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share

Oracle Berkeley DB - Data Storage (DS) Tutorial

  • 3,689 views
Uploaded on

A deep dive presentation into the Oracle Berkeley DB Data Storage APIs.

A deep dive presentation into the Oracle Berkeley DB Data Storage APIs.

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
3,689
On Slideshare
3,688
From Embeds
1
Number of Embeds
1

Actions

Shares
Downloads
65
Comments
0
Likes
1

Embeds 1

http://www.techgig.com 1

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • Goal of the overview is to leave you with a sense of when Berkeley DB might be the right solution to a problem Other things to include: DBTs, memory management of DBTs Configuring environments and databases Other access methods
  • Not a rigorous set of numbers, but a feeling for how fast this can go.
  • Before we get into too much depth, let’s look at how easy it is to code BDB.
  • Expect command line to accept options to: set number of items in DB, remove DB or not, put items in DB.
  • Create handle/create/open database
  • Skip initialization of datadbt for space reasons only Omitted timer code as well
  • Removed some printf and assertions
  • Dbp->close is a destructor (as is remove).
  • In this part of the tutorial we are going to concentrate on the simplest version of our product, data store. However, here is an overview of the richer feature sets and other products.
  • Goal of the overview is to leave you with a sense of when Berkeley DB might be the right solution to a problem: Things this use case addresses: Environments, databases, cursors, get, put, data organization
  • Berkeley DB is used to collect, aggregate and forward event and performance records A DB_QUEUE acts as a buffer to hold unsolicited events from managed devices Periodically, the queue is drained Archival, inventory, event, and performance records are collected in per-device Btrees which are exported on a daily basis into CSV files If the switch crashes; data is no longer useful, so no recovery needed.
  • There are other flags; these are the main ones We will talk about recovery after we’ve discussed transactional applications. Open is the same as create without the DB_CREATE flag
  • Possible to create databases without a dbenv, but we see people doing this increasingly rarely. Set record length and pad character (for short data). Describe various combinations of filename and database name being NULL or non-NULL: If filename and database name are specified, file can contain multiple databases. If filename is specified, but database name is NULL, it’s a file with only one database in it. If filename and database name are NULL, it’s an in-memory database. If filename is NULL and database name is specified, it’s an in-memory named database and may participate in a replication system.
  • Consume will get and delete. Repeated calls will continue eating the first entry. Another form CONSUME_WAIT will block until there is something in the queue.
  • Note that a relational database probably would require you to have N databaes per record type.
  • Emphasize that the other fields in these records could be completely and totally different from each other -- different fields and different sizes. Note that we will need to deal with differentiating records of different types, but we’ll come to this.
  • Type field will allow us to differentiate
  • Comment that there are LOTS of things you can configure -- DUPLICATES is just one -- page size, comparison function, etc. This produces a highly Configurable storage engine.
  • Think of this arrangement is kind of partitioning the database into four different segments, one for each type.
  • NEED TO UPDATE CODE
  • Type field will allow us to differentiate
  • NEED TO UPDATE CODE
  • Picking between 3 and 4 really depends on what you want your spreadsheets to look like and what kinds of queries you want to run on the data.
  • NEED TO UPDATE CODE
  • Typically, dbremove is the better choice; use db->remove if you don’t have an environment (dbenv->dbremove can be transactional). Now you can/should recreate the database.
  • Replace dumb green boxes with icons.
  • Customer database; Order database Phone log database Secondaries for customers on phone number; zip code
  • Catalog might need duplicates (old versions of a product or the like). Order: could be a non-duplicated thing with a variable number of “items” OR a duplicated thing with some number of items as duplicates; or non-duplicates with a counter added in the key.
  • Might want to have a reverse index from items to every order in which someone ordered a particular item.
  • Blue; order info; red and black are the items; Field are: customer_id;data;total;shipdate;nitems
  • Order_item array has been removed from order_info
  • Blue; order info; red and black are the items; Field are: customer_id;data;total;shipdate;nitems
  • Note: Erorr checking omitted.
  • Blue; order info; red and black are the items; Field are: customer_id;data;total;shipdate;nitems
  • Note: Erorr checking omitted.
  • Secondaries only work on primaries WITHOUT duplicates

Transcript

  • 1. A Use-Case Based Tutorial Oracle Berkeley DB Data Store
  • 2. Part I: Data Store (DS)
    • Overview
      • Creating and removing databases
      • Getting and putting data
    • Case Studies
      • Telecom use case
      • Customer account management
  • 3. Berkeley DB Data Store
    • Simple indexed data storage
    • Attributes
      • Blindingly fast
      • Easy to use
      • APIs in the languages of your choice
    • Limitations
      • No concurrency
      • No recovery
  • 4. Blindingly Fast
    • Platform Description
      • 2.33 GHz Intel Core 2 Duo
      • 2 GB 667 MHz DDR2 SDRAM
      • Mac OSX 10.4.10
      • Window manager and apps all up and running
    • Insert 1,000,000 8-byte keys with 25-byte data
      • 180,000 inserts/second
    • Retrieve same keys/data
      • 750,000 gets/second
    • Bulk get of same key/data pairs
      • 2,500,000 pairs/second
  • 5. Easy to Use
    • Code for benchmark program
      • Main loop
      • Create/open database
      • Put keys
      • Get keys
      • Close/remove database
      • (Omitted) Command line processing, key generation
  • 6. Main Program
    • main(argc, argv)
    • int argc;
    • char *argv[];
    • {
    • DB *dbp;
    • int do_n, rm_db, skip_put;
    • do_n = DEFAULT_N;
    • rm_db = 0;
    • skip_put = 0;
    • do_cmdline(argc, argv, &do_n, &rm_db, &skip_put);
    • create_database(&dbp);
    • if (!skip_put)
    • do_put(dbp, do_n);
    • do_bulk_get(dbp, do_n);
    • do_get(dbp);
    • close_database(dbp, rm_db);
    • }
  • 7. Create/Open Database
    • static void
    • create_database(dbpp)
    • DB **dbpp;
    • {
    • DB *dbp;
    • /* Create database handle. */
    • assert(db_create(&dbp, NULL, 0) == 0);
    • /* Open/Create btree database. */
    • assert(dbp->open(dbp,
    • NULL, DBNAME, NULL, DB_BTREE, DB_CREATE, 0644) == 0);
    • *dbpp = dbp;
    • }
  • 8. Put Data
    • static void
    • do_put(dbp, n)
    • DB *dbp;
    • int n;
    • {
    • char key[KEYLEN];
    • DBT datadbt, keydbt;
    • int i;
    • /* Initialize key DBT (repeat with Data, omitted) */
    • strncpy(key, FIRST_KEY, KEYLEN);
    • memset(&keydbt, 0, sizeof(keydbt));
    • keydbt.data = key;
    • keydbt.size = KEYLEN;
    • for (i = 0; i < n; i++) {
    • assert(dbp->put(dbp,
    • NULL, &keydbt, &datadbt, 0) == 0);
    • next_key(key);
    • }
    • }
  • 9. Get Data
    • static void
    • do_get(dbp)
    • DB *dbp;
    • {
    • DBC *dbc;
    • DBT keydbt, datadbt;
    • int n, ret;
    • assert(dbp->cursor(dbp, NULL, &dbc, 0) == 0);
    • memset(&keydbt, 0, sizeof(keydbt));
    • memset(&datadbt, 0, sizeof(datadbt));
    • n = 0;
    • TIMER_START;
    • while ((ret = dbc->get(dbc,
    • &keydbt, &datadbt, DB_NEXT)) == 0) {
    • n++;
    • }
    • TIMER_STOP;
    • TIMER_DISPLAY(n);
    • }
  • 10. Close and Remove Database
    • static void
    • close_database(dbp, remove)
    • DB *dbp;
    • int remove;
    • {
    • assert(dbp->close(dbp, 0) == 0);
    • if (remove) {
    • assert(db_create(&dbp, NULL, 0) == 0);
    • assert(dbp->remove(dbp,
    • DBNAME, NULL, 0) == 0);
    • }
    • }
  • 11. APIs in many Languages
    • C/C++
    • Java
    • Python
    • Perl
    • Tcl
    • Ruby
    • Etc.
  • 12. Berkeley DB Product Family C API Berkeley DB XML C++ API Java API Java Collections API Runs on UNIX, Linux, MacOS X, Windows, VxWorks, QNX, etc. APIs for C, C++, Java, Perl, Python, PHP, Ruby, Tcl, Eiffel, etc. Java API DS CDS TDS HA Berkeley DB C++ API Java API Berkeley DB Java Edition CDS TDS
  • 13. Part I: Data Store
    • Overview
      • Creating and removing databases
      • Getting and putting data
    • Case Studies
      • Telecom Use Case
        • Creating/using an environment
        • Filling/draining a queue
        • Btrees: storing different record types in a single database
        • Cursors
      • Customer account management
  • 14. Telecommunications Use Case
    • Radio access network optimization system
    • Vendor-independent multi-service multi-protocol RAN aggregation plus optimization
    • Interoperable with all major base stations
    • Proven, world-wide deployment
    • Functional over range of operating conditions
    • Low TCO
  • 15. Berkeley DB in Telecom Gateway BDB Queue BDB Btrees: One per device Unsolicited events Drain queue Archive, Inventory, Event and Performance Records CSV export
  • 16. BDB Access Methods
    • DB_QUEUE/DB_BTREE:
      • Berkeley DB supports a variety of different indexing mechanisms.
      • Applications select the appropriate mechanism for the data management task at hand.
      • For example:
        • DB_QUEUE: FIFO order, fixed size data, numerical keys
        • DB_BTREE: Clustered lookup, variable length keys/data
    • Other index types (access methods)
      • Hash: truly large unordered data
      • Recno: data with no natural key; numerical keys assigned
  • 17. Databases & Environments
    • Each different data collection is a database .
    • Databases may be gathered together into environments .
    • An environment is a logically related collection of databases.
    • Example:
      • The NMS queue and per-device btrees all reside in a single environment for a single instance of AccessGate.
  • 18. Creating Environments
    • Create the handle using db_env_create()
    • Open using DB_ENV->open()
      • Use DB_CREATE flag to create a new environment.
      • Use DB_INIT_MPOOL for a shared memory buffer pool
      • Use DB_PRIVATE to keep environment in main memory.
      • Use DB_THREAD to allow threaded access to the DB_ENV handle.
  • 19. Environment Example
    • /* Create handle. */
    • ret = db_env_create(&dbenv, 0);
    • … error_handling …
    • ret = dbenv->set_cachesize(dbenv, 1, 0, 1);
    • … error_handling …
    • /* Other configuration here. */
    • /* Create and open the environment. */
    • flags = DB_CREATE |DB_INIT_MPOOL;
    • ret = dbenv->open(dbenv, HOME, flags, 0);
    • … error_handling …
  • 20. Filling and Draining a Queue
    • Creating a queue database.
    • Appending into a queue.
    • Draining a queue
    • Important attributes:
      • Queue supports only fixed-length records
      • Queue “keys” are integer record numbers
      • Queues do support random access
  • 21. Creating a Queue Database
    • Create the handle using db_create()
    • Open using DB->open()
      • Use DB_CREATE flag to create a new database
    • Close using DB->close()
      • Never close a database with open cursors or transactions.
  • 22. Creating a Queue
    • /* Create a database handle. */
    • ret = db_create(&dbp, dbenv, 0);
    • … error_handling …
    • /* Set the record length to 256 bytes. */
    • ret = dbp->set_relen(dbp, 256);
    • … error_handling …
    • ret = dbp->set_repad(dbp, (int)’ ‘);
    • … error_handling …
    • /* Create and open the database. */
    • ret = dbp->open(dbp, NULL, file, database,
    • DB_QUEUE,DB_CREATE, 0666);
    • … error_handling …
  • 23. Enqueuing Data
    • DB->put() inserts data into a database.
    • Appending onto the end of a queue:
      • DBT key, data;
      • memset(&key, 0, sizeof(key));
      • memset(&data, 0, sizeof(data));
      • data->data = “fill in the record here”;
      • data->size = DATASIZE;
      • ret = dbp->put(dbp, NULL, &key, &data, DB_APPEND);
      • if (ret != 0)
      • … error_handling …
      • /* Key->data contains the new record number. */
  • 24. Draining the Queue
    • DB->get() retrieves records
      • memset(&key, 0, sizeof(key));
      • memset(&data, 0, sizeof(data));
      • While ((ret = dbp->get(dbp,
      • NULL, &key, &data, DB_CONSUME) == 0) {
      • /* Do something with record. */
      • }
      • if (ret != 0)
      • … error_handling …
  • 25. Btrees: Different record types in a single database
    • One btree per device
    • Each btree contains four record types
      • Archive
      • Inventory
      • Event
      • Performance
    • Not necessary to have N databases for N record types.
  • 26. Record Types
    • Different records (with different fields/sizes):
    • struct archive { struct event {
    • int32_t field1; int64_t field1;
    • … other fields here … other fields here
    • }; };
    • struct inventory { struct perf {
    • double field1; char field1[16];
    • … other fields here … other fields here
    • };
    • };
    • Assumptions:
      • Every record has a timestamp
      • May need to encode types of different records
  • 27. Data Design 1
    • Index database by timestamp
      • Add “type” field to every record.
      • Allow duplicate data records for any timestamp
    • Advantages:
      • Groups data temporally
    • Disadvantages:
      • Difficult to get records of a particular type
  • 28. Allowing Duplicates
    • int create_database (DB_ENV *dbenv, DB *dbpp, char *name)
    • {
    • DB *dbp;
    • int ret;
    • if ((ret = db_create(&dbp, dbenv, 0)) != 0)
    • return (ret);
    • if ((ret = dbp->set_flags(dbp, DB_DUP )) != 0 ||
    • ((ret = dbp->open(dbp, NULL, name, NULL, DB_BTREE, 0, 0644)) != 0)
    • (void)dbp->close(dbp, 0);
    • return (ret);
    • *dbpp = dbp;
    • return (ret);
    • }
  • 29. Inserting a Record (design 1)
    • int insert_archive_record(DB *dbp, timestamp_t ts, char *buf)
    • {
    • DBT key, data;
    • struct archive a;
    • memset(&key, 0, sizeof(key));
    • memset(&data, 0, sizeof(data));
    • key->data = &ts;
    • key->size = sizeof(ts);
    • a->type = ARCHIVE_TYPE;
    • BUFFER_TO_ARCHIVE(buf, a);/
    • data->data = &a;
    • data->size = sizeof(a);
    • return (dbp->put(dbp, &key, &data, 0));
    • }
  • 30. Data Design 2
    • Index database by type
      • Add “timestamp” field to every record.
      • Allow duplicate data records for any type.
    • Advantages:
      • Groups data by type
    • Disadvantages:
      • Difficult to get records for a given time
      • Many records will have the same type
  • 31. Inserting a Record (design 2)
    • int insert_archive_record(DB *dbp, timestamp_t ts, char *buf)
    • {
    • DBT key, data;
    • RECTYPE type;
    • struct archive a;
    • memset(&key, 0, sizeof(key));
    • memset(&data, 0, sizeof(data));
    • type = ARCHIVE_TYPE;
    • key->data = &type;
    • key->size = sizeof(type);
    • a->timestamp = ts;
    • BUFFER_TO_ARCHIVE(buf, a);
    • data->data = &a;
    • data->size = sizeof(a);
    • return (dbp->put(dbp, &key, &data, 0));
    • }
  • 32. Data Design 3
    • Index database by type and timestamp
      • May not need to allow duplicates
    • Advantages:
      • Groups data by type and by timestamp within type
      • Fast retrieval by type
      • Reasonable retrieval by timestamp
    • Disadvantages:
      • Nothing obvious
  • 33. Inserting a Record (design 3)
    • int insert_archive_record(DB *dbp, timestamp_t ts, char *buf)
    • {
    • DBT key, data;
    • struct archive a;
    • struct akey {
    • timestamp_t ts;
    • RECTYE type;
    • } archive_key;
    • memset(&key, 0, sizeof(key));
    • memset(&data, 0, sizeof(data));
    • archive_key.ts = ts;
    • archive_key.type = ARCHIVE_TYPE;
    • key->data = &archive_key;
    • key->size = sizeof(archive_key);
    • BUFFER_TO_ARCHIVE(buf, a);
    • data->data = &a;
    • data->size = sizeof(a);
    • return (dbp->put(dbp, &key, &data, 0));
    • }
  • 34. Data Design 4
    • Create four databases in a single file
      • Key each database by timestamp
    • Advantages:
      • Groups data by type
      • Fast retrieval by type
      • Reasonable retrieval by timestamp
    • Disadvantages:
      • Nothing obvious
  • 35. Multiple Database in a File
    • int create_database (DB_ENV *dbenv, DB *db_array, char **names)
    • {
    • DB *dbp;
    • int i, ret;
    • for (i = 0; i < 4; i++) {
    • db_array[i] = dbp = NULL;
    • if ((ret = db_create(&dbp, dbenv, 0)) != 0 ||
    • ((ret = dbp->open(dbp,
    • NULL, “DBFILE”, names[i], DB_BTREE, 0, 0644)) != 0)
    • if (dbp != NULL)
    • (void)dbp->close(dbp, 0);
    • break;
    • db_array[i] = dbp;
    • }
    • if (ret != 0)
    • /* Close all non-Null entries in db_array. */
    • return (ret);
  • 36. Inserting a Record (design 4)
    • int insert_record(DB *db_array, timestamp_t ts, RECTYPE type, char *buf, size_t buflen)
    • {
    • DBT key, data;
    • memset(&key, 0, sizeof(key));
    • memset(&data, 0, sizeof(data));
    • key->data = &ts;
    • key->size = sizeof(ts);
    • data->data = buf;
    • data->size = buflen;
    • if (type == ARCHIVE_TYPE)
    • ret = put_rec(db_array[ARCHIVE_NDX], &key, &data);
    • else if (type == EVENT_TYPE)
    • ret = put_rec(db_array[EVENT_NDX], &key, &data);
    • else if (type == INVENTORY_TYPE)
    • ret = put_rec(db_array[INVENTORY_NDX], &key, &data);
    • else if (type == PERF_TYPE)
    • ret = put_rec(db_array[PERF_NDX], &key, &data);
    • else /* Signal invalid record type */
    • return (ret);
    • }
  • 37. Inserting a Record (cont)
    • int put_rec(DB *dbp, DBT *key, DBT *data)
    • {
    • return (dbp->put(dbp, NULL, &key, &data, 0));
    • }
  • 38. Data Design:Summary
    • Berkeley DB’s schemaless design provides applications flexibility.
    • Tune data organization for performance, ease of programming, or other criteria.
    • Create indexes based upon query needs.
  • 39. Exporting to CSV
    • Assume that we used design 4
      • Four databases in a single file
      • One database per record type
    • Let’s say that we want one CSV file per database.
    • Each CSV export will look very similar:
      • Use a cursor to iterate over the database.
      • For each record, output the CSV entry.
  • 40. Iteration with cursors
    • A cursor represents a position in the database.
    • The DB->cursor method creates a cursor.
    • Cursors have methods to get/put data.
      • DBC->del
      • DBC->get
      • DBC->put
    • Close cursors when done.
  • 41. Iteration (code)
    • int archive_to_csv(DB *dbp)
    • {
    • DBC dbc;
    • DBT key, data;
    • struct archive *a;
    • int ret;
    • if ((ret = dbp->cursor(dbp, NULL, &dbc, 0)) != 0)
    • return (ret);
    • memset(&key, 0, sizeof(key));
    • memset(&data, 0, sizeof(data));
    • while ((ret = dbc->get(dbc, &key, &data, DB_NEXT)) == 0) {
    • a = (struct archive *)data->data;
    • /* Output archive structure into CSV */
    • }
    • if (ret == DB_NOTFOUND)
    • ret = 0;
    • return (ret);
    • }
  • 42. Cleaning out the Database
    • After creating the CSV, we want to remove the data from the Berkeley DB databases.
    • Two options:
      • Delete and recreate the database.
      • Truncate the database (leave the database and remove all the data).
  • 43. Removing a Database
    • Removal with a DB handle (un-opened)
    • ret = db_create(&dbp, NULL, 0);
    • … error_handling …
    • ret = dbp->remove(dbp, file, database, 0);
    • … error handling …
    • Removal with a DB_ENV (no DB handle)
    • ret = dbenv->dbremove(dbenv, NULL,
    • file, database, 0);
    • … error handling …
  • 44. Truncating a Database
    • Truncate with an open DB handle
    • ret = dbp->truncate(dbp, NULL, &nrecords, 0);
    • … error handling …
    • Nrecords returns the number of records removed from the database during the truncate.
  • 45. Part I: Data Store
    • Overview
      • Creating and removing databases
      • Getting and putting data
    • Case Studies
      • Telecom Use Case
      • Customer account management
        • Secondary indexes
        • Cursor-based manipulation
        • Configuration and tuning
  • 46. Customer Account Management Orders Log Call Customer data Create order Customer history Add Customer Warehouse Support Manager Fulfill orders View last week’s calls Labels by zip Database
  • 47. The Schema Customer Data Get by name Get by ID Update Add new Catalog Data Get by name Get by ID Call log Append Get by time Get by customer Order Data Get by ID Get by customer Update Add new
  • 48. Secondary Indexes
    • Note that most of the data objects require lookup in (at least) two different ways.
    • Berkeley DB keys are primary (clustered) indexes.
    • Berkeley DB also supports secondary (unclustered) indexes.
      • Automatic maintenance in the presence of insert, delete, update
  • 49. Selecting keys and indexes Database Primary Key Secondary Key Duplicates? Customer Customer ID Name No Catalog Product ID Name No Order Order ID Customer Maybe Calls Timestamp Customer No
  • 50. Secondaries internally Customer Primary Database Key Data 01928374 IBM; Hawthorne, NY … 19283746 HP; Cupertino, CA … 28910283 Dell; Austin, TX … … … Customer Secondary Key Data Dell 28910283 IBM 01928374 HP 19283746 … …
  • 51. Secondaries Programmatically
    • Create new database to contain secondary.
    • Define callback function that extracts secondary key from primary key/data pair.
    • Associate secondary with (open) primary.
  • 52. Customer Secondary (1)
    • Assume our customer data item contains a structure:
    • struct customer {
    • char name[20];
    • char address1 [20];
    • char address2 [20];
    • char state[2];
    • int zip;
    • };
  • 53. Customer Secondary (2)
    • Need callback function to extract secondary key, given a primary key/data pair:
    • int customer_callback(DB *dbp, DBT *key, DBT *data, DBT *result)
    • {
    • struct customer rec;
    • rec = data.data;
    • result.data = rec.name;
    • result.size = sizeof(rec.name);
    • return (0);
    • }
  • 54. Customer Secondary (3)
    • Associate the (open) primary with the (open) secondary:
    • ret = pdbp->open(pdbp, NULL, “customer.db”, NULL, DB_BTREE, 0);
    • if (ret != 0)
    • return (ret);
    • ret = sdbp->open(sdbp, NULL, “custname.db”, NULL, DB_BTREE, 0);
    • if (ret != 0){
    • (void)pdbp->close(pdbp, 0);
    • return (ret);
    • }
    • ret = pdbp->associate(pdbp, NULL, sdbp, customer_callback, 0);
    • if (ret != 0)
    • return (ret);
  • 55. Representing Orders
    • What is an order?
      • An order number
      • A customer
      • A date/time
      • A collection of items
    • How do we represent orders?
      • Key by order number with items in a single data item
      • Key by order number with duplicates for items
      • Key by order number/sequence number
  • 56. Orders: Multiple Items in Data
    • typedef int order_id; /* Primary key */
    • struct order_info {
    • int customer_id;
    • time_t date;
    • float total;
    • time_t shipdate;
    • int nitems; /* Number of items to follow. */
    • char[1] order_items; /* Must martial all items into here */
    • };
    • Struct order_item {
    • int item_id;
    • int nitems; /* per-item price stored elsewhere */
    • float total;
    • };
  • 57. Orders: Multiple Items in Data -- Secondaries
    • Secondary on customer_id
      • Just like what we did on customer
      • Create a callback that extracts the customer_id field from the data field.
      • Associate secondary with primary.
    • Secondary on items
      • Need to return multiple secondary keys for a single primary record.
  • 58. Secondaries: Return multiple keys per key/data pair
    • int item_callback(DB *dbp, DBT *key, DBT *data, DBT *result)
    • {
    • struct order_info rec;
    • DBT **dbta;
    • int i;
    • order = data.data;
    • result.flags = DB_DBT_MULTIPLE | DB_DBT_APPMALLOC;
    • result.size = order->nitems;
    • result.data = calloc(sizeof(DBT), order->nitems);
    • if (result.data == NULL)
    • return (ENOMEM);
    • dbta = result.data;
    • for (i = 0; i < order->nitems; i++) {
    • dbta[i].size = sizeof(order_item);
    • dbta[I].data = order->order_items[i];
    • }
    • return (0);
    • }
  • 59. Secondaries: Example of multiple keys per key/data pair Order Primary Database Key Data 9876543 19283746;3/17/08;138.50;3/18/09;3 0003;2;130.00 ;0019;1;8.50 8769887 19283877;3/16/08;1024.00;3/16/0810 0092;10;1024.00 5430987 90867654;3/16/08;564.98;;5 0003;3;195.00 ;0092;1;102.40; 9902;1;267.58 … … Order/Item Secondary Key Data 0003 9876543 5439087 0019 9876543 0092 8769887 5430987 9902 5430987 … …
  • 60. Orders: Duplicates for Items
    • Modify previous structures slightly:
    • typedef int order_id; /* Primary key */
    • struct order_info {
    • int customer_id;
    • time_t date;
    • float total;
    • time_t shipdate;
    • int nitems; /* Number of items to follow. */
    • };
    • struct order_item {
    • int item_id;
    • int nitems; /* per-item price stored elsewhere */
    • float total;
    • };
    • First duplicate is order; rest are items
  • 61. Orders: Items as duplicates Example Order Primary Database Key Data 9876543 19283746;3/17/08;138.50;3/18/09 0003;2;130.00 0019;1;8.50 8769887 19283877;3/16/08;1024.00;3/16/0810 0092;10;1024.00 5430987 90867654;3/16/08;564.98;;5 0003;3;195.00 0092;1;102.40 9902;1;267.58
  • 62. Orders: Duplicates for Items Inserting an order int insert_order(int order_id, struct orderer_info *oi; struct order_items **oip) { DBT keydbt, datadbt; int I; keydbt.data = &order_id; keydbt.size = sizeof(order_id); datadbt.data = oi; datadbt.size = sizeof(order_info); dbp->put(dbp, NULL, &keydbt, &datadbt, DB_NODUPDATA); for (i = 0; i < oi->nitems; i++) { datadbt.data = *oip++; datadbt.size = sizeof(order_item); dbp->put(dbp, NULL, &keydbt, &datadbt, 0); } }
  • 63. Orders: Duplicates for Items Trade-offs
    • Must configure database for duplicates.
    • - No automatic secondaries (must hand-code).
    • + Easy to add/remove items from order.
  • 64. Orders: Key by order_id/seqno
    • struct pkey { /* Primary key */
    • int order_id;
    • int seqno;
    • };
    • struct order_info {
    • int customer_id;
    • time_t date;
    • float total;
    • time_t shipdate;
    • int nitems; /* Number of items to follow. */
    • };
    • struct order_item {
    • int item_id;
    • int nitems; /* per-item price stored elsewhere */
    • float total;
    • };
  • 65. Orders: key by order/seqno Example Order/Item Secondary Key Data 0003 9876543/01 0003 5439087/01 0019 9876543/02 0092 8769887/01 0092 5430987/02 9902 5430987/03 … … Order Primary Database Key Data 9876543/00 19283746;3/17/08;138.50; 3/18/09 9876543/01 0003;2;130.00 9876543/02 0019;1;8.50 8769887/00 19283877;3/16/08;1024.00; 3/16/0810 8769887/01 0092;10;1024.00 5430987/00 90867654;3/16/08;564.98;;5 5430987/01 0003;3;195.00 5430987/02 0092;1;102.40 5430987/03 9902;1;267.58
  • 66. Orders: Key by order/seqno Inserting an order int insert_order(int order_id, struct orderer_info *oi; struct order_items **oip) { DBT keydbt, datadbt; struct pkey; int i; pkey.order_id = order_id; pkey.seqno = 0; keydbt.data = &pkey; keydbt.size = sizeof(pkey); datadbt.data = oi; datadbt.size = sizeof(order_info); dbp->put(dbp, NULL, &keydbt, &datadbt, DB_NODUPDATA); for (i = 0; i < oi->nitems; i++) { pkey.seqno = i + 1; datadbt.data = *oip++; datadbt.size = sizeof(order_item); dbp->put(dbp, NULL, &keydbt, &datadbt, 0); } }
  • 67. Order Trade-offs
    • ID Key w/multiple items in data
      • One get/put per order
      • No duplicates means automatic secondary support
      • Add/delete/update item is somewhat messier
    • ID key w/duplicate data items
      • Easy to add/delete/update order items
      • Must implement secondaries manually
      • Where do you place per-order info (CID, data) (first dup?)
    • Key is ID/sequence number
      • Easy to add/delete/update order items
      • Can support secondaries automatically
      • Where do you place per-order info (data item 0?)
  • 68. End of Part I