Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Planning for Synchronization with Browser-Local Databases


Published on

Talk by Eric Farrar of Sybase at ZendCon 2009

Published in: Technology
  • Be the first to comment

Planning for Synchronization with Browser-Local Databases

  1. 1. Eric Farrar, Product Manager Sybase iAnywhere October 20 th , 2009 Preparing for Synchronization with Browser-local Databases
  2. 2. Web applications are always online. Aren’t they? <ul><li>Traditionally, but the lines are being blurred </li></ul><ul><li>Web applications are starting to act like desktop applications </li></ul><ul><li>Desktop applications are starting to act like web applications </li></ul><ul><li>Offline Application Caching </li></ul><ul><ul><li>Gears </li></ul></ul><ul><ul><li>HTML5 </li></ul></ul><ul><ul><li>Adobe Air </li></ul></ul><ul><li>This is a new space for web applications, but a well-known area for desktop and mobile applications </li></ul>
  3. 3. It Isn’t Only About Going Offline <ul><li>It’s also about speed-ups for online applications </li></ul><ul><li>Data intensive applications can save many round-trips to the server by storing reference data locally </li></ul>
  4. 4. Browser-Local Databases <ul><li>SQLite </li></ul><ul><ul><li>Gears </li></ul></ul><ul><ul><ul><li>Chrome </li></ul></ul></ul><ul><ul><ul><li>Android </li></ul></ul></ul><ul><ul><li>HTML 5 Specification Draft </li></ul></ul><ul><ul><ul><li>Firefox </li></ul></ul></ul><ul><ul><ul><li>Safari </li></ul></ul></ul><ul><ul><li>Adobe Air </li></ul></ul><ul><li>Semi-structured Storage (key-value pairs) </li></ul><ul><ul><li>Cookies </li></ul></ul><ul><ul><li>Flash Storage </li></ul></ul><ul><li>Isolated Storage </li></ul><ul><ul><li>Silverlight </li></ul></ul>
  5. 5. But... <ul><li>For most applications, the ultimate destination for local data is a central server (“consolidated database”) </li></ul><ul><li>This means two more copies of the same data that can be changed independently </li></ul><ul><li>This introduces synchronization! </li></ul>
  6. 6. Synchronization <ul><li>Synchronization is a huge topic </li></ul><ul><li>This talk aims to show two things: </li></ul><ul><ul><li>Synchronization is a complex problem that requires careful planning </li></ul></ul><ul><ul><li>The more planning done upfront, the easier the synchronization process will be </li></ul></ul><ul><li>Although related, this talk will discuss synchronization of application data , not the application itself (which is usually done separately) </li></ul>
  7. 7. How often are you synchronizing? <ul><li>Occasionally-connected </li></ul><ul><ul><li>Normal application state is connected </li></ul></ul><ul><ul><li>Disconnection is treated as special case </li></ul></ul><ul><li>Occasionally-disconnected </li></ul><ul><ul><li>Normal application state is disconnected </li></ul></ul><ul><ul><li>Connection is treated as a special case </li></ul></ul>
  8. 8. What are you synchronizing? <ul><li>Object-based synchronization </li></ul><ul><ul><li>Serialized objects (perhaps hierarchical) are the basic object of synchronization </li></ul></ul><ul><ul><li>Similar to document-based synchronization (CouchDB) </li></ul></ul><ul><li>Action-based synchronization </li></ul><ul><ul><li>Actions on data, rather than data itself, in synchronized </li></ul></ul><ul><ul><li>Store a log of actions to take once connection is available </li></ul></ul><ul><ul><li>Difficult to implement for moderately complex systems </li></ul></ul><ul><li>Row-based synchronization </li></ul><ul><ul><li>Individual database rows are the basic object of synchronization </li></ul></ul><ul><ul><li>Problem is the best defined </li></ul></ul><ul><ul><li>Often the ultimate destination will of the data will be a relational database </li></ul></ul>
  9. 9. “A few rows up, a few rows down. What’s so hard about that?” <ul><li>Synchronization is deceivingly complicated </li></ul><ul><li>Lets build an application that only synchronizes a single table holding a contact list </li></ul><ul><li>Typical record </li></ul>Contact ID Name Address City Last Contact 102 Homer Simpson 742 Evergreen Terrace Springfield 2009-10-10 10:00
  10. 10. Data Subsetting <ul><li>Each user should get all of their data, and only their data </li></ul><ul><li>Data must be filtered at the central server, not the end clients </li></ul><ul><ul><li>Network Traffic </li></ul></ul><ul><ul><li>Data Storage </li></ul></ul><ul><ul><li>Application Performance </li></ul></ul><ul><ul><li>User Experience </li></ul></ul><ul><ul><li>Security </li></ul></ul><ul><li>Clients should only need to provide a user name and some subscriptions, and the servers should be able to figure out what they need </li></ul>
  11. 11. Implementing Subsetting for Contacts <ul><li>Create a mapping (“subscription”) table </li></ul>RowID User ID City 1 0001 Springfield 2 0002 Ogdenville 3 0003 Springfield 4 0003 Ogdenville 5 ... ...
  12. 12. Adding Contacts <ul><li>Add the following contact </li></ul><ul><li>What do we use as a Contact ID? </li></ul><ul><ul><li>Autoincrement won’t work </li></ul></ul><ul><li>What can we do? </li></ul>Contact ID Name Address City Last Contact ???? Mr. Teeny 123 Fake Street Springfield 2009-10-10 10:00
  13. 13. GUIDs to the Rescue. Maybe... <ul><li>The simplest solution is to use Globally Unique Identifiers (GUIDs) </li></ul><ul><li>GUIDS are 128-bit number (often expressed as a 32-character alphanumeric string) </li></ul><ul><li>Can be safely generated and be guaranteed to be unique if... </li></ul><ul><ul><li> know the same generation algorithm is being used </li></ul></ul><ul><ul><li> have a good pseudo-random number generator </li></ul></ul><ul><li>Keys are meaningless, large, and awkward </li></ul><ul><li>May not integrate very well into an existing system </li></ul><ul><li>Can never use self-checking features in your primary keys </li></ul>
  14. 14. Composite Keys <ul><li>Assign each application (not each user) a unique ID number </li></ul><ul><li>Combine this unique ID along with a regular autoincrementing number </li></ul><ul><ul><li>Can be concatenated to create a single key, or use a composite-column key </li></ul></ul><ul><li>Key is much smaller than GUIDs, and they carry built-in meaning </li></ul><ul><li>Possible that you will exhaust your “key range” before you can synchronize and be assigned a new ID </li></ul><ul><li>Application IDs can be assigned by the central server using a simple autoincrement </li></ul>
  15. 15. Primary Key Pools <ul><li>Composite keys use Application ID to implicitly reserve a range of keys </li></ul><ul><li>Primary key pools explicitly reserve a range of keys by taking them! </li></ul><ul><li>At each synchronization, the application requests a range of unassigned keys from the central server </li></ul><ul><li>Every time the local application needs a key, it takes one from the primary key pool </li></ul><ul><li>Does not require a unique application identifier </li></ul><ul><li>Possible you will exhaust your pool of keys before synchronizing </li></ul><ul><li>Keys don’t carry any meaning </li></ul><ul><li>Complex to setup </li></ul>
  16. 16. Deleting a Contact <ul><li>Once you delete something, it is gone </li></ul><ul><li>If it is gone, how do you know you deleted it? </li></ul><ul><li>Must implement some method to “remember” that something has been deleted </li></ul><ul><ul><li>deleted status column </li></ul></ul><ul><ul><li>shadow (“tombstone”) table </li></ul></ul>
  17. 17. Updating a Column <ul><li>Need a method to distinguish that a column has been modified </li></ul><ul><ul><li>Status column </li></ul></ul><ul><ul><li>Last-Modified timestamp </li></ul></ul><ul><ul><li>Version number </li></ul></ul><ul><li>What happens if two people update the same row? </li></ul><ul><ul><li>Detect that a change conflict has happened </li></ul></ul><ul><ul><ul><li>Row-level vs Column-level conflict detection </li></ul></ul></ul><ul><ul><li>Resolve the conflict </li></ul></ul><ul><li>Idempotent changes </li></ul><ul><li>The “Delete-then-Insert” problem </li></ul><ul><li>Mapping data types to consolidated database </li></ul>
  18. 18. Non-Synchronized Deletes <ul><li>What if you want to delete something off your application, but not delete it system wide </li></ul><ul><li>Need some method to turn off your change tracking algorithm </li></ul>
  19. 19. Our Simple Contact List <ul><li>Went from... </li></ul><ul><ul><li>Contacts table </li></ul></ul>
  20. 20. Our Not-So-Simple Synced Contact List <ul><li>To... </li></ul><ul><ul><li>Contacts table </li></ul></ul><ul><ul><li>Contacts_deleted shadow table </li></ul></ul><ul><ul><li>Contacts_Users_subscription table </li></ul></ul><ul><ul><li>Contacts_key_pool table </li></ul></ul><ul><ul><li>TRIGGER AFTER UPDATE on Contacts </li></ul></ul><ul><ul><li>TRIGGER AFTER INSERT on Contacts </li></ul></ul><ul><ul><li>TRIGGER AFTER DELETE on Contacts </li></ul></ul><ul><ul><li>Disable/Enable change tracking </li></ul></ul><ul><li>And that says nothing about the actual synchronization or conflict resolution logic! </li></ul>
  21. 21. Data Reassignment <ul><li>A user is reassigned to a new city. What needs to happen? </li></ul><ul><ul><li>Complete a final upload of all change on the “old” set of data </li></ul></ul><ul><ul><li>Download a set of operations that delete all the old contacts </li></ul></ul><ul><ul><li>Download the contacts of the new city </li></ul></ul><ul><li>Easy enough for one table. But what happens when we have two or more tables with a foreign key relationship? </li></ul>
  22. 22. Referential Integrity: Friend or Foe? <ul><li>Reassignment problems can quickly become lost in a referential integrity nightmare </li></ul><ul><li>It may become tempting to disable (or at least never enforce) referential integrity checks </li></ul><ul><li>This is usually a bad idea: </li></ul><ul><ul><li>Referential integrity should be your friend. It ensures your data stays consistent </li></ul></ul><ul><ul><li>Likely your server-side database will enforce referential integrity, so it is better to do a client-side check before sending the data up </li></ul></ul><ul><ul><li>There are performance benefits to defining foreign key relations </li></ul></ul>
  23. 23. Application and Schema Upgrades <ul><li>The traditional problems associated with dealing with legacy deployed software are usually avoided by web applications </li></ul><ul><li>Cached versions of applications means you will no longer be able to guarantee that everyone is running with the latest version </li></ul><ul><li>Need some provision to let “older” applications synchronize against logic that is correct for their schema </li></ul><ul><li>This is typically achieved by adding a full level of indirection between the local database and the consolidated database </li></ul>
  24. 24. Data Integrity in the Field <ul><li>Everyone should always have a consistent view of the data at every moment </li></ul><ul><ul><li>Inconsistent data can quickly propagate though a synchronization environment and infect everyone </li></ul></ul><ul><li>This typically means synchronizations must be fully atomic and both ends. </li></ul><ul><ul><li>Error reporting must happen outside this atomic transaction, otherwise it would be lost </li></ul></ul><ul><li>Most applications can not handle half-synchronized data </li></ul><ul><li>Need to handle broken and partial synchronizations </li></ul>
  25. 25. What else might you want? <ul><li>High-priority synchronization </li></ul><ul><li>Implementing secure authentication at every point in the chain </li></ul><ul><li>Encryption </li></ul><ul><li>Server-initiated synchronization (“Push” synchronization) </li></ul><ul><li>Lots more... </li></ul>
  26. 26. Summary <ul><li>Synchronization is deceptively hard </li></ul><ul><ul><li>It is relatively easy to put together a simple, controlled synchronization in a lab between two computers </li></ul></ul><ul><ul><li>The real complications only show up in the real world </li></ul></ul><ul><li>A full synchronization strategy should be planned at the start </li></ul><ul><ul><li>All projects suffer from scope-creep </li></ul></ul><ul><ul><li>It is better to decide early on what you will and won’t do, and architect for it </li></ul></ul><ul><li>Synchronization is rarely, if ever, a simple “bolt-on” solution </li></ul><ul><li>Test with under realistic conditions and realistic load!!!! </li></ul>
  27. 27. <ul><li>A patched version of Gears that adds synchronization functionality </li></ul><ul><li>Allows the use of the Sybase UltraLite relational database </li></ul><ul><li>UltraLite is a small footprint, fully relational database capable of totally self-contained synchronization </li></ul><ul><li>Contains all its own change tracking and synchronization logic that is totally transparent to the end user </li></ul><ul><li>Synchronizes with a MobiLink Synchronization sever providing out-of-the-box synchronization with Oracle, SQL Server, DB2, ASE, SQL Anywhere, and MySQL </li></ul><ul><ul><li>10 years old </li></ul></ul><ul><ul><li>Heavily deployed and tested </li></ul></ul><ul><li>Handles all of the mechanics and plumbing of synchronization, and lets you focus on your business logic </li></ul><ul><li>Business logic can be written in SQL, .NET, or Java </li></ul>
  28. 28. <ul><li>Implemented as an open-source patch to the Gears project released under the Apache 2 project </li></ul><ul><li>Beta went live last night  </li></ul><ul><li>Available for Internet Explorer (Windows) and Firefox (Windows and Linux) </li></ul><ul><li>Free deployment for SQL Anywhere and MySQL-based applications </li></ul><ul><li> </li></ul>
  29. 29. <ul><li>Thank You </li></ul><ul><li>Eric Farrar </li></ul><ul><li>[email_address] </li></ul><ul><li> </li></ul>