Planning for Synchronization with Browser-Local Databases


Published on

Talk by Eric Farrar of Sybase at ZendCon 2009

Published in: Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Planning for Synchronization with Browser-Local Databases

  1. 1. Eric Farrar, Product Manager Sybase iAnywhere October 20 th , 2009 Preparing for Synchronization with Browser-local Databases
  2. 2. Web applications are always online. Aren’t they? <ul><li>Traditionally, but the lines are being blurred </li></ul><ul><li>Web applications are starting to act like desktop applications </li></ul><ul><li>Desktop applications are starting to act like web applications </li></ul><ul><li>Offline Application Caching </li></ul><ul><ul><li>Gears </li></ul></ul><ul><ul><li>HTML5 </li></ul></ul><ul><ul><li>Adobe Air </li></ul></ul><ul><li>This is a new space for web applications, but a well-known area for desktop and mobile applications </li></ul>
  3. 3. It Isn’t Only About Going Offline <ul><li>It’s also about speed-ups for online applications </li></ul><ul><li>Data intensive applications can save many round-trips to the server by storing reference data locally </li></ul>
  4. 4. Browser-Local Databases <ul><li>SQLite </li></ul><ul><ul><li>Gears </li></ul></ul><ul><ul><ul><li>Chrome </li></ul></ul></ul><ul><ul><ul><li>Android </li></ul></ul></ul><ul><ul><li>HTML 5 Specification Draft </li></ul></ul><ul><ul><ul><li>Firefox </li></ul></ul></ul><ul><ul><ul><li>Safari </li></ul></ul></ul><ul><ul><li>Adobe Air </li></ul></ul><ul><li>Semi-structured Storage (key-value pairs) </li></ul><ul><ul><li>Cookies </li></ul></ul><ul><ul><li>Flash Storage </li></ul></ul><ul><li>Isolated Storage </li></ul><ul><ul><li>Silverlight </li></ul></ul>
  5. 5. But... <ul><li>For most applications, the ultimate destination for local data is a central server (“consolidated database”) </li></ul><ul><li>This means two more copies of the same data that can be changed independently </li></ul><ul><li>This introduces synchronization! </li></ul>
  6. 6. Synchronization <ul><li>Synchronization is a huge topic </li></ul><ul><li>This talk aims to show two things: </li></ul><ul><ul><li>Synchronization is a complex problem that requires careful planning </li></ul></ul><ul><ul><li>The more planning done upfront, the easier the synchronization process will be </li></ul></ul><ul><li>Although related, this talk will discuss synchronization of application data , not the application itself (which is usually done separately) </li></ul>
  7. 7. How often are you synchronizing? <ul><li>Occasionally-connected </li></ul><ul><ul><li>Normal application state is connected </li></ul></ul><ul><ul><li>Disconnection is treated as special case </li></ul></ul><ul><li>Occasionally-disconnected </li></ul><ul><ul><li>Normal application state is disconnected </li></ul></ul><ul><ul><li>Connection is treated as a special case </li></ul></ul>
  8. 8. What are you synchronizing? <ul><li>Object-based synchronization </li></ul><ul><ul><li>Serialized objects (perhaps hierarchical) are the basic object of synchronization </li></ul></ul><ul><ul><li>Similar to document-based synchronization (CouchDB) </li></ul></ul><ul><li>Action-based synchronization </li></ul><ul><ul><li>Actions on data, rather than data itself, in synchronized </li></ul></ul><ul><ul><li>Store a log of actions to take once connection is available </li></ul></ul><ul><ul><li>Difficult to implement for moderately complex systems </li></ul></ul><ul><li>Row-based synchronization </li></ul><ul><ul><li>Individual database rows are the basic object of synchronization </li></ul></ul><ul><ul><li>Problem is the best defined </li></ul></ul><ul><ul><li>Often the ultimate destination will of the data will be a relational database </li></ul></ul>
  9. 9. “A few rows up, a few rows down. What’s so hard about that?” <ul><li>Synchronization is deceivingly complicated </li></ul><ul><li>Lets build an application that only synchronizes a single table holding a contact list </li></ul><ul><li>Typical record </li></ul>Contact ID Name Address City Last Contact 102 Homer Simpson 742 Evergreen Terrace Springfield 2009-10-10 10:00
  10. 10. Data Subsetting <ul><li>Each user should get all of their data, and only their data </li></ul><ul><li>Data must be filtered at the central server, not the end clients </li></ul><ul><ul><li>Network Traffic </li></ul></ul><ul><ul><li>Data Storage </li></ul></ul><ul><ul><li>Application Performance </li></ul></ul><ul><ul><li>User Experience </li></ul></ul><ul><ul><li>Security </li></ul></ul><ul><li>Clients should only need to provide a user name and some subscriptions, and the servers should be able to figure out what they need </li></ul>
  11. 11. Implementing Subsetting for Contacts <ul><li>Create a mapping (“subscription”) table </li></ul>RowID User ID City 1 0001 Springfield 2 0002 Ogdenville 3 0003 Springfield 4 0003 Ogdenville 5 ... ...
  12. 12. Adding Contacts <ul><li>Add the following contact </li></ul><ul><li>What do we use as a Contact ID? </li></ul><ul><ul><li>Autoincrement won’t work </li></ul></ul><ul><li>What can we do? </li></ul>Contact ID Name Address City Last Contact ???? Mr. Teeny 123 Fake Street Springfield 2009-10-10 10:00
  13. 13. GUIDs to the Rescue. Maybe... <ul><li>The simplest solution is to use Globally Unique Identifiers (GUIDs) </li></ul><ul><li>GUIDS are 128-bit number (often expressed as a 32-character alphanumeric string) </li></ul><ul><li>Can be safely generated and be guaranteed to be unique if... </li></ul><ul><ul><li> know the same generation algorithm is being used </li></ul></ul><ul><ul><li> have a good pseudo-random number generator </li></ul></ul><ul><li>Keys are meaningless, large, and awkward </li></ul><ul><li>May not integrate very well into an existing system </li></ul><ul><li>Can never use self-checking features in your primary keys </li></ul>
  14. 14. Composite Keys <ul><li>Assign each application (not each user) a unique ID number </li></ul><ul><li>Combine this unique ID along with a regular autoincrementing number </li></ul><ul><ul><li>Can be concatenated to create a single key, or use a composite-column key </li></ul></ul><ul><li>Key is much smaller than GUIDs, and they carry built-in meaning </li></ul><ul><li>Possible that you will exhaust your “key range” before you can synchronize and be assigned a new ID </li></ul><ul><li>Application IDs can be assigned by the central server using a simple autoincrement </li></ul>
  15. 15. Primary Key Pools <ul><li>Composite keys use Application ID to implicitly reserve a range of keys </li></ul><ul><li>Primary key pools explicitly reserve a range of keys by taking them! </li></ul><ul><li>At each synchronization, the application requests a range of unassigned keys from the central server </li></ul><ul><li>Every time the local application needs a key, it takes one from the primary key pool </li></ul><ul><li>Does not require a unique application identifier </li></ul><ul><li>Possible you will exhaust your pool of keys before synchronizing </li></ul><ul><li>Keys don’t carry any meaning </li></ul><ul><li>Complex to setup </li></ul>
  16. 16. Deleting a Contact <ul><li>Once you delete something, it is gone </li></ul><ul><li>If it is gone, how do you know you deleted it? </li></ul><ul><li>Must implement some method to “remember” that something has been deleted </li></ul><ul><ul><li>deleted status column </li></ul></ul><ul><ul><li>shadow (“tombstone”) table </li></ul></ul>
  17. 17. Updating a Column <ul><li>Need a method to distinguish that a column has been modified </li></ul><ul><ul><li>Status column </li></ul></ul><ul><ul><li>Last-Modified timestamp </li></ul></ul><ul><ul><li>Version number </li></ul></ul><ul><li>What happens if two people update the same row? </li></ul><ul><ul><li>Detect that a change conflict has happened </li></ul></ul><ul><ul><ul><li>Row-level vs Column-level conflict detection </li></ul></ul></ul><ul><ul><li>Resolve the conflict </li></ul></ul><ul><li>Idempotent changes </li></ul><ul><li>The “Delete-then-Insert” problem </li></ul><ul><li>Mapping data types to consolidated database </li></ul>
  18. 18. Non-Synchronized Deletes <ul><li>What if you want to delete something off your application, but not delete it system wide </li></ul><ul><li>Need some method to turn off your change tracking algorithm </li></ul>
  19. 19. Our Simple Contact List <ul><li>Went from... </li></ul><ul><ul><li>Contacts table </li></ul></ul>
  20. 20. Our Not-So-Simple Synced Contact List <ul><li>To... </li></ul><ul><ul><li>Contacts table </li></ul></ul><ul><ul><li>Contacts_deleted shadow table </li></ul></ul><ul><ul><li>Contacts_Users_subscription table </li></ul></ul><ul><ul><li>Contacts_key_pool table </li></ul></ul><ul><ul><li>TRIGGER AFTER UPDATE on Contacts </li></ul></ul><ul><ul><li>TRIGGER AFTER INSERT on Contacts </li></ul></ul><ul><ul><li>TRIGGER AFTER DELETE on Contacts </li></ul></ul><ul><ul><li>Disable/Enable change tracking </li></ul></ul><ul><li>And that says nothing about the actual synchronization or conflict resolution logic! </li></ul>
  21. 21. Data Reassignment <ul><li>A user is reassigned to a new city. What needs to happen? </li></ul><ul><ul><li>Complete a final upload of all change on the “old” set of data </li></ul></ul><ul><ul><li>Download a set of operations that delete all the old contacts </li></ul></ul><ul><ul><li>Download the contacts of the new city </li></ul></ul><ul><li>Easy enough for one table. But what happens when we have two or more tables with a foreign key relationship? </li></ul>
  22. 22. Referential Integrity: Friend or Foe? <ul><li>Reassignment problems can quickly become lost in a referential integrity nightmare </li></ul><ul><li>It may become tempting to disable (or at least never enforce) referential integrity checks </li></ul><ul><li>This is usually a bad idea: </li></ul><ul><ul><li>Referential integrity should be your friend. It ensures your data stays consistent </li></ul></ul><ul><ul><li>Likely your server-side database will enforce referential integrity, so it is better to do a client-side check before sending the data up </li></ul></ul><ul><ul><li>There are performance benefits to defining foreign key relations </li></ul></ul>
  23. 23. Application and Schema Upgrades <ul><li>The traditional problems associated with dealing with legacy deployed software are usually avoided by web applications </li></ul><ul><li>Cached versions of applications means you will no longer be able to guarantee that everyone is running with the latest version </li></ul><ul><li>Need some provision to let “older” applications synchronize against logic that is correct for their schema </li></ul><ul><li>This is typically achieved by adding a full level of indirection between the local database and the consolidated database </li></ul>
  24. 24. Data Integrity in the Field <ul><li>Everyone should always have a consistent view of the data at every moment </li></ul><ul><ul><li>Inconsistent data can quickly propagate though a synchronization environment and infect everyone </li></ul></ul><ul><li>This typically means synchronizations must be fully atomic and both ends. </li></ul><ul><ul><li>Error reporting must happen outside this atomic transaction, otherwise it would be lost </li></ul></ul><ul><li>Most applications can not handle half-synchronized data </li></ul><ul><li>Need to handle broken and partial synchronizations </li></ul>
  25. 25. What else might you want? <ul><li>High-priority synchronization </li></ul><ul><li>Implementing secure authentication at every point in the chain </li></ul><ul><li>Encryption </li></ul><ul><li>Server-initiated synchronization (“Push” synchronization) </li></ul><ul><li>Lots more... </li></ul>
  26. 26. Summary <ul><li>Synchronization is deceptively hard </li></ul><ul><ul><li>It is relatively easy to put together a simple, controlled synchronization in a lab between two computers </li></ul></ul><ul><ul><li>The real complications only show up in the real world </li></ul></ul><ul><li>A full synchronization strategy should be planned at the start </li></ul><ul><ul><li>All projects suffer from scope-creep </li></ul></ul><ul><ul><li>It is better to decide early on what you will and won’t do, and architect for it </li></ul></ul><ul><li>Synchronization is rarely, if ever, a simple “bolt-on” solution </li></ul><ul><li>Test with under realistic conditions and realistic load!!!! </li></ul>
  27. 27. <ul><li>A patched version of Gears that adds synchronization functionality </li></ul><ul><li>Allows the use of the Sybase UltraLite relational database </li></ul><ul><li>UltraLite is a small footprint, fully relational database capable of totally self-contained synchronization </li></ul><ul><li>Contains all its own change tracking and synchronization logic that is totally transparent to the end user </li></ul><ul><li>Synchronizes with a MobiLink Synchronization sever providing out-of-the-box synchronization with Oracle, SQL Server, DB2, ASE, SQL Anywhere, and MySQL </li></ul><ul><ul><li>10 years old </li></ul></ul><ul><ul><li>Heavily deployed and tested </li></ul></ul><ul><li>Handles all of the mechanics and plumbing of synchronization, and lets you focus on your business logic </li></ul><ul><li>Business logic can be written in SQL, .NET, or Java </li></ul>
  28. 28. <ul><li>Implemented as an open-source patch to the Gears project released under the Apache 2 project </li></ul><ul><li>Beta went live last night  </li></ul><ul><li>Available for Internet Explorer (Windows) and Firefox (Windows and Linux) </li></ul><ul><li>Free deployment for SQL Anywhere and MySQL-based applications </li></ul><ul><li> </li></ul>
  29. 29. <ul><li>Thank You </li></ul><ul><li>Eric Farrar </li></ul><ul><li>[email_address] </li></ul><ul><li> </li></ul>