Planning for Synchronization with Browser-Local Databases

  • 2,332 views
Uploaded on

Talk by Eric Farrar of Sybase at ZendCon 2009

Talk by Eric Farrar of Sybase at ZendCon 2009

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
2,332
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
90
Comments
0
Likes
2

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Eric Farrar, Product Manager Sybase iAnywhere October 20 th , 2009 Preparing for Synchronization with Browser-local Databases
  • 2. Web applications are always online. Aren’t they?
    • Traditionally, but the lines are being blurred
    • Web applications are starting to act like desktop applications
    • Desktop applications are starting to act like web applications
    • Offline Application Caching
      • Gears
      • HTML5
      • Adobe Air
    • This is a new space for web applications, but a well-known area for desktop and mobile applications
  • 3. It Isn’t Only About Going Offline
    • It’s also about speed-ups for online applications
    • Data intensive applications can save many round-trips to the server by storing reference data locally
  • 4. Browser-Local Databases
    • SQLite
      • Gears
        • Chrome
        • Android
      • HTML 5 Specification Draft
        • Firefox
        • Safari
      • Adobe Air
    • Semi-structured Storage (key-value pairs)
      • Cookies
      • Flash Storage
    • Isolated Storage
      • Silverlight
  • 5. But...
    • For most applications, the ultimate destination for local data is a central server (“consolidated database”)
    • This means two more copies of the same data that can be changed independently
    • This introduces synchronization!
  • 6. Synchronization
    • Synchronization is a huge topic
    • This talk aims to show two things:
      • Synchronization is a complex problem that requires careful planning
      • The more planning done upfront, the easier the synchronization process will be
    • Although related, this talk will discuss synchronization of application data , not the application itself (which is usually done separately)
  • 7. How often are you synchronizing?
    • Occasionally-connected
      • Normal application state is connected
      • Disconnection is treated as special case
    • Occasionally-disconnected
      • Normal application state is disconnected
      • Connection is treated as a special case
  • 8. What are you synchronizing?
    • Object-based synchronization
      • Serialized objects (perhaps hierarchical) are the basic object of synchronization
      • Similar to document-based synchronization (CouchDB)
    • Action-based synchronization
      • Actions on data, rather than data itself, in synchronized
      • Store a log of actions to take once connection is available
      • Difficult to implement for moderately complex systems
    • Row-based synchronization
      • Individual database rows are the basic object of synchronization
      • Problem is the best defined
      • Often the ultimate destination will of the data will be a relational database
  • 9. “A few rows up, a few rows down. What’s so hard about that?”
    • Synchronization is deceivingly complicated
    • Lets build an application that only synchronizes a single table holding a contact list
    • Typical record
    Contact ID Name Address City Last Contact 102 Homer Simpson 742 Evergreen Terrace Springfield 2009-10-10 10:00
  • 10. Data Subsetting
    • Each user should get all of their data, and only their data
    • Data must be filtered at the central server, not the end clients
      • Network Traffic
      • Data Storage
      • Application Performance
      • User Experience
      • Security
    • Clients should only need to provide a user name and some subscriptions, and the servers should be able to figure out what they need
  • 11. Implementing Subsetting for Contacts
    • Create a mapping (“subscription”) table
    RowID User ID City 1 0001 Springfield 2 0002 Ogdenville 3 0003 Springfield 4 0003 Ogdenville 5 ... ...
  • 12. Adding Contacts
    • Add the following contact
    • What do we use as a Contact ID?
      • Autoincrement won’t work
    • What can we do?
    Contact ID Name Address City Last Contact ???? Mr. Teeny 123 Fake Street Springfield 2009-10-10 10:00
  • 13. GUIDs to the Rescue. Maybe...
    • The simplest solution is to use Globally Unique Identifiers (GUIDs)
    • GUIDS are 128-bit number (often expressed as a 32-character alphanumeric string)
    • Can be safely generated and be guaranteed to be unique if...
      • ...you know the same generation algorithm is being used
      • ...you have a good pseudo-random number generator
    • Keys are meaningless, large, and awkward
    • May not integrate very well into an existing system
    • Can never use self-checking features in your primary keys
  • 14. Composite Keys
    • Assign each application (not each user) a unique ID number
    • Combine this unique ID along with a regular autoincrementing number
      • Can be concatenated to create a single key, or use a composite-column key
    • Key is much smaller than GUIDs, and they carry built-in meaning
    • Possible that you will exhaust your “key range” before you can synchronize and be assigned a new ID
    • Application IDs can be assigned by the central server using a simple autoincrement
  • 15. Primary Key Pools
    • Composite keys use Application ID to implicitly reserve a range of keys
    • Primary key pools explicitly reserve a range of keys by taking them!
    • At each synchronization, the application requests a range of unassigned keys from the central server
    • Every time the local application needs a key, it takes one from the primary key pool
    • Does not require a unique application identifier
    • Possible you will exhaust your pool of keys before synchronizing
    • Keys don’t carry any meaning
    • Complex to setup
  • 16. Deleting a Contact
    • Once you delete something, it is gone
    • If it is gone, how do you know you deleted it?
    • Must implement some method to “remember” that something has been deleted
      • deleted status column
      • shadow (“tombstone”) table
  • 17. Updating a Column
    • Need a method to distinguish that a column has been modified
      • Status column
      • Last-Modified timestamp
      • Version number
    • What happens if two people update the same row?
      • Detect that a change conflict has happened
        • Row-level vs Column-level conflict detection
      • Resolve the conflict
    • Idempotent changes
    • The “Delete-then-Insert” problem
    • Mapping data types to consolidated database
  • 18. Non-Synchronized Deletes
    • What if you want to delete something off your application, but not delete it system wide
    • Need some method to turn off your change tracking algorithm
  • 19. Our Simple Contact List
    • Went from...
      • Contacts table
  • 20. Our Not-So-Simple Synced Contact List
    • To...
      • Contacts table
      • Contacts_deleted shadow table
      • Contacts_Users_subscription table
      • Contacts_key_pool table
      • TRIGGER AFTER UPDATE on Contacts
      • TRIGGER AFTER INSERT on Contacts
      • TRIGGER AFTER DELETE on Contacts
      • Disable/Enable change tracking
    • And that says nothing about the actual synchronization or conflict resolution logic!
  • 21. Data Reassignment
    • A user is reassigned to a new city. What needs to happen?
      • Complete a final upload of all change on the “old” set of data
      • Download a set of operations that delete all the old contacts
      • Download the contacts of the new city
    • Easy enough for one table. But what happens when we have two or more tables with a foreign key relationship?
  • 22. Referential Integrity: Friend or Foe?
    • Reassignment problems can quickly become lost in a referential integrity nightmare
    • It may become tempting to disable (or at least never enforce) referential integrity checks
    • This is usually a bad idea:
      • Referential integrity should be your friend. It ensures your data stays consistent
      • Likely your server-side database will enforce referential integrity, so it is better to do a client-side check before sending the data up
      • There are performance benefits to defining foreign key relations
  • 23. Application and Schema Upgrades
    • The traditional problems associated with dealing with legacy deployed software are usually avoided by web applications
    • Cached versions of applications means you will no longer be able to guarantee that everyone is running with the latest version
    • Need some provision to let “older” applications synchronize against logic that is correct for their schema
    • This is typically achieved by adding a full level of indirection between the local database and the consolidated database
  • 24. Data Integrity in the Field
    • Everyone should always have a consistent view of the data at every moment
      • Inconsistent data can quickly propagate though a synchronization environment and infect everyone
    • This typically means synchronizations must be fully atomic and both ends.
      • Error reporting must happen outside this atomic transaction, otherwise it would be lost
    • Most applications can not handle half-synchronized data
    • Need to handle broken and partial synchronizations
  • 25. What else might you want?
    • High-priority synchronization
    • Implementing secure authentication at every point in the chain
    • Encryption
    • Server-initiated synchronization (“Push” synchronization)
    • Lots more...
  • 26. Summary
    • Synchronization is deceptively hard
      • It is relatively easy to put together a simple, controlled synchronization in a lab between two computers
      • The real complications only show up in the real world
    • A full synchronization strategy should be planned at the start
      • All projects suffer from scope-creep
      • It is better to decide early on what you will and won’t do, and architect for it
    • Synchronization is rarely, if ever, a simple “bolt-on” solution
    • Test with under realistic conditions and realistic load!!!!
  • 27.
    • A patched version of Gears that adds synchronization functionality
    • Allows the use of the Sybase UltraLite relational database
    • UltraLite is a small footprint, fully relational database capable of totally self-contained synchronization
    • Contains all its own change tracking and synchronization logic that is totally transparent to the end user
    • Synchronizes with a MobiLink Synchronization sever providing out-of-the-box synchronization with Oracle, SQL Server, DB2, ASE, SQL Anywhere, and MySQL
      • 10 years old
      • Heavily deployed and tested
    • Handles all of the mechanics and plumbing of synchronization, and lets you focus on your business logic
    • Business logic can be written in SQL, .NET, or Java
  • 28.
    • Implemented as an open-source patch to the Gears project released under the Apache 2 project
    • Beta went live last night 
    • Available for Internet Explorer (Windows) and Firefox (Windows and Linux)
    • Free deployment for SQL Anywhere and MySQL-based applications
    • www.sybase.com/ultraliteweb
  • 29.
    • Thank You
    • Eric Farrar
    • [email_address]
    • http://iablog.sybase.com/efarrar