• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Planning for Synchronization with Browser-Local Databases

Planning for Synchronization with Browser-Local Databases



Talk by Eric Farrar of Sybase at ZendCon 2009

Talk by Eric Farrar of Sybase at ZendCon 2009



Total Views
Views on SlideShare
Embed Views



1 Embed 5

http://www.slideshare.net 5



Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

    Planning for Synchronization with Browser-Local Databases Planning for Synchronization with Browser-Local Databases Presentation Transcript

    • Eric Farrar, Product Manager Sybase iAnywhere October 20 th , 2009 Preparing for Synchronization with Browser-local Databases
    • Web applications are always online. Aren’t they?
      • Traditionally, but the lines are being blurred
      • Web applications are starting to act like desktop applications
      • Desktop applications are starting to act like web applications
      • Offline Application Caching
        • Gears
        • HTML5
        • Adobe Air
      • This is a new space for web applications, but a well-known area for desktop and mobile applications
    • It Isn’t Only About Going Offline
      • It’s also about speed-ups for online applications
      • Data intensive applications can save many round-trips to the server by storing reference data locally
    • Browser-Local Databases
      • SQLite
        • Gears
          • Chrome
          • Android
        • HTML 5 Specification Draft
          • Firefox
          • Safari
        • Adobe Air
      • Semi-structured Storage (key-value pairs)
        • Cookies
        • Flash Storage
      • Isolated Storage
        • Silverlight
    • But...
      • For most applications, the ultimate destination for local data is a central server (“consolidated database”)
      • This means two more copies of the same data that can be changed independently
      • This introduces synchronization!
    • Synchronization
      • Synchronization is a huge topic
      • This talk aims to show two things:
        • Synchronization is a complex problem that requires careful planning
        • The more planning done upfront, the easier the synchronization process will be
      • Although related, this talk will discuss synchronization of application data , not the application itself (which is usually done separately)
    • How often are you synchronizing?
      • Occasionally-connected
        • Normal application state is connected
        • Disconnection is treated as special case
      • Occasionally-disconnected
        • Normal application state is disconnected
        • Connection is treated as a special case
    • What are you synchronizing?
      • Object-based synchronization
        • Serialized objects (perhaps hierarchical) are the basic object of synchronization
        • Similar to document-based synchronization (CouchDB)
      • Action-based synchronization
        • Actions on data, rather than data itself, in synchronized
        • Store a log of actions to take once connection is available
        • Difficult to implement for moderately complex systems
      • Row-based synchronization
        • Individual database rows are the basic object of synchronization
        • Problem is the best defined
        • Often the ultimate destination will of the data will be a relational database
    • “A few rows up, a few rows down. What’s so hard about that?”
      • Synchronization is deceivingly complicated
      • Lets build an application that only synchronizes a single table holding a contact list
      • Typical record
      Contact ID Name Address City Last Contact 102 Homer Simpson 742 Evergreen Terrace Springfield 2009-10-10 10:00
    • Data Subsetting
      • Each user should get all of their data, and only their data
      • Data must be filtered at the central server, not the end clients
        • Network Traffic
        • Data Storage
        • Application Performance
        • User Experience
        • Security
      • Clients should only need to provide a user name and some subscriptions, and the servers should be able to figure out what they need
    • Implementing Subsetting for Contacts
      • Create a mapping (“subscription”) table
      RowID User ID City 1 0001 Springfield 2 0002 Ogdenville 3 0003 Springfield 4 0003 Ogdenville 5 ... ...
    • Adding Contacts
      • Add the following contact
      • What do we use as a Contact ID?
        • Autoincrement won’t work
      • What can we do?
      Contact ID Name Address City Last Contact ???? Mr. Teeny 123 Fake Street Springfield 2009-10-10 10:00
    • GUIDs to the Rescue. Maybe...
      • The simplest solution is to use Globally Unique Identifiers (GUIDs)
      • GUIDS are 128-bit number (often expressed as a 32-character alphanumeric string)
      • Can be safely generated and be guaranteed to be unique if...
        • ...you know the same generation algorithm is being used
        • ...you have a good pseudo-random number generator
      • Keys are meaningless, large, and awkward
      • May not integrate very well into an existing system
      • Can never use self-checking features in your primary keys
    • Composite Keys
      • Assign each application (not each user) a unique ID number
      • Combine this unique ID along with a regular autoincrementing number
        • Can be concatenated to create a single key, or use a composite-column key
      • Key is much smaller than GUIDs, and they carry built-in meaning
      • Possible that you will exhaust your “key range” before you can synchronize and be assigned a new ID
      • Application IDs can be assigned by the central server using a simple autoincrement
    • Primary Key Pools
      • Composite keys use Application ID to implicitly reserve a range of keys
      • Primary key pools explicitly reserve a range of keys by taking them!
      • At each synchronization, the application requests a range of unassigned keys from the central server
      • Every time the local application needs a key, it takes one from the primary key pool
      • Does not require a unique application identifier
      • Possible you will exhaust your pool of keys before synchronizing
      • Keys don’t carry any meaning
      • Complex to setup
    • Deleting a Contact
      • Once you delete something, it is gone
      • If it is gone, how do you know you deleted it?
      • Must implement some method to “remember” that something has been deleted
        • deleted status column
        • shadow (“tombstone”) table
    • Updating a Column
      • Need a method to distinguish that a column has been modified
        • Status column
        • Last-Modified timestamp
        • Version number
      • What happens if two people update the same row?
        • Detect that a change conflict has happened
          • Row-level vs Column-level conflict detection
        • Resolve the conflict
      • Idempotent changes
      • The “Delete-then-Insert” problem
      • Mapping data types to consolidated database
    • Non-Synchronized Deletes
      • What if you want to delete something off your application, but not delete it system wide
      • Need some method to turn off your change tracking algorithm
    • Our Simple Contact List
      • Went from...
        • Contacts table
    • Our Not-So-Simple Synced Contact List
      • To...
        • Contacts table
        • Contacts_deleted shadow table
        • Contacts_Users_subscription table
        • Contacts_key_pool table
        • TRIGGER AFTER UPDATE on Contacts
        • TRIGGER AFTER INSERT on Contacts
        • TRIGGER AFTER DELETE on Contacts
        • Disable/Enable change tracking
      • And that says nothing about the actual synchronization or conflict resolution logic!
    • Data Reassignment
      • A user is reassigned to a new city. What needs to happen?
        • Complete a final upload of all change on the “old” set of data
        • Download a set of operations that delete all the old contacts
        • Download the contacts of the new city
      • Easy enough for one table. But what happens when we have two or more tables with a foreign key relationship?
    • Referential Integrity: Friend or Foe?
      • Reassignment problems can quickly become lost in a referential integrity nightmare
      • It may become tempting to disable (or at least never enforce) referential integrity checks
      • This is usually a bad idea:
        • Referential integrity should be your friend. It ensures your data stays consistent
        • Likely your server-side database will enforce referential integrity, so it is better to do a client-side check before sending the data up
        • There are performance benefits to defining foreign key relations
    • Application and Schema Upgrades
      • The traditional problems associated with dealing with legacy deployed software are usually avoided by web applications
      • Cached versions of applications means you will no longer be able to guarantee that everyone is running with the latest version
      • Need some provision to let “older” applications synchronize against logic that is correct for their schema
      • This is typically achieved by adding a full level of indirection between the local database and the consolidated database
    • Data Integrity in the Field
      • Everyone should always have a consistent view of the data at every moment
        • Inconsistent data can quickly propagate though a synchronization environment and infect everyone
      • This typically means synchronizations must be fully atomic and both ends.
        • Error reporting must happen outside this atomic transaction, otherwise it would be lost
      • Most applications can not handle half-synchronized data
      • Need to handle broken and partial synchronizations
    • What else might you want?
      • High-priority synchronization
      • Implementing secure authentication at every point in the chain
      • Encryption
      • Server-initiated synchronization (“Push” synchronization)
      • Lots more...
    • Summary
      • Synchronization is deceptively hard
        • It is relatively easy to put together a simple, controlled synchronization in a lab between two computers
        • The real complications only show up in the real world
      • A full synchronization strategy should be planned at the start
        • All projects suffer from scope-creep
        • It is better to decide early on what you will and won’t do, and architect for it
      • Synchronization is rarely, if ever, a simple “bolt-on” solution
      • Test with under realistic conditions and realistic load!!!!
      • A patched version of Gears that adds synchronization functionality
      • Allows the use of the Sybase UltraLite relational database
      • UltraLite is a small footprint, fully relational database capable of totally self-contained synchronization
      • Contains all its own change tracking and synchronization logic that is totally transparent to the end user
      • Synchronizes with a MobiLink Synchronization sever providing out-of-the-box synchronization with Oracle, SQL Server, DB2, ASE, SQL Anywhere, and MySQL
        • 10 years old
        • Heavily deployed and tested
      • Handles all of the mechanics and plumbing of synchronization, and lets you focus on your business logic
      • Business logic can be written in SQL, .NET, or Java
      • Implemented as an open-source patch to the Gears project released under the Apache 2 project
      • Beta went live last night 
      • Available for Internet Explorer (Windows) and Firefox (Windows and Linux)
      • Free deployment for SQL Anywhere and MySQL-based applications
      • www.sybase.com/ultraliteweb
      • Thank You
      • Eric Farrar
      • [email_address]
      • http://iablog.sybase.com/efarrar