Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Data handling for Disconnected Apps


Published on

Published in: Technology
  • Be the first to comment

Data handling for Disconnected Apps

  1. 1. Disconnected Data Handling in Mobile / Wireless Applications Kyle Cordes Oasis Digital Solutions Inc.
  2. 2. Agenda <ul><li>What do I mean by mobile / wireless? </li></ul><ul><li>Brief overview of the problem space </li></ul><ul><li>Solutions, approaches, tips </li></ul><ul><li>Examples along the way </li></ul><ul><li>We’ll covers many of these slides quite briefly, to focus on the interesting parts. </li></ul>
  3. 3. Yet Another About Me Slide <ul><li>Kyle Cordes: </li></ul><ul><ul><li>Developer, consultant, project manager, etc. </li></ul></ul><ul><ul><li>Uses Java, .NET, Delphi, lots of other languages, RDBMSs, etc. </li></ul></ul><ul><li>Oasis Digital has developed mobile / wireless applications… </li></ul><ul><ul><li>for client projects; we have no product / framework / etc. to sell. </li></ul></ul>
  4. 4. What is a Mobile / Wireless Application? <ul><li>User community that moves around </li></ul><ul><ul><li>World, Country, State, Metro, Campus </li></ul></ul><ul><li>Occasional (typically wireless) access to servers </li></ul><ul><li>This presentation is specifically concerned with data-centric “enterprise” applications </li></ul><ul><ul><li>There are other interesting application types, such as multiplayer games. </li></ul></ul>
  5. 5. Application Examples: <ul><li>Service Dispatching </li></ul><ul><ul><li>For example a delivery driver </li></ul></ul><ul><li>Mobile Workforce CRM </li></ul><ul><ul><li>On-the-road sales force </li></ul></ul><ul><li>Shared Calendars / Scheduling </li></ul><ul><li>Email (!) </li></ul><ul><ul><li>Read and compose messages on an airplane </li></ul></ul>
  6. 6. The Problem Landscape <ul><li>Connectivity: Intermittent, Unreliable, Slow </li></ul><ul><li>Development Model </li></ul><ul><ul><li>Most tools focus on clients that can always talk to middle tier or database servers, even with disconnected recordset </li></ul></ul><ul><li>Concurrency </li></ul><ul><ul><li>“ Highly Optimistic” locking </li></ul></ul>
  7. 7. Assumptions <ul><li>My assumptions are for the worst-case, nationwide deployment scenario </li></ul><ul><li>The situation is less severe if you application is limited to a geographic area where there are ample, high quality service providers </li></ul><ul><li>The general ideas still apply </li></ul>
  8. 8. Sometimes You are Not Connected <ul><li>Wireless networks are nowhere near 100% coverage </li></ul><ul><ul><li>(excluding some campus networks) </li></ul></ul><ul><li>So an application must use a “Briefcase” Metaphor: </li></ul><ul><ul><li>Pack up some work to do, in your briefcase. </li></ul></ul><ul><ul><li>Do the work, using just what’s in there. </li></ul></ul><ul><ul><li>Occasionally get more work and drop off / send in what you have done. </li></ul></ul>
  9. 9. Interlude: Buying Wireless Access <ul><li>Availability and cost of wireless data depends on which vendors have built networks in an area </li></ul><ul><li>Varies widely by area </li></ul><ul><ul><li>Even with nationwide providers </li></ul></ul><ul><ul><li>Especially when you get away from the largest cities </li></ul></ul>
  10. 10. When You are Connected, It’s Not Very Good <ul><li>High Latency </li></ul><ul><ul><li>Hundreds of milliseconds is typical, 1000 is not unusual </li></ul></ul><ul><li>Low Bandwidth </li></ul><ul><ul><li>There are promising developments, but… </li></ul></ul><ul><ul><li>There are also many areas where ~10 kilobits is typical </li></ul></ul><ul><ul><li>User may want to wait on large data items until a fast connection is available. </li></ul></ul>
  11. 11. … And The Data Changes Underneath You <ul><li>Someone else worked on the same entities you were working on </li></ul><ul><li>Something assigned to you, got reassigned to someone else </li></ul><ul><li>In many problem domains, conflict avoidance is possible </li></ul><ul><li>In others, conflict resolution is necessary </li></ul>
  12. 12. A Rich / Smart Client <ul><li>These applications require: </li></ul><ul><ul><li>Some kind of local storage: data file(s), database engine, RAM </li></ul></ul><ul><ul><li>Local data validation, as much as possible </li></ul></ul><ul><ul><li>A non-HTML GUI </li></ul></ul><ul><ul><ul><li>Unless you use a client-side web server </li></ul></ul></ul><ul><li>Happily, it’s no longer painful to deploy and update rich client, with JWS, ClickOnce, etc. </li></ul>
  13. 13. Solution Elements <ul><li>Synchronization </li></ul><ul><ul><li>Reference Data vs. Transaction Data </li></ul></ul><ul><ul><li>Figuring out what data to send </li></ul></ul><ul><ul><li>Sending it </li></ul></ul><ul><li>Storing and manipulating data in the client application </li></ul>
  14. 14. Synchronization <ul><li>The client and server occasionally connect </li></ul><ul><ul><li>Perhaps a few times a week </li></ul></ul><ul><ul><li>Perhaps a few times a minute </li></ul></ul><ul><li>Send relevant changes back and forth </li></ul><ul><li>Reference data “syncs” one direction </li></ul><ul><li>Transaction data “syncs” both directions </li></ul>
  15. 15. Interlude: Dependency Reversal <ul><li>Usual approach: </li></ul><ul><ul><li>Client code depends on server APIs </li></ul></ul><ul><ul><li>Server doesn’t know much client internals </li></ul></ul><ul><li>Consider: </li></ul><ul><ul><li>A well-encapsulated server module which intimately understands the client </li></ul></ul><ul><ul><li>Makes it possible to know more about what to send, thus to send less – trade runtime efficiency for development effort </li></ul></ul>
  16. 16. DBMS Replication <ul><li>Some DBMSs provide replication </li></ul><ul><ul><li>Including small-footprint code for small devices </li></ul></ul><ul><ul><li>Consider SQL Server, SQL Anywhere, PointBase, others </li></ul></ul><ul><li>Big advantage: </li></ul><ul><ul><li>Can save a lot of time in development schedule </li></ul></ul>
  17. 17. Why not use DBMS Replication? <ul><li>Ties the application closely to that DBMS vendor </li></ul><ul><ul><li>Each has a different feature set / design </li></ul></ul><ul><ul><li>Not a problem if you are committed to one DBMS </li></ul></ul><ul><li>Can be quite expensive </li></ul><ul><ul><li>In one case, just the sync module for X,000 users would have cost almost as much as the whole project </li></ul></ul><ul><li>Limits the application to vendor’s conception of how synchronization should work </li></ul>
  18. 18. Rolling Your Own – A Layered Model <ul><li>Logical Layer </li></ul><ul><ul><li>Decide what data to send </li></ul></ul><ul><ul><li>Code, Queries, etc. </li></ul></ul><ul><li>Transport / Physical Layer </li></ul><ul><ul><li>Package / represent the data </li></ul></ul><ul><ul><li>XML, Binary, etc. </li></ul></ul>
  19. 19. Decide What Data to Send: <ul><li>Three approaches discussed here: </li></ul><ul><li>Date-Modified Fields </li></ul><ul><li>Message Queuing </li></ul><ul><li>Remember and Diff </li></ul>
  20. 20. Date-Modified Approach <ul><li>Each table / entity gets a date-modified field </li></ul><ul><ul><li>Kept up to date with code or trigger </li></ul></ul><ul><li>Don’t delete, rather set/clear a Active flag </li></ul><ul><ul><li>There are other ways, of course </li></ul></ul><ul><li>Database schema has to accommodate this approach </li></ul><ul><li>Potentially complex and expensive queries to gather the data </li></ul>
  21. 21. Example <ul><li>Schema </li></ul><ul><li>Application Code </li></ul><ul><li>SQL to gather changed data </li></ul><ul><ul><li>Potentially expensive JOINs or subqueries </li></ul></ul>
  22. 22. Message Queuing <ul><li>Easily added to existing systems without schema change </li></ul><ul><ul><li>Hence much easier to deal with legacy systems and EAI </li></ul></ul><ul><li>Need to generate change messages, in application code or triggers </li></ul><ul><li>Can use an off the shelf MOM </li></ul><ul><ul><li>Which comes with transports already </li></ul></ul>
  23. 23. Message Queuing Implementation <ul><li>Each client subscribes to the topics that matter to it </li></ul><ul><ul><li>“Durable” subscription, so they can pick up changes while disconnected </li></ul></ul><ul><li>Application publishes all changes </li></ul><ul><li>Can handle deletes as well as insert/updates </li></ul>
  24. 24. Message Queuing Downsides <ul><li>If something changes many times between the client picking up the changes, they there could be lots of wasted download volume </li></ul><ul><li>This can be addressed with a “queue consolidator” interposed between end client and MOM system </li></ul>
  25. 25. Example <ul><li>Queue Setup </li></ul><ul><li>Application Code </li></ul>
  26. 26. “Diff” Approach <ul><li>Most efficient, in terms of data sent </li></ul><ul><li>Always sends exactly what needs to be sent </li></ul><ul><li>Can handle deletes seamlessly </li></ul><ul><li>Easily scales to highly complex data </li></ul><ul><ul><li>Though it can get CPU-intensive; </li></ul></ul><ul><ul><li>CPU cycles are much cheaper than wireless bandwidth </li></ul></ul>
  27. 27. Diff Approach - How It Works <ul><li>The server knows the current state of the client data store </li></ul><ul><li>The server calculates the intended state of the client data store </li></ul><ul><li>Calculates the “diff” between them and sends (Generic, well-studied problem)’ </li></ul><ul><li>Client applies the diff to its local data store </li></ul><ul><li>Uses before/after hash/CRC to make sure it really worked (Safe!) </li></ul>
  28. 28. Example <ul><li>Application Code </li></ul><ul><li>Example data packet </li></ul>
  29. 29. Continuous Synchronization <ul><li>Users may want to keep up to date at all times, when they have a connection </li></ul><ul><li>Trivial under the MOM approach </li></ul><ul><li>Less easy under the other approaches </li></ul><ul><ul><li>Typically necessary to add a publish / subscribe subsystem </li></ul></ul>
  30. 30. Low-Hanging Fruit: Reference Data <ul><li>Reference data does not change as part of the transaction flow of the application. </li></ul><ul><li>Install large reference data with the application – CDROMs are cheap </li></ul><ul><li>Update it with a one-way sync process </li></ul><ul><ul><li>Usually any simple sync process works fine for reference data </li></ul></ul><ul><ul><li>Disconnected approach to reference data can be very helpful for garden-variety applications also! </li></ul></ul>
  31. 31. Physical: Encoding / Packing / Transferring the Data <ul><li>Minimize Data Transfer Volume </li></ul><ul><ul><li>Send the minimum data needed </li></ul></ul><ul><ul><li>Consider data format verbosity </li></ul></ul><ul><ul><li>Use Compression </li></ul></ul><ul><li>Minimize round-trips </li></ul><ul><ul><li>“ Boxcarring” </li></ul></ul>
  32. 32. Data Transfer Volume <ul><li>Choosing carefully what data to send (in the application domain) matters more than compression or encoding. </li></ul><ul><li>This often interacts with overall application design </li></ul><ul><li>Given the above, look for a sufficiently efficient encoding </li></ul><ul><ul><li>The more data, the more important the encoding. </li></ul></ul>
  33. 33. Compression <ul><li>Whatever else you do, also use compression </li></ul><ul><ul><li>Wireless networks are slow compared to compression/decompression </li></ul></ul><ul><ul><li>Even on small-CPU devices. </li></ul></ul>
  34. 34. XML <ul><li>Be wary of verbose formats like XML </li></ul><ul><li>But not too wary - measure. XML has been fine for us, much of the verbosity compresses out. </li></ul><ul><li>Some DBMSs can read and write XML, saving implementation effort </li></ul><ul><li>XML facilitates varying client/server platforms, declarative correctness checks, versioning, etc. </li></ul>
  35. 35. Example <ul><li>XML “diff” packets </li></ul>
  36. 36. SOAP <ul><li>Lots of in-the-box support in many languages and toolsets </li></ul><ul><li>XML – based; verbose; flexible </li></ul><ul><li>Many SOAP experts recommend coarse operations, which are appropriate for disconnected mode applications </li></ul><ul><li>Consider SOAP document mode (vs. RPC mode) </li></ul>
  37. 37. Making XML More Efficient <ul><li>There are XML-specific compression tools </li></ul><ul><ul><li>Which can be used with straight XML or SOAP </li></ul></ul><ul><li>There are “binary XML” encodings </li></ul><ul><ul><li>WML includes one: </li></ul></ul><ul><ul><ul><li>Format available at </li></ul></ul></ul><ul><ul><ul><li>Encode and decode with </li></ul></ul></ul><ul><ul><li>BOX – </li></ul></ul><ul><ul><li>There are others </li></ul></ul>
  38. 38. Highly Efficient Encodings <ul><li>If the previous approaches are insufficient, consider domain-specific encodings: </li></ul><ul><ul><li>tokenize your data with a dictionary of common data elements pre-loaded on the client, and pre-loaded in the compression engine </li></ul></ul><ul><ul><li>Hand-code a binary storage format </li></ul></ul><ul><ul><li>Java: Externalize rather than Serialize </li></ul></ul><ul><ul><li>Measure serialization mechanisms, sometimes they are surprisingly verbose </li></ul></ul>
  39. 39. Boxcarring <ul><ul><li>Lots of little messages in one big message </li></ul></ul><ul><ul><li>Minimizes round-trips </li></ul></ul><ul><ul><li>Supported directly by a few middleware systems, such as XML-RPC </li></ul></ul><ul><ul><li>Can be implemented as a collection of Command objects </li></ul></ul>
  40. 40. Example <ul><li>Command pattern used to send many changes over a single server invocation </li></ul>
  41. 41. Client-side Data Storage and Manipulation <ul><li>Approaches to data storage in a client application: </li></ul><ul><li>DBMS </li></ul><ul><li>Serialization </li></ul><ul><ul><li>XML, native, etc. </li></ul></ul><ul><li>Image/Log </li></ul><ul><ul><li>If using PC hardware, data probably fits in RAM </li></ul></ul><ul><ul><li>Smalltalk-ish approach </li></ul></ul><ul><ul><li>Prevalyer </li></ul></ul>
  42. 42. DBMS on the Client <ul><li>A DBMS is a well-understood, off the shelf way to deal with business application data </li></ul><ul><li>Client schema can be a simplified version of the server schema, or totally different (!) </li></ul><ul><li>Facilitates RAD development on the client </li></ul><ul><li>I recommend the DBMS approach for complex client applications </li></ul>
  43. 43. Local Database Engines <ul><li>Java: HSQLDB, JDataStore, Mckoi SQL, others </li></ul><ul><li>Microsoft: MSDE </li></ul><ul><li>Other: In a Delphi app, we used DBISAM, which compiles completely in to the application </li></ul><ul><li>If you use a DBMS-provided replication mechanism, you’ll use their client DBMS. </li></ul>
  44. 44. Serialization <ul><li>The data size may be small enough to simply write it all out with a few lines of code </li></ul><ul><ul><li>If your toolset supports serialization </li></ul></ul><ul><ul><li>Benchmark and measure, of course </li></ul></ul><ul><ul><li>Big win in simplicity </li></ul></ul><ul><li>The even/odd technique to prevent data loss </li></ul>
  45. 45. Tracking Changes on the Client <ul><li>Various approaches: </li></ul><ul><ul><li>Changed data “flag” fields </li></ul></ul><ul><ul><li>Change log (table, objects, file, etc.) </li></ul></ul><ul><ul><li>Diff current client data to stored starting point </li></ul></ul><ul><li>Choose whatever is convenient in your client toolset </li></ul>
  46. 46. Applying Changes From the Server <ul><li>Small application: </li></ul><ul><ul><li>Hand-coding to apply the few kinds of changes </li></ul></ul><ul><li>Big application </li></ul><ul><ul><li>Build a generic mechanism to apply change “packets” to the local data store </li></ul></ul><ul><li>Thing of each change as a command to be replayed </li></ul>
  47. 47. Applying Changes: Example <ul><li>Example code for a simple application </li></ul><ul><li>Snippet from a complex application </li></ul>
  48. 48. The Other Direction – Client to Server <ul><li>End users do work on their local data store </li></ul><ul><li>When synchronizing, these changes need to be send to the server </li></ul><ul><li>Approach 1: use a data-centric mechanism as described for Server  Client </li></ul><ul><li>Approach 2: send changes at the domain level (recommended) </li></ul>
  49. 49. Example <ul><li>Domain-level changes </li></ul>
  50. 50. Conflict Resolution <ul><li>Handle user operations on the client, as domain-level operations when sending to the server </li></ul><ul><li>Resolve conflict at the business level: </li></ul><ul><ul><li>A “Work Order” is dispatched to worker A </li></ul></ul><ul><ul><li>Worker A does the work </li></ul></ul><ul><ul><li>Meanwhile, the Work Order get reassigned to worker B </li></ul></ul><ul><ul><li>Worker A syncs </li></ul></ul><ul><ul><li>Uhoh! </li></ul></ul>
  51. 51. Pleasant Surprises <ul><li>Scalability </li></ul><ul><li>Usage Peaks </li></ul><ul><li>Bandwidth </li></ul><ul><li>Uptime </li></ul>
  52. 52. Scalability <ul><li>Mobile / Wireless apps tend to scale very well </li></ul><ul><li>At any moment, most of the users aren’t connected to the server </li></ul><ul><li>Many thousands of users per server is often OK </li></ul>
  53. 53. Locality of Reference <ul><li>When connected, a client tends to performs multiple operations at once on related data </li></ul><ul><ul><li>Hence excellent locality and caching characteristics </li></ul></ul>
  54. 54. Usage Peaks <ul><li>Even busy times of the day (8 AM) get spread out by time zones and human randomness. </li></ul><ul><li>Gracefully defer some clients a little while, if server load is high </li></ul><ul><ul><li>They can usually keep working, since they have local data </li></ul></ul>
  55. 55. Bandwidth <ul><li>Data center bandwidth is far cheaper than mobile bandwidth, of course </li></ul><ul><li>Even a T1 line can support a multitude of busy users who connect over a mobile wireless connection </li></ul><ul><ul><li>Real-world example: 2 T1s, 2000+ end users. </li></ul></ul><ul><li>Hence, data-center bandwidth usually isn’t much of an issue </li></ul>
  56. 56. Uptime <ul><li>“Five Nines” uptime tends to be less important with disconnected applications </li></ul><ul><li>Most end users would not notice a few minutes of downtime </li></ul><ul><ul><li>They have local data to work with </li></ul></ul><ul><ul><li>Enables much less costly few-minutes failover approaches, in lieu of online failover clustering </li></ul></ul>
  57. 57. Lessons From Our Experiences <ul><li>Have enough data on the client to let the user work for a long time without a connection </li></ul><ul><li>Minimize round-trips and data volume </li></ul><ul><li>Take control with code when to do something specific to get the performance needed. </li></ul><ul><li>Conflict resolution and synchronization status end up as requirements / application design issues </li></ul><ul><ul><li>Take them to the customer </li></ul></ul>
  58. 58. THE END <ul><li>Kyle Cordes [email_address] (636) 219-9589 </li></ul><ul><li>Slides and snippets will be on my web site, </li></ul><ul><li>Slides are also on the conference CDROM </li></ul>