Data handling for Disconnected Apps

1,683 views

Published on

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,683
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
0
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • I generally skip this slide when giving a talk, it is here to provide background for those who aren’t there to hear it.
  • In an application for single-city deployment, you may be able to assure a high level of connectedness. This will not be the case in a nationwide deployment.
  • Also consider web-based installers, auto upgrade mechanisms, etc.
  • I’ll mention reference data again later, it is a simpler case with some easy optimizations
  • Moreover, this talk is about the technical design of how it works, which you may be able to avoid caring about.
  • Show an example of such a query, point out why it can be expensive
  • Show an example of this, using JMS
  • Data handling for Disconnected Apps

    1. 1. Disconnected Data Handling in Mobile / Wireless Applications Kyle Cordes Oasis Digital Solutions Inc.
    2. 2. Agenda <ul><li>What do I mean by mobile / wireless? </li></ul><ul><li>Brief overview of the problem space </li></ul><ul><li>Solutions, approaches, tips </li></ul><ul><li>Examples along the way </li></ul><ul><li>We’ll covers many of these slides quite briefly, to focus on the interesting parts. </li></ul>
    3. 3. Yet Another About Me Slide <ul><li>Kyle Cordes: </li></ul><ul><ul><li>Developer, consultant, project manager, etc. </li></ul></ul><ul><ul><li>Uses Java, .NET, Delphi, lots of other languages, RDBMSs, etc. </li></ul></ul><ul><li>Oasis Digital has developed mobile / wireless applications… </li></ul><ul><ul><li>for client projects; we have no product / framework / etc. to sell. </li></ul></ul>
    4. 4. What is a Mobile / Wireless Application? <ul><li>User community that moves around </li></ul><ul><ul><li>World, Country, State, Metro, Campus </li></ul></ul><ul><li>Occasional (typically wireless) access to servers </li></ul><ul><li>This presentation is specifically concerned with data-centric “enterprise” applications </li></ul><ul><ul><li>There are other interesting application types, such as multiplayer games. </li></ul></ul>
    5. 5. Application Examples: <ul><li>Service Dispatching </li></ul><ul><ul><li>For example a delivery driver </li></ul></ul><ul><li>Mobile Workforce CRM </li></ul><ul><ul><li>On-the-road sales force </li></ul></ul><ul><li>Shared Calendars / Scheduling </li></ul><ul><li>Email (!) </li></ul><ul><ul><li>Read and compose messages on an airplane </li></ul></ul>
    6. 6. The Problem Landscape <ul><li>Connectivity: Intermittent, Unreliable, Slow </li></ul><ul><li>Development Model </li></ul><ul><ul><li>Most tools focus on clients that can always talk to middle tier or database servers, even with disconnected recordset </li></ul></ul><ul><li>Concurrency </li></ul><ul><ul><li>“ Highly Optimistic” locking </li></ul></ul>
    7. 7. Assumptions <ul><li>My assumptions are for the worst-case, nationwide deployment scenario </li></ul><ul><li>The situation is less severe if you application is limited to a geographic area where there are ample, high quality service providers </li></ul><ul><li>The general ideas still apply </li></ul>
    8. 8. Sometimes You are Not Connected <ul><li>Wireless networks are nowhere near 100% coverage </li></ul><ul><ul><li>(excluding some campus networks) </li></ul></ul><ul><li>So an application must use a “Briefcase” Metaphor: </li></ul><ul><ul><li>Pack up some work to do, in your briefcase. </li></ul></ul><ul><ul><li>Do the work, using just what’s in there. </li></ul></ul><ul><ul><li>Occasionally get more work and drop off / send in what you have done. </li></ul></ul>
    9. 9. Interlude: Buying Wireless Access <ul><li>Availability and cost of wireless data depends on which vendors have built networks in an area </li></ul><ul><li>Varies widely by area </li></ul><ul><ul><li>Even with nationwide providers </li></ul></ul><ul><ul><li>Especially when you get away from the largest cities </li></ul></ul>
    10. 10. When You are Connected, It’s Not Very Good <ul><li>High Latency </li></ul><ul><ul><li>Hundreds of milliseconds is typical, 1000 is not unusual </li></ul></ul><ul><li>Low Bandwidth </li></ul><ul><ul><li>There are promising developments, but… </li></ul></ul><ul><ul><li>There are also many areas where ~10 kilobits is typical </li></ul></ul><ul><ul><li>User may want to wait on large data items until a fast connection is available. </li></ul></ul>
    11. 11. … And The Data Changes Underneath You <ul><li>Someone else worked on the same entities you were working on </li></ul><ul><li>Something assigned to you, got reassigned to someone else </li></ul><ul><li>In many problem domains, conflict avoidance is possible </li></ul><ul><li>In others, conflict resolution is necessary </li></ul>
    12. 12. A Rich / Smart Client <ul><li>These applications require: </li></ul><ul><ul><li>Some kind of local storage: data file(s), database engine, RAM </li></ul></ul><ul><ul><li>Local data validation, as much as possible </li></ul></ul><ul><ul><li>A non-HTML GUI </li></ul></ul><ul><ul><ul><li>Unless you use a client-side web server </li></ul></ul></ul><ul><li>Happily, it’s no longer painful to deploy and update rich client, with JWS, ClickOnce, etc. </li></ul>
    13. 13. Solution Elements <ul><li>Synchronization </li></ul><ul><ul><li>Reference Data vs. Transaction Data </li></ul></ul><ul><ul><li>Figuring out what data to send </li></ul></ul><ul><ul><li>Sending it </li></ul></ul><ul><li>Storing and manipulating data in the client application </li></ul>
    14. 14. Synchronization <ul><li>The client and server occasionally connect </li></ul><ul><ul><li>Perhaps a few times a week </li></ul></ul><ul><ul><li>Perhaps a few times a minute </li></ul></ul><ul><li>Send relevant changes back and forth </li></ul><ul><li>Reference data “syncs” one direction </li></ul><ul><li>Transaction data “syncs” both directions </li></ul>
    15. 15. Interlude: Dependency Reversal <ul><li>Usual approach: </li></ul><ul><ul><li>Client code depends on server APIs </li></ul></ul><ul><ul><li>Server doesn’t know much client internals </li></ul></ul><ul><li>Consider: </li></ul><ul><ul><li>A well-encapsulated server module which intimately understands the client </li></ul></ul><ul><ul><li>Makes it possible to know more about what to send, thus to send less – trade runtime efficiency for development effort </li></ul></ul>
    16. 16. DBMS Replication <ul><li>Some DBMSs provide replication </li></ul><ul><ul><li>Including small-footprint code for small devices </li></ul></ul><ul><ul><li>Consider SQL Server, SQL Anywhere, PointBase, others </li></ul></ul><ul><li>Big advantage: </li></ul><ul><ul><li>Can save a lot of time in development schedule </li></ul></ul>
    17. 17. Why not use DBMS Replication? <ul><li>Ties the application closely to that DBMS vendor </li></ul><ul><ul><li>Each has a different feature set / design </li></ul></ul><ul><ul><li>Not a problem if you are committed to one DBMS </li></ul></ul><ul><li>Can be quite expensive </li></ul><ul><ul><li>In one case, just the sync module for X,000 users would have cost almost as much as the whole project </li></ul></ul><ul><li>Limits the application to vendor’s conception of how synchronization should work </li></ul>
    18. 18. Rolling Your Own – A Layered Model <ul><li>Logical Layer </li></ul><ul><ul><li>Decide what data to send </li></ul></ul><ul><ul><li>Code, Queries, etc. </li></ul></ul><ul><li>Transport / Physical Layer </li></ul><ul><ul><li>Package / represent the data </li></ul></ul><ul><ul><li>XML, Binary, etc. </li></ul></ul>
    19. 19. Decide What Data to Send: <ul><li>Three approaches discussed here: </li></ul><ul><li>Date-Modified Fields </li></ul><ul><li>Message Queuing </li></ul><ul><li>Remember and Diff </li></ul>
    20. 20. Date-Modified Approach <ul><li>Each table / entity gets a date-modified field </li></ul><ul><ul><li>Kept up to date with code or trigger </li></ul></ul><ul><li>Don’t delete, rather set/clear a Active flag </li></ul><ul><ul><li>There are other ways, of course </li></ul></ul><ul><li>Database schema has to accommodate this approach </li></ul><ul><li>Potentially complex and expensive queries to gather the data </li></ul>
    21. 21. Example <ul><li>Schema </li></ul><ul><li>Application Code </li></ul><ul><li>SQL to gather changed data </li></ul><ul><ul><li>Potentially expensive JOINs or subqueries </li></ul></ul>
    22. 22. Message Queuing <ul><li>Easily added to existing systems without schema change </li></ul><ul><ul><li>Hence much easier to deal with legacy systems and EAI </li></ul></ul><ul><li>Need to generate change messages, in application code or triggers </li></ul><ul><li>Can use an off the shelf MOM </li></ul><ul><ul><li>Which comes with transports already </li></ul></ul>
    23. 23. Message Queuing Implementation <ul><li>Each client subscribes to the topics that matter to it </li></ul><ul><ul><li>“Durable” subscription, so they can pick up changes while disconnected </li></ul></ul><ul><li>Application publishes all changes </li></ul><ul><li>Can handle deletes as well as insert/updates </li></ul>
    24. 24. Message Queuing Downsides <ul><li>If something changes many times between the client picking up the changes, they there could be lots of wasted download volume </li></ul><ul><li>This can be addressed with a “queue consolidator” interposed between end client and MOM system </li></ul>
    25. 25. Example <ul><li>Queue Setup </li></ul><ul><li>Application Code </li></ul>
    26. 26. “Diff” Approach <ul><li>Most efficient, in terms of data sent </li></ul><ul><li>Always sends exactly what needs to be sent </li></ul><ul><li>Can handle deletes seamlessly </li></ul><ul><li>Easily scales to highly complex data </li></ul><ul><ul><li>Though it can get CPU-intensive; </li></ul></ul><ul><ul><li>CPU cycles are much cheaper than wireless bandwidth </li></ul></ul>
    27. 27. Diff Approach - How It Works <ul><li>The server knows the current state of the client data store </li></ul><ul><li>The server calculates the intended state of the client data store </li></ul><ul><li>Calculates the “diff” between them and sends (Generic, well-studied problem)’ </li></ul><ul><li>Client applies the diff to its local data store </li></ul><ul><li>Uses before/after hash/CRC to make sure it really worked (Safe!) </li></ul>
    28. 28. Example <ul><li>Application Code </li></ul><ul><li>Example data packet </li></ul>
    29. 29. Continuous Synchronization <ul><li>Users may want to keep up to date at all times, when they have a connection </li></ul><ul><li>Trivial under the MOM approach </li></ul><ul><li>Less easy under the other approaches </li></ul><ul><ul><li>Typically necessary to add a publish / subscribe subsystem </li></ul></ul>
    30. 30. Low-Hanging Fruit: Reference Data <ul><li>Reference data does not change as part of the transaction flow of the application. </li></ul><ul><li>Install large reference data with the application – CDROMs are cheap </li></ul><ul><li>Update it with a one-way sync process </li></ul><ul><ul><li>Usually any simple sync process works fine for reference data </li></ul></ul><ul><ul><li>Disconnected approach to reference data can be very helpful for garden-variety applications also! </li></ul></ul>
    31. 31. Physical: Encoding / Packing / Transferring the Data <ul><li>Minimize Data Transfer Volume </li></ul><ul><ul><li>Send the minimum data needed </li></ul></ul><ul><ul><li>Consider data format verbosity </li></ul></ul><ul><ul><li>Use Compression </li></ul></ul><ul><li>Minimize round-trips </li></ul><ul><ul><li>“ Boxcarring” </li></ul></ul>
    32. 32. Data Transfer Volume <ul><li>Choosing carefully what data to send (in the application domain) matters more than compression or encoding. </li></ul><ul><li>This often interacts with overall application design </li></ul><ul><li>Given the above, look for a sufficiently efficient encoding </li></ul><ul><ul><li>The more data, the more important the encoding. </li></ul></ul>
    33. 33. Compression <ul><li>Whatever else you do, also use compression </li></ul><ul><ul><li>Wireless networks are slow compared to compression/decompression </li></ul></ul><ul><ul><li>Even on small-CPU devices. </li></ul></ul>
    34. 34. XML <ul><li>Be wary of verbose formats like XML </li></ul><ul><li>But not too wary - measure. XML has been fine for us, much of the verbosity compresses out. </li></ul><ul><li>Some DBMSs can read and write XML, saving implementation effort </li></ul><ul><li>XML facilitates varying client/server platforms, declarative correctness checks, versioning, etc. </li></ul>
    35. 35. Example <ul><li>XML “diff” packets </li></ul>
    36. 36. SOAP <ul><li>Lots of in-the-box support in many languages and toolsets </li></ul><ul><li>XML – based; verbose; flexible </li></ul><ul><li>Many SOAP experts recommend coarse operations, which are appropriate for disconnected mode applications </li></ul><ul><li>Consider SOAP document mode (vs. RPC mode) </li></ul>
    37. 37. Making XML More Efficient <ul><li>There are XML-specific compression tools </li></ul><ul><ul><li>Which can be used with straight XML or SOAP </li></ul></ul><ul><li>There are “binary XML” encodings </li></ul><ul><ul><li>WML includes one: </li></ul></ul><ul><ul><ul><li>Format available at http://www.wapforum.org/ </li></ul></ul></ul><ul><ul><ul><li>Encode and decode with http://wbxml4j.sourceforge.net/ </li></ul></ul></ul><ul><ul><li>BOX – http://box.sf.net </li></ul></ul><ul><ul><li>There are others </li></ul></ul>
    38. 38. Highly Efficient Encodings <ul><li>If the previous approaches are insufficient, consider domain-specific encodings: </li></ul><ul><ul><li>tokenize your data with a dictionary of common data elements pre-loaded on the client, and pre-loaded in the compression engine </li></ul></ul><ul><ul><li>Hand-code a binary storage format </li></ul></ul><ul><ul><li>Java: Externalize rather than Serialize </li></ul></ul><ul><ul><li>Measure serialization mechanisms, sometimes they are surprisingly verbose </li></ul></ul>
    39. 39. Boxcarring <ul><ul><li>Lots of little messages in one big message </li></ul></ul><ul><ul><li>Minimizes round-trips </li></ul></ul><ul><ul><li>Supported directly by a few middleware systems, such as XML-RPC </li></ul></ul><ul><ul><li>Can be implemented as a collection of Command objects </li></ul></ul>
    40. 40. Example <ul><li>Command pattern used to send many changes over a single server invocation </li></ul>
    41. 41. Client-side Data Storage and Manipulation <ul><li>Approaches to data storage in a client application: </li></ul><ul><li>DBMS </li></ul><ul><li>Serialization </li></ul><ul><ul><li>XML, native, etc. </li></ul></ul><ul><li>Image/Log </li></ul><ul><ul><li>If using PC hardware, data probably fits in RAM </li></ul></ul><ul><ul><li>Smalltalk-ish approach </li></ul></ul><ul><ul><li>Prevalyer </li></ul></ul>
    42. 42. DBMS on the Client <ul><li>A DBMS is a well-understood, off the shelf way to deal with business application data </li></ul><ul><li>Client schema can be a simplified version of the server schema, or totally different (!) </li></ul><ul><li>Facilitates RAD development on the client </li></ul><ul><li>I recommend the DBMS approach for complex client applications </li></ul>
    43. 43. Local Database Engines <ul><li>Java: HSQLDB, JDataStore, Mckoi SQL, others </li></ul><ul><li>Microsoft: MSDE </li></ul><ul><li>Other: In a Delphi app, we used DBISAM, which compiles completely in to the application </li></ul><ul><li>If you use a DBMS-provided replication mechanism, you’ll use their client DBMS. </li></ul>
    44. 44. Serialization <ul><li>The data size may be small enough to simply write it all out with a few lines of code </li></ul><ul><ul><li>If your toolset supports serialization </li></ul></ul><ul><ul><li>Benchmark and measure, of course </li></ul></ul><ul><ul><li>Big win in simplicity </li></ul></ul><ul><li>The even/odd technique to prevent data loss </li></ul>
    45. 45. Tracking Changes on the Client <ul><li>Various approaches: </li></ul><ul><ul><li>Changed data “flag” fields </li></ul></ul><ul><ul><li>Change log (table, objects, file, etc.) </li></ul></ul><ul><ul><li>Diff current client data to stored starting point </li></ul></ul><ul><li>Choose whatever is convenient in your client toolset </li></ul>
    46. 46. Applying Changes From the Server <ul><li>Small application: </li></ul><ul><ul><li>Hand-coding to apply the few kinds of changes </li></ul></ul><ul><li>Big application </li></ul><ul><ul><li>Build a generic mechanism to apply change “packets” to the local data store </li></ul></ul><ul><li>Thing of each change as a command to be replayed </li></ul>
    47. 47. Applying Changes: Example <ul><li>Example code for a simple application </li></ul><ul><li>Snippet from a complex application </li></ul>
    48. 48. The Other Direction – Client to Server <ul><li>End users do work on their local data store </li></ul><ul><li>When synchronizing, these changes need to be send to the server </li></ul><ul><li>Approach 1: use a data-centric mechanism as described for Server  Client </li></ul><ul><li>Approach 2: send changes at the domain level (recommended) </li></ul>
    49. 49. Example <ul><li>Domain-level changes </li></ul>
    50. 50. Conflict Resolution <ul><li>Handle user operations on the client, as domain-level operations when sending to the server </li></ul><ul><li>Resolve conflict at the business level: </li></ul><ul><ul><li>A “Work Order” is dispatched to worker A </li></ul></ul><ul><ul><li>Worker A does the work </li></ul></ul><ul><ul><li>Meanwhile, the Work Order get reassigned to worker B </li></ul></ul><ul><ul><li>Worker A syncs </li></ul></ul><ul><ul><li>Uhoh! </li></ul></ul>
    51. 51. Pleasant Surprises <ul><li>Scalability </li></ul><ul><li>Usage Peaks </li></ul><ul><li>Bandwidth </li></ul><ul><li>Uptime </li></ul>
    52. 52. Scalability <ul><li>Mobile / Wireless apps tend to scale very well </li></ul><ul><li>At any moment, most of the users aren’t connected to the server </li></ul><ul><li>Many thousands of users per server is often OK </li></ul>
    53. 53. Locality of Reference <ul><li>When connected, a client tends to performs multiple operations at once on related data </li></ul><ul><ul><li>Hence excellent locality and caching characteristics </li></ul></ul>
    54. 54. Usage Peaks <ul><li>Even busy times of the day (8 AM) get spread out by time zones and human randomness. </li></ul><ul><li>Gracefully defer some clients a little while, if server load is high </li></ul><ul><ul><li>They can usually keep working, since they have local data </li></ul></ul>
    55. 55. Bandwidth <ul><li>Data center bandwidth is far cheaper than mobile bandwidth, of course </li></ul><ul><li>Even a T1 line can support a multitude of busy users who connect over a mobile wireless connection </li></ul><ul><ul><li>Real-world example: 2 T1s, 2000+ end users. </li></ul></ul><ul><li>Hence, data-center bandwidth usually isn’t much of an issue </li></ul>
    56. 56. Uptime <ul><li>“Five Nines” uptime tends to be less important with disconnected applications </li></ul><ul><li>Most end users would not notice a few minutes of downtime </li></ul><ul><ul><li>They have local data to work with </li></ul></ul><ul><ul><li>Enables much less costly few-minutes failover approaches, in lieu of online failover clustering </li></ul></ul>
    57. 57. Lessons From Our Experiences <ul><li>Have enough data on the client to let the user work for a long time without a connection </li></ul><ul><li>Minimize round-trips and data volume </li></ul><ul><li>Take control with code when to do something specific to get the performance needed. </li></ul><ul><li>Conflict resolution and synchronization status end up as requirements / application design issues </li></ul><ul><ul><li>Take them to the customer </li></ul></ul>
    58. 58. THE END <ul><li>Kyle Cordes [email_address] (636) 219-9589 </li></ul><ul><li>Slides and snippets will be on my web site, http://kylecordes.com </li></ul><ul><li>Slides are also on the conference CDROM </li></ul>

    ×