• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Data handling for Disconnected Apps

Data handling for Disconnected Apps






Total Views
Views on SlideShare
Embed Views



0 Embeds 0

No embeds



Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment
  • I generally skip this slide when giving a talk, it is here to provide background for those who aren’t there to hear it.
  • In an application for single-city deployment, you may be able to assure a high level of connectedness. This will not be the case in a nationwide deployment.
  • Also consider web-based installers, auto upgrade mechanisms, etc.
  • I’ll mention reference data again later, it is a simpler case with some easy optimizations
  • Moreover, this talk is about the technical design of how it works, which you may be able to avoid caring about.
  • Show an example of such a query, point out why it can be expensive
  • Show an example of this, using JMS

Data handling for Disconnected Apps Data handling for Disconnected Apps Presentation Transcript

  • Disconnected Data Handling in Mobile / Wireless Applications Kyle Cordes Oasis Digital Solutions Inc.
  • Agenda
    • What do I mean by mobile / wireless?
    • Brief overview of the problem space
    • Solutions, approaches, tips
    • Examples along the way
    • We’ll covers many of these slides quite briefly, to focus on the interesting parts.
  • Yet Another About Me Slide
    • Kyle Cordes:
      • Developer, consultant, project manager, etc.
      • Uses Java, .NET, Delphi, lots of other languages, RDBMSs, etc.
    • Oasis Digital has developed mobile / wireless applications…
      • for client projects; we have no product / framework / etc. to sell.
  • What is a Mobile / Wireless Application?
    • User community that moves around
      • World, Country, State, Metro, Campus
    • Occasional (typically wireless) access to servers
    • This presentation is specifically concerned with data-centric “enterprise” applications
      • There are other interesting application types, such as multiplayer games.
  • Application Examples:
    • Service Dispatching
      • For example a delivery driver
    • Mobile Workforce CRM
      • On-the-road sales force
    • Shared Calendars / Scheduling
    • Email (!)
      • Read and compose messages on an airplane
  • The Problem Landscape
    • Connectivity: Intermittent, Unreliable, Slow
    • Development Model
      • Most tools focus on clients that can always talk to middle tier or database servers, even with disconnected recordset
    • Concurrency
      • “ Highly Optimistic” locking
  • Assumptions
    • My assumptions are for the worst-case, nationwide deployment scenario
    • The situation is less severe if you application is limited to a geographic area where there are ample, high quality service providers
    • The general ideas still apply
  • Sometimes You are Not Connected
    • Wireless networks are nowhere near 100% coverage
      • (excluding some campus networks)
    • So an application must use a “Briefcase” Metaphor:
      • Pack up some work to do, in your briefcase.
      • Do the work, using just what’s in there.
      • Occasionally get more work and drop off / send in what you have done.
  • Interlude: Buying Wireless Access
    • Availability and cost of wireless data depends on which vendors have built networks in an area
    • Varies widely by area
      • Even with nationwide providers
      • Especially when you get away from the largest cities
  • When You are Connected, It’s Not Very Good
    • High Latency
      • Hundreds of milliseconds is typical, 1000 is not unusual
    • Low Bandwidth
      • There are promising developments, but…
      • There are also many areas where ~10 kilobits is typical
      • User may want to wait on large data items until a fast connection is available.
  • … And The Data Changes Underneath You
    • Someone else worked on the same entities you were working on
    • Something assigned to you, got reassigned to someone else
    • In many problem domains, conflict avoidance is possible
    • In others, conflict resolution is necessary
  • A Rich / Smart Client
    • These applications require:
      • Some kind of local storage: data file(s), database engine, RAM
      • Local data validation, as much as possible
      • A non-HTML GUI
        • Unless you use a client-side web server
    • Happily, it’s no longer painful to deploy and update rich client, with JWS, ClickOnce, etc.
  • Solution Elements
    • Synchronization
      • Reference Data vs. Transaction Data
      • Figuring out what data to send
      • Sending it
    • Storing and manipulating data in the client application
  • Synchronization
    • The client and server occasionally connect
      • Perhaps a few times a week
      • Perhaps a few times a minute
    • Send relevant changes back and forth
    • Reference data “syncs” one direction
    • Transaction data “syncs” both directions
  • Interlude: Dependency Reversal
    • Usual approach:
      • Client code depends on server APIs
      • Server doesn’t know much client internals
    • Consider:
      • A well-encapsulated server module which intimately understands the client
      • Makes it possible to know more about what to send, thus to send less – trade runtime efficiency for development effort
  • DBMS Replication
    • Some DBMSs provide replication
      • Including small-footprint code for small devices
      • Consider SQL Server, SQL Anywhere, PointBase, others
    • Big advantage:
      • Can save a lot of time in development schedule
  • Why not use DBMS Replication?
    • Ties the application closely to that DBMS vendor
      • Each has a different feature set / design
      • Not a problem if you are committed to one DBMS
    • Can be quite expensive
      • In one case, just the sync module for X,000 users would have cost almost as much as the whole project
    • Limits the application to vendor’s conception of how synchronization should work
  • Rolling Your Own – A Layered Model
    • Logical Layer
      • Decide what data to send
      • Code, Queries, etc.
    • Transport / Physical Layer
      • Package / represent the data
      • XML, Binary, etc.
  • Decide What Data to Send:
    • Three approaches discussed here:
    • Date-Modified Fields
    • Message Queuing
    • Remember and Diff
  • Date-Modified Approach
    • Each table / entity gets a date-modified field
      • Kept up to date with code or trigger
    • Don’t delete, rather set/clear a Active flag
      • There are other ways, of course
    • Database schema has to accommodate this approach
    • Potentially complex and expensive queries to gather the data
  • Example
    • Schema
    • Application Code
    • SQL to gather changed data
      • Potentially expensive JOINs or subqueries
  • Message Queuing
    • Easily added to existing systems without schema change
      • Hence much easier to deal with legacy systems and EAI
    • Need to generate change messages, in application code or triggers
    • Can use an off the shelf MOM
      • Which comes with transports already
  • Message Queuing Implementation
    • Each client subscribes to the topics that matter to it
      • “Durable” subscription, so they can pick up changes while disconnected
    • Application publishes all changes
    • Can handle deletes as well as insert/updates
  • Message Queuing Downsides
    • If something changes many times between the client picking up the changes, they there could be lots of wasted download volume
    • This can be addressed with a “queue consolidator” interposed between end client and MOM system
  • Example
    • Queue Setup
    • Application Code
  • “Diff” Approach
    • Most efficient, in terms of data sent
    • Always sends exactly what needs to be sent
    • Can handle deletes seamlessly
    • Easily scales to highly complex data
      • Though it can get CPU-intensive;
      • CPU cycles are much cheaper than wireless bandwidth
  • Diff Approach - How It Works
    • The server knows the current state of the client data store
    • The server calculates the intended state of the client data store
    • Calculates the “diff” between them and sends (Generic, well-studied problem)’
    • Client applies the diff to its local data store
    • Uses before/after hash/CRC to make sure it really worked (Safe!)
  • Example
    • Application Code
    • Example data packet
  • Continuous Synchronization
    • Users may want to keep up to date at all times, when they have a connection
    • Trivial under the MOM approach
    • Less easy under the other approaches
      • Typically necessary to add a publish / subscribe subsystem
  • Low-Hanging Fruit: Reference Data
    • Reference data does not change as part of the transaction flow of the application.
    • Install large reference data with the application – CDROMs are cheap
    • Update it with a one-way sync process
      • Usually any simple sync process works fine for reference data
      • Disconnected approach to reference data can be very helpful for garden-variety applications also!
  • Physical: Encoding / Packing / Transferring the Data
    • Minimize Data Transfer Volume
      • Send the minimum data needed
      • Consider data format verbosity
      • Use Compression
    • Minimize round-trips
      • “ Boxcarring”
  • Data Transfer Volume
    • Choosing carefully what data to send (in the application domain) matters more than compression or encoding.
    • This often interacts with overall application design
    • Given the above, look for a sufficiently efficient encoding
      • The more data, the more important the encoding.
  • Compression
    • Whatever else you do, also use compression
      • Wireless networks are slow compared to compression/decompression
      • Even on small-CPU devices.
  • XML
    • Be wary of verbose formats like XML
    • But not too wary - measure. XML has been fine for us, much of the verbosity compresses out.
    • Some DBMSs can read and write XML, saving implementation effort
    • XML facilitates varying client/server platforms, declarative correctness checks, versioning, etc.
  • Example
    • XML “diff” packets
  • SOAP
    • Lots of in-the-box support in many languages and toolsets
    • XML – based; verbose; flexible
    • Many SOAP experts recommend coarse operations, which are appropriate for disconnected mode applications
    • Consider SOAP document mode (vs. RPC mode)
  • Making XML More Efficient
    • There are XML-specific compression tools
      • Which can be used with straight XML or SOAP
    • There are “binary XML” encodings
      • WML includes one:
        • Format available at http://www.wapforum.org/
        • Encode and decode with http://wbxml4j.sourceforge.net/
      • BOX – http://box.sf.net
      • There are others
  • Highly Efficient Encodings
    • If the previous approaches are insufficient, consider domain-specific encodings:
      • tokenize your data with a dictionary of common data elements pre-loaded on the client, and pre-loaded in the compression engine
      • Hand-code a binary storage format
      • Java: Externalize rather than Serialize
      • Measure serialization mechanisms, sometimes they are surprisingly verbose
  • Boxcarring
      • Lots of little messages in one big message
      • Minimizes round-trips
      • Supported directly by a few middleware systems, such as XML-RPC
      • Can be implemented as a collection of Command objects
  • Example
    • Command pattern used to send many changes over a single server invocation
  • Client-side Data Storage and Manipulation
    • Approaches to data storage in a client application:
    • DBMS
    • Serialization
      • XML, native, etc.
    • Image/Log
      • If using PC hardware, data probably fits in RAM
      • Smalltalk-ish approach
      • Prevalyer
  • DBMS on the Client
    • A DBMS is a well-understood, off the shelf way to deal with business application data
    • Client schema can be a simplified version of the server schema, or totally different (!)
    • Facilitates RAD development on the client
    • I recommend the DBMS approach for complex client applications
  • Local Database Engines
    • Java: HSQLDB, JDataStore, Mckoi SQL, others
    • Microsoft: MSDE
    • Other: In a Delphi app, we used DBISAM, which compiles completely in to the application
    • If you use a DBMS-provided replication mechanism, you’ll use their client DBMS.
  • Serialization
    • The data size may be small enough to simply write it all out with a few lines of code
      • If your toolset supports serialization
      • Benchmark and measure, of course
      • Big win in simplicity
    • The even/odd technique to prevent data loss
  • Tracking Changes on the Client
    • Various approaches:
      • Changed data “flag” fields
      • Change log (table, objects, file, etc.)
      • Diff current client data to stored starting point
    • Choose whatever is convenient in your client toolset
  • Applying Changes From the Server
    • Small application:
      • Hand-coding to apply the few kinds of changes
    • Big application
      • Build a generic mechanism to apply change “packets” to the local data store
    • Thing of each change as a command to be replayed
  • Applying Changes: Example
    • Example code for a simple application
    • Snippet from a complex application
  • The Other Direction – Client to Server
    • End users do work on their local data store
    • When synchronizing, these changes need to be send to the server
    • Approach 1: use a data-centric mechanism as described for Server  Client
    • Approach 2: send changes at the domain level (recommended)
  • Example
    • Domain-level changes
  • Conflict Resolution
    • Handle user operations on the client, as domain-level operations when sending to the server
    • Resolve conflict at the business level:
      • A “Work Order” is dispatched to worker A
      • Worker A does the work
      • Meanwhile, the Work Order get reassigned to worker B
      • Worker A syncs
      • Uhoh!
  • Pleasant Surprises
    • Scalability
    • Usage Peaks
    • Bandwidth
    • Uptime
  • Scalability
    • Mobile / Wireless apps tend to scale very well
    • At any moment, most of the users aren’t connected to the server
    • Many thousands of users per server is often OK
  • Locality of Reference
    • When connected, a client tends to performs multiple operations at once on related data
      • Hence excellent locality and caching characteristics
  • Usage Peaks
    • Even busy times of the day (8 AM) get spread out by time zones and human randomness.
    • Gracefully defer some clients a little while, if server load is high
      • They can usually keep working, since they have local data
  • Bandwidth
    • Data center bandwidth is far cheaper than mobile bandwidth, of course
    • Even a T1 line can support a multitude of busy users who connect over a mobile wireless connection
      • Real-world example: 2 T1s, 2000+ end users.
    • Hence, data-center bandwidth usually isn’t much of an issue
  • Uptime
    • “Five Nines” uptime tends to be less important with disconnected applications
    • Most end users would not notice a few minutes of downtime
      • They have local data to work with
      • Enables much less costly few-minutes failover approaches, in lieu of online failover clustering
  • Lessons From Our Experiences
    • Have enough data on the client to let the user work for a long time without a connection
    • Minimize round-trips and data volume
    • Take control with code when to do something specific to get the performance needed.
    • Conflict resolution and synchronization status end up as requirements / application design issues
      • Take them to the customer
    • Kyle Cordes [email_address] (636) 219-9589
    • Slides and snippets will be on my web site, http://kylecordes.com
    • Slides are also on the conference CDROM