Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Implementing Server Side Data Synchronization for Mobile Apps

Today mobile apps are everywhere. These apps cannot count on a reliable and constant internet connection: working in offline mode is becoming a common pattern. This is quite easy for read-only apps but it becomes rapidly tricky for apps that create data in offline mode. This talk is a case study about a possible architecture for enabling data synchronization in these situations

  • Login to see the comments

Implementing Server Side Data Synchronization for Mobile Apps

  1. 1. Implementing Server Side Data Synchronization for Mobile Apps
  2. 2. Michele Orselli CTO@Ideato ! _orso_ ! micheleorselli ! mo@ideato.it
  3. 3. Agenda scenario design choices implementation alternative approaches
  4. 4. Sync scenario A B C
  5. 5. Sync scenario A B C A B C A B C
  6. 6. Dealing with conflicts A1 A2 ?
  7. 7. Scenario Brownfield project ! several mobile apps for tracking user generated data (calendar, notes, bio data) ! iOS & Android ! ~10 K users steadily growing at 1.2 K/month
  8. 8. Scenario MongoDB ! Legacy App based on codeigniter ! Existing RPC-wannabe-REST API for data sync
  9. 9. Scenario get updates: ! POST /m/<app>/get/<user_id>/<res>/<updated_from> ! ! ! send updates: ! POST /m/<app>/update/<user_id>/<res_id>/<dev_id>/<res> ! !
  10. 10. api
  11. 11. Scenario ! ! 6 different resources, 12 calls per sync ! apps sync by polling every 30 sec ! every call sync little data ! !
  12. 12. Challenge ! ! rebuild sync API for old apps + 2 incoming ! allow image synchronization ! more efficient than previous API ! !
  13. 13. Existing Solutions Tstamps, Vector clocks, CRDTs syncML, syncano Algorithms Protocols/API Azure Data sync Platform couchDB, riak Storage
  14. 14. Not Invented Here? Don't Reinvent The Wheel, Unless You Plan on Learning More About Wheels ! J. Atwood
  15. 15. Architecture ! ! 2 different mobile platforms several teams with different skill level ! changing storage wasn’t an option forcing a particular technology client side wasn’t an option
  16. 16. Architecture c1 server c2 c3 sync logic conflicts resolution thin clients
  17. 17. Implementation ! ! In the sync domain all resources are the same ! For every app one endpoint for getting new data one endpoint for pushing changes one endpoint for uploading images
  18. 18. Get changes ! Get all changes (1st sync): ! GET /apps/{app}/users/{user_id}/changes ! Get latest changes: ! GET /apps/{app}/users/{user_id}/changes?from={from}
  19. 19. Get changes ! Get all changes (1st sync): ! GET /apps/{app}/users/{user_id}/changes ! Get latest changes: ! GET /apps/{app}/users/{user_id}/changes?from={from} timestamp?
  20. 20. Server suggest the sync time timestamp are inaccurate (skew and developer errors) ! server suggests the “from” parameter to be used in the next request GET /changes c1 server { ‘next’ : 123456, ‘data’: […] }
  21. 21. Server suggest the sync time GET /changes { ‘next’ : 12345, ‘data’: […] } c1 server
  22. 22. Server suggest the sync time GET /changes { ‘next’ : 12345, ‘data’: […] } c1 server GET /changes?from=12345 { ‘next’ : 45678, ‘data’: […] }
  23. 23. what to transfer operations: {‘op’: ’add’, id: ‘1’, ’data’:[…]} {‘op’: ’update’, id: ‘1’, ’data’:[…]} {‘op’: ’delete’, id: ‘1’} {‘op’: ’add’, id: ‘2’, ’data’:[…]} ! ! states: {id: ‘1’, ’data’:[…]} {id: 2’, ’data’:[…]} {id: ‘3’, ’data’:[…]}
  24. 24. what to transfer ! we chosen to transfer states {id: ‘1’, ’type’: ‘measure’, ‘_deleted’: true} {id: 2’, ‘type’: ‘note’} {id: ‘3’, ‘type’: ‘note’} ! ! ps: soft delete all the things!
  25. 25. unique identifiers How do we generate an unique id in a distributed system? ! UUID: several implementations (RFC 4122) ! Local Ids/Global Id: server generates GUIDs clients use local ids to manage their records GET /changes c1 server {‘data’:{’guid’: ‘58f0bdd7-1481’}}
  26. 26. unique identifiers POST /merge { ‘data’: [ {’lid’: ‘1’, …}, {‘lid’: ‘2’, …} ] } c1 server { ‘data’: [ {‘guid’: ‘58f0bdd7-1400’, ’lid’: ‘1’, …}, {‘guid’: ‘6f9f3ec9-1400’, ‘lid’: ‘2’, …} ] }
  27. 27. conflict resolution algorithm (plain data) ! server handles conflicts resolution mobile generated data are “temporary” until sync to server ! conflict resolution: domain indipendent: last-write wins domain dipendent: use domain knowledge to resolve
  28. 28. conflict resolution algorithm (plain data) function sync($data) {! ! ! foreach ($data as $newRecord) {! ! ! ! $s = findByGuid($newRecord->getGuid());! ! ! ! ! if (!$s) {! ! ! ! add($newRecord);! ! ! ! send($newRecord);! ! ! ! continue;! ! ! }! ! ! ! ! ! ! if ($newRecord->updated > $s->updated) {! ! ! ! update($s, $newRecord);! ! ! ! send($newRecord);! ! ! ! continue;! ! ! }! ! ! ! ! ! updateRemote($newRecord, $s);! }
  29. 29. conflict resolution algorithm (plain data) function sync($data) {! ! ! foreach ($data as $newRecord) {! ! ! ! $s = findByGuid($newRecord->getGuid());! ! ! ! ! if (!$s) {! ! ! ! add($newRecord);! ! ! ! send($newRecord);! ! ! ! continue;! ! ! }! ! ! ! ! ! ! if ($newRecord->updated > $s->updated) {! ! ! ! update($s, $newRecord);! ! ! ! send($newRecord);! ! ! ! continue;! ! ! }! ! ! ! ! ! updateRemote($newRecord, $s);! }
  30. 30. conflict resolution algorithm (plain data) function sync($data) {! ! ! foreach ($data as $newRecord) {! ! ! ! $s = findByGuid($newRecord->getGuid());! ! ! ! ! if (!$s) {! ! ! ! add($newRecord);! ! ! ! send($newRecord);! ! ! ! continue;! ! ! }! ! ! ! ! ! ! if ($newRecord->updated > $s->updated) {! ! ! ! update($s, $newRecord);! ! ! ! send($newRecord);! ! ! ! continue;! ! ! }! ! ! ! ! ! updateRemote($newRecord, $s);! } no conflict
  31. 31. conflict resolution algorithm (plain data) function sync($data) {! ! ! foreach ($data as $newRecord) {! ! ! ! $s = findByGuid($newRecord->getGuid());! ! ! ! ! if (!$s) {! ! ! ! add($newRecord);! ! ! ! send($newRecord);! ! ! ! continue;! ! ! }! ! ! ! ! ! ! if ($newRecord->updated > $s->updated) {! ! ! ! update($s, $newRecord);! ! ! ! send($newRecord);! ! ! ! continue;! ! ! }! ! ! ! ! ! updateRemote($newRecord, $s);! } remote wins
  32. 32. conflict resolution algorithm (plain data) function sync($data) {! ! ! foreach ($data as $newRecord) {! ! ! ! $s = findByGuid($newRecord->getGuid());! ! ! ! ! if (!$s) {! ! ! ! add($newRecord);! ! ! ! send($newRecord);! ! ! ! continue;! ! ! }! ! ! ! ! ! ! if ($newRecord->updated > $s->updated) {! ! ! ! update($s, $newRecord);! ! ! ! send($newRecord);! ! ! ! continue;! ! ! }! ! ! ! ! ! updateRemote($newRecord, $s);! } server wins
  33. 33. conflict resolution algorithm (plain data) { ‘lid’: ‘1’, ‘guid’: ‘af54d’, ‘data’ : ‘AAA’ ‘updated’: ’100’ } { ‘lid’: ‘2’, ‘data’ : ‘hello!’, ‘updated’: ’15’ } c1 { ’guid’: ‘af54d’, ‘data’: ‘BBB’, ‘updated’ : ’20’ } server
  34. 34. conflict resolution algorithm (plain data) { ‘lid’: ‘1’, ‘guid’: ‘af54d’, ‘data’ : ‘AAA’ ‘updated’: ’100’ } { ‘lid’: ‘2’, ‘data’ : ‘hello!’, ‘updated’: ’15’ } POST /merge { ’guid’: ‘af54d’, ‘data’: ‘BBB’, ‘updated’ : ’20’ } c1 server
  35. 35. conflict resolution algorithm (plain data) { ‘lid’: ‘1’, ‘guid’: ‘af54d’, ‘data’ : ‘AAA’ ‘updated’: ’100’ } { ‘lid’: ‘2’, ‘data’ : ‘hello!’, ‘updated’: ’15’ } POST /merge { ‘guid’: ‘e324f’, ‘data’ : ‘hello!’, ‘updated’: ’15’ } { ’guid’: ‘af54d’, ‘data’: ‘BBB’, ‘updated’ : ’20’ } c1 server
  36. 36. conflict resolution algorithm (plain data) { ‘lid’: ‘1’, ‘guid’: ‘af54d’, ‘data’ : ‘AAA’ ‘updated’: ’100’ } { ‘lid’: ‘2’, ‘data’ : ‘hello!’, ‘updated’: ’15’ } POST /merge { ‘guid’: ‘e324f’, ‘data’ : ‘hello!’, ‘updated’: ’15’ } { ’guid’: ‘af54d’, ‘data’: ‘BBB’, ‘updated’ : ’20’ } c1 server
  37. 37. conflict resolution algorithm (plain data) { ‘lid’: ‘1’, ‘guid’: ‘af54d’, ‘data’ : ‘AAA’ ‘updated’: ’100’ } { ‘lid’: ‘2’, ‘data’ : ‘hello!’, ‘updated’: ’15’ } POST /merge { ‘guid’: ‘e324f’, ‘data’ : ‘hello!’, ‘updated’: ’15’ } { ’guid’: ‘af54d’, ‘data’: ‘AAA’, ‘updated’ : ’100’ } c1 server
  38. 38. conflict resolution algorithm (plain data) { ‘lid’: ‘1’, ‘guid’: ‘af54d’, ‘data’ : ‘AAA’ ‘updated’: ’100’ } { ‘guid’: ‘e324f’, ‘data’ : ‘hello!’, ‘updated’: ’15’ } { ’guid’: ‘af54d’, ‘data’: ‘AAA’, ‘updated’ : ’100’ } { ‘lid’: ‘2’, ‘data’ : ‘hello!’, ‘updated’: ’15’ } POST /merge c1 server {‘ok’ : { ’guid’: ‘af54d’ }} {‘update’ : { lid: ‘2’, ’guid’: ‘e324f’ }}
  39. 39. conflict resolution algorithm (hierarchical data) ! How to manage hierarchical data? ! ! { ‘lid’ : ‘123456’, ‘type’ : ‘baby’, … } { ‘lid’ : ‘123456’, ‘type’ : ‘temperature’, ‘baby_id : ‘123456’ }
  40. 40. conflict resolution algorithm (hierarchical data) ! How to manage hierarchical data? 1) sync root record 2) update ids 3) sync child records ! ! { ‘lid’ : ‘123456’, ‘type’ : ‘baby’, … } { ‘lid’ : ‘123456’, ‘type’ : ‘temperature’, ‘baby_id : ‘123456’ }
  41. 41. conflict resolution algorithm (hierarchical data) function syncHierarchical($data) {! ! ! sortByHierarchy($data);! ! ! foreach ($data as $newRootRecord) {! ! ! ! ! ! $s = findByGuid($newRootRecord->getGuid());! ! ! ! ! ! if($newRecord->isRoot()) {! ! ! ! ! if (!$s) {! ! ! ! ! add($newRootRecord);! ! ! ! ! updateRecordIds($newRootRecord, $data);! ! ! ! ! send($newRootRecord);! ! ! ! ! continue;! ! ! ! }! ! ! ! ! ! ! ! …
  42. 42. conflict resolution algorithm (hierarchical data) function syncHierarchical($data) {! ! ! sortByHierarchy($data);! parent records first ! ! foreach ($data as $newRootRecord) {! ! ! ! ! ! $s = findByGuid($newRootRecord->getGuid());! ! ! ! ! ! if($newRecord->isRoot()) {! ! ! ! ! if (!$s) {! ! ! ! ! add($newRootRecord);! ! ! ! ! updateRecordIds($newRootRecord, $data);! ! ! ! ! send($newRootRecord);! ! ! ! ! continue;! ! ! ! }! ! ! ! ! ! ! ! …
  43. 43. conflict resolution algorithm (hierarchical data) function syncHierarchical($data) {! ! ! sortByHierarchy($data);! ! ! foreach ($data as $newRootRecord) {! ! ! ! ! ! $s = findByGuid($newRootRecord->getGuid());! ! ! ! ! ! if($newRecord->isRoot()) {! ! ! ! ! if (!$s) {! ! ! ! ! add($newRootRecord);! ! ! ! ! updateRecordIds($newRootRecord, $data);! ! ! ! ! send($newRootRecord);! ! ! ! ! continue;! ! ! ! }! ! ! ! ! ! ! ! …
  44. 44. conflict resolution algorithm (hierarchical data) function syncHierarchical($data) {! ! ! sortByHierarchy($data);! ! ! foreach ($data as $newRootRecord) {! ! ! ! ! ! $s = findByGuid($newRootRecord->getGuid());! ! ! ! ! ! if($newRecord->isRoot()) {! ! ! ! ! if (!$s) {! ! ! ! ! add($newRootRecord);! ! ! ! ! updateRecordIds($newRootRecord, $data);! ! ! ! ! send($newRootRecord);! ! ! ! ! continue;! ! ! ! }! ! ! ! ! ! ! ! … no conflict
  45. 45. conflict resolution algorithm (hierarchical data) ! ! ! …! ! ! ! ! ! if ($newRootRecord->updated > $s->updated) {! ! ! ! ! ! ! update($s, $newRecord);! ! ! ! updateRecordIds($newRootRecord, $data);! ! ! ! ! ! send($newRootRecord);! ! ! ! continue;! ! ! } else {! ! ! ! updateRecordIds($s, $data);! ! ! ! updateRemote($newRecord, $s);! ! ! }! ! ! } else {! ! ! sync($data);! ! }! ! }! remote wins
  46. 46. conflict resolution algorithm (hierarchical data) ! ! ! …! ! ! ! ! ! if ($newRootRecord->updated > $s->updated) {! ! ! ! ! ! ! update($s, $newRecord);! ! ! ! updateRecordIds($newRootRecord, $data);! ! ! ! ! ! send($newRootRecord);! ! ! ! continue;! ! ! } else {! ! ! ! updateRecordIds($s, $data);! server wins ! ! ! updateRemote($newRecord, $s);! ! ! }! ! ! } else {! ! ! sync($data);! ! }! ! }!
  47. 47. conflict resolution algorithm (hierarchical data) { ‘lid’: ‘1’, ‘data’ : ‘AAA’ ‘updated’: ’100’ } { ‘lid’: ‘2’, ‘parent’: ‘1’, ‘data’ : ‘hello!’, ‘updated’: ’15’ } POST /merge c1 server
  48. 48. conflict resolution algorithm (hierarchical data) { ‘lid’: ‘1’, ‘data’ : ‘AAA’ ‘updated’: ’100’ } { ‘lid’: ‘2’, ‘parent’: ‘1’, ‘data’ : ‘hello!’, ‘updated’: ’15’ } c1 server POST /merge { ‘lid’: ‘1’, ‘guid’ : ‘32ead’, ‘data’ : ‘AAA’ ‘updated’: ’100’ }
  49. 49. conflict resolution algorithm (hierarchical data) { ‘lid’: ‘1’, ‘data’ : ‘AAA’ ‘updated’: ’100’ } { ‘lid’: ‘2’, ‘parent’: ‘32ead’, ‘data’ : ‘hello!’, ‘updated’: ’15’ } c1 server POST /merge { ‘lid’: ‘1’, ‘guid’ : ‘32ead’, ‘data’ : ‘AAA’ ‘updated’: ’100’ }
  50. 50. conflict resolution algorithm (hierarchical data) { ‘lid’: ‘1’, ‘data’ : ‘AAA’ ‘updated’: ’100’ } { ‘lid’: ‘2’, ‘parent’: ‘32ead’, ‘data’ : ‘hello!’, ‘updated’: ’15’ } c1 server POST /merge { ‘lid’: ‘1’, ‘guid’ : ‘32ead’, ‘data’ : ‘AAA’ ‘updated’: ’100’ } { ‘lid’: ‘2’, ‘parent’: ‘32ead’, ‘data’ : ‘hello!’, ‘updated’: ’15’ } {‘update’ : { ‘lid’: ‘1’, ’guid’: ‘af54d’ }} {‘update’ : { lid: ‘2’, ’guid’: ‘e324f’ }}
  51. 51. enforcing domain constraints ! ! e.g. “only one temperature can be registered in a given day” ! how to we enforce domain constraints on data?
  52. 52. enforcing domain constraints ! ! e.g. “only one temperature can be registered in a given day” ! how to we enforce domain constraints on data? 1) relax constraints
  53. 53. enforcing domain constraints ! ! e.g. “only one temperature can be registered in a given day” ! how to we enforce domain constraints on data? 1) relax constraints 2) integrate constraints in sync algorithm
  54. 54. ! ! from findByGuid to findSimilar ! first lookup by GUID then by domain rules ! “two measures are similar if are referred to the same date” ! ! ! ! enforcing domain constraints
  55. 55. enforcing domain constraints c1 server
  56. 56. enforcing domain constraints { ’guid’: ‘af54d’, ‘when’: ‘20141005’ } c1 server
  57. 57. enforcing domain constraints { ‘lid’: ‘1’, ‘when’: ‘20141005’ } { ’guid’: ‘af54d’, ‘when’: ‘20141005’ } c1 server
  58. 58. enforcing domain constraints { ‘lid’: ‘1’, ‘when’: ‘20141005’ } { ’guid’: ‘af54d’, ‘when’: ‘20141005’ } POST /merge c1 server
  59. 59. enforcing domain constraints { ‘lid’: ‘1’, ‘when’: ‘20141005’ } { ’guid’: ‘af54d’, ‘when’: ‘20141005’ } POST /merge c1 server
  60. 60. enforcing domain constraints { ‘lid’: ‘1’, ‘when’: ‘20141005’ } { ’guid’: ‘af54d’, ‘when’: ‘20141005’ } POST /merge c1 server { ’guid’: ‘af54d’, ‘when’: ‘20141005’ }
  61. 61. dealing with binary data ! Binary data uploaded via custom endpoint ! Sync data remain small ! Uploads can be resumed
  62. 62. dealing with binary data ! Two steps* 1) data are synched to server 2) related images are uploaded ! * this means record without file for a given time
  63. 63. dealing with binary data POST /merge { ‘lid’ : 1, ‘type’ : ‘baby’, ‘image’ : ‘myimage.jpg’ } { ‘lid’ : 1, ‘guid’ : ‘ac435-f8345’ } c1 server POST /upload/ac435-f8345/image
  64. 64. What we learned ! Implementing this stuff is tricky ! Explore existing solution if you can ! Understanding the domain is important
  65. 65. vector clocks
  66. 66. CRDT ! Conflict-free Replicated Data Types (CRDTs) ! Constraining the types of operations in order to: - ensure convergence of changes to shared data by uncoordinated, concurrent actors - eliminate network failure modes as a source of error
  67. 67. Math!!! CRDT ! Bounded-join semilattices - join operation defining a least upper bound - partially order set - always increasing
  68. 68. Gateways handles sync Data flows through channels - partition data set - authorization - limit the data ! Use revision trees Couchbase Mobile
  69. 69. Riak Distributed DB Eventually/Strong Consistency ! Data Types ! Configurable conflic resolution - db level for built-in data types - application level for custom data
  70. 70. ! That’s all folks! Questions? ! Please leave feedback! https://joind.in/11797 !
  71. 71. Links Vector Clocks http://basho.com/why-vector-clocks-are-easy/ http://www.datastax.com/dev/blog/why-cassandra-doesnt-need-vector-clocks http://basho.com/why-vector-clocks-are-hard/ ! CRDTs http://christophermeiklejohn.com/distributed/systems/2013/07/12/readings-in-distributed-systems. html http://www.infoq.com/presentations/problems-distributed-systems https://www.youtube.com/watch?v=qyVNG7fnubQ ! Riak http://docs.basho.com/riak/latest/dev/using/conflict-resolution/ ! Couchbase Sync Gateway http://docs.couchbase.com/sync-gateway/ http://www.infoq.com/presentations/sync-mobile-data ! API http://developers.amiando.com/index.php/REST_API_DataSync https://login.syncano.com/docs/rest/index.html
  72. 72. Credits phones https://www.flickr.com/photos/15216811@N06/14504964841 wat http://uturncrossfit.com/wp-content/uploads/2014/04/wait-what.jpg darth http://www.listal.com/viewimage/3825918h blueprint: http://upload.wikimedia.org/wikipedia/commons/5/5e/Joy_Oil_gas_station_blueprints.jpg! building: http://s0.geograph.org.uk/geophotos/02/42/74/2427436_96c4cd84.jpg! brownfield: http://s0.geograph.org.uk/geophotos/02/04/54/2045448_03a2fb36.jpg! no connection: https://www.flickr.com/photos/77018488@N03/9004800239! no internet con https://www.flickr.com/photos/roland/9681237793! vector clocks: http://en.wikipedia.org/wiki/Vector_clock! crdts: http://www.infoq.com/presentations/problems-distributed-systems

×