Implementing data 
synchronization API for 
mobile apps with Silex
Michele Orselli 
CTO@Ideato 
_orso_ 
micheleorselli / ideatosrl 
mo@ideato.it
Agenda 
scenario design choices 
implementation alternative approaches
Sync scenario 
A 
B 
C
Sync scenario 
ABC 
ABC 
ABC
Dealing with conflicts 
A1 
A2 
?
Brownfield project 
Scenario 
several mobile apps for tracking user generated 
data (calendar, notes, bio data) 
iOS & Android 
~10 K users steadily growing at 1.2 K/month
MongoDB 
Scenario 
Legacy App based on Codeigniter 
Existing RPC-wannabe-REST API for data sync
For every resource 
get updates: 
Scenario 
POST /m/:app/get/:user_id/:res/:updated_from 
send updates: 
POST /m/:app/update/:user_id/:res_id/:dev_id/:res
api
Scenario 
~6 different resources, ~12 calls per sync 
apps sync by polling every 30 sec 
every call sync little data
Challenge 
Rebuild sync API for old apps + 2 incoming 
Enable image synchronization 
More efficient than previous API
Existing Solutions 
Tstamps, 
Vector clocks, 
CRDTs 
syncML, 
syncano 
Algorithms Protocols/API 
Azure Data 
sync 
Platform 
couchDB, 
riak 
Storage
Not Invented Here? 
Don't Reinvent The Wheel, 
Unless You Plan on Learning More About Wheels 
J. Atwood
Architecture 
2 different mobile platforms 
Several teams with different skill level 
Changing storage wasn’t an option 
Forcing a particular technology client side wasn’t 
an option
Architecture 
c1 
server 
c2 
c3 
sync logic 
conflicts resolution 
thin clients
Implementation 
In the sync domain all resources are managed in 
the same way
Implementation 
For every app: 
one endpoint for getting new data 
one endpoint for pushing changes 
one endpoint for uploading images
The new APIs 
GET /apps/:app/users/:user_id/changes[?from=:from] 
POST /apps/:app/users/:user_id/merge 
POST /upload/:res_id/images
Silex Implementation
Silex Implementation 
Col 1 
Col 2 
Col 3
Silex Implementation 
Col 1 
Col 2 
Col 3 
Sync Service
Silex Implementation 
Col 1 
Col 2 
Col 3 
Sync Service
Silex Implementation 
Col 1 
Col 2 
Col 3 
Sync Service
Silex Implementation 
Col 1 
Col 2 
Col 3 
Sync Service
Silex Implementation 
Col 1 
Col 2 
Col 3 
Sync Service
Silex Implementation 
Col 1 
Col 2 
Col 3 
Sync Service
Silex Implementation 
Col 1 
Col 2 
Col 3 
Sync Service
Silex Implementation 
$app->get(“/apps/{mApp}/users/{userId}/changes”, 
function ($mApp, $userId, $app, $request) 
{ 
$lastSync = $request->get('from', null); 
$syncService = $app[‘syncService’]; 
$syncService->sync($lastSync, $userId); 
$response = new JsonResponse( 
$syncService->getResult() 
); 
return $response; 
}
Silex Implementation 
$app->get(“/apps/{mApp}/users/{userId}/changes”, 
function ($mApp, $userId, $app, $request) 
{ 
$lastSync = $request->get('from', null); 
$syncService = $app[‘syncService’]; 
$syncService->sync($lastSync, $userId); 
$response = new JsonResponse( 
$syncService->getResult() 
); 
return $response; 
}
Silex Implementation 
$app->get(“/apps/{mApp}/users/{userId}/changes”, 
function ($mApp, $userId, $app, $request) 
{ 
$lastSync = $request->get('from', null); 
$syncService = $app[‘syncService’]; 
$syncService->sync($lastSync, $userId); 
$response = new JsonResponse( 
$syncService->getResult() 
); 
return $response; 
}
Silex Implementation 
$app->get(“/apps/{mApp}/users/{userId}/changes”, 
function ($mApp, $userId, $app, $request) 
{ 
$lastSync = $request->get('from', null); 
$syncService = $app[‘syncService’]; 
$syncService->sync($lastSync, $userId); 
$response = new JsonResponse( 
$syncService->getResult() 
); 
return $response; 
}
Silex Implementation 
$app->get(“/apps/{mApp}/users/{userId}/merge”, 
function ($mApp, $userId, $app, $request) 
{ 
$lastSync = $request->get('from', null); 
$data = $request->get(‘data’, false); 
$syncService = $app[‘syncService’]; 
$syncService->merge($data, $lastSync, $userId); 
$response = new JsonResponse( 
$syncService->getResult() 
); 
return $response; 
}
Silex Implementation 
$app->get(“/apps/{mApp}/users/{userId}/merge”, 
function ($mApp, $userId, $app, $request) 
{ 
$lastSync = $request->get('from', null); 
$data = $request->get(‘data’, false); 
$syncService = $app[‘syncService’]; 
$syncService->merge($data, $lastSync, $userId); 
$response = new JsonResponse( 
$syncService->getResult() 
); 
return $response; 
}
Silex Implementation 
$app->get(“/apps/{mApp}/users/{userId}/merge”, 
function ($mApp, $userId, $app, $request) 
{ 
$lastSync = $request->get('from', null); 
$data = $request->get(‘data’, false); 
$syncService = $app[‘syncService’]; 
$syncService->merge($data, $lastSync, $userId); 
$response = new JsonResponse( 
$syncService->getResult() 
); 
return $response; 
}
Silex Implementation 
$app->get(“/apps/{mApp}/users/{userId}/merge”, 
function ($mApp, $userId, $app, $request) 
{ 
$lastSync = $request->get('from', null); 
$data = $request->get(‘data’, false); 
$syncService = $app[‘syncService’]; 
$syncService->merge($data, $lastSync, $userId); 
$response = new JsonResponse( 
$syncService->getResult() 
); 
return $response; 
}
Silex Implementation 
$app->get(“/apps/{mApp}/users/{userId}/merge”, 
function ($mApp, $userId, $app, $request) 
{ 
$lastSync = $request->get('from', null); 
$data = $request->get(‘data’, false); 
$syncService = $app[‘syncService’]; 
$syncService->merge($data, $lastSync, $userId); 
$response = new JsonResponse( 
$syncService->getResult() 
); 
return $response; 
}
Silex Implementation 
$app['mongodb'] = new MongoDb(…); 
$app[‘changesRepo’] = new ChangesRepository( 
$app[‘mongodb’] 
); 
$app[‘syncService’] ? new SyncService( 
$app[‘changesRepo’] 
);
Get changes 
GET /apps/:app/users/:user_id/changes?from=:from 
timestamp?
Server suggest the sync time 
timestamp are inaccurate 
server suggests the “from” parameter to be used 
in the next request
Server suggest the sync time 
GET /changes 
{ ‘next’ : 12345, 
‘data’: […] } 
c1 server
Server suggest the sync time 
GET /changes 
{ ‘next’ : 12345, 
‘data’: […] } 
c1 server 
GET /changes?from=12345 
{ ‘next’ : 45678, 
‘data’: […] }
operations: 
{‘op’: ’add’, id: ‘1’, ’data’:[…]} 
{‘op’: ’update’, id: ‘1’, ’data’:[…]} 
{‘op’: ’delete’, id: ‘1’} 
{‘op’: ’add’, id: ‘2’, ’data’:[…]} 
states: 
{id: ‘1’, ’data’:[…]} 
{id: 2’, ’data’:[…]} 
{id: ‘3’, ’data’:[…]} 
what to transfer
what to transfer 
we choose to transfer states 
{id: ‘1’, ’type’: ‘measure’, ‘_deleted’: true} 
{id: 2’, ‘type’: ‘note’} 
{id: ‘3’, ‘type’: ‘note’} 
ps: soft delete all the things!
unique identifiers 
How do we generate an unique id in a distributed 
system?
unique identifiers 
How do we generate an unique id in a distributed 
system? 
UUID (RFC 4122): several implementations in PHP 
(https://github.com/ramsey/uuid)
unique identifiers 
How do we generate an unique id in a distributed 
system? 
Local/Global Id: only the server generates GUIDs 
clients use local ids to manage their records
unique identifiers 
POST /merge 
{ ‘data’: [ 
{’lid’: ‘1’, …}, 
{‘lid’: ‘2’, …} 
] } 
c1 server 
{ ‘data’: [ 
{‘guid’: ‘58f0bdd7-1400’, ’lid’: ‘1’, …}, 
{‘guid’: ‘6f9f3ec9-1400’, ‘lid’: ‘2’, …} 
] }
conflict resolution algorithm (plain data) 
mobile generated data are “temporary” until sync 
to server 
server handles conflicts resolution
conflict resolution algorithm (plain data) 
conflict resolution: 
domain indipendent: e.g. last-write wins 
domain dipendent: use domain knowledge to 
resolve
function sync($data) { 
foreach ($data as $newRecord) { 
$s = findByGuid($newRecord->getGuid()); 
if (!$s) { 
add($newRecord); 
send($newRecord); 
continue; 
} 
if ($newRecord->updated > $s->updated) { 
update($s, $newRecord); 
send($newRecord); 
continue; 
} 
updateRemote($newRecord, $s); 
} 
conflict resolution algorithm (plain data)
function sync($data) { 
foreach ($data as $newRecord) { 
$s = findByGuid($newRecord->getGuid()); 
if (!$s) { 
add($newRecord); 
send($newRecord); 
continue; 
} 
if ($newRecord->updated > $s->updated) { 
update($s, $newRecord); 
send($newRecord); 
continue; 
} 
updateRemote($newRecord, $s); 
} 
conflict resolution algorithm (plain data)
function sync($data) { 
foreach ($data as $newRecord) { 
$s = findByGuid($newRecord->getGuid()); 
if (!$s) { 
add($newRecord); 
send($newRecord); 
continue; 
} 
if ($newRecord->updated > $s->updated) { 
update($s, $newRecord); 
send($newRecord); 
continue; 
} 
updateRemote($newRecord, $s); 
} 
conflict resolution algorithm (plain data) 
no conflict
function sync($data) { 
foreach ($data as $newRecord) { 
$s = findByGuid($newRecord->getGuid()); 
if (!$s) { 
add($newRecord); 
send($newRecord); 
continue; 
} 
if ($newRecord->updated > $s->updated) { 
update($s, $newRecord); 
send($newRecord); 
continue; 
} 
updateRemote($newRecord, $s); 
} 
conflict resolution algorithm (plain data) 
remote wins
function sync($data) { 
foreach ($data as $newRecord) { 
$s = findByGuid($newRecord->getGuid()); 
if (!$s) { 
add($newRecord); 
send($newRecord); 
continue; 
} 
if ($newRecord->updated > $s->updated) { 
update($s, $newRecord); 
send($newRecord); 
continue; 
} 
updateRemote($newRecord, $s); 
} 
conflict resolution algorithm (plain data) 
server wins
conflict resolution algorithm (plain data) 
{ ‘lid’: ‘1’, 
‘guid’: ‘af54d’, 
‘data’ : ‘AAA’ 
‘updated’: ’100’ } 
{ ‘lid’: ‘2’, 
‘data’ : ‘hello!’, 
‘updated’: ’15’ } 
c1 
{ ’guid’: ‘af54d’, 
‘data’: ‘BBB’, 
‘updated’ : ’20’ } 
server
conflict resolution algorithm (plain data) 
{ ‘lid’: ‘1’, 
‘guid’: ‘af54d’, 
‘data’ : ‘AAA’ 
‘updated’: ’100’ } 
{ ‘lid’: ‘2’, 
‘data’ : ‘hello!’, 
‘updated’: ’15’ } POST /merge 
{ ’guid’: ‘af54d’, 
‘data’: ‘BBB’, 
‘updated’ : ’20’ } 
c1 server
conflict resolution algorithm (plain data) 
{ ‘lid’: ‘1’, 
‘guid’: ‘af54d’, 
‘data’ : ‘AAA’ 
‘updated’: ’100’ } 
{ ‘lid’: ‘2’, 
‘data’ : ‘hello!’, 
‘updated’: ’15’ } POST /merge 
{ ‘guid’: ‘e324f’, 
‘data’ : ‘hello!’, 
‘updated’: ’15’ } 
{ ’guid’: ‘af54d’, 
‘data’: ‘BBB’, 
‘updated’ : ’20’ } 
c1 server
conflict resolution algorithm (plain data) 
{ ‘lid’: ‘1’, 
‘guid’: ‘af54d’, 
‘data’ : ‘AAA’ 
‘updated’: ’100’ } 
{ ‘lid’: ‘2’, 
‘data’ : ‘hello!’, 
‘updated’: ’15’ } POST /merge 
{ ‘guid’: ‘e324f’, 
‘data’ : ‘hello!’, 
‘updated’: ’15’ } 
{ ’guid’: ‘af54d’, 
‘data’: ‘BBB’, 
‘updated’ : ’20’ } 
c1 server
conflict resolution algorithm (plain data) 
{ ‘lid’: ‘1’, 
‘guid’: ‘af54d’, 
‘data’ : ‘AAA’ 
‘updated’: ’100’ } 
{ ‘lid’: ‘2’, 
‘data’ : ‘hello!’, 
‘updated’: ’15’ } POST /merge 
{ ‘guid’: ‘e324f’, 
‘data’ : ‘hello!’, 
‘updated’: ’15’ } 
{ ’guid’: ‘af54d’, 
‘data’: ‘AAA’, 
‘updated’ : ’100’ } 
c1 server
conflict resolution algorithm (plain data) 
{ ‘lid’: ‘1’, 
‘guid’: ‘af54d’, 
‘data’ : ‘AAA’ 
‘updated’: ’100’ } 
{ ‘guid’: ‘e324f’, 
‘data’ : ‘hello!’, 
‘updated’: ’15’ } 
{ ’guid’: ‘af54d’, 
‘data’: ‘AAA’, 
‘updated’ : ’100’ } 
{ ‘lid’: ‘2’, 
‘data’ : ‘hello!’, 
‘updated’: ’15’ } POST /merge 
c1 server 
{‘ok’ : { ’guid’: ‘af54d’ }} 
{‘update’ : { lid: ‘2’, ’guid’: ‘e324f’ }}
conflict resolution algorithm (hierarchical data) 
How to manage hierarchical data? 
{ 
‘lid’ : ‘123456’, 
‘type’ : ‘baby’, 
… 
} 
{ 
‘lid’ : ‘123456’, 
‘type’ : ‘temperature’, 
‘baby_id : ‘123456’ 
}
conflict resolution algorithm (hierarchical data) 
How to manage hierarchical data? 
1) sync root record 
2) update ids 
3) sync child records 
{ 
‘lid’ : ‘123456’, 
‘type’ : ‘baby’, 
… 
} 
{ 
‘lid’ : ‘123456’, 
‘type’ : ‘temperature’, 
‘baby_id : ‘123456’ 
}
conflict resolution algorithm (hierarchical data) 
function syncHierarchical($data) { 
sortByHierarchy($data); 
foreach ($data as $newRootRecord) { 
$s = findByGuid($newRootRecord->getGuid()); 
if($newRecord->isRoot()) { 
if (!$s) { 
add($newRootRecord); 
updateRecordIds($newRootRecord, $data); 
send($newRootRecord); 
continue; 
} 
…
conflict resolution algorithm (hierarchical data) 
function syncHierarchical($data) { 
sortByHierarchy($data); 
foreach ($data as $newRootRecord) { 
$s = findByGuid($newRootRecord->getGuid()); 
if($newRecord->isRoot()) { 
if (!$s) { 
add($newRootRecord); 
updateRecordIds($newRootRecord, $data); 
send($newRootRecord); 
continue; 
} 
… 
parent records first
conflict resolution algorithm (hierarchical data) 
function syncHierarchical($data) { 
sortByHierarchy($data); 
foreach ($data as $newRootRecord) { 
$s = findByGuid($newRootRecord->getGuid()); 
if($newRecord->isRoot()) { 
if (!$s) { 
add($newRootRecord); 
updateRecordIds($newRootRecord, $data); 
send($newRootRecord); 
continue; 
} 
…
conflict resolution algorithm (hierarchical data) 
function syncHierarchical($data) { 
sortByHierarchy($data); 
foreach ($data as $newRootRecord) { 
$s = findByGuid($newRootRecord->getGuid()); 
if($newRecord->isRoot()) { 
if (!$s) { 
add($newRootRecord); 
updateRecordIds($newRootRecord, $data); 
send($newRootRecord); 
continue; 
} 
… 
no conflict
… 
if ($newRootRecord->updated > $s->updated) { 
update($s, $newRecord); 
updateRecordIds($newRootRecord, $data); 
send($newRootRecord); 
continue; 
} else { 
updateRecordIds($s, $data); 
updateRemote($newRecord, $s); 
} 
} else { 
sync($data); 
} 
} 
conflict resolution algorithm (hierarchical data) 
remote wins
… 
if ($newRootRecord->updated > $s->updated) { 
update($s, $newRecord); 
updateRecordIds($newRootRecord, $data); 
send($newRootRecord); 
continue; 
} else { 
updateRecordIds($s, $data); 
updateRemote($newRecord, $s); 
} 
} else { 
sync($data); 
} 
} 
conflict resolution algorithm (hierarchical data) 
server wins
conflict resolution algorithm (hierarchical data) 
{ ‘lid’: ‘1’, 
‘data’ : ‘AAA’ 
‘updated’: ’100’ } 
{ ‘lid’: ‘2’, 
‘parent’: ‘1’, 
‘data’ : ‘hello!’, 
‘updated’: ’15’ } 
POST /merge 
c1 server
conflict resolution algorithm (hierarchical data) 
{ ‘lid’: ‘1’, 
‘data’ : ‘AAA’ 
‘updated’: ’100’ } 
{ ‘lid’: ‘2’, 
‘parent’: ‘1’, 
‘data’ : ‘hello!’, 
‘updated’: ’15’ } 
c1 
server 
POST /merge 
{ ‘lid’: ‘1’, 
‘guid’ : ‘32ead’, 
‘data’ : ‘AAA’ 
‘updated’: ’100’ }
conflict resolution algorithm (hierarchical data) 
{ ‘lid’: ‘1’, 
‘data’ : ‘AAA’ 
‘updated’: ’100’ } 
{ ‘lid’: ‘2’, 
‘parent’: ‘32ead’, 
‘data’ : ‘hello!’, 
‘updated’: ’15’ } 
c1 
server 
POST /merge 
{ ‘lid’: ‘1’, 
‘guid’ : ‘32ead’, 
‘data’ : ‘AAA’ 
‘updated’: ’100’ }
conflict resolution algorithm (hierarchical data) 
{ ‘lid’: ‘1’, 
‘data’ : ‘AAA’ 
‘updated’: ’100’ } 
{ ‘lid’: ‘2’, 
‘parent’: ‘32ead’, 
‘data’ : ‘hello!’, 
‘updated’: ’15’ } 
c1 
server 
POST /merge 
{ ‘lid’: ‘1’, 
‘guid’ : ‘32ead’, 
‘data’ : ‘AAA’ 
‘updated’: ’100’ } 
{ ‘lid’: ‘2’, 
‘parent’: ‘32ead’, 
‘data’ : ‘hello!’, 
‘updated’: ’15’ } 
{‘update’ : { ‘lid’: ‘1’, ’guid’: ‘af54d’ }} 
{‘update’ : { lid: ‘2’, ’guid’: ‘e324f’ }}
enforcing domain constraints 
e.g. “only one temperature can be registered in a 
given day” 
how to we enforce domain constraints on data?
enforcing domain constraints 
e.g. “only one temperature can be registered in a 
given day” 
how to we enforce domain constraints on data? 
1) relax constraints
enforcing domain constraints 
e.g. “only one temperature can be registered in a 
given day” 
how to we enforce domain constraints on data? 
1) relax constraints 
2) integrate constraints in sync algorithm
enforcing domain constraints 
from findByGuid to findSimilar 
first lookup by GUID then by domain rules 
“two measures are similar if are referred to the 
same date”
enforcing domain constraints 
c1 server
enforcing domain constraints 
{ ’guid’: ‘af54d’, 
‘when’: ‘20141005’ } 
c1 server
enforcing domain constraints 
{ ‘lid’: ‘1’, 
‘when’: ‘20141005’ } 
{ ’guid’: ‘af54d’, 
‘when’: ‘20141005’ } 
c1 server
enforcing domain constraints 
{ ‘lid’: ‘1’, 
‘when’: ‘20141005’ } 
{ ’guid’: ‘af54d’, 
‘when’: ‘20141005’ } 
POST /merge 
c1 server
enforcing domain constraints 
{ ‘lid’: ‘1’, 
‘when’: ‘20141005’ } 
{ ’guid’: ‘af54d’, 
‘when’: ‘20141005’ } 
POST /merge 
c1 server
enforcing domain constraints 
{ ‘lid’: ‘1’, 
‘when’: ‘20141005’ } 
{ ’guid’: ‘af54d’, 
‘when’: ‘20141005’ } 
POST /merge 
c1 server 
{ ’guid’: ‘af54d’, 
‘when’: ‘20141005’ }
dealing with binary data 
Binary data uploaded via custom endpoint 
Sync data remains small 
Uploads can be resumed
dealing with binary data 
Two steps* 
1) data are synchronized 
2) related images are uploaded 
* this means record without file for a given time
dealing with binary data 
POST /merge 
{ ‘lid’ : 1, 
‘type’ : ‘baby’, 
‘image’ : ‘myimage.jpg’ } 
{ ‘lid’ : 1, 
‘guid’ : ‘ac435-f8345’ } 
c1 server 
POST /upload/ac435-f8345/image
What we learned 
Implementing this stuff is tricky 
Explore existing solution if you can 
Understanding the domain is important
vector clocks
vector clocks
CRDT 
Conflict-free Replicated Data Types (CRDTs) 
Constraining the types of operations in order to: 
- ensure convergence of changes to shared data by 
uncoordinated, concurrent actors 
- eliminate network failure modes as a source of 
error
Couchbase Mobile 
Gateways handles sync 
Data flows through channels 
- partition data set 
- authorization 
- limit the data 
Use revision trees
Riak 
Distributed DB 
Eventually/Strong Consistency 
Data Types 
Configurable conflict resolution 
- db level for built-in data types 
- application level for custom 
data
That’s all folks! 
Questions? 
Please leave feedback! https://joind.in/12959
http://www.objc.io/issue-10/sync-case-study.html 
http://www.objc.io/issue-10/data-synchronization.html 
https://dev.evernote.com/media/pdf/edam-sync.pdf 
http://blog.helftone.com/clear-in-the-icloud/ 
http://strongloop.com/strongblog/node-js-replication-mobile-offline-sync-loopback/ 
http://blog.denivip.ru/index.php/2014/04/data-syncing-in-core-data-based-ios-apps/?lang=en 
http://inessential.com/2014/02/15/vesper_sync_diary_8_the_problem_of_un 
http://culturedcode.com/things/blog/2010/12/state-of-sync-part-1.html 
http://programmers.stackexchange.com/questions/206310/data-synchronization-in-mobile-apps-multiple- 
devices-multiple-users 
http://bricklin.com/offline.htm 
http://blog.couchbase.com/why-mobile-sync 
Links
Links 
Vector Clocks 
http://basho.com/why-vector-clocks-are-easy/ 
http://www.datastax.com/dev/blog/why-cassandra-doesnt-need-vector-clocks 
http://basho.com/why-vector-clocks-are-hard/ 
http://blog.8thlight.com/rylan-dirksen/2013/10/04/synchronization-in-a-distributed-system.html 
CRDTs 
http://christophermeiklejohn.com/distributed/systems/2013/07/12/readings-in-distributed-systems.html 
http://www.infoq.com/presentations/problems-distributed-systems 
https://www.youtube.com/watch?v=qyVNG7fnubQ 
Riak 
http://docs.basho.com/riak/latest/dev/using/conflict-resolution/ 
Couchbase Sync Gateway 
http://docs.couchbase.com/sync-gateway/ 
http://www.infoq.com/presentations/sync-mobile-data 
API 
http://developers.amiando.com/index.php/REST_API_DataSync 
https://login.syncano.com/docs/rest/index.html
Credits 
phones https://www.flickr.com/photos/15216811@N06/14504964841 
wat http://uturncrossfit.com/wp-content/uploads/2014/04/wait-what.jpg 
darth http://www.listal.com/viewimage/3825918h 
blueprint: http://upload.wikimedia.org/wikipedia/commons/5/5e/Joy_Oil_gas_station_blueprints.jpg 
building: http://s0.geograph.org.uk/geophotos/02/42/74/2427436_96c4cd84.jpg 
brownfield: http://s0.geograph.org.uk/geophotos/02/04/54/2045448_03a2fb36.jpg 
no connection: https://www.flickr.com/photos/77018488@N03/9004800239 
no internet con https://www.flickr.com/photos/roland/9681237793 
vector clocks: http://en.wikipedia.org/wiki/Vector_clock 
crdts: http://www.infoq.com/presentations/problems-distributed-systems

Server side data sync for mobile apps with silex

  • 1.
    Implementing data synchronizationAPI for mobile apps with Silex
  • 2.
    Michele Orselli CTO@Ideato _orso_ micheleorselli / ideatosrl mo@ideato.it
  • 3.
    Agenda scenario designchoices implementation alternative approaches
  • 6.
  • 7.
  • 8.
  • 9.
    Brownfield project Scenario several mobile apps for tracking user generated data (calendar, notes, bio data) iOS & Android ~10 K users steadily growing at 1.2 K/month
  • 10.
    MongoDB Scenario LegacyApp based on Codeigniter Existing RPC-wannabe-REST API for data sync
  • 11.
    For every resource get updates: Scenario POST /m/:app/get/:user_id/:res/:updated_from send updates: POST /m/:app/update/:user_id/:res_id/:dev_id/:res
  • 12.
  • 13.
    Scenario ~6 differentresources, ~12 calls per sync apps sync by polling every 30 sec every call sync little data
  • 14.
    Challenge Rebuild syncAPI for old apps + 2 incoming Enable image synchronization More efficient than previous API
  • 16.
    Existing Solutions Tstamps, Vector clocks, CRDTs syncML, syncano Algorithms Protocols/API Azure Data sync Platform couchDB, riak Storage
  • 17.
    Not Invented Here? Don't Reinvent The Wheel, Unless You Plan on Learning More About Wheels J. Atwood
  • 18.
    Architecture 2 differentmobile platforms Several teams with different skill level Changing storage wasn’t an option Forcing a particular technology client side wasn’t an option
  • 19.
    Architecture c1 server c2 c3 sync logic conflicts resolution thin clients
  • 20.
    Implementation In thesync domain all resources are managed in the same way
  • 21.
    Implementation For everyapp: one endpoint for getting new data one endpoint for pushing changes one endpoint for uploading images
  • 22.
    The new APIs GET /apps/:app/users/:user_id/changes[?from=:from] POST /apps/:app/users/:user_id/merge POST /upload/:res_id/images
  • 23.
  • 24.
  • 25.
    Silex Implementation Col1 Col 2 Col 3 Sync Service
  • 26.
    Silex Implementation Col1 Col 2 Col 3 Sync Service
  • 27.
    Silex Implementation Col1 Col 2 Col 3 Sync Service
  • 28.
    Silex Implementation Col1 Col 2 Col 3 Sync Service
  • 29.
    Silex Implementation Col1 Col 2 Col 3 Sync Service
  • 30.
    Silex Implementation Col1 Col 2 Col 3 Sync Service
  • 31.
    Silex Implementation Col1 Col 2 Col 3 Sync Service
  • 32.
    Silex Implementation $app->get(“/apps/{mApp}/users/{userId}/changes”, function ($mApp, $userId, $app, $request) { $lastSync = $request->get('from', null); $syncService = $app[‘syncService’]; $syncService->sync($lastSync, $userId); $response = new JsonResponse( $syncService->getResult() ); return $response; }
  • 33.
    Silex Implementation $app->get(“/apps/{mApp}/users/{userId}/changes”, function ($mApp, $userId, $app, $request) { $lastSync = $request->get('from', null); $syncService = $app[‘syncService’]; $syncService->sync($lastSync, $userId); $response = new JsonResponse( $syncService->getResult() ); return $response; }
  • 34.
    Silex Implementation $app->get(“/apps/{mApp}/users/{userId}/changes”, function ($mApp, $userId, $app, $request) { $lastSync = $request->get('from', null); $syncService = $app[‘syncService’]; $syncService->sync($lastSync, $userId); $response = new JsonResponse( $syncService->getResult() ); return $response; }
  • 35.
    Silex Implementation $app->get(“/apps/{mApp}/users/{userId}/changes”, function ($mApp, $userId, $app, $request) { $lastSync = $request->get('from', null); $syncService = $app[‘syncService’]; $syncService->sync($lastSync, $userId); $response = new JsonResponse( $syncService->getResult() ); return $response; }
  • 36.
    Silex Implementation $app->get(“/apps/{mApp}/users/{userId}/merge”, function ($mApp, $userId, $app, $request) { $lastSync = $request->get('from', null); $data = $request->get(‘data’, false); $syncService = $app[‘syncService’]; $syncService->merge($data, $lastSync, $userId); $response = new JsonResponse( $syncService->getResult() ); return $response; }
  • 37.
    Silex Implementation $app->get(“/apps/{mApp}/users/{userId}/merge”, function ($mApp, $userId, $app, $request) { $lastSync = $request->get('from', null); $data = $request->get(‘data’, false); $syncService = $app[‘syncService’]; $syncService->merge($data, $lastSync, $userId); $response = new JsonResponse( $syncService->getResult() ); return $response; }
  • 38.
    Silex Implementation $app->get(“/apps/{mApp}/users/{userId}/merge”, function ($mApp, $userId, $app, $request) { $lastSync = $request->get('from', null); $data = $request->get(‘data’, false); $syncService = $app[‘syncService’]; $syncService->merge($data, $lastSync, $userId); $response = new JsonResponse( $syncService->getResult() ); return $response; }
  • 39.
    Silex Implementation $app->get(“/apps/{mApp}/users/{userId}/merge”, function ($mApp, $userId, $app, $request) { $lastSync = $request->get('from', null); $data = $request->get(‘data’, false); $syncService = $app[‘syncService’]; $syncService->merge($data, $lastSync, $userId); $response = new JsonResponse( $syncService->getResult() ); return $response; }
  • 40.
    Silex Implementation $app->get(“/apps/{mApp}/users/{userId}/merge”, function ($mApp, $userId, $app, $request) { $lastSync = $request->get('from', null); $data = $request->get(‘data’, false); $syncService = $app[‘syncService’]; $syncService->merge($data, $lastSync, $userId); $response = new JsonResponse( $syncService->getResult() ); return $response; }
  • 41.
    Silex Implementation $app['mongodb']= new MongoDb(…); $app[‘changesRepo’] = new ChangesRepository( $app[‘mongodb’] ); $app[‘syncService’] ? new SyncService( $app[‘changesRepo’] );
  • 42.
    Get changes GET/apps/:app/users/:user_id/changes?from=:from timestamp?
  • 43.
    Server suggest thesync time timestamp are inaccurate server suggests the “from” parameter to be used in the next request
  • 44.
    Server suggest thesync time GET /changes { ‘next’ : 12345, ‘data’: […] } c1 server
  • 45.
    Server suggest thesync time GET /changes { ‘next’ : 12345, ‘data’: […] } c1 server GET /changes?from=12345 { ‘next’ : 45678, ‘data’: […] }
  • 46.
    operations: {‘op’: ’add’,id: ‘1’, ’data’:[…]} {‘op’: ’update’, id: ‘1’, ’data’:[…]} {‘op’: ’delete’, id: ‘1’} {‘op’: ’add’, id: ‘2’, ’data’:[…]} states: {id: ‘1’, ’data’:[…]} {id: 2’, ’data’:[…]} {id: ‘3’, ’data’:[…]} what to transfer
  • 47.
    what to transfer we choose to transfer states {id: ‘1’, ’type’: ‘measure’, ‘_deleted’: true} {id: 2’, ‘type’: ‘note’} {id: ‘3’, ‘type’: ‘note’} ps: soft delete all the things!
  • 48.
    unique identifiers Howdo we generate an unique id in a distributed system?
  • 49.
    unique identifiers Howdo we generate an unique id in a distributed system? UUID (RFC 4122): several implementations in PHP (https://github.com/ramsey/uuid)
  • 50.
    unique identifiers Howdo we generate an unique id in a distributed system? Local/Global Id: only the server generates GUIDs clients use local ids to manage their records
  • 51.
    unique identifiers POST/merge { ‘data’: [ {’lid’: ‘1’, …}, {‘lid’: ‘2’, …} ] } c1 server { ‘data’: [ {‘guid’: ‘58f0bdd7-1400’, ’lid’: ‘1’, …}, {‘guid’: ‘6f9f3ec9-1400’, ‘lid’: ‘2’, …} ] }
  • 52.
    conflict resolution algorithm(plain data) mobile generated data are “temporary” until sync to server server handles conflicts resolution
  • 53.
    conflict resolution algorithm(plain data) conflict resolution: domain indipendent: e.g. last-write wins domain dipendent: use domain knowledge to resolve
  • 54.
    function sync($data) { foreach ($data as $newRecord) { $s = findByGuid($newRecord->getGuid()); if (!$s) { add($newRecord); send($newRecord); continue; } if ($newRecord->updated > $s->updated) { update($s, $newRecord); send($newRecord); continue; } updateRemote($newRecord, $s); } conflict resolution algorithm (plain data)
  • 55.
    function sync($data) { foreach ($data as $newRecord) { $s = findByGuid($newRecord->getGuid()); if (!$s) { add($newRecord); send($newRecord); continue; } if ($newRecord->updated > $s->updated) { update($s, $newRecord); send($newRecord); continue; } updateRemote($newRecord, $s); } conflict resolution algorithm (plain data)
  • 56.
    function sync($data) { foreach ($data as $newRecord) { $s = findByGuid($newRecord->getGuid()); if (!$s) { add($newRecord); send($newRecord); continue; } if ($newRecord->updated > $s->updated) { update($s, $newRecord); send($newRecord); continue; } updateRemote($newRecord, $s); } conflict resolution algorithm (plain data) no conflict
  • 57.
    function sync($data) { foreach ($data as $newRecord) { $s = findByGuid($newRecord->getGuid()); if (!$s) { add($newRecord); send($newRecord); continue; } if ($newRecord->updated > $s->updated) { update($s, $newRecord); send($newRecord); continue; } updateRemote($newRecord, $s); } conflict resolution algorithm (plain data) remote wins
  • 58.
    function sync($data) { foreach ($data as $newRecord) { $s = findByGuid($newRecord->getGuid()); if (!$s) { add($newRecord); send($newRecord); continue; } if ($newRecord->updated > $s->updated) { update($s, $newRecord); send($newRecord); continue; } updateRemote($newRecord, $s); } conflict resolution algorithm (plain data) server wins
  • 59.
    conflict resolution algorithm(plain data) { ‘lid’: ‘1’, ‘guid’: ‘af54d’, ‘data’ : ‘AAA’ ‘updated’: ’100’ } { ‘lid’: ‘2’, ‘data’ : ‘hello!’, ‘updated’: ’15’ } c1 { ’guid’: ‘af54d’, ‘data’: ‘BBB’, ‘updated’ : ’20’ } server
  • 60.
    conflict resolution algorithm(plain data) { ‘lid’: ‘1’, ‘guid’: ‘af54d’, ‘data’ : ‘AAA’ ‘updated’: ’100’ } { ‘lid’: ‘2’, ‘data’ : ‘hello!’, ‘updated’: ’15’ } POST /merge { ’guid’: ‘af54d’, ‘data’: ‘BBB’, ‘updated’ : ’20’ } c1 server
  • 61.
    conflict resolution algorithm(plain data) { ‘lid’: ‘1’, ‘guid’: ‘af54d’, ‘data’ : ‘AAA’ ‘updated’: ’100’ } { ‘lid’: ‘2’, ‘data’ : ‘hello!’, ‘updated’: ’15’ } POST /merge { ‘guid’: ‘e324f’, ‘data’ : ‘hello!’, ‘updated’: ’15’ } { ’guid’: ‘af54d’, ‘data’: ‘BBB’, ‘updated’ : ’20’ } c1 server
  • 62.
    conflict resolution algorithm(plain data) { ‘lid’: ‘1’, ‘guid’: ‘af54d’, ‘data’ : ‘AAA’ ‘updated’: ’100’ } { ‘lid’: ‘2’, ‘data’ : ‘hello!’, ‘updated’: ’15’ } POST /merge { ‘guid’: ‘e324f’, ‘data’ : ‘hello!’, ‘updated’: ’15’ } { ’guid’: ‘af54d’, ‘data’: ‘BBB’, ‘updated’ : ’20’ } c1 server
  • 63.
    conflict resolution algorithm(plain data) { ‘lid’: ‘1’, ‘guid’: ‘af54d’, ‘data’ : ‘AAA’ ‘updated’: ’100’ } { ‘lid’: ‘2’, ‘data’ : ‘hello!’, ‘updated’: ’15’ } POST /merge { ‘guid’: ‘e324f’, ‘data’ : ‘hello!’, ‘updated’: ’15’ } { ’guid’: ‘af54d’, ‘data’: ‘AAA’, ‘updated’ : ’100’ } c1 server
  • 64.
    conflict resolution algorithm(plain data) { ‘lid’: ‘1’, ‘guid’: ‘af54d’, ‘data’ : ‘AAA’ ‘updated’: ’100’ } { ‘guid’: ‘e324f’, ‘data’ : ‘hello!’, ‘updated’: ’15’ } { ’guid’: ‘af54d’, ‘data’: ‘AAA’, ‘updated’ : ’100’ } { ‘lid’: ‘2’, ‘data’ : ‘hello!’, ‘updated’: ’15’ } POST /merge c1 server {‘ok’ : { ’guid’: ‘af54d’ }} {‘update’ : { lid: ‘2’, ’guid’: ‘e324f’ }}
  • 65.
    conflict resolution algorithm(hierarchical data) How to manage hierarchical data? { ‘lid’ : ‘123456’, ‘type’ : ‘baby’, … } { ‘lid’ : ‘123456’, ‘type’ : ‘temperature’, ‘baby_id : ‘123456’ }
  • 66.
    conflict resolution algorithm(hierarchical data) How to manage hierarchical data? 1) sync root record 2) update ids 3) sync child records { ‘lid’ : ‘123456’, ‘type’ : ‘baby’, … } { ‘lid’ : ‘123456’, ‘type’ : ‘temperature’, ‘baby_id : ‘123456’ }
  • 67.
    conflict resolution algorithm(hierarchical data) function syncHierarchical($data) { sortByHierarchy($data); foreach ($data as $newRootRecord) { $s = findByGuid($newRootRecord->getGuid()); if($newRecord->isRoot()) { if (!$s) { add($newRootRecord); updateRecordIds($newRootRecord, $data); send($newRootRecord); continue; } …
  • 68.
    conflict resolution algorithm(hierarchical data) function syncHierarchical($data) { sortByHierarchy($data); foreach ($data as $newRootRecord) { $s = findByGuid($newRootRecord->getGuid()); if($newRecord->isRoot()) { if (!$s) { add($newRootRecord); updateRecordIds($newRootRecord, $data); send($newRootRecord); continue; } … parent records first
  • 69.
    conflict resolution algorithm(hierarchical data) function syncHierarchical($data) { sortByHierarchy($data); foreach ($data as $newRootRecord) { $s = findByGuid($newRootRecord->getGuid()); if($newRecord->isRoot()) { if (!$s) { add($newRootRecord); updateRecordIds($newRootRecord, $data); send($newRootRecord); continue; } …
  • 70.
    conflict resolution algorithm(hierarchical data) function syncHierarchical($data) { sortByHierarchy($data); foreach ($data as $newRootRecord) { $s = findByGuid($newRootRecord->getGuid()); if($newRecord->isRoot()) { if (!$s) { add($newRootRecord); updateRecordIds($newRootRecord, $data); send($newRootRecord); continue; } … no conflict
  • 71.
    … if ($newRootRecord->updated> $s->updated) { update($s, $newRecord); updateRecordIds($newRootRecord, $data); send($newRootRecord); continue; } else { updateRecordIds($s, $data); updateRemote($newRecord, $s); } } else { sync($data); } } conflict resolution algorithm (hierarchical data) remote wins
  • 72.
    … if ($newRootRecord->updated> $s->updated) { update($s, $newRecord); updateRecordIds($newRootRecord, $data); send($newRootRecord); continue; } else { updateRecordIds($s, $data); updateRemote($newRecord, $s); } } else { sync($data); } } conflict resolution algorithm (hierarchical data) server wins
  • 73.
    conflict resolution algorithm(hierarchical data) { ‘lid’: ‘1’, ‘data’ : ‘AAA’ ‘updated’: ’100’ } { ‘lid’: ‘2’, ‘parent’: ‘1’, ‘data’ : ‘hello!’, ‘updated’: ’15’ } POST /merge c1 server
  • 74.
    conflict resolution algorithm(hierarchical data) { ‘lid’: ‘1’, ‘data’ : ‘AAA’ ‘updated’: ’100’ } { ‘lid’: ‘2’, ‘parent’: ‘1’, ‘data’ : ‘hello!’, ‘updated’: ’15’ } c1 server POST /merge { ‘lid’: ‘1’, ‘guid’ : ‘32ead’, ‘data’ : ‘AAA’ ‘updated’: ’100’ }
  • 75.
    conflict resolution algorithm(hierarchical data) { ‘lid’: ‘1’, ‘data’ : ‘AAA’ ‘updated’: ’100’ } { ‘lid’: ‘2’, ‘parent’: ‘32ead’, ‘data’ : ‘hello!’, ‘updated’: ’15’ } c1 server POST /merge { ‘lid’: ‘1’, ‘guid’ : ‘32ead’, ‘data’ : ‘AAA’ ‘updated’: ’100’ }
  • 76.
    conflict resolution algorithm(hierarchical data) { ‘lid’: ‘1’, ‘data’ : ‘AAA’ ‘updated’: ’100’ } { ‘lid’: ‘2’, ‘parent’: ‘32ead’, ‘data’ : ‘hello!’, ‘updated’: ’15’ } c1 server POST /merge { ‘lid’: ‘1’, ‘guid’ : ‘32ead’, ‘data’ : ‘AAA’ ‘updated’: ’100’ } { ‘lid’: ‘2’, ‘parent’: ‘32ead’, ‘data’ : ‘hello!’, ‘updated’: ’15’ } {‘update’ : { ‘lid’: ‘1’, ’guid’: ‘af54d’ }} {‘update’ : { lid: ‘2’, ’guid’: ‘e324f’ }}
  • 77.
    enforcing domain constraints e.g. “only one temperature can be registered in a given day” how to we enforce domain constraints on data?
  • 78.
    enforcing domain constraints e.g. “only one temperature can be registered in a given day” how to we enforce domain constraints on data? 1) relax constraints
  • 79.
    enforcing domain constraints e.g. “only one temperature can be registered in a given day” how to we enforce domain constraints on data? 1) relax constraints 2) integrate constraints in sync algorithm
  • 80.
    enforcing domain constraints from findByGuid to findSimilar first lookup by GUID then by domain rules “two measures are similar if are referred to the same date”
  • 81.
  • 82.
    enforcing domain constraints { ’guid’: ‘af54d’, ‘when’: ‘20141005’ } c1 server
  • 83.
    enforcing domain constraints { ‘lid’: ‘1’, ‘when’: ‘20141005’ } { ’guid’: ‘af54d’, ‘when’: ‘20141005’ } c1 server
  • 84.
    enforcing domain constraints { ‘lid’: ‘1’, ‘when’: ‘20141005’ } { ’guid’: ‘af54d’, ‘when’: ‘20141005’ } POST /merge c1 server
  • 85.
    enforcing domain constraints { ‘lid’: ‘1’, ‘when’: ‘20141005’ } { ’guid’: ‘af54d’, ‘when’: ‘20141005’ } POST /merge c1 server
  • 86.
    enforcing domain constraints { ‘lid’: ‘1’, ‘when’: ‘20141005’ } { ’guid’: ‘af54d’, ‘when’: ‘20141005’ } POST /merge c1 server { ’guid’: ‘af54d’, ‘when’: ‘20141005’ }
  • 87.
    dealing with binarydata Binary data uploaded via custom endpoint Sync data remains small Uploads can be resumed
  • 88.
    dealing with binarydata Two steps* 1) data are synchronized 2) related images are uploaded * this means record without file for a given time
  • 89.
    dealing with binarydata POST /merge { ‘lid’ : 1, ‘type’ : ‘baby’, ‘image’ : ‘myimage.jpg’ } { ‘lid’ : 1, ‘guid’ : ‘ac435-f8345’ } c1 server POST /upload/ac435-f8345/image
  • 90.
    What we learned Implementing this stuff is tricky Explore existing solution if you can Understanding the domain is important
  • 91.
  • 92.
  • 93.
    CRDT Conflict-free ReplicatedData Types (CRDTs) Constraining the types of operations in order to: - ensure convergence of changes to shared data by uncoordinated, concurrent actors - eliminate network failure modes as a source of error
  • 94.
    Couchbase Mobile Gatewayshandles sync Data flows through channels - partition data set - authorization - limit the data Use revision trees
  • 95.
    Riak Distributed DB Eventually/Strong Consistency Data Types Configurable conflict resolution - db level for built-in data types - application level for custom data
  • 96.
    That’s all folks! Questions? Please leave feedback! https://joind.in/12959
  • 97.
    http://www.objc.io/issue-10/sync-case-study.html http://www.objc.io/issue-10/data-synchronization.html https://dev.evernote.com/media/pdf/edam-sync.pdf http://blog.helftone.com/clear-in-the-icloud/ http://strongloop.com/strongblog/node-js-replication-mobile-offline-sync-loopback/ http://blog.denivip.ru/index.php/2014/04/data-syncing-in-core-data-based-ios-apps/?lang=en http://inessential.com/2014/02/15/vesper_sync_diary_8_the_problem_of_un http://culturedcode.com/things/blog/2010/12/state-of-sync-part-1.html http://programmers.stackexchange.com/questions/206310/data-synchronization-in-mobile-apps-multiple- devices-multiple-users http://bricklin.com/offline.htm http://blog.couchbase.com/why-mobile-sync Links
  • 98.
    Links Vector Clocks http://basho.com/why-vector-clocks-are-easy/ http://www.datastax.com/dev/blog/why-cassandra-doesnt-need-vector-clocks http://basho.com/why-vector-clocks-are-hard/ http://blog.8thlight.com/rylan-dirksen/2013/10/04/synchronization-in-a-distributed-system.html CRDTs http://christophermeiklejohn.com/distributed/systems/2013/07/12/readings-in-distributed-systems.html http://www.infoq.com/presentations/problems-distributed-systems https://www.youtube.com/watch?v=qyVNG7fnubQ Riak http://docs.basho.com/riak/latest/dev/using/conflict-resolution/ Couchbase Sync Gateway http://docs.couchbase.com/sync-gateway/ http://www.infoq.com/presentations/sync-mobile-data API http://developers.amiando.com/index.php/REST_API_DataSync https://login.syncano.com/docs/rest/index.html
  • 99.
    Credits phones https://www.flickr.com/photos/15216811@N06/14504964841 wat http://uturncrossfit.com/wp-content/uploads/2014/04/wait-what.jpg darth http://www.listal.com/viewimage/3825918h blueprint: http://upload.wikimedia.org/wikipedia/commons/5/5e/Joy_Oil_gas_station_blueprints.jpg building: http://s0.geograph.org.uk/geophotos/02/42/74/2427436_96c4cd84.jpg brownfield: http://s0.geograph.org.uk/geophotos/02/04/54/2045448_03a2fb36.jpg no connection: https://www.flickr.com/photos/77018488@N03/9004800239 no internet con https://www.flickr.com/photos/roland/9681237793 vector clocks: http://en.wikipedia.org/wiki/Vector_clock crdts: http://www.infoq.com/presentations/problems-distributed-systems