5. So, why is it a problem?
● user is blocked in offline
● connection (blink, detection delay, unstable)
● data is always changing
It may bring user to loose data
5
6. What is different?
6
● Client UX (non-blocking, responsive, reliable)
● Client has smart caching
● Client partially duplicates server’s logic
● Controlled connectivity
● Smart synchronization
7. Offline-first is a Paradigm
is a distinct set of concepts or thought
patterns, including theories, research
methods, postulates, and standards for
what constitutes legitimate contributions
to a field
7
§
8. Who needs these super-smart apps?
● Field workers
● Store observation
● Technical audit
● Medical observation
BUSINESS
EFFICIENCY
8
9. Who needs these super-smart apps?
● Instant messengers
● Social networks
● Document editors
● Media organizers
ALLUSERS
LOYALTY
9
16. Data slices: Static Meta-Data
16
Application
Static Meta-
Data
Business
Data
System
Runtime Data
Configuration & Settings
Layout Declaration
Permissions
Translations
17. Data slices: Business Data
17
Application
Static Meta-
Data
Business
Data
System
Runtime Data
Dictionaries
Fetched Data (Read only)
Entered Data (User Data)
18. Data slices: System Runtime Data
18
Application
Static Meta-
Data
Business
Data
System
Runtime Data
Logs (action, error,
profiling)
Session Data
Transaction States (Sync)
24. Optimistic replication algorithm
24
1. Operation submission (user submission)
2. Propagation to other replicas
3. Scheduling of operations to apply by each
replica
4. Conflict resolution (syntactic, semantic)
5. Commitment on a final schedule and
25. Strong Eventual Consistency (SEC)
is a property of some eventually-consistent
systems: replicas that have received and
applied the same set of updates must
immediately have equivalent state
25
§
26. Conflict-free replicated data type (CRDT)
26
is a type of specially-designed data structure
used to achieve strong eventual consistency
(SEC) and monotonicity (absence of rollbacks)
● operation-based
● state-based
27. Operation-based CRDTs (CmRDT)
27
commutative replicated data types
The operations are commutative, so can
be received and applied in any order;
however, they are not idempotent, and
additional network protocol guarantees are
required to ensure unique delivery
a * b = b * a
O(a) ≠ O(O(a))
29. State-based CRDTs (CvRDT): Interface
29
➔ query - reads the state of the replica, with no side
effects
➔ update - writes to the replica state in accordance with
certain restrictions (monotonically increase)
➔ merge - merges local state with the state of some
remote replica (commutative, associative, and
idempotent)
a * b = b * a
O(a, b) = O(O(a, b), b)
a*(b*c) = (a*b)*c
33. Operational transformation (OT)
33
is a technology for supporting a range of
collaboration functionalities in advanced
collaborative software systems. OT was
originally invented for consistency
maintenance and concurrency control in
collaborative editing of plain text documents
Apache
Wave
Google
Docs
34. Operational transformation (OT)
34
➔O1 = Insert[0, "x"] // “xabc”
(to insert character "x" at position "0")
➔O2 = Delete[2, "c"] // “xabc” ???
(to delete the character "c" at position "2")
“abc”
➔* O2 = Delete[3, "c"] // “xab”
35. Operational transformation (OT)
35
Unfortunately, implementing OT sucks. There's a million
algorithms with different tradeoffs, mostly trapped in
academic papers. The algorithms are really hard and time
consuming to implement correctly. ... Wave took 2 years to
write and if we rewrote it today, it would take almost as
long to write a second time.
Joseph Gentle
who is an ex Google Wave engineer and an author of the Share.JS
39. Don’t forget about AUDIT
39
● Why do we have this value?
● Who did this change?
● When?
● What was the previous value?
40. References: read theory
● Offline First manifesto http://offlinefirst.org/
● Links to articles, events, samples https://github.com/offlinefirst
● Designing Offline-First Web Apps http://alistapart.com/article/offline-first
● SAY HELLO TO OFFLINE FIRST http://hood.ie/blog/say-hello-to-offline-first.html
● Optimistic replication https://www.wikiwand.com/en/Optimistic_replication
● CRDT http://www.wikiwand.com/en/Conflict-free_replicated_data_type
● OT http://www.wikiwand.com/en/Operational_transformation
● appSync.org: open-source patterns & code for data synchronization in mobile apps
http://confluence.tapcrowd.com/pages/viewpage.action;jsessionid=3F4D2C44DBFC46644A7955F82
A416DC2?pageId=2262404
● * Free Book “Distributed Systems” by Mikito Takada http://book.mixu.net/distsys/index.html
● Roshi CRDT explanation https://developers.soundcloud.com/blog/roshi-a-crdt-system-for-
timestamped-events
● A comprehensive study of CRDTs by Marc Shapiro
https://hal.inria.fr/file/index/docid/555588/filename/techreport.pdf
40
41. References: source code
● appSync source https://bitbucket.org/nikonelissen/appsync/src
● CRDT Roshi by SoundCloud with great expanation https://github.com/soundcloud/roshi
● CRDT Riak by Basho (League of Legends) https://github.com/basho/
● ShareJS https://github.com/share/ShareJS
● Hoodie https://github.com/hoodiehq/hoodie
41
42. References: borrowed resources
● Sync types diagram http://confluence.tapcrowd.com/pages/viewpage.action?pageId=2262404
● Free icons found here http://findicons.com/
42
43. Summary: how to build offline-first apps
43
● Know the Paradigm
● Plan data structure and synchronization
approaches before development
● Be paranoid about user’s data
● Develop offline-first applications
TODO:
Chat example should explain choosing of sync structure for different fields (title, messages, participants)
Tell them how important to understand approaches to not work around hacks and bugs (my experience during SCollab development)
Current approach of mobile app (web, hybrid, native) development takes into account that a client has connection to server or sometimes can have connection breaks and a user can wait until the connection is restored.
We live in a disconnected & battery powered world, but our technology and best practices are a leftover from the always connected & steadily powered past.
But this paradigm is not complete because:
connection might blink
device might not detect connection status change immediately
connection might be unstable (request handled but not communicated to the client back)
So, to bring super-stable experience the client should be super-smart and the server should understand it.
Field workers
TV, internet, cell providers, doctor visits, energy providers, plumbers, etc.
Store observation
Technical audit
Car, house, utilities assets audit
Medical observation
TODO: Insert a picture of mobile device and cell/radio tower
TODO: insert a diagram “start” of offline-first app branches
- User Interface
- User Experience
- Offline Data
- Connectivity
- Synchronization
- Troubleshooting
TODO: insert a diagram “start” of offline-first app branches
- User Interface
- User Experience
- Offline Data
- Connectivity
- Synchronization
- Troubleshooting
TODO: Insert a picture of mobile device and cell/radio tower
## Application itself
- scripts,
- styles
- media resources
## Static meta-data
- configs and settings - e.g. how to connect to services, client-server time difference,
- layout declaration
- permissions
- translations
## Business data
- dictionaries
- fetched data
- entered data (aka User data)
## System runtime data
- logs (action, error, profiling)
- session data (app state, offline authentication)
- transaction states (data sync, app updates)
## Application itself
- scripts,
- styles
- media resources
## Static meta-data
- configs and settings - e.g. how to connect to services, client-server time difference,
- layout declaration
- permissions
- translations
## Business data
- dictionaries
- fetched data
- entered data (aka User data)
## System runtime data
- logs (action, error, profiling)
- session data (app state, offline authentication)
- transaction states (data sync, app updates)
Data synchronization is the process of establishing consistency among data from a source to a target data storage and vice versa and the continuous harmonization of the data over time. It is fundamental to a wide variety of applications, including file synchronization and mobile device synchronization
One-way sync: data is only synced from the server to the apps (e.g. news app where content is synced from the backend CMS to the apps) or data is synced from the apps to a server (e.g. logging/analytics).
Two-way sync: data is synced in two directions, from an app to a backend and back. E.g. a user is logged in and can manage his own data on a website and in an app (assuming user cannot be logged in on 2 devices at the same time).
Multi-way sync: data is synced from multiple devices to a server/backend and back. This also means that data from one device is synced to the server and from the server to other devices (e.g. collaboration apps...).
One-way sync: data is only synced from the server to the apps (e.g. news app where content is synced from the backend CMS to the apps) or data is synced from the apps to a server (e.g. logging/analytics).
Two-way sync: data is synced in two directions, from an app to a backend and back. E.g. a user is logged in and can manage his own data on a website and in an app (assuming user cannot be logged in on 2 devices at the same time).
Multi-way sync: data is synced from multiple devices to a server/backend and back. This also means that data from one device is synced to the server and from the server to other devices (e.g. collaboration apps...).
Traditional pessimistic replication systems try to guarantee from the beginning that all of the replicas are identical to each other, as if there was only a single copy of the data all along.
Optimistic replication does away with this in favor of eventual consistency, meaning that replicas are guaranteed to converge only when the system has been quiesced for a period of time. As a result there is no longer a need to wait for all of the copies to be synchronized when updating data, which helps concurrency and parallelism. The trade-off is that different replicas may require explicit reconciliation later on, which might then prove difficult or even insoluble.
Informally, eventual consistency means that replicas eventually reach the same value if clients stop submitting updates. Eventually consistent systems accept local updates without remote synchronization, improving performance and scalability by sacrificing strong consistency. Without remote synchronization, replicas concurrently hold different values which are expected to converge over time. Convergence is complicated by conflicts which arise when merging values between replicas. A conflict is a combination of concurrent updates which may be individually correct, but taken together violate some system invariant. A conflict is a combination of concurrent updates which may be individually correct, but taken together violate some system invariant. Conventional conflict-resolution schemes involve state roll-back, full consensus, or even user interaction.
An optimistic replication algorithm consists of five elements:
Operation submission: Users submit operations at independent sites.
Propagation: Each site shares the operations it knows about with the rest of the system.
Scheduling: Each site decides on an order for the operations it knows about.
Conflict resolution: If there are any conflicts among the operations a site has scheduled, it must modify them in some way.
Commitment: The sites agree on a final schedule and conflict resolution result, and the operations are made permanent.
There are two strategies for propagation: state transfer, where sites propagate a representation of the current state, and operation transfer, where sites propagate the operations that were performed (essentially, a list of instructions on how to reach the new state).
Scheduling and conflict resolution can either be syntactic or semantic. Syntactic systems rely on general information, such as when or where an operation was submitted. Semantic systems are able to make use of application-specific information to make smarter decisions. Note that state transfer systems generally have no information about the semantics of the data being transferred, and so they have to use syntactic scheduling and conflict resolution.
Strong eventual consistency is a property of some eventually-consistent systems: replicas that have received and applied the same set of updates must immediately have equivalent state.
There is no conflict arbitration process, because conflicts do not exist in strongly-consistent systems.
CRDTs are used to achieve strong eventual consistency in a distributed system.
State-based CRDTs are called convergent replicated data types, or CvRDTs. In contrast to CmRDTs, CvRDTs send their full local state to other replicas. CvRDTs have the following local interface:
query - reads the state of the replica, with no side effects
update - writes to the replica state in accordance with certain restrictions
merge - merges local state with the state of some remote replica
The merge function must be commutative, associative, and idempotent. It provides a join for any pair of replica states, so the set of all states forms a semilattice. The updatefunction must monotonically increase the internal state, according to the same partial order rules as the semilattice.
A common strategy in CRDT development is to stick multiple primitive CRDTs together to make a more complex CRDT. In this case, two increment-only counters were combined to create a CvRDT supporting both increment and decrement operations. Note that the CvRDT's internal state must increase monotonically, even though its external state as exposed throughquery can return to previous values.
The grow-only set is a CvRDT implementing a set which only allows adds. Since it is impossible for adds and removes to commute (one must take precedence over the other), any CvRDT supporting both add and remove operations must pick and choose its semantics.
Two grow-only set CvRDTs are combined to create the 2P-set CvRDT. With the addition of a "tombstone" set, elements can be added and also removed. Once removed, an element cannot be re-added; that is, once an element e is in the tombstone set, query will never again return True for that element. The 2P-set uses "remove-wins" semantics, so remove(e) takes precedence over add(e).