Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Data Diffing Based Software Architecture Patterns


Published on

Clojure has been heralded as a pioneer in data oriented functional programming. In this talk, Huahai will explore the use of Clojure data diffing/patching library as a tool to simplify software architecture and solve complex engineering problems. After briefly describing EditScript, a Clojure data diffing/patching library, he will detail several usage patterns by drawing from code examples in our production system.

Huahai will discuss how diffing improves system modularization by reducing namespace dependencies; how it drastically simplifies client-server communication to drive much faster UI iterations; how it enables massive scaling by turning stateful applications into stateless ones; and how it powers collaborative editing of online documents.

This talk is for everyone who are interested in expanding their data oriented functional programming tool box.

Published in: Software
  • Be the first to comment

  • Be the first to like this

Data Diffing Based Software Architecture Patterns

  1. 1. Data Diffing Based Software Architecture Patterns Huahai Yang Juji Inc.
  2. 2. What is diffing? • Given two elements a and b,calculate the difference d between them • Function (diff a b) ;=> d • Function (patch a d) • Such that (= b (patch a d)) • Or: (= b (patch a (diff a b))) • These are normally true: • (not= (diff a b) (diff b a)) • (= (diff a c) (concat (diff a b) (diff b c))) • (< (size d) (min (size a) (size b))) • (< (time (patch a d)) (time (diff a b)))
  3. 3. Evolution of diffing (1) • Earliest diff was developed by Doug McIIroy on Unix at Bell Lab in 1974 • Works on text file, work units are lines of text • Purpose: Reduce storage necessary to maintain multiple versions of file. • Use: compare content, track changes, verifying output, version control
  4. 4. Evolution of diffing (2) • Diffing in 3D graphics programming • World modeled as a scene graph • Only re-render changed subtrees • Purpose: performance optimization • Conceptually simple programming model: render everything • Inspired react.js • Clojurescript wrapper of react could be faster than react due to faster diffing with immutable data
  5. 5. Evolution of diffing (3) • Data oriented programming • Data, not text • Data are directly meaningful for code, no need for parsing or decoding • Generic data literals, not specialized opaque programming constructs • Diff input and output are both data • Diffing as a software architecture consideration, not just an implementation detail, impacting • Delineation of system components • Data model design • API design
  6. 6. Diffing enables decoupling • diff & patch functions are generic and blind • They don't have to understand their input for them to work • Semantic asymmetry between sender and receiver enforces separation of concerns • Also support a kind of natural encapsulation, not forced like in OOP • d is still open for inspection if the receiver chooses to • Graded, receiver don’t need know a lot, but can know a lot if choose to Sender (diff a a’) ;=> d d Receiver (patch a d) ;=> a’
  7. 7. Diffing encourages data model reuse • Thanks to diffing, data duplication between components are faithful and cheap • Advantageous to reuse the same data model throughout the system, dramatically simplifying system
  8. 8. Diffing tracks changes • Thanks to diffing, each version of the world state can be cheaply saved and replayed to recover originals • Application statefulness can be externalized and managed
  9. 9. Editscript: a Clojure data diffing library • • Works for vector, list, set and map • Edits are a vector of vectors: • Path • Op :+, :-, or :r • Value • Diffing algorithms • Quick: fast • A* : optimal diff size
  10. 10. Case study: Juji Studio UI Re-design • Complete UI redesign • Re-implementation • One month turnaround • Mainly due to switching from a resource-oriented API to a diffing based API Before
  11. 11. Case study: Juji Studio UI Re-design • Complete UI redesign • Re-implementation • One month turnaround • Mainly due to switching from a resource-oriented API to a diffing based API After
  12. 12. UI Data model: config doc • Single Page Application (SPA) in cljs • States in an EDN document – config doc • SPA, server and DB all having copies of config doc Config doc SPA Server DB GraphQL Config doc Config docAPI
  13. 13. Traditional GraphQL API • Resources oriented (RESTful) • Server side config doc is the truth • API is CRUD on server resources • i.e. paths in the config doc • Repetitive CRUD calls for each and every type of nodes • Thousands lines of Lacinia schema
  14. 14. Diffing based GraphQL API • All logic is in SPA • API is CRUD on config doc • Update is sending diffs • SPA periodically sends to server: (diff doc-prev doc-now) • Server applies the diff, saves the doc in DB, replies with config doc SHA • SPA validates SHA, if different, sends config doc to overwrite • Removed all API calls on paths and nodes
  15. 15. Case study: externalize application states • How to scale highly stateful application? • E.g. Juji initiates an agent (rep) for each chat session on a server node, the state of each rep is stored in an atom • What if the server node become unavailable? Server Node API Gateway
  16. 16. Case study: externalize application states • Each rep sends diff of its state to a persistent log (e.g. Kafka) • E.g. At each utterance, rep sends (diff state-prev state-now) • When a server becomes unavailable, API gateway forward traffic to another server, which recovers the agent state from the persistent log, by simply sequentially applying all diffs to a shared initial state. Server Node API Gateway Persistent Log diff
  17. 17. Case study: reduce component dependency • Stateful components depend on one another • Introducing user invokable system functions, leads to circular dependency, e.g. (juji.func.system/cleanup-chat rep) System Rep Reps Rep Subs func.system [:rt jujiid]
  18. 18. • Instead of depending on namespaces that contain subscriptions • Watch reps atom • Inspect its diff between old and new • Handle the case when a rep is removed or cleaned • i.e. sending :user-left message to channels, and let the subscriptions clean themselves up
  19. 19. Case study: synchronize collaborative editing • Multiple parties sending diffs • Out of sync when lines cross path • Difficult yet common problem • E.g. enable multiple users editing the same chat at the same time • Locking has bad UX • Three-way merge has high latency A A (diff A A’) (diff A A’’)
  20. 20. Differential Synchronization • Diffing based synchronization method • Scalable • Fault-tolerant • Low latency • Developed by Neil Fraser in 2009 • Used by Google Docs
  21. 21. • Client-server case • Use two shadows
  22. 22. • Fault tolerant case • Keep a backup shadow
  23. 23. • Scaling
  24. 24. Data modeling guideline: Don’t use vector • Minimize unnecessary use of ordered data structure, e.g. vector or list • Diffing algorithm is slow for ordered data, because order is a strong constraint to satisfy • Ordered O(mn) vs. Unordered O(m+n) • The implicit order of data elements are often source of incidental complexity • Meaningful order is often based on data fields • Sets or maps suffice in most cases [ {} {} {} … ] Bad { {} {} {} … } #{ {} {} {} … } Good
  25. 25. Conclusion • Diffing offers a few properties that lead to • Simplified software architecture • Enhanced system decoupling • Easier scaling of stateful application • Better solution to data synchronization problem • Worthwhile to consider diffing based software architecture • Particularly for data-oriented programming
  26. 26. Thank you! • Huahai Yang @huahaiy • Juji Inc.