SlideShare a Scribd company logo
1 of 26
Download to read offline
Data Diffing Based Software
Architecture Patterns
Huahai Yang
Juji Inc.
What is diffing?
• Given two elements a and b,calculate the difference d between
them
• Function (diff a b) ;=> d
• Function (patch a d)
• Such that (= b (patch a d))
• Or: (= b (patch a (diff a b)))
• These are normally true:
• (not= (diff a b) (diff b a))
• (= (diff a c) (concat (diff a b) (diff b c)))
• (< (size d) (min (size a) (size b)))
• (< (time (patch a d)) (time (diff a b)))
Evolution of diffing (1)
• Earliest diff was developed by Doug
McIIroy on Unix at Bell Lab in 1974
• Works on text file, work units are lines
of text
• Purpose: Reduce storage necessary to
maintain multiple versions of file.
• Use: compare content, track changes,
verifying output, version control
Evolution of diffing (2)
• Diffing in 3D graphics programming
• World modeled as a scene graph
• Only re-render changed subtrees
• Purpose: performance optimization
• Conceptually simple programming
model: render everything
• Inspired react.js
• Clojurescript wrapper of react could
be faster than react due to faster
diffing with immutable data
Evolution of diffing (3)
• Data oriented programming
• Data, not text
• Data are directly meaningful for code, no need for parsing or decoding
• Generic data literals, not specialized opaque programming constructs
• Diff input and output are both data
• Diffing as a software architecture consideration, not just an
implementation detail, impacting
• Delineation of system components
• Data model design
• API design
Diffing enables decoupling
• diff & patch functions are generic and blind
• They don't have to understand their input for them to work
• Semantic asymmetry between sender and receiver enforces separation of
concerns
• Also support a kind of natural encapsulation, not forced like in OOP
• d is still open for inspection if the receiver chooses to
• Graded, receiver don’t need know a lot, but can know a lot if choose to
Sender
(diff a a’) ;=> d
d
Receiver
(patch a d) ;=> a’
Diffing encourages data model reuse
• Thanks to diffing, data duplication between components are faithful and
cheap
• Advantageous to reuse the same data model throughout the system,
dramatically simplifying system
Diffing tracks changes
• Thanks to diffing, each version of the
world state can be cheaply saved
and replayed to recover originals
• Application statefulness can be
externalized and managed
Editscript: a Clojure data
diffing library
• https://github.com/juji-io/editscript
• Works for vector, list, set and map
• Edits are a vector of vectors:
• Path
• Op :+, :-, or :r
• Value
• Diffing algorithms
• Quick: fast
• A* : optimal diff size
Case study: Juji Studio UI Re-design
• Complete UI redesign
• Re-implementation
• One month
turnaround
• Mainly due to
switching from a
resource-oriented API
to a diffing based API
Before
Case study: Juji Studio UI Re-design
• Complete UI redesign
• Re-implementation
• One month
turnaround
• Mainly due to
switching from a
resource-oriented API
to a diffing based API
After
UI Data model: config doc
• Single Page Application (SPA) in cljs
• States in an EDN document – config doc
• SPA, server and DB all having copies of
config doc
Config
doc
SPA Server DB
GraphQL
Config
doc
Config
docAPI
Traditional GraphQL API
• Resources oriented
(RESTful)
• Server side config doc is
the truth
• API is CRUD on server
resources
• i.e. paths in the config
doc
• Repetitive CRUD calls for
each and every type of
nodes
• Thousands lines of Lacinia
schema
Diffing based GraphQL API
• All logic is in SPA
• API is CRUD on config doc
• Update is sending diffs
• SPA periodically sends to
server:
(diff doc-prev doc-now)
• Server applies the diff, saves
the doc in DB, replies with
config doc SHA
• SPA validates SHA, if
different, sends config doc
to overwrite
• Removed all API calls on
paths and nodes
Case study: externalize application states
• How to scale highly stateful application?
• E.g. Juji initiates an agent (rep) for each chat session on a server node, the
state of each rep is stored in an atom
• What if the server node become unavailable?
Server Node
API
Gateway
Case study: externalize application states
• Each rep sends diff of its state to a persistent log (e.g. Kafka)
• E.g. At each utterance, rep sends (diff state-prev state-now)
• When a server becomes unavailable, API gateway forward traffic to
another server, which recovers the agent state from the persistent
log, by simply sequentially applying all diffs to a shared initial state.
Server Node
API
Gateway
Persistent Log
diff
Case study: reduce component dependency
• Stateful components depend on one another
• Introducing user invokable system functions,
leads to circular dependency, e.g.
(juji.func.system/cleanup-chat rep)
System
Rep
Reps
Rep
Subs
func.system
[:rt jujiid]
• Instead of depending on
namespaces that contain
subscriptions
• Watch reps atom
• Inspect its diff between old
and new
• Handle the case when a rep
is removed or cleaned
• i.e. sending :user-left
message to channels, and let
the subscriptions clean
themselves up
Case study: synchronize collaborative editing
• Multiple parties sending diffs
• Out of sync when lines cross path
• Difficult yet common problem
• E.g. enable multiple users editing the same
chat at the same time
• Locking has bad UX
• Three-way merge has high latency
A A
(diff A A’)
(diff A A’’)
Differential Synchronization
• Diffing based synchronization
method
• Scalable
• Fault-tolerant
• Low latency
• Developed by Neil Fraser in
2009
• Used by Google Docs
• Client-server
case
• Use two
shadows
• Fault
tolerant case
• Keep a
backup
shadow
• Scaling
Data modeling guideline: Don’t use vector
• Minimize unnecessary use of ordered data structure, e.g. vector or
list
• Diffing algorithm is slow for ordered data, because order is a strong
constraint to satisfy
• Ordered O(mn) vs. Unordered O(m+n)
• The implicit order of data elements are often source of incidental complexity
• Meaningful order is often based on data fields
• Sets or maps suffice in most cases
[ {} {} {} … ]
Bad
{ {} {} {} … } #{ {} {} {} … }
Good
Conclusion
• Diffing offers a few properties that lead to
• Simplified software architecture
• Enhanced system decoupling
• Easier scaling of stateful application
• Better solution to data synchronization problem
• Worthwhile to consider diffing based software architecture
• Particularly for data-oriented programming
Thank you!
• Huahai Yang @huahaiy
• Juji Inc. https://juji.io

More Related Content

What's hot

Hadoop 3 @ Hadoop Summit San Jose 2017
Hadoop 3 @ Hadoop Summit San Jose 2017Hadoop 3 @ Hadoop Summit San Jose 2017
Hadoop 3 @ Hadoop Summit San Jose 2017Junping Du
 
Everything I Ever Learned About JVM Performance Tuning @Twitter
Everything I Ever Learned About JVM Performance Tuning @TwitterEverything I Ever Learned About JVM Performance Tuning @Twitter
Everything I Ever Learned About JVM Performance Tuning @TwitterAttila Szegedi
 
Transactional writes to cloud storage with Eric Liang
Transactional writes to cloud storage with Eric LiangTransactional writes to cloud storage with Eric Liang
Transactional writes to cloud storage with Eric LiangDatabricks
 
Disperse xlator ramon_datalab
Disperse xlator ramon_datalabDisperse xlator ramon_datalab
Disperse xlator ramon_datalabGluster.org
 
Braxton McKee, Founder & CEO, Ufora at MLconf SF - 11/13/15
Braxton McKee, Founder & CEO, Ufora at MLconf SF - 11/13/15Braxton McKee, Founder & CEO, Ufora at MLconf SF - 11/13/15
Braxton McKee, Founder & CEO, Ufora at MLconf SF - 11/13/15MLconf
 
A Comparative Performance Evaluation of Apache Flink
A Comparative Performance Evaluation of Apache FlinkA Comparative Performance Evaluation of Apache Flink
A Comparative Performance Evaluation of Apache FlinkDongwon Kim
 
HBaseCon 2015: HBase Operations at Xiaomi
HBaseCon 2015: HBase Operations at XiaomiHBaseCon 2015: HBase Operations at Xiaomi
HBaseCon 2015: HBase Operations at XiaomiHBaseCon
 
HBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, Cloudera
HBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, ClouderaHBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, Cloudera
HBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, ClouderaCloudera, Inc.
 
Inside Apache SystemML by Frederick Reiss
Inside Apache SystemML by Frederick ReissInside Apache SystemML by Frederick Reiss
Inside Apache SystemML by Frederick ReissSpark Summit
 
[Hadoop Meetup] Apache Hadoop 3 community update - Rohith Sharma
[Hadoop Meetup] Apache Hadoop 3 community update - Rohith Sharma[Hadoop Meetup] Apache Hadoop 3 community update - Rohith Sharma
[Hadoop Meetup] Apache Hadoop 3 community update - Rohith SharmaNewton Alex
 
CaffeOnSpark Update: Recent Enhancements and Use Cases
CaffeOnSpark Update: Recent Enhancements and Use CasesCaffeOnSpark Update: Recent Enhancements and Use Cases
CaffeOnSpark Update: Recent Enhancements and Use CasesDataWorks Summit
 
2014.06.16 - BGI - Genomics BigData Workloads - Shenzhen China
2014.06.16 - BGI - Genomics BigData Workloads - Shenzhen China2014.06.16 - BGI - Genomics BigData Workloads - Shenzhen China
2014.06.16 - BGI - Genomics BigData Workloads - Shenzhen ChinaAllen Day, PhD
 
Apache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query ProcessingApache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query ProcessingBikas Saha
 
Scalable Data Science in Python and R on Apache Spark
Scalable Data Science in Python and R on Apache SparkScalable Data Science in Python and R on Apache Spark
Scalable Data Science in Python and R on Apache Sparkfelixcss
 
Why You Definitely Don’t Want to Build Your Own Time Series Database
Why You Definitely Don’t Want to Build Your Own Time Series DatabaseWhy You Definitely Don’t Want to Build Your Own Time Series Database
Why You Definitely Don’t Want to Build Your Own Time Series DatabaseInfluxData
 
Training Slides: 151 - Tungsten Replicator - Moving your Data
Training Slides: 151 - Tungsten Replicator - Moving your DataTraining Slides: 151 - Tungsten Replicator - Moving your Data
Training Slides: 151 - Tungsten Replicator - Moving your DataContinuent
 
Millions of Regions in HBase: Size Matters
Millions of Regions in HBase: Size MattersMillions of Regions in HBase: Size Matters
Millions of Regions in HBase: Size MattersDataWorks Summit
 
dryadOrleans
dryadOrleansdryadOrleans
dryadOrleansb0rAAs
 

What's hot (20)

Hadoop 3 @ Hadoop Summit San Jose 2017
Hadoop 3 @ Hadoop Summit San Jose 2017Hadoop 3 @ Hadoop Summit San Jose 2017
Hadoop 3 @ Hadoop Summit San Jose 2017
 
Everything I Ever Learned About JVM Performance Tuning @Twitter
Everything I Ever Learned About JVM Performance Tuning @TwitterEverything I Ever Learned About JVM Performance Tuning @Twitter
Everything I Ever Learned About JVM Performance Tuning @Twitter
 
Transactional writes to cloud storage with Eric Liang
Transactional writes to cloud storage with Eric LiangTransactional writes to cloud storage with Eric Liang
Transactional writes to cloud storage with Eric Liang
 
Disperse xlator ramon_datalab
Disperse xlator ramon_datalabDisperse xlator ramon_datalab
Disperse xlator ramon_datalab
 
Braxton McKee, Founder & CEO, Ufora at MLconf SF - 11/13/15
Braxton McKee, Founder & CEO, Ufora at MLconf SF - 11/13/15Braxton McKee, Founder & CEO, Ufora at MLconf SF - 11/13/15
Braxton McKee, Founder & CEO, Ufora at MLconf SF - 11/13/15
 
A Comparative Performance Evaluation of Apache Flink
A Comparative Performance Evaluation of Apache FlinkA Comparative Performance Evaluation of Apache Flink
A Comparative Performance Evaluation of Apache Flink
 
Inferno Scalable Deep Learning on Spark
Inferno Scalable Deep Learning on SparkInferno Scalable Deep Learning on Spark
Inferno Scalable Deep Learning on Spark
 
HBaseCon 2015: HBase Operations at Xiaomi
HBaseCon 2015: HBase Operations at XiaomiHBaseCon 2015: HBase Operations at Xiaomi
HBaseCon 2015: HBase Operations at Xiaomi
 
HBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, Cloudera
HBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, ClouderaHBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, Cloudera
HBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, Cloudera
 
Inside Apache SystemML by Frederick Reiss
Inside Apache SystemML by Frederick ReissInside Apache SystemML by Frederick Reiss
Inside Apache SystemML by Frederick Reiss
 
[Hadoop Meetup] Apache Hadoop 3 community update - Rohith Sharma
[Hadoop Meetup] Apache Hadoop 3 community update - Rohith Sharma[Hadoop Meetup] Apache Hadoop 3 community update - Rohith Sharma
[Hadoop Meetup] Apache Hadoop 3 community update - Rohith Sharma
 
CaffeOnSpark Update: Recent Enhancements and Use Cases
CaffeOnSpark Update: Recent Enhancements and Use CasesCaffeOnSpark Update: Recent Enhancements and Use Cases
CaffeOnSpark Update: Recent Enhancements and Use Cases
 
2014.06.16 - BGI - Genomics BigData Workloads - Shenzhen China
2014.06.16 - BGI - Genomics BigData Workloads - Shenzhen China2014.06.16 - BGI - Genomics BigData Workloads - Shenzhen China
2014.06.16 - BGI - Genomics BigData Workloads - Shenzhen China
 
Apache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query ProcessingApache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query Processing
 
Scalable Data Science in Python and R on Apache Spark
Scalable Data Science in Python and R on Apache SparkScalable Data Science in Python and R on Apache Spark
Scalable Data Science in Python and R on Apache Spark
 
Why You Definitely Don’t Want to Build Your Own Time Series Database
Why You Definitely Don’t Want to Build Your Own Time Series DatabaseWhy You Definitely Don’t Want to Build Your Own Time Series Database
Why You Definitely Don’t Want to Build Your Own Time Series Database
 
Training Slides: 151 - Tungsten Replicator - Moving your Data
Training Slides: 151 - Tungsten Replicator - Moving your DataTraining Slides: 151 - Tungsten Replicator - Moving your Data
Training Slides: 151 - Tungsten Replicator - Moving your Data
 
Node Labels in YARN
Node Labels in YARNNode Labels in YARN
Node Labels in YARN
 
Millions of Regions in HBase: Size Matters
Millions of Regions in HBase: Size MattersMillions of Regions in HBase: Size Matters
Millions of Regions in HBase: Size Matters
 
dryadOrleans
dryadOrleansdryadOrleans
dryadOrleans
 

Similar to Data Diffing Based Software Architecture Patterns

Building Big Data Streaming Architectures
Building Big Data Streaming ArchitecturesBuilding Big Data Streaming Architectures
Building Big Data Streaming ArchitecturesDavid Martínez Rego
 
Michael stack -the state of apache h base
Michael stack -the state of apache h baseMichael stack -the state of apache h base
Michael stack -the state of apache h basehdhappy001
 
YARN Ready: Integrating to YARN with Tez
YARN Ready: Integrating to YARN with Tez YARN Ready: Integrating to YARN with Tez
YARN Ready: Integrating to YARN with Tez Hortonworks
 
So you want to liberate your data?
So you want to liberate your data?So you want to liberate your data?
So you want to liberate your data?Mogens Heller Grabe
 
High Performance Systems in Go - GopherCon 2014
High Performance Systems in Go - GopherCon 2014High Performance Systems in Go - GopherCon 2014
High Performance Systems in Go - GopherCon 2014Derek Collison
 
(DAT203) Building Graph Databases on AWS
(DAT203) Building Graph Databases on AWS(DAT203) Building Graph Databases on AWS
(DAT203) Building Graph Databases on AWSAmazon Web Services
 
Evolutionary database design
Evolutionary database designEvolutionary database design
Evolutionary database designSalehein Syed
 
Agile Oracle to PostgreSQL migrations (PGConf.EU 2013)
Agile Oracle to PostgreSQL migrations (PGConf.EU 2013)Agile Oracle to PostgreSQL migrations (PGConf.EU 2013)
Agile Oracle to PostgreSQL migrations (PGConf.EU 2013)Gabriele Bartolini
 
Building a highly scalable and available cloud application
Building a highly scalable and available cloud applicationBuilding a highly scalable and available cloud application
Building a highly scalable and available cloud applicationNoam Sheffer
 
Kafka Summit NYC 2017 Introduction to Kafka Streams with a Real-life Example
Kafka Summit NYC 2017 Introduction to Kafka Streams with a Real-life ExampleKafka Summit NYC 2017 Introduction to Kafka Streams with a Real-life Example
Kafka Summit NYC 2017 Introduction to Kafka Streams with a Real-life Exampleconfluent
 
Impala Architecture presentation
Impala Architecture presentationImpala Architecture presentation
Impala Architecture presentationhadooparchbook
 
Large-scale projects development (scaling LAMP)
Large-scale projects development (scaling LAMP)Large-scale projects development (scaling LAMP)
Large-scale projects development (scaling LAMP)Alexey Rybak
 
Intro to Big Data and NoSQL
Intro to Big Data and NoSQLIntro to Big Data and NoSQL
Intro to Big Data and NoSQLDon Demcsak
 
Introduction to Impala
Introduction to ImpalaIntroduction to Impala
Introduction to Impalamarkgrover
 
Introduction to Hadoop and Big Data
Introduction to Hadoop and Big DataIntroduction to Hadoop and Big Data
Introduction to Hadoop and Big DataJoe Alex
 
Distributed Data processing in a Cloud
Distributed Data processing in a CloudDistributed Data processing in a Cloud
Distributed Data processing in a Cloudelliando dias
 
FiloDB: Reactive, Real-Time, In-Memory Time Series at Scale
FiloDB: Reactive, Real-Time, In-Memory Time Series at ScaleFiloDB: Reactive, Real-Time, In-Memory Time Series at Scale
FiloDB: Reactive, Real-Time, In-Memory Time Series at ScaleEvan Chan
 

Similar to Data Diffing Based Software Architecture Patterns (20)

Building Big Data Streaming Architectures
Building Big Data Streaming ArchitecturesBuilding Big Data Streaming Architectures
Building Big Data Streaming Architectures
 
Michael stack -the state of apache h base
Michael stack -the state of apache h baseMichael stack -the state of apache h base
Michael stack -the state of apache h base
 
YARN Ready: Integrating to YARN with Tez
YARN Ready: Integrating to YARN with Tez YARN Ready: Integrating to YARN with Tez
YARN Ready: Integrating to YARN with Tez
 
So you want to liberate your data?
So you want to liberate your data?So you want to liberate your data?
So you want to liberate your data?
 
High Performance Systems in Go - GopherCon 2014
High Performance Systems in Go - GopherCon 2014High Performance Systems in Go - GopherCon 2014
High Performance Systems in Go - GopherCon 2014
 
(DAT203) Building Graph Databases on AWS
(DAT203) Building Graph Databases on AWS(DAT203) Building Graph Databases on AWS
(DAT203) Building Graph Databases on AWS
 
Data Science
Data ScienceData Science
Data Science
 
Evolutionary database design
Evolutionary database designEvolutionary database design
Evolutionary database design
 
Agile Oracle to PostgreSQL migrations (PGConf.EU 2013)
Agile Oracle to PostgreSQL migrations (PGConf.EU 2013)Agile Oracle to PostgreSQL migrations (PGConf.EU 2013)
Agile Oracle to PostgreSQL migrations (PGConf.EU 2013)
 
Building a highly scalable and available cloud application
Building a highly scalable and available cloud applicationBuilding a highly scalable and available cloud application
Building a highly scalable and available cloud application
 
Kafka Summit NYC 2017 Introduction to Kafka Streams with a Real-life Example
Kafka Summit NYC 2017 Introduction to Kafka Streams with a Real-life ExampleKafka Summit NYC 2017 Introduction to Kafka Streams with a Real-life Example
Kafka Summit NYC 2017 Introduction to Kafka Streams with a Real-life Example
 
Impala Architecture presentation
Impala Architecture presentationImpala Architecture presentation
Impala Architecture presentation
 
Apache Spark
Apache SparkApache Spark
Apache Spark
 
Large-scale projects development (scaling LAMP)
Large-scale projects development (scaling LAMP)Large-scale projects development (scaling LAMP)
Large-scale projects development (scaling LAMP)
 
Intro to Big Data and NoSQL
Intro to Big Data and NoSQLIntro to Big Data and NoSQL
Intro to Big Data and NoSQL
 
Introduction to Impala
Introduction to ImpalaIntroduction to Impala
Introduction to Impala
 
Introduction to Hadoop and Big Data
Introduction to Hadoop and Big DataIntroduction to Hadoop and Big Data
Introduction to Hadoop and Big Data
 
Distributed Data processing in a Cloud
Distributed Data processing in a CloudDistributed Data processing in a Cloud
Distributed Data processing in a Cloud
 
No sql Database
No sql DatabaseNo sql Database
No sql Database
 
FiloDB: Reactive, Real-Time, In-Memory Time Series at Scale
FiloDB: Reactive, Real-Time, In-Memory Time Series at ScaleFiloDB: Reactive, Real-Time, In-Memory Time Series at Scale
FiloDB: Reactive, Real-Time, In-Memory Time Series at Scale
 

Recently uploaded

WSO2CON 2024 - How CSI Piemonte Is Apifying the Public Administration
WSO2CON 2024 - How CSI Piemonte Is Apifying the Public AdministrationWSO2CON 2024 - How CSI Piemonte Is Apifying the Public Administration
WSO2CON 2024 - How CSI Piemonte Is Apifying the Public AdministrationWSO2
 
WSO2Con2024 - Software Delivery in Hybrid Environments
WSO2Con2024 - Software Delivery in Hybrid EnvironmentsWSO2Con2024 - Software Delivery in Hybrid Environments
WSO2Con2024 - Software Delivery in Hybrid EnvironmentsWSO2
 
WSO2CON 2024 - How to Run a Security Program
WSO2CON 2024 - How to Run a Security ProgramWSO2CON 2024 - How to Run a Security Program
WSO2CON 2024 - How to Run a Security ProgramWSO2
 
WSO2CON 2024 - Unlocking the Identity: Embracing CIAM 2.0 for a Competitive A...
WSO2CON 2024 - Unlocking the Identity: Embracing CIAM 2.0 for a Competitive A...WSO2CON 2024 - Unlocking the Identity: Embracing CIAM 2.0 for a Competitive A...
WSO2CON 2024 - Unlocking the Identity: Embracing CIAM 2.0 for a Competitive A...WSO2
 
WSO2Con2024 - Facilitating Broadband Switching Services for UK Telecoms Provi...
WSO2Con2024 - Facilitating Broadband Switching Services for UK Telecoms Provi...WSO2Con2024 - Facilitating Broadband Switching Services for UK Telecoms Provi...
WSO2Con2024 - Facilitating Broadband Switching Services for UK Telecoms Provi...WSO2
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...Shane Coughlan
 
WSO2CON 2024 Slides - Unlocking Value with AI
WSO2CON 2024 Slides - Unlocking Value with AIWSO2CON 2024 Slides - Unlocking Value with AI
WSO2CON 2024 Slides - Unlocking Value with AIWSO2
 
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2
 
Evolving Data Governance for the Real-time Streaming and AI Era
Evolving Data Governance for the Real-time Streaming and AI EraEvolving Data Governance for the Real-time Streaming and AI Era
Evolving Data Governance for the Real-time Streaming and AI Eraconfluent
 
WSO2CON 2024 - Building a Digital Government in Uganda
WSO2CON 2024 - Building a Digital Government in UgandaWSO2CON 2024 - Building a Digital Government in Uganda
WSO2CON 2024 - Building a Digital Government in UgandaWSO2
 
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With SimplicityWSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With SimplicityWSO2
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2
 
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...WSO2
 
%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in sowetomasabamasaba
 
WSO2Con2024 - Organization Management: The Revolution in B2B CIAM
WSO2Con2024 - Organization Management: The Revolution in B2B CIAMWSO2Con2024 - Organization Management: The Revolution in B2B CIAM
WSO2Con2024 - Organization Management: The Revolution in B2B CIAMWSO2
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024VictoriaMetrics
 
WSO2CON 2024 - Not Just Microservices: Rightsize Your Services!
WSO2CON 2024 - Not Just Microservices: Rightsize Your Services!WSO2CON 2024 - Not Just Microservices: Rightsize Your Services!
WSO2CON 2024 - Not Just Microservices: Rightsize Your Services!WSO2
 
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...WSO2
 
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open SourceWSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open SourceWSO2
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2
 

Recently uploaded (20)

WSO2CON 2024 - How CSI Piemonte Is Apifying the Public Administration
WSO2CON 2024 - How CSI Piemonte Is Apifying the Public AdministrationWSO2CON 2024 - How CSI Piemonte Is Apifying the Public Administration
WSO2CON 2024 - How CSI Piemonte Is Apifying the Public Administration
 
WSO2Con2024 - Software Delivery in Hybrid Environments
WSO2Con2024 - Software Delivery in Hybrid EnvironmentsWSO2Con2024 - Software Delivery in Hybrid Environments
WSO2Con2024 - Software Delivery in Hybrid Environments
 
WSO2CON 2024 - How to Run a Security Program
WSO2CON 2024 - How to Run a Security ProgramWSO2CON 2024 - How to Run a Security Program
WSO2CON 2024 - How to Run a Security Program
 
WSO2CON 2024 - Unlocking the Identity: Embracing CIAM 2.0 for a Competitive A...
WSO2CON 2024 - Unlocking the Identity: Embracing CIAM 2.0 for a Competitive A...WSO2CON 2024 - Unlocking the Identity: Embracing CIAM 2.0 for a Competitive A...
WSO2CON 2024 - Unlocking the Identity: Embracing CIAM 2.0 for a Competitive A...
 
WSO2Con2024 - Facilitating Broadband Switching Services for UK Telecoms Provi...
WSO2Con2024 - Facilitating Broadband Switching Services for UK Telecoms Provi...WSO2Con2024 - Facilitating Broadband Switching Services for UK Telecoms Provi...
WSO2Con2024 - Facilitating Broadband Switching Services for UK Telecoms Provi...
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
 
WSO2CON 2024 Slides - Unlocking Value with AI
WSO2CON 2024 Slides - Unlocking Value with AIWSO2CON 2024 Slides - Unlocking Value with AI
WSO2CON 2024 Slides - Unlocking Value with AI
 
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
 
Evolving Data Governance for the Real-time Streaming and AI Era
Evolving Data Governance for the Real-time Streaming and AI EraEvolving Data Governance for the Real-time Streaming and AI Era
Evolving Data Governance for the Real-time Streaming and AI Era
 
WSO2CON 2024 - Building a Digital Government in Uganda
WSO2CON 2024 - Building a Digital Government in UgandaWSO2CON 2024 - Building a Digital Government in Uganda
WSO2CON 2024 - Building a Digital Government in Uganda
 
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With SimplicityWSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?
 
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
 
%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto
 
WSO2Con2024 - Organization Management: The Revolution in B2B CIAM
WSO2Con2024 - Organization Management: The Revolution in B2B CIAMWSO2Con2024 - Organization Management: The Revolution in B2B CIAM
WSO2Con2024 - Organization Management: The Revolution in B2B CIAM
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
 
WSO2CON 2024 - Not Just Microservices: Rightsize Your Services!
WSO2CON 2024 - Not Just Microservices: Rightsize Your Services!WSO2CON 2024 - Not Just Microservices: Rightsize Your Services!
WSO2CON 2024 - Not Just Microservices: Rightsize Your Services!
 
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
 
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open SourceWSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
 

Data Diffing Based Software Architecture Patterns

  • 1. Data Diffing Based Software Architecture Patterns Huahai Yang Juji Inc.
  • 2. What is diffing? • Given two elements a and b,calculate the difference d between them • Function (diff a b) ;=> d • Function (patch a d) • Such that (= b (patch a d)) • Or: (= b (patch a (diff a b))) • These are normally true: • (not= (diff a b) (diff b a)) • (= (diff a c) (concat (diff a b) (diff b c))) • (< (size d) (min (size a) (size b))) • (< (time (patch a d)) (time (diff a b)))
  • 3. Evolution of diffing (1) • Earliest diff was developed by Doug McIIroy on Unix at Bell Lab in 1974 • Works on text file, work units are lines of text • Purpose: Reduce storage necessary to maintain multiple versions of file. • Use: compare content, track changes, verifying output, version control
  • 4. Evolution of diffing (2) • Diffing in 3D graphics programming • World modeled as a scene graph • Only re-render changed subtrees • Purpose: performance optimization • Conceptually simple programming model: render everything • Inspired react.js • Clojurescript wrapper of react could be faster than react due to faster diffing with immutable data
  • 5. Evolution of diffing (3) • Data oriented programming • Data, not text • Data are directly meaningful for code, no need for parsing or decoding • Generic data literals, not specialized opaque programming constructs • Diff input and output are both data • Diffing as a software architecture consideration, not just an implementation detail, impacting • Delineation of system components • Data model design • API design
  • 6. Diffing enables decoupling • diff & patch functions are generic and blind • They don't have to understand their input for them to work • Semantic asymmetry between sender and receiver enforces separation of concerns • Also support a kind of natural encapsulation, not forced like in OOP • d is still open for inspection if the receiver chooses to • Graded, receiver don’t need know a lot, but can know a lot if choose to Sender (diff a a’) ;=> d d Receiver (patch a d) ;=> a’
  • 7. Diffing encourages data model reuse • Thanks to diffing, data duplication between components are faithful and cheap • Advantageous to reuse the same data model throughout the system, dramatically simplifying system
  • 8. Diffing tracks changes • Thanks to diffing, each version of the world state can be cheaply saved and replayed to recover originals • Application statefulness can be externalized and managed
  • 9. Editscript: a Clojure data diffing library • https://github.com/juji-io/editscript • Works for vector, list, set and map • Edits are a vector of vectors: • Path • Op :+, :-, or :r • Value • Diffing algorithms • Quick: fast • A* : optimal diff size
  • 10. Case study: Juji Studio UI Re-design • Complete UI redesign • Re-implementation • One month turnaround • Mainly due to switching from a resource-oriented API to a diffing based API Before
  • 11. Case study: Juji Studio UI Re-design • Complete UI redesign • Re-implementation • One month turnaround • Mainly due to switching from a resource-oriented API to a diffing based API After
  • 12. UI Data model: config doc • Single Page Application (SPA) in cljs • States in an EDN document – config doc • SPA, server and DB all having copies of config doc Config doc SPA Server DB GraphQL Config doc Config docAPI
  • 13. Traditional GraphQL API • Resources oriented (RESTful) • Server side config doc is the truth • API is CRUD on server resources • i.e. paths in the config doc • Repetitive CRUD calls for each and every type of nodes • Thousands lines of Lacinia schema
  • 14. Diffing based GraphQL API • All logic is in SPA • API is CRUD on config doc • Update is sending diffs • SPA periodically sends to server: (diff doc-prev doc-now) • Server applies the diff, saves the doc in DB, replies with config doc SHA • SPA validates SHA, if different, sends config doc to overwrite • Removed all API calls on paths and nodes
  • 15. Case study: externalize application states • How to scale highly stateful application? • E.g. Juji initiates an agent (rep) for each chat session on a server node, the state of each rep is stored in an atom • What if the server node become unavailable? Server Node API Gateway
  • 16. Case study: externalize application states • Each rep sends diff of its state to a persistent log (e.g. Kafka) • E.g. At each utterance, rep sends (diff state-prev state-now) • When a server becomes unavailable, API gateway forward traffic to another server, which recovers the agent state from the persistent log, by simply sequentially applying all diffs to a shared initial state. Server Node API Gateway Persistent Log diff
  • 17. Case study: reduce component dependency • Stateful components depend on one another • Introducing user invokable system functions, leads to circular dependency, e.g. (juji.func.system/cleanup-chat rep) System Rep Reps Rep Subs func.system [:rt jujiid]
  • 18. • Instead of depending on namespaces that contain subscriptions • Watch reps atom • Inspect its diff between old and new • Handle the case when a rep is removed or cleaned • i.e. sending :user-left message to channels, and let the subscriptions clean themselves up
  • 19. Case study: synchronize collaborative editing • Multiple parties sending diffs • Out of sync when lines cross path • Difficult yet common problem • E.g. enable multiple users editing the same chat at the same time • Locking has bad UX • Three-way merge has high latency A A (diff A A’) (diff A A’’)
  • 20. Differential Synchronization • Diffing based synchronization method • Scalable • Fault-tolerant • Low latency • Developed by Neil Fraser in 2009 • Used by Google Docs
  • 22. • Fault tolerant case • Keep a backup shadow
  • 24. Data modeling guideline: Don’t use vector • Minimize unnecessary use of ordered data structure, e.g. vector or list • Diffing algorithm is slow for ordered data, because order is a strong constraint to satisfy • Ordered O(mn) vs. Unordered O(m+n) • The implicit order of data elements are often source of incidental complexity • Meaningful order is often based on data fields • Sets or maps suffice in most cases [ {} {} {} … ] Bad { {} {} {} … } #{ {} {} {} … } Good
  • 25. Conclusion • Diffing offers a few properties that lead to • Simplified software architecture • Enhanced system decoupling • Easier scaling of stateful application • Better solution to data synchronization problem • Worthwhile to consider diffing based software architecture • Particularly for data-oriented programming
  • 26. Thank you! • Huahai Yang @huahaiy • Juji Inc. https://juji.io