MongoDB and Machine Learning with Flowable

Flowable + MongoDB
+ Machine Learning
Joram Barrez
-Flowable Core Developer-

Transactions
• Flowable relies on the transactional semantics of a
relational db
• “Atomically” moving from one stable state to another
• This doesn’t free you from forgetting about service failures,
but understanding the transactional model of Flowable sure
makes it easier to write resilient processes
• MongoDB 4.0 added support for transactions (June)
2

MongoDB
• Open-source NoSql JSON document store
• Short history
• Started in 2007 by 10gen as component of their PaaS
• Was known in the early days (2.2 versions and before) as the
dev/null db
• Acquired WiredTiger end of 2014
• WiredTiger default storage engine in 3.2
• WiredTiger enables transactional semantics (ACID) on multi-
document operations in 4.0 (*)
3* “Path to transactions” series on https://www.youtube.com/user/MongoDB/videos

Flowable – MongoDB
• All code: https://github.com/flowable/flowable-mongodb
4
Service call
Command Interceptor/
Commands
Agenda / operations
EntityManagers
DataManagers
Engine core logic
Low-level data access
High-level data functions

Implementation
• Replace the lowest layer
• MongoDB’s transactions follow a familiar programming
model
• Concept of clientSession
• Matches Flowable’s low-level session concept nicely
5

Demo
6
Relational vs MongoDB implementation

Implementation
• Replace all Datamanager interface implementations with a
MongoDB counterpart
• alpha releases
• Gather interest/feedback
• Using the existing test suite to validate the implementation
• Completed -> beta / stable release
• (Almost) 1-1 translation of the relational data structure
• Optimizations along the way
• MongoDb-specific structure optimization surely will follow
7

Challenges
• com.mongodb.MongoCommandException: Command
failed with error 112 (WriteConflict): 'WriteConflict' on server
exethanter.local:27017. The full response is { "errorLabels" :
["TransientTransactionError"], "operationTime" : {
"$timestamp" : { "t" : 1537701066, "i" : 3 } }, "ok" : 0.0, "errmsg"
: "WriteConflict", "code" : 112, "codeName" : "WriteConflict",
"$clusterTime" : { "clusterTime" : { "$timestamp" : { "t" :
1537701066, "i" : 3 } }, "signature" : { "hash" : { "$binary" :
"AAAAAAAAAAAAAAAAAAAAAAAAAAA=", "$type" : "00" },
"keyId" : { "$numberLong" : "0" } } } }
8

Challenges
• Taking joins for granted
• Denormalization needed
• Way more work as a developer to guarantee data consistency
• E.g simple example: see ‘latest’ of Process definition
• Exchange writes/updates for faster reads
9

Luckily
• Over the past years
• We’ve made Flowable a lot faster by keeping in mind that one
exchange over a network is extremely expensive
• Denormalization, prefetching, entity counts
10

Performance
• Is the performance acceptable?
• Benchmark on AWS
• Setup (see GitHub repo)
11
Process Service
Postgres
MongoDB
m5d.2xlarge (8 cores/32Gb RAM), 100GB SSD
m5d.2xlarge (8 cores/32Gb RAM), 100GB SSD
t3.2xlarge (8 cores/32Gb RAM), 100GB SSD
max_connections = 100
shared_buffers = 8GB
effective_cache_size = 24GB
maintenance_work_mem = 2GB
checkpoint_completion_target = 0.7
wal_buffers = 16MB
default_statistics_target = 100
random_page_cost = 1.1
effective_io_concurrency = 200
work_mem = 20971kB
min_wal_size = 1GB
max_wal_size = 2GB
max_worker_processes = 8
max_parallel_workers_per_gather = 4
max_parallel_workers = 8
listen_addresses = '*'

Process Service
• Bring process into a stable state
• One transaction
• Fixed threadpool of 8 threads
12
- 6 executions
- 2 user tasks
- 1 hist. proc inst
- 11 hist. activities
- 2 hist. user tasks
- 31 variables
- 31 hist. variables
- 1 timer job

Results
• Reverse of what we expected J
13

Results
• Although the graphs seem to indicate a relative large
difference, we’re talking about sub-ms differences!
• Relational db’s have not been idling
• See our recent performance benchmarks
• https://blog.flowable.org/2018/03/05/flowable-6-3-0-
performance-benchmark/
• https://blog.flowable.org/2018/03/13/async-history-
performance-benchmark/
14

Conclusion
• The transactional support in MongoDB is impressive
• Data consistency perspective
• Performance perspective
• Using Flowable on MongoDB is a valid alternative
15

Current limitations
• Read/Write to primary only
• Adding replica nodes seemed to have a negative effect
• Even though read/write to primary (current MongoDB transactions
limitation)
• MongoDB transactions are still under development
16https://www.youtube.com/watch?v=dQh03YLkmyg

Future work
• MongoDB is designed for horizontal scale
• (Yes, (for example) postgres has partitioning, but …)
• Sharded clusters + Flowable à interesting use cases
• Shard by tenant
• Shard on process definition key
• BigData use cases … like ML!
17

Machine Learning
• Process/Case engines are in a prime position
• End-user data through forms
• Service invocation data
• (Semi-)Structured models
18

Machine Learning
• MongoDB being “BigData” (e.g better suited for streaming,
reactive, etc.) opens up use cases for ML
• Demo
• Run processes a lot from start to end
• Feed historical data into ML
• See if human work is repetitive and suggest optimizations
19

Machine Learning
1. Look for Human Decision patterns
20

Machine Learning
2. Gather possible data inputs and backtrack

Machine Learning
2. Gather possible data inputs and backtrack
3. Use machine learning (Spark decision tree algorithm) to
calculate potential patterns in the data
1. i.e. which data at the start leads to a certain path later on
(within certain % of confidence)

Architecture
23
Process
Service
UI
Stream as RDD
Decision
Analysis
Service
suggestions
Spark (cluster) +
MLlib
Process
Service
Process
Service
Decision
Analysis
Service
Decision
Analysis
Service

Architecture
• vs last year
24

Demo
Processes + Mongo + Machine Learning
25

MongoDB and Machine Learning with Flowable

More Related Content

What's hot

Similar to MongoDB and Machine Learning with Flowable

More from Flowable

Recently uploaded

MongoDB and Machine Learning with Flowable