Flowable + MongoDB
+ Machine Learning
Joram Barrez
-Flowable Core Developer-
Transactions
• Flowable relies on the transactional semantics of a
relational db
• “Atomically” moving from one stable state to another
• This doesn’t free you from forgetting about service failures,
but understanding the transactional model of Flowable sure
makes it easier to write resilient processes
• MongoDB 4.0 added support for transactions (June)
2
MongoDB
• Open-source NoSql JSON document store
• Short history
• Started in 2007 by 10gen as component of their PaaS
• Was known in the early days (2.2 versions and before) as the
dev/null db
• Acquired WiredTiger end of 2014
• WiredTiger default storage engine in 3.2
• WiredTiger enables transactional semantics (ACID) on multi-
document operations in 4.0 (*)
3* “Path to transactions” series on https://www.youtube.com/user/MongoDB/videos
Flowable – MongoDB
• All code: https://github.com/flowable/flowable-mongodb
4
Service call
Command Interceptor/
Commands
Agenda / operations
EntityManagers
DataManagers
Engine core logic
Low-level data access
High-level data functions
Implementation
• Replace the lowest layer
• MongoDB’s transactions follow a familiar programming
model
• Concept of clientSession
• Matches Flowable’s low-level session concept nicely
5
Demo
6
Relational vs MongoDB implementation
Implementation
• Replace all Datamanager interface implementations with a
MongoDB counterpart
• alpha releases
• Gather interest/feedback
• Using the existing test suite to validate the implementation
• Completed -> beta / stable release
• (Almost) 1-1 translation of the relational data structure
• Optimizations along the way
• MongoDb-specific structure optimization surely will follow
7
Challenges
• com.mongodb.MongoCommandException: Command
failed with error 112 (WriteConflict): 'WriteConflict' on server
exethanter.local:27017. The full response is { "errorLabels" :
["TransientTransactionError"], "operationTime" : {
"$timestamp" : { "t" : 1537701066, "i" : 3 } }, "ok" : 0.0, "errmsg"
: "WriteConflict", "code" : 112, "codeName" : "WriteConflict",
"$clusterTime" : { "clusterTime" : { "$timestamp" : { "t" :
1537701066, "i" : 3 } }, "signature" : { "hash" : { "$binary" :
"AAAAAAAAAAAAAAAAAAAAAAAAAAA=", "$type" : "00" },
"keyId" : { "$numberLong" : "0" } } } }
8
Challenges
• Taking joins for granted
• Denormalization needed
• Way more work as a developer to guarantee data consistency
• E.g simple example: see ‘latest’ of Process definition
• Exchange writes/updates for faster reads
9
Luckily
• Over the past years
• We’ve made Flowable a lot faster by keeping in mind that one
exchange over a network is extremely expensive
• Denormalization, prefetching, entity counts
10
Performance
• Is the performance acceptable?
• Benchmark on AWS
• Setup (see GitHub repo)
11
Process Service
Postgres
MongoDB
m5d.2xlarge (8 cores/32Gb RAM), 100GB SSD
m5d.2xlarge (8 cores/32Gb RAM), 100GB SSD
t3.2xlarge (8 cores/32Gb RAM), 100GB SSD
max_connections = 100
shared_buffers = 8GB
effective_cache_size = 24GB
maintenance_work_mem = 2GB
checkpoint_completion_target = 0.7
wal_buffers = 16MB
default_statistics_target = 100
random_page_cost = 1.1
effective_io_concurrency = 200
work_mem = 20971kB
min_wal_size = 1GB
max_wal_size = 2GB
max_worker_processes = 8
max_parallel_workers_per_gather = 4
max_parallel_workers = 8
listen_addresses = '*'
Process Service
• Bring process into a stable state
• One transaction
• Fixed threadpool of 8 threads
12
- 6 executions
- 2 user tasks
- 1 hist. proc inst
- 11 hist. activities
- 2 hist. user tasks
- 31 variables
- 31 hist. variables
- 1 timer job
Results
• Reverse of what we expected J
13
Results
• Although the graphs seem to indicate a relative large
difference, we’re talking about sub-ms differences!
• Relational db’s have not been idling
• See our recent performance benchmarks
• https://blog.flowable.org/2018/03/05/flowable-6-3-0-
performance-benchmark/
• https://blog.flowable.org/2018/03/13/async-history-
performance-benchmark/
14
Conclusion
• The transactional support in MongoDB is impressive
• Data consistency perspective
• Performance perspective
• Using Flowable on MongoDB is a valid alternative
15
Current limitations
• Read/Write to primary only
• Adding replica nodes seemed to have a negative effect
• Even though read/write to primary (current MongoDB transactions
limitation)
• MongoDB transactions are still under development
16https://www.youtube.com/watch?v=dQh03YLkmyg
Future work
• MongoDB is designed for horizontal scale
• (Yes, (for example) postgres has partitioning, but …)
• Sharded clusters + Flowable à interesting use cases
• Shard by tenant
• Shard on process definition key
• BigData use cases … like ML!
17
Machine Learning
• Process/Case engines are in a prime position
• End-user data through forms
• Service invocation data
• (Semi-)Structured models
18
Machine Learning
• MongoDB being “BigData” (e.g better suited for streaming,
reactive, etc.) opens up use cases for ML
• Demo
• Run processes a lot from start to end
• Feed historical data into ML
• See if human work is repetitive and suggest optimizations
19
Machine Learning
1. Look for Human Decision patterns
20
Machine Learning
1. Look for Human Decision patterns
2. Gather possible data inputs and backtrack
Machine Learning
1. Look for Human Decision patterns
2. Gather possible data inputs and backtrack
3. Use machine learning (Spark decision tree algorithm) to
calculate potential patterns in the data
1. i.e. which data at the start leads to a certain path later on
(within certain % of confidence)
Architecture
23
Process
Service
UI
Stream as RDD
Decision
Analysis
Service
suggestions
Spark (cluster) +
MLlib
Process
Service
Process
Service
Decision
Analysis
Service
Decision
Analysis
Service
Architecture
• vs last year
24
Demo
Processes + Mongo + Machine Learning
25
Thank you!

MongoDB and Machine Learning with Flowable

  • 1.
    Flowable + MongoDB +Machine Learning Joram Barrez -Flowable Core Developer-
  • 2.
    Transactions • Flowable relieson the transactional semantics of a relational db • “Atomically” moving from one stable state to another • This doesn’t free you from forgetting about service failures, but understanding the transactional model of Flowable sure makes it easier to write resilient processes • MongoDB 4.0 added support for transactions (June) 2
  • 3.
    MongoDB • Open-source NoSqlJSON document store • Short history • Started in 2007 by 10gen as component of their PaaS • Was known in the early days (2.2 versions and before) as the dev/null db • Acquired WiredTiger end of 2014 • WiredTiger default storage engine in 3.2 • WiredTiger enables transactional semantics (ACID) on multi- document operations in 4.0 (*) 3* “Path to transactions” series on https://www.youtube.com/user/MongoDB/videos
  • 4.
    Flowable – MongoDB •All code: https://github.com/flowable/flowable-mongodb 4 Service call Command Interceptor/ Commands Agenda / operations EntityManagers DataManagers Engine core logic Low-level data access High-level data functions
  • 5.
    Implementation • Replace thelowest layer • MongoDB’s transactions follow a familiar programming model • Concept of clientSession • Matches Flowable’s low-level session concept nicely 5
  • 6.
  • 7.
    Implementation • Replace allDatamanager interface implementations with a MongoDB counterpart • alpha releases • Gather interest/feedback • Using the existing test suite to validate the implementation • Completed -> beta / stable release • (Almost) 1-1 translation of the relational data structure • Optimizations along the way • MongoDb-specific structure optimization surely will follow 7
  • 8.
    Challenges • com.mongodb.MongoCommandException: Command failedwith error 112 (WriteConflict): 'WriteConflict' on server exethanter.local:27017. The full response is { "errorLabels" : ["TransientTransactionError"], "operationTime" : { "$timestamp" : { "t" : 1537701066, "i" : 3 } }, "ok" : 0.0, "errmsg" : "WriteConflict", "code" : 112, "codeName" : "WriteConflict", "$clusterTime" : { "clusterTime" : { "$timestamp" : { "t" : 1537701066, "i" : 3 } }, "signature" : { "hash" : { "$binary" : "AAAAAAAAAAAAAAAAAAAAAAAAAAA=", "$type" : "00" }, "keyId" : { "$numberLong" : "0" } } } } 8
  • 9.
    Challenges • Taking joinsfor granted • Denormalization needed • Way more work as a developer to guarantee data consistency • E.g simple example: see ‘latest’ of Process definition • Exchange writes/updates for faster reads 9
  • 10.
    Luckily • Over thepast years • We’ve made Flowable a lot faster by keeping in mind that one exchange over a network is extremely expensive • Denormalization, prefetching, entity counts 10
  • 11.
    Performance • Is theperformance acceptable? • Benchmark on AWS • Setup (see GitHub repo) 11 Process Service Postgres MongoDB m5d.2xlarge (8 cores/32Gb RAM), 100GB SSD m5d.2xlarge (8 cores/32Gb RAM), 100GB SSD t3.2xlarge (8 cores/32Gb RAM), 100GB SSD max_connections = 100 shared_buffers = 8GB effective_cache_size = 24GB maintenance_work_mem = 2GB checkpoint_completion_target = 0.7 wal_buffers = 16MB default_statistics_target = 100 random_page_cost = 1.1 effective_io_concurrency = 200 work_mem = 20971kB min_wal_size = 1GB max_wal_size = 2GB max_worker_processes = 8 max_parallel_workers_per_gather = 4 max_parallel_workers = 8 listen_addresses = '*'
  • 12.
    Process Service • Bringprocess into a stable state • One transaction • Fixed threadpool of 8 threads 12 - 6 executions - 2 user tasks - 1 hist. proc inst - 11 hist. activities - 2 hist. user tasks - 31 variables - 31 hist. variables - 1 timer job
  • 13.
    Results • Reverse ofwhat we expected J 13
  • 14.
    Results • Although thegraphs seem to indicate a relative large difference, we’re talking about sub-ms differences! • Relational db’s have not been idling • See our recent performance benchmarks • https://blog.flowable.org/2018/03/05/flowable-6-3-0- performance-benchmark/ • https://blog.flowable.org/2018/03/13/async-history- performance-benchmark/ 14
  • 15.
    Conclusion • The transactionalsupport in MongoDB is impressive • Data consistency perspective • Performance perspective • Using Flowable on MongoDB is a valid alternative 15
  • 16.
    Current limitations • Read/Writeto primary only • Adding replica nodes seemed to have a negative effect • Even though read/write to primary (current MongoDB transactions limitation) • MongoDB transactions are still under development 16https://www.youtube.com/watch?v=dQh03YLkmyg
  • 17.
    Future work • MongoDBis designed for horizontal scale • (Yes, (for example) postgres has partitioning, but …) • Sharded clusters + Flowable à interesting use cases • Shard by tenant • Shard on process definition key • BigData use cases … like ML! 17
  • 18.
    Machine Learning • Process/Caseengines are in a prime position • End-user data through forms • Service invocation data • (Semi-)Structured models 18
  • 19.
    Machine Learning • MongoDBbeing “BigData” (e.g better suited for streaming, reactive, etc.) opens up use cases for ML • Demo • Run processes a lot from start to end • Feed historical data into ML • See if human work is repetitive and suggest optimizations 19
  • 20.
    Machine Learning 1. Lookfor Human Decision patterns 20
  • 21.
    Machine Learning 1. Lookfor Human Decision patterns 2. Gather possible data inputs and backtrack
  • 22.
    Machine Learning 1. Lookfor Human Decision patterns 2. Gather possible data inputs and backtrack 3. Use machine learning (Spark decision tree algorithm) to calculate potential patterns in the data 1. i.e. which data at the start leads to a certain path later on (within certain % of confidence)
  • 23.
    Architecture 23 Process Service UI Stream as RDD Decision Analysis Service suggestions Spark(cluster) + MLlib Process Service Process Service Decision Analysis Service Decision Analysis Service
  • 24.
  • 25.
    Demo Processes + Mongo+ Machine Learning 25
  • 26.