Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Camunda BPM 7.2: Performance and Scalability (English)

7,418 views

Published on

Executing hundreds or thousands of process instances per second? Yes, it's possible. This webinar is about best practices for high-load situations, and how to scale Camunda BPM horizontally.

Published in: Software

Camunda BPM 7.2: Performance and Scalability (English)

  1. 1. Hands-on Webinar Camunda BPM 7.2 Performance and Scalability
  2. 2. Daniel Meyer  Process Engine Expert  Technical Project Lead @meyerdan | daniel.meyer@camunda.com Bernd Rücker  10+ years experience with workflow and Java  Co-Founder of Camunda  Evangelist & Head of Consulting @berndruecker | bernd.ruecker@camunda.com Your speakers today
  3. 3. Performance is a difficult topic
  4. 4.  It always depends −On hardware −On software environment (OS, Java, App Server, Database, …) −On Service Tasks in the process −On network topology (e.g. remote database, web services, …) −On concurrent requests, database load, …  There is no simple answer to performance  But we always succeed – in each and every real-life situation −Handling millions of process instances / day −Handling more than 1.000 process instances / second −Handling thousands of parallel users Performance is a difficult topic
  5. 5. We are much faster than competition see http://camunda.com/landing/whitepaper-camunda-jbpm/ In our tests, Camunda‘s throughput was 10x – 30x higher than with JBoss jBPM.
  6. 6. 1. Understand basic engine architecture 2. Understand influence parameters on performance 3. Discuss performance improvement approaches 4. See example figures / measurements 5. Discuss future scenarios (e.g. sharding, NoSQL, …) What we do today
  7. 7. Basic Engine Architecture We use Optimistic Locking
  8. 8. Runtime vs. History Database
  9. 9. Runtime database schema
  10. 10. Learning #1: The architecture it damn simple – and the bottleneck is not the process engine!
  11. 11. Biggest influence on Performance Database Delegation Code Call Service
  12. 12. Clustering via shared database
  13. 13. Learning #2: All state is in the database so clustering gets really easy. camunda scales! More on this later…
  14. 14. „But what can I do if performance IS a problem?“
  15. 15. 1. Tasklist 2. (History) Queries 3. Job Execution Typical Areas of performance issues
  16. 16.  Process/Task Variables −Show in list −Use in Search/Filter  Support for Pagination  Big number of users accessing the tasklist very often Implementation challenge  Provide a generic database schema  Complex data types are serialized – no SQL-JOIN possible  Variables are stored in one row per variable – multiple SQL-JOINs might be required  Some customers use 10-30 variables Tasklist Requirements
  17. 17.  Add Process Variables optimized (and only used) for Queries −Extract attributes −Combine variables to work with LIKE  Use own queries −Native – if you want to improve the WHERE −Custom – if you want to SELECT multiple information at once  Own TaskInfo or ProcessInstanceInfo entities −Persisted as MyBatis or JPA entities −Combine all attributes – allow to query tasks without (or with one) JOIN only −Synchronisation via Listener – or use ProcessInstanceInfo as single source Solution Approaches: Tasklist
  18. 18. Example Customer - customerId - company - … Your DB camunda PROCESS_VARIABLES customerId ... searchField 4711 ... 4711#camunda#Berlin#... 1 2 Native Query: 3 Custom Query: 4 Java API – results are camunda „Task“ entities Own MyBatis mapping – result can be anything. Called via custom code.
  19. 19. Example TaskInfo - taskId - customerId - companyName - contractId - productName - … Your DB camunda PROCESS_VARIABLES customerId contractId productId 4711 0815 42 5 TaskInfo Entity (or ProcessInstanceInfo)
  20. 20.  The challenge: −Indexes cost space and performance in writing data −We provide a generic database schema without knowing what you exactly do with it −We constantly work on the right balance of too many and too less indexes  What you can do: −Check indexes and slow query log −Add index where appropriate for your situation (perfectly OK with us, you do not loose support!) −As Enterprise Customer you can always discuss/validate changes with support  Example: create index PROC_DEF_ID_END_TIME ON ACT_HI_PROCINST (PROC_DEF_ID_,END_TIME_) (History) Queries
  21. 21. You can also customize history Custom History (e.g. ElasticSearch) Different History Levels: - NONE - ACTIVITY - AUDIT - FULL - CUSTOM (own Filter written in Java, e.g. „only variable X“, „not process Y“, …) Example for custom log level: https://github.com/camunda/camunda- bpm-examples/tree/master/process- engine-plugin/custom-history-level
  22. 22. Job Execution
  23. 23.  Asynchronous Continuation involve Jobs  Jobs are stored in the database  Job Executor can be configured −Number of Worker Threads −Number of Jobs fetched with one database query −Size of in-memory Queue −Lock Time, Retry Behavior, …  Job Execution can be distributed over a Cluster  Optimizing is not a straight forward task, hard to give general advise  If you need to improve: Measure and benchmark configurations in your environment! Job Execution
  24. 24. The good news: We did big performance improvements in Camunda BPM 7.2!  Improved First Level Cache (throughput increased by up to 90% if async Service Tasks are executed in a row)  Improved locking to have less Optimistic Lock Exceptions and more Jobs acquired per Acquisition. Results in bigger Clusters getting possible. Job Execution in Camunda BPM 7.2
  25. 25. Recap:  Added log level “CUSTOM” for History  First Level Cache  Job Executor Acquisition Locking Plus:  Added flush ordering (comparable to Hibernate) to minimize risk of deadlocks Summary: Performance Improvements in 7.2
  26. 26. Learning #3: All performance challenges can be solved.
  27. 27. This is AWESOME!
  28. 28. Recommendation: Measure! No guessing. camunda engine Process Application External Load Generator e.g. JMeter, HP Load Runner, CURL, … REST „close to production“ environment
  29. 29. - Measure - JobExecutor Horizontal Scalability - Impact of 1st level cache reuse - Improvements Version 7.1.0 vs. Version 7.2.0 - Environment: Amazon AWS Cloud (EC2 & RDS) Benchmark
  30. 30. Benchmark Setup Client Process Engine Node 1 Process Engine Node 2 Process Engine Node 3 Process Engine Node 4 Start Process Instance (Rest API) Database (Postgres) https://github.com/meyerdan/ec2-benchmark EC2 m3.xlarge (Intel Xeon E5-2670 v2, 4 core, 15 GiB Memory) EC2 m3.xlarge (Intel Xeon E5-2670 v2, 4 core, 15 GiB Memory) EC2 db.m3.xlarge (Intel Xeon E5-2670 v2, 4 core, 15 GiB Memory) Provisioned using Docker
  31. 31. EC2
  32. 32. Benchmark Setup - The process - All service tasks „Async“ - 1st service task creates 5 variables - Variables are read by subsequent service tasks
  33. 33.  Throughput in terms of transactions / second  No absolute Numbers  Benchmark Results
  34. 34. Benchmark Results (1)
  35. 35. Benchmark Results (1)
  36. 36. Benchmarks Results (2)
  37. 37. Benchmarks Results Cache Off Cache On Amazon RDS Metrics
  38. 38. Benchmarks Results Cache Off Cache On Amazon RDS Metrics
  39. 39. What about true Horizontal Scalability?
  40. 40. What is Horizontal Scalability? Scale up the number of transactions executed by adding more processing nodes to the system. [*] [*] http://en.wikipedia.org/wiki/Scalability#Horizontal_and_vertical_scaling (Adapted) Horizontal Scalability transactions / sec nodes
  41. 41. The current Situation Scale number of Process Engine Nodes (JVMs) Up to a certain point Limited possibilities for scaling the shared relational Database. In a sense this can only be scaled “up”, not “out”. Shared Relational Database Process Engine Process Engine Process Engine
  42. 42. Which way to go? Distributed Datastore Process Engine Process Engine Process Engine Distributed Datastore. Use a database which is itself a distributed system and can be scaled horizontally. - Apache Cassandra, - Apache HBase, - Distributed Caches (Hazelcast, …) - ... Sharding and partitioning. Distribute the state over multiple Datastores. - Multiple instances of PostgreSQL - Each “DB” is a Mongo DB shard - No “DB” at all: use a filesystem journal? - ... Key Difference: on the right hand side, the process engine itself is “distributed” in the sense that it is aware of the distribution and sharding.
  43. 43. The problem with Distributed Datastores (In the context of process engines) 1. Consistency guarantees offered by these databases (eventual consistency, ACID vs. BASE, ...) often do not match the requirements of BPMN process execution. See: conflicting concurrent transactions: a. Racing incoming signals (E.g.: Two Messages targeting the same event instance arrive at the same time) b. Joins & Synchronization (E.g.: Gateways, Multi Instance, ...) c. Cancel Activity instance (E.g.: Interrupting Message Boundary Event) 1. Data Representation and Network Latency / Overhead: Process instance state is composite: a. Token state / active activity instances b. Variables c. Task Information, … Challenge is to find a data representation which does not lead to distribution of the state of a single process instance across the cluster while still supporting the required access patterns. 2. Significant differences between individual technologies while there are no industry standards in place yet. (Different with SQL).
  44. 44. Sharding => Distributed yet Local Scale horizontally... Each “shard / node” maitains its state locally Partitioning workflow instance state - Each process instance lives inside a single shard / partition => local data consistency easy to guarantee, => easy to access efficiently => Support range of different persistence engines (Relational Database, Non-Relational Databases, …)
  45. 45. Proces s Engine Flexible Architecture ... Reality @ zalando 2014 Proces s Engine Proces s Engine The simplest case A single process engine node running on top of a conventional database. A medium Scenario Horizontally scale on top of a conventional database. Massive Compute Cluster 500 Nodes ? All of this should be possible with one unified architecture!
  46. 46. No more Search! The catch “Find Process Instance for order with ID 43543242” ?? ???
  47. 47. Human Workflow (Build Task Lists) History: Monitoring, Reporting, … Message Correlation When is „Search“ required?
  48. 48. Message Correlation The Problem to solve Workflow Instance State for order with ID 435345 Incoming Message: “customer cancelled Order with ID 435345”
  49. 49.  Yes, but for non-workflow execution Use Cases Use Search Index? (A)sync Updates Search Index (Near Realtime) Tasklist Queries, Monitoring,...
  50. 50. Vision HistoryTasksCore Process Execution Signal / Cancel Activity Instance by Id Correlate Message Query for List of Tasks Monitoring, Reports Real Time, Strongly Consistent Horizontally scalable through sharding Multiple persistence technologies possible Near Real Time, Eventually Consistent Use best technology for the Job. Async Event Stream
  51. 51. But still... HistoryTasksCore Process Execution Signal / Cancel Activity Instance by Id Correlate Message Query for List of Tasks Monitoring, Reports In the simplest case!
  52. 52. Learning #4: You can do true horizontal clustering with the engine which exists today! There is no need for No-SQL persistence in the core engine.
  53. 53. Learning #5: Camunda is really damn smart :-)
  54. 54. Camunda BPM Performance is already awesome However: We are continuously improving performance There are strategies to solve specific performance challenges There is no limit in scalability Summary
  55. 55. Start now! Open Source Edition • Download: www.camunda.org • Docs, Tutorials etc. • Forum • Meetings Enterprise Edition • Trial: www.camunda.com • Additional Features • Support, Patches etc. • Consulting, Training http://camunda.com/bpm/consultation/ info@camunda.com | US +1.415.800.3908 | DE +49 30 664040 900
  56. 56. Thank you! Questions?

×