1
2
Building Performant, Reliable,
Scalable Integrations with Mule ESB
Ryan Hoegg, IntegrationArchitect,Confluex
RupeshRamachandran,SolutionsArchitect, MuleSoft
3
Service Level Agreement
First Class Requirements
Precision Matters
4
5
Reliability SLAs
Availability
Uptime
Time toRecovery
Message Loss
Maximum Guarantee
Detection
Recovery
6
Scalability SLAs
Capacity
Peak
Sustained
(Degraded)
Message Volume
Message Size
7
Performance SLAs
Response Time
Throughput
Concurrency
Sustainability
8
Performance Tuning: Big picture
Mule ESB
JVM
Operating System
File System
Network
Downstream Systems
9
Performance: Best Practices
Asynchronous vs Synchronous
Real-time vs Batch
Stateful vs Stateless (web scale)
On-Premise vs iPaaS
10
Performance: Manage SLA’s
11
Performance: Case Study
Use Case:
API Gateway
XML to JSON
transforms
Mixed Payload Sizes
12
Performance: Case Study
Test Case Mule ESB 3.5 EE as API Gateway
Infrastructure Amazon EC2 with 10GbE network
Throughput ~8000tps
Latency ~5ms
Scale Linear scale out, to 6 boxes*
13
Performance Characteristics:
Mule ESB 3.5 EE
14
Making SLAs a Reality
Prioritize
Model
Measure
15
Tuning: Focusing on What Matters
Observe
Identify hot spot
Generate load
Compare with SLA
16
Tuning: Improve Scalability
Scale Up, Scale Out
SEDA
Store and Forward
Message Oriented Middleware
17
Case Study: Gaming Platform
Public Beta Launch
“Code Complete”
Players are unforgiving
18
Case Study: Gaming Platform
1. Catalog Services, Estimate Load
2. Prioritize
3. Isolate Muleand Instrument
4. Generate Load
5. Observe
6. Tune
19
Case Study: Gaming Platform
20
Case Study: Gaming Platform
21
Tuning: Improve Reliability
Reliable Acquisition Pattern
Transactions
Retry
Delegate
22
Case Study: Retail
Business Critical Integration
“Code Complete”
Losing Purchase Orders
23
Case Study: Retail
1. Determine failure modes
2. Decide how torespond
3. Induce and observe
4. Apply reliability patterns
24
Case Study: Retail
25
Questions?
Please visit Confluex and
MuleSoft experts in the Expo Hall
26

Building Performant, Reliable, and Scalable Integrations with Mule ESB

Editor's Notes

  • #3 Integration Software is Software
  • #4 Someone’s going to define them, intentionally or not Personal Story – emergent properties -> discover in production Instead, determine these requirements along with the functional ones Define terms: Service Level Agreement, originally from telecom, informally used in integration work to describe precise non functional requirements at system boundaries. OLA another term you may see used.
  • #5 Opposing Forces Tuning for reliability can negatively impact performance and scalability, … Complexity is the currency
  • #6 Risk Management
  • #7 Degraded – optional, when useful Introduce Rupesh Ramachandran, Solutions Architect, MuleSoft
  • #8 Primary: Response Time TPS Secondary: How many concurrent users can be handled Is performance sustainable over time?
  • #9 Peak throughput and response times require tuning of the entire stack. For instance: JVM (heap size, GC algorithm for low GC pause) OS (Linux ulimits, tcp_ip stack, HugePages, etc) File System (avoid excess logging, SSD vs HDD vs NFS) Network (1gig ethernet cannot handle over 125MBps per channel and quickly becomes the bottleneck) Downstream systems: backend systems like DB, SOAP WS, JMS Server, Websphere MQ, etc being integrated by Mule need to be tuned/scaled to perform equally or better than the ESB or it becomes the limiting factor
  • #10 Synchronous: Request-response with fast response times – synchronous allows use of single thread per request and avoid context switching overhead. If request takes too long, thread held up for long time, not always on CPU. Sub-optimal core usage. Non-blocking synchronous HTTP (Use Jetty) Asynchronous: Leverages SEDA, provides better CPU utilization for longer running processes. Threads not held up. Real-time: Typically synchronous, but not necessary. All records read into memory and processed at once. Problematic if occasional spikes or large payloads. Batch: Records processed in stages/batches. Like ETL jobs. Stateful: Typically required for long running processes, for H/A via in-memory grid, for state based flow controls like aggregator, scatter-gather, etc Stateless: No state means no persistence overhead. Helps achieve web scale.
  • #11 Managing SLA’s depending on consumer Get SLA metrics from API Analytics Works for all HTTP endpoints
  • #12 Business critical API. Back end .NET WS and REST service. Payloads 10k – 500k
  • #13 8000tps = 650+ million records per day (could go higher if not for 10GbE saturation) Average Latency of 5ms is round-trip from test client. Mule added latency is approx 1ms Scale test was done on smaller boxes. Stopped at 6 because the 10g ethernet was saturated.
  • #14 Single MuleESB. 2G heap on a 36GB RAM box. 50% cpu used on a 24 vcpu box. SOAP Proxy use case with data transformation from SOAP to REST.
  • #15 Prioritize: Establishing baseline one SLA at a time allows you to protect the most important things first Model: How to generate load? How realistic? Measure: Instrument, Record, apply statistics
  • #16 <60 seconds, describe iterative process
  • #17 Scale up = additive, Scale out = multiplicative SEDA: focuses threading controls on hot spots Store and Forward: supports out of process SEDA MOM: supports scale out
  • #19 Work with project manager to determine the list of services that would go live, and what realistic and peak load might be Rank them by importance and by volume, so we work on the right one first Mock outbound services, instrument with JMX, profiler, network monitoring Model request patterns in a JMeter test plan, execute against QA environment Observe Response Time, Error Rate, CPU, Memory, IO, Threads, Network See next slides