MIDDLEWARE SYSTEMS
RESEARCH GROUP
MSRG.ORG

Solving Big Data Challenges for
Enterprise Application
Performance Management
...
Agenda


Application Performance Management



APM Benchmark



Benchmarked Systems



Test Results

Tilmann Rabl - VL...
Motivation

Tilmann Rabl - VLDB 2012

8/26/2012
Enterprise System Architecture
SAP
Agent

Client
Client
Client

Identity
Manager

Agent

Web Server
Agent

Application
Ser...
Application Performance
Management
SAP
Agent

Client
Client
Client

Identity
Manager

Agent

Web Server
Agent

Application...
APM in Numbers


Nodes in an enterprise
system










10 sec






100B / event

Raw data


Up to 50K
Avg 10K...
APM Benchmark


Based on Yahoo! Cloud Serving Benchmark (YCSB)




Single table






CRUD operations
25 byte key
5...
Benchmarked Systems


6 systems











Categories









Cassandra
HBase
Project Voldemort
Redis
Volt...
Classification of Benchmarked
Systems

Disk-Based
Stores
Distributed
In-Memory Sharded
Stores
Key-value stores

Extensible...
Experimental Testbed
Two clusters
 Cluster M (memory-bound)








16 nodes (plus master node)
2x quad core CPU,16 ...
Evaluation






Minimum 3 runs per workload and system
Fresh install for each run
10 minutes runtime
Up to 5 clients ...
Workload W - Throughput







Cassandra dominates
Higher throughput for Cassandra and HBase
Lower throughput for oth...
Workload W – Latency Writes



Same latencies for




Cassandra,Voldemort,VoltDB,Redis, MySQL

HBase latency increased
...
Workload W – Latency Reads



Same latency as for R for




Cassandra,Voldemort, Redis,VoltDB, HBase

HBase latency in ...
Workload R - Throughput







95% reads, 5% writes
On a single node, main memory systems have best
performance
Linea...
Workload RW - Throughput






50% reads, 50% writes
VoltDB highest single node throughput
HBase and Cassandra through...
Workload RS – Latency Scans





HBase latency equal to reads in Workload R
MySQL latency very high due to full table s...
Workload RWS - Throughput





25% reads, 25% scans, 50% writes
Cassandra, HBase, Redis,VoltDB gain performance
MySQL p...
Bounded Throughput –
Write Latency






Workload R, normalized
Maximum throughput in previous tests 100%
Steadily dec...
Bounded Throughput –
Read Latency






Redis and Voldemort slightly decrease
HBase two states
Cassandra decreases lin...
Disk Usage






Raw data: 75 byte per record, 0.7GB/10M
Cassandra: 2.5GB / 10M
HBase: 7.5GB / 10M
No compression
Tilm...
Cluster D Results – Throughput




Disk bound, 150M records on 8 nodes
More writes – more throughput




Cassandra: 2...
Cluster D Results – Latency Writes






Low latency for writes for all systems
Relatively stable latency for writes
L...
Cluster D Results – Latency Reads






Cassandra latency decreases with more writes
Voldemort latency low 5-20ms
HBas...
Lessons Learned


YCSB




Cassandra




Higher number of connections might lead to better results

MySQL




Jedis...
Conclusions I


Cassandra




HBase




Read and write latency stable at low level

Redis




Scales well, latency ...
Conclusions II


Cassandra’s performance close to APM requirements




Additional improvements needed for reliably sust...
Thanks!

 Questions?



Contact: Tilmann Rabl
tilmann.rabl@utoronto.ca

Tilmann Rabl - VLDB 2012

8/24/2012
Upcoming SlideShare
Loading in …5
×

Solving Big Data Challenges for Enterprise Application Performance Management

1,237 views

Published on

This is a presentation that was held at the 38th Conference on Very Large Databases (VLDB), 2012.

Full paper and additional information available at:
http://msrg.org/papers/vldb12-bigdata

Abstract:
As the complexity of enterprise systems increases, the need for monitoring and analyzing such systems also grows. A number of companies have built sophisticated monitoring tools that go far beyond simple resource utilization reports. For example, based on instrumentation and specialized APIs, it is now possible to monitor single method invocations and trace individual transactions across geographically distributed systems. This high-level of detail enables more precise forms of analysis and prediction but comes at the price of high data rates (i.e., big data). To maximize the benefit of data monitoring, the data has to be stored for an extended period of time for ulterior analysis. This new wave of big data analytics imposes new challenges especially for the application performance monitoring systems. The monitoring data has to be stored in a system that can sustain the high data rates and at the same time enable an up-to-date view of the underlying infrastructure. With the advent of modern key-value stores, a variety of data storage systems have emerged that are built with a focus on scalability and high data rates as predominant in this monitoring use case.
In this work, we present our experience and a comprehensive performance evaluation of six modern (open-source) data stores in the context of application performance monitoring as part of CA Technologies initiative. We evaluated these systems with data and workloads that can be found in application performance monitoring, as well as, on-line advertisement, power monitoring, and many other use cases. We present our insights not only as performance results but also as lessons learned and our experience relating to the setup and configuration complexity of these data stores in an industry setting.

Published in: Technology
1 Comment
2 Likes
Statistics
Notes
No Downloads
Views
Total views
1,237
On SlideShare
0
From Embeds
0
Number of Embeds
478
Actions
Shares
0
Downloads
24
Comments
1
Likes
2
Embeds 0
No embeds

No notes for slide

Solving Big Data Challenges for Enterprise Application Performance Management

  1. 1. MIDDLEWARE SYSTEMS RESEARCH GROUP MSRG.ORG Solving Big Data Challenges for Enterprise Application Performance Management Tilmann Rabl, Mohammad Sadoghi, Hans-Arno Jacobsen University of Toronto Sergio Gomez-Villamor Universitat Politecnica de Catalunya Victor Muntes-Mulero*, Serge Mankowskii CA Labs (Europe*)
  2. 2. Agenda  Application Performance Management  APM Benchmark  Benchmarked Systems  Test Results Tilmann Rabl - VLDB 2012 8/26/2012
  3. 3. Motivation Tilmann Rabl - VLDB 2012 8/26/2012
  4. 4. Enterprise System Architecture SAP Agent Client Client Client Identity Manager Agent Web Server Agent Application Server Agent Application Server Agent Message Queue Database Agent Message Broker Agent Agent Web Service Agent Application Server Client Main Frame 3rd Party Database Agent Measurements Agent Agent Tilmann Rabl - VLDB 2012 Agent 8/26/2012
  5. 5. Application Performance Management SAP Agent Client Client Client Identity Manager Agent Web Server Agent Application Server Agent Application Server Agent Message Queue Database Agent Message Broker Agent Agent Web Service Agent Application Server Client Main Frame 3rd Party Database Agent Measurements Agent Agent Tilmann Rabl - VLDB 2012 Agent 8/26/2012
  6. 6. APM in Numbers  Nodes in an enterprise system      10 sec    100B / event Raw data  Up to 50K Avg 10K Reporting period Data size  Metrics per node   100 – 10K  100MB / sec 355GB / h 2.8 PB / y Event rate at storage  > 1M / sec Tilmann Rabl - VLDB 2012 8/24/2012
  7. 7. APM Benchmark  Based on Yahoo! Cloud Serving Benchmark (YCSB)   Single table     CRUD operations 25 byte key 5 values (10 byte each) 75 byte / record Five workloads  50 rows scan length Tilmann Rabl - VLDB 2012 8/24/2012
  8. 8. Benchmarked Systems  6 systems        Categories        Cassandra HBase Project Voldemort Redis VoltDB MySQL Key-value stores Extensible record stores Scalable relational stores Sharded systems Main memory systems Disk based systems Chosen by  Previous results, popularity, maturity, availability Tilmann Rabl - VLDB 2012 8/26/2012
  9. 9. Classification of Benchmarked Systems Disk-Based Stores Distributed In-Memory Sharded Stores Key-value stores Extensible Row Stores Scalable Relational Stores Tilmann Rabl - VLDB 2012 8/26/2012
  10. 10. Experimental Testbed Two clusters  Cluster M (memory-bound)      16 nodes (plus master node) 2x quad core CPU,16 GB RAM, 2x 74GB HDD (RAID 0) 10 million records per node (~700 MB raw) 128 connections per node (8 per core) Cluster D (disk-bound)     24 nodes 2x dual core CPU, 4 GB RAM, 74 GB HDD 150 million records on 12 nodes (~10.5 GB raw) 8 connections per node Tilmann Rabl - VLDB 2012 8/24/2012
  11. 11. Evaluation     Minimum 3 runs per workload and system Fresh install for each run 10 minutes runtime Up to 5 clients for 12 nodes   To make sure YCSB is no bottleneck Maximum achievable throughput Tilmann Rabl - VLDB 2012 8/24/2012
  12. 12. Workload W - Throughput      Cassandra dominates Higher throughput for Cassandra and HBase Lower throughput for other systems Scalability not as good for all web stores VoltDB best single node throughput Tilmann Rabl - VLDB 2012 8/24/2012
  13. 13. Workload W – Latency Writes  Same latencies for   Cassandra,Voldemort,VoltDB,Redis, MySQL HBase latency increased Tilmann Rabl - VLDB 2012 8/24/2012
  14. 14. Workload W – Latency Reads  Same latency as for R for   Cassandra,Voldemort, Redis,VoltDB, HBase HBase latency in second range Tilmann Rabl - VLDB 2012 8/24/2012
  15. 15. Workload R - Throughput      95% reads, 5% writes On a single node, main memory systems have best performance Linear scalability for web data stores Sublinear scalability for sharded systems Slow-down for VoltDB Tilmann Rabl - VLDB 2012 8/24/2012
  16. 16. Workload RW - Throughput     50% reads, 50% writes VoltDB highest single node throughput HBase and Cassandra throughput increase MySQL and Voldemort throughput reduction Tilmann Rabl - VLDB 2012 8/24/2012
  17. 17. Workload RS – Latency Scans    HBase latency equal to reads in Workload R MySQL latency very high due to full table scans Similar but increased latency for  Cassandra, Redis,VoltDB Tilmann Rabl - VLDB 2012 8/24/2012
  18. 18. Workload RWS - Throughput    25% reads, 25% scans, 50% writes Cassandra, HBase, Redis,VoltDB gain performance MySQL performance 20 – 4 ops / sec Tilmann Rabl - VLDB 2012 8/24/2012
  19. 19. Bounded Throughput – Write Latency     Workload R, normalized Maximum throughput in previous tests 100% Steadily decreasing for most systems HBase not as stable (but below 0.1 ms) Tilmann Rabl - VLDB 2012 8/24/2012
  20. 20. Bounded Throughput – Read Latency     Redis and Voldemort slightly decrease HBase two states Cassandra decreases linearly MySQL decreases and then stays constant Tilmann Rabl - VLDB 2012 8/24/2012
  21. 21. Disk Usage     Raw data: 75 byte per record, 0.7GB/10M Cassandra: 2.5GB / 10M HBase: 7.5GB / 10M No compression Tilmann Rabl - VLDB 2012 8/24/2012
  22. 22. Cluster D Results – Throughput   Disk bound, 150M records on 8 nodes More writes – more throughput    Cassandra: 26x Hbase: 15x Voldemort 3x Tilmann Rabl - VLDB 2012 8/24/2012
  23. 23. Cluster D Results – Latency Writes     Low latency for writes for all systems Relatively stable latency for writes Lowest for HBase Equal for Voldemort and Cassandra Tilmann Rabl - VLDB 2012 8/24/2012
  24. 24. Cluster D Results – Latency Reads     Cassandra latency decreases with more writes Voldemort latency low 5-20ms HBase latency high (up to 200 ms) Cassandra in between Tilmann Rabl - VLDB 2012 8/24/2012
  25. 25. Lessons Learned  YCSB   Cassandra   Higher number of connections might lead to better results MySQL   Jedis distribution uneven Voldemort   Difficult setup, special JVM settings necessary Redis   Optimal tokens for data distribution necessary HBase   Client to host ratio 1:3 better 1:2 Eager scan evaluation VoltDB  Synchronous communication slow on multiple nodes Tilmann Rabl - VLDB 2012 8/24/2012
  26. 26. Conclusions I  Cassandra   HBase   Read and write latency stable at low level Redis   Scales well, latency decreases with higher scales Voldemort   Low write latency, high read latency, low throughput Sharded MySQL   Winner in terms of scalability and throughput Standalone has high throughput, sharded version does not scale well VoltDB  High single node throughput, does not scale for synchronous access Tilmann Rabl - VLDB 2012 8/24/2012
  27. 27. Conclusions II  Cassandra’s performance close to APM requirements   Additional improvements needed for reliably sustaining APM workload Future work   Benchmark impact of replication, compression Monitor the benchmark runs using APM tool Tilmann Rabl - VLDB 2012 8/26/2012
  28. 28. Thanks!  Questions?  Contact: Tilmann Rabl tilmann.rabl@utoronto.ca Tilmann Rabl - VLDB 2012 8/24/2012

×