2. CONTENTS
Covered topics:
⢠What is profiling? How do profilers work?
⢠What problems can affect performance?
⢠How to profile a distributed application?
⢠Gathering, storing and analysis of stack traces
⢠Memory analysis
⢠Use Case
⢠Alternative approaches to profiling
3. 3
WHAT IS A PROFILER?
Profiler is a tool to look what parts of your app is working slowly.
VisualVM YourKit
4. 4
HOW DO PROFILERS WORK?
⢠Instrumenting: adding extra bytecode to your
methods for recording when theyâre called and
how long they execute.
⢠Sampling: taking dumps of all the threads
periodically in order to understand how much
CPU time each method takes.
5. 5
DIFFUCULTIES WITH A CLUSTER
This is a typical mapreduce application running on a Hadoop cluster.
All blue boxes are separate JVM processes running on different
machines. Question: how can we profile a distributed Java app?
1. How to attach to a process
running on another host?
2. How to track the appearance
of new processes?
3. How to gather profiling data?
4. How to analyze this vast
amount of data?
6. 6
WHY DO WE NEED A CLUSTER PROFILER?
Answer: we need a profiler to get more performance.
Hadoop principle is next:
âIf you want more performance, add more hardwareâ.
This is a truth. But this is not the only truth.
Another truth is: there are problems that are related to
ALL applications (both distributed and local).
7. 7
PROBLEM 1: NOT OPTIMAL CODE
public static void QuickSort(int[] a, int x, int y){
int pivot = (x+y)/2;
int apivot = a[pivot];
int i = x;
int j = y;
while (i <= j){
while (a[i] < apivot) i++;
while (a[j] > apivot) j--;
if (i <= j){
int temp = a[i];
a[i] = a[j];
a[j] = temp;
i++;
j--;
}
}
if (x < j)
QuickSort(a, x, j);
if (i < y)
QuickSort(a, i, y);
}
public static void StupidSort(int[] a){
for (int i = 0; i < a.length - 1; ++i)
for (int j = i + 1; j < a.length; ++j)
if (a[i] > a[j]){
int temp = a[i];
a[i] = a[j];
a[j] = temp;
}
}
âTupo-v-lobâ sort: O(N^2) Quicksort: O(N*log(N))
This is a simple example of different algorithm solving the same tasks:
8. 8
PROBLEM 2: BAD CODE/DATA
⢠Repeatedly doing the same unnecessary actions
Example: re-reading the configuration file or a database table again and again during every operation (although we
could cache it in the memory).
⢠Wrong usage of someoneâs code/libraries/binaries
Example: sqoop can import from MySQL in two modes â direct mode (using mysqldump and mysqlimport) and JDBC-
mode. The first one is faster.
⢠Usage of wrong libraries
Example: https://powercollections.codeplex.com/workitem/16950
I found that famous Wintellectâs OrderedSet works 3 times slower than native Microsoftâs SortedSet.
⢠Absense of indexes in a database
Example: âselect * from fact join dim on fact.productid = dim.productidâ is slow because developers missed to
make keys/indexes
⢠Bugs in famous libraries/frameworks
Example: http://ihorbobak.com/index.php/2015/06/03/spark-sql-bad-performance/ problem with A->B->C tables
join when enumerated in order A, C, B. This is handled fine by all database servers, but NOT by Spark SQL.
9. 9
PROBLEM 3: HARDWARE TROUBLES
Two most important problems are:
⢠Disk problems (slow I/O speed)
⢠Network problems (slow bandwidths, packets
loss)
10. 10
CLUSTER PROFILER ARCHITECTURE
Java Process
âInjectedâ code
which does stacktrace
sampling
Passing stacktraces
each 10 seconds
though HTTP
A set of Python/Perl
scripts to get
visualizations
Visualization in the
form of flame graphs
This is applicable to any java process:
mapper, reducer, etc., and it is applicable
not only Hadoop: it can be Spark RDD
code, Java web app code, etc.
11. 11
HOW JAVA AGENT WORKS?
⢠Agent is bound to a java process by specifying -javaagent parameter, e.g.
java âjavaagent:/path/agent.jar=parameters MainClass
or by overriding _JAVA_OPTIONS like this:
_JAVA_OPTIONS='-javaagent:/path/agent.jar=parameters
⢠Agentâs jar has a manifest with
PreMain-Class: namespace.TheAgentClass
⢠âTheAgentClassâ has a premain() method that executes before your
main() and does the following:
â Read the parameters of the agent
â Constructs the profiler instances (based on parameters)
â Creates a ScheduledExecutorService (see java.util.concurrent) that does
scheduleAtFixedRate(worker, 0, 10, TimeUnit.SECONDS)
12. 12
HOW JAVA AGENT WORKS?
The profiler thread collects stacktraces 100 times per second using ThreadMXBean (a
part of JMX â a technology for monitoring and managing the JVM)
public void profile() {
profileCount++;
try{
for (ThreadInfo thread : getAllRunnableThreads()) {
if (thread.getStackTrace().length > 0) {
String traceKey = StackTraceFormatter.formatStackTrace(thread.getStackTrace());
if (filter.includeStackTrace(traceKey))
traces.increment(traceKey, 1);
}
}
}
catch (OutOfMemoryError ex)
{
// ... skipping code for handling OOM (just for safety)
}
if (profileCount == reportingFrequency) {
profileCount = 0;
recordMethodCounts();
}
}
For more information about JMX read here:
https://docs.oracle.com/javase/tutorial/jmx/index.html
13. 13
STATSD + MY CHANGES
I made a modification of a famous StatsD JVM profiler https://github.com/etsy/statsd-
jvm-profiler
List of my changes:
⢠Added the jvmName and host tag to each stacktrace;
⢠Optimized performance in stacktraces collection code;
⢠Improved stability - added catching of OutOfMemoryException;
⢠Added statistics to show how many lines and characters we pass to the backend;
⢠Seriously modified the influxdb_dump.py: now it extracts data into a set of distinct
files - one file for each JVM, each host and a total.
⢠Added extraction of memory information and rendering it with charts in R
⢠Added call_tree.py: a script for analysis of the method call trees
⢠Added some helper scripts.
14. 14
INFLUXD
What is InfluxDB?
It is a time series, metrics, and analytics database.
Targeted at:
gathering metrics (like response times, CPU load), sensor
data, events (like exceptions) and real-time analytics.
Key Features:
⢠SQL-like query language;
⢠HTTP(S) API for data ingestion and queries;
⢠Built-in support for other data protocols such as
collectd;
⢠Has a CLI and web interface;
⢠Tag data for fast and efficient queries.
16. 16
Schema exploration examples:
⢠SHOW MEASUREMENTS
shows the list of measurements
⢠SHOW SERIES FROM /.*cpu.*/
shows the list of series for each measurement whose name matches the
pattern /*.cpu.*/
⢠SHOW TAG KEYS FROM /.*heap.*/
shows different tag keys from measurements that match pattern
⢠SHOW TAG VALUES FROM /.*cpu.*/ WITH KEY = jvmName
shows different tag keys from measurements that match pattern
Data exploration examples:
⢠SELECT * FROM cpu WHERE host = âAâ
selects series for âcpuâ measurement with tag host=âAâ
⢠SELECT percentile(value, 95) FROM response_times
WHERE time > now() - 1d
GROUP BY time(1m)
shows the 95th percentile of response times in the last day in 1 minute
interval
17. 17
FLAME GRAPHS
D D
C C C
B B B B
A A A A
0th
ms
10th
ms
20th
ms
30th
ms
Gathered stack traces:
A->B->C
A->B->C->D
A->B->C->D
A->B
D D
C C C
B B B B
A A A A
0th
ms
10th
ms
20th
ms
30th
ms
THE WIDTH OF A BAR MATTERS.
Color doesnât matter and is selected just to distinguish bars.
18. 18
FLAME GRAPHS
Flame graphs are a visualization of profiled software, allowing the
most frequent code-paths to be identified quickly and accurately.
Invented by Brendann Gregg: http://www.brendangregg.com
19. 19
SEQUENCE OF ACTIONS
Steps to Profile a Cluster:
1. Install InfluxDB on a separate machine visible to all machines of the cluster.
Create a database and a user.
2. Get the agentâs jar file from my blog (or from sources) and put it into
/var/lib at every worker node.
3. Change the configuration of the cluster: make _JAVA_OPTIONS=â-
javaagentâŚâ available to all JVM processes.
4. Run your application and get the stacktraces in the InfluxDB. You may
âswitch offâ the _JAVA_AGENT after this.
5. Get the SVG files (flame graphs) from InfluxDB with the help of
influxdb_dump.py and flamegraph_files.sh and do the analysis.
These steps are described in detail at my blog http://ihorbobak.com
21. 21
USE CASE WITH A REAL CUSTOMER
The App/Inventory/Environment:
â˘Our customer has an app that crawls data from a set of sites, parses it
and puts to a Hadoop cluster (20 machines with 8 cores, 32GB RAM
and 1TB HDD each).
â˘The app leverages Apache Nutch, Cloudera Hadoop distribution
version 5.3, Hbase, MongoDB and other technologies.
â˘There is a central Java web app (Java/Tomcat) that uses Nutch which
runs the mapreduce jobs.
The problem:
â˘The cluster crawls just 100 sites per day; a customer is asking us
âhow to make it crawl 10 times more on the same hardware?â
22. 22
FIRST FINDINGS
The first question that arose in my head: what exactly works slowly?
At the beginning I quickly found this: slow are the parts that are I/O intensive.
24. 24
FETCHER MAPREDUCE JOB
% of CPU time:
15% - HTML parsing
15% - Hadoop
framework
initialization code
7% - HDFS
initialization code
22% - reducer code
(BAD NEWS HERE)
18% - reading
Hadoop XML config
files
23% - real job
25. 25
DRILL DOWN INTO THE REDUCER
org.apache.hadoop.hbase.
catalog.MetatataReader.
fullScan()
org.apache.avro.
Schema$Parser.parse()
ending with ZipFile.read,
ZipFile.getEntry(), etc.
org.apache.hadoop.hbase.
client.HConnectionManager.
createConnection()
Creating a record writer
Parsing
avro
schema
Fetcher
Reducer
.run()
26. 26
DRILL DOWN INTO THE RECORD WRITER
This is Gora
library code
Most observable
function calls on
top are:
java.util.zip.*
FileInputStream*
FileOutputStream*
28. 28
INEFFECTIVE MEMORY MANAGEMENT
Most of Java processes used
significantly less memory
than they were initially
assigned.
Legend:
⢠init - the initial amount of memory that the
JVM requests from the OS during startup;
⢠used - the amount of memory currently
used;
⢠Committed - the amount of memory that is
guaranteed to be available for use by the
Java virtual machine;
⢠Max - represents the maximum amount of
memory (in bytes) that can be used for
memory management.
A memory allocation may fail if it attempts
to increase the used memory such that used
> committed even if used <= max would still
be true
29. 29
PROBLEMS AND NEXT STEPS
1) Gora + HBase
Reasons: Bad code in Gora (too many metadata full table scans)
Actions:
⢠check Goraâs configuration, dive into the code to find out why it does full scan
⢠try Cassandra instead of HBase
2) Hadoop Framework parts, in particular:
⢠HDFS initialization in mapreduce jobs (slow communication with Namenode)
⢠Reading configuration files (it is done with Xerces library ).
Possible Reasons:
⢠Bad I/O speed and bad network speed.
⢠There can be some parameterizing of XML parsing of config files that weâre not aware of.
Actions:
⢠fix the hardware issues.
⢠Search for why Hadoop XML config parsing may be so slow
⢠Check namenode memory usage
30. 30
OTHER METHOD OF GETTING STACKTRACE
Another method to get stack traces is Linuxâs perf_events:
perf record -F 99 -g -p PID
perf record -e L1-dcache-load-misses -c 10000 -ag -- sleep 5
Perf monitors:
⢠Hardware events (e.g. level 2 cache
misses);
⢠Software events (e.g. CPU migrations)
⢠Tracepoint events (e.g. filesystem I/O,
TCP events)
Perf can also do
⢠Sampling: collection of snapshots at some
frequency (by timer)
⢠Dynamic tracing: instrumenting code to
create events in any location (using
kprobes or uprobes frameworks)
For more details see: http://www.brendangregg.com/perf.html
31. 31
PERF vs. JAVA AGENT
Advantages of perf over java agent:
⢠low overhead when getting stack traces;
⢠combining user calls (Java) and kernel calls in one flame graph.
⢠Will 100% catch all Java methods (no matter that JVM may
exclude safepoint checks from hot methods)
(http://chriskirk.blogspot.com/2013/09/what-is-java-safepoint.html - a good
explanation about safepoints).
Disadvantages of perf:
⢠Cannot get Javaâs stacktraces (it is necessary to fix frame pointer-based stack
walking in OpenJDK â done by Netflix and Twitter)
⢠Doesnât see Java symbols (hex numbers instead; special agent needed to add
symbols https://github.com/jrudolph/perf-map-agent )
⢠Permissions must be configured to symbol files
⢠It is necessary to develop a service which will launch perf, get stacktraces and
pass them to a server.
32. 32
PERF vs. JAVA AGENT
AndâŚ. it happens that Netflixâs product is open sourcedâŚ
33. 33
CREDITS
Andrew Johnson
Software Engineer at Etsy
Previously: Explorys, Inc.
https://www.linkedin.com/in/ajsquared
Brendann Gregg
Senior Performance Architect at Netflix
Previously: Joyent, Oracle, Sun Microsystems
http://www.brendangregg.com/index.html
34. 34
BLOGS/ARTICLES
Blogs:
⢠My blog article
http://ihorbobak.com/index.php/2015/08/05/cluster-profiling/
⢠Etsyâs blog about JVM Profiler
https://codeascraft.com/2015/01/14/introducing-statsd-jvm-profiler-a-jvm-profiler-for-hadoop/
https://codeascraft.com/2015/05/12/four-months-of-statsd-jvm-profiler-a-retrospective/
⢠Brendan Greggâs blog
http://www.brendangregg.com/blog/index.html
Source code:
⢠My modification of StatsD JVM Profiler
https://github.com/ibobak/statsd-jvm-profiler
⢠Original Etsyâs StatsD JVM Profiler
https://github.com/etsy/statsd-jvm-profiler
⢠Brendan Greggâs FlameGraph
https://github.com/brendangregg/FlameGraph
Manuals:
⢠InfluxDB Docs
https://influxdb.com/docs/v0.9/introduction/overview.html
⢠Overview of the JMX Technology
https://docs.oracle.com/javase/tutorial/jmx/overview/index.html
⢠JVM Tool Interface
http://docs.oracle.com/javase/7/docs/platform/jvmti/jvmti.html#starting
35. 35
BOOKS / VIDEOS
⢠Systems Performance: Enterprise and the Cloud
by Brendan Gregg
http://www.amazon.com/Systems-Performance-
Enterprise-Brendan-Gregg/dp/0133390098
⢠Blazing Performance with Flame Graphs
by Brendan Gregg
https://www.youtube.com/watch?v=nZfNehCzGdw
⢠Linux profiling at Netflix
by Brendan Gregg
https://www.youtube.com/watch?v=_Ik8oiQvWgo
⢠Profiling Java in Production
by Kaushik Srenevasan, Twitter University
https://www.youtube.com/watch?v=Yg6_ulhwLw0