Greg Banks
Staff Site Reliablity Engineer
LinkedIn
www.linkedin.com/in/gregnbanks
Java hates Linux.
Deal with it.
3
This is a story about Java and Linux
Java and Linux are a perfectly
matched pair
Except when they’re not
Then it’s fireworks
@misslexirose
4
Java: a portable application platform
supposedly
Alternatively: Java is written for
an imaginary OS
Which is sorta like the set-top
box Java was designed for
Then shoe-horned into Linux,
Solaris & Windows with hidden
portability layers
Kevin Fream adroit.blog
5
Garbage Collection
vs
Unix Virtual Memory
6
Garbage collection vs virtual memory
Since 3BSD (1979) every Unix-
like system’s virtual memory
subsystem has been designed
around locality of reference
www.mckusick.com
7
Locality of reference
A process’ memory space is
managed in fixed size pages
A program uses a working set of
pages frequently
Other pages are hardly ever
used. They can be swapped out
to disk, saving previous RAM
Brian0918 wikipedia.org CC BY-SA 3.0
8
Java behaves just like that
…until GC happens
Every page in the Java heap - a
multi-GB chunk of virtually
contiguous address space - is
read and written as fast as
possible.
While the application is stopped.
This has to be milliseconds fast
Javaworld.com
9
Daytrip to Failtown
If enough of the Java heap is
swapped out, GC can take
hundreds of seconds
Your service’s clients time out
Healthchecks fail
Latency spikes propagate
Your SLAs are shot
10
Real World Example
gc.log
2014-10-15T18:42:44.931+0000: 4651814.348: [GC 4651814.546: [ParNew
...
: 1152488K->274696K(1572864K), 106.7471350 secs] 15021460K-
>14143829K(32944128K), 106.9300350 secs] [Times: user=53.97 sys=518.37,
real=107.11 secs]
Total time for which application threads were stopped: 107.5402200 seconds
11
Why? Swap is SLOW
“If you’re swapping out, you’ve
already lost the battle”
Swapping in happens one page
at a time
Swap has none of the tricks used
to make filesystems fast
(readahead, contiguous extents)
Nationalgeographic.com
12
Deal with it
The easy way
Disable swap entirely
Remove /etc/fstab entries
Check with swapon –s
Affects all processes
OS-wide configuration change
We couldn’t do this for $reasons
13
Deal with it
The clever way
sysctl –w 
vm.swappiness=0
Well-known kernel tunable to
limit swapping
Doesn’t eliminate swapping
Behavior depends on kernel
version
It just didn’t work
14
Deal with it
The horrible/cunning way
Lock the Java heap in RAM
A native system call using
com.sun.jna
mlockall(MCL_CURRENT)
Call early in process lifetime
No more swapping
15
Resource Limits
RLIMIT_MEMLOCK limits how
much memory a process can lock
Kernel default is 64K, needs raising
echo 
"app - memlock unlimited" 
>>/etc/security/limits.conf
You could also give the process the
CAP_IPC_LOCK capability
Stephen Codrington CC BY 2.5
16
Checking it worked
For a process with ~32G heap (-Xmx32684m)
egrep '^Vm(Lck|RSS|Size|Swap)' /proc/$pid/status
VmSize: 48767632 kB
VmLck: 39652024 kB  needs to be > heap size
VmRSS: 35127672 kB
VmSwap: 26444 kB  needs to be small
The whole heap is locked and resident
Some non-heap allocations came along for the ride
Others are eligible for swapping
17
Sometimes You Just Need a
File Descriptor
18
Java Hides the OS
So you can write portable code
Also … so you can only write
portable code
But sometime you need access
to the underlying OS objects
Coachingbysubjectexperts.com
19
File Descriptors
File descriptors are the small
integers used to name open files
and sockets to Unix system calls.
int read(int fd, void*buf, int len)
Java hides these from you
because they're different on
Windows.
Stationeryinfo.com
20
The Goal
Gateway to the Rabbit Hole
Network server performance
analysis
Need the length of a socket's in-
kernel input queue
Not a complicated example, but
outside Java's API fence
Jenningswire.com
21
Doing it in C
FILE *stream = …
int length;
int r = ioctl(fileno(stream), FIONREAD, &length);
/* error handling */
22
Doing it in Java
…because the build system cannot cope with mixed C & Java
First we have to convince Java to let us call ioctl()
import com.sun.jna.Native;
private static native int ioctl(int fd, int cmd, IntByReference valuep) throws
LastErrorException;
Native.register(…)
private static final int FIONREAD = 0x541B;
23
Doing it in Java
part 2
Now we just need the Java
equivalent of C's fileno()
Given an InputStream object
return the Unix file descriptor
So simple…so impossible
24
FileDescriptor is Wrapped Tight
Class FileInputStream has
public FileDescriptor getFD();
And FileDescriptor has
private int fd;
But NO WAY TO GET IT
http://www.iconsdb.com/soylent-red-icons/private-4-icon.html
25
How to Unwrap a FileDescriptor ?
Well known trick using Reflection
…performance concerns
Tricks using Unsafe…are unsafe
sun.misc.SharedSecrets
This is the 2nd of the big ol' rugs
under which Sun swept all the
dust bunnies
literateforlife.org
26
SharedSecrets
public static JavaIOFileDescriptorAccess
getJavaIOFileDescriptorAccess();
public interface JavaIOFileDescriptorAccess {
public int get(FileDescriptor fd);
public long getHandle(FileDescriptor obj);
}
27
Deal With It: Simple, Fast, Obscure
import sun.misc.SharedSecrets;
public static int getFileDescriptor(FileDescriptor fd) … {
return SharedSecrets.getJavaIOFileDescriptorAccess().get(fd);
}
public static int getFileDescriptor(InputStream is) … {
return getFileDescriptor(((FileInputStream)is).getFD());
}
28
NFS, Big Directories, Oh My
29
The Problem
We take database backup
snapshots and copy them to a
directory on an NFS filer.
Directory has 1000s of files
A Java process lists this directory
and reports file names and sizes
SLOOOOWLY
Offtackleempire.com
30
How NFSv3 Works
when we run "ls"
A process on the NFS client
wants to read the contents of a
directory
getdents() system call
NFS client sends READDIR rpc
to the server, caches result
Gxdmtl.com
31
The READDIR call
greatly simplified
Maps to getdents()
Arguments
nfs_fh3, count
Results
list of {
fileid3 fileid; // inode #
filename3 name; // string
}
Theodysseyonline.com
32
Message Flow for "ls /d"
Process NFS Client NFS Server
getdents
local system calls network RPCs
READDIR
f1, f2, f3, …
33
The Trouble With READDIR
The "ls -l" problem
Only returns names, not file
attributes or a file handle
If the process does a stat() or
open() system call on the files,
the NFS client needs to do a
LOOKUP rpc, maybe GETATTR
Those RPCs are one per file
Reference.com
34
Message Flow for "ls -l /d"
Process NFS Client NFS Serverlocal system calls network RPCs
getdents
READDIR
f1, f2, …
stat(/d/f1)
LOOKUP f1
fh, attrs
stat(/d/f2)
LOOKUP f2
fh, attrs
…
For N files, N+1 RPCs
serialized on a
single thread
35
The READDIRPLUS call
READDIR but returns file
handles and attributes for each
file.
Frontloads the information
needed for the process to stat()
or open() a file.
Clipartkid.com
36
Call Sequence for "ls -l /d"
with READDIRPLUS
Process NFS Client NFS Serverlocal system calls network RPCs
getdents
READDIR
f1, f2, …
stat(/d/f1)
LOOKUP f1
fh, attrs
stat(/d/f2)
LOOKUP f2
fh, attrs
…
For N files, N+1 RPCs
serialized on a
single thread
37
So we just use READDIRPLUS right?
If only it were that easy
The NFS protocol puts an upper
limit on the encoded size of the
results
A really big directory, needs more
READDIRPLUS than READDIR
Tradeoff: READDIR is faster with
large directories if you don’t want
to stat() the files.
Iconfinder.com
38
Heuristics To The Rescue
On the first getdents() send a
READDIRPLUS
If the process open()s or stat()s
set a flag
On subsequent getdents() if the
flag is set, send READDIRPLUS
else READDIR
111emergency.co.uk
"ls" = processes which do
getdents  READDIRPLUS
getdents  READDIR
getdents  READDIR
39
These Patterns Are Optimal
"ls –l" = processes which do
getdents  READDIRPLUS
stat, stat, … (cached)
getdents  READDIRPLUS
stat, stat, … (cached)
getdents  READDIRPLUS
stat, stat, … (cached)
40
"ls -l /d" in Java
This is the solution you will find on StackOverflow
File dir = File("/d");
File[] list =
dir.listFiles();
for (File f: list) {
print(f.getName(),
f.length());
}
Steven James Keathley
41
java.io.File
Wraps a string filename
File.length()  stat()
File.listFiles() reads the whole
directory using N x getdents(),
returns File[]
The API forces this behavor
Stationeryinfo.com
C processes do
getdents  READDIRPLUS
stat, stat, … (cached)
getdents  READDIRPLUS
stat, stat, … (cached)
getdents  READDIRPLUS
stat, stat, … (cached)
42
"ls -l": C vs Java
on large directories
Java processes do
getdents  READDIRPLUS
getdents  READDIR
getdents  READDIR
stat, stat, … (cached)
stat  LOOKUP
stat  LOOKUP
…
100s x slower
43
Deal With It: Rewrite Using java.nio.files
new in Java 8
Path dir = Paths.get("/d");
DirectoryStream<Path> stream =
Files.newDirectoryStream(dir);
for (Path p: stream) {
File = p.toFile();
print(f.getName(), f.length());
}
The iterator object delays the getdents()until they're needed
44
Understatement
"[newDirectoryStream] may be
more responsive when working
with remote directories"
-- Java 8 Documentation
Theflyingtortoise.blogspot.com
45
GC: Log Hard
With a Vengeance
46
Java GC is Important
GC is a limiting factor on the
availability of Java services
In production, it's important to keep
GC behaving well
"You can't manage what you can't
measure"
 GC logging in production.
HearMeSayThis.org
47
GC Logging Options We Use
-Xloggc:$filename
-XX:+PrintGC
-XX:+PrintGCDetails
-XX:+PrintGCDateStamps
-XX:+PrintGCApplicationStoppedTime
-XX:+PrintGCApplicationConcurrentTime
-XX:+PrintTenuringDistribution
More data is better, right
Hank Rabe hankstruckpictures.com
48
What gets logged?
Major GC event
2017-03-12T23:19:01.480+0000: 3382210.059: Total time for which application threads were stopped: 0.0004373 seconds
2017-03-12T23:22:01.186+0000: 3382389.765: Application time: 179.7055362 seconds
2017-03-12T23:22:01.186+0000: 3382389.766: [GC (Allocation Failure) 3382389.766: [ParNew
Desired survivor size 134217728 bytes, new threshold 15 (max 15)
- age 1: 998952 bytes, 998952 total
- age 2: 12312 bytes, 1011264 total
- age 3: 10672 bytes, 1021936 total
- age 4: 10480 bytes, 1032416 total
- age 5: 753312 bytes, 1785728 total
- age 6: 11208 bytes, 1796936 total
- age 7: 149688 bytes, 1946624 total
- age 8: 9904 bytes, 1956528 total
- age 9: 11000 bytes, 1967528 total
- age 10: 10136 bytes, 1977664 total
- age 11: 10184 bytes, 1987848 total
- age 12: 10304 bytes, 1998152 total
- age 13: 92360 bytes, 2090512 total
- age 14: 176648 bytes, 2267160 total
- age 15: 70328 bytes, 2337488 total
: 539503K->10753K(786432K), 0.0073635 secs] 2119127K->1590653K(3932160K), 0.0075104 secs] [Times: user=0.09 sys=0.00, real=0.01 secs]
49
Byte & Line Rate of Logging
Major GC up to 1200 B
Minor GC ~200 B
Light Load Heavy Load
Bytes/day 3.4 MB/d 33.0 MB/d
Lines/day 39 KL/d 543 KL/d
Visitmelbourne.com
50
The Problem
The GC log is written from JDK's C
code in Stop-The-World
No userspace buffering. Every line
is a write() and an opportunity to
block in the kernel
If the root disk is heavily loaded, the
long tail latency can be 1-5 sec
Client timeouts typically 1-5 sec
Eureferendum.com
51
Possible Solutions
Disable GC logging. Flying blind.
Log4j 2.x async logging. GC iog is
written from C code
Log to a named FIFO. If the reader
process dies the app is blocked.
En.oxforddictionaries.com
52
Deal With It: Log To FUSE
Log to a file mounted on a FUSE
fileystem.
The filesystem's daemon accepts
writes and queues them in
userspace, writes to disk. Provides
asynchrony.
If the daemon dies, app write()s fail
immediately with ENOTCONN, app
continues.
53
DNS Lookup
It’s Easy, Right?
54
Java Hostname Lookup
Java makes this easy
String host = "www.example.com";
int port = 80;
Socket sock = Socket();
Sock.connect(host, port); 
Kabl00ey @ wikipedia.org CC BY-SA 3.0
55
Peeling The Onion
The easy APIs are layers of
wrappers around
class InetAddress {
public static InetAddress[]
getAllByName(String host) …
By default calls libc's
getaddrinfo()
Which calls Linux's NSCD
High Mowing Seed Company
56
Linux NSCD
Name Service Cache Daemon
Standard (in the glibc repo)
Runs locally on every box
Configurable. Supports multiple
name service providers, like a
DNS client.
Understands & obeys TTL
Theodysseyonline.com
57
Java's In-Process Cache
InetAddress conveniently caches
the results in RAM
For 30 sec. Ignores the TTL
returned from DNS
Java could have gotten this right
int __gethostbyname3_r(const char *name, …,
int32_t *ttlp, …);
58
Why Does Java Do This?
Hostname Resolver Latencies
Java cache 53 ± 2 µs
NSCD 72 ± 3 ms
DNS server 192 ± 8 ms
Measured in production using a
specially written Java program.
Norfolkwildlifetrust.co.uk
59
The Problem: Long TTLs
Most stable A/AAAA records are
configured with a TTL of 1h or 1d
Talking to NSCD every 30 sec is
wasteful
Yathin S Krishnappa CC BY-SA 3.0
60
Worse Problem: Short TTLs
One way of achieving load
balancing is with Round-Robin
DNS
Some implementations rely on
an ultra short TTL << 30sec to
achieve failover
spirit-animals.com
61
Deal With It: Disable Java's Cache
easy but impactful
Add to
$JAVA_HOME/lib/security/java.securit
y
networkaddress.cache.ttl=0
or (older)
sun.net.inetaddr.ttl=0
This is global to the host
62
Deal With It: Bypass Cache using Reflection
Reflection is almost never the answer
static InetAddress[] getAllByNameUncached(String hostname) {
Field field = InetAddress.class.getDeclaredField("impl");
field.setAccessible(true);
Object impl = field.get(null);
Method method = null;
for (Method m : impl.getClass().getDeclaredMethods()) {
if (m.getName().equals("lookupAllHostAddr")) {
method = m;
break;
}
}
method.setAccessible(true);
return (InetAddress[]) method.invoke(impl, hostname);
}
63
Deal With It: DNS With JNDI
least worst option
Use com.sun.jndi.dns to make
your own DNS requests for
A/AAAA records
Quick, easy
Does not affect other lookups in
the same process
Bypasses nscd entirely; all
lookups talk to DNS server.
64
Deal With It: DNS SRV Records with JNDI
very useful and very hard
Use com.sun.jndi.dns to make
your own DNS requests for SRV
records
Need to handle stale entries
Have to parse out SRV response
Does not affect other lookups in
the same process
Bypasses nscd entirely; all
lookups talk to DNS server.
65
…and the Lessons Learned
66
The Two Hardest Things in CS
1. Naming
2. Cache Invalidation
3. Off By One Errors
67
The Three Hardest Things in CS
1. Naming
2. Cache Invalidation
3. Off By One Errors
4. Making Java Behave Rationally
68
Java Is Not Magic
Understand that Java doesn’t
always do the right thing with
your OS.
Anne Zweiner
69
Horrible Things Live In The Corners
Modern software is complicated
and has corner cases. Bad bad
horrible things live there.
Thekidsshouldseethis.com
70
Portable code is nice
Working code is better
Do what is needful
Do not be afraid to subvert
Java’s portability fascism
Know your OS
rickele @ flickr.com
©2014 LinkedIn Corporation. All Rights Reserved.

Java Hates Linux. Deal With It.

  • 2.
    Greg Banks Staff SiteReliablity Engineer LinkedIn www.linkedin.com/in/gregnbanks Java hates Linux. Deal with it.
  • 3.
    3 This is astory about Java and Linux Java and Linux are a perfectly matched pair Except when they’re not Then it’s fireworks @misslexirose
  • 4.
    4 Java: a portableapplication platform supposedly Alternatively: Java is written for an imaginary OS Which is sorta like the set-top box Java was designed for Then shoe-horned into Linux, Solaris & Windows with hidden portability layers Kevin Fream adroit.blog
  • 5.
  • 6.
    6 Garbage collection vsvirtual memory Since 3BSD (1979) every Unix- like system’s virtual memory subsystem has been designed around locality of reference www.mckusick.com
  • 7.
    7 Locality of reference Aprocess’ memory space is managed in fixed size pages A program uses a working set of pages frequently Other pages are hardly ever used. They can be swapped out to disk, saving previous RAM Brian0918 wikipedia.org CC BY-SA 3.0
  • 8.
    8 Java behaves justlike that …until GC happens Every page in the Java heap - a multi-GB chunk of virtually contiguous address space - is read and written as fast as possible. While the application is stopped. This has to be milliseconds fast Javaworld.com
  • 9.
    9 Daytrip to Failtown Ifenough of the Java heap is swapped out, GC can take hundreds of seconds Your service’s clients time out Healthchecks fail Latency spikes propagate Your SLAs are shot
  • 10.
    10 Real World Example gc.log 2014-10-15T18:42:44.931+0000:4651814.348: [GC 4651814.546: [ParNew ... : 1152488K->274696K(1572864K), 106.7471350 secs] 15021460K- >14143829K(32944128K), 106.9300350 secs] [Times: user=53.97 sys=518.37, real=107.11 secs] Total time for which application threads were stopped: 107.5402200 seconds
  • 11.
    11 Why? Swap isSLOW “If you’re swapping out, you’ve already lost the battle” Swapping in happens one page at a time Swap has none of the tricks used to make filesystems fast (readahead, contiguous extents) Nationalgeographic.com
  • 12.
    12 Deal with it Theeasy way Disable swap entirely Remove /etc/fstab entries Check with swapon –s Affects all processes OS-wide configuration change We couldn’t do this for $reasons
  • 13.
    13 Deal with it Theclever way sysctl –w vm.swappiness=0 Well-known kernel tunable to limit swapping Doesn’t eliminate swapping Behavior depends on kernel version It just didn’t work
  • 14.
    14 Deal with it Thehorrible/cunning way Lock the Java heap in RAM A native system call using com.sun.jna mlockall(MCL_CURRENT) Call early in process lifetime No more swapping
  • 15.
    15 Resource Limits RLIMIT_MEMLOCK limitshow much memory a process can lock Kernel default is 64K, needs raising echo "app - memlock unlimited" >>/etc/security/limits.conf You could also give the process the CAP_IPC_LOCK capability Stephen Codrington CC BY 2.5
  • 16.
    16 Checking it worked Fora process with ~32G heap (-Xmx32684m) egrep '^Vm(Lck|RSS|Size|Swap)' /proc/$pid/status VmSize: 48767632 kB VmLck: 39652024 kB  needs to be > heap size VmRSS: 35127672 kB VmSwap: 26444 kB  needs to be small The whole heap is locked and resident Some non-heap allocations came along for the ride Others are eligible for swapping
  • 17.
    17 Sometimes You JustNeed a File Descriptor
  • 18.
    18 Java Hides theOS So you can write portable code Also … so you can only write portable code But sometime you need access to the underlying OS objects Coachingbysubjectexperts.com
  • 19.
    19 File Descriptors File descriptorsare the small integers used to name open files and sockets to Unix system calls. int read(int fd, void*buf, int len) Java hides these from you because they're different on Windows. Stationeryinfo.com
  • 20.
    20 The Goal Gateway tothe Rabbit Hole Network server performance analysis Need the length of a socket's in- kernel input queue Not a complicated example, but outside Java's API fence Jenningswire.com
  • 21.
    21 Doing it inC FILE *stream = … int length; int r = ioctl(fileno(stream), FIONREAD, &length); /* error handling */
  • 22.
    22 Doing it inJava …because the build system cannot cope with mixed C & Java First we have to convince Java to let us call ioctl() import com.sun.jna.Native; private static native int ioctl(int fd, int cmd, IntByReference valuep) throws LastErrorException; Native.register(…) private static final int FIONREAD = 0x541B;
  • 23.
    23 Doing it inJava part 2 Now we just need the Java equivalent of C's fileno() Given an InputStream object return the Unix file descriptor So simple…so impossible
  • 24.
    24 FileDescriptor is WrappedTight Class FileInputStream has public FileDescriptor getFD(); And FileDescriptor has private int fd; But NO WAY TO GET IT http://www.iconsdb.com/soylent-red-icons/private-4-icon.html
  • 25.
    25 How to Unwrapa FileDescriptor ? Well known trick using Reflection …performance concerns Tricks using Unsafe…are unsafe sun.misc.SharedSecrets This is the 2nd of the big ol' rugs under which Sun swept all the dust bunnies literateforlife.org
  • 26.
    26 SharedSecrets public static JavaIOFileDescriptorAccess getJavaIOFileDescriptorAccess(); publicinterface JavaIOFileDescriptorAccess { public int get(FileDescriptor fd); public long getHandle(FileDescriptor obj); }
  • 27.
    27 Deal With It:Simple, Fast, Obscure import sun.misc.SharedSecrets; public static int getFileDescriptor(FileDescriptor fd) … { return SharedSecrets.getJavaIOFileDescriptorAccess().get(fd); } public static int getFileDescriptor(InputStream is) … { return getFileDescriptor(((FileInputStream)is).getFD()); }
  • 28.
  • 29.
    29 The Problem We takedatabase backup snapshots and copy them to a directory on an NFS filer. Directory has 1000s of files A Java process lists this directory and reports file names and sizes SLOOOOWLY Offtackleempire.com
  • 30.
    30 How NFSv3 Works whenwe run "ls" A process on the NFS client wants to read the contents of a directory getdents() system call NFS client sends READDIR rpc to the server, caches result Gxdmtl.com
  • 31.
    31 The READDIR call greatlysimplified Maps to getdents() Arguments nfs_fh3, count Results list of { fileid3 fileid; // inode # filename3 name; // string } Theodysseyonline.com
  • 32.
    32 Message Flow for"ls /d" Process NFS Client NFS Server getdents local system calls network RPCs READDIR f1, f2, f3, …
  • 33.
    33 The Trouble WithREADDIR The "ls -l" problem Only returns names, not file attributes or a file handle If the process does a stat() or open() system call on the files, the NFS client needs to do a LOOKUP rpc, maybe GETATTR Those RPCs are one per file Reference.com
  • 34.
    34 Message Flow for"ls -l /d" Process NFS Client NFS Serverlocal system calls network RPCs getdents READDIR f1, f2, … stat(/d/f1) LOOKUP f1 fh, attrs stat(/d/f2) LOOKUP f2 fh, attrs … For N files, N+1 RPCs serialized on a single thread
  • 35.
    35 The READDIRPLUS call READDIRbut returns file handles and attributes for each file. Frontloads the information needed for the process to stat() or open() a file. Clipartkid.com
  • 36.
    36 Call Sequence for"ls -l /d" with READDIRPLUS Process NFS Client NFS Serverlocal system calls network RPCs getdents READDIR f1, f2, … stat(/d/f1) LOOKUP f1 fh, attrs stat(/d/f2) LOOKUP f2 fh, attrs … For N files, N+1 RPCs serialized on a single thread
  • 37.
    37 So we justuse READDIRPLUS right? If only it were that easy The NFS protocol puts an upper limit on the encoded size of the results A really big directory, needs more READDIRPLUS than READDIR Tradeoff: READDIR is faster with large directories if you don’t want to stat() the files. Iconfinder.com
  • 38.
    38 Heuristics To TheRescue On the first getdents() send a READDIRPLUS If the process open()s or stat()s set a flag On subsequent getdents() if the flag is set, send READDIRPLUS else READDIR 111emergency.co.uk
  • 39.
    "ls" = processeswhich do getdents  READDIRPLUS getdents  READDIR getdents  READDIR 39 These Patterns Are Optimal "ls –l" = processes which do getdents  READDIRPLUS stat, stat, … (cached) getdents  READDIRPLUS stat, stat, … (cached) getdents  READDIRPLUS stat, stat, … (cached)
  • 40.
    40 "ls -l /d"in Java This is the solution you will find on StackOverflow File dir = File("/d"); File[] list = dir.listFiles(); for (File f: list) { print(f.getName(), f.length()); } Steven James Keathley
  • 41.
    41 java.io.File Wraps a stringfilename File.length()  stat() File.listFiles() reads the whole directory using N x getdents(), returns File[] The API forces this behavor Stationeryinfo.com
  • 42.
    C processes do getdents READDIRPLUS stat, stat, … (cached) getdents  READDIRPLUS stat, stat, … (cached) getdents  READDIRPLUS stat, stat, … (cached) 42 "ls -l": C vs Java on large directories Java processes do getdents  READDIRPLUS getdents  READDIR getdents  READDIR stat, stat, … (cached) stat  LOOKUP stat  LOOKUP … 100s x slower
  • 43.
    43 Deal With It:Rewrite Using java.nio.files new in Java 8 Path dir = Paths.get("/d"); DirectoryStream<Path> stream = Files.newDirectoryStream(dir); for (Path p: stream) { File = p.toFile(); print(f.getName(), f.length()); } The iterator object delays the getdents()until they're needed
  • 44.
    44 Understatement "[newDirectoryStream] may be moreresponsive when working with remote directories" -- Java 8 Documentation Theflyingtortoise.blogspot.com
  • 45.
  • 46.
    46 Java GC isImportant GC is a limiting factor on the availability of Java services In production, it's important to keep GC behaving well "You can't manage what you can't measure"  GC logging in production. HearMeSayThis.org
  • 47.
    47 GC Logging OptionsWe Use -Xloggc:$filename -XX:+PrintGC -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCApplicationConcurrentTime -XX:+PrintTenuringDistribution More data is better, right Hank Rabe hankstruckpictures.com
  • 48.
    48 What gets logged? MajorGC event 2017-03-12T23:19:01.480+0000: 3382210.059: Total time for which application threads were stopped: 0.0004373 seconds 2017-03-12T23:22:01.186+0000: 3382389.765: Application time: 179.7055362 seconds 2017-03-12T23:22:01.186+0000: 3382389.766: [GC (Allocation Failure) 3382389.766: [ParNew Desired survivor size 134217728 bytes, new threshold 15 (max 15) - age 1: 998952 bytes, 998952 total - age 2: 12312 bytes, 1011264 total - age 3: 10672 bytes, 1021936 total - age 4: 10480 bytes, 1032416 total - age 5: 753312 bytes, 1785728 total - age 6: 11208 bytes, 1796936 total - age 7: 149688 bytes, 1946624 total - age 8: 9904 bytes, 1956528 total - age 9: 11000 bytes, 1967528 total - age 10: 10136 bytes, 1977664 total - age 11: 10184 bytes, 1987848 total - age 12: 10304 bytes, 1998152 total - age 13: 92360 bytes, 2090512 total - age 14: 176648 bytes, 2267160 total - age 15: 70328 bytes, 2337488 total : 539503K->10753K(786432K), 0.0073635 secs] 2119127K->1590653K(3932160K), 0.0075104 secs] [Times: user=0.09 sys=0.00, real=0.01 secs]
  • 49.
    49 Byte & LineRate of Logging Major GC up to 1200 B Minor GC ~200 B Light Load Heavy Load Bytes/day 3.4 MB/d 33.0 MB/d Lines/day 39 KL/d 543 KL/d Visitmelbourne.com
  • 50.
    50 The Problem The GClog is written from JDK's C code in Stop-The-World No userspace buffering. Every line is a write() and an opportunity to block in the kernel If the root disk is heavily loaded, the long tail latency can be 1-5 sec Client timeouts typically 1-5 sec Eureferendum.com
  • 51.
    51 Possible Solutions Disable GClogging. Flying blind. Log4j 2.x async logging. GC iog is written from C code Log to a named FIFO. If the reader process dies the app is blocked. En.oxforddictionaries.com
  • 52.
    52 Deal With It:Log To FUSE Log to a file mounted on a FUSE fileystem. The filesystem's daemon accepts writes and queues them in userspace, writes to disk. Provides asynchrony. If the daemon dies, app write()s fail immediately with ENOTCONN, app continues.
  • 53.
  • 54.
    54 Java Hostname Lookup Javamakes this easy String host = "www.example.com"; int port = 80; Socket sock = Socket(); Sock.connect(host, port);  Kabl00ey @ wikipedia.org CC BY-SA 3.0
  • 55.
    55 Peeling The Onion Theeasy APIs are layers of wrappers around class InetAddress { public static InetAddress[] getAllByName(String host) … By default calls libc's getaddrinfo() Which calls Linux's NSCD High Mowing Seed Company
  • 56.
    56 Linux NSCD Name ServiceCache Daemon Standard (in the glibc repo) Runs locally on every box Configurable. Supports multiple name service providers, like a DNS client. Understands & obeys TTL Theodysseyonline.com
  • 57.
    57 Java's In-Process Cache InetAddressconveniently caches the results in RAM For 30 sec. Ignores the TTL returned from DNS Java could have gotten this right int __gethostbyname3_r(const char *name, …, int32_t *ttlp, …);
  • 58.
    58 Why Does JavaDo This? Hostname Resolver Latencies Java cache 53 ± 2 µs NSCD 72 ± 3 ms DNS server 192 ± 8 ms Measured in production using a specially written Java program. Norfolkwildlifetrust.co.uk
  • 59.
    59 The Problem: LongTTLs Most stable A/AAAA records are configured with a TTL of 1h or 1d Talking to NSCD every 30 sec is wasteful Yathin S Krishnappa CC BY-SA 3.0
  • 60.
    60 Worse Problem: ShortTTLs One way of achieving load balancing is with Round-Robin DNS Some implementations rely on an ultra short TTL << 30sec to achieve failover spirit-animals.com
  • 61.
    61 Deal With It:Disable Java's Cache easy but impactful Add to $JAVA_HOME/lib/security/java.securit y networkaddress.cache.ttl=0 or (older) sun.net.inetaddr.ttl=0 This is global to the host
  • 62.
    62 Deal With It:Bypass Cache using Reflection Reflection is almost never the answer static InetAddress[] getAllByNameUncached(String hostname) { Field field = InetAddress.class.getDeclaredField("impl"); field.setAccessible(true); Object impl = field.get(null); Method method = null; for (Method m : impl.getClass().getDeclaredMethods()) { if (m.getName().equals("lookupAllHostAddr")) { method = m; break; } } method.setAccessible(true); return (InetAddress[]) method.invoke(impl, hostname); }
  • 63.
    63 Deal With It:DNS With JNDI least worst option Use com.sun.jndi.dns to make your own DNS requests for A/AAAA records Quick, easy Does not affect other lookups in the same process Bypasses nscd entirely; all lookups talk to DNS server.
  • 64.
    64 Deal With It:DNS SRV Records with JNDI very useful and very hard Use com.sun.jndi.dns to make your own DNS requests for SRV records Need to handle stale entries Have to parse out SRV response Does not affect other lookups in the same process Bypasses nscd entirely; all lookups talk to DNS server.
  • 65.
  • 66.
    66 The Two HardestThings in CS 1. Naming 2. Cache Invalidation 3. Off By One Errors
  • 67.
    67 The Three HardestThings in CS 1. Naming 2. Cache Invalidation 3. Off By One Errors 4. Making Java Behave Rationally
  • 68.
    68 Java Is NotMagic Understand that Java doesn’t always do the right thing with your OS. Anne Zweiner
  • 69.
    69 Horrible Things LiveIn The Corners Modern software is complicated and has corner cases. Bad bad horrible things live there. Thekidsshouldseethis.com
  • 70.
    70 Portable code isnice Working code is better Do what is needful Do not be afraid to subvert Java’s portability fascism Know your OS rickele @ flickr.com
  • 71.
    ©2014 LinkedIn Corporation.All Rights Reserved.

Editor's Notes

  • #15 https://github.com/voldemort/voldemort/blob/master/src/java/voldemort/utils/JNAUtils.java#L45
  • #18 A version of this story appears in my blog post https://www.linkedin.com/pulse/getting-unix-file-descriptor-java-greg-banks
  • #23 https://github.com/gnb/voldemort/blob/roswap/src/java/voldemort/utils/UnixUtils.java#L125
  • #26 http://dev.xscheme.de/2013/05/getting-the-file-descriptor-integers-from-a-socket-java/
  • #27 http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/687fd7c7986d/src/share/classes/sun/misc/SharedSecrets.java#l126 http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/687fd7c7986d/src/share/classes/sun/misc/JavaIOFileDescriptorAccess.java#l35 Two implementations of the interface, one exists on Unix platforms and throws an exception for getHandle(), one vice-versa
  • #28 https://github.com/gnb/voldemort/blob/roswap/src/java/voldemort/utils/UnixUtils.java#L103
  • #29 Thanks to Gabe Nydick for bringing me in to troubleshoot NFS issues in his product
  • #32 https://tools.ietf.org/html/rfc1813#section-3.3.16
  • #36 https://tools.ietf.org/html/rfc1813#section-3.3.17
  • #39 https://github.com/torvalds/linux/commit/d69ee9b85541a69a1092f5da675bd23256dc62af
  • #41 http://stackoverflow.com/questions/15482423/how-to-list-the-files-in-current-directory
  • #42 https://docs.oracle.com/javase/7/docs/api/java/io/File.html
  • #44 https://docs.oracle.com/javase/8/docs/api/java/nio/file/Files.html#newDirectoryStream-java.nio.file.Path- http://stackoverflow.com/questions/33668859/files-newdirectorystream-vs-files-list is very very wrong, its not a matter of style
  • #45 https://docs.oracle.com/javase/8/docs/api/java/io/File.html#list--
  • #55 https://docs.oracle.com/javase/7/docs/api/java/net/Socket.html
  • #56 https://docs.oracle.com/javase/7/docs/api/java/net/InetAddress.html#getAllByName(java.lang.String)
  • #59 The comments in the JDK documentation and source cpde about security appear to be entirely bullcrap
  • #61 LinkedIn also has a very clever internal product that exposes applied cluster topology information with heartbeat via DNS. This service uses a 3sec TTL.
  • #63 Object impl = null; private static Method method = null; static InetAddress[] getAllByNameUncached(String hostname) ... { if (impl == null) { Field field = InetAddress.class.getDeclaredField("impl"); field.setAccessible(true); impl = field.get(null); for (Method m : impl.getClass().getDeclaredMethods()) { if (m.getName().equals("lookupAllHostAddr")) { method = m; break; } } method.setAccessible(true); } return (InetAddress[]) method.invoke(impl, hostname); }