SlideShare a Scribd company logo
Turn your Java Objects into
Binary for better performance
17 Nov 15
BEFORE WE BEGIN…
C24
3
BINARY WE'RE TALKING 1S & 0S
16
0
16
1
16
2
16
3
16
4
16
5
16
6
16
7
46
10
0
10
1
10
2
10
3
10
4
10
5
10
6
10
7
001
00010110
2
0
2
1
2
2
2
3
2
4
2
5
2
6
2
7
DECIMAL
HEX
BINARY
C24
4
BINARY WE'RE TALKING 1S & 0S
00010110
BYTE
INT 4 BYTES
LONG 8 BYTES
There are also shorts (16 bits), chars (16), floats (32),

doubles (64) and booleans (undefined)
C24
5
BINARY WE'RE TALKING 1S & 0S
Most of the Java primitives (for some bizarre reason)
are signed
So the “top” bit is used for the sign, 0 is positive and 1 is negative
Bitwise operands work on ints and longs
& (AND)
| (OR)
^ (XOR)
~ (NOT)
<< (left shift), >> (right shirt with sign) and >>> (right shift with 0s)
C24
6
A QUICK TEST…
What is this 0x20 in decimal?
32
What is special about 0x55 and 0xAA in hex?
They are 01010101 and 10101010 (alternate 1s and 0s)
How many bits are needed to store all of the months of
the year?
Just 4 bits, 2^4 = 16, more than enough
There are 365.25 days in a year, how many bits to store
100 years at day resolution?
16 bits (2 bytes), 2^16 = 65,536 or 179 years worth of days
What is ~-1?
NOT -1 or 0, i.e. the reverse of 0xFFFFFFFF (same as -1^-1)
C24
7
TO CALCULATE THE NUMBER OF BITS…
To calculate the number of bits needed we simply
divide the Log of the number by the Log of 2 (any base)
E.g. 36,525 days in 100 years
Log 36,525 = 4.56259
Log 2 = 0.30103
Log 36525 / Log 2 = 15.15660
So we need 16 bits
C24
8
JAVA IS VERY INEFFICIENT AT STORING DATA
Take the string “ABC”, typically it needs just 4 bytes
Three if you know the length won’t change
Even less if “ABC” is an enumeration
Java takes 48 bytes to store “ABC” as

a String
And many other simple classes too
A B C 0
C24
9
IT'S NOT JUST A STRING
If it was just String then we could use byte[] or char[] but Java
bloating is endemic
Double (even today’s date takes 48 bytes)
BigDecimal
Date
ArrayList
Use just one or two and we’re OK but write a class with a few
of these and we really start to see the problem
A class with 10 minimum sized Objects can be over 500 bytes
in size - for each instance
What’s worse is that each object requires 11 separate memory allocations
All of which need managing
Which is why we have the garbage collector(s)
C24
10
A NEW DATE?
Could we create a new Date class?
class NewDate {

private byte year;

private byte month;

private byte day;

}
Yes but this is still 16 bytes in Java, 8 times more than
we need
In C/C++ we could use bitfields, 7 for the year, 4 for the
month and 5 for the day.
Total just 16 bits or two bytes
This is is our target, 2 bytes
C24
11
48 BYTES IS A LOT!
To store today’s date Java needs 48 bytes that looks
like this…
00101001 11000111 10100000 00010111
10100011 11100101 00010101 10010100
11101001 01000001 10101111 01010001
10100011 11100100 11010100 11010100
00101001 11000111 10100000 00010101
10100111 11100101 00010101 10010100
01101101 01000001 10101101 01010101
10100011 01100100 01010101 11010100
10001001 01000111 10100001 00010100
10100111 11100101 00010100 10010100
11101001 01000000 00101100 01010100
10100011 11100101 01010101 11010101
As opposed to this: 0x4174 or 01000001 01111000
The red is just random 0s and 1s but the green binary is real (16,756)
C24
12
START SIMPLE
Simple data we’re going to be referring to for the next few
slides…
ID TradeDate BuySell Currency1 Amount1
Exchange
Rate
Currency2 Amount2
Settlement
Date
1 21/07/2014 Buy EUR 50,000,000 1.344 USD 67,200,000.00 28/07/2014
2 21/07/2014 Sell USD 35,000,000 0.7441 EUR 26,043,500.00 20/08/2014
3 22/07/2014 Buy GBP 7,000,000 172.99 JPY 1,210,930,000.00 05/08/2014
4 23/07/2014 Sell AUD 13,500,000 0.9408 USD 12,700,800.00 22/08/2014
5 24/07/2014 Buy EUR 11,000,000 1.2148 CHF 13,362,800.00 31/07/2014
6 24/07/2014 Buy CHF 6,000,000 0.6513 GBP 3,907,800.00 31/07/2014
7 25/07/2014 Sell JPY 150,000,000 0.6513 EUR 97,695,000.00 08/08/2014
8 25/07/2014 Sell CAD 17,500,000 0.9025 USD 15,793,750.00 01/08/2014
9 28/07/2014 Buy GBP 7,000,000 1.8366 CAD 12,856,200.00 27/08/2014
10 28/07/2014 Buy EUR 13,500,000 0.7911 GBP 10,679,850.00 11/08/2014
C24
13
START WITH THE CSV
Each line is relatively efficient
ID,TradeDate,BuySell,Currency1,Amount1,Exchange Rate,Currency2,Amount2,Settlement Date
1,21/07/2014,Buy,EUR,50000000.00,1.344,USD,67200000.00,28/07/2014
2,21/07/2014,Sell,USD,35000000.00,0.7441,EUR,26043500.00,20/08/2014
3,22/07/2014,Buy,GBP,7000000.00,172.99,JPY,1210930000,05/08/2014
But it’s in human-readable format not CPU readable
At least not efficient CPU readable
We could store the lines as they are but in order to work
with the data we need it in something Java can work
with
The same goes for any other language, C, C++, PHP, Scala etc.
So typically we parse it into a Java class and give it a
self-documenting name - Row
C24
14
CSV TO JAVA
This seems like a reasonably good implementation
From this…
ID,TradeDate,BuySell,Currency1,Amount1,Exchange Rate,Currency2,Amount2,Settlement Date
1,21/07/2014,Buy,EUR,50000000.00,1.344,USD,67200000.00,28/07/2014
2,21/07/2014,Sell,USD,35000000.00,0.7441,EUR,26043500.00,20/08/2014
3,22/07/2014,Buy,GBP,7000000.00,172.99,JPY,1210930000,05/08/2014
We get this…
public class ObjectTrade {

private long id;

private Date tradeDate;

private String buySell;

private String currency1;

private BigDecimal amount1;

private double exchangeRate;

private String currency2;

private BigDecimal amount2;

private Date settlementDate;

}
C24
15
EVERYTHING’S FINE
With very simple getters and setters, something to
parse the CSV and a custom toString() we’re good
public BasicTrade parse( String line ) throws ParseException {
String[] fields = line.split(",");
setId(Long.parseLong(fields[0]));
setTradeDate(DATE_FORMAT.get().parse(fields[1]));
setBuySell(fields[2]);
setCurrency1(fields[3]);
setAmount1(new BigDecimal(fields[4]));
setExchangeRate(Double.parseDouble(fields[5]));
setCurrency2(fields[6]);
setAmount2(new BigDecimal(fields[7]));
setSettlementDate(DATE_FORMAT.get().parse(fields[8]));
return this;
}
What could possibly go wrong?
C24
16
CLASSIC JAVA
In fact everything works really well, this is how Java
was designed to work
There are a few “fixes” to add for SimpleDateFormat due to it not
being thread safe but otherwise we’re good
Performance is good, well it seems good and
everything is well behaved
As the years go on and the volumes increase, we now
have 100 million of them
Now we start to see some problems
To start with we don’t have enough memory - GC is killing
performance
When we distribute the size of the objects are killing performance too
C24
17
JAVA IS VERBOSE & SLOW
A simple CSV can grow by over 4 times…
ID,TradeDate,BuySell,Currency1,Amount1,Exchange Rate,Currency2,Amount2,Settlement Date
1,21/07/2014,Buy,EUR,50000000.00,1.344,USD,67200000.00,28/07/2014
2,21/07/2014,Sell,USD,35000000.00,0.7441,EUR,26043500.00,20/08/2014
3,22/07/2014,Buy,GBP,7000000.00,172.99,JPY,1210930000,05/08/2014
From roughly 70 bytes per line as CSV to around 328 in
Java
That means you get over 4 times less data when

stored in Java
Or need over 4 times more RAM, network

capacity and disk
Serialized objects are even bigger!
C24
18
JAVA BLOATS YOUR DATA
You probably thought XML was bad imagine what
happens when you take XML and bind it to Java!
Anything you put into Java objects get

horribly bloated in memory
Effectively you are paying the price of

memory and hardware abstraction
C24
19
C AND C++ WERE RAW
Raw code, raw data, sometimes a little dangerous but
no garbage after consumption
C24
20
JAVA IS PLASTIC WRAPPING
Everything is wrapped in Java and that creates
garbage, a LOT of garbage - It all needs collecting
C24
21
JAVA OBJECTS - GOOD FOR HARDWARE VENDORS
These fat Java Objects are a hardware vendor’s wet
dream
Think about it, Java came from Sun, it was free but they made money
selling hardware, well they tried at least
Fat objects need more memory, more CPU, more
network capacity, more machines
More money for the hardware vendors - why would they change it?
And everything just runs slower because you’re busy
collecting all the memory you’re not using
Java for programmers was like free shots for

AA members
C24
22
THIS ISN’T JUST JAVA
Think this is just a Java problem?
It’s all the same, every time you create objects you’re
blasting huge holes all over your machine’s RAM
And someone’s got to clean all the garbage up too!
Great for performance-tuning consultants :-)
C24
23
IN-MEMORY CACHES
If you use an in-memory cache then you’re most likely
suffering from the same problem…
Many of them provide and use compression or “clever”
memory optimisation
If they use reflection things slow down
If you have to write it yourself you have a lot of work to do, especially
for hierarchical data
C24
24
CLASSIC JAVA BINDING
This is how we’d typically code this simple CSV
example…
ID,TradeDate,BuySell,Currency1,Amount1,Exchange Rate,Currency2,Amount2,Settlement Date
1,21/07/2014,Buy,EUR,50000000.00,1.344,USD,67200000.00,28/07/2014
2,21/07/2014,Sell,USD,35000000.00,0.7441,EUR,26043500.00,20/08/2014
3,22/07/2014,Buy,GBP,7000000.00,172.99,JPY,1210930000,05/08/2014
public class ObjectTrade {

private long id;

private Date tradeDate;

private String buySell;

private String currency1;

private BigDecimal amount1;

private double exchangeRate;

private String currency2;

private BigDecimal amount2;

private Date settlementDate;

}
It’s easy to write the code and fast to execute, retrieve,
search and query data BUT it needs a lot of RAM and it
slow to manage
C24
25
JUST STORE THE ORIGINAL DATA?
We could just store each row
ID,TradeDate,BuySell,Currency1,Amount1,Exchange Rate,Currency2,Amount2,Settlement Date
1,21/07/2014,Buy,EUR,50000000.00,1.344,USD,67200000.00,28/07/2014
2,21/07/2014,Sell,USD,35000000.00,0.7441,EUR,26043500.00,20/08/2014
3,22/07/2014,Buy,GBP,7000000.00,172.99,JPY,1210930000,05/08/2014
public class StringTrade {

private String row;

}
But every time we wanted a date or amount we’d need
to parse it and that would slow down analysis
If the data was XML it would be even worse
We’d need a SAX (or other) parser every time
C24
26
JUST STORE THE ORIGINAL DATA?
public class StringTrade {

private String row;

}
Allocation of new StringTrades are faster as we allocate
just 2 Objects
Serialization and De-Serialization are improved for the
same reason
BUT over all we lose out when we’re accessing the data
We need to find what we’re looking each time
This is sort of OK with a CSV but VERY expensive for XML
public class XmlTrade {

private String xml;

}
C24
27
COMPRESSION OR COMPACTION?
OK, a lot of asks, why don’t we just use compression?
Well there are many reasons, mainly that it’s slow, slow
to compress and slow to de-compress, the better it is
the slower it is
Compression is the lazy person’s tool, a good protocol
or codec doesn’t compress well, try compressing your
MP3s or videos
It has it’s place but we’re looking for more, we want
compaction not compression, then we get
performance too
It’s very similar to image processing, zip the images
and you can’t see them
C24
28
NOW IN BINARY…
This is what our binary version looks like…
ID,TradeDate,BuySell,Currency1,Amount1,Exchange Rate,Currency2,Amount2,Settlement Date
1,21/07/2014,Buy,EUR,50000000.00,1.344,USD,67200000.00,28/07/2014
2,21/07/2014,Sell,USD,35000000.00,0.7441,EUR,26043500.00,20/08/2014
3,22/07/2014,Buy,GBP,7000000.00,172.99,JPY,1210930000,05/08/2014
public class ObjectTrade extends SDO {

private byte[] data;
}
Just one object again so fast to allocate
If we can encode the data in the binary then it’s fast to
query too
And serialisation is just writing out the byte[]
C24
29
SAME API, JUST BINARY
Classic getter and setter vs. binary implementation
Identical API
C24
30
SAME API, JUST BINARY
Classic getter and setter vs. binary implementation
Identical API
C24
31
DID I MENTION.. THE SAME API
This is a key point, we’re changing the implementation
not the API
This means that Spring, in-memory caches and other
tools work exactly as they did before
Let’s look at some code and

a little demo of this…
C24
32
TIME TO SEE SOME CODE
A quick demo, I’ve created a Trade interface and two
implementations, one “classic” and the other binary
We’ll create a List of a few million trades (randomly but quite cleverly
generated)
We’ll run a quick Java 8 filter and sort on them
We’ll serialize and de-serialize them to create a new List
Finally for the binary version we’ll write out the entire list
via NIO and read it in again to a new List
C24
33
HOW DOES IT PERFORM?
Compare classic Java binding to binary…
Classic Java
version
Binary Java
version
Improvement
Bytes used 328 39 840%
Serialization size 668 85 786%
Custom
Serialization
668 40 1,670%
Time to Serialize/
Deserialize
41.1µS 4.17µS 10x
Batched Serialize/
Deserialize
12.3µS 44nS 280x
C24
34
COMPARING…
While the shapes look the same you can clearly see the
differences on both axis
The Object version is a lot slower
And the Object version uses significantly more memory
Note that the left side (objects) has a heap size of 8GB,
the right (binary) has a heap size of just 2GB
C24
35
COMPARING…
While the shapes look the same you can clearly see the
differences on both axis
The Object version is a lot slower
And the Object version uses significantly more memory
Note that the left side (objects) has a heap size of 8GB, the
right (binary) has a heap size of just 2GB
C24
36
COMPARING…
These two graphs show the GC pause time during
message creation and serialisation
Left is “classic” Java
Right is the binary version
The top of the right hand graph is lower than the first
rung of the left (50ms)
C24
37
COMPARING…
These two graphs show the GC pause time during message
creation and serialisation
Left is “classic” Java
Right is the binary version
The top of the right hand graph is lower than the first rung of
the left (50ms)
C24
38
BEYOND THE CSV FILE
It worked for a simple CSV how about more complex
models like XML or JSON?
Once we have a mapping of types to binary and code
we can extend this to any type of model
But it gets a lot more complex as we have optional
elements, repeating elements and a vast array of
different types
XML can be represented as binary and binary as XML,
you just need a model
C24
39
STANDARD JAVA BINDING
This is a minute snippet from FpML, ISO 20022 is a very
similar standard
<resetFrequency>

<periodMultiplier>6</periodMultiplier>

<period>M</period>

</resetFrequency>
JAXB, JIBX, Castor etc. generate something like …
public class ResetFrequency {
private BigInteger periodMultiplier; // Positive Integer
private Object period; // Enum of D, W, M, Q,Y
public BigInteger getPeriodMultiplier() {
return this.periodMultiplier;
}
// constructors & other getters and setters
In memory - 3 objects - at least 144 bytes
The parent, a positive integer and an enumeration for Period
3 Java objects at 48 bytes is 144 bytes and it becomes fragmented in
memory
C24
40
JAVA BINDING TO BINARY
<resetFrequency>

<periodMultiplier>6</periodMultiplier>

<period>M</period>

</resetFrequency>
Using our binary codec we generate …
ByteBuffer data; // From the root object
public BigInteger getPeriodMultiplier() {
int byteOffset = 123; // Actually a lot more complex
return BigInteger.valueOf( data.get(byteOffset) & 0x1F );
}
// constructors & other getters
In memory -1 byte for all three fields
The root contains one ByteBuffer which is a wrapper for byte[]
The getters use bit-fields…
PeriodMultiplier is 1-30 so needs 5 bits
Period is just 3 bits for values D, W, M, Q or Y
C24
41
SERIALIZATION & DESERIALIZATION
Writing a Java Object as a serialized stream is slow
Of the Object contains other objects then it’s painfully SLOW
Several frameworks exist for (performance) serialisation
Avro, Kryo, Thrift, Protobuf, Cap’n proto
Of course there’s always JSON and XML if speed and memory are
infinite resources and you like reading the data streams before bed
time
Generating the serialisation though is faster still since
we can avoid reflection and the data is already in
serialised form
The same is true for deserialisation
So we can almost double the speed of serialisation
frameworks like Kryo with even smaller packet size
C24
42
JMH BENCHMARKS
JMH is an open source Java Micro-benchmark
Harness, the following are timings comparing Kryo and
our own binary serialisation…
Benchmark Mode Cnt Score Error Units
IoBenchmark.deSer sample 58102 171.596 ± 0.541 ns/op
IoBenchmark.ser sample 58794 169.579 ± 0.546 ns/op
Benchmark Mode Cnt Score Error Units
KryoBenchmark.deSer sample 26423 378.121 ± 1.096 ns/op
KryoBenchmark.ser sample 24951 400.387 ± 1.512 ns/op
http://openjdk.java.net/projects/code-tools/jmh/
https://github.com/eishay/jvm-serializers
C24
43
ANALYTICS
When we have a lot of data there seems to be an
assumption that we need Hadoop
Hadoop is an implementation of Map / Reduce
If we shrink the data then we can use a lambda for the
same task…
.collect(Collectors.groupingBy(f -> f.getSignedCurrencyAmount()

.getCurrencyAmount().getCurrency(),

Collectors.reducing(BigDecimal.ZERO, t -> t.getSignedCurrencyAmount()

.getCurrencyAmount().getAmount(), BigDecimal::add)));
Produces (in 105ms for 1 million complex SWIFT
messages)
TWD=16218034.70, CHF=127406235.40, HKD=1292102912.26, EUR=831021868.46,
DKK=196507853.20, CAD=137982907.68, USD=525123842.02, ZAR=19383094.78,
NOK=618033504.06, AUD=218890981.32, ILS=46628.66, SGD=74032013.42 etc.
C24
44
ANOTHER DEMO
Take some raw XML (an FpML derivative, about 7.4k of XML)
Parse it, mutate a few variables and then compact each one
to its binary version - We do this 1 million times
This takes about 100 seconds
Now the test begins
We take a look at the binary version (just 370 bytes)
We search through the 1 million trades for data and aggregate the results
Then we try it multi-threaded (using parallelStream())
A few more similar operations (I will discuss)
Finally we’ll write all 1 million to disk and then read them
back into another array (comparing to make sure it worked)
This will be a test of serialisation
C24
45
IT’S REALLY SMALL
Fit 10-20x more messages into your existing memory
footprint
C24
46
IT’S REALLY FAST
Binary messages can be queried without
deserialisation
Binary maximises the benefit of your CPU cache
C24
47
IT’S TRANSPORT FRIENDLY
The in-memory format is the same as its on-disk format
We can query on-disk data almost as quickly as in-memory data
Replication & distribution doesn’t incur (de)serialisation costs
C24
48
SO IT WORKS
Key points from the slides…
If you want performance, scalability and ease of
maintenance then you need…
Reduce the amount of memory you’re using
Reduce the way you use the memory
Try using binary data instead of objects
Start coding intelligently like we used to in the 70s, 80s and early 90s
Or just buy much bigger, much faster machines, more
RAM, bigger networks and more DnD programmers
And you’ll probably get free lunches from your hardware vendors
‫شكرا‬
@jtdavies

More Related Content

Similar to Devoxx MA 2015 - Turn you java objects into binary

What does OOP stand for?
What does OOP stand for?What does OOP stand for?
What does OOP stand for?Colin Riley
 
10 hexadecimal number system student
10   hexadecimal number system student10   hexadecimal number system student
10 hexadecimal number system studentLee Chadwick
 
There are two types of ciphers - Block and Stream. Block is used to .docx
There are two types of ciphers - Block and Stream. Block is used to .docxThere are two types of ciphers - Block and Stream. Block is used to .docx
There are two types of ciphers - Block and Stream. Block is used to .docxrelaine1
 
Swifter Taipei 聊聊 Swift 遊樂場、變數常數、數字與運算
Swifter Taipei 聊聊 Swift 遊樂場、變數常數、數字與運算Swifter Taipei 聊聊 Swift 遊樂場、變數常數、數字與運算
Swifter Taipei 聊聊 Swift 遊樂場、變數常數、數字與運算Xue Xin Tsai
 
Oracle JSON treatment evolution - from 12.1 to 18 AOUG-2018
Oracle JSON treatment evolution - from 12.1 to 18 AOUG-2018Oracle JSON treatment evolution - from 12.1 to 18 AOUG-2018
Oracle JSON treatment evolution - from 12.1 to 18 AOUG-2018Alexander Tokarev
 
Data Wars: The Bloody Enterprise strikes back
Data Wars: The Bloody Enterprise strikes backData Wars: The Bloody Enterprise strikes back
Data Wars: The Bloody Enterprise strikes backVictor_Cr
 
PostgreSQL as seen by Rubyists (Kaigi on Rails 2022)
PostgreSQL as seen by Rubyists (Kaigi on Rails 2022)PostgreSQL as seen by Rubyists (Kaigi on Rails 2022)
PostgreSQL as seen by Rubyists (Kaigi on Rails 2022)Андрей Новиков
 
Next Generation Indexes For Big Data Engineering (ODSC East 2018)
Next Generation Indexes For Big Data Engineering (ODSC East 2018)Next Generation Indexes For Big Data Engineering (ODSC East 2018)
Next Generation Indexes For Big Data Engineering (ODSC East 2018)Daniel Lemire
 
Memory Optimization
Memory OptimizationMemory Optimization
Memory Optimizationguest3eed30
 
Memory Optimization
Memory OptimizationMemory Optimization
Memory OptimizationWei Lin
 
Breaking first-normal form with Hive
Breaking first-normal form with HiveBreaking first-normal form with Hive
Breaking first-normal form with HiveEdward Capriolo
 
Architecting for a scalable enterprise - John Davies
Architecting for a scalable enterprise - John DaviesArchitecting for a scalable enterprise - John Davies
Architecting for a scalable enterprise - John DaviesJAXLondon_Conference
 
What I Love About Ruby
What I Love About RubyWhat I Love About Ruby
What I Love About RubyKeith Bennett
 
GDC2014: Boosting your ARM mobile 3D rendering performance with Umbra
GDC2014: Boosting your ARM mobile 3D rendering performance with Umbra GDC2014: Boosting your ARM mobile 3D rendering performance with Umbra
GDC2014: Boosting your ARM mobile 3D rendering performance with Umbra Umbra Software
 
A 64-bit horse that can count
A 64-bit horse that can countA 64-bit horse that can count
A 64-bit horse that can countAndrey Karpov
 
The article is a report about testing of portability of Loki library with 64-...
The article is a report about testing of portability of Loki library with 64-...The article is a report about testing of portability of Loki library with 64-...
The article is a report about testing of portability of Loki library with 64-...PVS-Studio
 
Tales from the Field
Tales from the FieldTales from the Field
Tales from the FieldMongoDB
 
Bits and the organization of memory
Bits and the organization of memoryBits and the organization of memory
Bits and the organization of memorydmendonsa
 

Similar to Devoxx MA 2015 - Turn you java objects into binary (20)

What does OOP stand for?
What does OOP stand for?What does OOP stand for?
What does OOP stand for?
 
10 hexadecimal number system student
10   hexadecimal number system student10   hexadecimal number system student
10 hexadecimal number system student
 
Master tuning
Master   tuningMaster   tuning
Master tuning
 
Fancy talk
Fancy talkFancy talk
Fancy talk
 
There are two types of ciphers - Block and Stream. Block is used to .docx
There are two types of ciphers - Block and Stream. Block is used to .docxThere are two types of ciphers - Block and Stream. Block is used to .docx
There are two types of ciphers - Block and Stream. Block is used to .docx
 
Swifter Taipei 聊聊 Swift 遊樂場、變數常數、數字與運算
Swifter Taipei 聊聊 Swift 遊樂場、變數常數、數字與運算Swifter Taipei 聊聊 Swift 遊樂場、變數常數、數字與運算
Swifter Taipei 聊聊 Swift 遊樂場、變數常數、數字與運算
 
Oracle JSON treatment evolution - from 12.1 to 18 AOUG-2018
Oracle JSON treatment evolution - from 12.1 to 18 AOUG-2018Oracle JSON treatment evolution - from 12.1 to 18 AOUG-2018
Oracle JSON treatment evolution - from 12.1 to 18 AOUG-2018
 
Data Wars: The Bloody Enterprise strikes back
Data Wars: The Bloody Enterprise strikes backData Wars: The Bloody Enterprise strikes back
Data Wars: The Bloody Enterprise strikes back
 
PostgreSQL as seen by Rubyists (Kaigi on Rails 2022)
PostgreSQL as seen by Rubyists (Kaigi on Rails 2022)PostgreSQL as seen by Rubyists (Kaigi on Rails 2022)
PostgreSQL as seen by Rubyists (Kaigi on Rails 2022)
 
Next Generation Indexes For Big Data Engineering (ODSC East 2018)
Next Generation Indexes For Big Data Engineering (ODSC East 2018)Next Generation Indexes For Big Data Engineering (ODSC East 2018)
Next Generation Indexes For Big Data Engineering (ODSC East 2018)
 
Memory Optimization
Memory OptimizationMemory Optimization
Memory Optimization
 
Memory Optimization
Memory OptimizationMemory Optimization
Memory Optimization
 
Breaking first-normal form with Hive
Breaking first-normal form with HiveBreaking first-normal form with Hive
Breaking first-normal form with Hive
 
Architecting for a scalable enterprise - John Davies
Architecting for a scalable enterprise - John DaviesArchitecting for a scalable enterprise - John Davies
Architecting for a scalable enterprise - John Davies
 
What I Love About Ruby
What I Love About RubyWhat I Love About Ruby
What I Love About Ruby
 
GDC2014: Boosting your ARM mobile 3D rendering performance with Umbra
GDC2014: Boosting your ARM mobile 3D rendering performance with Umbra GDC2014: Boosting your ARM mobile 3D rendering performance with Umbra
GDC2014: Boosting your ARM mobile 3D rendering performance with Umbra
 
A 64-bit horse that can count
A 64-bit horse that can countA 64-bit horse that can count
A 64-bit horse that can count
 
The article is a report about testing of portability of Loki library with 64-...
The article is a report about testing of portability of Loki library with 64-...The article is a report about testing of portability of Loki library with 64-...
The article is a report about testing of portability of Loki library with 64-...
 
Tales from the Field
Tales from the FieldTales from the Field
Tales from the Field
 
Bits and the organization of memory
Bits and the organization of memoryBits and the organization of memory
Bits and the organization of memory
 

Recently uploaded

UiPath Test Automation using UiPath Test Suite series, part 1
UiPath Test Automation using UiPath Test Suite series, part 1UiPath Test Automation using UiPath Test Suite series, part 1
UiPath Test Automation using UiPath Test Suite series, part 1DianaGray10
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Product School
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesThousandEyes
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...Product School
 
НАДІЯ ФЕДЮШКО БАЦ «Професійне зростання QA спеціаліста»
НАДІЯ ФЕДЮШКО БАЦ  «Професійне зростання QA спеціаліста»НАДІЯ ФЕДЮШКО БАЦ  «Професійне зростання QA спеціаліста»
НАДІЯ ФЕДЮШКО БАЦ «Професійне зростання QA спеціаліста»QADay
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...Elena Simperl
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...Product School
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance
 
Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutesconfluent
 
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptxUnpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptxDavid Michel
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsPaul Groth
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupCatarinaPereira64715
 
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlPeter Udo Diehl
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...Product School
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaRTTS
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
 

Recently uploaded (20)

UiPath Test Automation using UiPath Test Suite series, part 1
UiPath Test Automation using UiPath Test Suite series, part 1UiPath Test Automation using UiPath Test Suite series, part 1
UiPath Test Automation using UiPath Test Suite series, part 1
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
НАДІЯ ФЕДЮШКО БАЦ «Професійне зростання QA спеціаліста»
НАДІЯ ФЕДЮШКО БАЦ  «Професійне зростання QA спеціаліста»НАДІЯ ФЕДЮШКО БАЦ  «Професійне зростання QA спеціаліста»
НАДІЯ ФЕДЮШКО БАЦ «Професійне зростання QA спеціаліста»
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutes
 
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptxUnpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
 
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 

Devoxx MA 2015 - Turn you java objects into binary

  • 1. Turn your Java Objects into Binary for better performance 17 Nov 15
  • 3. C24 3 BINARY WE'RE TALKING 1S & 0S 16 0 16 1 16 2 16 3 16 4 16 5 16 6 16 7 46 10 0 10 1 10 2 10 3 10 4 10 5 10 6 10 7 001 00010110 2 0 2 1 2 2 2 3 2 4 2 5 2 6 2 7 DECIMAL HEX BINARY
  • 4. C24 4 BINARY WE'RE TALKING 1S & 0S 00010110 BYTE INT 4 BYTES LONG 8 BYTES There are also shorts (16 bits), chars (16), floats (32),
 doubles (64) and booleans (undefined)
  • 5. C24 5 BINARY WE'RE TALKING 1S & 0S Most of the Java primitives (for some bizarre reason) are signed So the “top” bit is used for the sign, 0 is positive and 1 is negative Bitwise operands work on ints and longs & (AND) | (OR) ^ (XOR) ~ (NOT) << (left shift), >> (right shirt with sign) and >>> (right shift with 0s)
  • 6. C24 6 A QUICK TEST… What is this 0x20 in decimal? 32 What is special about 0x55 and 0xAA in hex? They are 01010101 and 10101010 (alternate 1s and 0s) How many bits are needed to store all of the months of the year? Just 4 bits, 2^4 = 16, more than enough There are 365.25 days in a year, how many bits to store 100 years at day resolution? 16 bits (2 bytes), 2^16 = 65,536 or 179 years worth of days What is ~-1? NOT -1 or 0, i.e. the reverse of 0xFFFFFFFF (same as -1^-1)
  • 7. C24 7 TO CALCULATE THE NUMBER OF BITS… To calculate the number of bits needed we simply divide the Log of the number by the Log of 2 (any base) E.g. 36,525 days in 100 years Log 36,525 = 4.56259 Log 2 = 0.30103 Log 36525 / Log 2 = 15.15660 So we need 16 bits
  • 8. C24 8 JAVA IS VERY INEFFICIENT AT STORING DATA Take the string “ABC”, typically it needs just 4 bytes Three if you know the length won’t change Even less if “ABC” is an enumeration Java takes 48 bytes to store “ABC” as
 a String And many other simple classes too A B C 0
  • 9. C24 9 IT'S NOT JUST A STRING If it was just String then we could use byte[] or char[] but Java bloating is endemic Double (even today’s date takes 48 bytes) BigDecimal Date ArrayList Use just one or two and we’re OK but write a class with a few of these and we really start to see the problem A class with 10 minimum sized Objects can be over 500 bytes in size - for each instance What’s worse is that each object requires 11 separate memory allocations All of which need managing Which is why we have the garbage collector(s)
  • 10. C24 10 A NEW DATE? Could we create a new Date class? class NewDate {
 private byte year;
 private byte month;
 private byte day;
 } Yes but this is still 16 bytes in Java, 8 times more than we need In C/C++ we could use bitfields, 7 for the year, 4 for the month and 5 for the day. Total just 16 bits or two bytes This is is our target, 2 bytes
  • 11. C24 11 48 BYTES IS A LOT! To store today’s date Java needs 48 bytes that looks like this… 00101001 11000111 10100000 00010111 10100011 11100101 00010101 10010100 11101001 01000001 10101111 01010001 10100011 11100100 11010100 11010100 00101001 11000111 10100000 00010101 10100111 11100101 00010101 10010100 01101101 01000001 10101101 01010101 10100011 01100100 01010101 11010100 10001001 01000111 10100001 00010100 10100111 11100101 00010100 10010100 11101001 01000000 00101100 01010100 10100011 11100101 01010101 11010101 As opposed to this: 0x4174 or 01000001 01111000 The red is just random 0s and 1s but the green binary is real (16,756)
  • 12. C24 12 START SIMPLE Simple data we’re going to be referring to for the next few slides… ID TradeDate BuySell Currency1 Amount1 Exchange Rate Currency2 Amount2 Settlement Date 1 21/07/2014 Buy EUR 50,000,000 1.344 USD 67,200,000.00 28/07/2014 2 21/07/2014 Sell USD 35,000,000 0.7441 EUR 26,043,500.00 20/08/2014 3 22/07/2014 Buy GBP 7,000,000 172.99 JPY 1,210,930,000.00 05/08/2014 4 23/07/2014 Sell AUD 13,500,000 0.9408 USD 12,700,800.00 22/08/2014 5 24/07/2014 Buy EUR 11,000,000 1.2148 CHF 13,362,800.00 31/07/2014 6 24/07/2014 Buy CHF 6,000,000 0.6513 GBP 3,907,800.00 31/07/2014 7 25/07/2014 Sell JPY 150,000,000 0.6513 EUR 97,695,000.00 08/08/2014 8 25/07/2014 Sell CAD 17,500,000 0.9025 USD 15,793,750.00 01/08/2014 9 28/07/2014 Buy GBP 7,000,000 1.8366 CAD 12,856,200.00 27/08/2014 10 28/07/2014 Buy EUR 13,500,000 0.7911 GBP 10,679,850.00 11/08/2014
  • 13. C24 13 START WITH THE CSV Each line is relatively efficient ID,TradeDate,BuySell,Currency1,Amount1,Exchange Rate,Currency2,Amount2,Settlement Date 1,21/07/2014,Buy,EUR,50000000.00,1.344,USD,67200000.00,28/07/2014 2,21/07/2014,Sell,USD,35000000.00,0.7441,EUR,26043500.00,20/08/2014 3,22/07/2014,Buy,GBP,7000000.00,172.99,JPY,1210930000,05/08/2014 But it’s in human-readable format not CPU readable At least not efficient CPU readable We could store the lines as they are but in order to work with the data we need it in something Java can work with The same goes for any other language, C, C++, PHP, Scala etc. So typically we parse it into a Java class and give it a self-documenting name - Row
  • 14. C24 14 CSV TO JAVA This seems like a reasonably good implementation From this… ID,TradeDate,BuySell,Currency1,Amount1,Exchange Rate,Currency2,Amount2,Settlement Date 1,21/07/2014,Buy,EUR,50000000.00,1.344,USD,67200000.00,28/07/2014 2,21/07/2014,Sell,USD,35000000.00,0.7441,EUR,26043500.00,20/08/2014 3,22/07/2014,Buy,GBP,7000000.00,172.99,JPY,1210930000,05/08/2014 We get this… public class ObjectTrade {
 private long id;
 private Date tradeDate;
 private String buySell;
 private String currency1;
 private BigDecimal amount1;
 private double exchangeRate;
 private String currency2;
 private BigDecimal amount2;
 private Date settlementDate;
 }
  • 15. C24 15 EVERYTHING’S FINE With very simple getters and setters, something to parse the CSV and a custom toString() we’re good public BasicTrade parse( String line ) throws ParseException { String[] fields = line.split(","); setId(Long.parseLong(fields[0])); setTradeDate(DATE_FORMAT.get().parse(fields[1])); setBuySell(fields[2]); setCurrency1(fields[3]); setAmount1(new BigDecimal(fields[4])); setExchangeRate(Double.parseDouble(fields[5])); setCurrency2(fields[6]); setAmount2(new BigDecimal(fields[7])); setSettlementDate(DATE_FORMAT.get().parse(fields[8])); return this; } What could possibly go wrong?
  • 16. C24 16 CLASSIC JAVA In fact everything works really well, this is how Java was designed to work There are a few “fixes” to add for SimpleDateFormat due to it not being thread safe but otherwise we’re good Performance is good, well it seems good and everything is well behaved As the years go on and the volumes increase, we now have 100 million of them Now we start to see some problems To start with we don’t have enough memory - GC is killing performance When we distribute the size of the objects are killing performance too
  • 17. C24 17 JAVA IS VERBOSE & SLOW A simple CSV can grow by over 4 times… ID,TradeDate,BuySell,Currency1,Amount1,Exchange Rate,Currency2,Amount2,Settlement Date 1,21/07/2014,Buy,EUR,50000000.00,1.344,USD,67200000.00,28/07/2014 2,21/07/2014,Sell,USD,35000000.00,0.7441,EUR,26043500.00,20/08/2014 3,22/07/2014,Buy,GBP,7000000.00,172.99,JPY,1210930000,05/08/2014 From roughly 70 bytes per line as CSV to around 328 in Java That means you get over 4 times less data when
 stored in Java Or need over 4 times more RAM, network
 capacity and disk Serialized objects are even bigger!
  • 18. C24 18 JAVA BLOATS YOUR DATA You probably thought XML was bad imagine what happens when you take XML and bind it to Java! Anything you put into Java objects get
 horribly bloated in memory Effectively you are paying the price of
 memory and hardware abstraction
  • 19. C24 19 C AND C++ WERE RAW Raw code, raw data, sometimes a little dangerous but no garbage after consumption
  • 20. C24 20 JAVA IS PLASTIC WRAPPING Everything is wrapped in Java and that creates garbage, a LOT of garbage - It all needs collecting
  • 21. C24 21 JAVA OBJECTS - GOOD FOR HARDWARE VENDORS These fat Java Objects are a hardware vendor’s wet dream Think about it, Java came from Sun, it was free but they made money selling hardware, well they tried at least Fat objects need more memory, more CPU, more network capacity, more machines More money for the hardware vendors - why would they change it? And everything just runs slower because you’re busy collecting all the memory you’re not using Java for programmers was like free shots for
 AA members
  • 22. C24 22 THIS ISN’T JUST JAVA Think this is just a Java problem? It’s all the same, every time you create objects you’re blasting huge holes all over your machine’s RAM And someone’s got to clean all the garbage up too! Great for performance-tuning consultants :-)
  • 23. C24 23 IN-MEMORY CACHES If you use an in-memory cache then you’re most likely suffering from the same problem… Many of them provide and use compression or “clever” memory optimisation If they use reflection things slow down If you have to write it yourself you have a lot of work to do, especially for hierarchical data
  • 24. C24 24 CLASSIC JAVA BINDING This is how we’d typically code this simple CSV example… ID,TradeDate,BuySell,Currency1,Amount1,Exchange Rate,Currency2,Amount2,Settlement Date 1,21/07/2014,Buy,EUR,50000000.00,1.344,USD,67200000.00,28/07/2014 2,21/07/2014,Sell,USD,35000000.00,0.7441,EUR,26043500.00,20/08/2014 3,22/07/2014,Buy,GBP,7000000.00,172.99,JPY,1210930000,05/08/2014 public class ObjectTrade {
 private long id;
 private Date tradeDate;
 private String buySell;
 private String currency1;
 private BigDecimal amount1;
 private double exchangeRate;
 private String currency2;
 private BigDecimal amount2;
 private Date settlementDate;
 } It’s easy to write the code and fast to execute, retrieve, search and query data BUT it needs a lot of RAM and it slow to manage
  • 25. C24 25 JUST STORE THE ORIGINAL DATA? We could just store each row ID,TradeDate,BuySell,Currency1,Amount1,Exchange Rate,Currency2,Amount2,Settlement Date 1,21/07/2014,Buy,EUR,50000000.00,1.344,USD,67200000.00,28/07/2014 2,21/07/2014,Sell,USD,35000000.00,0.7441,EUR,26043500.00,20/08/2014 3,22/07/2014,Buy,GBP,7000000.00,172.99,JPY,1210930000,05/08/2014 public class StringTrade {
 private String row;
 } But every time we wanted a date or amount we’d need to parse it and that would slow down analysis If the data was XML it would be even worse We’d need a SAX (or other) parser every time
  • 26. C24 26 JUST STORE THE ORIGINAL DATA? public class StringTrade {
 private String row;
 } Allocation of new StringTrades are faster as we allocate just 2 Objects Serialization and De-Serialization are improved for the same reason BUT over all we lose out when we’re accessing the data We need to find what we’re looking each time This is sort of OK with a CSV but VERY expensive for XML public class XmlTrade {
 private String xml;
 }
  • 27. C24 27 COMPRESSION OR COMPACTION? OK, a lot of asks, why don’t we just use compression? Well there are many reasons, mainly that it’s slow, slow to compress and slow to de-compress, the better it is the slower it is Compression is the lazy person’s tool, a good protocol or codec doesn’t compress well, try compressing your MP3s or videos It has it’s place but we’re looking for more, we want compaction not compression, then we get performance too It’s very similar to image processing, zip the images and you can’t see them
  • 28. C24 28 NOW IN BINARY… This is what our binary version looks like… ID,TradeDate,BuySell,Currency1,Amount1,Exchange Rate,Currency2,Amount2,Settlement Date 1,21/07/2014,Buy,EUR,50000000.00,1.344,USD,67200000.00,28/07/2014 2,21/07/2014,Sell,USD,35000000.00,0.7441,EUR,26043500.00,20/08/2014 3,22/07/2014,Buy,GBP,7000000.00,172.99,JPY,1210930000,05/08/2014 public class ObjectTrade extends SDO {
 private byte[] data; } Just one object again so fast to allocate If we can encode the data in the binary then it’s fast to query too And serialisation is just writing out the byte[]
  • 29. C24 29 SAME API, JUST BINARY Classic getter and setter vs. binary implementation Identical API
  • 30. C24 30 SAME API, JUST BINARY Classic getter and setter vs. binary implementation Identical API
  • 31. C24 31 DID I MENTION.. THE SAME API This is a key point, we’re changing the implementation not the API This means that Spring, in-memory caches and other tools work exactly as they did before Let’s look at some code and
 a little demo of this…
  • 32. C24 32 TIME TO SEE SOME CODE A quick demo, I’ve created a Trade interface and two implementations, one “classic” and the other binary We’ll create a List of a few million trades (randomly but quite cleverly generated) We’ll run a quick Java 8 filter and sort on them We’ll serialize and de-serialize them to create a new List Finally for the binary version we’ll write out the entire list via NIO and read it in again to a new List
  • 33. C24 33 HOW DOES IT PERFORM? Compare classic Java binding to binary… Classic Java version Binary Java version Improvement Bytes used 328 39 840% Serialization size 668 85 786% Custom Serialization 668 40 1,670% Time to Serialize/ Deserialize 41.1µS 4.17µS 10x Batched Serialize/ Deserialize 12.3µS 44nS 280x
  • 34. C24 34 COMPARING… While the shapes look the same you can clearly see the differences on both axis The Object version is a lot slower And the Object version uses significantly more memory Note that the left side (objects) has a heap size of 8GB, the right (binary) has a heap size of just 2GB
  • 35. C24 35 COMPARING… While the shapes look the same you can clearly see the differences on both axis The Object version is a lot slower And the Object version uses significantly more memory Note that the left side (objects) has a heap size of 8GB, the right (binary) has a heap size of just 2GB
  • 36. C24 36 COMPARING… These two graphs show the GC pause time during message creation and serialisation Left is “classic” Java Right is the binary version The top of the right hand graph is lower than the first rung of the left (50ms)
  • 37. C24 37 COMPARING… These two graphs show the GC pause time during message creation and serialisation Left is “classic” Java Right is the binary version The top of the right hand graph is lower than the first rung of the left (50ms)
  • 38. C24 38 BEYOND THE CSV FILE It worked for a simple CSV how about more complex models like XML or JSON? Once we have a mapping of types to binary and code we can extend this to any type of model But it gets a lot more complex as we have optional elements, repeating elements and a vast array of different types XML can be represented as binary and binary as XML, you just need a model
  • 39. C24 39 STANDARD JAVA BINDING This is a minute snippet from FpML, ISO 20022 is a very similar standard <resetFrequency>
 <periodMultiplier>6</periodMultiplier>
 <period>M</period>
 </resetFrequency> JAXB, JIBX, Castor etc. generate something like … public class ResetFrequency { private BigInteger periodMultiplier; // Positive Integer private Object period; // Enum of D, W, M, Q,Y public BigInteger getPeriodMultiplier() { return this.periodMultiplier; } // constructors & other getters and setters In memory - 3 objects - at least 144 bytes The parent, a positive integer and an enumeration for Period 3 Java objects at 48 bytes is 144 bytes and it becomes fragmented in memory
  • 40. C24 40 JAVA BINDING TO BINARY <resetFrequency>
 <periodMultiplier>6</periodMultiplier>
 <period>M</period>
 </resetFrequency> Using our binary codec we generate … ByteBuffer data; // From the root object public BigInteger getPeriodMultiplier() { int byteOffset = 123; // Actually a lot more complex return BigInteger.valueOf( data.get(byteOffset) & 0x1F ); } // constructors & other getters In memory -1 byte for all three fields The root contains one ByteBuffer which is a wrapper for byte[] The getters use bit-fields… PeriodMultiplier is 1-30 so needs 5 bits Period is just 3 bits for values D, W, M, Q or Y
  • 41. C24 41 SERIALIZATION & DESERIALIZATION Writing a Java Object as a serialized stream is slow Of the Object contains other objects then it’s painfully SLOW Several frameworks exist for (performance) serialisation Avro, Kryo, Thrift, Protobuf, Cap’n proto Of course there’s always JSON and XML if speed and memory are infinite resources and you like reading the data streams before bed time Generating the serialisation though is faster still since we can avoid reflection and the data is already in serialised form The same is true for deserialisation So we can almost double the speed of serialisation frameworks like Kryo with even smaller packet size
  • 42. C24 42 JMH BENCHMARKS JMH is an open source Java Micro-benchmark Harness, the following are timings comparing Kryo and our own binary serialisation… Benchmark Mode Cnt Score Error Units IoBenchmark.deSer sample 58102 171.596 ± 0.541 ns/op IoBenchmark.ser sample 58794 169.579 ± 0.546 ns/op Benchmark Mode Cnt Score Error Units KryoBenchmark.deSer sample 26423 378.121 ± 1.096 ns/op KryoBenchmark.ser sample 24951 400.387 ± 1.512 ns/op http://openjdk.java.net/projects/code-tools/jmh/ https://github.com/eishay/jvm-serializers
  • 43. C24 43 ANALYTICS When we have a lot of data there seems to be an assumption that we need Hadoop Hadoop is an implementation of Map / Reduce If we shrink the data then we can use a lambda for the same task… .collect(Collectors.groupingBy(f -> f.getSignedCurrencyAmount()
 .getCurrencyAmount().getCurrency(),
 Collectors.reducing(BigDecimal.ZERO, t -> t.getSignedCurrencyAmount()
 .getCurrencyAmount().getAmount(), BigDecimal::add))); Produces (in 105ms for 1 million complex SWIFT messages) TWD=16218034.70, CHF=127406235.40, HKD=1292102912.26, EUR=831021868.46, DKK=196507853.20, CAD=137982907.68, USD=525123842.02, ZAR=19383094.78, NOK=618033504.06, AUD=218890981.32, ILS=46628.66, SGD=74032013.42 etc.
  • 44. C24 44 ANOTHER DEMO Take some raw XML (an FpML derivative, about 7.4k of XML) Parse it, mutate a few variables and then compact each one to its binary version - We do this 1 million times This takes about 100 seconds Now the test begins We take a look at the binary version (just 370 bytes) We search through the 1 million trades for data and aggregate the results Then we try it multi-threaded (using parallelStream()) A few more similar operations (I will discuss) Finally we’ll write all 1 million to disk and then read them back into another array (comparing to make sure it worked) This will be a test of serialisation
  • 45. C24 45 IT’S REALLY SMALL Fit 10-20x more messages into your existing memory footprint
  • 46. C24 46 IT’S REALLY FAST Binary messages can be queried without deserialisation Binary maximises the benefit of your CPU cache
  • 47. C24 47 IT’S TRANSPORT FRIENDLY The in-memory format is the same as its on-disk format We can query on-disk data almost as quickly as in-memory data Replication & distribution doesn’t incur (de)serialisation costs
  • 48. C24 48 SO IT WORKS Key points from the slides… If you want performance, scalability and ease of maintenance then you need… Reduce the amount of memory you’re using Reduce the way you use the memory Try using binary data instead of objects Start coding intelligently like we used to in the 70s, 80s and early 90s Or just buy much bigger, much faster machines, more RAM, bigger networks and more DnD programmers And you’ll probably get free lunches from your hardware vendors