Juggling with Bits and Bytes - How Apache Flink operates on binary data

Juggling with Bits and Bytes
How Apache Flink operates on binary data
Fabian Hueske
fhueske@apache.org @fhueske
1

Big Data frameworks on JVMs
• Many (open source) Big Data frameworks run on JVMs
– Hadoop, Drill, Spark, Hive, Pig, and ...
– Flink as well
• Common challenge: How to organize data in-memory?
– In-memory processing (sorting, joining, aggregating)
– In-memory caching of intermediate results
• Memory management of a system influences
– Reliability
– Resource efficiency, performance & performance predictability
– Ease of configuration
2

The straight-forward approach
Store and process data as objects on the heap
• Put objects in an Array and sort it
A few notable drawbacks
• Predicting memory consumption is hard
– If you fail, an OutOfMemoryError will kill you!
• High garbage collection overhead
– Easily 50% of time spend on GC
• Objects have space overhead
– At least 8 bytes for each (nested) object! (Depends on arch)
3

Flink adopts DBMS technology
• Allocates fixed number of memory segments upfront
• Data objects are serialized into memory segments
• DBMS-style algorithms work on binary representation
5

Why is that good?
• Memory-safe execution
– Used and available memory segments are easy to count
• Efficient out-of-core algorithms
– Memory segments can be efficiently written to disk
• Reduced GC pressure
– Memory segments are never deallocated
– Data objects are short-lived or reused
• Space-efficient data representation
• Efficient operations on binary data
6

What does it cost?
• Significant implementation investment
– Using java.util.HashMap
vs.
– Implementing a spillable hash table backed by byte arrays
and custom serialization stack
• Other systems use similar techniques
– Apache Drill, Apache Ignite, Apache Geode
• Apache Spark plans to evolve into a similar direction
7

Memory segments
• Unit of memory distribution in Flink
– Fixed number allocated when worker starts
• Backed by a regular byte array (default 32KB)
• R/W access through Java’s efficient unsafe methods
• Multiple memory segments can be concatenated to
a larger chunk of memory
9

Custom de/serialization stack
• Many alternatives for Java object serialization
– Kryo, Apache Avro, Apache Thrift, Protobufs, …
• But Flink has its own serialization stack
– Operating on serialized data requires knowledge of layout
– Control over layout can improve efficiency of operations
– Data types are known before execution
12

Rich & extensible type system
• Serialization framework requires knowledge of types
• Flink analyzes return types of functions
– Java: Reflection based type analyzer
– Scala: Compiler information
• Rich type system
– Atomics: Primitives, Writables, Generic types, …
– Composites: Tuples, Pojos, CaseClasses
– Extensible by custom types
13

Serializers & comparators
• All types have dedicated de/serializers
– Primitives are natively serialized
– Writables use their own serialization functions
– Generic types use Kryo
– …
• Serialization goes automatically through Java unsafe
• Comparators compare and hash objects
– On binary representation if possible
• Composite serializers and comparators delegate to
serializers and comparators of member types
14

Serializing a Tuple3<Integer, Double, Person>
15

Data Processing Algorithms
• Flink’s algorithms are based on RDBMS technology
– External Merge Sort, Hybrid Hash Join, Sort Merge Join, …
• Algorithms receive a budget of memory segments
• Operate in-memory as long as data fits into budget
– And gracefully spill to disk if data exceeds memory
17

In-Memory Sort – Fill the Sort Buffer
18

In-Memory Sort – Sort the Buffer
19

In-Memory Sort – Read Sorted Buffer
20

Sort benchmark
• Task: Sort 10 million Tuple2<Integer, String> records
– String length 12 chars
• Tuple has 16 Bytes of raw data
• ~152 MB raw data
– Integers uniformly, Strings long-tail distributed
– Sort on Integer field and on String field
• Input provided as mutable object iterator
• Use JVM with 900 MB heap size
– Minimum size to reliable run the benchmark
22

Sorting methods
1. Objects-on-Heap:
– Put cloned data objects in ArrayList and use Java’s Collection sort.
– ArrayList is initialized with right size.
2. Flink-serialized:
– Using Flink’s custom serializers.
– Integer with full binary sorting key, String with 8 byte prefix key.
3. Kryo-serialized:
– Serialize fields with Kryo.
– No binary sorting keys, objects are deserialized for comparison.
• All implementations use a single thread
• Average execution time of 10 runs reported
• GC triggered between runs (does not go into time)
23

Garbage collection and heap usage
25
Objects-on-heap
Flink-serialized

Memory usage
26
• Breakdown: Flink serialized - Sort Integer
– 4 bytes Integer
– 12 bytes String
– 4 bytes String length
– 4 bytes pointer
– 4 bytes Integer sorting key
– 28 bytes * 10M records = 267 MB
Object-on-heap Flink-serialized Kryo-serialized
Sort Integer Approx. 700 MB 277 MB 266 MB
Sort String Approx. 700 MB 315 MB 266 MB

We’re not done yet!
• Move memory segments to off-heap memory
– Smaller JVM, lower GC pressure, easier configuration
• Table API provides full semantics for execution
– Use code generation to operate fully on binary data
• Serialization layouts tailored towards operations
– More efficient operations on binary data
• …
28

Summary
• Active memory management avoids OOMErrors.
• Highly efficient data serialization stack
– Facilitates operations on binary data
– Makes more data fit into memory
• DBMS-style operators operate on binary data
– High performance in-memory processing
– Graceful destaging to disk if necessary
• Read the full story:
http://flink.apache.org/news/2015/05/11/Juggling-with-Bits-and-Bytes.html
29

30
http://flink.apache.org @ApacheFlink
Apache Flink

Juggling with Bits and Bytes - How Apache Flink operates on binary data

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (16)

Similar to Juggling with Bits and Bytes - How Apache Flink operates on binary data

Similar to Juggling with Bits and Bytes - How Apache Flink operates on binary data (20)

Recently uploaded

Recently uploaded (20)

Juggling with Bits and Bytes - How Apache Flink operates on binary data