Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Fabian Hueske – Juggling with Bits and Bytes

6,357 views

Published on

How Apache Flink operates on binary data
Flink Forward 2015

Published in: Technology
  • Be the first to comment

Fabian Hueske – Juggling with Bits and Bytes

  1. 1. Juggling  with  Bits  and  Bytes   How  Apache  Flink  operates  on  binary  data     Fabian  Hueske   :ueske@apache.org                    @:ueske     1  
  2. 2. Big  Data  frameworks  on  JVMs   •  Many  (open  source)  Big  Data  frameworks  run  on  JVMs   –  Hadoop,  Drill,  Spark,  Hive,  Pig,  and  ...   –  Flink  as  well   •  Common  challenge:  How  to  organize  data  in-­‐memory?   –  In-­‐memory  processing  (sorOng,  joining,  aggregaOng)   –  In-­‐memory  caching  of  intermediate  results   •  Memory  management  of  a  system  influences   –  Reliability   –  Resource  efficiency,  performance  &  performance  predictability   –  Ease  of  configuraOon   2  
  3. 3. The  straight-­‐forward  approach   Store  and  process  data  as  objects  on  the  heap   •  Put  objects  in  an  array  and  sort  it     A  few  notable  drawbacks   •  PredicOng  memory  consumpOon  is  hard   –  If  you  fail,  an  OutOfMemoryError  will  kill  you!   •  High  garbage  collecOon  overhead   –  Easily  50%  of  Ome  spend  on  GC   •  Objects  have  considerable  space  overhead   –  At  least  8  bytes  for  each  (nested)  object!  (Depends  on  arch)   3  
  4. 4. FLINK’S  APPROACH   4  
  5. 5. Flink  adopts  DBMS  technology   •  Allocates  fixed  number  of  memory  segments  upfront   •  Data  objects  are  serialized  into  memory  segments   •  DBMS-­‐style  algorithms  work  on  binary  representaOon   5  
  6. 6. Why  is  that  good?   •  Memory-­‐safe  execuOon   –  Used  and  available  memory  segments  are  easy  to  count   –  No  parameter  tuning  for  reliable  operaOons!   •  Efficient  out-­‐of-­‐core  algorithms   –  Memory  segments  can  be  efficiently  wrifen  to  disk   •  Reduced  GC  pressure   –  Memory  segments  are  off-­‐heap  or  never  deallocated   –  Data  objects  are  short-­‐lived  or  reused   •  Space-­‐efficient  data  representaOon   •  Efficient  operaOons  on  binary  data   6  
  7. 7. What  does  it  cost?   •  Significant  implementaOon  investment   –  Using  java.uOl.HashMap   vs.   –  ImplemenOng  a  spillable  hash  table  backed  by  byte  arrays   and  custom  serializaOon  stack   •  Other  systems  use  similar  techniques   –  Apache  Drill,  Apache  AsterixDB  (incubaOng)   •  Apache  Spark  evolves  into  a  similar  direcOon   7  
  8. 8. MEMORY  ALLOCATION   8  
  9. 9. Memory  segments   •  Unit  of  memory  distribuOon  in  Flink   –  Fixed  number  allocated  when  worker  starts   •  Backed  by  a  regular  byte  array  (default  32KB)   •  On-­‐heap  or  off-­‐heap  allocaOon   •  R/W  access  through  Java’s  efficient  unsafe  methods   •  MulOple  memory  segments  can  be  logically   concatenated  to  a  larger  chunk  of  memory   9  
  10. 10. On-­‐heap  memory  allocaOon   10  
  11. 11. Off-­‐heap  memory  allocaOon   11  
  12. 12. On-­‐heap  vs.  Off-­‐heap   •  No  significant  performance  difference  in     micro-­‐benchmarks   •  Garbage  CollecOon   –  Smaller  heap  -­‐>  faster  GC   •  Faster  start-­‐up  Ome   –  A  mulO-­‐GB  JVM  heap  takes  Ome  to  allocate   12  
  13. 13. DATA  SERIALIZATION   13  
  14. 14. Custom  de/serializaOon  stack   •  Many  alternaOves  for  Java  object  serializaOon   –  Dynamic:  Kryo   –  Schema-­‐dependent:  Apache  Avro,  Apache  Thrip,  Protobufs   •  But  Flink  has  its  own  serializaOon  stack   –  OperaOng  on  serialized  data  requires  knowledge  of  layout   –  Control  over  layout  can  improve  efficiency  of  operaOons   –  Data  types  are  known  before  execuOon   14  
  15. 15. Rich  &  extensible  type  system   •  SerializaOon  framework  requires  knowledge  of  types   •  Flink  analyzes  return  types  of  funcOons   –  Java:  ReflecOon  based  type  analyzer   –  Scala:  Compiler  informaOon  +  CodeGen  via  Macros   •  Rich  type  system   –  Atomics:  PrimiOves,  Writables,  Generic  types,  …   –  Composites:  Tuples,  Pojos,  CaseClasses   –  Extensible  by  custom  types   15  
  16. 16. Serializing  a  Tuple3<Integer,  Double,  Person>   16  
  17. 17. OPERATING  ON  BINARY  DATA   17  
  18. 18. Data  processing  algorithms   •  Flink’s  algorithms  are  based  on  RDBMS  technology   –  External  Merge  Sort,  Hybrid  Hash  Join,  Sort  Merge  Join,  …   •  Algorithms  receive  a  budget  of  memory  segments   –  AutomaOc  decision  about  budget  size   –  No  fine-­‐tuning  of  operator  memory!   •  Operate  in-­‐memory  as  long  as  data  fits  into  budget   –  And  gracefully  spill  to  disk  if  data  exceeds  memory   18  
  19. 19. In-­‐memory  sort  –  Fill  the  sort  buffer   19  
  20. 20. In-­‐memory  sort  –  Sort  the  buffer   20  
  21. 21. In-­‐memory  sort  –  Read  sorted  buffer   21  
  22. 22. SHOW  ME  NUMBERS!   22  
  23. 23. Sort  benchmark   •  Task:  Sort  10  million  Tuple2<Integer,  String>  records   –  String  length  12  chars   •   Tuple  has  16  Bytes  of  raw  data   •  ~152  MB  raw  data   –  Integers  uniformly,  Strings  long-­‐tail  distributed   –  Sort  on  Integer  field  and  on  String  field   •  Generated  input  provided  as  mutable  object  iterator   •  Use  JVM  with  900  MB  heap  size   –  Minimum  size  to  reliable  run  the  benchmark   23  
  24. 24. SorOng  methods   1.  Objects-­‐on-­‐Heap:     –  Put  cloned  data  objects  in  ArrayList  and  use  Java’s  CollecOon  sort.     –  ArrayList  is  iniOalized  with  right  size.   2.  Flink-­‐serialized  (on-­‐heap):     –  Using  Flink’s  custom  serializers.   –  Integer  with  full  binary  sorOng  key,  String  with  8  byte  prefix  key.   3.  Kryo-­‐serialized  (on-­‐heap):     –  Serialize  fields  with  Kryo.     –  No  binary  sorOng  keys,  objects  are  deserialized  for  comparison.   •  All  implementaOons  use  a  single  thread   •  Average  execuOon  Ome  of  10  runs  reported   •  GC  triggered  between  runs  (does  not  go  into  reported  Ome)   24  
  25. 25. ExecuOon  Ome   25  
  26. 26. Garbage  collecOon  and  heap  usage   26   Objects-­‐on-­‐heap   Flink-­‐serialized  
  27. 27. Memory  usage   27   •  Breakdown:  Flink  serialized  -­‐  Sort  Integer   –  4  bytes  Integer   –  12  bytes  String   –  4  bytes  String  length   –  4  bytes  pointer   –  4  bytes  Integer  sorOng  key   –  28  bytes  *  10M  records  =  267  MB   Object-­‐on-­‐heap   Flink-­‐serialized   Kryo-­‐serialized   Sort  Integer   Approx.  700  MB   277  MB   266  MB   Sort  String   Approx.  700  MB   315  MB   266  MB  
  28. 28. Going  out-­‐of-­‐core   28   •  Single  thread  HashJoin  with  4GB  memory  budget   •  Build  side  varies,  Probe  side  64GB  
  29. 29. WHAT’S  NEXT?   29  
  30. 30. We’re  not  done  yet!     •  SerializaOon  layouts  tailored  towards  operaOons   –  More  efficient  operaOons  on  binary  data   •  Table  API  provides  full  semanOcs  for  execuOon   –  Use  code  generaOon  to  operate  fully  on  binary  data   •  …   30  
  31. 31. Summary   •  AcOve  memory  management  avoids  OOMErrors   •  Highly  efficient  data  serializaOon  stack   –  Facilitates  operaOons  on  binary  data   –  Makes  more  data  fit  into  memory   •  DBMS-­‐style  operators  operate  on  binary  data     –  High  performance  in-­‐memory  processing     –  Graceful  destaging  to  disk  if  necessary   •  Read  Flink’s  blog:     –  hfp://flink.apache.org/news/2015/05/11/Juggling-­‐with-­‐Bits-­‐and-­‐Bytes.html   –  hfp://flink.apache.org/news/2015/03/13/peeking-­‐into-­‐Apache-­‐Flinks-­‐Engine-­‐Room.html   –  hfp://flink.apache.org/news/2015/09/16/off-­‐heap-­‐memory.html     31  
  32. 32. 32   hfp://flink.apache.org    @ApacheFlink   Apache  Flink  

×