Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Serial-War

124 views

Published on

Evaluate the real-time performance of different serialization technologies

Published in: Engineering
  • Be the first to comment

  • Be the first to like this

Serial-War

  1. 1. Serial-war Xuechao Wu Evaluate the performance of serialization formats Insight Data Engineering Fellowship, SV
  2. 2. DEMO www.serialwar.xyz
  3. 3. Ideas and Motivations • What format should be used for real-time apps? • Bandwidth usage
  4. 4. PIPELINE Serialization Deserialization Dashboard Ingestion Processing Cache
  5. 5. PIPELINE m4.x m4.x m4.x m4.x m4.x m4.x m4.x $1.673/hr
  6. 6. Protocol Buffers 33 Bytes *https://martin.kleppmann.com/2012/12/05/schema-ev olution- in-avr o-pr otocol- buffers-thrift.html { "userName": "Martin", "favouriteNumber": 1337, "interests": ["daydreaming", "hacking"] } 82 Bytes message Person { required string user_name = 1; optional int64 favourite_number = 2; repeated string interests = 3; }
  7. 7. Protocol Buffers 33 Bytes *https://martin.kleppmann.com/2012/12/05/schema-ev olution- in-avr o-pr otocol- buffers-thrift.html
  8. 8. Apache Avro 32 Bytes *https://martin.kleppmann.com/2012/12/05/schema-ev olution- in-avr o-pr otocol- buffers-thrift.html { "userName": "Martin", "favouriteNumber": 1337, "interests": ["daydreaming", "hacking"] } 82 Bytes { "type": "record", "name": "Person", "fields": [ {"name": "userName", "type": "string"}, {"name": "favouriteNumber", "type": ["null", "long"]}, {"name": "interests", "type": {"type": "array", "items": "string"}} ] }
  9. 9. Apache Avro 32 Bytes *https://martin.kleppmann.com/2012/12/05/schema-ev olution- in-avr o-pr otocol- buffers-thrift.html
  10. 10. MapReduce Jobs • Challenge: How to monitor real-time bandwidth usage? No developed protocol could be utilized(HTTP GET/POST TCP/IP both requires server-cli ack communication) • Solution: Map data into (time,data) • time resolution 1 second, then averageByKey 1 second 1 second
  11. 11. AverageByKey latency_stream = message_DStream. map(lambda x:json.loads(x)). //x:{json} map(lambda x:(math.ceil(time.time()),time.time()-x["time"])). //(key_time,latency) combineByKey(lambda value: (value, 1),lambda x, value: (x[0] + value, x[1] + 1),lambda x, y: (x[0] + y[0], x[1] + y[1])). //(key_time, (value,1)) -> (key_time,(sum,count)) -> (key_time,(sum,count)) map(lambda (label, (value_sum, count)): (label, value_sum / count)) //(time,averaged_latency)
  12. 12. Throughput monitoring ● “peak” pattern
  13. 13. Throughput monitoring ● “Compensation” Pattern
  14. 14. Overall Performance: 2000 events/sec ~500 KB/s ~200KB/s ~220KB/s JSON Avro Protobuf
  15. 15. Overall Performance: 10000 events/sec ~2400 KB/s ~930KB/s ~1050KB/s JSON Avro Protobuf
  16. 16. Latency JSON 27.13ms Avro 38.27ms Protobuf 46.12ms Recommendation?
  17. 17. Recommendation JSON ~50% more Bandwidth ~34% less latency Avro 930kb/s Bandwidth 38ms Protobuf 10% more Bandwidth 17% higher latency
  18. 18. I would recommend… JSON If your app is Lag-critical Light-sized data Avro If your app is Data-heavy real-time critical Protobuf If your app is Heavily replying on Google Services Need Perfect documentatio n
  19. 19. About me • University of Southern California • MS Electrical Engineering Before Insight At Insight Basic MapReduce Spark, Kafka, Redis Compression Serialization Linux C AWS, Bash, tmux… Basic front-end Full Stack Dev Think Alone Communication
  20. 20. Avro vs. Protobuf • Why Avro serialization is slightly smaller than Protobuf? • Avro schema has both attribution name and type. • Protobuf tags each record with name tag and type. (1 byte more per record) • Schema Evolution? • Avro must keep the most recent version(order matters, field matters), or runtime risk • Protobuf may decode with previous schema without runtime error, overall more flexible. • Optional Feature? • Protobuf: decode with validation for required • Avro: null in a union to indicate optional

×