DATA SERIALIZATION WITH
GOOGLE PROTOCOL BUFFERS
By: William Kibira
What is Data Serialization
● The process of translating a data structure and
its object state into a format that can be stored
in a memory buffer, file or transported on a
network.
● End goal being that it can be reconstructed in
another computer environment.
Reasons as To why We do this
● Persist Objects [Store and later Retrieve them]
● Perform Remote Procedural Calls
● Create Distributed Objects [Corba , JavaRMI,
ICE]
Key Words
● Computer Environment
- Programming Languages
- Operating Systems
- Architectures and processors
● Platform Independent Solutions
Popular Platform Independent
Solutions
● JSON and XML
● BSON and Binary XML
● Google Protocol Buffer , Thrift , Avro
Ref
http://en.wikipedia.org/wiki/Comparison_of_data_serialization_formats
JSON AND XML
● Most popular
● Easily Human Readable to some extent
● Most Web based APIs use it by default
● Lots of generators for this stuff
How to works
● You write an IDL [Interface Description
Language] . Kinda like CORBA IDLs but ,
much cleaner and more flexible.
● Pass it through a C++ based code generator
● Get your Boiler plate code in a given language
you specified
GOOGLE PROTOCOL BUFFERS
● This is a platform independent language
independent data serialization solution similar
to XML in structure but much smaller in size
and easier to structure .
● Been there since 2001 , made open in 2008
JSON BINARY FORMATS
● JSON is darn easy to read , If you can read binary , you definitely
need to see a doctor.
● JSON [Gets fat even on little Data], Binary really compact
{"deposit_money": "12345678"}
JSON BINARY
'0x6d', '0x6f', '0x6e', '0x01', '0xBC614E'
'0x65', '0x79', '0x31',
'0x32', '0x33', '0x34',
'0x35', '0x36', '0x37',
'0x38'
SPEED AT PARSING
● JSON is Fairly fast but , Binary is close to
machine speed since it is readily parse-able.
FLOW
Schema / IDL
C++ Code Generator
C++ JAVA Python JavaScript
Server /Client application bases
What does a Schema Look Like ?
Howto Generate the Code
● Use the protobuffer compiler by specifying the
language you want out put and the file.proto
● Protoc -I=/DIR_to_Schema/
--out_language=FOLDER_TO_Buffer/
DIR_TO_Schema/file.proto
A Look in my Terminal
What is Inside My XX.java
SIZE COMPARISON
RMI
GPB
JSON
XML
0 100 200 300 400 500 600 700 800 900 1000
905
250
559
836
Runtime Performance
Server CPU AVG Client CPU AVG Time
Protobuf 30.0% 37.75% 01:19:48
JSON 20.0% 75.00% 04:44:83
XML 12.00 80.75% 05:27:45
Versioning
● This is to do with backward compatibility
between Protocol buffers that are old or new
● Old server new Client and Vice Versa
Even if a field has changed , the data will be
parsed
Other Protocol Buffers
● MessagePack [.Net]
● Thrift [Facebook]
● Avro
Reasons To use Protocol Buffers
● They are smaller to push around over
networks
● Easier [If Not easiest] to structure
● Give a sense object oriented structuring
Reasons Not To use it
● Well, you will have to maintain both the server
and clients .
● They may in most cases not be easy to learn
● They are not an industry standard.
● I am just trying to be fair here :)
SIMPLE DEMO CHAT APPS
● Simple chat application working on both
desktops, laptops and Also on different
Operating systems
● Partial Inspiration from the Fifth Estate
THE END
● Links to Check out
Google Protocol Buffers Main Page
https://developers.google.com/protocol-buffers/
● Apache Thrift
https://thrift.apache.org/

Data Serialization Using Google Protocol Buffers

  • 1.
    DATA SERIALIZATION WITH GOOGLEPROTOCOL BUFFERS By: William Kibira
  • 2.
    What is DataSerialization ● The process of translating a data structure and its object state into a format that can be stored in a memory buffer, file or transported on a network. ● End goal being that it can be reconstructed in another computer environment.
  • 3.
    Reasons as Towhy We do this ● Persist Objects [Store and later Retrieve them] ● Perform Remote Procedural Calls ● Create Distributed Objects [Corba , JavaRMI, ICE]
  • 4.
    Key Words ● ComputerEnvironment - Programming Languages - Operating Systems - Architectures and processors ● Platform Independent Solutions
  • 5.
    Popular Platform Independent Solutions ●JSON and XML ● BSON and Binary XML ● Google Protocol Buffer , Thrift , Avro Ref http://en.wikipedia.org/wiki/Comparison_of_data_serialization_formats
  • 6.
    JSON AND XML ●Most popular ● Easily Human Readable to some extent ● Most Web based APIs use it by default ● Lots of generators for this stuff
  • 7.
    How to works ●You write an IDL [Interface Description Language] . Kinda like CORBA IDLs but , much cleaner and more flexible. ● Pass it through a C++ based code generator ● Get your Boiler plate code in a given language you specified
  • 8.
    GOOGLE PROTOCOL BUFFERS ●This is a platform independent language independent data serialization solution similar to XML in structure but much smaller in size and easier to structure . ● Been there since 2001 , made open in 2008
  • 9.
    JSON BINARY FORMATS ●JSON is darn easy to read , If you can read binary , you definitely need to see a doctor. ● JSON [Gets fat even on little Data], Binary really compact {"deposit_money": "12345678"} JSON BINARY '0x6d', '0x6f', '0x6e', '0x01', '0xBC614E' '0x65', '0x79', '0x31', '0x32', '0x33', '0x34', '0x35', '0x36', '0x37', '0x38'
  • 10.
    SPEED AT PARSING ●JSON is Fairly fast but , Binary is close to machine speed since it is readily parse-able.
  • 11.
    FLOW Schema / IDL C++Code Generator C++ JAVA Python JavaScript Server /Client application bases
  • 12.
    What does aSchema Look Like ?
  • 13.
    Howto Generate theCode ● Use the protobuffer compiler by specifying the language you want out put and the file.proto ● Protoc -I=/DIR_to_Schema/ --out_language=FOLDER_TO_Buffer/ DIR_TO_Schema/file.proto
  • 14.
    A Look inmy Terminal
  • 15.
    What is InsideMy XX.java
  • 16.
    SIZE COMPARISON RMI GPB JSON XML 0 100200 300 400 500 600 700 800 900 1000 905 250 559 836
  • 17.
    Runtime Performance Server CPUAVG Client CPU AVG Time Protobuf 30.0% 37.75% 01:19:48 JSON 20.0% 75.00% 04:44:83 XML 12.00 80.75% 05:27:45
  • 18.
    Versioning ● This isto do with backward compatibility between Protocol buffers that are old or new ● Old server new Client and Vice Versa Even if a field has changed , the data will be parsed
  • 19.
    Other Protocol Buffers ●MessagePack [.Net] ● Thrift [Facebook] ● Avro
  • 20.
    Reasons To useProtocol Buffers ● They are smaller to push around over networks ● Easier [If Not easiest] to structure ● Give a sense object oriented structuring
  • 21.
    Reasons Not Touse it ● Well, you will have to maintain both the server and clients . ● They may in most cases not be easy to learn ● They are not an industry standard. ● I am just trying to be fair here :)
  • 22.
    SIMPLE DEMO CHATAPPS ● Simple chat application working on both desktops, laptops and Also on different Operating systems ● Partial Inspiration from the Fifth Estate
  • 23.
    THE END ● Linksto Check out Google Protocol Buffers Main Page https://developers.google.com/protocol-buffers/ ● Apache Thrift https://thrift.apache.org/