Protocol Buffers

Protocol Buffers
Protocol Buffers
Ceyhan Kasap | Software Infrastructure

Data Serialization
● The process of translating an object into a format
that can be stored in a memory buffer, file or
transported on a network.
● End goal : Reconstruction in another computer
environment.
● Reverse process: Deserialization

Binary Serialization
● Many languages provides built in language
support
● Language specific (Interop issues)
● Example : Java - Serializable marker interface
(increases likelihood of bugs and security holes )
● Item 74: Implement Serializable
judiciously
● Item 78: Consider serialization
proxies instead of serialized instances

Binary Serialization
● Advantages
● Memory efficient
● Fast to emit and parse
● Disadvantages
● Not human readable
● Platform dependent

CROSS PLATFORM SOLUTIONS - XML
(Extensible Markup Language)
● Design goals: simplicity, generality, and usability across
the Internet
● Hierarchical structure, validation via schema (DTD, XSD
etc)
● A common standard with great acceptance.
● Criticism for verbosity and complexity (especially when
namespaces are involved)

CROSS PLATFORM SOLUTIONS - JSON
(Javascript object notation)
● Lightweight data- interchange format
● Uses human-readable text to transmit data objects
consisting of attribute–value pairs.
● Remember: xml is markup language and json is
data format

Google Data Encoding Solution
Options
«At Google, our mission is organizing all of the
world's information.
We use literally thousands of different data formats
and most of these formats are structured, not flat»
https://opensource.googleblog.com/2008/07/protocol-buffers-googles-data.html

Options
« Not efficient enough for this scale.
Writing code to work with the DOM tree can
sometimes become unwieldy.»
Option 1 : Use XML

Options
«When we roll out a new version of a server, it
almost always has to start out talking to older
servers.
Also, we use many languages, so we need a portable
solution.»
Option 2 : write the raw bytes of in-memory data
structures to the wire

Options
« there was a format for requests and responses
that used hand marshalling/unmarshalling of
requests and responses, and that supported a
number of versions of the protocol....»
Option 3 : Use hand-coded parsing and serialization
routines for each data structure (used solution
before protocol buffers)

What are protocol buffers?
 A language-neutral, platform-neutral, extensible
way of serializing structured data for use in
communications protocols, data storage, and
more.
 Initially developed at Google to deal with an index
server request/response protocol.
 Designed and used since 2001 in Google.
 Open-sourced since 2008.

How do they work?
 You define your structured data format in a
descriptor file (.proto file)
 You run the protocol buffer compiler for your
application's language on your .proto file to
generate data access classes.
 You can even update your data structure without
breaking deployed programs that are compiled
against the "old" format.

Message Definition
 Messages defined in .proto files
 Syntax:
Message [MessageName] { ... }
 Can be nested
 Will be converted to e.g. a Java class

Message Contents
 Each message may have
 Messages
 Enums:
enum <name> {
valuename = value;
}
 Fields
 Each field is defined as
<rule> <type> <name> = <id> {[<options>]};
 Rules : required, optional, repeated

Generated Code
MESSAGES
• Immutable (Person.java)
BUILDERS
• (Person.Builder.java)
ENUMS & NESTED CLASSES
• Person.PhoneType.MOBILE
• Person. PhoneNumber
PARSING & SERIALIZATION
• writeTo(final OutputStream output)
• parseFrom(byte[] data), parseFrom(java.io.InputStream input)

Backward / Forward Compatibility
 DO NOT change the tag numbers of any existing
fields.
 You can delete optional or repeated fields, but you
must not add or delete any required fields.

 When adding new field you must use fresh tag
numbers… (i.e. tag numbers that were never used
in this protocol buffer, not even by deleted fields).
 A good practice :
 Make your deleted fields are reserved.
 Protocol buffer compiler complains if reserved
fields are used.

 Changing a default value is generally OK …
 But remember that default values are never sent
over the wire.
Sender Receiver
Receiver reads value as 20 if not sent by sender

Performance Comparison
http://homepages.lasige.di.fc.ul.pt/~vielmo/notes/2014_02_12_smalltalk_protocol_buffers.pdf

Possible Use Cases For Us?
 Java, C++, C#
 IBM MQ / Solace messages
 DB raw data
 Log messages to disk
 Show as XML / JSON
 exe utility associated with protobuf files
Use Cases at Barclays Investment Bank
http://www.slideshare.net/SergeyPodolsky/google-protocol-buffers-56085699

Protocol Buffers

More Related Content

What's hot

Similar to Protocol Buffers

More from Software Infrastructure

Recently uploaded

Protocol Buffers

Editor's Notes