Data exchange
Because JSON isn’t enough
Przemysław Kamiński
About me
● Python & JavaScript programmer at Mirantis - Pure Play OpenStack™
● Hobbyist Haskeller
● Soon-to-be father :)
This is a research-based presentation, not
Chatroom ver. 1
email: String
username: String
RoomType: Private | Public
name: String
type: RoomType
user: User
room: Room
timestamp: DateTime
message: String
Chatroom ver. 2
email: String
username: String
firstName: String
lastName: String
age: Integer
badges: String[]
What is data actually?
What is data actually?
What is data?
username: MrPresident
firstName: Frank
lastName: Underwood
age: 50
badges: {caring, loving}
username: MrPresident
firstName: Frank
lastName: Underwood
age: 50
badges: {caring, loving}
email: Email
username: String
firstName: String
lastName: String
age: Int
badges: String[]
Serialized data
“email”: “”,
“username”: “MrPresident”,
“firstName”: “Frank”,
“lastName”: “Underwood”,
“age”: 50,
“badges”: [“caring”, “loving”]
What can change?
● Data
● Schema
● Encoding/serialization
Data exchange formats!
● Schema
● Encoding/serialization
Can be serialized in only one way
{"username": "MrPresident", "firstName":
"Frank", "lastName": "Underwood", "age": 50,
"email": "",
"badges": ["caring", "loving"]}
Schema is sent with every message
“email”: “”,
“username”: “MrPresident”,
“firstName”: “Frank”,
“lastName”: “Underwood”,
“age”: 50,
“badges”: [“caring”, “loving”]
Schema can change at any time
“email”: “”,
“username”: “MrPresident”,
“first_name”: “Frank”,
“last_name”: “Underwood”,
“age”: 50,
“badges”: [“caring”, “loving”]
No built-in validation
“email”: “”,
“username”: “MrPresident”,
“firstName”: “Frank”,
“lastName”: “Underwood”,
“age”: “50”,
“badges”: [“caring”, “loving”]
[{"user": {"follow_request_sent": false, "profile_use_background_image": true, "default_profile_image": false, "id": 6253282, "verified": true, "entities": {"url": {"urls": [{"url": "",
"indices": [0, 22], "expanded_url": null}]}, "description": {"urls": []}}, "profile_image_url_https": "",
"profile_sidebar_fill_color": "DDEEF6", "geo_enabled": true, "profile_text_color": "333333", "followers_count": 1212864, "protected": false, "location": "San Francisco, CA", "profile_background_color":
"C0DEED", "listed_count": 10775, "utc_offset": -28800, "statuses_count": 3333, "description": "The Real Twitter API. I tweet about API changes, service issues and happily answer questions about Twitter and
our API. Don't get an answer? It's on my website.", "friends_count": 31, "profile_link_color": "0084B4", "profile_image_url": "http://a0.twimg.
com/profile_images/2284174872/7df3h38zabcvjylnyfe3_normal.png", "notifications": null, "show_all_inline_media": false, "profile_background_image_url_https": "https://si0.twimg.
com/images/themes/theme1/bg.png", "id_str": "6253282", "profile_background_image_url": "", "screen_name": "twitterapi", "lang": "en", "following":
null, "profile_background_tile": false, "favourites_count": 24, "name": "Twitter API", "url": "", "created_at": "Wed May 23 06:01:13 +0000 2007", "contributors_enabled": true,
"time_zone": "Pacific Time (US & Canada)", "profile_sidebar_border_color": "C0DEED", "default_profile": true, "is_translator": false}, "favorited": false, "contributors": null, "truncated": false, "source": "<a
href="//"" rel=""nofollow"">YoruFukurou</a>", "text": "Introducing the Twitter Certified Products Program:", "created_at": "Wed Aug 29
17:12:58 +0000 2012", "retweeted": false, "in_reply_to_status_id": null, "coordinates": null, "id": 240859602684612608, "entities": {"user_mentions": [], "hashtags": [], "urls": [{"url": "https://t.
co/MjJ8xAnT", "indices": [52, 73], "expanded_url": "", "display_url": ""}]}, "in_reply_to_status_id_str": null,
"place": null, "id_str": "240859602684612608", "in_reply_to_screen_name": null, "retweet_count": 121, "geo": null, "in_reply_to_user_id_str": null, "possibly_sensitive": false, "in_reply_to_user_id": null},
{"user": {"follow_request_sent": false, "profile_use_background_image": true, "default_profile_image": false, "id": 6253282, "verified": true, "entities": {"url": {"urls": [{"url": "",
"indices": [0, 22], "expanded_url": null}]}, "description": {"urls": []}}, "profile_image_url_https": "",
"profile_sidebar_fill_color": "DDEEF6", "geo_enabled": true, "profile_text_color": "333333", "followers_count": 1212864, "protected": false, "location": "San Francisco, CA", "profile_background_color":
"C0DEED", "listed_count": 10775, "utc_offset": -28800, "statuses_count": 3333, "description": "The Real Twitter API. I tweet about API changes, service issues and happily answer questions about Twitter and
our API. Don't get an answer? It's on my website.", "friends_count": 31, "profile_link_color": "0084B4", "profile_image_url": "http://a0.twimg.
com/profile_images/2284174872/7df3h38zabcvjylnyfe3_normal.png", "notifications": null,"show_all_inline_media": false, "profile_background_image_url_https": "https://si0.twimg.
com/images/themes/theme1/bg.png", "id_str": "6253282", "profile_background_image_url": "", "screen_name": "twitterapi", "lang": "en", "following":
null, "profile_background_tile": false, "favourites_count": 24, "name": "Twitter API", "url": "", "created_at": "Wed May 23 06:01:13 +0000 2007", "contributors_enabled": true,
"time_zone": "Pacific Time (US & Canada)", "profile_sidebar_border_color": "C0DEED", "default_profile": true, "is_translator": false}, "favorited": false, "contributors": null, "truncated": false, "source": "<a
href="//"" rel=""nofollow"">YoruFukurou</a>", "text": "We are working to resolve issues with application management & logging in to the dev portal: https://t.
co/p5bOzH0k ^TS", "created_at": "Sat Aug 25 17:26:51 +0000 2012", "retweeted": false, "in_reply_to_status_id": null, "coordinates": null, "id": 239413543487819778, "entities": {"user_mentions": [],
"hashtags": [], "urls": [{"url": "", "indices": [97, 118], "expanded_url": "", "display_url": ""}]},
"in_reply_to_status_id_str": null, "place": null, "id_str": "239413543487819778", "in_reply_to_screen_name": null,"retweet_count": 105, "geo": null, "in_reply_to_user_id_str": null, "possibly_sensitive":
false, "in_reply_to_user_id": null}]
XML: Redability
Eric Naggum
“(...) those who think it is rational to require parsing of character
data at each and every application interface are literally retarded
and willfully blind. (...) But, alas, people prefer buggy text formats
that they can approximate rather than precise binary formats that
follow general rules that are make them as easy to use as text
MessagePack is just another way of serializing dynamic
data and has the same problems as JSON.
The granddaddy
About ASN.1
● Created in 1984 by ITU (International Telecommunication Union)
● Revised and changed many times later
● There is no ASN.2 :)
● Heavily used by telecommunications industry (UMTS, LTE, SIM
● Cryptography: PKCS, PKI (X.509)
ASN.1 demo
Time for a demo...
ASN.1 - goodies
● INTEGER ranges: INTEGER (12345..12346)
● String ranges: UTF8String (FROM("A".."Z"))
● Regexes:
VisibleString (PATTERN "d#2/d#2/d#4-d#2:d#2") --
● Many encodings supported: BER, CER, DER, PER, XER (XML), GSER
● Can be encoded into JSON too:
ASN.1 - parametrized types
ASN.1 - parametrized types
My-Type {INTEGER : dummy1, Dummy2} ::=
first-field Dummy2,
second-field INTEGER (1..dummy1)
my-field ::= My-Type{10, UTF8String}
ASN.1 - so many string types
NumericString, PrintableString, VisibleString, ISO646String,
IA5String, TeletexString, T61String, VideotexString,
GraphicString, GeneralString, UniversalString, UTF8String
ASN.1 - VideotexString - Minitel???
ASN.1- quality of libraries
Because the language is quite rich the quality of libraries
varies greatly.
Most featureful seems to be OSS Noklava ( but
their product is proprietary and not free.
ASN.1- my opinion
Opinions about ASN.1 vary. IMHO it hasn’t
reached the critical mass required for the open-
source ecosystem to develop and great tools to
appear that will support all of its advanced
Bandwagon effect: once a product becomes popular, more people tend to "get
on the bandwagon" and buy it, too [Wikipedia]
ASN.1 demo
And let’s not forget about
openssl asn1parse -inform DER -in server-message-received
openssl asn1parse -in id_rsa.asn1-test
protobuf - general remarks
● Only one official serialization protocol
● 3rd party JSON encoder:
● ASN.1 with PER encoding is more compact: http:
● Tags are required (contrary to ASN.1)
protobuf - who uses it?
● Google
● Hadoop
● OpenStreetMap for their PBF Format
● Ubuntu One did use it for storage (before it was shut
● … know more?
protobuf - officially supported languages
● C++
● Java
● Python
● Objective-C (3.0+)
● C# (3.0+)
protobuf - unofficially supported languages
● Clojure
● Common Lisp
● Erlang
● Go
● Haskell
● JavaScript
● Lua
● Ruby
● Scala
● and many others
protobuf - RPC
Similarly to ASN.1, protobuf doesn’t have builtin RPC but
take a look at
protobuf demo
Time for a demo...
protobuf - why not ASN.1?
Thrift - general remarks
● Started at Facebook, now part of Apache Software Foundation
● RPC is built-in
● Supports multiple protocols (Binary, Compact, JSON,
multiplexed, simple JSON, tuple, ...) and transports
(Empty, Framed, HttpClient, ...) out of the box
● Tags are required
Thrift - officially supported languages
● C++
● C#
● Erlang
● Haskell
● Java
● JavaScript
● Python
● Ruby
● and many others with varying protocol/transport support
Thrift - who uses it?
● Facebook
● Hadoop
● Cassandra DID use it but switched to CQL
● Evernote
● LastFM
Thrift - funny Haskell bug
Thrift - funny Haskell bug ctd.
Thrift demo
Time for a demo...
Cap’n Proto
Cap’n proto
● Similar to the previous ones
● Tags are obligatory
● RPC is built-in
● One encoding protocol with optional “packing” for data
● No encoding/decoding step, data is already kept
encoded in memory
Cap’n proto - supported languages
● C++
● Erlang
● JavaScript (Node.js only)
● Python
● Ruby
Cap’n proto - supported languages (no RPC)
● C
● C#
● Go
● Java
● JavaScript
● Lua
● OCaml
Cap’n proto - RPC is quite interesting
Cap’n proto - RPC
Cap’n proto - E Language
# synchronous
def car := carMaker("Mercedes")
# asynchronous
def carPromise := carMaker <- new(“Mercedes”)
def movePromise := carPromise <- moveTo(2, 3)
Cap’n proto - E Language ctd.
when (movePromise) -> done(position) {
println(`Moved to $position`)
} catch e {
println(`Error moving: $e`)
Cap’n proto - E Language ctd.
Major implementations:
E on Java
E on Common Lisp
Avro - general remarks
● Developed by Apache Software Foundation
● Primarily used in Hadoop
● Dynamic schema, can be specified either in JSON or a
custom IDL (Interface Definition Language)
● Serialization to binary or JSON
● No tags! Every field is optional
Avro - supported languages
● C/C++
● C#
● Java
● Python
● Ruby
● Scala
Avro - dynamic schema
Because schema is dynamic, every message must be sent
along with its schema definition.
This resembles JSONSchema packed together with JSON
which holds the actual data.
Avro - dynamic schema ctd.
To decode the message you always need the schema it
was encoded with.
But: Avro has smart schema resolution rules so that if
reader expects the message in different format, Avro will
translate the decoded data to that format.
Avro - schema resolution
Avro - schema resolution for records
● the ordering of fields may be different: fields are matched by name.
● schemas for fields with the same name in both records are resolved
● if the writer's record contains a field with a name not present in the reader's
record, the writer's value for that field is ignored.
● if the reader's record schema has a field that contains a default value, and
writer's schema does not have a field with the same name, then the reader
should use the default value from its field.
● if the reader's record schema has a field with no default value, and writer's
schema does not have a field with the same name, an error is signalled.
Avro seems like a good way to start out when you’re still
used to JSON and are not convinced by other protocols.
Final remarks
Final remarks
● Required fields are forever (except for Avro, where “tag” is fieldname)
● Can be used not only for exchange but storage too: protobuf in ActiveMQ and Ubuntu One (just
think in terms of Thrift and take “transport” to be Database)
● For HTTP use the Content-Encoding and Content-Type headers
● Performance & size comparisons:
○ (C#)
○ (Java)
● Everyone creates their own:
○ Protocol Buffers (Google)
○ Thrift (Facebook)
○ Bond (Microsoft)

