16. Can be serialized in only one way
{"username": "MrPresident", "firstName":
"Frank", "lastName": "Underwood", "age": 50,
"email": "president@whitehouse.gov",
"badges": ["caring", "loving"]}
17. Schema is sent with every message
{
“email”: “president@whitehouse.gov”,
“username”: “MrPresident”,
“firstName”: “Frank”,
“lastName”: “Underwood”,
“age”: 50,
“badges”: [“caring”, “loving”]
}
18. Schema can change at any time
{
“email”: “president@whitehouse.gov”,
“username”: “MrPresident”,
“first_name”: “Frank”,
“last_name”: “Underwood”,
“age”: 50,
“badges”: [“caring”, “loving”]
}
23. Eric Naggum
“(...) those who think it is rational to require parsing of character
data at each and every application interface are literally retarded
and willfully blind. (...) But, alas, people prefer buggy text formats
that they can approximate rather than precise binary formats that
follow general rules that are make them as easy to use as text
formats.”
http://www.xach.com/naggum/articles/3242274237190594@naggum.no.txt
26. About ASN.1
● Created in 1984 by ITU (International Telecommunication Union)
● Revised and changed many times later
● There is no ASN.2 :)
● Heavily used by telecommunications industry (UMTS, LTE, SIM
cards)
● Cryptography: PKCS, PKI (X.509)
● LDAP, RFID
33. ASN.1- quality of libraries
Because the language is quite rich the quality of libraries
varies greatly.
Most featureful seems to be OSS Noklava (oss.com) but
their product is proprietary and not free.
34. ASN.1- my opinion
Opinions about ASN.1 vary. IMHO it hasn’t
reached the critical mass required for the open-
source ecosystem to develop and great tools to
appear that will support all of its advanced
features.
Bandwagon effect: once a product becomes popular, more people tend to "get
on the bandwagon" and buy it, too [Wikipedia]
35. ASN.1 demo
And let’s not forget about
openssl asn1parse -inform DER -in server-message-received
openssl asn1parse -in id_rsa.asn1-test
https://lapo.it/asn1js/
37. protobuf - general remarks
● Only one official serialization protocol
● 3rd party JSON encoder: https://github.com/dpp-
name/protobuf-json
● ASN.1 with PER encoding is more compact: http:
//stackoverflow.com/a/4441622
● Tags are required (contrary to ASN.1)
38. protobuf - who uses it?
● Google
● Hadoop
● OpenStreetMap for their PBF Format
● Ubuntu One did use it for storage (before it was shut
down)
● … know more?
39. protobuf - officially supported languages
● C++
● Java
● Python
● Objective-C (3.0+)
● C# (3.0+)
40. protobuf - unofficially supported languages
● Clojure
● Common Lisp
● Erlang
● Go
● Haskell
● JavaScript
● Lua
● Ruby
● Scala
● and many others
41. protobuf - RPC
Similarly to ASN.1, protobuf doesn’t have builtin RPC but
take a look at http://www.grpc.io/
45. Thrift - general remarks
● Started at Facebook, now part of Apache Software Foundation
● RPC is built-in
● Supports multiple protocols (Binary, Compact, JSON,
multiplexed, simple JSON, tuple, ...) and transports
(Empty, Framed, HttpClient, ...) out of the box
● Tags are required
● http://diwakergupta.github.io/thrift-missing-guide/
46. Thrift - officially supported languages
● C++
● C#
● Erlang
● Haskell
● Java
● JavaScript
● Python
● Ruby
● and many others with varying protocol/transport support
47. Thrift - who uses it?
● Facebook
● Hadoop
● Cassandra DID use it but switched to CQL
● Evernote
● LastFM
52. Cap’n proto
● Similar to the previous ones
● Tags are obligatory
● RPC is built-in
● One encoding protocol with optional “packing” for data
compression
● No encoding/decoding step, data is already kept
encoded in memory
53. Cap’n proto - supported languages
● C++
● Erlang
● JavaScript (Node.js only)
● Python
● Ruby
54. Cap’n proto - supported languages (no RPC)
● C
● C#
● Go
● Java
● JavaScript
● Lua
● OCaml
61. Avro - general remarks
● Developed by Apache Software Foundation
● Primarily used in Hadoop
● Dynamic schema, can be specified either in JSON or a
custom IDL (Interface Definition Language)
● Serialization to binary or JSON
● No tags! Every field is optional
63. Avro - dynamic schema
Because schema is dynamic, every message must be sent
along with its schema definition.
This resembles JSONSchema packed together with JSON
which holds the actual data.
64. Avro - dynamic schema ctd.
To decode the message you always need the schema it
was encoded with.
But: Avro has smart schema resolution rules so that if
reader expects the message in different format, Avro will
translate the decoded data to that format.
66. Avro - schema resolution for records
● the ordering of fields may be different: fields are matched by name.
● schemas for fields with the same name in both records are resolved
recursively.
● if the writer's record contains a field with a name not present in the reader's
record, the writer's value for that field is ignored.
● if the reader's record schema has a field that contains a default value, and
writer's schema does not have a field with the same name, then the reader
should use the default value from its field.
● if the reader's record schema has a field with no default value, and writer's
schema does not have a field with the same name, an error is signalled.
67. Avro
Avro seems like a good way to start out when you’re still
used to JSON and are not convinced by other protocols.
69. Final remarks
● Required fields are forever (except for Avro, where “tag” is fieldname)
● Can be used not only for exchange but storage too: protobuf in ActiveMQ and Ubuntu One (just
think in terms of Thrift and take “transport” to be Database)
● For HTTP use the Content-Encoding and Content-Type headers
● Performance & size comparisons:
○ https://github.com/sidshetye/SerializersCompare (C#)
○ https://github.com/eishay/jvm-serializers (Java)
● Everyone creates their own:
○ Protocol Buffers (Google)
○ Thrift (Facebook)
○ Bond (Microsoft)