2. 2
www.luxoft.co
m
● [Alice] I created new binary format! here
is C code to parse and construct it!
● [Bob] Do you have Python binding?
● [Charly] What about Java?
● [Dave] Oh I would like to use it in my PHP
web application.
● [Eve] I can’t use it for iOS development,
there are no Objective-C and Swift
utilities
● But, but, but … C - bindings …. Okay!!
Problem:
3. 3
www.luxoft.co
m
Declarative and imperative
Here is my protocol description; it's so easy to understand. Why can’t you
create a
binding for your own language?
Okay, here is declarative description:
5. 5
www.luxoft.co
m
Need to support several bindings for each file format.
Data Struct
C
C++
Python
Java
Go
Rust
...
Binary Data
PNG
ICMP
MP3
TCP
CAN
ZIP
...
N x N
bindings
9. 9
www.luxoft.co
m
Katai Struct benefits
➔ Documented and supported https://doc.kaitai.io/
➔ Bindings for a bunch of languages: C++/STL, C#, Go, Java, JavaScript,
Lua, Nim, Perl, PHP, Python, Ruby
➔ Awesome tools for debug and visualize https://github.com/kaitai-
io/awesome-kaitai
➔ Cross-platform
➔ Open-source https://github.com/kaitai-io
➔ Modern and sexy
➔ Already tested parsing and constructing code.
➔ Most common formats are already described (from fonts to
filesystems) https://formats.kaitai.io/
10. 10
www.luxoft.co
m
KSY file format
● meta: Contains metadata about the target binary format we
are parsing such as identifiers or the default endianness
● seq: Describes an ordered sequence of elements (attributes)
such as the element identifier, type, and size (or literal
contents, e.g., magic numbers).
● enum: Maps integer constants to symbolic names for clarity,
which can then be referenced using the enum key.
● type: Declares user-defined named types, each of which can
contain any of the elements above, including other type
elements.
22. 22
www.luxoft.co
m
Summary
Benefits:
▪ Open-source
▪ Great tooling
▪ A lot of bindings
▪ Many binary formats already implemented
Drawbacks:
▪ Implemented in Scala, but doesn't always work out-of-the-box
▪ Current version is 0.8 (not stable enough)
▪ Generation for some languages requires some workarounds