SlideShare a Scribd company logo
1 of 36
Download to read offline
Story writing byte
Serializer in go
@dxhuy
@huydx
Byte serializer
Problem space
• Writting buffer control that need to persist struct
data in disk
• Struct data is simple (will not change in near
future)
• Program needs
• Low memory footprint
• Low CPU usage
Bunches of options
• encoding/gob (base on encoding/binary)
• gogoprotobuf
• capnproto (glycerine/go-capnproto)
• ugorji/go/codec
• mgo.v2/bson
• .....
Some problems
• Some are overcomplex
• Cryptic error message
• Some are fast, but not support all datastructure (map)
• flatbuffer (could use vector instead, but look up is not
O(1))
• All libraries does some abstraction, make it hard to debug
• Write to disk failed at the middle, some bytes are written,
some are not
• Using library lack of fine-grained control
• You want some special behaviours for some special field
• You want some special behaviours when it failed
So let's write your own
Serializer anatomy
type A struct {
Name string
BirthDay time.Time
Phone string
Siblings int
Spouse bool
Money float64
}
Example struct
Struct layout matter
type A struct {
Name string
BirthDay time.Time
Phone string
Siblings int
Spouse bool
Money float64
}
size + order
In general, there 2 types
• Dynamic layout: just pass struct, serializer will do
everything for you
• encoding/gob, encoding/json
• Library have to figure out "what type" first,
than serialize later
• Fix layout: you have to tell serializer about your
struct first
• protobuf, capnproto, messagepack...
• Library already know type, just using code-
gen to serialize
Dynamic layout Fix layout
Advantages
- Easy to use
- Easy support nested
struct...
- No additional step
- Easy to optimize
- Managable
protocol file (.proto
or .flatbuffer)
Disadvantages
- Harder to optimize
- Need reflection
(performance
downgrade)
- Needs code
generation
What should we use
• What we have
• Fix protocol
• Need low memory footprint / low CPU usage
• So I decided to have serialization
method which is
• Fix layout in code
• But without codegen
func MarshalRaw(a *A, buf *bytes.Buffer) {
encodeString(a.Name, buf)
encodeUint64(uint64(a.BirthDay.UnixNano()), buf)
encodeString(a.Phone, buf)
encodeUint64(uint64(a.Siblings), buf)
encodeBool(a.Spouse, buf)
encodeFloat64(a.Money, buf)
}
Your struct field order is fixed in code
Name
Birthday
Phone
...
Implementation
First try
• Using encoding/binary to convert type to byte
array
• Write byte array to buffer
• For dynamic size struct (vector, map..)
• Write size first as int and than write payload
• When decode, read size first, and than read
payload
uint64
func encodeUint64(v uint64, w io.Writer) error {
b := [64 / 8]byte{}
binary.LittleEndian.PutUint64(b[:], v)
_, err := w.Write(b[:])
return err
}
func decodeUint64(r io.Reader) (uint64, error) {
var l uint64
err := binary.Read(r, binary.LittleEndian, &l)
if err != nil {
return 0, err
}
return l, nil
}
string
func encodeString(v string, w io.Writer) error {
l := len(v)
err := encodeUint16(uint16(l), w)
if err != nil {
return err
}
_, err = w.Write([]byte(v))
return err
}
func decodeString(r io.Reader) (string, error) {
l, err := decodeUint16(r)
if err != nil {
return "", err
}
b := make([]byte, l)
_, err = r.Read(b)
return string(b), err
}
float64
func encodeFloat64(v float64, w io.Writer) error
{
var zeroByte byte
b := [64 / 8]byte{}
bs := math.Float64Bits(v)
binary.LittleEndian.PutUint64(b[:], bs)
_, err := w.Write(b[:])
return err
}
func decodeFloat64(r io.Reader) (float64, error)
{
var l uint64
err := binary.Read(r, binary.LittleEndian, &l)
if err != nil {
return 0, nil
}
return math.Float64fromBits(l), nil
}
It's so simple
And does nothing special
It must be fast!
Let's benchmark
• Using
• https://github.com/alecthomas/go_serialization_benchmarks
• Add our own serialization method and
compare with another
• Call it `raw`
• Let's see result
BenchmarkMsgpMarshal-8 10000000 161 ns/op 128 B/op 1 allocs/op
BenchmarkMsgpUnmarshal-8 5000000 307 ns/op 112 B/op 3 allocs/op
BenchmarkVmihailencoMsgpackMarshal-8 1000000 1840 ns/op 368 B/op 6 allocs/op
BenchmarkVmihailencoMsgpackUnmarshal-8 1000000 1874 ns/op 384 B/op 13 allocs/op
BenchmarkRawMarshaller-8 2000000 826 ns/op 384 B/op 13 allocs/op
BenchmarkRawUnmarshaller-8 2000000 710 ns/op 338 B/op 17 allocs/op
BenchmarkJsonMarshal-8 500000 2804 ns/op 1232 B/op 10 allocs/op
BenchmarkJsonUnmarshal-8 500000 2999 ns/op 464 B/op 7 allocs/op
BenchmarkEasyJsonMarshal-8 1000000 1223 ns/op 784 B/op 5 allocs/op
BenchmarkEasyJsonUnmarshal-8 1000000 1351 ns/op 160 B/op 4 allocs/op
BenchmarkBsonMarshal-8 1000000 1405 ns/op 392 B/op 10 allocs/op
BenchmarkBsonUnmarshal-8 1000000 1869 ns/op 248 B/op 21 allocs/op
BenchmarkGobMarshal-8 2000000 903 ns/op 48 B/op 2 allocs/op
BenchmarkGobUnmarshal-8 2000000 913 ns/op 112 B/op 3 allocs/op
BenchmarkXdrMarshal-8 1000000 1553 ns/op 456 B/op 21 allocs/op
BenchmarkXdrUnmarshal-8 1000000 1392 ns/op 240 B/op 11 allocs/op
BenchmarkUgorjiCodecMsgpackMarshal-8 1000000 2190 ns/op 2753 B/op 8 allocs/op
BenchmarkUgorjiCodecMsgpackUnmarshal-8 500000 2207 ns/op 3008 B/op 6 allocs/op
BenchmarkUgorjiCodecBincMarshal-8 1000000 2070 ns/op 2785 B/op 8 allocs/op
BenchmarkUgorjiCodecBincUnmarshal-8 500000 2386 ns/op 3168 B/op 9 allocs/op
BenchmarkSerealMarshal-8 500000 2563 ns/op 912 B/op 21 allocs/op
BenchmarkSerealUnmarshal-8 500000 3068 ns/op 1008 B/op 34 allocs/op
BenchmarkBinaryMarshal-8 1000000 1221 ns/op 256 B/op 16 allocs/op
BenchmarkBinaryUnmarshal-8 1000000 1389 ns/op 335 B/op 22 allocs/op
BenchmarkFlatBuffersMarshal-8 5000000 345 ns/op 0 B/op 0 allocs/op
BenchmarkFlatBuffersUnmarshal-8 5000000 259 ns/op 112 B/op 3 allocs/op
BenchmarkCapNProtoMarshal-8 3000000 423 ns/op 56 B/op 2 allocs/op
BenchmarkCapNProtoUnmarshal-8 5000000 384 ns/op 200 B/op 6 allocs/op
BenchmarkCapNProto2Marshal-8 2000000 695 ns/op 244 B/op 3 allocs/op
BenchmarkCapNProto2Unmarshal-8 2000000 859 ns/op 320 B/op 6 allocs/op
BenchmarkHproseMarshal-8 1000000 1033 ns/op 479 B/op 8 allocs/op
BenchmarkHproseUnmarshal-8 2000000 1028 ns/op 319 B/op 10 allocs/op
BenchmarkProtobufMarshal-8 2000000 885 ns/op 200 B/op 7 allocs/op
BenchmarkProtobufUnmarshal-8 2000000 641 ns/op 192 B/op 10 allocs/op
BenchmarkGoprotobufMarshal-8 3000000 447 ns/op 312 B/op 4 allocs/op
BenchmarkGoprotobufUnmarshal-8 3000000 592 ns/op 432 B/op 9 allocs/op
BenchmarkGogoprotobufMarshal-8 10000000 131 ns/op 64 B/op 1 allocs/op
BenchmarkGogoprotobufUnmarshal-8 10000000 222 ns/op 96 B/op 3 allocs/op
BenchmarkColferMarshal-8 10000000 123 ns/op 64 B/op 1 allocs/op
BenchmarkColferUnmarshal-8 10000000 181 ns/op 112 B/op 3 allocs/op
BenchmarkGencodeMarshal-8 10000000 153 ns/op 80 B/op 2 allocs/op
BenchmarkGencodeUnmarshal-8 10000000 172 ns/op 112 B/op 3 allocs/op
BenchmarkGencodeUnsafeMarshal-8 20000000 98.2 ns/op 48 B/op 1 allocs/op
BenchmarkGencodeUnsafeUnmarshal-8 10000000 142 ns/op 96 B/op 3 allocs/op
BenchmarkXDR2Marshal-8 10000000 151 ns/op 64 B/op 1 allocs/op
BenchmarkXDR2Unmarshal-8 10000000 145 ns/op 32 B/op 2 allocs/op
BenchmarkGoAvroMarshal-8 500000 2291 ns/op 1032 B/op 33 allocs/op
BenchmarkGoAvroUnmarshal-8 300000 5388 ns/op 3440 B/op 89 allocs/op
Not bad
But slow as 1/10 compare to

BenchmarkGencode
What we did wrong?
Slow pattern
• Use GODEBUG=allocfreetrace=1 to find
redundant allocation pattern
func encodeUint64(v uint64, w io.Writer) error {
b := [64 / 8]byte{}
binary.LittleEndian.PutUint64(b[:], v)
_, err := w.Write(b[:])
return err
}
func encodeString(v string, w io.Writer) error {
l := len(v)
err := encodeUint16(uint16(l), w)
if err != nil {
return err
}
_, err = w.Write([]byte(v))
return err
}
func rawbyteslice(size int) (b []byte) {
cap := roundupsize(uintptr(size))
p := mallocgc(cap, nil, false)
if cap != uintptr(size) {
memclrNoHeapPointers(add(p,
uintptr(size)), cap-uintptr(size))
}
*(*slice)(unsafe.Pointer(&b)) = slice{p,
size, int(cap)}
return
}
Slow pattern
• Took a look at some fast serialization
• Just byte copying around, no alloc
• And in our case, we write to file write after
encode, so we do not need each serialization
buffer, we just need global one
Second try
• Prepare a global buffer
• Grow if needed
• Clear buffer each run
• Just copy byte around, no more allocation
var bufferByte = make([]byte, DEFAULT_BUFFER_CAP)
func (rs Raw2Serializer) Marshal(o interface{}) []byte {
a := o.(*A)
cleanBuffer()
idx := 0
idx += WriteString(idx, a.Name)
idx += WriteUint64(idx, uint64(a.BirthDay.UnixNano()))
idx += WriteString(idx, a.Phone)
idx += WriteUint64(idx, uint64(a.Siblings))
idx += WriteBool(idx, a.Spouse)
idx += WriteFloat64(idx, a.Money)
// copy from a to bufferByte
return bufferByte[0:idx]
}
small different, need index control to know where we need to copy
and need to clean Buffer for each run
func WriteUint64(idx int, n uint64) int {
if (idx + 8) > currentCap {
growBufferIfneeded()
}
for i := uint(idx); i < uint(8); i++ {
bufferByte[i] = byte(n >> (i*8))
}
return 8
}
func WriteString(idx int, s string) int {
l := len(s)
if (idx + l) > currentCap {
growBufferIfneeded()
}
n := WriteUint64(idx, uint64(l))
// NOTE: copy works without conversion
copy(bufferByte[idx+n:idx+l], s)
return l+n
}
BenchmarkMsgpMarshal-8 10000000 158 ns/op 128 B/op 1 allocs/op
BenchmarkMsgpUnmarshal-8 5000000 335 ns/op 112 B/op 3 allocs/op
BenchmarkVmihailencoMsgpackMarshal-8 1000000 1764 ns/op 368 B/op 6 allocs/op
BenchmarkVmihailencoMsgpackUnmarshal-8 1000000 1779 ns/op 384 B/op 13 allocs/op
BenchmarkRawMarshaller-8 2000000 806 ns/op 384 B/op 13 allocs/op
BenchmarkRawUnmarshaller-8 2000000 706 ns/op 338 B/op 17
allocs/op
BenchmarkRaw2Marshaller-8 20000000 83.6 ns/op 0 B/op 0
allocs/op
10 times faster !
As fast as fastest andyleap/gencode
bench again
What I learned
• Hidden allocation reduce performance
• Serialization to file pitfalls
• Need thread-safe implement to prevent dirty file
• Need versioning (write version first, than payload
later) for backward compatibility
• Checksum matter
• You can calculate checksum directly from struct,
no need to calculate from bytes
• Using fnv to hash all fields and add up together,
instead of using CRC32 to calculate the whole
byte arrays
Interesting techniques of
other serialization methods
varint (protobuf)
• Available in many softwares (protobuf, sqlite,
webassemlby (LEB128 of LLVM), golang encoding/
binary)
• Compressed positive integer (negative number with 2-
complement will take more bits)
• Idea:
• most of integer in our app is small ("not very big")
• Use as little number of bits as possible
• 7 bit per byte, MSB bit as "continuation bit"
• Cons:
• CPU cost
• decoding is a bit complex
varint (protobuf)
t := uint64(l)
for t >= 0x80 {
buf[i+8] = byte(t) | 0x80
t >>= 7
i++
}
Many variant (group varint encoding, 

prefix varint encoding etc ... )
zigzag encoding (protobuf)
• varint works only with positive number
• Zigzag encoding encode negative number as
nearest (in absolute) positive number
0 0
-1 1
1 2
-2 3
2147483647 4294967294
-2147483648 4294967295
zigzag = (n << 1) ^ (n >> (BIT_WIDTH - 1)


Remember that arithmetic shift replicates the sign bit
(n >> (BIT_WIDTH - 1) -> 11111...1 for negative
(n >> (BIT_WIDTH - 1) -> 00000...0 for positive
So when XOR with negative n, a lot of 1 will be eliminate
float reverse (gob)
// floatBits returns a uint64 holding the bits of a floating-point
number.
// Floating-point numbers are transmitted as uint64s holding the bits
// of the underlying representation. They are sent byte-reversed, with
// the exponent end coming out first, so integer floating point numbers
// (for example) transmit more compactly. This routine does the
// swizzling.
func floatBits(f float64) uint64 {
u := math.Float64bits(f)
var v uint64
for i := 0; i < 8; i++ {
v <<= 8
v |= u & 0xFF
u >>= 8
}
return v
}
unsafe (andyleap/gencode)
v := *(*uint64)(unsafe.Pointer(&(d.Height)))
Unmarshal number without copy or allocation
You could use same technique for string too
http://qiita.com/mattn/items/176459728ff4f854b165
Finally
• Write your own serialization is not hard, and fun
• You can learn a lot from existence method
• There are tons of techniques could be used to
enhance performance
• When there are no much preferences, let's use
fix-layout type serialization
• Version control proto file
• High performance

More Related Content

Similar to Story Writing Byte Serializer in Golang

Microprocessor vs. microcontroller
Microprocessor vs. microcontrollerMicroprocessor vs. microcontroller
Microprocessor vs. microcontroller
aviban
 
ExperiencesSharingOnEmbeddedSystemDevelopment_20160321
ExperiencesSharingOnEmbeddedSystemDevelopment_20160321ExperiencesSharingOnEmbeddedSystemDevelopment_20160321
ExperiencesSharingOnEmbeddedSystemDevelopment_20160321
Teddy Hsiung
 
Crash_Report_Mechanism_In_Tizen
Crash_Report_Mechanism_In_TizenCrash_Report_Mechanism_In_Tizen
Crash_Report_Mechanism_In_Tizen
Lex Yu
 
Effisiensi prog atmel
Effisiensi prog atmelEffisiensi prog atmel
Effisiensi prog atmel
rm_dhozooo
 

Similar to Story Writing Byte Serializer in Golang (20)

The reasons why 64-bit programs require more stack memory
The reasons why 64-bit programs require more stack memoryThe reasons why 64-bit programs require more stack memory
The reasons why 64-bit programs require more stack memory
 
Page table manipulation attack
Page table manipulation attackPage table manipulation attack
Page table manipulation attack
 
Microprocessor vs. microcontroller
Microprocessor vs. microcontrollerMicroprocessor vs. microcontroller
Microprocessor vs. microcontroller
 
Doc8453
Doc8453Doc8453
Doc8453
 
Network server in go #gocon 2013-11-14
Network server in go  #gocon 2013-11-14Network server in go  #gocon 2013-11-14
Network server in go #gocon 2013-11-14
 
ExperiencesSharingOnEmbeddedSystemDevelopment_20160321
ExperiencesSharingOnEmbeddedSystemDevelopment_20160321ExperiencesSharingOnEmbeddedSystemDevelopment_20160321
ExperiencesSharingOnEmbeddedSystemDevelopment_20160321
 
Codecraft Dunedin, 2015-03-04, Blackbox feature for Cleanflight, Nicholas She...
Codecraft Dunedin, 2015-03-04, Blackbox feature for Cleanflight, Nicholas She...Codecraft Dunedin, 2015-03-04, Blackbox feature for Cleanflight, Nicholas She...
Codecraft Dunedin, 2015-03-04, Blackbox feature for Cleanflight, Nicholas She...
 
Happy To Use SIMD
Happy To Use SIMDHappy To Use SIMD
Happy To Use SIMD
 
OpenGL 4.5 Reference Card
OpenGL 4.5 Reference CardOpenGL 4.5 Reference Card
OpenGL 4.5 Reference Card
 
Exploiting GPU's for Columnar DataFrrames by Kiran Lonikar
Exploiting GPU's for Columnar DataFrrames by Kiran LonikarExploiting GPU's for Columnar DataFrrames by Kiran Lonikar
Exploiting GPU's for Columnar DataFrrames by Kiran Lonikar
 
Interpreter, Compiler, JIT from scratch
Interpreter, Compiler, JIT from scratchInterpreter, Compiler, JIT from scratch
Interpreter, Compiler, JIT from scratch
 
A Speculative Technique for Auto-Memoization Processor with Multithreading
A Speculative Technique for Auto-Memoization Processor with MultithreadingA Speculative Technique for Auto-Memoization Processor with Multithreading
A Speculative Technique for Auto-Memoization Processor with Multithreading
 
Crash_Report_Mechanism_In_Tizen
Crash_Report_Mechanism_In_TizenCrash_Report_Mechanism_In_Tizen
Crash_Report_Mechanism_In_Tizen
 
Verilog Lecture2 thhts
Verilog Lecture2 thhtsVerilog Lecture2 thhts
Verilog Lecture2 thhts
 
Optimizing Games for Mobiles
Optimizing Games for MobilesOptimizing Games for Mobiles
Optimizing Games for Mobiles
 
Atmega324 p
Atmega324 pAtmega324 p
Atmega324 p
 
Berkeley Packet Filters
Berkeley Packet FiltersBerkeley Packet Filters
Berkeley Packet Filters
 
Effisiensi prog atmel
Effisiensi prog atmelEffisiensi prog atmel
Effisiensi prog atmel
 
LLVM Register Allocation (2nd Version)
LLVM Register Allocation (2nd Version)LLVM Register Allocation (2nd Version)
LLVM Register Allocation (2nd Version)
 
Architecture of pentium family
Architecture of pentium familyArchitecture of pentium family
Architecture of pentium family
 

More from Huy Do

CacheとRailsの簡単まとめ
CacheとRailsの簡単まとめCacheとRailsの簡単まとめ
CacheとRailsの簡単まとめ
Huy Do
 
[Htmlday]present
[Htmlday]present[Htmlday]present
[Htmlday]present
Huy Do
 

More from Huy Do (17)

Distributed Tracing, from internal SAAS insights
Distributed Tracing, from internal SAAS insightsDistributed Tracing, from internal SAAS insights
Distributed Tracing, from internal SAAS insights
 
Write on memory TSDB database (gocon tokyo autumn 2018)
Write on memory TSDB database (gocon tokyo autumn 2018)Write on memory TSDB database (gocon tokyo autumn 2018)
Write on memory TSDB database (gocon tokyo autumn 2018)
 
Some note about GC algorithm
Some note about GC algorithmSome note about GC algorithm
Some note about GC algorithm
 
Engineering Efficiency in LINE
Engineering Efficiency in LINEEngineering Efficiency in LINE
Engineering Efficiency in LINE
 
GOCON Autumn (Story of our own Monitoring Agent in golang)
GOCON Autumn (Story of our own Monitoring Agent in golang)GOCON Autumn (Story of our own Monitoring Agent in golang)
GOCON Autumn (Story of our own Monitoring Agent in golang)
 
Akka と Typeの話
Akka と Typeの話Akka と Typeの話
Akka と Typeの話
 
[Scalameetup]spark shuffle
[Scalameetup]spark shuffle[Scalameetup]spark shuffle
[Scalameetup]spark shuffle
 
DI in ruby
DI in rubyDI in ruby
DI in ruby
 
Itlc2015
Itlc2015Itlc2015
Itlc2015
 
Consistent Hashingの小ネタ
Consistent Hashingの小ネタConsistent Hashingの小ネタ
Consistent Hashingの小ネタ
 
Thriftを用いた分散型のNyancatを作ってきた
Thriftを用いた分散型のNyancatを作ってきたThriftを用いた分散型のNyancatを作ってきた
Thriftを用いた分散型のNyancatを作ってきた
 
NoSQL for great good [hanoi.rb talk]
NoSQL for great good [hanoi.rb talk]NoSQL for great good [hanoi.rb talk]
NoSQL for great good [hanoi.rb talk]
 
実践Akka
実践Akka実践Akka
実践Akka
 
CA15卒勉強会 メタプログラミングについて
CA15卒勉強会 メタプログラミングについてCA15卒勉強会 メタプログラミングについて
CA15卒勉強会 メタプログラミングについて
 
Making CLI app in ruby
Making CLI app in rubyMaking CLI app in ruby
Making CLI app in ruby
 
CacheとRailsの簡単まとめ
CacheとRailsの簡単まとめCacheとRailsの簡単まとめ
CacheとRailsの簡単まとめ
 
[Htmlday]present
[Htmlday]present[Htmlday]present
[Htmlday]present
 

Recently uploaded

Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Medical / Health Care (+971588192166) Mifepristone and Misoprostol tablets 200mg
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
masabamasaba
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
masabamasaba
 

Recently uploaded (20)

WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?
 
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
 
Harnessing ChatGPT - Elevating Productivity in Today's Agile Environment
Harnessing ChatGPT  - Elevating Productivity in Today's Agile EnvironmentHarnessing ChatGPT  - Elevating Productivity in Today's Agile Environment
Harnessing ChatGPT - Elevating Productivity in Today's Agile Environment
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
 
%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto
 
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With SimplicityWSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the past
 
WSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go Platformless
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfPayment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
 
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
 

Story Writing Byte Serializer in Golang

  • 1. Story writing byte Serializer in go @dxhuy @huydx
  • 3. Problem space • Writting buffer control that need to persist struct data in disk • Struct data is simple (will not change in near future) • Program needs • Low memory footprint • Low CPU usage
  • 4. Bunches of options • encoding/gob (base on encoding/binary) • gogoprotobuf • capnproto (glycerine/go-capnproto) • ugorji/go/codec • mgo.v2/bson • .....
  • 5. Some problems • Some are overcomplex • Cryptic error message • Some are fast, but not support all datastructure (map) • flatbuffer (could use vector instead, but look up is not O(1)) • All libraries does some abstraction, make it hard to debug • Write to disk failed at the middle, some bytes are written, some are not • Using library lack of fine-grained control • You want some special behaviours for some special field • You want some special behaviours when it failed
  • 6. So let's write your own
  • 8. type A struct { Name string BirthDay time.Time Phone string Siblings int Spouse bool Money float64 } Example struct
  • 9. Struct layout matter type A struct { Name string BirthDay time.Time Phone string Siblings int Spouse bool Money float64 } size + order
  • 10. In general, there 2 types • Dynamic layout: just pass struct, serializer will do everything for you • encoding/gob, encoding/json • Library have to figure out "what type" first, than serialize later • Fix layout: you have to tell serializer about your struct first • protobuf, capnproto, messagepack... • Library already know type, just using code- gen to serialize
  • 11. Dynamic layout Fix layout Advantages - Easy to use - Easy support nested struct... - No additional step - Easy to optimize - Managable protocol file (.proto or .flatbuffer) Disadvantages - Harder to optimize - Need reflection (performance downgrade) - Needs code generation
  • 12. What should we use • What we have • Fix protocol • Need low memory footprint / low CPU usage • So I decided to have serialization method which is • Fix layout in code • But without codegen
  • 13. func MarshalRaw(a *A, buf *bytes.Buffer) { encodeString(a.Name, buf) encodeUint64(uint64(a.BirthDay.UnixNano()), buf) encodeString(a.Phone, buf) encodeUint64(uint64(a.Siblings), buf) encodeBool(a.Spouse, buf) encodeFloat64(a.Money, buf) } Your struct field order is fixed in code Name Birthday Phone ...
  • 15. First try • Using encoding/binary to convert type to byte array • Write byte array to buffer • For dynamic size struct (vector, map..) • Write size first as int and than write payload • When decode, read size first, and than read payload
  • 16. uint64 func encodeUint64(v uint64, w io.Writer) error { b := [64 / 8]byte{} binary.LittleEndian.PutUint64(b[:], v) _, err := w.Write(b[:]) return err } func decodeUint64(r io.Reader) (uint64, error) { var l uint64 err := binary.Read(r, binary.LittleEndian, &l) if err != nil { return 0, err } return l, nil }
  • 17. string func encodeString(v string, w io.Writer) error { l := len(v) err := encodeUint16(uint16(l), w) if err != nil { return err } _, err = w.Write([]byte(v)) return err } func decodeString(r io.Reader) (string, error) { l, err := decodeUint16(r) if err != nil { return "", err } b := make([]byte, l) _, err = r.Read(b) return string(b), err }
  • 18. float64 func encodeFloat64(v float64, w io.Writer) error { var zeroByte byte b := [64 / 8]byte{} bs := math.Float64Bits(v) binary.LittleEndian.PutUint64(b[:], bs) _, err := w.Write(b[:]) return err } func decodeFloat64(r io.Reader) (float64, error) { var l uint64 err := binary.Read(r, binary.LittleEndian, &l) if err != nil { return 0, nil } return math.Float64fromBits(l), nil }
  • 19. It's so simple And does nothing special It must be fast!
  • 20. Let's benchmark • Using • https://github.com/alecthomas/go_serialization_benchmarks • Add our own serialization method and compare with another • Call it `raw` • Let's see result
  • 21. BenchmarkMsgpMarshal-8 10000000 161 ns/op 128 B/op 1 allocs/op BenchmarkMsgpUnmarshal-8 5000000 307 ns/op 112 B/op 3 allocs/op BenchmarkVmihailencoMsgpackMarshal-8 1000000 1840 ns/op 368 B/op 6 allocs/op BenchmarkVmihailencoMsgpackUnmarshal-8 1000000 1874 ns/op 384 B/op 13 allocs/op BenchmarkRawMarshaller-8 2000000 826 ns/op 384 B/op 13 allocs/op BenchmarkRawUnmarshaller-8 2000000 710 ns/op 338 B/op 17 allocs/op BenchmarkJsonMarshal-8 500000 2804 ns/op 1232 B/op 10 allocs/op BenchmarkJsonUnmarshal-8 500000 2999 ns/op 464 B/op 7 allocs/op BenchmarkEasyJsonMarshal-8 1000000 1223 ns/op 784 B/op 5 allocs/op BenchmarkEasyJsonUnmarshal-8 1000000 1351 ns/op 160 B/op 4 allocs/op BenchmarkBsonMarshal-8 1000000 1405 ns/op 392 B/op 10 allocs/op BenchmarkBsonUnmarshal-8 1000000 1869 ns/op 248 B/op 21 allocs/op BenchmarkGobMarshal-8 2000000 903 ns/op 48 B/op 2 allocs/op BenchmarkGobUnmarshal-8 2000000 913 ns/op 112 B/op 3 allocs/op BenchmarkXdrMarshal-8 1000000 1553 ns/op 456 B/op 21 allocs/op BenchmarkXdrUnmarshal-8 1000000 1392 ns/op 240 B/op 11 allocs/op BenchmarkUgorjiCodecMsgpackMarshal-8 1000000 2190 ns/op 2753 B/op 8 allocs/op BenchmarkUgorjiCodecMsgpackUnmarshal-8 500000 2207 ns/op 3008 B/op 6 allocs/op BenchmarkUgorjiCodecBincMarshal-8 1000000 2070 ns/op 2785 B/op 8 allocs/op BenchmarkUgorjiCodecBincUnmarshal-8 500000 2386 ns/op 3168 B/op 9 allocs/op BenchmarkSerealMarshal-8 500000 2563 ns/op 912 B/op 21 allocs/op BenchmarkSerealUnmarshal-8 500000 3068 ns/op 1008 B/op 34 allocs/op BenchmarkBinaryMarshal-8 1000000 1221 ns/op 256 B/op 16 allocs/op BenchmarkBinaryUnmarshal-8 1000000 1389 ns/op 335 B/op 22 allocs/op BenchmarkFlatBuffersMarshal-8 5000000 345 ns/op 0 B/op 0 allocs/op BenchmarkFlatBuffersUnmarshal-8 5000000 259 ns/op 112 B/op 3 allocs/op BenchmarkCapNProtoMarshal-8 3000000 423 ns/op 56 B/op 2 allocs/op BenchmarkCapNProtoUnmarshal-8 5000000 384 ns/op 200 B/op 6 allocs/op BenchmarkCapNProto2Marshal-8 2000000 695 ns/op 244 B/op 3 allocs/op BenchmarkCapNProto2Unmarshal-8 2000000 859 ns/op 320 B/op 6 allocs/op BenchmarkHproseMarshal-8 1000000 1033 ns/op 479 B/op 8 allocs/op BenchmarkHproseUnmarshal-8 2000000 1028 ns/op 319 B/op 10 allocs/op BenchmarkProtobufMarshal-8 2000000 885 ns/op 200 B/op 7 allocs/op BenchmarkProtobufUnmarshal-8 2000000 641 ns/op 192 B/op 10 allocs/op BenchmarkGoprotobufMarshal-8 3000000 447 ns/op 312 B/op 4 allocs/op BenchmarkGoprotobufUnmarshal-8 3000000 592 ns/op 432 B/op 9 allocs/op BenchmarkGogoprotobufMarshal-8 10000000 131 ns/op 64 B/op 1 allocs/op BenchmarkGogoprotobufUnmarshal-8 10000000 222 ns/op 96 B/op 3 allocs/op BenchmarkColferMarshal-8 10000000 123 ns/op 64 B/op 1 allocs/op BenchmarkColferUnmarshal-8 10000000 181 ns/op 112 B/op 3 allocs/op BenchmarkGencodeMarshal-8 10000000 153 ns/op 80 B/op 2 allocs/op BenchmarkGencodeUnmarshal-8 10000000 172 ns/op 112 B/op 3 allocs/op BenchmarkGencodeUnsafeMarshal-8 20000000 98.2 ns/op 48 B/op 1 allocs/op BenchmarkGencodeUnsafeUnmarshal-8 10000000 142 ns/op 96 B/op 3 allocs/op BenchmarkXDR2Marshal-8 10000000 151 ns/op 64 B/op 1 allocs/op BenchmarkXDR2Unmarshal-8 10000000 145 ns/op 32 B/op 2 allocs/op BenchmarkGoAvroMarshal-8 500000 2291 ns/op 1032 B/op 33 allocs/op BenchmarkGoAvroUnmarshal-8 300000 5388 ns/op 3440 B/op 89 allocs/op Not bad But slow as 1/10 compare to
 BenchmarkGencode
  • 22. What we did wrong?
  • 23. Slow pattern • Use GODEBUG=allocfreetrace=1 to find redundant allocation pattern func encodeUint64(v uint64, w io.Writer) error { b := [64 / 8]byte{} binary.LittleEndian.PutUint64(b[:], v) _, err := w.Write(b[:]) return err } func encodeString(v string, w io.Writer) error { l := len(v) err := encodeUint16(uint16(l), w) if err != nil { return err } _, err = w.Write([]byte(v)) return err } func rawbyteslice(size int) (b []byte) { cap := roundupsize(uintptr(size)) p := mallocgc(cap, nil, false) if cap != uintptr(size) { memclrNoHeapPointers(add(p, uintptr(size)), cap-uintptr(size)) } *(*slice)(unsafe.Pointer(&b)) = slice{p, size, int(cap)} return }
  • 24. Slow pattern • Took a look at some fast serialization • Just byte copying around, no alloc • And in our case, we write to file write after encode, so we do not need each serialization buffer, we just need global one
  • 25. Second try • Prepare a global buffer • Grow if needed • Clear buffer each run • Just copy byte around, no more allocation
  • 26. var bufferByte = make([]byte, DEFAULT_BUFFER_CAP) func (rs Raw2Serializer) Marshal(o interface{}) []byte { a := o.(*A) cleanBuffer() idx := 0 idx += WriteString(idx, a.Name) idx += WriteUint64(idx, uint64(a.BirthDay.UnixNano())) idx += WriteString(idx, a.Phone) idx += WriteUint64(idx, uint64(a.Siblings)) idx += WriteBool(idx, a.Spouse) idx += WriteFloat64(idx, a.Money) // copy from a to bufferByte return bufferByte[0:idx] } small different, need index control to know where we need to copy and need to clean Buffer for each run
  • 27. func WriteUint64(idx int, n uint64) int { if (idx + 8) > currentCap { growBufferIfneeded() } for i := uint(idx); i < uint(8); i++ { bufferByte[i] = byte(n >> (i*8)) } return 8 } func WriteString(idx int, s string) int { l := len(s) if (idx + l) > currentCap { growBufferIfneeded() } n := WriteUint64(idx, uint64(l)) // NOTE: copy works without conversion copy(bufferByte[idx+n:idx+l], s) return l+n }
  • 28. BenchmarkMsgpMarshal-8 10000000 158 ns/op 128 B/op 1 allocs/op BenchmarkMsgpUnmarshal-8 5000000 335 ns/op 112 B/op 3 allocs/op BenchmarkVmihailencoMsgpackMarshal-8 1000000 1764 ns/op 368 B/op 6 allocs/op BenchmarkVmihailencoMsgpackUnmarshal-8 1000000 1779 ns/op 384 B/op 13 allocs/op BenchmarkRawMarshaller-8 2000000 806 ns/op 384 B/op 13 allocs/op BenchmarkRawUnmarshaller-8 2000000 706 ns/op 338 B/op 17 allocs/op BenchmarkRaw2Marshaller-8 20000000 83.6 ns/op 0 B/op 0 allocs/op 10 times faster ! As fast as fastest andyleap/gencode bench again
  • 29. What I learned • Hidden allocation reduce performance • Serialization to file pitfalls • Need thread-safe implement to prevent dirty file • Need versioning (write version first, than payload later) for backward compatibility • Checksum matter • You can calculate checksum directly from struct, no need to calculate from bytes • Using fnv to hash all fields and add up together, instead of using CRC32 to calculate the whole byte arrays
  • 30. Interesting techniques of other serialization methods
  • 31. varint (protobuf) • Available in many softwares (protobuf, sqlite, webassemlby (LEB128 of LLVM), golang encoding/ binary) • Compressed positive integer (negative number with 2- complement will take more bits) • Idea: • most of integer in our app is small ("not very big") • Use as little number of bits as possible • 7 bit per byte, MSB bit as "continuation bit" • Cons: • CPU cost • decoding is a bit complex
  • 32. varint (protobuf) t := uint64(l) for t >= 0x80 { buf[i+8] = byte(t) | 0x80 t >>= 7 i++ } Many variant (group varint encoding, 
 prefix varint encoding etc ... )
  • 33. zigzag encoding (protobuf) • varint works only with positive number • Zigzag encoding encode negative number as nearest (in absolute) positive number 0 0 -1 1 1 2 -2 3 2147483647 4294967294 -2147483648 4294967295 zigzag = (n << 1) ^ (n >> (BIT_WIDTH - 1) 
 Remember that arithmetic shift replicates the sign bit (n >> (BIT_WIDTH - 1) -> 11111...1 for negative (n >> (BIT_WIDTH - 1) -> 00000...0 for positive So when XOR with negative n, a lot of 1 will be eliminate
  • 34. float reverse (gob) // floatBits returns a uint64 holding the bits of a floating-point number. // Floating-point numbers are transmitted as uint64s holding the bits // of the underlying representation. They are sent byte-reversed, with // the exponent end coming out first, so integer floating point numbers // (for example) transmit more compactly. This routine does the // swizzling. func floatBits(f float64) uint64 { u := math.Float64bits(f) var v uint64 for i := 0; i < 8; i++ { v <<= 8 v |= u & 0xFF u >>= 8 } return v }
  • 35. unsafe (andyleap/gencode) v := *(*uint64)(unsafe.Pointer(&(d.Height))) Unmarshal number without copy or allocation You could use same technique for string too http://qiita.com/mattn/items/176459728ff4f854b165
  • 36. Finally • Write your own serialization is not hard, and fun • You can learn a lot from existence method • There are tons of techniques could be used to enhance performance • When there are no much preferences, let's use fix-layout type serialization • Version control proto file • High performance