Pack/Unpack
Become a Deft Manipulator of Binary Data
Presented by
Lambert Lum
What is this?
● Perl becomes a deft manipulator of bytes
● Albeit slow, relative to C.
Other languages
● Ruby/PHP (same or follows closely)
● Python (suffers from Perl hatred)
– Changed the format grammar
● Node.JS (two versions: perl-like, python-like)
Useful?
● Good for small jobs, light duty
● Not something you use very often
● Will never be asked during a job interview
My story
● Worked at SOLiD Technology
● Proprietary protocol that ran over serial
● Same data frame was used in the socket
communication as well.
What we need to start
● Hex viewer
– we will use Ghex
● The binary data format specification
– We will use ID3v2
● Perl
– pack/unpack is a built-in
House rules
● We will mostly talk about unpack
– The format specifier is basically the same between
pack and unpack
● Whereas the documentation uses the term
template, I use the term format specifier
●
Example: ID3
● Mp3 with ID3v2 metadata
● Only parsing the header
● http://id3.org/id3v2.3.0#ID3v2_header
ID3v2 header
ID3v2/file identifier “ID3”
ID3v2 version $03 00
ID3v2 flags %abc00000
ID3v2 size 4 * $0xxxxxxx
MP3 with ID3v2
● Author: Admiral Bob
● Song: Beautiful Mystery
● admiralbob77_-_Beautiful_Mystery_4.mp3
● http://ccmixter.org/files/admiralbob77/47427
● [Ghex demo]
Pack/Unpack
my $file_name = "admiralbob77_-_Beautiful_Mystery_4.mp3";
open my $fh, "<", $file_name;
binmode $fh, ":raw";
read $fh, my $data, 128;
close $fh;
my @L;
@L = unpack "H H H H H H", $data;
print Dumper @L;
demo
● [run unpack_hex.pl]
What's wrong?
● Hex viewer shows one data
● unpack_hex.pl shows a different data
Unpack hex
my $file_name = "admiralbob77_-_Beautiful_Mystery_4.mp3";
open my $fh, "<", $file_name;
read $fh, my $data, 128;
close $fh;
my @L;
@L = unpack "H2 H2 H2 H2 H2 H2", $data;
print Dumper @L;
Format specifier
a
A
H
B
Arbitrary binary data
ASCII
Hexadecimal
Bit String (descending order)
Format specifier
H
H2
H*
Single hexademical
Two hexadecimal
Everything Hexadecimal
Unpack ASCII
# ID3
my ($tag, $other) = unpack "A3 a*", $data;
print "tag: $tagn";
● [demo unpack_ascii.pl]
Format specifier
C
S
L
Q
B
Unsigned char (8-bit) value
Unsigned short (16-bit) value
Unsigned long (32-bit) value
Unsigned quad (64-bit) value
Bit String (descending order)
Format specifier
c
s
l
q
b
signed char (8-bit) value
signed short (16-bit) value
signed long (32-bit) value
signed quad (64-bit) value
bit String (ascending order)
Unpack major/minor
my ($tag, $other) = unpack "A3 a*", $data;
# major / minor version
my ($major, $minor, $other) = unpack "C C a*", $other;
print "major: $majorn";
print "minor: $minorn";
Unpack bit flags
my ($flags) = unpack "B3 a*", $other;
print "flags: $flagsn";
# split the flags
my ($flag_unsynchronization, $flag_extended_header,
$flag_experimental_indicator)
= split m{}, $flags;
print " flag_unsynchronization: $flag_unsynchronizationn";
print " flag_extended_header: $flag_extended_headern";
print " flag_experimental_indicator: $flag_experimental_indicatorn";
Unpack/Pack/Unpack
# Next 32 bits describes the size of header
my $bit_string = unpack "B32 a*", $other;
print "bit_string: $bit_stringn";
# Ignore all eighth bits
$bit_string = "0000" . join '', $bit_string =~ m{.(.{7})}g;
print "bit_string: $bit_stringn";
# pack to binary
# unpack to long int
my $size = unpack "L", pack "B*", $bit_string;
print "size: $sizen";
● [run unpack_pack_unpack.pl]
Something is wrong
Big endian vs Little Endian
● What is this?
Big Endian vs Little Endian
8-bit
00010100
Byte 0
128
1
16-bit
00010100
Byte 0
00001011
Byte 1
128
1
32-bit
00010100
Byte 0
000010110001000010000100
Byte 1Byte 2Byte 3
128
1
32-bit
Big Endian
00010100
Byte 0
000010110001000010000100
Byte 1Byte 2Byte 3
2,147,483,648 128
1
32-bit
Little Endian
10000100
Byte 3
000100000000101100010100
Byte 2Byte 1Byte 0
128 2,147,483,648
1
● Byte ordering changes
● Bit ordering is the same
Big endian vs Little Endian
● Intel is Little Endian
● ID3v2 size is Big Endian
Format specifier
n
N
v
V
unsigned (16-bit) in "network" (big-endian) order.
unsigned (32-bit) in "network" (big-endian) order.
unsigned (16-bit) in "VAX" (little-endian) order.
unsigned (32-bit) in "VAX" (little-endian) order.
Format specifier
>
<
force big endian
force little endian
Little/Big Endian
# Little Endian on Intel
my $size = unpack "L", pack "B*", $bit_string;
print "size: $sizen";
# Big Endian
$size = unpack "N", pack "B*", $bit_string;
print "size: $sizen";
# Force L to Big Endian
$size = unpack "L>", pack "B*", $bit_string;
print "size: $sizen";
Named Unpack
# purely Lambert invention
my $named = named_unpack ($data, [
'tag' => 'A3',
'v_major' => 'C',
'v_minor' => 'C',
'bit_flags' => 'B3',
'bit_string' => 'B32',
]);
print Dumper $named;
Named Unpack
use List::MoreUtils qw(part);
sub named_unpack {
my ($data, $format_list) = @_;
my %named;
my $i;
my ($lefts, $rights) = part { $i++ % 2 } @$format_list;
my $format = join ('', @$rights);
@named{@$lefts} = unpack ($format, $data);
return %named;
}
Replacing bytes
● substr
– We will demonstrate
● vec
– A bit limited, not a favorite of mine.
Substr
● substr ($data, $offset)
– Return all from $offset onward
● substr ($data, $offset, $length)
– Return $length bytes at $offset
● substr ($data, $offset, $length, $replacement)
– Replace $length bytes at $offset with $replacement
– To prevent realloc, replace with same size
– Avoid realloc when manipulating large files
Substr
substr ($data, 5, 1, $bit_data);
#substr($data, 5, 1) = $bit_data;
my $named = named_unpack ($data, [
'tag' => 'A3',
'v_major' => 'C',
'v_minor' => 'C',
'bit_flags' => 'B3',
'bit_string' => 'B32',
]);
print Dumper $named;
For reference
● http://perldoc.perl.org/perlpacktut.html
● http://id3.org/id3v2.3.0#ID3v2_header
●

Pack/Unpack: manipulate binary data