how to hack with pack and unpack - Presentation Transcript
How to hack
with pack
and unpack
1
crash course
standard uses
synergies within perl
horrific abuses
2
crash course
3
what’s pack?
• like sprintf()
• only for bytes, not for presentation
• template rules very complex
• DWIM: packs empty strings for
missing arguments
4
what’s unpack?
• like sscanf() for bytes
• (not really: we’ll come back to it)
• (mostly) identical template rules to
pack
• dies if it runs out of input bytes
5
• vec(): treat a scalar as an arbitrary
length bit vector
• (you’re not using numbers, are you?)
• pack and unpack ‘b’ template is
perfect for working with the vector as
a whole
• convert vectors to and from from
strings “011100” or lists (0,1,1,1,0,0)
• count bits with unpack checksum
• perldoc -f vec
19
example: one million
bits!
## create a 125,001 byte vector
my $bit_vector = '';
(vec $bit_vector, 1_000_000, 1) = 1;
## stringify: “00000...1”
my $bits = unpack 'b*', $bit_vector;
## listify: (0,0,0,...,1)
my @bits = split //, unpack 'b*', $bit_vector;
## how many bits are on?
my $on_bits = unpack '%32b*', $bit_vector;
• the 1000001st through 1000008th
bits are free!
20
lvalue substr()
21
• (or 4-argument substr)
• magic: no realloc iff replacement
length == original length
• sprintf also might work, depending...
22
example: Sys::Mmap
mmap($shared, 4, PROT_READ|PROT_WRITE,
MAP_SHARED, $filehandle) or die $!;
$shared = meaning_of_life();
munmap($shared);
• 7.5 million years’ work down the tubes!
mmap($shared, 4, PROT_READ|PROT_WRITE,
MAP_SHARED, $filehandle) or die $!;
(substr $shared, 0, 4) =
pack ‘L’, meaning_of_life();
munmap($shared);
23
use bytes
24
use bytes
• binary data + DWIM + unicode
• ouch!
• pragma to the rescue: “No matter
what you think might be in this PV, do
not cleverly switch to character
semantics when I’m not looking.”
• pack/unpack themselves don’t care,
it’s things like length and substr
25
eat a snack
please come back
26
horrific abuses
think like a C programmer
serialization tricks
lazy perlification
27
think like a C
programmer
28
typedef struct TWO_THINGS {
char a;
char b;
} two_things;
two_things things;
two_things lots_of_things[1000];
• where is things.a? things.
• where is things.b? *(&things + 1).
• where is lots_of_things[2].b?
lots_of_things + (2 *
sizeof(two_things)) + 1.
• where is the point? next slide.
29
Readonly my $FORMAT => ‘cc’;
my $things = pack $FORMAT;
my $lots_of_things = pack “($FORMAT)1000”;
• where is $things.a? unpack ‘cx’,
$things;
• where is $things.b? unpack ‘xc’,
$things;
• where is $lots_of_things[2].b? unpack
‘(xx)2xc’, $lots_of_things
30
• bytes, bytes, bytes on the brain
• byte offsets a natural way of thinking
about working with data
• “language neutral” is just a cute way
of saying “C”
31
• “strong typing” the roundabout way
• unpack() == C cast: “I, programmer,
assure you, language, that these bytes
contain precisely data of this type,
and I will live with the consequences if
I’m wrong.”
32
example: SEGV!
my $bar = unpack 'P', ‘asdf’;
• god, I miss pointers sometimes
• (but not right now)
33
No pointers in Perl
34
No pointers in Perl
34
but...
• we are not writing C
• because down that road lies madness
• still, its siren song is hard to resist...
35
serialization tricks
36
space efficiency
• Storable: general-purpose
• what does that mean?
• if you’re thinking like a C
programmer, maybe you can do
better...
37
example: array of shorts
@shorts = map {int((rand 256)-128)} (1..10000);
## 20,000 bytes: 2 bytes per element
$packed = pack 's*', @shorts;
## 20,016 bytes: 2 bytes per element
$stored = Storable::freeze(\\@shorts);
## harmlessly examine contents of @shorts...
print \"$_\\n\" for @shorts;
## roughly 46,000 bytes: ???
$stored = Storable::freeze(\\@shorts);
• Extra credit: deserialize just
$shorts[2113]...
38
fixed width
• depending on what you’re serializing
• interesting properties
• more in a bit
39
keyless hashes
• when a hash is really a struct/record
• thinking like a C programmer again!
• serialize bags of them without bags of
redundant copies of their keys
40
idiom
## shape of the “structure” and format are
## passed or encoded separately
Readonly my $TEMPLATE => ‘VVC';
Readonly my @FIELDS => qw(thing1 thing2 kite);
## get the bytes
my $bytes = get_from_somewhere();
## unpack via hash slice FTW!
my %thing;
@thing{@FIELDS} = unpack $TEMPLATE, $bytes;
41
example: keyless hash
my @records = map {
{ thing1 => int rand 4294967296,
thing2 => int rand 4294967296,
kite => int rand 255, } } (1 .. 10000);
## 90,000 bytes: 9 bytes per record
my $packed = pack \"($TEMPLATE)*\",
map { @{$_}{@FIELDS} } @records;
## roughly 544,000 bytes: 54 bytes per record
my $stored = Storable::freeze(\\@records);
42
lazy perlification
43
• for transient bytes e.g. from key-value
storage
• for sparse algorithms e.g. binary
search
• otherwise, don’t do this!
• or at least, don’t blame me
44
example: filtering
• problem scale: 100k x 20k x 100
• idea 1: regular expressions!
• idea 2: binary search, of course!
• idea 3: binary search + lazy
perlification
45
0 comments
Post a comment