• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
how to hack with pack and unpack
 

how to hack with pack and unpack

on

  • 4,129 views

 

Statistics

Views

Total Views
4,129
Views on SlideShare
4,077
Embed Views
52

Actions

Likes
2
Downloads
26
Comments
0

3 Embeds 52

http://coderwall.com 44
http://www.slideshare.net 7
http://www.lmodules.com 1

Accessibility

Categories

Upload Details

Uploaded via as Apple Keynote

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • talk about: ‘A’, ‘A12’, ‘A*’ and the meaningless whitespace... <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />

how to hack with pack and unpack how to hack with pack and unpack Presentation Transcript

  • How to hack with pack and unpack 1
  • crash course standard uses synergies within perl horrific abuses 2
  • crash course 3
  • what’s pack? • like sprintf() • only for bytes, not for presentation • template rules very complex • DWIM: packs empty strings for missing arguments 4
  • what’s unpack? • like sscanf() for bytes • (not really: we’ll come back to it) • (mostly) identical template rules to pack • dies if it runs out of input bytes 5
  • my $fourbytes = pack ‘L’, 12; my $twelve = unpack ‘L’, $fourbytes; • perldoc -f pack • perldoc -f unpack • perldoc perlpacktut 6
  • standard uses someone else’s bytes fixed-width parsing 7
  • someone else’s bytes 8
  • FFI XS C 9
  • 6 10
  • 6 SIX BYTES ON THAT LAST SLIDE! 10
  • but seriously... • struct alignment issues will ruin your day • change the XS and/or C if you can • Convert::Binary::C to the rescue! 11
  • syscall • I have never had to do this... • perldoc -f syscall 12
  • “close to the metal” • network protocols • binary file formats • bytes are language neutral 13
  • fixed-width parsing 14
  • • no sscanf() in perl • substr or regexes... • unpack is a bit nicer (not much) 15
  • example: contrived pie chess 8 pecan 7 shaker lemon4 shoo fly 10 $pie = substr $_, 0, 12; $deliciousness = substr $_, 12; ($pie, $deliciousness) = m/(.{12})(.*)/; ($pie, $deliciousness) = unpack 'A12 A*', $_; • not quite identical... 16
  • synergies vec() lvalue substr() use bytes 17
  • vec() 0000001 00000011 0000001 00000011 0000001 0000001 0000001 0000001 18
  • • vec(): treat a scalar as an arbitrary length bit vector • (you’re not using numbers, are you?) • pack and unpack ‘b’ template is perfect for working with the vector as a whole • convert vectors to and from from strings “011100” or lists (0,1,1,1,0,0) • count bits with unpack checksum • perldoc -f vec 19
  • example: one million bits! ## create a 125,001 byte vector my $bit_vector = ''; (vec $bit_vector, 1_000_000, 1) = 1; ## stringify: “00000...1” my $bits = unpack 'b*', $bit_vector; ## listify: (0,0,0,...,1) my @bits = split //, unpack 'b*', $bit_vector; ## how many bits are on? my $on_bits = unpack '%32b*', $bit_vector; • the 1000001st through 1000008th bits are free! 20
  • lvalue substr() 21
  • • (or 4-argument substr) • magic: no realloc iff replacement length == original length • sprintf also might work, depending... 22
  • example: Sys::Mmap mmap($shared, 4, PROT_READ|PROT_WRITE, MAP_SHARED, $filehandle) or die $!; $shared = meaning_of_life(); munmap($shared); • 7.5 million years’ work down the tubes! mmap($shared, 4, PROT_READ|PROT_WRITE, MAP_SHARED, $filehandle) or die $!; (substr $shared, 0, 4) = pack ‘L’, meaning_of_life(); munmap($shared); 23
  • use bytes 24
  • use bytes • binary data + DWIM + unicode • ouch! • pragma to the rescue: “No matter what you think might be in this PV, do not cleverly switch to character semantics when I’m not looking.” • pack/unpack themselves don’t care, it’s things like length and substr 25
  • eat a snack please come back 26
  • horrific abuses think like a C programmer serialization tricks lazy perlification 27
  • think like a C programmer 28
  • typedef struct TWO_THINGS { char a; char b; } two_things; two_things things; two_things lots_of_things[1000]; • where is things.a? things. • where is things.b? *(&things + 1). • where is lots_of_things[2].b? lots_of_things + (2 * sizeof(two_things)) + 1. • where is the point? next slide. 29
  • Readonly my $FORMAT => ‘cc’; my $things = pack $FORMAT; my $lots_of_things = pack “($FORMAT)1000”; • where is $things.a? unpack ‘cx’, $things; • where is $things.b? unpack ‘xc’, $things; • where is $lots_of_things[2].b? unpack ‘(xx)2xc’, $lots_of_things 30
  • • bytes, bytes, bytes on the brain • byte offsets a natural way of thinking about working with data • “language neutral” is just a cute way of saying “C” 31
  • • “strong typing” the roundabout way • unpack() == C cast: “I, programmer, assure you, language, that these bytes contain precisely data of this type, and I will live with the consequences if I’m wrong.” 32
  • example: SEGV! my $bar = unpack 'P', ‘asdf’; • god, I miss pointers sometimes • (but not right now) 33
  • No pointers in Perl 34
  • No pointers in Perl 34
  • but... • we are not writing C • because down that road lies madness • still, its siren song is hard to resist... 35
  • serialization tricks 36
  • space efficiency • Storable: general-purpose • what does that mean? • if you’re thinking like a C programmer, maybe you can do better... 37
  • example: array of shorts @shorts = map {int((rand 256)-128)} (1..10000); ## 20,000 bytes: 2 bytes per element $packed = pack 's*', @shorts; ## 20,016 bytes: 2 bytes per element $stored = Storable::freeze(@shorts); ## harmlessly examine contents of @shorts... print quot;$_nquot; for @shorts; ## roughly 46,000 bytes: ??? $stored = Storable::freeze(@shorts); • Extra credit: deserialize just $shorts[2113]... 38
  • fixed width • depending on what you’re serializing • interesting properties • more in a bit 39
  • keyless hashes • when a hash is really a struct/record • thinking like a C programmer again! • serialize bags of them without bags of redundant copies of their keys 40
  • idiom ## shape of the “structure” and format are ## passed or encoded separately Readonly my $TEMPLATE => ‘VVC'; Readonly my @FIELDS => qw(thing1 thing2 kite); ## get the bytes my $bytes = get_from_somewhere(); ## unpack via hash slice FTW! my %thing; @thing{@FIELDS} = unpack $TEMPLATE, $bytes; 41
  • example: keyless hash my @records = map { { thing1 => int rand 4294967296, thing2 => int rand 4294967296, kite => int rand 255, } } (1 .. 10000); ## 90,000 bytes: 9 bytes per record my $packed = pack quot;($TEMPLATE)*quot;, map { @{$_}{@FIELDS} } @records; ## roughly 544,000 bytes: 54 bytes per record my $stored = Storable::freeze(@records); 42
  • lazy perlification 43
  • • for transient bytes e.g. from key-value storage • for sparse algorithms e.g. binary search • otherwise, don’t do this! • or at least, don’t blame me 44
  • example: filtering • problem scale: 100k x 20k x 100 • idea 1: regular expressions! • idea 2: binary search, of course! • idea 3: binary search + lazy perlification 45
  • serializing 46
  • deserializing 47
  • searching 48
  • lazy binary search pack('Ca*', $size, pack(“(Z$size)*”, @sorted_haystack)); $size = unpack('C', ${$frozen_haystack_ref}); $format = ‘Z’ . $size; ... $element = unpack('x' . ($size * $mid + 1) . $format, ${$frozen_haystack_ref}); $cmp = $element cmp $needle; ... 49
  • summary • bytes, bytes, bytes • “Premature optimization is the root of all evil.” -- Donald Knuth 50
  • ? j.david.lowe@gmail.com twitter.com/j_david_lowe dlowe-wfh.blogspot.com slideshare 51