Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

how to hack with pack and unpack

3,658 views

Published on

Published in: Technology, Sports
  • Be the first to comment

how to hack with pack and unpack

  1. 1. How to hack with pack and unpack 1
  2. 2. crash course standard uses synergies within perl horrific abuses 2
  3. 3. crash course 3
  4. 4. what’s pack? • like sprintf() • only for bytes, not for presentation • template rules very complex • DWIM: packs empty strings for missing arguments 4
  5. 5. what’s unpack? • like sscanf() for bytes • (not really: we’ll come back to it) • (mostly) identical template rules to pack • dies if it runs out of input bytes 5
  6. 6. my $fourbytes = pack ‘L’, 12; my $twelve = unpack ‘L’, $fourbytes; • perldoc -f pack • perldoc -f unpack • perldoc perlpacktut 6
  7. 7. standard uses someone else’s bytes fixed-width parsing 7
  8. 8. someone else’s bytes 8
  9. 9. FFI XS C 9
  10. 10. 6 10
  11. 11. 6 SIX BYTES ON THAT LAST SLIDE! 10
  12. 12. but seriously... • struct alignment issues will ruin your day • change the XS and/or C if you can • Convert::Binary::C to the rescue! 11
  13. 13. syscall • I have never had to do this... • perldoc -f syscall 12
  14. 14. “close to the metal” • network protocols • binary file formats • bytes are language neutral 13
  15. 15. fixed-width parsing 14
  16. 16. • no sscanf() in perl • substr or regexes... • unpack is a bit nicer (not much) 15
  17. 17. example: contrived pie chess 8 pecan 7 shaker lemon4 shoo fly 10 $pie = substr $_, 0, 12; $deliciousness = substr $_, 12; ($pie, $deliciousness) = m/(.{12})(.*)/; ($pie, $deliciousness) = unpack 'A12 A*', $_; • not quite identical... 16
  18. 18. synergies vec() lvalue substr() use bytes 17
  19. 19. vec() 0000001 00000011 0000001 00000011 0000001 0000001 0000001 0000001 18
  20. 20. • vec(): treat a scalar as an arbitrary length bit vector • (you’re not using numbers, are you?) • pack and unpack ‘b’ template is perfect for working with the vector as a whole • convert vectors to and from from strings “011100” or lists (0,1,1,1,0,0) • count bits with unpack checksum • perldoc -f vec 19
  21. 21. example: one million bits! ## create a 125,001 byte vector my $bit_vector = ''; (vec $bit_vector, 1_000_000, 1) = 1; ## stringify: “00000...1” my $bits = unpack 'b*', $bit_vector; ## listify: (0,0,0,...,1) my @bits = split //, unpack 'b*', $bit_vector; ## how many bits are on? my $on_bits = unpack '%32b*', $bit_vector; • the 1000001st through 1000008th bits are free! 20
  22. 22. lvalue substr() 21
  23. 23. • (or 4-argument substr) • magic: no realloc iff replacement length == original length • sprintf also might work, depending... 22
  24. 24. example: Sys::Mmap mmap($shared, 4, PROT_READ|PROT_WRITE, MAP_SHARED, $filehandle) or die $!; $shared = meaning_of_life(); munmap($shared); • 7.5 million years’ work down the tubes! mmap($shared, 4, PROT_READ|PROT_WRITE, MAP_SHARED, $filehandle) or die $!; (substr $shared, 0, 4) = pack ‘L’, meaning_of_life(); munmap($shared); 23
  25. 25. use bytes 24
  26. 26. use bytes • binary data + DWIM + unicode • ouch! • pragma to the rescue: “No matter what you think might be in this PV, do not cleverly switch to character semantics when I’m not looking.” • pack/unpack themselves don’t care, it’s things like length and substr 25
  27. 27. eat a snack please come back 26
  28. 28. horrific abuses think like a C programmer serialization tricks lazy perlification 27
  29. 29. think like a C programmer 28
  30. 30. typedef struct TWO_THINGS { char a; char b; } two_things; two_things things; two_things lots_of_things[1000]; • where is things.a? things. • where is things.b? *(&things + 1). • where is lots_of_things[2].b? lots_of_things + (2 * sizeof(two_things)) + 1. • where is the point? next slide. 29
  31. 31. Readonly my $FORMAT => ‘cc’; my $things = pack $FORMAT; my $lots_of_things = pack “($FORMAT)1000”; • where is $things.a? unpack ‘cx’, $things; • where is $things.b? unpack ‘xc’, $things; • where is $lots_of_things[2].b? unpack ‘(xx)2xc’, $lots_of_things 30
  32. 32. • bytes, bytes, bytes on the brain • byte offsets a natural way of thinking about working with data • “language neutral” is just a cute way of saying “C” 31
  33. 33. • “strong typing” the roundabout way • unpack() == C cast: “I, programmer, assure you, language, that these bytes contain precisely data of this type, and I will live with the consequences if I’m wrong.” 32
  34. 34. example: SEGV! my $bar = unpack 'P', ‘asdf’; • god, I miss pointers sometimes • (but not right now) 33
  35. 35. No pointers in Perl 34
  36. 36. No pointers in Perl 34
  37. 37. but... • we are not writing C • because down that road lies madness • still, its siren song is hard to resist... 35
  38. 38. serialization tricks 36
  39. 39. space efficiency • Storable: general-purpose • what does that mean? • if you’re thinking like a C programmer, maybe you can do better... 37
  40. 40. example: array of shorts @shorts = map {int((rand 256)-128)} (1..10000); ## 20,000 bytes: 2 bytes per element $packed = pack 's*', @shorts; ## 20,016 bytes: 2 bytes per element $stored = Storable::freeze(@shorts); ## harmlessly examine contents of @shorts... print quot;$_nquot; for @shorts; ## roughly 46,000 bytes: ??? $stored = Storable::freeze(@shorts); • Extra credit: deserialize just $shorts[2113]... 38
  41. 41. fixed width • depending on what you’re serializing • interesting properties • more in a bit 39
  42. 42. keyless hashes • when a hash is really a struct/record • thinking like a C programmer again! • serialize bags of them without bags of redundant copies of their keys 40
  43. 43. idiom ## shape of the “structure” and format are ## passed or encoded separately Readonly my $TEMPLATE => ‘VVC'; Readonly my @FIELDS => qw(thing1 thing2 kite); ## get the bytes my $bytes = get_from_somewhere(); ## unpack via hash slice FTW! my %thing; @thing{@FIELDS} = unpack $TEMPLATE, $bytes; 41
  44. 44. example: keyless hash my @records = map { { thing1 => int rand 4294967296, thing2 => int rand 4294967296, kite => int rand 255, } } (1 .. 10000); ## 90,000 bytes: 9 bytes per record my $packed = pack quot;($TEMPLATE)*quot;, map { @{$_}{@FIELDS} } @records; ## roughly 544,000 bytes: 54 bytes per record my $stored = Storable::freeze(@records); 42
  45. 45. lazy perlification 43
  46. 46. • for transient bytes e.g. from key-value storage • for sparse algorithms e.g. binary search • otherwise, don’t do this! • or at least, don’t blame me 44
  47. 47. example: filtering • problem scale: 100k x 20k x 100 • idea 1: regular expressions! • idea 2: binary search, of course! • idea 3: binary search + lazy perlification 45
  48. 48. serializing 46
  49. 49. deserializing 47
  50. 50. searching 48
  51. 51. lazy binary search pack('Ca*', $size, pack(“(Z$size)*”, @sorted_haystack)); $size = unpack('C', ${$frozen_haystack_ref}); $format = ‘Z’ . $size; ... $element = unpack('x' . ($size * $mid + 1) . $format, ${$frozen_haystack_ref}); $cmp = $element cmp $needle; ... 49
  52. 52. summary • bytes, bytes, bytes • “Premature optimization is the root of all evil.” -- Donald Knuth 50
  53. 53. ? j.david.lowe@gmail.com twitter.com/j_david_lowe dlowe-wfh.blogspot.com slideshare 51

×