How to hack
 with pack
and unpack


     1
crash course
standard uses
synergies within perl
horrific abuses




          2
crash course




     3
what’s pack?

• like sprintf()
• only for bytes, not for presentation
• template rules very complex
• DWIM: packs empty st...
what’s unpack?

• like sscanf() for bytes
• (not really: we’ll come back to it)
• (mostly) identical template rules to
  p...
my $fourbytes = pack ‘L’, 12;
  my $twelve = unpack ‘L’, $fourbytes;




• perldoc -f pack
• perldoc -f unpack
• perldoc p...
standard uses



someone else’s bytes
fixed-width parsing




          7
someone else’s bytes




         8
FFI
XS
C


 9
6
10
6
SIX BYTES ON THAT
   LAST SLIDE!



        10
but seriously...


• struct alignment issues will ruin your
  day
• change the XS and/or C if you can
• Convert::Binary::C...
syscall



• I have never had to do this...
• perldoc -f syscall



                       12
“close to the metal”


• network protocols
• binary file formats
• bytes are language neutral


                  13
fixed-width parsing




        14
• no sscanf() in perl
• substr or regexes...
• unpack is a bit nicer (not much)




                   15
example: contrived pie
   chess       8
   pecan       7
   shaker lemon4
   shoo fly    10


   $pie           = substr $...
synergies


vec()
lvalue substr()
use bytes




        17
vec()
         0000001
        00000011
         0000001
        00000011
         0000001
         0000001
         00000...
• vec(): treat a scalar as an arbitrary
  length bit vector

• (you’re not using numbers, are you?)
• pack and unpack ‘b’ ...
example: one million
        bits!
  ## create a 125,001 byte vector
  my $bit_vector = '';
  (vec $bit_vector, 1_000_000,...
lvalue substr()




       21
• (or 4-argument substr)
• magic: no realloc iff replacement
  length == original length

• sprintf also might work, depen...
example: Sys::Mmap
   mmap($shared, 4, PROT_READ|PROT_WRITE,
     MAP_SHARED, $filehandle) or die $!;

   $shared = meanin...
use bytes




    24
use bytes
• binary data + DWIM + unicode
• ouch!
• pragma to the rescue: “No matter
  what you think might be in this PV, ...
eat a snack




please come back


       26
horrific abuses


think like a C programmer
serialization tricks
lazy perlification




             27
think like a C
programmer




      28
typedef struct TWO_THINGS {
     char a;
     char b;
 } two_things;

 two_things things;

 two_things lots_of_things[1000...
Readonly my $FORMAT => ‘cc’;

  my $things         = pack $FORMAT;
  my $lots_of_things = pack “($FORMAT)1000”;




• wher...
• bytes, bytes, bytes on the brain
• byte offsets a natural way of thinking
  about working with data

• “language neutral...
• “strong typing” the roundabout way
• unpack() == C cast: “I, programmer,
  assure you, language, that these bytes
  cont...
example: SEGV!

  my $bar = unpack 'P', ‘asdf’;




• god, I miss pointers sometimes
• (but not right now)



            ...
No pointers in Perl



         34
No pointers in Perl



         34
but...

• we are not writing C
• because down that road lies madness
• still, its siren song is hard to resist...



     ...
serialization tricks




         36
space efficiency

• Storable: general-purpose
• what does that mean?
• if you’re thinking like a C
  programmer, maybe you ...
example: array of shorts
  @shorts = map {int((rand 256)-128)} (1..10000);

  ## 20,000 bytes: 2 bytes per element
  $pack...
fixed width


• depending on what you’re serializing
• interesting properties
• more in a bit


                   39
keyless hashes


• when a hash is really a struct/record
• thinking like a C programmer again!
• serialize bags of them wi...
idiom
## shape of the “structure” and format are
## passed or encoded separately
Readonly my $TEMPLATE => ‘VVC';
Readonly ...
example: keyless hash
my @records = map {
    { thing1 => int rand 4294967296,
      thing2 => int rand 4294967296,
      ...
lazy perlification




        43
• for transient bytes e.g. from key-value
  storage
• for sparse algorithms e.g. binary
  search
• otherwise, don’t do thi...
example: filtering

• problem scale: 100k x 20k x 100
• idea 1: regular expressions!
• idea 2: binary search, of course!
• ...
serializing




     46
deserializing




      47
searching




    48
lazy binary search
pack('Ca*', $size,
    pack(“(Z$size)*”, @sorted_haystack));




$size   = unpack('C', ${$frozen_haysta...
summary


• bytes, bytes, bytes
• “Premature optimization is the root of
  all evil.” -- Donald Knuth




                ...
?

j.david.lowe@gmail.com
twitter.com/j_david_lowe
dlowe-wfh.blogspot.com
slideshare



             51
Upcoming SlideShare
Loading in...5
×

how to hack with pack and unpack

3,082

Published on

Published in: Technology, Sports
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
3,082
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
31
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide















  • talk about: ‘A’, ‘A12’, ‘A*’ and the meaningless whitespace...



































  • how to hack with pack and unpack

    1. 1. How to hack with pack and unpack 1
    2. 2. crash course standard uses synergies within perl horrific abuses 2
    3. 3. crash course 3
    4. 4. what’s pack? • like sprintf() • only for bytes, not for presentation • template rules very complex • DWIM: packs empty strings for missing arguments 4
    5. 5. what’s unpack? • like sscanf() for bytes • (not really: we’ll come back to it) • (mostly) identical template rules to pack • dies if it runs out of input bytes 5
    6. 6. my $fourbytes = pack ‘L’, 12; my $twelve = unpack ‘L’, $fourbytes; • perldoc -f pack • perldoc -f unpack • perldoc perlpacktut 6
    7. 7. standard uses someone else’s bytes fixed-width parsing 7
    8. 8. someone else’s bytes 8
    9. 9. FFI XS C 9
    10. 10. 6 10
    11. 11. 6 SIX BYTES ON THAT LAST SLIDE! 10
    12. 12. but seriously... • struct alignment issues will ruin your day • change the XS and/or C if you can • Convert::Binary::C to the rescue! 11
    13. 13. syscall • I have never had to do this... • perldoc -f syscall 12
    14. 14. “close to the metal” • network protocols • binary file formats • bytes are language neutral 13
    15. 15. fixed-width parsing 14
    16. 16. • no sscanf() in perl • substr or regexes... • unpack is a bit nicer (not much) 15
    17. 17. example: contrived pie chess 8 pecan 7 shaker lemon4 shoo fly 10 $pie = substr $_, 0, 12; $deliciousness = substr $_, 12; ($pie, $deliciousness) = m/(.{12})(.*)/; ($pie, $deliciousness) = unpack 'A12 A*', $_; • not quite identical... 16
    18. 18. synergies vec() lvalue substr() use bytes 17
    19. 19. vec() 0000001 00000011 0000001 00000011 0000001 0000001 0000001 0000001 18
    20. 20. • vec(): treat a scalar as an arbitrary length bit vector • (you’re not using numbers, are you?) • pack and unpack ‘b’ template is perfect for working with the vector as a whole • convert vectors to and from from strings “011100” or lists (0,1,1,1,0,0) • count bits with unpack checksum • perldoc -f vec 19
    21. 21. example: one million bits! ## create a 125,001 byte vector my $bit_vector = ''; (vec $bit_vector, 1_000_000, 1) = 1; ## stringify: “00000...1” my $bits = unpack 'b*', $bit_vector; ## listify: (0,0,0,...,1) my @bits = split //, unpack 'b*', $bit_vector; ## how many bits are on? my $on_bits = unpack '%32b*', $bit_vector; • the 1000001st through 1000008th bits are free! 20
    22. 22. lvalue substr() 21
    23. 23. • (or 4-argument substr) • magic: no realloc iff replacement length == original length • sprintf also might work, depending... 22
    24. 24. example: Sys::Mmap mmap($shared, 4, PROT_READ|PROT_WRITE, MAP_SHARED, $filehandle) or die $!; $shared = meaning_of_life(); munmap($shared); • 7.5 million years’ work down the tubes! mmap($shared, 4, PROT_READ|PROT_WRITE, MAP_SHARED, $filehandle) or die $!; (substr $shared, 0, 4) = pack ‘L’, meaning_of_life(); munmap($shared); 23
    25. 25. use bytes 24
    26. 26. use bytes • binary data + DWIM + unicode • ouch! • pragma to the rescue: “No matter what you think might be in this PV, do not cleverly switch to character semantics when I’m not looking.” • pack/unpack themselves don’t care, it’s things like length and substr 25
    27. 27. eat a snack please come back 26
    28. 28. horrific abuses think like a C programmer serialization tricks lazy perlification 27
    29. 29. think like a C programmer 28
    30. 30. typedef struct TWO_THINGS { char a; char b; } two_things; two_things things; two_things lots_of_things[1000]; • where is things.a? things. • where is things.b? *(&things + 1). • where is lots_of_things[2].b? lots_of_things + (2 * sizeof(two_things)) + 1. • where is the point? next slide. 29
    31. 31. Readonly my $FORMAT => ‘cc’; my $things = pack $FORMAT; my $lots_of_things = pack “($FORMAT)1000”; • where is $things.a? unpack ‘cx’, $things; • where is $things.b? unpack ‘xc’, $things; • where is $lots_of_things[2].b? unpack ‘(xx)2xc’, $lots_of_things 30
    32. 32. • bytes, bytes, bytes on the brain • byte offsets a natural way of thinking about working with data • “language neutral” is just a cute way of saying “C” 31
    33. 33. • “strong typing” the roundabout way • unpack() == C cast: “I, programmer, assure you, language, that these bytes contain precisely data of this type, and I will live with the consequences if I’m wrong.” 32
    34. 34. example: SEGV! my $bar = unpack 'P', ‘asdf’; • god, I miss pointers sometimes • (but not right now) 33
    35. 35. No pointers in Perl 34
    36. 36. No pointers in Perl 34
    37. 37. but... • we are not writing C • because down that road lies madness • still, its siren song is hard to resist... 35
    38. 38. serialization tricks 36
    39. 39. space efficiency • Storable: general-purpose • what does that mean? • if you’re thinking like a C programmer, maybe you can do better... 37
    40. 40. example: array of shorts @shorts = map {int((rand 256)-128)} (1..10000); ## 20,000 bytes: 2 bytes per element $packed = pack 's*', @shorts; ## 20,016 bytes: 2 bytes per element $stored = Storable::freeze(@shorts); ## harmlessly examine contents of @shorts... print quot;$_nquot; for @shorts; ## roughly 46,000 bytes: ??? $stored = Storable::freeze(@shorts); • Extra credit: deserialize just $shorts[2113]... 38
    41. 41. fixed width • depending on what you’re serializing • interesting properties • more in a bit 39
    42. 42. keyless hashes • when a hash is really a struct/record • thinking like a C programmer again! • serialize bags of them without bags of redundant copies of their keys 40
    43. 43. idiom ## shape of the “structure” and format are ## passed or encoded separately Readonly my $TEMPLATE => ‘VVC'; Readonly my @FIELDS => qw(thing1 thing2 kite); ## get the bytes my $bytes = get_from_somewhere(); ## unpack via hash slice FTW! my %thing; @thing{@FIELDS} = unpack $TEMPLATE, $bytes; 41
    44. 44. example: keyless hash my @records = map { { thing1 => int rand 4294967296, thing2 => int rand 4294967296, kite => int rand 255, } } (1 .. 10000); ## 90,000 bytes: 9 bytes per record my $packed = pack quot;($TEMPLATE)*quot;, map { @{$_}{@FIELDS} } @records; ## roughly 544,000 bytes: 54 bytes per record my $stored = Storable::freeze(@records); 42
    45. 45. lazy perlification 43
    46. 46. • for transient bytes e.g. from key-value storage • for sparse algorithms e.g. binary search • otherwise, don’t do this! • or at least, don’t blame me 44
    47. 47. example: filtering • problem scale: 100k x 20k x 100 • idea 1: regular expressions! • idea 2: binary search, of course! • idea 3: binary search + lazy perlification 45
    48. 48. serializing 46
    49. 49. deserializing 47
    50. 50. searching 48
    51. 51. lazy binary search pack('Ca*', $size, pack(“(Z$size)*”, @sorted_haystack)); $size = unpack('C', ${$frozen_haystack_ref}); $format = ‘Z’ . $size; ... $element = unpack('x' . ($size * $mid + 1) . $format, ${$frozen_haystack_ref}); $cmp = $element cmp $needle; ... 49
    52. 52. summary • bytes, bytes, bytes • “Premature optimization is the root of all evil.” -- Donald Knuth 50
    53. 53. ? j.david.lowe@gmail.com twitter.com/j_david_lowe dlowe-wfh.blogspot.com slideshare 51
    1. Gostou de algum slide específico?

      Recortar slides é uma maneira fácil de colecionar informações para acessar mais tarde.

    ×