how to hack with pack and unpack

Loading...

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

0 comments

Post a comment

    Post a comment
    Embed Video
    Edit your comment Cancel

    Notes on slide 1
















    talk about: ‘A’, ‘A12’, ‘A*’ and the meaningless whitespace...




































    Favorites, Groups & Events

    how to hack with pack and unpack - Presentation Transcript

    1. How to hack with pack and unpack 1
    2. crash course standard uses synergies within perl horrific abuses 2
    3. crash course 3
    4. what’s pack? • like sprintf() • only for bytes, not for presentation • template rules very complex • DWIM: packs empty strings for missing arguments 4
    5. what’s unpack? • like sscanf() for bytes • (not really: we’ll come back to it) • (mostly) identical template rules to pack • dies if it runs out of input bytes 5
    6. my $fourbytes = pack ‘L’, 12; my $twelve = unpack ‘L’, $fourbytes; • perldoc -f pack • perldoc -f unpack • perldoc perlpacktut 6
    7. standard uses someone else’s bytes fixed-width parsing 7
    8. someone else’s bytes 8
    9. FFI XS C 9
    10. 6 10
    11. 6 SIX BYTES ON THAT LAST SLIDE! 10
    12. but seriously... • struct alignment issues will ruin your day • change the XS and/or C if you can • Convert::Binary::C to the rescue! 11
    13. syscall • I have never had to do this... • perldoc -f syscall 12
    14. “close to the metal” • network protocols • binary file formats • bytes are language neutral 13
    15. fixed-width parsing 14
    16. • no sscanf() in perl • substr or regexes... • unpack is a bit nicer (not much) 15
    17. example: contrived pie chess 8 pecan 7 shaker lemon4 shoo fly 10 $pie = substr $_, 0, 12; $deliciousness = substr $_, 12; ($pie, $deliciousness) = m/(.{12})(.*)/; ($pie, $deliciousness) = unpack 'A12 A*', $_; • not quite identical... 16
    18. synergies vec() lvalue substr() use bytes 17
    19. vec() 0000001 00000011 0000001 00000011 0000001 0000001 0000001 0000001 18
    20. • vec(): treat a scalar as an arbitrary length bit vector • (you’re not using numbers, are you?) • pack and unpack ‘b’ template is perfect for working with the vector as a whole • convert vectors to and from from strings “011100” or lists (0,1,1,1,0,0) • count bits with unpack checksum • perldoc -f vec 19
    21. example: one million bits! ## create a 125,001 byte vector my $bit_vector = ''; (vec $bit_vector, 1_000_000, 1) = 1; ## stringify: “00000...1” my $bits = unpack 'b*', $bit_vector; ## listify: (0,0,0,...,1) my @bits = split //, unpack 'b*', $bit_vector; ## how many bits are on? my $on_bits = unpack '%32b*', $bit_vector; • the 1000001st through 1000008th bits are free! 20
    22. lvalue substr() 21
    23. • (or 4-argument substr) • magic: no realloc iff replacement length == original length • sprintf also might work, depending... 22
    24. example: Sys::Mmap mmap($shared, 4, PROT_READ|PROT_WRITE, MAP_SHARED, $filehandle) or die $!; $shared = meaning_of_life(); munmap($shared); • 7.5 million years’ work down the tubes! mmap($shared, 4, PROT_READ|PROT_WRITE, MAP_SHARED, $filehandle) or die $!; (substr $shared, 0, 4) = pack ‘L’, meaning_of_life(); munmap($shared); 23
    25. use bytes 24
    26. use bytes • binary data + DWIM + unicode • ouch! • pragma to the rescue: “No matter what you think might be in this PV, do not cleverly switch to character semantics when I’m not looking.” • pack/unpack themselves don’t care, it’s things like length and substr 25
    27. eat a snack please come back 26
    28. horrific abuses think like a C programmer serialization tricks lazy perlification 27
    29. think like a C programmer 28
    30. typedef struct TWO_THINGS { char a; char b; } two_things; two_things things; two_things lots_of_things[1000]; • where is things.a? things. • where is things.b? *(&things + 1). • where is lots_of_things[2].b? lots_of_things + (2 * sizeof(two_things)) + 1. • where is the point? next slide. 29
    31. Readonly my $FORMAT => ‘cc’; my $things = pack $FORMAT; my $lots_of_things = pack “($FORMAT)1000”; • where is $things.a? unpack ‘cx’, $things; • where is $things.b? unpack ‘xc’, $things; • where is $lots_of_things[2].b? unpack ‘(xx)2xc’, $lots_of_things 30
    32. • bytes, bytes, bytes on the brain • byte offsets a natural way of thinking about working with data • “language neutral” is just a cute way of saying “C” 31
    33. • “strong typing” the roundabout way • unpack() == C cast: “I, programmer, assure you, language, that these bytes contain precisely data of this type, and I will live with the consequences if I’m wrong.” 32
    34. example: SEGV! my $bar = unpack 'P', ‘asdf’; • god, I miss pointers sometimes • (but not right now) 33
    35. No pointers in Perl 34
    36. No pointers in Perl 34
    37. but... • we are not writing C • because down that road lies madness • still, its siren song is hard to resist... 35
    38. serialization tricks 36
    39. space efficiency • Storable: general-purpose • what does that mean? • if you’re thinking like a C programmer, maybe you can do better... 37
    40. example: array of shorts @shorts = map {int((rand 256)-128)} (1..10000); ## 20,000 bytes: 2 bytes per element $packed = pack 's*', @shorts; ## 20,016 bytes: 2 bytes per element $stored = Storable::freeze(\\@shorts); ## harmlessly examine contents of @shorts... print \"$_\\n\" for @shorts; ## roughly 46,000 bytes: ??? $stored = Storable::freeze(\\@shorts); • Extra credit: deserialize just $shorts[2113]... 38
    41. fixed width • depending on what you’re serializing • interesting properties • more in a bit 39
    42. keyless hashes • when a hash is really a struct/record • thinking like a C programmer again! • serialize bags of them without bags of redundant copies of their keys 40
    43. idiom ## shape of the “structure” and format are ## passed or encoded separately Readonly my $TEMPLATE => ‘VVC'; Readonly my @FIELDS => qw(thing1 thing2 kite); ## get the bytes my $bytes = get_from_somewhere(); ## unpack via hash slice FTW! my %thing; @thing{@FIELDS} = unpack $TEMPLATE, $bytes; 41
    44. example: keyless hash my @records = map { { thing1 => int rand 4294967296, thing2 => int rand 4294967296, kite => int rand 255, } } (1 .. 10000); ## 90,000 bytes: 9 bytes per record my $packed = pack \"($TEMPLATE)*\", map { @{$_}{@FIELDS} } @records; ## roughly 544,000 bytes: 54 bytes per record my $stored = Storable::freeze(\\@records); 42
    45. lazy perlification 43
    46. • for transient bytes e.g. from key-value storage • for sparse algorithms e.g. binary search • otherwise, don’t do this! • or at least, don’t blame me 44
    47. example: filtering • problem scale: 100k x 20k x 100 • idea 1: regular expressions! • idea 2: binary search, of course! • idea 3: binary search + lazy perlification 45
    48. serializing 46
    49. deserializing 47
    50. searching 48
    51. lazy binary search pack('Ca*', $size, pack(“(Z$size)*”, @sorted_haystack)); $size = unpack('C', ${$frozen_haystack_ref}); $format = ‘Z’ . $size; ... $element = unpack('x' . ($size * $mid + 1) . $format, ${$frozen_haystack_ref}); $cmp = $element cmp $needle; ... 49
    52. summary • bytes, bytes, bytes • “Premature optimization is the root of all evil.” -- Donald Knuth 50
    53. ? j.david.lowe@gmail.com twitter.com/j_david_lowe dlowe-wfh.blogspot.com slideshare 51

    + David LoweDavid Lowe, 7 months ago

    custom

    611 views, 0 favs, 0 embeds more stats

    More info about this document

    © All Rights Reserved

    Go to text version

    • Total Views 611
      • 611 on SlideShare
      • 0 from embeds
    • Comments 0
    • Favorites 0
    • Downloads 11
    Most viewed embeds

    more

    All embeds

    less

    Flagged as inappropriate Flag as inappropriate
    Flag as inappropriate

    Select your reason for flagging this presentation as inappropriate. If needed, use the feedback form to let us know more details.

    Cancel
    File a copyright complaint
    Having problems? Go to our helpdesk?

    Categories