Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Pack/Unpack: manipulate binary data

99 views

Published on

The miracle solution to manipulating binary data. No bit operations required.

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

Pack/Unpack: manipulate binary data

  1. 1. Pack/Unpack Become a Deft Manipulator of Binary Data Presented by Lambert Lum
  2. 2. What is this? ● Perl becomes a deft manipulator of bytes ● Albeit slow, relative to C.
  3. 3. Other languages ● Ruby/PHP (same or follows closely) ● Python (suffers from Perl hatred) – Changed the format grammar ● Node.JS (two versions: perl-like, python-like)
  4. 4. Useful? ● Good for small jobs, light duty ● Not something you use very often ● Will never be asked during a job interview
  5. 5. My story ● Worked at SOLiD Technology ● Proprietary protocol that ran over serial ● Same data frame was used in the socket communication as well.
  6. 6. What we need to start ● Hex viewer – we will use Ghex ● The binary data format specification – We will use ID3v2 ● Perl – pack/unpack is a built-in
  7. 7. House rules ● We will mostly talk about unpack – The format specifier is basically the same between pack and unpack ● Whereas the documentation uses the term template, I use the term format specifier ●
  8. 8. Example: ID3 ● Mp3 with ID3v2 metadata ● Only parsing the header ● http://id3.org/id3v2.3.0#ID3v2_header
  9. 9. ID3v2 header ID3v2/file identifier “ID3” ID3v2 version $03 00 ID3v2 flags %abc00000 ID3v2 size 4 * $0xxxxxxx
  10. 10. MP3 with ID3v2 ● Author: Admiral Bob ● Song: Beautiful Mystery ● admiralbob77_-_Beautiful_Mystery_4.mp3 ● http://ccmixter.org/files/admiralbob77/47427
  11. 11. ● [Ghex demo]
  12. 12. Pack/Unpack my $file_name = "admiralbob77_-_Beautiful_Mystery_4.mp3"; open my $fh, "<", $file_name; binmode $fh, ":raw"; read $fh, my $data, 128; close $fh; my @L; @L = unpack "H H H H H H", $data; print Dumper @L;
  13. 13. demo ● [run unpack_hex.pl]
  14. 14. What's wrong? ● Hex viewer shows one data ● unpack_hex.pl shows a different data
  15. 15. Unpack hex my $file_name = "admiralbob77_-_Beautiful_Mystery_4.mp3"; open my $fh, "<", $file_name; read $fh, my $data, 128; close $fh; my @L; @L = unpack "H2 H2 H2 H2 H2 H2", $data; print Dumper @L;
  16. 16. Format specifier a A H B Arbitrary binary data ASCII Hexadecimal Bit String (descending order)
  17. 17. Format specifier H H2 H* Single hexademical Two hexadecimal Everything Hexadecimal
  18. 18. Unpack ASCII # ID3 my ($tag, $other) = unpack "A3 a*", $data; print "tag: $tagn";
  19. 19. ● [demo unpack_ascii.pl]
  20. 20. Format specifier C S L Q B Unsigned char (8-bit) value Unsigned short (16-bit) value Unsigned long (32-bit) value Unsigned quad (64-bit) value Bit String (descending order)
  21. 21. Format specifier c s l q b signed char (8-bit) value signed short (16-bit) value signed long (32-bit) value signed quad (64-bit) value bit String (ascending order)
  22. 22. Unpack major/minor my ($tag, $other) = unpack "A3 a*", $data; # major / minor version my ($major, $minor, $other) = unpack "C C a*", $other; print "major: $majorn"; print "minor: $minorn";
  23. 23. Unpack bit flags my ($flags) = unpack "B3 a*", $other; print "flags: $flagsn"; # split the flags my ($flag_unsynchronization, $flag_extended_header, $flag_experimental_indicator) = split m{}, $flags; print " flag_unsynchronization: $flag_unsynchronizationn"; print " flag_extended_header: $flag_extended_headern"; print " flag_experimental_indicator: $flag_experimental_indicatorn";
  24. 24. Unpack/Pack/Unpack # Next 32 bits describes the size of header my $bit_string = unpack "B32 a*", $other; print "bit_string: $bit_stringn"; # Ignore all eighth bits $bit_string = "0000" . join '', $bit_string =~ m{.(.{7})}g; print "bit_string: $bit_stringn"; # pack to binary # unpack to long int my $size = unpack "L", pack "B*", $bit_string; print "size: $sizen";
  25. 25. ● [run unpack_pack_unpack.pl]
  26. 26. Something is wrong
  27. 27. Big endian vs Little Endian ● What is this?
  28. 28. Big Endian vs Little Endian
  29. 29. 8-bit 00010100 Byte 0 128 1
  30. 30. 16-bit 00010100 Byte 0 00001011 Byte 1 128 1
  31. 31. 32-bit 00010100 Byte 0 000010110001000010000100 Byte 1Byte 2Byte 3 128 1
  32. 32. 32-bit Big Endian 00010100 Byte 0 000010110001000010000100 Byte 1Byte 2Byte 3 2,147,483,648 128 1
  33. 33. 32-bit Little Endian 10000100 Byte 3 000100000000101100010100 Byte 2Byte 1Byte 0 128 2,147,483,648 1 ● Byte ordering changes ● Bit ordering is the same
  34. 34. Big endian vs Little Endian ● Intel is Little Endian ● ID3v2 size is Big Endian
  35. 35. Format specifier n N v V unsigned (16-bit) in "network" (big-endian) order. unsigned (32-bit) in "network" (big-endian) order. unsigned (16-bit) in "VAX" (little-endian) order. unsigned (32-bit) in "VAX" (little-endian) order.
  36. 36. Format specifier > < force big endian force little endian
  37. 37. Little/Big Endian # Little Endian on Intel my $size = unpack "L", pack "B*", $bit_string; print "size: $sizen"; # Big Endian $size = unpack "N", pack "B*", $bit_string; print "size: $sizen"; # Force L to Big Endian $size = unpack "L>", pack "B*", $bit_string; print "size: $sizen";
  38. 38. Named Unpack # purely Lambert invention my $named = named_unpack ($data, [ 'tag' => 'A3', 'v_major' => 'C', 'v_minor' => 'C', 'bit_flags' => 'B3', 'bit_string' => 'B32', ]); print Dumper $named;
  39. 39. Named Unpack use List::MoreUtils qw(part); sub named_unpack { my ($data, $format_list) = @_; my %named; my $i; my ($lefts, $rights) = part { $i++ % 2 } @$format_list; my $format = join ('', @$rights); @named{@$lefts} = unpack ($format, $data); return %named; }
  40. 40. Replacing bytes ● substr – We will demonstrate ● vec – A bit limited, not a favorite of mine.
  41. 41. Substr ● substr ($data, $offset) – Return all from $offset onward ● substr ($data, $offset, $length) – Return $length bytes at $offset ● substr ($data, $offset, $length, $replacement) – Replace $length bytes at $offset with $replacement – To prevent realloc, replace with same size – Avoid realloc when manipulating large files
  42. 42. Substr substr ($data, 5, 1, $bit_data); #substr($data, 5, 1) = $bit_data; my $named = named_unpack ($data, [ 'tag' => 'A3', 'v_major' => 'C', 'v_minor' => 'C', 'bit_flags' => 'B3', 'bit_string' => 'B32', ]); print Dumper $named;
  43. 43. For reference ● http://perldoc.perl.org/perlpacktut.html ● http://id3.org/id3v2.3.0#ID3v2_header ●

×