Perl and Unicode
Perl and UnicodeMike Whitaker, BBC/EnlightenedPerl.org
The problem• Keeping track of input and output  encodings• Not losing encoding data in the middle• Understanding the diffe...
Characters vs bytes
Characters vs bytescharacters
Characters vs bytescharacters             $
Characters vs bytescharacters              $             U+0024
Characters vs bytescharacters              $             U+0024  bytes (UTF-8)
Characters vs bytescharacters              $             U+0024  bytes      0x24 (UTF-8)
Characters vs bytescharacters              $       €             U+0024  bytes      0x24 (UTF-8)
Characters vs bytescharacters              $        €             U+0024   U+20AC  bytes      0x24 (UTF-8)
Characters vs bytescharacters              $        €             U+0024   U+20AC  bytes      0x24 0xE2 0x82 0xAC (UTF-8)
Characters vs bytes   2characters              $        €             U+0024   U+20AC  bytes      0x24 0xE2 0x82 0xAC (UTF...
Characters vs bytes   2characters              $        €             U+0024   U+20AC   4  bytes      0x24 0xE2 0x82 0xAC ...
Handling Encodings
Handling Encodingsinput
Handling Encodings           àbçdé         bytes in someinput     encoding or             other
Handling Encodings           àbçdé         bytes in someinput     encoding or             other        decode
Handling Encodings           àbçdé         bytes in someinput     encoding or             other        decode             ...
Handling Encodings           àbçdé         bytes in someinput     encoding or             other        decode             ...
Handling Encodings           àbçdé                     àbçdé         bytes in some                                 bytes i...
Handling Encodings           àbçdé                     àbçdé         bytes in some                                 bytes i...
Handling Encodings                àbçdé                     àbçdé              bytes in some                              ...
Handling Encodings                àbçdé                     àbçdé              bytes in some                              ...
The Holy Grail
The Holy Grail•   Can represent all    encodings
The Holy Grail•   Can represent all    encodings•   Has multibyte character    support
The Holy Grail•   Can represent all    encodings•   Has multibyte character    support    •   for example, length()       ...
It doesnt work like        that
use Encode;
use Encode;Only works in Perl 5.8and above
use Encode;Only works in Perl 5.8   Why the $£%^&*()                         are you using 5.6and above                ANY...
use Encode;Only works in Perl 5.8and aboveThere are solutions for 5.6 and evenearlier. But theyre HORRIBLE.
character-based    internal representation
character-based    internal      Perl has one! representation
character-based    internal               Perl has one! representation          Magic internal representation.
character-based    internal               Perl has one! representation          Magic internal representation.       All s...
character-based    internal               Perl has one! representation          Magic internal representation.       All s...
character-based    internal                 Perl has one! representation          Magic internal representation.       All...
ITS UTF-8!
-8!almost         TF    SU IT
Handling Encodings                àbçdé                     àbçdé              bytes in some                              ...
àbçdé                     àbçdé           bytes in                 bytes ininput   machines 8bit           machines 8bit  ...
I18N? What the £$%^&*(s that?           àbçdé                     àbçdé           bytes in                 bytes ininput  ...
People are still writing Perl like it was Perl 4
People arestill writing Perl like itwas Perl 4
People arestill writing Perl like itwas Perl 4...and we have to supportthem.
People arestill writing Perl like itwas Perl 4...and we have to supportthem.Even though our stringfunctions expect chars.
????Perls magic internal  representation
????Perls magic internal  representation                        if
????Perls magic internal  representation                           if          all characters are representable in        ...
????Perls magic internal  representation                           if          all characters are representable in        ...
????Perls magic internal  representation                           if          all characters are representable in        ...
àbçdé  UTF-8characters
àbçdé  UTF-8characters   use Encode;             $bytes = encode($enc,                       $chars);
àbçdé                               àbçdé  UTF-8                          bytes in desiredcharacters                      ...
àbçdé                               àbçdé  UTF-8                          bytes in desiredcharacters                      ...
àbçdé                               àbçdé  UTF-8                          bytes in desiredcharacters                      ...
àbçdé                               àbçdé  UTF-8                          bytes in desiredcharacters                      ...
UTF-8characters
UTF-8characters+
UTF-8characters+àbçdé machine  bytes
UTF-8+ =characters             ????? àbçdé machine  bytes
UTF-8          + =          characters                       ?????àbçdémachine bytes
UTF-8          + =          characters                       ?????àbçdémachine      promote bytes
UTF-8          + =          characters                                    ?????àbçdé                    àbçdémachine      ...
UTF-8          + =          characters                                         àbçdé                                    UT...
àbçdémachine bytes
àbçdémachine   output bytes
àbçdé          Content-Encoding: UTF-8machine                             output bytes
àbçdé          Content-Encoding: UTF-8   bd                                    ? ? ?machine                            ...
àbçdé               Content-Encoding: UTF-8   bd                                         ? ? ?machine                  ...
àbçdé               Content-Encoding: UTF-8   bd                                         ? ? ?machine                  ...
àbçdé                  Content-Encoding: UTF-8   bd                                            ? ? ? machine           ...
àbçdé                  Content-Encoding: UTF-8   bd                                            ? ? ? machine           ...
àbçdé                  Content-Encoding: UTF-8   bd                                            ? ? ? machine           ...
àbçdé                  Content-Encoding: UTF-8   bd                                            ? ? ? machine           ...
àbçdé                  Content-Encoding: UTF-8   bd                                            ? ? ? machine           ...
àbçdé                  Content-Encoding: UTF-8   bd                                            ? ? ? machine           ...
ARRR   GH!!!!
It gets worse.
You cant tell whatyouve actually got
You cant tell whatyouve actually got  utf8::is_utf8()
You cant tell what  youve actually got      utf8::is_utf8()does not mean what you think it means
You cant tell whatyouve actually got
You cant tell what    youve actually gotencoded bytes
You cant tell what    youve actually gotencoded bytes         utf8::is_utf8() = false
You cant tell what    youve actually gotencoded bytes          utf8::is_utf8() = false               EVEN IF theyre UTF-8
You cant tell what      youve actually got encoded  bytes           utf8::is_utf8() = false                 EVEN IF theyre...
You cant tell what      youve actually got encoded  bytes           utf8::is_utf8() = false                 EVEN IF theyre...
You cant tell what       youve actually got  encoded   bytes           utf8::is_utf8() = false                  EVEN IF th...
You cant tell what       youve actually got  encoded   bytes           utf8::is_utf8() = false                  EVEN IF th...
The science bit
The science bit• Encode.pm  use Encode; $bytes = encode($enc,  $chars);
The science bit• Encode.pm  use Encode; $bytes = encode($enc,  $chars);• 3 argument form of open() - PerlIO layers  open(F...
The science bit• Encode.pm  use Encode; $bytes = encode($enc,  $chars);• 3 argument form of open() - PerlIO layers  open(F...
utf8 vs UTF-8
utf8 vs UTF-8• Encode.pm
utf8 vs UTF-8• Encode.pm • utf8 = marks it as UTF-8 and hopes...
utf8 vs UTF-8• Encode.pm • utf8 = marks it as UTF-8 and hopes... • UTF-8 = is actually valid UTF-8
utf8 vs UTF-8• Encode.pm • utf8 = marks it as UTF-8 and hopes... • UTF-8 = is actually valid UTF-8• PerlIO layers:
utf8 vs UTF-8• Encode.pm • utf8 = marks it as UTF-8 and hopes... • UTF-8 = is actually valid UTF-8• PerlIO layers: • :utf8
utf8 vs UTF-8• Encode.pm • utf8 = marks it as UTF-8 and hopes... • UTF-8 = is actually valid UTF-8• PerlIO layers: • :utf8...
use utf8;
use utf8;• Does NOT do what you might think it  does
use utf8;• Does NOT do what you might think it  does• All it says is my source code is UTF-8.
Modules
Modules• It depends on the module:
Modules• It depends on the module: • CGI - $CGI::PARAM_UTF8=1;
Modules• It depends on the module: • CGI - $CGI::PARAM_UTF8=1; • LWP::UserAgent -   >decoded_content() method honours   Co...
Modules• It depends on the module: • CGI - $CGI::PARAM_UTF8=1; • LWP::UserAgent -   >decoded_content() method honours   Co...
Modules• It depends on the module: • CGI - $CGI::PARAM_UTF8=1; • LWP::UserAgent -   >decoded_content() method honours   Co...
In summary
In summary• decode bytes as soon as you get them:
In summary• decode bytes as soon as you get them: • decode(), binmode(STDIN), 3 arg    open()
In summary• decode bytes as soon as you get them: • decode(), binmode(STDIN), 3 arg    open()• encode characters just befo...
In summary• decode bytes as soon as you get them: • decode(), binmode(STDIN), 3 arg    open()• encode characters just befo...
In summary• decode bytes as soon as you get them: • decode(), binmode(STDIN), 3 arg    open()• encode characters just befo...
NEVER EVER EVERrely on the encoding of      Perls internal     representation
and...
...there isNO SUCH THING        as  "plain text"
Handling Encodings                àbçdé                     àbçdé              bytes in some                              ...
The Holy Fail (thanks Joel!)                àbçdé                     àbçdé              bytes in some                    ...
Questions?
Perl And Unicode
Perl And Unicode
Perl And Unicode
Perl And Unicode
Perl And Unicode
Perl And Unicode
Perl And Unicode
Perl And Unicode
Perl And Unicode
Upcoming SlideShare
Loading in...5
×

Perl And Unicode

2,961

Published on

Perl and Unicode talk from Italian Perl Workshop 2009 and London Perl Workshop 2010.

Published in: Technology, Business
5 Comments
5 Likes
Statistics
Notes
No Downloads
Views
Total Views
2,961
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
65
Comments
5
Likes
5
Embeds 0
No embeds

No notes for slide
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • Perl And Unicode

    1. 1. Perl and Unicode
    2. 2. Perl and UnicodeMike Whitaker, BBC/EnlightenedPerl.org
    3. 3. The problem• Keeping track of input and output encodings• Not losing encoding data in the middle• Understanding the difference between characters and bytes
    4. 4. Characters vs bytes
    5. 5. Characters vs bytescharacters
    6. 6. Characters vs bytescharacters $
    7. 7. Characters vs bytescharacters $ U+0024
    8. 8. Characters vs bytescharacters $ U+0024 bytes (UTF-8)
    9. 9. Characters vs bytescharacters $ U+0024 bytes 0x24 (UTF-8)
    10. 10. Characters vs bytescharacters $ € U+0024 bytes 0x24 (UTF-8)
    11. 11. Characters vs bytescharacters $ € U+0024 U+20AC bytes 0x24 (UTF-8)
    12. 12. Characters vs bytescharacters $ € U+0024 U+20AC bytes 0x24 0xE2 0x82 0xAC (UTF-8)
    13. 13. Characters vs bytes 2characters $ € U+0024 U+20AC bytes 0x24 0xE2 0x82 0xAC (UTF-8)
    14. 14. Characters vs bytes 2characters $ € U+0024 U+20AC 4 bytes 0x24 0xE2 0x82 0xAC (UTF-8)
    15. 15. Handling Encodings
    16. 16. Handling Encodingsinput
    17. 17. Handling Encodings àbçdé bytes in someinput encoding or other
    18. 18. Handling Encodings àbçdé bytes in someinput encoding or other decode
    19. 19. Handling Encodings àbçdé bytes in someinput encoding or other decode àbçdé character-based internal representation
    20. 20. Handling Encodings àbçdé bytes in someinput encoding or other decode encode àbçdé character-based internal representation
    21. 21. Handling Encodings àbçdé àbçdé bytes in some bytes in desiredinput encoding or encoding other decode encode àbçdé character-based internal representation
    22. 22. Handling Encodings àbçdé àbçdé bytes in some bytes in desiredinput encoding or encoding output other decode encode àbçdé character-based internal representation
    23. 23. Handling Encodings àbçdé àbçdé bytes in some bytes in desiredinput encoding or encoding output other decode encodeuse Encode;$chars = decode($enc, àbçdé $bytes); character-based internal representation
    24. 24. Handling Encodings àbçdé àbçdé bytes in some bytes in desiredinput encoding or encoding output other decode encodeuse Encode;$chars = decode($enc, àbçdé use Encode; $bytes = encode($enc, $bytes); $chars); character-based internal representation
    25. 25. The Holy Grail
    26. 26. The Holy Grail• Can represent all encodings
    27. 27. The Holy Grail• Can represent all encodings• Has multibyte character support
    28. 28. The Holy Grail• Can represent all encodings• Has multibyte character support • for example, length() should count characters, not bytes
    29. 29. It doesnt work like that
    30. 30. use Encode;
    31. 31. use Encode;Only works in Perl 5.8and above
    32. 32. use Encode;Only works in Perl 5.8 Why the $£%^&*() are you using 5.6and above ANYWAY?
    33. 33. use Encode;Only works in Perl 5.8and aboveThere are solutions for 5.6 and evenearlier. But theyre HORRIBLE.
    34. 34. character-based internal representation
    35. 35. character-based internal Perl has one! representation
    36. 36. character-based internal Perl has one! representation Magic internal representation.
    37. 37. character-based internal Perl has one! representation Magic internal representation. All string functions know about it.
    38. 38. character-based internal Perl has one! representation Magic internal representation. All string functions know about it. Its encoding-agnostic.
    39. 39. character-based internal Perl has one! representation Magic internal representation. All string functions know about it. Its encoding-agnostic. In fact....
    40. 40. ITS UTF-8!
    41. 41. -8!almost TF SU IT
    42. 42. Handling Encodings àbçdé àbçdé bytes in some bytes in desiredinput encoding or encoding output other decode encodeuse Encode;$chars = decode($enc, àbçdé use Encode; $bytes = encode($enc, $bytes); $chars); Perls magic internal representation
    43. 43. àbçdé àbçdé bytes in bytes ininput machines 8bit machines 8bit output encoding encoding àbçdé bytes in machines 8bit encoding
    44. 44. I18N? What the £$%^&*(s that? àbçdé àbçdé bytes in bytes ininput machines 8bit machines 8bit output encoding encoding àbçdé bytes in machines 8bit encoding
    45. 45. People are still writing Perl like it was Perl 4
    46. 46. People arestill writing Perl like itwas Perl 4
    47. 47. People arestill writing Perl like itwas Perl 4...and we have to supportthem.
    48. 48. People arestill writing Perl like itwas Perl 4...and we have to supportthem.Even though our stringfunctions expect chars.
    49. 49. ????Perls magic internal representation
    50. 50. ????Perls magic internal representation if
    51. 51. ????Perls magic internal representation if all characters are representable in local machines 8 bit charset, use that;
    52. 52. ????Perls magic internal representation if all characters are representable in local machines 8 bit charset, use that; else
    53. 53. ????Perls magic internal representation if all characters are representable in local machines 8 bit charset, use that; else use UTF-8
    54. 54. àbçdé UTF-8characters
    55. 55. àbçdé UTF-8characters use Encode; $bytes = encode($enc, $chars);
    56. 56. àbçdé àbçdé UTF-8 bytes in desiredcharacters output use Encode; encoding $bytes = encode($enc, $chars);
    57. 57. àbçdé àbçdé UTF-8 bytes in desiredcharacters output use Encode; encoding $bytes = encode($enc, $chars); àbçdé machine bytes
    58. 58. àbçdé àbçdé UTF-8 bytes in desiredcharacters output use Encode; encoding $bytes = encode($enc, $chars); àbçdé machine bytes use Encode; $bytes = encode($enc, $chars);
    59. 59. àbçdé àbçdé UTF-8 bytes in desiredcharacters output use Encode; encoding $bytes = encode($enc, $chars); àbçdé àbçdé machine bytes in desired output bytes use Encode; encoding $bytes = encode($enc, $chars);
    60. 60. UTF-8characters
    61. 61. UTF-8characters+
    62. 62. UTF-8characters+àbçdé machine bytes
    63. 63. UTF-8+ =characters ????? àbçdé machine bytes
    64. 64. UTF-8 + = characters ?????àbçdémachine bytes
    65. 65. UTF-8 + = characters ?????àbçdémachine promote bytes
    66. 66. UTF-8 + = characters ?????àbçdé àbçdémachine UTF-8 promote bytes characters
    67. 67. UTF-8 + = characters àbçdé UTF-8 bytesàbçdé àbçdémachine UTF-8 promote bytes characters
    68. 68. àbçdémachine bytes
    69. 69. àbçdémachine output bytes
    70. 70. àbçdé Content-Encoding: UTF-8machine output bytes
    71. 71. àbçdé Content-Encoding: UTF-8 bd ? ? ?machine output bytes
    72. 72. àbçdé Content-Encoding: UTF-8 bd ? ? ?machine output bytes Content-Encoding: ISO-8859-1
    73. 73. àbçdé Content-Encoding: UTF-8 bd ? ? ?machine output bytes Content-Encoding: ISO-8859-1 àbçdé
    74. 74. àbçdé Content-Encoding: UTF-8 bd ? ? ? machine output bytes Content-Encoding: ISO-8859-1 àbçdé àbçdé UTF-8characters
    75. 75. àbçdé Content-Encoding: UTF-8 bd ? ? ? machine output bytes Content-Encoding: ISO-8859-1 àbçdé àbçdé UTF-8 outputcharacters
    76. 76. àbçdé Content-Encoding: UTF-8 bd ? ? ? machine output bytes Content-Encoding: ISO-8859-1 àbçdé àbçdé Content-Encoding: UTF-8 UTF-8 outputcharacters
    77. 77. àbçdé Content-Encoding: UTF-8 bd ? ? ? machine output bytes Content-Encoding: ISO-8859-1 àbçdé àbçdé Content-Encoding: UTF-8 àbçdé UTF-8 outputcharacters
    78. 78. àbçdé Content-Encoding: UTF-8 bd ? ? ? machine output bytes Content-Encoding: ISO-8859-1 àbçdé àbçdé Content-Encoding: UTF-8 àbçdé UTF-8 outputcharacters Content-Encoding: ISO-8859-1
    79. 79. àbçdé Content-Encoding: UTF-8 bd ? ? ? machine output bytes Content-Encoding: ISO-8859-1 àbçdé àbçdé Content-Encoding: UTF-8 àbçdé UTF-8 outputcharacters Content-Encoding: ISO-8859-1 àbçdé
    80. 80. ARRR GH!!!!
    81. 81. It gets worse.
    82. 82. You cant tell whatyouve actually got
    83. 83. You cant tell whatyouve actually got utf8::is_utf8()
    84. 84. You cant tell what youve actually got utf8::is_utf8()does not mean what you think it means
    85. 85. You cant tell whatyouve actually got
    86. 86. You cant tell what youve actually gotencoded bytes
    87. 87. You cant tell what youve actually gotencoded bytes utf8::is_utf8() = false
    88. 88. You cant tell what youve actually gotencoded bytes utf8::is_utf8() = false EVEN IF theyre UTF-8
    89. 89. You cant tell what youve actually got encoded bytes utf8::is_utf8() = false EVEN IF theyre UTF-8 decodedUTF-8 chars
    90. 90. You cant tell what youve actually got encoded bytes utf8::is_utf8() = false EVEN IF theyre UTF-8 decodedUTF-8 chars utf8::is_utf8() = true
    91. 91. You cant tell what youve actually got encoded bytes utf8::is_utf8() = false EVEN IF theyre UTF-8 decodedUTF-8 chars utf8::is_utf8() = true decodedmachine bytes
    92. 92. You cant tell what youve actually got encoded bytes utf8::is_utf8() = false EVEN IF theyre UTF-8 decodedUTF-8 chars utf8::is_utf8() = true decodedmachine bytes utf8::is_utf8() = false
    93. 93. The science bit
    94. 94. The science bit• Encode.pm use Encode; $bytes = encode($enc, $chars);
    95. 95. The science bit• Encode.pm use Encode; $bytes = encode($enc, $chars);• 3 argument form of open() - PerlIO layers open(FILEHANDLE, ">:encoding(UTF-8)", $file);
    96. 96. The science bit• Encode.pm use Encode; $bytes = encode($enc, $chars);• 3 argument form of open() - PerlIO layers open(FILEHANDLE, ">:encoding(UTF-8)", $file);• binmode(FILEHANDE,
    97. 97. utf8 vs UTF-8
    98. 98. utf8 vs UTF-8• Encode.pm
    99. 99. utf8 vs UTF-8• Encode.pm • utf8 = marks it as UTF-8 and hopes...
    100. 100. utf8 vs UTF-8• Encode.pm • utf8 = marks it as UTF-8 and hopes... • UTF-8 = is actually valid UTF-8
    101. 101. utf8 vs UTF-8• Encode.pm • utf8 = marks it as UTF-8 and hopes... • UTF-8 = is actually valid UTF-8• PerlIO layers:
    102. 102. utf8 vs UTF-8• Encode.pm • utf8 = marks it as UTF-8 and hopes... • UTF-8 = is actually valid UTF-8• PerlIO layers: • :utf8
    103. 103. utf8 vs UTF-8• Encode.pm • utf8 = marks it as UTF-8 and hopes... • UTF-8 = is actually valid UTF-8• PerlIO layers: • :utf8 • :encoding(UTF-8)
    104. 104. use utf8;
    105. 105. use utf8;• Does NOT do what you might think it does
    106. 106. use utf8;• Does NOT do what you might think it does• All it says is my source code is UTF-8.
    107. 107. Modules
    108. 108. Modules• It depends on the module:
    109. 109. Modules• It depends on the module: • CGI - $CGI::PARAM_UTF8=1;
    110. 110. Modules• It depends on the module: • CGI - $CGI::PARAM_UTF8=1; • LWP::UserAgent - >decoded_content() method honours Content-Encoding:
    111. 111. Modules• It depends on the module: • CGI - $CGI::PARAM_UTF8=1; • LWP::UserAgent - >decoded_content() method honours Content-Encoding: • DBI - mysql_enable_utf8 in DBI::connect()
    112. 112. Modules• It depends on the module: • CGI - $CGI::PARAM_UTF8=1; • LWP::UserAgent - >decoded_content() method honours Content-Encoding: • DBI - mysql_enable_utf8 in DBI::connect() • XML::LibXML - looks at encoding,
    113. 113. In summary
    114. 114. In summary• decode bytes as soon as you get them:
    115. 115. In summary• decode bytes as soon as you get them: • decode(), binmode(STDIN), 3 arg open()
    116. 116. In summary• decode bytes as soon as you get them: • decode(), binmode(STDIN), 3 arg open()• encode characters just before you output:
    117. 117. In summary• decode bytes as soon as you get them: • decode(), binmode(STDIN), 3 arg open()• encode characters just before you output: • encode(), binmode(STDOUT), 3 arg open()
    118. 118. In summary• decode bytes as soon as you get them: • decode(), binmode(STDIN), 3 arg open()• encode characters just before you output: • encode(), binmode(STDOUT), 3 arg open()• keep track of whether your strings are
    119. 119. NEVER EVER EVERrely on the encoding of Perls internal representation
    120. 120. and...
    121. 121. ...there isNO SUCH THING as "plain text"
    122. 122. Handling Encodings àbçdé àbçdé bytes in some bytes in desiredinput encoding or encoding output other decode encodeuse Encode;$chars = decode($enc, àbçdé use Encode; $bytes = encode($enc, $bytes); $chars); Perls magic internal representation
    123. 123. The Holy Fail (thanks Joel!) àbçdé àbçdé bytes in some bytes in desiredinput encoding or encoding output other decode encodeuse Encode;$chars = decode($enc, àbçdé use Encode; $bytes = encode($enc, $bytes); $chars); Perls magic internal representation
    124. 124. Questions?
    1. Gostou de algum slide específico?

      Recortar slides é uma maneira fácil de colecionar informações para acessar mais tarde.

    ×