3. The problem
• Keeping track of input and output
encodings
• Not losing encoding data in the middle
• Understanding the difference between
characters and bytes
20. Handling Encodings
àbçdé
bytes in some
input encoding or
other
decode
àbçdé
character-based
internal
representation
21. Handling Encodings
àbçdé
bytes in some
input encoding or
other
decode encode
àbçdé
character-based
internal
representation
22. Handling Encodings
àbçdé àbçdé
bytes in some
bytes in desired
input encoding or
encoding
other
decode encode
àbçdé
character-based
internal
representation
23. Handling Encodings
àbçdé àbçdé
bytes in some
bytes in desired
input encoding or
encoding output
other
decode encode
àbçdé
character-based
internal
representation
24. Handling Encodings
àbçdé àbçdé
bytes in some
bytes in desired
input encoding or
encoding output
other
decode encode
use Encode;
$chars = decode($enc,
àbçdé
$bytes);
character-based
internal
representation
25. Handling Encodings
àbçdé àbçdé
bytes in some
bytes in desired
input encoding or
encoding output
other
decode encode
use Encode;
$chars = decode($enc,
àbçdé use Encode;
$bytes = encode($enc,
$bytes); $chars);
character-based
internal
representation
37. character-based
internal Perl has one!
representation
Magic internal representation.
38. character-based
internal Perl has one!
representation
Magic internal representation.
All string functions know about it.
39. character-based
internal Perl has one!
representation
Magic internal representation.
All string functions know about it.
It's encoding-agnostic.
40. character-based
internal Perl has one!
representation
Magic internal representation.
All string functions know about it.
It's encoding-agnostic.
In fact....
100. The science bit
• Encode.pm
use Encode; $bytes = encode($enc,
$chars);
101. The science bit
• Encode.pm
use Encode; $bytes = encode($enc,
$chars);
• 3 argument form of open() - PerlIO layers
open(FILEHANDLE, ">:encoding(UTF-8)",
$file);
102. The science bit
• Encode.pm
use Encode; $bytes = encode($enc,
$chars);
• 3 argument form of open() - PerlIO layers
open(FILEHANDLE, ">:encoding(UTF-8)",
$file);
• binmode(FILEHANDE,
106. 'utf8' vs 'UTF-8'
• Encode.pm
• utf8 = marks it as UTF-8 and hopes...
• UTF-8 = is actually valid UTF-8
107. 'utf8' vs 'UTF-8'
• Encode.pm
• utf8 = marks it as UTF-8 and hopes...
• UTF-8 = is actually valid UTF-8
• PerlIO layers:
108. 'utf8' vs 'UTF-8'
• Encode.pm
• utf8 = marks it as UTF-8 and hopes...
• UTF-8 = is actually valid UTF-8
• PerlIO layers:
• :utf8
109. 'utf8' vs 'UTF-8'
• Encode.pm
• utf8 = marks it as UTF-8 and hopes...
• UTF-8 = is actually valid UTF-8
• PerlIO layers:
• :utf8
• :encoding(UTF-8)
121. In summary
• decode bytes as soon as you get them:
• decode(), binmode(STDIN), 3 arg
open()
122. In summary
• decode bytes as soon as you get them:
• decode(), binmode(STDIN), 3 arg
open()
• encode characters just before you output:
123. In summary
• decode bytes as soon as you get them:
• decode(), binmode(STDIN), 3 arg
open()
• encode characters just before you output:
• encode(), binmode(STDOUT), 3 arg
open()
124. In summary
• decode bytes as soon as you get them:
• decode(), binmode(STDIN), 3 arg
open()
• encode characters just before you output:
• encode(), binmode(STDOUT), 3 arg
open()
• keep track of whether your strings are
131. Handling Encodings
àbçdé àbçdé
bytes in some
bytes in desired
input encoding or
encoding output
other
decode encode
use Encode;
$chars = decode($enc,
àbçdé use Encode;
$bytes = encode($enc,
$bytes); $chars);
Perl's magic internal
representation
132. The Holy Fail (thanks Joel!)
àbçdé àbçdé
bytes in some
bytes in desired
input encoding or
encoding output
other
decode encode
use Encode;
$chars = decode($enc,
àbçdé use Encode;
$bytes = encode($enc,
$bytes); $chars);
Perl's magic internal
representation