"Character sets and iconv" PHP source code

3,501 views

Published on

Matching source code for "Character sets and iconv"

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
3,501
On SlideShare
0
From Embeds
0
Number of Embeds
7
Actions
Shares
0
Downloads
2
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

"Character sets and iconv" PHP source code

  1. 1. First play<?php//note that this script file is UTF-8//set browser to ISO-8859-1header("Content-Type: text/html; charset=ISO-8859-1;");$utf8_sentence = That will be £500 please;//gives [That will be £500 please] as UTF-8 is being displayed//as ISO-8859-1echo $utf8_sentence . <br>;$iso_sentence = iconv(UTF-8, ISO-8859-1, $utf8_sentence);//gives [That will be £500 please] as no mismatch//between actual character set of string//and browserecho $iso_sentence . <br>;//YOU TRY IT! When viewing this in your browser,//set the pages encoding to UTF-8 and you will//see the mojibake reverse!?>Within reason<?php//note that this script file is UTF-8//set browser to ISO-8859-1header("Content-Type: text/html; charset=ISO-8859-1;");//some Korean (contains two space characters)$utf8_sentence = 연예가 뒷 이야기;//gives [연예가 ë’· ì ´ì•¼ê¸°] as UTF-8 is being displayed//as ISO-8859-1echo $utf8_sentence . <br>;//gives [Notice: iconv(): Detected an illegal character in inputstring]$iso_sentence = iconv(UTF-8, ISO-8859-1, $utf8_sentence);//gives an empty stringvar_dump($iso_sentence);?>
  2. 2. First transliteration<?php//note that this script file is UTF-8//set browser to ISO-8859-1header("Content-Type: text/html; charset=ISO-8859-1;");//some Korean (contains two space characters)$utf8_sentence = 연예가 뒷 이야기;//gives [연예가 ë’· 이야기] as UTF-8 is being displayed//as ISO-8859-1echo $utf8_sentence . <br>;//approximate characters that arent in target character set$iso_sentence = iconv(UTF-8, ISO-8859-1//TRANSLIT,$utf8_sentence);//gives [??? ? ???]echo $iso_sentence . <br>;?>More realistic transliteration (extended)<?php//note that this script file is UTF-8//set browser to UTF-8header("Content-Type: text/html; charset=UTF-8;");//some German$utf8_sentence = Weiß, Goldmann, Göbel, Weiss, Göthe, Goethe undGötz;//fine as UTF-8 is being displayed as UTF-8echo $utf8_sentence . <br>;$trans_sentence = iconv(UTF-8, ASCII//TRANSLIT, $utf8_sentence);//gives [Weiss, Goldmann, G?bel, Weiss, G?the, Goethe und G?tz]//which is not quite what we expected (only ß has been flattened)echo $trans_sentence . <br>;//BUT iconv interacts with system locale setting so lets have aplay:$current_locale = setlocale(LC_ALL, 0);//gives, for me, "C" which is a kind of nondescript defaultecho $current_locale . <br>;
  3. 3. //we set the locale of the *target* character setsetlocale(LC_ALL, en_GB);//try again...$trans_sentence = iconv(UTF-8, ASCII//TRANSLIT, $utf8_sentence);//gives [Weiss, Goldmann, Gobel, Weiss, Gothe, Goethe und Gotz]//which is our original string flattened into 7-bit ASCII!echo $trans_sentence . <br>;//out of curiosity...setlocale(LC_ALL, de_DE);$trans_sentence = iconv(UTF-8, ASCII//TRANSLIT, $utf8_sentence);//gives [Weiss, Goldmann, Goebel, Weiss, Goethe, Goethe und Goetz]//which is exactly how a German would transliterate those//umlauted characters if forced to use 7-bit ASCII!//(because really ä = ae, ö = oe and ü = ue)echo $trans_sentence . <br>;?>Ignore example<?php//note that this script file is UTF-8//set browser to ISO-8859-1header("Content-Type: text/html; charset=ISO-8859-1;");//some Korean (contains two space characters)$utf8_sentence = 연예가 뒷 이야기;//gives [연예가 ë’· 이야기] as UTF-8 is being displayed//as ISO-8859-1echo $utf8_sentence . <br>;//discard characters that arent in target character set//STILL gives [Notice: iconv(): Detected an illegal character ininput string]$iso_sentence = iconv(UTF-8, ISO-8859-1//IGNORE, $utf8_sentence);//gives " " (two space characters)var_dump($iso_sentence);?>
  4. 4. ob_iconv_handler<?php//note that this script file is UTF-8//character set of PHP scripts etciconv_set_encoding(internal_encoding, UTF-8);//character set of browser output//(sends HTTP header of "Content-Type: text/html; charset=ISO-8859-1;")iconv_set_encoding(output_encoding, ISO-8859-1//TRANSLIT);ob_start(ob_iconv_handler); //start output buffering//Unicode string$utf8_sentence = The Japanese title is "指輪物語";//when buffer is flushed, outputs [The Japanese title is "????"]echo $utf8_sentence;?>iconv_strlen()<?php//note that this script file is UTF-8//set browser to UTF-8header("Content-Type: text/html; charset=UTF-8;");//some Russian (13 characters)$utf8_sentence = Правительство;//gives 13 which is correctecho iconv_strlen($utf8_sentence, UTF-8) . <br>;//lets try core PHP//gives 26 (the *byte* count). Oops!echo strlen($utf8_sentence) . <br>;?>
  5. 5. Inter-Japanese conversion (not on presentation)<?php//note that this script file is UTF-8//set browser to EUC-JP (a Japanese character set)header("Content-Type: text/html; charset=EUC-JP;");//some Japanese$utf8_sentence = 一斗缶に詰められた遺体は、藤森容疑者の妻と息子と判明。;//gives mojibake as UTF-8 is being displayed as EUC-JPecho $utf8_sentence . <br>;$euc_sentence = iconv(UTF-8, EUC-JP, $utf8_sentence);//gives intact Japanese stringecho $euc_sentence . <br>;?>

×