Your SlideShare is downloading. ×
"Character sets and iconv" PHP source code
"Character sets and iconv" PHP source code
"Character sets and iconv" PHP source code
"Character sets and iconv" PHP source code
"Character sets and iconv" PHP source code
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

"Character sets and iconv" PHP source code

2,887

Published on

Matching source code for "Character sets and iconv"

Matching source code for "Character sets and iconv"

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
2,887
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
1
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. First play<?php//note that this script file is UTF-8//set browser to ISO-8859-1header("Content-Type: text/html; charset=ISO-8859-1;");$utf8_sentence = That will be £500 please;//gives [That will be £500 please] as UTF-8 is being displayed//as ISO-8859-1echo $utf8_sentence . <br>;$iso_sentence = iconv(UTF-8, ISO-8859-1, $utf8_sentence);//gives [That will be £500 please] as no mismatch//between actual character set of string//and browserecho $iso_sentence . <br>;//YOU TRY IT! When viewing this in your browser,//set the pages encoding to UTF-8 and you will//see the mojibake reverse!?>Within reason<?php//note that this script file is UTF-8//set browser to ISO-8859-1header("Content-Type: text/html; charset=ISO-8859-1;");//some Korean (contains two space characters)$utf8_sentence = 연예가 뒷 이야기;//gives [연예가 ë’· ì ´ì•¼ê¸°] as UTF-8 is being displayed//as ISO-8859-1echo $utf8_sentence . <br>;//gives [Notice: iconv(): Detected an illegal character in inputstring]$iso_sentence = iconv(UTF-8, ISO-8859-1, $utf8_sentence);//gives an empty stringvar_dump($iso_sentence);?>
  • 2. First transliteration<?php//note that this script file is UTF-8//set browser to ISO-8859-1header("Content-Type: text/html; charset=ISO-8859-1;");//some Korean (contains two space characters)$utf8_sentence = 연예가 뒷 이야기;//gives [연예가 ë’· 이야기] as UTF-8 is being displayed//as ISO-8859-1echo $utf8_sentence . <br>;//approximate characters that arent in target character set$iso_sentence = iconv(UTF-8, ISO-8859-1//TRANSLIT,$utf8_sentence);//gives [??? ? ???]echo $iso_sentence . <br>;?>More realistic transliteration (extended)<?php//note that this script file is UTF-8//set browser to UTF-8header("Content-Type: text/html; charset=UTF-8;");//some German$utf8_sentence = Weiß, Goldmann, Göbel, Weiss, Göthe, Goethe undGötz;//fine as UTF-8 is being displayed as UTF-8echo $utf8_sentence . <br>;$trans_sentence = iconv(UTF-8, ASCII//TRANSLIT, $utf8_sentence);//gives [Weiss, Goldmann, G?bel, Weiss, G?the, Goethe und G?tz]//which is not quite what we expected (only ß has been flattened)echo $trans_sentence . <br>;//BUT iconv interacts with system locale setting so lets have aplay:$current_locale = setlocale(LC_ALL, 0);//gives, for me, "C" which is a kind of nondescript defaultecho $current_locale . <br>;
  • 3. //we set the locale of the *target* character setsetlocale(LC_ALL, en_GB);//try again...$trans_sentence = iconv(UTF-8, ASCII//TRANSLIT, $utf8_sentence);//gives [Weiss, Goldmann, Gobel, Weiss, Gothe, Goethe und Gotz]//which is our original string flattened into 7-bit ASCII!echo $trans_sentence . <br>;//out of curiosity...setlocale(LC_ALL, de_DE);$trans_sentence = iconv(UTF-8, ASCII//TRANSLIT, $utf8_sentence);//gives [Weiss, Goldmann, Goebel, Weiss, Goethe, Goethe und Goetz]//which is exactly how a German would transliterate those//umlauted characters if forced to use 7-bit ASCII!//(because really ä = ae, ö = oe and ü = ue)echo $trans_sentence . <br>;?>Ignore example<?php//note that this script file is UTF-8//set browser to ISO-8859-1header("Content-Type: text/html; charset=ISO-8859-1;");//some Korean (contains two space characters)$utf8_sentence = 연예가 뒷 이야기;//gives [연예가 ë’· 이야기] as UTF-8 is being displayed//as ISO-8859-1echo $utf8_sentence . <br>;//discard characters that arent in target character set//STILL gives [Notice: iconv(): Detected an illegal character ininput string]$iso_sentence = iconv(UTF-8, ISO-8859-1//IGNORE, $utf8_sentence);//gives " " (two space characters)var_dump($iso_sentence);?>
  • 4. ob_iconv_handler<?php//note that this script file is UTF-8//character set of PHP scripts etciconv_set_encoding(internal_encoding, UTF-8);//character set of browser output//(sends HTTP header of "Content-Type: text/html; charset=ISO-8859-1;")iconv_set_encoding(output_encoding, ISO-8859-1//TRANSLIT);ob_start(ob_iconv_handler); //start output buffering//Unicode string$utf8_sentence = The Japanese title is "指輪物語";//when buffer is flushed, outputs [The Japanese title is "????"]echo $utf8_sentence;?>iconv_strlen()<?php//note that this script file is UTF-8//set browser to UTF-8header("Content-Type: text/html; charset=UTF-8;");//some Russian (13 characters)$utf8_sentence = Правительство;//gives 13 which is correctecho iconv_strlen($utf8_sentence, UTF-8) . <br>;//lets try core PHP//gives 26 (the *byte* count). Oops!echo strlen($utf8_sentence) . <br>;?>
  • 5. Inter-Japanese conversion (not on presentation)<?php//note that this script file is UTF-8//set browser to EUC-JP (a Japanese character set)header("Content-Type: text/html; charset=EUC-JP;");//some Japanese$utf8_sentence = 一斗缶に詰められた遺体は、藤森容疑者の妻と息子と判明。;//gives mojibake as UTF-8 is being displayed as EUC-JPecho $utf8_sentence . <br>;$euc_sentence = iconv(UTF-8, EUC-JP, $utf8_sentence);//gives intact Japanese stringecho $euc_sentence . <br>;?>

×