SlideShare a Scribd company logo
1 of 5
Download to read offline
First play

<?php

//note that this script file is UTF-8

//set browser to ISO-8859-1
header("Content-Type: text/html; charset=ISO-8859-1;");

$utf8_sentence = 'That will be £500 please';

//gives [That will be £500 please] as UTF-8 is being displayed
//as ISO-8859-1
echo $utf8_sentence . '<br>';

$iso_sentence = iconv('UTF-8', 'ISO-8859-1', $utf8_sentence);

//gives [That will be £500 please] as no mismatch
//between actual character set of string
//and browser
echo $iso_sentence . '<br>';

//YOU TRY IT! When viewing this in your browser,
//set the page's encoding to UTF-8 and you will
//see the mojibake reverse!

?>

Within reason

<?php

//note that this script file is UTF-8

//set browser to ISO-8859-1
header("Content-Type: text/html; charset=ISO-8859-1;");

//some Korean (contains two space characters)
$utf8_sentence = '연예가 뒷 이야기';

//gives [연예가 ë’· ì ´ì•¼ê¸°] as UTF-8 is being displayed
//as ISO-8859-1
echo $utf8_sentence . '<br>';

//gives [Notice: iconv(): Detected an illegal character in input
string]
$iso_sentence = iconv('UTF-8', 'ISO-8859-1', $utf8_sentence);

//gives an empty string
var_dump($iso_sentence);

?>
First transliteration

<?php

//note that this script file is UTF-8

//set browser to ISO-8859-1
header("Content-Type: text/html; charset=ISO-8859-1;");

//some Korean (contains two space characters)
$utf8_sentence = '연예가 뒷 이야기';

//gives [연예가 뒷 이야기] as UTF-8 is being displayed
//as ISO-8859-1
echo $utf8_sentence . '<br>';

//approximate characters that aren't in target character set
$iso_sentence = iconv('UTF-8', 'ISO-8859-1//TRANSLIT',
$utf8_sentence);

//gives [??? ? ???]
echo $iso_sentence . '<br>';

?>

More realistic transliteration (extended)

<?php

//note that this script file is UTF-8

//set browser to UTF-8
header("Content-Type: text/html; charset=UTF-8;");

//some German
$utf8_sentence = 'Weiß, Goldmann, Göbel, Weiss, Göthe, Goethe und
Götz';

//fine as UTF-8 is being displayed as UTF-8
echo $utf8_sentence . '<br>';

$trans_sentence = iconv('UTF-8', 'ASCII//TRANSLIT', $utf8_sentence);

//gives [Weiss, Goldmann, G?bel, Weiss, G?the, Goethe und G?tz]
//which is not quite what we expected (only 'ß' has been flattened)
echo $trans_sentence . '<br>';

//BUT iconv interacts with system locale setting so let's have a
play:

$current_locale = setlocale(LC_ALL, '0');
//gives, for me, "C" which is a kind of nondescript default
echo $current_locale . '<br>';
//we set the locale of the *target* character set
setlocale(LC_ALL, 'en_GB');

//try again...
$trans_sentence = iconv('UTF-8', 'ASCII//TRANSLIT', $utf8_sentence);

//gives [Weiss, Goldmann, Gobel, Weiss, Gothe, Goethe und Gotz]
//which is our original string flattened into 7-bit ASCII!
echo $trans_sentence . '<br>';

//out of curiosity...
setlocale(LC_ALL, 'de_DE');

$trans_sentence = iconv('UTF-8', 'ASCII//TRANSLIT', $utf8_sentence);

//gives [Weiss, Goldmann, Goebel, Weiss, Goethe, Goethe und Goetz]
//which is exactly how a German would transliterate those
//umlauted characters if forced to use 7-bit ASCII!
//(because really ä = ae, ö = oe and ü = ue)
echo $trans_sentence . '<br>';

?>

Ignore example

<?php

//note that this script file is UTF-8

//set browser to ISO-8859-1
header("Content-Type: text/html; charset=ISO-8859-1;");

//some Korean (contains two space characters)
$utf8_sentence = '연예가 뒷 이야기';

//gives [연예가 뒷 이야기] as UTF-8 is being displayed
//as ISO-8859-1
echo $utf8_sentence . '<br>';

//discard characters that aren't in target character set
//STILL gives [Notice: iconv(): Detected an illegal character in
input string]
$iso_sentence = iconv('UTF-8', 'ISO-8859-1//IGNORE', $utf8_sentence);

//gives " " (two space characters)
var_dump($iso_sentence);

?>
ob_iconv_handler

<?php

//note that this script file is UTF-8

//character set of PHP scripts etc
iconv_set_encoding('internal_encoding', 'UTF-8');

//character set of browser output
//(sends HTTP header of "Content-Type: text/html; charset=ISO-8859-
1;")
iconv_set_encoding('output_encoding', 'ISO-8859-1//TRANSLIT');

ob_start('ob_iconv_handler');   //start output buffering

//Unicode string
$utf8_sentence = 'The Japanese title is "指輪物語"';

//when buffer is flushed, outputs [The Japanese title is "????"]
echo $utf8_sentence;

?>

iconv_strlen()

<?php

//note that this script file is UTF-8

//set browser to UTF-8
header("Content-Type: text/html; charset=UTF-8;");

//some Russian (13 characters)
$utf8_sentence = 'Правительство';

//gives 13 which is correct
echo iconv_strlen($utf8_sentence, 'UTF-8') . '<br>';

//let's try core PHP
//gives 26 (the *byte* count). Oops!
echo strlen($utf8_sentence) . '<br>';

?>
Inter-Japanese conversion (not on presentation)

<?php

//note that this script file is UTF-8

//set browser to EUC-JP (a Japanese character set)
header("Content-Type: text/html; charset=EUC-JP;");

//some Japanese
$utf8_sentence = '一斗缶に詰められた遺体は、藤森容疑者の妻と息子と判明。';

//gives mojibake as UTF-8 is being displayed as EUC-JP
echo $utf8_sentence . '<br>';

$euc_sentence = iconv('UTF-8', 'EUC-JP', $utf8_sentence);

//gives intact Japanese string
echo $euc_sentence . '<br>';

?>

More Related Content

What's hot

What's hot (18)

PHP and Databases
PHP and DatabasesPHP and Databases
PHP and Databases
 
An Introduction to PHP... and Why It's Yucky!
An Introduction to PHP... and Why It's Yucky!An Introduction to PHP... and Why It's Yucky!
An Introduction to PHP... and Why It's Yucky!
 
Php talk
Php talkPhp talk
Php talk
 
WordPress: From Antispambot to Zeroize
WordPress: From Antispambot to ZeroizeWordPress: From Antispambot to Zeroize
WordPress: From Antispambot to Zeroize
 
Class 6 - PHP Web Programming
Class 6 - PHP Web ProgrammingClass 6 - PHP Web Programming
Class 6 - PHP Web Programming
 
Add loop shortcode
Add loop shortcodeAdd loop shortcode
Add loop shortcode
 
Intro to php
Intro to phpIntro to php
Intro to php
 
Generating Power with Yield
Generating Power with YieldGenerating Power with Yield
Generating Power with Yield
 
PHP POWERPOINT SLIDES
PHP POWERPOINT SLIDESPHP POWERPOINT SLIDES
PHP POWERPOINT SLIDES
 
basic concept of php(Gunikhan sonowal)
basic concept of php(Gunikhan sonowal)basic concept of php(Gunikhan sonowal)
basic concept of php(Gunikhan sonowal)
 
Intro to PHP
Intro to PHPIntro to PHP
Intro to PHP
 
Smarty Template Engine
Smarty Template EngineSmarty Template Engine
Smarty Template Engine
 
PHP Variables and scopes
PHP Variables and scopesPHP Variables and scopes
PHP Variables and scopes
 
Sa
SaSa
Sa
 
Introduction to php web programming - get and post
Introduction to php  web programming - get and postIntroduction to php  web programming - get and post
Introduction to php web programming - get and post
 
Php mysql
Php mysqlPhp mysql
Php mysql
 
bash
bashbash
bash
 
Quick tour of PHP from inside
Quick tour of PHP from insideQuick tour of PHP from inside
Quick tour of PHP from inside
 

Similar to "Character sets and iconv" PHP source code

Unicode, PHP, and Character Set Collisions
Unicode, PHP, and Character Set CollisionsUnicode, PHP, and Character Set Collisions
Unicode, PHP, and Character Set CollisionsRay Paseur
 
Get into the FLOW with Extbase
Get into the FLOW with ExtbaseGet into the FLOW with Extbase
Get into the FLOW with ExtbaseJochen Rau
 
Unicode and character sets
Unicode and character setsUnicode and character sets
Unicode and character setsrenchenyu
 
Software Internationalization Crash Course
Software Internationalization Crash CourseSoftware Internationalization Crash Course
Software Internationalization Crash CourseWill Iverson
 
2014 database - course 2 - php
2014 database - course 2 - php2014 database - course 2 - php
2014 database - course 2 - phpHung-yu Lin
 
The new features of PHP 7 - Enrico Zimuel - Codemotion Milan 2016
The new features of PHP 7 - Enrico Zimuel - Codemotion Milan 2016The new features of PHP 7 - Enrico Zimuel - Codemotion Milan 2016
The new features of PHP 7 - Enrico Zimuel - Codemotion Milan 2016Codemotion
 
Ch1(introduction to php)
Ch1(introduction to php)Ch1(introduction to php)
Ch1(introduction to php)Chhom Karath
 
Unicode (UTF-8) with PHP 5.3, MySQL 5.5 and HTML5 Cheat Sheet (2011)
Unicode (UTF-8) with PHP 5.3, MySQL 5.5 and HTML5 Cheat Sheet (2011)Unicode (UTF-8) with PHP 5.3, MySQL 5.5 and HTML5 Cheat Sheet (2011)
Unicode (UTF-8) with PHP 5.3, MySQL 5.5 and HTML5 Cheat Sheet (2011)Francois Cardinaux
 

Similar to "Character sets and iconv" PHP source code (20)

Unicode 101
Unicode 101Unicode 101
Unicode 101
 
Php mysql ppt
Php mysql pptPhp mysql ppt
Php mysql ppt
 
Unicode, PHP, and Character Set Collisions
Unicode, PHP, and Character Set CollisionsUnicode, PHP, and Character Set Collisions
Unicode, PHP, and Character Set Collisions
 
Get into the FLOW with Extbase
Get into the FLOW with ExtbaseGet into the FLOW with Extbase
Get into the FLOW with Extbase
 
Unicode and character sets
Unicode and character setsUnicode and character sets
Unicode and character sets
 
Php mysql
Php mysqlPhp mysql
Php mysql
 
My cool new Slideshow!
My cool new Slideshow!My cool new Slideshow!
My cool new Slideshow!
 
slidesharenew1
slidesharenew1slidesharenew1
slidesharenew1
 
Software Internationalization Crash Course
Software Internationalization Crash CourseSoftware Internationalization Crash Course
Software Internationalization Crash Course
 
PHP for Grown-ups
PHP for Grown-upsPHP for Grown-ups
PHP for Grown-ups
 
Php mysql
Php mysqlPhp mysql
Php mysql
 
2014 database - course 2 - php
2014 database - course 2 - php2014 database - course 2 - php
2014 database - course 2 - php
 
Php Lecture Notes
Php Lecture NotesPhp Lecture Notes
Php Lecture Notes
 
Introduction in php
Introduction in phpIntroduction in php
Introduction in php
 
Js mod1
Js mod1Js mod1
Js mod1
 
The new features of PHP 7 - Enrico Zimuel - Codemotion Milan 2016
The new features of PHP 7 - Enrico Zimuel - Codemotion Milan 2016The new features of PHP 7 - Enrico Zimuel - Codemotion Milan 2016
The new features of PHP 7 - Enrico Zimuel - Codemotion Milan 2016
 
The new features of PHP 7
The new features of PHP 7The new features of PHP 7
The new features of PHP 7
 
Blog Hacks 2011
Blog Hacks 2011Blog Hacks 2011
Blog Hacks 2011
 
Ch1(introduction to php)
Ch1(introduction to php)Ch1(introduction to php)
Ch1(introduction to php)
 
Unicode (UTF-8) with PHP 5.3, MySQL 5.5 and HTML5 Cheat Sheet (2011)
Unicode (UTF-8) with PHP 5.3, MySQL 5.5 and HTML5 Cheat Sheet (2011)Unicode (UTF-8) with PHP 5.3, MySQL 5.5 and HTML5 Cheat Sheet (2011)
Unicode (UTF-8) with PHP 5.3, MySQL 5.5 and HTML5 Cheat Sheet (2011)
 

More from Daniel_Rhodes

PhoneGap by Dissection
PhoneGap by DissectionPhoneGap by Dissection
PhoneGap by DissectionDaniel_Rhodes
 
Hyperlocalisation or "localising everything"
Hyperlocalisation or "localising everything"Hyperlocalisation or "localising everything"
Hyperlocalisation or "localising everything"Daniel_Rhodes
 
PHP floating point precision
PHP floating point precisionPHP floating point precision
PHP floating point precisionDaniel_Rhodes
 
Creating a constructive comment culture
Creating a constructive comment cultureCreating a constructive comment culture
Creating a constructive comment cultureDaniel_Rhodes
 
"Internationalisation with PHP and Intl" source code
"Internationalisation with PHP and Intl" source code"Internationalisation with PHP and Intl" source code
"Internationalisation with PHP and Intl" source codeDaniel_Rhodes
 
Internationalisation with PHP and Intl
Internationalisation with PHP and IntlInternationalisation with PHP and Intl
Internationalisation with PHP and IntlDaniel_Rhodes
 
Character sets and iconv
Character sets and iconvCharacter sets and iconv
Character sets and iconvDaniel_Rhodes
 
Handling multibyte CSV files in PHP
Handling multibyte CSV files in PHPHandling multibyte CSV files in PHP
Handling multibyte CSV files in PHPDaniel_Rhodes
 
Multibyte string handling in PHP
Multibyte string handling in PHPMultibyte string handling in PHP
Multibyte string handling in PHPDaniel_Rhodes
 

More from Daniel_Rhodes (9)

PhoneGap by Dissection
PhoneGap by DissectionPhoneGap by Dissection
PhoneGap by Dissection
 
Hyperlocalisation or "localising everything"
Hyperlocalisation or "localising everything"Hyperlocalisation or "localising everything"
Hyperlocalisation or "localising everything"
 
PHP floating point precision
PHP floating point precisionPHP floating point precision
PHP floating point precision
 
Creating a constructive comment culture
Creating a constructive comment cultureCreating a constructive comment culture
Creating a constructive comment culture
 
"Internationalisation with PHP and Intl" source code
"Internationalisation with PHP and Intl" source code"Internationalisation with PHP and Intl" source code
"Internationalisation with PHP and Intl" source code
 
Internationalisation with PHP and Intl
Internationalisation with PHP and IntlInternationalisation with PHP and Intl
Internationalisation with PHP and Intl
 
Character sets and iconv
Character sets and iconvCharacter sets and iconv
Character sets and iconv
 
Handling multibyte CSV files in PHP
Handling multibyte CSV files in PHPHandling multibyte CSV files in PHP
Handling multibyte CSV files in PHP
 
Multibyte string handling in PHP
Multibyte string handling in PHPMultibyte string handling in PHP
Multibyte string handling in PHP
 

Recently uploaded

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 

Recently uploaded (20)

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 

"Character sets and iconv" PHP source code

  • 1. First play <?php //note that this script file is UTF-8 //set browser to ISO-8859-1 header("Content-Type: text/html; charset=ISO-8859-1;"); $utf8_sentence = 'That will be £500 please'; //gives [That will be £500 please] as UTF-8 is being displayed //as ISO-8859-1 echo $utf8_sentence . '<br>'; $iso_sentence = iconv('UTF-8', 'ISO-8859-1', $utf8_sentence); //gives [That will be £500 please] as no mismatch //between actual character set of string //and browser echo $iso_sentence . '<br>'; //YOU TRY IT! When viewing this in your browser, //set the page's encoding to UTF-8 and you will //see the mojibake reverse! ?> Within reason <?php //note that this script file is UTF-8 //set browser to ISO-8859-1 header("Content-Type: text/html; charset=ISO-8859-1;"); //some Korean (contains two space characters) $utf8_sentence = '연예가 뒷 이야기'; //gives [연예가 ë’· ì ´ì•¼ê¸°] as UTF-8 is being displayed //as ISO-8859-1 echo $utf8_sentence . '<br>'; //gives [Notice: iconv(): Detected an illegal character in input string] $iso_sentence = iconv('UTF-8', 'ISO-8859-1', $utf8_sentence); //gives an empty string var_dump($iso_sentence); ?>
  • 2. First transliteration <?php //note that this script file is UTF-8 //set browser to ISO-8859-1 header("Content-Type: text/html; charset=ISO-8859-1;"); //some Korean (contains two space characters) $utf8_sentence = '연예가 뒷 이야기'; //gives [연예가 ë’· 이야기] as UTF-8 is being displayed //as ISO-8859-1 echo $utf8_sentence . '<br>'; //approximate characters that aren't in target character set $iso_sentence = iconv('UTF-8', 'ISO-8859-1//TRANSLIT', $utf8_sentence); //gives [??? ? ???] echo $iso_sentence . '<br>'; ?> More realistic transliteration (extended) <?php //note that this script file is UTF-8 //set browser to UTF-8 header("Content-Type: text/html; charset=UTF-8;"); //some German $utf8_sentence = 'Weiß, Goldmann, Göbel, Weiss, Göthe, Goethe und Götz'; //fine as UTF-8 is being displayed as UTF-8 echo $utf8_sentence . '<br>'; $trans_sentence = iconv('UTF-8', 'ASCII//TRANSLIT', $utf8_sentence); //gives [Weiss, Goldmann, G?bel, Weiss, G?the, Goethe und G?tz] //which is not quite what we expected (only 'ß' has been flattened) echo $trans_sentence . '<br>'; //BUT iconv interacts with system locale setting so let's have a play: $current_locale = setlocale(LC_ALL, '0'); //gives, for me, "C" which is a kind of nondescript default echo $current_locale . '<br>';
  • 3. //we set the locale of the *target* character set setlocale(LC_ALL, 'en_GB'); //try again... $trans_sentence = iconv('UTF-8', 'ASCII//TRANSLIT', $utf8_sentence); //gives [Weiss, Goldmann, Gobel, Weiss, Gothe, Goethe und Gotz] //which is our original string flattened into 7-bit ASCII! echo $trans_sentence . '<br>'; //out of curiosity... setlocale(LC_ALL, 'de_DE'); $trans_sentence = iconv('UTF-8', 'ASCII//TRANSLIT', $utf8_sentence); //gives [Weiss, Goldmann, Goebel, Weiss, Goethe, Goethe und Goetz] //which is exactly how a German would transliterate those //umlauted characters if forced to use 7-bit ASCII! //(because really ä = ae, ö = oe and ü = ue) echo $trans_sentence . '<br>'; ?> Ignore example <?php //note that this script file is UTF-8 //set browser to ISO-8859-1 header("Content-Type: text/html; charset=ISO-8859-1;"); //some Korean (contains two space characters) $utf8_sentence = '연예가 뒷 이야기'; //gives [연예가 ë’· 이야기] as UTF-8 is being displayed //as ISO-8859-1 echo $utf8_sentence . '<br>'; //discard characters that aren't in target character set //STILL gives [Notice: iconv(): Detected an illegal character in input string] $iso_sentence = iconv('UTF-8', 'ISO-8859-1//IGNORE', $utf8_sentence); //gives " " (two space characters) var_dump($iso_sentence); ?>
  • 4. ob_iconv_handler <?php //note that this script file is UTF-8 //character set of PHP scripts etc iconv_set_encoding('internal_encoding', 'UTF-8'); //character set of browser output //(sends HTTP header of "Content-Type: text/html; charset=ISO-8859- 1;") iconv_set_encoding('output_encoding', 'ISO-8859-1//TRANSLIT'); ob_start('ob_iconv_handler'); //start output buffering //Unicode string $utf8_sentence = 'The Japanese title is "指輪物語"'; //when buffer is flushed, outputs [The Japanese title is "????"] echo $utf8_sentence; ?> iconv_strlen() <?php //note that this script file is UTF-8 //set browser to UTF-8 header("Content-Type: text/html; charset=UTF-8;"); //some Russian (13 characters) $utf8_sentence = 'Правительство'; //gives 13 which is correct echo iconv_strlen($utf8_sentence, 'UTF-8') . '<br>'; //let's try core PHP //gives 26 (the *byte* count). Oops! echo strlen($utf8_sentence) . '<br>'; ?>
  • 5. Inter-Japanese conversion (not on presentation) <?php //note that this script file is UTF-8 //set browser to EUC-JP (a Japanese character set) header("Content-Type: text/html; charset=EUC-JP;"); //some Japanese $utf8_sentence = '一斗缶に詰められた遺体は、藤森容疑者の妻と息子と判明。'; //gives mojibake as UTF-8 is being displayed as EUC-JP echo $utf8_sentence . '<br>'; $euc_sentence = iconv('UTF-8', 'EUC-JP', $utf8_sentence); //gives intact Japanese string echo $euc_sentence . '<br>'; ?>