SlideShare a Scribd company logo
1 of 5
Download to read offline
First play

<?php

//note that this script file is UTF-8

//set browser to ISO-8859-1
header("Content-Type: text/html; charset=ISO-8859-1;");

$utf8_sentence = 'That will be £500 please';

//gives [That will be £500 please] as UTF-8 is being displayed
//as ISO-8859-1
echo $utf8_sentence . '<br>';

$iso_sentence = iconv('UTF-8', 'ISO-8859-1', $utf8_sentence);

//gives [That will be £500 please] as no mismatch
//between actual character set of string
//and browser
echo $iso_sentence . '<br>';

//YOU TRY IT! When viewing this in your browser,
//set the page's encoding to UTF-8 and you will
//see the mojibake reverse!

?>

Within reason

<?php

//note that this script file is UTF-8

//set browser to ISO-8859-1
header("Content-Type: text/html; charset=ISO-8859-1;");

//some Korean (contains two space characters)
$utf8_sentence = '연예가 뒷 이야기';

//gives [연예가 ë’· ì ´ì•¼ê¸°] as UTF-8 is being displayed
//as ISO-8859-1
echo $utf8_sentence . '<br>';

//gives [Notice: iconv(): Detected an illegal character in input
string]
$iso_sentence = iconv('UTF-8', 'ISO-8859-1', $utf8_sentence);

//gives an empty string
var_dump($iso_sentence);

?>
First transliteration

<?php

//note that this script file is UTF-8

//set browser to ISO-8859-1
header("Content-Type: text/html; charset=ISO-8859-1;");

//some Korean (contains two space characters)
$utf8_sentence = '연예가 뒷 이야기';

//gives [연예가 뒷 이야기] as UTF-8 is being displayed
//as ISO-8859-1
echo $utf8_sentence . '<br>';

//approximate characters that aren't in target character set
$iso_sentence = iconv('UTF-8', 'ISO-8859-1//TRANSLIT',
$utf8_sentence);

//gives [??? ? ???]
echo $iso_sentence . '<br>';

?>

More realistic transliteration (extended)

<?php

//note that this script file is UTF-8

//set browser to UTF-8
header("Content-Type: text/html; charset=UTF-8;");

//some German
$utf8_sentence = 'Weiß, Goldmann, Göbel, Weiss, Göthe, Goethe und
Götz';

//fine as UTF-8 is being displayed as UTF-8
echo $utf8_sentence . '<br>';

$trans_sentence = iconv('UTF-8', 'ASCII//TRANSLIT', $utf8_sentence);

//gives [Weiss, Goldmann, G?bel, Weiss, G?the, Goethe und G?tz]
//which is not quite what we expected (only 'ß' has been flattened)
echo $trans_sentence . '<br>';

//BUT iconv interacts with system locale setting so let's have a
play:

$current_locale = setlocale(LC_ALL, '0');
//gives, for me, "C" which is a kind of nondescript default
echo $current_locale . '<br>';
//we set the locale of the *target* character set
setlocale(LC_ALL, 'en_GB');

//try again...
$trans_sentence = iconv('UTF-8', 'ASCII//TRANSLIT', $utf8_sentence);

//gives [Weiss, Goldmann, Gobel, Weiss, Gothe, Goethe und Gotz]
//which is our original string flattened into 7-bit ASCII!
echo $trans_sentence . '<br>';

//out of curiosity...
setlocale(LC_ALL, 'de_DE');

$trans_sentence = iconv('UTF-8', 'ASCII//TRANSLIT', $utf8_sentence);

//gives [Weiss, Goldmann, Goebel, Weiss, Goethe, Goethe und Goetz]
//which is exactly how a German would transliterate those
//umlauted characters if forced to use 7-bit ASCII!
//(because really ä = ae, ö = oe and ü = ue)
echo $trans_sentence . '<br>';

?>

Ignore example

<?php

//note that this script file is UTF-8

//set browser to ISO-8859-1
header("Content-Type: text/html; charset=ISO-8859-1;");

//some Korean (contains two space characters)
$utf8_sentence = '연예가 뒷 이야기';

//gives [연예가 뒷 이야기] as UTF-8 is being displayed
//as ISO-8859-1
echo $utf8_sentence . '<br>';

//discard characters that aren't in target character set
//STILL gives [Notice: iconv(): Detected an illegal character in
input string]
$iso_sentence = iconv('UTF-8', 'ISO-8859-1//IGNORE', $utf8_sentence);

//gives " " (two space characters)
var_dump($iso_sentence);

?>
ob_iconv_handler

<?php

//note that this script file is UTF-8

//character set of PHP scripts etc
iconv_set_encoding('internal_encoding', 'UTF-8');

//character set of browser output
//(sends HTTP header of "Content-Type: text/html; charset=ISO-8859-
1;")
iconv_set_encoding('output_encoding', 'ISO-8859-1//TRANSLIT');

ob_start('ob_iconv_handler');   //start output buffering

//Unicode string
$utf8_sentence = 'The Japanese title is "指輪物語"';

//when buffer is flushed, outputs [The Japanese title is "????"]
echo $utf8_sentence;

?>

iconv_strlen()

<?php

//note that this script file is UTF-8

//set browser to UTF-8
header("Content-Type: text/html; charset=UTF-8;");

//some Russian (13 characters)
$utf8_sentence = 'Правительство';

//gives 13 which is correct
echo iconv_strlen($utf8_sentence, 'UTF-8') . '<br>';

//let's try core PHP
//gives 26 (the *byte* count). Oops!
echo strlen($utf8_sentence) . '<br>';

?>
Inter-Japanese conversion (not on presentation)

<?php

//note that this script file is UTF-8

//set browser to EUC-JP (a Japanese character set)
header("Content-Type: text/html; charset=EUC-JP;");

//some Japanese
$utf8_sentence = '一斗缶に詰められた遺体は、藤森容疑者の妻と息子と判明。';

//gives mojibake as UTF-8 is being displayed as EUC-JP
echo $utf8_sentence . '<br>';

$euc_sentence = iconv('UTF-8', 'EUC-JP', $utf8_sentence);

//gives intact Japanese string
echo $euc_sentence . '<br>';

?>

More Related Content

What's hot

What's hot (18)

PHP and Databases
PHP and DatabasesPHP and Databases
PHP and Databases
 
An Introduction to PHP... and Why It's Yucky!
An Introduction to PHP... and Why It's Yucky!An Introduction to PHP... and Why It's Yucky!
An Introduction to PHP... and Why It's Yucky!
 
Php talk
Php talkPhp talk
Php talk
 
WordPress: From Antispambot to Zeroize
WordPress: From Antispambot to ZeroizeWordPress: From Antispambot to Zeroize
WordPress: From Antispambot to Zeroize
 
Class 6 - PHP Web Programming
Class 6 - PHP Web ProgrammingClass 6 - PHP Web Programming
Class 6 - PHP Web Programming
 
Add loop shortcode
Add loop shortcodeAdd loop shortcode
Add loop shortcode
 
Intro to php
Intro to phpIntro to php
Intro to php
 
Generating Power with Yield
Generating Power with YieldGenerating Power with Yield
Generating Power with Yield
 
PHP POWERPOINT SLIDES
PHP POWERPOINT SLIDESPHP POWERPOINT SLIDES
PHP POWERPOINT SLIDES
 
basic concept of php(Gunikhan sonowal)
basic concept of php(Gunikhan sonowal)basic concept of php(Gunikhan sonowal)
basic concept of php(Gunikhan sonowal)
 
Intro to PHP
Intro to PHPIntro to PHP
Intro to PHP
 
Smarty Template Engine
Smarty Template EngineSmarty Template Engine
Smarty Template Engine
 
PHP Variables and scopes
PHP Variables and scopesPHP Variables and scopes
PHP Variables and scopes
 
Sa
SaSa
Sa
 
Introduction to php web programming - get and post
Introduction to php  web programming - get and postIntroduction to php  web programming - get and post
Introduction to php web programming - get and post
 
Php mysql
Php mysqlPhp mysql
Php mysql
 
bash
bashbash
bash
 
Quick tour of PHP from inside
Quick tour of PHP from insideQuick tour of PHP from inside
Quick tour of PHP from inside
 

Similar to "Character sets and iconv" PHP source code

Unicode, PHP, and Character Set Collisions
Unicode, PHP, and Character Set CollisionsUnicode, PHP, and Character Set Collisions
Unicode, PHP, and Character Set CollisionsRay Paseur
 
Get into the FLOW with Extbase
Get into the FLOW with ExtbaseGet into the FLOW with Extbase
Get into the FLOW with ExtbaseJochen Rau
 
Unicode and character sets
Unicode and character setsUnicode and character sets
Unicode and character setsrenchenyu
 
Software Internationalization Crash Course
Software Internationalization Crash CourseSoftware Internationalization Crash Course
Software Internationalization Crash CourseWill Iverson
 
2014 database - course 2 - php
2014 database - course 2 - php2014 database - course 2 - php
2014 database - course 2 - phpHung-yu Lin
 
The new features of PHP 7 - Enrico Zimuel - Codemotion Milan 2016
The new features of PHP 7 - Enrico Zimuel - Codemotion Milan 2016The new features of PHP 7 - Enrico Zimuel - Codemotion Milan 2016
The new features of PHP 7 - Enrico Zimuel - Codemotion Milan 2016Codemotion
 
Ch1(introduction to php)
Ch1(introduction to php)Ch1(introduction to php)
Ch1(introduction to php)Chhom Karath
 
Unicode (UTF-8) with PHP 5.3, MySQL 5.5 and HTML5 Cheat Sheet (2011)
Unicode (UTF-8) with PHP 5.3, MySQL 5.5 and HTML5 Cheat Sheet (2011)Unicode (UTF-8) with PHP 5.3, MySQL 5.5 and HTML5 Cheat Sheet (2011)
Unicode (UTF-8) with PHP 5.3, MySQL 5.5 and HTML5 Cheat Sheet (2011)Francois Cardinaux
 

Similar to "Character sets and iconv" PHP source code (20)

Unicode 101
Unicode 101Unicode 101
Unicode 101
 
Php mysql ppt
Php mysql pptPhp mysql ppt
Php mysql ppt
 
Unicode, PHP, and Character Set Collisions
Unicode, PHP, and Character Set CollisionsUnicode, PHP, and Character Set Collisions
Unicode, PHP, and Character Set Collisions
 
Get into the FLOW with Extbase
Get into the FLOW with ExtbaseGet into the FLOW with Extbase
Get into the FLOW with Extbase
 
Unicode and character sets
Unicode and character setsUnicode and character sets
Unicode and character sets
 
Php mysql
Php mysqlPhp mysql
Php mysql
 
My cool new Slideshow!
My cool new Slideshow!My cool new Slideshow!
My cool new Slideshow!
 
slidesharenew1
slidesharenew1slidesharenew1
slidesharenew1
 
Software Internationalization Crash Course
Software Internationalization Crash CourseSoftware Internationalization Crash Course
Software Internationalization Crash Course
 
PHP for Grown-ups
PHP for Grown-upsPHP for Grown-ups
PHP for Grown-ups
 
Php mysql
Php mysqlPhp mysql
Php mysql
 
2014 database - course 2 - php
2014 database - course 2 - php2014 database - course 2 - php
2014 database - course 2 - php
 
Php Lecture Notes
Php Lecture NotesPhp Lecture Notes
Php Lecture Notes
 
Introduction in php
Introduction in phpIntroduction in php
Introduction in php
 
Js mod1
Js mod1Js mod1
Js mod1
 
The new features of PHP 7 - Enrico Zimuel - Codemotion Milan 2016
The new features of PHP 7 - Enrico Zimuel - Codemotion Milan 2016The new features of PHP 7 - Enrico Zimuel - Codemotion Milan 2016
The new features of PHP 7 - Enrico Zimuel - Codemotion Milan 2016
 
The new features of PHP 7
The new features of PHP 7The new features of PHP 7
The new features of PHP 7
 
Blog Hacks 2011
Blog Hacks 2011Blog Hacks 2011
Blog Hacks 2011
 
Ch1(introduction to php)
Ch1(introduction to php)Ch1(introduction to php)
Ch1(introduction to php)
 
Unicode (UTF-8) with PHP 5.3, MySQL 5.5 and HTML5 Cheat Sheet (2011)
Unicode (UTF-8) with PHP 5.3, MySQL 5.5 and HTML5 Cheat Sheet (2011)Unicode (UTF-8) with PHP 5.3, MySQL 5.5 and HTML5 Cheat Sheet (2011)
Unicode (UTF-8) with PHP 5.3, MySQL 5.5 and HTML5 Cheat Sheet (2011)
 

More from Daniel_Rhodes

PhoneGap by Dissection
PhoneGap by DissectionPhoneGap by Dissection
PhoneGap by DissectionDaniel_Rhodes
 
Hyperlocalisation or "localising everything"
Hyperlocalisation or "localising everything"Hyperlocalisation or "localising everything"
Hyperlocalisation or "localising everything"Daniel_Rhodes
 
PHP floating point precision
PHP floating point precisionPHP floating point precision
PHP floating point precisionDaniel_Rhodes
 
Creating a constructive comment culture
Creating a constructive comment cultureCreating a constructive comment culture
Creating a constructive comment cultureDaniel_Rhodes
 
"Internationalisation with PHP and Intl" source code
"Internationalisation with PHP and Intl" source code"Internationalisation with PHP and Intl" source code
"Internationalisation with PHP and Intl" source codeDaniel_Rhodes
 
Internationalisation with PHP and Intl
Internationalisation with PHP and IntlInternationalisation with PHP and Intl
Internationalisation with PHP and IntlDaniel_Rhodes
 
Character sets and iconv
Character sets and iconvCharacter sets and iconv
Character sets and iconvDaniel_Rhodes
 
Handling multibyte CSV files in PHP
Handling multibyte CSV files in PHPHandling multibyte CSV files in PHP
Handling multibyte CSV files in PHPDaniel_Rhodes
 
Multibyte string handling in PHP
Multibyte string handling in PHPMultibyte string handling in PHP
Multibyte string handling in PHPDaniel_Rhodes
 

More from Daniel_Rhodes (9)

PhoneGap by Dissection
PhoneGap by DissectionPhoneGap by Dissection
PhoneGap by Dissection
 
Hyperlocalisation or "localising everything"
Hyperlocalisation or "localising everything"Hyperlocalisation or "localising everything"
Hyperlocalisation or "localising everything"
 
PHP floating point precision
PHP floating point precisionPHP floating point precision
PHP floating point precision
 
Creating a constructive comment culture
Creating a constructive comment cultureCreating a constructive comment culture
Creating a constructive comment culture
 
"Internationalisation with PHP and Intl" source code
"Internationalisation with PHP and Intl" source code"Internationalisation with PHP and Intl" source code
"Internationalisation with PHP and Intl" source code
 
Internationalisation with PHP and Intl
Internationalisation with PHP and IntlInternationalisation with PHP and Intl
Internationalisation with PHP and Intl
 
Character sets and iconv
Character sets and iconvCharacter sets and iconv
Character sets and iconv
 
Handling multibyte CSV files in PHP
Handling multibyte CSV files in PHPHandling multibyte CSV files in PHP
Handling multibyte CSV files in PHP
 
Multibyte string handling in PHP
Multibyte string handling in PHPMultibyte string handling in PHP
Multibyte string handling in PHP
 

Recently uploaded

🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 

Recently uploaded (20)

🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 

"Character sets and iconv" PHP source code

  • 1. First play <?php //note that this script file is UTF-8 //set browser to ISO-8859-1 header("Content-Type: text/html; charset=ISO-8859-1;"); $utf8_sentence = 'That will be £500 please'; //gives [That will be £500 please] as UTF-8 is being displayed //as ISO-8859-1 echo $utf8_sentence . '<br>'; $iso_sentence = iconv('UTF-8', 'ISO-8859-1', $utf8_sentence); //gives [That will be £500 please] as no mismatch //between actual character set of string //and browser echo $iso_sentence . '<br>'; //YOU TRY IT! When viewing this in your browser, //set the page's encoding to UTF-8 and you will //see the mojibake reverse! ?> Within reason <?php //note that this script file is UTF-8 //set browser to ISO-8859-1 header("Content-Type: text/html; charset=ISO-8859-1;"); //some Korean (contains two space characters) $utf8_sentence = '연예가 뒷 이야기'; //gives [연예가 ë’· ì ´ì•¼ê¸°] as UTF-8 is being displayed //as ISO-8859-1 echo $utf8_sentence . '<br>'; //gives [Notice: iconv(): Detected an illegal character in input string] $iso_sentence = iconv('UTF-8', 'ISO-8859-1', $utf8_sentence); //gives an empty string var_dump($iso_sentence); ?>
  • 2. First transliteration <?php //note that this script file is UTF-8 //set browser to ISO-8859-1 header("Content-Type: text/html; charset=ISO-8859-1;"); //some Korean (contains two space characters) $utf8_sentence = '연예가 뒷 이야기'; //gives [연예가 ë’· 이야기] as UTF-8 is being displayed //as ISO-8859-1 echo $utf8_sentence . '<br>'; //approximate characters that aren't in target character set $iso_sentence = iconv('UTF-8', 'ISO-8859-1//TRANSLIT', $utf8_sentence); //gives [??? ? ???] echo $iso_sentence . '<br>'; ?> More realistic transliteration (extended) <?php //note that this script file is UTF-8 //set browser to UTF-8 header("Content-Type: text/html; charset=UTF-8;"); //some German $utf8_sentence = 'Weiß, Goldmann, Göbel, Weiss, Göthe, Goethe und Götz'; //fine as UTF-8 is being displayed as UTF-8 echo $utf8_sentence . '<br>'; $trans_sentence = iconv('UTF-8', 'ASCII//TRANSLIT', $utf8_sentence); //gives [Weiss, Goldmann, G?bel, Weiss, G?the, Goethe und G?tz] //which is not quite what we expected (only 'ß' has been flattened) echo $trans_sentence . '<br>'; //BUT iconv interacts with system locale setting so let's have a play: $current_locale = setlocale(LC_ALL, '0'); //gives, for me, "C" which is a kind of nondescript default echo $current_locale . '<br>';
  • 3. //we set the locale of the *target* character set setlocale(LC_ALL, 'en_GB'); //try again... $trans_sentence = iconv('UTF-8', 'ASCII//TRANSLIT', $utf8_sentence); //gives [Weiss, Goldmann, Gobel, Weiss, Gothe, Goethe und Gotz] //which is our original string flattened into 7-bit ASCII! echo $trans_sentence . '<br>'; //out of curiosity... setlocale(LC_ALL, 'de_DE'); $trans_sentence = iconv('UTF-8', 'ASCII//TRANSLIT', $utf8_sentence); //gives [Weiss, Goldmann, Goebel, Weiss, Goethe, Goethe und Goetz] //which is exactly how a German would transliterate those //umlauted characters if forced to use 7-bit ASCII! //(because really ä = ae, ö = oe and ü = ue) echo $trans_sentence . '<br>'; ?> Ignore example <?php //note that this script file is UTF-8 //set browser to ISO-8859-1 header("Content-Type: text/html; charset=ISO-8859-1;"); //some Korean (contains two space characters) $utf8_sentence = '연예가 뒷 이야기'; //gives [연예가 ë’· 이야기] as UTF-8 is being displayed //as ISO-8859-1 echo $utf8_sentence . '<br>'; //discard characters that aren't in target character set //STILL gives [Notice: iconv(): Detected an illegal character in input string] $iso_sentence = iconv('UTF-8', 'ISO-8859-1//IGNORE', $utf8_sentence); //gives " " (two space characters) var_dump($iso_sentence); ?>
  • 4. ob_iconv_handler <?php //note that this script file is UTF-8 //character set of PHP scripts etc iconv_set_encoding('internal_encoding', 'UTF-8'); //character set of browser output //(sends HTTP header of "Content-Type: text/html; charset=ISO-8859- 1;") iconv_set_encoding('output_encoding', 'ISO-8859-1//TRANSLIT'); ob_start('ob_iconv_handler'); //start output buffering //Unicode string $utf8_sentence = 'The Japanese title is "指輪物語"'; //when buffer is flushed, outputs [The Japanese title is "????"] echo $utf8_sentence; ?> iconv_strlen() <?php //note that this script file is UTF-8 //set browser to UTF-8 header("Content-Type: text/html; charset=UTF-8;"); //some Russian (13 characters) $utf8_sentence = 'Правительство'; //gives 13 which is correct echo iconv_strlen($utf8_sentence, 'UTF-8') . '<br>'; //let's try core PHP //gives 26 (the *byte* count). Oops! echo strlen($utf8_sentence) . '<br>'; ?>
  • 5. Inter-Japanese conversion (not on presentation) <?php //note that this script file is UTF-8 //set browser to EUC-JP (a Japanese character set) header("Content-Type: text/html; charset=EUC-JP;"); //some Japanese $utf8_sentence = '一斗缶に詰められた遺体は、藤森容疑者の妻と息子と判明。'; //gives mojibake as UTF-8 is being displayed as EUC-JP echo $utf8_sentence . '<br>'; $euc_sentence = iconv('UTF-8', 'EUC-JP', $utf8_sentence); //gives intact Japanese string echo $euc_sentence . '<br>'; ?>