Character sets and iconv This presentation is about character sets and the iconv library (with usage examples in PHP) By Daniel Rhodes of Warp Asylum http://www.warpasylum.co.uk
What is a character set? Mapping of  character x in human language y is value z
Western European languages often use 8-bit ISO 8859-1
English possible in 7-bit ASCII!
Some languages have complex / numerous characters and need 2, 3 or even 4 bytes to represent one character!
So, many different character sets exist
More about character sets Even same language may have many different character sets
Character sets tend not to be compatible
So, conversion is necessary and useful
But Unicode is coming through as a modernising, unifying character set
Unicode is one HUGE character set that can be used to represent any character from any language!
Character sets? Who cares! Anglophones very lucky as everything seems to  just work  (even if in the background different character sets are interacting)
English is not the only language!
An app expecting character set  x  but getting  y  (or an incorrect character set conversion) will result in mojibake
Mojibake? What's that? A great Japanese word meaning garbled (bake) characters (moji)
Often encountered in Japanese computing with its two traditional character sets, Unicode and a separate character set for emails!
Shouldn't really happen at all in modern computing
But it still does, mostly due to lack of implementation knowledge
Mojibake in English A slight case of mojibake here, the pound symbols (£) have garbled
Mojibake in German More severe now, umlauted vowels ( ä, ö and ü ) have garbled
Mojibake in Japanese Ouch!
What is the iconv library? API to convert between character sets
Works on strings
Some support for transliteration (changing / substituting characters in source character set that don't exist in target character set)
Your implementation may vary, but a HUGE number of character sets are supported
Some iconv use cases Convert legacy character set ↔ Unicode
Convert backend ↔ frontend character sets
Convert file's character set for import / export
Transliterate to remove unwanted characters
Transliterate to make safe for URL / filename

Character sets and iconv