U+000 to U+127 1 byte
ASCII = UTF8
High bit indicates more bytes
High bits are used to indicate how many bytes are used to
represent a speciﬁc character. Software can easily read a
UTF8 stream, even starting in the middle.
Common for internet and ﬁle system
• XML: default encoding
• Flash: only encoding
UTF-8 encoded text may be larger
Possible to split a string mid-character
Not all implementations are complete
For example, MySql5
supports only 3 bytes for UTF8
Most spoken languages can be represented in 3 bytes,
the "Basic Multilingual Plane"
In May 2001, the Unicode Technical Committee rejected the Klingon proposal;
however, Michael Everson created a mapping of pIqaD into the Private Use
Area of Unicode, which are listed in the ConScript Unicode Registry
(U+F8D0 to U+F8FF).
The tengwar font has been proposed for the Unicode standard. The codepoints
are subject to change; the range U+016080 to U+0160FF in the SMP is
tentatively allocated for tengwar according to the current Unicode roadmap.
You need to have an appropriate font installed
to use unicode.
• You can specify what character set you
want back when you send a form post
• This is informational for the server
• Just setting these won’t change how your
app behaves, unless your web app has code