Zh-Hant-TW: Chinese written in traditional Chinese script used in Taiwan
Obsoletes RFC 3066 & RFC 1766
Often still used in products/earlier standards
Internationalization Changes
Intro to Web Internationalization POSIX Locales
Cross-platform API
Locale-identifiers can have variations
Un*x: en_US
Windows: English_United States
Results can be platform-dependent
Basis for locale functionality in all scripting languages
Provides functionality for
Number Formatting: 1,000,000.23
Date/Time Formatting: 8 Μάρτιος 2007 12:00:00 μμ
Sorting
String processing (e.g. upper-/lower-casing)
Some translated strings like weekdays, yes/no messages
Intro to Web Internationalization International Components for Unicode
IBM Open Source project
Extensive locale data and APIs
Data vetted as part of Common Locale Data Repository (CLDR) project
Java and C++ APIs
Wrappers for scripting languages
PyICU (Python)
ICU4R (Ruby) – abandoned?
DIY – difficult because of API complexity and character encoding issues
Intro to Web Internationalization Microsoft Internationalization APIs
Windows NLS API
Microsoft .NET Framework System.Globalization namespace
Similar set of data to ICU
Data vetted by Microsoft subsidiaries
APIs accessible from all Microsoft programming languages
Intro to Web Internationalization Unicode 5.0 00000 10000 20000 30000 E0000 F0000 100000 … Basic Multilingual Plane Dead Languages & Math Han Characters Language Tags Private Use 0000 1000 2000 3000 4000 5000 6000 7000 8000 9000 A000 B000 C000 D000 E000 F000 Alphabets Punctuation Asian Languages Han Characters Yi Hangul Surrogates Private Use Legacy/Compatibility 99,024 of 1,114,112 code points (U+0000 to U+10FFFF) defined
Intro to Web Internationalization Unicode Encodings Forms
Intro to Web Internationalization Unicode on the Web
XML processors are required to process UTF-8/UTF-16
Encoding declaration precedence
HTTP Content-Type header charset declaration
XML encoding declaration (XHTML)
meta charset declaration in (X)HTML
link element charset attribute
Approx. 4% of pages have encoding errors*
No real need for character references
Exceptions: <,>,&,"
Use styles to control font selection
Demo A Currency Converter Application – globalized but not localized
Intro to Web Internationalization Localization Recommendations Avoid translatable text in graphics Make sure graphics are culturally neutral Avoid absolute sizing Use HTML flow layout Write complete sentences
Intro to Web Internationalization Localization Model and Tools
Text translation
Localization formats
HTML with template library
W3C Internationalization Tag Set (tool support?)
GNU gettext/PO
XLIFF - XML Localization Interchange File Format
Localization tools
OmegaT
Open Language Tools (Sun)
The WordForge Project: Pootle
…
Searchability – Links/Sitemap
Demo A Currency Converter Application – fully internationalized Web 1.0 application
ECMAScript edition 3 added a range of internationalization features (1999)
Good support for Unicode processing
Set of locale-sensitive functions
Dependent on host locale (i.e. browser)
Set of locale-insensitive functions
No number or date/time parsing
Javascript libraries with additional internationalization functionality
dojo Toolkit (i18n contributed by IBM)
Microsoft AJAX Library
Client-side Scripting AJAX Recommendations
Late globalization
Transmit data in locale-independent form with XMLHttpRequest
Might require some creative parsing/UI
Early localization
Text localization server-side
Browsers are missing a message-catalog facility
Dynamically created page content is invisible to search engines
Demo A Currency Converter Application – dynamic update of exchange amounts using Ajax
Multi-lingual Syndication RSS 2.0
Character encoding
RSS 2.0 is an XML application
XML encoding rules apply
Language
Element only on channel (feed), not on item
Create one channel per language
Specified to comply to RFC1766 language tags
Date/Time
In standard RFC 822 format (including 4-digit years)
E.g. “Wed, 02 Oct 2002 08:00:00 EST”
Multi-lingual Syndication Atom Syndication
More granular language marking
xml:lang can be applied to any human readable text in the format
Aggregators need to deal with this
Better date/time format: RFC 3339
E.g. “2003-12-13T18:30:02-05:00”
Acknowledgement: Tim Bray
Demo A Currency Converter Application – adding a syndication feed with exchange rate information
International Web Services Design Service Patterns norske kroner ? NOK CHF Service adjusts formatting and language to locale the data refers to Data Driven 03/08/2007 12:00pm EST Service is locale-specific and ignores client preference Service Determined Kanadischer Dollar CAD (Accept-Language: de) Service reacts to client-locale e.g. HTTP Accept-Language Client Influenced 1.1785 CAD Neutral data formats Locale Neutral Return data Request data Description
International Web Services Design REST
REST naturally ties into i18n features in HTTP/HTML/XML
Locale indicated with HTTP Accept-Language
Encoding and language marking in markup
Special caution for HTTP GET parameters
Locale-independent formatting recommended
Text parameters
Encode in UTF-8 and escape in URIs
IRI (International Resource Identifier) functionality might provide this for you
International Web Services Design SOAP
Locale can be communicated in
Transport header (e.g. HTTP)
SOAP header
SOAP message body
Beware of automatically generated SOAP interfaces
Might be locale-dependent, but not allow to specify locale
Use of XML Schema data types promotes locale-independence
Also consider localization of error messages
Demo A Currency Converter Application – exchange rates as a REST web service
Conclusions
Unification
One code base
Customization
Localization and adaptation for locales
Next step: cross-language “leakage”
Provide views in multiple languages to the same (user-generated) data
One thing hasn’t changed in Web 2.0: users can be more
One thing hasn’t changed in Web 2.0: users can be from many different countries, speaking many different languages. This session will show how to design internationalized SOAP and REST web services, how to deal with multiple languages in syndication, and how to make all of this work with Ajax in the browser. Over the course of the session we will take a plain, screen-scraping Web 1.0 currency converter and remake it into a multilingual, personalized Web 2.0 mashup, gadget, feed, and service. Learn the principles and apply them in your environment. less
0 comments
Post a comment