Your SlideShare is downloading. ×
0
ppt
ppt
ppt
ppt
ppt
ppt
ppt
ppt
ppt
ppt
ppt
ppt
ppt
ppt
ppt
ppt
ppt
ppt
ppt
ppt
ppt
ppt
ppt
ppt
ppt
ppt
ppt
ppt
ppt
ppt
ppt
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

ppt

993

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
993
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
9
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • Language negotiation: implemented in libraries or on web servers Apache MultiViews Language Negotiation Perl I18N::LangTags ASP.NET Language Fallback Key difference to other user preferences: if the user can’t read the description of the preference he/she doesn’t even have a chance to make a choice.
  • BCP: Best Current Practice BCP 47: Concatenation of RFCs 4646 and 4647
  • Developed originally for the Taligent OS project in the late 80s/early 90s
  • UTF=Unicode Transformation Format UTF-8: used on the web, C++, Perl, Ruby UTF-16: used in Java, .NET, Python, ICU UTF-32: used in some C++ libraries
  • Character references: character entity references, numeric character references (ü: ü or ü) Character references impact searchability
  • If you have to use absolute sizing, define in CSS style.
  • HTML with template library: mirrors collaboration with designers XLIFF=OASIS XML Localization Interchange File Format
  • RSS 2.0: Aggregator needs to be able to parse RFC 822 formats (incl. timezone)
  • Transcript

    • 1. Making Cents of Yens and Euros: Web 2.0 Internationalization Achim Ruopp [email_address] http://www.digitalsilkroad.net/
    • 2. Demo A Currency Converter Application – before and after Web 2.0 Internationalization
    • 3. Agenda <ul><li>Introduction to Web Internationalization (i18n) </li></ul><ul><ul><li>Selecting and Persisting User Preferences </li></ul></ul><ul><ul><li>Locales and Locale Identifiers </li></ul></ul><ul><ul><li>Unicode </li></ul></ul><ul><ul><li>Localization – Model and Tools </li></ul></ul><ul><li>Multi-lingual Syndication </li></ul><ul><ul><li>RSS </li></ul></ul><ul><ul><li>Atom </li></ul></ul><ul><li>Client-side Scripting </li></ul><ul><ul><li>Javascript Internationalization </li></ul></ul><ul><ul><li>Ajax </li></ul></ul><ul><li>International Web Services Design </li></ul><ul><ul><li>REST </li></ul></ul><ul><ul><li>SOAP </li></ul></ul>
    • 4. Intro to Web Internationalization Language and Location en-US fr en;0.8 da-DK
    • 5. Intro to Web Internationalization User Preferences <ul><li>Language </li></ul><ul><ul><li>HTTP Accept-Language header </li></ul></ul><ul><ul><li>E.g.: en, fr-CA;0.8, fr;0.6 </li></ul></ul><ul><ul><li>Language negotiation with the server </li></ul></ul><ul><li>Locale </li></ul><ul><ul><li>Cultural preferences for formatting, sorting etc. </li></ul></ul><ul><ul><li>Infer from Accept-Language header </li></ul></ul><ul><ul><li>Map IPv4 address to ccTLD (country code top-level domain) </li></ul></ul><ul><ul><ul><li>Public information accessible through libraries </li></ul></ul></ul><ul><ul><ul><ul><li>E.g. Perl IP::Country CPAN module </li></ul></ul></ul></ul><ul><ul><ul><li>Commercial services offer more precision </li></ul></ul></ul><ul><li>Always provide option to change defaults </li></ul><ul><li>Store preferences in cookies </li></ul>
    • 6. Intro to Web Internationalization Internet Language Tags <ul><li>IETF Language Tags (BCP 47) </li></ul><ul><li>Language[-Language]* 3 [-Script][-Region] [-Variant]*[-Extension]*[-PrivateUse]* </li></ul><ul><li>Examples </li></ul><ul><ul><li>en-CA: English in Canada </li></ul></ul><ul><ul><li>Zh-Hant-TW: Chinese written in traditional Chinese script used in Taiwan </li></ul></ul><ul><li>Obsoletes RFC 3066 & RFC 1766 </li></ul><ul><ul><li>Often still used in products/earlier standards </li></ul></ul>
    • 7. Internationalization Changes
    • 8. Intro to Web Internationalization POSIX Locales <ul><li>Cross-platform API </li></ul><ul><ul><li>Locale-identifiers can have variations </li></ul></ul><ul><ul><ul><li>Un*x: en_US </li></ul></ul></ul><ul><ul><ul><li>Windows: English_United States </li></ul></ul></ul><ul><ul><li>Results can be platform-dependent </li></ul></ul><ul><li>Basis for locale functionality in all scripting languages </li></ul><ul><li>Provides functionality for </li></ul><ul><ul><li>Number Formatting: 1,000,000.23 </li></ul></ul><ul><ul><li>Date/Time Formatting: 8 Μάρτιος 2007 12:00:00 μμ </li></ul></ul><ul><ul><li>Sorting </li></ul></ul><ul><ul><li>String processing (e.g. upper-/lower-casing) </li></ul></ul><ul><ul><li>Some translated strings like weekdays, yes/no messages </li></ul></ul>
    • 9. Intro to Web Internationalization International Components for Unicode <ul><li>IBM Open Source project </li></ul><ul><li>Extensive locale data and APIs </li></ul><ul><ul><li>Data vetted as part of Common Locale Data Repository (CLDR) project </li></ul></ul><ul><li>Java and C++ APIs </li></ul><ul><li>Wrappers for scripting languages </li></ul><ul><ul><li>PyICU (Python) </li></ul></ul><ul><ul><li>ICU4R (Ruby) – abandoned? </li></ul></ul><ul><ul><li>DIY – difficult because of API complexity and character encoding issues </li></ul></ul>
    • 10. Intro to Web Internationalization Microsoft Internationalization APIs <ul><li>Windows NLS API </li></ul><ul><li>Microsoft .NET Framework System.Globalization namespace </li></ul><ul><li>Similar set of data to ICU </li></ul><ul><ul><li>Data vetted by Microsoft subsidiaries </li></ul></ul><ul><li>APIs accessible from all Microsoft programming languages </li></ul>
    • 11. Intro to Web Internationalization Unicode 5.0 00000 10000 20000 30000 E0000 F0000 100000 … Basic Multilingual Plane Dead Languages & Math Han Characters Language Tags Private Use 0000 1000 2000 3000 4000 5000 6000 7000 8000 9000 A000 B000 C000 D000 E000 F000 Alphabets Punctuation Asian Languages Han Characters Yi Hangul Surrogates Private Use Legacy/Compatibility 99,024 of 1,114,112 code points (U+0000 to U+10FFFF) defined
    • 12. Intro to Web Internationalization Unicode Encodings Forms <ul><li>Variable length: UTF-8/UTF-16 </li></ul><ul><li>Fixed length: UTF-32 </li></ul><ul><li>U+2122: ™: Trade Mark Sign </li></ul>0…00100001 00100010 0x00002122 UTF-32 00100001 00100010 0x2122 UTF-16 1110 0010 10 000100 10 100010 0xE2 0x84 0xA2 UTF-8
    • 13. Intro to Web Internationalization Unicode on the Web <ul><li>XML processors are required to process UTF-8/UTF-16 </li></ul><ul><li>Encoding declaration precedence </li></ul><ul><ul><li>HTTP Content-Type header charset declaration </li></ul></ul><ul><ul><li>XML encoding declaration (XHTML) </li></ul></ul><ul><ul><li>meta charset declaration in (X)HTML </li></ul></ul><ul><ul><li>link element charset attribute </li></ul></ul><ul><li>Approx. 4% of pages have encoding errors* </li></ul><ul><li>No real need for character references </li></ul><ul><ul><li>ü: &uuml; or &#252 </li></ul></ul><ul><ul><li>Exceptions: <,>,&,&quot; </li></ul></ul><ul><li>Use styles to control font selection </li></ul>
    • 14. Demo A Currency Converter Application – globalized but not localized
    • 15. Intro to Web Internationalization Localization Recommendations Avoid translatable text in graphics Make sure graphics are culturally neutral Avoid absolute sizing Use HTML flow layout Write complete sentences
    • 16. Intro to Web Internationalization Localization Model and Tools <ul><li>Text translation </li></ul><ul><ul><li>Localization formats </li></ul></ul><ul><ul><ul><li>HTML with template library </li></ul></ul></ul><ul><ul><ul><ul><li>W3C Internationalization Tag Set (tool support?) </li></ul></ul></ul></ul><ul><ul><ul><li>GNU gettext/PO </li></ul></ul></ul><ul><ul><ul><li>XLIFF - XML Localization Interchange File Format </li></ul></ul></ul><ul><ul><li>Localization tools </li></ul></ul><ul><ul><ul><li>OmegaT </li></ul></ul></ul><ul><ul><ul><li>Open Language Tools (Sun) </li></ul></ul></ul><ul><ul><ul><li>The WordForge Project: Pootle </li></ul></ul></ul><ul><ul><ul><li>… </li></ul></ul></ul><ul><li>Searchability – Links/Sitemap </li></ul>
    • 17. Demo A Currency Converter Application – fully internationalized Web 1.0 application
    • 18. Client-side Scripting Javascript Internationalization <ul><li>ECMAScript edition 3 added a range of internationalization features (1999) </li></ul><ul><ul><li>Good support for Unicode processing </li></ul></ul><ul><ul><li>Set of locale-sensitive functions </li></ul></ul><ul><ul><ul><li>Dependent on host locale (i.e. browser) </li></ul></ul></ul><ul><ul><li>Set of locale-insensitive functions </li></ul></ul><ul><ul><li>No number or date/time parsing </li></ul></ul><ul><li>Javascript libraries with additional internationalization functionality </li></ul><ul><ul><li>dojo Toolkit (i18n contributed by IBM) </li></ul></ul><ul><ul><li>Microsoft AJAX Library </li></ul></ul>
    • 19. Client-side Scripting AJAX Recommendations <ul><li>Late globalization </li></ul><ul><ul><li>Transmit data in locale-independent form with XMLHttpRequest </li></ul></ul><ul><ul><li>Might require some creative parsing/UI </li></ul></ul><ul><li>Early localization </li></ul><ul><ul><li>Text localization server-side </li></ul></ul><ul><ul><li>Browsers are missing a message-catalog facility </li></ul></ul><ul><ul><li>Dynamically created page content is invisible to search engines </li></ul></ul>
    • 20. Multi-lingual Syndication RSS 2.0 <ul><li>Character encoding </li></ul><ul><ul><li>RSS 2.0 is an XML application </li></ul></ul><ul><ul><li>XML encoding rules apply </li></ul></ul><ul><li>Language </li></ul><ul><ul><li>Element only on channel (feed), not on item </li></ul></ul><ul><ul><ul><li>Create one channel per language </li></ul></ul></ul><ul><ul><li>Specified to comply to RFC1766 language tags </li></ul></ul><ul><li>Date/Time </li></ul><ul><ul><li>In standard RFC 822 format (including 4-digit years) </li></ul></ul><ul><ul><ul><li>E.g. “Wed, 02 Oct 2002 08:00:00 EST” </li></ul></ul></ul>
    • 21. Multi-lingual Syndication Atom Syndication <ul><li>More granular language marking </li></ul><ul><ul><li>xml:lang can be applied to any human readable text in the format </li></ul></ul><ul><ul><li>Aggregators need to deal with this </li></ul></ul><ul><li>Better date/time format: RFC 3339 </li></ul><ul><ul><li>E.g. “2003-12-13T18:30:02-05:00” </li></ul></ul><ul><li>Acknowledgement: Tim Bray </li></ul>
    • 22. Demo A Currency Converter Application – adding a syndication feed with exchange rate information
    • 23. International Web Services Design Service Patterns norske kroner ? NOK CHF Service adjusts formatting and language to locale the data refers to Data Driven 03/08/2007 12:00pm EST Service is locale-specific and ignores client preference Service Determined Kanadischer Dollar CAD (Accept-Language: de) Service reacts to client-locale e.g. HTTP Accept-Language Client Influenced 1.1785 CAD Neutral data formats Locale Neutral Return data Request data Description
    • 24. International Web Services Design REST <ul><li>REST naturally ties into i18n features in HTTP/HTML/XML </li></ul><ul><ul><li>Locale indicated with HTTP Accept-Language </li></ul></ul><ul><ul><li>Encoding and language marking in markup </li></ul></ul><ul><li>Special caution for HTTP GET parameters </li></ul><ul><ul><li>Locale-independent formatting recommended </li></ul></ul><ul><ul><li>Text parameters </li></ul></ul><ul><ul><ul><li>Encode in UTF-8 and escape in URIs </li></ul></ul></ul><ul><ul><ul><li>IRI (International Resource Identifier) functionality might provide this for you </li></ul></ul></ul>
    • 25. International Web Services Design SOAP <ul><li>Locale can be communicated in </li></ul><ul><ul><li>Transport header (e.g. HTTP) </li></ul></ul><ul><ul><li>SOAP header </li></ul></ul><ul><ul><li>SOAP message body </li></ul></ul><ul><li>Beware of automatically generated SOAP interfaces </li></ul><ul><ul><li>Might be locale-dependent, but not allow to specify locale </li></ul></ul><ul><li>Use of XML Schema data types promotes locale-independence </li></ul><ul><li>Also consider localization of error messages </li></ul>
    • 26. Conclusions <ul><li>Unification </li></ul><ul><ul><li>One code base </li></ul></ul><ul><li>Customization </li></ul><ul><ul><li>Localization and adaptation for locales </li></ul></ul><ul><li>Next step: cross-language “leakage” </li></ul><ul><ul><li>Provide views in multiple languages to the same (user-generated) data </li></ul></ul><ul><ul><li>Translate user-generated content </li></ul></ul><ul><ul><ul><li>Volunteers </li></ul></ul></ul><ul><ul><ul><li>Machine Translation </li></ul></ul></ul>
    • 27. Call for Contributions <ul><li>Presentation and Perl CGI demo code </li></ul><ul><ul><li>http://www.digitalsilkroad.net/web2expo </li></ul></ul><ul><li>Add a version in your preferred language </li></ul><ul><ul><li>Ruby on Rails </li></ul></ul><ul><ul><li>PHP </li></ul></ul><ul><ul><li>Python </li></ul></ul><ul><ul><li>… </li></ul></ul><ul><li>Similar ASP.NET application </li></ul><ul><ul><li>http://quickstarts.asp.net/QuickStartv20/aspnet/doc/localization/default.aspx </li></ul></ul>
    • 28. References <ul><li>W3C Internationalization Activity </li></ul><ul><ul><li>http://www.w3.org/International/ </li></ul></ul><ul><li>POSIX Locale </li></ul><ul><ul><li>http://www.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap07.html </li></ul></ul><ul><li>International Components for Unicode </li></ul><ul><ul><li>http://www-306.ibm.com/software/globalization/icu/ </li></ul></ul><ul><li>Unicode/Common Locale Data Repository </li></ul><ul><ul><li>http://www.unicode.org/ </li></ul></ul><ul><li>Microsoft Internationalization APIs </li></ul><ul><ul><li>http://msdn2.microsoft.com/en-us/library/ms776254.aspx </li></ul></ul><ul><ul><li>http://msdn2.microsoft.com/en-us/library/system.globalization.aspx </li></ul></ul>
    • 29. References <ul><li>OmegaT </li></ul><ul><ul><li>http://www.omegat.org/omegat/omegat_en/omegat.html </li></ul></ul><ul><li>Open Language Tools </li></ul><ul><ul><li>https://open-language-tools.dev.java.net/ </li></ul></ul><ul><li>The WordForge Project </li></ul><ul><ul><li>http://www.wordforge.org/drupal/ </li></ul></ul><ul><li>Javascript Internationalization </li></ul><ul><ul><li>http://www.icu-project.org/docs/papers/internationalization_support_for_javascript.html </li></ul></ul><ul><li>RSS 2.0 </li></ul><ul><ul><li>http://www.rssboard.org/rss-specification </li></ul></ul><ul><li>Atom Syndication </li></ul><ul><ul><li>http://www.atomenabled.org/developers/syndication </li></ul></ul><ul><li>RSS 1.0 </li></ul><ul><ul><li>http://web.resource.org/rss/1.0/spec </li></ul></ul><ul><li>W3C Web Services Internationalization Usage Scenarios </li></ul><ul><ul><li>http://www.w3.org/TR/ws-i18n-scenarios/ </li></ul></ul>
    • 30. Additional Slides
    • 31. Multi-lingual Syndication RSS 1.0 <ul><li>Character encoding </li></ul><ul><ul><li>RSS 1.0 is an XML application </li></ul></ul><ul><ul><li>XML encoding rules apply </li></ul></ul><ul><li>Complies to RDF (Resource Description Framework) specification </li></ul><ul><ul><li>Definition of language and date/time formats are left to RDF metadata formats </li></ul></ul><ul><ul><ul><li>Dublin Core Metadata Element Set </li></ul></ul></ul><ul><ul><ul><li>Language: RFC1766/ISO639-2 </li></ul></ul></ul><ul><ul><ul><li>Date/Time: ISO 8601 (superset of RFC 3339) </li></ul></ul></ul><ul><ul><ul><ul><li>Also Dublin Core allows to specify time periods! </li></ul></ul></ul></ul>

    ×