Making Cents of Yens and Euros: Web 2.0 Internationalization
Upcoming SlideShare
Loading in...5
×
 

Making Cents of Yens and Euros: Web 2.0 Internationalization

on

  • 5,975 views

One thing hasn’t changed in Web 2.0: users can be from many different countries, speaking many different languages. This session will show how to design internationalized SOAP and REST web services, ...

One thing hasn’t changed in Web 2.0: users can be from many different countries, speaking many different languages. This session will show how to design internationalized SOAP and REST web services, how to deal with multiple languages in syndication, and how to make all of this work with Ajax in the browser. Over the course of the session we will take a plain, screen-scraping Web 1.0 currency converter and remake it into a multilingual, personalized Web 2.0 mashup, gadget, feed, and service. Learn the principles and apply them in your environment.

Statistics

Views

Total Views
5,975
Views on SlideShare
5,972
Embed Views
3

Actions

Likes
1
Downloads
87
Comments
0

1 Embed 3

http://www.slideshare.net 3

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Making Cents of Yens and Euros: Web 2.0 Internationalization Making Cents of Yens and Euros: Web 2.0 Internationalization Presentation Transcript

  • Making Cents of Yens and Euros: Web 2.0 Internationalization Achim Ruopp Digital Silk Road http://www.digitalsilkroad.net/
  • Demo A Currency Converter Application – before and after Web 2.0 Internationalization
  • Agenda
    • Introduction to Web Internationalization (i18n)
      • Selecting and Persisting User Preferences
      • Locales and Locale Identifiers
      • Unicode
      • Localization – Model and Tools
    • Client-side Scripting
      • Javascript Internationalization
      • Ajax
    • Multi-lingual Syndication
      • RSS
      • Atom
    • International Web Services Design
      • REST
      • SOAP
  • Intro to Web Internationalization Language and Location en-US fr en;0.8 da-DK
  • Intro to Web Internationalization User Preferences
    • Language
      • HTTP Accept-Language header
      • E.g.: en, fr-CA;0.8, fr;0.6
      • Language negotiation with the server
    • Locale
      • Cultural preferences for formatting, sorting etc.
      • Infer from Accept-Language header
      • Map IPv4 address to ccTLD (country code top-level domain)
        • Public information accessible through libraries
          • E.g. Perl IP::Country CPAN module
        • Commercial services offer more precision
    • Always provide option to change defaults
    • Store preferences in cookies
  • Intro to Web Internationalization Internet Language Tags
    • IETF Language Tags (BCP 47)
    • Language[-Language]* 3 [-Script][-Region] [-Variant]*[-Extension]*[-PrivateUse]*
    • Examples
      • en-CA: English in Canada
      • Zh-Hant-TW: Chinese written in traditional Chinese script used in Taiwan
    • Obsoletes RFC 3066 & RFC 1766
      • Often still used in products/earlier standards
  • Internationalization Changes
  • Intro to Web Internationalization POSIX Locales
    • Cross-platform API
      • Locale-identifiers can have variations
        • Un*x: en_US
        • Windows: English_United States
      • Results can be platform-dependent
    • Basis for locale functionality in all scripting languages
    • Provides functionality for
      • Number Formatting: 1,000,000.23
      • Date/Time Formatting: 8 Μάρτιος 2007 12:00:00 μμ
      • Sorting
      • String processing (e.g. upper-/lower-casing)
      • Some translated strings like weekdays, yes/no messages
  • Intro to Web Internationalization International Components for Unicode
    • IBM Open Source project
    • Extensive locale data and APIs
      • Data vetted as part of Common Locale Data Repository (CLDR) project
    • Java and C++ APIs
    • Wrappers for scripting languages
      • PyICU (Python)
      • ICU4R (Ruby) – abandoned?
      • DIY – difficult because of API complexity and character encoding issues
  • Intro to Web Internationalization Microsoft Internationalization APIs
    • Windows NLS API
    • Microsoft .NET Framework System.Globalization namespace
    • Similar set of data to ICU
      • Data vetted by Microsoft subsidiaries
    • APIs accessible from all Microsoft programming languages
  • Intro to Web Internationalization Unicode 5.0 00000 10000 20000 30000 E0000 F0000 100000 … Basic Multilingual Plane Dead Languages & Math Han Characters Language Tags Private Use 0000 1000 2000 3000 4000 5000 6000 7000 8000 9000 A000 B000 C000 D000 E000 F000 Alphabets Punctuation Asian Languages Han Characters Yi Hangul Surrogates Private Use Legacy/Compatibility 99,024 of 1,114,112 code points (U+0000 to U+10FFFF) defined
  • Intro to Web Internationalization Unicode Encodings Forms
    • Variable length: UTF-8/UTF-16
    • Fixed length: UTF-32
    • U+2122: ™: Trade Mark Sign
    0…00100001 00100010 0x00002122 UTF-32 00100001 00100010 0x2122 UTF-16 1110 0010 10 000100 10 100010 0xE2 0x84 0xA2 UTF-8
  • Intro to Web Internationalization Unicode on the Web
    • XML processors are required to process UTF-8/UTF-16
    • Encoding declaration precedence
      • HTTP Content-Type header charset declaration
      • XML encoding declaration (XHTML)
      • meta charset declaration in (X)HTML
      • link element charset attribute
    • Approx. 4% of pages have encoding errors*
    • No real need for character references
      • Exceptions: <,>,&,&quot;
    • Use styles to control font selection
  • Demo A Currency Converter Application – globalized but not localized
  • Intro to Web Internationalization Localization Recommendations Avoid translatable text in graphics Make sure graphics are culturally neutral Avoid absolute sizing Use HTML flow layout Write complete sentences
  • Intro to Web Internationalization Localization Model and Tools
    • Text translation
      • Localization formats
        • HTML with template library
          • W3C Internationalization Tag Set (tool support?)
        • GNU gettext/PO
        • XLIFF - XML Localization Interchange File Format
      • Localization tools
        • OmegaT
        • Open Language Tools (Sun)
        • The WordForge Project: Pootle
    • Searchability – Links/Sitemap
  • Demo A Currency Converter Application – fully internationalized Web 1.0 application
  • Client-side Scripting Javascript Internationalization
    • ECMAScript edition 3 added a range of internationalization features (1999)
      • Good support for Unicode processing
      • Set of locale-sensitive functions
        • Dependent on host locale (i.e. browser)
      • Set of locale-insensitive functions
      • No number or date/time parsing
    • Javascript libraries with additional internationalization functionality
      • dojo Toolkit (i18n contributed by IBM)
      • Microsoft AJAX Library
  • Client-side Scripting AJAX Recommendations
    • Late globalization
      • Transmit data in locale-independent form with XMLHttpRequest
      • Might require some creative parsing/UI
    • Early localization
      • Text localization server-side
      • Browsers are missing a message-catalog facility
      • Dynamically created page content is invisible to search engines
  • Demo A Currency Converter Application – dynamic update of exchange amounts using Ajax
  • Multi-lingual Syndication RSS 2.0
    • Character encoding
      • RSS 2.0 is an XML application
      • XML encoding rules apply
    • Language
      • Element only on channel (feed), not on item
        • Create one channel per language
      • Specified to comply to RFC1766 language tags
    • Date/Time
      • In standard RFC 822 format (including 4-digit years)
        • E.g. “Wed, 02 Oct 2002 08:00:00 EST”
  • Multi-lingual Syndication Atom Syndication
    • More granular language marking
      • xml:lang can be applied to any human readable text in the format
      • Aggregators need to deal with this
    • Better date/time format: RFC 3339
      • E.g. “2003-12-13T18:30:02-05:00”
    • Acknowledgement: Tim Bray
  • Demo A Currency Converter Application – adding a syndication feed with exchange rate information
  • International Web Services Design Service Patterns norske kroner ? NOK CHF Service adjusts formatting and language to locale the data refers to Data Driven 03/08/2007 12:00pm EST Service is locale-specific and ignores client preference Service Determined Kanadischer Dollar CAD (Accept-Language: de) Service reacts to client-locale e.g. HTTP Accept-Language Client Influenced 1.1785 CAD Neutral data formats Locale Neutral Return data Request data Description
  • International Web Services Design REST
    • REST naturally ties into i18n features in HTTP/HTML/XML
      • Locale indicated with HTTP Accept-Language
      • Encoding and language marking in markup
    • Special caution for HTTP GET parameters
      • Locale-independent formatting recommended
      • Text parameters
        • Encode in UTF-8 and escape in URIs
        • IRI (International Resource Identifier) functionality might provide this for you
  • International Web Services Design SOAP
    • Locale can be communicated in
      • Transport header (e.g. HTTP)
      • SOAP header
      • SOAP message body
    • Beware of automatically generated SOAP interfaces
      • Might be locale-dependent, but not allow to specify locale
    • Use of XML Schema data types promotes locale-independence
    • Also consider localization of error messages
  • Demo A Currency Converter Application – exchange rates as a REST web service
  • Conclusions
    • Unification
      • One code base
    • Customization
      • Localization and adaptation for locales
    • Next step: cross-language “leakage”
      • Provide views in multiple languages to the same (user-generated) data
      • Translate user-generated content
        • Volunteers
        • Machine Translation
  • Call for Contributions
    • The Perl CGI demo code is available on
      • http://www.digitalsilkroad.net/twiki/CurrencyConverter
    • Add a version in your preferred language
      • Ruby on Rails
      • PHP
      • Python
    • A similar application for ASP.NET is available on
      • http://quickstarts.asp.net/QuickStartv20/aspnet/doc/localization/default.aspx
  • References
    • W3C Internationalization Activity
      • http://www.w3.org/International/
    • POSIX Locale
      • http://www.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap07.html
    • International Components for Unicode
      • http://www-306.ibm.com/software/globalization/icu/
    • Unicode/Common Locale Data Repository
      • http://www.unicode.org/
    • Microsoft Internationalization APIs
      • http://msdn2.microsoft.com/en-us/library/ms776254.aspx
      • http://msdn2.microsoft.com/en-us/library/system.globalization.aspx
  • References
    • OmegaT
      • http://www.omegat.org/omegat/omegat_en/omegat.html
    • Open Language Tools
      • https://open-language-tools.dev.java.net/
    • The WordForge Project
      • http://www.wordforge.org/drupal/
    • Javascript Internationalization
      • http://www.icu-project.org/docs/papers/internationalization_support_for_javascript.html
    • RSS 2.0
      • http://www.rssboard.org/rss-specification
    • Atom Syndication
      • http://www.atomenabled.org/developers/syndication
    • RSS 1.0
      • http://web.resource.org/rss/1.0/spec
    • W3C Web Services Internationalization Usage Scenarios
      • http://www.w3.org/TR/ws-i18n-scenarios/
  • Additional Slides
  • Multi-lingual Syndication RSS 1.0
    • Character encoding
      • RSS 1.0 is an XML application
      • XML encoding rules apply
    • Complies to RDF (Resource Description Framework) specification
      • Definition of language and date/time formats are left to RDF metadata formats
        • Dublin Core Metadata Element Set
        • Language: RFC1766/ISO639-2
        • Date/Time: ISO 8601 (superset of RFC 3339)
          • Also Dublin Core allows to specify time periods!