Software Internationalization & Localization: Basic Concepts

Loading...

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

0 comments

Post a comment

    Post a comment
    Embed Video
    Edit your comment Cancel

    1 Favorite

    Software Internationalization & Localization: Basic Concepts - Presentation Transcript

    1. Software Internationalization and Localization: Basic Concepts Doug Kunz
    2. Outline
      • Introduction
      • Localization Examples
      • Design and development impact
    3. Why does internationalization matter?
      • Web’s global reach – potential global user base
      • Support foreign language speakers within our borders
      • Ever-increasing numbers of international business transactions
    4. Definitions
      • Internationalization (i18n): The practice of writing software which can easily be extended to support users from multiple cultural and linguistic backgrounds
      • Localization (L10n): The process of taking internationalized software and actually producing a version tailored to users from a particular culture and language background
    5. Language Tags – IETF BCP 47
      • A “language tag” or “locale” describes a common language + culture shared by a group of users, often at a national level.
      • Documented by IETF “Best Current Practice” 47
        • http://www.ietf.org/rfc/bcp/bcp47.txt
        • Refers to underlying RFC’s (these can change over time, but the BCP number does not)
      • Typically represented by an identifier describing a combination of:
        • 2-3 letter language code (ISO 639, parts 1 or 2)
        • 2 letter country code (ISO 3166)
        • Optional extensions for dialect, writing system
          • en – English zh – Chinese (macrolanguage)
          • en-US – US English zh-cmn – Mandarin Chinese
          • en-GB – UK English zh-cmn-TW – Mandarin Chinese as spoken in Taiwan
          • es-US – US Spanish zh-cmn-Hans-CN – Mandarin Chinese written with
          • Simplified system, as used in China
      ISO = International Organization for Standardization; IETF = Internet Engineering Task Force
    6. What to localize (a non-exhaustive listing)
    7. Writing System
      • Direction of scan (Left-to-Right vs. tfeL-ot-tghiR)
      • Character set (various alphabets, syllabaries and logographies)
    8. Display captions
      • Regional variations within language
        • Spelling variations, e.g. US “color” vs. UK “colour”
        • Terminology variations (“lift” vs. “elevator”, “Español” vs. “Castellano”)
      • Language variations (“Login” vs. “Conectese” vs. “Anmelden” vs. “Connessione”)
    9. Display layouts US English Caption 1 nnnnn Caption 2 nnnnn German BigGermanTranslationOfCaption1 nnnnn BigGermanTranslationOfCaption2 nnnnn Arabic nnnnn 2noitpaC nnnnn 1noitpaC
    10. Print layouts
      • US Letter paper (8 ½ by 11 inches) vs. A4 paper (210×297 mm)
    11. Units of Measure
      • “ British Engineering” (Imperial) System – U.S.A, Liberia and Myanmar
        • Feet/inches/miles
        • Pounds, stone or slugs
        • Fahrenheit
      • SI (Système International) – Rest of world
        • Meters/centimeters/kilometers
        • Kilograms
        • Celsius or Kelvin
    12. Formats: Numbers
      • Decimal separator – character varies
        • 1,000 (US) “one thousand”
        • 1,000 (Most of Europe) “one”
      • Readability delimiters – placement and character vary
        • 1,000,000 (US)
        • 10,00,000 (“10 lakh” India/Pakistan/Sri Lanka)
        • 1.000.000 (Germany)
        • 1 000 000 (France)
        • 100,0000 (China)
    13. Formats: Contact Info
      • Phone numbers
        • (415) 644-3912 within US
        • +1 415 6443912 outside US
      • Postal Codes (a few examples) –
        • US Zip Codes: 99999 or 99999-9999
        • Canadian Postal Codes: A9A 9A9
        • UK Postal Codes (generally):
          • A9 9AA
          • A99 9AA
          • A9A 9AA
          • AA9 9AA
          • AA99 9AA
          • AA9A 9AA
    14. Formats: Contact Info
      • Address layout examples
        • Line1
        • Line2 etc.
        • City PostCode
        • Country
        • Line1
        • Line2 etc.
        • PostCode City
        • Country
        • Line1
        • Line2 etc.
        • City Region PostCode
        • Country
    15. Formats: Dates and Times
      • Dates –
        • Commonly, formats differ within calendar systems: does 01/06/2006 mean “January 1, 2006” or “June 1, 2006”?
        • Less commonly, across calendar systems
          • 22 May 2006 - Gregorian
          • 9 May 2006 - Julian
          • 24 Iyyar 5766 (before sunset) – Hebrew
          • 23 or 24 Rabi`-ul-Akhir 1427 (before sunset) - Islamic
      • Times – 5:00pm vs. 17.00
      • Time Zones – 22 May 2006 12:00pm (UTC+14) = 21 May 2006 10:00am (UTC-12)
    16. Design/Development Impacts and Techniques
    17. Know your user
      • Collect information in user profile, such as:
        • “ Preferred language”
          • store as language tag containing least possible amount of information (subtags) needed to localize experience for that particular user (e.g. “en-US” is better than “en-Latn-US”)
        • Time zone
        • Preferred units-of-measure
        • Preferred currency
    18. User Interface vs. Data Locales
      • User Interface locale
        • The captioning, formats and layout needed to present data to the current user
      • Data locale
        • Locale to which a business object belongs, may be distinct from current user’s locale. Example: purchase order has comment text written in French, although current user is English-speaking
        • Typically the locale of the user who created the object
    19. Resource Extraction
      • A “resource” is a screen artifact—text, image, etc.—which contains localized information. For example, a field caption written in US English would be a resource.
      • Place text captions in a separate file for translation
      • Images
        • Where possible, implement buttons as text with a background image, to avoid producing locale-specific images
        • When text *must* be included in an image:
          • “ ALT” text should be placed in a separate file, and should match image text (if any) for ease of translation
          • Image “path” should be locale-specific, e.g. medem.com/images/en_us/next_button.gif
      • Sometimes screen shots help translation services by providing context
    20. Layouts
      • Technique 1: Produce general layout that will work for most languages
        • Where needed, make language-specific “override layouts”
      • Technique 2: “Least common denominator” layouts that will always work
        • Example: restrict print layouts to 210mm by 279mm – works on US Letter and A4
    21. “Store globally, display locally”
      • Pick a reasonable standard format for storage in your database (e.g. ISO 8601 “2006-05-24T18:15:00Z”)
      • Translate for display based on user’s locale (5/24/06 10:15am Pacific Daylight Time)
    22. Flexible storage design
      • Explicit rate/unit storage
        • Bad: Column “Height”
        • Bad: Column “Height_inches”
        • Good: Column “Height” and Column “Height_Units”
        • Good: Column “Price” and Column “Currency”
      • Globally appropriate data type
        • Bad: Column “ZipCode” Integer(5)
        • Good: Column “PostCode” Varchar2 (10)
      • Globally appropriate name
        • Bad: Column “State”
        • Better: Column “Region”
    23. Appropriate character encoding
      • US-ASCII (American Standard Code for Information Interchange)
        • 7 bits / character
        • English only: diacritics not supported (ü, è, ç, etc.)
      • ISO-8859-1 (“Latin 1”)
        • 1 byte (8 bits) / character
        • Superset of US-ASCII
        • Western European languages
        • Default encoding for “text/*” MIME types
        • Basis of the set of characters allowed in HTML 3.2 documents
      • UTF-8
        • 1 to 4 bytes/character (in practice, 1 to 3 bytes)
        • Backward compatible with US-ASCII and ISO-8859-1
        • Unicode (all character sets, including extinct languages)
        • Basis of the set of characters allowed in HTML 4.0 documents
    24. For More Information
      • International Telecommunications Union (ITU) http://www.itu.int/
      • Universal Postal Union (UPU) http://www.upu.int/
      • International Organization for Standardization http://www.iso.org/
      • UTF-8 http://en.wikipedia.org/wiki/UTF-8

    + guest1f8175guest1f8175, 10 months ago

    custom

    807 views, 1 favs, 2 embeds more stats

    Overview of basic concepts involved in producing so more

    More info about this document

    © All Rights Reserved

    Go to text version

    • Total Views 807
      • 800 on SlideShare
      • 7 from embeds
    • Comments 0
    • Favorites 1
    • Downloads 18
    Most viewed embeds
    • 6 views on http://pragmaticoutsourcing.com
    • 1 views on http://www.slideshare.net

    more

    All embeds
    • 6 views on http://pragmaticoutsourcing.com
    • 1 views on http://www.slideshare.net

    less

    Flagged as inappropriate Flag as inappropriate
    Flag as inappropriate

    Select your reason for flagging this presentation as inappropriate. If needed, use the feedback form to let us know more details.

    Cancel
    File a copyright complaint
    Having problems? Go to our helpdesk?

    Categories