Localization and Internationalization 2013

499 views

Published on

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
499
On SlideShare
0
From Embeds
0
Number of Embeds
10
Actions
Shares
0
Downloads
2
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Localization and Internationalization 2013

  1. 1. 11 — Localization From Code to Product gidgreen.com/course
  2. 2. Getting it wrong From Code to Product Lecture 11 — Localization— Slide 2 gidgreen.com/course
  3. 3. Something we should know? From Code to Product Lecture 11 — Localization— Slide 3 gidgreen.com/course
  4. 4. Lecture 11 •  Countries and languages •  Character sets •  Unicode •  Text localization •  Outsourcing translation •  Other localization From Code to Product Lecture 11 — Localization— Slide 4 gidgreen.com/course
  5. 5. Population From Code to Product Lecture 11 — Localization— Slide 5 gidgreen.com/course China 1,347 M 19.3% India 1,210 M 17.3% USA 313 M 4.5% Indonesia 238 M 3.4% Brazil 192 M 2.8% Pakistan 179 M 2.6% Nigeria 162 M 2.3% Russia 143 M 2.0% Bangladesh 142 M 2.0% Japan 128 M 1.8% Mandarin 845 M 12.1% Spanish 329 M 4.7% English 328 M 4.7% Hindi-Urdu 240 M 3.4% Arabic 221 M 3.2% Bengali 181 M 2.6% Portuguese 178 M 2.5% Russian 144 M 2.1% Japanese 122 M 1.7% Punjabi 109 M 1.6% 2011-2012 from Wikipedia
  6. 6. Economic weight (nominal) From Code to Product Lecture 11 — Localization— Slide 6 gidgreen.com/course USA $14.4 T 23.7% Japan $4.9 T 8.1% China $4.3 T 7.1% Germany $3.7 T 6.0% France $2.9 T 4.7% UK $2.7 T 4.4% Italy $2.3 T 3.8% Russia $1.7 T 2.8% Spain $1.6 T 2.6% Brazil $1.6 T 2.6% English $21.3 T 34.9% Chinese $5.2 T 8.4% Japanese $4.9 T 8.1% German $4.4 T 7.2% Spanish $4.2 T 6.8% French $4.0 T 6.5% Italian $2.5 T 4.1% Russian $2.2 T 3.7% Portuguese $1.9 T 3.1% Arabic $1.9 T 3.1% 2008 from globalization-group.com, IMF
  7. 7. Internet users From Code to Product Lecture 11 — Localization— Slide 7 gidgreen.com/course China 485 M 36% USA 245 M 78% India 100 M 8% Japan 99 M 78% Brazil 76 M 37% Germany 65 M 80% Russia 60 M 43% UK 51 M 82% France 45 M 70% Nigeria 44 M 28% English 565 M 43% Chinese 510 M 37% Spanish 165 M 39% Japanese 99 M 78% Portuguese 83 M 32% German 75 M 80% Arabic 65 M 19% French 60 M 17% Russian 60 M 43% Korean 39 M 55% 2011 from internetworldstats.com
  8. 8. Internet penetration From Code to Product Lecture 11 — Localization— Slide 8 gidgreen.com/course
  9. 9. E-commerce volumes $135B $51B $37B $36B$28B $28B$19B $16B $15B $13B $123B USA Japan China Germany France UK Italy Canada Spain South Korea Other From Code to Product Lecture 11 — Localization— Slide 9 gidgreen.com/course 2009 from Everis
  10. 10. Multilingual countries From Code to Product Lecture 11 — Localization— Slide 10 gidgreen.com/course English 21M French 8M Canada Germa n 5.0M French 1.6M Italian 0.5M Switzerland
  11. 11. Language variations •  US vs UK English – color | colour – vacation | holiday – Where are you (at)? •  European vs Brazilian Portuguese •  French •  Spanish From Code to Product Lecture 11 — Localization— Slide 11 gidgreen.com/course
  12. 12. Language codes (ISO-639-1) From Code to Product Lecture 11 — Localization— Slide 12 gidgreen.com/course ar Arabic fr French nl Dutch de German he Hebrew it Italian ja Japanese pl Polish ru Russian es Spanish zh-CN Chinese (simplified) zh-TW Chinese (traditional) en-GB English (UK) en-US English (US) pt-BR Portuguese (Brazilian) pt-PT Portuguese (Portugal) es-AR Spanish (Argentina) es-CL Spanish (Chile) es-MX Spanish (Mexico) es-ES Spanish (Spain)
  13. 13. Lecture 11 •  Countries and languages •  Character sets •  Unicode •  Text localization •  Outsourcing translation •  Other localization From Code to Product Lecture 11 — Localization— Slide 13 gidgreen.com/course
  14. 14. Computer representation From Code to Product Lecture X — SUBJECT— Slide 14 gidgreen.com/course 0 1 0 0 0 0 0 1 0 … 65 … 255 .,/?;:’!%abcdefghijklmnopqrstuvwxyz… A …BCDEFGHIJKMNOPQRSTUVWXYZ0123456789 00 … 41 … FF
  15. 15. US-ASCII From Code to Product Lecture 11 — Localization— Slide 15 gidgreen.com/course Image from czyborra.com
  16. 16. ISO-8859-1 From Code to Product Lecture 11 — Localization— Slide 16 gidgreen.com/course
  17. 17. Windows-1252 From Code to Product Lecture 11 — Localization— Slide 17 gidgreen.com/course
  18. 18. ISO-8859-5 From Code to Product Lecture 11 — Localization— Slide 18 gidgreen.com/course
  19. 19. ISO-8859-8 From Code to Product Lecture 11 — Localization— Slide 19 gidgreen.com/course
  20. 20. Problems with character sets •  Extra metadata •  Potential for misdisplay •  Mutually exclusive •  Little space to grow - e.g. € •  Ideographic languages – 70,000+ Chinese characters – Multibyte encoding From Code to Product Lecture 11 — Localization— Slide 20 gidgreen.com/course
  21. 21. Lecture 11 •  Countries and languages •  Character sets •  Unicode •  Text localization •  Outsourcing translation •  Other localization From Code to Product Lecture 11 — Localization— Slide 21 gidgreen.com/course
  22. 22. The Unicode solution •  One global character set – Over 110,000 characters – Over 100 alphabets •  1,114,112 code points – 0…255 compatible with ISO-8859-1 – U+0041 = A •  Multiple encodings From Code to Product Lecture X — SUBJECT— Slide 22 gidgreen.com/course
  23. 23. U+0000 … U+007F From Code to Product Lecture 11 — Localization— Slide 23 gidgreen.com/course
  24. 24. U+0080 … U+00FF From Code to Product Lecture 11 — Localization— Slide 24 gidgreen.com/course
  25. 25. U+0400 … U+047F From Code to Product Lecture 11 — Localization— Slide 25 gidgreen.com/course
  26. 26. U+0590 … U+060F From Code to Product Lecture X — SUBJECT— Slide 26 gidgreen.com/course
  27. 27. U+4E00 … U+4E7F From Code to Product Lecture 11 — Localization— Slide 27 gidgreen.com/course
  28. 28. U+2190 … U+220F From Code to Product Lecture 11 — Localization— Slide 28 gidgreen.com/course
  29. 29. U+2800 … U+267F From Code to Product Lecture 11 — Localization— Slide 29 gidgreen.com/course
  30. 30. UTF-16 encoding •  2 or 4 bytes per code point •  Simple for U+0000…D7FF and E000…FFFF – “Basic Multilingual Pane” •  Higher code points use 4 bytes •  U+FEFF = byte-order mark – No well-followed default •  Windows APIs since Windows 2000 – Also .NET, Android, iOS, Mac OS X From Code to Product Lecture 11 — Localization— Slide 30 gidgreen.com/course
  31. 31. UTF-8 encoding •  1 to 6 bytes per code point •  1 byte for U+0000…007F – Perfect compatibility with ASCII •  2 bytes for U+0080…07FF – etc… •  Byte order mark allowed – But unnecessary, causes problems •  Dominant on web, email From Code to Product Lecture 11 — Localization— Slide 31 gidgreen.com/course
  32. 32. UTF-8 encoding From Code to Product Lecture 11 — Localization— Slide 32 gidgreen.com/course
  33. 33. UTF-8 advantages •  Natural compression for English •  English works in old tools/APIs – HTML tags unaffected •  No shared values between byte types – Easy to synchronize mid-stream – Easy to search by byte value •  No zero bytes (good for C) •  Byte-sorting = codepoint-sorting From Code to Product Lecture 11 — Localization— Slide 33 gidgreen.com/course
  34. 34. Unicode on the web From Code to Product Lecture 11 — Localization— Slide 34 gidgreen.com/course Source: googleblog.blogspot.com
  35. 35. Lecture 11 •  Countries and languages •  Character sets •  Unicode •  Text localization •  Outsourcing translation •  Other localization From Code to Product Lecture 11 — Localization— Slide 35 gidgreen.com/course
  36. 36. The original source code From Code to Product Lecture 11 — Localization— Slide 36 gidgreen.com/course function Check_Username(username) … if Username_Taken(username)… error="username is taken." … return error end function
  37. 37. And now in Spanish… function Check_Username(username) … if Username_Taken(username)… error="username se toma." … return error end function From Code to Product Lecture 11 — Localization— Slide 37 gidgreen.com/course
  38. 38. Internationalized function Check_Username(username) … if Username_Taken(username)… error=Get_String("un-taken") … return error end function From Code to Product Lecture 11 — Localization— Slide 38 gidgreen.com/course
  39. 39. Internationalized function Check_Username(username) … if Username_Taken(username)… error=Translate("username is taken") … return error end function From Code to Product Lecture 11 — Localization— Slide 39 gidgreen.com/course
  40. 40. IDs vs English strings From Code to Product Lecture 11 — Localization— Slide 40 gidgreen.com/course IDs English strings More compact code More explicit code English can be changed Enforces sync between languages Less error-prone Easier for third parties
  41. 41. Concatenation is evil print Translate("You will travel from ") + from_city + Translate(" to ") + to_city From Code to Product Lecture 11 — Localization— Slide 41 gidgreen.com/course You will travel from London to Paris Usted viajará de London a Paris Sie wird von London nach Paris reisen
  42. 42. Substitutions From Code to Product Lecture 11 — Localization— Slide 42 gidgreen.com/course raw=Translate("You will travel from %from% to %to%") raw=replace(raw, "%from%", from_city) print replace(raw, "%to%", to_city) You will travel from %from% to %to% Usted viajará de %from% a %to% Sie wird von %from% nach %to% reisen
  43. 43. Singular/plural if (credits is 1) c_string=translate("1 credit") else c_string=replace(translate("%#% credits", "%#%", credits) raw=translate("You have %credits% left”) print replace(raw, "%credits%", c_string) From Code to Product Lecture 11 — Localization— Slide 43 gidgreen.com/course You have 3 credits left You have 1 credit left
  44. 44. Text in images From Code to Product Lecture 11 — Localization— Slide 44 gidgreen.com/course
  45. 45. Width in layouts .‫اﻟﺪﻓﻊ‬ ‫ﻋﻠﻰ‬ ‫أﺷﻜﺮﻛﻢ‬ 感谢 的付款。 Gracias por su pago. .‫התשלום‬ ‫על‬ ‫לך‬ ‫מודים‬ ‫אנו‬ Спасибо за ваш платеж. Thank you for your payment. Vielen Dank für Ihre Bezahlung. Σας ευχαριστούµε για την πληρωµή σας. Nous vous remercions de votre paiement. お支払いしていただきありがとうございます。 From Code to Product Lecture 11 — Localization— Slide 45 gidgreen.com/course +57%!
  46. 46. LTR / RTL From Code to Product Lecture 11 — Localization— Slide 46 gidgreen.com/course
  47. 47. Outsourcing translation quotes From Code to Product Lecture 11 — Localization— Slide 47 gidgreen.com/course Ibidem-translations.com •  Add 15-50% for specialized areas •  Clarify how words are counted •  Check for extra costs
  48. 48. Lecture 11 •  Countries and languages •  Character sets •  Unicode •  Text localization •  Outsourcing translation •  Other localization From Code to Product Lecture 11 — Localization— Slide 48 gidgreen.com/course
  49. 49. Numbers 1,234,567.89 — Japan, UK, USA 1 234 567,89 — France, Central Europe 1.234.567,89 — Germany, Scandinavia 1’234’567.89 — Switzerland 123,4567.89 — China 1’234,567.89 — Mexico 12,34,567.89 — India From Code to Product Lecture X — SUBJECT— Slide 49 gidgreen.com/course
  50. 50. Date and Times 7/21/2012 21/7/2012 21.7.2012 2012-07-12 7. 21. 2012 7-12-2012 From Code to Product Lecture 11 — Localization— Slide 50 gidgreen.com/course 15:45 3.45 PM 3:45 pm
  51. 51. Time zones From Code to Product Lecture 11 — Localization— Slide 51 gidgreen.com/course Map from wikipedia.org
  52. 52. Displaying times online •  Store times independent of zone •  Options for display – Ask the user for their time zone – Show an explicit time zone – Use “ago” notation •  Javascript to get from browser From Code to Product Lecture 11 — Localization— Slide 52 gidgreen.com/course
  53. 53. Currencies •  Biggest traded currencies: $ € ¥ £ – But there are almost 200 •  How to display – Number formatting – Symbols: ₪ ₩ ฿ $ – Currency codes: USD EUR JPY GBP CAD AUD •  Also: currency conversion – Live feed, e.g. from ECB From Code to Product Lecture 11 — Localization— Slide 53 gidgreen.com/course
  54. 54. Names •  Surname can come first – China, Japan, Korea, Hungary •  Multiple surnames – José Santos Tavares Melo Silva •  Middle names/initials •  Double-barrelled names – Sarah-Jane Darlington-Whit •  No spaces in CJK From Code to Product Lecture 11 — Localization— Slide 54 gidgreen.com/course
  55. 55. Names From Code to Product Lecture 11 — Localization— Slide 55 gidgreen.com/course Full Name: What should we call you? Family name: Other/given names: •  Or localize based on language •  Do you need names at all? – Username or email can be enough
  56. 56. Addresses From Code to Product Lecture 11 — Localization— Slide 56 gidgreen.com/course John Doe Acme, Inc Suite 3B-3824 294 W Ronson Dallas TX 75211 USA John Smith Acme, Ltd Flat 384 33 Walton Road Birmingham B26 3QJ UK 〒100-8994 東京都中央区八重洲一丁目5番3号 東京中央郵便局 Tokyo Central Post Office 1-5-3 Yaesu, Chuo-ku Tokyo 100-8994 Japan C/Pescadoro, 13, 2°, 3ª 28331 – Madrid Spain
  57. 57. Addresses •  Single multi-line field •  Change in response to country •  Generic format From Code to Product Lecture 11 — Localization— Slide 57 gidgreen.com/course
  58. 58. Indexing, sorting, searching •  Capitalization and accents – Øyvind matches oyvind? •  Collation (sort order) – Swedish: a b c … x y z å ä ö – French: cote côte coté côté •  CJK (ideographic languages) – No spaces between words – Sort based on stroke count From Code to Product Lecture 11 — Localization— Slide 58 gidgreen.com/course
  59. 59. Domain names •  Country-code top-level domains – .fr .de .uk .in .br .jp .cn •  Need separate registrar for many •  Some countries have restrictions – .com.au requires registered company – .ca requires nationality/residence – Also restricted: .fr .br .cn .ie .jp … •  Internationalized domain names From Code to Product Lecture 11 — Localization— Slide 59 gidgreen.com/course
  60. 60. And there’s more… •  Phone numbers •  Units of measurement •  Colors •  Images of people •  Calendars •  Border disputes •  Culture •  Law From Code to Product Lecture 11 — Localization— Slide 60 gidgreen.com/course
  61. 61. Google in China •  2005: Chinese language google.com •  2006: google.cn under censorship •  2009: China blocks YouTube •  2010: Google claims hacking attack – Redirects google.cn to google.com.hk – China blocks it for a day •  Today: Baidu 79%, Google 17% – Baidu links to MP3/movie downloads From Code to Product Lecture 11 — Localization— Slide 61 gidgreen.com/course
  62. 62. Getting real •  It’s time consuming and costly •  Cheap wins in version 1.0 – Parameterize + functionize – Use Unicode throughout – Flexible layouts •  See where there is demand – Identify most important locales From Code to Product Lecture 11 — Localization— Slide 62 gidgreen.com/course
  63. 63. Getting real •  Don’t skimp the details – Needs to look native •  Use serious service providers •  Prepare for tech support – Machine translation an option? •  It will slow development – So wait for product maturity From Code to Product Lecture 11 — Localization— Slide 63 gidgreen.com/course

×