2. Getting it wrong
From Code to Product Lecture 5 ā Localizationā Slide 2 gidgreen.com/course
3. Something we should know?
From Code to Product Lecture 5 ā Localizationā Slide 3 gidgreen.com/course
4. Lecture 5
ā¢āÆ Countries and languages
ā¢āÆ Character sets
ā¢āÆ Unicode
ā¢āÆ Text localization
ā¢āÆ Outsourcing translation
ā¢āÆ Other localization
From Code to Product Lecture 5 ā Localizationā Slide 4 gidgreen.com/course
5. Population
China 1,347 M 19.3% Mandarin 845 M 12.1%
India 1,210 M 17.3% Spanish 329 M 4.7%
USA 313 M 4.5% English 328 M 4.7%
Indonesia 238 M 3.4% Hindi-Urdu 240 M 3.4%
Brazil 192 M 2.8% Arabic 221 M 3.2%
Pakistan 179 M 2.6% Bengali 181 M 2.6%
Nigeria 162 M 2.3% Portuguese 178 M 2.5%
Russia 143 M 2.0% Russian 144 M 2.1%
Bangladesh 142 M 2.0% Japanese 122 M 1.7%
Japan 128 M 1.8% Punjabi 109 M 1.6%
2011-2012 from Wikipedia
From Code to Product Lecture 5 ā Localizationā Slide 5 gidgreen.com/course
6. Economic weight (nominal)
USA $14.4 T 23.7% English $21.3 T 34.9%
Japan $4.9 T 8.1% Chinese $5.2 T 8.4%
China $4.3 T 7.1% Japanese $4.9 T 8.1%
Germany $3.7 T 6.0% German $4.4 T 7.2%
France $2.9 T 4.7% Spanish $4.2 T 6.8%
UK $2.7 T 4.4% French $4.0 T 6.5%
Italy $2.3 T 3.8% Italian $2.5 T 4.1%
Russia $1.7 T 2.8% Russian $2.2 T 3.7%
Spain $1.6 T 2.6% Portuguese $1.9 T 3.1%
Brazil $1.6 T 2.6% Arabic $1.9 T 3.1%
2008 from globalization-group.com, IMF
From Code to Product Lecture 5 ā Localizationā Slide 6 gidgreen.com/course
7. Internet users
China 485 M 36% English 565 M 43%
USA 245 M 78% Chinese 510 M 37%
India 100 M 8% Spanish 165 M 39%
Japan 99 M 78% Japanese 99 M 78%
Brazil 76 M 37% Portuguese 83 M 32%
Germany 65 M 80% German 75 M 80%
Russia 60 M 43% Arabic 65 M 19%
UK 51 M 82% French 60 M 17%
France 45 M 70% Russian 60 M 43%
Nigeria 44 M 28% Korean 39 M 55%
2011 from internetworldstats.com
From Code to Product Lecture 5 ā Localizationā Slide 7 gidgreen.com/course
9. E-commerce volumes
USA
Japan
$123B $135B China
Germany
France
$13B UK
Italy
$15B
$51B Canada
$16B Spain
$19B $28B $37B South Korea
$28B $36B Other
2009 from Everis
From Code to Product Lecture 5 ā Localizationā Slide 9 gidgreen.com/course
10. Multilingual countries
Italian
French 0.5M
8M French
1.6M
Germa
English n
21M 5.0M
Canada Switzerland
From Code to Product Lecture 5 ā Localizationā Slide 10 gidgreen.com/course
11. Language variations
ā¢āÆ US vs UK English
āāÆ color | colour
āāÆ vacation | holiday
āāÆ Where are you (at)?
ā¢āÆ European vs Brazilian Portuguese
ā¢āÆ French
ā¢āÆ Spanish
From Code to Product Lecture 5 ā Localizationā Slide 11 gidgreen.com/course
12. Language codes (ISO-639-1)
ar Arabic zh-CN Chinese (simplified)
fr French zh-TW Chinese (traditional)
nl Dutch en-GB English (UK)
de German en-US English (US)
he Hebrew pt-BR Portuguese (Brazilian)
it Italian pt-PT Portuguese (Portugal)
ja Japanese es-AR Spanish (Argentina)
pl Polish es-CL Spanish (Chile)
ru Russian es-MX Spanish (Mexico)
es Spanish es-ES Spanish (Spain)
From Code to Product Lecture 5 ā Localizationā Slide 12 gidgreen.com/course
13. Lecture 5
ā¢āÆ Countries and languages
ā¢āÆ Character sets
ā¢āÆ Unicode
ā¢āÆ Text localization
ā¢āÆ Outsourcing translation
ā¢āÆ Other localization
From Code to Product Lecture 5 ā Localizationā Slide 13 gidgreen.com/course
14. Computer representation
0 1 0 0 0 0 0 1
00 ā¦ 41 ā¦ FF
0 ā¦ 65 ā¦ 255
A
.,/?;:ā!%abcdefghijklmnopqrstuvwxyzā¦ ā¦BCDEFGHIJKMNOPQRSTUVWXYZ0123456789
From Code to Product Lecture X ā SUBJECTā Slide 14 gidgreen.com/course
15. US-ASCII
Image from czyborra.com
From Code to Product Lecture 5 ā Localizationā Slide 15 gidgreen.com/course
20. Problems with character sets
ā¢āÆ Extra metadata
ā¢āÆ Potential for misdisplay
ā¢āÆ Mutually exclusive
ā¢āÆ Little space to grow - e.g. ā¬
ā¢āÆ Ideographic languages
āāÆ 70,000+ Chinese characters
āāÆ Multibyte encoding
From Code to Product Lecture 5 ā Localizationā Slide 20 gidgreen.com/course
21. Lecture 5
ā¢āÆ Countries and languages
ā¢āÆ Character sets
ā¢āÆ Unicode
ā¢āÆ Text localization
ā¢āÆ Outsourcing translation
ā¢āÆ Other localization
From Code to Product Lecture 5 ā Localizationā Slide 21 gidgreen.com/course
22. The Unicode solution
ā¢āÆ One global character set
āāÆ Over 110,000 characters
āāÆ Over 100 alphabets
ā¢āÆ 1,114,112 code points
āāÆ 0ā¦255 compatible with ISO-8859-1
āāÆ U+0041 = A
ā¢āÆ Multiple encodings
From Code to Product Lecture X ā SUBJECTā Slide 22 gidgreen.com/course
23. U+0000 ā¦ U+007F
From Code to Product Lecture 5 ā Localizationā Slide 23 gidgreen.com/course
24. U+0080 ā¦ U+00FF
From Code to Product Lecture 5 ā Localizationā Slide 24 gidgreen.com/course
25. U+0400 ā¦ U+047F
From Code to Product Lecture 5 ā Localizationā Slide 25 gidgreen.com/course
26. U+0590 ā¦ U+060F
From Code to Product Lecture X ā SUBJECTā Slide 26 gidgreen.com/course
27. U+4E00 ā¦ U+4E7F
From Code to Product Lecture 5 ā Localizationā Slide 27 gidgreen.com/course
28. U+2190 ā¦ U+220F
From Code to Product Lecture 5 ā Localizationā Slide 28 gidgreen.com/course
29. U+2800 ā¦ U+267F
From Code to Product Lecture 5 ā Localizationā Slide 29 gidgreen.com/course
30. UTF-16 encoding
ā¢āÆ 2 or 4 bytes per code point
ā¢āÆ Simple for U+0000ā¦D7FF and E000ā¦FFFF
āāÆ āBasic Multilingual Paneā
ā¢āÆ Higher code points use 4 bytes
ā¢āÆ U+FEFF = byte-order mark
āāÆ No well-followed default
ā¢āÆ Windows APIs since Windows 2000
āāÆ Also .NET, Android, iOS, Mac OS X
From Code to Product Lecture 5 ā Localizationā Slide 30 gidgreen.com/course
31. UTF-8 encoding
ā¢āÆ 1 to 6 bytes per code point
ā¢āÆ 1 byte for U+0000ā¦007F
āāÆ Perfect compatibility with ASCII
ā¢āÆ 2 bytes for U+0080ā¦07FF
āāÆ etcā¦
ā¢āÆ Byte order mark allowed
āāÆ But unnecessary, causes problems
ā¢āÆ Dominant on web, email
From Code to Product Lecture 5 ā Localizationā Slide 31 gidgreen.com/course
33. UTF-8 advantages
ā¢āÆ Natural compression for English
ā¢āÆ English works in old tools/APIs
āāÆ HTML tags unaffected
ā¢āÆ No shared values between byte types
āāÆ Easy to synchronize mid-stream
āāÆ Easy to search by byte value
ā¢āÆ No zero bytes (good for C)
ā¢āÆ Byte-sorting = codepoint-sorting
From Code to Product Lecture 5 ā Localizationā Slide 33 gidgreen.com/course
34. Unicode on the web
googleblog.blogspot.com
Source:
From Code to Product Lecture 5 ā Localizationā Slide 34 gidgreen.com/course
35. Lecture 5
ā¢āÆ Countries and languages
ā¢āÆ Character sets
ā¢āÆ Unicode
ā¢āÆ Text localization
ā¢āÆ Outsourcing translation
ā¢āÆ Other localization
From Code to Product Lecture 5 ā Localizationā Slide 35 gidgreen.com/course
36. The original source code
function Check_Username(username)
ā¦
if Username_Taken(username)ā¦
error="username is taken."
ā¦
return error
end function
From Code to Product Lecture 5 ā Localizationā Slide 36 gidgreen.com/course
37. And now in Spanishā¦
function Check_Username(username)
ā¦
if Username_Taken(username)ā¦
error="username se toma."
ā¦
return error
end function
From Code to Product Lecture 5 ā Localizationā Slide 37 gidgreen.com/course
38. Internationalized
function Check_Username(username)
ā¦
if Username_Taken(username)ā¦
error=Get_String("un-taken")
ā¦
return error
end function
From Code to Product Lecture 5 ā Localizationā Slide 38 gidgreen.com/course
39. Internationalized
function Check_Username(username)
ā¦
if Username_Taken(username)ā¦
error=Translate("username is
taken")
ā¦
return error
end function
From Code to Product Lecture 5 ā Localizationā Slide 39 gidgreen.com/course
40. IDs vs English strings
IDs English strings
More compact code More explicit code
Enforces sync between
English can be changed
languages
Less error-prone Easier for third parties
From Code to Product Lecture 5 ā Localizationā Slide 40 gidgreen.com/course
41. Concatenation is evil
You will travel from London to Paris
print Translate("You will travel from ") +
from_city + Translate(" to ") + to_city
Usted viajarĆ” de London a Paris
Sie wird von London nach Paris reisen
From Code to Product Lecture 5 ā Localizationā Slide 41 gidgreen.com/course
42. Substitutions
raw=Translate("You will travel from
%from% to %to%")
raw=replace(raw, "%from%", from_city)
print replace(raw, "%to%", to_city)
You will travel from %from% to %to%
Usted viajarĆ” de %from% a %to%
Sie wird von %from% nach %to% reisen
From Code to Product Lecture 5 ā Localizationā Slide 42 gidgreen.com/course
43. Singular/plural
You have 3 credits left You have 1 credit left
if (credits is 1)
c_string=translate("1 credit")
else
c_string=replace(translate("%#% credits",
"%#%", credits)
raw=translate("You have %credits% leftā)
print replace(raw, "%credits", c_string)
From Code to Product Lecture 5 ā Localizationā Slide 43 gidgreen.com/course
44. Text in images
From Code to Product Lecture 5 ā Localizationā Slide 44 gidgreen.com/course
46. LTR / RTL
From Code to Product Lecture 5 ā Localizationā Slide 46 gidgreen.com/course
47. Lecture 5
ā¢āÆ Countries and languages
ā¢āÆ Character sets
ā¢āÆ Unicode
ā¢āÆ Text localization
ā¢āÆ Outsourcing translation
ā¢āÆ Other localization
From Code to Product Lecture 5 ā Localizationā Slide 47 gidgreen.com/course
48. Outsourcing translation
ā¢āÆ Preparing code
ā¢āÆ Collecting (English) assets
ā¢āÆ Choosing a provider
ā¢āÆ Costs and quotes
ā¢āÆ Glossary
ā¢āÆ Translation memory
ā¢āÆ Independent review
From Code to Product Lecture 5 ā Localizationā Slide 48 gidgreen.com/course
49. Collecting assets
ā¢āÆ Text files
āāÆ Simple arrays or resource files
āāÆ Standard formats, e.g. gettext, XLIFF
ā¢āÆ HTML files
āāÆ Risk of accidental markup changes
ā¢āÆ Graphics files
āāÆ Originals, not rendered
ā¢āÆ Think about text expansion
From Code to Product Lecture 5 ā Localizationā Slide 49 gidgreen.com/course
50. Choosing a provider
ā¢āÆ Problem: you canāt assess quality
ā¢āÆ Go by reputation and clients
āāÆ Examples of previous work
ā¢āÆ Ask who will actually do it
āāÆ Native speaker of target language
āāÆ Subject-specific experience
ā¢āÆ Consider future language needs
From Code to Product Lecture 5 ā Localizationā Slide 50 gidgreen.com/course
51. Cost and quotes
Ibidem-translations.com
ā¢āÆ Add 15-50% for specialized areas
ā¢āÆ Clarify how words are counted
ā¢āÆ Check for extra costs
From Code to Product Lecture 5 ā Localizationā Slide 51 gidgreen.com/course
52. Glossary
ā¢āÆ Fixed translation for specific terms
āāÆ Control over branding
āāÆ Domain-specific terminology
āāÆ Consistency
ā¢āÆ Not-to-be-translated terms
ā¢āÆ Requires thorough review of product
From Code to Product Lecture 5 ā Localizationā Slide 52 gidgreen.com/course
53. Glossary
Image from Google Translator Toolkit Help
From Code to Product Lecture 5 ā Localizationā Slide 53 gidgreen.com/course
54. Translation memory
ā¢āÆ Lots of translation is repetitive
āāÆ Same text in many places
āāÆ Small changes between versions
ā¢āÆ Same sentence = same translation
āāÆ Save time and money
āāÆ Help ensure consistency
āāÆ But manual confirmation required
ā¢āÆ Should be owned by you
From Code to Product Lecture 5 ā Localizationā Slide 54 gidgreen.com/course
55. Translation memory
Image from kilgray.com screenshots
From Code to Product Lecture 5 ā Localizationā Slide 55 gidgreen.com/course
57. Lecture 5
ā¢āÆ Countries and languages
ā¢āÆ Character sets
ā¢āÆ Unicode
ā¢āÆ Text localization
ā¢āÆ Outsourcing translation
ā¢āÆ Other localization
From Code to Product Lecture 5 ā Localizationā Slide 57 gidgreen.com/course
58. Numbers
1,234,567.89 ā Japan, UK, USA
1 234 567,89 ā France, Central Europe
1.234.567,89 ā Germany, Scandinavia
1ā234ā567.89 ā Switzerland
123,4567.89 ā China
1ā234,567.89 ā Mexico
12,34,567.89 ā India
From Code to Product Lecture X ā SUBJECTā Slide 58 gidgreen.com/course
59. Date and Times
7/21/2012 15:45
21/7/2012 3.45 PM
21.7.2012 3:45 pm
2012-07-12
7. 21. 2012
7-12-2012
From Code to Product Lecture 5 ā Localizationā Slide 59 gidgreen.com/course
60. Time zones
Map from
wikipedia.org
From Code to Product Lecture 5 ā Localizationā Slide 60 gidgreen.com/course
61. Displaying times online
ā¢āÆ Store times independent of zone
ā¢āÆ Options for display
āāÆ Ask the user for their time zone
āāÆ Show an explicit time zone
āāÆ Use āagoā notation
ā¢āÆ Javascript to get from browser
From Code to Product Lecture 5 ā Localizationā Slide 61 gidgreen.com/course
64. Names
Full Name:
What should we call you?
Family name:
Other/given names:
ā¢āÆ Or localize based on language
ā¢āÆ Do you need names at all?
āāÆ Username or email can be enough
From Code to Product Lecture 5 ā Localizationā Slide 64 gidgreen.com/course
65. Addresses
John Doe ć100-8994
Acme, Inc ę±äŗ¬é½äø央åŗå «éę“²äøäøē®5ēŖ3å·
Suite 3B-3824 ę±äŗ¬äø央éµä¾æå±
294 W Ronson
Dallas TX 75211 Tokyo Central Post Office
USA 1-5-3 Yaesu, Chuo-ku
Tokyo 100-8994
Japan
John Smith
Acme, Ltd
Flat 384
33 Walton Road
C/Pescadoro, 13, 2Ā°, 3ĀŖ
Birmingham
28331 ā Madrid
B26 3QJ
Spain
UK
From Code to Product Lecture 5 ā Localizationā Slide 65 gidgreen.com/course
66. Addresses
ā¢āÆ Single multi-line field
ā¢āÆ Change in response to country
ā¢āÆ Generic format
From Code to Product Lecture 5 ā Localizationā Slide 66 gidgreen.com/course
67. Phone numbers
UK: +44 (0) 123-456-7890
France: +33 1-23-45-67-89
China: +86 10-2345-6789
USA: +1 (123) 456-7890 x123
ā¢āÆ Country selector
ā¢āÆ Change in response to country
ā¢āÆ Generic format
From Code to Product Lecture 5 ā Localizationā Slide 67 gidgreen.com/course
69. Paper sizes
A4 US Letter
297 x 210 mm 279 x 216 mm US Legal
356 x 216 mm
From Code to Product Lecture 5 ā Localizationā Slide 69 gidgreen.com/course
70. Domain names
ā¢āÆ Country-code top-level domains
āāÆ .fr .de .uk .in .br .jp .cn
ā¢āÆ Need separate registrar for many
ā¢āÆ Some countries have restrictions
āāÆ .com.au requires registered company
āāÆ .ca requires nationality/residence
āāÆ Also restricted: .fr .br .cn .ie .jp ā¦
ā¢āÆ Internationalized domain names
From Code to Product Lecture 5 ā Localizationā Slide 70 gidgreen.com/course
71. And thereās moreā¦
ā¢āÆ Units of measurement
ā¢āÆ Colors
ā¢āÆ Images of people
ā¢āÆ Calendars
ā¢āÆ Holidays
ā¢āÆ Border disputes
ā¢āÆ Culture
ā¢āÆ Law
From Code to Product Lecture 5 ā Localizationā Slide 71 gidgreen.com/course
72. Google in China
ā¢āÆ 2005: Chinese language google.com
ā¢āÆ 2006: google.cn under censorship
ā¢āÆ 2009: China blocks YouTube
ā¢āÆ 2010: Google claims hacking attack
āāÆ Redirects google.cn to google.com.hk
āāÆ China blocks it for a day
ā¢āÆ Today: Baidu 79%, Google 17%
āāÆ Baidu links to MP3/movie downloads
From Code to Product Lecture 5 ā Localizationā Slide 72 gidgreen.com/course
73. Getting real
ā¢āÆ Itās time consuming and costly
ā¢āÆ Cheap wins in version 1.0
āāÆ Parameterize + functionize
āāÆ Use Unicode throughout
āāÆ Flexible layouts
ā¢āÆ See where there is demand
ā¢āÆ Identify most important locales
From Code to Product Lecture 5 ā Localizationā Slide 73 gidgreen.com/course
74. Getting real
ā¢āÆ Donāt skimp the details
āāÆ Needs to look native
ā¢āÆ Use serious service providers
ā¢āÆ Prepare for tech support
āāÆ Machine translation an option?
ā¢āÆ It will slow development
āāÆ So wait for product maturity
From Code to Product Lecture 5 ā Localizationā Slide 74 gidgreen.com/course