Go Global Fearless(I18N & L10N)

Go Global Fearless
Conquer the world by Internationalizing your product!
D V enkata Rajesh
Principal QA Engineer
Progress Software

© 2014 Progress Software Corporation. All rights reserved.2
Agenda
Introduction –I18N , L10N
All about I18N & L10N - Terminology!
Unicode – Deep dive into details
Localization testing tips

Why L10N?
TOP 10 Global Internet Websites have
81% of User base outside America
92% of the Top 25 Grossing iPhone apps in
China use Chinese names
80% of the Top 25 Grossing
Android apps in Japan use
Japanese names
41% of the total app global
revenue came from Asia, while
North America generated 31% and
Europe23%
72.4% of global consumers indicated
that they prefer to use their native
language when shopping online
Sources: KPCB, Common Sense Advisory, App Annie 2014

Localization with trending Mobile & Cloud era – Social media

I18N and L10N
 Internationalization is
a process of designing a
software application to
adapt to various
languages and regions
without any changes in
source
 Localization is the process
of customizing a software
application that was
originally designed for a
domestic market so that it
can be released in foreign
markets

Internationalization Process

Internationalization process
Source code
Hard coded contents
Resource bundles
Move contents to a
properties file
MessagesBundle_fr_FR.propertiesMessagesBundle_en_US.properties

Evolution
Character sets, Code pages, Encoding

11000011 10000000
ÀU+00C0
Process of text to encoding

Code Pages
IBM code
pages
ISO code
pages
Microsoft
code pages
Code pages Name
ISO 8859-1 Latin-1
ISO 8859-2 Latin-2
ISO 8859-3 Latin-3
ISO 8859-4 Latin-4
ISO 8859-5 Cyrillic
ISO 8859-6 Arabic
ISO 8859-7 Greek
ISO 8859-8 Hebrew
ISO 8859-9 Latin-5
ISO 8859-10 Latin-6
ISO 8859-11 Thai
ISO 8859-13 Latin-7
ISO 8859-14 Latin-8
ISO 8859-15 Latin-9
ISO 8859-16 Latin-10
Code pages Name
CP 1250 Latin 2
CP 1251 Cyrillic
CP 1252 Latin 1
CP 1253 Greek
CP 1254 Latin 5
CP 1255 Hebrew
CP 1256 Arabic
CP 1257 Baltic
CP 1258 Viet Nam
CP 874 Thai
Code pages Name
37 USA/Canada - CECP
256 International #1
259 Symbols, Set 7
273
Germany F.R./Austria -
CECP
274 Old Belgium Code Page
275 Brazil - CECP
276 Canada (French) - 94
850
Personal Computer -
Multilingual Page
278 Finland, Sweden - CECP
280 Italy - CECP
281 Japan (Latin) - CECP
282 Portugal - CECP
284 Spain/Latin America - CECP
285 United Kingdom - CECP

Common Encoding Problems
Tofu
hollow boxes
Mojibake
garbage characters
Question Marks
(conversion not supported)

Unicode
Deep dive into
Normalization , Compatibility, Replacement characters ..

Unicode - Encodes the world’s scripts
 Code space of up to 0x10FFFF
(about 1.1 million) characters
 Currently encodes 120,737
characters
 Currently allocated code points
264,256
 U+0041 <= hex notation
Plane
Allocated code
points
Assigned characters
0 BMP 65,392 55,181
1 SMP 14,000 11,833
2 SIP 53,424 53,386
3 TIP 16,672 799
14 SSP 368 337
15 PUA-A 65,536
16 PUA-B 65,536
Totals 264,256 120,737

Four Normalization Forms
 Form D
canonical decomposition
 Form C
canonical decomposition followed by
composition
 Form KD
Compatibility decomposition
 Form KC
Compatibility decomposition followed
by composition
ways to represent:
U+01FA
U+00C5 U+0301
U+00C1 U+030A
U+212B U+0301
U+0041 U+0301 U+030A
U+0041 U+030A U+0301
Ǻ

Unicode Encoding Forms
 UTF-32
• Uses 32-bit code units
• All characters are the same width
 UTF-16
• Uses 16-bit code units
• BMP characters use one 16-bit code unit
• Supplementary characters use two special 16-bit code units: a “surrogate pair”
 UTF-8
• Uses 8-bit code units (bytes!)
• It’s a multi-byte encoding!
• Characters use between 1 and 4 bytes
• ASCII is ASCII in UTF-8

Localization testing tips

Case study: A Website + 10 languages + 4 Browsers + 20 test cases
L
O
C
A
L
I
Z
A
T
I
o
N
L
1
0
N
T
E
S
T
I
N
G

Localization testing UI checks
Layout Hot keys Text Graphics
• Text truncation
• Control truncation
• Misalignment
• Overlapping
• Tabbing order
• Oversized dialogs
• Different layout in
general
• Duplicated hotkeys
• Missing hotkey
• Inappropriate hotkey
• Un-translated text
• Mistranslated text
• Unexpected text
• Inconsistent translation
• Technical inaccuracy
• Double space after full
stop
• Wrong alphabetical
order
• Wrong date/time
format
• Corrupt characters
• Missing graphics
• Different graphics
• Un-translated icons

Pseudo Localization testing
A way to evaluate a website or software product’s readiness for the localization process
 Considered a part of the internationalization testing process
1. Identify hard-coded strings that should be translatable
2. Find strings in the source files that shouldn’t be translated
3. Identify design and layout issues that will affect the
software or site when it is translated

Regional differences

More localization testing
Aspect Challenge
Limitation of screen size Character count and font of characters differ in various languages.
Direction
Some languages are written left to right, whereas others are
written right to left.
Spelling rules and upper and lower
case conversions Rules differ based on locale.
Regional Standards
Applications may have to be compatible with not only national
languages, but also the regional languages
Data Handling
Different data storages and processing mechanisms along with
different encoding/code pages.
Context and Special Characters
The translation of special characters needs to be handled carefully
as different characters may have different meanings in different
languages.
Collation And Sorting Sorting and collation rules differ in various languages.

Go Global Fearless(I18N & L10N)

Go Global Fearless(I18N & L10N)

Recommended

Recommended

More Related Content

Similar to Go Global Fearless(I18N & L10N)

Similar to Go Global Fearless(I18N & L10N) (20)

Recently uploaded

Recently uploaded (20)

Go Global Fearless(I18N & L10N)