More Related Content Similar to Go Global Fearless(I18N & L10N) (20) Go Global Fearless(I18N & L10N)1. Go Global Fearless
Conquer the world by Internationalizing your product!
D V enkata Rajesh
Principal QA Engineer
Progress Software
2. © 2014 Progress Software Corporation. All rights reserved.2
Agenda
Introduction –I18N , L10N
All about I18N & L10N - Terminology!
Unicode – Deep dive into details
Localization testing tips
3. © 2014 Progress Software Corporation. All rights reserved.3
Why L10N?
TOP 10 Global Internet Websites have
81% of User base outside America
92% of the Top 25 Grossing iPhone apps in
China use Chinese names
80% of the Top 25 Grossing
Android apps in Japan use
Japanese names
41% of the total app global
revenue came from Asia, while
North America generated 31% and
Europe23%
72.4% of global consumers indicated
that they prefer to use their native
language when shopping online
Sources: KPCB, Common Sense Advisory, App Annie 2014
4. © 2014 Progress Software Corporation. All rights reserved.4
Localization with trending Mobile & Cloud era – Social media
5. © 2014 Progress Software Corporation. All rights reserved.5
I18N and L10N
Internationalization is
a process of designing a
software application to
adapt to various
languages and regions
without any changes in
source
Localization is the process
of customizing a software
application that was
originally designed for a
domestic market so that it
can be released in foreign
markets
6. © 2014 Progress Software Corporation. All rights reserved.6
Internationalization Process
7. © 2014 Progress Software Corporation. All rights reserved.7
Internationalization process
Source code
Hard coded contents
Resource bundles
Move contents to a
properties file
MessagesBundle_fr_FR.propertiesMessagesBundle_en_US.properties
8. © 2014 Progress Software Corporation. All rights reserved.8
Evolution
Character sets, Code pages, Encoding
9. © 2014 Progress Software Corporation. All rights reserved.9
11000011 10000000
ÀU+00C0
Process of text to encoding
10. © 2014 Progress Software Corporation. All rights reserved.10
Code Pages
IBM code
pages
ISO code
pages
Microsoft
code pages
Code pages Name
ISO 8859-1 Latin-1
ISO 8859-2 Latin-2
ISO 8859-3 Latin-3
ISO 8859-4 Latin-4
ISO 8859-5 Cyrillic
ISO 8859-6 Arabic
ISO 8859-7 Greek
ISO 8859-8 Hebrew
ISO 8859-9 Latin-5
ISO 8859-10 Latin-6
ISO 8859-11 Thai
ISO 8859-13 Latin-7
ISO 8859-14 Latin-8
ISO 8859-15 Latin-9
ISO 8859-16 Latin-10
Code pages Name
CP 1250 Latin 2
CP 1251 Cyrillic
CP 1252 Latin 1
CP 1253 Greek
CP 1254 Latin 5
CP 1255 Hebrew
CP 1256 Arabic
CP 1257 Baltic
CP 1258 Viet Nam
CP 874 Thai
Code pages Name
37 USA/Canada - CECP
256 International #1
259 Symbols, Set 7
273
Germany F.R./Austria -
CECP
274 Old Belgium Code Page
275 Brazil - CECP
276 Canada (French) - 94
850
Personal Computer -
Multilingual Page
278 Finland, Sweden - CECP
280 Italy - CECP
281 Japan (Latin) - CECP
282 Portugal - CECP
284 Spain/Latin America - CECP
285 United Kingdom - CECP
11. © 2014 Progress Software Corporation. All rights reserved.11
Common Encoding Problems
Tofu
hollow boxes
Mojibake
garbage characters
Question Marks
(conversion not supported)
12. © 2014 Progress Software Corporation. All rights reserved.12
Unicode
Deep dive into
Normalization , Compatibility, Replacement characters ..
13. © 2014 Progress Software Corporation. All rights reserved.13
Unicode - Encodes the world’s scripts
Code space of up to 0x10FFFF
(about 1.1 million) characters
Currently encodes 120,737
characters
Currently allocated code points
264,256
U+0041 <= hex notation
Plane
Allocated code
points
Assigned characters
0 BMP 65,392 55,181
1 SMP 14,000 11,833
2 SIP 53,424 53,386
3 TIP 16,672 799
14 SSP 368 337
15 PUA-A 65,536
16 PUA-B 65,536
Totals 264,256 120,737
14. © 2014 Progress Software Corporation. All rights reserved.14
Four Normalization Forms
Form D
canonical decomposition
Form C
canonical decomposition followed by
composition
Form KD
Compatibility decomposition
Form KC
Compatibility decomposition followed
by composition
ways to represent:
U+01FA
U+00C5 U+0301
U+00C1 U+030A
U+212B U+0301
U+0041 U+0301 U+030A
U+0041 U+030A U+0301
Ǻ
15. © 2014 Progress Software Corporation. All rights reserved.15
Unicode Encoding Forms
UTF-32
• Uses 32-bit code units
• All characters are the same width
UTF-16
• Uses 16-bit code units
• BMP characters use one 16-bit code unit
• Supplementary characters use two special 16-bit code units: a “surrogate pair”
UTF-8
• Uses 8-bit code units (bytes!)
• It’s a multi-byte encoding!
• Characters use between 1 and 4 bytes
• ASCII is ASCII in UTF-8
16. © 2014 Progress Software Corporation. All rights reserved.16
Localization testing tips
17. © 2014 Progress Software Corporation. All rights reserved.17
Case study: A Website + 10 languages + 4 Browsers + 20 test cases
L
O
C
A
L
I
Z
A
T
I
o
N
L
1
0
N
T
E
S
T
I
N
G
18. © 2014 Progress Software Corporation. All rights reserved.18
Localization testing UI checks
Layout Hot keys Text Graphics
• Text truncation
• Control truncation
• Misalignment
• Overlapping
• Tabbing order
• Oversized dialogs
• Different layout in
general
• Duplicated hotkeys
• Missing hotkey
• Inappropriate hotkey
• Un-translated text
• Mistranslated text
• Unexpected text
• Inconsistent translation
• Technical inaccuracy
• Double space after full
stop
• Wrong alphabetical
order
• Wrong date/time
format
• Corrupt characters
• Missing graphics
• Different graphics
• Un-translated icons
19. © 2014 Progress Software Corporation. All rights reserved.19
Pseudo Localization testing
A way to evaluate a website or software product’s readiness for the localization process
Considered a part of the internationalization testing process
1. Identify hard-coded strings that should be translatable
2. Find strings in the source files that shouldn’t be translated
3. Identify design and layout issues that will affect the
software or site when it is translated
20. © 2014 Progress Software Corporation. All rights reserved.20
Regional differences
21. © 2014 Progress Software Corporation. All rights reserved.21
More localization testing
Aspect Challenge
Limitation of screen size Character count and font of characters differ in various languages.
Direction
Some languages are written left to right, whereas others are
written right to left.
Spelling rules and upper and lower
case conversions Rules differ based on locale.
Regional Standards
Applications may have to be compatible with not only national
languages, but also the regional languages
Data Handling
Different data storages and processing mechanisms along with
different encoding/code pages.
Context and Special Characters
The translation of special characters needs to be handled carefully
as different characters may have different meanings in different
languages.
Collation And Sorting Sorting and collation rules differ in various languages.