ibm i globalization v3.11

1,482 views

Published on

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,482
On SlideShare
0
From Embeds
0
Number of Embeds
12
Actions
Shares
0
Downloads
10
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

ibm i globalization v3.11

  1. 1. CEC2011 – IBM i Globalization stefano.tassi@dedanext.it
  2. 2. My profile CEC2011
  3. 3. keywords• I(nternazionalizatio)n i18n – Process of producing a product(design and code) indipendent of a language, script, culture or character set  Neutral +• L(ocalizatio)n l10n – Process of adapting an internazionalized product to specific languages, scripts, cultures and character sets  Customize, extend = CEC2011
  4. 4. keywords• G(lobalization)n g11n – Proper design and execution so one instance of software, executing on a single machine, can process multilingual data ad present it culturally correct in a multicultural environment; G11N = I18N + L10N + Multilingual Support CEC2011
  5. 5. Character representation• Some characters from Italy, Germany, France, China, Greece, Sweden, Japan… CEC2011
  6. 6. Character representation• CS – Character set – A collection of elements used to represent textual information (e.g. 0-9, a-z, A-Z, .,;:!? … ) – A Character Set generally supports more than one language CEC2011
  7. 7. Character SET  a subset of chars• CS 695 – Euro Country Extended Code Page CEC2011
  8. 8. Character SET  a subset of chars• CS 925 – Greece CEC2011
  9. 9. Character SET  a subset of chars• CS 1172 – Japanese alpha and Katakana CEC2011
  10. 10. Character SET  a subset of chars• CS 1150 – Cyrillic Russian CEC2011
  11. 11. Character SET  a subset of chars• CS 1174 – People’s Republic of China CEC2011
  12. 12. Character SET  a subset of chars CEC2011
  13. 13. Code Page• Code Page (CP) – Defines a subset of characters from a Character Set – Each character in a character set is assigned a numerical representation (Hex Code) CEC2011
  14. 14. CCSID• A unique number (0-65535) used by IBM to uniquely identify a Character Set and a Code Page• Defines an ENCODING Scheme CEC2011
  15. 15. Encoding Scheme ES Encoding Scheme 1100 EBCDIC, single-byte, No code extension is allowed 1301 EBCDIC, mixed single-byte and double-byte, using shift-in (SI) and shift-out (SO) code extension method 4100 ISO 8, single-byte, No code extension is allowed 7200 UCS-2, No code extension is allowed 7808 UTF-8, No code extension is allowed Encoding Scheme • EBCDIC – SBCS (1Byte/Char) • EBCDIC – DBCS (2Byte/Char) • ASCII (1Byte/Char) • UNICODE (………) CEC2011
  16. 16. CCSID - Attributes CCSID Character Set Code Page Encoding Scheme Description 37 697 37 1100 USA 273 697 273 1100 Germany 280 697 280 1100 Italy 1025 1150 1025 1100 Cyrillic Russian 1388 1174 836 1301 Simplified Chinese Code Page 836 Simplified Chinese Extended 37 USA/Canada - CECP 273 Germany F.R./Austria - CECP 280 Italy - CECP 1025 Cyriliic multilingual Character Set 697 Latin 1 1150 Cyrillic Multilingual 1174 Simplified Chinese Ext (EBCDIC/PC Common) Encoding Scheme 1100 EBCDIC, single-byte, No code extension is allowed. Number of States = 1. 1301 EBCDIC, mixed single-byte and double-byte, using shift-in (SI) and shift-out (SO) code extension method CEC2011
  17. 17. CCSID 1140: USA 1144: ITA• Same CS (697 Latin-1) Different CP  Different CCSID  Different Character position CEC2011
  18. 18. Fixed/Variant Code PointsVARIANT Code PointsCharacters that dochange hex values(position):§, £, #, $, @, !FIXED Code PointsCharacter that doNOT chages hexvaluesA-Z, a-z, 0-9, ()/+-_*%.;:, Hint: Avoid using characters that are not in the invariant character set for names and literals in programs. CEC2011
  19. 19. SBCS-DBCS• SBCS – EBCDIC – Each CCSID can store x’FF’ = 256 Chars• DBCS – EBCDIC – Each CCSID can store x’FFFF’ = 65536 Chars – APAC Only: Chinese (Simplifies and Traditional) Japanese Korean CEC2011
  20. 20. Data Integrity• If characters are in both CCSID – Ok  match!• Else – Roundtrip ITA è  USA }  ITA è – Substitution char Some cases (e.g.FTP)  Substitution char x’3F’ CEC2011
  21. 21. !• Never use CCSID 65535 in a multilingual Environment• 65535 means  NO TRANSLATE – turns off automatic conversion – maintains the same codepoint across different Codepages• 65535  ok in a single language env CEC2011
  22. 22. CCSID PF-SRC PF-DTA Numeric columns  NO CCSID CEC2011
  23. 23. CCSID CEC2011
  24. 24. CCSID - escalation • Job CCSID if set is used. • If the Job CCSID is set to *USRPRF then the user profile is checked. • If the user profile CCSID is set then it is used. • If the user profile value is set to *SYSVAL then the system value is checked. • If the system value is set to 65535 then the Language id is checked. • If the language id value is set then the QTQ_DEFAULT_CCSID is used, else the language id is converted to a CCSID. CEC2011
  25. 25. iSeries Access for windows • Not UNICODE Compliant • Needs NL Installation • Depends on Client (Win) codepage Client Language CCSID CodePage German 273 850 Italian 280 850 Russian 1025 866 Simpl.Chinese 1388 936 CEC2011
  26. 26. iSeries Access for windows ClientLanguage CCSID CodePageGerman 273 850Italian 280 850Russian 1025 866Simpl.Chinese 1388 936 CEC2011
  27. 27. iSeries Access fow windows CEC2011
  28. 28. iSeries Access fow windows Limits: 1 CCSID/Job CEC2011
  29. 29. National Language• Primary and secondary Language CEC2011
  30. 30. National Language• Primary and secondary Language CEC2011
  31. 31. National Language• Primary and secondary Language CEC2011
  32. 32. About CP, CS, CCSIDhttp://www-01.ibm.com/software/globalization/g11n-res.html CEC2011
  33. 33. Limits• SBCS/DBCS• Limits :one CCSID(language)/Work Session• Limits :one CCSID(language)/DB.Column• Limits :more code (SBCS/DBCS) CEC2011
  34. 34. Unicode• Single Character Set – Contains all current and paste languages – A unique number for every character – Different way to store data (not only 16bit) – Has mapping to all CharSets CEC2011
  35. 35. Unicode• Now – Hundreds of CCSID: one for each language (SBCS/DBCS)• Unicode – One encoding system includes all language characters CEC2011
  36. 36. UnicodeThere is a code page for everylanguage, each character beingrepresented by a number CEC2011
  37. 37. Unicode - Endian Little Endian (intel) UTF16 LE UTF16 BE Big Endian i5 NO Endian CEC2011
  38. 38. Unicode - Encodings First version of unicode  2 byte/Char  65535 Characters Version 2  multibyte  > 1 million characters Unicode supports three UTF formats there are three widely accepted schemes, or Unicode transformation formats ( UTFs ) – UTF-8 – UTF-16 (default) – UTF-32 CEC2011
  39. 39. Unicode - Encodings • Unicode (UCS-2) support 3 UTF formats – UTF8 No Endian WEB Multibyte – UTF16 Little-Big Endian (Little: Intel) Host Languages on i5 (RPG/CBL) – UTF32 No support on i5 CEC2011
  40. 40. Unicode - Encodings UTF8 8 bit Blocks ABC  x’414243’ UTF16 16 bit Blocks ABC  x’004100420043’ UTF32 32 bit Blocks ABC  x’000000041000000042000000043’ CEC2011
  41. 41. Unicode - Multibyte UTF8 (example) depending on the first bits…  CEC2011
  42. 42. Unicode – Multibyte - example UTF16 BE UTF16 LE UTF8: 11100100-10001000- 10101101 CEC2011
  43. 43. Unicode - CCSID Encoding CCSID Note Char Unit UTF-8 1208 from 5.3 8 Bit UTF-16 1200 from 5.3 16 Bit UTF-32 NA 32 Bit UCS-2 13488 superseded --> UTF-16 16 Bit UCS-4 NA 32 Bit UTF-8 (Unicode Transformation Format) is mapping algorithm : 1 char  1-n Octets Memory usage depend on different languages e.g. English  1 Byte/Char Greek/Russian/Arabian/Hebrew  1,7 Byte/Char UTF8 Other European languages  1,1 Byte/Char Chinese/Japanese/Hindi/Korean  3 Byte/Char CCSID: 1208 Data TYPE : CHAR UTF16 1 Char  1-n 16BitGroups UTF16 UTF-16 is the standard for Unicode. CCSID: 1200 (or 13488) UCS-2 (Universal Multiple-Octet Coded Character Set) Data TYPE: Graphic Superseded by UTF16 CEC2011
  44. 44. Unicode CEC2011
  45. 45. Unicode Remember… 5250 Screen  1 CS – NO UNICODE Allowed But… CEC2011
  46. 46. Unicode – i access for WEB English Russian Chinese CEC2011
  47. 47. iSeries Navigator and Unicode CEC2011
  48. 48. Unicode - enabled software • Unicode - enabled software Websphere Lotus Domino DB2 UDB IFS Web browsers XML Java • I5/OS components not Unicode enabled QSYS library system OS/400 message files PersonalCOMmunication CEC2011
  49. 49. USER Interface• DDS-5250• JDBC-ODBC-WEB – Rewrite apps CEC2011
  50. 50. RPG and Unicode Default: Unicode CCSID 13488 If you need CCSID 1200 CEC2011
  51. 51. RPG and Unicode Very Easy! Remember: Char and Unicode : Different weight CEC2011
  52. 52. CCSID to CCSID • LF support • iconv() CEC2011
  53. 53. Something about IFS • Table fields have a CCSID Tag • Stream File in IFS has CCSID Tag • Stream File in other system doesn’t CEC2011
  54. 54. Something about IFS UTF16 BE UTF16 LE How to translate correctly? CEC2011
  55. 55. Something about IFS UTF16 BE UTF16 LE BOM – Byte order mark first bytes of stream file CEC2011
  56. 56. Something about IFS CEC2011
  57. 57. Something about IFS Iconv() CEC2011
  58. 58. Something about IFS • Table fields have a CCSID Tag • Stream Files in IFS have CCSID Tag • Stream Files in other system don’t • Stream files have BOM • Table columns don’t CEC2011
  59. 59. php Means: php does not FULL support UTF-16 CEC2011
  60. 60. php – setup UTF8 CEC2011
  61. 61. php – setup UTF8 Column DESCR  CCSID 1208/13400/1200 Read correctly from 1208, 1200, 13488 Write correctly from phpvars to 1208 CEC2011
  62. 62. php CEC2011
  63. 63. Globalization guidelines • User interface • messages, dialog boxes, online manuals, audio output, animations, windows, help text, tutorials, diagnostics, clip art, icons, and any presentation control that is necessary to convey information to users • Culture and conventions • Date and time, Address, Numeric shapes, Numeric Values • Product structure CEC2011
  64. 64. User Interface Variable Order Icons Avoid text in icons. Avoid internationally recognized symbols in icons. (e.s. star6, cross/plus sign) Avoid the use of national flags in icons. Line break rules You cannot use Latin script-based text formatting algorithms for Chinese/Japanese CEC2011
  65. 65. Culture and conventions Calendar Allow the user to select the calendar and calendar format. Be prepared to adapt to other calendar requirements. CEC2011
  66. 66. Culture and conventions Date and Time Country Format Russia 08 sen. 1994 g. The Netherlands 08 september 1994 Bulgaria 1994-IX-08 Arabic countries 08/09/94 Germany 8.9.1994 Iran 1373/6/17 Islamic lunar 1415/4/2 Israel 3 Trishrey 5755 Country Format Canada 2.00 p. Canada (Québec) 14 h Italy 14.00 Sweden kl 14.00 USA 2.00 p. CEC2011
  67. 67. Culture and conventions Timezones Time zones and daylight savings time (DST) affect time stamps. There are some 3part products (e.g. TZN/400) I5 system values doesn’t support different TZ LPAR can be a solution You can write our routine: offset can depend from the user, the InfoSystem… (Before trigger) CEC2011
  68. 68. Culture and conventions Paper Sizes Letter, A4… Cardinal number shape Numeric Values Negative numbers format Decimal and thousands separators Monetary Amount Country Format US $12,345.67 US USD 12,345.67 Denmark kr 12.345,67 France 12 345,67 € Portugal 12.345$67 € CEC2011
  69. 69. Culture and conventions Measurement system Miles, inches, km, °C, °F…. First day of week Address Fields, Labels, presentation order Telephone formats + - . numbers CEC2011
  70. 70. Product structure Isolating culture and language sensitive parts • easy to change Write one set of application source code that will work correctly, without modification, in each of the required countries or regions. CEC2011
  71. 71. TNX CEC2011

×