Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

International Web Application Development

4,357 views

Published on

Sarah Allen's talk at Ruby Kaigi 2010 about how to deal with text in multiple languages when building web applications in Ruby.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

International Web Application Development

  1. 1. International Web Application Development
  2. 2. ookina umi no youni gengo ga arimasu. There is an ocean of language. http://www.flickr.com/photos/jimbrekke/429292020/
  3. 3. demo watashitachiwa webu apurike-shon o kaihatsusuro toki... But when we develop web applications...
  4. 4. ido no nakano kaeru no youni we are like the frog in the well, http://www.flickr.com/photos/clickykbd/2650909663/
  5. 5. jibuntachi no gengo dake o kangaemasu only thinking about our own language http://www.flickr.com/photos/clickykbd/2650909663/
  6. 6. demo webu wa ookina umi noyouina mono desu. but the web is a big ocean. http://www.flickr.com/photos/jimbrekke/429292020/
  7. 7. Sarah Allen @ultrasaurus Mightyverse sara aren desu. mightyverse o kaihatsu shite imasu
  8. 8. San Francisco san furanshisuko ni sunde imasu.
  9. 9. Mightyverse
  10. 10. mojibake
  11. 11. Character Encoding UTF8 JIS UTF16 Shift-JIS UTF32 EUC
  12. 12. Encoding Vocabulary Code point one or more bytes that represent a single character
  13. 13. Unicode UTF8 - variable length (1, 2, 3, or 4 bytes) UTF16 - variable length (2 or 4 bytes) UTF32 - fixed width (4 bytes)
  14. 14. UTF8 U+000 to U+127 1 byte ASCII = UTF8 High bit indicates more bytes
  15. 15. High bits are used to indicate how many bytes are used to represent a specific character. Software can easily read a UTF8 stream, even starting in the middle. http://tools.ietf.org/html/rfc3629#section-3
  16. 16. UTF8 Common for internet and file system format • XML: default encoding • Flash: only encoding
  17. 17. UTF8 Disadvantages UTF-8 encoded text may be larger Possible to split a string mid-character Excessive unification
  18. 18. Caution Not all implementations are complete For example, MySql5 supports only 3 bytes for UTF8
  19. 19. Most spoken languages can be represented in 3 bytes, the "Basic Multilingual Plane" http://www.siriusict.com/2010/08/06/ character-encoding-unicode-utf-8-and-a-bit-of-chauvinism-explained-for-the-masses-2/
  20. 20. http://globalmoxie.com/blog/klingon-not-spoken-here.shtml In May 2001, the Unicode Technical Committee rejected the Klingon proposal; however, Michael Everson created a mapping of pIqaD into the Private Use Area of Unicode, which are listed in the ConScript Unicode Registry (U+F8D0 to U+F8FF). http://en.wikipedia.org/wiki/Klingon_writing_systems
  21. 21. The tengwar font has been proposed for the Unicode standard. The codepoints are subject to change; the range U+016080 to U+0160FF in the SMP is tentatively allocated for tengwar according to the current Unicode roadmap. http://en.wikipedia.org/wiki/Tengwar
  22. 22. You need to have an appropriate font installed to use unicode. http://en.wikipedia.org/wiki/Tengwar
  23. 23. Web Application Story
  24. 24. 1. HTML Form post 2. Ruby code 3. Write to Database 4. Output HTML for Display
  25. 25. HTML Form Post
  26. 26. HTTP headers • You can specify what character set you want back when you send a form post • This is informational for the server • Just setting these won’t change how your app behaves, unless your web app has code for that
  27. 27. Ruby code
  28. 28. Ruby code Most web applications don’t parse text If yours does, you will need to think about Ruby 1.8 vs. Ruby 1.9
  29. 29. Ruby 1.8 >> name = "Yukihiro” => "Yukihiro” >> name[4] => 104 >> name[4].chr => "h" >> name = " " =>"3432012233432022233432012043432012413432 257” >> name[2] => 147 >> name[2].chr => ?
  30. 30. Ruby 1.9 name = "yukihiro” => "yukihiro” >> name[4] => "h" >> name = " ” => " ” >> name[2] => " ” >> name[0] => " "
  31. 31. Ruby Use Ruby 1.9 For Ruby 1.8 (if you must).... require 'jcode'
  32. 32. Database
  33. 33. Database A) Character encoding i. client ii. connection iii. server B) Collation
  34. 34. SQL client connection database
  35. 35. check database settings always use the same character set
  36. 36. Collation Different Languages Alphabetize Differently
  37. 37. Collation Swedish German Alingsås Ägypten Borgholm Äthiopien Eslöv Afghanistan Flen Bolivien Hässleholm Dänemark Tranås Deutschland Vetlanda Jamaika Växjö Marokko Ängelholm Österreich Örnsköldsvik Venezuela Östersund
  38. 38. Collation 1. Sorting 2. Equality
  39. 39. e é
  40. 40. 4 Output HTML for Display
  41. 41. Content Type • Setting the content-type tells the browser how to display the text • meta tag • http header
  42. 42. Questions? http://www.flickr.com/photos/daswunderkind/2689195410/

×