0
Unihandecode An Unicode  transliteration  library to make people happy in  many  languages world wide OpenStreetMap Founda...
Who is @miurahr <ul><li>1998 Linux kernel Development
OSS/Linux Evangelist and enthusiast   
OpenStreetMap Japan founder
Most recent project:  Sinsai.info : east-japan disaster crisis mapping project – founder and sub-leader </li></ul>
Agenda <ul><li>Motivation  どうしてこれを作ったか。
Challenge  課題と方向性
Result   成果
Future   今後の方向 </li></ul>
Last Year: e-book launched!  <ul><li>Get 'Amazon Kindle 2 nd  edition' in April 2010 </li></ul>Kindle:[ 他動詞 ] 火をつける、燃え立たせる。
Calibre: e-book library manager <ul><li>Now I can read Web-site as E-book, Great!~
based on PyQt  </li></ul>
little  BIG problem 日経新聞 “ Ri Jing Xi Wen”
Oh, you guy, do you like Chinese?!
Unidecode for python
Unidecode library <ul><li>Originally for Perl by Sean M. Burke </li><ul><li>http://interglacial.com/tpj/22/ </li></ul></ul>
Oh, you understand the problem 日中韓でよみが違うって 理解してるじゃない。
Solution <ul><li>More  people speak  Chinese  than Japanese.
More people speak Japanese than Korean.
The Korean and  Japanese  pronunciations are often  derived from the Chinese  pronunciations. It rarely if ever went the o...
A  Japanese  or Korean person is more likely to have  studied Chinese  (meaning modern spoken Mandarin), than a Chinese pe...
Japanese study Chinese?!
Upcoming SlideShare
Loading in...5
×

unihandecode: An Unicode transliteration library

1,864

Published on

This is a presentation show as Lightening Talk on PyConJp 2011 in Tokyo, in 27, Aug, 2011

Project pages:
http://pypi.python.org/pypi/Unihandecode/0.31

https://launchpad.net/unihandecode

Published in: Technology
2 Comments
0 Likes
Statistics
Notes
  • Very interesting stuff. I wonder if you know, do iconv or intl / icu do anything like this 'out-of-the-box'?
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Video streaming by Ustream
    <br /><object type="application/x-shockwave-flash" data="http://www.ustream.tv/flash/viewer.swf" width="350" height="288"><param name="movie" value="http://www.ustream.tv/flash/viewer.swf"></param><embed src="http://www.ustream.tv/flash/viewer.swf" width="350" height="288" type="application/x-shockwave-flash"></embed></object>
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Be the first to like this

No Downloads
Views
Total Views
1,864
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
7
Comments
2
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "unihandecode: An Unicode transliteration library"

  1. 1. Unihandecode An Unicode transliteration library to make people happy in many languages world wide OpenStreetMap Foundation Japan Representative director Hiroshi Miura slideshare: miurahr transliteration: 音訳
  2. 2. Who is @miurahr <ul><li>1998 Linux kernel Development
  3. 3. OSS/Linux Evangelist and enthusiast   
  4. 4. OpenStreetMap Japan founder
  5. 5. Most recent project: Sinsai.info : east-japan disaster crisis mapping project – founder and sub-leader </li></ul>
  6. 6. Agenda <ul><li>Motivation どうしてこれを作ったか。
  7. 7. Challenge 課題と方向性
  8. 8. Result  成果
  9. 9. Future   今後の方向 </li></ul>
  10. 10. Last Year: e-book launched! <ul><li>Get 'Amazon Kindle 2 nd edition' in April 2010 </li></ul>Kindle:[ 他動詞 ] 火をつける、燃え立たせる。
  11. 11. Calibre: e-book library manager <ul><li>Now I can read Web-site as E-book, Great!~
  12. 12. based on PyQt </li></ul>
  13. 13. little BIG problem 日経新聞 “ Ri Jing Xi Wen”
  14. 14. Oh, you guy, do you like Chinese?!
  15. 15. Unidecode for python
  16. 16. Unidecode library <ul><li>Originally for Perl by Sean M. Burke </li><ul><li>http://interglacial.com/tpj/22/ </li></ul></ul>
  17. 17. Oh, you understand the problem 日中韓でよみが違うって 理解してるじゃない。
  18. 18. Solution <ul><li>More people speak Chinese than Japanese.
  19. 19. More people speak Japanese than Korean.
  20. 20. The Korean and Japanese pronunciations are often derived from the Chinese pronunciations. It rarely if ever went the other way.
  21. 21. A Japanese or Korean person is more likely to have studied Chinese (meaning modern spoken Mandarin), than a Chinese person is to have studied Japanese or Korean. </li></ul>
  22. 22. Japanese study Chinese?!
  23. 23. Unihandecode <ul><li>Locale aware transliterate library for python
  24. 24. Semi-automatic generation of transliteration table </li><ul><li>From UNICODE.org standards text file. </li></ul><li>Support CJKV charsets </li><ul><li>bring KAKASI feature to unidecode </li></ul><li>Several test case for east and south asian languages. </li></ul>
  25. 25. Unicode.org unihan data table
  26. 26. KAKASI <ul><li>C language binding, no release after 2004 </li></ul>
  27. 27. Project page at LaunchPad.net
  28. 28. Implementation /data/ 日本語 韓国語 ベトナム語 デフォルト 中国語 table pykakasi 部分 漢字->かな 辞書アクセス かな->ローマ kakasi class 辞書 異体字 漢字よみ かなローマ Unicode 定義
  29. 29. How to use it from unihandecode import Unidecoder d = Unidecoder() # default latin+chinese d.decode(u&quot;u5317u4EB0&quot;) # 北京 'Bei Jing' d = Unidecoder(lang='ja') # prefer japanese d.decode(u&quot;u5317u4EB0&quot;) 'Pe King'
  30. 30. Good roman file name 日経新聞 “ Nikkeishinbun”
  31. 31. Result <ul><li>Now Calibre e-book library understand Japanese book/newpaper title!~
  32. 32. Unihandecode library is now support Latin languages, Arabian Japanese, Korean, Chinese, Kantonese, Vietnam Trial to India languages.
  33. 33. Now it has a some functionality of KAKASI </li><ul><li>convert KANJI-to-YOMI </li></ul></ul>
  34. 34. Future <ul><li>Adoption in routines generating URL </li><ul><li>Blog/CMS permalink
  35. 35. Wiki URL </li></ul><li>Hopefully find another usage
  36. 36. Better Greek, Indic characters
  37. 37. More test case difficult for non-speaker to prepare </li></ul>
  38. 38. Thank you
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×