Sunpinyin and Its Future Mike(Dai) Qin firstname.lastname@example.org University of Toronto July 17, 2011
Who Am I• Just call me Mike.• Used to be a student of Zhejiang University, Hangzhou. Now admitted by University of Toronto.• Was born in Tianjin. (That’s why I’m here.)• Has been using Linux since high school. A Fedora user now.• Has been a sunpinyin committer since 2009 winter.
Pinyin Input Method• Show several candidates according to Pinyin that user inputs.• Lots of commercial and free implementation. • Pinyin ABC • Microsoft Pinyin • Sougou Pinyin • QQ Pinyin
Approach• Dictionary based • A dictionary that contains all possible words. • Always look up from the dictionary upon user input. • Will adjust the order of candidates upon user commit.• Pros: Easy to implement.• Cons: Not intelligence enough when words are combined.• Implementation: • Pinyin ABC (Commercial) • Fcitx (GPL) • ibus-pinyin (GPL)
Approach• N-Gram based • Have a database of conditional probability. • Will try to calculate the sentence with the larger probability. • Interpolate between user commit history and database.• Pros: Intelligence!• Cons: Where can I get the database?• Implementation: • Sunpinyin (LGPL/CDDL) • Sougou Pinyin (Commercial)
Sunpinyin and OpenGram Project• Sunpinyin is a input method using N-Gram based method.• Free as in freedom. LGPL license.• It’s using tri-grams for build-in and bi-grams for user history.• OpenGram project aims at creating a tri-gram database for Simpliﬁed Chinese.• Free as in freedom. CC license.
Current Progress - Sunpinyin• Released 2.0.3, and we’re working on 2.1/2.5 release.• Works on Linux, BSD and OSX platform, with native interface.• Ported to Ibus(ibus-sunpinyin), Scim(scim-sunpinyin). Also have a standalone version(xsunpinyin).• Current Progress of next release. • Multiple Best Sentence. done • Partial Sentence. done • Plugin Support. WIP• Needs maintainer!
What Do I Need to Know Before Joining Sunpinyin?• Passion.• A little C++ knowledge. (Nobody in the team know C++ completely. :) )• That’s enough! and maybe plus • Ibus/Scim API, Xorg API or OSX API. • Python API. (Plugin Support) • Windows API. (Windows Port?)