• Email
  • Like
  • Save
  • Private Content
  • Embed
 

Short Text Language Detection with Infinity-Gram

by

  • 13,992 views

 

Accessibility

Categories

Upload Details

Uploaded via SlideShare as Adobe PDF

Usage Rights

© All Rights Reserved

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate. If needed, use the feedback form to let us know more details.

Cancel

12 Embeds 8,582

http://shuyo.wordpress.com 5861
http://d.hatena.ne.jp 2540
http://www.redditmedia.com 98
https://twitter.com 38
https://si0.twimg.com 13
https://twimg0-a.akamaihd.net 8
http://us-w1.rockmelt.com 8
http://hatenatunnel.appspot.com 7
http://webcache.googleusercontent.com 5
http://iblunk.com 2
http://131.253.14.250 1
http://a0.twimg.com 1

More...

Statistics

Likes
10
Downloads
48
Comments
1
Embed Views
8,582
Views on SlideShare
5,410
Total Views
13,992

11 of 1 previous next

  • samuemarks Samuel Marks 'Our Goal is 99%+ accurate detection for 'sentence with more than 3 words'' Hope they define sentence somewhere! More interesting is their use of Tries and ESAs. I suspect a more efficient (in practise and asymptotic space+time) implementation of ESAs could be created by forming them using a special kind of Suffix Tree (rather than Array); which uses hash maps and sibling lists (made of skew binary lists or even regular linked lists). The normalisation stuff looked fairly generic, but the Tweet sampling was irritating because they're just using Twitter's algorithm; which they don't reference :(. A 98% accuracy before even reducing bias is fairly impressive! 'Double array' was mentioned in reference to https://github.com/shuyo/ldig on slide 54. Not sure what to make of that (is it ESA?!); might take a look through the code later… Some need for improvement (especially related to lack of compression and use of overly suboptimal data-structures). However there were some very nice accuracy results at the end there :) 2 months ago
    Are you sure you want to
Post Comment
Edit your comment

Short Text Language Detection with Infinity-Gram Short Text Language Detection with Infinity-Gram Presentation Transcript