Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Casual-Talk #1 青虫の生態について

2,005 views

Published on

2009/11/20に開催された(Perl) Casual-Talk #1の発表資料です。
Algorithm::NaiveBayesを使ったベイジアンフィルタの実装についてお話しました。

Published in: Technology
  • Be the first to comment

Casual-Talk #1 青虫の生態について

  1. 1. (aomushi510) 2009 11 24
  2. 2. aomushi • perl • • casual-perl IRC 2009 11 24
  3. 3. aomushi • JPA 2009 11 24
  4. 4. • Algorithm::NaiveBayes OK/NG 2009 11 24
  5. 5. Algorithm::NaiveBayes • http://search.cpan.org/~kwilliams/ Algorithm-NaiveBayes-0.04/lib/ Algorithm/NaiveBayes.pm • • AI::Categorizer Lingua::JA::Categorize 2009 11 24
  6. 6. 2009 11 24
  7. 7. aomushi • Mecab 2009 11 24
  8. 8. aomushi aomushi , , ,*,*,*,* , ,*,*,*,*, , , ,, , ,*,*,*,*, , , • Mecab ,, , , ,*,*,*, , , ,, , ,*,*,*,*, , , ,, , ,*,*, , , , , ,, , ,*,*,*,*, , , ,, 2009 11 24
  9. 9. aomushi • NaiveBayes aomushi , , ,*,*,*,* , ,*,*,*,*, , , , ,*,*,*,*, , , ,, • , ,*,*,*,*, , , ,, 2009 11 24
  10. 10. 28 for (my $node = $self->mecab->parse($text); $node; $node = $node->next) { 29 my $info = $node->feature; 30 my $word = $node->surface; 31 next unless $info; 32 if ( $info =~ /^ /){ 33 next 34 if $info =~ / | | | | | /; 35 next if List::MoreUtils::any { $word eq $_ } @{ $self->_skip_word }; 36 $data->{$word}++; 37 } 38 } 39 return $data; 2009 11 24
  11. 11. mecab • naist-dic wikipedia • deepneko • http://deepneko.dyndns.org/ kokotech/2009/06/ mecabwikipedia.html • NG 2009 11 24
  12. 12. • • NG • OK NG • 10000 2009 11 24
  13. 13. 59 while ( my ( $label, $ref ) = each %$categories ) { 60 my $words = $self->_get_words($ref->{display}); 61 foreach (@$words) { 62 my $tokenizer = MyFilter::Util::Tokenizer->new; 63 my $word_set = $tokenizer->tokenize($_, $self->threshold); 64 65 $brain->add_instance( 66 attributes => $word_set, 67 label => $label, 68 ); 69 } 70 $brain->train; 71 } 72 $brain->save_state($save_file) if $save_file; 2009 11 24
  14. 14. 31 sub categorize { 32 my ($self, $word_set) = @_; 33 34 return $self->brain->predict( attributes => $word_set ); 35 } 2009 11 24
  15. 15. • bad $result = { good => 0.092, bad => 0.996, }; 2009 11 24
  16. 16. • • • ao shi • 2009 11 24
  17. 17. • 2009 11 24
  18. 18. • 2009 11 24
  19. 19. • 2009 11 24
  20. 20. 2009 11 24
  21. 21. 2009 11 24
  22. 22. P-1 • 200 NG ( ) • 2009 11 24
  23. 23. 3 2009 11 24
  24. 24. 3 2009 11 24
  25. 25. 2 2009 11 24
  26. 26. 2 2009 11 24
  27. 27. 1 2009 11 24
  28. 28. 1 2009 11 24
  29. 29. • • Algorithm::NaiveBayes • mecab • yusukebe 2009 11 24
  30. 30. 2009 11 24
  31. 31. • Algorithm::NaiveBayes • http://search.cpan.org/~kwilliams/Algorithm-NaiveBayes-0.04/ lib/Algorithm/NaiveBayes.pm • mecab wikipedia • http://deepneko.dyndns.org/kokotech/2009/06/ mecabwikipedia.html • Lingua::JA::Categorize • http://search.cpan.org/~miki/Lingua-JA-Categorize-0.01001/ lib/Lingua/JA/Categorize.pm 2009 11 24

×