Your SlideShare is downloading. ×
ComplementaryNaiveBayesClassifier
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

ComplementaryNaiveBayesClassifier

15,980
views

Published on


0 Comments
7 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
15,980
On Slideshare
0
From Embeds
0
Number of Embeds
7
Actions
Shares
0
Downloads
52
Comments
0
Likes
7
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. (Complementary) Naive Bayes 15 2011/01/23 TokyoWebmining #9-2nd2011 1 23
  • 2. 2011 1 23
  • 3. naoki yanai @yanaoki web Java Ruby Hadoop TokyoWebmining2011 1 23
  • 4. - Naive Bayes -2011 1 23
  • 5. Naive Bayes ,Supervised Naive Baye ComplementNaiveBayes2011 1 23
  • 6. 2011 1 23
  • 7. Web2011 1 23
  • 8. Naive Bayes F1,...,Fn C C c ※ gihyo.jp http://gihyo.jp/dev/serial/01/machine-learning/00022011 1 23
  • 9. Naive Bayes w j i i c j θcj i ※ Wiki http://ibisforest.org/index.php?complement%20naive%20Bayes2011 1 23
  • 10. Complementaly Naive Bayes Naive Bayes Naive Bayes θcj NaiveBayes c j θcj CNaiveBayes c j θˆij ※ Wiki http://ibisforest.org/index.php?complement%20naive%20Bayes2011 1 23
  • 11. - -2011 1 23
  • 12. API Mahout Mahout Mahout2011 1 23
  • 13. Mahout Mahout CF Clustering Hadoop (Classifier) NaiveBayes ComplementaryNaiveBayes2011 1 23
  • 14. bayes cbayes2011 1 23
  • 15. 1. API API API mecab ID[TAB] c100371 1 ! : 45 45 10 SIZE - 02 : 56 B 76 31 SIZE - 03 : 57 B 822011 1 23
  • 16. 2. bayes|cbayes N-gram N-gram alpha Bayes/CBayes 35MB EC2 3 15 $ mahout trainclassifier --gramSize 1 --input /classifier/rakuten/data_search --output /classifier/rakuten/model_searchcbig --classifierType cbayes --dataSource hdfs --alpha 12011 1 23
  • 17. 3. $ mahout testclassifier --gramSize 1 --testDir /home/yanaoki/classifier/rakuten/data_rank1 --model /classifier/rakuten/model_searchcbig --classifierType cbayes --dataSource hdfs --alpha 12011 1 23
  • 18. 4. Classify HBase export CLASSPATH=... $ java org.apache.mahout.classifier.Classify --path /home/yanaoki/classifier/rakuten/model --classify /home/yanaoki/classifier/rakuten/d.doc --encoding UTF-8 --gramSize 1 --classifierType cbayes --dataSource hdfs 2011 1 23
  • 19. 4. Classify HBase export CLASSPATH=... $ java org.apache.mahout.classifier.Classify --path /home/yanaoki/classifier/rakuten/model --classify /home/yanaoki/classifier/rakuten/d.doc --encoding UTF-8 --gramSize 1 --classifierType cbayes --dataSource hdfs 2011 1 23
  • 20. 2011 1 23
  • 21. ID 100005 DIY 2997 4.8M 100227 2994 3.9M 100371 3000 4.6M 100939 2990 4.7M 101114 2997 3.3M 101381 2905 4.2M 200162 2997 2.6M 216131 2999 3.4M2011 1 23
  • 22. bayes ======================================================= Summary ------------------------------------------------------- Correctly Classified Instances          :      22442       93.9822% Incorrectly Classified Instances        :       1437        6.0178% Total Classified Instances              :      23879 ======================================================= Confusion Matrix ------------------------------------------------------- a       b       c       d       e       f       g       h       <--Classified as 2802    0       127     3       8       19      4       27       |  2990        a     = c100939 0       2982    2       0       0       0       0       13       |  2997        b     = c200162 0       7       2948    34      0       1       0       9        |  2999        c     = c216131 1       0       133     2846    0       1       0       19       |  3000        d     = c100371 233     2       53      0       2434    13      0       259      |  2994        e     = c100227 20      0       15      0       0       2935    3       24       |  2997        f     = c101114 0       0       39      0       1       139     2753    65       |  2997        g     = c100005 0       6       125     4       1       26      1       2742     |  2905        h     = c101381 Default Category: unknown: 82011 1 23
  • 23. bayes ======================================================= Summary ------------------------------------------------------- Correctly Classified Instances          :      22442       93.9822% Incorrectly Classified Instances        :       1437        6.0178% Total Classified Instances              :      23879 ======================================================= Confusion Matrix ------------------------------------------------------- a       b       c       d       e       f       g       h       <--Classified as 2802    0       127     3       8       19      4       27       |  2990        a     = c100939 0       2982    2       0       0       0       0       13       |  2997        b     = c200162 0       7       2948    34      0       1       0       9        |  2999        c     = c216131 1       0       133     2846    0       1       0       19       |  3000        d     = c100371 233     2       53      0       2434    13      0       259      |  2994        e     = c100227 20      0       15      0       0       2935    3       24       |  2997        f     = c101114 0       0       39      0       1       139     2753    65       |  2997        g     = c100005 0       6       125     4       1       26      1       2742     |  2905        h     = c101381 Default Category: unknown: 82011 1 23
  • 24. ID 100005 DIY 5997 9.0M 100227 5969 7.1M 100371 3000 4.6M 100939 2990 4.7M 101114 1500 2.4M 101381 1500 1.5M 200162 500 0.48M 216131 500 0.45M2011 1 23
  • 25. bayes ======================================================= Summary ------------------------------------------------------- Correctly Classified Instances          :      14459       60.5511% Incorrectly Classified Instances        :       9420       39.4489% Total Classified Instances              :      23879 ======================================================= Confusion Matrix ------------------------------------------------------- a       b       c       d       e       f       g       h       <--Classified as 1945    0       0       5       799     0       241     0        |  2990        a     = c100939 11      0       0       0       349     0       2632    5        |  2997        b     = c200162 6       0       0       833     82      0       2078    0        |  2999        c     = c216131 0       0       0       2925    13      0       62      0        |  3000        d     = c100371 0       0       0       0       2993    0       1       0        |  2994        e     = c100227 3       0       0       0       86      2479    429     0        |  2997        f     = c101114 0       0       0       0       39      0       2958    0        |  2997        g     = c100005 0       0       1       8       800     2       935     1159     |  2905        h     = c101381 Default Category: unknown: 82011 1 23
  • 26. complement naive bayes ======================================================= Summary ------------------------------------------------------- Correctly Classified Instances          :      17225       72.1345% Incorrectly Classified Instances        :       6654       27.8655% Total Classified Instances              :      23879 ======================================================= Confusion Matrix ------------------------------------------------------- a       b       c       d       e       f       g       h       <--Classified as 2249    0       0       5       588     0       148     0        |  2990        a     = c100939 10      1806    0       3       370     0       808     0        |  2997        b     = c200162 2       0       471     1084    252     0       1190    0        |  2999        c     = c216131 0       0       0       2985    4       0       11      0        |  3000        d     = c100371 0       0       0       0       2994    0       0       0        |  2994        e     = c100227 27      0       0       1       69      2422    478     0        |  2997        f     = c101114 0       0       0       0       33      0       2964    0        |  2997        g     = c100005 1       0       0       30      910     0       630     1334     |  2905        h     = c101381 Default Category: unknown: 82011 1 23
  • 27. API complement naive bayes Mahout2011 1 23
  • 28. 2011 1 23