This document discusses Hadoop, HBase, Mahout, naive Bayes classification, and analyzing web content. It provides an example of using Mahout to train a naive Bayes classifier on web content stored in Hadoop and HBase. Evaluation results are presented, showing over 90% accuracy in classifying different types of web content. The effects of parameters like alpha values, n-grams, and feature selection are also explored.
4. HBase
• KeyValue
• read/write
• goal is the hosting of very large tables -- billions of rows ,
millions of columns ...
• Hadoop
• CAP C,P
• C: ,A: ,P:
• Sharding
• Hadoop/MapReduce
2011 4 18