Bayesian Counters
by Hadoop_Summit on Jun 19, 2012
- 1,512 views
Processing of large data requires new approaches to data mining: low, close to linear, complexity and stream processing. While in the traditional data mining the practitioner is usually presented with ...
Processing of large data requires new approaches to data mining: low, close to linear, complexity and stream processing. While in the traditional data mining the practitioner is usually presented with a static dataset, which might have just a timestamp attached to it, to infer a model for predicting future/takeout observations, in stream processing the problem is often posed as extracting as much information as possible on the current data to convert them to an actionable model within a limited time window. In this talk I present an approach based on HBase counters for mining over streams of data, which allows for massively distributed processing and data mining. I will consider overall design goals as well as HBase schema design dilemmas to speed up knowledge extraction process. I will also demo efficient implementations of Naive Bayes, Nearest Neighbor and Bayesian Learning on top of Bayesian Counters.
Accessibility
Categories
Upload Details
Uploaded via SlideShare as Adobe PDF
Usage Rights
© All Rights Reserved
Statistics
- Likes
- 0
- Downloads
- 0
- Comments
- 0
- Embed Views
- Views on SlideShare
- 1,440
- Total Views
- 1,512