Flume is used simply to collect logs to a central place (HDFS) from multiple agents. But at the end we still have a single log file that something (raw log importer) then needs to process. No HBase is involved directly with Flume here and there is no HBase sink in this scenario.
Making use of Flume's ability to plug in different Sinks, so instead of just collecting data to a log file on HDFS, we hook up FLUME-247 Sink to Flume and make it write directly to HBase.
2h, 2K/min, 1sys (240K actions, 43mb of input data) 1193mb - no prune, no compress 624mb - prune sort index only, no compress 408mb - prune, no compress 196mb - no prune, copress 106mb - prune sort index only, compress 64mb - prune, compress
Transcript of "Search Analytics with Flume and HBase"
Search Analytics with Flume & HBase Otis Gospodneti ć ••• Sematext International
HBaseLog4JAppender Cons <ul><li>Doesn't help with reliable delivery </li><ul><li>e.g. when network or HBase down </li></ul><li>Non-centralized config with larger clusters </li><ul><li>e.g. changing destination table in HBase