Talk on Apache Kudu, presented by Asim Jalis at SF Data Engineering Meetup on 2/23/2016.
Big Data applications need to ingest streaming data and analyze it. HBase is great at ingesting streaming data but not good at analytics. HDFS is great at analytics but not at ingesting streaming data. Frequently applications ingest data into HBase and then move it to HDFS for analytics. What if you could use a single system for both use cases?
What if you could use a single system for both use cases? This could dramatically simplify your data pipeline architecture.
This is where Kudu comes in. Kudu is a storage system that lives between HDFS and HBase. It is good at both ingesting streaming data and good at analyzing it using Spark, MapReduce, and SQL.