Brisk - Truly peer-to-peer hadoop. …
Brisk - Truly peer-to-peer hadoop.
Brisk is an open-source Hadoop & Hive distribution that uses Apache Cassandra for its core services and storage. Brisk makes it possible to run Hadoop MapReduce on top of CassandraFS, an HDFS-compatible storage layer. By replacing HDFS with CassandraFS, users leverage MapReduce jobs on Cassandra’s peer-to-peer, fault-tolerant and scalable architecture.
With CassandraFS all nodes are peers. Data files can be loaded through any node in the cluster and any node can serve as the JobTracker for MapReduce jobs. Hive MetaStore is stored & accessed as just another column family (table) on the distributed data store. Brisk makes Hadoop truly peer-to-peer.
We demonstrate visualisation & monitoring of Brisk using OpsCenter. The operational simplicity of cassandra’s multi-datacenter & multi-region aware replication makes Brisk well-suited for a rich set of Applications and usecases. And by being able to store and isolate hdfs & online data within the same data cluster, Brisk makes analytics possible without ETL!
LA Scalability Talk, Mahalo