This talk focuses on building a system from scratch, showing how to perform analytical queries in near real-time and still get the benefits of high performance database engine of Cassandra. The key subjects of my speech are:
● The splendors and miseries of NoSQL
● Apache Cassandra use-cases
● Difficulties of using MapReduce directly in Cassandra
● Amazon cloud solutions: Elastic MapReduce and S3
● “real-enough” time analysis
In particular the talk dives into ways of handling different kinds of semi-ad-hoc queries when using Cassandra, the pitfalls in designing a schema around a specific analytics use case. Some attention will be paid towards dealing with time series data in particular, which can present a real problem when using Column-Family or Key-Value store databases.