Wait! Exclusive 60 day trial to the world's largest digital library.
The SlideShare family just got bigger. You now have unlimited* access to books, audiobooks, magazines, and more from Scribd.Cancel anytime.
OLAP is an acronym for online analytical processing. It focuses on reporting and in a broader sense, it is about answering schema oriented queries quickly. Queries could be “how many distinct infections seen for a threat in a given month” or “what is the maximum duration in last month that a particular infection was seen in my enterprise”.
Contrast this to OLTP or online transaction processing where storing a fast stream of transactional elements is more important.
If we talk about OLAP, Star Schema is the first thing that comes to mind. In a relational OLAP world, Star Schema is an important concept. Modeling OLAP data in Star Schema format means segregating data into Fact and Dimension tables. The central table represents couple of dimensions which constitutes a fact and one or more measures which we try to calculate. Measure is often a derived field and can be deduced with SQL queries like group by and aggregate functions.
We use Spark and HBase to implement a Hybrid OLAP system. We call it hybrid because we store data in both relational(ROLAP) and multi-dimensional (MOLAP) format.
MOLAP materialization can be best visualized as a lattice. Each of the circular points here is called Tile or Cuboid. Each of the tiles can be thought to be equivalent of Group By clause in SQL, aggregates like Sum or Count are implicit and not shown in the diagram. If we are reading the lattice from bottom to top we are skipping one field out of the 3 fields (Infection_type,country,monthId). The 2-D cuboids are based on dropping one field at a time. This is called roll up. Conversely if we start from the top i.e. 0-D cuboid and move downwards we are grouping by on one field, this is called drill down. There are various literature on how to do this rollup and drilldown efficiently and which cuboids to materialize. I would strongly recommend Han and Kamber's Data Mining book and the lattice paper by Harinarayan et al for deep understanding of this domain.