What is it? • System to query very large amount of data. • Used by en<re Inmobi Sales and Analy<cs teams. • Also used by mul<ple other Inmobi projects to extract data. • Built en<rely on top of Hadoop system. • Highly op<mized for storage and eﬃciency.
What is it? • Ingests close to 2.5B events per day. • A record consists of events belonging to diﬀerent <me-‐shiKed streams. • Provides a uniﬁed (joined) view of the events. • Queries are asynchronous and the goal is to execute most queries in under 2 minutes.
Supported Features • Mul<ple Aggregates (SUM, MAX, MIN, DISTINCT, COUNTDISTINCT etc.) • Custom formula expressions. • Powerful expression based ﬁlters. Integrated with JEP expression library. • Decode, Truncate etc. • Top, Having • UDF.
Why not Hive, Pig etc. • We do not need a very generic and complex system for our requirements – we have focused more on speed and resource op<miza<on and our analyst requirements. • Single Job: Most user queries can be modeled as a single MR job – hive, pig launch job chains which process data mul<ple <mes. We pack everything in one job. • Pre-‐process fact-‐fact joins and have smart meta-‐data joins on map side (can join with 10s of tables running into up-‐to 500 MB size). • Op<mized query planning/execu<on based on our data model. • Flexible – can quickly iden<fy and ﬁx problems or add new features.