SlideShare a Scribd company logo
Grill
Unified analytics platform
Amareshwari Sriramadasu
Amareshwari
Sriramadasu
• Architect in Data platform
team at Inmobi
• Working in Hadoop and eco
systems since 2007
• Apache Hadoop – PMC
• Apache Hive– Committer
• Worked with Yahoo! earlier
Analytics at Inmobi - Problem areas and Motivation
Introduction to Grill
OLAP Model
Query examples
Grill design
Agenda
Digital advertising at Inmobi
Courtesy: http://www.liesdamnedlies.com/
Owns & Sells
Real estate
on digital
inventory
Has reach to
users
Wants to
target Users
Brings money
Market place
Consumer
Analytics Use cases
• Understanding Trends &
Inference
• Forecasting and Anamoly
detection
Data scientists
• Feedback to improve Ad
Relevance in Real Time
Engineering systems
• Troubleshooting of issues
Developers
• Publisher/Advertiser specific
analytics(dashboards)
Advertisers and publishers
• Tracking specific accounts
Account managers
• Inventory sizing and
estimation
Business/Product
analysts
• Canned/Dashboard queries
• Adhoc queries
• Interactive/Batch queries
• Scheduled queries
• Infer insights through ML algorithms
Analytics use cases
Analytics Warehouses at Inmobi and Scale
• Billions of Ad Requests/Impressions per day
• 170 TB Hadoop Warehouse
• 5 TB SQL Columnar Datawarehouse
• 70 TB Hbase cluster
Why both Hadoop and SQL warehouse?
Canned
Adhoc
Response Times
IO (Input
Records)
Adhoc
Canned Adhoc
Query Engine
Query
Dashboard queries
are mostly canned
and Interactive
Adhoc queries can be
Interactive or batch
depending on the
data volumes and
query complexity
• Disparate user experience
• Disparate data storage and execution engines
• Schema management across storages
• Not leveraging ‘SQL on Hadoop’ community
Problems
Analytics at Inmobi - Problem areas and Motivation
Introduction to Grill
OLAP Model
Query examples
Grill design
Agenda
GRILL
Analytics As Service
Unify the Catalog and Query layer for Adhoc/Canned
Batch/Interactive
Reports on single Interface
Grill Architecture
Data Layout – Fact data
Dim-cuts, Measures
Cost
Data Layout – Dimension data
…
Subsetm (am <
am-1)
…
Subset2 (a2 < a1)
Subset1 (a1 < ax)
All attributes (ax)
Cost
Data Layout – Snowflake
Aggr Factk :
measures (mak <=
ma(k-1)), dims (dak <
da(k-1))
…..
Aggr Fact2 : measures (ma2 <=
ma1), dimensions (da2 < da1)
Aggr Fact1 : measures (ma1 <= mr),
dimensions (da1 < dr)
Raw Fact: measures (mr), dimensions(dr)
Dim2_1
Dim3
Dim2
Dim4_1
Dim4
Dim1
Associates structure to data
Provides Metastore and
catalog service – Hcatalog
Provides pluggable storage
interface
Accepts SQL like queries
HQL is widely adopted
language by systems like
Shark, Impala
Has strong apache
community
Data warehouse features like
cubes, facts, dimensions
Logical table associated with
multiple physical storages
Pluggable execution engine
Query lifecycle management
Query quota management
Scheduling queries
WhatdoesHiveprovide
WhatismissinginHive
Apache Hive to the rescue
Analytics at Inmobi - Problem areas and Motivation
Introduction to Grill
OLAP Model
Query examples
Grill design
Agenda
Data Model
Cube
Fact Table
• Physical Fact
tables
Dimension
Dimension
Table
• Physical
Dimension tables
Storage
Data Model - Cube
Expressions
• Any expression with
reachable fields
Dim-attributes
• Simple dim-attribute
• Referenced dim-attribute
• Hierarchical dim-attribute
• Timed dim-attribute
Measure
• Column Measure
Cube
Measures
Dim-
attributes
Expressions
Note : Some of the concepts are borrowed from
http://community.pentaho.com/projects/mondrian/
Data Model - Dimension
Expressions
• Any expression with
reachable fields
Attributes
• Simple dim-attribute
• Referenced dim-attribute
• Hierarchical dim-attribute
Dimension
Attributes Expressions
Data Model - Relationships
Cube
Dimension
Dimension
Fact
table
Cube
Fact
table
Storage
Dimension
Table
Dimension
Dimensio
n table
Storage
Data Model
Aggrk Fact
Table
…..
Aggr2 Fact Table
Aggr1 Fact Table
Raw FactTable
Dim2_table1
Dim2_table2
Dim3_table1
Dim2_table2Dimtable1
Dimtable2
CUBE
Dimension
Dimension
Dimension
Analytics at Inmobi - Problem areas and Motivation
Introduction to Grill
OLAP Model
Query examples
Grill design
Agenda
CUBE SELECT [DISTINCT] select_expr, select_expr, ...
FROM cube_table_reference
WHERE [where_condition AND] TIME_RANGE_IN(colName , from, to)
[GROUP BY col_list]
[HAVING having_expr]
[ORDER BY colList]
[LIMIT number]
Queries on OLAP cubes
• SELECT ( city.name ), ( city.stateid ) FROM c2_citytable city LIMIT 100
• SELECT ( city.name ), ( city.stateid ) FROM c1_citytable city WHERE (city.dt =
'latest') LIMIT 100
cube select name, stateid from city limit 100
Example query
Example query
• SELECT (citytable.name), sum((testcube.msr2)) FROM c2_testfact testcube INNER
JOIN c1_citytable city ON ((testcube.cityid)= (city.id)) WHERE (( testcube.dt='2014-03-
10-03') OR (testcube.dt='2014-03-10-04') OR (testcube.dt='2014-03-10-05') OR
(testcube.dt='2014-03-10-06') OR (testcube.dt='2014-03-10-07') OR (testcube.dt='2014-03-
10-08') OR (testcube.dt='2014-03-10-09') OR (testcube.dt='2014-03-10-10') OR
(testcube.dt='2014-03-10-11') OR (testcube.dt='2014-03-10-12') OR (testcube.dt='2014-03-
10-13') OR (testcube.dt='2014-03-10-14') OR (testcube.dt='2014-03-10-15') OR
(testcube.dt='2014-03-10-16') OR (testcube.dt='2014-03-10-17') OR (testcube.dt='2014-03-
10-18') OR (testcube.dt='2014-03-10-19') OR (testcube.dt='2014-03-10-20') OR
(testcube.dt='2014-03-10-21') OR (testcube.dt='2014-03-10-22') OR (testcube.dt='2014-03-
10-23') OR (testcube.dt='2014-03-11') OR (testcube.dt='2014-03-12-00') OR
(testcube.dt='2014-03-12 -01') OR (testcube.dt='2014-03-12-02') )AND (city.dt = 'latest')
GROUP BY(city.name)
cube select city.name, msr2 from testcube where
timerange_in(dt, '2014-03-10-03’, '2014-03-12-03’)
Analytics at Inmobi - Problem areas and Motivation
Introduction to Grill
OLAP Model
Query examples
Grill design
Agenda
Implements an interface
• explain
• execute
• executeAsynchronously
• fetchResults
• Specify all storages it can support
Pluggable execution engine
OLAP Cube QL query
Rewrite query for available execution engine’s
supported storages
Get cost of the rewritten query from each
execution engine
Pick up execution engine with least cost and
fire the query
Cube query with multiple execution engines
Grill Server
Grill – current state
Server
Query Service
Metastore Service
Metrics
Query statistics(In progress)
Scheduled queries(In
progress)
Query caching(In progress)
Client
Java Client
CLI
JDBC Client
Execution Engine
Hive Driver
JDBC Driver
• Normalize query cost
• Load balancing across execution engines
• Alter meta hooks in StorageHandler
• Authentication and authorization
• Machine learning through Grill
• Query quota management
Grill roadmap
Github source for grill
• https://github.com/InMobi/grill
Github source for Hive
• https://github.com/InMobi/hive
Documentation
• http://inmobi.github.io/grill
Mailing lists
• grill-users@googlegroups.com
• grill-dev@googlegroups.com
References
Thank you!
• amareshwari@apache.org
Backup
Data Model – Storage
Storage
• Name
• End point
• Properties
• Ex : ProdCluster, StagingCluster, Postgres1,
HBase1, HBase2
Data Model – Fact Table
Fact
table
Cube
Fact
table
Storage
FactTable
• Columns
• Cube that it belongs
• Storages on which it is
present and the
associated update
periods
Data Model – Dimension table
DimensionTable
• Columns
• Dimension to which it
belongs
• Storages on which it is
present and associated
snapshot dump period, if
any.
Dimension
Table
Dimension
Dimension
table
Storage
Data Model – Storage tables and partitions
Storagetable
• Belongs to fact/dimension
• Associated storage descriptor
• Partitioned by columns
• Naming convention – storage
name followed by
fact/dimension name
• Partition can override its
storage descriptor
• Fact storage table
Fact table
• Dimension storage table
Dimension table
Resolve candidate tables and storages
Automatically resolve joins, aggregations
Allows SQL over Cube QL
Queries can span multiple storages
Accepts multiple time ranges in query
All Hive QL features
Query features
Adhoc querying system Internal Dashboards
Customer facing
Dashboards and
Reporting
Analytics systems at Inmobi

More Related Content

What's hot

StreamSQL Feature Store (Apache Pulsar Summit)
StreamSQL Feature Store (Apache Pulsar Summit)StreamSQL Feature Store (Apache Pulsar Summit)
StreamSQL Feature Store (Apache Pulsar Summit)
Simba Khadder
 
Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...
Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...
Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...
PAPIs.io
 
(BDT310) Big Data Architectural Patterns and Best Practices on AWS
(BDT310) Big Data Architectural Patterns and Best Practices on AWS(BDT310) Big Data Architectural Patterns and Best Practices on AWS
(BDT310) Big Data Architectural Patterns and Best Practices on AWS
Amazon Web Services
 
Machine Learning with Apache Spark
Machine Learning with Apache SparkMachine Learning with Apache Spark
Machine Learning with Apache Spark
IBM Cloud Data Services
 
The Feature Store in Hopsworks
The Feature Store in HopsworksThe Feature Store in Hopsworks
The Feature Store in Hopsworks
Jim Dowling
 
Pandas UDF: Scalable Analysis with Python and PySpark
Pandas UDF: Scalable Analysis with Python and PySparkPandas UDF: Scalable Analysis with Python and PySpark
Pandas UDF: Scalable Analysis with Python and PySpark
Li Jin
 
Hopsworks - The Platform for Data-Intensive AI
Hopsworks - The Platform for Data-Intensive AIHopsworks - The Platform for Data-Intensive AI
Hopsworks - The Platform for Data-Intensive AI
QAware GmbH
 
Continuous Evaluation of Deployed Models in Production Many high-tech industr...
Continuous Evaluation of Deployed Models in Production Many high-tech industr...Continuous Evaluation of Deployed Models in Production Many high-tech industr...
Continuous Evaluation of Deployed Models in Production Many high-tech industr...
Databricks
 
(ARC202) Real-World Real-Time Analytics | AWS re:Invent 2014
(ARC202) Real-World Real-Time Analytics | AWS re:Invent 2014(ARC202) Real-World Real-Time Analytics | AWS re:Invent 2014
(ARC202) Real-World Real-Time Analytics | AWS re:Invent 2014
Amazon Web Services
 
Introduction to Amazon Athena
Introduction to Amazon AthenaIntroduction to Amazon Athena
Introduction to Amazon Athena
Amazon Web Services
 
Dataflow in 104corp - DataConTW2018
Dataflow in 104corp - DataConTW2018Dataflow in 104corp - DataConTW2018
Dataflow in 104corp - DataConTW2018
Gavin Lin
 
Interactive Analytics on AWS - AWS Summit Tel Aviv 2017
Interactive Analytics on AWS - AWS Summit Tel Aviv 2017Interactive Analytics on AWS - AWS Summit Tel Aviv 2017
Interactive Analytics on AWS - AWS Summit Tel Aviv 2017
Amazon Web Services
 
Spark ML with High Dimensional Labels Michael Zargham and Stefan Panayotov
Spark ML with High Dimensional Labels Michael Zargham and Stefan PanayotovSpark ML with High Dimensional Labels Michael Zargham and Stefan Panayotov
Spark ML with High Dimensional Labels Michael Zargham and Stefan Panayotov
Databricks
 
AWS Webcast - AWS Kinesis Webinar
AWS Webcast - AWS Kinesis WebinarAWS Webcast - AWS Kinesis Webinar
AWS Webcast - AWS Kinesis Webinar
Amazon Web Services
 
Scaling machine learning as a service at Uber — Li Erran Li at #papis2016
Scaling machine learning as a service at Uber — Li Erran Li at #papis2016Scaling machine learning as a service at Uber — Li Erran Li at #papis2016
Scaling machine learning as a service at Uber — Li Erran Li at #papis2016
PAPIs.io
 
AWS re:Invent 2016: Taking Data to the Extreme (MBL202)
AWS re:Invent 2016: Taking Data to the Extreme (MBL202)AWS re:Invent 2016: Taking Data to the Extreme (MBL202)
AWS re:Invent 2016: Taking Data to the Extreme (MBL202)
Amazon Web Services
 
Real-time Streaming and Querying with Amazon Kinesis and Amazon Elastic MapRe...
Real-time Streaming and Querying with Amazon Kinesis and Amazon Elastic MapRe...Real-time Streaming and Querying with Amazon Kinesis and Amazon Elastic MapRe...
Real-time Streaming and Querying with Amazon Kinesis and Amazon Elastic MapRe...
Amazon Web Services
 
AWS Summit 2013 | Singapore - Big Data Analytics, Presented by AWS, Intel and...
AWS Summit 2013 | Singapore - Big Data Analytics, Presented by AWS, Intel and...AWS Summit 2013 | Singapore - Big Data Analytics, Presented by AWS, Intel and...
AWS Summit 2013 | Singapore - Big Data Analytics, Presented by AWS, Intel and...
Amazon Web Services
 
Supercharging the Value of Your Data with Amazon S3
Supercharging the Value of Your Data with Amazon S3Supercharging the Value of Your Data with Amazon S3
Supercharging the Value of Your Data with Amazon S3
Amazon Web Services
 
Data Warehousing with Amazon Redshift
Data Warehousing with Amazon RedshiftData Warehousing with Amazon Redshift
Data Warehousing with Amazon Redshift
Amazon Web Services
 

What's hot (20)

StreamSQL Feature Store (Apache Pulsar Summit)
StreamSQL Feature Store (Apache Pulsar Summit)StreamSQL Feature Store (Apache Pulsar Summit)
StreamSQL Feature Store (Apache Pulsar Summit)
 
Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...
Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...
Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...
 
(BDT310) Big Data Architectural Patterns and Best Practices on AWS
(BDT310) Big Data Architectural Patterns and Best Practices on AWS(BDT310) Big Data Architectural Patterns and Best Practices on AWS
(BDT310) Big Data Architectural Patterns and Best Practices on AWS
 
Machine Learning with Apache Spark
Machine Learning with Apache SparkMachine Learning with Apache Spark
Machine Learning with Apache Spark
 
The Feature Store in Hopsworks
The Feature Store in HopsworksThe Feature Store in Hopsworks
The Feature Store in Hopsworks
 
Pandas UDF: Scalable Analysis with Python and PySpark
Pandas UDF: Scalable Analysis with Python and PySparkPandas UDF: Scalable Analysis with Python and PySpark
Pandas UDF: Scalable Analysis with Python and PySpark
 
Hopsworks - The Platform for Data-Intensive AI
Hopsworks - The Platform for Data-Intensive AIHopsworks - The Platform for Data-Intensive AI
Hopsworks - The Platform for Data-Intensive AI
 
Continuous Evaluation of Deployed Models in Production Many high-tech industr...
Continuous Evaluation of Deployed Models in Production Many high-tech industr...Continuous Evaluation of Deployed Models in Production Many high-tech industr...
Continuous Evaluation of Deployed Models in Production Many high-tech industr...
 
(ARC202) Real-World Real-Time Analytics | AWS re:Invent 2014
(ARC202) Real-World Real-Time Analytics | AWS re:Invent 2014(ARC202) Real-World Real-Time Analytics | AWS re:Invent 2014
(ARC202) Real-World Real-Time Analytics | AWS re:Invent 2014
 
Introduction to Amazon Athena
Introduction to Amazon AthenaIntroduction to Amazon Athena
Introduction to Amazon Athena
 
Dataflow in 104corp - DataConTW2018
Dataflow in 104corp - DataConTW2018Dataflow in 104corp - DataConTW2018
Dataflow in 104corp - DataConTW2018
 
Interactive Analytics on AWS - AWS Summit Tel Aviv 2017
Interactive Analytics on AWS - AWS Summit Tel Aviv 2017Interactive Analytics on AWS - AWS Summit Tel Aviv 2017
Interactive Analytics on AWS - AWS Summit Tel Aviv 2017
 
Spark ML with High Dimensional Labels Michael Zargham and Stefan Panayotov
Spark ML with High Dimensional Labels Michael Zargham and Stefan PanayotovSpark ML with High Dimensional Labels Michael Zargham and Stefan Panayotov
Spark ML with High Dimensional Labels Michael Zargham and Stefan Panayotov
 
AWS Webcast - AWS Kinesis Webinar
AWS Webcast - AWS Kinesis WebinarAWS Webcast - AWS Kinesis Webinar
AWS Webcast - AWS Kinesis Webinar
 
Scaling machine learning as a service at Uber — Li Erran Li at #papis2016
Scaling machine learning as a service at Uber — Li Erran Li at #papis2016Scaling machine learning as a service at Uber — Li Erran Li at #papis2016
Scaling machine learning as a service at Uber — Li Erran Li at #papis2016
 
AWS re:Invent 2016: Taking Data to the Extreme (MBL202)
AWS re:Invent 2016: Taking Data to the Extreme (MBL202)AWS re:Invent 2016: Taking Data to the Extreme (MBL202)
AWS re:Invent 2016: Taking Data to the Extreme (MBL202)
 
Real-time Streaming and Querying with Amazon Kinesis and Amazon Elastic MapRe...
Real-time Streaming and Querying with Amazon Kinesis and Amazon Elastic MapRe...Real-time Streaming and Querying with Amazon Kinesis and Amazon Elastic MapRe...
Real-time Streaming and Querying with Amazon Kinesis and Amazon Elastic MapRe...
 
AWS Summit 2013 | Singapore - Big Data Analytics, Presented by AWS, Intel and...
AWS Summit 2013 | Singapore - Big Data Analytics, Presented by AWS, Intel and...AWS Summit 2013 | Singapore - Big Data Analytics, Presented by AWS, Intel and...
AWS Summit 2013 | Singapore - Big Data Analytics, Presented by AWS, Intel and...
 
Supercharging the Value of Your Data with Amazon S3
Supercharging the Value of Your Data with Amazon S3Supercharging the Value of Your Data with Amazon S3
Supercharging the Value of Your Data with Amazon S3
 
Data Warehousing with Amazon Redshift
Data Warehousing with Amazon RedshiftData Warehousing with Amazon Redshift
Data Warehousing with Amazon Redshift
 

Viewers also liked

Ed Laws, Francois Deschenes, InMobi
Ed Laws, Francois Deschenes, InMobiEd Laws, Francois Deschenes, InMobi
Ed Laws, Francois Deschenes, InMobi
anastasiaalikova
 
Mobile Media Consumption: A New Wave Takes Shape
Mobile Media Consumption: A New Wave Takes Shape Mobile Media Consumption: A New Wave Takes Shape
Mobile Media Consumption: A New Wave Takes Shape
Vivastream
 
UK Retail - Rising Role Of Mobile For Modern Grocery Buyers
UK Retail - Rising Role Of Mobile For Modern Grocery BuyersUK Retail - Rising Role Of Mobile For Modern Grocery Buyers
UK Retail - Rising Role Of Mobile For Modern Grocery Buyers
InMobi
 
Building Audience Analytics Platform
Building Audience Analytics PlatformBuilding Audience Analytics Platform
Building Audience Analytics Platform
InMobi Technology
 
Concept of CSR in islam
Concept of CSR in islamConcept of CSR in islam
Concept of CSR in islam
Farhan Ahmad
 
Pixeom
PixeomPixeom
Winning Strategies for Going Global
Winning Strategies for Going GlobalWinning Strategies for Going Global
Winning Strategies for Going Global
InMobi
 
How Successful Brand Marketers Combine the Power of Mobile Data Science And C...
How Successful Brand Marketers Combine the Power of Mobile Data Science And C...How Successful Brand Marketers Combine the Power of Mobile Data Science And C...
How Successful Brand Marketers Combine the Power of Mobile Data Science And C...
InMobi
 
Work life balance
Work life balanceWork life balance
Work life balance
Duy Do Phan
 
Everything a developer needs to know about the mobile video ads
Everything a developer needs to know about the mobile video ads Everything a developer needs to know about the mobile video ads
Everything a developer needs to know about the mobile video ads
InMobi
 
Work-life Balance
Work-life BalanceWork-life Balance
Work-life Balance
tin bulac
 
Project report on work life balance
Project report on work life balanceProject report on work life balance
Project report on work life balance
Khushbu Malara
 
Work life balance issues- How to deal with it.
Work life balance issues- How to deal with it.Work life balance issues- How to deal with it.
Work life balance issues- How to deal with it.
Sandipan Samaddar
 
Work-Life Balance Presentation
Work-Life Balance PresentationWork-Life Balance Presentation
Work-Life Balance Presentation
grawitch
 
Work Life Balance
Work Life BalanceWork Life Balance
Work Life Balance
Megha Bormudai
 
Work life balance
Work life balanceWork life balance
Work life balance
Kanhaiya Kumar
 
Corporate Social Responsibility
Corporate Social ResponsibilityCorporate Social Responsibility
Corporate Social Responsibility
Shahzad Khan
 

Viewers also liked (17)

Ed Laws, Francois Deschenes, InMobi
Ed Laws, Francois Deschenes, InMobiEd Laws, Francois Deschenes, InMobi
Ed Laws, Francois Deschenes, InMobi
 
Mobile Media Consumption: A New Wave Takes Shape
Mobile Media Consumption: A New Wave Takes Shape Mobile Media Consumption: A New Wave Takes Shape
Mobile Media Consumption: A New Wave Takes Shape
 
UK Retail - Rising Role Of Mobile For Modern Grocery Buyers
UK Retail - Rising Role Of Mobile For Modern Grocery BuyersUK Retail - Rising Role Of Mobile For Modern Grocery Buyers
UK Retail - Rising Role Of Mobile For Modern Grocery Buyers
 
Building Audience Analytics Platform
Building Audience Analytics PlatformBuilding Audience Analytics Platform
Building Audience Analytics Platform
 
Concept of CSR in islam
Concept of CSR in islamConcept of CSR in islam
Concept of CSR in islam
 
Pixeom
PixeomPixeom
Pixeom
 
Winning Strategies for Going Global
Winning Strategies for Going GlobalWinning Strategies for Going Global
Winning Strategies for Going Global
 
How Successful Brand Marketers Combine the Power of Mobile Data Science And C...
How Successful Brand Marketers Combine the Power of Mobile Data Science And C...How Successful Brand Marketers Combine the Power of Mobile Data Science And C...
How Successful Brand Marketers Combine the Power of Mobile Data Science And C...
 
Work life balance
Work life balanceWork life balance
Work life balance
 
Everything a developer needs to know about the mobile video ads
Everything a developer needs to know about the mobile video ads Everything a developer needs to know about the mobile video ads
Everything a developer needs to know about the mobile video ads
 
Work-life Balance
Work-life BalanceWork-life Balance
Work-life Balance
 
Project report on work life balance
Project report on work life balanceProject report on work life balance
Project report on work life balance
 
Work life balance issues- How to deal with it.
Work life balance issues- How to deal with it.Work life balance issues- How to deal with it.
Work life balance issues- How to deal with it.
 
Work-Life Balance Presentation
Work-Life Balance PresentationWork-Life Balance Presentation
Work-Life Balance Presentation
 
Work Life Balance
Work Life BalanceWork Life Balance
Work Life Balance
 
Work life balance
Work life balanceWork life balance
Work life balance
 
Corporate Social Responsibility
Corporate Social ResponsibilityCorporate Social Responsibility
Corporate Social Responsibility
 

Similar to Fifth elephant-grill

Datacubes in Apache Hive at ApacheCon
Datacubes in Apache Hive at ApacheConDatacubes in Apache Hive at ApacheCon
Datacubes in Apache Hive at ApacheCon
amarsri
 
Apache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
Apache Kylin: OLAP Engine on Hadoop - Tech Deep DiveApache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
Apache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
Xu Jiang
 
Amazon Redshift with Full 360 Inc.
Amazon Redshift with Full 360 Inc.Amazon Redshift with Full 360 Inc.
Amazon Redshift with Full 360 Inc.
Amazon Web Services
 
Machine Learning Models in Production
Machine Learning Models in ProductionMachine Learning Models in Production
Machine Learning Models in Production
DataWorks Summit
 
IncQuery Server for Teamwork Cloud - Talk at IW2019
IncQuery Server for Teamwork Cloud - Talk at IW2019IncQuery Server for Teamwork Cloud - Talk at IW2019
IncQuery Server for Teamwork Cloud - Talk at IW2019
Istvan Rath
 
Scaling your Analytics with Amazon Elastic MapReduce (BDT301) | AWS re:Invent...
Scaling your Analytics with Amazon Elastic MapReduce (BDT301) | AWS re:Invent...Scaling your Analytics with Amazon Elastic MapReduce (BDT301) | AWS re:Invent...
Scaling your Analytics with Amazon Elastic MapReduce (BDT301) | AWS re:Invent...
Amazon Web Services
 
Serverless SQL
Serverless SQLServerless SQL
Serverless SQL
Torsten Steinbach
 
AWS January 2016 Webinar Series - Getting Started with Big Data on AWS
AWS January 2016 Webinar Series - Getting Started with Big Data on AWSAWS January 2016 Webinar Series - Getting Started with Big Data on AWS
AWS January 2016 Webinar Series - Getting Started with Big Data on AWS
Amazon Web Services
 
Hadoop for the Data Scientist: Spark in Cloudera 5.5
Hadoop for the Data Scientist: Spark in Cloudera 5.5Hadoop for the Data Scientist: Spark in Cloudera 5.5
Hadoop for the Data Scientist: Spark in Cloudera 5.5
Cloudera, Inc.
 
A practical guidance of the enterprise machine learning
A practical guidance of the enterprise machine learning A practical guidance of the enterprise machine learning
A practical guidance of the enterprise machine learning
Jesus Rodriguez
 
Operationalizing Machine Learning at Scale at Starbucks
Operationalizing Machine Learning at Scale at StarbucksOperationalizing Machine Learning at Scale at Starbucks
Operationalizing Machine Learning at Scale at Starbucks
Databricks
 
Lens at apachecon
Lens at apacheconLens at apachecon
Lens at apachecon
amarsri
 
Apache Kylin Extreme OLAP Engine for Big Data
Apache Kylin Extreme OLAP Engine for Big DataApache Kylin Extreme OLAP Engine for Big Data
Apache Kylin Extreme OLAP Engine for Big Data
Luke Han
 
Apache Kylin @ Big Data Europe 2015
Apache Kylin @ Big Data Europe 2015Apache Kylin @ Big Data Europe 2015
Apache Kylin @ Big Data Europe 2015
Seshu Adunuthula
 
Big data and Analytics on AWS
Big data and Analytics on AWSBig data and Analytics on AWS
Big data and Analytics on AWS
2nd Watch
 
Build and Manage Your APIs with Amazon API Gateway
Build and Manage Your APIs with Amazon API GatewayBuild and Manage Your APIs with Amazon API Gateway
Build and Manage Your APIs with Amazon API Gateway
Amazon Web Services
 
(ATS6-APP06) Accelrys LIMS and Accelrys ELN integration
(ATS6-APP06) Accelrys LIMS and Accelrys ELN integration    (ATS6-APP06) Accelrys LIMS and Accelrys ELN integration
(ATS6-APP06) Accelrys LIMS and Accelrys ELN integration
BIOVIA
 
PCM18 (Big Data Analytics)
PCM18 (Big Data Analytics)PCM18 (Big Data Analytics)
PCM18 (Big Data Analytics)
Stratebi
 
MongoDB .local Houston 2019: Building an IoT Streaming Analytics Platform to ...
MongoDB .local Houston 2019: Building an IoT Streaming Analytics Platform to ...MongoDB .local Houston 2019: Building an IoT Streaming Analytics Platform to ...
MongoDB .local Houston 2019: Building an IoT Streaming Analytics Platform to ...
MongoDB
 
Pacemaker hadoop infrastructure and soft serve experience
Pacemaker   hadoop infrastructure and soft serve experiencePacemaker   hadoop infrastructure and soft serve experience
Pacemaker hadoop infrastructure and soft serve experience
Vitaliy Bashun
 

Similar to Fifth elephant-grill (20)

Datacubes in Apache Hive at ApacheCon
Datacubes in Apache Hive at ApacheConDatacubes in Apache Hive at ApacheCon
Datacubes in Apache Hive at ApacheCon
 
Apache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
Apache Kylin: OLAP Engine on Hadoop - Tech Deep DiveApache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
Apache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
 
Amazon Redshift with Full 360 Inc.
Amazon Redshift with Full 360 Inc.Amazon Redshift with Full 360 Inc.
Amazon Redshift with Full 360 Inc.
 
Machine Learning Models in Production
Machine Learning Models in ProductionMachine Learning Models in Production
Machine Learning Models in Production
 
IncQuery Server for Teamwork Cloud - Talk at IW2019
IncQuery Server for Teamwork Cloud - Talk at IW2019IncQuery Server for Teamwork Cloud - Talk at IW2019
IncQuery Server for Teamwork Cloud - Talk at IW2019
 
Scaling your Analytics with Amazon Elastic MapReduce (BDT301) | AWS re:Invent...
Scaling your Analytics with Amazon Elastic MapReduce (BDT301) | AWS re:Invent...Scaling your Analytics with Amazon Elastic MapReduce (BDT301) | AWS re:Invent...
Scaling your Analytics with Amazon Elastic MapReduce (BDT301) | AWS re:Invent...
 
Serverless SQL
Serverless SQLServerless SQL
Serverless SQL
 
AWS January 2016 Webinar Series - Getting Started with Big Data on AWS
AWS January 2016 Webinar Series - Getting Started with Big Data on AWSAWS January 2016 Webinar Series - Getting Started with Big Data on AWS
AWS January 2016 Webinar Series - Getting Started with Big Data on AWS
 
Hadoop for the Data Scientist: Spark in Cloudera 5.5
Hadoop for the Data Scientist: Spark in Cloudera 5.5Hadoop for the Data Scientist: Spark in Cloudera 5.5
Hadoop for the Data Scientist: Spark in Cloudera 5.5
 
A practical guidance of the enterprise machine learning
A practical guidance of the enterprise machine learning A practical guidance of the enterprise machine learning
A practical guidance of the enterprise machine learning
 
Operationalizing Machine Learning at Scale at Starbucks
Operationalizing Machine Learning at Scale at StarbucksOperationalizing Machine Learning at Scale at Starbucks
Operationalizing Machine Learning at Scale at Starbucks
 
Lens at apachecon
Lens at apacheconLens at apachecon
Lens at apachecon
 
Apache Kylin Extreme OLAP Engine for Big Data
Apache Kylin Extreme OLAP Engine for Big DataApache Kylin Extreme OLAP Engine for Big Data
Apache Kylin Extreme OLAP Engine for Big Data
 
Apache Kylin @ Big Data Europe 2015
Apache Kylin @ Big Data Europe 2015Apache Kylin @ Big Data Europe 2015
Apache Kylin @ Big Data Europe 2015
 
Big data and Analytics on AWS
Big data and Analytics on AWSBig data and Analytics on AWS
Big data and Analytics on AWS
 
Build and Manage Your APIs with Amazon API Gateway
Build and Manage Your APIs with Amazon API GatewayBuild and Manage Your APIs with Amazon API Gateway
Build and Manage Your APIs with Amazon API Gateway
 
(ATS6-APP06) Accelrys LIMS and Accelrys ELN integration
(ATS6-APP06) Accelrys LIMS and Accelrys ELN integration    (ATS6-APP06) Accelrys LIMS and Accelrys ELN integration
(ATS6-APP06) Accelrys LIMS and Accelrys ELN integration
 
PCM18 (Big Data Analytics)
PCM18 (Big Data Analytics)PCM18 (Big Data Analytics)
PCM18 (Big Data Analytics)
 
MongoDB .local Houston 2019: Building an IoT Streaming Analytics Platform to ...
MongoDB .local Houston 2019: Building an IoT Streaming Analytics Platform to ...MongoDB .local Houston 2019: Building an IoT Streaming Analytics Platform to ...
MongoDB .local Houston 2019: Building an IoT Streaming Analytics Platform to ...
 
Pacemaker hadoop infrastructure and soft serve experience
Pacemaker   hadoop infrastructure and soft serve experiencePacemaker   hadoop infrastructure and soft serve experience
Pacemaker hadoop infrastructure and soft serve experience
 

Recently uploaded

Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
Rohit Gautam
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
Neo4j
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
Zilliz
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
Kumud Singh
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
Neo4j
 
20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website
Pixlogix Infotech
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
Claudio Di Ciccio
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
Neo4j
 

Recently uploaded (20)

Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
 
20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
 

Fifth elephant-grill

  • 2. Amareshwari Sriramadasu • Architect in Data platform team at Inmobi • Working in Hadoop and eco systems since 2007 • Apache Hadoop – PMC • Apache Hive– Committer • Worked with Yahoo! earlier
  • 3. Analytics at Inmobi - Problem areas and Motivation Introduction to Grill OLAP Model Query examples Grill design Agenda
  • 4. Digital advertising at Inmobi Courtesy: http://www.liesdamnedlies.com/ Owns & Sells Real estate on digital inventory Has reach to users Wants to target Users Brings money Market place Consumer
  • 5. Analytics Use cases • Understanding Trends & Inference • Forecasting and Anamoly detection Data scientists • Feedback to improve Ad Relevance in Real Time Engineering systems • Troubleshooting of issues Developers • Publisher/Advertiser specific analytics(dashboards) Advertisers and publishers • Tracking specific accounts Account managers • Inventory sizing and estimation Business/Product analysts
  • 6. • Canned/Dashboard queries • Adhoc queries • Interactive/Batch queries • Scheduled queries • Infer insights through ML algorithms Analytics use cases
  • 7. Analytics Warehouses at Inmobi and Scale • Billions of Ad Requests/Impressions per day • 170 TB Hadoop Warehouse • 5 TB SQL Columnar Datawarehouse • 70 TB Hbase cluster
  • 8. Why both Hadoop and SQL warehouse? Canned Adhoc Response Times IO (Input Records) Adhoc Canned Adhoc Query Engine Query Dashboard queries are mostly canned and Interactive Adhoc queries can be Interactive or batch depending on the data volumes and query complexity
  • 9. • Disparate user experience • Disparate data storage and execution engines • Schema management across storages • Not leveraging ‘SQL on Hadoop’ community Problems
  • 10. Analytics at Inmobi - Problem areas and Motivation Introduction to Grill OLAP Model Query examples Grill design Agenda
  • 11. GRILL Analytics As Service Unify the Catalog and Query layer for Adhoc/Canned Batch/Interactive Reports on single Interface
  • 13. Data Layout – Fact data Dim-cuts, Measures Cost
  • 14. Data Layout – Dimension data … Subsetm (am < am-1) … Subset2 (a2 < a1) Subset1 (a1 < ax) All attributes (ax) Cost
  • 15. Data Layout – Snowflake Aggr Factk : measures (mak <= ma(k-1)), dims (dak < da(k-1)) ….. Aggr Fact2 : measures (ma2 <= ma1), dimensions (da2 < da1) Aggr Fact1 : measures (ma1 <= mr), dimensions (da1 < dr) Raw Fact: measures (mr), dimensions(dr) Dim2_1 Dim3 Dim2 Dim4_1 Dim4 Dim1
  • 16. Associates structure to data Provides Metastore and catalog service – Hcatalog Provides pluggable storage interface Accepts SQL like queries HQL is widely adopted language by systems like Shark, Impala Has strong apache community Data warehouse features like cubes, facts, dimensions Logical table associated with multiple physical storages Pluggable execution engine Query lifecycle management Query quota management Scheduling queries WhatdoesHiveprovide WhatismissinginHive Apache Hive to the rescue
  • 17. Analytics at Inmobi - Problem areas and Motivation Introduction to Grill OLAP Model Query examples Grill design Agenda
  • 18. Data Model Cube Fact Table • Physical Fact tables Dimension Dimension Table • Physical Dimension tables Storage
  • 19. Data Model - Cube Expressions • Any expression with reachable fields Dim-attributes • Simple dim-attribute • Referenced dim-attribute • Hierarchical dim-attribute • Timed dim-attribute Measure • Column Measure Cube Measures Dim- attributes Expressions Note : Some of the concepts are borrowed from http://community.pentaho.com/projects/mondrian/
  • 20. Data Model - Dimension Expressions • Any expression with reachable fields Attributes • Simple dim-attribute • Referenced dim-attribute • Hierarchical dim-attribute Dimension Attributes Expressions
  • 21. Data Model - Relationships Cube Dimension Dimension Fact table Cube Fact table Storage Dimension Table Dimension Dimensio n table Storage
  • 22. Data Model Aggrk Fact Table ….. Aggr2 Fact Table Aggr1 Fact Table Raw FactTable Dim2_table1 Dim2_table2 Dim3_table1 Dim2_table2Dimtable1 Dimtable2 CUBE Dimension Dimension Dimension
  • 23. Analytics at Inmobi - Problem areas and Motivation Introduction to Grill OLAP Model Query examples Grill design Agenda
  • 24. CUBE SELECT [DISTINCT] select_expr, select_expr, ... FROM cube_table_reference WHERE [where_condition AND] TIME_RANGE_IN(colName , from, to) [GROUP BY col_list] [HAVING having_expr] [ORDER BY colList] [LIMIT number] Queries on OLAP cubes
  • 25. • SELECT ( city.name ), ( city.stateid ) FROM c2_citytable city LIMIT 100 • SELECT ( city.name ), ( city.stateid ) FROM c1_citytable city WHERE (city.dt = 'latest') LIMIT 100 cube select name, stateid from city limit 100 Example query
  • 26. Example query • SELECT (citytable.name), sum((testcube.msr2)) FROM c2_testfact testcube INNER JOIN c1_citytable city ON ((testcube.cityid)= (city.id)) WHERE (( testcube.dt='2014-03- 10-03') OR (testcube.dt='2014-03-10-04') OR (testcube.dt='2014-03-10-05') OR (testcube.dt='2014-03-10-06') OR (testcube.dt='2014-03-10-07') OR (testcube.dt='2014-03- 10-08') OR (testcube.dt='2014-03-10-09') OR (testcube.dt='2014-03-10-10') OR (testcube.dt='2014-03-10-11') OR (testcube.dt='2014-03-10-12') OR (testcube.dt='2014-03- 10-13') OR (testcube.dt='2014-03-10-14') OR (testcube.dt='2014-03-10-15') OR (testcube.dt='2014-03-10-16') OR (testcube.dt='2014-03-10-17') OR (testcube.dt='2014-03- 10-18') OR (testcube.dt='2014-03-10-19') OR (testcube.dt='2014-03-10-20') OR (testcube.dt='2014-03-10-21') OR (testcube.dt='2014-03-10-22') OR (testcube.dt='2014-03- 10-23') OR (testcube.dt='2014-03-11') OR (testcube.dt='2014-03-12-00') OR (testcube.dt='2014-03-12 -01') OR (testcube.dt='2014-03-12-02') )AND (city.dt = 'latest') GROUP BY(city.name) cube select city.name, msr2 from testcube where timerange_in(dt, '2014-03-10-03’, '2014-03-12-03’)
  • 27. Analytics at Inmobi - Problem areas and Motivation Introduction to Grill OLAP Model Query examples Grill design Agenda
  • 28. Implements an interface • explain • execute • executeAsynchronously • fetchResults • Specify all storages it can support Pluggable execution engine
  • 29. OLAP Cube QL query Rewrite query for available execution engine’s supported storages Get cost of the rewritten query from each execution engine Pick up execution engine with least cost and fire the query Cube query with multiple execution engines
  • 31. Grill – current state Server Query Service Metastore Service Metrics Query statistics(In progress) Scheduled queries(In progress) Query caching(In progress) Client Java Client CLI JDBC Client Execution Engine Hive Driver JDBC Driver
  • 32. • Normalize query cost • Load balancing across execution engines • Alter meta hooks in StorageHandler • Authentication and authorization • Machine learning through Grill • Query quota management Grill roadmap
  • 33. Github source for grill • https://github.com/InMobi/grill Github source for Hive • https://github.com/InMobi/hive Documentation • http://inmobi.github.io/grill Mailing lists • grill-users@googlegroups.com • grill-dev@googlegroups.com References
  • 36. Data Model – Storage Storage • Name • End point • Properties • Ex : ProdCluster, StagingCluster, Postgres1, HBase1, HBase2
  • 37. Data Model – Fact Table Fact table Cube Fact table Storage FactTable • Columns • Cube that it belongs • Storages on which it is present and the associated update periods
  • 38. Data Model – Dimension table DimensionTable • Columns • Dimension to which it belongs • Storages on which it is present and associated snapshot dump period, if any. Dimension Table Dimension Dimension table Storage
  • 39. Data Model – Storage tables and partitions Storagetable • Belongs to fact/dimension • Associated storage descriptor • Partitioned by columns • Naming convention – storage name followed by fact/dimension name • Partition can override its storage descriptor • Fact storage table Fact table • Dimension storage table Dimension table
  • 40. Resolve candidate tables and storages Automatically resolve joins, aggregations Allows SQL over Cube QL Queries can span multiple storages Accepts multiple time ranges in query All Hive QL features Query features
  • 41. Adhoc querying system Internal Dashboards Customer facing Dashboards and Reporting Analytics systems at Inmobi

Editor's Notes

  1. Inmobi provides marketplace, where it buys the space on mobile from publishers and sells it to advertisers, meanwhile it acquires users.
  2. Inmobi has 130TB hadoop warehouse and 5TB SQL warehouse. Let us see an example of reporting page. This is the dashboard a publisher sees.
  3. Conventional columnar databases (RDBMS) systems lend themselves well for interactive SQL queries over reasonably small datasets in the order of 10-100s of GB, while hadoop based warehouses operate well over large datasets in the order of TBs and PBs and scales fairly linearly. Though there have been some improvements recently in storage structures in the Hadoop warehouses such as ORC, queries over hadoop still typically adopts a full scan approach. Choosing between these different data stores based on cost of storage, concurrency, scalability and performance is fairly complex and not easy for most users.
  4. Individually all the systems we just saw work really great! They provide best time responses to user queries. Disparate user experience because of multiple reporting systems Involves a learning curve for systems and their api Disparate data storage systems causing inability to scale Altering schema involves different systems Data discovery Cannot leverage data in other systems Not leveraging community around Cannot experiment with new storage/execution engine out of the box
  5. Column Measure : name, type, default aggregate, format string, start date, end date Expression Measure : Associated Expression Simple Dimension: name, type, start date, end date Referenced Dimension : Referencing table and column Hierarchical Dimension :hierarchy Expression Dimension : Associated expression
  6. Column Measure : name, type, default aggregate, format string, start date, end date Expression Measure : Associated Expression Simple Dimension: name, type, start date, end date Referenced Dimension : Referencing table and column Hierarchical Dimension :hierarchy Expression Dimension : Associated expression
  7. Column Measure : name, type, default aggregate, format string, start date, end date Expression Measure : Associated Expression Simple Dimension: name, type, start date, end date Referenced Dimension : Referencing table and column Hierarchical Dimension :hierarchy Expression Dimension : Associated expression
  8. The grammar is subset of HQL Resolve candidate dimension tables and the storage tables . Resolve the candidate fact tables which can answer the query, pick the ones from top of the pyramid. Resolve fact storage tables for the queried time range. Automatically resolve joins using the relationships between cubes and dimension. Automatically add aggregate functions to measures. Add expression to group by clause, if projected; and project group by clause, if it is not.
  9. Beta version in prod
  10. Beta version already in production
  11. Resolve candidate dimension tables and the storage tables . Resolve the candidate fact tables which can answer the query, pick the ones from top of the pyramid. Resolve fact storage tables for the queried time range. Automatically resolve joins using the relationships between cubes and dimension. Automatically add aggregate functions to measures. Add expression to group by clause, if projected; and project group by clause, if it is not.
  12. Adhoc querying system Adhoc and Batch queries Scheduled queries Based on Hadoop Mapreduce Provides UI and custom api Data is stored in HDFS Dashboard system Canned reports Interactive and adhoc queries Provides UI and Custom api Data is stored in columnar DWH Customer facing system Face to the outside world (Advertisers and publishers) Interactive and adhoc queries Provides UI and custom api Data is stored in relational DB, Postgres