SlideShare a Scribd company logo
1 of 30
Download to read offline
Sébastien  Jelsch
London,  7-­11-­2015
Big  Data  MDX  with  Mondrian  and  Apache  Kylin
1Big  Data  MDX  with  Mondrian  and  Apache  Kylin                                                                                                                        Sébastien  Jelsch
Agenda
▪ OLAP-­on-­Hadoop  with  Apache  Kylin
▪ Features
▪ Apache  Kylin  &  Mondrian
▪ Conclusion  &  Discussion
Big  Data  MDX  with  Mondrian  and  Apache  Kylin                                                                                                                        Sébastien  Jelsch
Agenda
▪ OLAP-­on-­Hadoop  with  Apache  Kylin
▪ Features
▪ Apache  Kylin  &  Mondrian
▪ Conclusion  &  Discussion
1
Big  Data  MDX  with  Mondrian  and  Apache  Kylin                                                                                                                        Sébastien  Jelsch
Big  Data
Situation
▪ More  and  more  data  becoming  available  on  Hadoop
▪ Limitations  in  existing  Business  Intelligence  Tools
○ Limited  support  for  Hadoop
○ Data  size  growing  exponentially
○ High  latency  of  interactive  queries
▪ Challenges  to  adapt  Hadoop  for  interactive  analysis
○ OLAP  capability  on  Hadoop  ecosystem  not  ready  yet
2
Big  Data  MDX  with  Mondrian  and  Apache  Kylin                                                                                                                        Sébastien  Jelsch
OLAP  and  Big  Data
Goals
▪ Full  OLAP  capability  and  advanced  functionality
▪ Interactive  analysis  in  subseconds
▪ ANSI  SQL  or  MDX  for  analysts  and  engineers
▪ Seamless  integration  with  BI  Tools
▪ High  concurrency  with  thousands  of  end  users
▪ Distributed  and  scale  out  architecture  for  large  data  
volume
3
Big  Data  MDX  with  Mondrian  and  Apache  Kylin                                                                                                                        Sébastien  Jelsch
What  is  Apache  Kylin?
Solution: Apache  Kylin
Extreme  OLAP  Engine  for  Big  Data
▪ Distributed  Analytics  Engine  from  eBay
▪ OLAP-­on-­Hadoop
▪ Provides  SQL  interface  for  multidimensional   analysis
▪ Based  on  Hadoop  ecosystem
Open  Source  on:  1.  October  2014
Accepted  into  incubation:  25.  November  2014
Current  version:  1.1  (25.  October  2015)
4
OLAP  Cube
Big  Data  MDX  with  Mondrian  and  Apache  Kylin                                                                                                                        Sébastien  Jelsch
Short  introduction  into  OLAP
5
8 7 14
12 22 19
30 15 25Beer
Water
Wine
Berlin
Paris
London
2013
2014
2015
Big  Data  MDX  with  Mondrian  and  Apache  Kylin                                                                                                                        Sébastien  Jelsch
Apache  Kylin:  Architecture
6
3rd  Party  App Web  App BI  Tools
REST  Server
Query  Engine
Routing
OLAP
Cube
(HBase)
OLAP
Cube
(HBase)Metadata
Cube  Build  Engine
Hive
HDFS
Star  Schema  Data Key  Value  Data
Mid  Latency Low  Latency
SQLSQL JDBC  /  ODBC
Big  Data  MDX  with  Mondrian  and  Apache  Kylin                                                                                                                        Sébastien  Jelsch
Agenda
▪ OLAP-­on-­Hadoop  with  Apache  Kylin
▪ Features
▪ Apache  Kylin  &  Mondrian
▪ Conclusion  &  Discussion
7
Big  Data  MDX  with  Mondrian  and  Apache  Kylin                                                                                                                        Sébastien  Jelsch
Apache  Kylin:  Cube  Designer
8
Big  Data  MDX  with  Mondrian  and  Apache  Kylin                                                                                                                        Sébastien  Jelsch
Apache  Kylin:  Cube  Designer
9
Big  Data  MDX  with  Mondrian  and  Apache  Kylin                                                                                                                        Sébastien  Jelsch
Apache  Kylin:  Cube  Designer
10
Big  Data  MDX  with  Mondrian  and  Apache  Kylin                                                                                                                        Sébastien  Jelsch
Apache  Kylin:  Cube  Designer
11
Big  Data  MDX  with  Mondrian  and  Apache  Kylin                                                                                                                        Sébastien  Jelsch
Apache  Kylin:  Cube  Designer
12
Big  Data  MDX  with  Mondrian  and  Apache  Kylin                                                                                                                        Sébastien  Jelsch
Apache  Kylin:  Monitoring
13
Big  Data  MDX  with  Mondrian  and  Apache  Kylin                                                                                                                        Sébastien  Jelsch
Apache  Kylin:  SQL  Interface
14
Big  Data  MDX  with  Mondrian  and  Apache  Kylin                                                                                                                        Sébastien  Jelsch
Agenda
▪ OLAP-­on-­Hadoop  with  Apache  Kylin
▪ Features
▪ Apache  Kylin  &  Mondrian
▪ Conclusion  &  Discussion
15
Big  Data  MDX  with  Mondrian  and  Apache  Kylin                                                                                                                        Sébastien  Jelsch
Apache  Kylin  and  MDX
SQL  returns  2-­dimensional  result  set
For  more  dimensions  SQL  was  not  designed
Wish:
▪ Multidimensional   result  set
▪ Consider  hierarchies  and  levels  in  the  data
16
Query  Language:  MDX
Big  Data  MDX  with  Mondrian  and  Apache  Kylin                                                                                                                        Sébastien  Jelsch
Pentahos  Mondrian
Mondrian
▪ OLAP  Engine
▪ Transforms  MDX  queries  into  SQL
▪ Multidimensional   representation  of  data
▪ Integrated  into  Saiku  /  Pentahos  Business  Analytics  
Platform
▪ Expandable  through  SQL  dialects
e.g.  MySQL,  Postgres,  Hive,  Impala,  ...
17
Big  Data  MDX  with  Mondrian  and  Apache  Kylin                                                                                                                        Sébastien  Jelsch
Apache  Kylin  +  Mondrian:  Idea
18
OLAP  Client
Apache  Kylin
HBase,  Cuboids  ...
MondrianMondrian  Schema
Measures
Dimensions
Hierarchies
Levels
Attributes
XML
MDX
JDBC
Kylin  
Dialect
Big  Data  MDX  with  Mondrian  and  Apache  Kylin                                                                                                                        Sébastien  Jelsch
Apache  Kylin  +  Mondrian:  Implementation
Work  done:
▪ Kylin  dialect  created
▪ Optimized  Kylins JDBC  driver
▪ Bugs  fixed  to  get  Mondrian  working  with  Kylin
TBD:
▪ Integrate  Kylin  dialect  into  Mondrians official  code*
▪ Make  every  MDX  query  executable
Successful  tests**:
▪ Current  Saiku and  Mondrian  4.4
▪ Current  Saiku and  Mondrian  3.x  (not  tested  very  well)
*   Pull  Request:  https://github.com/pentaho/mondrian/pull/480
**  Github Project:  https://github.com/mustangore/kylin-­mondrian-­interaction
19
Big  Data  MDX  with  Mondrian  and  Apache  Kylin                                                                                                                        Sébastien  Jelsch
Apache  Kylin  +  Mondrian:  Examples
20
Big  Data  MDX  with  Mondrian  and  Apache  Kylin                                                                                                                        Sébastien  Jelsch
Apache  Kylin  +  Mondrian:  Examples
21
Big  Data  MDX  with  Mondrian  and  Apache  Kylin                                                                                                                        Sébastien  Jelsch
Agenda
▪ OLAP-­on-­Hadoop  with  Apache  Kylin
▪ Features
▪ Apache  Kylin  &  Mondrian
▪ Conclusion  &  Discussion
22
Big  Data  MDX  with  Mondrian  and  Apache  Kylin                                                                                                                        Sébastien  Jelsch
Apache  Kylin:  Conclusion
▪ Extremely  fast  and  scalable  OLAP  Engine
▪ OLAP-­on-­Hadoop
▪ Depends  on  Apache  Hadoop  infrastructure
▪ MOLAP  Cube
▪ Incremental  refresh  of  cubes
▪ Integration  into  existing  BI  Tools
▪ MDX  queries  with  Mondrian  possible  (ongoing  work)
23
Contact
Sébastien  Jelsch
Big  Data  Scientist
inovex  GmbH
Office  Karlsruhe
Ludwig-­Erhard-­Allee  6
76131  Karlsruhe
Tel:  +49  176  -­ 45786280
E-­Mail:  sjelsch@inovex.de
Twitter:  @inovexgmbh  |  @Mustangore
Thank  you  for  your  attention
Big  Data  MDX  with  Mondrian  and  Apache  Kylin                                                                                                                        Sébastien  Jelsch
Introduction  into  OLAP
B1
1,1,1,0 1,1,0,1
1,1,1,1
1,0,1,1 0,1,1,1
0,1,1,0
1,0,0,0 0,1,0,0 0,0,1,0 0,0,0,1
0,0,0,0
0,0,1,10,1,0,11,0,0,11,0,1,01,1,0,0
Cube:  All  combinations
Cuboid:  One  single  combination
Number  cuboids  growing  exponentially
0-­Cuboid
N-­Cuboid
Big  Data  MDX  with  Mondrian  and  Apache  Kylin                                                                                                                        Sébastien  Jelsch
Apache  Kylin:  Aggregation  Groups
Problem:  Number  of  Cuboids  grows  exponentially
Example:
Cube  with  30  dimensions
Number  of  Cuboids:  2³º  >  1  billion
Solution:  Partial  Cube
Classificate  the  OLAP  Cube  in  Aggregation  Groups
Example:
30  dimensions  splitted  into  3  groups  of  10  dimensions
Number  of  Cuboids:  2¹º  +  2¹º  +  2¹º  =  3072  <<  1  billion
B2
Big  Data  MDX  with  Mondrian  and  Apache  Kylin                                                                                                                        Sébastien  Jelsch
Apache  Kylin:  Cube  Build  Process
B3
Source
Hive  
Tables
HiveQL
Dimension
Dictionaries
Intermediate  
Hive  Table
HiveQL MapReduce
HDFS
Sequence
Files
N-­Cuboid
(1)
(2) (3)
Big  Data  MDX  with  Mondrian  and  Apache  Kylin                                                                                                                        Sébastien  Jelsch
Apache  Kylin:  Cube  Build  Process
B4
MapReduce
N-­Cuboid
HDFS
Sequence
Files
N-­1-­Cuboid
HDFS
Sequence
Files
0-­Cuboid
HDFS
Sequence
Files
... MapReduce
MapReduce
HFiles
HBase
Bulk  Import

More Related Content

What's hot

Hadoop and Enterprise Data Warehouse
Hadoop and Enterprise Data WarehouseHadoop and Enterprise Data Warehouse
Hadoop and Enterprise Data Warehouse
DataWorks Summit
 
Standing on the Shoulders of Open-Source Giants: The Serverless Realtime Lake...
Standing on the Shoulders of Open-Source Giants: The Serverless Realtime Lake...Standing on the Shoulders of Open-Source Giants: The Serverless Realtime Lake...
Standing on the Shoulders of Open-Source Giants: The Serverless Realtime Lake...
HostedbyConfluent
 
An Introduction to Big Data, NoSQL and MongoDB
An Introduction to Big Data, NoSQL and MongoDBAn Introduction to Big Data, NoSQL and MongoDB
An Introduction to Big Data, NoSQL and MongoDB
William LaForest
 

What's hot (20)

Data in Motion을 위한 이벤트 기반 마이크로서비스 아키텍처 소개
Data in Motion을 위한 이벤트 기반 마이크로서비스 아키텍처 소개Data in Motion을 위한 이벤트 기반 마이크로서비스 아키텍처 소개
Data in Motion을 위한 이벤트 기반 마이크로서비스 아키텍처 소개
 
Introducing Databricks Delta
Introducing Databricks DeltaIntroducing Databricks Delta
Introducing Databricks Delta
 
Cassandra at Instagram 2016 (Dikang Gu, Facebook) | Cassandra Summit 2016
Cassandra at Instagram 2016 (Dikang Gu, Facebook) | Cassandra Summit 2016Cassandra at Instagram 2016 (Dikang Gu, Facebook) | Cassandra Summit 2016
Cassandra at Instagram 2016 (Dikang Gu, Facebook) | Cassandra Summit 2016
 
Kafka Connect - debezium
Kafka Connect - debeziumKafka Connect - debezium
Kafka Connect - debezium
 
Apache Iceberg Presentation for the St. Louis Big Data IDEA
Apache Iceberg Presentation for the St. Louis Big Data IDEAApache Iceberg Presentation for the St. Louis Big Data IDEA
Apache Iceberg Presentation for the St. Louis Big Data IDEA
 
Netflix viewing data architecture evolution - QCon 2014
Netflix viewing data architecture evolution - QCon 2014Netflix viewing data architecture evolution - QCon 2014
Netflix viewing data architecture evolution - QCon 2014
 
Managing your Hadoop Clusters with Apache Ambari
Managing your Hadoop Clusters with Apache AmbariManaging your Hadoop Clusters with Apache Ambari
Managing your Hadoop Clusters with Apache Ambari
 
Change Data Streaming Patterns for Microservices With Debezium
Change Data Streaming Patterns for Microservices With Debezium Change Data Streaming Patterns for Microservices With Debezium
Change Data Streaming Patterns for Microservices With Debezium
 
Apache Kylin – Cubes on Hadoop
Apache Kylin – Cubes on HadoopApache Kylin – Cubes on Hadoop
Apache Kylin – Cubes on Hadoop
 
Hadoop and Enterprise Data Warehouse
Hadoop and Enterprise Data WarehouseHadoop and Enterprise Data Warehouse
Hadoop and Enterprise Data Warehouse
 
Introducing Change Data Capture with Debezium
Introducing Change Data Capture with DebeziumIntroducing Change Data Capture with Debezium
Introducing Change Data Capture with Debezium
 
Standing on the Shoulders of Open-Source Giants: The Serverless Realtime Lake...
Standing on the Shoulders of Open-Source Giants: The Serverless Realtime Lake...Standing on the Shoulders of Open-Source Giants: The Serverless Realtime Lake...
Standing on the Shoulders of Open-Source Giants: The Serverless Realtime Lake...
 
Migrating Oracle database to PostgreSQL
Migrating Oracle database to PostgreSQLMigrating Oracle database to PostgreSQL
Migrating Oracle database to PostgreSQL
 
An Introduction to Big Data, NoSQL and MongoDB
An Introduction to Big Data, NoSQL and MongoDBAn Introduction to Big Data, NoSQL and MongoDB
An Introduction to Big Data, NoSQL and MongoDB
 
Building an open data platform with apache iceberg
Building an open data platform with apache icebergBuilding an open data platform with apache iceberg
Building an open data platform with apache iceberg
 
Architecting a datalake
Architecting a datalakeArchitecting a datalake
Architecting a datalake
 
When to Use MongoDB...and When You Should Not...
When to Use MongoDB...and When You Should Not...When to Use MongoDB...and When You Should Not...
When to Use MongoDB...and When You Should Not...
 
Spark tunning in Apache Kylin
Spark tunning in Apache KylinSpark tunning in Apache Kylin
Spark tunning in Apache Kylin
 
Making Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta LakeMaking Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta Lake
 
Building Reliable Lakehouses with Apache Flink and Delta Lake
Building Reliable Lakehouses with Apache Flink and Delta LakeBuilding Reliable Lakehouses with Apache Flink and Delta Lake
Building Reliable Lakehouses with Apache Flink and Delta Lake
 

Viewers also liked

Business Intelligence Portfolio
Business Intelligence PortfolioBusiness Intelligence Portfolio
Business Intelligence Portfolio
Doug Armantrout
 
Getting Started with MDX 20140625a
Getting Started with MDX 20140625aGetting Started with MDX 20140625a
Getting Started with MDX 20140625a
Ron Moore
 
Multidimensional expressions mdx - reference
Multidimensional expressions   mdx - referenceMultidimensional expressions   mdx - reference
Multidimensional expressions mdx - reference
Steve Xu
 
rmoore.smartquery KScope15
rmoore.smartquery KScope15rmoore.smartquery KScope15
rmoore.smartquery KScope15
Ron Moore
 

Viewers also liked (20)

IS OLAP DEAD IN THE AGE OF BIG DATA?
IS OLAP DEAD IN THE AGE OF BIG DATA?IS OLAP DEAD IN THE AGE OF BIG DATA?
IS OLAP DEAD IN THE AGE OF BIG DATA?
 
Drilling into Data with Apache Drill
Drilling into Data with Apache DrillDrilling into Data with Apache Drill
Drilling into Data with Apache Drill
 
Pulsar: Real-time Analytics at Scale with Kafka, Kylin and Druid
Pulsar: Real-time Analytics at Scale with Kafka, Kylin and DruidPulsar: Real-time Analytics at Scale with Kafka, Kylin and Druid
Pulsar: Real-time Analytics at Scale with Kafka, Kylin and Druid
 
Apache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
Apache Kylin: OLAP Engine on Hadoop - Tech Deep DiveApache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
Apache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
 
Cost Optimization at Scale
Cost Optimization at ScaleCost Optimization at Scale
Cost Optimization at Scale
 
Enhancing Dashboard Visuals with Multi-Dimensional Expressions (MDX)
Enhancing Dashboard Visuals with Multi-Dimensional Expressions (MDX)Enhancing Dashboard Visuals with Multi-Dimensional Expressions (MDX)
Enhancing Dashboard Visuals with Multi-Dimensional Expressions (MDX)
 
2012 Acura MDX Brochure | DCH Acura of Temecula
2012 Acura MDX Brochure | DCH Acura of Temecula2012 Acura MDX Brochure | DCH Acura of Temecula
2012 Acura MDX Brochure | DCH Acura of Temecula
 
Unlocking the Mystery of MDX
Unlocking the Mystery of MDXUnlocking the Mystery of MDX
Unlocking the Mystery of MDX
 
Ssis sql ssrs_sp_ssas_mdx_hb_li
Ssis sql ssrs_sp_ssas_mdx_hb_liSsis sql ssrs_sp_ssas_mdx_hb_li
Ssis sql ssrs_sp_ssas_mdx_hb_li
 
Business Intelligence Portfolio
Business Intelligence PortfolioBusiness Intelligence Portfolio
Business Intelligence Portfolio
 
Getting Started with MDX 20140625a
Getting Started with MDX 20140625aGetting Started with MDX 20140625a
Getting Started with MDX 20140625a
 
Multidimensional expressions mdx - reference
Multidimensional expressions   mdx - referenceMultidimensional expressions   mdx - reference
Multidimensional expressions mdx - reference
 
Citrix MDX Technologies Feature Brief
Citrix MDX Technologies Feature BriefCitrix MDX Technologies Feature Brief
Citrix MDX Technologies Feature Brief
 
Mdx university dubai courses
Mdx university dubai coursesMdx university dubai courses
Mdx university dubai courses
 
Introduction to mdx query ppt
Introduction to mdx query pptIntroduction to mdx query ppt
Introduction to mdx query ppt
 
SQL Server Analysis Services and MDX
SQL Server Analysis Services and MDXSQL Server Analysis Services and MDX
SQL Server Analysis Services and MDX
 
Mdx complex-queries-130019
Mdx complex-queries-130019Mdx complex-queries-130019
Mdx complex-queries-130019
 
MDX 2015-2019 Work Program Overview presentation, October 22, 2014
MDX 2015-2019 Work Program Overview presentation, October 22, 2014MDX 2015-2019 Work Program Overview presentation, October 22, 2014
MDX 2015-2019 Work Program Overview presentation, October 22, 2014
 
rmoore.smartquery KScope15
rmoore.smartquery KScope15rmoore.smartquery KScope15
rmoore.smartquery KScope15
 
IBM Cognos Dimensional Dashboarding Techniques
IBM Cognos Dimensional Dashboarding TechniquesIBM Cognos Dimensional Dashboarding Techniques
IBM Cognos Dimensional Dashboarding Techniques
 

Similar to Big Data MDX with Mondrian and Apache Kylin

HBaseConAsia2018 Track2-2: Apache Kylin on HBase: Extreme OLAP for big data
HBaseConAsia2018  Track2-2: Apache Kylin on HBase: Extreme OLAP for big dataHBaseConAsia2018  Track2-2: Apache Kylin on HBase: Extreme OLAP for big data
HBaseConAsia2018 Track2-2: Apache Kylin on HBase: Extreme OLAP for big data
Michael Stack
 
Hybrid my sql_hadoop_datawarehouse
Hybrid my sql_hadoop_datawarehouseHybrid my sql_hadoop_datawarehouse
Hybrid my sql_hadoop_datawarehouse
Laine Campbell
 
Advanced Analytics in Hadoop
Advanced Analytics in HadoopAdvanced Analytics in Hadoop
Advanced Analytics in Hadoop
AnalyticsWeek
 
NoSQL for the SQL Server Pro
NoSQL for the SQL Server ProNoSQL for the SQL Server Pro
NoSQL for the SQL Server Pro
Lynn Langit
 

Similar to Big Data MDX with Mondrian and Apache Kylin (20)

Cloud-native Semantic Layer on Data Lake
Cloud-native Semantic Layer on Data LakeCloud-native Semantic Layer on Data Lake
Cloud-native Semantic Layer on Data Lake
 
HBaseConAsia2018 Track2-2: Apache Kylin on HBase: Extreme OLAP for big data
HBaseConAsia2018  Track2-2: Apache Kylin on HBase: Extreme OLAP for big dataHBaseConAsia2018  Track2-2: Apache Kylin on HBase: Extreme OLAP for big data
HBaseConAsia2018 Track2-2: Apache Kylin on HBase: Extreme OLAP for big data
 
Apache Kylin on HBase: Extreme OLAP engine for big data
Apache Kylin on HBase: Extreme OLAP engine for big dataApache Kylin on HBase: Extreme OLAP engine for big data
Apache Kylin on HBase: Extreme OLAP engine for big data
 
ESGYN Overview
ESGYN OverviewESGYN Overview
ESGYN Overview
 
Building Enterprise OLAP on Hadoop for FSI
Building Enterprise OLAP on Hadoop for FSIBuilding Enterprise OLAP on Hadoop for FSI
Building Enterprise OLAP on Hadoop for FSI
 
Hybrid my sql_hadoop_datawarehouse
Hybrid my sql_hadoop_datawarehouseHybrid my sql_hadoop_datawarehouse
Hybrid my sql_hadoop_datawarehouse
 
PCM18 (Big Data Analytics)
PCM18 (Big Data Analytics)PCM18 (Big Data Analytics)
PCM18 (Big Data Analytics)
 
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear ScalabilityPowering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
 
AMIS OOW Review 2012- Deel 1 - Lucas Jellema & Paul Uijtewaal
AMIS OOW Review 2012- Deel 1 - Lucas Jellema & Paul UijtewaalAMIS OOW Review 2012- Deel 1 - Lucas Jellema & Paul Uijtewaal
AMIS OOW Review 2012- Deel 1 - Lucas Jellema & Paul Uijtewaal
 
Webinar: Buckle Up: The Future of the Distributed Database is Here - DataStax...
Webinar: Buckle Up: The Future of the Distributed Database is Here - DataStax...Webinar: Buckle Up: The Future of the Distributed Database is Here - DataStax...
Webinar: Buckle Up: The Future of the Distributed Database is Here - DataStax...
 
Bi on Big Data - Strata 2016 in London
Bi on Big Data - Strata 2016 in LondonBi on Big Data - Strata 2016 in London
Bi on Big Data - Strata 2016 in London
 
Advanced Analytics in Hadoop
Advanced Analytics in HadoopAdvanced Analytics in Hadoop
Advanced Analytics in Hadoop
 
NoSQL for the SQL Server Pro
NoSQL for the SQL Server ProNoSQL for the SQL Server Pro
NoSQL for the SQL Server Pro
 
The Evolution of Open Source Databases
The Evolution of Open Source DatabasesThe Evolution of Open Source Databases
The Evolution of Open Source Databases
 
Introduction of MariaDB 2017 09
Introduction of MariaDB 2017 09Introduction of MariaDB 2017 09
Introduction of MariaDB 2017 09
 
How to Run Containerized Enterprise SQL Applications in the Cloud with NuoDB ...
How to Run Containerized Enterprise SQL Applications in the Cloud with NuoDB ...How to Run Containerized Enterprise SQL Applications in the Cloud with NuoDB ...
How to Run Containerized Enterprise SQL Applications in the Cloud with NuoDB ...
 
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
 
How R Developers Can Build and Share Data and AI Applications that Scale with...
How R Developers Can Build and Share Data and AI Applications that Scale with...How R Developers Can Build and Share Data and AI Applications that Scale with...
How R Developers Can Build and Share Data and AI Applications that Scale with...
 
SQL + Hadoop: The High Performance Advantage�
SQL + Hadoop:  The High Performance Advantage�SQL + Hadoop:  The High Performance Advantage�
SQL + Hadoop: The High Performance Advantage�
 
Using Scalding for Data Driven Product Development at LinkedIn
Using Scalding for Data Driven Product Development at LinkedInUsing Scalding for Data Driven Product Development at LinkedIn
Using Scalding for Data Driven Product Development at LinkedIn
 

More from inovex GmbH

Interpretable Machine Learning
Interpretable Machine LearningInterpretable Machine Learning
Interpretable Machine Learning
inovex GmbH
 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender Systems
inovex GmbH
 
Representation Learning von Zeitreihen
Representation Learning von ZeitreihenRepresentation Learning von Zeitreihen
Representation Learning von Zeitreihen
inovex GmbH
 
Performance evaluation of GANs in a semisupervised OCR use case
Performance evaluation of GANs in a semisupervised OCR use casePerformance evaluation of GANs in a semisupervised OCR use case
Performance evaluation of GANs in a semisupervised OCR use case
inovex GmbH
 

More from inovex GmbH (20)

lldb – Debugger auf Abwegen
lldb – Debugger auf Abwegenlldb – Debugger auf Abwegen
lldb – Debugger auf Abwegen
 
Are you sure about that?! Uncertainty Quantification in AI
Are you sure about that?! Uncertainty Quantification in AIAre you sure about that?! Uncertainty Quantification in AI
Are you sure about that?! Uncertainty Quantification in AI
 
Why natural language is next step in the AI evolution
Why natural language is next step in the AI evolutionWhy natural language is next step in the AI evolution
Why natural language is next step in the AI evolution
 
WWDC 2019 Recap
WWDC 2019 RecapWWDC 2019 Recap
WWDC 2019 Recap
 
Network Policies
Network PoliciesNetwork Policies
Network Policies
 
Interpretable Machine Learning
Interpretable Machine LearningInterpretable Machine Learning
Interpretable Machine Learning
 
Jenkins X – CI/CD in wolkigen Umgebungen
Jenkins X – CI/CD in wolkigen UmgebungenJenkins X – CI/CD in wolkigen Umgebungen
Jenkins X – CI/CD in wolkigen Umgebungen
 
AI auf Edge-Geraeten
AI auf Edge-GeraetenAI auf Edge-Geraeten
AI auf Edge-Geraeten
 
Prometheus on Kubernetes
Prometheus on KubernetesPrometheus on Kubernetes
Prometheus on Kubernetes
 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender Systems
 
Azure IoT Edge
Azure IoT EdgeAzure IoT Edge
Azure IoT Edge
 
Representation Learning von Zeitreihen
Representation Learning von ZeitreihenRepresentation Learning von Zeitreihen
Representation Learning von Zeitreihen
 
Talk to me – Chatbots und digitale Assistenten
Talk to me – Chatbots und digitale AssistentenTalk to me – Chatbots und digitale Assistenten
Talk to me – Chatbots und digitale Assistenten
 
Künstlich intelligent?
Künstlich intelligent?Künstlich intelligent?
Künstlich intelligent?
 
Dev + Ops = Go
Dev + Ops = GoDev + Ops = Go
Dev + Ops = Go
 
Das Android Open Source Project
Das Android Open Source ProjectDas Android Open Source Project
Das Android Open Source Project
 
Machine Learning Interpretability
Machine Learning InterpretabilityMachine Learning Interpretability
Machine Learning Interpretability
 
Performance evaluation of GANs in a semisupervised OCR use case
Performance evaluation of GANs in a semisupervised OCR use casePerformance evaluation of GANs in a semisupervised OCR use case
Performance evaluation of GANs in a semisupervised OCR use case
 
People & Products – Lessons learned from the daily IT madness
People & Products – Lessons learned from the daily IT madnessPeople & Products – Lessons learned from the daily IT madness
People & Products – Lessons learned from the daily IT madness
 
Infrastructure as (real) Code – Manage your K8s resources with Pulumi
Infrastructure as (real) Code – Manage your K8s resources with PulumiInfrastructure as (real) Code – Manage your K8s resources with Pulumi
Infrastructure as (real) Code – Manage your K8s resources with Pulumi
 

Recently uploaded

AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
VictorSzoltysek
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
VishalKumarJha10
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
Health
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
shinachiaurasa2
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
mohitmore19
 

Recently uploaded (20)

AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
 
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
ManageIQ - Sprint 236 Review - Slide Deck
ManageIQ - Sprint 236 Review - Slide DeckManageIQ - Sprint 236 Review - Slide Deck
ManageIQ - Sprint 236 Review - Slide Deck
 
LEVEL 5 - SESSION 1 2023 (1).pptx - PDF 123456
LEVEL 5   - SESSION 1 2023 (1).pptx - PDF 123456LEVEL 5   - SESSION 1 2023 (1).pptx - PDF 123456
LEVEL 5 - SESSION 1 2023 (1).pptx - PDF 123456
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 

Big Data MDX with Mondrian and Apache Kylin

  • 1. Sébastien  Jelsch London,  7-­11-­2015 Big  Data  MDX  with  Mondrian  and  Apache  Kylin
  • 2. 1Big  Data  MDX  with  Mondrian  and  Apache  Kylin                                                                                                                        Sébastien  Jelsch Agenda ▪ OLAP-­on-­Hadoop  with  Apache  Kylin ▪ Features ▪ Apache  Kylin  &  Mondrian ▪ Conclusion  &  Discussion
  • 3. Big  Data  MDX  with  Mondrian  and  Apache  Kylin                                                                                                                        Sébastien  Jelsch Agenda ▪ OLAP-­on-­Hadoop  with  Apache  Kylin ▪ Features ▪ Apache  Kylin  &  Mondrian ▪ Conclusion  &  Discussion 1
  • 4. Big  Data  MDX  with  Mondrian  and  Apache  Kylin                                                                                                                        Sébastien  Jelsch Big  Data Situation ▪ More  and  more  data  becoming  available  on  Hadoop ▪ Limitations  in  existing  Business  Intelligence  Tools ○ Limited  support  for  Hadoop ○ Data  size  growing  exponentially ○ High  latency  of  interactive  queries ▪ Challenges  to  adapt  Hadoop  for  interactive  analysis ○ OLAP  capability  on  Hadoop  ecosystem  not  ready  yet 2
  • 5. Big  Data  MDX  with  Mondrian  and  Apache  Kylin                                                                                                                        Sébastien  Jelsch OLAP  and  Big  Data Goals ▪ Full  OLAP  capability  and  advanced  functionality ▪ Interactive  analysis  in  subseconds ▪ ANSI  SQL  or  MDX  for  analysts  and  engineers ▪ Seamless  integration  with  BI  Tools ▪ High  concurrency  with  thousands  of  end  users ▪ Distributed  and  scale  out  architecture  for  large  data   volume 3
  • 6. Big  Data  MDX  with  Mondrian  and  Apache  Kylin                                                                                                                        Sébastien  Jelsch What  is  Apache  Kylin? Solution: Apache  Kylin Extreme  OLAP  Engine  for  Big  Data ▪ Distributed  Analytics  Engine  from  eBay ▪ OLAP-­on-­Hadoop ▪ Provides  SQL  interface  for  multidimensional   analysis ▪ Based  on  Hadoop  ecosystem Open  Source  on:  1.  October  2014 Accepted  into  incubation:  25.  November  2014 Current  version:  1.1  (25.  October  2015) 4
  • 7. OLAP  Cube Big  Data  MDX  with  Mondrian  and  Apache  Kylin                                                                                                                        Sébastien  Jelsch Short  introduction  into  OLAP 5 8 7 14 12 22 19 30 15 25Beer Water Wine Berlin Paris London 2013 2014 2015
  • 8. Big  Data  MDX  with  Mondrian  and  Apache  Kylin                                                                                                                        Sébastien  Jelsch Apache  Kylin:  Architecture 6 3rd  Party  App Web  App BI  Tools REST  Server Query  Engine Routing OLAP Cube (HBase) OLAP Cube (HBase)Metadata Cube  Build  Engine Hive HDFS Star  Schema  Data Key  Value  Data Mid  Latency Low  Latency SQLSQL JDBC  /  ODBC
  • 9. Big  Data  MDX  with  Mondrian  and  Apache  Kylin                                                                                                                        Sébastien  Jelsch Agenda ▪ OLAP-­on-­Hadoop  with  Apache  Kylin ▪ Features ▪ Apache  Kylin  &  Mondrian ▪ Conclusion  &  Discussion 7
  • 10. Big  Data  MDX  with  Mondrian  and  Apache  Kylin                                                                                                                        Sébastien  Jelsch Apache  Kylin:  Cube  Designer 8
  • 11. Big  Data  MDX  with  Mondrian  and  Apache  Kylin                                                                                                                        Sébastien  Jelsch Apache  Kylin:  Cube  Designer 9
  • 12. Big  Data  MDX  with  Mondrian  and  Apache  Kylin                                                                                                                        Sébastien  Jelsch Apache  Kylin:  Cube  Designer 10
  • 13. Big  Data  MDX  with  Mondrian  and  Apache  Kylin                                                                                                                        Sébastien  Jelsch Apache  Kylin:  Cube  Designer 11
  • 14. Big  Data  MDX  with  Mondrian  and  Apache  Kylin                                                                                                                        Sébastien  Jelsch Apache  Kylin:  Cube  Designer 12
  • 15. Big  Data  MDX  with  Mondrian  and  Apache  Kylin                                                                                                                        Sébastien  Jelsch Apache  Kylin:  Monitoring 13
  • 16. Big  Data  MDX  with  Mondrian  and  Apache  Kylin                                                                                                                        Sébastien  Jelsch Apache  Kylin:  SQL  Interface 14
  • 17. Big  Data  MDX  with  Mondrian  and  Apache  Kylin                                                                                                                        Sébastien  Jelsch Agenda ▪ OLAP-­on-­Hadoop  with  Apache  Kylin ▪ Features ▪ Apache  Kylin  &  Mondrian ▪ Conclusion  &  Discussion 15
  • 18. Big  Data  MDX  with  Mondrian  and  Apache  Kylin                                                                                                                        Sébastien  Jelsch Apache  Kylin  and  MDX SQL  returns  2-­dimensional  result  set For  more  dimensions  SQL  was  not  designed Wish: ▪ Multidimensional   result  set ▪ Consider  hierarchies  and  levels  in  the  data 16 Query  Language:  MDX
  • 19. Big  Data  MDX  with  Mondrian  and  Apache  Kylin                                                                                                                        Sébastien  Jelsch Pentahos  Mondrian Mondrian ▪ OLAP  Engine ▪ Transforms  MDX  queries  into  SQL ▪ Multidimensional   representation  of  data ▪ Integrated  into  Saiku  /  Pentahos  Business  Analytics   Platform ▪ Expandable  through  SQL  dialects e.g.  MySQL,  Postgres,  Hive,  Impala,  ... 17
  • 20. Big  Data  MDX  with  Mondrian  and  Apache  Kylin                                                                                                                        Sébastien  Jelsch Apache  Kylin  +  Mondrian:  Idea 18 OLAP  Client Apache  Kylin HBase,  Cuboids  ... MondrianMondrian  Schema Measures Dimensions Hierarchies Levels Attributes XML MDX JDBC Kylin   Dialect
  • 21. Big  Data  MDX  with  Mondrian  and  Apache  Kylin                                                                                                                        Sébastien  Jelsch Apache  Kylin  +  Mondrian:  Implementation Work  done: ▪ Kylin  dialect  created ▪ Optimized  Kylins JDBC  driver ▪ Bugs  fixed  to  get  Mondrian  working  with  Kylin TBD: ▪ Integrate  Kylin  dialect  into  Mondrians official  code* ▪ Make  every  MDX  query  executable Successful  tests**: ▪ Current  Saiku and  Mondrian  4.4 ▪ Current  Saiku and  Mondrian  3.x  (not  tested  very  well) *   Pull  Request:  https://github.com/pentaho/mondrian/pull/480 **  Github Project:  https://github.com/mustangore/kylin-­mondrian-­interaction 19
  • 22. Big  Data  MDX  with  Mondrian  and  Apache  Kylin                                                                                                                        Sébastien  Jelsch Apache  Kylin  +  Mondrian:  Examples 20
  • 23. Big  Data  MDX  with  Mondrian  and  Apache  Kylin                                                                                                                        Sébastien  Jelsch Apache  Kylin  +  Mondrian:  Examples 21
  • 24. Big  Data  MDX  with  Mondrian  and  Apache  Kylin                                                                                                                        Sébastien  Jelsch Agenda ▪ OLAP-­on-­Hadoop  with  Apache  Kylin ▪ Features ▪ Apache  Kylin  &  Mondrian ▪ Conclusion  &  Discussion 22
  • 25. Big  Data  MDX  with  Mondrian  and  Apache  Kylin                                                                                                                        Sébastien  Jelsch Apache  Kylin:  Conclusion ▪ Extremely  fast  and  scalable  OLAP  Engine ▪ OLAP-­on-­Hadoop ▪ Depends  on  Apache  Hadoop  infrastructure ▪ MOLAP  Cube ▪ Incremental  refresh  of  cubes ▪ Integration  into  existing  BI  Tools ▪ MDX  queries  with  Mondrian  possible  (ongoing  work) 23
  • 26. Contact Sébastien  Jelsch Big  Data  Scientist inovex  GmbH Office  Karlsruhe Ludwig-­Erhard-­Allee  6 76131  Karlsruhe Tel:  +49  176  -­ 45786280 E-­Mail:  sjelsch@inovex.de Twitter:  @inovexgmbh  |  @Mustangore Thank  you  for  your  attention
  • 27. Big  Data  MDX  with  Mondrian  and  Apache  Kylin                                                                                                                        Sébastien  Jelsch Introduction  into  OLAP B1 1,1,1,0 1,1,0,1 1,1,1,1 1,0,1,1 0,1,1,1 0,1,1,0 1,0,0,0 0,1,0,0 0,0,1,0 0,0,0,1 0,0,0,0 0,0,1,10,1,0,11,0,0,11,0,1,01,1,0,0 Cube:  All  combinations Cuboid:  One  single  combination Number  cuboids  growing  exponentially 0-­Cuboid N-­Cuboid
  • 28. Big  Data  MDX  with  Mondrian  and  Apache  Kylin                                                                                                                        Sébastien  Jelsch Apache  Kylin:  Aggregation  Groups Problem:  Number  of  Cuboids  grows  exponentially Example: Cube  with  30  dimensions Number  of  Cuboids:  2³º  >  1  billion Solution:  Partial  Cube Classificate  the  OLAP  Cube  in  Aggregation  Groups Example: 30  dimensions  splitted  into  3  groups  of  10  dimensions Number  of  Cuboids:  2¹º  +  2¹º  +  2¹º  =  3072  <<  1  billion B2
  • 29. Big  Data  MDX  with  Mondrian  and  Apache  Kylin                                                                                                                        Sébastien  Jelsch Apache  Kylin:  Cube  Build  Process B3 Source Hive   Tables HiveQL Dimension Dictionaries Intermediate   Hive  Table HiveQL MapReduce HDFS Sequence Files N-­Cuboid (1) (2) (3)
  • 30. Big  Data  MDX  with  Mondrian  and  Apache  Kylin                                                                                                                        Sébastien  Jelsch Apache  Kylin:  Cube  Build  Process B4 MapReduce N-­Cuboid HDFS Sequence Files N-­1-­Cuboid HDFS Sequence Files 0-­Cuboid HDFS Sequence Files ... MapReduce MapReduce HFiles HBase Bulk  Import