2. 1Big Data MDX with Mondrian and Apache Kylin Sébastien Jelsch
Agenda
▪ OLAP-on-Hadoop with Apache Kylin
▪ Features
▪ Apache Kylin & Mondrian
▪ Conclusion & Discussion
3. Big Data MDX with Mondrian and Apache Kylin Sébastien Jelsch
Agenda
▪ OLAP-on-Hadoop with Apache Kylin
▪ Features
▪ Apache Kylin & Mondrian
▪ Conclusion & Discussion
1
4. Big Data MDX with Mondrian and Apache Kylin Sébastien Jelsch
Big Data
Situation
▪ More and more data becoming available on Hadoop
▪ Limitations in existing Business Intelligence Tools
○ Limited support for Hadoop
○ Data size growing exponentially
○ High latency of interactive queries
▪ Challenges to adapt Hadoop for interactive analysis
○ OLAP capability on Hadoop ecosystem not ready yet
2
5. Big Data MDX with Mondrian and Apache Kylin Sébastien Jelsch
OLAP and Big Data
Goals
▪ Full OLAP capability and advanced functionality
▪ Interactive analysis in subseconds
▪ ANSI SQL or MDX for analysts and engineers
▪ Seamless integration with BI Tools
▪ High concurrency with thousands of end users
▪ Distributed and scale out architecture for large data
volume
3
6. Big Data MDX with Mondrian and Apache Kylin Sébastien Jelsch
What is Apache Kylin?
Solution: Apache Kylin
Extreme OLAP Engine for Big Data
▪ Distributed Analytics Engine from eBay
▪ OLAP-on-Hadoop
▪ Provides SQL interface for multidimensional analysis
▪ Based on Hadoop ecosystem
Open Source on: 1. October 2014
Accepted into incubation: 25. November 2014
Current version: 1.1 (25. October 2015)
4
7. OLAP Cube
Big Data MDX with Mondrian and Apache Kylin Sébastien Jelsch
Short introduction into OLAP
5
8 7 14
12 22 19
30 15 25Beer
Water
Wine
Berlin
Paris
London
2013
2014
2015
8. Big Data MDX with Mondrian and Apache Kylin Sébastien Jelsch
Apache Kylin: Architecture
6
3rd Party App Web App BI Tools
REST Server
Query Engine
Routing
OLAP
Cube
(HBase)
OLAP
Cube
(HBase)Metadata
Cube Build Engine
Hive
HDFS
Star Schema Data Key Value Data
Mid Latency Low Latency
SQLSQL JDBC / ODBC
9. Big Data MDX with Mondrian and Apache Kylin Sébastien Jelsch
Agenda
▪ OLAP-on-Hadoop with Apache Kylin
▪ Features
▪ Apache Kylin & Mondrian
▪ Conclusion & Discussion
7
10. Big Data MDX with Mondrian and Apache Kylin Sébastien Jelsch
Apache Kylin: Cube Designer
8
11. Big Data MDX with Mondrian and Apache Kylin Sébastien Jelsch
Apache Kylin: Cube Designer
9
12. Big Data MDX with Mondrian and Apache Kylin Sébastien Jelsch
Apache Kylin: Cube Designer
10
13. Big Data MDX with Mondrian and Apache Kylin Sébastien Jelsch
Apache Kylin: Cube Designer
11
14. Big Data MDX with Mondrian and Apache Kylin Sébastien Jelsch
Apache Kylin: Cube Designer
12
15. Big Data MDX with Mondrian and Apache Kylin Sébastien Jelsch
Apache Kylin: Monitoring
13
16. Big Data MDX with Mondrian and Apache Kylin Sébastien Jelsch
Apache Kylin: SQL Interface
14
17. Big Data MDX with Mondrian and Apache Kylin Sébastien Jelsch
Agenda
▪ OLAP-on-Hadoop with Apache Kylin
▪ Features
▪ Apache Kylin & Mondrian
▪ Conclusion & Discussion
15
18. Big Data MDX with Mondrian and Apache Kylin Sébastien Jelsch
Apache Kylin and MDX
SQL returns 2-dimensional result set
For more dimensions SQL was not designed
Wish:
▪ Multidimensional result set
▪ Consider hierarchies and levels in the data
16
Query Language: MDX
19. Big Data MDX with Mondrian and Apache Kylin Sébastien Jelsch
Pentahos Mondrian
Mondrian
▪ OLAP Engine
▪ Transforms MDX queries into SQL
▪ Multidimensional representation of data
▪ Integrated into Saiku / Pentahos Business Analytics
Platform
▪ Expandable through SQL dialects
e.g. MySQL, Postgres, Hive, Impala, ...
17
20. Big Data MDX with Mondrian and Apache Kylin Sébastien Jelsch
Apache Kylin + Mondrian: Idea
18
OLAP Client
Apache Kylin
HBase, Cuboids ...
MondrianMondrian Schema
Measures
Dimensions
Hierarchies
Levels
Attributes
XML
MDX
JDBC
Kylin
Dialect
21. Big Data MDX with Mondrian and Apache Kylin Sébastien Jelsch
Apache Kylin + Mondrian: Implementation
Work done:
▪ Kylin dialect created
▪ Optimized Kylins JDBC driver
▪ Bugs fixed to get Mondrian working with Kylin
TBD:
▪ Integrate Kylin dialect into Mondrians official code*
▪ Make every MDX query executable
Successful tests**:
▪ Current Saiku and Mondrian 4.4
▪ Current Saiku and Mondrian 3.x (not tested very well)
* Pull Request: https://github.com/pentaho/mondrian/pull/480
** Github Project: https://github.com/mustangore/kylin-mondrian-interaction
19
22. Big Data MDX with Mondrian and Apache Kylin Sébastien Jelsch
Apache Kylin + Mondrian: Examples
20
23. Big Data MDX with Mondrian and Apache Kylin Sébastien Jelsch
Apache Kylin + Mondrian: Examples
21
24. Big Data MDX with Mondrian and Apache Kylin Sébastien Jelsch
Agenda
▪ OLAP-on-Hadoop with Apache Kylin
▪ Features
▪ Apache Kylin & Mondrian
▪ Conclusion & Discussion
22
25. Big Data MDX with Mondrian and Apache Kylin Sébastien Jelsch
Apache Kylin: Conclusion
▪ Extremely fast and scalable OLAP Engine
▪ OLAP-on-Hadoop
▪ Depends on Apache Hadoop infrastructure
▪ MOLAP Cube
▪ Incremental refresh of cubes
▪ Integration into existing BI Tools
▪ MDX queries with Mondrian possible (ongoing work)
23
26. Contact
Sébastien Jelsch
Big Data Scientist
inovex GmbH
Office Karlsruhe
Ludwig-Erhard-Allee 6
76131 Karlsruhe
Tel: +49 176 - 45786280
E-Mail: sjelsch@inovex.de
Twitter: @inovexgmbh | @Mustangore
Thank you for your attention
27. Big Data MDX with Mondrian and Apache Kylin Sébastien Jelsch
Introduction into OLAP
B1
1,1,1,0 1,1,0,1
1,1,1,1
1,0,1,1 0,1,1,1
0,1,1,0
1,0,0,0 0,1,0,0 0,0,1,0 0,0,0,1
0,0,0,0
0,0,1,10,1,0,11,0,0,11,0,1,01,1,0,0
Cube: All combinations
Cuboid: One single combination
Number cuboids growing exponentially
0-Cuboid
N-Cuboid
28. Big Data MDX with Mondrian and Apache Kylin Sébastien Jelsch
Apache Kylin: Aggregation Groups
Problem: Number of Cuboids grows exponentially
Example:
Cube with 30 dimensions
Number of Cuboids: 2³º > 1 billion
Solution: Partial Cube
Classificate the OLAP Cube in Aggregation Groups
Example:
30 dimensions splitted into 3 groups of 10 dimensions
Number of Cuboids: 2¹º + 2¹º + 2¹º = 3072 << 1 billion
B2
29. Big Data MDX with Mondrian and Apache Kylin Sébastien Jelsch
Apache Kylin: Cube Build Process
B3
Source
Hive
Tables
HiveQL
Dimension
Dictionaries
Intermediate
Hive Table
HiveQL MapReduce
HDFS
Sequence
Files
N-Cuboid
(1)
(2) (3)
30. Big Data MDX with Mondrian and Apache Kylin Sébastien Jelsch
Apache Kylin: Cube Build Process
B4
MapReduce
N-Cuboid
HDFS
Sequence
Files
N-1-Cuboid
HDFS
Sequence
Files
0-Cuboid
HDFS
Sequence
Files
... MapReduce
MapReduce
HFiles
HBase
Bulk Import