Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Kylin Tour 
Kylin Tour 
--Extreme OLAP Engine for Big Data 
Luke Han | @lukehq 
Product Manager, eBay 
Oct 2014
Agenda 
 What’s Kylin 
 Architecture 
 Performance 
 Onboarding 
 Open Source 
 Q & A
Kylin Tour 
Big Data Era 
 More and more data becoming available on Hadoop 
 Limitations in existing Business Intelligen...
Kylin Tour 
Business Needs for Big Data Analysis 
 Sub-second query latency on billions of rows 
 High concurrency – tho...
Kylin Tour 
Options Considered 
 Commercial Solutions 
 Open Source Options 
No one solution that could match our busine...
Build an engine from scratch 
6 Kylin Tour
Extreme OLAP Engine for Big Data 
Kylin is an open source Distributed Analytics Engine from eBay 
that provides SQL interf...
Agenda 
 What’s Kylin 
 Architecture 
 Performance 
 Onboarding 
 Open Source 
 Q & A
Kylin is designed to accelerate analytics 
queries performance on Hadoop 
Strategy 
Kylin Tour 
Transaction 
Operation 
Hi...
Kylin Tour 10 
What’s OLAP Cube? 
• Cuboid = one combination of dimensions 
• Cube = all combination of dimensions (all cu...
Kylin Tour 11 
From Rational to Key-Value
Data 
Cube 
OLAP 
Cube 
(HBase) 
Kylin Tour 12 
Kylin Architecture Overview 
REST Server 
Query Engine 
Cube 
SQL-Based To...
Kylin Tour 13 
Why is Kylin fast? 
 Pre-built cube 
 No runtime Hive table scan and MapReduce job 
 Leveraging distribu...
End User Cube Modeler Admin 
Row Key Column 
Val 1 
Val 2 
Val 3 
Kylin Tour 14 
Data Modeling 
Cube: … 
Fact Table: … 
Di...
Kylin Tour 15 
Query Engine – Calcite (Optiq) 
 Dynamic data management framework. 
 Formerly known as Optiq, Calcite is...
Kylin Tour 16 
Query Engine – Kylin Explain Plan 
SELECT test_cal_dt.week_beg_dt, test_category.category_name, test_catego...
Agenda 
 What’s Kylin 
 Architecture 
 Performance 
 Onboarding 
 Open Source 
 Q & A
Kylin Tour 18 
Kylin vs. Hive 
200 
150 
100 
50 
0 
# Query 
Type 
SQL #1 SQL #2 SQL #3 
Return Dataset Query 
On Kylin (...
Kylin Tour 19 
Performance -- Concurrency 
Linear scale out with more nodes
Kylin Tour 20 
Performance - Query Latency 
90%tile queries <5s
Agenda 
 What’s Kylin 
 Architecture 
 Performance 
 Onboarding 
 Open Source 
 Q & A
Kylin Tour 22 
How to onboard - Data 
 Landing data to Hadoop Cluster 
 Creating Hive tables with Star-Schema 
 Calcula...
Kylin Tour 23 
How to onboard – Build Kylin Cube 
 Sync up Hive Metadata to Kylin 
 Create cube metadata via Cube Design...
Agenda 
 What’s Kylin 
 Architecture 
 Performance 
 Onboarding 
 Open Source 
 Q & A
Kylin Tour 25 
Open Source 
 Kylin Site: 
 http://kylin.io 
 Github Repo: 
 https://github.com/KylinOLAP 
 Google Gro...
Kylin Tour 26 
Kylin Ecosystem 
 Kylin Core 
 Fundamental framework of 
Kylin OLAP Engine 
 Extension 
 Plugins to sup...
Kylin Tour 27 
What’s Next? 
• v1.1 (Current version under development) 
– InvertedIndex to support extra high cardinality...
Kylin Tour 28 
Thanks 
 Kylin Web Site: 
 http://kylin.io 
 Google Group: 
 Kylin OLAP 
 Contact: 
 Luke Han | lukha...
Upcoming SlideShare
Loading in …5
×

Kylin OLAP Engine Tour

5,288 views

Published on

Kylin is an open source Distributed Analytics Engine from eBay Inc. that provides SQL interface and multi-dimensional analysis (OLAP) on Hadoop supporting extremely large datasets
Kylin Open Source Web Site: http://kylin.io

Published in: Technology
  • Hi there! Get Your Professional Job-Winning Resume Here - Check our website! http://bit.ly/resumpro
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Kylin OLAP Engine Tour

  1. 1. Kylin Tour Kylin Tour --Extreme OLAP Engine for Big Data Luke Han | @lukehq Product Manager, eBay Oct 2014
  2. 2. Agenda  What’s Kylin  Architecture  Performance  Onboarding  Open Source  Q & A
  3. 3. Kylin Tour Big Data Era  More and more data becoming available on Hadoop  Limitations in existing Business Intelligence (BI) Tools  Limited support for Hadoop  Data size growing exponentially  High latency of interactive queries  Challenges to adopt Hadoop as interactive analysis system  Majority of analyst groups are SQL savvy  No mature SQL interface on Hadoop  Full OLAP capability on Hadoop ecosystem not ready yet
  4. 4. Kylin Tour Business Needs for Big Data Analysis  Sub-second query latency on billions of rows  High concurrency – thousands of end users  ANSI-SQL for both analysts and engineers  Full OLAP capability to offer advanced functionality  Support of high cardinality and high dimensions  Seamless Integration with BI Tools  Distributed and scale out architecture for large data volume
  5. 5. Kylin Tour Options Considered  Commercial Solutions  Open Source Options No one solution that could match our business needs
  6. 6. Build an engine from scratch 6 Kylin Tour
  7. 7. Extreme OLAP Engine for Big Data Kylin is an open source Distributed Analytics Engine from eBay that provides SQL interface and multi-dimensional analysis (OLAP) on Hadoop supporting extremely large datasets Kylin Tour What’s Kylin kylin / ˈkiːˈlɪn / 麒麟 --n. (in Chinese art) a mythical animal of composite form
  8. 8. Agenda  What’s Kylin  Architecture  Performance  Onboarding  Open Source  Q & A
  9. 9. Kylin is designed to accelerate analytics queries performance on Hadoop Strategy Kylin Tour Transaction Operation High Level Aggregation •Very High Level, e.g GMV by site by vertical by weeks Analysis Query •Middle level, e.g GMV by site by vertical, by category (level x) past 12 weeks Drill Down to Detail •Detail Level (Summary Table) Low Level Aggregation •First Level Aggragation Transaction Level •Transaction Data 9 Analytics Query Taxonomy 80+% Analytics
  10. 10. Kylin Tour 10 What’s OLAP Cube? • Cuboid = one combination of dimensions • Cube = all combination of dimensions (all cuboids) time, item time item location supplier time, item, location item, location time, location, supplier time, item, location, supplier time, location Time, supplier item, supplier location, supplier time, item, supplier item, location, supplier 0-D(apex) cuboid 1-D cuboids 2-D cuboids 3-D cuboids 4-D(base) cuboid • Base vs. aggregate cells; ancestor vs. descendant cells; parent vs. child cells 1. (9/15, milk, Urbana, Dairy_land) - <time, item, location, supplier> 2. (9/15, milk, Urbana, *) - <time, item, location> 3. (*, milk, Urbana, *) - <item, location> 4. (*, milk, Chicago, *) - <item, location> 5. (*, milk, *, *) - <item>
  11. 11. Kylin Tour 11 From Rational to Key-Value
  12. 12. Data Cube OLAP Cube (HBase) Kylin Tour 12 Kylin Architecture Overview REST Server Query Engine Cube SQL-Based Tool (BI Tools: Tableau…) Build Engine SQL Low Latency - Seconds Mid Latency - Minutes Routing 3rd Party App (Web App, Mobile…) Metadata Hadoop Hive REST API JDBC/ODBC  Online Analysis Data Flow  Offline Data Flow  Clients/Users interactive with Kylin via SQL  OLAP Cube is transparent to users SQL Star Schema Data Key Value Data
  13. 13. Kylin Tour 13 Why is Kylin fast?  Pre-built cube  No runtime Hive table scan and MapReduce job  Leveraging distributed computing infrastructure  Compression and encoding  Put “Computing” to “Data”  Cached
  14. 14. End User Cube Modeler Admin Row Key Column Val 1 Val 2 Val 3 Kylin Tour 14 Data Modeling Cube: … Fact Table: … Dimensions: … Measures: … Dim Fact Storage(HBase): … Dim Dim Source Star Schema row A row B row C Column Family Target HBase Storage Mapping Cube Metadata
  15. 15. Kylin Tour 15 Query Engine – Calcite (Optiq)  Dynamic data management framework.  Formerly known as Optiq, Calcite is an Apache incubator project, used by Apache Drill and Apache Hive, among others.  http://optiq.incubator.apache.org
  16. 16. Kylin Tour 16 Query Engine – Kylin Explain Plan SELECT test_cal_dt.week_beg_dt, test_category.category_name, test_category.lvl2_name, test_category.lvl3_name, test_kylin_fact.lstg_format_name, test_sites.site_name, SUM(test_kylin_fact.price) AS GMV, COUNT(*) AS TRANS_CNT FROM test_kylin_fact LEFT JOIN test_cal_dt ON test_kylin_fact.cal_dt = test_cal_dt.cal_dt LEFT JOIN test_category ON test_kylin_fact.leaf_categ_id = test_category.leaf_categ_id AND test_kylin_fact.lstg_site_id = test_category.site_id LEFT JOIN test_sites ON test_kylin_fact.lstg_site_id = test_sites.site_id WHERE test_kylin_fact.seller_id = 123456OR test_kylin_fact.lstg_format_name = ’New' GROUP BY test_cal_dt.week_beg_dt, test_category.category_name, test_category.lvl2_name, test_category.lvl3_name, test_kylin_fact.lstg_format_name,test_sites.site_name OLAPToEnumerableConverter OLAPProjectRel(WEEK_BEG_DT=[$0], category_name=[$1], CATEG_LVL2_NAME=[$2], CATEG_LVL3_NAME=[$3], LSTG_FORMAT_NAME=[$4], SITE_NAME=[$5], GMV=[CASE(=($7, 0), null, $6)], TRANS_CNT=[$8]) OLAPAggregateRel(group=[{0, 1, 2, 3, 4, 5}], agg#0=[$SUM0($6)], agg#1=[COUNT($6)], TRANS_CNT=[COUNT()]) OLAPProjectRel(WEEK_BEG_DT=[$13], category_name=[$21], CATEG_LVL2_NAME=[$15], CATEG_LVL3_NAME=[$14], LSTG_FORMAT_NAME=[$5], SITE_NAME=[$23], PRICE=[$0]) OLAPFilterRel(condition=[OR(=($3, 123456), =($5, ’New'))]) OLAPJoinRel(condition=[=($2, $25)], joinType=[left]) OLAPJoinRel(condition=[AND(=($6, $22), =($2, $17))], joinType=[left]) OLAPJoinRel(condition=[=($4, $12)], joinType=[left]) OLAPTableScan(table=[[DEFAULT, TEST_KYLIN_FACT]], fields=[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]]) OLAPTableScan(table=[[DEFAULT, TEST_CAL_DT]], fields=[[0, 1]]) OLAPTableScan(table=[[DEFAULT, test_category]], fields=[[0, 1, 2, 3, 4, 5, 6, 7, 8]]) OLAPTableScan(table=[[DEFAULT, TEST_SITES]], fields=[[0, 1, 2]])
  17. 17. Agenda  What’s Kylin  Architecture  Performance  Onboarding  Open Source  Q & A
  18. 18. Kylin Tour 18 Kylin vs. Hive 200 150 100 50 0 # Query Type SQL #1 SQL #2 SQL #3 Return Dataset Query On Kylin (s) Query On Hive (s) Comments 1 High Level Aggregation 4 0.129 157.437 1,217 times 2 Analysis Query 22,669 1.615 109.206 68 times 3 Drill Down to Detail 325,029 12.058 113.123 9 times 4 Drill Down to Detail 524,780 22.42 6383.21 278 times 5 Data Dump 972,002 49.054 N/A Hive Kylin High Level Aggregatio n Analysis Query Drill Down to Detail Low Level Aggregatio n Transactio n Level
  19. 19. Kylin Tour 19 Performance -- Concurrency Linear scale out with more nodes
  20. 20. Kylin Tour 20 Performance - Query Latency 90%tile queries <5s
  21. 21. Agenda  What’s Kylin  Architecture  Performance  Onboarding  Open Source  Q & A
  22. 22. Kylin Tour 22 How to onboard - Data  Landing data to Hadoop Cluster  Creating Hive tables with Star-Schema  Calculating dimensions cardinality statistics  Gathering query patterns
  23. 23. Kylin Tour 23 How to onboard – Build Kylin Cube  Sync up Hive Metadata to Kylin  Create cube metadata via Cube Designer  Trigger cube build job  Setup Incremental job  Consume from front-end tool, like Tableau
  24. 24. Agenda  What’s Kylin  Architecture  Performance  Onboarding  Open Source  Q & A
  25. 25. Kylin Tour 25 Open Source  Kylin Site:  http://kylin.io  Github Repo:  https://github.com/KylinOLAP  Google Group:  Kylin OLAP
  26. 26. Kylin Tour 26 Kylin Ecosystem  Kylin Core  Fundamental framework of Kylin OLAP Engine  Extension  Plugins to support for additional functions and features  Integration  Lifecycle Management Support to integrate with other applications  Interface  Allows for third party users to build more features via user-interface atop Kylin core  Driver  ODBC and JDBC Drivers Kylin OLAP Core Extension  Security  SSO  Redis Storage Interface  Web Console  Customized BI  Ambari/Hue Plugin Integration  ODBC Driver  ETL  Scheduling
  27. 27. Kylin Tour 27 What’s Next? • v1.1 (Current version under development) – InvertedIndex to support extra high cardinality and high dimensions – Remote JDBC Driver – Filter on Coprocessor • v1.2 – Job Schedule and Priority – Capacity Management – Automation • v2.0 – HOLAP (Hybrid OLAP to combine ROLAP and MOLAP) – In-Memory Analysis – More…
  28. 28. Kylin Tour 28 Thanks  Kylin Web Site:  http://kylin.io  Google Group:  Kylin OLAP  Contact:  Luke Han | lukhan@ebay.com  Medha Samant | msamant@ebay.com

×