Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Adding Spark support to Kylin at Bay Area Spark Meetup


Published on

Quick talk about Kylin, existing challenges and potential area to leverage Spark Ecosystem at Bay Area Spark Meetup on March 25.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Adding Spark support to Kylin at Bay Area Spark Meetup

  1. 1. Adding Spark Support to Apache Kylin Luke Han | 韩卿 Kylin co-creator & PMC Member |
  2. 2. Extreme OLAP Engine for Big Data Kylin is an open source Distributed Analytics Engine from eBay that provides SQL interface and multi-dimensional analysis (OLAP) on Hadoop supporting extremely large datasets What’s Kylin kylin / ˈkiːˈlɪn / 麒麟 --n. (in Chinese art) a mythical animal of composite form • Open Sourced on Oct 1st, 2014 • Be accepted as Apache Incubator Project on Nov 25th, 2014
  3. 3. Kylin Architecture Overview 3 Cube Build Engine Batch (MapReduce, Spark) & Streaming SQL Low Latency - Seconds Mid Latency - Minutes Routing 3rd Party App (Web App, Mobile…) Metadata SQL-Based Tool (BI Tools: Tableau…) Query Engine Hadoop Hive REST API JDBC/ODBC  Online Analysis Data Flow  Offline Data Flow  Clients/Users interactive with Kylin via SQL  OLAP Cube is transparent to users Star Schema Data Key Value Data Data Cube OLAP Cube (HBase) SQL REST Server SparkSQL
  4. 4. http://kylin.io4  High latency when reading data from Hive  Several hours to fetch data when join big tables  Route to SQL-on-Hadoop turned off due to performance issue  Time-to-Market of data latency  Huge IO & Network traffic with MR jobs  Streaming?  Streaming process and pre-calculate cubes Challenges…
  5. 5. http://kylin.io5  Integrating with Spark SQL:  Option I: Read data from SparkSQL instead of Hive  Option II: Route unsupported queries to SparkSQL  Option III: Kylin to be OLAP source of SparkSQL  Spark Cube Build Engine  Efficiency cube generate engine with Spark  Spark Streaming  Leverage SparkStreaming for StreamingOLAP  HBase? Add Spark Support to Apache Kylin
  6. 6. Kylin Evolution Roadmap 201520142013 Initial Prototype for MOLAP • Basic end to end POC MOLAP • Incremental Refresh • ANSI SQL • ODBC Driver • Web GUI • ACL • Open Source HOLAP • Streaming OLAP • JDBC Driver • New UI • Excel Support • SparkSQL • … more Next Gen • Automation • Capacity Management • Spark Engine • In-Memory Analysis • … more TBD Future… Sep, 2013 Jan, 2014 Sep, 2014 Q1, 2015
  7. 7.  Kylin Core  Fundamental framework of Kylin OLAP Engine  Extension  Plugins to support for additional functions and features  Integration  Lifecycle Management Support to integrate with other applications  Interface  Allows for third party users to build more features via user- interface atop Kylin core  Driver  ODBC and JDBC Drivers Kylin OLAP Core Extension  Security  Redis Storage  Spark Engine  Docker Interface  Web Console  Customized BI  Ambari/Hue Plugin Integration  ODBC Driver  ETL  Drill  SparkSQL Kylin Ecosystem
  8. 8. If you want to go fast, go alone. If you want to go far, go together. --African Proverb