SlideShare a Scribd company logo
1 of 34
Apache Kylin
Balance between Space and Time
Debashis Saha | Luke Han
2015-06-09
http://kylin.io | @ApacheKylin
About us
§Debashis Saha (@debashis_saha )
- VP, eBay Cloud Services (Platform, Infrastructure, Data)
§Luke Han (@lukehq)
- Sr. Product Manager, Analytics Data Infrastructure
- Committer & PMC Member of Apache Kylin
Agenda
§About Apache Kylin
§Feature Highlights
§Tech Highlights
§Roadmap
§Q&A
http://kylin.io
What
kylin	
  	
  /	
  ˈkiːˈlɪn	
  /	
  麒麟
-­‐-­‐n.	
  (in	
  Chinese	
  art)	
  a	
  mythical	
  animal	
  of	
  composite	
  form	
  
Extreme OLAP Engine for Big Data
Kylin is an open source Distributed Analytics Engine, contributed
by eBay Inc., provides SQL interface and multi-dimensional
analysis (OLAP) on Hadoop supporting extremely large datasets
http://kylin.io
• Open	
  Sourced	
  on	
  Oct	
  1st,	
  2014	
  
• Accepted	
  as	
  Apache	
  Incubator	
  Project	
  on	
  Nov	
  25th,	
  2014	
  
• http://kylin.io	
  (http://kylin.incubator.apache.org)
@ApacheKylin
§ External
§ 25+ contributors in community
§ Adoption:
§ On Production: Baidu Map
§ Evaluation: Huawei, Bloomberg Law, British Gas, JD.com
Microsoft, Tableau…
§ eBay Internal Cases
- 90% ile query < 5 seconds
Case Cube	
  Size Raw	
  Records
User	
  Session	
  Analysis 26	
  TB 28+	
  billion	
  rows
Traffic	
  Analysis 21	
  TB 20+	
  billion	
  rows
Behavior	
  Analysis 560	
  GB 1.2+	
  billion	
  rows
—from mailing list
Who are using Kylin?
http://kylin.io
Why
http://kylin.io
Happiness
Latency
10s
size
Balance Between Space and Time
http://kylin.io
time, item
time, item, location
time, item, location, supplier
time item location supplier
time, location
Time, supplier
item, location
item, supplier
location, supplier
time, item, supplier
time, location, supplier
item, location, supplier
0-D(apex) cuboid
1-D cuboids
2-D cuboids
3-D cuboids
4-D(base) cuboid
• Base vs. aggregate cells; ancestor vs. descendant cells; parent vs. child cells
1. (9/15, milk, Urbana, Dairy_land) - <time, item, location, supplier>
2. (9/15, milk, Urbana, *) - <time, item, location>
3. (*, milk, Urbana, *) - <item, location>
4. (*, milk, Chicago, *) - <item, location>
5. (*, milk, *, *) - <item>
• Cuboid = one combination of dimensions
• Cube = all combination of dimensions
(all cuboids)
OLAP Cube
How
http://kylin.io
Map Reduce
Kylin
BI Tools,Web App…
ANSI SQL
Agenda
§About Apache Kylin
§Feature Highlights
§Tech Highlights
§Roadmap
§Q&A
http://kylin.io
Feature Highlights
• Extremely Fast OLAP Engine at scale
• ANSI SQL Interface on Hadoop
• Seamless Integration with BI Tools, like Tableau
• Interactive Query Capability
• MOLAP Cube
• Incremental Build of Cubes
• Approximate Query Capability for Distinct Count (HyperLogLog)
• Leverage HBase Coprocessor for query latency
• Job Management and Monitoring
• User friendly Web GUI for manage, build, monitor and query cubes
• Security capability to set ACL at Cube/Project Level
• Support LDAP Integration
http://kylin.io
Define Data Model
http://kylin.io
Manage Jobs
http://kylin.io
Explore the Data
http://kylin.io
Interactive with BI Tool - Tableau
http://kylin.io
Agenda
§About Apache Kylin
§Feature Highlights
§Tech Highlights
§Roadmap
§Q&A
http://kylin.io
http://kylin.io
Cube	
  Build	
  Engine	
  
(MapReduce…)
SQL
Low	
  	
  Latency	
  -­‐	
  SecondsMid	
  Latency	
  -­‐	
  Minutes
Routing
3rd	
  Party	
  App	
  
(Web	
  App,	
  Mobile…)
Metadata
SQL-­‐Based	
  Tool	
  
(BI	
  Tools:	
  Tableau…)
Query	
  Engine
Hadoop
Hive
REST	
  API JDBC/ODBC
➢Online	
  Analysis	
  Data	
  Flow	
  
➢Offline	
  Data	
  Flow	
  
➢Clients/Users	
  interactive	
  with	
  Kylin	
  
via	
  SQL	
  
➢OLAP	
  Cube	
  is	
  transparent	
  to	
  users
Star	
  Schema	
  Data Key	
  Value	
  Data
Data	
  
Cube
OLAP	
  
Cube	
  
(HBase)
SQL
REST	
  Server
Kylin Architecture Overview
http://kylin.io
Cube:	
  …	
  
Fact	
  Table:	
  …	
  
Dimensions:	
  …	
  
Measures:	
  …	
  
Storage(HBase):	
  …
Fact
Dim Dim
Dim
Source	
  
Star	
  Schema
row	
  A
row	
  B
row	
  C
Column	
  Family
Val	
  1
Val	
  2
Val	
  3
Row	
  Key Column
Target	
  	
  
HBase	
  Storage
Mapping	
  
Cube	
  Metadata
End	
  User Cube	
  Modeler Admin
Data Modeling
http://kylin.io
Cube Build Job Flow
http://kylin.io
How to Store Cube - HBase Schema
http://kylin.io
SELECT	
  test_cal_dt.week_beg_dt,	
  
test_category.category_name,	
  test_category.lvl2_name,	
  
test_category.lvl3_name,	
  test_kylin_fact.lstg_format_name,	
  	
  
test_sites.site_name,	
  SUM(test_kylin_fact.price)	
  AS	
  GMV,	
  
COUNT(*)	
  AS	
  TRANS_CNT	
  
FROM	
  	
  test_kylin_fact	
  	
  
	
  	
  LEFT	
  JOIN	
  test_cal_dt	
  ON	
  test_kylin_fact.cal_dt	
  =	
  
test_cal_dt.cal_dt	
  
	
  	
  LEFT	
  JOIN	
  test_category	
  ON	
  test_kylin_fact.leaf_categ_id	
  =	
  
test_category.leaf_categ_id	
  	
  AND	
  test_kylin_fact.lstg_site_id	
  
=	
  test_category.site_id	
  
	
  	
  LEFT	
  JOIN	
  test_sites	
  ON	
  test_kylin_fact.lstg_site_id	
  =	
  
test_sites.site_id	
  
WHERE	
  test_kylin_fact.seller_id	
  =	
  123456OR	
  
test_kylin_fact.lstg_format_name	
  =	
  ’New'	
  
GROUP	
  BY	
  test_cal_dt.week_beg_dt,	
  
test_category.category_name,	
  test_category.lvl2_name,	
  
test_category.lvl3_name,	
  
test_kylin_fact.lstg_format_name,test_sites.site_name
OLAPToEnumerableConverter	
  
	
  	
  OLAPProjectRel(WEEK_BEG_DT=[$0],	
  category_name=[$1],	
  
CATEG_LVL2_NAME=[$2],	
  CATEG_LVL3_NAME=[$3],	
  LSTG_FORMAT_NAME=[$4],	
  
SITE_NAME=[$5],	
  GMV=[CASE(=($7,	
  0),	
  null,	
  $6)],	
  TRANS_CNT=[$8])	
  
	
  	
  	
  	
  OLAPAggregateRel(group=[{0,	
  1,	
  2,	
  3,	
  4,	
  5}],	
  agg#0=[$SUM0($6)],	
  
agg#1=[COUNT($6)],	
  TRANS_CNT=[COUNT()])	
  
	
  	
  	
  	
  	
  	
  OLAPProjectRel(WEEK_BEG_DT=[$13],	
  category_name=[$21],	
  
CATEG_LVL2_NAME=[$15],	
  CATEG_LVL3_NAME=[$14],	
  
LSTG_FORMAT_NAME=[$5],	
  SITE_NAME=[$23],	
  PRICE=[$0])	
  
	
  	
  	
  	
  	
  	
  	
  	
  OLAPFilterRel(condition=[OR(=($3,	
  123456),	
  =($5,	
  ’New'))])	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  OLAPJoinRel(condition=[=($2,	
  $25)],	
  joinType=[left])	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  OLAPJoinRel(condition=[AND(=($6,	
  $22),	
  =($2,	
  $17))],	
  joinType=[left])	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  OLAPJoinRel(condition=[=($4,	
  $12)],	
  joinType=[left])	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  OLAPTableScan(table=[[DEFAULT,	
  TEST_KYLIN_FACT]],	
  fields=[[0,	
  1,	
  2,	
  3,	
  
4,	
  5,	
  6,	
  7,	
  8,	
  9,	
  10,	
  11]])	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  OLAPTableScan(table=[[DEFAULT,	
  TEST_CAL_DT]],	
  fields=[[0,	
  1]])	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  OLAPTableScan(table=[[DEFAULT,	
  test_category]],	
  fields=[[0,	
  1,	
  2,	
  3,	
  4,	
  5,	
  
6,	
  7,	
  8]])	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  OLAPTableScan(table=[[DEFAULT,	
  TEST_SITES]],	
  fields=[[0,	
  1,	
  2]])
Kylin Query Engine - Explain Plan
Cube Optimization
§ Curse	
  of	
  Dimensionality	
  
§ N	
  dimension	
  cube	
  has	
  2N	
  cuboid	
  
§ Full	
  Cube	
  vs.	
  Parqal	
  Cube	
  	
  
§ Huge	
  Data	
  Volume	
  	
  
§ Dicqonary	
  Encoding	
  
§ Incremental	
  Building
http://kylin.io
§ Full	
  Cube
- Pre-aggregate all dimension combinations
- “Curse of dimensionality”: N dimension cube has 2N cuboid.
§ ParOal	
  Cube
- To avoid dimension explosion, we divide the dimensions into different aggregation
groups
- 2N+M+L à 2N + 2M + 2L
- For cube with 30 dimensions, if we divide these dimensions into 3 group, the cuboid
number will reduce from 1 Billion to 3 Thousands
- 230 à 210 + 210 + 210
- Tradeoff between online aggregation and offline pre-aggregation
http://kylin.io
Full Cube vs. Partical Cube
http://kylin.io
Partical Cube
http://kylin.io
Incremental Build
What’s Next
§ Improve	
  cube	
  algorithm	
  
§ Cube	
  by	
  segments,	
  30%-­‐50%	
  faster	
  
§ Build	
  delay	
  down	
  to	
  tens	
  of	
  minutes	
  
§ Streaming	
  cubing	
  
§ Analyze	
  real-­‐qme	
  data	
  
§ Build	
  delay	
  down	
  to	
  seconds	
  
§ Spark
http://kylin.io
Cube by Layer
§ The current algorithm
- Many MRs, the number of
dimensions
- Huge shuffles, aggregation at
reduce side, 100x of total cube
size
http://kylin.io
Full	
  Data
0-­‐D	
  Cuboid
1-­‐D	
  Cuboid
2-­‐D	
  Cuboid
3-­‐D	
  Cuboid
4-­‐D	
  Cuboid
MR
MR
MR
MR
MR
Cube by Segments
§ The to-be algorithm,
30%-50% faster
- 1 round MR
- Reduced shuffles, map side
aggregation, 20x total cube
size
- Hourly incremental build done
in tens of minutes
http://kylin.io
Data	
  Split
Cube	
  Segment
Data	
  Split
Cube	
  Segment
Data	
  Split
Cube	
  Segment
……
Final	
  Cube
Merge	
  Sort	
  
(Shuffle)
mapper mapper mapper
Streaming Cubing
§ Cube is great but…
- Cube takes time to build, how about real-time analysis?
- Sometimes we want to drill down to row level information
§ Streaming cubing
- Build micro cube segments from streaming
- Use inverted index to capture last minute data
http://kylin.io
http://kylin.io
Streaming
seconds	
  delay
Last	
  Hour
Inverted	
  
Index
Before	
  Last	
  Hour
Cube
Kylin Lambda Architecture
minutes	
  delay
Query	
  Engine
ANSI	
  SQL
Hybrid Storage
Interface
Adding Spark Support
§ Cubing Efficiency
§ MR is not optimal framework
§ Spark Cubing Engine
§ Source from SparkSQL
§ Read data from SparkSQL instead of Hive
§ Route to SparkSQL
§ Unsupported queries be coved by SparkSQL
http://kylin.io
Agenda
§About Apache Kylin
§Feature Highlights
§Tech Highlights
§Roadmap
§Q&A
http://kylin.io
Future2015 2016
Kylin Evolution Roadmap
http://kylin.io
20142013
Initial
Prototype	
  for	
  
MOLAP	
  
• Basic	
  end	
  to	
  end	
  POC	
  
MOLAP	
  
• Incremental	
  Refresh	
  
• ANSI	
  SQL	
  
• ODBC	
  Driver	
  
• Web	
  GUI	
  
• Tableau	
  
• ACL	
  
• Open	
  Source
StreamingOLAP	
  
• Streaming	
  OLAP	
  
• JDBC	
  Driver	
  
• New	
  UI	
  
• Excel	
  
• SparkSQL	
  
• …	
  more	
  
TBD
Sep,	
  2013
Jan,	
  2014
Oct,	
  2014
H1,	
  2015
HybridOLAP	
  
• Lambda	
  Arch	
  
• Automation	
  
• Capacity	
  
Management	
  
• Spark	
  	
  
• …	
  more
Next	
  Gen	
  
• Adv	
  OLAP	
  Functions	
  
• In-­‐Memory	
  Analysis	
  
(TBD)	
  
• Mobile	
  (TBD)	
  
• …	
  more
Kylin Ecosystem
■ Kylin Core
■ Fundamental framework of Kylin OLAP Engine
■ Extension
■ Plugins to support for additional functions and features
■ Integration
■ Lifecycle Management Support to integrate with other
applications
■ Interface
■ Allows for third party users to build more features via user-
interface atop Kylin core
http://kylin.io
Kylin OLAP
Core
Extension
à Security
à Redis Storage
à Spark Engine
à Docker
Interface
à Web Console
à Customized BI
à Ambari/Hue Plugin
Integration
à ODBC Driver
à ETL
à Drill
à SparkSQL
http://kylin.io
If	
  you	
  want	
  to	
  go	
  fast,	
  go	
  alone.	
  
If	
  you	
  want	
  to	
  go	
  far,	
  go	
  together.
-­‐-­‐African	
  Proverb
dev@kylin.incubator.apache.org

More Related Content

What's hot

Apache Kylin Streaming
Apache Kylin Streaming Apache Kylin Streaming
Apache Kylin Streaming hongbin ma
 
Apache Kylin’s Performance Boost from Apache HBase
Apache Kylin’s Performance Boost from Apache HBaseApache Kylin’s Performance Boost from Apache HBase
Apache Kylin’s Performance Boost from Apache HBaseHBaseCon
 
Apache kylin 2.0: from classic olap to real-time data warehouse
Apache kylin 2.0: from classic olap to real-time data warehouseApache kylin 2.0: from classic olap to real-time data warehouse
Apache kylin 2.0: from classic olap to real-time data warehouseYang Li
 
Apache Kylin on HBase: Extreme OLAP engine for big data
Apache Kylin on HBase: Extreme OLAP engine for big dataApache Kylin on HBase: Extreme OLAP engine for big data
Apache Kylin on HBase: Extreme OLAP engine for big dataShi Shao Feng
 
Adding Spark support to Kylin at Bay Area Spark Meetup
Adding Spark support to Kylin at Bay Area Spark MeetupAdding Spark support to Kylin at Bay Area Spark Meetup
Adding Spark support to Kylin at Bay Area Spark MeetupLuke Han
 
Kylin OLAP Engine Tour
Kylin OLAP Engine TourKylin OLAP Engine Tour
Kylin OLAP Engine TourLuke Han
 
6. Apache Kylin Roadmap and Community - Apache Kylin Meetup @Shanghai
6. Apache Kylin Roadmap and Community - Apache Kylin Meetup @Shanghai6. Apache Kylin Roadmap and Community - Apache Kylin Meetup @Shanghai
6. Apache Kylin Roadmap and Community - Apache Kylin Meetup @ShanghaiLuke Han
 
Kylin olap part 1- getting started
Kylin olap   part 1- getting startedKylin olap   part 1- getting started
Kylin olap part 1- getting startedShubham Shirude
 
Apache Kylin Introduction
Apache Kylin IntroductionApache Kylin Introduction
Apache Kylin IntroductionLuke Han
 
The Evolution of Apache Kylin by Luke Han
The Evolution of Apache Kylin by Luke HanThe Evolution of Apache Kylin by Luke Han
The Evolution of Apache Kylin by Luke HanLuke Han
 
Big Data MDX with Mondrian and Apache Kylin
Big Data MDX with Mondrian and Apache KylinBig Data MDX with Mondrian and Apache Kylin
Big Data MDX with Mondrian and Apache Kylininovex GmbH
 
Apache Kylin @ Big Data Europe 2015
Apache Kylin @ Big Data Europe 2015Apache Kylin @ Big Data Europe 2015
Apache Kylin @ Big Data Europe 2015Seshu Adunuthula
 
Apache Kylin Open Source Journey for QCon2015 Beijing
Apache Kylin Open Source Journey for QCon2015 BeijingApache Kylin Open Source Journey for QCon2015 Beijing
Apache Kylin Open Source Journey for QCon2015 BeijingLuke Han
 
Kylin Engineering Principles
Kylin Engineering PrinciplesKylin Engineering Principles
Kylin Engineering PrinciplesXu Jiang
 
1. Apache Kylin Deep Dive - Streaming and Plugin Architecture - Apache Kylin ...
1. Apache Kylin Deep Dive - Streaming and Plugin Architecture - Apache Kylin ...1. Apache Kylin Deep Dive - Streaming and Plugin Architecture - Apache Kylin ...
1. Apache Kylin Deep Dive - Streaming and Plugin Architecture - Apache Kylin ...Luke Han
 
Apache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
Apache Kylin: OLAP Engine on Hadoop - Tech Deep DiveApache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
Apache Kylin: OLAP Engine on Hadoop - Tech Deep DiveXu Jiang
 
Apache Kylin Use Cases in China and Japan
Apache Kylin Use Cases in China and JapanApache Kylin Use Cases in China and Japan
Apache Kylin Use Cases in China and JapanLuke Han
 
Apache kylin - Big Data Technology Conference 2014 Beijing
Apache kylin - Big Data Technology Conference 2014 BeijingApache kylin - Big Data Technology Conference 2014 Beijing
Apache kylin - Big Data Technology Conference 2014 BeijingLuke Han
 
Kylin and Druid Presentation
Kylin and Druid PresentationKylin and Druid Presentation
Kylin and Druid Presentationargonauts007
 

What's hot (20)

Apache Kylin Streaming
Apache Kylin Streaming Apache Kylin Streaming
Apache Kylin Streaming
 
Apache Kylin’s Performance Boost from Apache HBase
Apache Kylin’s Performance Boost from Apache HBaseApache Kylin’s Performance Boost from Apache HBase
Apache Kylin’s Performance Boost from Apache HBase
 
Apache kylin 2.0: from classic olap to real-time data warehouse
Apache kylin 2.0: from classic olap to real-time data warehouseApache kylin 2.0: from classic olap to real-time data warehouse
Apache kylin 2.0: from classic olap to real-time data warehouse
 
The Evolution of Apache Kylin
The Evolution of Apache KylinThe Evolution of Apache Kylin
The Evolution of Apache Kylin
 
Apache Kylin on HBase: Extreme OLAP engine for big data
Apache Kylin on HBase: Extreme OLAP engine for big dataApache Kylin on HBase: Extreme OLAP engine for big data
Apache Kylin on HBase: Extreme OLAP engine for big data
 
Adding Spark support to Kylin at Bay Area Spark Meetup
Adding Spark support to Kylin at Bay Area Spark MeetupAdding Spark support to Kylin at Bay Area Spark Meetup
Adding Spark support to Kylin at Bay Area Spark Meetup
 
Kylin OLAP Engine Tour
Kylin OLAP Engine TourKylin OLAP Engine Tour
Kylin OLAP Engine Tour
 
6. Apache Kylin Roadmap and Community - Apache Kylin Meetup @Shanghai
6. Apache Kylin Roadmap and Community - Apache Kylin Meetup @Shanghai6. Apache Kylin Roadmap and Community - Apache Kylin Meetup @Shanghai
6. Apache Kylin Roadmap and Community - Apache Kylin Meetup @Shanghai
 
Kylin olap part 1- getting started
Kylin olap   part 1- getting startedKylin olap   part 1- getting started
Kylin olap part 1- getting started
 
Apache Kylin Introduction
Apache Kylin IntroductionApache Kylin Introduction
Apache Kylin Introduction
 
The Evolution of Apache Kylin by Luke Han
The Evolution of Apache Kylin by Luke HanThe Evolution of Apache Kylin by Luke Han
The Evolution of Apache Kylin by Luke Han
 
Big Data MDX with Mondrian and Apache Kylin
Big Data MDX with Mondrian and Apache KylinBig Data MDX with Mondrian and Apache Kylin
Big Data MDX with Mondrian and Apache Kylin
 
Apache Kylin @ Big Data Europe 2015
Apache Kylin @ Big Data Europe 2015Apache Kylin @ Big Data Europe 2015
Apache Kylin @ Big Data Europe 2015
 
Apache Kylin Open Source Journey for QCon2015 Beijing
Apache Kylin Open Source Journey for QCon2015 BeijingApache Kylin Open Source Journey for QCon2015 Beijing
Apache Kylin Open Source Journey for QCon2015 Beijing
 
Kylin Engineering Principles
Kylin Engineering PrinciplesKylin Engineering Principles
Kylin Engineering Principles
 
1. Apache Kylin Deep Dive - Streaming and Plugin Architecture - Apache Kylin ...
1. Apache Kylin Deep Dive - Streaming and Plugin Architecture - Apache Kylin ...1. Apache Kylin Deep Dive - Streaming and Plugin Architecture - Apache Kylin ...
1. Apache Kylin Deep Dive - Streaming and Plugin Architecture - Apache Kylin ...
 
Apache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
Apache Kylin: OLAP Engine on Hadoop - Tech Deep DiveApache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
Apache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
 
Apache Kylin Use Cases in China and Japan
Apache Kylin Use Cases in China and JapanApache Kylin Use Cases in China and Japan
Apache Kylin Use Cases in China and Japan
 
Apache kylin - Big Data Technology Conference 2014 Beijing
Apache kylin - Big Data Technology Conference 2014 BeijingApache kylin - Big Data Technology Conference 2014 Beijing
Apache kylin - Big Data Technology Conference 2014 Beijing
 
Kylin and Druid Presentation
Kylin and Druid PresentationKylin and Druid Presentation
Kylin and Druid Presentation
 

Viewers also liked

eBay's private Cloud Journey
eBay's private Cloud JourneyeBay's private Cloud Journey
eBay's private Cloud JourneySuneet Nandwani
 
Webinar: How NVMe Will Change Flash Storage
Webinar: How NVMe Will Change Flash StorageWebinar: How NVMe Will Change Flash Storage
Webinar: How NVMe Will Change Flash StorageStorage Switzerland
 
Intel ssd dc data center family for PCIe
Intel ssd dc data center family for PCIeIntel ssd dc data center family for PCIe
Intel ssd dc data center family for PCIeLow Hong Chuan
 
Spark Summit Keynote by Seshu Adunuthula
Spark Summit Keynote by Seshu AdunuthulaSpark Summit Keynote by Seshu Adunuthula
Spark Summit Keynote by Seshu AdunuthulaSpark Summit
 
Hard Disk Drive versus Solid State Drive
Hard Disk Drive versus Solid State DriveHard Disk Drive versus Solid State Drive
Hard Disk Drive versus Solid State DriveDac Khue Nguyen
 
Moving to PCI Express based SSD with NVM Express
Moving to PCI Express based SSD with NVM ExpressMoving to PCI Express based SSD with NVM Express
Moving to PCI Express based SSD with NVM ExpressOdinot Stanislas
 

Viewers also liked (7)

eBay's private Cloud Journey
eBay's private Cloud JourneyeBay's private Cloud Journey
eBay's private Cloud Journey
 
Webinar: How NVMe Will Change Flash Storage
Webinar: How NVMe Will Change Flash StorageWebinar: How NVMe Will Change Flash Storage
Webinar: How NVMe Will Change Flash Storage
 
Solid state drives
Solid state drivesSolid state drives
Solid state drives
 
Intel ssd dc data center family for PCIe
Intel ssd dc data center family for PCIeIntel ssd dc data center family for PCIe
Intel ssd dc data center family for PCIe
 
Spark Summit Keynote by Seshu Adunuthula
Spark Summit Keynote by Seshu AdunuthulaSpark Summit Keynote by Seshu Adunuthula
Spark Summit Keynote by Seshu Adunuthula
 
Hard Disk Drive versus Solid State Drive
Hard Disk Drive versus Solid State DriveHard Disk Drive versus Solid State Drive
Hard Disk Drive versus Solid State Drive
 
Moving to PCI Express based SSD with NVM Express
Moving to PCI Express based SSD with NVM ExpressMoving to PCI Express based SSD with NVM Express
Moving to PCI Express based SSD with NVM Express
 

Similar to Apache Kylin: Balance Between Space and Time for Big Data Analytics

ApacheKylin_HBaseCon2015
ApacheKylin_HBaseCon2015ApacheKylin_HBaseCon2015
ApacheKylin_HBaseCon2015Luke Han
 
HBaseCon 2015: Apache Kylin - Extreme OLAP Engine for Hadoop
HBaseCon 2015: Apache Kylin - Extreme OLAP  Engine for HadoopHBaseCon 2015: Apache Kylin - Extreme OLAP  Engine for Hadoop
HBaseCon 2015: Apache Kylin - Extreme OLAP Engine for HadoopHBaseCon
 
Scio - A Scala API for Google Cloud Dataflow & Apache Beam
Scio - A Scala API for Google Cloud Dataflow & Apache BeamScio - A Scala API for Google Cloud Dataflow & Apache Beam
Scio - A Scala API for Google Cloud Dataflow & Apache BeamNeville Li
 
Apache Kylin - Balance Between Space and Time
Apache Kylin - Balance Between Space and TimeApache Kylin - Balance Between Space and Time
Apache Kylin - Balance Between Space and TimeDataWorks Summit
 
Scio - Moving to Google Cloud, A Spotify Story
 Scio - Moving to Google Cloud, A Spotify Story Scio - Moving to Google Cloud, A Spotify Story
Scio - Moving to Google Cloud, A Spotify StoryNeville Li
 
Cloud-native Semantic Layer on Data Lake
Cloud-native Semantic Layer on Data LakeCloud-native Semantic Layer on Data Lake
Cloud-native Semantic Layer on Data LakeDatabricks
 
Sorry - How Bieber broke Google Cloud at Spotify
Sorry - How Bieber broke Google Cloud at SpotifySorry - How Bieber broke Google Cloud at Spotify
Sorry - How Bieber broke Google Cloud at SpotifyNeville Li
 
Introduction to Presto at Treasure Data
Introduction to Presto at Treasure DataIntroduction to Presto at Treasure Data
Introduction to Presto at Treasure DataTaro L. Saito
 
Querying Druid in SQL with Superset
Querying Druid in SQL with SupersetQuerying Druid in SQL with Superset
Querying Druid in SQL with SupersetDataWorks Summit
 
Senlin deep dive 2015 05-20
Senlin deep dive 2015 05-20Senlin deep dive 2015 05-20
Senlin deep dive 2015 05-20Qiming Teng
 
Using ClickHouse for Experimentation
Using ClickHouse for ExperimentationUsing ClickHouse for Experimentation
Using ClickHouse for ExperimentationGleb Kanterov
 
How we evolved data pipeline at Celtra and what we learned along the way
How we evolved data pipeline at Celtra and what we learned along the wayHow we evolved data pipeline at Celtra and what we learned along the way
How we evolved data pipeline at Celtra and what we learned along the wayGrega Kespret
 
Accelerating Big Data Analytics with Apache Kylin
Accelerating Big Data Analytics with Apache KylinAccelerating Big Data Analytics with Apache Kylin
Accelerating Big Data Analytics with Apache KylinTyler Wishnoff
 
Keeping Identity Graphs In Sync With Apache Spark
Keeping Identity Graphs In Sync With Apache SparkKeeping Identity Graphs In Sync With Apache Spark
Keeping Identity Graphs In Sync With Apache SparkDatabricks
 
Oracle OpenWorld 2016 Review - Focus on Data, BigData, Streaming Data, Machin...
Oracle OpenWorld 2016 Review - Focus on Data, BigData, Streaming Data, Machin...Oracle OpenWorld 2016 Review - Focus on Data, BigData, Streaming Data, Machin...
Oracle OpenWorld 2016 Review - Focus on Data, BigData, Streaming Data, Machin...Lucas Jellema
 
2017 09-27 democratize data products with SQL
2017 09-27 democratize data products with SQL2017 09-27 democratize data products with SQL
2017 09-27 democratize data products with SQLYu Ishikawa
 
How and why we evolved a legacy Java web application to Scala... and we are s...
How and why we evolved a legacy Java web application to Scala... and we are s...How and why we evolved a legacy Java web application to Scala... and we are s...
How and why we evolved a legacy Java web application to Scala... and we are s...Katia Aresti
 
Self-serve analytics journey at Celtra: Snowflake, Spark, and Databricks
Self-serve analytics journey at Celtra: Snowflake, Spark, and DatabricksSelf-serve analytics journey at Celtra: Snowflake, Spark, and Databricks
Self-serve analytics journey at Celtra: Snowflake, Spark, and DatabricksGrega Kespret
 

Similar to Apache Kylin: Balance Between Space and Time for Big Data Analytics (20)

ApacheKylin_HBaseCon2015
ApacheKylin_HBaseCon2015ApacheKylin_HBaseCon2015
ApacheKylin_HBaseCon2015
 
HBaseCon 2015: Apache Kylin - Extreme OLAP Engine for Hadoop
HBaseCon 2015: Apache Kylin - Extreme OLAP  Engine for HadoopHBaseCon 2015: Apache Kylin - Extreme OLAP  Engine for Hadoop
HBaseCon 2015: Apache Kylin - Extreme OLAP Engine for Hadoop
 
Scio - A Scala API for Google Cloud Dataflow & Apache Beam
Scio - A Scala API for Google Cloud Dataflow & Apache BeamScio - A Scala API for Google Cloud Dataflow & Apache Beam
Scio - A Scala API for Google Cloud Dataflow & Apache Beam
 
Apache Kylin - Balance Between Space and Time
Apache Kylin - Balance Between Space and TimeApache Kylin - Balance Between Space and Time
Apache Kylin - Balance Between Space and Time
 
Scio - Moving to Google Cloud, A Spotify Story
 Scio - Moving to Google Cloud, A Spotify Story Scio - Moving to Google Cloud, A Spotify Story
Scio - Moving to Google Cloud, A Spotify Story
 
Cloud-native Semantic Layer on Data Lake
Cloud-native Semantic Layer on Data LakeCloud-native Semantic Layer on Data Lake
Cloud-native Semantic Layer on Data Lake
 
Sorry - How Bieber broke Google Cloud at Spotify
Sorry - How Bieber broke Google Cloud at SpotifySorry - How Bieber broke Google Cloud at Spotify
Sorry - How Bieber broke Google Cloud at Spotify
 
Introduction to Presto at Treasure Data
Introduction to Presto at Treasure DataIntroduction to Presto at Treasure Data
Introduction to Presto at Treasure Data
 
Querying Druid in SQL with Superset
Querying Druid in SQL with SupersetQuerying Druid in SQL with Superset
Querying Druid in SQL with Superset
 
Senlin deep dive 2015 05-20
Senlin deep dive 2015 05-20Senlin deep dive 2015 05-20
Senlin deep dive 2015 05-20
 
Using ClickHouse for Experimentation
Using ClickHouse for ExperimentationUsing ClickHouse for Experimentation
Using ClickHouse for Experimentation
 
How we evolved data pipeline at Celtra and what we learned along the way
How we evolved data pipeline at Celtra and what we learned along the wayHow we evolved data pipeline at Celtra and what we learned along the way
How we evolved data pipeline at Celtra and what we learned along the way
 
Accelerating Big Data Analytics with Apache Kylin
Accelerating Big Data Analytics with Apache KylinAccelerating Big Data Analytics with Apache Kylin
Accelerating Big Data Analytics with Apache Kylin
 
Keeping Identity Graphs In Sync With Apache Spark
Keeping Identity Graphs In Sync With Apache SparkKeeping Identity Graphs In Sync With Apache Spark
Keeping Identity Graphs In Sync With Apache Spark
 
Oow2016 review-db-dev-bigdata-BI
Oow2016 review-db-dev-bigdata-BIOow2016 review-db-dev-bigdata-BI
Oow2016 review-db-dev-bigdata-BI
 
Oracle OpenWorld 2016 Review - Focus on Data, BigData, Streaming Data, Machin...
Oracle OpenWorld 2016 Review - Focus on Data, BigData, Streaming Data, Machin...Oracle OpenWorld 2016 Review - Focus on Data, BigData, Streaming Data, Machin...
Oracle OpenWorld 2016 Review - Focus on Data, BigData, Streaming Data, Machin...
 
2017 09-27 democratize data products with SQL
2017 09-27 democratize data products with SQL2017 09-27 democratize data products with SQL
2017 09-27 democratize data products with SQL
 
How and why we evolved a legacy Java web application to Scala... and we are s...
How and why we evolved a legacy Java web application to Scala... and we are s...How and why we evolved a legacy Java web application to Scala... and we are s...
How and why we evolved a legacy Java web application to Scala... and we are s...
 
Self-serve analytics journey at Celtra: Snowflake, Spark, and Databricks
Self-serve analytics journey at Celtra: Snowflake, Spark, and DatabricksSelf-serve analytics journey at Celtra: Snowflake, Spark, and Databricks
Self-serve analytics journey at Celtra: Snowflake, Spark, and Databricks
 
Yahoo compares Storm and Spark
Yahoo compares Storm and SparkYahoo compares Storm and Spark
Yahoo compares Storm and Spark
 

Recently uploaded

Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfngoud9212
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 

Recently uploaded (20)

E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdf
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 

Apache Kylin: Balance Between Space and Time for Big Data Analytics

  • 1. Apache Kylin Balance between Space and Time Debashis Saha | Luke Han 2015-06-09 http://kylin.io | @ApacheKylin
  • 2. About us §Debashis Saha (@debashis_saha ) - VP, eBay Cloud Services (Platform, Infrastructure, Data) §Luke Han (@lukehq) - Sr. Product Manager, Analytics Data Infrastructure - Committer & PMC Member of Apache Kylin
  • 3. Agenda §About Apache Kylin §Feature Highlights §Tech Highlights §Roadmap §Q&A http://kylin.io
  • 4. What kylin    /  ˈkiːˈlɪn  /  麒麟 -­‐-­‐n.  (in  Chinese  art)  a  mythical  animal  of  composite  form   Extreme OLAP Engine for Big Data Kylin is an open source Distributed Analytics Engine, contributed by eBay Inc., provides SQL interface and multi-dimensional analysis (OLAP) on Hadoop supporting extremely large datasets http://kylin.io • Open  Sourced  on  Oct  1st,  2014   • Accepted  as  Apache  Incubator  Project  on  Nov  25th,  2014   • http://kylin.io  (http://kylin.incubator.apache.org) @ApacheKylin
  • 5. § External § 25+ contributors in community § Adoption: § On Production: Baidu Map § Evaluation: Huawei, Bloomberg Law, British Gas, JD.com Microsoft, Tableau… § eBay Internal Cases - 90% ile query < 5 seconds Case Cube  Size Raw  Records User  Session  Analysis 26  TB 28+  billion  rows Traffic  Analysis 21  TB 20+  billion  rows Behavior  Analysis 560  GB 1.2+  billion  rows —from mailing list Who are using Kylin? http://kylin.io
  • 7. Balance Between Space and Time http://kylin.io time, item time, item, location time, item, location, supplier time item location supplier time, location Time, supplier item, location item, supplier location, supplier time, item, supplier time, location, supplier item, location, supplier 0-D(apex) cuboid 1-D cuboids 2-D cuboids 3-D cuboids 4-D(base) cuboid • Base vs. aggregate cells; ancestor vs. descendant cells; parent vs. child cells 1. (9/15, milk, Urbana, Dairy_land) - <time, item, location, supplier> 2. (9/15, milk, Urbana, *) - <time, item, location> 3. (*, milk, Urbana, *) - <item, location> 4. (*, milk, Chicago, *) - <item, location> 5. (*, milk, *, *) - <item> • Cuboid = one combination of dimensions • Cube = all combination of dimensions (all cuboids) OLAP Cube
  • 9. Agenda §About Apache Kylin §Feature Highlights §Tech Highlights §Roadmap §Q&A http://kylin.io
  • 10. Feature Highlights • Extremely Fast OLAP Engine at scale • ANSI SQL Interface on Hadoop • Seamless Integration with BI Tools, like Tableau • Interactive Query Capability • MOLAP Cube • Incremental Build of Cubes • Approximate Query Capability for Distinct Count (HyperLogLog) • Leverage HBase Coprocessor for query latency • Job Management and Monitoring • User friendly Web GUI for manage, build, monitor and query cubes • Security capability to set ACL at Cube/Project Level • Support LDAP Integration http://kylin.io
  • 14. Interactive with BI Tool - Tableau http://kylin.io
  • 15. Agenda §About Apache Kylin §Feature Highlights §Tech Highlights §Roadmap §Q&A http://kylin.io
  • 16. http://kylin.io Cube  Build  Engine   (MapReduce…) SQL Low    Latency  -­‐  SecondsMid  Latency  -­‐  Minutes Routing 3rd  Party  App   (Web  App,  Mobile…) Metadata SQL-­‐Based  Tool   (BI  Tools:  Tableau…) Query  Engine Hadoop Hive REST  API JDBC/ODBC ➢Online  Analysis  Data  Flow   ➢Offline  Data  Flow   ➢Clients/Users  interactive  with  Kylin   via  SQL   ➢OLAP  Cube  is  transparent  to  users Star  Schema  Data Key  Value  Data Data   Cube OLAP   Cube   (HBase) SQL REST  Server Kylin Architecture Overview
  • 17. http://kylin.io Cube:  …   Fact  Table:  …   Dimensions:  …   Measures:  …   Storage(HBase):  … Fact Dim Dim Dim Source   Star  Schema row  A row  B row  C Column  Family Val  1 Val  2 Val  3 Row  Key Column Target     HBase  Storage Mapping   Cube  Metadata End  User Cube  Modeler Admin Data Modeling
  • 19. http://kylin.io How to Store Cube - HBase Schema
  • 20. http://kylin.io SELECT  test_cal_dt.week_beg_dt,   test_category.category_name,  test_category.lvl2_name,   test_category.lvl3_name,  test_kylin_fact.lstg_format_name,     test_sites.site_name,  SUM(test_kylin_fact.price)  AS  GMV,   COUNT(*)  AS  TRANS_CNT   FROM    test_kylin_fact        LEFT  JOIN  test_cal_dt  ON  test_kylin_fact.cal_dt  =   test_cal_dt.cal_dt      LEFT  JOIN  test_category  ON  test_kylin_fact.leaf_categ_id  =   test_category.leaf_categ_id    AND  test_kylin_fact.lstg_site_id   =  test_category.site_id      LEFT  JOIN  test_sites  ON  test_kylin_fact.lstg_site_id  =   test_sites.site_id   WHERE  test_kylin_fact.seller_id  =  123456OR   test_kylin_fact.lstg_format_name  =  ’New'   GROUP  BY  test_cal_dt.week_beg_dt,   test_category.category_name,  test_category.lvl2_name,   test_category.lvl3_name,   test_kylin_fact.lstg_format_name,test_sites.site_name OLAPToEnumerableConverter      OLAPProjectRel(WEEK_BEG_DT=[$0],  category_name=[$1],   CATEG_LVL2_NAME=[$2],  CATEG_LVL3_NAME=[$3],  LSTG_FORMAT_NAME=[$4],   SITE_NAME=[$5],  GMV=[CASE(=($7,  0),  null,  $6)],  TRANS_CNT=[$8])          OLAPAggregateRel(group=[{0,  1,  2,  3,  4,  5}],  agg#0=[$SUM0($6)],   agg#1=[COUNT($6)],  TRANS_CNT=[COUNT()])              OLAPProjectRel(WEEK_BEG_DT=[$13],  category_name=[$21],   CATEG_LVL2_NAME=[$15],  CATEG_LVL3_NAME=[$14],   LSTG_FORMAT_NAME=[$5],  SITE_NAME=[$23],  PRICE=[$0])                  OLAPFilterRel(condition=[OR(=($3,  123456),  =($5,  ’New'))])                      OLAPJoinRel(condition=[=($2,  $25)],  joinType=[left])                          OLAPJoinRel(condition=[AND(=($6,  $22),  =($2,  $17))],  joinType=[left])                              OLAPJoinRel(condition=[=($4,  $12)],  joinType=[left])                                  OLAPTableScan(table=[[DEFAULT,  TEST_KYLIN_FACT]],  fields=[[0,  1,  2,  3,   4,  5,  6,  7,  8,  9,  10,  11]])                                  OLAPTableScan(table=[[DEFAULT,  TEST_CAL_DT]],  fields=[[0,  1]])                              OLAPTableScan(table=[[DEFAULT,  test_category]],  fields=[[0,  1,  2,  3,  4,  5,   6,  7,  8]])                          OLAPTableScan(table=[[DEFAULT,  TEST_SITES]],  fields=[[0,  1,  2]]) Kylin Query Engine - Explain Plan
  • 21. Cube Optimization § Curse  of  Dimensionality   § N  dimension  cube  has  2N  cuboid   § Full  Cube  vs.  Parqal  Cube     § Huge  Data  Volume     § Dicqonary  Encoding   § Incremental  Building http://kylin.io
  • 22. § Full  Cube - Pre-aggregate all dimension combinations - “Curse of dimensionality”: N dimension cube has 2N cuboid. § ParOal  Cube - To avoid dimension explosion, we divide the dimensions into different aggregation groups - 2N+M+L à 2N + 2M + 2L - For cube with 30 dimensions, if we divide these dimensions into 3 group, the cuboid number will reduce from 1 Billion to 3 Thousands - 230 à 210 + 210 + 210 - Tradeoff between online aggregation and offline pre-aggregation http://kylin.io Full Cube vs. Partical Cube
  • 25. What’s Next § Improve  cube  algorithm   § Cube  by  segments,  30%-­‐50%  faster   § Build  delay  down  to  tens  of  minutes   § Streaming  cubing   § Analyze  real-­‐qme  data   § Build  delay  down  to  seconds   § Spark http://kylin.io
  • 26. Cube by Layer § The current algorithm - Many MRs, the number of dimensions - Huge shuffles, aggregation at reduce side, 100x of total cube size http://kylin.io Full  Data 0-­‐D  Cuboid 1-­‐D  Cuboid 2-­‐D  Cuboid 3-­‐D  Cuboid 4-­‐D  Cuboid MR MR MR MR MR
  • 27. Cube by Segments § The to-be algorithm, 30%-50% faster - 1 round MR - Reduced shuffles, map side aggregation, 20x total cube size - Hourly incremental build done in tens of minutes http://kylin.io Data  Split Cube  Segment Data  Split Cube  Segment Data  Split Cube  Segment …… Final  Cube Merge  Sort   (Shuffle) mapper mapper mapper
  • 28. Streaming Cubing § Cube is great but… - Cube takes time to build, how about real-time analysis? - Sometimes we want to drill down to row level information § Streaming cubing - Build micro cube segments from streaming - Use inverted index to capture last minute data http://kylin.io
  • 29. http://kylin.io Streaming seconds  delay Last  Hour Inverted   Index Before  Last  Hour Cube Kylin Lambda Architecture minutes  delay Query  Engine ANSI  SQL Hybrid Storage Interface
  • 30. Adding Spark Support § Cubing Efficiency § MR is not optimal framework § Spark Cubing Engine § Source from SparkSQL § Read data from SparkSQL instead of Hive § Route to SparkSQL § Unsupported queries be coved by SparkSQL http://kylin.io
  • 31. Agenda §About Apache Kylin §Feature Highlights §Tech Highlights §Roadmap §Q&A http://kylin.io
  • 32. Future2015 2016 Kylin Evolution Roadmap http://kylin.io 20142013 Initial Prototype  for   MOLAP   • Basic  end  to  end  POC   MOLAP   • Incremental  Refresh   • ANSI  SQL   • ODBC  Driver   • Web  GUI   • Tableau   • ACL   • Open  Source StreamingOLAP   • Streaming  OLAP   • JDBC  Driver   • New  UI   • Excel   • SparkSQL   • …  more   TBD Sep,  2013 Jan,  2014 Oct,  2014 H1,  2015 HybridOLAP   • Lambda  Arch   • Automation   • Capacity   Management   • Spark     • …  more Next  Gen   • Adv  OLAP  Functions   • In-­‐Memory  Analysis   (TBD)   • Mobile  (TBD)   • …  more
  • 33. Kylin Ecosystem ■ Kylin Core ■ Fundamental framework of Kylin OLAP Engine ■ Extension ■ Plugins to support for additional functions and features ■ Integration ■ Lifecycle Management Support to integrate with other applications ■ Interface ■ Allows for third party users to build more features via user- interface atop Kylin core http://kylin.io Kylin OLAP Core Extension à Security à Redis Storage à Spark Engine à Docker Interface à Web Console à Customized BI à Ambari/Hue Plugin Integration à ODBC Driver à ETL à Drill à SparkSQL
  • 34. http://kylin.io If  you  want  to  go  fast,  go  alone.   If  you  want  to  go  far,  go  together. -­‐-­‐African  Proverb dev@kylin.incubator.apache.org