Apache Kylin
Balance between Space and Time
Debashis Saha | Luke Han
2015-06-09
http://kylin.io | @ApacheKylin
About us
§Debashis Saha (@debashis_saha )
- VP, eBay Cloud Services (Platform, Infrastructure, Data)
§Luke Han (@lukehq)
- Sr. Product Manager, Analytics Data Infrastructure
- Committer & PMC Member of Apache Kylin
Agenda
§About Apache Kylin
§Feature Highlights
§Tech Highlights
§Roadmap
§Q&A
http://kylin.io
What
kylin	
  	
  /	
  ˈkiːˈlɪn	
  /	
  麒麟
-­‐-­‐n.	
  (in	
  Chinese	
  art)	
  a	
  mythical	
  animal	
  of	
  composite	
  form	
  
Extreme OLAP Engine for Big Data
Kylin is an open source Distributed Analytics Engine, contributed
by eBay Inc., provides SQL interface and multi-dimensional
analysis (OLAP) on Hadoop supporting extremely large datasets
http://kylin.io
• Open	
  Sourced	
  on	
  Oct	
  1st,	
  2014	
  
• Accepted	
  as	
  Apache	
  Incubator	
  Project	
  on	
  Nov	
  25th,	
  2014	
  
• http://kylin.io	
  (http://kylin.incubator.apache.org)
@ApacheKylin
§ External
§ 25+ contributors in community
§ Adoption:
§ On Production: Baidu Map
§ Evaluation: Huawei, Bloomberg Law, British Gas, JD.com
Microsoft, Tableau…
§ eBay Internal Cases
- 90% ile query < 5 seconds
Case Cube	
  Size Raw	
  Records
User	
  Session	
  Analysis 26	
  TB 28+	
  billion	
  rows
Traffic	
  Analysis 21	
  TB 20+	
  billion	
  rows
Behavior	
  Analysis 560	
  GB 1.2+	
  billion	
  rows
—from mailing list
Who are using Kylin?
http://kylin.io
Why
http://kylin.io
Happiness
Latency
10s
size
Balance Between Space and Time
http://kylin.io
time, item
time, item, location
time, item, location, supplier
time item location supplier
time, location
Time, supplier
item, location
item, supplier
location, supplier
time, item, supplier
time, location, supplier
item, location, supplier
0-D(apex) cuboid
1-D cuboids
2-D cuboids
3-D cuboids
4-D(base) cuboid
• Base vs. aggregate cells; ancestor vs. descendant cells; parent vs. child cells
1. (9/15, milk, Urbana, Dairy_land) - <time, item, location, supplier>
2. (9/15, milk, Urbana, *) - <time, item, location>
3. (*, milk, Urbana, *) - <item, location>
4. (*, milk, Chicago, *) - <item, location>
5. (*, milk, *, *) - <item>
• Cuboid = one combination of dimensions
• Cube = all combination of dimensions
(all cuboids)
OLAP Cube
How
http://kylin.io
Map Reduce
Kylin
BI Tools,Web App…
ANSI SQL
Agenda
§About Apache Kylin
§Feature Highlights
§Tech Highlights
§Roadmap
§Q&A
http://kylin.io
Feature Highlights
• Extremely Fast OLAP Engine at scale
• ANSI SQL Interface on Hadoop
• Seamless Integration with BI Tools, like Tableau
• Interactive Query Capability
• MOLAP Cube
• Incremental Build of Cubes
• Approximate Query Capability for Distinct Count (HyperLogLog)
• Leverage HBase Coprocessor for query latency
• Job Management and Monitoring
• User friendly Web GUI for manage, build, monitor and query cubes
• Security capability to set ACL at Cube/Project Level
• Support LDAP Integration
http://kylin.io
Define Data Model
http://kylin.io
Manage Jobs
http://kylin.io
Explore the Data
http://kylin.io
Interactive with BI Tool - Tableau
http://kylin.io
Agenda
§About Apache Kylin
§Feature Highlights
§Tech Highlights
§Roadmap
§Q&A
http://kylin.io
http://kylin.io
Cube	
  Build	
  Engine	
  
(MapReduce…)
SQL
Low	
  	
  Latency	
  -­‐	
  SecondsMid	
  Latency	
  -­‐	
  Minutes
Routing
3rd	
  Party	
  App	
  
(Web	
  App,	
  Mobile…)
Metadata
SQL-­‐Based	
  Tool	
  
(BI	
  Tools:	
  Tableau…)
Query	
  Engine
Hadoop
Hive
REST	
  API JDBC/ODBC
➢Online	
  Analysis	
  Data	
  Flow	
  
➢Offline	
  Data	
  Flow	
  
➢Clients/Users	
  interactive	
  with	
  Kylin	
  
via	
  SQL	
  
➢OLAP	
  Cube	
  is	
  transparent	
  to	
  users
Star	
  Schema	
  Data Key	
  Value	
  Data
Data	
  
Cube
OLAP	
  
Cube	
  
(HBase)
SQL
REST	
  Server
Kylin Architecture Overview
http://kylin.io
Cube:	
  …	
  
Fact	
  Table:	
  …	
  
Dimensions:	
  …	
  
Measures:	
  …	
  
Storage(HBase):	
  …
Fact
Dim Dim
Dim
Source	
  
Star	
  Schema
row	
  A
row	
  B
row	
  C
Column	
  Family
Val	
  1
Val	
  2
Val	
  3
Row	
  Key Column
Target	
  	
  
HBase	
  Storage
Mapping	
  
Cube	
  Metadata
End	
  User Cube	
  Modeler Admin
Data Modeling
http://kylin.io
Cube Build Job Flow
http://kylin.io
How to Store Cube - HBase Schema
http://kylin.io
SELECT	
  test_cal_dt.week_beg_dt,	
  
test_category.category_name,	
  test_category.lvl2_name,	
  
test_category.lvl3_name,	
  test_kylin_fact.lstg_format_name,	
  	
  
test_sites.site_name,	
  SUM(test_kylin_fact.price)	
  AS	
  GMV,	
  
COUNT(*)	
  AS	
  TRANS_CNT	
  
FROM	
  	
  test_kylin_fact	
  	
  
	
  	
  LEFT	
  JOIN	
  test_cal_dt	
  ON	
  test_kylin_fact.cal_dt	
  =	
  
test_cal_dt.cal_dt	
  
	
  	
  LEFT	
  JOIN	
  test_category	
  ON	
  test_kylin_fact.leaf_categ_id	
  =	
  
test_category.leaf_categ_id	
  	
  AND	
  test_kylin_fact.lstg_site_id	
  
=	
  test_category.site_id	
  
	
  	
  LEFT	
  JOIN	
  test_sites	
  ON	
  test_kylin_fact.lstg_site_id	
  =	
  
test_sites.site_id	
  
WHERE	
  test_kylin_fact.seller_id	
  =	
  123456OR	
  
test_kylin_fact.lstg_format_name	
  =	
  ’New'	
  
GROUP	
  BY	
  test_cal_dt.week_beg_dt,	
  
test_category.category_name,	
  test_category.lvl2_name,	
  
test_category.lvl3_name,	
  
test_kylin_fact.lstg_format_name,test_sites.site_name
OLAPToEnumerableConverter	
  
	
  	
  OLAPProjectRel(WEEK_BEG_DT=[$0],	
  category_name=[$1],	
  
CATEG_LVL2_NAME=[$2],	
  CATEG_LVL3_NAME=[$3],	
  LSTG_FORMAT_NAME=[$4],	
  
SITE_NAME=[$5],	
  GMV=[CASE(=($7,	
  0),	
  null,	
  $6)],	
  TRANS_CNT=[$8])	
  
	
  	
  	
  	
  OLAPAggregateRel(group=[{0,	
  1,	
  2,	
  3,	
  4,	
  5}],	
  agg#0=[$SUM0($6)],	
  
agg#1=[COUNT($6)],	
  TRANS_CNT=[COUNT()])	
  
	
  	
  	
  	
  	
  	
  OLAPProjectRel(WEEK_BEG_DT=[$13],	
  category_name=[$21],	
  
CATEG_LVL2_NAME=[$15],	
  CATEG_LVL3_NAME=[$14],	
  
LSTG_FORMAT_NAME=[$5],	
  SITE_NAME=[$23],	
  PRICE=[$0])	
  
	
  	
  	
  	
  	
  	
  	
  	
  OLAPFilterRel(condition=[OR(=($3,	
  123456),	
  =($5,	
  ’New'))])	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  OLAPJoinRel(condition=[=($2,	
  $25)],	
  joinType=[left])	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  OLAPJoinRel(condition=[AND(=($6,	
  $22),	
  =($2,	
  $17))],	
  joinType=[left])	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  OLAPJoinRel(condition=[=($4,	
  $12)],	
  joinType=[left])	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  OLAPTableScan(table=[[DEFAULT,	
  TEST_KYLIN_FACT]],	
  fields=[[0,	
  1,	
  2,	
  3,	
  
4,	
  5,	
  6,	
  7,	
  8,	
  9,	
  10,	
  11]])	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  OLAPTableScan(table=[[DEFAULT,	
  TEST_CAL_DT]],	
  fields=[[0,	
  1]])	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  OLAPTableScan(table=[[DEFAULT,	
  test_category]],	
  fields=[[0,	
  1,	
  2,	
  3,	
  4,	
  5,	
  
6,	
  7,	
  8]])	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  OLAPTableScan(table=[[DEFAULT,	
  TEST_SITES]],	
  fields=[[0,	
  1,	
  2]])
Kylin Query Engine - Explain Plan
Cube Optimization
§ Curse	
  of	
  Dimensionality	
  
§ N	
  dimension	
  cube	
  has	
  2N	
  cuboid	
  
§ Full	
  Cube	
  vs.	
  Parqal	
  Cube	
  	
  
§ Huge	
  Data	
  Volume	
  	
  
§ Dicqonary	
  Encoding	
  
§ Incremental	
  Building
http://kylin.io
§ Full	
  Cube
- Pre-aggregate all dimension combinations
- “Curse of dimensionality”: N dimension cube has 2N cuboid.
§ ParOal	
  Cube
- To avoid dimension explosion, we divide the dimensions into different aggregation
groups
- 2N+M+L à 2N + 2M + 2L
- For cube with 30 dimensions, if we divide these dimensions into 3 group, the cuboid
number will reduce from 1 Billion to 3 Thousands
- 230 à 210 + 210 + 210
- Tradeoff between online aggregation and offline pre-aggregation
http://kylin.io
Full Cube vs. Partical Cube
http://kylin.io
Partical Cube
http://kylin.io
Incremental Build
What’s Next
§ Improve	
  cube	
  algorithm	
  
§ Cube	
  by	
  segments,	
  30%-­‐50%	
  faster	
  
§ Build	
  delay	
  down	
  to	
  tens	
  of	
  minutes	
  
§ Streaming	
  cubing	
  
§ Analyze	
  real-­‐qme	
  data	
  
§ Build	
  delay	
  down	
  to	
  seconds	
  
§ Spark
http://kylin.io
Cube by Layer
§ The current algorithm
- Many MRs, the number of
dimensions
- Huge shuffles, aggregation at
reduce side, 100x of total cube
size
http://kylin.io
Full	
  Data
0-­‐D	
  Cuboid
1-­‐D	
  Cuboid
2-­‐D	
  Cuboid
3-­‐D	
  Cuboid
4-­‐D	
  Cuboid
MR
MR
MR
MR
MR
Cube by Segments
§ The to-be algorithm,
30%-50% faster
- 1 round MR
- Reduced shuffles, map side
aggregation, 20x total cube
size
- Hourly incremental build done
in tens of minutes
http://kylin.io
Data	
  Split
Cube	
  Segment
Data	
  Split
Cube	
  Segment
Data	
  Split
Cube	
  Segment
……
Final	
  Cube
Merge	
  Sort	
  
(Shuffle)
mapper mapper mapper
Streaming Cubing
§ Cube is great but…
- Cube takes time to build, how about real-time analysis?
- Sometimes we want to drill down to row level information
§ Streaming cubing
- Build micro cube segments from streaming
- Use inverted index to capture last minute data
http://kylin.io
http://kylin.io
Streaming
seconds	
  delay
Last	
  Hour
Inverted	
  
Index
Before	
  Last	
  Hour
Cube
Kylin Lambda Architecture
minutes	
  delay
Query	
  Engine
ANSI	
  SQL
Hybrid Storage
Interface
Adding Spark Support
§ Cubing Efficiency
§ MR is not optimal framework
§ Spark Cubing Engine
§ Source from SparkSQL
§ Read data from SparkSQL instead of Hive
§ Route to SparkSQL
§ Unsupported queries be coved by SparkSQL
http://kylin.io
Agenda
§About Apache Kylin
§Feature Highlights
§Tech Highlights
§Roadmap
§Q&A
http://kylin.io
Future2015 2016
Kylin Evolution Roadmap
http://kylin.io
20142013
Initial
Prototype	
  for	
  
MOLAP	
  
• Basic	
  end	
  to	
  end	
  POC	
  
MOLAP	
  
• Incremental	
  Refresh	
  
• ANSI	
  SQL	
  
• ODBC	
  Driver	
  
• Web	
  GUI	
  
• Tableau	
  
• ACL	
  
• Open	
  Source
StreamingOLAP	
  
• Streaming	
  OLAP	
  
• JDBC	
  Driver	
  
• New	
  UI	
  
• Excel	
  
• SparkSQL	
  
• …	
  more	
  
TBD
Sep,	
  2013
Jan,	
  2014
Oct,	
  2014
H1,	
  2015
HybridOLAP	
  
• Lambda	
  Arch	
  
• Automation	
  
• Capacity	
  
Management	
  
• Spark	
  	
  
• …	
  more
Next	
  Gen	
  
• Adv	
  OLAP	
  Functions	
  
• In-­‐Memory	
  Analysis	
  
(TBD)	
  
• Mobile	
  (TBD)	
  
• …	
  more
Kylin Ecosystem
■ Kylin Core
■ Fundamental framework of Kylin OLAP Engine
■ Extension
■ Plugins to support for additional functions and features
■ Integration
■ Lifecycle Management Support to integrate with other
applications
■ Interface
■ Allows for third party users to build more features via user-
interface atop Kylin core
http://kylin.io
Kylin OLAP
Core
Extension
à Security
à Redis Storage
à Spark Engine
à Docker
Interface
à Web Console
à Customized BI
à Ambari/Hue Plugin
Integration
à ODBC Driver
à ETL
à Drill
à SparkSQL
http://kylin.io
If	
  you	
  want	
  to	
  go	
  fast,	
  go	
  alone.	
  
If	
  you	
  want	
  to	
  go	
  far,	
  go	
  together.
-­‐-­‐African	
  Proverb
dev@kylin.incubator.apache.org

Apache Kylin - Balance between space and time - Hadoop Summit 2015

  • 1.
    Apache Kylin Balance betweenSpace and Time Debashis Saha | Luke Han 2015-06-09 http://kylin.io | @ApacheKylin
  • 2.
    About us §Debashis Saha(@debashis_saha ) - VP, eBay Cloud Services (Platform, Infrastructure, Data) §Luke Han (@lukehq) - Sr. Product Manager, Analytics Data Infrastructure - Committer & PMC Member of Apache Kylin
  • 3.
    Agenda §About Apache Kylin §FeatureHighlights §Tech Highlights §Roadmap §Q&A http://kylin.io
  • 4.
    What kylin    /  ˈkiːˈlɪn  /  麒麟 -­‐-­‐n.  (in  Chinese  art)  a  mythical  animal  of  composite  form   Extreme OLAP Engine for Big Data Kylin is an open source Distributed Analytics Engine, contributed by eBay Inc., provides SQL interface and multi-dimensional analysis (OLAP) on Hadoop supporting extremely large datasets http://kylin.io • Open  Sourced  on  Oct  1st,  2014   • Accepted  as  Apache  Incubator  Project  on  Nov  25th,  2014   • http://kylin.io  (http://kylin.incubator.apache.org) @ApacheKylin
  • 5.
    § External § 25+contributors in community § Adoption: § On Production: Baidu Map § Evaluation: Huawei, Bloomberg Law, British Gas, JD.com Microsoft, Tableau… § eBay Internal Cases - 90% ile query < 5 seconds Case Cube  Size Raw  Records User  Session  Analysis 26  TB 28+  billion  rows Traffic  Analysis 21  TB 20+  billion  rows Behavior  Analysis 560  GB 1.2+  billion  rows —from mailing list Who are using Kylin? http://kylin.io
  • 6.
  • 7.
    Balance Between Spaceand Time http://kylin.io time, item time, item, location time, item, location, supplier time item location supplier time, location Time, supplier item, location item, supplier location, supplier time, item, supplier time, location, supplier item, location, supplier 0-D(apex) cuboid 1-D cuboids 2-D cuboids 3-D cuboids 4-D(base) cuboid • Base vs. aggregate cells; ancestor vs. descendant cells; parent vs. child cells 1. (9/15, milk, Urbana, Dairy_land) - <time, item, location, supplier> 2. (9/15, milk, Urbana, *) - <time, item, location> 3. (*, milk, Urbana, *) - <item, location> 4. (*, milk, Chicago, *) - <item, location> 5. (*, milk, *, *) - <item> • Cuboid = one combination of dimensions • Cube = all combination of dimensions (all cuboids) OLAP Cube
  • 8.
  • 9.
    Agenda §About Apache Kylin §FeatureHighlights §Tech Highlights §Roadmap §Q&A http://kylin.io
  • 10.
    Feature Highlights • ExtremelyFast OLAP Engine at scale • ANSI SQL Interface on Hadoop • Seamless Integration with BI Tools, like Tableau • Interactive Query Capability • MOLAP Cube • Incremental Build of Cubes • Approximate Query Capability for Distinct Count (HyperLogLog) • Leverage HBase Coprocessor for query latency • Job Management and Monitoring • User friendly Web GUI for manage, build, monitor and query cubes • Security capability to set ACL at Cube/Project Level • Support LDAP Integration http://kylin.io
  • 11.
  • 12.
  • 13.
  • 14.
    Interactive with BITool - Tableau http://kylin.io
  • 15.
    Agenda §About Apache Kylin §FeatureHighlights §Tech Highlights §Roadmap §Q&A http://kylin.io
  • 16.
    http://kylin.io Cube  Build  Engine   (MapReduce…) SQL Low    Latency  -­‐  SecondsMid  Latency  -­‐  Minutes Routing 3rd  Party  App   (Web  App,  Mobile…) Metadata SQL-­‐Based  Tool   (BI  Tools:  Tableau…) Query  Engine Hadoop Hive REST  API JDBC/ODBC ➢Online  Analysis  Data  Flow   ➢Offline  Data  Flow   ➢Clients/Users  interactive  with  Kylin   via  SQL   ➢OLAP  Cube  is  transparent  to  users Star  Schema  Data Key  Value  Data Data   Cube OLAP   Cube   (HBase) SQL REST  Server Kylin Architecture Overview
  • 17.
    http://kylin.io Cube:  …   Fact  Table:  …   Dimensions:  …   Measures:  …   Storage(HBase):  … Fact Dim Dim Dim Source   Star  Schema row  A row  B row  C Column  Family Val  1 Val  2 Val  3 Row  Key Column Target     HBase  Storage Mapping   Cube  Metadata End  User Cube  Modeler Admin Data Modeling
  • 18.
  • 19.
    http://kylin.io How to StoreCube - HBase Schema
  • 20.
    http://kylin.io SELECT  test_cal_dt.week_beg_dt,   test_category.category_name,  test_category.lvl2_name,   test_category.lvl3_name,  test_kylin_fact.lstg_format_name,     test_sites.site_name,  SUM(test_kylin_fact.price)  AS  GMV,   COUNT(*)  AS  TRANS_CNT   FROM    test_kylin_fact        LEFT  JOIN  test_cal_dt  ON  test_kylin_fact.cal_dt  =   test_cal_dt.cal_dt      LEFT  JOIN  test_category  ON  test_kylin_fact.leaf_categ_id  =   test_category.leaf_categ_id    AND  test_kylin_fact.lstg_site_id   =  test_category.site_id      LEFT  JOIN  test_sites  ON  test_kylin_fact.lstg_site_id  =   test_sites.site_id   WHERE  test_kylin_fact.seller_id  =  123456OR   test_kylin_fact.lstg_format_name  =  ’New'   GROUP  BY  test_cal_dt.week_beg_dt,   test_category.category_name,  test_category.lvl2_name,   test_category.lvl3_name,   test_kylin_fact.lstg_format_name,test_sites.site_name OLAPToEnumerableConverter      OLAPProjectRel(WEEK_BEG_DT=[$0],  category_name=[$1],   CATEG_LVL2_NAME=[$2],  CATEG_LVL3_NAME=[$3],  LSTG_FORMAT_NAME=[$4],   SITE_NAME=[$5],  GMV=[CASE(=($7,  0),  null,  $6)],  TRANS_CNT=[$8])          OLAPAggregateRel(group=[{0,  1,  2,  3,  4,  5}],  agg#0=[$SUM0($6)],   agg#1=[COUNT($6)],  TRANS_CNT=[COUNT()])              OLAPProjectRel(WEEK_BEG_DT=[$13],  category_name=[$21],   CATEG_LVL2_NAME=[$15],  CATEG_LVL3_NAME=[$14],   LSTG_FORMAT_NAME=[$5],  SITE_NAME=[$23],  PRICE=[$0])                  OLAPFilterRel(condition=[OR(=($3,  123456),  =($5,  ’New'))])                      OLAPJoinRel(condition=[=($2,  $25)],  joinType=[left])                          OLAPJoinRel(condition=[AND(=($6,  $22),  =($2,  $17))],  joinType=[left])                              OLAPJoinRel(condition=[=($4,  $12)],  joinType=[left])                                  OLAPTableScan(table=[[DEFAULT,  TEST_KYLIN_FACT]],  fields=[[0,  1,  2,  3,   4,  5,  6,  7,  8,  9,  10,  11]])                                  OLAPTableScan(table=[[DEFAULT,  TEST_CAL_DT]],  fields=[[0,  1]])                              OLAPTableScan(table=[[DEFAULT,  test_category]],  fields=[[0,  1,  2,  3,  4,  5,   6,  7,  8]])                          OLAPTableScan(table=[[DEFAULT,  TEST_SITES]],  fields=[[0,  1,  2]]) Kylin Query Engine - Explain Plan
  • 21.
    Cube Optimization § Curse  of  Dimensionality   § N  dimension  cube  has  2N  cuboid   § Full  Cube  vs.  Parqal  Cube     § Huge  Data  Volume     § Dicqonary  Encoding   § Incremental  Building http://kylin.io
  • 22.
    § Full  Cube -Pre-aggregate all dimension combinations - “Curse of dimensionality”: N dimension cube has 2N cuboid. § ParOal  Cube - To avoid dimension explosion, we divide the dimensions into different aggregation groups - 2N+M+L à 2N + 2M + 2L - For cube with 30 dimensions, if we divide these dimensions into 3 group, the cuboid number will reduce from 1 Billion to 3 Thousands - 230 à 210 + 210 + 210 - Tradeoff between online aggregation and offline pre-aggregation http://kylin.io Full Cube vs. Partical Cube
  • 23.
  • 24.
  • 25.
    What’s Next § Improve  cube  algorithm   § Cube  by  segments,  30%-­‐50%  faster   § Build  delay  down  to  tens  of  minutes   § Streaming  cubing   § Analyze  real-­‐qme  data   § Build  delay  down  to  seconds   § Spark http://kylin.io
  • 26.
    Cube by Layer §The current algorithm - Many MRs, the number of dimensions - Huge shuffles, aggregation at reduce side, 100x of total cube size http://kylin.io Full  Data 0-­‐D  Cuboid 1-­‐D  Cuboid 2-­‐D  Cuboid 3-­‐D  Cuboid 4-­‐D  Cuboid MR MR MR MR MR
  • 27.
    Cube by Segments §The to-be algorithm, 30%-50% faster - 1 round MR - Reduced shuffles, map side aggregation, 20x total cube size - Hourly incremental build done in tens of minutes http://kylin.io Data  Split Cube  Segment Data  Split Cube  Segment Data  Split Cube  Segment …… Final  Cube Merge  Sort   (Shuffle) mapper mapper mapper
  • 28.
    Streaming Cubing § Cubeis great but… - Cube takes time to build, how about real-time analysis? - Sometimes we want to drill down to row level information § Streaming cubing - Build micro cube segments from streaming - Use inverted index to capture last minute data http://kylin.io
  • 29.
    http://kylin.io Streaming seconds  delay Last  Hour Inverted   Index Before  Last  Hour Cube Kylin Lambda Architecture minutes  delay Query  Engine ANSI  SQL Hybrid Storage Interface
  • 30.
    Adding Spark Support §Cubing Efficiency § MR is not optimal framework § Spark Cubing Engine § Source from SparkSQL § Read data from SparkSQL instead of Hive § Route to SparkSQL § Unsupported queries be coved by SparkSQL http://kylin.io
  • 31.
    Agenda §About Apache Kylin §FeatureHighlights §Tech Highlights §Roadmap §Q&A http://kylin.io
  • 32.
    Future2015 2016 Kylin EvolutionRoadmap http://kylin.io 20142013 Initial Prototype  for   MOLAP   • Basic  end  to  end  POC   MOLAP   • Incremental  Refresh   • ANSI  SQL   • ODBC  Driver   • Web  GUI   • Tableau   • ACL   • Open  Source StreamingOLAP   • Streaming  OLAP   • JDBC  Driver   • New  UI   • Excel   • SparkSQL   • …  more   TBD Sep,  2013 Jan,  2014 Oct,  2014 H1,  2015 HybridOLAP   • Lambda  Arch   • Automation   • Capacity   Management   • Spark     • …  more Next  Gen   • Adv  OLAP  Functions   • In-­‐Memory  Analysis   (TBD)   • Mobile  (TBD)   • …  more
  • 33.
    Kylin Ecosystem ■ KylinCore ■ Fundamental framework of Kylin OLAP Engine ■ Extension ■ Plugins to support for additional functions and features ■ Integration ■ Lifecycle Management Support to integrate with other applications ■ Interface ■ Allows for third party users to build more features via user- interface atop Kylin core http://kylin.io Kylin OLAP Core Extension à Security à Redis Storage à Spark Engine à Docker Interface à Web Console à Customized BI à Ambari/Hue Plugin Integration à ODBC Driver à ETL à Drill à SparkSQL
  • 34.
    http://kylin.io If  you  want  to  go  fast,  go  alone.   If  you  want  to  go  far,  go  together. -­‐-­‐African  Proverb dev@kylin.incubator.apache.org