SlideShare a Scribd company logo
1 of 42
© 2014 MapR Technologies 1© 2014 MapR Technologies
© 2014 MapR Technologies 2
Who I am
Ted Dunning, Chief Applications Architect, MapR Technologies
Email tdunning@mapr.com tdunning@apache.org
Twitter @Ted_Dunning
VP Incubator
Email tdunning@apache.org
Twitter @ApacheMahout @ApacheDrill
Credit for slides to Luke Han and the Kylin dev team
© 2014 MapR Technologies 3
Kylin Committers
ankur Ankur Bansal
jiangxu Jiang Xu
liyang Li Yang
lukehan Luke Han*
mahongbin Hongbin Ma
xduo Xiaodong Duo
yisong George Song
jhyde Julian Hyde
The real deal
Calcite plenipotentiary
© 2014 MapR Technologies 4
Agenda
• What is Apache Kylin?
• Features & Tech Highlights
• Performance
• Roadmap
• Q & A
© 2014 MapR Technologies 5
What is Kylin?
Extreme OLAP Engine for Big Data
Kylin is an open source Distributed Analytics Engine from (originally from eBay) that
provides SQL interface and multi-dimensional analysis (OLAP) on Hadoop for
extremely large datasets
kylin / ˈkiːˈlɪn / 麒麟
--n. (in Chinese art) a mythical animal of composite form
• Open Sourced on Oct 1st, 2014
• Accepted into incubation November, 2014
• Preparing for first Apache release
© 2014 MapR Technologies 6
Big Data Obligatory Slide
• More and more data becoming available on Hadoop
• Limitations in existing Business Intelligence (BI) Tools
– Limited support for Hadoop
– Data size growing exponentially
– High latency of interactive queries
– Scale-Up architecture
• Challenges to adopt Hadoop as interactive analysis system
– Majority of analyst groups are SQL savvy
– No mature SQL interface on Hadoop
– OLAP capability on Hadoop ecosystem not ready yet
© 2014 MapR Technologies 7
Goals
• Sub-second query latency on billions of rows
• ANSI SQL for both analysts and engineers
• Full OLAP capability to offer advanced functionality
• Seamless Integration with BI Tools
• Support for high cardinality and dimensionality
• High concurrency – thousands of end users
• Distributed and scale out architecture for large data volume
© 2014 MapR Technologies 8
Possible Strategies
• Build from scratch
– A grand tradition
– Large-scale SQL support is much harder than it looks
– Huge level of distraction
• Patch Hive
– Not feasible due to design assumptions in Hive
– Weak optimizer
– Hive isn’t standard SQL anyway
– (but isn’t Hive moving to Calcite?)
© 2014 MapR Technologies 9
Kylin’s Strategy
• Use Calcite as SQL core
– Real SQL
– Real cost-based optimizer
– Already in Apache
– Provides linkage to Apache Drill and future of Hive
• Build cubes externally
– Don’t care which tools, currently Hive, soon Spark
• Use Calcite’s Rex interpreter
– Assumes final aggregations fit on one machine
• Possibly integrate with Drill at some point for parallel execution
© 2014 MapR Technologies 10
Transaction
Operation
Strategy
Analytics Query Taxonomy
High Level
Aggregation
• Very High Level, e.g GMV
by site by vertical by weeks
Analysis
Query
• Mid-level, e.g GMV by site
by vertical, by category
(level x) past 12 weeks
Drill Down
to Detail • Detail Level (Summary Table)
Low Level
Aggregation
• First Level
Aggregation
Transaction
Level
• Transaction
Data
OLAP
Kylin is designed to accelerate 80+% of analytics queries on Hadoop
OLTP
© 2014 MapR Technologies 11
Technical Challenges
• Huge volume data
– Table scan
• Big table joins
– Data shuffling
• Analysis on different granularity
– Runtime aggregation expensive
• Map Reduce job
– Batch processing
© 2014 MapR Technologies 12
How Cubes Work
• Start with a simple table
– revenue,time,item,location,supplier
• Build a table of aggregates for every combination of fields
select sum(revenue), max(revenue), supplier from tbl group by time,item,location;
select sum(revenue), max(revenue), location,supplier from tbl group by time,item;
select sum(revenue), max(revenue), location from tbl group by time,item,supplier;
…
• Then transform queries using appropriate magic
select sum(revenue), city from tbl join location_details
where state = ‘MN’ group by city
 select … from (select sum(),location from cube) join location_details
where state = ‘MN’ group by city
© 2014 MapR Technologies 13
How Cubes Don’t Work
• Total number of cubes is exponential in columns
• High cardinality can result in large cubes
• Skewed data can make cubes larger as original data
• Magic may be insufficient to recognize cubable queries
• Keeping cubes up to date can be hard
• Forget OLTP thoughts like pervasive transactions
© 2014 MapR Technologies 15
OLAP Cube – Balance between Space and Time
Base vs. aggregate cells; ancestor vs. descendant cells; parent vs. child cells
1. (9/15, milk, Urbana, Dairy_land) - <time, item, location, supplier>
2. (9/15, milk, Urbana, *) - <time, item, location>
3. (*, milk, Urbana, *) - <item, location>
4. (*, milk, Chicago, *) - <item, location>
5. (*, milk, *, *) - <item>
Cuboid = one combination of dimensions
Cube = all combinations of dimensions
1111
0111 1011 1101 1110
0011 0101 0110 1001 1010 1100
0001 0010 0100 1000
0000
© 2014 MapR Technologies 16
From Relational to Key-Value
© 2014 MapR Technologies 17
Kylin Architecture Overview
17
Cube Build Engine
(MapReduce…)
SQL
Low Latency - SecondsMid Latency - Minutes
Routing
3rd Party App
(Web App, Mobile…)
Metadata
SQL Tools
(BI Tools: Tableau…)
Query Engine
Hadoop
Hive
REST API JDBC/ODBC
 Online Analysis Data Flow
 Offline Data Flow
 Clients/Users interactive with Kylin via
SQL
 OLAP Cube is transparent to users
Star Schema Data Key Value Data
Data CubeOLAP
Cube
(HBase)
SQL
REST Server
© 2014 MapR Technologies 18
Kylin Depends on Hadoop Eco-system
• Hive
– Input source, pre-join star schema during cube building
• MapReduce
– Aggregate metrics during cube building
• HDFS
– Store intermediate files during cube building
• HBase
– Store and query data cubes
• Calcite
– SQL parsing, code generation, optimization
© 2014 MapR Technologies 19
Agenda
• What is Apache Kylin?
• Features & Tech Highlights
• Performance
• Roadmap
• Q & A
© 2014 MapR Technologies 20
Kylin Highlights
• Extremely Fast OLAP Engine at Scale
Kylin is designed to reduce query latency on Hadoop for 10+ billions of rows of data to seconds
• ANSI SQL Interface on Hadoop
Kylin offers ANSI SQL on Hadoop and supports most ANSI SQL query functions
• Seamless Integration with BI Tools
Kylin currently offers integration capability with BI Tools like Tableau.
• Interactive Query Capability
Users can interact with Hadoop data via Kylin at sub-second latency
• MOLAP Cube
User can define a data model and pre-build in Kylin with more than 10+ billions of raw data
records
© 2014 MapR Technologies 21
More Highlights
• Compression and Encoding Support
• Incremental Refresh of Cubes
• Approximate Query Capability for distinct Count (HyperLogLog)
• Leverage HBase Coprocessor for query latency
• Job Management and Monitoring
• Easy Web interface to manage, build, monitor and query cubes
• Security capability to set ACL at Cube/Project Level
• Support LDAP Integration
© 2014 MapR Technologies 22
Cube Designer
© 2014 MapR Technologies 23
Job Management
© 2014 MapR Technologies 24
Query and Visualization
© 2014 MapR Technologies 25
Tableau Integration
© 2014 MapR Technologies 26
Data Modeling Points of View
Cube: …
Fact Table: …
Dimensions: …
Measures: …
Storage(HBase): …
Fact
Dim Dim
Dim
Source
Star Schema
row A
row B
row C
Column Family
Val 1
Val 2
Val 3
Row Key Column
Target
HBase Storage
Mapping
Cube Metadata
End User Cube Modeler Admin
© 2014 MapR Technologies 27
Process Flow
Source Joined
tables
Build
dict
Dimension
dictionaries
Hive
© 2014 MapR Technologies 28
Process Flow
Joined n cuboid
n-1 cuboids
Apex cuboid
MR MR
Dimension
dictionaries
MR
© 2014 MapR Technologies 29
Process flow
n cuboid n-1 cuboids Apex cuboid
MR
H-files
HBase
© 2014 MapR Technologies 30
How To Store Cube? – HBase Schema
© 2014 MapR Technologies 31
SELECT test_cal_dt.week_beg_dt, test_category.category_name, test_category.lvl2_name, test_category.lvl3_name, test_kylin_fact.lstg_format_name,
test_sites.site_name, SUM(test_kylin_fact.price) AS GMV, COUNT(*) AS TRANS_CNT
FROM test_kylin_fact
LEFT JOIN test_cal_dt ON test_kylin_fact.cal_dt = test_cal_dt.cal_dt
LEFT JOIN test_category ON test_kylin_fact.leaf_categ_id = test_category.leaf_categ_id AND test_kylin_fact.lstg_site_id = test_category.site_id
LEFT JOIN test_sites ON test_kylin_fact.lstg_site_id = test_sites.site_id
WHERE test_kylin_fact.seller_id = 123456OR test_kylin_fact.lstg_format_name = ’New'
GROUP BY test_cal_dt.week_beg_dt, test_category.category_name, test_category.lvl2_name, test_category.lvl3_name,
test_kylin_fact.lstg_format_name,test_sites.site_name
OLAPToEnumerableConverter
OLAPProjectRel(WEEK_BEG_DT=[$0], category_name=[$1], CATEG_LVL2_NAME=[$2], CATEG_LVL3_NAME=[$3], LSTG_FORMAT_NAME=[$4],
SITE_NAME=[$5], GMV=[CASE(=($7, 0), null, $6)], TRANS_CNT=[$8])
OLAPAggregateRel(group=[{0, 1, 2, 3, 4, 5}], agg#0=[$SUM0($6)], agg#1=[COUNT($6)], TRANS_CNT=[COUNT()])
OLAPProjectRel(WEEK_BEG_DT=[$13], category_name=[$21], CATEG_LVL2_NAME=[$15], CATEG_LVL3_NAME=[$14], LSTG_FORMAT_NAME=[$5],
SITE_NAME=[$23], PRICE=[$0])
OLAPFilterRel(condition=[OR(=($3, 123456), =($5, ’New'))])
OLAPJoinRel(condition=[=($2, $25)], joinType=[left])
OLAPJoinRel(condition=[AND(=($6, $22), =($2, $17))], joinType=[left])
OLAPJoinRel(condition=[=($4, $12)], joinType=[left])
OLAPTableScan(table=[[DEFAULT, TEST_KYLIN_FACT]], fields=[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]])
OLAPTableScan(table=[[DEFAULT, TEST_CAL_DT]], fields=[[0, 1]])
OLAPTableScan(table=[[DEFAULT, test_category]], fields=[[0, 1, 2, 3, 4, 5, 6, 7, 8]])
OLAPTableScan(table=[[DEFAULT, TEST_SITES]], fields=[[0, 1, 2]])
Query Engine – Kylin Explain Plan
© 2014 MapR Technologies 32
Now Let’s Make it Really Work
• Full Cube
– Pre-aggregate all dimension combinations
– “Curse of dimensionality”: N dimension cube has 2N cuboid.
• Partial Cube
– To avoid dimension explosion, we divide the dimensions into different
aggregation groups
• 2N+M+L  2N + 2M + 2L
– For cube with 30 dimensions, if we divide these dimensions into 3
group, the cuboid count is reduced from 1 Billion to 3 thousand
• 230  210 + 210 + 210
– Tradeoff between online aggregation and offline pre-aggregation
© 2014 MapR Technologies 34
Incremental Cube Building
© 2014 MapR Technologies 36
Agenda
• What is Apache Kylin?
• Features & Tech Highlights
• Performance
• Roadmap
• Q & A
© 2014 MapR Technologies 37
# Query Type Return Dataset Query
On Kylin (s)
Query
On Hive (s)
Comments
1 High Level
Aggregation
4 0.129 157.437 1,217 times
2 Analysis Query 22,669 1.615 109.206 68 times
3 Drill Down to Detail 325,029 12.058 113.123 9 times
4 Drill Down to Detail 524,780 22.42 6383.21 278 times
5 Data Dump 972,002 49.054 N/A
0
50
100
150
200
SQL #1 SQL #2 SQL #3
Hive
Kylin
High
Level
Aggregati
on
Analysis
Query
Drill Down
to Detail
Low Level
Aggregati
on
Transactio
n Level
Based on 12+B records
Kylin vs. Hive
© 2014 MapR Technologies 38
Performance Scaleout
Linear scale out with more nodes
© 2014 MapR Technologies 39
Performance - Query Latency
99 %-ile
95 %-ile
© 2014 MapR Technologies 40
Agenda
• What is Apache Kylin?
• Features & Tech Highlights
• Performance
• Roadmap
• Q & A
© 2014 MapR Technologies 41
201520142013
Initial
Prototype
for MOLAP
• Basic end to end
POC
MOLAP
• Incremental
Refresh
• ANSI SQL
• ODBC Driver
• Web GUI
• ACL
• Open Source
HOLAP
• Streaming OLAP
• JDBC Driver
• New UI
• Excel Support
• … more
Next Gen
• Automation
• Capacity
Management
• In-Memory Analysis
(TBD)
• Spark (TBD)
• … more
TBD
Future…
Sep, 2013
Jan, 2014
Sep, 2014
Q1, 2015
Kylin History and Roadmap
© 2014 MapR Technologies 42
Kylin Ecosystem
• Kylin Core
– Fundamental framework of Kylin
OLAP Engine
• Extension
– Plugins to support for additional
functions and features
• Integration
– Lifecycle Management Support to
integrate with other applications
• Interface
– Allows for third party users to build
more features via user-interface atop
Kylin core
• Driver
– ODBC and JDBC Drivers
Kylin OLAP
Core
Extension
 Security
 Redis Storage
 Spark Engine
 Docker
Interface
 Web Console
 Customized BI
 Ambari/Hue Plugin
Integration
 ODBC Driver
 ETL
 Drill
 SparkSQL
© 2014 MapR Technologies 44
If you want to go fast, go alone.
If you want to go far, go together.
--African Proverb
© 2014 MapR Technologies 45
Agenda
• What is Apache Kylin?
• Features & Tech Highlights
• Performance
• Roadmap
• Q & A
© 2014 MapR Technologies 46
Q&A
@mapr maprtech
tdunning@mapr.com
Engage with us!
MapR
maprtech
mapr-technologies

More Related Content

What's hot

Big Data MDX with Mondrian and Apache Kylin
Big Data MDX with Mondrian and Apache KylinBig Data MDX with Mondrian and Apache Kylin
Big Data MDX with Mondrian and Apache Kylininovex GmbH
 
OLAP for Big Data (Druid vs Apache Kylin vs Apache Lens)
OLAP for Big Data (Druid vs Apache Kylin vs Apache Lens)OLAP for Big Data (Druid vs Apache Kylin vs Apache Lens)
OLAP for Big Data (Druid vs Apache Kylin vs Apache Lens)SANG WON PARK
 
Apache Kylin - Balance Between Space and Time
Apache Kylin - Balance Between Space and TimeApache Kylin - Balance Between Space and Time
Apache Kylin - Balance Between Space and TimeDataWorks Summit
 
Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?confluent
 
Low Latency OLAP with Hadoop and HBase
Low Latency OLAP with Hadoop and HBaseLow Latency OLAP with Hadoop and HBase
Low Latency OLAP with Hadoop and HBaseDataWorks Summit
 
Kafka Connect - debezium
Kafka Connect - debeziumKafka Connect - debezium
Kafka Connect - debeziumKasun Don
 
Cassandra at eBay - Cassandra Summit 2012
Cassandra at eBay - Cassandra Summit 2012Cassandra at eBay - Cassandra Summit 2012
Cassandra at eBay - Cassandra Summit 2012Jay Patel
 
Spark tunning in Apache Kylin
Spark tunning in Apache KylinSpark tunning in Apache Kylin
Spark tunning in Apache KylinShi Shao Feng
 
Apache Kafka® and API Management
Apache Kafka® and API ManagementApache Kafka® and API Management
Apache Kafka® and API Managementconfluent
 
Apache Druid 101
Apache Druid 101Apache Druid 101
Apache Druid 101Data Con LA
 
Getting Started with Confluent Schema Registry
Getting Started with Confluent Schema RegistryGetting Started with Confluent Schema Registry
Getting Started with Confluent Schema Registryconfluent
 
Facebook Messages & HBase
Facebook Messages & HBaseFacebook Messages & HBase
Facebook Messages & HBase强 王
 
Schema Registry 101 with Bill Bejeck | Kafka Summit London 2022
Schema Registry 101 with Bill Bejeck | Kafka Summit London 2022Schema Registry 101 with Bill Bejeck | Kafka Summit London 2022
Schema Registry 101 with Bill Bejeck | Kafka Summit London 2022HostedbyConfluent
 
Apache Kylin Introduction
Apache Kylin IntroductionApache Kylin Introduction
Apache Kylin IntroductionLuke Han
 
Streaming all over the world Real life use cases with Kafka Streams
Streaming all over the world  Real life use cases with Kafka StreamsStreaming all over the world  Real life use cases with Kafka Streams
Streaming all over the world Real life use cases with Kafka Streamsconfluent
 
A Closer Look at Apache Kudu
A Closer Look at Apache KuduA Closer Look at Apache Kudu
A Closer Look at Apache KuduAndriy Zabavskyy
 
Kafka Summit NYC 2017 - Data Processing at LinkedIn with Apache Kafka
Kafka Summit NYC 2017 - Data Processing at LinkedIn with Apache KafkaKafka Summit NYC 2017 - Data Processing at LinkedIn with Apache Kafka
Kafka Summit NYC 2017 - Data Processing at LinkedIn with Apache Kafkaconfluent
 
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)Jean-Paul Azar
 
Apache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingApache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingDataWorks Summit
 
Real-Time Data Flows with Apache NiFi
Real-Time Data Flows with Apache NiFiReal-Time Data Flows with Apache NiFi
Real-Time Data Flows with Apache NiFiManish Gupta
 

What's hot (20)

Big Data MDX with Mondrian and Apache Kylin
Big Data MDX with Mondrian and Apache KylinBig Data MDX with Mondrian and Apache Kylin
Big Data MDX with Mondrian and Apache Kylin
 
OLAP for Big Data (Druid vs Apache Kylin vs Apache Lens)
OLAP for Big Data (Druid vs Apache Kylin vs Apache Lens)OLAP for Big Data (Druid vs Apache Kylin vs Apache Lens)
OLAP for Big Data (Druid vs Apache Kylin vs Apache Lens)
 
Apache Kylin - Balance Between Space and Time
Apache Kylin - Balance Between Space and TimeApache Kylin - Balance Between Space and Time
Apache Kylin - Balance Between Space and Time
 
Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?
 
Low Latency OLAP with Hadoop and HBase
Low Latency OLAP with Hadoop and HBaseLow Latency OLAP with Hadoop and HBase
Low Latency OLAP with Hadoop and HBase
 
Kafka Connect - debezium
Kafka Connect - debeziumKafka Connect - debezium
Kafka Connect - debezium
 
Cassandra at eBay - Cassandra Summit 2012
Cassandra at eBay - Cassandra Summit 2012Cassandra at eBay - Cassandra Summit 2012
Cassandra at eBay - Cassandra Summit 2012
 
Spark tunning in Apache Kylin
Spark tunning in Apache KylinSpark tunning in Apache Kylin
Spark tunning in Apache Kylin
 
Apache Kafka® and API Management
Apache Kafka® and API ManagementApache Kafka® and API Management
Apache Kafka® and API Management
 
Apache Druid 101
Apache Druid 101Apache Druid 101
Apache Druid 101
 
Getting Started with Confluent Schema Registry
Getting Started with Confluent Schema RegistryGetting Started with Confluent Schema Registry
Getting Started with Confluent Schema Registry
 
Facebook Messages & HBase
Facebook Messages & HBaseFacebook Messages & HBase
Facebook Messages & HBase
 
Schema Registry 101 with Bill Bejeck | Kafka Summit London 2022
Schema Registry 101 with Bill Bejeck | Kafka Summit London 2022Schema Registry 101 with Bill Bejeck | Kafka Summit London 2022
Schema Registry 101 with Bill Bejeck | Kafka Summit London 2022
 
Apache Kylin Introduction
Apache Kylin IntroductionApache Kylin Introduction
Apache Kylin Introduction
 
Streaming all over the world Real life use cases with Kafka Streams
Streaming all over the world  Real life use cases with Kafka StreamsStreaming all over the world  Real life use cases with Kafka Streams
Streaming all over the world Real life use cases with Kafka Streams
 
A Closer Look at Apache Kudu
A Closer Look at Apache KuduA Closer Look at Apache Kudu
A Closer Look at Apache Kudu
 
Kafka Summit NYC 2017 - Data Processing at LinkedIn with Apache Kafka
Kafka Summit NYC 2017 - Data Processing at LinkedIn with Apache KafkaKafka Summit NYC 2017 - Data Processing at LinkedIn with Apache Kafka
Kafka Summit NYC 2017 - Data Processing at LinkedIn with Apache Kafka
 
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
 
Apache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingApache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data Processing
 
Real-Time Data Flows with Apache NiFi
Real-Time Data Flows with Apache NiFiReal-Time Data Flows with Apache NiFi
Real-Time Data Flows with Apache NiFi
 

Similar to Apache Kylin – Cubes on Hadoop

Big Data Everywhere Chicago: SQL on Hadoop
Big Data Everywhere Chicago: SQL on Hadoop Big Data Everywhere Chicago: SQL on Hadoop
Big Data Everywhere Chicago: SQL on Hadoop BigDataEverywhere
 
Real time-hadoop
Real time-hadoopReal time-hadoop
Real time-hadoopTed Dunning
 
IoT and Big Data - Iot Asia 2014
IoT and Big Data - Iot Asia 2014IoT and Big Data - Iot Asia 2014
IoT and Big Data - Iot Asia 2014John Berns
 
How the Internet of Things is Turning the Internet Upside Down
How the Internet of Things is Turning the Internet Upside DownHow the Internet of Things is Turning the Internet Upside Down
How the Internet of Things is Turning the Internet Upside DownTed Dunning
 
Dunning time-series-2015
Dunning time-series-2015Dunning time-series-2015
Dunning time-series-2015Ted Dunning
 
Dealing with an Upside Down Internet With High Performance Time Series Database
Dealing with an Upside Down Internet  With High Performance Time Series DatabaseDealing with an Upside Down Internet  With High Performance Time Series Database
Dealing with an Upside Down Internet With High Performance Time Series DatabaseDataWorks Summit
 
Stinger Initiative - Deep Dive
Stinger Initiative - Deep DiveStinger Initiative - Deep Dive
Stinger Initiative - Deep DiveHortonworks
 
Using Hadoop to Offload Data Warehouse Processing and More - Brad Anserson
Using Hadoop to Offload Data Warehouse Processing and More - Brad AnsersonUsing Hadoop to Offload Data Warehouse Processing and More - Brad Anserson
Using Hadoop to Offload Data Warehouse Processing and More - Brad AnsersonMapR Technologies
 
Apache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
Apache Kylin: OLAP Engine on Hadoop - Tech Deep DiveApache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
Apache Kylin: OLAP Engine on Hadoop - Tech Deep DiveXu Jiang
 
The Future of Hadoop: MapR VP of Product Management, Tomer Shiran
The Future of Hadoop: MapR VP of Product Management, Tomer ShiranThe Future of Hadoop: MapR VP of Product Management, Tomer Shiran
The Future of Hadoop: MapR VP of Product Management, Tomer ShiranMapR Technologies
 
Apache Kylin: Hadoop OLAP Engine, 2014 Dec
Apache Kylin: Hadoop OLAP Engine, 2014 DecApache Kylin: Hadoop OLAP Engine, 2014 Dec
Apache Kylin: Hadoop OLAP Engine, 2014 DecYang Li
 
Big Data Ecosystem- Impetus Technologies
Big Data Ecosystem-  Impetus TechnologiesBig Data Ecosystem-  Impetus Technologies
Big Data Ecosystem- Impetus TechnologiesImpetus Technologies
 
Spark and MapR Streams: A Motivating Example
Spark and MapR Streams: A Motivating ExampleSpark and MapR Streams: A Motivating Example
Spark and MapR Streams: A Motivating ExampleIan Downard
 
Predictive Analytics with Hadoop
Predictive Analytics with HadoopPredictive Analytics with Hadoop
Predictive Analytics with HadoopDataWorks Summit
 
Recommendation Techn
Recommendation TechnRecommendation Techn
Recommendation TechnTed Dunning
 
Converged and Containerized Distributed Deep Learning With TensorFlow and Kub...
Converged and Containerized Distributed Deep Learning With TensorFlow and Kub...Converged and Containerized Distributed Deep Learning With TensorFlow and Kub...
Converged and Containerized Distributed Deep Learning With TensorFlow and Kub...Mathieu Dumoulin
 
Apache Kylin @ Big Data Europe 2015
Apache Kylin @ Big Data Europe 2015Apache Kylin @ Big Data Europe 2015
Apache Kylin @ Big Data Europe 2015Seshu Adunuthula
 
Introduction to Spark on Hadoop
Introduction to Spark on HadoopIntroduction to Spark on Hadoop
Introduction to Spark on HadoopCarol McDonald
 

Similar to Apache Kylin – Cubes on Hadoop (20)

Big Data Everywhere Chicago: SQL on Hadoop
Big Data Everywhere Chicago: SQL on Hadoop Big Data Everywhere Chicago: SQL on Hadoop
Big Data Everywhere Chicago: SQL on Hadoop
 
Real time-hadoop
Real time-hadoopReal time-hadoop
Real time-hadoop
 
IoT and Big Data - Iot Asia 2014
IoT and Big Data - Iot Asia 2014IoT and Big Data - Iot Asia 2014
IoT and Big Data - Iot Asia 2014
 
Keys for Success from Streams to Queries
Keys for Success from Streams to QueriesKeys for Success from Streams to Queries
Keys for Success from Streams to Queries
 
How the Internet of Things is Turning the Internet Upside Down
How the Internet of Things is Turning the Internet Upside DownHow the Internet of Things is Turning the Internet Upside Down
How the Internet of Things is Turning the Internet Upside Down
 
Dunning time-series-2015
Dunning time-series-2015Dunning time-series-2015
Dunning time-series-2015
 
Dealing with an Upside Down Internet With High Performance Time Series Database
Dealing with an Upside Down Internet  With High Performance Time Series DatabaseDealing with an Upside Down Internet  With High Performance Time Series Database
Dealing with an Upside Down Internet With High Performance Time Series Database
 
Stinger Initiative - Deep Dive
Stinger Initiative - Deep DiveStinger Initiative - Deep Dive
Stinger Initiative - Deep Dive
 
Using Hadoop to Offload Data Warehouse Processing and More - Brad Anserson
Using Hadoop to Offload Data Warehouse Processing and More - Brad AnsersonUsing Hadoop to Offload Data Warehouse Processing and More - Brad Anserson
Using Hadoop to Offload Data Warehouse Processing and More - Brad Anserson
 
Apache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
Apache Kylin: OLAP Engine on Hadoop - Tech Deep DiveApache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
Apache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
 
Introduction to Spark
Introduction to SparkIntroduction to Spark
Introduction to Spark
 
The Future of Hadoop: MapR VP of Product Management, Tomer Shiran
The Future of Hadoop: MapR VP of Product Management, Tomer ShiranThe Future of Hadoop: MapR VP of Product Management, Tomer Shiran
The Future of Hadoop: MapR VP of Product Management, Tomer Shiran
 
Apache Kylin: Hadoop OLAP Engine, 2014 Dec
Apache Kylin: Hadoop OLAP Engine, 2014 DecApache Kylin: Hadoop OLAP Engine, 2014 Dec
Apache Kylin: Hadoop OLAP Engine, 2014 Dec
 
Big Data Ecosystem- Impetus Technologies
Big Data Ecosystem-  Impetus TechnologiesBig Data Ecosystem-  Impetus Technologies
Big Data Ecosystem- Impetus Technologies
 
Spark and MapR Streams: A Motivating Example
Spark and MapR Streams: A Motivating ExampleSpark and MapR Streams: A Motivating Example
Spark and MapR Streams: A Motivating Example
 
Predictive Analytics with Hadoop
Predictive Analytics with HadoopPredictive Analytics with Hadoop
Predictive Analytics with Hadoop
 
Recommendation Techn
Recommendation TechnRecommendation Techn
Recommendation Techn
 
Converged and Containerized Distributed Deep Learning With TensorFlow and Kub...
Converged and Containerized Distributed Deep Learning With TensorFlow and Kub...Converged and Containerized Distributed Deep Learning With TensorFlow and Kub...
Converged and Containerized Distributed Deep Learning With TensorFlow and Kub...
 
Apache Kylin @ Big Data Europe 2015
Apache Kylin @ Big Data Europe 2015Apache Kylin @ Big Data Europe 2015
Apache Kylin @ Big Data Europe 2015
 
Introduction to Spark on Hadoop
Introduction to Spark on HadoopIntroduction to Spark on Hadoop
Introduction to Spark on Hadoop
 

More from DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

More from DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Recently uploaded

TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Modernizing Legacy Systems Using Ballerina
Modernizing Legacy Systems Using BallerinaModernizing Legacy Systems Using Ballerina
Modernizing Legacy Systems Using BallerinaWSO2
 
Less Is More: Utilizing Ballerina to Architect a Cloud Data Platform
Less Is More: Utilizing Ballerina to Architect a Cloud Data PlatformLess Is More: Utilizing Ballerina to Architect a Cloud Data Platform
Less Is More: Utilizing Ballerina to Architect a Cloud Data PlatformWSO2
 
Decarbonising Commercial Real Estate: The Role of Operational Performance
Decarbonising Commercial Real Estate: The Role of Operational PerformanceDecarbonising Commercial Real Estate: The Role of Operational Performance
Decarbonising Commercial Real Estate: The Role of Operational PerformanceIES VE
 
Quantum Leap in Next-Generation Computing
Quantum Leap in Next-Generation ComputingQuantum Leap in Next-Generation Computing
Quantum Leap in Next-Generation ComputingWSO2
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2
 
Simplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptxSimplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptxMarkSteadman7
 
Navigating Identity and Access Management in the Modern Enterprise
Navigating Identity and Access Management in the Modern EnterpriseNavigating Identity and Access Management in the Modern Enterprise
Navigating Identity and Access Management in the Modern EnterpriseWSO2
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....rightmanforbloodline
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMKumar Satyam
 
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAnitaRaj43
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Zilliz
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services
 

Recently uploaded (20)

TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Modernizing Legacy Systems Using Ballerina
Modernizing Legacy Systems Using BallerinaModernizing Legacy Systems Using Ballerina
Modernizing Legacy Systems Using Ballerina
 
Less Is More: Utilizing Ballerina to Architect a Cloud Data Platform
Less Is More: Utilizing Ballerina to Architect a Cloud Data PlatformLess Is More: Utilizing Ballerina to Architect a Cloud Data Platform
Less Is More: Utilizing Ballerina to Architect a Cloud Data Platform
 
Decarbonising Commercial Real Estate: The Role of Operational Performance
Decarbonising Commercial Real Estate: The Role of Operational PerformanceDecarbonising Commercial Real Estate: The Role of Operational Performance
Decarbonising Commercial Real Estate: The Role of Operational Performance
 
Quantum Leap in Next-Generation Computing
Quantum Leap in Next-Generation ComputingQuantum Leap in Next-Generation Computing
Quantum Leap in Next-Generation Computing
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Simplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptxSimplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptx
 
Navigating Identity and Access Management in the Modern Enterprise
Navigating Identity and Access Management in the Modern EnterpriseNavigating Identity and Access Management in the Modern Enterprise
Navigating Identity and Access Management in the Modern Enterprise
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDM
 
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by Anitaraj
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 

Apache Kylin – Cubes on Hadoop

  • 1. © 2014 MapR Technologies 1© 2014 MapR Technologies
  • 2. © 2014 MapR Technologies 2 Who I am Ted Dunning, Chief Applications Architect, MapR Technologies Email tdunning@mapr.com tdunning@apache.org Twitter @Ted_Dunning VP Incubator Email tdunning@apache.org Twitter @ApacheMahout @ApacheDrill Credit for slides to Luke Han and the Kylin dev team
  • 3. © 2014 MapR Technologies 3 Kylin Committers ankur Ankur Bansal jiangxu Jiang Xu liyang Li Yang lukehan Luke Han* mahongbin Hongbin Ma xduo Xiaodong Duo yisong George Song jhyde Julian Hyde The real deal Calcite plenipotentiary
  • 4. © 2014 MapR Technologies 4 Agenda • What is Apache Kylin? • Features & Tech Highlights • Performance • Roadmap • Q & A
  • 5. © 2014 MapR Technologies 5 What is Kylin? Extreme OLAP Engine for Big Data Kylin is an open source Distributed Analytics Engine from (originally from eBay) that provides SQL interface and multi-dimensional analysis (OLAP) on Hadoop for extremely large datasets kylin / ˈkiːˈlɪn / 麒麟 --n. (in Chinese art) a mythical animal of composite form • Open Sourced on Oct 1st, 2014 • Accepted into incubation November, 2014 • Preparing for first Apache release
  • 6. © 2014 MapR Technologies 6 Big Data Obligatory Slide • More and more data becoming available on Hadoop • Limitations in existing Business Intelligence (BI) Tools – Limited support for Hadoop – Data size growing exponentially – High latency of interactive queries – Scale-Up architecture • Challenges to adopt Hadoop as interactive analysis system – Majority of analyst groups are SQL savvy – No mature SQL interface on Hadoop – OLAP capability on Hadoop ecosystem not ready yet
  • 7. © 2014 MapR Technologies 7 Goals • Sub-second query latency on billions of rows • ANSI SQL for both analysts and engineers • Full OLAP capability to offer advanced functionality • Seamless Integration with BI Tools • Support for high cardinality and dimensionality • High concurrency – thousands of end users • Distributed and scale out architecture for large data volume
  • 8. © 2014 MapR Technologies 8 Possible Strategies • Build from scratch – A grand tradition – Large-scale SQL support is much harder than it looks – Huge level of distraction • Patch Hive – Not feasible due to design assumptions in Hive – Weak optimizer – Hive isn’t standard SQL anyway – (but isn’t Hive moving to Calcite?)
  • 9. © 2014 MapR Technologies 9 Kylin’s Strategy • Use Calcite as SQL core – Real SQL – Real cost-based optimizer – Already in Apache – Provides linkage to Apache Drill and future of Hive • Build cubes externally – Don’t care which tools, currently Hive, soon Spark • Use Calcite’s Rex interpreter – Assumes final aggregations fit on one machine • Possibly integrate with Drill at some point for parallel execution
  • 10. © 2014 MapR Technologies 10 Transaction Operation Strategy Analytics Query Taxonomy High Level Aggregation • Very High Level, e.g GMV by site by vertical by weeks Analysis Query • Mid-level, e.g GMV by site by vertical, by category (level x) past 12 weeks Drill Down to Detail • Detail Level (Summary Table) Low Level Aggregation • First Level Aggregation Transaction Level • Transaction Data OLAP Kylin is designed to accelerate 80+% of analytics queries on Hadoop OLTP
  • 11. © 2014 MapR Technologies 11 Technical Challenges • Huge volume data – Table scan • Big table joins – Data shuffling • Analysis on different granularity – Runtime aggregation expensive • Map Reduce job – Batch processing
  • 12. © 2014 MapR Technologies 12 How Cubes Work • Start with a simple table – revenue,time,item,location,supplier • Build a table of aggregates for every combination of fields select sum(revenue), max(revenue), supplier from tbl group by time,item,location; select sum(revenue), max(revenue), location,supplier from tbl group by time,item; select sum(revenue), max(revenue), location from tbl group by time,item,supplier; … • Then transform queries using appropriate magic select sum(revenue), city from tbl join location_details where state = ‘MN’ group by city  select … from (select sum(),location from cube) join location_details where state = ‘MN’ group by city
  • 13. © 2014 MapR Technologies 13 How Cubes Don’t Work • Total number of cubes is exponential in columns • High cardinality can result in large cubes • Skewed data can make cubes larger as original data • Magic may be insufficient to recognize cubable queries • Keeping cubes up to date can be hard • Forget OLTP thoughts like pervasive transactions
  • 14. © 2014 MapR Technologies 15 OLAP Cube – Balance between Space and Time Base vs. aggregate cells; ancestor vs. descendant cells; parent vs. child cells 1. (9/15, milk, Urbana, Dairy_land) - <time, item, location, supplier> 2. (9/15, milk, Urbana, *) - <time, item, location> 3. (*, milk, Urbana, *) - <item, location> 4. (*, milk, Chicago, *) - <item, location> 5. (*, milk, *, *) - <item> Cuboid = one combination of dimensions Cube = all combinations of dimensions 1111 0111 1011 1101 1110 0011 0101 0110 1001 1010 1100 0001 0010 0100 1000 0000
  • 15. © 2014 MapR Technologies 16 From Relational to Key-Value
  • 16. © 2014 MapR Technologies 17 Kylin Architecture Overview 17 Cube Build Engine (MapReduce…) SQL Low Latency - SecondsMid Latency - Minutes Routing 3rd Party App (Web App, Mobile…) Metadata SQL Tools (BI Tools: Tableau…) Query Engine Hadoop Hive REST API JDBC/ODBC  Online Analysis Data Flow  Offline Data Flow  Clients/Users interactive with Kylin via SQL  OLAP Cube is transparent to users Star Schema Data Key Value Data Data CubeOLAP Cube (HBase) SQL REST Server
  • 17. © 2014 MapR Technologies 18 Kylin Depends on Hadoop Eco-system • Hive – Input source, pre-join star schema during cube building • MapReduce – Aggregate metrics during cube building • HDFS – Store intermediate files during cube building • HBase – Store and query data cubes • Calcite – SQL parsing, code generation, optimization
  • 18. © 2014 MapR Technologies 19 Agenda • What is Apache Kylin? • Features & Tech Highlights • Performance • Roadmap • Q & A
  • 19. © 2014 MapR Technologies 20 Kylin Highlights • Extremely Fast OLAP Engine at Scale Kylin is designed to reduce query latency on Hadoop for 10+ billions of rows of data to seconds • ANSI SQL Interface on Hadoop Kylin offers ANSI SQL on Hadoop and supports most ANSI SQL query functions • Seamless Integration with BI Tools Kylin currently offers integration capability with BI Tools like Tableau. • Interactive Query Capability Users can interact with Hadoop data via Kylin at sub-second latency • MOLAP Cube User can define a data model and pre-build in Kylin with more than 10+ billions of raw data records
  • 20. © 2014 MapR Technologies 21 More Highlights • Compression and Encoding Support • Incremental Refresh of Cubes • Approximate Query Capability for distinct Count (HyperLogLog) • Leverage HBase Coprocessor for query latency • Job Management and Monitoring • Easy Web interface to manage, build, monitor and query cubes • Security capability to set ACL at Cube/Project Level • Support LDAP Integration
  • 21. © 2014 MapR Technologies 22 Cube Designer
  • 22. © 2014 MapR Technologies 23 Job Management
  • 23. © 2014 MapR Technologies 24 Query and Visualization
  • 24. © 2014 MapR Technologies 25 Tableau Integration
  • 25. © 2014 MapR Technologies 26 Data Modeling Points of View Cube: … Fact Table: … Dimensions: … Measures: … Storage(HBase): … Fact Dim Dim Dim Source Star Schema row A row B row C Column Family Val 1 Val 2 Val 3 Row Key Column Target HBase Storage Mapping Cube Metadata End User Cube Modeler Admin
  • 26. © 2014 MapR Technologies 27 Process Flow Source Joined tables Build dict Dimension dictionaries Hive
  • 27. © 2014 MapR Technologies 28 Process Flow Joined n cuboid n-1 cuboids Apex cuboid MR MR Dimension dictionaries MR
  • 28. © 2014 MapR Technologies 29 Process flow n cuboid n-1 cuboids Apex cuboid MR H-files HBase
  • 29. © 2014 MapR Technologies 30 How To Store Cube? – HBase Schema
  • 30. © 2014 MapR Technologies 31 SELECT test_cal_dt.week_beg_dt, test_category.category_name, test_category.lvl2_name, test_category.lvl3_name, test_kylin_fact.lstg_format_name, test_sites.site_name, SUM(test_kylin_fact.price) AS GMV, COUNT(*) AS TRANS_CNT FROM test_kylin_fact LEFT JOIN test_cal_dt ON test_kylin_fact.cal_dt = test_cal_dt.cal_dt LEFT JOIN test_category ON test_kylin_fact.leaf_categ_id = test_category.leaf_categ_id AND test_kylin_fact.lstg_site_id = test_category.site_id LEFT JOIN test_sites ON test_kylin_fact.lstg_site_id = test_sites.site_id WHERE test_kylin_fact.seller_id = 123456OR test_kylin_fact.lstg_format_name = ’New' GROUP BY test_cal_dt.week_beg_dt, test_category.category_name, test_category.lvl2_name, test_category.lvl3_name, test_kylin_fact.lstg_format_name,test_sites.site_name OLAPToEnumerableConverter OLAPProjectRel(WEEK_BEG_DT=[$0], category_name=[$1], CATEG_LVL2_NAME=[$2], CATEG_LVL3_NAME=[$3], LSTG_FORMAT_NAME=[$4], SITE_NAME=[$5], GMV=[CASE(=($7, 0), null, $6)], TRANS_CNT=[$8]) OLAPAggregateRel(group=[{0, 1, 2, 3, 4, 5}], agg#0=[$SUM0($6)], agg#1=[COUNT($6)], TRANS_CNT=[COUNT()]) OLAPProjectRel(WEEK_BEG_DT=[$13], category_name=[$21], CATEG_LVL2_NAME=[$15], CATEG_LVL3_NAME=[$14], LSTG_FORMAT_NAME=[$5], SITE_NAME=[$23], PRICE=[$0]) OLAPFilterRel(condition=[OR(=($3, 123456), =($5, ’New'))]) OLAPJoinRel(condition=[=($2, $25)], joinType=[left]) OLAPJoinRel(condition=[AND(=($6, $22), =($2, $17))], joinType=[left]) OLAPJoinRel(condition=[=($4, $12)], joinType=[left]) OLAPTableScan(table=[[DEFAULT, TEST_KYLIN_FACT]], fields=[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]]) OLAPTableScan(table=[[DEFAULT, TEST_CAL_DT]], fields=[[0, 1]]) OLAPTableScan(table=[[DEFAULT, test_category]], fields=[[0, 1, 2, 3, 4, 5, 6, 7, 8]]) OLAPTableScan(table=[[DEFAULT, TEST_SITES]], fields=[[0, 1, 2]]) Query Engine – Kylin Explain Plan
  • 31. © 2014 MapR Technologies 32 Now Let’s Make it Really Work • Full Cube – Pre-aggregate all dimension combinations – “Curse of dimensionality”: N dimension cube has 2N cuboid. • Partial Cube – To avoid dimension explosion, we divide the dimensions into different aggregation groups • 2N+M+L  2N + 2M + 2L – For cube with 30 dimensions, if we divide these dimensions into 3 group, the cuboid count is reduced from 1 Billion to 3 thousand • 230  210 + 210 + 210 – Tradeoff between online aggregation and offline pre-aggregation
  • 32. © 2014 MapR Technologies 34 Incremental Cube Building
  • 33. © 2014 MapR Technologies 36 Agenda • What is Apache Kylin? • Features & Tech Highlights • Performance • Roadmap • Q & A
  • 34. © 2014 MapR Technologies 37 # Query Type Return Dataset Query On Kylin (s) Query On Hive (s) Comments 1 High Level Aggregation 4 0.129 157.437 1,217 times 2 Analysis Query 22,669 1.615 109.206 68 times 3 Drill Down to Detail 325,029 12.058 113.123 9 times 4 Drill Down to Detail 524,780 22.42 6383.21 278 times 5 Data Dump 972,002 49.054 N/A 0 50 100 150 200 SQL #1 SQL #2 SQL #3 Hive Kylin High Level Aggregati on Analysis Query Drill Down to Detail Low Level Aggregati on Transactio n Level Based on 12+B records Kylin vs. Hive
  • 35. © 2014 MapR Technologies 38 Performance Scaleout Linear scale out with more nodes
  • 36. © 2014 MapR Technologies 39 Performance - Query Latency 99 %-ile 95 %-ile
  • 37. © 2014 MapR Technologies 40 Agenda • What is Apache Kylin? • Features & Tech Highlights • Performance • Roadmap • Q & A
  • 38. © 2014 MapR Technologies 41 201520142013 Initial Prototype for MOLAP • Basic end to end POC MOLAP • Incremental Refresh • ANSI SQL • ODBC Driver • Web GUI • ACL • Open Source HOLAP • Streaming OLAP • JDBC Driver • New UI • Excel Support • … more Next Gen • Automation • Capacity Management • In-Memory Analysis (TBD) • Spark (TBD) • … more TBD Future… Sep, 2013 Jan, 2014 Sep, 2014 Q1, 2015 Kylin History and Roadmap
  • 39. © 2014 MapR Technologies 42 Kylin Ecosystem • Kylin Core – Fundamental framework of Kylin OLAP Engine • Extension – Plugins to support for additional functions and features • Integration – Lifecycle Management Support to integrate with other applications • Interface – Allows for third party users to build more features via user-interface atop Kylin core • Driver – ODBC and JDBC Drivers Kylin OLAP Core Extension  Security  Redis Storage  Spark Engine  Docker Interface  Web Console  Customized BI  Ambari/Hue Plugin Integration  ODBC Driver  ETL  Drill  SparkSQL
  • 40. © 2014 MapR Technologies 44 If you want to go fast, go alone. If you want to go far, go together. --African Proverb
  • 41. © 2014 MapR Technologies 45 Agenda • What is Apache Kylin? • Features & Tech Highlights • Performance • Roadmap • Q & A
  • 42. © 2014 MapR Technologies 46 Q&A @mapr maprtech tdunning@mapr.com Engage with us! MapR maprtech mapr-technologies