SlideShare a Scribd company logo
INTRODUCTION TO OLAP
 OLAP (online analytical processing) is
computer processing that enables a user
to easily and selectively extract and
view data from different points of view.
 OLAP allows users to analyze database
information from multiple database systems
at one time.
 OLAP data is stored in multidimensional
databases.
Analysis
Query/
Reporting
Data
Mining
Monitoring & Administration
Metadata
Repository
External
Sources
Operational
databases
Extract
Transform
Load
Refresh
DATA
WAREHOUSE
Serve
OLAP servers
DATAWAREHOUSING ARCHITECHURE
 A multidimensional cube can combine data from
disparate data sources and store the information
in a fashion that is logical for business users.
THE OLAP CUBE
 An OLAP Cube is a data structure that allows fast
analysis of data.
 The arrangement of data into cubes overcomes a
limitation of relational databases.
 The OLAP cube consists of numeric facts called
measures which are categorized by dimensions.
OLAP CUBE
TWOTYPES OF
DATABASE ACTIVITY
 OLTP
◦ (Online-Transaction Processing)
 OLAP
◦ (Online-Analytical Processing)
OLTP vs. OLAP
 On-LineTransaction Processing (OLTP):
– technology used to perform updates on
operational or transactional systems (e.g., point
of sale systems)
 On-Line Analytical Processing (OLAP):
– technology used to perform complex analysis of
the data in a data warehouse
OLAP is a category of software technology that enables analysts,
managers, and executives to gain insight into data through fast,
consistent, interactive access to a wide variety of possible views of
information that has been transformed from raw data to reflect
the dimensionality of the enterprise as understood by the user.
[source: OLAP Council: www.olapcouncil.org]
OLTP vs. OLAP
TYPES OF OLAP
 Relational OLAP(ROLAP):
 Relational and Specialized Relational DBMS to store and manage warehouse data
 OLAP middleware to support missing pieces
 Optimize for each DBMS backend
 Aggregation Navigation Logic
 Additional tools and services
 Example: Microstrategy, MetaCube (Informix)
 Extended RDBMS with multidimensional data mapping to standard relational operation.
 Multidimensional OLAP(MOLAP):
 Array-based storage structures
 Direct access to array data structures
 Implemented operation in multidimensional data
 Example: Essbase (Arbor)
 Hybrid Online Analytical Processing (HOLAP):
A hybrid approach to the solution where the aggregated totals are stored in a
multidimensional database while the detail data is stored in the relational database. This is the
balance between the data efficiency of the ROLAP model and the performance of the
MOLAP model.
ROLAP v/s MOLAP
Characteristics ROLAP MOLAP
SCHEMA User star Schema
•Additional dimensions
can be added
dynamically.
User Data cubes
•Addition dimensions
require recreation of
data cube.
Database Size Medium to large Small to medium
Architecture Client/Server Client/Server
Access Support ad-hoc
requests
Limited to pre-defined
dimensions
Characteristics ROLAP MOLAP
Resources HIGH VERY HIGH
Flexibility HIGH LOW
Scalability HIGH LOW
Speed •Good with small data
sets.
•Average for medium to
large data set.
•Faster for small to
medium data sets.
•Average for large data
sets.
 One main benefit of OLAP is consistency of information
and calculations.
 "What if" scenarios are some of the most popular uses of
OLAP software and are made eminently more possible by
multidimensional processing.
 It allows a manager to pull down data from an OLAP
database in broad or specific terms.
 OLAP creates a single platform for all the information
and business needs, planning, budgeting, forecasting,
reporting and analysis.
BENEFITS OF OLAP
/Contd…
 Marketing and sales analysis
 Consumer goods industries
 Financial services industry (insurance, banks etc)
 Database Marketing
Apache Kylin – What ?
● Open source
● Distributed Analytics Engine
● Provides SQL interface
● Multi-dimensional analysis (OLAP) on Hadoop
● Faster and more user-responsive than relational online
analytical processing (ROLAP)
The Fundamental Idea
● The idea of Kylin is not brand new.
● Technologies include methods to store pre-calculated results
to serve analysis queries, generate each level’s cuboids with
all possible combinations of dimensions, and calculate all
metrics at different levels.
From Relational to key-value
● Prevents large table scan and a long delay to get the answer.
● It makes sense to calculate and store those values for further
usage.
● This process generates all of the dimension combinations and
measured values.
Github Page
How it Works ?
● Read data from Hive (which is stored on HDFS)
● Run MapReduce jobs to pre-calculate
● Store cube data in HBase
● Leverage Zookeeper for job coordination
Apache Foundation Blog December 2015
● Apache Kylin is the best OLAP engine on Big Data so far.
● While other OLAP engines struggle with the data volume,
Kylin enables query responses in the milliseconds.
● Starting to leverage Kylin for near real time data streaming
storage and analytics engine.
Advantages
● Kylin has good intergration with BI tools, such as Tableau or
Excel.
● Kylin support molap cube, it has very good performance for
complex query on billion level data set
Limitations
● Real Time Support hasn’t yet been built.
● Kylin only supports the star schema. You are limited to a
single fact table for each cube.
Key Features
●Open Source.
●Distributed architecture.
●Real-time ingestion.
●Column-oriented for speed.
●Fast filtering.
●Operational simplicity.
●Support to OLAP Queries.
Druid Architecture
Types of Nodes:
Historical Nodes
➢Backbone of Druid cluster.
➢Download segments and serve queries over them.
Broker Nodes
➢Clients query to broker node to get data from Druid .
➢Scattering Queries.
➢Gathering and merging results.(know location of the segments)
Coordinator Nodes
➢Manage segments on historical nodes .
➢Load new segments , drop old segments and move segments to load
balance.
Ingestion method
● Streaming (real-time):
– If your dataset originates in a streaming system like Kafka .
– Kafka lets you process streams of records as they occur.
– The Kafka cluster stores streams of records in categories called topics.
– Each record consists of a key, a value, and a timestamp
● File based (Batch):
– Load data from HDFS, local files ,etc in batches.
Segments
● Druid stores its index in segment files ,partitioned by time
(Timestamp)
● Data Structure of segment file
– Columnar: the data for each column is laid out in separate
data structures.
●A segment consists of the timestamp column, dimension columns, and metric
columns .
●The timestamp and metric columns are simple and each of these is an array of
integer or floating point values .Values in metric columns are pulled out to perform
aggregate.
●Dimensions columns are different because they support filter and group-by
operations and requires:
➢ Dictionary that encodes column values
{
"Justin Bieber": 0,
"Ke$ha": 1
}
➢Column data
[0,
0,
1,
1]
●Bitmaps - one for each unique value of the column
●value="Justin Bieber": [1,1,0,0]
●value="Ke$ha": [0,0,1,1]
Druid vs Apache Kylin
DRUID APACHE KYLIN
Query Speed Very Fast Fast
Type of Analysis RealTime Analysis Focuses on OLAP cases,
RealTime Analysis under
development
SQL Support Absent Present
FaultTolerance All Nodes Need to Setup
BITools Integration Under Development Present (Tableau or Excel)
Integration with Kafka Present Absent
Complex Queries Bad for big data sets Good Performance
StorageType Bit-map Index OLAP Cube
Underlying technology Own computation and storage
cluster
Hadoop for cube build ,
HBase for storage
Miscellaneous Points to Consider…
 Druid has limitation on table join.
 Apache Kylin supports Star Schema.
 Modern corporations are increasingly looking for near real time
analytics and insights to make actionable decisions.
 Druid is trying to support integration with BI tools using Apache
Hive at Horton works.
(https://ko.hortonworks.com/blog/apache-hive-druid-part-1-3/)
 Previous version of Druid was under GPL v2 license.The latest
version of Druid is under Apache license v2,Apache Kylin is
under Apache License v2.
 Druid has 181 contributors for their GitHub project whereas
Apache Kylin has 60 contributors.
References - OLAP & OLTP
● http://en.wikipedia.org/wiki/Online_analytical_processing
● http://www.dmreview.com/issues/19971101/964-l.html
● http://en.wikipedia.org/wiki/Extract,_transform,_load
● http://www.olapreport.com/Applications.html
References-Apache Kylin
 https://mail-archives.apache.org/mod_mbox/kylin-
dev/201503.mbox/%3CCAKmQrOY0fjZLUU0MGo5aajZ2uLb3T0qJknHQd+Wv1oxd5PKixQ@mai
l.gmail.com%3E
 https://dzone.com/articles/apache-kylin-for-olap-on-hadoop
 http://kylin.apache.org/docs16/
 https://github.com/apache/kylin
 https://resources.zaloni.com/blog/apache-kylin-for-olap-on-hadoop
 https://en.wikipedia.org/wiki/Apache_Kylin
 http://www.ebaytechblog.com/2014/10/20/announcing-kylin-extreme-olap-engine-for-big-data/
References-Druid
 http://druid.io/docs/latest/design/
 http://druid.io/docs/latest/tutorials/ingestion.html
 http://druid.io/docs/latest/design/segments.html
 https://en.wikipedia.org/wiki/Druid_(open-source_data_store)
References-Druid vs Apache Kylin
 https://www.slideshare.net/freepsw/olap-for-big-data-druid-vs-apache-kylin-vs-apache-lens
 http://markmail.org/message/mf6gfzdwfqwtbtv6#query:+page:1+mid:sp7ek7x5pawjlxb6+state:results
 https://ko.hortonworks.com/blog/apache-hive-druid-part-1-3/
 https://github.com/druid-io/druid
 https://github.com/apache/kylin
THANK YOU!!

More Related Content

What's hot

ClickHouse Deep Dive, by Aleksei Milovidov
ClickHouse Deep Dive, by Aleksei MilovidovClickHouse Deep Dive, by Aleksei Milovidov
ClickHouse Deep Dive, by Aleksei Milovidov
Altinity Ltd
 
Migration to ClickHouse. Practical guide, by Alexander Zaitsev
Migration to ClickHouse. Practical guide, by Alexander ZaitsevMigration to ClickHouse. Practical guide, by Alexander Zaitsev
Migration to ClickHouse. Practical guide, by Alexander Zaitsev
Altinity Ltd
 
Extending Druid Index File
Extending Druid Index FileExtending Druid Index File
Extending Druid Index File
Navis Ryu
 
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the CloudAmazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Noritaka Sekiyama
 
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdfDeep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
Altinity Ltd
 
Performance Optimizations in Apache Impala
Performance Optimizations in Apache ImpalaPerformance Optimizations in Apache Impala
Performance Optimizations in Apache Impala
Cloudera, Inc.
 
Pinot: Near Realtime Analytics @ Uber
Pinot: Near Realtime Analytics @ UberPinot: Near Realtime Analytics @ Uber
Pinot: Near Realtime Analytics @ Uber
Xiang Fu
 
Introduction to the Mysteries of ClickHouse Replication, By Robert Hodges and...
Introduction to the Mysteries of ClickHouse Replication, By Robert Hodges and...Introduction to the Mysteries of ClickHouse Replication, By Robert Hodges and...
Introduction to the Mysteries of ClickHouse Replication, By Robert Hodges and...
Altinity Ltd
 
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Flink Forward
 
Building large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudiBuilding large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudi
Bill Liu
 
Bootstrapping state in Apache Flink
Bootstrapping state in Apache FlinkBootstrapping state in Apache Flink
Bootstrapping state in Apache Flink
DataWorks Summit
 
Tricks every ClickHouse designer should know, by Robert Hodges, Altinity CEO
Tricks every ClickHouse designer should know, by Robert Hodges, Altinity CEOTricks every ClickHouse designer should know, by Robert Hodges, Altinity CEO
Tricks every ClickHouse designer should know, by Robert Hodges, Altinity CEO
Altinity Ltd
 
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
Altinity Ltd
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and Hudi
Databricks
 
Snowflake Best Practices for Elastic Data Warehousing
Snowflake Best Practices for Elastic Data WarehousingSnowflake Best Practices for Elastic Data Warehousing
Snowflake Best Practices for Elastic Data Warehousing
Amazon Web Services
 
Real-time, Exactly-once Data Ingestion from Kafka to ClickHouse at eBay
Real-time, Exactly-once Data Ingestion from Kafka to ClickHouse at eBayReal-time, Exactly-once Data Ingestion from Kafka to ClickHouse at eBay
Real-time, Exactly-once Data Ingestion from Kafka to ClickHouse at eBay
Altinity Ltd
 
6 Nines: How Stripe keeps Kafka highly-available across the globe with Donny ...
6 Nines: How Stripe keeps Kafka highly-available across the globe with Donny ...6 Nines: How Stripe keeps Kafka highly-available across the globe with Donny ...
6 Nines: How Stripe keeps Kafka highly-available across the globe with Donny ...
HostedbyConfluent
 
Pinot: Enabling Real-time Analytics Applications @ LinkedIn's Scale
Pinot: Enabling Real-time Analytics Applications @ LinkedIn's ScalePinot: Enabling Real-time Analytics Applications @ LinkedIn's Scale
Pinot: Enabling Real-time Analytics Applications @ LinkedIn's Scale
Seunghyun Lee
 
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander ZaitsevClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
Altinity Ltd
 
Clickhouse Capacity Planning for OLAP Workloads, Mik Kocikowski of CloudFlare
Clickhouse Capacity Planning for OLAP Workloads, Mik Kocikowski of CloudFlareClickhouse Capacity Planning for OLAP Workloads, Mik Kocikowski of CloudFlare
Clickhouse Capacity Planning for OLAP Workloads, Mik Kocikowski of CloudFlare
Altinity Ltd
 

What's hot (20)

ClickHouse Deep Dive, by Aleksei Milovidov
ClickHouse Deep Dive, by Aleksei MilovidovClickHouse Deep Dive, by Aleksei Milovidov
ClickHouse Deep Dive, by Aleksei Milovidov
 
Migration to ClickHouse. Practical guide, by Alexander Zaitsev
Migration to ClickHouse. Practical guide, by Alexander ZaitsevMigration to ClickHouse. Practical guide, by Alexander Zaitsev
Migration to ClickHouse. Practical guide, by Alexander Zaitsev
 
Extending Druid Index File
Extending Druid Index FileExtending Druid Index File
Extending Druid Index File
 
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the CloudAmazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
 
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdfDeep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
 
Performance Optimizations in Apache Impala
Performance Optimizations in Apache ImpalaPerformance Optimizations in Apache Impala
Performance Optimizations in Apache Impala
 
Pinot: Near Realtime Analytics @ Uber
Pinot: Near Realtime Analytics @ UberPinot: Near Realtime Analytics @ Uber
Pinot: Near Realtime Analytics @ Uber
 
Introduction to the Mysteries of ClickHouse Replication, By Robert Hodges and...
Introduction to the Mysteries of ClickHouse Replication, By Robert Hodges and...Introduction to the Mysteries of ClickHouse Replication, By Robert Hodges and...
Introduction to the Mysteries of ClickHouse Replication, By Robert Hodges and...
 
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
 
Building large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudiBuilding large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudi
 
Bootstrapping state in Apache Flink
Bootstrapping state in Apache FlinkBootstrapping state in Apache Flink
Bootstrapping state in Apache Flink
 
Tricks every ClickHouse designer should know, by Robert Hodges, Altinity CEO
Tricks every ClickHouse designer should know, by Robert Hodges, Altinity CEOTricks every ClickHouse designer should know, by Robert Hodges, Altinity CEO
Tricks every ClickHouse designer should know, by Robert Hodges, Altinity CEO
 
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and Hudi
 
Snowflake Best Practices for Elastic Data Warehousing
Snowflake Best Practices for Elastic Data WarehousingSnowflake Best Practices for Elastic Data Warehousing
Snowflake Best Practices for Elastic Data Warehousing
 
Real-time, Exactly-once Data Ingestion from Kafka to ClickHouse at eBay
Real-time, Exactly-once Data Ingestion from Kafka to ClickHouse at eBayReal-time, Exactly-once Data Ingestion from Kafka to ClickHouse at eBay
Real-time, Exactly-once Data Ingestion from Kafka to ClickHouse at eBay
 
6 Nines: How Stripe keeps Kafka highly-available across the globe with Donny ...
6 Nines: How Stripe keeps Kafka highly-available across the globe with Donny ...6 Nines: How Stripe keeps Kafka highly-available across the globe with Donny ...
6 Nines: How Stripe keeps Kafka highly-available across the globe with Donny ...
 
Pinot: Enabling Real-time Analytics Applications @ LinkedIn's Scale
Pinot: Enabling Real-time Analytics Applications @ LinkedIn's ScalePinot: Enabling Real-time Analytics Applications @ LinkedIn's Scale
Pinot: Enabling Real-time Analytics Applications @ LinkedIn's Scale
 
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander ZaitsevClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
 
Clickhouse Capacity Planning for OLAP Workloads, Mik Kocikowski of CloudFlare
Clickhouse Capacity Planning for OLAP Workloads, Mik Kocikowski of CloudFlareClickhouse Capacity Planning for OLAP Workloads, Mik Kocikowski of CloudFlare
Clickhouse Capacity Planning for OLAP Workloads, Mik Kocikowski of CloudFlare
 

Viewers also liked

Apache Kylin’s Performance Boost from Apache HBase
Apache Kylin’s Performance Boost from Apache HBaseApache Kylin’s Performance Boost from Apache HBase
Apache Kylin’s Performance Boost from Apache HBase
HBaseCon
 
Performance of Spark vs MapReduce
Performance of Spark vs MapReducePerformance of Spark vs MapReduce
Performance of Spark vs MapReduce
Edureka!
 
HBase Coprocessor Introduction
HBase Coprocessor IntroductionHBase Coprocessor Introduction
HBase Coprocessor Introduction
Schubert Zhang
 
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Cloudera, Inc.
 
카일린 Kylin, OLAP on hadoop
카일린 Kylin, OLAP on hadoop카일린 Kylin, OLAP on hadoop
카일린 Kylin, OLAP on hadoop
Doo Yong Kim
 
Hive, Presto, and Spark on TPC-DS benchmark
Hive, Presto, and Spark on TPC-DS benchmarkHive, Presto, and Spark on TPC-DS benchmark
Hive, Presto, and Spark on TPC-DS benchmark
Dongwon Kim
 

Viewers also liked (6)

Apache Kylin’s Performance Boost from Apache HBase
Apache Kylin’s Performance Boost from Apache HBaseApache Kylin’s Performance Boost from Apache HBase
Apache Kylin’s Performance Boost from Apache HBase
 
Performance of Spark vs MapReduce
Performance of Spark vs MapReducePerformance of Spark vs MapReduce
Performance of Spark vs MapReduce
 
HBase Coprocessor Introduction
HBase Coprocessor IntroductionHBase Coprocessor Introduction
HBase Coprocessor Introduction
 
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
 
카일린 Kylin, OLAP on hadoop
카일린 Kylin, OLAP on hadoop카일린 Kylin, OLAP on hadoop
카일린 Kylin, OLAP on hadoop
 
Hive, Presto, and Spark on TPC-DS benchmark
Hive, Presto, and Spark on TPC-DS benchmarkHive, Presto, and Spark on TPC-DS benchmark
Hive, Presto, and Spark on TPC-DS benchmark
 

Similar to Kylin and Druid Presentation

Apache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
Apache Kylin: OLAP Engine on Hadoop - Tech Deep DiveApache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
Apache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
Xu Jiang
 
Enterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
Enterprise Data World 2018 - Building Cloud Self-Service Analytical SolutionEnterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
Enterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
Dmitry Anoshin
 
Otimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
Otimizações de Projetos de Big Data, Dw e AI no Microsoft AzureOtimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
Otimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
Luan Moreno Medeiros Maciel
 
An overview of modern scalable web development
An overview of modern scalable web developmentAn overview of modern scalable web development
An overview of modern scalable web development
Tung Nguyen
 
Big Data_Architecture.pptx
Big Data_Architecture.pptxBig Data_Architecture.pptx
Big Data_Architecture.pptx
betalab
 
3 OLAP.pptx
3 OLAP.pptx3 OLAP.pptx
3 OLAP.pptx
Priyanshu931034
 
Accelerating Big Data Analytics with Apache Kylin
Accelerating Big Data Analytics with Apache KylinAccelerating Big Data Analytics with Apache Kylin
Accelerating Big Data Analytics with Apache Kylin
Tyler Wishnoff
 
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
Fwdays
 
86921864 olap-case-study-vj
86921864 olap-case-study-vj86921864 olap-case-study-vj
86921864 olap-case-study-vj
homeworkping4
 
Unlocking big data with Hadoop + MySQL
Unlocking big data with Hadoop + MySQLUnlocking big data with Hadoop + MySQL
Unlocking big data with Hadoop + MySQL
Ricky Setyawan
 
SQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for ImpalaSQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for Impala
markgrover
 
Apache Kylin Extreme OLAP Engine for Big Data
Apache Kylin Extreme OLAP Engine for Big DataApache Kylin Extreme OLAP Engine for Big Data
Apache Kylin Extreme OLAP Engine for Big Data
Luke Han
 
Apache Kylin @ Big Data Europe 2015
Apache Kylin @ Big Data Europe 2015Apache Kylin @ Big Data Europe 2015
Apache Kylin @ Big Data Europe 2015
Seshu Adunuthula
 
SAP HANA_class1.pptx
SAP HANA_class1.pptxSAP HANA_class1.pptx
SAP HANA_class1.pptx
SudhaVukkalkar1
 
OBIEE ARCHITECTURE.ppt
OBIEE ARCHITECTURE.pptOBIEE ARCHITECTURE.ppt
OBIEE ARCHITECTURE.ppt
Canara bank
 
OLAP OnLine Analytical Processing
OLAP OnLine Analytical ProcessingOLAP OnLine Analytical Processing
OLAP OnLine Analytical Processing
Walid Elbadawy
 
Big Data 2.0: YARN Enablement for Distributed ETL & SQL with Hadoop
Big Data 2.0: YARN Enablement for Distributed ETL & SQL with HadoopBig Data 2.0: YARN Enablement for Distributed ETL & SQL with Hadoop
Big Data 2.0: YARN Enablement for Distributed ETL & SQL with Hadoop
Caserta
 
SnappyData Toronto Meetup Nov 2017
SnappyData Toronto Meetup Nov 2017SnappyData Toronto Meetup Nov 2017
SnappyData Toronto Meetup Nov 2017
SnappyData
 
Kushal Data Warehousing PPT
Kushal Data Warehousing PPTKushal Data Warehousing PPT
Kushal Data Warehousing PPT
Kushal Singh
 
OLAP & Data Warehouse
OLAP & Data WarehouseOLAP & Data Warehouse
OLAP & Data Warehouse
Zalpa Rathod
 

Similar to Kylin and Druid Presentation (20)

Apache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
Apache Kylin: OLAP Engine on Hadoop - Tech Deep DiveApache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
Apache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
 
Enterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
Enterprise Data World 2018 - Building Cloud Self-Service Analytical SolutionEnterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
Enterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
 
Otimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
Otimizações de Projetos de Big Data, Dw e AI no Microsoft AzureOtimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
Otimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
 
An overview of modern scalable web development
An overview of modern scalable web developmentAn overview of modern scalable web development
An overview of modern scalable web development
 
Big Data_Architecture.pptx
Big Data_Architecture.pptxBig Data_Architecture.pptx
Big Data_Architecture.pptx
 
3 OLAP.pptx
3 OLAP.pptx3 OLAP.pptx
3 OLAP.pptx
 
Accelerating Big Data Analytics with Apache Kylin
Accelerating Big Data Analytics with Apache KylinAccelerating Big Data Analytics with Apache Kylin
Accelerating Big Data Analytics with Apache Kylin
 
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
 
86921864 olap-case-study-vj
86921864 olap-case-study-vj86921864 olap-case-study-vj
86921864 olap-case-study-vj
 
Unlocking big data with Hadoop + MySQL
Unlocking big data with Hadoop + MySQLUnlocking big data with Hadoop + MySQL
Unlocking big data with Hadoop + MySQL
 
SQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for ImpalaSQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for Impala
 
Apache Kylin Extreme OLAP Engine for Big Data
Apache Kylin Extreme OLAP Engine for Big DataApache Kylin Extreme OLAP Engine for Big Data
Apache Kylin Extreme OLAP Engine for Big Data
 
Apache Kylin @ Big Data Europe 2015
Apache Kylin @ Big Data Europe 2015Apache Kylin @ Big Data Europe 2015
Apache Kylin @ Big Data Europe 2015
 
SAP HANA_class1.pptx
SAP HANA_class1.pptxSAP HANA_class1.pptx
SAP HANA_class1.pptx
 
OBIEE ARCHITECTURE.ppt
OBIEE ARCHITECTURE.pptOBIEE ARCHITECTURE.ppt
OBIEE ARCHITECTURE.ppt
 
OLAP OnLine Analytical Processing
OLAP OnLine Analytical ProcessingOLAP OnLine Analytical Processing
OLAP OnLine Analytical Processing
 
Big Data 2.0: YARN Enablement for Distributed ETL & SQL with Hadoop
Big Data 2.0: YARN Enablement for Distributed ETL & SQL with HadoopBig Data 2.0: YARN Enablement for Distributed ETL & SQL with Hadoop
Big Data 2.0: YARN Enablement for Distributed ETL & SQL with Hadoop
 
SnappyData Toronto Meetup Nov 2017
SnappyData Toronto Meetup Nov 2017SnappyData Toronto Meetup Nov 2017
SnappyData Toronto Meetup Nov 2017
 
Kushal Data Warehousing PPT
Kushal Data Warehousing PPTKushal Data Warehousing PPT
Kushal Data Warehousing PPT
 
OLAP & Data Warehouse
OLAP & Data WarehouseOLAP & Data Warehouse
OLAP & Data Warehouse
 

Recently uploaded

FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 

Recently uploaded (20)

FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 

Kylin and Druid Presentation

  • 1.
  • 2. INTRODUCTION TO OLAP  OLAP (online analytical processing) is computer processing that enables a user to easily and selectively extract and view data from different points of view.  OLAP allows users to analyze database information from multiple database systems at one time.  OLAP data is stored in multidimensional databases.
  • 4.  A multidimensional cube can combine data from disparate data sources and store the information in a fashion that is logical for business users.
  • 5. THE OLAP CUBE  An OLAP Cube is a data structure that allows fast analysis of data.  The arrangement of data into cubes overcomes a limitation of relational databases.  The OLAP cube consists of numeric facts called measures which are categorized by dimensions.
  • 7. TWOTYPES OF DATABASE ACTIVITY  OLTP ◦ (Online-Transaction Processing)  OLAP ◦ (Online-Analytical Processing)
  • 8. OLTP vs. OLAP  On-LineTransaction Processing (OLTP): – technology used to perform updates on operational or transactional systems (e.g., point of sale systems)  On-Line Analytical Processing (OLAP): – technology used to perform complex analysis of the data in a data warehouse OLAP is a category of software technology that enables analysts, managers, and executives to gain insight into data through fast, consistent, interactive access to a wide variety of possible views of information that has been transformed from raw data to reflect the dimensionality of the enterprise as understood by the user. [source: OLAP Council: www.olapcouncil.org]
  • 10. TYPES OF OLAP  Relational OLAP(ROLAP):  Relational and Specialized Relational DBMS to store and manage warehouse data  OLAP middleware to support missing pieces  Optimize for each DBMS backend  Aggregation Navigation Logic  Additional tools and services  Example: Microstrategy, MetaCube (Informix)  Extended RDBMS with multidimensional data mapping to standard relational operation.  Multidimensional OLAP(MOLAP):  Array-based storage structures  Direct access to array data structures  Implemented operation in multidimensional data  Example: Essbase (Arbor)  Hybrid Online Analytical Processing (HOLAP): A hybrid approach to the solution where the aggregated totals are stored in a multidimensional database while the detail data is stored in the relational database. This is the balance between the data efficiency of the ROLAP model and the performance of the MOLAP model.
  • 11. ROLAP v/s MOLAP Characteristics ROLAP MOLAP SCHEMA User star Schema •Additional dimensions can be added dynamically. User Data cubes •Addition dimensions require recreation of data cube. Database Size Medium to large Small to medium Architecture Client/Server Client/Server Access Support ad-hoc requests Limited to pre-defined dimensions
  • 12. Characteristics ROLAP MOLAP Resources HIGH VERY HIGH Flexibility HIGH LOW Scalability HIGH LOW Speed •Good with small data sets. •Average for medium to large data set. •Faster for small to medium data sets. •Average for large data sets.
  • 13.  One main benefit of OLAP is consistency of information and calculations.  "What if" scenarios are some of the most popular uses of OLAP software and are made eminently more possible by multidimensional processing.  It allows a manager to pull down data from an OLAP database in broad or specific terms.  OLAP creates a single platform for all the information and business needs, planning, budgeting, forecasting, reporting and analysis. BENEFITS OF OLAP
  • 14. /Contd…  Marketing and sales analysis  Consumer goods industries  Financial services industry (insurance, banks etc)  Database Marketing
  • 15. Apache Kylin – What ? ● Open source ● Distributed Analytics Engine ● Provides SQL interface ● Multi-dimensional analysis (OLAP) on Hadoop ● Faster and more user-responsive than relational online analytical processing (ROLAP)
  • 16. The Fundamental Idea ● The idea of Kylin is not brand new. ● Technologies include methods to store pre-calculated results to serve analysis queries, generate each level’s cuboids with all possible combinations of dimensions, and calculate all metrics at different levels.
  • 17.
  • 18. From Relational to key-value ● Prevents large table scan and a long delay to get the answer. ● It makes sense to calculate and store those values for further usage. ● This process generates all of the dimension combinations and measured values.
  • 19.
  • 20.
  • 22. How it Works ? ● Read data from Hive (which is stored on HDFS) ● Run MapReduce jobs to pre-calculate ● Store cube data in HBase ● Leverage Zookeeper for job coordination
  • 23. Apache Foundation Blog December 2015 ● Apache Kylin is the best OLAP engine on Big Data so far. ● While other OLAP engines struggle with the data volume, Kylin enables query responses in the milliseconds. ● Starting to leverage Kylin for near real time data streaming storage and analytics engine.
  • 24. Advantages ● Kylin has good intergration with BI tools, such as Tableau or Excel. ● Kylin support molap cube, it has very good performance for complex query on billion level data set
  • 25. Limitations ● Real Time Support hasn’t yet been built. ● Kylin only supports the star schema. You are limited to a single fact table for each cube.
  • 26.
  • 27. Key Features ●Open Source. ●Distributed architecture. ●Real-time ingestion. ●Column-oriented for speed. ●Fast filtering. ●Operational simplicity. ●Support to OLAP Queries.
  • 28. Druid Architecture Types of Nodes: Historical Nodes ➢Backbone of Druid cluster. ➢Download segments and serve queries over them. Broker Nodes ➢Clients query to broker node to get data from Druid . ➢Scattering Queries. ➢Gathering and merging results.(know location of the segments) Coordinator Nodes ➢Manage segments on historical nodes . ➢Load new segments , drop old segments and move segments to load balance.
  • 29. Ingestion method ● Streaming (real-time): – If your dataset originates in a streaming system like Kafka . – Kafka lets you process streams of records as they occur. – The Kafka cluster stores streams of records in categories called topics. – Each record consists of a key, a value, and a timestamp ● File based (Batch): – Load data from HDFS, local files ,etc in batches.
  • 30. Segments ● Druid stores its index in segment files ,partitioned by time (Timestamp) ● Data Structure of segment file – Columnar: the data for each column is laid out in separate data structures.
  • 31. ●A segment consists of the timestamp column, dimension columns, and metric columns . ●The timestamp and metric columns are simple and each of these is an array of integer or floating point values .Values in metric columns are pulled out to perform aggregate. ●Dimensions columns are different because they support filter and group-by operations and requires: ➢ Dictionary that encodes column values { "Justin Bieber": 0, "Ke$ha": 1 } ➢Column data [0, 0, 1, 1] ●Bitmaps - one for each unique value of the column ●value="Justin Bieber": [1,1,0,0] ●value="Ke$ha": [0,0,1,1]
  • 32. Druid vs Apache Kylin DRUID APACHE KYLIN Query Speed Very Fast Fast Type of Analysis RealTime Analysis Focuses on OLAP cases, RealTime Analysis under development SQL Support Absent Present FaultTolerance All Nodes Need to Setup BITools Integration Under Development Present (Tableau or Excel) Integration with Kafka Present Absent Complex Queries Bad for big data sets Good Performance StorageType Bit-map Index OLAP Cube Underlying technology Own computation and storage cluster Hadoop for cube build , HBase for storage
  • 33. Miscellaneous Points to Consider…  Druid has limitation on table join.  Apache Kylin supports Star Schema.  Modern corporations are increasingly looking for near real time analytics and insights to make actionable decisions.  Druid is trying to support integration with BI tools using Apache Hive at Horton works. (https://ko.hortonworks.com/blog/apache-hive-druid-part-1-3/)  Previous version of Druid was under GPL v2 license.The latest version of Druid is under Apache license v2,Apache Kylin is under Apache License v2.  Druid has 181 contributors for their GitHub project whereas Apache Kylin has 60 contributors.
  • 34. References - OLAP & OLTP ● http://en.wikipedia.org/wiki/Online_analytical_processing ● http://www.dmreview.com/issues/19971101/964-l.html ● http://en.wikipedia.org/wiki/Extract,_transform,_load ● http://www.olapreport.com/Applications.html
  • 35. References-Apache Kylin  https://mail-archives.apache.org/mod_mbox/kylin- dev/201503.mbox/%3CCAKmQrOY0fjZLUU0MGo5aajZ2uLb3T0qJknHQd+Wv1oxd5PKixQ@mai l.gmail.com%3E  https://dzone.com/articles/apache-kylin-for-olap-on-hadoop  http://kylin.apache.org/docs16/  https://github.com/apache/kylin  https://resources.zaloni.com/blog/apache-kylin-for-olap-on-hadoop  https://en.wikipedia.org/wiki/Apache_Kylin  http://www.ebaytechblog.com/2014/10/20/announcing-kylin-extreme-olap-engine-for-big-data/
  • 36. References-Druid  http://druid.io/docs/latest/design/  http://druid.io/docs/latest/tutorials/ingestion.html  http://druid.io/docs/latest/design/segments.html  https://en.wikipedia.org/wiki/Druid_(open-source_data_store)
  • 37. References-Druid vs Apache Kylin  https://www.slideshare.net/freepsw/olap-for-big-data-druid-vs-apache-kylin-vs-apache-lens  http://markmail.org/message/mf6gfzdwfqwtbtv6#query:+page:1+mid:sp7ek7x5pawjlxb6+state:results  https://ko.hortonworks.com/blog/apache-hive-druid-part-1-3/  https://github.com/druid-io/druid  https://github.com/apache/kylin