Submit Search
Upload
Apache Kylin – Cubes on Hadoop
•
22 likes
•
8,546 views
DataWorks Summit
Follow
Apache Kylin – Cubes on Hadoop Ted Dunning MapR
Read less
Read more
Technology
Report
Share
Report
Share
1 of 42
Recommended
Design cube in Apache Kylin
Design cube in Apache Kylin
Yang Li
Accelerating Big Data Analytics with Apache Kylin
Accelerating Big Data Analytics with Apache Kylin
Tyler Wishnoff
The Evolution of Apache Kylin
The Evolution of Apache Kylin
DataWorks Summit/Hadoop Summit
Apache kylin 2.0: from classic olap to real-time data warehouse
Apache kylin 2.0: from classic olap to real-time data warehouse
Yang Li
Apache Kylin 101
Apache Kylin 101
SamanthaBerlant
Apache Kylin on HBase: Extreme OLAP engine for big data
Apache Kylin on HBase: Extreme OLAP engine for big data
Shi Shao Feng
Apache Kylin
Apache Kylin
BYOUNG GON KIM
Apache Kylin: Speed Up Cubing with Apache Spark with Luke Han and Shaofeng Shi
Apache Kylin: Speed Up Cubing with Apache Spark with Luke Han and Shaofeng Shi
Databricks
Recommended
Design cube in Apache Kylin
Design cube in Apache Kylin
Yang Li
Accelerating Big Data Analytics with Apache Kylin
Accelerating Big Data Analytics with Apache Kylin
Tyler Wishnoff
The Evolution of Apache Kylin
The Evolution of Apache Kylin
DataWorks Summit/Hadoop Summit
Apache kylin 2.0: from classic olap to real-time data warehouse
Apache kylin 2.0: from classic olap to real-time data warehouse
Yang Li
Apache Kylin 101
Apache Kylin 101
SamanthaBerlant
Apache Kylin on HBase: Extreme OLAP engine for big data
Apache Kylin on HBase: Extreme OLAP engine for big data
Shi Shao Feng
Apache Kylin
Apache Kylin
BYOUNG GON KIM
Apache Kylin: Speed Up Cubing with Apache Spark with Luke Han and Shaofeng Shi
Apache Kylin: Speed Up Cubing with Apache Spark with Luke Han and Shaofeng Shi
Databricks
Apache Kylin - Balance Between Space and Time
Apache Kylin - Balance Between Space and Time
DataWorks Summit
Cost-Based Optimizer in Apache Spark 2.2
Cost-Based Optimizer in Apache Spark 2.2
Databricks
Scaling containers with keda
Scaling containers with keda
Nilesh Gule
Apache Kylin Introduction
Apache Kylin Introduction
Luke Han
Spark tunning in Apache Kylin
Spark tunning in Apache Kylin
Shi Shao Feng
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
StreamNative
Scylla Summit 2022: New AWS Instances Perfect for ScyllaDB
Scylla Summit 2022: New AWS Instances Perfect for ScyllaDB
ScyllaDB
File Format Benchmark - Avro, JSON, ORC & Parquet
File Format Benchmark - Avro, JSON, ORC & Parquet
DataWorks Summit/Hadoop Summit
Hudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilities
Nishith Agarwal
Deploying Confluent Platform for Production
Deploying Confluent Platform for Production
confluent
Cost-based query optimization in Apache Hive
Cost-based query optimization in Apache Hive
Julian Hyde
Grafana Mimir and VictoriaMetrics_ Performance Tests.pptx
Grafana Mimir and VictoriaMetrics_ Performance Tests.pptx
RomanKhavronenko
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
huguk
Introduction to InfluxDB and TICK Stack
Introduction to InfluxDB and TICK Stack
Ahmed AbouZaid
AWSKRUG DS - 데이터 엔지니어가 실무에서 맞닥뜨리는 문제들
AWSKRUG DS - 데이터 엔지니어가 실무에서 맞닥뜨리는 문제들
Woong Seok Kang
MySQL 8.0 EXPLAIN ANALYZE
MySQL 8.0 EXPLAIN ANALYZE
Norvald Ryeng
Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...
Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...
Julian Hyde
How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...
DataWorks Summit/Hadoop Summit
Dr. Elephant for Monitoring and Tuning Apache Spark Jobs on Hadoop with Carl ...
Dr. Elephant for Monitoring and Tuning Apache Spark Jobs on Hadoop with Carl ...
Databricks
Introduction to MLflow
Introduction to MLflow
Databricks
Big Data Everywhere Chicago: SQL on Hadoop
Big Data Everywhere Chicago: SQL on Hadoop
BigDataEverywhere
Real time-hadoop
Real time-hadoop
Ted Dunning
More Related Content
What's hot
Apache Kylin - Balance Between Space and Time
Apache Kylin - Balance Between Space and Time
DataWorks Summit
Cost-Based Optimizer in Apache Spark 2.2
Cost-Based Optimizer in Apache Spark 2.2
Databricks
Scaling containers with keda
Scaling containers with keda
Nilesh Gule
Apache Kylin Introduction
Apache Kylin Introduction
Luke Han
Spark tunning in Apache Kylin
Spark tunning in Apache Kylin
Shi Shao Feng
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
StreamNative
Scylla Summit 2022: New AWS Instances Perfect for ScyllaDB
Scylla Summit 2022: New AWS Instances Perfect for ScyllaDB
ScyllaDB
File Format Benchmark - Avro, JSON, ORC & Parquet
File Format Benchmark - Avro, JSON, ORC & Parquet
DataWorks Summit/Hadoop Summit
Hudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilities
Nishith Agarwal
Deploying Confluent Platform for Production
Deploying Confluent Platform for Production
confluent
Cost-based query optimization in Apache Hive
Cost-based query optimization in Apache Hive
Julian Hyde
Grafana Mimir and VictoriaMetrics_ Performance Tests.pptx
Grafana Mimir and VictoriaMetrics_ Performance Tests.pptx
RomanKhavronenko
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
huguk
Introduction to InfluxDB and TICK Stack
Introduction to InfluxDB and TICK Stack
Ahmed AbouZaid
AWSKRUG DS - 데이터 엔지니어가 실무에서 맞닥뜨리는 문제들
AWSKRUG DS - 데이터 엔지니어가 실무에서 맞닥뜨리는 문제들
Woong Seok Kang
MySQL 8.0 EXPLAIN ANALYZE
MySQL 8.0 EXPLAIN ANALYZE
Norvald Ryeng
Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...
Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...
Julian Hyde
How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...
DataWorks Summit/Hadoop Summit
Dr. Elephant for Monitoring and Tuning Apache Spark Jobs on Hadoop with Carl ...
Dr. Elephant for Monitoring and Tuning Apache Spark Jobs on Hadoop with Carl ...
Databricks
Introduction to MLflow
Introduction to MLflow
Databricks
What's hot
(20)
Apache Kylin - Balance Between Space and Time
Apache Kylin - Balance Between Space and Time
Cost-Based Optimizer in Apache Spark 2.2
Cost-Based Optimizer in Apache Spark 2.2
Scaling containers with keda
Scaling containers with keda
Apache Kylin Introduction
Apache Kylin Introduction
Spark tunning in Apache Kylin
Spark tunning in Apache Kylin
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
Scylla Summit 2022: New AWS Instances Perfect for ScyllaDB
Scylla Summit 2022: New AWS Instances Perfect for ScyllaDB
File Format Benchmark - Avro, JSON, ORC & Parquet
File Format Benchmark - Avro, JSON, ORC & Parquet
Hudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilities
Deploying Confluent Platform for Production
Deploying Confluent Platform for Production
Cost-based query optimization in Apache Hive
Cost-based query optimization in Apache Hive
Grafana Mimir and VictoriaMetrics_ Performance Tests.pptx
Grafana Mimir and VictoriaMetrics_ Performance Tests.pptx
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
Introduction to InfluxDB and TICK Stack
Introduction to InfluxDB and TICK Stack
AWSKRUG DS - 데이터 엔지니어가 실무에서 맞닥뜨리는 문제들
AWSKRUG DS - 데이터 엔지니어가 실무에서 맞닥뜨리는 문제들
MySQL 8.0 EXPLAIN ANALYZE
MySQL 8.0 EXPLAIN ANALYZE
Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...
Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...
How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...
Dr. Elephant for Monitoring and Tuning Apache Spark Jobs on Hadoop with Carl ...
Dr. Elephant for Monitoring and Tuning Apache Spark Jobs on Hadoop with Carl ...
Introduction to MLflow
Introduction to MLflow
Similar to Apache Kylin – Cubes on Hadoop
Big Data Everywhere Chicago: SQL on Hadoop
Big Data Everywhere Chicago: SQL on Hadoop
BigDataEverywhere
Real time-hadoop
Real time-hadoop
Ted Dunning
IoT and Big Data - Iot Asia 2014
IoT and Big Data - Iot Asia 2014
John Berns
Keys for Success from Streams to Queries
Keys for Success from Streams to Queries
DataWorks Summit/Hadoop Summit
How the Internet of Things is Turning the Internet Upside Down
How the Internet of Things is Turning the Internet Upside Down
Ted Dunning
Dunning time-series-2015
Dunning time-series-2015
Ted Dunning
Dealing with an Upside Down Internet With High Performance Time Series Database
Dealing with an Upside Down Internet With High Performance Time Series Database
DataWorks Summit
Stinger Initiative - Deep Dive
Stinger Initiative - Deep Dive
Hortonworks
Using Hadoop to Offload Data Warehouse Processing and More - Brad Anserson
Using Hadoop to Offload Data Warehouse Processing and More - Brad Anserson
MapR Technologies
Apache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
Apache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
Xu Jiang
Introduction to Spark
Introduction to Spark
Carol McDonald
The Future of Hadoop: MapR VP of Product Management, Tomer Shiran
The Future of Hadoop: MapR VP of Product Management, Tomer Shiran
MapR Technologies
Apache Kylin: Hadoop OLAP Engine, 2014 Dec
Apache Kylin: Hadoop OLAP Engine, 2014 Dec
Yang Li
Big Data Ecosystem- Impetus Technologies
Big Data Ecosystem- Impetus Technologies
Impetus Technologies
Spark and MapR Streams: A Motivating Example
Spark and MapR Streams: A Motivating Example
Ian Downard
Predictive Analytics with Hadoop
Predictive Analytics with Hadoop
DataWorks Summit
Recommendation Techn
Recommendation Techn
Ted Dunning
Converged and Containerized Distributed Deep Learning With TensorFlow and Kub...
Converged and Containerized Distributed Deep Learning With TensorFlow and Kub...
Mathieu Dumoulin
Apache Kylin @ Big Data Europe 2015
Apache Kylin @ Big Data Europe 2015
Seshu Adunuthula
Introduction to Spark on Hadoop
Introduction to Spark on Hadoop
Carol McDonald
Similar to Apache Kylin – Cubes on Hadoop
(20)
Big Data Everywhere Chicago: SQL on Hadoop
Big Data Everywhere Chicago: SQL on Hadoop
Real time-hadoop
Real time-hadoop
IoT and Big Data - Iot Asia 2014
IoT and Big Data - Iot Asia 2014
Keys for Success from Streams to Queries
Keys for Success from Streams to Queries
How the Internet of Things is Turning the Internet Upside Down
How the Internet of Things is Turning the Internet Upside Down
Dunning time-series-2015
Dunning time-series-2015
Dealing with an Upside Down Internet With High Performance Time Series Database
Dealing with an Upside Down Internet With High Performance Time Series Database
Stinger Initiative - Deep Dive
Stinger Initiative - Deep Dive
Using Hadoop to Offload Data Warehouse Processing and More - Brad Anserson
Using Hadoop to Offload Data Warehouse Processing and More - Brad Anserson
Apache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
Apache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
Introduction to Spark
Introduction to Spark
The Future of Hadoop: MapR VP of Product Management, Tomer Shiran
The Future of Hadoop: MapR VP of Product Management, Tomer Shiran
Apache Kylin: Hadoop OLAP Engine, 2014 Dec
Apache Kylin: Hadoop OLAP Engine, 2014 Dec
Big Data Ecosystem- Impetus Technologies
Big Data Ecosystem- Impetus Technologies
Spark and MapR Streams: A Motivating Example
Spark and MapR Streams: A Motivating Example
Predictive Analytics with Hadoop
Predictive Analytics with Hadoop
Recommendation Techn
Recommendation Techn
Converged and Containerized Distributed Deep Learning With TensorFlow and Kub...
Converged and Containerized Distributed Deep Learning With TensorFlow and Kub...
Apache Kylin @ Big Data Europe 2015
Apache Kylin @ Big Data Europe 2015
Introduction to Spark on Hadoop
Introduction to Spark on Hadoop
More from DataWorks Summit
Data Science Crash Course
Data Science Crash Course
DataWorks Summit
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
DataWorks Summit
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
DataWorks Summit
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
DataWorks Summit
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
DataWorks Summit
Managing the Dewey Decimal System
Managing the Dewey Decimal System
DataWorks Summit
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
DataWorks Summit
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
DataWorks Summit
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
DataWorks Summit
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
DataWorks Summit
Security Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
DataWorks Summit
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
DataWorks Summit
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
DataWorks Summit
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
DataWorks Summit
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
DataWorks Summit
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
DataWorks Summit
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
DataWorks Summit
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
DataWorks Summit
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
DataWorks Summit
More from DataWorks Summit
(20)
Data Science Crash Course
Data Science Crash Course
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Managing the Dewey Decimal System
Managing the Dewey Decimal System
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Security Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Recently uploaded
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
Puma Security, LLC
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Drew Madelung
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
Delhi Call girls
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
The Digital Insurer
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
Principled Technologies
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
hans926745
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Miguel Araújo
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
Martijn de Jong
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
ThousandEyes
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
debabhi2
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
apidays
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
Malak Abu Hammad
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
Maria Levchenko
🐬 The future of MySQL is Postgres 🐘
🐬 The future of MySQL is Postgres 🐘
RTylerCroy
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
Igalia
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
Anna Loughnan Colquhoun
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
The Digital Insurer
Recently uploaded
(20)
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
🐬 The future of MySQL is Postgres 🐘
🐬 The future of MySQL is Postgres 🐘
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
Apache Kylin – Cubes on Hadoop
1.
© 2014 MapR
Technologies 1© 2014 MapR Technologies
2.
© 2014 MapR
Technologies 2 Who I am Ted Dunning, Chief Applications Architect, MapR Technologies Email tdunning@mapr.com tdunning@apache.org Twitter @Ted_Dunning VP Incubator Email tdunning@apache.org Twitter @ApacheMahout @ApacheDrill Credit for slides to Luke Han and the Kylin dev team
3.
© 2014 MapR
Technologies 3 Kylin Committers ankur Ankur Bansal jiangxu Jiang Xu liyang Li Yang lukehan Luke Han* mahongbin Hongbin Ma xduo Xiaodong Duo yisong George Song jhyde Julian Hyde The real deal Calcite plenipotentiary
4.
© 2014 MapR
Technologies 4 Agenda • What is Apache Kylin? • Features & Tech Highlights • Performance • Roadmap • Q & A
5.
© 2014 MapR
Technologies 5 What is Kylin? Extreme OLAP Engine for Big Data Kylin is an open source Distributed Analytics Engine from (originally from eBay) that provides SQL interface and multi-dimensional analysis (OLAP) on Hadoop for extremely large datasets kylin / ˈkiːˈlɪn / 麒麟 --n. (in Chinese art) a mythical animal of composite form • Open Sourced on Oct 1st, 2014 • Accepted into incubation November, 2014 • Preparing for first Apache release
6.
© 2014 MapR
Technologies 6 Big Data Obligatory Slide • More and more data becoming available on Hadoop • Limitations in existing Business Intelligence (BI) Tools – Limited support for Hadoop – Data size growing exponentially – High latency of interactive queries – Scale-Up architecture • Challenges to adopt Hadoop as interactive analysis system – Majority of analyst groups are SQL savvy – No mature SQL interface on Hadoop – OLAP capability on Hadoop ecosystem not ready yet
7.
© 2014 MapR
Technologies 7 Goals • Sub-second query latency on billions of rows • ANSI SQL for both analysts and engineers • Full OLAP capability to offer advanced functionality • Seamless Integration with BI Tools • Support for high cardinality and dimensionality • High concurrency – thousands of end users • Distributed and scale out architecture for large data volume
8.
© 2014 MapR
Technologies 8 Possible Strategies • Build from scratch – A grand tradition – Large-scale SQL support is much harder than it looks – Huge level of distraction • Patch Hive – Not feasible due to design assumptions in Hive – Weak optimizer – Hive isn’t standard SQL anyway – (but isn’t Hive moving to Calcite?)
9.
© 2014 MapR
Technologies 9 Kylin’s Strategy • Use Calcite as SQL core – Real SQL – Real cost-based optimizer – Already in Apache – Provides linkage to Apache Drill and future of Hive • Build cubes externally – Don’t care which tools, currently Hive, soon Spark • Use Calcite’s Rex interpreter – Assumes final aggregations fit on one machine • Possibly integrate with Drill at some point for parallel execution
10.
© 2014 MapR
Technologies 10 Transaction Operation Strategy Analytics Query Taxonomy High Level Aggregation • Very High Level, e.g GMV by site by vertical by weeks Analysis Query • Mid-level, e.g GMV by site by vertical, by category (level x) past 12 weeks Drill Down to Detail • Detail Level (Summary Table) Low Level Aggregation • First Level Aggregation Transaction Level • Transaction Data OLAP Kylin is designed to accelerate 80+% of analytics queries on Hadoop OLTP
11.
© 2014 MapR
Technologies 11 Technical Challenges • Huge volume data – Table scan • Big table joins – Data shuffling • Analysis on different granularity – Runtime aggregation expensive • Map Reduce job – Batch processing
12.
© 2014 MapR
Technologies 12 How Cubes Work • Start with a simple table – revenue,time,item,location,supplier • Build a table of aggregates for every combination of fields select sum(revenue), max(revenue), supplier from tbl group by time,item,location; select sum(revenue), max(revenue), location,supplier from tbl group by time,item; select sum(revenue), max(revenue), location from tbl group by time,item,supplier; … • Then transform queries using appropriate magic select sum(revenue), city from tbl join location_details where state = ‘MN’ group by city select … from (select sum(),location from cube) join location_details where state = ‘MN’ group by city
13.
© 2014 MapR
Technologies 13 How Cubes Don’t Work • Total number of cubes is exponential in columns • High cardinality can result in large cubes • Skewed data can make cubes larger as original data • Magic may be insufficient to recognize cubable queries • Keeping cubes up to date can be hard • Forget OLTP thoughts like pervasive transactions
14.
© 2014 MapR
Technologies 15 OLAP Cube – Balance between Space and Time Base vs. aggregate cells; ancestor vs. descendant cells; parent vs. child cells 1. (9/15, milk, Urbana, Dairy_land) - <time, item, location, supplier> 2. (9/15, milk, Urbana, *) - <time, item, location> 3. (*, milk, Urbana, *) - <item, location> 4. (*, milk, Chicago, *) - <item, location> 5. (*, milk, *, *) - <item> Cuboid = one combination of dimensions Cube = all combinations of dimensions 1111 0111 1011 1101 1110 0011 0101 0110 1001 1010 1100 0001 0010 0100 1000 0000
15.
© 2014 MapR
Technologies 16 From Relational to Key-Value
16.
© 2014 MapR
Technologies 17 Kylin Architecture Overview 17 Cube Build Engine (MapReduce…) SQL Low Latency - SecondsMid Latency - Minutes Routing 3rd Party App (Web App, Mobile…) Metadata SQL Tools (BI Tools: Tableau…) Query Engine Hadoop Hive REST API JDBC/ODBC Online Analysis Data Flow Offline Data Flow Clients/Users interactive with Kylin via SQL OLAP Cube is transparent to users Star Schema Data Key Value Data Data CubeOLAP Cube (HBase) SQL REST Server
17.
© 2014 MapR
Technologies 18 Kylin Depends on Hadoop Eco-system • Hive – Input source, pre-join star schema during cube building • MapReduce – Aggregate metrics during cube building • HDFS – Store intermediate files during cube building • HBase – Store and query data cubes • Calcite – SQL parsing, code generation, optimization
18.
© 2014 MapR
Technologies 19 Agenda • What is Apache Kylin? • Features & Tech Highlights • Performance • Roadmap • Q & A
19.
© 2014 MapR
Technologies 20 Kylin Highlights • Extremely Fast OLAP Engine at Scale Kylin is designed to reduce query latency on Hadoop for 10+ billions of rows of data to seconds • ANSI SQL Interface on Hadoop Kylin offers ANSI SQL on Hadoop and supports most ANSI SQL query functions • Seamless Integration with BI Tools Kylin currently offers integration capability with BI Tools like Tableau. • Interactive Query Capability Users can interact with Hadoop data via Kylin at sub-second latency • MOLAP Cube User can define a data model and pre-build in Kylin with more than 10+ billions of raw data records
20.
© 2014 MapR
Technologies 21 More Highlights • Compression and Encoding Support • Incremental Refresh of Cubes • Approximate Query Capability for distinct Count (HyperLogLog) • Leverage HBase Coprocessor for query latency • Job Management and Monitoring • Easy Web interface to manage, build, monitor and query cubes • Security capability to set ACL at Cube/Project Level • Support LDAP Integration
21.
© 2014 MapR
Technologies 22 Cube Designer
22.
© 2014 MapR
Technologies 23 Job Management
23.
© 2014 MapR
Technologies 24 Query and Visualization
24.
© 2014 MapR
Technologies 25 Tableau Integration
25.
© 2014 MapR
Technologies 26 Data Modeling Points of View Cube: … Fact Table: … Dimensions: … Measures: … Storage(HBase): … Fact Dim Dim Dim Source Star Schema row A row B row C Column Family Val 1 Val 2 Val 3 Row Key Column Target HBase Storage Mapping Cube Metadata End User Cube Modeler Admin
26.
© 2014 MapR
Technologies 27 Process Flow Source Joined tables Build dict Dimension dictionaries Hive
27.
© 2014 MapR
Technologies 28 Process Flow Joined n cuboid n-1 cuboids Apex cuboid MR MR Dimension dictionaries MR
28.
© 2014 MapR
Technologies 29 Process flow n cuboid n-1 cuboids Apex cuboid MR H-files HBase
29.
© 2014 MapR
Technologies 30 How To Store Cube? – HBase Schema
30.
© 2014 MapR
Technologies 31 SELECT test_cal_dt.week_beg_dt, test_category.category_name, test_category.lvl2_name, test_category.lvl3_name, test_kylin_fact.lstg_format_name, test_sites.site_name, SUM(test_kylin_fact.price) AS GMV, COUNT(*) AS TRANS_CNT FROM test_kylin_fact LEFT JOIN test_cal_dt ON test_kylin_fact.cal_dt = test_cal_dt.cal_dt LEFT JOIN test_category ON test_kylin_fact.leaf_categ_id = test_category.leaf_categ_id AND test_kylin_fact.lstg_site_id = test_category.site_id LEFT JOIN test_sites ON test_kylin_fact.lstg_site_id = test_sites.site_id WHERE test_kylin_fact.seller_id = 123456OR test_kylin_fact.lstg_format_name = ’New' GROUP BY test_cal_dt.week_beg_dt, test_category.category_name, test_category.lvl2_name, test_category.lvl3_name, test_kylin_fact.lstg_format_name,test_sites.site_name OLAPToEnumerableConverter OLAPProjectRel(WEEK_BEG_DT=[$0], category_name=[$1], CATEG_LVL2_NAME=[$2], CATEG_LVL3_NAME=[$3], LSTG_FORMAT_NAME=[$4], SITE_NAME=[$5], GMV=[CASE(=($7, 0), null, $6)], TRANS_CNT=[$8]) OLAPAggregateRel(group=[{0, 1, 2, 3, 4, 5}], agg#0=[$SUM0($6)], agg#1=[COUNT($6)], TRANS_CNT=[COUNT()]) OLAPProjectRel(WEEK_BEG_DT=[$13], category_name=[$21], CATEG_LVL2_NAME=[$15], CATEG_LVL3_NAME=[$14], LSTG_FORMAT_NAME=[$5], SITE_NAME=[$23], PRICE=[$0]) OLAPFilterRel(condition=[OR(=($3, 123456), =($5, ’New'))]) OLAPJoinRel(condition=[=($2, $25)], joinType=[left]) OLAPJoinRel(condition=[AND(=($6, $22), =($2, $17))], joinType=[left]) OLAPJoinRel(condition=[=($4, $12)], joinType=[left]) OLAPTableScan(table=[[DEFAULT, TEST_KYLIN_FACT]], fields=[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]]) OLAPTableScan(table=[[DEFAULT, TEST_CAL_DT]], fields=[[0, 1]]) OLAPTableScan(table=[[DEFAULT, test_category]], fields=[[0, 1, 2, 3, 4, 5, 6, 7, 8]]) OLAPTableScan(table=[[DEFAULT, TEST_SITES]], fields=[[0, 1, 2]]) Query Engine – Kylin Explain Plan
31.
© 2014 MapR
Technologies 32 Now Let’s Make it Really Work • Full Cube – Pre-aggregate all dimension combinations – “Curse of dimensionality”: N dimension cube has 2N cuboid. • Partial Cube – To avoid dimension explosion, we divide the dimensions into different aggregation groups • 2N+M+L 2N + 2M + 2L – For cube with 30 dimensions, if we divide these dimensions into 3 group, the cuboid count is reduced from 1 Billion to 3 thousand • 230 210 + 210 + 210 – Tradeoff between online aggregation and offline pre-aggregation
32.
© 2014 MapR
Technologies 34 Incremental Cube Building
33.
© 2014 MapR
Technologies 36 Agenda • What is Apache Kylin? • Features & Tech Highlights • Performance • Roadmap • Q & A
34.
© 2014 MapR
Technologies 37 # Query Type Return Dataset Query On Kylin (s) Query On Hive (s) Comments 1 High Level Aggregation 4 0.129 157.437 1,217 times 2 Analysis Query 22,669 1.615 109.206 68 times 3 Drill Down to Detail 325,029 12.058 113.123 9 times 4 Drill Down to Detail 524,780 22.42 6383.21 278 times 5 Data Dump 972,002 49.054 N/A 0 50 100 150 200 SQL #1 SQL #2 SQL #3 Hive Kylin High Level Aggregati on Analysis Query Drill Down to Detail Low Level Aggregati on Transactio n Level Based on 12+B records Kylin vs. Hive
35.
© 2014 MapR
Technologies 38 Performance Scaleout Linear scale out with more nodes
36.
© 2014 MapR
Technologies 39 Performance - Query Latency 99 %-ile 95 %-ile
37.
© 2014 MapR
Technologies 40 Agenda • What is Apache Kylin? • Features & Tech Highlights • Performance • Roadmap • Q & A
38.
© 2014 MapR
Technologies 41 201520142013 Initial Prototype for MOLAP • Basic end to end POC MOLAP • Incremental Refresh • ANSI SQL • ODBC Driver • Web GUI • ACL • Open Source HOLAP • Streaming OLAP • JDBC Driver • New UI • Excel Support • … more Next Gen • Automation • Capacity Management • In-Memory Analysis (TBD) • Spark (TBD) • … more TBD Future… Sep, 2013 Jan, 2014 Sep, 2014 Q1, 2015 Kylin History and Roadmap
39.
© 2014 MapR
Technologies 42 Kylin Ecosystem • Kylin Core – Fundamental framework of Kylin OLAP Engine • Extension – Plugins to support for additional functions and features • Integration – Lifecycle Management Support to integrate with other applications • Interface – Allows for third party users to build more features via user-interface atop Kylin core • Driver – ODBC and JDBC Drivers Kylin OLAP Core Extension Security Redis Storage Spark Engine Docker Interface Web Console Customized BI Ambari/Hue Plugin Integration ODBC Driver ETL Drill SparkSQL
40.
© 2014 MapR
Technologies 44 If you want to go fast, go alone. If you want to go far, go together. --African Proverb
41.
© 2014 MapR
Technologies 45 Agenda • What is Apache Kylin? • Features & Tech Highlights • Performance • Roadmap • Q & A
42.
© 2014 MapR
Technologies 46 Q&A @mapr maprtech tdunning@mapr.com Engage with us! MapR maprtech mapr-technologies