SlideShare a Scribd company logo
Open Source Technologies in
the Analytics Revolution
Zitong Wei
Rachel Beddor
Introduction
© Kyligence Inc. 2021, Confidential.
Open Source in Big Data
Why we love open source
• Free
• Customizable
• Innovation
• Choices
• Standard
• Personal growth
© Kyligence Inc. 2021, Confidential.
Data Pipeline for Analytics
Collect Process Store Analyze
© Kyligence Inc. 2021, Confidential.
Data Pipeline for Analytics
Structured
Data
Unstructured
Data
Semi-Structured
Data
© Kyligence Inc. 2021, Confidential.
Typical Use Cases
• Customer Analysis
• Operational Efficiency
• Abnormal/Fraud Detection
• Recommendation
• Self Service Analysis
© Kyligence Inc. 2021, Confidential.
Platform
Apache Hadoop, Apache Spark
Ingestion & ETL
Nifi, Sqoop, Airflow, Gobblin
Streaming
Kafka, Spark Streaming, Flink, Samza,
Storm, Flume
Open Source Projects
NoSQL
HBase, Cassandra, MongoDB
SQL
Spark SQL, Hive, Impala, Presto
Machine Learning
Python, R, Tensorflow
Report & Visualization
Superset, Jupyter, Zepplin
Platform
© Kyligence Inc. 2021, Confidential.
MapReduce
© Kyligence Inc. 2021, Confidential.
© Kyligence Inc. 2021, Confidential.
Ingestion & ETL
© Kyligence Inc. 2021, Confidential.
© Kyligence Inc. 2021, Confidential.
© Kyligence Inc. 2021, Confidential.
Storage
© Kyligence Inc. 2021, Confidential.
© Kyligence Inc. 2021, Confidential.
© Kyligence Inc. 2021, Confidential.
Analyze
© Kyligence Inc. 2021, Confidential.
Reporting & Visualization
© Kyligence Inc. 2021, Confidential.
SQL Engine
Apache Kylin
© Kyligence Inc. 2021, Confidential.
OLAP (Online Analytical Processing)
Good at:
 Designed for analysis – BI reporting, data
discovery etc.
 Quickly answering questions like:
 What are our top 5 best-selling products in each
state/city?
 Which products should be put together?
 What is our profit for beer in the US this year?
Online Analytical Processing
Not good at:
• Update/delete frequently
• Transactional data
© Kyligence Inc. 2021, Confidential.
OLAP Cubes
120 80 60
50 130 90
70 50 100
April May June
New York
Los Angeles
San Francisco
Beer
Milk
Juice
Q: How many beers were sold in Los
Angeles in June?
A: 90
© Kyligence Inc. 2021, Confidential.
Traditional OLAP Tools
© Kyligence Inc. 2021, Confidential.
Challenges in the Big Data Era
Traditional OLAP tools are great but…
• Difficult to handle massive data volumes
• Cube size limited by a single machine
• Have to maintain lots of cubes
• Hard to scale
• Takes a long time to build cubes
• Number of dimensions is limited
© Kyligence Inc. 2021, Confidential.
Journey of Apache Kylin
Sept 2013 Oct 2014 Nov 2014 Sept 2015 Nov 2015 Mar 2016
Officially
Open Source
Project
Initiated
Apache
Incubator Project
InfoWorld
Best Open Source
Big Data Tool Award Kyligence Inc.
Founded
Apache Top-Level
Project
© Kyligence Inc. 2021, Confidential.
Apache Kylin Architecture
BI Tools, Web App…
ANSI SQL
OLAP Cube
© Kyligence Inc. 2021, Confidential.
Performance Benchmark
© Kyligence Inc. 2021, Confidential.
Apache Kylin Data Flow
© Kyligence Inc. 2021, Confidential.
Kylin on Lambda
Demonstration
© Kyligence Inc. 2021, Confidential.
Demonstration – Technical Details
© Kyligence Inc. 2021, Confidential.
Demonstration – Dataset Details
Fact Table
Movie Box Office Revenue
Dimension Table
Dates
Dimension Table
Movie Genres
© Kyligence Inc. 2021, Confidential.
Join the community
https://github.com/apache/kylin apache-kylin.slack.com
user@kylin.apache.org
© Kyligence Inc. 2021, Confidential.
Contact Us
Kyligence Inc
 http://kyligence.io
 info@kyligence.io
 Twitter: @Kyligence
Apache Kylin
 http://kylin.apache.org
 dev@kylin.apache.org
 Twitter: @ApacheKylin
© Kyligence Inc. 2021, Confidential.

More Related Content

What's hot

Kyligence Cloud 4 - Feature Focus: Spark-Powered Cubing and Indexing
Kyligence Cloud 4 - Feature Focus: Spark-Powered Cubing and IndexingKyligence Cloud 4 - Feature Focus: Spark-Powered Cubing and Indexing
Kyligence Cloud 4 - Feature Focus: Spark-Powered Cubing and Indexing
SamanthaBerlant
 
Augmented OLAP for Big Data Analytics
Augmented OLAP for Big Data AnalyticsAugmented OLAP for Big Data Analytics
Augmented OLAP for Big Data Analytics
Tyler Wishnoff
 
Pivotal Digital Transformation Forum: Requirements to Become a Data-Driven En...
Pivotal Digital Transformation Forum: Requirements to Become a Data-Driven En...Pivotal Digital Transformation Forum: Requirements to Become a Data-Driven En...
Pivotal Digital Transformation Forum: Requirements to Become a Data-Driven En...
VMware Tanzu
 
Event Sponsor NetApp - CSO- Jon Kissane
Event Sponsor NetApp - CSO- Jon Kissane  Event Sponsor NetApp - CSO- Jon Kissane
Event Sponsor NetApp - CSO- Jon Kissane
Hostway|HOSTING
 
Unleash the Power of Big Data and Machine Learning
Unleash the Power of Big Data and Machine LearningUnleash the Power of Big Data and Machine Learning
Unleash the Power of Big Data and Machine Learning
Talend
 
Achieving Agility and Scale for Your Data Lake - Talend
Achieving Agility and Scale for Your Data Lake - TalendAchieving Agility and Scale for Your Data Lake - Talend
Achieving Agility and Scale for Your Data Lake - Talend
Talend
 
Pivotal corporate story by CS Park
Pivotal corporate story by CS ParkPivotal corporate story by CS Park
Pivotal corporate story by CS Park
VMware Tanzu Korea
 
Qlik sense- Technical Seminar
Qlik sense- Technical SeminarQlik sense- Technical Seminar
Qlik sense- Technical SeminarSanjana Gondane
 
Moving data to the cloud BY CESAR ROJAS from Pivotal
Moving data to the cloud BY CESAR ROJAS from PivotalMoving data to the cloud BY CESAR ROJAS from Pivotal
Moving data to the cloud BY CESAR ROJAS from Pivotal
VMware Tanzu Korea
 
Tableau Conference 2018: Binging on Data - Enabling Analytics at Netflix
Tableau Conference 2018: Binging on Data - Enabling Analytics at NetflixTableau Conference 2018: Binging on Data - Enabling Analytics at Netflix
Tableau Conference 2018: Binging on Data - Enabling Analytics at Netflix
Blake Irvine
 
Transformational Search Performance with EnergyIQ
Transformational Search Performance with EnergyIQ Transformational Search Performance with EnergyIQ
Transformational Search Performance with EnergyIQ
Elasticsearch
 
Smart application on Azure at Vattenfall - Rens Weijers & Peter van 't Hof
Smart application on Azure at Vattenfall - Rens Weijers & Peter van 't HofSmart application on Azure at Vattenfall - Rens Weijers & Peter van 't Hof
Smart application on Azure at Vattenfall - Rens Weijers & Peter van 't Hof
GoDataDriven
 
VYW_Online Live Story Pitch OK
VYW_Online Live Story Pitch OKVYW_Online Live Story Pitch OK
VYW_Online Live Story Pitch OKMarco Zampieri
 
Embracing Cloud Agility to Maximize Flexibility & Performance
Embracing Cloud Agility to Maximize Flexibility & Performance Embracing Cloud Agility to Maximize Flexibility & Performance
Embracing Cloud Agility to Maximize Flexibility & Performance
Talend
 
Artificial intelligence in actions: delivering a new experience to Formula 1 ...
Artificial intelligence in actions: delivering a new experience to Formula 1 ...Artificial intelligence in actions: delivering a new experience to Formula 1 ...
Artificial intelligence in actions: delivering a new experience to Formula 1 ...
GoDataDriven
 
Using AI-powered Automation for High Performance Data Pipelines in the Cloud
Using AI-powered Automation for High Performance Data Pipelines in the CloudUsing AI-powered Automation for High Performance Data Pipelines in the Cloud
Using AI-powered Automation for High Performance Data Pipelines in the Cloud
DevOps.com
 
Notebooks @ Netflix: From analytics to engineering with Jupyter notebooks
Notebooks @ Netflix: From analytics to engineering with Jupyter notebooksNotebooks @ Netflix: From analytics to engineering with Jupyter notebooks
Notebooks @ Netflix: From analytics to engineering with Jupyter notebooks
Michelle Ufford
 
Data Driven Possibilities with Qlik
Data Driven Possibilities with QlikData Driven Possibilities with Qlik
Data Driven Possibilities with Qlik
Mischa van Werkhoven
 
Enterprise asset management analytics
Enterprise asset management analyticsEnterprise asset management analytics
Enterprise asset management analytics
Nitai Partners Inc
 
TechTuesdays Session 2
TechTuesdays Session 2TechTuesdays Session 2
TechTuesdays Session 2
Informatica Cloud
 

What's hot (20)

Kyligence Cloud 4 - Feature Focus: Spark-Powered Cubing and Indexing
Kyligence Cloud 4 - Feature Focus: Spark-Powered Cubing and IndexingKyligence Cloud 4 - Feature Focus: Spark-Powered Cubing and Indexing
Kyligence Cloud 4 - Feature Focus: Spark-Powered Cubing and Indexing
 
Augmented OLAP for Big Data Analytics
Augmented OLAP for Big Data AnalyticsAugmented OLAP for Big Data Analytics
Augmented OLAP for Big Data Analytics
 
Pivotal Digital Transformation Forum: Requirements to Become a Data-Driven En...
Pivotal Digital Transformation Forum: Requirements to Become a Data-Driven En...Pivotal Digital Transformation Forum: Requirements to Become a Data-Driven En...
Pivotal Digital Transformation Forum: Requirements to Become a Data-Driven En...
 
Event Sponsor NetApp - CSO- Jon Kissane
Event Sponsor NetApp - CSO- Jon Kissane  Event Sponsor NetApp - CSO- Jon Kissane
Event Sponsor NetApp - CSO- Jon Kissane
 
Unleash the Power of Big Data and Machine Learning
Unleash the Power of Big Data and Machine LearningUnleash the Power of Big Data and Machine Learning
Unleash the Power of Big Data and Machine Learning
 
Achieving Agility and Scale for Your Data Lake - Talend
Achieving Agility and Scale for Your Data Lake - TalendAchieving Agility and Scale for Your Data Lake - Talend
Achieving Agility and Scale for Your Data Lake - Talend
 
Pivotal corporate story by CS Park
Pivotal corporate story by CS ParkPivotal corporate story by CS Park
Pivotal corporate story by CS Park
 
Qlik sense- Technical Seminar
Qlik sense- Technical SeminarQlik sense- Technical Seminar
Qlik sense- Technical Seminar
 
Moving data to the cloud BY CESAR ROJAS from Pivotal
Moving data to the cloud BY CESAR ROJAS from PivotalMoving data to the cloud BY CESAR ROJAS from Pivotal
Moving data to the cloud BY CESAR ROJAS from Pivotal
 
Tableau Conference 2018: Binging on Data - Enabling Analytics at Netflix
Tableau Conference 2018: Binging on Data - Enabling Analytics at NetflixTableau Conference 2018: Binging on Data - Enabling Analytics at Netflix
Tableau Conference 2018: Binging on Data - Enabling Analytics at Netflix
 
Transformational Search Performance with EnergyIQ
Transformational Search Performance with EnergyIQ Transformational Search Performance with EnergyIQ
Transformational Search Performance with EnergyIQ
 
Smart application on Azure at Vattenfall - Rens Weijers & Peter van 't Hof
Smart application on Azure at Vattenfall - Rens Weijers & Peter van 't HofSmart application on Azure at Vattenfall - Rens Weijers & Peter van 't Hof
Smart application on Azure at Vattenfall - Rens Weijers & Peter van 't Hof
 
VYW_Online Live Story Pitch OK
VYW_Online Live Story Pitch OKVYW_Online Live Story Pitch OK
VYW_Online Live Story Pitch OK
 
Embracing Cloud Agility to Maximize Flexibility & Performance
Embracing Cloud Agility to Maximize Flexibility & Performance Embracing Cloud Agility to Maximize Flexibility & Performance
Embracing Cloud Agility to Maximize Flexibility & Performance
 
Artificial intelligence in actions: delivering a new experience to Formula 1 ...
Artificial intelligence in actions: delivering a new experience to Formula 1 ...Artificial intelligence in actions: delivering a new experience to Formula 1 ...
Artificial intelligence in actions: delivering a new experience to Formula 1 ...
 
Using AI-powered Automation for High Performance Data Pipelines in the Cloud
Using AI-powered Automation for High Performance Data Pipelines in the CloudUsing AI-powered Automation for High Performance Data Pipelines in the Cloud
Using AI-powered Automation for High Performance Data Pipelines in the Cloud
 
Notebooks @ Netflix: From analytics to engineering with Jupyter notebooks
Notebooks @ Netflix: From analytics to engineering with Jupyter notebooksNotebooks @ Netflix: From analytics to engineering with Jupyter notebooks
Notebooks @ Netflix: From analytics to engineering with Jupyter notebooks
 
Data Driven Possibilities with Qlik
Data Driven Possibilities with QlikData Driven Possibilities with Qlik
Data Driven Possibilities with Qlik
 
Enterprise asset management analytics
Enterprise asset management analyticsEnterprise asset management analytics
Enterprise asset management analytics
 
TechTuesdays Session 2
TechTuesdays Session 2TechTuesdays Session 2
TechTuesdays Session 2
 

Similar to Open Source Technologies in the Analytics Revolution

Apache Kylin 101
Apache Kylin 101Apache Kylin 101
Apache Kylin 101
SamanthaBerlant
 
Building Enterprise OLAP on Hadoop for FSI
Building Enterprise OLAP on Hadoop for FSIBuilding Enterprise OLAP on Hadoop for FSI
Building Enterprise OLAP on Hadoop for FSI
Luke Han
 
Can you Re-Platform your Teradata, Oracle, Netezza and SQL Server Analytic Wo...
Can you Re-Platform your Teradata, Oracle, Netezza and SQL Server Analytic Wo...Can you Re-Platform your Teradata, Oracle, Netezza and SQL Server Analytic Wo...
Can you Re-Platform your Teradata, Oracle, Netezza and SQL Server Analytic Wo...
DataWorks Summit
 
HP Helion Webinar #4 - Open stack the magic pill
HP Helion Webinar #4 - Open stack the magic pillHP Helion Webinar #4 - Open stack the magic pill
HP Helion Webinar #4 - Open stack the magic pill
BeMyApp
 
Cloud-native Semantic Layer on Data Lake
Cloud-native Semantic Layer on Data LakeCloud-native Semantic Layer on Data Lake
Cloud-native Semantic Layer on Data Lake
Databricks
 
Architecting Snowflake for High Concurrency and High Performance
Architecting Snowflake for High Concurrency and High PerformanceArchitecting Snowflake for High Concurrency and High Performance
Architecting Snowflake for High Concurrency and High Performance
SamanthaBerlant
 
Apache Kylin and Use Cases - 2018 Big Data Spain
Apache Kylin and Use Cases - 2018 Big Data SpainApache Kylin and Use Cases - 2018 Big Data Spain
Apache Kylin and Use Cases - 2018 Big Data Spain
Luke Han
 
Batched To Perfection: Modeling & Solving Business Problems With Apache Spark
Batched To Perfection: Modeling & Solving Business Problems  With Apache SparkBatched To Perfection: Modeling & Solving Business Problems  With Apache Spark
Batched To Perfection: Modeling & Solving Business Problems With Apache Spark
Eliav Lavi
 
Accelerating Big Data Analytics with Apache Kylin
Accelerating Big Data Analytics with Apache KylinAccelerating Big Data Analytics with Apache Kylin
Accelerating Big Data Analytics with Apache Kylin
Tyler Wishnoff
 
Democratizing Apache Spark for the Enterprise with Jonathan Gole
Democratizing Apache Spark for the Enterprise with Jonathan GoleDemocratizing Apache Spark for the Enterprise with Jonathan Gole
Democratizing Apache Spark for the Enterprise with Jonathan Gole
Databricks
 
Apache Kylin Use Cases in China and Japan
Apache Kylin Use Cases in China and JapanApache Kylin Use Cases in China and Japan
Apache Kylin Use Cases in China and Japan
Luke Han
 
Gab Genai Cloudera - Going Beyond Traditional Analytic
Gab Genai Cloudera - Going Beyond Traditional Analytic Gab Genai Cloudera - Going Beyond Traditional Analytic
Gab Genai Cloudera - Going Beyond Traditional Analytic
IntelAPAC
 
Deploying, Managing, and Leveraging Honeypots in the Enterprise using Open So...
Deploying, Managing, and Leveraging Honeypots in the Enterprise using Open So...Deploying, Managing, and Leveraging Honeypots in the Enterprise using Open So...
Deploying, Managing, and Leveraging Honeypots in the Enterprise using Open So...
Jason Trost
 
Smashing Through Big Data Barriers with Tableau and Snowflake
Smashing Through Big Data Barriers with Tableau and SnowflakeSmashing Through Big Data Barriers with Tableau and Snowflake
Smashing Through Big Data Barriers with Tableau and Snowflake
SamanthaBerlant
 
Addressing the systemic shortcomings of cloud analytics
Addressing the systemic shortcomings of cloud analyticsAddressing the systemic shortcomings of cloud analytics
Addressing the systemic shortcomings of cloud analytics
SamanthaBerlant
 
Turn Data Into Actionable Insights - StampedeCon 2016
Turn Data Into Actionable Insights - StampedeCon 2016Turn Data Into Actionable Insights - StampedeCon 2016
Turn Data Into Actionable Insights - StampedeCon 2016
StampedeCon
 
Hadoop as an Analytic Platform: Why Not?
Hadoop as an Analytic Platform: Why Not?Hadoop as an Analytic Platform: Why Not?
Hadoop as an Analytic Platform: Why Not?
Inside Analysis
 
ISV Showcase: End-to-end Machine Learning using H2O on Azure
ISV Showcase: End-to-end Machine Learning using H2O on AzureISV Showcase: End-to-end Machine Learning using H2O on Azure
ISV Showcase: End-to-end Machine Learning using H2O on Azure
Microsoft Tech Community
 
SQL + Hadoop: The High Performance Advantage�
SQL + Hadoop:  The High Performance Advantage�SQL + Hadoop:  The High Performance Advantage�
SQL + Hadoop: The High Performance Advantage�
Actian Corporation
 
Cloud scale predictive DevOps automation using Apache Spark: Velocity in Amst...
Cloud scale predictive DevOps automation using Apache Spark: Velocity in Amst...Cloud scale predictive DevOps automation using Apache Spark: Velocity in Amst...
Cloud scale predictive DevOps automation using Apache Spark: Velocity in Amst...
Romeo Kienzler
 

Similar to Open Source Technologies in the Analytics Revolution (20)

Apache Kylin 101
Apache Kylin 101Apache Kylin 101
Apache Kylin 101
 
Building Enterprise OLAP on Hadoop for FSI
Building Enterprise OLAP on Hadoop for FSIBuilding Enterprise OLAP on Hadoop for FSI
Building Enterprise OLAP on Hadoop for FSI
 
Can you Re-Platform your Teradata, Oracle, Netezza and SQL Server Analytic Wo...
Can you Re-Platform your Teradata, Oracle, Netezza and SQL Server Analytic Wo...Can you Re-Platform your Teradata, Oracle, Netezza and SQL Server Analytic Wo...
Can you Re-Platform your Teradata, Oracle, Netezza and SQL Server Analytic Wo...
 
HP Helion Webinar #4 - Open stack the magic pill
HP Helion Webinar #4 - Open stack the magic pillHP Helion Webinar #4 - Open stack the magic pill
HP Helion Webinar #4 - Open stack the magic pill
 
Cloud-native Semantic Layer on Data Lake
Cloud-native Semantic Layer on Data LakeCloud-native Semantic Layer on Data Lake
Cloud-native Semantic Layer on Data Lake
 
Architecting Snowflake for High Concurrency and High Performance
Architecting Snowflake for High Concurrency and High PerformanceArchitecting Snowflake for High Concurrency and High Performance
Architecting Snowflake for High Concurrency and High Performance
 
Apache Kylin and Use Cases - 2018 Big Data Spain
Apache Kylin and Use Cases - 2018 Big Data SpainApache Kylin and Use Cases - 2018 Big Data Spain
Apache Kylin and Use Cases - 2018 Big Data Spain
 
Batched To Perfection: Modeling & Solving Business Problems With Apache Spark
Batched To Perfection: Modeling & Solving Business Problems  With Apache SparkBatched To Perfection: Modeling & Solving Business Problems  With Apache Spark
Batched To Perfection: Modeling & Solving Business Problems With Apache Spark
 
Accelerating Big Data Analytics with Apache Kylin
Accelerating Big Data Analytics with Apache KylinAccelerating Big Data Analytics with Apache Kylin
Accelerating Big Data Analytics with Apache Kylin
 
Democratizing Apache Spark for the Enterprise with Jonathan Gole
Democratizing Apache Spark for the Enterprise with Jonathan GoleDemocratizing Apache Spark for the Enterprise with Jonathan Gole
Democratizing Apache Spark for the Enterprise with Jonathan Gole
 
Apache Kylin Use Cases in China and Japan
Apache Kylin Use Cases in China and JapanApache Kylin Use Cases in China and Japan
Apache Kylin Use Cases in China and Japan
 
Gab Genai Cloudera - Going Beyond Traditional Analytic
Gab Genai Cloudera - Going Beyond Traditional Analytic Gab Genai Cloudera - Going Beyond Traditional Analytic
Gab Genai Cloudera - Going Beyond Traditional Analytic
 
Deploying, Managing, and Leveraging Honeypots in the Enterprise using Open So...
Deploying, Managing, and Leveraging Honeypots in the Enterprise using Open So...Deploying, Managing, and Leveraging Honeypots in the Enterprise using Open So...
Deploying, Managing, and Leveraging Honeypots in the Enterprise using Open So...
 
Smashing Through Big Data Barriers with Tableau and Snowflake
Smashing Through Big Data Barriers with Tableau and SnowflakeSmashing Through Big Data Barriers with Tableau and Snowflake
Smashing Through Big Data Barriers with Tableau and Snowflake
 
Addressing the systemic shortcomings of cloud analytics
Addressing the systemic shortcomings of cloud analyticsAddressing the systemic shortcomings of cloud analytics
Addressing the systemic shortcomings of cloud analytics
 
Turn Data Into Actionable Insights - StampedeCon 2016
Turn Data Into Actionable Insights - StampedeCon 2016Turn Data Into Actionable Insights - StampedeCon 2016
Turn Data Into Actionable Insights - StampedeCon 2016
 
Hadoop as an Analytic Platform: Why Not?
Hadoop as an Analytic Platform: Why Not?Hadoop as an Analytic Platform: Why Not?
Hadoop as an Analytic Platform: Why Not?
 
ISV Showcase: End-to-end Machine Learning using H2O on Azure
ISV Showcase: End-to-end Machine Learning using H2O on AzureISV Showcase: End-to-end Machine Learning using H2O on Azure
ISV Showcase: End-to-end Machine Learning using H2O on Azure
 
SQL + Hadoop: The High Performance Advantage�
SQL + Hadoop:  The High Performance Advantage�SQL + Hadoop:  The High Performance Advantage�
SQL + Hadoop: The High Performance Advantage�
 
Cloud scale predictive DevOps automation using Apache Spark: Velocity in Amst...
Cloud scale predictive DevOps automation using Apache Spark: Velocity in Amst...Cloud scale predictive DevOps automation using Apache Spark: Velocity in Amst...
Cloud scale predictive DevOps automation using Apache Spark: Velocity in Amst...
 

Recently uploaded

Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
MaleehaSheikh2
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
vcaxypu
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
Opendatabay
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
yhkoc
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
ahzuo
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Subhajit Sahu
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
nscud
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Boston Institute of Analytics
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
Oppotus
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
pchutichetpong
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
ocavb
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
vcaxypu
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
ArpitMalhotra16
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 

Recently uploaded (20)

Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 

Open Source Technologies in the Analytics Revolution

  • 1. Open Source Technologies in the Analytics Revolution Zitong Wei Rachel Beddor
  • 3. © Kyligence Inc. 2021, Confidential. Open Source in Big Data Why we love open source • Free • Customizable • Innovation • Choices • Standard • Personal growth
  • 4. © Kyligence Inc. 2021, Confidential. Data Pipeline for Analytics Collect Process Store Analyze
  • 5. © Kyligence Inc. 2021, Confidential. Data Pipeline for Analytics Structured Data Unstructured Data Semi-Structured Data
  • 6. © Kyligence Inc. 2021, Confidential. Typical Use Cases • Customer Analysis • Operational Efficiency • Abnormal/Fraud Detection • Recommendation • Self Service Analysis
  • 7. © Kyligence Inc. 2021, Confidential. Platform Apache Hadoop, Apache Spark Ingestion & ETL Nifi, Sqoop, Airflow, Gobblin Streaming Kafka, Spark Streaming, Flink, Samza, Storm, Flume Open Source Projects NoSQL HBase, Cassandra, MongoDB SQL Spark SQL, Hive, Impala, Presto Machine Learning Python, R, Tensorflow Report & Visualization Superset, Jupyter, Zepplin
  • 9. © Kyligence Inc. 2021, Confidential. MapReduce
  • 10. © Kyligence Inc. 2021, Confidential.
  • 11. © Kyligence Inc. 2021, Confidential.
  • 13. © Kyligence Inc. 2021, Confidential.
  • 14. © Kyligence Inc. 2021, Confidential.
  • 15. © Kyligence Inc. 2021, Confidential.
  • 17. © Kyligence Inc. 2021, Confidential.
  • 18. © Kyligence Inc. 2021, Confidential.
  • 19. © Kyligence Inc. 2021, Confidential.
  • 21. © Kyligence Inc. 2021, Confidential. Reporting & Visualization
  • 22. © Kyligence Inc. 2021, Confidential. SQL Engine
  • 24. © Kyligence Inc. 2021, Confidential. OLAP (Online Analytical Processing) Good at:  Designed for analysis – BI reporting, data discovery etc.  Quickly answering questions like:  What are our top 5 best-selling products in each state/city?  Which products should be put together?  What is our profit for beer in the US this year? Online Analytical Processing Not good at: • Update/delete frequently • Transactional data
  • 25. © Kyligence Inc. 2021, Confidential. OLAP Cubes 120 80 60 50 130 90 70 50 100 April May June New York Los Angeles San Francisco Beer Milk Juice Q: How many beers were sold in Los Angeles in June? A: 90
  • 26. © Kyligence Inc. 2021, Confidential. Traditional OLAP Tools
  • 27. © Kyligence Inc. 2021, Confidential. Challenges in the Big Data Era Traditional OLAP tools are great but… • Difficult to handle massive data volumes • Cube size limited by a single machine • Have to maintain lots of cubes • Hard to scale • Takes a long time to build cubes • Number of dimensions is limited
  • 28. © Kyligence Inc. 2021, Confidential. Journey of Apache Kylin Sept 2013 Oct 2014 Nov 2014 Sept 2015 Nov 2015 Mar 2016 Officially Open Source Project Initiated Apache Incubator Project InfoWorld Best Open Source Big Data Tool Award Kyligence Inc. Founded Apache Top-Level Project
  • 29. © Kyligence Inc. 2021, Confidential. Apache Kylin Architecture BI Tools, Web App… ANSI SQL OLAP Cube
  • 30. © Kyligence Inc. 2021, Confidential. Performance Benchmark
  • 31. © Kyligence Inc. 2021, Confidential. Apache Kylin Data Flow
  • 32. © Kyligence Inc. 2021, Confidential. Kylin on Lambda
  • 34. © Kyligence Inc. 2021, Confidential. Demonstration – Technical Details
  • 35. © Kyligence Inc. 2021, Confidential. Demonstration – Dataset Details Fact Table Movie Box Office Revenue Dimension Table Dates Dimension Table Movie Genres
  • 36. © Kyligence Inc. 2021, Confidential. Join the community https://github.com/apache/kylin apache-kylin.slack.com user@kylin.apache.org
  • 37. © Kyligence Inc. 2021, Confidential. Contact Us Kyligence Inc  http://kyligence.io  info@kyligence.io  Twitter: @Kyligence Apache Kylin  http://kylin.apache.org  dev@kylin.apache.org  Twitter: @ApacheKylin
  • 38. © Kyligence Inc. 2021, Confidential.

Editor's Notes

  1. MPP – massively parallel processing
  2. Mention HBase will be removed in next release Kylin runs on cluster
  3. Mention HBase will be removed in next release