SlideShare a Scribd company logo
1 of 27
Download to read offline
Big Telco 

Real-Time Network Analytics
Yousun Jeong
Who am I?
§ Senior Software Engineer of SK Telecom, South Korea’s largest wireless communications
provider
§ Work on commercial products (~ ’17)
- She worked with Big Data Solution
- She worked with IaaS(OpenStack)
- She worked with PaaS(CloudFoundry)

§ Mail to : jerryjung@apache.org
22
Table of Contents
§ Big Data in SK Telecom
§ History of SKT's big data
§ Overall Architecture
§ Use case: Real-Time Network Analytics
3
Big Data in SKT in a Nutshell
§ Data Size
- Currently collecting 100 TB/day
§ Big Data Management Infrastructure
- Hadoop cluster (1400+ nodes); migrated from MPP RDBMS
§ Overall Architecture
- Spark
- Druid
§ Real-Time Network Analytics
- Real-Time Processing
- Hadoop DW
- Big Data Discovery
4
5
History of SKT’s Big Data
6
§ Batch	Processing(Daily)	
§ Map-Reduce	Programming
§ Hadoop	HDFS
2013
§ Batch	Processing(Hourly,	Daily)	
§ SQL	on	Hadoop
§ Hive(UDF,	UDAF)
2014
§ Real-time	Processing	(Near	real-time)		
§ Hadoop	DW	
§ Spark(Streaming,	SQL)
2015
§ Big	Data	OLAP	cube	
§ Self	Data	Discovery	
§ Druid
Now
Overall Architecture
§ Designed to handle both real-time & batch data processing and high level analysis using
Spark and Druid as a core technology
7
BatchInterface Layer
Flume
Kafka HDFS
oozie (workflow)
Spark
(ETL)
Analytics
Layer
1
2
Spark SQL
Spark MlLib
Jupyter(R,Python)
Kubernetes
YARN (Unified Resource Manager)
Real-Time
Layer
NoSQL
Elastic

Search
HDFS
Data Service
Layer
Legacy
App
3
Analytics Layer
Batch Processing Layer
Hadoop EDW
Real-Time Layer
Real-Time analysis
3
1
2
【 Components 】
Spark Streaming
H/W Accelerator
(SSD, FPGA)
Provisioning
PXEBoot/chef
4
5
Druid
(Mart)
Metatron(BI)
Benefits of Spark
§ Spark help us to have the gains in processing speed and implement various big data
applications easily and speedily
§ Why SKT use Spark…
- Support for Event Stream Processing
- Fast Data Queries in Real Time
- Improved Programmer Productivity
- Fast Batch Processing of Large Data Set
8
Benchmark - SQL on Hadoop
§ Spark vs Hive
9
Table 1
Query

ID
Q01 Q02 Q03 Q04 Q05 Q06 Q07 Q08 Q09 Q10 Q11 Q12 Q13 Q14 Q15 Q16 Q17 Q18 Q19 Q20 Q21 Q22
Spark 47s 16s 47s 61s 62s 50s 72s 107s 133s 57s 191s 59s 25s 50s 56s 40s 143s 147s 60s 81s 228s 21s
Hive

(tez)
68s 62s 190s 122s 115s 61s 207s 133s 390s 110s 47s 70s 54s 54s 69s 81s 139s 195s 85s 114s 232s 91s
Benefits of Druid
§ Druid is a distributed in-memory OLAP data store. It has features of timestamp-based
sharding, columnar index & compression, and pre-aggregation on the metric
§ Why SKT use Druid…
- Sub-second processing capability
- Stores aggregated summary data 

for time-series data
- Separated processing engine

(Real-time and historical engine) 

support analytics at the same time
10
Deep
Storage
(HDFS/S3)
Realtime
Nodes
Hand off Data
Historical
Nodes
Broker
Coordinator
MetaData
Streaming Data
Batch Data
Indexing
Data segments
Queries
Queries
Druid vs Spark Performance Comparison
§ Druid and Spark have different results depending on the nature of the engine.
§ Druid vs Spark
- Druid converts data into OLAP 

optimized pre-aggregated, indexed, 

columnar structures
- Druid has separate ingestion overhead
- Excellent in terms of memory and 

disk I/O compared to Spark
- Spark is able to process all TPC-H queries
11
https://github.com/jaehc/tpch-spark/tree/feature-run-multiple-queries

http://druid.io/blog/2014/03/17/benchmarking-druid.html
Druid vs Spark Performance Comparison
§ SUM_ALL_YEAR
- SELECT YEAR(L_SHIPDATE),
SUM(L_EXTENDEDPRICE),
SUM(L_DISCOUNT),SUM(L_TAX), SUM(L_QUANTITY)
FROM LINEITEM GROUP BY YEAR(L_SHIPDATE)
§ TOP_100_PARTS_DETAILS
- SELECT L_PARTKEY, SUM(L_QUANTITY),
SUM(L_EXTENDEDPRICE),MIN(L_DISCOUNT),
MAX(L_DISCOUNT) FROM LINEITEM GROUP BY
L_PARTKEY ORDER BY SUM(L_QUANTITY) DESC
LIMIT 100
12
Use cases : Summary
13
TANGO-D
APOLLO
• TANGO(T Advanced Next Generation OSS)-D(Data warehouse)
• End-to-end network quality assurance and fault analysis in a
timely manner
• APOLLO(Analytics PlatfOrm for inteLLigent Operation)
• Real-time analysis of radio access network to improve
operation efficiency
Real-Time Network analytics
1
2
Metatron 

Discovery
3
• Metatron(Development by SKT big data discovery & analytics
solution)
• Interactive Analysis for network engineer & operator & data
scientist
Use Case 1: Apollo Real-Time Analytics
§ APOLLO aims to improve mobile user experience, reduce operation cost, and improve
operation efficiency by analyzing radio access networks
14
Analytics Output
Root
Cause
Finding
Anomaly
Detection
Optimization
Resource
Monitoring
Call Data
RF Signal
Customer/Service
Device Data
A/F/S
Real-Time Analytics
Platform
Data
Collecting
Analytics based
Control
OAM
Operator
Predictive
Analysis
Service
Analysis
Real-time
Monitoring &
OptimizationEngineering 

Optimization
NetworkIntelligence
KPI
Detection
* APOLLO : Analytics PlatfOrm for inteLLigent Operation
Use Case 1: Apollo Real-Time Analytics
§ APOLLO collects and analyzes raw data from base stations in real time to optimize the
service performance
§ Spark Streaming
- Processes raw data to obtain statistics 

every 10 seconds
- Automatically detects abnormality
§ Real-Time User/Service Level Optimization
- Predict traffic variation and base 

station performance
- Minimize degradation in base

station and user performance
15
Base Station
Storage
Spark
Dashboard
Spark Streaming
Data
Parsing
Real time
Processing
Kafka
Data
Converting
RDD
Elastic

Search
[ Real-Time Analytics]
Use case 2: TANGO-D
§ TANGO-D is a Hadoop DW that can handle big telco data with scalability & cost efficiency
16
“Hadoop S/W and Commodity H/W
Based Cost-effective IT Infrastructure System”
【 SKT DW Infrastructure】
“High-price, High-performance
Proprietary IT Infrastructure System”
【 Legacy IT Infrastructure 】
※ MPP Massively Parallel Processing, SAN Storage Area Network, NAS Network Attached Storage, RDBMS Relational DB Management System
Structured/Un-structured Data
Scale-out Structure (Petabyte, Exabyte)
Data
Structured Data
Scale-up Structure (Terabyte)
Commodity H/W (x86 Server)H/W
High Performance H/W
(MPP, Fabric Switch, etc.)
Hadoop Architecture
SQL on Hadoop
S/W
Proprietary S/W

(RDBMS, etc.)
Transaction/Batch
Processing
(SQL) Hadoop File System
※ MPP Massively Parallel Processing
Use case 2: TANGO-D
§ Data scientists need unified platform to collect data from all network equipment for
management and analysis purpose
§ Expected advantages
- Unification of 130+ legacy DMBSs, each of which was managing separate network monitoring system, 

enabling thorough analysis over the entire network
- Quick and accurate identification of root causes of network failure
17
NMS#1
DBMS
…
NMS#1
DBMS
NMS#N-1
DBMS
[ AS-WAS ]

Siloed Data & IT Management
Access NW Core NW Transport
NMS

#1
…
NMS

#2
NMS

#N-1
Legacy
NMS

#N
Hadoop DW
DW
Legacy
NEW

NMS#1
…
NEW

NMS#N
BI &

Analytic…
[ AS-IS ]
Network Enterprise DW
Use case 2: TANGO-D
§ TANGO-D is a Hadoop-based data warehouse built on Spark for various network statistics
or raw data
§ User Benefits
- End-to-End quality assurance,

Fault analysis
- Reduces analysis lead time

(days → minutes)
- Saves TCO (1/5 less than legacy DW)
§ Hadoop DW
- Spark-SQL functions and query optimizer
- Bulk-loading and timely processing 

of large data 

(processing 2,500 table per hour)
18
Acess
Core
Transport
EMS
EMS
T-Pani
EMS
Hadoop DW
DW Data
Data Mart
SQL on
Hadoop

(Spark SQL)
IP
EMS
AnalyticsSQL
ETL
ETL
O
D
S
MQE

(Meta Query

Engine)
BI
Use case 3: Metatron Discovery
§ We developed the Metatron Discovery solution for quick and easy data analysis and we
applied it in-house big data system
19
Analysis & Analytics tools
(Jupyter, Prediction, Clustering)
Application
(Visualization,
Data Preparation, Workbench)
Big Data
Storage
File system
Key	FeaturesArchitecture
It easy to analyze big data with end-to-end
functionality from data preparation to
analysis charts.
Intuitive Analysis
Minimize ETL cost, speed up, and
support schema changes by creating a
single Big Mart by combining various
dimension data based on large-capacity
Big OLAP Cube
By transferring data to In-memory, Local
Storage, and Deep Storage over time, it is
possible to respond quickly to large-
capacity data over TB.
Sub-second Processing
Advanced Analytics
Provides analysis function in conjunction with
jupyter, Provides fast time series forecasting,
clustering with embedded analytics.
Data Processing Engine
(OLAP Engine)
Complex to analysis
separated various SWs
needed for each step of data
discovery
Too slow for big data
not support real-time
analysis
Lack of analytics functions
and visualization charts
for telecom analysis
Challenges
Use case 3: Metatron Discovery
§ Metatron Discovery enables E2E analysis to perform on a unified analytics platform
§ User Benefits
- Operational BI using 

network engineer and operator
- Work with Jupyter to perform 

Advanced Analysis
- Drill-Down search 

by Drag and Drop interface easily
20
Executive
Officer
Network
Operator
Field
Engineer Biz. Partner
TANGO-D
Access
Transport
Core/ICT
Planning and
Investment
Strategy
Engineering Construction Operation
Work & TT
Management
Network
Monitoring
N/W Data Repository Analytics PlatformE2E Inventory
Operational BI
Advanced

Analytics
Data Discovery
Use case 3: Metatron Discovery
§ Metatron's core engine is that Druid can query quickly by time granularity using a cache
21
Historical
Nodes
Broker
Zookeeper
Coordinator
Nodes
Druid Cluster


HDFS
metastore
Oozie
Hadoop Cluster(DW)
HDFS(Deep Storage)
Segment
Memory
Segment
Disk
Cache

Entries
Segment
Metadata
Data/segment
Queries
Querying

2017-01-03 ~

2017-01-08
Cache (Broker Nodes)
Result segment 2017-01-03/2017-01-04
Result segment 2017-01-07/2017-01-08
Querying

(Not in Cache)
Historical Node
Segment 2017-01-04/2017-01-05
Segment 2017-01-07/2017-01-08
Druid

Query Process
TANGO-D (Hadoop DW)
1
3
4
2
Use case 3: Metatron Discovery
§ Metatron Discovery composes to 3 Parts (Workspace, Workbench, Jupyter). Each user can
experience various analysis environments.
§ Workspace
- General Network Engineer 

& Operator
§ Workbench
- Advanced Analyst
§ Jupyter
- Statistical Analyst
22
Direct Query
TANGO-D(Hadoop DW Cluster)
Oozie
Spark

SQL
Thrift Server
Yarn
SparkSQL
HDFS
Druid Cluster
Deep Storage
Historical Nodes Real-Time Nodes
Broker
Nodes
Zookeeper
Coordinator
Nodes
Workbench
Workspace
Data
Analytics
(SQL)
특수지역 동기화
(Sqoop)
Fixed Report Dynamic Report
DW/Mart Data Batch
Data 

Analytics

Ad-hoc
Jupyter
R/Python
Metatron Discovery
Direct
Query
1
2 3
Containerized Environment of Analytics(Ongoing)
§ The analysis environment can deploy as a docker, configured for individual analysis
environments, and managed container resources as needed using by Kubernetes,
GlusterFS
23
K8S Master K8S Master K8S Node#1 K8S Node#N K8S Node#N
Nginx
GlusterFS GlusterFS GlusterFS
private shared
[Container]
[Provisioning]
Admin
User
Docker
Registry
Self-Data Preparation(Ongoing)
§ Data preparation makes it easy for anyone to do tedious and repetitive ETL tasks that
preprocessing for visualizing and analyzing data
24
Self-Data Analytics(Ongoing)
§ Data analysts can interact with Metatron Discovery to run analytics and create Rest API
directly from jupyter
25
1
2
3
4
Metatron
§ If you have any questions, please visit here - https://metatron.sktelecom.com/
26
THANK YOU

More Related Content

What's hot

03-NOV-1510-Ognjen-Antonic-Telemach-stream-1
03-NOV-1510-Ognjen-Antonic-Telemach-stream-103-NOV-1510-Ognjen-Antonic-Telemach-stream-1
03-NOV-1510-Ognjen-Antonic-Telemach-stream-1Ognjen Antonic
 
Real-Time Data Pipelines with Kafka, Spark, and Operational Databases
Real-Time Data Pipelines with Kafka, Spark, and Operational DatabasesReal-Time Data Pipelines with Kafka, Spark, and Operational Databases
Real-Time Data Pipelines with Kafka, Spark, and Operational DatabasesSingleStore
 
Change Data Capture with Data Collector @OVH
Change Data Capture with Data Collector @OVHChange Data Capture with Data Collector @OVH
Change Data Capture with Data Collector @OVHParis Data Engineers !
 
Sherlock: an anomaly detection service on top of Druid
Sherlock: an anomaly detection service on top of Druid Sherlock: an anomaly detection service on top of Druid
Sherlock: an anomaly detection service on top of Druid DataWorks Summit
 
A Non-Standard use Case of Hadoop: High Scale Image Processing and Analytics
A Non-Standard use Case of Hadoop: High Scale Image Processing and AnalyticsA Non-Standard use Case of Hadoop: High Scale Image Processing and Analytics
A Non-Standard use Case of Hadoop: High Scale Image Processing and AnalyticsDataWorks Summit
 
HBaseConAsia2018 Track2-6: Scaling 30TB's of data lake with Apache HBase and ...
HBaseConAsia2018 Track2-6: Scaling 30TB's of data lake with Apache HBase and ...HBaseConAsia2018 Track2-6: Scaling 30TB's of data lake with Apache HBase and ...
HBaseConAsia2018 Track2-6: Scaling 30TB's of data lake with Apache HBase and ...Michael Stack
 
Powering Predictive Mapping at Scale with Spark, Kafka, and Elastic Search: S...
Powering Predictive Mapping at Scale with Spark, Kafka, and Elastic Search: S...Powering Predictive Mapping at Scale with Spark, Kafka, and Elastic Search: S...
Powering Predictive Mapping at Scale with Spark, Kafka, and Elastic Search: S...Spark Summit
 
Spark Operator—Deploy, Manage and Monitor Spark clusters on Kubernetes
 Spark Operator—Deploy, Manage and Monitor Spark clusters on Kubernetes Spark Operator—Deploy, Manage and Monitor Spark clusters on Kubernetes
Spark Operator—Deploy, Manage and Monitor Spark clusters on KubernetesDatabricks
 
Leveraging docker for hadoop build automation and big data stack provisioning
Leveraging docker for hadoop build automation and big data stack provisioningLeveraging docker for hadoop build automation and big data stack provisioning
Leveraging docker for hadoop build automation and big data stack provisioningEvans Ye
 
HBaseConAsia2018: Track2-5: JanusGraph-Distributed graph database with HBase
HBaseConAsia2018: Track2-5: JanusGraph-Distributed graph database with HBaseHBaseConAsia2018: Track2-5: JanusGraph-Distributed graph database with HBase
HBaseConAsia2018: Track2-5: JanusGraph-Distributed graph database with HBaseMichael Stack
 
Apache Pulsar: The Next Generation Messaging and Queuing System
Apache Pulsar: The Next Generation Messaging and Queuing SystemApache Pulsar: The Next Generation Messaging and Queuing System
Apache Pulsar: The Next Generation Messaging and Queuing SystemDatabricks
 
C* Capacity Forecasting (Ajay Upadhyay, Jyoti Shandil, Arun Agrawal, Netflix)...
C* Capacity Forecasting (Ajay Upadhyay, Jyoti Shandil, Arun Agrawal, Netflix)...C* Capacity Forecasting (Ajay Upadhyay, Jyoti Shandil, Arun Agrawal, Netflix)...
C* Capacity Forecasting (Ajay Upadhyay, Jyoti Shandil, Arun Agrawal, Netflix)...DataStax
 
Backup multi-cloud solution based on named pipes
Backup multi-cloud solution based on named pipesBackup multi-cloud solution based on named pipes
Backup multi-cloud solution based on named pipesLeandro Totino Pereira
 
Proud to be Polyglot - Riviera Dev 2015
Proud to be Polyglot - Riviera Dev 2015Proud to be Polyglot - Riviera Dev 2015
Proud to be Polyglot - Riviera Dev 2015Tugdual Grall
 
Data Pipelines with Spark & DataStax Enterprise
Data Pipelines with Spark & DataStax EnterpriseData Pipelines with Spark & DataStax Enterprise
Data Pipelines with Spark & DataStax EnterpriseDataStax
 
Procella: A fast versatile SQL query engine powering data at Youtube
Procella: A fast versatile SQL query engine powering data at YoutubeProcella: A fast versatile SQL query engine powering data at Youtube
Procella: A fast versatile SQL query engine powering data at YoutubeDataWorks Summit
 
More Data, More Problems: Scaling Kafka-Mirroring Pipelines at LinkedIn
More Data, More Problems: Scaling Kafka-Mirroring Pipelines at LinkedIn More Data, More Problems: Scaling Kafka-Mirroring Pipelines at LinkedIn
More Data, More Problems: Scaling Kafka-Mirroring Pipelines at LinkedIn confluent
 

What's hot (20)

03-NOV-1510-Ognjen-Antonic-Telemach-stream-1
03-NOV-1510-Ognjen-Antonic-Telemach-stream-103-NOV-1510-Ognjen-Antonic-Telemach-stream-1
03-NOV-1510-Ognjen-Antonic-Telemach-stream-1
 
Real-Time Data Pipelines with Kafka, Spark, and Operational Databases
Real-Time Data Pipelines with Kafka, Spark, and Operational DatabasesReal-Time Data Pipelines with Kafka, Spark, and Operational Databases
Real-Time Data Pipelines with Kafka, Spark, and Operational Databases
 
Change Data Capture with Data Collector @OVH
Change Data Capture with Data Collector @OVHChange Data Capture with Data Collector @OVH
Change Data Capture with Data Collector @OVH
 
Big Data Tools in AWS
Big Data Tools in AWSBig Data Tools in AWS
Big Data Tools in AWS
 
Sherlock: an anomaly detection service on top of Druid
Sherlock: an anomaly detection service on top of Druid Sherlock: an anomaly detection service on top of Druid
Sherlock: an anomaly detection service on top of Druid
 
Lambda architecture
Lambda architectureLambda architecture
Lambda architecture
 
A Non-Standard use Case of Hadoop: High Scale Image Processing and Analytics
A Non-Standard use Case of Hadoop: High Scale Image Processing and AnalyticsA Non-Standard use Case of Hadoop: High Scale Image Processing and Analytics
A Non-Standard use Case of Hadoop: High Scale Image Processing and Analytics
 
HBaseConAsia2018 Track2-6: Scaling 30TB's of data lake with Apache HBase and ...
HBaseConAsia2018 Track2-6: Scaling 30TB's of data lake with Apache HBase and ...HBaseConAsia2018 Track2-6: Scaling 30TB's of data lake with Apache HBase and ...
HBaseConAsia2018 Track2-6: Scaling 30TB's of data lake with Apache HBase and ...
 
Powering Predictive Mapping at Scale with Spark, Kafka, and Elastic Search: S...
Powering Predictive Mapping at Scale with Spark, Kafka, and Elastic Search: S...Powering Predictive Mapping at Scale with Spark, Kafka, and Elastic Search: S...
Powering Predictive Mapping at Scale with Spark, Kafka, and Elastic Search: S...
 
Spark Operator—Deploy, Manage and Monitor Spark clusters on Kubernetes
 Spark Operator—Deploy, Manage and Monitor Spark clusters on Kubernetes Spark Operator—Deploy, Manage and Monitor Spark clusters on Kubernetes
Spark Operator—Deploy, Manage and Monitor Spark clusters on Kubernetes
 
Leveraging docker for hadoop build automation and big data stack provisioning
Leveraging docker for hadoop build automation and big data stack provisioningLeveraging docker for hadoop build automation and big data stack provisioning
Leveraging docker for hadoop build automation and big data stack provisioning
 
HBaseConAsia2018: Track2-5: JanusGraph-Distributed graph database with HBase
HBaseConAsia2018: Track2-5: JanusGraph-Distributed graph database with HBaseHBaseConAsia2018: Track2-5: JanusGraph-Distributed graph database with HBase
HBaseConAsia2018: Track2-5: JanusGraph-Distributed graph database with HBase
 
Video Analysis in Hadoop
Video Analysis in HadoopVideo Analysis in Hadoop
Video Analysis in Hadoop
 
Apache Pulsar: The Next Generation Messaging and Queuing System
Apache Pulsar: The Next Generation Messaging and Queuing SystemApache Pulsar: The Next Generation Messaging and Queuing System
Apache Pulsar: The Next Generation Messaging and Queuing System
 
C* Capacity Forecasting (Ajay Upadhyay, Jyoti Shandil, Arun Agrawal, Netflix)...
C* Capacity Forecasting (Ajay Upadhyay, Jyoti Shandil, Arun Agrawal, Netflix)...C* Capacity Forecasting (Ajay Upadhyay, Jyoti Shandil, Arun Agrawal, Netflix)...
C* Capacity Forecasting (Ajay Upadhyay, Jyoti Shandil, Arun Agrawal, Netflix)...
 
Backup multi-cloud solution based on named pipes
Backup multi-cloud solution based on named pipesBackup multi-cloud solution based on named pipes
Backup multi-cloud solution based on named pipes
 
Proud to be Polyglot - Riviera Dev 2015
Proud to be Polyglot - Riviera Dev 2015Proud to be Polyglot - Riviera Dev 2015
Proud to be Polyglot - Riviera Dev 2015
 
Data Pipelines with Spark & DataStax Enterprise
Data Pipelines with Spark & DataStax EnterpriseData Pipelines with Spark & DataStax Enterprise
Data Pipelines with Spark & DataStax Enterprise
 
Procella: A fast versatile SQL query engine powering data at Youtube
Procella: A fast versatile SQL query engine powering data at YoutubeProcella: A fast versatile SQL query engine powering data at Youtube
Procella: A fast versatile SQL query engine powering data at Youtube
 
More Data, More Problems: Scaling Kafka-Mirroring Pipelines at LinkedIn
More Data, More Problems: Scaling Kafka-Mirroring Pipelines at LinkedIn More Data, More Problems: Scaling Kafka-Mirroring Pipelines at LinkedIn
More Data, More Problems: Scaling Kafka-Mirroring Pipelines at LinkedIn
 

Similar to Stsg17 speaker yousunjeong

Big Telco Real-Time Network Analytics
Big Telco Real-Time Network AnalyticsBig Telco Real-Time Network Analytics
Big Telco Real-Time Network AnalyticsYousun Jeong
 
Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...
Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...
Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...Databricks
 
A Glass Half Full: Using Programmable Hardware Accelerators in Analytical Dat...
A Glass Half Full: Using Programmable Hardware Accelerators in Analytical Dat...A Glass Half Full: Using Programmable Hardware Accelerators in Analytical Dat...
A Glass Half Full: Using Programmable Hardware Accelerators in Analytical Dat...Facultad de Informática UCM
 
Inroduction to Big Data
Inroduction to Big DataInroduction to Big Data
Inroduction to Big DataOmnia Safaan
 
Otimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
Otimizações de Projetos de Big Data, Dw e AI no Microsoft AzureOtimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
Otimizações de Projetos de Big Data, Dw e AI no Microsoft AzureLuan Moreno Medeiros Maciel
 
Pivotal Real Time Data Stream Analytics
Pivotal Real Time Data Stream AnalyticsPivotal Real Time Data Stream Analytics
Pivotal Real Time Data Stream Analyticskgshukla
 
The Future of Hadoop: A deeper look at Apache Spark
The Future of Hadoop: A deeper look at Apache SparkThe Future of Hadoop: A deeper look at Apache Spark
The Future of Hadoop: A deeper look at Apache SparkCloudera, Inc.
 
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...DataStax
 
SQream-GPU가속 초거대 정형데이타 분석용 SQL DB-제품소개-박문기@메가존클라우드
SQream-GPU가속 초거대 정형데이타 분석용 SQL DB-제품소개-박문기@메가존클라우드SQream-GPU가속 초거대 정형데이타 분석용 SQL DB-제품소개-박문기@메가존클라우드
SQream-GPU가속 초거대 정형데이타 분석용 SQL DB-제품소개-박문기@메가존클라우드문기 박
 
Dataiku - hadoop ecosystem - @Epitech Paris - janvier 2014
Dataiku  - hadoop ecosystem - @Epitech Paris - janvier 2014Dataiku  - hadoop ecosystem - @Epitech Paris - janvier 2014
Dataiku - hadoop ecosystem - @Epitech Paris - janvier 2014Dataiku
 
Hadoop workshop
Hadoop workshopHadoop workshop
Hadoop workshopFang Mac
 
Vitalii Bondarenko - Масштабована бізнес-аналітика у Cloud Big Data Cluster. ...
Vitalii Bondarenko - Масштабована бізнес-аналітика у Cloud Big Data Cluster. ...Vitalii Bondarenko - Масштабована бізнес-аналітика у Cloud Big Data Cluster. ...
Vitalii Bondarenko - Масштабована бізнес-аналітика у Cloud Big Data Cluster. ...Lviv Startup Club
 
Using a Fast Operational Database to Build Real-time Streaming Aggregations
Using a Fast Operational Database to Build Real-time Streaming AggregationsUsing a Fast Operational Database to Build Real-time Streaming Aggregations
Using a Fast Operational Database to Build Real-time Streaming AggregationsVoltDB
 
Adding structure to your streaming pipelines: moving from Spark streaming to ...
Adding structure to your streaming pipelines: moving from Spark streaming to ...Adding structure to your streaming pipelines: moving from Spark streaming to ...
Adding structure to your streaming pipelines: moving from Spark streaming to ...DataWorks Summit
 
Big data at United Airlines
Big data at United AirlinesBig data at United Airlines
Big data at United AirlinesDataWorks Summit
 
Explore big data at speed of thought with Spark 2.0 and Snappydata
Explore big data at speed of thought with Spark 2.0 and SnappydataExplore big data at speed of thought with Spark 2.0 and Snappydata
Explore big data at speed of thought with Spark 2.0 and SnappydataData Con LA
 
SnappyData Toronto Meetup Nov 2017
SnappyData Toronto Meetup Nov 2017SnappyData Toronto Meetup Nov 2017
SnappyData Toronto Meetup Nov 2017SnappyData
 
The Apache Spark config behind the indsutry's first 100TB Spark SQL benchmark
The Apache Spark config behind the indsutry's first 100TB Spark SQL benchmarkThe Apache Spark config behind the indsutry's first 100TB Spark SQL benchmark
The Apache Spark config behind the indsutry's first 100TB Spark SQL benchmarkLenovo Data Center
 

Similar to Stsg17 speaker yousunjeong (20)

Big Telco Real-Time Network Analytics
Big Telco Real-Time Network AnalyticsBig Telco Real-Time Network Analytics
Big Telco Real-Time Network Analytics
 
Kafka & Hadoop in Rakuten
Kafka & Hadoop in RakutenKafka & Hadoop in Rakuten
Kafka & Hadoop in Rakuten
 
Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...
Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...
Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...
 
A Glass Half Full: Using Programmable Hardware Accelerators in Analytical Dat...
A Glass Half Full: Using Programmable Hardware Accelerators in Analytical Dat...A Glass Half Full: Using Programmable Hardware Accelerators in Analytical Dat...
A Glass Half Full: Using Programmable Hardware Accelerators in Analytical Dat...
 
Inroduction to Big Data
Inroduction to Big DataInroduction to Big Data
Inroduction to Big Data
 
Greenplum Architecture
Greenplum ArchitectureGreenplum Architecture
Greenplum Architecture
 
Otimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
Otimizações de Projetos de Big Data, Dw e AI no Microsoft AzureOtimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
Otimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
 
Pivotal Real Time Data Stream Analytics
Pivotal Real Time Data Stream AnalyticsPivotal Real Time Data Stream Analytics
Pivotal Real Time Data Stream Analytics
 
The Future of Hadoop: A deeper look at Apache Spark
The Future of Hadoop: A deeper look at Apache SparkThe Future of Hadoop: A deeper look at Apache Spark
The Future of Hadoop: A deeper look at Apache Spark
 
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
 
SQream-GPU가속 초거대 정형데이타 분석용 SQL DB-제품소개-박문기@메가존클라우드
SQream-GPU가속 초거대 정형데이타 분석용 SQL DB-제품소개-박문기@메가존클라우드SQream-GPU가속 초거대 정형데이타 분석용 SQL DB-제품소개-박문기@메가존클라우드
SQream-GPU가속 초거대 정형데이타 분석용 SQL DB-제품소개-박문기@메가존클라우드
 
Dataiku - hadoop ecosystem - @Epitech Paris - janvier 2014
Dataiku  - hadoop ecosystem - @Epitech Paris - janvier 2014Dataiku  - hadoop ecosystem - @Epitech Paris - janvier 2014
Dataiku - hadoop ecosystem - @Epitech Paris - janvier 2014
 
Hadoop workshop
Hadoop workshopHadoop workshop
Hadoop workshop
 
Vitalii Bondarenko - Масштабована бізнес-аналітика у Cloud Big Data Cluster. ...
Vitalii Bondarenko - Масштабована бізнес-аналітика у Cloud Big Data Cluster. ...Vitalii Bondarenko - Масштабована бізнес-аналітика у Cloud Big Data Cluster. ...
Vitalii Bondarenko - Масштабована бізнес-аналітика у Cloud Big Data Cluster. ...
 
Using a Fast Operational Database to Build Real-time Streaming Aggregations
Using a Fast Operational Database to Build Real-time Streaming AggregationsUsing a Fast Operational Database to Build Real-time Streaming Aggregations
Using a Fast Operational Database to Build Real-time Streaming Aggregations
 
Adding structure to your streaming pipelines: moving from Spark streaming to ...
Adding structure to your streaming pipelines: moving from Spark streaming to ...Adding structure to your streaming pipelines: moving from Spark streaming to ...
Adding structure to your streaming pipelines: moving from Spark streaming to ...
 
Big data at United Airlines
Big data at United AirlinesBig data at United Airlines
Big data at United Airlines
 
Explore big data at speed of thought with Spark 2.0 and Snappydata
Explore big data at speed of thought with Spark 2.0 and SnappydataExplore big data at speed of thought with Spark 2.0 and Snappydata
Explore big data at speed of thought with Spark 2.0 and Snappydata
 
SnappyData Toronto Meetup Nov 2017
SnappyData Toronto Meetup Nov 2017SnappyData Toronto Meetup Nov 2017
SnappyData Toronto Meetup Nov 2017
 
The Apache Spark config behind the indsutry's first 100TB Spark SQL benchmark
The Apache Spark config behind the indsutry's first 100TB Spark SQL benchmarkThe Apache Spark config behind the indsutry's first 100TB Spark SQL benchmark
The Apache Spark config behind the indsutry's first 100TB Spark SQL benchmark
 

More from Yousun Jeong

Spark day 2017 - Spark on Kubernetes
Spark day 2017 - Spark on KubernetesSpark day 2017 - Spark on Kubernetes
Spark day 2017 - Spark on KubernetesYousun Jeong
 
Druid meetup 4th_sql_on_druid
Druid meetup 4th_sql_on_druidDruid meetup 4th_sql_on_druid
Druid meetup 4th_sql_on_druidYousun Jeong
 
Kafka for begginer
Kafka for begginerKafka for begginer
Kafka for begginerYousun Jeong
 
Data Analytics with Druid
Data Analytics with DruidData Analytics with Druid
Data Analytics with DruidYousun Jeong
 
Spark streaming , Spark SQL
Spark streaming , Spark SQLSpark streaming , Spark SQL
Spark streaming , Spark SQLYousun Jeong
 
Enterprise 환경에서의 오픈소스 기반 아키텍처 적용 사례
Enterprise 환경에서의 오픈소스 기반 아키텍처 적용 사례Enterprise 환경에서의 오픈소스 기반 아키텍처 적용 사례
Enterprise 환경에서의 오픈소스 기반 아키텍처 적용 사례Yousun Jeong
 
2012 07 28_cloud_reference_architecture_openplatform
2012 07 28_cloud_reference_architecture_openplatform2012 07 28_cloud_reference_architecture_openplatform
2012 07 28_cloud_reference_architecture_openplatformYousun Jeong
 

More from Yousun Jeong (8)

Spark day 2017 - Spark on Kubernetes
Spark day 2017 - Spark on KubernetesSpark day 2017 - Spark on Kubernetes
Spark day 2017 - Spark on Kubernetes
 
Druid meetup 4th_sql_on_druid
Druid meetup 4th_sql_on_druidDruid meetup 4th_sql_on_druid
Druid meetup 4th_sql_on_druid
 
Kubernetes on aws
Kubernetes on awsKubernetes on aws
Kubernetes on aws
 
Kafka for begginer
Kafka for begginerKafka for begginer
Kafka for begginer
 
Data Analytics with Druid
Data Analytics with DruidData Analytics with Druid
Data Analytics with Druid
 
Spark streaming , Spark SQL
Spark streaming , Spark SQLSpark streaming , Spark SQL
Spark streaming , Spark SQL
 
Enterprise 환경에서의 오픈소스 기반 아키텍처 적용 사례
Enterprise 환경에서의 오픈소스 기반 아키텍처 적용 사례Enterprise 환경에서의 오픈소스 기반 아키텍처 적용 사례
Enterprise 환경에서의 오픈소스 기반 아키텍처 적용 사례
 
2012 07 28_cloud_reference_architecture_openplatform
2012 07 28_cloud_reference_architecture_openplatform2012 07 28_cloud_reference_architecture_openplatform
2012 07 28_cloud_reference_architecture_openplatform
 

Recently uploaded

Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfngoud9212
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 

Recently uploaded (20)

Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdf
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 

Stsg17 speaker yousunjeong

  • 1. Big Telco 
 Real-Time Network Analytics Yousun Jeong
  • 2. Who am I? § Senior Software Engineer of SK Telecom, South Korea’s largest wireless communications provider § Work on commercial products (~ ’17) - She worked with Big Data Solution - She worked with IaaS(OpenStack) - She worked with PaaS(CloudFoundry)
 § Mail to : jerryjung@apache.org 22
  • 3. Table of Contents § Big Data in SK Telecom § History of SKT's big data § Overall Architecture § Use case: Real-Time Network Analytics 3
  • 4. Big Data in SKT in a Nutshell § Data Size - Currently collecting 100 TB/day § Big Data Management Infrastructure - Hadoop cluster (1400+ nodes); migrated from MPP RDBMS § Overall Architecture - Spark - Druid § Real-Time Network Analytics - Real-Time Processing - Hadoop DW - Big Data Discovery 4
  • 5. 5
  • 6. History of SKT’s Big Data 6 § Batch Processing(Daily) § Map-Reduce Programming § Hadoop HDFS 2013 § Batch Processing(Hourly, Daily) § SQL on Hadoop § Hive(UDF, UDAF) 2014 § Real-time Processing (Near real-time) § Hadoop DW § Spark(Streaming, SQL) 2015 § Big Data OLAP cube § Self Data Discovery § Druid Now
  • 7. Overall Architecture § Designed to handle both real-time & batch data processing and high level analysis using Spark and Druid as a core technology 7 BatchInterface Layer Flume Kafka HDFS oozie (workflow) Spark (ETL) Analytics Layer 1 2 Spark SQL Spark MlLib Jupyter(R,Python) Kubernetes YARN (Unified Resource Manager) Real-Time Layer NoSQL Elastic
 Search HDFS Data Service Layer Legacy App 3 Analytics Layer Batch Processing Layer Hadoop EDW Real-Time Layer Real-Time analysis 3 1 2 【 Components 】 Spark Streaming H/W Accelerator (SSD, FPGA) Provisioning PXEBoot/chef 4 5 Druid (Mart) Metatron(BI)
  • 8. Benefits of Spark § Spark help us to have the gains in processing speed and implement various big data applications easily and speedily § Why SKT use Spark… - Support for Event Stream Processing - Fast Data Queries in Real Time - Improved Programmer Productivity - Fast Batch Processing of Large Data Set 8
  • 9. Benchmark - SQL on Hadoop § Spark vs Hive 9 Table 1 Query
 ID Q01 Q02 Q03 Q04 Q05 Q06 Q07 Q08 Q09 Q10 Q11 Q12 Q13 Q14 Q15 Q16 Q17 Q18 Q19 Q20 Q21 Q22 Spark 47s 16s 47s 61s 62s 50s 72s 107s 133s 57s 191s 59s 25s 50s 56s 40s 143s 147s 60s 81s 228s 21s Hive
 (tez) 68s 62s 190s 122s 115s 61s 207s 133s 390s 110s 47s 70s 54s 54s 69s 81s 139s 195s 85s 114s 232s 91s
  • 10. Benefits of Druid § Druid is a distributed in-memory OLAP data store. It has features of timestamp-based sharding, columnar index & compression, and pre-aggregation on the metric § Why SKT use Druid… - Sub-second processing capability - Stores aggregated summary data 
 for time-series data - Separated processing engine
 (Real-time and historical engine) 
 support analytics at the same time 10 Deep Storage (HDFS/S3) Realtime Nodes Hand off Data Historical Nodes Broker Coordinator MetaData Streaming Data Batch Data Indexing Data segments Queries Queries
  • 11. Druid vs Spark Performance Comparison § Druid and Spark have different results depending on the nature of the engine. § Druid vs Spark - Druid converts data into OLAP 
 optimized pre-aggregated, indexed, 
 columnar structures - Druid has separate ingestion overhead - Excellent in terms of memory and 
 disk I/O compared to Spark - Spark is able to process all TPC-H queries 11 https://github.com/jaehc/tpch-spark/tree/feature-run-multiple-queries
 http://druid.io/blog/2014/03/17/benchmarking-druid.html
  • 12. Druid vs Spark Performance Comparison § SUM_ALL_YEAR - SELECT YEAR(L_SHIPDATE), SUM(L_EXTENDEDPRICE), SUM(L_DISCOUNT),SUM(L_TAX), SUM(L_QUANTITY) FROM LINEITEM GROUP BY YEAR(L_SHIPDATE) § TOP_100_PARTS_DETAILS - SELECT L_PARTKEY, SUM(L_QUANTITY), SUM(L_EXTENDEDPRICE),MIN(L_DISCOUNT), MAX(L_DISCOUNT) FROM LINEITEM GROUP BY L_PARTKEY ORDER BY SUM(L_QUANTITY) DESC LIMIT 100 12
  • 13. Use cases : Summary 13 TANGO-D APOLLO • TANGO(T Advanced Next Generation OSS)-D(Data warehouse) • End-to-end network quality assurance and fault analysis in a timely manner • APOLLO(Analytics PlatfOrm for inteLLigent Operation) • Real-time analysis of radio access network to improve operation efficiency Real-Time Network analytics 1 2 Metatron 
 Discovery 3 • Metatron(Development by SKT big data discovery & analytics solution) • Interactive Analysis for network engineer & operator & data scientist
  • 14. Use Case 1: Apollo Real-Time Analytics § APOLLO aims to improve mobile user experience, reduce operation cost, and improve operation efficiency by analyzing radio access networks 14 Analytics Output Root Cause Finding Anomaly Detection Optimization Resource Monitoring Call Data RF Signal Customer/Service Device Data A/F/S Real-Time Analytics Platform Data Collecting Analytics based Control OAM Operator Predictive Analysis Service Analysis Real-time Monitoring & OptimizationEngineering 
 Optimization NetworkIntelligence KPI Detection * APOLLO : Analytics PlatfOrm for inteLLigent Operation
  • 15. Use Case 1: Apollo Real-Time Analytics § APOLLO collects and analyzes raw data from base stations in real time to optimize the service performance § Spark Streaming - Processes raw data to obtain statistics 
 every 10 seconds - Automatically detects abnormality § Real-Time User/Service Level Optimization - Predict traffic variation and base 
 station performance - Minimize degradation in base
 station and user performance 15 Base Station Storage Spark Dashboard Spark Streaming Data Parsing Real time Processing Kafka Data Converting RDD Elastic
 Search [ Real-Time Analytics]
  • 16. Use case 2: TANGO-D § TANGO-D is a Hadoop DW that can handle big telco data with scalability & cost efficiency 16 “Hadoop S/W and Commodity H/W Based Cost-effective IT Infrastructure System” 【 SKT DW Infrastructure】 “High-price, High-performance Proprietary IT Infrastructure System” 【 Legacy IT Infrastructure 】 ※ MPP Massively Parallel Processing, SAN Storage Area Network, NAS Network Attached Storage, RDBMS Relational DB Management System Structured/Un-structured Data Scale-out Structure (Petabyte, Exabyte) Data Structured Data Scale-up Structure (Terabyte) Commodity H/W (x86 Server)H/W High Performance H/W (MPP, Fabric Switch, etc.) Hadoop Architecture SQL on Hadoop S/W Proprietary S/W
 (RDBMS, etc.) Transaction/Batch Processing (SQL) Hadoop File System ※ MPP Massively Parallel Processing
  • 17. Use case 2: TANGO-D § Data scientists need unified platform to collect data from all network equipment for management and analysis purpose § Expected advantages - Unification of 130+ legacy DMBSs, each of which was managing separate network monitoring system, 
 enabling thorough analysis over the entire network - Quick and accurate identification of root causes of network failure 17 NMS#1 DBMS … NMS#1 DBMS NMS#N-1 DBMS [ AS-WAS ]
 Siloed Data & IT Management Access NW Core NW Transport NMS
 #1 … NMS
 #2 NMS
 #N-1 Legacy NMS
 #N Hadoop DW DW Legacy NEW
 NMS#1 … NEW
 NMS#N BI &
 Analytic… [ AS-IS ] Network Enterprise DW
  • 18. Use case 2: TANGO-D § TANGO-D is a Hadoop-based data warehouse built on Spark for various network statistics or raw data § User Benefits - End-to-End quality assurance,
 Fault analysis - Reduces analysis lead time
 (days → minutes) - Saves TCO (1/5 less than legacy DW) § Hadoop DW - Spark-SQL functions and query optimizer - Bulk-loading and timely processing 
 of large data 
 (processing 2,500 table per hour) 18 Acess Core Transport EMS EMS T-Pani EMS Hadoop DW DW Data Data Mart SQL on Hadoop
 (Spark SQL) IP EMS AnalyticsSQL ETL ETL O D S MQE
 (Meta Query
 Engine) BI
  • 19. Use case 3: Metatron Discovery § We developed the Metatron Discovery solution for quick and easy data analysis and we applied it in-house big data system 19 Analysis & Analytics tools (Jupyter, Prediction, Clustering) Application (Visualization, Data Preparation, Workbench) Big Data Storage File system Key FeaturesArchitecture It easy to analyze big data with end-to-end functionality from data preparation to analysis charts. Intuitive Analysis Minimize ETL cost, speed up, and support schema changes by creating a single Big Mart by combining various dimension data based on large-capacity Big OLAP Cube By transferring data to In-memory, Local Storage, and Deep Storage over time, it is possible to respond quickly to large- capacity data over TB. Sub-second Processing Advanced Analytics Provides analysis function in conjunction with jupyter, Provides fast time series forecasting, clustering with embedded analytics. Data Processing Engine (OLAP Engine) Complex to analysis separated various SWs needed for each step of data discovery Too slow for big data not support real-time analysis Lack of analytics functions and visualization charts for telecom analysis Challenges
  • 20. Use case 3: Metatron Discovery § Metatron Discovery enables E2E analysis to perform on a unified analytics platform § User Benefits - Operational BI using 
 network engineer and operator - Work with Jupyter to perform 
 Advanced Analysis - Drill-Down search 
 by Drag and Drop interface easily 20 Executive Officer Network Operator Field Engineer Biz. Partner TANGO-D Access Transport Core/ICT Planning and Investment Strategy Engineering Construction Operation Work & TT Management Network Monitoring N/W Data Repository Analytics PlatformE2E Inventory Operational BI Advanced
 Analytics Data Discovery
  • 21. Use case 3: Metatron Discovery § Metatron's core engine is that Druid can query quickly by time granularity using a cache 21 Historical Nodes Broker Zookeeper Coordinator Nodes Druid Cluster 
 HDFS metastore Oozie Hadoop Cluster(DW) HDFS(Deep Storage) Segment Memory Segment Disk Cache
 Entries Segment Metadata Data/segment Queries Querying
 2017-01-03 ~
 2017-01-08 Cache (Broker Nodes) Result segment 2017-01-03/2017-01-04 Result segment 2017-01-07/2017-01-08 Querying
 (Not in Cache) Historical Node Segment 2017-01-04/2017-01-05 Segment 2017-01-07/2017-01-08 Druid
 Query Process TANGO-D (Hadoop DW) 1 3 4 2
  • 22. Use case 3: Metatron Discovery § Metatron Discovery composes to 3 Parts (Workspace, Workbench, Jupyter). Each user can experience various analysis environments. § Workspace - General Network Engineer 
 & Operator § Workbench - Advanced Analyst § Jupyter - Statistical Analyst 22 Direct Query TANGO-D(Hadoop DW Cluster) Oozie Spark
 SQL Thrift Server Yarn SparkSQL HDFS Druid Cluster Deep Storage Historical Nodes Real-Time Nodes Broker Nodes Zookeeper Coordinator Nodes Workbench Workspace Data Analytics (SQL) 특수지역 동기화 (Sqoop) Fixed Report Dynamic Report DW/Mart Data Batch Data 
 Analytics
 Ad-hoc Jupyter R/Python Metatron Discovery Direct Query 1 2 3
  • 23. Containerized Environment of Analytics(Ongoing) § The analysis environment can deploy as a docker, configured for individual analysis environments, and managed container resources as needed using by Kubernetes, GlusterFS 23 K8S Master K8S Master K8S Node#1 K8S Node#N K8S Node#N Nginx GlusterFS GlusterFS GlusterFS private shared [Container] [Provisioning] Admin User Docker Registry
  • 24. Self-Data Preparation(Ongoing) § Data preparation makes it easy for anyone to do tedious and repetitive ETL tasks that preprocessing for visualizing and analyzing data 24
  • 25. Self-Data Analytics(Ongoing) § Data analysts can interact with Metatron Discovery to run analytics and create Rest API directly from jupyter 25 1 2 3 4
  • 26. Metatron § If you have any questions, please visit here - https://metatron.sktelecom.com/ 26