Telco Analytics @Scale
Harikumar, Director Platform & Architecture
www.subex.com
Nov , 2016
1
Private & Confidentialwww.subex.com2
Subex Intro
Private & Confidentialwww.subex.com
Subex BSS/OSS Portfolio
3
Data Crunching - Use Cases & Latency
www.subex.com
Real Time
(Milliseconds)
Near Real Time
(Seconds)
Micro Batch
(Minutes)
Batch
(Hours-Days)
Latency
Algorithmic Complexity
Reporting
Aggregation
Rule Engine
Profiling
Machine Learning
Audits
Graph/Network Analysis
Text Search
Natural Language Processing
Stream Processing & Complex Event Processing (CEP)
Event Processing in the Eventful World
www.subex.com 5
• (Aggregated) Event
Data is
combined/correlated
with
• Users
• Assets
• Threats
• Vulnerabilities
• Location
• Historical
Techniques
• Rule Engine
• Event filtering
• Event aggregation and
transformation
• Operate on stored and
streaming data
• SQL like semantics over
stream data
• Supervised/Unsupervised
machine learning
• Applying known Models
• Event Pattern Detection
• Detecting Event
relationships
Areas
• Real time fraud
detection.
• Real time rating.
• Security Information and
Event Management
• Sensor Data/IOT
• DPI Data – Metadata,
Content,Flow
correlation.
• M2M Data
• Data Fraud – Malware
• Transaction Risk Scoring
Stream processing
• Keep the data Moving (Low Latency) – In Memory
• Distributed Message Queues
• Distributed In Memory Caches
• Distributed In Memory Stores
• Scalable, Highly Available Distributed stream Processing(Partition Data &
Scale, Data safety & Highly Available)
• Handle Stream Imperfections( Delayed, Missing, Out-Of-Order Data)
Key considerations
www.subex.com 6
ETL @ Scale – In Memory Distributed Cache
• Problem Statement(s)
• Scale the ETL
enrichment/lookups
layer
• High throughput +
Streaming Low latency
• Support Multiple
Access mechanisms
• GET/PUT
• SQL
• Views
www.subex.com 7
JVM
ETL
JVM
Cache
JVM
ETL
JVM
ETL
JVM
Cache
JVM
Cache
RDBMS
Read Write Update
Rule n
Rule Engine – In Memory Aggregation
www.subex.com 8
Event / I/P Data Record
Rule 2
Rule 1
Event Filters
Filtered Records
Aggregation Layer
Condition Evaluation
Actions
Shared Memory
8K
Page
Pool
16K
Page
Pool
32K
Page
Pool
..256
K
Page
Pool
Key / Value
Byte Stream
SerDEI
M
L
o
g
Shared Memory
8K
Page
Pool
16K
Page
Pool
32K
Page
Pool
..256
K
Page
Pool
Key / Value
Byte Stream
SerDEI
M
L
o
g
Shared Memory
8K
Page
Pool
16K
Page
Pool
32K
Page
Pool
..256
K
Page
Pool
Key / Value
Byte Stream
SerDEI
M
L
o
g
www.subex.com 9
Data placement Strategies
Application Data
• Application configuration data– Rule libraries
,DNA Configurations, Configurations – MySQL.
• Application generated data – Alarms,
Discrepancies – MySQL
• Operations Data (Application generated , Infra
Monitoring ) – Logs , Audit ,Metrics – Solr
• Application Aggregations - Summary/Pre-
aggregated data – Hive Tables
• Statistical Profiles, In Memory aggregation files –
HDFS
Traditional Telco Data
• Telco Entity Data – With Update Semantics –
HBase/MySQL
• Telco Historic Transaction Data – Hive with ORC file
format Partitioned by Date Stored in HDFS
• Switch Input Raw Files –HDFS
Other Sources
• Social Media
• DPI Flow Data
• Location Data
• IOT Sensor Data
Spark Streaming
Application Data
Data Flow
www.subex.com 10
Landing Directory
SAN/HDFS Apache
Flume
Flume –
Spark
Sink
Apache Kafka
In Memory
Rule Engine
Analytics
Application
s
…
Apache Spark
Streaming
ETL Adaptors
Flume – Dir
Source
Message
Queue
Flume –Kafka
Source
DB Sources
Sqoop/CDC
Tools
HDFS – Raw
File Backup
HDFS Hive Tables Hbase Tables Solr - Search
Indexes
Audits
MySQL–
Ref DB
HDFS
Hive Tables
Hbase Tables
Dist Message Queue
Data
Lake
Submit Spark
Jobs
Data Access
Hive/Presto
Distributed Cache
Operational
Metrics
Data Load
Stage
O
M
Spark Streaming
O
M
Pre-
aggregation
Data Management
Data Platform – Business and Domain Packaging
11
Data Acquisition/Ingest
Data Federation F/W
Data Processing
PreAggregation
Distributed Stream
Processing Apache Spark
Data Visualization & Analysis
Mobile F/W
ROC View
Case Management
Standard APIs – EAI & WS
Analytics Engine
Reconciliation Engine BPM-
Workflow
Engine
Flexible ETL
Rule Processing - In
Memory
Common Data Model
DistributedCache
Control
Panel
Operations &
Admin
Resource
Mgmt
Data Security
Audit &
Logging
Scheduler
Network Analysis
ROC Insights
Real time Message based
Distributed
MessageQueue
Hadoop – HDFS, Hive , HBase
Multi -tenancy
Machine Learning
Enterprise Search
Real time
Continuous
Query - CEP
Document Store Graph Data Store
Authoriza
tion &
Authentic
ation
Real time Rating
Profiling
Cloud
Metering
Risk Scoring
Cloud connectors
API Mgmt
Infrastructure
On premise OS/Servers/Network/StorageIaaS(Public /Private cloud)
ESB
Analytic Models
Thank you
Harikumar
hari.kumar@Subex.com
www.subex.com 12

Telco analytics at scale

  • 1.
    Telco Analytics @Scale Harikumar,Director Platform & Architecture www.subex.com Nov , 2016 1
  • 2.
  • 3.
  • 4.
    Data Crunching -Use Cases & Latency www.subex.com Real Time (Milliseconds) Near Real Time (Seconds) Micro Batch (Minutes) Batch (Hours-Days) Latency Algorithmic Complexity Reporting Aggregation Rule Engine Profiling Machine Learning Audits Graph/Network Analysis Text Search Natural Language Processing
  • 5.
    Stream Processing &Complex Event Processing (CEP) Event Processing in the Eventful World www.subex.com 5 • (Aggregated) Event Data is combined/correlated with • Users • Assets • Threats • Vulnerabilities • Location • Historical Techniques • Rule Engine • Event filtering • Event aggregation and transformation • Operate on stored and streaming data • SQL like semantics over stream data • Supervised/Unsupervised machine learning • Applying known Models • Event Pattern Detection • Detecting Event relationships Areas • Real time fraud detection. • Real time rating. • Security Information and Event Management • Sensor Data/IOT • DPI Data – Metadata, Content,Flow correlation. • M2M Data • Data Fraud – Malware • Transaction Risk Scoring
  • 6.
    Stream processing • Keepthe data Moving (Low Latency) – In Memory • Distributed Message Queues • Distributed In Memory Caches • Distributed In Memory Stores • Scalable, Highly Available Distributed stream Processing(Partition Data & Scale, Data safety & Highly Available) • Handle Stream Imperfections( Delayed, Missing, Out-Of-Order Data) Key considerations www.subex.com 6
  • 7.
    ETL @ Scale– In Memory Distributed Cache • Problem Statement(s) • Scale the ETL enrichment/lookups layer • High throughput + Streaming Low latency • Support Multiple Access mechanisms • GET/PUT • SQL • Views www.subex.com 7 JVM ETL JVM Cache JVM ETL JVM ETL JVM Cache JVM Cache RDBMS Read Write Update
  • 8.
    Rule n Rule Engine– In Memory Aggregation www.subex.com 8 Event / I/P Data Record Rule 2 Rule 1 Event Filters Filtered Records Aggregation Layer Condition Evaluation Actions Shared Memory 8K Page Pool 16K Page Pool 32K Page Pool ..256 K Page Pool Key / Value Byte Stream SerDEI M L o g Shared Memory 8K Page Pool 16K Page Pool 32K Page Pool ..256 K Page Pool Key / Value Byte Stream SerDEI M L o g Shared Memory 8K Page Pool 16K Page Pool 32K Page Pool ..256 K Page Pool Key / Value Byte Stream SerDEI M L o g
  • 9.
    www.subex.com 9 Data placementStrategies Application Data • Application configuration data– Rule libraries ,DNA Configurations, Configurations – MySQL. • Application generated data – Alarms, Discrepancies – MySQL • Operations Data (Application generated , Infra Monitoring ) – Logs , Audit ,Metrics – Solr • Application Aggregations - Summary/Pre- aggregated data – Hive Tables • Statistical Profiles, In Memory aggregation files – HDFS Traditional Telco Data • Telco Entity Data – With Update Semantics – HBase/MySQL • Telco Historic Transaction Data – Hive with ORC file format Partitioned by Date Stored in HDFS • Switch Input Raw Files –HDFS Other Sources • Social Media • DPI Flow Data • Location Data • IOT Sensor Data
  • 10.
    Spark Streaming Application Data DataFlow www.subex.com 10 Landing Directory SAN/HDFS Apache Flume Flume – Spark Sink Apache Kafka In Memory Rule Engine Analytics Application s … Apache Spark Streaming ETL Adaptors Flume – Dir Source Message Queue Flume –Kafka Source DB Sources Sqoop/CDC Tools HDFS – Raw File Backup HDFS Hive Tables Hbase Tables Solr - Search Indexes Audits MySQL– Ref DB HDFS Hive Tables Hbase Tables Dist Message Queue Data Lake Submit Spark Jobs Data Access Hive/Presto Distributed Cache Operational Metrics Data Load Stage O M Spark Streaming O M Pre- aggregation
  • 11.
    Data Management Data Platform– Business and Domain Packaging 11 Data Acquisition/Ingest Data Federation F/W Data Processing PreAggregation Distributed Stream Processing Apache Spark Data Visualization & Analysis Mobile F/W ROC View Case Management Standard APIs – EAI & WS Analytics Engine Reconciliation Engine BPM- Workflow Engine Flexible ETL Rule Processing - In Memory Common Data Model DistributedCache Control Panel Operations & Admin Resource Mgmt Data Security Audit & Logging Scheduler Network Analysis ROC Insights Real time Message based Distributed MessageQueue Hadoop – HDFS, Hive , HBase Multi -tenancy Machine Learning Enterprise Search Real time Continuous Query - CEP Document Store Graph Data Store Authoriza tion & Authentic ation Real time Rating Profiling Cloud Metering Risk Scoring Cloud connectors API Mgmt Infrastructure On premise OS/Servers/Network/StorageIaaS(Public /Private cloud) ESB Analytic Models
  • 12.