SlideShare a Scribd company logo

Stream processing on mobile networks

P
P

In this presentation I describe the architecture of two of our Flink projects. Both developed for our customers from telco industry.

Stream processing on mobile networks

1 of 30
Download to read offline
Apache Flink in action –
stream processing of mobile
networks
Future of Data: Real Time Stream Processing with Apache Flink
Who we are
We are a company that deals with the
processing of data, its storage, distribution
and analysis. We combine advanced
technology with expert services in order to
obtain value for our customers.
Main focus is on the big data technologies,
like Hadoop, Kafka, NiFi, Flink.
Web: http://triviadata.com/
What we‘re going to talk about
• Why mobile network operators need stream processing
• Architecture
• Business Challenges
• Operating Flink in Hadoop environment
• Stream processing challenges in our use case
Network architecture
Credits: https://www.gl.com/images/gsm-gprs-umts-sigtran-protocol-analyzer-over-tdm-ip-ps-web.gif
data sources (probes, devices, ...)
xDRStreamingConversion
2G
BTS
3G
NodeB
4G
eNodeB
00101101001111100010101000100110111001000010
00101101000101010001001101110010000111110010
01101001101110010111000101010001001100010
10101101000101010001001101110010000111110010
0010111001001011010000111110101000100000010
0011101001101110010111000101010001001100010
101101001101110010111000101010001001100010
Events - VOICE, SMS, DATA
• Date; Time; Event Type; MSISDN; VPN; IMEI;
Duration; Locality; Performance; Closing
Time; Relation; NULL; ...
• Date; Time; App; PortApp; IPCust; IPDest;
SrcPort; DstPor; Start; Stop; Duration;
ByteUp; ByteDn; nPacketUp; ...
• Date; Time; Event Type; MSISDN; VPN;
Duration; Locality; Performance; Closing
Time; Relation; NULL; ...
• Date; Time; Event Type; Customer APN;
Network; Locality; Performance; Closing
Time; Relation; Delay_Ans; ServiceProvider;
CDNProvider; Domain/Host; nBigPacket;
VLAN; SessionID
• Date; Time; MCC; MSISDN; Network; Locality;
IMSI, IMEI Performance; Closing Time;
Relation
• Date; Time; Event Type; MSISDN; Lenght;
Locality; Performance; Closing Time;
Relation; NULL; ...
Data conversion
Mobile operator’s data
Client’s transactions:
• SMS – simplest transaction (mostly a few records)
• Data – lenght of session = number of records
• Calls – most complex joining of records
Operators data:
• Network usage
• Billing events
Ad

Recommended

From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...DataWorks Summit
 
Big Data Day LA 2016/ Use Case Driven track - Hydrator: Open Source, Code-Fre...
Big Data Day LA 2016/ Use Case Driven track - Hydrator: Open Source, Code-Fre...Big Data Day LA 2016/ Use Case Driven track - Hydrator: Open Source, Code-Fre...
Big Data Day LA 2016/ Use Case Driven track - Hydrator: Open Source, Code-Fre...Data Con LA
 
Real time fraud detection at 1+M scale on hadoop stack
Real time fraud detection at 1+M scale on hadoop stackReal time fraud detection at 1+M scale on hadoop stack
Real time fraud detection at 1+M scale on hadoop stackDataWorks Summit/Hadoop Summit
 
Embeddable data transformation for real time streams
Embeddable data transformation for real time streamsEmbeddable data transformation for real time streams
Embeddable data transformation for real time streamsJoey Echeverria
 
Innovation in the Enterprise Rent-A-Car Data Warehouse
Innovation in the Enterprise Rent-A-Car Data WarehouseInnovation in the Enterprise Rent-A-Car Data Warehouse
Innovation in the Enterprise Rent-A-Car Data WarehouseDataWorks Summit
 
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder HortonworksThe Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder HortonworksData Con LA
 

More Related Content

What's hot

Solr + Hadoop: Interactive Search for Hadoop
Solr + Hadoop: Interactive Search for HadoopSolr + Hadoop: Interactive Search for Hadoop
Solr + Hadoop: Interactive Search for Hadoopgregchanan
 
Scaling Hadoop at LinkedIn
Scaling Hadoop at LinkedInScaling Hadoop at LinkedIn
Scaling Hadoop at LinkedInDataWorks Summit
 
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...DataWorks Summit/Hadoop Summit
 
Querying Druid in SQL with Superset
Querying Druid in SQL with SupersetQuerying Druid in SQL with Superset
Querying Druid in SQL with SupersetDataWorks Summit
 
Apache Druid 101
Apache Druid 101Apache Druid 101
Apache Druid 101Data Con LA
 
OracleStore: A Highly Performant RawStore Implementation for Hive Metastore
OracleStore: A Highly Performant RawStore Implementation for Hive MetastoreOracleStore: A Highly Performant RawStore Implementation for Hive Metastore
OracleStore: A Highly Performant RawStore Implementation for Hive MetastoreDataWorks Summit
 
HAWQ Meets Hive - Querying Unmanaged Data
HAWQ Meets Hive - Querying Unmanaged DataHAWQ Meets Hive - Querying Unmanaged Data
HAWQ Meets Hive - Querying Unmanaged DataDataWorks Summit
 
Building Continuously Curated Ingestion Pipelines
Building Continuously Curated Ingestion PipelinesBuilding Continuously Curated Ingestion Pipelines
Building Continuously Curated Ingestion PipelinesArvind Prabhakar
 
Built-In Security for the Cloud
Built-In Security for the CloudBuilt-In Security for the Cloud
Built-In Security for the CloudDataWorks Summit
 
How to Use Innovative Data Handling and Processing Techniques to Drive Alpha ...
How to Use Innovative Data Handling and Processing Techniques to Drive Alpha ...How to Use Innovative Data Handling and Processing Techniques to Drive Alpha ...
How to Use Innovative Data Handling and Processing Techniques to Drive Alpha ...DataWorks Summit
 
Combining Machine Learning frameworks with Apache Spark
Combining Machine Learning frameworks with Apache SparkCombining Machine Learning frameworks with Apache Spark
Combining Machine Learning frameworks with Apache SparkDataWorks Summit/Hadoop Summit
 
Leveraging docker for hadoop build automation and big data stack provisioning
Leveraging docker for hadoop build automation and big data stack provisioningLeveraging docker for hadoop build automation and big data stack provisioning
Leveraging docker for hadoop build automation and big data stack provisioningEvans Ye
 
Standalone metastore-dws-sjc-june-2018
Standalone metastore-dws-sjc-june-2018Standalone metastore-dws-sjc-june-2018
Standalone metastore-dws-sjc-june-2018alanfgates
 
Troubleshooting Kerberos in Hadoop: Taming the Beast
Troubleshooting Kerberos in Hadoop: Taming the BeastTroubleshooting Kerberos in Hadoop: Taming the Beast
Troubleshooting Kerberos in Hadoop: Taming the BeastDataWorks Summit
 
Data Ingest Self Service and Management using Nifi and Kafka
Data Ingest Self Service and Management using Nifi and KafkaData Ingest Self Service and Management using Nifi and Kafka
Data Ingest Self Service and Management using Nifi and KafkaDataWorks Summit
 
Preventative Maintenance of Robots in Automotive Industry
Preventative Maintenance of Robots in Automotive IndustryPreventative Maintenance of Robots in Automotive Industry
Preventative Maintenance of Robots in Automotive IndustryDataWorks Summit/Hadoop Summit
 
Running secured Spark job in Kubernetes compute cluster and integrating with ...
Running secured Spark job in Kubernetes compute cluster and integrating with ...Running secured Spark job in Kubernetes compute cluster and integrating with ...
Running secured Spark job in Kubernetes compute cluster and integrating with ...DataWorks Summit
 
Big Data Day LA 2015 - Always-on Ingestion for Data at Scale by Arvind Prabha...
Big Data Day LA 2015 - Always-on Ingestion for Data at Scale by Arvind Prabha...Big Data Day LA 2015 - Always-on Ingestion for Data at Scale by Arvind Prabha...
Big Data Day LA 2015 - Always-on Ingestion for Data at Scale by Arvind Prabha...Data Con LA
 
Schema Registry - Set Your Data Free
Schema Registry - Set Your Data FreeSchema Registry - Set Your Data Free
Schema Registry - Set Your Data FreeDataWorks Summit
 

What's hot (20)

Solr + Hadoop: Interactive Search for Hadoop
Solr + Hadoop: Interactive Search for HadoopSolr + Hadoop: Interactive Search for Hadoop
Solr + Hadoop: Interactive Search for Hadoop
 
Active Learning for Fraud Prevention
Active Learning for Fraud PreventionActive Learning for Fraud Prevention
Active Learning for Fraud Prevention
 
Scaling Hadoop at LinkedIn
Scaling Hadoop at LinkedInScaling Hadoop at LinkedIn
Scaling Hadoop at LinkedIn
 
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
 
Querying Druid in SQL with Superset
Querying Druid in SQL with SupersetQuerying Druid in SQL with Superset
Querying Druid in SQL with Superset
 
Apache Druid 101
Apache Druid 101Apache Druid 101
Apache Druid 101
 
OracleStore: A Highly Performant RawStore Implementation for Hive Metastore
OracleStore: A Highly Performant RawStore Implementation for Hive MetastoreOracleStore: A Highly Performant RawStore Implementation for Hive Metastore
OracleStore: A Highly Performant RawStore Implementation for Hive Metastore
 
HAWQ Meets Hive - Querying Unmanaged Data
HAWQ Meets Hive - Querying Unmanaged DataHAWQ Meets Hive - Querying Unmanaged Data
HAWQ Meets Hive - Querying Unmanaged Data
 
Building Continuously Curated Ingestion Pipelines
Building Continuously Curated Ingestion PipelinesBuilding Continuously Curated Ingestion Pipelines
Building Continuously Curated Ingestion Pipelines
 
Built-In Security for the Cloud
Built-In Security for the CloudBuilt-In Security for the Cloud
Built-In Security for the Cloud
 
How to Use Innovative Data Handling and Processing Techniques to Drive Alpha ...
How to Use Innovative Data Handling and Processing Techniques to Drive Alpha ...How to Use Innovative Data Handling and Processing Techniques to Drive Alpha ...
How to Use Innovative Data Handling and Processing Techniques to Drive Alpha ...
 
Combining Machine Learning frameworks with Apache Spark
Combining Machine Learning frameworks with Apache SparkCombining Machine Learning frameworks with Apache Spark
Combining Machine Learning frameworks with Apache Spark
 
Leveraging docker for hadoop build automation and big data stack provisioning
Leveraging docker for hadoop build automation and big data stack provisioningLeveraging docker for hadoop build automation and big data stack provisioning
Leveraging docker for hadoop build automation and big data stack provisioning
 
Standalone metastore-dws-sjc-june-2018
Standalone metastore-dws-sjc-june-2018Standalone metastore-dws-sjc-june-2018
Standalone metastore-dws-sjc-june-2018
 
Troubleshooting Kerberos in Hadoop: Taming the Beast
Troubleshooting Kerberos in Hadoop: Taming the BeastTroubleshooting Kerberos in Hadoop: Taming the Beast
Troubleshooting Kerberos in Hadoop: Taming the Beast
 
Data Ingest Self Service and Management using Nifi and Kafka
Data Ingest Self Service and Management using Nifi and KafkaData Ingest Self Service and Management using Nifi and Kafka
Data Ingest Self Service and Management using Nifi and Kafka
 
Preventative Maintenance of Robots in Automotive Industry
Preventative Maintenance of Robots in Automotive IndustryPreventative Maintenance of Robots in Automotive Industry
Preventative Maintenance of Robots in Automotive Industry
 
Running secured Spark job in Kubernetes compute cluster and integrating with ...
Running secured Spark job in Kubernetes compute cluster and integrating with ...Running secured Spark job in Kubernetes compute cluster and integrating with ...
Running secured Spark job in Kubernetes compute cluster and integrating with ...
 
Big Data Day LA 2015 - Always-on Ingestion for Data at Scale by Arvind Prabha...
Big Data Day LA 2015 - Always-on Ingestion for Data at Scale by Arvind Prabha...Big Data Day LA 2015 - Always-on Ingestion for Data at Scale by Arvind Prabha...
Big Data Day LA 2015 - Always-on Ingestion for Data at Scale by Arvind Prabha...
 
Schema Registry - Set Your Data Free
Schema Registry - Set Your Data FreeSchema Registry - Set Your Data Free
Schema Registry - Set Your Data Free
 

Similar to Stream processing on mobile networks

Building real time data-driven products
Building real time data-driven productsBuilding real time data-driven products
Building real time data-driven productsLars Albertsson
 
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...ssuserd3a367
 
Introduction to Apache NiFi dws19 DWS - DC 2019
Introduction to Apache NiFi   dws19 DWS - DC 2019Introduction to Apache NiFi   dws19 DWS - DC 2019
Introduction to Apache NiFi dws19 DWS - DC 2019Timothy Spann
 
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...DataWorks Summit
 
Navigating SAP’s Integration Options (Mastering SAP Technologies 2013)
Navigating SAP’s Integration Options (Mastering SAP Technologies 2013)Navigating SAP’s Integration Options (Mastering SAP Technologies 2013)
Navigating SAP’s Integration Options (Mastering SAP Technologies 2013)Sascha Wenninger
 
IBM Aspera for High Speed Data Migration to Your AWS Cloud - DEM06-S - Anahei...
IBM Aspera for High Speed Data Migration to Your AWS Cloud - DEM06-S - Anahei...IBM Aspera for High Speed Data Migration to Your AWS Cloud - DEM06-S - Anahei...
IBM Aspera for High Speed Data Migration to Your AWS Cloud - DEM06-S - Anahei...Amazon Web Services
 
Billions of Messages in Real Time: Why Paypal & LinkedIn Trust an Engagement ...
Billions of Messages in Real Time: Why Paypal & LinkedIn Trust an Engagement ...Billions of Messages in Real Time: Why Paypal & LinkedIn Trust an Engagement ...
Billions of Messages in Real Time: Why Paypal & LinkedIn Trust an Engagement ...confluent
 
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Precisely
 
1. beyond mission critical virtualizing big data and hadoop
1. beyond mission critical   virtualizing big data and hadoop1. beyond mission critical   virtualizing big data and hadoop
1. beyond mission critical virtualizing big data and hadoopChiou-Nan Chen
 
Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena Edelson
Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena EdelsonStreaming Analytics with Spark, Kafka, Cassandra and Akka by Helena Edelson
Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena EdelsonSpark Summit
 
Ai tour 2019 Mejores Practicas en Entornos de Produccion Big Data Open Source...
Ai tour 2019 Mejores Practicas en Entornos de Produccion Big Data Open Source...Ai tour 2019 Mejores Practicas en Entornos de Produccion Big Data Open Source...
Ai tour 2019 Mejores Practicas en Entornos de Produccion Big Data Open Source...nnakasone
 
Streaming Analytics with Spark, Kafka, Cassandra and Akka
Streaming Analytics with Spark, Kafka, Cassandra and AkkaStreaming Analytics with Spark, Kafka, Cassandra and Akka
Streaming Analytics with Spark, Kafka, Cassandra and AkkaHelena Edelson
 
IBM Aspera for high-speed data migration to your AWS Cloud - DEM02-S - New Yo...
IBM Aspera for high-speed data migration to your AWS Cloud - DEM02-S - New Yo...IBM Aspera for high-speed data migration to your AWS Cloud - DEM02-S - New Yo...
IBM Aspera for high-speed data migration to your AWS Cloud - DEM02-S - New Yo...Amazon Web Services
 
10 Big Data Technologies you Didn't Know About
10 Big Data Technologies you Didn't Know About 10 Big Data Technologies you Didn't Know About
10 Big Data Technologies you Didn't Know About Jesus Rodriguez
 
Trend Micro Big Data Platform and Apache Bigtop
Trend Micro Big Data Platform and Apache BigtopTrend Micro Big Data Platform and Apache Bigtop
Trend Micro Big Data Platform and Apache BigtopEvans Ye
 

Similar to Stream processing on mobile networks (20)

Kafka & Hadoop in Rakuten
Kafka & Hadoop in RakutenKafka & Hadoop in Rakuten
Kafka & Hadoop in Rakuten
 
Building real time data-driven products
Building real time data-driven productsBuilding real time data-driven products
Building real time data-driven products
 
Using Data Lakes
Using Data LakesUsing Data Lakes
Using Data Lakes
 
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
 
Using Data Lakes
Using Data LakesUsing Data Lakes
Using Data Lakes
 
Introduction to Apache NiFi dws19 DWS - DC 2019
Introduction to Apache NiFi   dws19 DWS - DC 2019Introduction to Apache NiFi   dws19 DWS - DC 2019
Introduction to Apache NiFi dws19 DWS - DC 2019
 
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...
 
Navigating SAP’s Integration Options (Mastering SAP Technologies 2013)
Navigating SAP’s Integration Options (Mastering SAP Technologies 2013)Navigating SAP’s Integration Options (Mastering SAP Technologies 2013)
Navigating SAP’s Integration Options (Mastering SAP Technologies 2013)
 
Using Data Lakes
Using Data LakesUsing Data Lakes
Using Data Lakes
 
Confluent and Elastic
Confluent and ElasticConfluent and Elastic
Confluent and Elastic
 
IBM Aspera for High Speed Data Migration to Your AWS Cloud - DEM06-S - Anahei...
IBM Aspera for High Speed Data Migration to Your AWS Cloud - DEM06-S - Anahei...IBM Aspera for High Speed Data Migration to Your AWS Cloud - DEM06-S - Anahei...
IBM Aspera for High Speed Data Migration to Your AWS Cloud - DEM06-S - Anahei...
 
Billions of Messages in Real Time: Why Paypal & LinkedIn Trust an Engagement ...
Billions of Messages in Real Time: Why Paypal & LinkedIn Trust an Engagement ...Billions of Messages in Real Time: Why Paypal & LinkedIn Trust an Engagement ...
Billions of Messages in Real Time: Why Paypal & LinkedIn Trust an Engagement ...
 
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
 
1. beyond mission critical virtualizing big data and hadoop
1. beyond mission critical   virtualizing big data and hadoop1. beyond mission critical   virtualizing big data and hadoop
1. beyond mission critical virtualizing big data and hadoop
 
Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena Edelson
Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena EdelsonStreaming Analytics with Spark, Kafka, Cassandra and Akka by Helena Edelson
Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena Edelson
 
Ai tour 2019 Mejores Practicas en Entornos de Produccion Big Data Open Source...
Ai tour 2019 Mejores Practicas en Entornos de Produccion Big Data Open Source...Ai tour 2019 Mejores Practicas en Entornos de Produccion Big Data Open Source...
Ai tour 2019 Mejores Practicas en Entornos de Produccion Big Data Open Source...
 
Streaming Analytics with Spark, Kafka, Cassandra and Akka
Streaming Analytics with Spark, Kafka, Cassandra and AkkaStreaming Analytics with Spark, Kafka, Cassandra and Akka
Streaming Analytics with Spark, Kafka, Cassandra and Akka
 
IBM Aspera for high-speed data migration to your AWS Cloud - DEM02-S - New Yo...
IBM Aspera for high-speed data migration to your AWS Cloud - DEM02-S - New Yo...IBM Aspera for high-speed data migration to your AWS Cloud - DEM02-S - New Yo...
IBM Aspera for high-speed data migration to your AWS Cloud - DEM02-S - New Yo...
 
10 Big Data Technologies you Didn't Know About
10 Big Data Technologies you Didn't Know About 10 Big Data Technologies you Didn't Know About
10 Big Data Technologies you Didn't Know About
 
Trend Micro Big Data Platform and Apache Bigtop
Trend Micro Big Data Platform and Apache BigtopTrend Micro Big Data Platform and Apache Bigtop
Trend Micro Big Data Platform and Apache Bigtop
 

Recently uploaded

Sql server types of joins with example.pptx
Sql server types of joins with example.pptxSql server types of joins with example.pptx
Sql server types of joins with example.pptxsameer gaikwad
 
Steps to Build a PWA with Odoo.pdf
Steps to Build a PWA with Odoo.pdfSteps to Build a PWA with Odoo.pdf
Steps to Build a PWA with Odoo.pdfayushinwizards
 
AUTOKEYUNLOCKER-BRANDS-SUPPORT-STANDARD-VERSION.pdf
AUTOKEYUNLOCKER-BRANDS-SUPPORT-STANDARD-VERSION.pdfAUTOKEYUNLOCKER-BRANDS-SUPPORT-STANDARD-VERSION.pdf
AUTOKEYUNLOCKER-BRANDS-SUPPORT-STANDARD-VERSION.pdfAutokey
 
Self scaling Multi cloud nomad workloads
Self scaling Multi cloud nomad workloadsSelf scaling Multi cloud nomad workloads
Self scaling Multi cloud nomad workloadsBram Vogelaar
 
unit I lecture 3 - Software Process Models.pdf
unit I lecture 3 - Software Process Models.pdfunit I lecture 3 - Software Process Models.pdf
unit I lecture 3 - Software Process Models.pdfStephenTec
 
Open Sprintera (Where Open Source Sparks a Sprint of Possibilities)
Open Sprintera (Where Open Source Sparks a Sprint of Possibilities)Open Sprintera (Where Open Source Sparks a Sprint of Possibilities)
Open Sprintera (Where Open Source Sparks a Sprint of Possibilities)GDSCNiT
 
Manual de la Mezcladora SoundCraft Notepad -12Fx
Manual de la Mezcladora SoundCraft Notepad -12FxManual de la Mezcladora SoundCraft Notepad -12Fx
Manual de la Mezcladora SoundCraft Notepad -12Fxjavierdavidvelasco17
 
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...confluent
 
Enabling Enterprise-wide OT Data access with Matrikon Data Broker.pdf
Enabling Enterprise-wide OT Data access  with Matrikon Data Broker.pdfEnabling Enterprise-wide OT Data access  with Matrikon Data Broker.pdf
Enabling Enterprise-wide OT Data access with Matrikon Data Broker.pdfJohn Archer
 
unit I lecture 2 - Software Engineering Ethics - Software Process.pdf
unit I lecture 2 - Software Engineering Ethics - Software Process.pdfunit I lecture 2 - Software Engineering Ethics - Software Process.pdf
unit I lecture 2 - Software Engineering Ethics - Software Process.pdfStephenTec
 
Les02 Restricting and Sorting Data using SQL.ppt
Les02 Restricting and Sorting Data using SQL.pptLes02 Restricting and Sorting Data using SQL.ppt
Les02 Restricting and Sorting Data using SQL.pptDrZeeshanBhatti
 
Essence of Requirements Engineering: Pragmatic Insights for 2024
Essence of Requirements Engineering: Pragmatic Insights for 2024Essence of Requirements Engineering: Pragmatic Insights for 2024
Essence of Requirements Engineering: Pragmatic Insights for 2024Asher Sterkin
 
India's_Generative_AI_Startup_Landscape_Report_2023_Inc42 (1).pdf
India's_Generative_AI_Startup_Landscape_Report_2023_Inc42 (1).pdfIndia's_Generative_AI_Startup_Landscape_Report_2023_Inc42 (1).pdf
India's_Generative_AI_Startup_Landscape_Report_2023_Inc42 (1).pdfgranitesrijan
 
unit 1 lecture 1 - Introduction - Software Engineering Myths.pdf
unit 1 lecture 1 - Introduction - Software Engineering Myths.pdfunit 1 lecture 1 - Introduction - Software Engineering Myths.pdf
unit 1 lecture 1 - Introduction - Software Engineering Myths.pdfStephenTec
 
unit I lecture 5 - Software Development Life Cycle.pdf
unit I lecture 5 - Software Development Life Cycle.pdfunit I lecture 5 - Software Development Life Cycle.pdf
unit I lecture 5 - Software Development Life Cycle.pdfStephenTec
 
Embracing Change - The Impact of Generative AI on Strategic Portfolio Management
Embracing Change - The Impact of Generative AI on Strategic Portfolio ManagementEmbracing Change - The Impact of Generative AI on Strategic Portfolio Management
Embracing Change - The Impact of Generative AI on Strategic Portfolio ManagementOnePlan Solutions
 
Getting Started with Trello for Beginners.pptx
Getting Started with Trello for Beginners.pptxGetting Started with Trello for Beginners.pptx
Getting Started with Trello for Beginners.pptxmavinoikein
 
unit I lecture 4 - AGILE DEVELOPMENT AND PLAN-DRIVEN.pdf
unit I lecture 4 - AGILE DEVELOPMENT AND PLAN-DRIVEN.pdfunit I lecture 4 - AGILE DEVELOPMENT AND PLAN-DRIVEN.pdf
unit I lecture 4 - AGILE DEVELOPMENT AND PLAN-DRIVEN.pdfStephenTec
 
App Builder - Hierarchical Data Apps.pptx
App Builder - Hierarchical Data Apps.pptxApp Builder - Hierarchical Data Apps.pptx
App Builder - Hierarchical Data Apps.pptxPoojitha B
 

Recently uploaded (20)

Sql server types of joins with example.pptx
Sql server types of joins with example.pptxSql server types of joins with example.pptx
Sql server types of joins with example.pptx
 
Steps to Build a PWA with Odoo.pdf
Steps to Build a PWA with Odoo.pdfSteps to Build a PWA with Odoo.pdf
Steps to Build a PWA with Odoo.pdf
 
AUTOKEYUNLOCKER-BRANDS-SUPPORT-STANDARD-VERSION.pdf
AUTOKEYUNLOCKER-BRANDS-SUPPORT-STANDARD-VERSION.pdfAUTOKEYUNLOCKER-BRANDS-SUPPORT-STANDARD-VERSION.pdf
AUTOKEYUNLOCKER-BRANDS-SUPPORT-STANDARD-VERSION.pdf
 
Self scaling Multi cloud nomad workloads
Self scaling Multi cloud nomad workloadsSelf scaling Multi cloud nomad workloads
Self scaling Multi cloud nomad workloads
 
unit I lecture 3 - Software Process Models.pdf
unit I lecture 3 - Software Process Models.pdfunit I lecture 3 - Software Process Models.pdf
unit I lecture 3 - Software Process Models.pdf
 
Open Sprintera (Where Open Source Sparks a Sprint of Possibilities)
Open Sprintera (Where Open Source Sparks a Sprint of Possibilities)Open Sprintera (Where Open Source Sparks a Sprint of Possibilities)
Open Sprintera (Where Open Source Sparks a Sprint of Possibilities)
 
Manual de la Mezcladora SoundCraft Notepad -12Fx
Manual de la Mezcladora SoundCraft Notepad -12FxManual de la Mezcladora SoundCraft Notepad -12Fx
Manual de la Mezcladora SoundCraft Notepad -12Fx
 
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
 
Enabling Enterprise-wide OT Data access with Matrikon Data Broker.pdf
Enabling Enterprise-wide OT Data access  with Matrikon Data Broker.pdfEnabling Enterprise-wide OT Data access  with Matrikon Data Broker.pdf
Enabling Enterprise-wide OT Data access with Matrikon Data Broker.pdf
 
unit I lecture 2 - Software Engineering Ethics - Software Process.pdf
unit I lecture 2 - Software Engineering Ethics - Software Process.pdfunit I lecture 2 - Software Engineering Ethics - Software Process.pdf
unit I lecture 2 - Software Engineering Ethics - Software Process.pdf
 
Features of IETM Software -Code and Pixels
Features of IETM Software -Code and PixelsFeatures of IETM Software -Code and Pixels
Features of IETM Software -Code and Pixels
 
Les02 Restricting and Sorting Data using SQL.ppt
Les02 Restricting and Sorting Data using SQL.pptLes02 Restricting and Sorting Data using SQL.ppt
Les02 Restricting and Sorting Data using SQL.ppt
 
Essence of Requirements Engineering: Pragmatic Insights for 2024
Essence of Requirements Engineering: Pragmatic Insights for 2024Essence of Requirements Engineering: Pragmatic Insights for 2024
Essence of Requirements Engineering: Pragmatic Insights for 2024
 
India's_Generative_AI_Startup_Landscape_Report_2023_Inc42 (1).pdf
India's_Generative_AI_Startup_Landscape_Report_2023_Inc42 (1).pdfIndia's_Generative_AI_Startup_Landscape_Report_2023_Inc42 (1).pdf
India's_Generative_AI_Startup_Landscape_Report_2023_Inc42 (1).pdf
 
unit 1 lecture 1 - Introduction - Software Engineering Myths.pdf
unit 1 lecture 1 - Introduction - Software Engineering Myths.pdfunit 1 lecture 1 - Introduction - Software Engineering Myths.pdf
unit 1 lecture 1 - Introduction - Software Engineering Myths.pdf
 
unit I lecture 5 - Software Development Life Cycle.pdf
unit I lecture 5 - Software Development Life Cycle.pdfunit I lecture 5 - Software Development Life Cycle.pdf
unit I lecture 5 - Software Development Life Cycle.pdf
 
Embracing Change - The Impact of Generative AI on Strategic Portfolio Management
Embracing Change - The Impact of Generative AI on Strategic Portfolio ManagementEmbracing Change - The Impact of Generative AI on Strategic Portfolio Management
Embracing Change - The Impact of Generative AI on Strategic Portfolio Management
 
Getting Started with Trello for Beginners.pptx
Getting Started with Trello for Beginners.pptxGetting Started with Trello for Beginners.pptx
Getting Started with Trello for Beginners.pptx
 
unit I lecture 4 - AGILE DEVELOPMENT AND PLAN-DRIVEN.pdf
unit I lecture 4 - AGILE DEVELOPMENT AND PLAN-DRIVEN.pdfunit I lecture 4 - AGILE DEVELOPMENT AND PLAN-DRIVEN.pdf
unit I lecture 4 - AGILE DEVELOPMENT AND PLAN-DRIVEN.pdf
 
App Builder - Hierarchical Data Apps.pptx
App Builder - Hierarchical Data Apps.pptxApp Builder - Hierarchical Data Apps.pptx
App Builder - Hierarchical Data Apps.pptx
 

Stream processing on mobile networks

  • 1. Apache Flink in action – stream processing of mobile networks Future of Data: Real Time Stream Processing with Apache Flink
  • 2. Who we are We are a company that deals with the processing of data, its storage, distribution and analysis. We combine advanced technology with expert services in order to obtain value for our customers. Main focus is on the big data technologies, like Hadoop, Kafka, NiFi, Flink. Web: http://triviadata.com/
  • 3. What we‘re going to talk about • Why mobile network operators need stream processing • Architecture • Business Challenges • Operating Flink in Hadoop environment • Stream processing challenges in our use case
  • 5. xDRStreamingConversion 2G BTS 3G NodeB 4G eNodeB 00101101001111100010101000100110111001000010 00101101000101010001001101110010000111110010 01101001101110010111000101010001001100010 10101101000101010001001101110010000111110010 0010111001001011010000111110101000100000010 0011101001101110010111000101010001001100010 101101001101110010111000101010001001100010 Events - VOICE, SMS, DATA • Date; Time; Event Type; MSISDN; VPN; IMEI; Duration; Locality; Performance; Closing Time; Relation; NULL; ... • Date; Time; App; PortApp; IPCust; IPDest; SrcPort; DstPor; Start; Stop; Duration; ByteUp; ByteDn; nPacketUp; ... • Date; Time; Event Type; MSISDN; VPN; Duration; Locality; Performance; Closing Time; Relation; NULL; ... • Date; Time; Event Type; Customer APN; Network; Locality; Performance; Closing Time; Relation; Delay_Ans; ServiceProvider; CDNProvider; Domain/Host; nBigPacket; VLAN; SessionID • Date; Time; MCC; MSISDN; Network; Locality; IMSI, IMEI Performance; Closing Time; Relation • Date; Time; Event Type; MSISDN; Lenght; Locality; Performance; Closing Time; Relation; NULL; ... Data conversion
  • 6. Mobile operator’s data Client’s transactions: • SMS – simplest transaction (mostly a few records) • Data – lenght of session = number of records • Calls – most complex joining of records Operators data: • Network usage • Billing events
  • 7. Typical use cases in telco Customer oriented • fraud & security • Customer Experience Management • triggers alarms based on customer-related quality indicators • CEM KPI • Fast issue diagnosis & Customer support • reduce the Average Handling Time and First Call Resolution rate • Data source for analysis: • Community analysis • Household detection • Segmentation • Churn prediction • Behavioural analysis Operation oriented • networks performing overlook • service management support • precise problem geolocation • end-to-end in-depth troubleshooting • real-time fault detection • automated troubleshooting (diagnosis, recovery) • QoS KPI trend analysis Constant monitoring of network, service and customer KPIs.
  • 8. Use cases in action • Network Analytics (web application) • Cell • User • Device • Getting raw data into HDFS for analysts – SQL queries via Impala
  • 9. They already do it • DWH style • Batch processing
  • 10. Challenges • Conversion from binary format (e.g. ASN.1) • Tightening the feedback loop • Have solution ready for future use cases • Anomaly detection • Predictive maintenance • Still allow people to run analytical queries on data
  • 12. Apache Kafka • De facto standard for stream processing • Fault tolerant • Highly scalable • We use it with • Avro (schema evolution) • Schema registry
  • 13. Apache Flink • Very flexible window definitions • Event time semantics • Many deployment options • Can handle large state
  • 14. Challenges • Running Flink on YARN • Secured Hadoop & Kafka cluster • Data onboarding • Side inputs/data enrichment • Storing data in Hadoop
  • 15. Flink on YARN • Big, Fat, Long running YARN session • Or Flink cluster per job ${FLINK_HOME}/bin/flink run -m yarn-cluster -d -ynm ${APPLICATION_NAME} -yn 2 -ys 2 -yjm 2048 -ytm 4096 -c com.triviadata.streaming.job.SipVoiceStream ${JAR_PATH} --kafkaServer ${KAFKA_SERVER} --schemaRegistryUrl ${SCHEMA_REGISTRY_URL} --sipVoiceTopic raw.SipVoice --correlatedSipVoiceTopic result.SipVoiceCorrelated --stateLocation ${FLINK_STATE_LOCATION} --security-protocol SASL_PLAINTEXT --sasl-kerberos-service-name kafka
  • 16. Kerberized Hadoop & Kafka • Easy & Straightforward Flink setup • Hbase/Phoenix privileges • Hassle with Kafka ACLs • ACL to read from the topic • ACL to write to the topic • ACL to join consumer group security.kerberos.login.use-ticket-cache: false security.kerberos.login.keytab: /home/appuser/appuser.keytab security.kerberos.login.principal: appuser security.kerberos.login.contexts: Client,KafkaClient
  • 19. Side inputs/Data enrichment • Read code lists from HDFS • Store them in Rocks DB on the local filesystem of the Data Node • Ask Rocks DB to translate code -> value
  • 20. Side inputs/Data enrichment • Code list files on HDFS updated once a day • Command topic to notify jobs about new files • Refresh code lists stored in Rocks DB
  • 21. Storing data in Hadoop
  • 22. Apache Phoenix • OLTP DB on top of HBase • JDBC API • ACID transactions • Secondary indexes • Joins
  • 23. Cloudera Impala • Analytic database for Hadoop
  • 25. Correlation • Merge together related messages coming from one stream • Key stream by calling/called number • Merge messages with the same key where start time difference is less than X.
  • 26. Correlation override def processElement( value: SipVoice, ctx: KeyedProcessFunction[String, SipVoice, SipVoices]#Context, out: Collector[SipVoices]): Unit = { val startTime = parseTime(value.startTime) val (key, values) = sipVoiceState .keys .asScala .find(s => math.abs(s - startTime) <= waitingTime) .map(k => (k, value :: sipVoiceState.get(k))) .getOrElse { val triggerTimeStamp = ctx.timerService().currentProcessingTime() + delayPeriod ctx .timerService .registerProcessingTimeTimer(triggerTimeStamp) sipVoiceTimers .put(triggerTimeStamp, startTime) (startTime, List(value)) } sipVoiceState.put(key, values) } override def onTimer( timestamp: Long, ctx: KeyedProcessFunction[String, SipVoice, SipVoices]#OnTimerContext, out: Collector[SipVoices]): Unit = { if (sipVoiceTimers.contains(timestamp)) { val sipVoiceKey = sipVoiceTimers.get(timestamp) val correlationId = UUID.randomUUID().toString val correlatedSipVoices = sipVoiceState .get(sipVoiceKey) .map(_.toCorrelated(correlationId)) .sortBy(_.startTime) out.collect(SipVoices(correlatedSipVoices)) correlatedSipVoice.inc() inStateSipVoice.dec(correlatedSipVoices.size) sipVoiceTimers.remove(timestamp) sipVoiceState.remove(sipVoiceKey) } }
  • 27. Correlation • Correlate massages among multiple streams • Switching between networks during the call • Call failure and reestablishment • Event time semantics • Lateness • Out of order messages
  • 28. Aggregations • As an example for a cell we want to see: • Number of errors • Number of calls • Number of intercell handovers • …
  • 29. Defined window table.window( Tumble over windowLengthInMinutes.minutes on 'timestamp as 'timeWindow)
  • 30. Table API table .window(Tumble over windowLengthInMinutes.minutes on 'timestamp as 'timeWindow) .groupBy( 'lastCell, 'cellName, 'cellType, 'cellBand, 'cellBandwidthDownload4g, 'cellBandwidthUpload4g, 'cellSiteName, 'cellSiteAddress, 'timeWindow ) .select( 'lastCell, 'cellName, 'cellType, 'cellBand, 'cellBandwidthDownload4g, 'cellBandwidthUpload4g, 'cellSiteName, 'cellSiteAddress, 'voiceConnectAttempt.sum as 'voiceConnectAttempt, 'voiceConnectSuccess.sum as 'voiceConnectSuccess, 'interCellHandovers.sum as 'interCellHandovers, 'srvccHandovers.sum as 'srvccHandovers, 'timeWindow.start.cast(Types.LONG) as 'timeWindow )

Editor's Notes

  1. Picture is just 2G and 3G 4G is simmilar – NodeB is changed to eNodeB + some new boxes Acronyms: base station controller (BSC) Radio Network Controller (or RNC) mobile switching center (MSC) Short Message Service Center (SMSC) Serving GPRS Support Node (SGSN)
  2. Network Analytics portal Network operation & Development to detect and troubleshoot problems in the network. Customer technical support – track Quality of service of a specific customer
  3. Based on batch jobs, Transforming and moving data between different layers (pre-stage, stage, datamarts,...), Cons: - Data stored multiple times. Heavy to calculate correlations and aggregations About one hour latency.
  4. Avro allows us to generate Java/Scala classes for our projects. There are Maven/SBT plugins, DDL scripts
  5. At the time we were choosing stream processing framework this was the only one which met our needs. We were considering Flink, Spark, Kafka Streams Spark (1.6) -> did not handle large state well Kafka Steams -> not so rich API. Too new at that time
  6. We have different setup for different clients. Why? Separation of concerns More processors in case of nifi. Copy from sFTP, parse, push to kafka, copy raw data to hdfs,…. In case of ASN.1 parsing -> has been already done for batch processing, generating CSV files. Now changed to also produce messages to Kafka
  7. AVOID NEW DB/CACHE – there is already whole Hadoop ensemble to maintain. PROBLEM: we don’t get updates, we get new version of each codelist every day Took too long while new values were reflected in the data stream
  8. Receive command to refresh codelist, Broadcast command to all parallel instances of next component check timestamp weather your codelists aren’t newer. -> It can be either refresh all, refresh one, refresh from different location… So far it works. There is possible problem if our codelists grow too big – e.g. whole user profile with history for streaming machine learning algorithms etc.
  9. Quite simple aggregations – usually SUM or COUNT We have different jobs calculating different aggregations – differently keyed stream
  10. We use tumble windows of length 5 minutes – which is our finest granularity. Coarser granularities we calctulate on with SQL on query time – 15 minutes/1 hour/1 day But it‘s possible to have defined multiple windows with different length
  11. Very natural way to write SQL like syntax in Scala. STREAMING API – reduce, aggregate, fold TABLE API SQL API – sql, window defined in group by