Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Simulation and processing of data streams in Telecom - Koen Dejonghe


Published on

At the recent sold-out Spark & Machine Learning Meetup in Brussels, Koen Dejonghe of Eurocontrol delivered a lightning talk about telco data stream simulation, processing and visualization.

Specifically, Koen discussed the development of a prototype for processing of data coming from cell towers, executed for a telco operator in the Middle East — with the added difficulty that the customer could not provide real data. In the end, Koen developed a data generator in Scala/Akka, a data processor with Spark Streaming, and a visualization front-end with Node.js.

Published in: Technology
  • Hello! Who wants to chat with me? Nu photos with me here
    Are you sure you want to  Yes  No
    Your message goes here

Simulation and processing of data streams in Telecom - Koen Dejonghe

  1. 1. Simulation and processing of data streams in Telecom Koen Dejonghe @botkop
  2. 2. Problem sketch Provide prototype for 3 use cases: ● Real-time location based offerings ● Network congestion monitoring ● Celltower anomaly detection Expert Cool. Can we log into your network? Customer No. Expert Can we get the data? Customer No. Expert Anonymized data? Customer No. Can you help us?
  3. 3. The trouble with real data - Cannot share - Cannot get - Cannot break - Cannot understand
  4. 4. So, how do we solve this conundrum ? Matching performance indicators (KPIs) is better than matching data details when generating fake data. Source: Ted Dunning - MapR: Sharing Big Data Safely
  5. 5. Simulator Kafka Spark Cassandra Node.js d3.js Pipeline
  6. 6. Event processing 2 types of streams: ● Attach streams: ○ Slow, long living data stream: only once at connection to the network ○ Contains identification of subscriber (IMSI) ○ Contains bearer id for correlation with other streams ● Context streams: ○ Fast, short-living data stream ○ Events occur multiple times during period the subscriber is connected to a celltower ○ No subscriber identification, only bearer id ○ Contains celltower identification and location ○ Contains metrics controlled by templates
  7. 7. Simulator Actor based system - Simulator - Celltowers - Subscribers
  8. 8. Use case 1 Real-time location based offerings
  9. 9. Use case 2 Network congestion monitoring
  10. 10. Use case 3 Celltower anomaly detection
  11. 11. Summary ● If you can’t get the data, fake them ○ (taking KPI’s into account) ● Simulation is great for ○ injecting anomalies ○ playing with ML algos ○ control of the universe Koen Dejonghe @botkop