Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Thomas Lamirault_Mohamed Amine Abdessemed -A brief history of time with Apache Flink

864 views

Published on

Many use cases in the telecommunication industry require producing counters, quality metrics, and alarms in a streaming fashion with very low latency. Most of this metrics are only valuable when they’re made available as soon as the associated events happened. In our company we are looking for a system able to produce this kind of real-time indicator, which must handle massive amounts of data (400,000 eps) with often peak loads (like New Year’s Eve) or out-of-order events like massive network disorder. Low latency and flexible window management with specific watermark emission are also a must-haves. Heterogeneous format, multiple flow correlation, and the possibility of late data arrival are other challenges. Flink being already widely used at Bouygues Telecom for real-time data integration, its features made it the evident candidate for the future System. In this talk, we'll present a real use case of streaming analytics using Flink, Kafka & HBase along with other legacy systems.

Published in: Data & Analytics
  • Be the first to comment

Thomas Lamirault_Mohamed Amine Abdessemed -A brief history of time with Apache Flink

  1. 1. A Brief History of Time with Apache Flink Real time monitoring and analysis with Flink, Kafka and HBase Flink Forward 2016 Berlin, September, 12th, 2016 1 Thomas LAMIRAULT Mohamed Amine ABDESSEMED
  2. 2. The speakers • Software engineer & solution architect @ Bouygues Telecom since 2013 • A Flinker since the early beginnings @AminouvicTweets 2 Mohamed Amine ABDESSEMED • Bigdata Software engineer @ Bouygues Telecom since 2015 • A Flink master enthusiast @thomaslamirault Thomas LAMIRAULT
  3. 3. Outline • Who is Bouygues Telecom • Data Value and Streaming Analytics • Use case • Challenges • Streaming analytics with Flink • Results 3
  4. 4. 15M Clients 12,1M Mobile subscriber 2,9M Fixed customer WHO IS BOUYGUES TELECOM ? Mobile . Fixed . TV . Internet . Cloud 4 First Android TV BOX Leader 4G/4G+ VoLTE UHSM A very Innovative company
  5. 5. WHO IS BOUYGUES TELECOM ? 5
  6. 6. LUX: Logged User eXperience Mobile QoE • Our Big Data platform • Produce Mobile QoE indicators based on massive network equipment’s event logs (Billions event/day). • Goals: – QoE (User) instead of QoS (Machine) – Real-time Diagnostic (<60’ end-to-end latency) – Business Intelligence – Reporting – Real-time alarming 6
  7. 7. LUX In Numbers 7 ~300 users 30 Flink production apps 10 Billions raw event/day 2 Po storage ~120 users 5 Flink production apps 4 Billions raw event/day 750 To storage 2015 Today
  8. 8. Analytic Data value 8 Important event occurrence Predictive analytics Advanced analytics Time Data Value Streaming analytics T0 T1 T2 TyT-x Before important event NOW After important event
  9. 9. DATA IS MOST VALUABLE NOW ! 9
  10. 10. Analytic Data value • Data is most valuable when made available as soon as important events occur. • Get the most of Data – Collect data fast. – (Pre)Process it fast. – Analyze it and create added value to act faster! 10
  11. 11. The use case 11 GNOC Identify Define ExploreAction Look back Specific Numbers Emergency Numbers Call Center Numbers Something wrong? The BIG picture isn’t always significant Global counts have no sense !
  12. 12. The use case • A simple and valuable use case • Need to analyze the entire call traffic : –Considering multiple aggregation axes –Fine grained analysis –Detect when something is happening somewhere in real-time –Compare with historical values trends 12
  13. 13. Challenges • Low latency & streaming fashion counters • Quickly available KPIs = value • Massive amounts of data + peak loads • Reliability • Multiple flow correlation • Time management: – Out of order & late events  our worst enemies – Flexible window management – Specific watermark emission 13
  14. 14. Streaming Analytics Time Management 14 8h00 8h30 9h00 9h30 Processing Time
  15. 15. Streaming Analytics Time Management 15 8h00 8h30 9h00 9h30 Processing Time 8h09 8h14 8h03 8h00 8h12 8h43 8h45 8h48 9h10 8h00 9h12 8h13 8h47 9h32 9h35 9h14 9h30 8h02 8h03 Event Time Window FIRE
  16. 16. Streaming Analytics with Flink Built-in windowing functionalities • Custom Watermark extractor • Custom Triggers for lateness management • Custom Key extractor Stateful Streaming • Checkpointing • Fault tolerance • Savepoints • Update without data loss 16
  17. 17. Streaming analytics with Flink Performance • High throughput • Low latency • Excellent memory management Flexible window management • Tumbling • Sliding • Session 17
  18. 18. High Level Architecture 18 Queue Streaming Processing Windowing Computation Collect Storage Dataviz In-Memory Lookups Alarming Queue Queue Streaming Streaming
  19. 19. Architectural details 19 Storage Kafkacluster T’1 P1 Px .. T’n P1 Px .. T1 P1 Px .. Tn P1 Px .. Tm P1 Px .. T’m P1 Px .. Analytic APP State Backend Historical Data Monitoring Alarming
  20. 20. Streaming application Details 20 TnTnTn Multiple Input Topics filterfilterfilter Keep only records that interest us Extract timestamp Extract relevant event timestamp, with custom watermark emission keyBy Group by considered aggregation axes window Tumbling windows trigger Custom trigger, allow configurable late data and manage lagged data sink Write Data into HBase reduce Produce KPIs UnionUnion Correlate input flows
  21. 21. The results Production metrics • Low latency (<100ms) • Input : Up to ~80.000 events/sec • Output : Produce ~40.000 KPI/window 21
  22. 22. GNOC The results Dataviz 23
  23. 23. Benefits 24 • Monitor and improve customer experience • Reduced incident detection time • Help GNOC alarm prioritization based on customer experience • Reduced operating costs
  24. 24. Difficulties • Massive amounts of data in both input and output • Savepoint/Checkpoint cost • HBase analytic limitations –Read vs Write –Long Scan • Massive out of order events 25
  25. 25. What’s Next 26 • Flink + Kudu • Async/Incremental checkpoints • Flink CEP • Streaming SQL • Flink applications monitoring & industrialization.
  26. 26. 27 Questions ?

×