Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

a Real-time Processing System based on Spark streaming int he field of Telecommunications


Published on

With the development of human social life, the city is undertaking an increasing population. Smart city is aimed at collecting, analyzing, integrating the key indicators of city operation through information technology, so as to make intelligent response to various needs including the people's livelihood, environmental protection, public security, city services, industrial and commercial activities. Recently the development of cloud computing, big data, internet of things technology, smart city is gradually evolving from the concept to a technology which can thoroughly change people's lives.
We have built a Streaming processing system (OCSP) based on Spark streaming, which has been used in two key fields (Location operation system and real time Marketing) in China Mobile corporation. The system used Spark streaming, Kafka, Flume, Redis technologies etc, and processed 30 million data records (type of location) per minute and 40 million (type of operation) data records per minute. The data comes from the real-time use of mobile phone from 60 million end users. After processing , the processed data are outputted to Kafka for other applications to use. OCSP provides APIs for developer to develop the applications for different use. Developers don't need to know the detail of RDD, streaming concept or other concepts in Spark. They just focus the logic implementation of the business.
We will describe the key technologies of OCSP system, then introduces large quantities of real- time data analysis and processing technology in smart tourism, and real-time processing data modeling methods. The technology has the characteristics of high real-time, high reliability, high accuracy of data processing, and has strong applicability, which be extended to other large data real-time processing scenarios.

Published in: Technology
  • Be the first to comment

a Real-time Processing System based on Spark streaming int he field of Telecommunications

  1. 1. A Real-time Processing System based on Spark streaming in the filed of Telecommunications FOR Hadoop SUMMIT,2017 Geng Wang Dong Wang
  2. 2. CONTENT
  3. 3. CONTENT 01
  4. 4. What we faced in Telecommunications 1 3 2 4 0.166 million new users / day 33G data / sec by mobile 10T Voice data / day 100T Signal data / day More data are produced
  5. 5. What we faced in Telecommunications More Real time Requirements • Smart Tourism • Intelligent marketing • Tourist count and analysis • Best choice of tourist resort • Recommendation of route for travelling • … • Recommendation of product for specific customer • Based on multiple dimensions (location, age, salary …)
  6. 6. Evolution of Requirement 2014 Real time Marketing 2015 • Operation based on Location 2017 Hard Real time … CEP, Esper 2016 • More data input 2/3/4 G Signal of Location Content of Business
  7. 7. CONTENT 02
  8. 8. Framework – High level Data output Tagging Data Input
  9. 9. Detailed Framework • Hadoop Layer • Basic components • OCSP core • Data pre-processing • Tagging • Event output (select and filter) • Multiple engine ( Spark Streaming and Storm) • Muti-tenant • Check point • Data transformation • Parse data to Kafka • Nifi and Flume • Customized processor & sink • Data source • Socket • Local files • HDFS
  10. 10. Framework - Data Input flume agent 1 • 2,3G Signal of location flume agent 2,3,4 • 4G Signal of location Nifi • Content data of Business Kafka Partition
  11. 11. Data Preprocessing Kafka transform transform transform Schema 2 Schema 3 Select Filter Select Expr 1 imsi Filter Expr 1 imsi!=0 Select Expr 2 Filter Expr 2 Select Expr 3 Filter Expr 3 Select Filter Select Filter Uniform Schema 1
  12. 12. Tagging and Label
  13. 13. Tagging process Customized operation Get by Key Codis User info Stay duration Cycle of Marketing User name imsi Phone number Base station
  14. 14. Select, filter & Output Codis Kafka Data with labels others Current location update for each user Output 3: User with specific location & specific business Output 2: New user marketing Output 1: User Path in a duration
  15. 15. Configurable process Data with labels End Check Interval Filter Select Output Codis
  16. 16. Framework - Deployment & Configuration External system SDTP Socket Source HDFS I/O Codis I/O Web Deployment in single host Deployment/ Configuration OCSP
  17. 17. CONTENT 03
  18. 18. Performance - scale out flume flume Nifi Nifi flume Codis Codis Codis Tagging SparkData Input kafKa Spark Data Output Kafka Spark
  19. 19. How OCSP works in Smart Tourism? Tagging Filter Select Output Codis • Data Source: • 4 G signal data • imsi + location + timestamp • Data transformation: • Flume source: socket • Sink: Kafka (keyed message) • Streaming processing • Filter invalid data • Tagging, get user’s information from codis by imsi • Tagging, compute the user path in a duration • Output • Write the latest location to Kafka • Use flume to update latest location in Hbase 4G data socket flume Imsi | location | timestamp Imsi | location | timestamp Imsi | location | timestamp … Kafka Imsi|location|timestamp|name|age|longitude|latitude … Imsi|location|timestamp|name|age|longitude|latitude Imsi|location|timestamp|name|age|longitude|latitude HBase Kafka flume
  20. 20. Performance - time cost Scenario Data per 30 s Spark Codis Kafka Partition Output number Case 1 0.6 million 20/128G/32 core 10/128G 200 3 Case 2 10 million 28/512G/64 core 10/512G 1200 11 Tagging(Get cache) Tagging (Operation) Output Case1 5 seconds 0.5 s 3 s 1s 1.5 s Data Transformation 1s 11 s 3 s 2 s Case2 17 seconds
  21. 21. CONTENT 04
  22. 22. Next Work Support more Scenarios Faster HA • Join of multiple streams in a time window • More streaming framework, flink, beam etc. • Spark upgrade, structured streaming • Faster cache • No single point of failure
  23. 23. Open Source
  24. 24. Thanks