Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Flink Forward Berlin 2018: Piotr Wawrzyniak & Jarosław Legierski - "Using Apache Flink for Smart Cities: Warsaw case study"


Published on

Mining large streams of real time data has recently grown to one of the key challenges for Big Data community in both, industry and academia. At the same time the concept of Smart City has gained significant acclaim by providing user-oriented services in the modern cities. One of the key part of the smart city is Intelligent Transportation System. Usual consideration of ITS implementation assumes the large amount of various sensor, both stationary and mobile, being deployed in the city area. Such sensors might in particular include video cameras, traffic detectors, or devices attached to various public transport means. The studies proved the usefulness of this data streams for incident detection, traffic monitoring and congestion detection, optimization of public transport schedules and operations. In this speech we present the architecture of complex system developed within EU-H2020 project VaVeL and deployed in a cluster operating in the City of Warsaw. We will discuss in details the core components of the stream and batch processing pipelines, with particular focus on its implementation, cluster-based deployment and integration, fault detection and maintenance.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Flink Forward Berlin 2018: Piotr Wawrzyniak & Jarosław Legierski - "Using Apache Flink for Smart Cities: Warsaw case study"

  1. 1. Using Apache Flink for Smart Cities: Warsaw case study Piotr Wawrzyniak, Jarosław Legierski, PhD Orange Polska Orange Labs R&D Centre 4th September2018
  2. 2. 1. VaVeL project 2. Vehicles Movement Analyser 3. Vehicle delay prediction 4. Cluster deployment and integration 5. Operational management 6. Summary Agenda
  3. 3. VaVeL: Variety, Veracity, VaLue: Handling the Multiplicity of Urban Sensors This research has been supported by the European Union’s Horizon 2020 research and innovation programme under grant agreement No. 688380 ‘VaVeL: Variety, Veracity, VaLue: Handling the Multiplicity of Urban Sensors’.
  4. 4. VaVeL project 1. The goal of the VaVeL project is to radically advance our ability to use urban data in applications that can identify and address citizen needs and improve urban life. Our motivation comes from problems in urban transportation. 2. This project will develop a general purpose framework for managing and mining multiple heterogeneous urban data streams for cities become more efficient, productive and resilient. The framework will be able to solve major issues that arise with urban transportation related data and are currently not dealt by existing stream management technologies. 3. VaVeL aims at making fundamental advances in addressing the most critical inefficiencies of current (big) data management and stream frameworks to cope with emerging urban sensor data thus making European urban data more accessible and easy to use and enhancing European industries that use big data management and analytics.
  5. 5. City of Warsaw use case 1. Warsaw use case focuses primarily on the public transport data 2. Two simultaneous streams of data: • realtime locationsof trams (400+vehiclesduring peak hours) • realtime locationsof buses (1800+vehiclesduring peak hours) 3. Moreover, static data are used to enrich raw data streams (timetables, maps etc.) 4. This is a joint work of Orange Polska and Warsaw University of Technology teams, in particular of Marcin Luckner, PhD, Karolina Kwasiborska and Tomasz Zaremba
  6. 6. City of Warsaw use case
  7. 7. VaVeL system architecture • Data sourceslayer • Data acquisitionlayer - Apache Flume or dedicated applications • Data processing layer – Apache Flink and Apache Spark • Data Storage layer – HDFS, Redis, PostgreSQL • Data Expositionlayer – NGNIX, Open Trip Planner • Applicationlayer – mobile application, web applications
  8. 8. Hadoop cluster
  9. 9. Apache Flink – vehicles movement analyser • Application combining vehicle location stream with static timetables data • Computes delays, stops passing, position with respect to the schedule etc. • Input: Vehicles positions (Apache Kafka stream) Timetables - GTFS files (General Transit Feed Specification) • Output: Vehicles positions with timetables information (Apache Kafka stream & HDFS files) Flink Vehicles-movement-analyser Timetables (GTFS Files) Buses /trams Location records (streams) Buses /trams location records with timetables data Vehicles streams HDFS files
  10. 10. Data Flow Timetables downloader Flume trams-source buses-source Kafka Flink Vehicles-movement- analyser HDFS
  11. 11. Input data .. 127;6;2018-08-29 09:27:25;20.89547;52.208771;2018-08-29T09:27:36.458 217;3;2018-08-29 09:27:23;21.024117;52.179916;2018-08-29T09:27:36.458 503;9;2018-08-29 09:27:25;21.053551;52.14637;2018-08-29T09:27:36.458 503;11;2018-08-29 09:27:18;21.043854;52.174828;2018-08-29T09:27:36.458 503;12;2018-08-29 09:27:26;21.024748;52.220844;2018-08-29T09:27:36.458 503;1;2018-08-29 09:27:26;21.017859;52.23724;2018-08-29T09:27:36.458 503;3;2018-08-29 09:27:29;21.017715;52.237766;2018-08-29T09:27:36.458 127;1;2018-08-29 09:27:28;21.019469;52.246325;2018-08-29T09:27:36.458 503;5;2018-08-29 09:27:25;21.042835;52.194279;2018-08-29T09:27:36.458 503;7;2018-08-29 09:27:21;21.042475;52.14782;2018-08-29T09:27:36.458 708;1;2018-08-29 09:27:30;20.795759;52.297321;2018-08-29T09:27:36.458 503;6;2018-08-29 09:27:24;21.034723;52.165955;2018-08-29T09:27:36.458 217;2;2018-08-29 09:27:28;21.07188;52.160404;2018-08-29T09:27:36.458 .. line String, brigade INT, time TIMESTAMP,lon DOUBLE, lat DOUBLE, finaltime TIMESTAMP
  12. 12. Output data .. 20180330;217;1;2018-08-29 09:29:15;21.041969;52.175838;21.041969;52.175838;MOVING;3.0;3038-Dominikańska;2018- 08-29 09:30:00;3039-Dolina Służewiecka;72.74276626168793;21.043;52.17567;3039-Dolina Służewiecka;21.043;52.17567;72.74276626168793;2018-08-29 09:29:03;2018-08-29 09:29:03;3038-Dominikańska;21.03359;52.17739;596.8414806550528;2018-08-29 09:30:00;217_1_113_0919;Metro Wilanowska;2018-08-29 09:19:00-2018-08-29 09:32:00;UNSAFE;2018-08-29T09:29:27.883;2018-08- 29T09:29:28.532;false;false;;false;5.01;3.0;OPL;10;9;10;3038_02;3039_04;3038_02;3009_17 ..
  13. 13. Output data version STRING, line String, brigade INT, time TIMESTAMP,lon DOUBLE, lat DOUBLE, rawLon DOUBLE, rawLat DOUBLE, status STRING, delay STRING, delayAtStop STRING, plannedLeaveTime TIMESTAMP, nearestStop STRING, nearestStopDistance DOUBLE, nearestStopLon DOUBLE, nearestStopLat DOUBLE, previousStop STRING, previousStopLon DOUBLE,previousStopLat DOUBLE, previousStopDistance DOUBLE, previousStopArrivalTime TIMESTAMP, previousStopLeaveTime TIMESTAMP, nextStop STRING, nextStopLon DOUBLE, nextStopLat DOUBLE, nextStopDistance DOUBLE, nextStopTimetableVisitTime TIMESTAMP, courseIdentifier STRING, courseDirection STRING, timetableIdentifier STRING, timetableStatus STRING, receivedTime TIMESTAMP, processingFinishedTime TIMESTAMP, onWayToDepot BOOLEAN, overlapsWithNextBrigade BOOLEAN, atStop STRING,overlapsWithNextBrigadeStopLineBrigade STRING, speed DOUBLE, (serverId STRING) delayAtStopStopSequence DOUBLE ,previousStopStopSequence DOUBLE,nextStopStopSequence DOUBLE,delayAtStopStopId STRING,previousStopStopId STRING,nextStopStopId STRING
  14. 14. Data volume Stream processing Data cleaning Data storing buses 2-4,5 GB daily trams 1-2,5 GB daily timetables 0,5 GB every day 0,15 GB daily 0,1 GB Every day text data PLMN statistics Clean and deduplicated data HDFS files Locations combined with timetables HDFS Files Real time streams Apache Kafka Apache Hive HFDS files & Apache HBase Vehicle data warehouse Tex data processingInput output
  15. 15. Delay prediction module- motivation
  16. 16. Module Architecture
  17. 17. ML methods used for validation
  18. 18. Evaluation results VAMR model of Apache SAMOA MLP models of WEKA
  19. 19. Operationalmanagement 1. Cluster (incl. Apache Flink) is manager via Apache Ambari 2. Important: Ambari creates empty Flink job to monitor the cluster, so you should see n+1 jobs, where n is the number of ‚real’ Flink applications.
  20. 20. Thank you  visit for info on the VaVeL project This research has been supported by the European Union’s Horizon 2020 research and innovation programme under grant agreement No. 688380 ‘VaVeL: Variety, Veracity, VaLue: Handling the Multiplicity of Urban Sensors’.