Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Big Data Europe Transport Pilot case, Luigi Selmi

1,342 views

Published on

Webinar presentation "Big data Europe Transport Pilot case" by Luigi Selmi

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Big Data Europe Transport Pilot case, Luigi Selmi

  1. 1. Pilot SC4 L. Selmi - BDE - SC4 Webinar BDE SC4 02.12.2016
  2. 2. Objective of the Pilot SC4 L. Selmi - BDE - SC4 Webinar A scalable, fault-tolerant and flexible platform based on open source frameworks that can process unbounded data sets and graphs.
  3. 3. Microservice Architecture L. Selmi - BDE - SC4 Webinar
  4. 4. Message Broker L. Selmi - BDE - SC4 Webinar Apache Kafka is a high-throughput distributed durable messaging system Apache Kafka
  5. 5. Kafka Cluster L. Selmi - BDE - SC4 Webinar Apache Kafka
  6. 6. Stream and Batch Processor L. Selmi - BDE - SC4 Webinar Apache Flink is an open source platform for distributed stream and batch data processing. Apache Flink
  7. 7. Flink Cluster L. Selmi - BDE - SC4 Webinar Apache Flink
  8. 8. Storage and Indexing L. Selmi - BDE - SC4 Webinar PostGis is a spatial database that stores the road network data. Elasticsearch is a distributed open source document database built on top of Apache Lucene. It stores the result of the workflow.
  9. 9. Elasticsearch Cluster L. Selmi - BDE - SC4 Webinar
  10. 10. Pilot Architecture L. Selmi - BDE - SC4 Webinar
  11. 11. BDE Components L. Selmi - BDE - SC4 Webinar
  12. 12. The FCD Pipeline L. Selmi - BDE - SC4 Webinar
  13. 13. Visualization L. Selmi - BDE - SC4 Webinar The pilot SC4 can process real-time FCD data for map-matching and classify a road segment according to the traffic level.
  14. 14. Distributed computing: the theoretical minimum L. Selmi - BDE - SC4 Webinar Minimum requirement for fault- tolerance and scalability ● Cluster of 3 nodes (Docker swarm) ● 4 CPU cores x node ● 1 (Flink) worker x node ● 1 (Flink) slot x CPU core Max parallelism = 12
  15. 15. Parallelization: map-match subtasks L. Selmi - BDE - SC4 Webinar 1. source() 2. mapMatch() 3. keyBy()/window()/apply() 4. sink() The subtasks can be distributed in slots with different parallelism (e.g. from 1 to 12)
  16. 16. Parallelization: Flink dataflow L. Selmi - BDE - SC4 Webinar A slot can process all the subtasks in a pipeline
  17. 17. Parallelization: input and output data L. Selmi - BDE - SC4 Webinar device_id timestamp lat lon speed orientation transit The mapMatch subtask keeps the time order so that the next task keyBy(road_seg)/window(15’)/apply() will return the correct average speed and number of vehicles within the time window for each road segment. road_seg_id start_date num_vehicles avg_speed
  18. 18. Pilot Cycle 2 Targets L. Selmi - BDE - SC4 Webinar ● Extend the functionalities ● Improve the technology ● Lower the boundaries
  19. 19. Cycle 2 - Extend the functionalities L. Selmi - BDE - SC4 Webinar Short-term traffic forecasts 1. Map-match 44 Gb of historical Floating Car Data from CERTH (Thessaloniki) 2. Train a model (using ANN) 3. Make predictions using the model and the near real-time data
  20. 20. Cycle 2 - Improve the technology L. Selmi - BDE - SC4 Webinar ● Improve the map-matching algorithm ● Parallelize the processing of the historical data ● Finalizing the “dockerization” of the components
  21. 21. Cycle 2 - Lower the boundaries L. Selmi - BDE - SC4 Webinar ● Set up different visualizations for traffic monitoring and forecasting ● Visualize the traffic pattern in a road segment ● Visualize a location of a vehicle and the matched road segment (for tests)
  22. 22. Thanks L. Selmi - BDE - SC4 Webinar BDE project website: https://www.big-data-europe.eu/ Code repository: https://github.com/big-data-europe Contact: luigi.selmi@iais.fraunhofer.de

×