Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Serving the Real-Time Data Needs of an Airport with Kafka Streams and KSQL

131 views

Published on

(Sönke LIebau, OpenCore GmbH & Co.KG) Kafka Summit SF 2018

Airports are complex networks consisting of an immense number of systems that are necessary to keep the daily stream of passengers in constant motion. Connecting these systems in order to make the big picture transparent to the people running the show, authorities and last but not least the passengers is no simple endeavor.

In this talk I will describe a fictional airport and its effort to restructure the IT infrastructure around Kafka Streams to serve the real-time data needs of a busy airport. I will start by giving a brief overview of Kafka Streams, KSQL and the opportunities they offer for real-time stream processing. Following that we will explore the the target architecture, which relies heavily on manifested views to serve up-to-date data, while also persisting to a traditional data lake for larger analytics workflows. Additionally we will take a look at the generic data transformation framework that was created to minimize integration effort of the data receiving systems. To illustrate these ideas I will describe some examples of possible integrations: joining flight data with radar and weather data to predict arrival time at the gate down to the second, constantly updated processing data from the luggage conveyor belts as well as results from prediction models for passenger flow, and many more.

Published in: Technology
  • Be the first to comment

Serving the Real-Time Data Needs of an Airport with Kafka Streams and KSQL

  1. 1. Stream Processing Airport Data Sönke Liebau – Co-Founder and Partner @ OpenCore October 17th 2018 Serving the Real-Time Data Needs of an Airport with Kafka Streams and KSQL
  2. 2. Who Am I? • Partner & Co-Founder at • Small consulting company with a Big Data & Open Source focus • First production Kafka deployment in 2014 Website: www.opencore.com soenke.liebau@opencore.com https://www.linkedin.com/in/soenkeliebau/ @soenkeliebau
  3. 3. Kafka Streams & KSQL
  4. 4. Source: https://kafka.apache.org/20/documentation/streams/ 13 What Is Kafka Streams? “The easiest way to write mission-critical real-time applications and microservices” “Kafka Streams is a client library for building applications and microservices, where the input and output data are stored in Kafka clusters. “
  5. 5. What Is KSQL? Confluent KSQL is the open source, streaming SQL engine that enables real-time data processing against Apache Kafka® Source: https://www.confluent.io/product/ksql/ 14
  6. 6. © 2018 OpenCore GmbH & Co. KG 17 Kafka Streams In The Ecosystem Sources KafkaConnect KafkaConnect Destinations Kafka Streams Jobs
  7. 7. © 2018 OpenCore GmbH & Co. KG 18 The Big Difference
  8. 8. 20 Using Kafka Streams final Serde<String> stringSerde = Serdes.String(); final Serde<Long> longSerde = Serdes.Long(); KStream<String, String> textLines = builder.stream("streams-plaintext-input", Consumed.with(stringSerde, stringSerde); KTable<String, Long> wordCounts = textLines .flatMapValues(value -> Arrays.asList(value.toLowerCase().split("W+"))) .groupBy((key, value) -> value) .count() wordCounts.toStream().to("streams-wordcount-output", Produced.with(Serdes.String(), Serdes.Long())); Source: https://kafka.apache.org/20/documentation/streams/quickstart
  9. 9. 21 Using Kafka Streams final Serde<String> stringSerde = Serdes.String(); final Serde<Long> longSerde = Serdes.Long(); KStream<String, String> textLines = builder.stream("streams-plaintext-input", Consumed.with(stringSerde, stringSerde); KTable<String, Long> wordCounts = textLines .flatMapValues(value -> Arrays.asList(value.toLowerCase().split("W+"))) .groupBy((key, value) -> value) .count() wordCounts.toStream().to("streams-wordcount-output", Produced.with(Serdes.String(), Serdes.Long())); Source: https://kafka.apache.org/20/documentation/streams/quickstart …
  10. 10. © 2018 OpenCore GmbH & Co. KG 22 Using KSQL RestInterface CLI Rest Client SELECT * FROM security_in WHERE status=’success’ AND terminal=’t1’;
  11. 11. 23 Running A KSQL Statement
  12. 12. © 2017 OpenCore GmbH & Co. KG 24 The Competition
  13. 13. Kafka Streams KSQL When To Use Which? • Offers lower level access • More data formats supported • Queryable state • Problems that cannot be expressed in SQL • Easier for people used to SQL • No need for additional orchestration • Data exploration © 2018 OpenCore GmbH & Co. KG 25
  14. 14. Our Airport © 2017 OpenCore GmbH & Co. KG 26
  15. 15. © 2017 OpenCore GmbH & Co. KG 27 A Few Facts Up Front • A lot of independent data sources • Airline ticketing • Baggage transport system • Passenger counting • Retail • Radar • Weather • … • Spread over multiple companies • Many legacy interfaces
  16. 16. © 2018 OpenCore GmbH & Co. KG 28 Integrations Operations Database External System External System External System External System External System External System
  17. 17. © 2018 OpenCore GmbH & Co. KG 29 Isolated Islands Of Data • A lot of isolated data stores to provide data for necessary solutions • Spiderweb of integrations • Operational DB needs to push data to a lot of systems • Many different formats
  18. 18. © 2018 OpenCore GmbH & Co. KG 31 The Dream … Weird binary source XML Source Destination Destination Destination Raw Source Processed RestStream Processing
  19. 19. © 2018 OpenCore GmbH & Co. KG 32 Ingest Transformation - Kafka Streams StreamsBuilder builder = new StreamsBuilder(); Serde<ProprietaryObject> weirdFormatSerde = new ProprietaryWeirdFormatSerde(); Serde<ProprietaryObject> avroSerde = new ProprietaryAvroSerde(); builder.stream(“proprietary_input_topic", Consumed.with( Serdes.String(), weirdFormatSerde)) .to("avro_output_topic", Produced.with( Serdes.String(), avroSerde));
  20. 20. 33 Ingest Transformation - KSQL ksql> CREATE STREAM source (uid INT, name VARCHAR) WITH (KAFKA_TOPIC='mysql_users', VALUE_FORMAT='JSON‘); ksql> CREATE STREAM target_avro WITH (VALUE_FORMAT='AVRO', KAFKA_TOPIC='mysql_users_avro') AS SELECT * FROM source; Source: https://gist.github.com/rmoff/165b05e4554c41719b71f1a47ee7b113
  21. 21. © 2018 OpenCore GmbH & Co. KG 34 Stream Processing • Stream processing jobs read converted avro topics and create enriched topics/alerts/… by • Joining streams • Aggregating streams • Filtering or alerting on streams • …
  22. 22. © 2018 OpenCore GmbH & Co. KG 35 DISCLAIMER
  23. 23. © 2018 OpenCore GmbH & Co. KG 36 Gate Changes • Gate changes can be based on different information • Delays of the incoming flight • Changes on other outgoing flights • … • Join relevant streams and publish change events that are consumed by • Apps • Gate monitors • Departure boards • …
  24. 24. © 2018 OpenCore GmbH & Co. KG 37 Passenger Count • Join stream of tickets scanned before line to security check and camera count of passengers leaving security check to estimate number of waiting passengers • Change routing of passengers (physical: signs change & digital: different routing in app) • Also consumed by • Monitors to display predicted waiting time • App to display predicted wait time • Predicition systems to feed models for capacity planning • Models to predict if a passenger might miss his flight -> reroute to priority lane
  25. 25. © 2018 OpenCore GmbH & Co. KG 38 Wait Time • Calculate how long a passenger took to clear the security checkpoint by joining when he scanned his boarding pass and when he is first spotted by an iBeacon beyond security • Push offers based on wait time and flight time • Long wait, lot of time till take-off -> free coffee or sandwich • Long wait, short time till take-off -> duty free voucher • …
  26. 26. © 2018 OpenCore GmbH & Co. KG 39 Baggage Notification • Baggage containers are scanned when they are loaded/unloaded • By joining this with data from the baggage sorter passengers could receive push notifications when their luggage is loaded/unloaded into/from the plane
  27. 27. © 2018 OpenCore GmbH & Co. KG 40 Arrival At Gate • There are complex models running to estimate when the plane will arrive at the gate after it has landed • Based on ground radar data • Can be used to • Predict whether the following flight might be delayed • Coordinate cleaning crews • Coordinate refueling • Feed into gate change decisions
  28. 28. © 2018 OpenCore GmbH & Co. KG 41 An Example Flow {"boardingpass_id":"123", "passenger“:"smith", "flight_number":"LH454“, “checked_bags”:1} {"boardingpass_id":"123", "security_area":"t1_2", "status":"success"} {"security_area":"t1_2", "count":"1"} {"passenger":"smith", "beacon_id":"t1_b123"} {"boardingpass_id":"123", "item_group":"cigarettes"} {"boardingpass_id":"123", "status":"success"} {"flight_no":"LH454", “runway":“1north"} {“old_gate":“a12”, “new_gate":“e50"}
  29. 29. © 2018 OpenCore GmbH & Co. KG 42 Check-In Event {"boardingpass_id":"123", "passenger":"smith", "flight_number":"LH454“, “terminal”:“terminal1” } check_in_count CREATE TABLE check_in_count AS SELECT terminal, count(terminal) FROM security_in WINDOW TUMBLING (SIZE 24 hour) GROUP BY terminal; check_in What is it good for? • Early warning for security capacity • „Don‘t dawdle“ warning based on security queues
  30. 30. © 2018 OpenCore GmbH & Co. KG 43 Passenger Enters Security Area {"boardingpass_id":"123", "security_area":"t1_2", "status" : "success"} security_in_count CREATE TABLE security_in_count AS SELECT security_area, count(security_area) FROM security_in WINDOW TUMBLING (SIZE 24 hour) WHERE status='success' GROUP BY security_area; security_in What is it good for? • Monitor for failed attempts • Passenger routing to security • Unload baggage of late passengers • … time_to_security SELECT s.boardingpass_id, c.rowtime - s.rowtime as time_to_security FROM security_in s LEFT JOIN check_in c WITHIN 1 HOUR ON s.boardingpass_id=c.boardingpass_id;
  31. 31. © 2018 OpenCore GmbH & Co. KG 44 Passenger Leaves Security Area {"security_area":"t1_2", "count“:"1"} security_out security_out_count CREATE TABLE security_out_count AS SELECT security_area, sum(count) FROM security_out WINDOW TUMBLING (SIZE 24 hour) GROUP BY security_area; security_in_count current_count What is it good for? • Capacity planning • Wait time prediction • Passenger routing (apps & physical) • Alerting on late passengers checking in • … SELECT i.terminal AS terminal, i.KSQL_COL_1 AS entry, o.KSQL_COL_1 AS exit, i.KSQL_COL_1 - o.KSQL_COL_1 AS waiting FROM security_in_count i INNER JOIN security_out_count o ON i.terminal=o.terminal;
  32. 32. © 2018 OpenCore GmbH & Co. KG 45 Passenger Located Via iBeacon {"passenger":"smith", "beacon_id":"t1_b123"} security_duration security_in dutyfree_joined CREATE STREAM dutyfree_joined AS SELECT c.boardingpass_id, d.passenger FROM dutyfree_in d LEFT JOIN security_in s WITHIN 1 HOURS ON s.passenger=d.passenger; dutyfree_in SELECT d.boardingpass_id, d.d_passenger, d.rowtime - s.rowtime as time_in_security FROM dutyfree_in_with_bc d LEFT JOIN security_in s WITHIN 1 HOUR ON d.boardingpass_id=s.boardingpass_id; What is it good for? • Refining wait time prediction • Targeted questionaire (find reasons for outliers) • Vouchers for huge delays • …
  33. 33. © 2018 OpenCore GmbH & Co. KG 46 Purchase Event {"boardingpass_id":"123", "item_group":"cigarettes"} flight_information check_in dutyfree_joined What is it good for? • Retail models • Route to smoking area nearest to gate • Advise of walk time if time is tight • … dutyfree_purchase CREATE STREAM dutyfree_joined AS SELECT c.boardingpass_id, c.passenger, p.purchase_type, f.gate FROM dutyfree_purchase p LEFT JOIN check_in c WITHIN 1 HOURS ON c.passenger=p.passenger LEFT JOIN flight_information f WITHIN 1 HOURS ON f.flight_number = c.flight_number;
  34. 34. © 2018 OpenCore GmbH & Co. KG 47 Gate Change expected_gate_arrival notifications expected_gate_departure CREATE STREAM gate_wait_time AS SELECT a.flight, d.departure_time - a.arrival_time as wait_time FROM expected_gate_arrival a INNER JOIN expected_gate_departure d WITHIN 1 HOURS ON a.gate=d.gate; gate_wait_time gate_change CREATE STREAM gate_change AS SELECT flight FROM gate_wait_time WHERE wait_time > 600000; CREATE STREAM notifications AS SELECT f.passenger FROM gate_change g LEFT JOIN flight_information f WITHIN 1 HOURS ON f.gate=g.gate; flight_information
  35. 35. © 2018 OpenCore GmbH & Co. KG 48 Passenger Boards Plane {"boardingpass_id":"123", "status":"success"} gate_in What is it good for? • Alert on bags without matching passengers • Trigger unloading based on related events • Gate closed • Time based • … bags_joined check_in baggage_loaded CREATE STREAM bag_join AS SELECT c.passenger, c.bags FROM gate_in g LEFT JOIN check_in c WITHIN 1 HOURS ON c.boardingpass_id=g.boardingpass_id LEFT JOIN baggage_loaded b WITHIN 1 HOURS ON b.bag_id = c.bag_id;
  36. 36. Thank You! © 2018 OpenCore GmbH & Co. KG

×