Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Using Apache Flink with Amazon Kinesis (ANT395) - AWS re:Invent 2018

754 views

Published on

Amazon Kinesis makes it easy to speed up the time it takes for you to get valuable, real-time insights from your streaming data. Apache Flink is an open source framework and engine for processing data streams. In this chalk talk, we provide an overview of streaming data, Amazon Kinesis, and Apache Flink. We then go deep into a specific example of when to use Apache Flink for building streaming application. Our customer, John Deere, then dives deep into their specific Amazon Kinesis and Apache Flink use case and discusses best practices for processing streaming data in real time.

  • Be the first to comment

Using Apache Flink with Amazon Kinesis (ANT395) - AWS re:Invent 2018

  1. 1. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Using Apache Flink with Amazon Kinesis Greg Finch Senior Product Manager John Deere A N T 3 9 5 Ryan Nienhuis Senior Technical Product Manager AWS, Amazon Kinesis
  2. 2. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Agenda Streaming and Amazon Kinesis overview New Capability: Amazon Kinesis Data Analytics for Java Streaming data at John Deere Architectural choices in streaming data
  3. 3. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. What is streaming data? Low-latencyContinuous Ordered, incremental High volume
  4. 4. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Streaming with Amazon Kinesis Easily collect, process, and analyze video and data streams in real time Capture, process, and store video streams Load data streams into AWS data stores Analyze data streams in real time Capture, process, and store data streams
  5. 5. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Kinesis Data Streams overview
  6. 6. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Data processing from a variety of consumers Fully managed service for real-time processing of streaming data Cost-effective: $0.014 per 1,000,000 PUT Payload Units Millions of sources producing 100’s of terabytes per hour Amazon Web Services Front End AZ AZ AZAuthentic authorization Durable, highly consistent storage replicas data across three data centers (availability zones) Ordered stream of events supports multiple readers Amazon Kinesis Client Library on EC2 Amazon Kinesis Data Firehose Amazon Kinesis Data Analytics AWS Lambda
  7. 7. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Apache Flink Framework and distributed engine for stateful processing of data streams. Simple programming High performance Stateful Processing Strong data integrity Easy to use and flexible APIs make building apps fast In-memory computing provides low latency & high throughput Durable application state saves Exactly-once processing and consistent state
  8. 8. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. How do you build a Flink application? Streaming operators are applied to data streams in a pipeline Source Sink DataStream KeyedDataStream DataStream Sink keyBy, window filter apply
  9. 9. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. What does your code look like? DataStream <GameEvent> rawEvents = env.addSource( New KinesisStreamSource(“input_events”)); DataStream <UserPerLevel> gameStream = rawEvents.map(event - > new UserPerLevel(event.gameMetadata.gameId, event.gameMetadata.levelId,event.userId)); gameStream.keyBy(event -> event.gameId) .keyBy(1) .window(TumblingProcessingTimeWindows.of(Time.minutes(1))) .apply(...) - > {...}; gameStream.addSink(new KinesisStreamSink("myGameStateStream"));
  10. 10. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  11. 11. © 2018 Deere & Company, All rights reserved. About John Deere
  12. 12. © 2018 Deere & Company, All rights reserved. Sophisticated machines produce massive data streams
  13. 13. © 2018 Deere & Company, All rights reserved. Machine Sync - real-time multi-machine coordination
  14. 14. © 2018 Deere & Company, All rights reserved. Remote monitoring and adjustments
  15. 15. © 2018 Deere & Company, All rights reserved. Operations Center
  16. 16. © 2018 Deere & Company, All rights reserved. John Deere Data Platform Processing millions of sensor measurements per second. Serving more than one billion field maps. Supports monitoring, tracking, dashboarding, and deep analysis applications.
  17. 17. © 2018 Deere & Company, All rights reserved. A simple solution for many applications
  18. 18. © 2018 Deere & Company, All rights reserved. Managing state can get complicated
  19. 19. © 2018 Deere & Company, All rights reserved. Keeping up: Over-sharding the stream
  20. 20. © 2018 Deere & Company, All rights reserved. Keeping up: Fan-out
  21. 21. © 2018 Deere & Company, All rights reserved. System complexity increases with use case complexity
  22. 22. © 2018 Deere & Company, All rights reserved. Shifting to Apache Flink
  23. 23. © 2018 Deere & Company, All rights reserved. An example Source Sessions Stream Source Sensors Stream Map Decode Sessions Map Decode Sensors Join Session & Sensors Window Aggregate Totals Flat Map Compute Tile Keys Window Rasterize Sink Tiles to S3 Sink Totals to DynamoDB Key by Key by High Parallelism
  24. 24. © 2018 Deere & Company, All rights reserved. Powerful solution for sophisticated applications
  25. 25. Thank you! © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Ryan Nienhuis Greg Finch
  26. 26. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.

×