SlideShare a Scribd company logo
©  2016,  Amazon  Web  Services,  Inc.  or  its  Affiliates.  All  rights  reserved.
Dr.  Steffen  Hausmann,  Solutions  Architect,  AWS
September  13,  2017
Build  a  Real-­time  Stream  Processing  
Pipeline  with  Apache  Flink on  AWS
Stream  Processing  Challenges
Consistency  and  
high  availability
Low  latency  and  
high  throughput
Rich  forms  of  
queries
Event  time  and  out  
of  order  events
Apache  Flink
“Apache  Flink® is  an  open  source  platform  for  distributed  
stream  and  batch  data  processing.”
https://flink.apache.org/
http://data-­artisans.com/why-­apache-­flink/
Analyzing  NYC  Taxi  Rides  in  Real-­time
Simple  Pattern  for  Streaming  Data
Continuously  creates  
data
Continuously  writes  
data  to  a  stream
Can  be  almost  
anything
Data  Producer
Durably  stores  data
Provides  temporary  
buffer
Supports  very  high-­
throughput
Streaming  Storage
Continuously  
processes  data
Cleans,  prepares,  &  
aggregates
Transforms  data  to  
information
Data  Consumer
Mobile  Client Amazon  Kinesis Apache  Flink
Amazon  Kinesis  Streams  
Create  streams  to  capture  and  store  
streaming  data
Replicates  your  streaming  data  across  three  
facilities
Elastically  add  and  remove  shards  to  scale  
throughput
Secured  via  AWS  IAM  and  server-­side  
encryption
Amazon  Elastic  Map  Reduce  (EMR)
Easily  provision  and  manage  clusters  for  
your  big  data  needs
Hadoop,  Flink,  Spark,  Presto,  HBase,  Tez,  
Hive,  Pig,  …
Dynamically  scalable,  persistent,  or  
transient  clusters  
Tightly  integrated  with  other  AWS  services,  
eg,  for storage,  encryption,  and  monitoring
Amazon  Elasticsearch  Service
Setup  Elasticsearch cluster  in  minutes
Integrated  with  Logstash and  Kibana
Scale  Elasticsearch clusters  seamlessly
Highly  available  and  reliable
Tightly  integrated  with  other  AWS  services
Amazon  Kinesis  
Streams
Amazon  ESApache  Flink on  
Amazon  EMR
Architecture  for  Analyzing  Taxi  Rides
Let’s  dive  right  in!
Lessons  Learned  and  
Best  Practices
Building  the  Flink  Kinesis  Connector
The  Flink Kinesis  Connector  binary  is  not  available  from  
Maven  Central
Build  the  Connector  with  Maven  3.0.x,  3.1.x,  or  3.2.x  …
• mvn clean  install  -­Pinclude-­kinesis  -­DskipTests
-­Dhadoop-­two.version=2.7.3
…  or  use  CodeBuild to  let  it  be  build  for  you!
Important  Parameters  of  the  Kinesis  Connector
AWS_CREDENTIALS_PROVIDER
• determines  how  Flink obtains  IAM  credentials
• set  to  AUTO  and  use  appropriate  roles  with  the  EMR  cluster
SHARD_GETRECORDS_INTERVAL_MILLIS
• determines  how  often  Flink polls  events  from  Kinesis
• set  to  at  least  1000  to  facilitate  multiple  consumers
Connecting  to  the  Flink  Dashboard
Use  dynamic  port  forwarding  to  the  Master  node
• ssh -­D  8157  hadoop@...
Use  FoxyProxy to  redirect  URLs  to  localhost
• *ec2*.amazonaws.com*
• *.compute.internal*
Connect  through  the  EMR  console
• navigate  to  the  YARN  Resource  Manager  
• select  the  Flink ApplicationMaster
Starting  Flink and  Submitting  Jobs
Important  Kinesis  Streams  Metrics
Checkpointing and  High  Availability
Zookeeper  can  be  bootstrapped  on  EMR
Overprovision  the  EMR  cluster  for  fast  failovers
Use  externalized  checkpoints  and  store  them  on  Amazon  S3
externalized  
Checkpoint
Build  a  Stream  Processing  Pipeline  Yourself
Many  examples  with  sample  code  are  on  the  AWS  Big  
Data  Blog.  Follow  the  blog!
Build  a  Real-­time  Stream  Processing  Pipeline  with  Apache  
Flink  on  AWS
https://github.com/awslabs/flink-­stream-­processing-­refarch/
Thank  you!
shausma@amazon.de

More Related Content

What's hot

Flink Forward Berlin 2017: Dongwon Kim - Predictive Maintenance with Apache F...
Flink Forward Berlin 2017: Dongwon Kim - Predictive Maintenance with Apache F...Flink Forward Berlin 2017: Dongwon Kim - Predictive Maintenance with Apache F...
Flink Forward Berlin 2017: Dongwon Kim - Predictive Maintenance with Apache F...
Flink Forward
 
Flink Forward San Francisco 2019: Massive Scale Data Processing at Netflix us...
Flink Forward San Francisco 2019: Massive Scale Data Processing at Netflix us...Flink Forward San Francisco 2019: Massive Scale Data Processing at Netflix us...
Flink Forward San Francisco 2019: Massive Scale Data Processing at Netflix us...
Flink Forward
 
Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...
Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...
Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...
Flink Forward
 

What's hot (20)

Flink Forward Berlin 2017: Pramod Bhatotia, Do Le Quoc - StreamApprox: Approx...
Flink Forward Berlin 2017: Pramod Bhatotia, Do Le Quoc - StreamApprox: Approx...Flink Forward Berlin 2017: Pramod Bhatotia, Do Le Quoc - StreamApprox: Approx...
Flink Forward Berlin 2017: Pramod Bhatotia, Do Le Quoc - StreamApprox: Approx...
 
Flink Forward SF 2017: Cliff Resnick & Seth Wiesman - From Zero to Streami...
Flink Forward SF 2017:  Cliff Resnick & Seth Wiesman -   From Zero to Streami...Flink Forward SF 2017:  Cliff Resnick & Seth Wiesman -   From Zero to Streami...
Flink Forward SF 2017: Cliff Resnick & Seth Wiesman - From Zero to Streami...
 
Hadoop summit - Scaling Uber’s Real-Time Infra for Trillion Events per Day
Hadoop summit - Scaling Uber’s Real-Time Infra for  Trillion Events per DayHadoop summit - Scaling Uber’s Real-Time Infra for  Trillion Events per Day
Hadoop summit - Scaling Uber’s Real-Time Infra for Trillion Events per Day
 
Flink Forward SF 2017: Malo Deniélou - No shard left behind: Dynamic work re...
Flink Forward SF 2017: Malo Deniélou -  No shard left behind: Dynamic work re...Flink Forward SF 2017: Malo Deniélou -  No shard left behind: Dynamic work re...
Flink Forward SF 2017: Malo Deniélou - No shard left behind: Dynamic work re...
 
Flink Forward San Francisco 2019: Building production Flink jobs with Airstre...
Flink Forward San Francisco 2019: Building production Flink jobs with Airstre...Flink Forward San Francisco 2019: Building production Flink jobs with Airstre...
Flink Forward San Francisco 2019: Building production Flink jobs with Airstre...
 
Stateful Distributed Stream Processing
Stateful Distributed Stream ProcessingStateful Distributed Stream Processing
Stateful Distributed Stream Processing
 
Kafka Summit NYC 2017 - Data Processing at LinkedIn with Apache Kafka
Kafka Summit NYC 2017 - Data Processing at LinkedIn with Apache KafkaKafka Summit NYC 2017 - Data Processing at LinkedIn with Apache Kafka
Kafka Summit NYC 2017 - Data Processing at LinkedIn with Apache Kafka
 
Portable Streaming Pipelines with Apache Beam
Portable Streaming Pipelines with Apache BeamPortable Streaming Pipelines with Apache Beam
Portable Streaming Pipelines with Apache Beam
 
Flink Forward Berlin 2017: Till Rohrmann - From Apache Flink 1.3 to 1.4
Flink Forward Berlin 2017: Till Rohrmann - From Apache Flink 1.3 to 1.4Flink Forward Berlin 2017: Till Rohrmann - From Apache Flink 1.3 to 1.4
Flink Forward Berlin 2017: Till Rohrmann - From Apache Flink 1.3 to 1.4
 
FlinkDTW: Time-series Pattern Search at Scale Using Dynamic Time Warping - Ch...
FlinkDTW: Time-series Pattern Search at Scale Using Dynamic Time Warping - Ch...FlinkDTW: Time-series Pattern Search at Scale Using Dynamic Time Warping - Ch...
FlinkDTW: Time-series Pattern Search at Scale Using Dynamic Time Warping - Ch...
 
Kafka Summit NYC 2017 - Building Advanced Streaming Applications using the La...
Kafka Summit NYC 2017 - Building Advanced Streaming Applications using the La...Kafka Summit NYC 2017 - Building Advanced Streaming Applications using the La...
Kafka Summit NYC 2017 - Building Advanced Streaming Applications using the La...
 
Flink Forward Berlin 2017 Keynote: Ferd Scheepers - Taking away customer fric...
Flink Forward Berlin 2017 Keynote: Ferd Scheepers - Taking away customer fric...Flink Forward Berlin 2017 Keynote: Ferd Scheepers - Taking away customer fric...
Flink Forward Berlin 2017 Keynote: Ferd Scheepers - Taking away customer fric...
 
Flink Forward SF 2017: Kenneth Knowles - Back to Sessions overview
Flink Forward SF 2017: Kenneth Knowles - Back to Sessions overviewFlink Forward SF 2017: Kenneth Knowles - Back to Sessions overview
Flink Forward SF 2017: Kenneth Knowles - Back to Sessions overview
 
Flink Forward SF 2017: Chinmay Soman - Real Time Analytics in the real World ...
Flink Forward SF 2017: Chinmay Soman - Real Time Analytics in the real World ...Flink Forward SF 2017: Chinmay Soman - Real Time Analytics in the real World ...
Flink Forward SF 2017: Chinmay Soman - Real Time Analytics in the real World ...
 
Flink Forward Berlin 2017: Dongwon Kim - Predictive Maintenance with Apache F...
Flink Forward Berlin 2017: Dongwon Kim - Predictive Maintenance with Apache F...Flink Forward Berlin 2017: Dongwon Kim - Predictive Maintenance with Apache F...
Flink Forward Berlin 2017: Dongwon Kim - Predictive Maintenance with Apache F...
 
Apache Flink Berlin Meetup May 2016
Apache Flink Berlin Meetup May 2016Apache Flink Berlin Meetup May 2016
Apache Flink Berlin Meetup May 2016
 
Virtual Flink Forward 2020: Production-Ready Flink and Hive Integration - wha...
Virtual Flink Forward 2020: Production-Ready Flink and Hive Integration - wha...Virtual Flink Forward 2020: Production-Ready Flink and Hive Integration - wha...
Virtual Flink Forward 2020: Production-Ready Flink and Hive Integration - wha...
 
Flink Forward San Francisco 2019: Massive Scale Data Processing at Netflix us...
Flink Forward San Francisco 2019: Massive Scale Data Processing at Netflix us...Flink Forward San Francisco 2019: Massive Scale Data Processing at Netflix us...
Flink Forward San Francisco 2019: Massive Scale Data Processing at Netflix us...
 
Data Stream Processing with Apache Flink
Data Stream Processing with Apache FlinkData Stream Processing with Apache Flink
Data Stream Processing with Apache Flink
 
Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...
Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...
Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...
 

Similar to Flink Forward Berlin 2017: Steffen Hausmann - Build a Real-time Stream Processing Pipeline with Apache Flink on AWS

Similar to Flink Forward Berlin 2017: Steffen Hausmann - Build a Real-time Stream Processing Pipeline with Apache Flink on AWS (20)

Building a Data Processing Pipeline on AWS
Building a Data Processing Pipeline on AWSBuilding a Data Processing Pipeline on AWS
Building a Data Processing Pipeline on AWS
 
WhizCard-CLF-C01-06-09-2022.pdf
WhizCard-CLF-C01-06-09-2022.pdfWhizCard-CLF-C01-06-09-2022.pdf
WhizCard-CLF-C01-06-09-2022.pdf
 
AMAZON CLOUD Course Content
AMAZON CLOUD Course ContentAMAZON CLOUD Course Content
AMAZON CLOUD Course Content
 
Bluesoft @ AWS re:Invent 2017 + AWS 101
Bluesoft @ AWS re:Invent 2017 + AWS 101Bluesoft @ AWS re:Invent 2017 + AWS 101
Bluesoft @ AWS re:Invent 2017 + AWS 101
 
AWS Cloud Practitioner.PDF
AWS Cloud Practitioner.PDFAWS Cloud Practitioner.PDF
AWS Cloud Practitioner.PDF
 
AWS-Certified-Cloud-Practitioner wiz.pdf
AWS-Certified-Cloud-Practitioner wiz.pdfAWS-Certified-Cloud-Practitioner wiz.pdf
AWS-Certified-Cloud-Practitioner wiz.pdf
 
From your First Migration to Mass migrations.
From your First Migration to Mass migrations. From your First Migration to Mass migrations.
From your First Migration to Mass migrations.
 
GPSTEC313_GPS Real-Time Data Processing with AWS Lambda Quickly, at Scale, an...
GPSTEC313_GPS Real-Time Data Processing with AWS Lambda Quickly, at Scale, an...GPSTEC313_GPS Real-Time Data Processing with AWS Lambda Quickly, at Scale, an...
GPSTEC313_GPS Real-Time Data Processing with AWS Lambda Quickly, at Scale, an...
 
AWS Reinvent Recap 2018
AWS Reinvent Recap 2018 AWS Reinvent Recap 2018
AWS Reinvent Recap 2018
 
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015
 
Increase Speed and Agility with Amazon Web Services
Increase Speed and Agility with Amazon Web ServicesIncrease Speed and Agility with Amazon Web Services
Increase Speed and Agility with Amazon Web Services
 
Increase Speed and Agility with Amazon Web Services
Increase Speed and Agility with Amazon Web ServicesIncrease Speed and Agility with Amazon Web Services
Increase Speed and Agility with Amazon Web Services
 
Building a Data Processing Pipeline on AWS - AWS Summit SG 2017
Building a Data Processing Pipeline on AWS - AWS Summit SG 2017Building a Data Processing Pipeline on AWS - AWS Summit SG 2017
Building a Data Processing Pipeline on AWS - AWS Summit SG 2017
 
AWS Architecting Cloud Apps - Best Practices and Design Patterns By Jinesh Varia
AWS Architecting Cloud Apps - Best Practices and Design Patterns By Jinesh VariaAWS Architecting Cloud Apps - Best Practices and Design Patterns By Jinesh Varia
AWS Architecting Cloud Apps - Best Practices and Design Patterns By Jinesh Varia
 
Building a Real Time Dashboard with Amazon Kinesis, Amazon Lambda and Amazon ...
Building a Real Time Dashboard with Amazon Kinesis, Amazon Lambda and Amazon ...Building a Real Time Dashboard with Amazon Kinesis, Amazon Lambda and Amazon ...
Building a Real Time Dashboard with Amazon Kinesis, Amazon Lambda and Amazon ...
 
Deep dive and best practices on real time streaming applications nyc-loft_oct...
Deep dive and best practices on real time streaming applications nyc-loft_oct...Deep dive and best practices on real time streaming applications nyc-loft_oct...
Deep dive and best practices on real time streaming applications nyc-loft_oct...
 
Deep Dive on Amazon Elastic Container Service (ECS) and Fargate
Deep Dive on Amazon Elastic Container Service (ECS) and FargateDeep Dive on Amazon Elastic Container Service (ECS) and Fargate
Deep Dive on Amazon Elastic Container Service (ECS) and Fargate
 
Getting started with streaming analytics: Deep Dive
Getting started with streaming analytics: Deep DiveGetting started with streaming analytics: Deep Dive
Getting started with streaming analytics: Deep Dive
 
Running your First Application on AWS
Running your First Application on AWSRunning your First Application on AWS
Running your First Application on AWS
 
Architecting for the Cloud: Best Practices
Architecting for the Cloud: Best PracticesArchitecting for the Cloud: Best Practices
Architecting for the Cloud: Best Practices
 

More from Flink Forward

More from Flink Forward (20)

Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...
 
Evening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in Flink
 
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
 
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
 
Introducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorIntroducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes Operator
 
Autoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeAutoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive Mode
 
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
 
One sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async SinkOne sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async Sink
 
Tuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptxTuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptx
 
Flink powered stream processing platform at Pinterest
Flink powered stream processing platform at PinterestFlink powered stream processing platform at Pinterest
Flink powered stream processing platform at Pinterest
 
Apache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraApache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native Era
 
Where is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in FlinkWhere is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in Flink
 
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production DeploymentUsing the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production Deployment
 
The Current State of Table API in 2022
The Current State of Table API in 2022The Current State of Table API in 2022
The Current State of Table API in 2022
 
Flink SQL on Pulsar made easy
Flink SQL on Pulsar made easyFlink SQL on Pulsar made easy
Flink SQL on Pulsar made easy
 
Dynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data AlertsDynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data Alerts
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotExactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
 
Processing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial ServicesProcessing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial Services
 
Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...
 
Batch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergBatch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & Iceberg
 

Recently uploaded

一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
vcaxypu
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
ewymefz
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
vcaxypu
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
ukgaet
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
ewymefz
 
Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...
Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...
Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...
Domenico Conte
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
ewymefz
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
nscud
 
Investigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesInvestigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_Crimes
StarCompliance.io
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
ewymefz
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
yhkoc
 

Recently uploaded (20)

Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
 
tapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive datatapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive data
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 
Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...
Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...
Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...
 
Using PDB Relocation to Move a Single PDB to Another Existing CDB
Using PDB Relocation to Move a Single PDB to Another Existing CDBUsing PDB Relocation to Move a Single PDB to Another Existing CDB
Using PDB Relocation to Move a Single PDB to Another Existing CDB
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
 
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
 
Investigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesInvestigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_Crimes
 
Business update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMIBusiness update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMI
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
 
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPsWebinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
 
Uber Ride Supply Demand Gap Analysis Report
Uber Ride Supply Demand Gap Analysis ReportUber Ride Supply Demand Gap Analysis Report
Uber Ride Supply Demand Gap Analysis Report
 

Flink Forward Berlin 2017: Steffen Hausmann - Build a Real-time Stream Processing Pipeline with Apache Flink on AWS

  • 1. ©  2016,  Amazon  Web  Services,  Inc.  or  its  Affiliates.  All  rights  reserved. Dr.  Steffen  Hausmann,  Solutions  Architect,  AWS September  13,  2017 Build  a  Real-­time  Stream  Processing   Pipeline  with  Apache  Flink on  AWS
  • 2. Stream  Processing  Challenges Consistency  and   high  availability Low  latency  and   high  throughput Rich  forms  of   queries Event  time  and  out   of  order  events
  • 3. Apache  Flink “Apache  Flink® is  an  open  source  platform  for  distributed   stream  and  batch  data  processing.” https://flink.apache.org/ http://data-­artisans.com/why-­apache-­flink/
  • 4. Analyzing  NYC  Taxi  Rides  in  Real-­time
  • 5. Simple  Pattern  for  Streaming  Data Continuously  creates   data Continuously  writes   data  to  a  stream Can  be  almost   anything Data  Producer Durably  stores  data Provides  temporary   buffer Supports  very  high-­ throughput Streaming  Storage Continuously   processes  data Cleans,  prepares,  &   aggregates Transforms  data  to   information Data  Consumer Mobile  Client Amazon  Kinesis Apache  Flink
  • 6. Amazon  Kinesis  Streams   Create  streams  to  capture  and  store   streaming  data Replicates  your  streaming  data  across  three   facilities Elastically  add  and  remove  shards  to  scale   throughput Secured  via  AWS  IAM  and  server-­side   encryption
  • 7. Amazon  Elastic  Map  Reduce  (EMR) Easily  provision  and  manage  clusters  for   your  big  data  needs Hadoop,  Flink,  Spark,  Presto,  HBase,  Tez,   Hive,  Pig,  … Dynamically  scalable,  persistent,  or   transient  clusters   Tightly  integrated  with  other  AWS  services,   eg,  for storage,  encryption,  and  monitoring
  • 8. Amazon  Elasticsearch  Service Setup  Elasticsearch cluster  in  minutes Integrated  with  Logstash and  Kibana Scale  Elasticsearch clusters  seamlessly Highly  available  and  reliable Tightly  integrated  with  other  AWS  services
  • 9. Amazon  Kinesis   Streams Amazon  ESApache  Flink on   Amazon  EMR Architecture  for  Analyzing  Taxi  Rides
  • 11. Lessons  Learned  and   Best  Practices
  • 12. Building  the  Flink  Kinesis  Connector The  Flink Kinesis  Connector  binary  is  not  available  from   Maven  Central Build  the  Connector  with  Maven  3.0.x,  3.1.x,  or  3.2.x  … • mvn clean  install  -­Pinclude-­kinesis  -­DskipTests -­Dhadoop-­two.version=2.7.3 …  or  use  CodeBuild to  let  it  be  build  for  you!
  • 13. Important  Parameters  of  the  Kinesis  Connector AWS_CREDENTIALS_PROVIDER • determines  how  Flink obtains  IAM  credentials • set  to  AUTO  and  use  appropriate  roles  with  the  EMR  cluster SHARD_GETRECORDS_INTERVAL_MILLIS • determines  how  often  Flink polls  events  from  Kinesis • set  to  at  least  1000  to  facilitate  multiple  consumers
  • 14. Connecting  to  the  Flink  Dashboard Use  dynamic  port  forwarding  to  the  Master  node • ssh -­D  8157  hadoop@... Use  FoxyProxy to  redirect  URLs  to  localhost • *ec2*.amazonaws.com* • *.compute.internal* Connect  through  the  EMR  console • navigate  to  the  YARN  Resource  Manager   • select  the  Flink ApplicationMaster
  • 15. Starting  Flink and  Submitting  Jobs
  • 17. Checkpointing and  High  Availability Zookeeper  can  be  bootstrapped  on  EMR Overprovision  the  EMR  cluster  for  fast  failovers Use  externalized  checkpoints  and  store  them  on  Amazon  S3 externalized   Checkpoint
  • 18. Build  a  Stream  Processing  Pipeline  Yourself Many  examples  with  sample  code  are  on  the  AWS  Big   Data  Blog.  Follow  the  blog! Build  a  Real-­time  Stream  Processing  Pipeline  with  Apache   Flink  on  AWS https://github.com/awslabs/flink-­stream-­processing-­refarch/