14. ※출처 : https://beam.apache.org/blog/2017/08/28/timely-processing.html
Dataflow Model
ParDo : for generic parallel processing. Each input element to be processed (which itself may
be a finite collection) is provided to a user-defined function (called a DoFn in Dataflow), which
can yield zero or more output elements per input
46. ※출처 : https://ci.apache.org/projects/flink/flink-docs-release-1.7
만약 Flink Connector 가 없다면? API
• Kafka consumer 를 구현하기 위해..
• State Init 을 해주고
• Partition Discover Thread 를 만들어주고,
• Kafka Consumer Thread, Fetch Thread 를 따로 만든 다음
• Consumer 와 Fetcher thread 의 message serving 을 담당하는 memory Queue(handover)
를 하나 만들어서 통신하게 하고~~
• 주기적으로 Checkpointing 을 하는 로직을 짠다음
• Close 할땐 Thread 정리와 checkpointing 을 잘 하면 되겠다!
• Monitoring 도 할거니까 metric 도 노출해야지!
47. ※출처 : https://ci.apache.org/projects/flink/flink-docs-release-1.7
만약 Flink Connector 가 없다면? API
• Kafka consumer 를 구현하기 위해..
• State Init 을 해주고
• Partition Discover Thread 를 만들어주고,
• Kafka Consumer Thread, Fetch Thread 를 따로 만든 다음
• Consumer 와 Fetcher thread 의 message serving 을 담당하는 memory Queue(handover)
를 하나 만들어서 통신하게 하고~~
• 주기적으로 Checkpointing 을 하는 로직을 짠다음
• Close 할땐 Thread 정리와 checkpointing 을 잘 하면 되겠다!
• Monitoring 도 할거니까 metric 도 노출해야지!
64. Should Alerting?
• CPU / Memory / Disk usage
• Gabage Collection ( count / time )
• Network I/O
• Job Downtime
• Latency Tracking
• Customized Metric
• Etc …
Alert
65. Should Alerting?
“As with alerts, an
information radiator
that always shows red
has no value. If a
condition shown on the
radiator isn’t important
enough to fix
immediately, then
remove it.
Alert
※출처 : O'Reilly Media, Inc. Infrastructure as code
66. Should Alerting?
• CPU / Memory / Disk usage
• Gabage Collection ( count / time )
• Network I/O
• Job Downtime
• Latency Tracking
• Customized Metric
• Etc …
Alert
77. Other Solutions
• Avoid to consuming from kafka-earliest-offsets
• Be careful of GroupByKey operator
• Do it first to Predicative / Filter-out operator
• Repartition / Rescaling for bottleneck ( sync logic )
• Use Async Logic