Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Log Event Stream Processing In Flink Way


Published on

Logs are one of the most important sources to monitor and reveal some significant events of interest. In this presentation, we introduced an implementation of log streams processing architecture based on Apache Flink. With fluentd, different kinds of emitted logs are collected and sent to Kafka. After having processed by Flink, we try to build a dash board utilizing elasticsearch and kibana for visualization.

Published in: Data & Analytics
  • Be the first to comment

Log Event Stream Processing In Flink Way

  1. 1. Log Event Stream Processing In Flink Way George T. C. Lai
  2. 2. Who Am I? ● BlueWell Technology ○ Big Data Architect ○ Focuses ■ Open DC/OS ■ CoreOS ■ Kubernetes ■ Apache Flink ■ Data Science
  3. 3. What We Have Done?
  4. 4. A Glance of Log Events Event Timestamp Compound Key Event Body
  5. 5. Brief Introduction To Apache Flink
  6. 6. Basic Structure ● For each Apache Flink DataStream Program ○ Obtain an execution environment. ■ StreamExecutionEnvironment.getExecutionEnvironment() ■ determine the method to handle timestamp (Time characteristic) ○ Load/create data sources. ■ read from file ■ read from socket ■ read from built-in sources (Kafka, RabbitMQ, etc.) ○ Execute transformations on them. ■ filter, map, reduce, etc. (Task chaining) ○ Specify where to save results of the computations. ■ stdout (print) ■ write to files ■ write to built-in sinks (elasticsearch, Kafka, etc.) ○ Trigger the program execution.
  7. 7. Time Handling & Time Characteristics Logs have its own timestamp.
  8. 8. Windows ● The concept of Windows ○ cut an infinite stream into slices with finite elements. ○ based on timestamp or some criteria. ● Construction of Windows ○ Keyed Windows ■ an infinite DataStream is divided based on both window and key ■ elements with different keys can be processed concurrently ○ Non-keyed Windows ● We focus on the keyed (hosts) windowing.
  9. 9. Windows ● Basic Structure ○ Key ○ Window assigner ○ Window function ■ reduce() ■ fold() ■ apply() val input: DataStream[T] = ... input .keyBy(<key selector>) .window(<window assigner>) .<windowed transformation>(<window function>)
  10. 10. Window Assigner - Global Windows ● Single per-key global window. ● Only useful if a custom trigger is specified.
  11. 11. Window Assigner - Tumbling Windows ● Defined by window size. ● Windows are disjoint. The window assigner we adopted!!
  12. 12. Window Assigner - Sliding Windows ● Defined by both window size and sliding size. ● Windows may have overlap.
  13. 13. Window Assigner - Session Windows ● Defined by gap of time. ● Window time ○ starts at individual time points. ○ ends once there has been a certain period of inactivity.
  14. 14. Window Functions ● WindowFunction ○ Cache elements internally ○ Provides Window meta information (e.g., start time, end time, etc.) ● ReduceFunction ○ Incrementally aggregation ○ No access to Window meta information ● FoldFunction ○ Incrementally aggregation ○ No access to Window meta information ● WindowFunction with ReduceFunction / FoldFunction ○ Incrementally aggregation ○ Has access to Window meta information
  15. 15. Dealing with Data Lateness ● Set allowed lateness to Windows ○ new in 1.1.0 ○ watermark passes end timestamp of window + allowedLateness. ○ defaults to 0, drop event once it is late.
  16. 16. Flink VS Spark Based On My Limited Experience ● Data streams manipulation by means of built-in API ○ Flink DataStream API (fine-grained) ○ Spark Streaming (coarse-grained) ● Intrinsic ○ Flink is a stream-intrinsic engine, time window ○ Spark is a batch-intrinsic engine, mini-batch
  17. 17. We’re all set. Thank you!!! Just Flink It!