Apache FLINK.pptx

•Download as PPTX, PDF•

0 likes•29 views

This document provides an introduction to Apache Flink, a framework for distributed stream and batch data processing. It discusses the differences between batch and stream processing, with batch processing operating on static data periodically and stream processing operating immediately on event streams. The document then describes Flink's programming model including data sources, transformations, and sinks. It explains Flink's time classification of event time, ingestion time, and processing time. It also covers windows, watermarks, and compares Flink to other frameworks like Spark and Hadoop. Key features of Flink highlighted are its streaming capabilities, high speed, fault tolerance, and flexible windowing.

Technology

Live Stock Feed
(Stream processing example)

Differences between Batch and Real-Time Processing
Batch Processing Real-Time Processing
Data Static Files Event Streams
Speed
Processed Periodically in minute,
hour, day etc.
Processed immediately
nanoseconds
Storage Past data on disk storage In Memory Storage
Example Bill Generation ATM Transaction Alert

FLINK program
Data source
Source is responsible for reading data from data
sources such as HDFS, KAFKA …
Transformation
Responsible for data transformation operations
Reduce(), sum(), max(), min() …
Data Sink
Responsible for final data outputs ()

FLINK time & window
EVENT TIME CLASSIFICATION TYPES
Event Time:
Time when an
event occurs
Ingestion time:
Time when an
event arrives at the
stream processing
system
Processing Time:
Time when an
event is processed
by the stream.

FLINK time & window
DEFINITION
Window is a
method for splitting
infinite data sets
into finites blocks
for processing.
Windows split the
stream into buckets
of infinite size,
which we can apply
computation.
TYPES

Time Windows based on Processing
Time
TUMBLING WINDOWS SLIDING WINDOWS

FLINK Watermark
OUT-OF-ORDER PROBLEM WATERMARK SOLUTION

Flink vs Spark vs Hadoop
Apache Hadoop Apache Spark Apache Flink
Data Processing Engine Batch Batch Stream
Processing Speed
Slower than Spark and
Flink
100x Faster than
Hadoop
Faster than spark
Throughput Medium High High
Optimization Manual Manual Automatic
Streaming Support NA Spark Streaming Flink Streaming
Graph Support NA GraphX Gelly
Machine Learning
Support
NA SparkML FlinkML
SQL Support Hive, Impala SparkSQL Table API and SQL
Data Transfer Batch Batch Pipelined and Batch

Features of Apache Flink
1) Has a streaming processor, which can run both batch and stream programs.
2) Can process data at lightning-fast speed.
3) APIs available in Java, Scala and Python.
4) Processes data in low latency (nanoseconds) and high throughput.
5) Its fault tolerant. If a node, application or a hardware fails, it does not affect the
cluster.
6) In-memory management can be customized for better computation.
7) Windowing is very flexible in Apache Flink.

Similar to Apache FLINK.pptx

Introduction to streaming and messaging flume,kafka,SQS,kinesis Omid Vahdaty

Event Stream Processing with Kafka and SamzaZach Cox

Building Applications with Streams and SnapshotsJ On The Beach

DataEngConf SF16 - Collecting and Moving Data at Scale Hakka Labs

Streaming AnalyticsNeera Agarwal

GOTO Night Amsterdam - Stream processing with Apache FlinkRobert Metzger

A Deep Dive into Structured Streaming: Apache Spark Meetup at Bloomberg 2016 Databricks

BP301: Q: What’s Your Second Most Valuable Asset and Nearly Doubles Every Year? panagenda

Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Ea...Ververica

Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewenconfluent

Data Streaming in KafkaSilviuMarcu1

K. Tzoumas & S. Ewen – Flink Forward KeynoteFlink Forward

Apache Flink Stream ProcessingSuneel Marthi

Asynchronous micro-services and the unified logAlexander Dean

Real-time Stream Processing with Apache FlinkDataWorks Summit

AWS Webcast - Amazon Kinesis and Apache StormAmazon Web Services

Flink Forward Berlin 2017: Jörg Schad, Till Rohrmann - Apache Flink meets Apa...Flink Forward

Apache Flink® Meets Apache Mesos® and DC/OSTill Rohrmann

Introduction to Actionscript3Yoss Cohen

Msdn Workflow Services And Windows Server App FabricJuan Pablo

Similar to Apache FLINK.pptx (20)

Introduction to streaming and messaging flume,kafka,SQS,kinesis

Event Stream Processing with Kafka and Samza

Building Applications with Streams and Snapshots

DataEngConf SF16 - Collecting and Moving Data at Scale

Streaming Analytics

GOTO Night Amsterdam - Stream processing with Apache Flink

A Deep Dive into Structured Streaming: Apache Spark Meetup at Bloomberg 2016

BP301: Q: What’s Your Second Most Valuable Asset and Nearly Doubles Every Year?

Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Ea...

Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen

Data Streaming in Kafka

K. Tzoumas & S. Ewen – Flink Forward Keynote

Apache Flink Stream Processing

Asynchronous micro-services and the unified log

Real-time Stream Processing with Apache Flink

AWS Webcast - Amazon Kinesis and Apache Storm

Flink Forward Berlin 2017: Jörg Schad, Till Rohrmann - Apache Flink meets Apa...

Apache Flink® Meets Apache Mesos® and DC/OS

Introduction to Actionscript3

Msdn Workflow Services And Windows Server App Fabric

Recently uploaded

From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software

TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc

Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1

HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics

A Domino Admins Adventures (Engage 2024)Gabriella Davis

Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer

How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes

Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer

Artificial Intelligence: Facts and MythsJoaquim Jorge

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays

2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong

Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo

Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Principled Technologies

Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services

Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya

AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin

Recently uploaded (20)

From Event to Action: Accelerate Your Decision Making with Real-Time Automation

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments

Boost Fertility New Invention Ups Success Rates.pdf

HTML Injection Attacks: Impact and Mitigation Strategies

A Domino Admins Adventures (Engage 2024)

Axa Assurance Maroc - Insurer Innovation Award 2024

How to Troubleshoot Apps for the Modern Connected Worker

Exploring the Future Potential of AI-Enabled Smartphone Processors

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024

Artificial Intelligence: Facts and Myths

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe

2024: Domino Containers - The Next Step. News from the Domino Container commu...

Tata AIG General Insurance Company - Insurer Innovation Award 2024

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...

Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...

Strategies for Landing an Oracle DBA Job as a Fresher

Artificial Intelligence Chap.5 : Uncertainty

AWS Community Day CPH - Three problems of Terraform

Apache FLINK.pptx

2. Agenda

7. Live Stock Feed (Stream processing example)

8. Differences between Batch and Real-Time Processing Batch Processing Real-Time Processing Data Static Files Event Streams Speed Processed Periodically in minute, hour, day etc. Processed immediately nanoseconds Storage Past data on disk storage In Memory Storage Example Bill Generation ATM Transaction Alert

9. Deeper into FLINK

10. Eco-system Apache FLINK

11. FLINK program Data source Source is responsible for reading data from data sources such as HDFS, KAFKA … Transformation Responsible for data transformation operations Reduce(), sum(), max(), min() … Data Sink Responsible for final data outputs ()

12. Architecture

13. Job Running Process

14. FLINK time & window EVENT TIME CLASSIFICATION TYPES Event Time: Time when an event occurs Ingestion time: Time when an event arrives at the stream processing system Processing Time: Time when an event is processed by the stream.

15. Different Between Three Time

16. FLINK time & window DEFINITION Window is a method for splitting infinite data sets into finites blocks for processing. Windows split the stream into buckets of infinite size, which we can apply computation. TYPES

17. Time Windows based on Processing Time TUMBLING WINDOWS SLIDING WINDOWS

18. FLINK Watermark OUT-OF-ORDER PROBLEM WATERMARK SOLUTION

19. Tips and useful resources

20. Flink vs Spark vs Hadoop Apache Hadoop Apache Spark Apache Flink Data Processing Engine Batch Batch Stream Processing Speed Slower than Spark and Flink 100x Faster than Hadoop Faster than spark Throughput Medium High High Optimization Manual Manual Automatic Streaming Support NA Spark Streaming Flink Streaming Graph Support NA GraphX Gelly Machine Learning Support NA SparkML FlinkML SQL Support Hive, Impala SparkSQL Table API and SQL Data Transfer Batch Batch Pipelined and Batch

21. Features of Apache Flink 1) Has a streaming processor, which can run both batch and stream programs. 2) Can process data at lightning-fast speed. 3) APIs available in Java, Scala and Python. 4) Processes data in low latency (nanoseconds) and high throughput. 5) Its fault tolerant. If a node, application or a hardware fails, it does not affect the cluster. 6) In-memory management can be customized for better computation. 7) Windowing is very flexible in Apache Flink.

22. Thank You

Apache FLINK.pptx

Recommended

Recommended

More Related Content

Similar to Apache FLINK.pptx

Similar to Apache FLINK.pptx (20)

Recently uploaded

Recently uploaded (20)

Apache FLINK.pptx