What Is Apache Flink ?
● A stream processing framework
● Open source / Apache 2.0 license
● Written in Java and Scala
● For batch and stream processing
● For high volume , low latency
● Develop in Java, Scala, Python, SQL
● Automatic compilation/optimization into data flows
How Does Flink Work ?
● Process Unbounded and Bounded Data
● Uses file systems to consume/persistently store data i.e.
– local, hadoop-compatible, Amazon S3, MapR FS, OpenStack
Swift FS, Aliyun OSS and Azure Blob Storage
● Leverages In-Memory Performance
● Provides a rich function set for handling
– Streams, state and time
– When building applications
● Provides layered API's which provides a balance between
– Conciseness and expressiveness
– See next slide
How Does Flink Work ?
Flink layered API's
Flink API's
● SQL & Table API
● DataStream API
● ProcessFunctions – event processing
● Flink also has libraries for common data processing
– Complex Event Processing (CEP)
– DataSet API
– Gelly - library for scalable graph processing/analysis
Flink Used By
Flink Deployment
● Deploy Flink to use the following cluster managers
– YARN
– Mesos
– Kubernetes
– Stand alone
● All application control communications via REST calls
● Deploy at any scale
– multiple trillions of events per day
– multiple terabytes of state
– thousands of cores
Flink Architecture
Flink Stateful Functions
● Simplifies building distributed stateful applications
● Provides a runtime built for serverless architectures
● Key Benefits
– Dynamic Messaging
– Consistent State
– Multi-language Support
– No Database Required
– Cloud Native
– "Stateless" Operation
Flink Stateful Functions
Flink Use Cases
● Event-driven Applications i.e.
– Fraud detection
– Anomaly detection
● Data Analytics Applications
– Quality monitoring of Telco networks
– Analysis of product updates & experiment evaluation
in mobile applications
● Data Pipeline Applications
– Real-time search index building in e-commerce
– Continuous ETL in e-commerce
Flink Use Cases
Flink Use Cases
Available Books
● See “Big Data Made Easy”
– Apress Jan 2015
●
See “Mastering Apache Spark”
– Packt Oct 2015
●
See “Complete Guide to Open Source Big Data Stack
– “Apress Jan 2018”
● Find the author on Amazon
– www.amazon.com/Michael-Frampton/e/B00NIQDOOM/
●
Connect on LinkedIn
– www.linkedin.com/in/mike-frampton-38563020
Connect
● Feel free to connect on LinkedIn
– www.linkedin.com/in/mike-frampton-38563020
● See my open source blog at
– open-source-systems.blogspot.com/
● I am always interested in
– New technology
– Opportunities
– Technology based issues
– Big data integration

Apache Flink

  • 1.
    What Is ApacheFlink ? ● A stream processing framework ● Open source / Apache 2.0 license ● Written in Java and Scala ● For batch and stream processing ● For high volume , low latency ● Develop in Java, Scala, Python, SQL ● Automatic compilation/optimization into data flows
  • 2.
    How Does FlinkWork ? ● Process Unbounded and Bounded Data ● Uses file systems to consume/persistently store data i.e. – local, hadoop-compatible, Amazon S3, MapR FS, OpenStack Swift FS, Aliyun OSS and Azure Blob Storage ● Leverages In-Memory Performance ● Provides a rich function set for handling – Streams, state and time – When building applications ● Provides layered API's which provides a balance between – Conciseness and expressiveness – See next slide
  • 3.
    How Does FlinkWork ? Flink layered API's
  • 4.
    Flink API's ● SQL& Table API ● DataStream API ● ProcessFunctions – event processing ● Flink also has libraries for common data processing – Complex Event Processing (CEP) – DataSet API – Gelly - library for scalable graph processing/analysis
  • 5.
  • 6.
    Flink Deployment ● DeployFlink to use the following cluster managers – YARN – Mesos – Kubernetes – Stand alone ● All application control communications via REST calls ● Deploy at any scale – multiple trillions of events per day – multiple terabytes of state – thousands of cores
  • 7.
  • 8.
    Flink Stateful Functions ●Simplifies building distributed stateful applications ● Provides a runtime built for serverless architectures ● Key Benefits – Dynamic Messaging – Consistent State – Multi-language Support – No Database Required – Cloud Native – "Stateless" Operation
  • 9.
  • 10.
    Flink Use Cases ●Event-driven Applications i.e. – Fraud detection – Anomaly detection ● Data Analytics Applications – Quality monitoring of Telco networks – Analysis of product updates & experiment evaluation in mobile applications ● Data Pipeline Applications – Real-time search index building in e-commerce – Continuous ETL in e-commerce
  • 11.
  • 12.
  • 13.
    Available Books ● See“Big Data Made Easy” – Apress Jan 2015 ● See “Mastering Apache Spark” – Packt Oct 2015 ● See “Complete Guide to Open Source Big Data Stack – “Apress Jan 2018” ● Find the author on Amazon – www.amazon.com/Michael-Frampton/e/B00NIQDOOM/ ● Connect on LinkedIn – www.linkedin.com/in/mike-frampton-38563020
  • 14.
    Connect ● Feel freeto connect on LinkedIn – www.linkedin.com/in/mike-frampton-38563020 ● See my open source blog at – open-source-systems.blogspot.com/ ● I am always interested in – New technology – Opportunities – Technology based issues – Big data integration