SlideShare a Scribd company logo
1 of 69
Advanced topics in
Apache Flink™
Maximilian Michels
mxm@apache.org
@stadtlegende
EIT ICT Summer School 2015
Ufuk Celebi
uce@apache.org
@iamuce
Agenda
 Batch analytics
 Iterative processing
 Fault tolerance
 Data types and keys
 More transformations
 Further API concepts
2
Batch analytics
3
Memory Management
4
Memory Management
5
Managed memory in Flink
6
Memory runs out
Hash Join
Probe side: 64GB
See also: http://flink.apache.org/news/2015/03/13/peeking-into-Apache-Flinks-Engine-Room.html
Cost-based optimizer
7
Table API
8
val customers = env.readCsvFile(…).as('id, 'mktSegment)
.filter("mktSegment = AUTOMOBILE")
val orders = env.readCsvFile(…)
.filter( o => dateFormat.parse(o.orderDate).before(date) )
.as("orderId, custId, orderDate, shipPrio")
val items = orders
.join(customers).where("custId = id")
.join(lineitems).where("orderId = id")
.select("orderId, orderDate, shipPrio,
extdPrice * (Literal(1.0f) – discount) as revenue")
val result = items
.groupBy("orderId, orderDate, shipPrio")
.select('orderId, revenue.sum, orderDate, shipPrio")
Flink stack
9
Table
DataSet (Java/Scala) DataStream (Java/Scala)
HadoopM/R
Local Cluster Yarn Tez Embedded
Table
Streaming dataflow runtime
Iterative processing
10
Non-native iterations
11
Step Step Step Step Step
Client
for (int i = 0; i < maxIterations; i++) {
// Execute MapReduce job
}
Iterative processing in Flink
Flink offers built-in iterations and delta
iterations to execute ML and graph
algorithms efficiently.
12
map
join sum
ID1
ID2
ID3
FlinkML
 API for ML pipelines inspired by scikit-learn
 Collection of packaged algorithms
• SVM, Multiple Linear Regression, Optimization, ALS, ...
13
val trainingData: DataSet[LabeledVector] = ...
val testingData: DataSet[Vector] = ...
val scaler = StandardScaler()
val polyFeatures = PolynomialFeatures().setDegree(3)
val mlr = MultipleLinearRegression()
val pipeline = scaler.chainTransformer(polyFeatures).chainPredictor(mlr)
pipeline.fit(trainingData)
val predictions: DataSet[LabeledVector] = pipeline.predict(testingData)
Gelly
 Graph API: various graph processing paradigms
 Packaged algorithms
• PageRank, SSSP, Label Propagation, Community
Detection, Connected Components
14
ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
Graph<Long, Long, NullValue> graph = ...
DataSet<Vertex<Long, Long>> verticesWithCommunity = graph.run(
new LabelPropagation<Long>(30)).getVertices();
verticesWithCommunity.print();
env.execute();
Example: Matrix Factorization
15
Factorizing a matrix with
28 billion ratings for
recommendations
More at: http://data-artisans.com/computing-recommendations-with-flink.html
Fault tolerance
16
Why be fault tolerant?
 Failures are rare but they occur
 The larger the cluster, the more likely
failures
 Types of failures
• Hardware
• Software
17
Recovery strategies
Batch
 Simple strategy: restart the job/tasks
 Resume from partially crated intermediate
results or re-read and process the entire input
again
Streaming
 Simple strategy: restart job/tasks
 We loose the state of the operators
 Goal: find the correct offset to resume from
18
Streaming fault tolerance
 Ensure that operators see all events
• “At least once”
• Solved by replaying a stream from a
checkpoint, e.g., from a past Kafka offset
 Ensure that operators do not perform
duplicate updates to their state
• “Exactly once”
• Several solutions
19
Exactly once approaches
 Discretized streams (Spark Streaming)
• Treat streaming as a series of small atomic computations
• “Fast track” to fault tolerance, but restricts computational
and programming model (e.g., cannot mutate state across
“mini-batches”, window functions correlated with mini-
batch size)
 MillWheel (Google Cloud Dataflow)
• State update and derived events committed as atomic
transaction to a high-throughput transactional store
• Requires a very high-throughput transactional store 
 Chandy-Lamport distributed snapshots (Flink)
20
21
JobManager
Initiate Checkpoint
Replay will start from here
22
JobManagerBarriers “push” prior events
(assumes in-order delivery in
individual channels)
Operator checkpointing
starting
Operator checkpointing
finished
Operator checkpointing in
progress
23
JobManager
Operator checkpointing takes
snapshot of state after ack’d
data have updated the state.
Checkpoints currently one-off
and synchronous, WiP for
incremental and
asynchronous
State backup
Pluggable mechanism. Currently
either JobManager (for small state) or
file system (HDFS/Tachyon). WiP for
in-memory grids
24
JobManager
State snapshots at sinks
signal successful end of this
checkpoint
At failure,
recover last
checkpointed
state and restart
sources from
last barrier. This
guarantees at
least once
State backup
Best of all worlds for streaming
 Low latency
• Thanks to pipelined engine
 Exactly-once guarantees
• Variation of Chandy-Lamport
 High throughput
• Controllable checkpointing overhead
 Separates app logic from recovery
• Checkpointing interval is just a config parameter
25
Fault Tolerance Demo
26
Type System and Keys
What kind of data can Flink handle?
27
Apache Flink’s Type System
 Flink aims to support all data types
• Ease of programming
• Seamless integration with existing code
 Programs are analyzed before execution
• Used data types are identified
• Serializer & comparator are configured
28
Apache Flink’s Type System
 Data types are either
• Atomic types (like Java Primitives)
• Composite types (like Flink Tuples)
 Composite types nest other types
 Not all data types can be used as keys!
• Flink groups, joins & sorts DataSets on keys
• Key types must be comparable
29
Atomic Types
Flink Type Java Type Can be used as key?
BasicType Java Primitives
(Integer, String, …)
Yes
ArrayType Arrays of Java primitives
or objects
No
WritableType Implements Hadoop’s
Writable interface
Yes, if implements
WritableComparable
GenericType Any other type Yes, if implements
Comparable
30
Composite Types
 Are composed of fields with other types
• Fields types can be atomic or composite
 Fields can be addressed as keys
• Field type must be a key type!
 A composite type can be a key type
• All field types must be key types!
31
PojoType
 Any Java class that
• Has an empty default constructor
• Has publicly accessible fields
(Public or getter/setter)
public class Person {
public int id;
public String name;
public Person() {};
public Person(int id, String name) {…};
}
DataSet<Person> p =
env.fromElements(new Person(1, ”Bob”)); 32
PojoType
 Define keys by field name
DataSet<Person> p = …
// group on “name” field
d.groupBy(“name”).groupReduce(…);
33
Scala CaseClasses
 Scala case classes are natively supported
case class Person(id: Int, name: String)
d: DataSet[Person] =
env.fromElements(new Person(1, “Bob”)
 Define keys by field name
// use field “name” as key
d.groupBy(“name”).groupReduce(…)
34
Composite & nested keys
DataSet<Tuple3<String, Person, Double>> d = …
 Composite keys are supported
// group on both long fields
d.groupBy(0, 1).reduceGroup(…);
 Nested fields can be used as types
// group on nested “name” field
d.groupBy(“f1.name”).reduceGroup(…);
 Full types can be used as key using “*” wildcard
// group on complete nested Pojo field
d.groupBy(“f1.*”).reduceGroup(…);
• “*” wildcard can also be used for atomic types 35
Join & CoGroup Keys
 Key types must match for binary operations!
DataSet<Tuple2<Long, String>> d1 = …
DataSet<Tuple2<Long, Long>> d2 = …
// works
d1.join(d2).where(0).equalTo(1).with(…);
// works
d1.join(d2).where(“f0”).equalTo(0).with(…);
// does not work!
d1.join(d2).where(1).equalTo(0).with(…);
36
Transformations & Functions
43
Transformations
 DataSet Basics presented:
• Map, FlatMap, GroupBy, GroupReduce, Join
 Reduce
 CoGroup
 Combine
 GroupSort
 AllReduce & AllGroupReduce
 Union
 see documentation for more transformations
44
GroupReduce (Hadoop-style)
 GroupReduceFunction gives iterator over
elements of group
• Elements are streamed (possibly from disk),
not materialized in memory
• Group size can exceed available JVM heap
 Input type and output type may be different
45
Reduce (FP-style)
• Reduce like in functional programming
• Less generic compared to GroupReduce
• Function must be commutative and associative
• Input type == Output type
• System can apply more optimizations
• Always combinable
• May use a hash strategy for execution (future)
46
Reduce (FP-style)
DataSet<Tuple2<Long,Long>> sum = data
.groupBy(0)
.reduce(new SumReducer());
public static class SumReducer implements
ReduceFunction<Tuple2<Long,Long>> {
@Override
public Tuple2<Long,Long> reduce(
Tuple2<Long,Long> v1,
Tuple2<Long,Long> v2) {
v1.f1 += v2.f1;
return v1;
}
}
47
CoGroup
 Binary operation (two inputs)
• Groups both inputs on a key
• Processes groups with matching keys of both
inputs
 Similar to GroupReduce on two inputs
48
CoGroup
DataSet<Tuple2<Long,String>> d1 = …;
DataSet<Long> d2 = …;
DataSet<String> d3 =
d1.coGroup(d2).where(0).equalTo(1).with(new CoGrouper());
public static class CoGrouper implements
CoGroupFunction<Tuple2<Long,String>,Long,String>{
@Override
public void coGroup(Iterable<Tuple2<Long,String> vs1,
Iterable<Long> vs2, Collector<String> out) {
if(!vs2.iterator.hasNext()) {
for(Tuple2<Long,String> v1 : vs1) {
out.collect(v1.f1);
}
}
} } 49
Combiner
 Local pre-aggregation of data
• Before data is sent to GroupReduce or CoGroup
• (functional) Reduce injects combiner automatically
• Similar to Hadoop Combiner
 Optional for semantics, crucial for performance!
• Reduces data before it is sent over the network
50
Combiner WordCount Example
51
Map
Map
Map
Combine
Combine
Combine
Reduce
Reduce
A B B
A C A
C A C
B A B
D A A
A B C
(A, 1)
(B, 1)
(B, 1)
(A, 1)
(C, 1)
(A, 1)
(C, 1)
(A, 1)
(C, 1)
(B, 1)
(A, 1)
(B, 1)
(D, 1)
(A, 1)
(A, 1)
(A, 1)
(B, 1)
(C, 1)
(A, 3)
(C, 1)
(A, 2)
(C, 2)
(A, 3)
(C, 1)
(B, 2)
(B, 2)
(B, 1)
(D, 1)
(A, 8)
(C, 4)
(B, 5)
(D, 1)
Use a combiner
 Implement RichGroupReduceFunction<I,O>
• Override combine(Iterable<I> in, Collector<O>);
• Same interface as reduce() method
• Annotate your GroupReduceFunction with @Combinable
• Combiner will be automatically injected into Flink
program
 Implement a GroupCombineFunction
• Same interface as GroupReduceFunction
• DataSet.combineGroup()
52
GroupSort
 Sort groups before they are handed to
GroupReduce or CoGroup functions
• More (resource-)efficient user code
• Easier user code implementation
• Comes (almost) for free
• Aka secondary sort (Hadoop)
53
DataSet<Tuple3<Long,Long,Long> data = …;
data.groupBy(0)
.sortGroup(1, Order.ASCENDING)
.groupReduce(new MyReducer());
AllReduce / AllGroupReduce
 Reduce / GroupReduce without GroupBy
• Operates on a single group -> Full DataSet
• Full DataSet is sent to one machine
• Will automatically run with parallelism of 1
 Careful with large DataSets!
• Make sure you have a Combiner
54
Union
 Union two data sets
• Binary operation, same data type required
• No duplicate elimination (SQL UNION ALL)
• Very cheap operation
55
DataSet<Tuple2<String, Long> d1 = …;
DataSet<Tuple2<String, Long> d2 = …;
DataSet<Tuple2<String, Long> d3 =
d1.union(d2);
RichFunctions
 Function interfaces have only one method
• Single abstract method (SAM)
• Support for Java8 Lambda functions
 There is a “Rich” variant for each function.
• RichFlatMapFunction, …
• Additional methods
• open(Configuration c)
• close()
• getRuntimeContext()
56
RichFunctions & RuntimeContext
 RuntimeContext has useful methods:
• getIndexOfThisSubtask ()
• getNumberOfParallelSubtasks()
• getExecutionConfig()
 Gives access to:
• Accumulators
• DistributedCache
flink.apache.org 57
Further API Concepts
58
Broadcast Variables
59
map
map
map
Example: Tag words with IDs in text corpus
Dictionary
Text
data set
broadcast (small)
dictionary to all
mappers
Broadcast variables
 register any DataSet as a broadcast
variable
 available on all parallel instances
60
// 1. The DataSet to be broadcasted
DataSet<Integer> toBroadcast = env.fromElements(1, 2, 3);
// 2. Broadcast the DataSet
map().withBroadcastSet(toBroadcast, "broadcastSetName");
// 3. inside user defined function
getRuntimeContext().getBroadcastVariable("broadcastSetName");
Accumulators
 Lightweight tool to compute stats on data
• Useful to verify your assumptions about your data
• Similar to Counters (Hadoop MapReduce)
 Build in accumulators
• Int and Long counters
• Histogramm
 Easily customizable
61
JobManager
counter =
4 +
18 +
22 =
44
Accumulators
62
map
long counter++
Example:
Count total number of words in text corpus
Send local counts from parallel
instances to JobManager
Text
data set
map
long counter++
map
long counter++
Using Accumulators
 Use accumulators to verify your
assumptions about the data
63
class Tokenizer extends
RichFlatMapFunction<String, String>> {
@Override
public void flatMap(String val,
Collector<String> out) {
getRuntimeContext()
.getLongCounter("elementCount").add(1L);
// do more stuff.
}
}
Get Accumulator Results
 Accumulators are available via JobExecutionResult
• returned by env.execute()
 Accumulators are displayed
• by CLI client
• in the JobManager web frontend
64
JobExecutionResult result = env.execute("WordCount");
long ec = result.getAccumulatorResult("elementCount");
Closing
65
I ♥ , do you?
66
 Get involved and start a discussion on
Flink‘s mailing list
 { user, dev }@flink.apache.org
 Subscribe to news@flink.apache.org
 Follow flink.apache.org/blog and
@ApacheFlink on Twitter
67
flink-forward.org
Call for papers deadline:
August 14, 2015
October 12-13, 2015
Discount code: FlinkEITSummerSchool25
Thank you for listening!
68
Flink compared to other
projects
69
Batch & Streaming projects
Batch only
Streaming only
Hybrid
70
Batch comparison
71
API low-level high-level high-level
Data Transfer batch batch pipelined & batch
Memory
Management
disk-based JVM-managed Active managed
Iterations
file system
cached
in-memory
cached
streamed
Fault tolerance task level task level job level
Good at massive scale out data exploration
heavy backend &
iterative jobs
Libraries many external built-in & external
evolving built-in &
external
Streaming comparison
72
Streaming “true” mini batches “true”
API low-level high-level high-level
Fault tolerance tuple-level ACKs RDD-based (lineage)
coarse
checkpointing
State not built-in external internal
Exactly once at least once exactly once exactly once
Windowing not built-in restricted flexible
Latency low medium low
Throughput medium high high
Stream platform architecture
73
- Gather and backup streams
- Offer streams for consumption
- Provide stream recovery
- Analyze and correlate streams
- Create derived streams and state
- Provide these to downstream systems
Server
logs
Trxn
logs
Sensor
logs
Downstream
systems
What is a stream processor?
1. Pipelining
2. Stream replay
3. Operator state
4. Backup and restore
5. High-level APIs
6. Integration with batch
7. High availability
8. Scale-in and scale-out
74
Basics
State
App development
Large deployments
See http://data-artisans.com/stream-processing-with-flink.html
Engine comparison
75
Paradigm
Optimization
Execution
API
Optimization
in all APIs
Optimization
of SQL queries
none none
DAG
Transformations
on k/v pair
collections
Iterative
transformations
on collections
RDD Cyclic
dataflows
MapReduce on
k/v pairs
k/v pair
Readers/Writers
Batch
sorting
Batch
sorting and
partitioning
Batch with
memory
pinning
Stream with
out-of-core
algorithms
MapReduce

More Related Content

What's hot

Migrating to Drupal 8
Migrating to Drupal 8Migrating to Drupal 8
Migrating to Drupal 8
Alkuvoima
 
Javascript Performance
Javascript PerformanceJavascript Performance
Javascript Performance
olivvv
 
Apache Flink Training: DataStream API Part 2 Advanced
Apache Flink Training: DataStream API Part 2 Advanced Apache Flink Training: DataStream API Part 2 Advanced
Apache Flink Training: DataStream API Part 2 Advanced
Flink Forward
 

What's hot (19)

Java 8 Lambda and Streams
Java 8 Lambda and StreamsJava 8 Lambda and Streams
Java 8 Lambda and Streams
 
Lambda Expressions in Java 8
Lambda Expressions in Java 8Lambda Expressions in Java 8
Lambda Expressions in Java 8
 
GDC 2010 - A Dynamic Component Architecture for High Performance Gameplay - M...
GDC 2010 - A Dynamic Component Architecture for High Performance Gameplay - M...GDC 2010 - A Dynamic Component Architecture for High Performance Gameplay - M...
GDC 2010 - A Dynamic Component Architecture for High Performance Gameplay - M...
 
Migrating to Drupal 8
Migrating to Drupal 8Migrating to Drupal 8
Migrating to Drupal 8
 
Java 7 New Features
Java 7 New FeaturesJava 7 New Features
Java 7 New Features
 
Java
JavaJava
Java
 
Lambdas and Laughs
Lambdas and LaughsLambdas and Laughs
Lambdas and Laughs
 
Data Summer Conf 2018, “Hands-on with Apache Spark for Beginners (ENG)” — Akm...
Data Summer Conf 2018, “Hands-on with Apache Spark for Beginners (ENG)” — Akm...Data Summer Conf 2018, “Hands-on with Apache Spark for Beginners (ENG)” — Akm...
Data Summer Conf 2018, “Hands-on with Apache Spark for Beginners (ENG)” — Akm...
 
CBStreams - Java Streams for ColdFusion (CFML)
CBStreams - Java Streams for ColdFusion (CFML)CBStreams - Java Streams for ColdFusion (CFML)
CBStreams - Java Streams for ColdFusion (CFML)
 
Java 8 presentation
Java 8 presentationJava 8 presentation
Java 8 presentation
 
Javascript Performance
Javascript PerformanceJavascript Performance
Javascript Performance
 
The CoFX Data Model
The CoFX Data ModelThe CoFX Data Model
The CoFX Data Model
 
BigDataSpain 2016: Stream Processing Applications with Apache Apex
BigDataSpain 2016: Stream Processing Applications with Apache ApexBigDataSpain 2016: Stream Processing Applications with Apache Apex
BigDataSpain 2016: Stream Processing Applications with Apache Apex
 
JDK1.6
JDK1.6JDK1.6
JDK1.6
 
Apache Flink Training: DataStream API Part 2 Advanced
Apache Flink Training: DataStream API Part 2 Advanced Apache Flink Training: DataStream API Part 2 Advanced
Apache Flink Training: DataStream API Part 2 Advanced
 
java 8 new features
java 8 new features java 8 new features
java 8 new features
 
Java 8 - Project Lambda
Java 8 - Project LambdaJava 8 - Project Lambda
Java 8 - Project Lambda
 
Performance van Java 8 en verder - Jeroen Borgers
Performance van Java 8 en verder - Jeroen BorgersPerformance van Java 8 en verder - Jeroen Borgers
Performance van Java 8 en verder - Jeroen Borgers
 
Nikita Popov "What’s new in PHP 8.0?"
Nikita Popov "What’s new in PHP 8.0?"Nikita Popov "What’s new in PHP 8.0?"
Nikita Popov "What’s new in PHP 8.0?"
 

Similar to Advanced

Similar to Advanced (20)

ApacheCon: Apache Flink - Fast and Reliable Large-Scale Data Processing
ApacheCon: Apache Flink - Fast and Reliable Large-Scale Data ProcessingApacheCon: Apache Flink - Fast and Reliable Large-Scale Data Processing
ApacheCon: Apache Flink - Fast and Reliable Large-Scale Data Processing
 
Apache Big Data EU 2016: Building Streaming Applications with Apache Apex
Apache Big Data EU 2016: Building Streaming Applications with Apache ApexApache Big Data EU 2016: Building Streaming Applications with Apache Apex
Apache Big Data EU 2016: Building Streaming Applications with Apache Apex
 
Flink internals web
Flink internals web Flink internals web
Flink internals web
 
Using Pony for Fintech
Using Pony for FintechUsing Pony for Fintech
Using Pony for Fintech
 
Essential Camel Components
Essential Camel ComponentsEssential Camel Components
Essential Camel Components
 
Beyond PITS, Functional Principles for Software Architecture
Beyond PITS, Functional Principles for Software ArchitectureBeyond PITS, Functional Principles for Software Architecture
Beyond PITS, Functional Principles for Software Architecture
 
Apache Flink Meetup Munich (November 2015): Flink Overview, Architecture, Int...
Apache Flink Meetup Munich (November 2015): Flink Overview, Architecture, Int...Apache Flink Meetup Munich (November 2015): Flink Overview, Architecture, Int...
Apache Flink Meetup Munich (November 2015): Flink Overview, Architecture, Int...
 
Exploring Java Heap Dumps (Oracle Code One 2018)
Exploring Java Heap Dumps (Oracle Code One 2018)Exploring Java Heap Dumps (Oracle Code One 2018)
Exploring Java Heap Dumps (Oracle Code One 2018)
 
Big data week presentation
Big data week presentationBig data week presentation
Big data week presentation
 
Hadoop MapReduce framework - Module 3
Hadoop MapReduce framework - Module 3Hadoop MapReduce framework - Module 3
Hadoop MapReduce framework - Module 3
 
Introduction to clojure
Introduction to clojureIntroduction to clojure
Introduction to clojure
 
Google cloud Dataflow & Apache Flink
Google cloud Dataflow & Apache FlinkGoogle cloud Dataflow & Apache Flink
Google cloud Dataflow & Apache Flink
 
A Brief Conceptual Introduction to Functional Java 8 and its API
A Brief Conceptual Introduction to Functional Java 8 and its APIA Brief Conceptual Introduction to Functional Java 8 and its API
A Brief Conceptual Introduction to Functional Java 8 and its API
 
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...Data all over the place! How SQL and Apache Calcite bring sanity to streaming...
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...
 
JS Essence
JS EssenceJS Essence
JS Essence
 
Charles Sharp: Java 8 Streams
Charles Sharp: Java 8 StreamsCharles Sharp: Java 8 Streams
Charles Sharp: Java 8 Streams
 
Google guava overview
Google guava overviewGoogle guava overview
Google guava overview
 
Metadata and Provenance for ML Pipelines with Hopsworks
Metadata and Provenance for ML Pipelines with Hopsworks Metadata and Provenance for ML Pipelines with Hopsworks
Metadata and Provenance for ML Pipelines with Hopsworks
 
Play Framework and Activator
Play Framework and ActivatorPlay Framework and Activator
Play Framework and Activator
 
Why apache Flink is the 4G of Big Data Analytics Frameworks
Why apache Flink is the 4G of Big Data Analytics FrameworksWhy apache Flink is the 4G of Big Data Analytics Frameworks
Why apache Flink is the 4G of Big Data Analytics Frameworks
 

Advanced

  • 1. Advanced topics in Apache Flink™ Maximilian Michels mxm@apache.org @stadtlegende EIT ICT Summer School 2015 Ufuk Celebi uce@apache.org @iamuce
  • 2. Agenda  Batch analytics  Iterative processing  Fault tolerance  Data types and keys  More transformations  Further API concepts 2
  • 6. Managed memory in Flink 6 Memory runs out Hash Join Probe side: 64GB See also: http://flink.apache.org/news/2015/03/13/peeking-into-Apache-Flinks-Engine-Room.html
  • 8. Table API 8 val customers = env.readCsvFile(…).as('id, 'mktSegment) .filter("mktSegment = AUTOMOBILE") val orders = env.readCsvFile(…) .filter( o => dateFormat.parse(o.orderDate).before(date) ) .as("orderId, custId, orderDate, shipPrio") val items = orders .join(customers).where("custId = id") .join(lineitems).where("orderId = id") .select("orderId, orderDate, shipPrio, extdPrice * (Literal(1.0f) – discount) as revenue") val result = items .groupBy("orderId, orderDate, shipPrio") .select('orderId, revenue.sum, orderDate, shipPrio")
  • 9. Flink stack 9 Table DataSet (Java/Scala) DataStream (Java/Scala) HadoopM/R Local Cluster Yarn Tez Embedded Table Streaming dataflow runtime
  • 11. Non-native iterations 11 Step Step Step Step Step Client for (int i = 0; i < maxIterations; i++) { // Execute MapReduce job }
  • 12. Iterative processing in Flink Flink offers built-in iterations and delta iterations to execute ML and graph algorithms efficiently. 12 map join sum ID1 ID2 ID3
  • 13. FlinkML  API for ML pipelines inspired by scikit-learn  Collection of packaged algorithms • SVM, Multiple Linear Regression, Optimization, ALS, ... 13 val trainingData: DataSet[LabeledVector] = ... val testingData: DataSet[Vector] = ... val scaler = StandardScaler() val polyFeatures = PolynomialFeatures().setDegree(3) val mlr = MultipleLinearRegression() val pipeline = scaler.chainTransformer(polyFeatures).chainPredictor(mlr) pipeline.fit(trainingData) val predictions: DataSet[LabeledVector] = pipeline.predict(testingData)
  • 14. Gelly  Graph API: various graph processing paradigms  Packaged algorithms • PageRank, SSSP, Label Propagation, Community Detection, Connected Components 14 ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment(); Graph<Long, Long, NullValue> graph = ... DataSet<Vertex<Long, Long>> verticesWithCommunity = graph.run( new LabelPropagation<Long>(30)).getVertices(); verticesWithCommunity.print(); env.execute();
  • 15. Example: Matrix Factorization 15 Factorizing a matrix with 28 billion ratings for recommendations More at: http://data-artisans.com/computing-recommendations-with-flink.html
  • 17. Why be fault tolerant?  Failures are rare but they occur  The larger the cluster, the more likely failures  Types of failures • Hardware • Software 17
  • 18. Recovery strategies Batch  Simple strategy: restart the job/tasks  Resume from partially crated intermediate results or re-read and process the entire input again Streaming  Simple strategy: restart job/tasks  We loose the state of the operators  Goal: find the correct offset to resume from 18
  • 19. Streaming fault tolerance  Ensure that operators see all events • “At least once” • Solved by replaying a stream from a checkpoint, e.g., from a past Kafka offset  Ensure that operators do not perform duplicate updates to their state • “Exactly once” • Several solutions 19
  • 20. Exactly once approaches  Discretized streams (Spark Streaming) • Treat streaming as a series of small atomic computations • “Fast track” to fault tolerance, but restricts computational and programming model (e.g., cannot mutate state across “mini-batches”, window functions correlated with mini- batch size)  MillWheel (Google Cloud Dataflow) • State update and derived events committed as atomic transaction to a high-throughput transactional store • Requires a very high-throughput transactional store   Chandy-Lamport distributed snapshots (Flink) 20
  • 22. 22 JobManagerBarriers “push” prior events (assumes in-order delivery in individual channels) Operator checkpointing starting Operator checkpointing finished Operator checkpointing in progress
  • 23. 23 JobManager Operator checkpointing takes snapshot of state after ack’d data have updated the state. Checkpoints currently one-off and synchronous, WiP for incremental and asynchronous State backup Pluggable mechanism. Currently either JobManager (for small state) or file system (HDFS/Tachyon). WiP for in-memory grids
  • 24. 24 JobManager State snapshots at sinks signal successful end of this checkpoint At failure, recover last checkpointed state and restart sources from last barrier. This guarantees at least once State backup
  • 25. Best of all worlds for streaming  Low latency • Thanks to pipelined engine  Exactly-once guarantees • Variation of Chandy-Lamport  High throughput • Controllable checkpointing overhead  Separates app logic from recovery • Checkpointing interval is just a config parameter 25
  • 27. Type System and Keys What kind of data can Flink handle? 27
  • 28. Apache Flink’s Type System  Flink aims to support all data types • Ease of programming • Seamless integration with existing code  Programs are analyzed before execution • Used data types are identified • Serializer & comparator are configured 28
  • 29. Apache Flink’s Type System  Data types are either • Atomic types (like Java Primitives) • Composite types (like Flink Tuples)  Composite types nest other types  Not all data types can be used as keys! • Flink groups, joins & sorts DataSets on keys • Key types must be comparable 29
  • 30. Atomic Types Flink Type Java Type Can be used as key? BasicType Java Primitives (Integer, String, …) Yes ArrayType Arrays of Java primitives or objects No WritableType Implements Hadoop’s Writable interface Yes, if implements WritableComparable GenericType Any other type Yes, if implements Comparable 30
  • 31. Composite Types  Are composed of fields with other types • Fields types can be atomic or composite  Fields can be addressed as keys • Field type must be a key type!  A composite type can be a key type • All field types must be key types! 31
  • 32. PojoType  Any Java class that • Has an empty default constructor • Has publicly accessible fields (Public or getter/setter) public class Person { public int id; public String name; public Person() {}; public Person(int id, String name) {…}; } DataSet<Person> p = env.fromElements(new Person(1, ”Bob”)); 32
  • 33. PojoType  Define keys by field name DataSet<Person> p = … // group on “name” field d.groupBy(“name”).groupReduce(…); 33
  • 34. Scala CaseClasses  Scala case classes are natively supported case class Person(id: Int, name: String) d: DataSet[Person] = env.fromElements(new Person(1, “Bob”)  Define keys by field name // use field “name” as key d.groupBy(“name”).groupReduce(…) 34
  • 35. Composite & nested keys DataSet<Tuple3<String, Person, Double>> d = …  Composite keys are supported // group on both long fields d.groupBy(0, 1).reduceGroup(…);  Nested fields can be used as types // group on nested “name” field d.groupBy(“f1.name”).reduceGroup(…);  Full types can be used as key using “*” wildcard // group on complete nested Pojo field d.groupBy(“f1.*”).reduceGroup(…); • “*” wildcard can also be used for atomic types 35
  • 36. Join & CoGroup Keys  Key types must match for binary operations! DataSet<Tuple2<Long, String>> d1 = … DataSet<Tuple2<Long, Long>> d2 = … // works d1.join(d2).where(0).equalTo(1).with(…); // works d1.join(d2).where(“f0”).equalTo(0).with(…); // does not work! d1.join(d2).where(1).equalTo(0).with(…); 36
  • 38. Transformations  DataSet Basics presented: • Map, FlatMap, GroupBy, GroupReduce, Join  Reduce  CoGroup  Combine  GroupSort  AllReduce & AllGroupReduce  Union  see documentation for more transformations 44
  • 39. GroupReduce (Hadoop-style)  GroupReduceFunction gives iterator over elements of group • Elements are streamed (possibly from disk), not materialized in memory • Group size can exceed available JVM heap  Input type and output type may be different 45
  • 40. Reduce (FP-style) • Reduce like in functional programming • Less generic compared to GroupReduce • Function must be commutative and associative • Input type == Output type • System can apply more optimizations • Always combinable • May use a hash strategy for execution (future) 46
  • 41. Reduce (FP-style) DataSet<Tuple2<Long,Long>> sum = data .groupBy(0) .reduce(new SumReducer()); public static class SumReducer implements ReduceFunction<Tuple2<Long,Long>> { @Override public Tuple2<Long,Long> reduce( Tuple2<Long,Long> v1, Tuple2<Long,Long> v2) { v1.f1 += v2.f1; return v1; } } 47
  • 42. CoGroup  Binary operation (two inputs) • Groups both inputs on a key • Processes groups with matching keys of both inputs  Similar to GroupReduce on two inputs 48
  • 43. CoGroup DataSet<Tuple2<Long,String>> d1 = …; DataSet<Long> d2 = …; DataSet<String> d3 = d1.coGroup(d2).where(0).equalTo(1).with(new CoGrouper()); public static class CoGrouper implements CoGroupFunction<Tuple2<Long,String>,Long,String>{ @Override public void coGroup(Iterable<Tuple2<Long,String> vs1, Iterable<Long> vs2, Collector<String> out) { if(!vs2.iterator.hasNext()) { for(Tuple2<Long,String> v1 : vs1) { out.collect(v1.f1); } } } } 49
  • 44. Combiner  Local pre-aggregation of data • Before data is sent to GroupReduce or CoGroup • (functional) Reduce injects combiner automatically • Similar to Hadoop Combiner  Optional for semantics, crucial for performance! • Reduces data before it is sent over the network 50
  • 45. Combiner WordCount Example 51 Map Map Map Combine Combine Combine Reduce Reduce A B B A C A C A C B A B D A A A B C (A, 1) (B, 1) (B, 1) (A, 1) (C, 1) (A, 1) (C, 1) (A, 1) (C, 1) (B, 1) (A, 1) (B, 1) (D, 1) (A, 1) (A, 1) (A, 1) (B, 1) (C, 1) (A, 3) (C, 1) (A, 2) (C, 2) (A, 3) (C, 1) (B, 2) (B, 2) (B, 1) (D, 1) (A, 8) (C, 4) (B, 5) (D, 1)
  • 46. Use a combiner  Implement RichGroupReduceFunction<I,O> • Override combine(Iterable<I> in, Collector<O>); • Same interface as reduce() method • Annotate your GroupReduceFunction with @Combinable • Combiner will be automatically injected into Flink program  Implement a GroupCombineFunction • Same interface as GroupReduceFunction • DataSet.combineGroup() 52
  • 47. GroupSort  Sort groups before they are handed to GroupReduce or CoGroup functions • More (resource-)efficient user code • Easier user code implementation • Comes (almost) for free • Aka secondary sort (Hadoop) 53 DataSet<Tuple3<Long,Long,Long> data = …; data.groupBy(0) .sortGroup(1, Order.ASCENDING) .groupReduce(new MyReducer());
  • 48. AllReduce / AllGroupReduce  Reduce / GroupReduce without GroupBy • Operates on a single group -> Full DataSet • Full DataSet is sent to one machine • Will automatically run with parallelism of 1  Careful with large DataSets! • Make sure you have a Combiner 54
  • 49. Union  Union two data sets • Binary operation, same data type required • No duplicate elimination (SQL UNION ALL) • Very cheap operation 55 DataSet<Tuple2<String, Long> d1 = …; DataSet<Tuple2<String, Long> d2 = …; DataSet<Tuple2<String, Long> d3 = d1.union(d2);
  • 50. RichFunctions  Function interfaces have only one method • Single abstract method (SAM) • Support for Java8 Lambda functions  There is a “Rich” variant for each function. • RichFlatMapFunction, … • Additional methods • open(Configuration c) • close() • getRuntimeContext() 56
  • 51. RichFunctions & RuntimeContext  RuntimeContext has useful methods: • getIndexOfThisSubtask () • getNumberOfParallelSubtasks() • getExecutionConfig()  Gives access to: • Accumulators • DistributedCache flink.apache.org 57
  • 53. Broadcast Variables 59 map map map Example: Tag words with IDs in text corpus Dictionary Text data set broadcast (small) dictionary to all mappers
  • 54. Broadcast variables  register any DataSet as a broadcast variable  available on all parallel instances 60 // 1. The DataSet to be broadcasted DataSet<Integer> toBroadcast = env.fromElements(1, 2, 3); // 2. Broadcast the DataSet map().withBroadcastSet(toBroadcast, "broadcastSetName"); // 3. inside user defined function getRuntimeContext().getBroadcastVariable("broadcastSetName");
  • 55. Accumulators  Lightweight tool to compute stats on data • Useful to verify your assumptions about your data • Similar to Counters (Hadoop MapReduce)  Build in accumulators • Int and Long counters • Histogramm  Easily customizable 61
  • 56. JobManager counter = 4 + 18 + 22 = 44 Accumulators 62 map long counter++ Example: Count total number of words in text corpus Send local counts from parallel instances to JobManager Text data set map long counter++ map long counter++
  • 57. Using Accumulators  Use accumulators to verify your assumptions about the data 63 class Tokenizer extends RichFlatMapFunction<String, String>> { @Override public void flatMap(String val, Collector<String> out) { getRuntimeContext() .getLongCounter("elementCount").add(1L); // do more stuff. } }
  • 58. Get Accumulator Results  Accumulators are available via JobExecutionResult • returned by env.execute()  Accumulators are displayed • by CLI client • in the JobManager web frontend 64 JobExecutionResult result = env.execute("WordCount"); long ec = result.getAccumulatorResult("elementCount");
  • 60. I ♥ , do you? 66  Get involved and start a discussion on Flink‘s mailing list  { user, dev }@flink.apache.org  Subscribe to news@flink.apache.org  Follow flink.apache.org/blog and @ApacheFlink on Twitter
  • 61. 67 flink-forward.org Call for papers deadline: August 14, 2015 October 12-13, 2015 Discount code: FlinkEITSummerSchool25
  • 62. Thank you for listening! 68
  • 63. Flink compared to other projects 69
  • 64. Batch & Streaming projects Batch only Streaming only Hybrid 70
  • 65. Batch comparison 71 API low-level high-level high-level Data Transfer batch batch pipelined & batch Memory Management disk-based JVM-managed Active managed Iterations file system cached in-memory cached streamed Fault tolerance task level task level job level Good at massive scale out data exploration heavy backend & iterative jobs Libraries many external built-in & external evolving built-in & external
  • 66. Streaming comparison 72 Streaming “true” mini batches “true” API low-level high-level high-level Fault tolerance tuple-level ACKs RDD-based (lineage) coarse checkpointing State not built-in external internal Exactly once at least once exactly once exactly once Windowing not built-in restricted flexible Latency low medium low Throughput medium high high
  • 67. Stream platform architecture 73 - Gather and backup streams - Offer streams for consumption - Provide stream recovery - Analyze and correlate streams - Create derived streams and state - Provide these to downstream systems Server logs Trxn logs Sensor logs Downstream systems
  • 68. What is a stream processor? 1. Pipelining 2. Stream replay 3. Operator state 4. Backup and restore 5. High-level APIs 6. Integration with batch 7. High availability 8. Scale-in and scale-out 74 Basics State App development Large deployments See http://data-artisans.com/stream-processing-with-flink.html
  • 69. Engine comparison 75 Paradigm Optimization Execution API Optimization in all APIs Optimization of SQL queries none none DAG Transformations on k/v pair collections Iterative transformations on collections RDD Cyclic dataflows MapReduce on k/v pairs k/v pair Readers/Writers Batch sorting Batch sorting and partitioning Batch with memory pinning Stream with out-of-core algorithms MapReduce

Editor's Notes

  1. What are the technologies that enable streaming? The open source leaders in this space is Apache Kafka (that solves the integration problem), and Apache Flink (that solves the analytics problem, removing the final barrier). Kafka and Flink combined can remove the batch barriers from the infrastructure, creating a truly real-time analytics platform.