SlideShare a Scribd company logo
Introducing Arc
Klas Segeljakt Max Meldrum
@FlinkForward
A Common Intermediate Language for
Unified Batch and Stream Analytics
Outline
• Project Introduction
• The Arc Intermediate Representation (IR)
• Arc Examples
• Arc + Flink Integration?
• Conclusions
2
The Big Picture
3
• Flink plays an important
role in the data science
landscape
?
?
?
?
? ?
The Big Picture
3
• Flink plays an important
role in the data science
landscape
• Combining Flink with other
frameworks can lead to
interesting applications
?
?
?
?
? ?
The Big Picture
3
• Flink plays an important
role in the data science
landscape
• Combining Flink with other
frameworks can lead to
interesting applications
• However, there is a
language barrier
?
?
?
?
? ?
Intuition
4
f1 f2 f3
Intuition
4
f1 f2 f3
• No cross-optimisation
optimisation is possible,
e.g. resource sharing
Intuition
4
f1 f2 f3
• No cross-optimisation
optimisation is possible,
e.g. resource sharing
• Data movement

costs ( )
Intuition
4
#Frameworks
Performance
f1 f2 f3
• No cross-optimisation
optimisation is possible,
e.g. resource sharing
• Data movement

costs ( )
Intuition
4
#Frameworks
Performance
f1 f2 f3
• No cross-optimisation
optimisation is possible,
e.g. resource sharing
• Data movement

costs ( )
Intuition
4
#Frameworks
Performance
f1 f2 f3
• No cross-optimisation
optimisation is possible,
e.g. resource sharing
• Data movement

costs ( )
f1 f2 f3IR IR IR
Intuition
4
#Frameworks
Performance
f1 f2 f3
• No cross-optimisation
optimisation is possible,
e.g. resource sharing
• Data movement

costs ( )
f1 f2 f3IR IR IR
f1 + f2 + f3IR
The Arc IR
5
• Streams
• Tables
• Linear algebra
High-level
• Runners
• Hardware
Low-level
Arc
The Arc IR
5
• Streams
• Tables
• Linear algebra
High-level
• Runners
• Hardware
Low-level
Arc
The Arc IR
5
• Streams
• Tables
• Linear algebra
High-level
• Runners
• Hardware
Low-level
Abstractions
• Pipelines (Operators/Sources/Sinks)
• User-defined Windows
• Out-of-Order Processing, ...
Arc
The Arc IR
5
• Streams
• Tables
• Linear algebra
High-level
• Runners
• Hardware
Low-level
Optimisations
• Compiler: Partial evaluation, ...
• Dataflow: Operator fusion,
fission, reordering, ...
Abstractions
• Pipelines (Operators/Sources/Sinks)
• User-defined Windows
• Out-of-Order Processing, ...
Compiler Pipeline
6
Arc (High Level IR)
Frontends
Logical Dataflow IR
Binaries
Physical Dataflow IR
Compiler Pipeline
6
Arc (High Level IR)
Frontends
Logical Dataflow IR
Binaries
Physical Dataflow IR
Flink Backend
Flink Frontend
7
What is Arc?
7
What is Arc?
A restrictive language for describing batch and stream transformations
7
What is Arc?
A restrictive language for describing batch and stream transformations
Transformations are modelled through:
7
What is Arc?
A restrictive language for describing batch and stream transformations
Transformations are modelled through:
• Values: Read-only data types (e.g. Vec[T], Stream[T], i8..i64)
7
What is Arc?
A restrictive language for describing batch and stream transformations
Transformations are modelled through:
• Values: Read-only data types (e.g. Vec[T], Stream[T], i8..i64)
• Builders: Write-only data types (e.g. Appender[T])
7
What is Arc?
A restrictive language for describing batch and stream transformations
Transformations are modelled through:
• Values: Read-only data types (e.g. Vec[T], Stream[T], i8..i64)
• Builders: Write-only data types (e.g. Appender[T])
• Values are written to builders, and builders are lazily materialised back into values
7
What is Arc?
A restrictive language for describing batch and stream transformations
Transformations are modelled through:
• Values: Read-only data types (e.g. Vec[T], Stream[T], i8..i64)
• Builders: Write-only data types (e.g. Appender[T])
• Values are written to builders, and builders are lazily materialised back into values
➡ Dependencies between values and builders form a dataflow graph
8
source
evenSink
oddSink
map(v+5)
filter(v%2==0)
filter(v%2!=0)
Arc Example
8
|source:Stream[i32],
evenSink:StreamAppender[i32],
oddSink:StreamAppender[i32]|
let mapped = result(for(source,
StreamAppender[i32],
|out, v| merge(out, v + 5)));
for(mapped, evenSink, |out, v|
if (v % 2 == 0, merge(out, v), out));
for(mapped, oddSink, |out, v|
if (v % 2 != 0, merge(out, v), out))
Arc
source
evenSink
oddSink
map(v+5)
filter(v%2==0)
filter(v%2!=0)
Arc Example
9
source
evenSink
oddSink
map(v+5)
filter(v%2==0)
filter(v%2!=0)
Arc Example
source
evenSink
oddSink
map(v+5) then if(x%2==0)
10
source
evenSink
oddSink
map(v+5) then if(x%2==0)
Arc Example (Fused)
|source:Stream[i32],
evenSink:StreamAppender[i32],
oddSink:StreamAppender[i32]|
let mapped = result(for(source,
StreamAppender[i32],
|out, v| merge(out, v + 5)));
for(mapped, evenSink, |out, v|
if (v % 2 == 0, merge(out, v), out));
for(mapped, oddSink, |out, v|
if (v % 2 != 0, merge(out, v), out))
Unfused
10
source
evenSink
oddSink
map(v+5) then if(x%2==0)
Arc Example (Fused)
|source:Stream[i32],
evenSink:StreamAppender[i32],
oddSink:StreamAppender[i32]|
let mapped = result(for(source,
StreamAppender[i32],
|out, v| merge(out, v + 5)));
for(mapped, evenSink, |out, v|
if (v % 2 == 0, merge(out, v), out));
for(mapped, oddSink, |out, v|
if (v % 2 != 0, merge(out, v), out))
Unfused
|source:Stream[i32],
evenSink:StreamAppender[i32],
oddSink:StreamAppender[i32]|
for(source,
{evenSink,oddSink},
|out, v|
let x = v + 5;
if (x % 2 == 0,
{merge(out.$1, x), out.$2},
{out.$1, merge(out.$2, x)}))
Fused
11
Arc + Flink?
• Benefits:
11
Arc + Flink?
• Benefits:
• Enable stronger optimisations
11
Arc + Flink?
• Benefits:
• Enable stronger optimisations
• Use your other favourite libraries together with Flink
11
Arc + Flink?
• Benefits:
• Enable stronger optimisations
• Use your other favourite libraries together with Flink
• Make life easier for data scientists
11
Arc + Flink?
The black box problem
UDFs are black boxes
12
stream.map( )
.filter( )
.reduce( )
The black box problem
UDFs are black boxes
➡ Flink is unaware of what is being executed
inside of each black box
12
stream.map( )
.filter( )
.reduce( )
Fusion Levels
13
= Flink Task ~ Thread
Fusion Levels
13
= Flink Task ~ Thread
x + 1 x + 1 x + 1 x + 11. No Fusion
Fusion Levels
13
= Flink Task ~ Thread
x + 1 x + 1 x + 1 x + 11. No Fusion
x + 1 x + 1 x + 1 x + 12.Task Fusion
Fusion Levels
13
= Flink Task ~ Thread
x + 1 x + 1 x + 1 x + 11. No Fusion
x + 1 x + 1 x + 1 x + 12.Task Fusion
3. Invocation-level
Fusion
x + 1
for-loop
4X
Fusion Levels
13
= Flink Task ~ Thread
x + 1 x + 1 x + 1 x + 11. No Fusion
x + 1 x + 1 x + 1 x + 12.Task Fusion
4. Instruction-level
Fusion
x + 4
3. Invocation-level
Fusion
x + 1
for-loop
4X
14
Experiment Results
100
101
102
103
ExecutionTime(seconds)
None
Task(Flink)
Invocation
Instruction
50 maps on 10M elements
N
one
Task(Flink)
Invocation
Instruction
Optimisation level
(Lower is better)
Example Frontend Code
(Pandas + Beam)
15
Example Frontend Code
(Pandas + Beam)
15
Example Frontend Code
(Pandas + Beam)
15
import arc.beam as beam
import arc.beam.transforms.window as window
import arc.beam.transforms.combiners as combiners
import arc.pandas as pandas
Example Frontend Code
(Pandas + Beam)
15
import arc.beam as beam
import arc.beam.transforms.window as window
import arc.beam.transforms.combiners as combiners
import arc.pandas as pandas
def normalise(elements):
series = pandas.Series(elements)
avg = series.sum() / series.count()
return series / avg
Example Frontend Code
(Pandas + Beam)
15
import arc.beam as beam
import arc.beam.transforms.window as window
import arc.beam.transforms.combiners as combiners
import arc.pandas as pandas
def normalise(elements):
series = pandas.Series(elements)
avg = series.sum() / series.count()
return series / avg
p = beam.Pipeline()
(p
| beam.io.ReadFromText(path='input.txt').with_output_types(int)
| beam.WindowInto(window.FixedWindows(size=5))
| beam.CombineGlobally(normalise)
| combiners.ToList()
| beam.io.WriteToText(path='output.txt'))
p.run()
Example Frontend Code
(Pandas + Beam)
15
import arc.beam as beam
import arc.beam.transforms.window as window
import arc.beam.transforms.combiners as combiners
import arc.pandas as pandas
def normalise(elements):
series = pandas.Series(elements)
avg = series.sum() / series.count()
return series / avg
p = beam.Pipeline()
(p
| beam.io.ReadFromText(path='input.txt').with_output_types(int)
| beam.WindowInto(window.FixedWindows(size=5))
| beam.CombineGlobally(normalise)
| combiners.ToList()
| beam.io.WriteToText(path='output.txt'))
p.run()
?
16
One more thing...
• Flink inspired dataflow engine built in Rust
• Goals:
• Common runtime for Arc applications
• Support dynamic task execution
• First-class support for hardware acceleration
17
Arcon: Native Arc Runner
• Arc is an IR for batch and stream programming.
• By raising the level of abstraction, Arc is able to both
optimise the dataflow and the code within it.
18
Conclusions
Arc and experiments can be found at https://github.com/cda-group
Contact info: klasseg@kth.se & mmeldrum@kth.se
References
Publications:
• Kroll, L., Segeljakt, K., Carbone, P., Schulte, C. and Haridi, S., 2019, June.
Arc: an IR for batch and stream programming. In Proceedings of the 17th
ACM SIGPLAN International Symposium on Database Programming
Languages (pp. 53-58). ACM.
• Meldrum, M., Segeljakt, K., Kroll, L., Carbone, P., Schulte, C. and Haridi,
S., 2019, August. Arcon: Continuous and Deep Data Stream Analytics.
In Proceedings of Real-Time Business Intelligence and Analytics (p. 3).
ACM.
19

More Related Content

What's hot

Stateful Distributed Stream Processing
Stateful Distributed Stream ProcessingStateful Distributed Stream Processing
Stateful Distributed Stream Processing
Gyula Fóra
 
Spark, GraphX, and Blockchains: Building a Behavioral Analytics Platform for ...
Spark, GraphX, and Blockchains: Building a Behavioral Analytics Platform for ...Spark, GraphX, and Blockchains: Building a Behavioral Analytics Platform for ...
Spark, GraphX, and Blockchains: Building a Behavioral Analytics Platform for ...
Databricks
 
Going Reactive with Spring 5
Going Reactive with Spring 5Going Reactive with Spring 5
Going Reactive with Spring 5
Drazen Nikolic
 
Fast and Reliable Apache Spark SQL Engine
Fast and Reliable Apache Spark SQL EngineFast and Reliable Apache Spark SQL Engine
Fast and Reliable Apache Spark SQL Engine
Databricks
 
Flink Streaming Berlin Meetup
Flink Streaming Berlin MeetupFlink Streaming Berlin Meetup
Flink Streaming Berlin Meetup
Márton Balassi
 
Apache Flink Training: System Overview
Apache Flink Training: System OverviewApache Flink Training: System Overview
Apache Flink Training: System Overview
Flink Forward
 
A Deep Dive into Query Execution Engine of Spark SQL
A Deep Dive into Query Execution Engine of Spark SQLA Deep Dive into Query Execution Engine of Spark SQL
A Deep Dive into Query Execution Engine of Spark SQL
Databricks
 
The Past, Present, and Future of Apache Flink®
The Past, Present, and Future of Apache Flink®The Past, Present, and Future of Apache Flink®
The Past, Present, and Future of Apache Flink®
Aljoscha Krettek
 
Introduction to Streaming with Apache Flink
Introduction to Streaming with Apache FlinkIntroduction to Streaming with Apache Flink
Introduction to Streaming with Apache Flink
Tugdual Grall
 
Reactive Spring 5
Reactive Spring 5Reactive Spring 5
Reactive Spring 5
Corneil du Plessis
 
Towards sql for streams
Towards sql for streamsTowards sql for streams
Towards sql for streams
Radu Tudoran
 
The Stream Processor as a Database Apache Flink
The Stream Processor as a Database Apache FlinkThe Stream Processor as a Database Apache Flink
The Stream Processor as a Database Apache Flink
DataWorks Summit/Hadoop Summit
 
Flink Forward SF 2017: Shaoxuan Wang_Xiaowei Jiang - Blinks Improvements to F...
Flink Forward SF 2017: Shaoxuan Wang_Xiaowei Jiang - Blinks Improvements to F...Flink Forward SF 2017: Shaoxuan Wang_Xiaowei Jiang - Blinks Improvements to F...
Flink Forward SF 2017: Shaoxuan Wang_Xiaowei Jiang - Blinks Improvements to F...
Flink Forward
 
Apache Beam: A unified model for batch and stream processing data
Apache Beam: A unified model for batch and stream processing dataApache Beam: A unified model for batch and stream processing data
Apache Beam: A unified model for batch and stream processing data
DataWorks Summit/Hadoop Summit
 
From Big to Fast Data. How #kafka and #kafka-connect can redefine you ETL and...
From Big to Fast Data. How #kafka and #kafka-connect can redefine you ETL and...From Big to Fast Data. How #kafka and #kafka-connect can redefine you ETL and...
From Big to Fast Data. How #kafka and #kafka-connect can redefine you ETL and...
Landoop Ltd
 
Apache Flink Meetup Munich (November 2015): Flink Overview, Architecture, Int...
Apache Flink Meetup Munich (November 2015): Flink Overview, Architecture, Int...Apache Flink Meetup Munich (November 2015): Flink Overview, Architecture, Int...
Apache Flink Meetup Munich (November 2015): Flink Overview, Architecture, Int...
Robert Metzger
 
Gelly-Stream: Single-Pass Graph Streaming Analytics with Apache Flink
Gelly-Stream: Single-Pass Graph Streaming Analytics with Apache FlinkGelly-Stream: Single-Pass Graph Streaming Analytics with Apache Flink
Gelly-Stream: Single-Pass Graph Streaming Analytics with Apache Flink
Vasia Kalavri
 
Apache Flink: API, runtime, and project roadmap
Apache Flink: API, runtime, and project roadmapApache Flink: API, runtime, and project roadmap
Apache Flink: API, runtime, and project roadmap
Kostas Tzoumas
 
Asynchronous Epoch Commits for Fast and Reliable Data Stream Execution in Apa...
Asynchronous Epoch Commits for Fast and Reliable Data Stream Execution in Apa...Asynchronous Epoch Commits for Fast and Reliable Data Stream Execution in Apa...
Asynchronous Epoch Commits for Fast and Reliable Data Stream Execution in Apa...
Paris Carbone
 
Flink Streaming @BudapestData
Flink Streaming @BudapestDataFlink Streaming @BudapestData
Flink Streaming @BudapestData
Gyula Fóra
 

What's hot (20)

Stateful Distributed Stream Processing
Stateful Distributed Stream ProcessingStateful Distributed Stream Processing
Stateful Distributed Stream Processing
 
Spark, GraphX, and Blockchains: Building a Behavioral Analytics Platform for ...
Spark, GraphX, and Blockchains: Building a Behavioral Analytics Platform for ...Spark, GraphX, and Blockchains: Building a Behavioral Analytics Platform for ...
Spark, GraphX, and Blockchains: Building a Behavioral Analytics Platform for ...
 
Going Reactive with Spring 5
Going Reactive with Spring 5Going Reactive with Spring 5
Going Reactive with Spring 5
 
Fast and Reliable Apache Spark SQL Engine
Fast and Reliable Apache Spark SQL EngineFast and Reliable Apache Spark SQL Engine
Fast and Reliable Apache Spark SQL Engine
 
Flink Streaming Berlin Meetup
Flink Streaming Berlin MeetupFlink Streaming Berlin Meetup
Flink Streaming Berlin Meetup
 
Apache Flink Training: System Overview
Apache Flink Training: System OverviewApache Flink Training: System Overview
Apache Flink Training: System Overview
 
A Deep Dive into Query Execution Engine of Spark SQL
A Deep Dive into Query Execution Engine of Spark SQLA Deep Dive into Query Execution Engine of Spark SQL
A Deep Dive into Query Execution Engine of Spark SQL
 
The Past, Present, and Future of Apache Flink®
The Past, Present, and Future of Apache Flink®The Past, Present, and Future of Apache Flink®
The Past, Present, and Future of Apache Flink®
 
Introduction to Streaming with Apache Flink
Introduction to Streaming with Apache FlinkIntroduction to Streaming with Apache Flink
Introduction to Streaming with Apache Flink
 
Reactive Spring 5
Reactive Spring 5Reactive Spring 5
Reactive Spring 5
 
Towards sql for streams
Towards sql for streamsTowards sql for streams
Towards sql for streams
 
The Stream Processor as a Database Apache Flink
The Stream Processor as a Database Apache FlinkThe Stream Processor as a Database Apache Flink
The Stream Processor as a Database Apache Flink
 
Flink Forward SF 2017: Shaoxuan Wang_Xiaowei Jiang - Blinks Improvements to F...
Flink Forward SF 2017: Shaoxuan Wang_Xiaowei Jiang - Blinks Improvements to F...Flink Forward SF 2017: Shaoxuan Wang_Xiaowei Jiang - Blinks Improvements to F...
Flink Forward SF 2017: Shaoxuan Wang_Xiaowei Jiang - Blinks Improvements to F...
 
Apache Beam: A unified model for batch and stream processing data
Apache Beam: A unified model for batch and stream processing dataApache Beam: A unified model for batch and stream processing data
Apache Beam: A unified model for batch and stream processing data
 
From Big to Fast Data. How #kafka and #kafka-connect can redefine you ETL and...
From Big to Fast Data. How #kafka and #kafka-connect can redefine you ETL and...From Big to Fast Data. How #kafka and #kafka-connect can redefine you ETL and...
From Big to Fast Data. How #kafka and #kafka-connect can redefine you ETL and...
 
Apache Flink Meetup Munich (November 2015): Flink Overview, Architecture, Int...
Apache Flink Meetup Munich (November 2015): Flink Overview, Architecture, Int...Apache Flink Meetup Munich (November 2015): Flink Overview, Architecture, Int...
Apache Flink Meetup Munich (November 2015): Flink Overview, Architecture, Int...
 
Gelly-Stream: Single-Pass Graph Streaming Analytics with Apache Flink
Gelly-Stream: Single-Pass Graph Streaming Analytics with Apache FlinkGelly-Stream: Single-Pass Graph Streaming Analytics with Apache Flink
Gelly-Stream: Single-Pass Graph Streaming Analytics with Apache Flink
 
Apache Flink: API, runtime, and project roadmap
Apache Flink: API, runtime, and project roadmapApache Flink: API, runtime, and project roadmap
Apache Flink: API, runtime, and project roadmap
 
Asynchronous Epoch Commits for Fast and Reliable Data Stream Execution in Apa...
Asynchronous Epoch Commits for Fast and Reliable Data Stream Execution in Apa...Asynchronous Epoch Commits for Fast and Reliable Data Stream Execution in Apa...
Asynchronous Epoch Commits for Fast and Reliable Data Stream Execution in Apa...
 
Flink Streaming @BudapestData
Flink Streaming @BudapestDataFlink Streaming @BudapestData
Flink Streaming @BudapestData
 

Similar to Introducing Arc: A Common Intermediate Language for Unified Batch and Stream Analytics - Max Meldrum & Klas Segeljakt, KTH

Programming Languages: some news for the last N years
Programming Languages: some news for the last N yearsProgramming Languages: some news for the last N years
Programming Languages: some news for the last N years
Ruslan Shevchenko
 
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
Jose Quesada (hiring)
 
ElixirでFPGAを設計する
ElixirでFPGAを設計するElixirでFPGAを設計する
ElixirでFPGAを設計する
Hideki Takase
 
Greg Hogan – To Petascale and Beyond- Apache Flink in the Clouds
Greg Hogan – To Petascale and Beyond- Apache Flink in the CloudsGreg Hogan – To Petascale and Beyond- Apache Flink in the Clouds
Greg Hogan – To Petascale and Beyond- Apache Flink in the Clouds
Flink Forward
 
What`s New in Java 8
What`s New in Java 8What`s New in Java 8
What`s New in Java 8
Mohsen Zainalpour
 
The Nitty Gritty of Advanced Analytics Using Apache Spark in Python
The Nitty Gritty of Advanced Analytics Using Apache Spark in PythonThe Nitty Gritty of Advanced Analytics Using Apache Spark in Python
The Nitty Gritty of Advanced Analytics Using Apache Spark in Python
Miklos Christine
 
Building a modern Application with DataFrames
Building a modern Application with DataFramesBuilding a modern Application with DataFrames
Building a modern Application with DataFrames
Spark Summit
 
Building a modern Application with DataFrames
Building a modern Application with DataFramesBuilding a modern Application with DataFrames
Building a modern Application with DataFrames
Databricks
 
Analyzing Dart Language with Pharo: Report and early results
Analyzing Dart Language with Pharo: Report and early resultsAnalyzing Dart Language with Pharo: Report and early results
Analyzing Dart Language with Pharo: Report and early results
ESUG
 
Spark sql meetup
Spark sql meetupSpark sql meetup
Spark sql meetup
Michael Zhang
 
Composable Parallel Processing in Apache Spark and Weld
Composable Parallel Processing in Apache Spark and WeldComposable Parallel Processing in Apache Spark and Weld
Composable Parallel Processing in Apache Spark and Weld
Databricks
 
Introduction to D programming language at Weka.IO
Introduction to D programming language at Weka.IOIntroduction to D programming language at Weka.IO
Introduction to D programming language at Weka.IO
Liran Zvibel
 
Preparing for Scala 3
Preparing for Scala 3Preparing for Scala 3
Preparing for Scala 3
Martin Odersky
 
Jump Start into Apache® Spark™ and Databricks
Jump Start into Apache® Spark™ and DatabricksJump Start into Apache® Spark™ and Databricks
Jump Start into Apache® Spark™ and Databricks
Databricks
 
Spark - The Ultimate Scala Collections by Martin Odersky
Spark - The Ultimate Scala Collections by Martin OderskySpark - The Ultimate Scala Collections by Martin Odersky
Spark - The Ultimate Scala Collections by Martin Odersky
Spark Summit
 
Big Data Processing with .NET and Spark (SQLBits 2020)
Big Data Processing with .NET and Spark (SQLBits 2020)Big Data Processing with .NET and Spark (SQLBits 2020)
Big Data Processing with .NET and Spark (SQLBits 2020)
Michael Rys
 
Graphs in data structures are non-linear data structures made up of a finite ...
Graphs in data structures are non-linear data structures made up of a finite ...Graphs in data structures are non-linear data structures made up of a finite ...
Graphs in data structures are non-linear data structures made up of a finite ...
bhargavi804095
 
Writing DSL with Applicative Functors
Writing DSL with Applicative FunctorsWriting DSL with Applicative Functors
Writing DSL with Applicative Functors
David Galichet
 
MADlib Architecture and Functional Demo on How to Use MADlib/PivotalR
MADlib Architecture and Functional Demo on How to Use MADlib/PivotalRMADlib Architecture and Functional Demo on How to Use MADlib/PivotalR
MADlib Architecture and Functional Demo on How to Use MADlib/PivotalR
PivotalOpenSourceHub
 
Kyo - Functional Scala 2023.pdf
Kyo - Functional Scala 2023.pdfKyo - Functional Scala 2023.pdf
Kyo - Functional Scala 2023.pdf
Flavio W. Brasil
 

Similar to Introducing Arc: A Common Intermediate Language for Unified Batch and Stream Analytics - Max Meldrum & Klas Segeljakt, KTH (20)

Programming Languages: some news for the last N years
Programming Languages: some news for the last N yearsProgramming Languages: some news for the last N years
Programming Languages: some news for the last N years
 
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
 
ElixirでFPGAを設計する
ElixirでFPGAを設計するElixirでFPGAを設計する
ElixirでFPGAを設計する
 
Greg Hogan – To Petascale and Beyond- Apache Flink in the Clouds
Greg Hogan – To Petascale and Beyond- Apache Flink in the CloudsGreg Hogan – To Petascale and Beyond- Apache Flink in the Clouds
Greg Hogan – To Petascale and Beyond- Apache Flink in the Clouds
 
What`s New in Java 8
What`s New in Java 8What`s New in Java 8
What`s New in Java 8
 
The Nitty Gritty of Advanced Analytics Using Apache Spark in Python
The Nitty Gritty of Advanced Analytics Using Apache Spark in PythonThe Nitty Gritty of Advanced Analytics Using Apache Spark in Python
The Nitty Gritty of Advanced Analytics Using Apache Spark in Python
 
Building a modern Application with DataFrames
Building a modern Application with DataFramesBuilding a modern Application with DataFrames
Building a modern Application with DataFrames
 
Building a modern Application with DataFrames
Building a modern Application with DataFramesBuilding a modern Application with DataFrames
Building a modern Application with DataFrames
 
Analyzing Dart Language with Pharo: Report and early results
Analyzing Dart Language with Pharo: Report and early resultsAnalyzing Dart Language with Pharo: Report and early results
Analyzing Dart Language with Pharo: Report and early results
 
Spark sql meetup
Spark sql meetupSpark sql meetup
Spark sql meetup
 
Composable Parallel Processing in Apache Spark and Weld
Composable Parallel Processing in Apache Spark and WeldComposable Parallel Processing in Apache Spark and Weld
Composable Parallel Processing in Apache Spark and Weld
 
Introduction to D programming language at Weka.IO
Introduction to D programming language at Weka.IOIntroduction to D programming language at Weka.IO
Introduction to D programming language at Weka.IO
 
Preparing for Scala 3
Preparing for Scala 3Preparing for Scala 3
Preparing for Scala 3
 
Jump Start into Apache® Spark™ and Databricks
Jump Start into Apache® Spark™ and DatabricksJump Start into Apache® Spark™ and Databricks
Jump Start into Apache® Spark™ and Databricks
 
Spark - The Ultimate Scala Collections by Martin Odersky
Spark - The Ultimate Scala Collections by Martin OderskySpark - The Ultimate Scala Collections by Martin Odersky
Spark - The Ultimate Scala Collections by Martin Odersky
 
Big Data Processing with .NET and Spark (SQLBits 2020)
Big Data Processing with .NET and Spark (SQLBits 2020)Big Data Processing with .NET and Spark (SQLBits 2020)
Big Data Processing with .NET and Spark (SQLBits 2020)
 
Graphs in data structures are non-linear data structures made up of a finite ...
Graphs in data structures are non-linear data structures made up of a finite ...Graphs in data structures are non-linear data structures made up of a finite ...
Graphs in data structures are non-linear data structures made up of a finite ...
 
Writing DSL with Applicative Functors
Writing DSL with Applicative FunctorsWriting DSL with Applicative Functors
Writing DSL with Applicative Functors
 
MADlib Architecture and Functional Demo on How to Use MADlib/PivotalR
MADlib Architecture and Functional Demo on How to Use MADlib/PivotalRMADlib Architecture and Functional Demo on How to Use MADlib/PivotalR
MADlib Architecture and Functional Demo on How to Use MADlib/PivotalR
 
Kyo - Functional Scala 2023.pdf
Kyo - Functional Scala 2023.pdfKyo - Functional Scala 2023.pdf
Kyo - Functional Scala 2023.pdf
 

More from Flink Forward

Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...
Flink Forward
 
Evening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in Flink
Flink Forward
 
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
Flink Forward
 
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Flink Forward
 
Introducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorIntroducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes Operator
Flink Forward
 
Autoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeAutoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive Mode
Flink Forward
 
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Flink Forward
 
One sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async SinkOne sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async Sink
Flink Forward
 
Tuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptxTuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptx
Flink Forward
 
Flink powered stream processing platform at Pinterest
Flink powered stream processing platform at PinterestFlink powered stream processing platform at Pinterest
Flink powered stream processing platform at Pinterest
Flink Forward
 
Apache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraApache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native Era
Flink Forward
 
Where is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in FlinkWhere is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in Flink
Flink Forward
 
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production DeploymentUsing the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Flink Forward
 
The Current State of Table API in 2022
The Current State of Table API in 2022The Current State of Table API in 2022
The Current State of Table API in 2022
Flink Forward
 
Flink SQL on Pulsar made easy
Flink SQL on Pulsar made easyFlink SQL on Pulsar made easy
Flink SQL on Pulsar made easy
Flink Forward
 
Dynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data AlertsDynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data Alerts
Flink Forward
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotExactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Flink Forward
 
Processing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial ServicesProcessing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial Services
Flink Forward
 
Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...
Flink Forward
 
Batch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergBatch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & Iceberg
Flink Forward
 

More from Flink Forward (20)

Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...
 
Evening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in Flink
 
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
 
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
 
Introducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorIntroducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes Operator
 
Autoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeAutoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive Mode
 
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
 
One sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async SinkOne sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async Sink
 
Tuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptxTuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptx
 
Flink powered stream processing platform at Pinterest
Flink powered stream processing platform at PinterestFlink powered stream processing platform at Pinterest
Flink powered stream processing platform at Pinterest
 
Apache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraApache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native Era
 
Where is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in FlinkWhere is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in Flink
 
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production DeploymentUsing the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production Deployment
 
The Current State of Table API in 2022
The Current State of Table API in 2022The Current State of Table API in 2022
The Current State of Table API in 2022
 
Flink SQL on Pulsar made easy
Flink SQL on Pulsar made easyFlink SQL on Pulsar made easy
Flink SQL on Pulsar made easy
 
Dynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data AlertsDynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data Alerts
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotExactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
 
Processing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial ServicesProcessing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial Services
 
Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...
 
Batch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergBatch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & Iceberg
 

Recently uploaded

Discovering Egypt A Step-by-Step Guide to Planning Your Trip.ppt
Discovering Egypt A Step-by-Step Guide to Planning Your Trip.pptDiscovering Egypt A Step-by-Step Guide to Planning Your Trip.ppt
Discovering Egypt A Step-by-Step Guide to Planning Your Trip.ppt
Imperial Egypt
 
Excursions in Tahiti Island Adventure
Excursions in Tahiti Island AdventureExcursions in Tahiti Island Adventure
Excursions in Tahiti Island Adventure
Unique Tahiti
 
Best Places to Stay in New Brunswick, Canada.
Best Places to Stay in New Brunswick, Canada.Best Places to Stay in New Brunswick, Canada.
Best Places to Stay in New Brunswick, Canada.
Mahogany Manor
 
Top 10 Tourist Places in South India to Explore.pdf
Top 10 Tourist Places in South India to Explore.pdfTop 10 Tourist Places in South India to Explore.pdf
Top 10 Tourist Places in South India to Explore.pdf
Savita Yadav
 
Wayanad-The-Touristry-Heaven to the tour.pptx
Wayanad-The-Touristry-Heaven to the tour.pptxWayanad-The-Touristry-Heaven to the tour.pptx
Wayanad-The-Touristry-Heaven to the tour.pptx
cosmo-soil
 
Assessing the Influence of Transportation on the Tourism Industry in Nigeria
Assessing the Influence of Transportation on the  Tourism Industry in NigeriaAssessing the Influence of Transportation on the  Tourism Industry in Nigeria
Assessing the Influence of Transportation on the Tourism Industry in Nigeria
gsochially
 
Agama Tours&Safaris-Kilimanjaro day trip
Agama Tours&Safaris-Kilimanjaro day tripAgama Tours&Safaris-Kilimanjaro day trip
Agama Tours&Safaris-Kilimanjaro day trip
Agama Tours
 
在线办理(BU毕业证书)波士顿大学毕业证录取通知书一模一样
在线办理(BU毕业证书)波士顿大学毕业证录取通知书一模一样在线办理(BU毕业证书)波士顿大学毕业证录取通知书一模一样
在线办理(BU毕业证书)波士顿大学毕业证录取通知书一模一样
v6ldcxuq
 
Un viaje a Argentina updated xxxxxxxxxxx
Un viaje a Argentina updated xxxxxxxxxxxUn viaje a Argentina updated xxxxxxxxxxx
Un viaje a Argentina updated xxxxxxxxxxx
Judy Hochberg
 
Understanding Bus Hire ServicesIN MELBOURNE .pptx
Understanding Bus Hire ServicesIN MELBOURNE .pptxUnderstanding Bus Hire ServicesIN MELBOURNE .pptx
Understanding Bus Hire ServicesIN MELBOURNE .pptx
MELBOURNEBUSHIRE
 
How Do I Plan a Kilimanjaro Climb? 7 Essential Tips Revealed.pdf
How Do I Plan a Kilimanjaro Climb? 7 Essential Tips Revealed.pdfHow Do I Plan a Kilimanjaro Climb? 7 Essential Tips Revealed.pdf
How Do I Plan a Kilimanjaro Climb? 7 Essential Tips Revealed.pdf
Eastafrica Travelcompany
 
How To Change A Name On American Airlines Ticket.pptx
How To Change A Name On American Airlines Ticket.pptxHow To Change A Name On American Airlines Ticket.pptx
How To Change A Name On American Airlines Ticket.pptx
edqour001namechange
 
How To Change Your Name On American Airlines Aadvantage.pptx
How To Change Your Name On American Airlines Aadvantage.pptxHow To Change Your Name On American Airlines Aadvantage.pptx
How To Change Your Name On American Airlines Aadvantage.pptx
edqour001namechange
 
What Challenges Await Beginners in Snowshoeing
What Challenges Await Beginners in SnowshoeingWhat Challenges Await Beginners in Snowshoeing
What Challenges Await Beginners in Snowshoeing
Snowshoe Tahoe
 
Un viaje a Buenos Aires y sus alrededores
Un viaje a Buenos Aires y sus alrededoresUn viaje a Buenos Aires y sus alrededores
Un viaje a Buenos Aires y sus alrededores
Judy Hochberg
 

Recently uploaded (15)

Discovering Egypt A Step-by-Step Guide to Planning Your Trip.ppt
Discovering Egypt A Step-by-Step Guide to Planning Your Trip.pptDiscovering Egypt A Step-by-Step Guide to Planning Your Trip.ppt
Discovering Egypt A Step-by-Step Guide to Planning Your Trip.ppt
 
Excursions in Tahiti Island Adventure
Excursions in Tahiti Island AdventureExcursions in Tahiti Island Adventure
Excursions in Tahiti Island Adventure
 
Best Places to Stay in New Brunswick, Canada.
Best Places to Stay in New Brunswick, Canada.Best Places to Stay in New Brunswick, Canada.
Best Places to Stay in New Brunswick, Canada.
 
Top 10 Tourist Places in South India to Explore.pdf
Top 10 Tourist Places in South India to Explore.pdfTop 10 Tourist Places in South India to Explore.pdf
Top 10 Tourist Places in South India to Explore.pdf
 
Wayanad-The-Touristry-Heaven to the tour.pptx
Wayanad-The-Touristry-Heaven to the tour.pptxWayanad-The-Touristry-Heaven to the tour.pptx
Wayanad-The-Touristry-Heaven to the tour.pptx
 
Assessing the Influence of Transportation on the Tourism Industry in Nigeria
Assessing the Influence of Transportation on the  Tourism Industry in NigeriaAssessing the Influence of Transportation on the  Tourism Industry in Nigeria
Assessing the Influence of Transportation on the Tourism Industry in Nigeria
 
Agama Tours&Safaris-Kilimanjaro day trip
Agama Tours&Safaris-Kilimanjaro day tripAgama Tours&Safaris-Kilimanjaro day trip
Agama Tours&Safaris-Kilimanjaro day trip
 
在线办理(BU毕业证书)波士顿大学毕业证录取通知书一模一样
在线办理(BU毕业证书)波士顿大学毕业证录取通知书一模一样在线办理(BU毕业证书)波士顿大学毕业证录取通知书一模一样
在线办理(BU毕业证书)波士顿大学毕业证录取通知书一模一样
 
Un viaje a Argentina updated xxxxxxxxxxx
Un viaje a Argentina updated xxxxxxxxxxxUn viaje a Argentina updated xxxxxxxxxxx
Un viaje a Argentina updated xxxxxxxxxxx
 
Understanding Bus Hire ServicesIN MELBOURNE .pptx
Understanding Bus Hire ServicesIN MELBOURNE .pptxUnderstanding Bus Hire ServicesIN MELBOURNE .pptx
Understanding Bus Hire ServicesIN MELBOURNE .pptx
 
How Do I Plan a Kilimanjaro Climb? 7 Essential Tips Revealed.pdf
How Do I Plan a Kilimanjaro Climb? 7 Essential Tips Revealed.pdfHow Do I Plan a Kilimanjaro Climb? 7 Essential Tips Revealed.pdf
How Do I Plan a Kilimanjaro Climb? 7 Essential Tips Revealed.pdf
 
How To Change A Name On American Airlines Ticket.pptx
How To Change A Name On American Airlines Ticket.pptxHow To Change A Name On American Airlines Ticket.pptx
How To Change A Name On American Airlines Ticket.pptx
 
How To Change Your Name On American Airlines Aadvantage.pptx
How To Change Your Name On American Airlines Aadvantage.pptxHow To Change Your Name On American Airlines Aadvantage.pptx
How To Change Your Name On American Airlines Aadvantage.pptx
 
What Challenges Await Beginners in Snowshoeing
What Challenges Await Beginners in SnowshoeingWhat Challenges Await Beginners in Snowshoeing
What Challenges Await Beginners in Snowshoeing
 
Un viaje a Buenos Aires y sus alrededores
Un viaje a Buenos Aires y sus alrededoresUn viaje a Buenos Aires y sus alrededores
Un viaje a Buenos Aires y sus alrededores
 

Introducing Arc: A Common Intermediate Language for Unified Batch and Stream Analytics - Max Meldrum & Klas Segeljakt, KTH

  • 1. Introducing Arc Klas Segeljakt Max Meldrum @FlinkForward A Common Intermediate Language for Unified Batch and Stream Analytics
  • 2. Outline • Project Introduction • The Arc Intermediate Representation (IR) • Arc Examples • Arc + Flink Integration? • Conclusions 2
  • 3. The Big Picture 3 • Flink plays an important role in the data science landscape ? ? ? ? ? ?
  • 4. The Big Picture 3 • Flink plays an important role in the data science landscape • Combining Flink with other frameworks can lead to interesting applications ? ? ? ? ? ?
  • 5. The Big Picture 3 • Flink plays an important role in the data science landscape • Combining Flink with other frameworks can lead to interesting applications • However, there is a language barrier ? ? ? ? ? ?
  • 7. Intuition 4 f1 f2 f3 • No cross-optimisation optimisation is possible, e.g. resource sharing
  • 8. Intuition 4 f1 f2 f3 • No cross-optimisation optimisation is possible, e.g. resource sharing • Data movement
 costs ( )
  • 9. Intuition 4 #Frameworks Performance f1 f2 f3 • No cross-optimisation optimisation is possible, e.g. resource sharing • Data movement
 costs ( )
  • 10. Intuition 4 #Frameworks Performance f1 f2 f3 • No cross-optimisation optimisation is possible, e.g. resource sharing • Data movement
 costs ( )
  • 11. Intuition 4 #Frameworks Performance f1 f2 f3 • No cross-optimisation optimisation is possible, e.g. resource sharing • Data movement
 costs ( ) f1 f2 f3IR IR IR
  • 12. Intuition 4 #Frameworks Performance f1 f2 f3 • No cross-optimisation optimisation is possible, e.g. resource sharing • Data movement
 costs ( ) f1 f2 f3IR IR IR f1 + f2 + f3IR
  • 13. The Arc IR 5 • Streams • Tables • Linear algebra High-level • Runners • Hardware Low-level
  • 14. Arc The Arc IR 5 • Streams • Tables • Linear algebra High-level • Runners • Hardware Low-level
  • 15. Arc The Arc IR 5 • Streams • Tables • Linear algebra High-level • Runners • Hardware Low-level Abstractions • Pipelines (Operators/Sources/Sinks) • User-defined Windows • Out-of-Order Processing, ...
  • 16. Arc The Arc IR 5 • Streams • Tables • Linear algebra High-level • Runners • Hardware Low-level Optimisations • Compiler: Partial evaluation, ... • Dataflow: Operator fusion, fission, reordering, ... Abstractions • Pipelines (Operators/Sources/Sinks) • User-defined Windows • Out-of-Order Processing, ...
  • 17. Compiler Pipeline 6 Arc (High Level IR) Frontends Logical Dataflow IR Binaries Physical Dataflow IR
  • 18. Compiler Pipeline 6 Arc (High Level IR) Frontends Logical Dataflow IR Binaries Physical Dataflow IR Flink Backend Flink Frontend
  • 20. 7 What is Arc? A restrictive language for describing batch and stream transformations
  • 21. 7 What is Arc? A restrictive language for describing batch and stream transformations Transformations are modelled through:
  • 22. 7 What is Arc? A restrictive language for describing batch and stream transformations Transformations are modelled through: • Values: Read-only data types (e.g. Vec[T], Stream[T], i8..i64)
  • 23. 7 What is Arc? A restrictive language for describing batch and stream transformations Transformations are modelled through: • Values: Read-only data types (e.g. Vec[T], Stream[T], i8..i64) • Builders: Write-only data types (e.g. Appender[T])
  • 24. 7 What is Arc? A restrictive language for describing batch and stream transformations Transformations are modelled through: • Values: Read-only data types (e.g. Vec[T], Stream[T], i8..i64) • Builders: Write-only data types (e.g. Appender[T]) • Values are written to builders, and builders are lazily materialised back into values
  • 25. 7 What is Arc? A restrictive language for describing batch and stream transformations Transformations are modelled through: • Values: Read-only data types (e.g. Vec[T], Stream[T], i8..i64) • Builders: Write-only data types (e.g. Appender[T]) • Values are written to builders, and builders are lazily materialised back into values ➡ Dependencies between values and builders form a dataflow graph
  • 27. 8 |source:Stream[i32], evenSink:StreamAppender[i32], oddSink:StreamAppender[i32]| let mapped = result(for(source, StreamAppender[i32], |out, v| merge(out, v + 5))); for(mapped, evenSink, |out, v| if (v % 2 == 0, merge(out, v), out)); for(mapped, oddSink, |out, v| if (v % 2 != 0, merge(out, v), out)) Arc source evenSink oddSink map(v+5) filter(v%2==0) filter(v%2!=0) Arc Example
  • 29. 10 source evenSink oddSink map(v+5) then if(x%2==0) Arc Example (Fused) |source:Stream[i32], evenSink:StreamAppender[i32], oddSink:StreamAppender[i32]| let mapped = result(for(source, StreamAppender[i32], |out, v| merge(out, v + 5))); for(mapped, evenSink, |out, v| if (v % 2 == 0, merge(out, v), out)); for(mapped, oddSink, |out, v| if (v % 2 != 0, merge(out, v), out)) Unfused
  • 30. 10 source evenSink oddSink map(v+5) then if(x%2==0) Arc Example (Fused) |source:Stream[i32], evenSink:StreamAppender[i32], oddSink:StreamAppender[i32]| let mapped = result(for(source, StreamAppender[i32], |out, v| merge(out, v + 5))); for(mapped, evenSink, |out, v| if (v % 2 == 0, merge(out, v), out)); for(mapped, oddSink, |out, v| if (v % 2 != 0, merge(out, v), out)) Unfused |source:Stream[i32], evenSink:StreamAppender[i32], oddSink:StreamAppender[i32]| for(source, {evenSink,oddSink}, |out, v| let x = v + 5; if (x % 2 == 0, {merge(out.$1, x), out.$2}, {out.$1, merge(out.$2, x)})) Fused
  • 33. • Benefits: • Enable stronger optimisations 11 Arc + Flink?
  • 34. • Benefits: • Enable stronger optimisations • Use your other favourite libraries together with Flink 11 Arc + Flink?
  • 35. • Benefits: • Enable stronger optimisations • Use your other favourite libraries together with Flink • Make life easier for data scientists 11 Arc + Flink?
  • 36. The black box problem UDFs are black boxes 12 stream.map( ) .filter( ) .reduce( )
  • 37. The black box problem UDFs are black boxes ➡ Flink is unaware of what is being executed inside of each black box 12 stream.map( ) .filter( ) .reduce( )
  • 38. Fusion Levels 13 = Flink Task ~ Thread
  • 39. Fusion Levels 13 = Flink Task ~ Thread x + 1 x + 1 x + 1 x + 11. No Fusion
  • 40. Fusion Levels 13 = Flink Task ~ Thread x + 1 x + 1 x + 1 x + 11. No Fusion x + 1 x + 1 x + 1 x + 12.Task Fusion
  • 41. Fusion Levels 13 = Flink Task ~ Thread x + 1 x + 1 x + 1 x + 11. No Fusion x + 1 x + 1 x + 1 x + 12.Task Fusion 3. Invocation-level Fusion x + 1 for-loop 4X
  • 42. Fusion Levels 13 = Flink Task ~ Thread x + 1 x + 1 x + 1 x + 11. No Fusion x + 1 x + 1 x + 1 x + 12.Task Fusion 4. Instruction-level Fusion x + 4 3. Invocation-level Fusion x + 1 for-loop 4X
  • 43. 14 Experiment Results 100 101 102 103 ExecutionTime(seconds) None Task(Flink) Invocation Instruction 50 maps on 10M elements N one Task(Flink) Invocation Instruction Optimisation level (Lower is better)
  • 46. Example Frontend Code (Pandas + Beam) 15 import arc.beam as beam import arc.beam.transforms.window as window import arc.beam.transforms.combiners as combiners import arc.pandas as pandas
  • 47. Example Frontend Code (Pandas + Beam) 15 import arc.beam as beam import arc.beam.transforms.window as window import arc.beam.transforms.combiners as combiners import arc.pandas as pandas def normalise(elements): series = pandas.Series(elements) avg = series.sum() / series.count() return series / avg
  • 48. Example Frontend Code (Pandas + Beam) 15 import arc.beam as beam import arc.beam.transforms.window as window import arc.beam.transforms.combiners as combiners import arc.pandas as pandas def normalise(elements): series = pandas.Series(elements) avg = series.sum() / series.count() return series / avg p = beam.Pipeline() (p | beam.io.ReadFromText(path='input.txt').with_output_types(int) | beam.WindowInto(window.FixedWindows(size=5)) | beam.CombineGlobally(normalise) | combiners.ToList() | beam.io.WriteToText(path='output.txt')) p.run()
  • 49. Example Frontend Code (Pandas + Beam) 15 import arc.beam as beam import arc.beam.transforms.window as window import arc.beam.transforms.combiners as combiners import arc.pandas as pandas def normalise(elements): series = pandas.Series(elements) avg = series.sum() / series.count() return series / avg p = beam.Pipeline() (p | beam.io.ReadFromText(path='input.txt').with_output_types(int) | beam.WindowInto(window.FixedWindows(size=5)) | beam.CombineGlobally(normalise) | combiners.ToList() | beam.io.WriteToText(path='output.txt')) p.run() ?
  • 51. • Flink inspired dataflow engine built in Rust • Goals: • Common runtime for Arc applications • Support dynamic task execution • First-class support for hardware acceleration 17 Arcon: Native Arc Runner
  • 52. • Arc is an IR for batch and stream programming. • By raising the level of abstraction, Arc is able to both optimise the dataflow and the code within it. 18 Conclusions Arc and experiments can be found at https://github.com/cda-group Contact info: klasseg@kth.se & mmeldrum@kth.se
  • 53. References Publications: • Kroll, L., Segeljakt, K., Carbone, P., Schulte, C. and Haridi, S., 2019, June. Arc: an IR for batch and stream programming. In Proceedings of the 17th ACM SIGPLAN International Symposium on Database Programming Languages (pp. 53-58). ACM. • Meldrum, M., Segeljakt, K., Kroll, L., Carbone, P., Schulte, C. and Haridi, S., 2019, August. Arcon: Continuous and Deep Data Stream Analytics. In Proceedings of Real-Time Business Intelligence and Analytics (p. 3). ACM. 19