SlideShare a Scribd company logo
Introduction to Stratosphere
Aljoscha Krettek
DIMA / TU Berlin
What is this?
●

●

●

Distributed data
processing system

source

DAG (Directed acyclic
graph) of sources, sinks,
and operators: “data flow”

map: “split words”

Handles distribution, faulttolerance, network
transfer

reduce: “count words”

sink
2
Why would I use this?
Automatic parallelization / Because you are told to
source

source

source

map: “split words”

map: “split words”

map: “split words”

reduce: “count words”

reduce: “count words”

reduce: “count words”

sink

sink

sink

3
So how do I use this?
(from Java)
●

How is data represented in the system?

●

How to I create data flows?

●

Which types of operators are there?

●

How do I write operators?

●

How do run the whole shebang?

4
How do I move my data?
●

●

●

Data is stored in fields in PactRecord
Basic data types: PactString, PactInteger, PactDouble,
PactFloat, PactBoolean, …
New data types must implement Value interface

5
PactRecord
PactRecord rec = ...
PactInteger foo =
rec.getField(0, PactInteger.class)
int i = foo.getValue()
PactInteger foo2 = new PactInteger(3)
rec.setField(1, foo2)

6
Creating Data Flows
●

Create one or several sources

●

Create operators:
–
–

●

Input is/are preceding operator(s)
Specify a class/object with the operator implementation

Create one or several sinks:
–

Input is some operator

7
WordCount Example Data Flow
FileDataSource source = new FileDataSource(TextInputFormat.class, dataInput, "Input Lines");
MapContract mapper = MapContract.builder(TokenizeLine.class)
.input(source)
.name("Tokenize Lines")
.build();
ReduceContract reducer = ReduceContract.builder(CountWords.class, PactString.class, 0)
.input(mapper)
.name("Count Words")
.build();
FileDataSink out = new FileDataSink(RecordOutputFormat.class, output, reducer, "Word Counts");
RecordOutputFormat.configureRecordFormat(out)
.recordDelimiter('n')
.fieldDelimiter(' ')
.field(PactString.class, 0)
.field(PactInteger.class, 1);
Plan plan = new Plan(out, "WordCount Example");

8
Operator Types
●

We call them second order functions (SOF)

●

Code inside the operator is the first order function
or user defined function (UDF)

●

●

Currently five SOFs: map, reduce, match, cogroup,
cross
SOF describes how PactRecords are handed to the
UDF

9
Map Operator
●

●

User code receives one
record at a time (per
call to user code
function)
Not really a functional
map since all operators
can output an arbitrary
number of records

10
Map Operator Example
public static class TokenizeLine extends MapStub {
private final AsciiUtils.WhitespaceTokenizer tokenizer =
new AsciiUtils.WhitespaceTokenizer();
private final PactRecord outputRecord = new PactRecord();
private final PactString word = new PactString();
private final PactInteger one = new PactInteger(1);
@Override
public void map(PactRecord record, Collector<PactRecord> collector) {
PactString line = record.getField(0, PactString.class);
this.tokenizer.setStringToTokenize(line);
while (tokenizer.next(word)) {
outputRecord.setField(0, word);
outputRecord.setField(1, one);
collector.collect(outputRecord);
}
}
}
11
Reduce Operator
●

●

User code receives a
group of records with
same key
Must specify which
fields of a record are
the key

12
Reduce Operator Example
public static class CountWords extends ReduceStub {
private final PactInteger cnt = new PactInteger();
@Override
public void reduce(Iterator<PactRecord> records, Collector<PactRecord> out)
throws Exception {
PactRecord element = null;
int sum = 0;
while (records.hasNext()) {
element = records.next();
PactInteger i = element.getField(1, PactInteger.class);
sum += i.getValue();
}
cnt.setValue(sum);
element.setField(1, cnt);
out.collect(element);
}
}
13
Specifying the Key Fields
ReduceContract reducer =
ReduceContract.builder(
Foo.class,
PactString.class, 0)
.input(mapper)
.keyField(PactInteger.class, 1)
.name("Count Words")
.build();
14
Cross Operator
●

●

●

●

Two input operator
Cartesian product: every
record from left combined
with every record from
right
One record from left, one
record from right per user
code call
Implement CrossStub

15
Match Operator
●

●

●

Two input operator
with keys
Join: record from left
combined with every
record from right with
same key
Implement MatchStub

16
CoGroup Operator
●

●

●

●

Two input operator with
keys
Records from left
combined with all record
from right with same key
User code gets an iterator
for left and right records
Implement CoGroupStub

17
How to execute a data flow plan
●

Either use LocalExecutor:
LocalExecutor.execute(plan)

●

Implement
PlanAssembler.getPlan(String...args)

And run on a local cluster or proper cluster
●

See: http://stratosphere.eu/quickstart/
and http://stratosphere.eu/docs/gettingstarted.html

18
Getting Started

https://github.com/stratosphere/stratosphere
https://github.com/stratosphere/stratosphere-quickstart

19
And Now for Something Completely
Different
val input = TextFile(textInput)
val words = input
.flatMap { _.split(" ") map { (_, 1) } }
val counts = words
.groupBy { case (word, _) => word }
.reduce { (w1, w2) => (w1._1, w1._2 + w2._2) }
val output = counts
.write(wordsOutput, CsvOutputFormat())
val plan = new ScalaPlan(Seq(output))
20
(Very) Short Introduction to Scala

21
Anatomy of a Scala Class
package foo.bar
import something.else
class Job(arg1: Int) {
def map(in: Int): String = {
val i: Int = in + 2
var a = “Hello”
i.toString
}
}

22
Singletons
●

Similar to Java singletons and/or static methods

object Job {
def main(args: String*) {
println(“Hello World”)
}
}

23
Collections
val a = Seq(1, 2, 4)
List(“Hallo”, 2)
Array(2,3)
Map(1->”1”, 2->”2”)
val b = a map { x => x + 2}
val c = a map { _ + 2 }
val c = a.map({ _ + 2 })

24
Generics and Tuples
val a: Seq[Int] = Seq(1, 2, 4)
val tup = (3, “a”)
val tup2: (Int, String) = (3, “a”)

25
Stratosphere Scala Front End

26
Skeleton of a Stratosphere Program
●

Input: a text file/JDBC source/CSV, etc.
–

●

Transformations on the Dataset
–

●

loaded in internal representation: the DataSet
map, reduce, join, etc.

Output: program results in a DataSink
–

Text file, JDBC, CSV, etc.

27
The Almighty DataSet
●

●

●

●

Operations are methods on DataSet[A]
Working with DataSet[A] feels like working with
Scala collections
DataSet[A] is not an actual collection but
represents computation on a collection
Stringing together operations creates a data flow
graph that can be execute

28
An Important Difference
Immediately Executed

Executed when data flow is executed

val input: List[String] = ...

val input: DataSet[String] = ...

val mapped = input.map { s => (s, 1) }

val mapped = input.map { s => (s, 1) }

val result = mapped.write(“file”, ...)

val plan = new Plan(result)

execute(plan)

29
Usable Data Types
●

Primitive types

●

Tuples

●

Case classes

●

Custom data types that implement the Value
interface

30
Creating Data Sources
val input = TextFile(“file://”)
val input: DataSet[(Int, String)] =
DataSource(“hdfs://”,
CsvInputFormat[(Int, String)]())
def parseInput(line: String): (Int, Int) = {…}
val input = DataSource(“hdfs://”,
DelimitedInputFormat](parseInput))

31
Interlude: Anonymous Functions
var fun: ((Int, String)) => String = ...
fun = { t => t._2 }
fun = { _._2 }
fun = { case (i, w) => w }

32
Map
val input: DataSet[(Int, String)] = ...
val mapper = input
.map { case (a, b) => (a + 2, b) }
val mapper2 = input
.flatMap { _._2.split(“ “) }
val filtered = input
.filter { case (a, b) => a > 3 }

33
Reduce
val input: DataSet[(String, Int)] = ...
val reducer = input
.groupBy { case (w, _) => w }
.groupReduce { _.minBy {...} }
val reducer2 = input
.groupBy { case (w, _) => w }
.reduce { (w1, w2) => (w1._1, w1._2 + w2._2) }

34
Cross
val left: DataSet[(String, Int)] = ...
val right: DataSet[(String, Int)] = ...
val cross = left cross right
.map { (l, r) => ... }
val cross = left cross right
.flatMap { (l, r) => ... }

35
Join (Match)
val counts: DataSet[(String, Int)] = ...
val names: DataSet[(Int, String)] = ...
val join = counts
.join(right)
.where {case (_,c) => c}.isEqualsTo {case (n,_) => n}
.map { (l, r) => (l._1, r._2) }
val join = counts
.join(right)
.where {case (_,c) => c}.isEqualsTo {case (n,_) => n}
.flatMap { (l, r) => ... }

36
CoGroup
val counts: DataSet[(String, Int)] = ...
val names: DataSet[(Int, String)] = ...
val cogroup = counts
.cogroup(right)
.where {case (_,c) => c}.isEqualsTo {case (n,_) => n}
.map { (l, r) => (l.minBy {...} , r.minBy {...}) }
val cogroup = counts
.cogroup(right)
.where {case (_,c) => c}.isEqualsTo {case (n,_) => n}
.flatMap { (l, r) => ... }

37
Creating Data Sinks
val counts: DataSet[(String, Int)]
val sink = counts.write(“<>”, CsvOutputFormat())
def formatOutput(a: (String, Int)): String = {
“Word “ + a._1 + “ count “ + a._2
}
val sink = counts.write(“<>”,
DelimitedOutputFormat(formatOutput))

38
Word Count example
val input = TextFile(textInput)
val words = input
.flatMap { _.split(" ") map { (_, 1) } }
val counts = words
.groupBy { case (word, _) => word }
.reduce { (w1, w2) => (w1._1, w1._2 + w2._2) }
val output = counts
.write(wordsOutput, CsvOutputFormat())
val plan = new ScalaPlan(Seq(output))
39
Things not mentioned
●

The is support for iterations (both in Java and Scala)

●

Many more data source/sink formats

●

Look at the examples in the stratosphere source

●

Don't be afraid to write on mailing list and on
github:
–

●

http://stratosphere.eu/quickstart/scala.html

Or come directly to us

40
End.

More Related Content

What's hot

Meet scala
Meet scalaMeet scala
Meet scala
Wojciech Pituła
 
A Scalable Hierarchical Clustering Algorithm Using Spark: Spark Summit East t...
A Scalable Hierarchical Clustering Algorithm Using Spark: Spark Summit East t...A Scalable Hierarchical Clustering Algorithm Using Spark: Spark Summit East t...
A Scalable Hierarchical Clustering Algorithm Using Spark: Spark Summit East t...
Spark Summit
 
Apache Flink Training: DataSet API Basics
Apache Flink Training: DataSet API BasicsApache Flink Training: DataSet API Basics
Apache Flink Training: DataSet API Basics
Flink Forward
 
Data Source API in Spark
Data Source API in SparkData Source API in Spark
Data Source API in Spark
Databricks
 
SparkSQL and Dataframe
SparkSQL and DataframeSparkSQL and Dataframe
SparkSQL and Dataframe
Namgee Lee
 
Sketching Data with T-Digest In Apache Spark: Spark Summit East talk by Erik ...
Sketching Data with T-Digest In Apache Spark: Spark Summit East talk by Erik ...Sketching Data with T-Digest In Apache Spark: Spark Summit East talk by Erik ...
Sketching Data with T-Digest In Apache Spark: Spark Summit East talk by Erik ...
Spark Summit
 
Writing MapReduce Programs using Java | Big Data Hadoop Spark Tutorial | Clou...
Writing MapReduce Programs using Java | Big Data Hadoop Spark Tutorial | Clou...Writing MapReduce Programs using Java | Big Data Hadoop Spark Tutorial | Clou...
Writing MapReduce Programs using Java | Big Data Hadoop Spark Tutorial | Clou...
CloudxLab
 
Spark: Taming Big Data
Spark: Taming Big DataSpark: Taming Big Data
Spark: Taming Big Data
Leonardo Gamas
 
Michael Häusler – Everyday flink
Michael Häusler – Everyday flinkMichael Häusler – Everyday flink
Michael Häusler – Everyday flink
Flink Forward
 
Pivoting Data with SparkSQL by Andrew Ray
Pivoting Data with SparkSQL by Andrew RayPivoting Data with SparkSQL by Andrew Ray
Pivoting Data with SparkSQL by Andrew Ray
Spark Summit
 
Data profiling with Apache Calcite
Data profiling with Apache CalciteData profiling with Apache Calcite
Data profiling with Apache Calcite
Julian Hyde
 
Anomaly Detection with Apache Spark
Anomaly Detection with Apache SparkAnomaly Detection with Apache Spark
Anomaly Detection with Apache Spark
Cloudera, Inc.
 
Map reduce: beyond word count
Map reduce: beyond word countMap reduce: beyond word count
Map reduce: beyond word count
Jeff Patti
 
Spark schema for free with David Szakallas
Spark schema for free with David SzakallasSpark schema for free with David Szakallas
Spark schema for free with David Szakallas
Databricks
 
An Introduction to Higher Order Functions in Spark SQL with Herman van Hovell
An Introduction to Higher Order Functions in Spark SQL with Herman van HovellAn Introduction to Higher Order Functions in Spark SQL with Herman van Hovell
An Introduction to Higher Order Functions in Spark SQL with Herman van Hovell
Databricks
 
Spark Dataframe - Mr. Jyotiska
Spark Dataframe - Mr. JyotiskaSpark Dataframe - Mr. Jyotiska
Spark Dataframe - Mr. Jyotiska
Sigmoid
 
Big Data Analytics with Scala at SCALA.IO 2013
Big Data Analytics with Scala at SCALA.IO 2013Big Data Analytics with Scala at SCALA.IO 2013
Big Data Analytics with Scala at SCALA.IO 2013
Samir Bessalah
 
Spark training-in-bangalore
Spark training-in-bangaloreSpark training-in-bangalore
Spark training-in-bangalore
Kelly Technologies
 
Enhancing Spark SQL Optimizer with Reliable Statistics
Enhancing Spark SQL Optimizer with Reliable StatisticsEnhancing Spark SQL Optimizer with Reliable Statistics
Enhancing Spark SQL Optimizer with Reliable Statistics
Jen Aman
 
Making Structured Streaming Ready for Production
Making Structured Streaming Ready for ProductionMaking Structured Streaming Ready for Production
Making Structured Streaming Ready for Production
Databricks
 

What's hot (20)

Meet scala
Meet scalaMeet scala
Meet scala
 
A Scalable Hierarchical Clustering Algorithm Using Spark: Spark Summit East t...
A Scalable Hierarchical Clustering Algorithm Using Spark: Spark Summit East t...A Scalable Hierarchical Clustering Algorithm Using Spark: Spark Summit East t...
A Scalable Hierarchical Clustering Algorithm Using Spark: Spark Summit East t...
 
Apache Flink Training: DataSet API Basics
Apache Flink Training: DataSet API BasicsApache Flink Training: DataSet API Basics
Apache Flink Training: DataSet API Basics
 
Data Source API in Spark
Data Source API in SparkData Source API in Spark
Data Source API in Spark
 
SparkSQL and Dataframe
SparkSQL and DataframeSparkSQL and Dataframe
SparkSQL and Dataframe
 
Sketching Data with T-Digest In Apache Spark: Spark Summit East talk by Erik ...
Sketching Data with T-Digest In Apache Spark: Spark Summit East talk by Erik ...Sketching Data with T-Digest In Apache Spark: Spark Summit East talk by Erik ...
Sketching Data with T-Digest In Apache Spark: Spark Summit East talk by Erik ...
 
Writing MapReduce Programs using Java | Big Data Hadoop Spark Tutorial | Clou...
Writing MapReduce Programs using Java | Big Data Hadoop Spark Tutorial | Clou...Writing MapReduce Programs using Java | Big Data Hadoop Spark Tutorial | Clou...
Writing MapReduce Programs using Java | Big Data Hadoop Spark Tutorial | Clou...
 
Spark: Taming Big Data
Spark: Taming Big DataSpark: Taming Big Data
Spark: Taming Big Data
 
Michael Häusler – Everyday flink
Michael Häusler – Everyday flinkMichael Häusler – Everyday flink
Michael Häusler – Everyday flink
 
Pivoting Data with SparkSQL by Andrew Ray
Pivoting Data with SparkSQL by Andrew RayPivoting Data with SparkSQL by Andrew Ray
Pivoting Data with SparkSQL by Andrew Ray
 
Data profiling with Apache Calcite
Data profiling with Apache CalciteData profiling with Apache Calcite
Data profiling with Apache Calcite
 
Anomaly Detection with Apache Spark
Anomaly Detection with Apache SparkAnomaly Detection with Apache Spark
Anomaly Detection with Apache Spark
 
Map reduce: beyond word count
Map reduce: beyond word countMap reduce: beyond word count
Map reduce: beyond word count
 
Spark schema for free with David Szakallas
Spark schema for free with David SzakallasSpark schema for free with David Szakallas
Spark schema for free with David Szakallas
 
An Introduction to Higher Order Functions in Spark SQL with Herman van Hovell
An Introduction to Higher Order Functions in Spark SQL with Herman van HovellAn Introduction to Higher Order Functions in Spark SQL with Herman van Hovell
An Introduction to Higher Order Functions in Spark SQL with Herman van Hovell
 
Spark Dataframe - Mr. Jyotiska
Spark Dataframe - Mr. JyotiskaSpark Dataframe - Mr. Jyotiska
Spark Dataframe - Mr. Jyotiska
 
Big Data Analytics with Scala at SCALA.IO 2013
Big Data Analytics with Scala at SCALA.IO 2013Big Data Analytics with Scala at SCALA.IO 2013
Big Data Analytics with Scala at SCALA.IO 2013
 
Spark training-in-bangalore
Spark training-in-bangaloreSpark training-in-bangalore
Spark training-in-bangalore
 
Enhancing Spark SQL Optimizer with Reliable Statistics
Enhancing Spark SQL Optimizer with Reliable StatisticsEnhancing Spark SQL Optimizer with Reliable Statistics
Enhancing Spark SQL Optimizer with Reliable Statistics
 
Making Structured Streaming Ready for Production
Making Structured Streaming Ready for ProductionMaking Structured Streaming Ready for Production
Making Structured Streaming Ready for Production
 

Viewers also liked

Dr. Kostas Tzoumas: Big Data Looks Tiny From Stratosphere at Big Data Beers (...
Dr. Kostas Tzoumas: Big Data Looks Tiny From Stratosphere at Big Data Beers (...Dr. Kostas Tzoumas: Big Data Looks Tiny From Stratosphere at Big Data Beers (...
Dr. Kostas Tzoumas: Big Data Looks Tiny From Stratosphere at Big Data Beers (...
stratosphere_eu
 
Is Data Scientist the Sexiest Job of the 21st century?
Is Data Scientist the Sexiest Job of the 21st century?Is Data Scientist the Sexiest Job of the 21st century?
Is Data Scientist the Sexiest Job of the 21st century?
Edureka!
 
Clare Corthell: Learning Data Science Online
Clare Corthell: Learning Data Science OnlineClare Corthell: Learning Data Science Online
Clare Corthell: Learning Data Science Online
sfdatascience
 
data scientist the sexiest job of the 21st century
data scientist the sexiest job of the 21st centurydata scientist the sexiest job of the 21st century
data scientist the sexiest job of the 21st century
Frank Kienle
 
Data scientist the sexiest job of the 21st century (article review presentation)
Data scientist the sexiest job of the 21st century (article review presentation)Data scientist the sexiest job of the 21st century (article review presentation)
Data scientist the sexiest job of the 21st century (article review presentation)
chaithu reddy
 
Is Data Scientist still the sexiest job of 21st century? Find Out!
Is Data Scientist still the sexiest job of 21st century? Find Out!Is Data Scientist still the sexiest job of 21st century? Find Out!
Is Data Scientist still the sexiest job of 21st century? Find Out!
Edureka!
 
Take Aways from "Data Scientist: The Sexiest Job of the 21st Century"
Take Aways from "Data Scientist: The Sexiest Job of the 21st Century"Take Aways from "Data Scientist: The Sexiest Job of the 21st Century"
Take Aways from "Data Scientist: The Sexiest Job of the 21st Century"
Greg Farrenkopf
 

Viewers also liked (7)

Dr. Kostas Tzoumas: Big Data Looks Tiny From Stratosphere at Big Data Beers (...
Dr. Kostas Tzoumas: Big Data Looks Tiny From Stratosphere at Big Data Beers (...Dr. Kostas Tzoumas: Big Data Looks Tiny From Stratosphere at Big Data Beers (...
Dr. Kostas Tzoumas: Big Data Looks Tiny From Stratosphere at Big Data Beers (...
 
Is Data Scientist the Sexiest Job of the 21st century?
Is Data Scientist the Sexiest Job of the 21st century?Is Data Scientist the Sexiest Job of the 21st century?
Is Data Scientist the Sexiest Job of the 21st century?
 
Clare Corthell: Learning Data Science Online
Clare Corthell: Learning Data Science OnlineClare Corthell: Learning Data Science Online
Clare Corthell: Learning Data Science Online
 
data scientist the sexiest job of the 21st century
data scientist the sexiest job of the 21st centurydata scientist the sexiest job of the 21st century
data scientist the sexiest job of the 21st century
 
Data scientist the sexiest job of the 21st century (article review presentation)
Data scientist the sexiest job of the 21st century (article review presentation)Data scientist the sexiest job of the 21st century (article review presentation)
Data scientist the sexiest job of the 21st century (article review presentation)
 
Is Data Scientist still the sexiest job of 21st century? Find Out!
Is Data Scientist still the sexiest job of 21st century? Find Out!Is Data Scientist still the sexiest job of 21st century? Find Out!
Is Data Scientist still the sexiest job of 21st century? Find Out!
 
Take Aways from "Data Scientist: The Sexiest Job of the 21st Century"
Take Aways from "Data Scientist: The Sexiest Job of the 21st Century"Take Aways from "Data Scientist: The Sexiest Job of the 21st Century"
Take Aways from "Data Scientist: The Sexiest Job of the 21st Century"
 

Similar to Stratosphere Intro (Java and Scala Interface)

User Defined Aggregation in Apache Spark: A Love Story
User Defined Aggregation in Apache Spark: A Love StoryUser Defined Aggregation in Apache Spark: A Love Story
User Defined Aggregation in Apache Spark: A Love Story
Databricks
 
User Defined Aggregation in Apache Spark: A Love Story
User Defined Aggregation in Apache Spark: A Love StoryUser Defined Aggregation in Apache Spark: A Love Story
User Defined Aggregation in Apache Spark: A Love Story
Databricks
 
Spark workshop
Spark workshopSpark workshop
Spark workshop
Wojciech Pituła
 
Distributed computing with spark
Distributed computing with sparkDistributed computing with spark
Distributed computing with spark
Javier Santos Paniego
 
CS101- Introduction to Computing- Lecture 35
CS101- Introduction to Computing- Lecture 35CS101- Introduction to Computing- Lecture 35
CS101- Introduction to Computing- Lecture 35
Bilal Ahmed
 
R Language Introduction
R Language IntroductionR Language Introduction
R Language Introduction
Khaled Al-Shamaa
 
Real Time Big Data Management
Real Time Big Data ManagementReal Time Big Data Management
Real Time Big Data Management
Albert Bifet
 
Distributed Real-Time Stream Processing: Why and How: Spark Summit East talk ...
Distributed Real-Time Stream Processing: Why and How: Spark Summit East talk ...Distributed Real-Time Stream Processing: Why and How: Spark Summit East talk ...
Distributed Real-Time Stream Processing: Why and How: Spark Summit East talk ...
Spark Summit
 
Distributed Stream Processing - Spark Summit East 2017
Distributed Stream Processing - Spark Summit East 2017Distributed Stream Processing - Spark Summit East 2017
Distributed Stream Processing - Spark Summit East 2017
Petr Zapletal
 
Legacy lambda code
Legacy lambda codeLegacy lambda code
Legacy lambda code
Peter Lawrey
 
Distributed Real-Time Stream Processing: Why and How 2.0
Distributed Real-Time Stream Processing:  Why and How 2.0Distributed Real-Time Stream Processing:  Why and How 2.0
Distributed Real-Time Stream Processing: Why and How 2.0
Petr Zapletal
 
A Deep Dive into Spark SQL's Catalyst Optimizer with Yin Huai
A Deep Dive into Spark SQL's Catalyst Optimizer with Yin HuaiA Deep Dive into Spark SQL's Catalyst Optimizer with Yin Huai
A Deep Dive into Spark SQL's Catalyst Optimizer with Yin Huai
Databricks
 
Wprowadzenie do technologii Big Data / Intro to Big Data Ecosystem
Wprowadzenie do technologii Big Data / Intro to Big Data EcosystemWprowadzenie do technologii Big Data / Intro to Big Data Ecosystem
Wprowadzenie do technologii Big Data / Intro to Big Data Ecosystem
Sages
 
Idea for ineractive programming language
Idea for ineractive programming languageIdea for ineractive programming language
Idea for ineractive programming language
Lincoln Hannah
 
Time Series Meetup: Virtual Edition | July 2020
Time Series Meetup: Virtual Edition | July 2020Time Series Meetup: Virtual Edition | July 2020
Time Series Meetup: Virtual Edition | July 2020
InfluxData
 
Compose Async with RxJS
Compose Async with RxJSCompose Async with RxJS
Compose Async with RxJS
Kyung Yeol Kim
 
Egor Bogatov - .NET Core intrinsics and other micro-optimizations
Egor Bogatov - .NET Core intrinsics and other micro-optimizationsEgor Bogatov - .NET Core intrinsics and other micro-optimizations
Egor Bogatov - .NET Core intrinsics and other micro-optimizations
Egor Bogatov
 
All I know about rsc.io/c2go
All I know about rsc.io/c2goAll I know about rsc.io/c2go
All I know about rsc.io/c2go
Moriyoshi Koizumi
 
R basics
R basicsR basics
R basics
Sagun Baijal
 
Matlab1
Matlab1Matlab1
Matlab1
guest8ba004
 

Similar to Stratosphere Intro (Java and Scala Interface) (20)

User Defined Aggregation in Apache Spark: A Love Story
User Defined Aggregation in Apache Spark: A Love StoryUser Defined Aggregation in Apache Spark: A Love Story
User Defined Aggregation in Apache Spark: A Love Story
 
User Defined Aggregation in Apache Spark: A Love Story
User Defined Aggregation in Apache Spark: A Love StoryUser Defined Aggregation in Apache Spark: A Love Story
User Defined Aggregation in Apache Spark: A Love Story
 
Spark workshop
Spark workshopSpark workshop
Spark workshop
 
Distributed computing with spark
Distributed computing with sparkDistributed computing with spark
Distributed computing with spark
 
CS101- Introduction to Computing- Lecture 35
CS101- Introduction to Computing- Lecture 35CS101- Introduction to Computing- Lecture 35
CS101- Introduction to Computing- Lecture 35
 
R Language Introduction
R Language IntroductionR Language Introduction
R Language Introduction
 
Real Time Big Data Management
Real Time Big Data ManagementReal Time Big Data Management
Real Time Big Data Management
 
Distributed Real-Time Stream Processing: Why and How: Spark Summit East talk ...
Distributed Real-Time Stream Processing: Why and How: Spark Summit East talk ...Distributed Real-Time Stream Processing: Why and How: Spark Summit East talk ...
Distributed Real-Time Stream Processing: Why and How: Spark Summit East talk ...
 
Distributed Stream Processing - Spark Summit East 2017
Distributed Stream Processing - Spark Summit East 2017Distributed Stream Processing - Spark Summit East 2017
Distributed Stream Processing - Spark Summit East 2017
 
Legacy lambda code
Legacy lambda codeLegacy lambda code
Legacy lambda code
 
Distributed Real-Time Stream Processing: Why and How 2.0
Distributed Real-Time Stream Processing:  Why and How 2.0Distributed Real-Time Stream Processing:  Why and How 2.0
Distributed Real-Time Stream Processing: Why and How 2.0
 
A Deep Dive into Spark SQL's Catalyst Optimizer with Yin Huai
A Deep Dive into Spark SQL's Catalyst Optimizer with Yin HuaiA Deep Dive into Spark SQL's Catalyst Optimizer with Yin Huai
A Deep Dive into Spark SQL's Catalyst Optimizer with Yin Huai
 
Wprowadzenie do technologii Big Data / Intro to Big Data Ecosystem
Wprowadzenie do technologii Big Data / Intro to Big Data EcosystemWprowadzenie do technologii Big Data / Intro to Big Data Ecosystem
Wprowadzenie do technologii Big Data / Intro to Big Data Ecosystem
 
Idea for ineractive programming language
Idea for ineractive programming languageIdea for ineractive programming language
Idea for ineractive programming language
 
Time Series Meetup: Virtual Edition | July 2020
Time Series Meetup: Virtual Edition | July 2020Time Series Meetup: Virtual Edition | July 2020
Time Series Meetup: Virtual Edition | July 2020
 
Compose Async with RxJS
Compose Async with RxJSCompose Async with RxJS
Compose Async with RxJS
 
Egor Bogatov - .NET Core intrinsics and other micro-optimizations
Egor Bogatov - .NET Core intrinsics and other micro-optimizationsEgor Bogatov - .NET Core intrinsics and other micro-optimizations
Egor Bogatov - .NET Core intrinsics and other micro-optimizations
 
All I know about rsc.io/c2go
All I know about rsc.io/c2goAll I know about rsc.io/c2go
All I know about rsc.io/c2go
 
R basics
R basicsR basics
R basics
 
Matlab1
Matlab1Matlab1
Matlab1
 

More from Robert Metzger

How to Contribute to Apache Flink (and Flink at the Apache Software Foundation)
How to Contribute to Apache Flink (and Flink at the Apache Software Foundation)How to Contribute to Apache Flink (and Flink at the Apache Software Foundation)
How to Contribute to Apache Flink (and Flink at the Apache Software Foundation)
Robert Metzger
 
dA Platform Overview
dA Platform OverviewdA Platform Overview
dA Platform Overview
Robert Metzger
 
Apache Flink @ Tel Aviv / Herzliya Meetup
Apache Flink @ Tel Aviv / Herzliya MeetupApache Flink @ Tel Aviv / Herzliya Meetup
Apache Flink @ Tel Aviv / Herzliya Meetup
Robert Metzger
 
Apache Flink Community Updates November 2016 @ Berlin Meetup
Apache Flink Community Updates November 2016 @ Berlin MeetupApache Flink Community Updates November 2016 @ Berlin Meetup
Apache Flink Community Updates November 2016 @ Berlin Meetup
Robert Metzger
 
A Data Streaming Architecture with Apache Flink (berlin Buzzwords 2016)
A Data Streaming Architecture with Apache Flink (berlin Buzzwords 2016)A Data Streaming Architecture with Apache Flink (berlin Buzzwords 2016)
A Data Streaming Architecture with Apache Flink (berlin Buzzwords 2016)
Robert Metzger
 
Community Update May 2016 (January - May) | Berlin Apache Flink Meetup
Community Update May 2016 (January - May) | Berlin Apache Flink MeetupCommunity Update May 2016 (January - May) | Berlin Apache Flink Meetup
Community Update May 2016 (January - May) | Berlin Apache Flink Meetup
Robert Metzger
 
GOTO Night Amsterdam - Stream processing with Apache Flink
GOTO Night Amsterdam - Stream processing with Apache FlinkGOTO Night Amsterdam - Stream processing with Apache Flink
GOTO Night Amsterdam - Stream processing with Apache Flink
Robert Metzger
 
QCon London - Stream Processing with Apache Flink
QCon London - Stream Processing with Apache FlinkQCon London - Stream Processing with Apache Flink
QCon London - Stream Processing with Apache Flink
Robert Metzger
 
January 2016 Flink Community Update & Roadmap 2016
January 2016 Flink Community Update & Roadmap 2016January 2016 Flink Community Update & Roadmap 2016
January 2016 Flink Community Update & Roadmap 2016
Robert Metzger
 
Flink Community Update December 2015: Year in Review
Flink Community Update December 2015: Year in ReviewFlink Community Update December 2015: Year in Review
Flink Community Update December 2015: Year in Review
Robert Metzger
 
Apache Flink Meetup Munich (November 2015): Flink Overview, Architecture, Int...
Apache Flink Meetup Munich (November 2015): Flink Overview, Architecture, Int...Apache Flink Meetup Munich (November 2015): Flink Overview, Architecture, Int...
Apache Flink Meetup Munich (November 2015): Flink Overview, Architecture, Int...
Robert Metzger
 
Chicago Flink Meetup: Flink's streaming architecture
Chicago Flink Meetup: Flink's streaming architectureChicago Flink Meetup: Flink's streaming architecture
Chicago Flink Meetup: Flink's streaming architecture
Robert Metzger
 
Flink September 2015 Community Update
Flink September 2015 Community UpdateFlink September 2015 Community Update
Flink September 2015 Community Update
Robert Metzger
 
Architecture of Flink's Streaming Runtime @ ApacheCon EU 2015
Architecture of Flink's Streaming Runtime @ ApacheCon EU 2015Architecture of Flink's Streaming Runtime @ ApacheCon EU 2015
Architecture of Flink's Streaming Runtime @ ApacheCon EU 2015
Robert Metzger
 
Click-Through Example for Flink’s KafkaConsumer Checkpointing
Click-Through Example for Flink’s KafkaConsumer CheckpointingClick-Through Example for Flink’s KafkaConsumer Checkpointing
Click-Through Example for Flink’s KafkaConsumer Checkpointing
Robert Metzger
 
August Flink Community Update
August Flink Community UpdateAugust Flink Community Update
August Flink Community Update
Robert Metzger
 
Flink Cummunity Update July (Berlin Meetup)
Flink Cummunity Update July (Berlin Meetup)Flink Cummunity Update July (Berlin Meetup)
Flink Cummunity Update July (Berlin Meetup)
Robert Metzger
 
Apache Flink First Half of 2015 Community Update
Apache Flink First Half of 2015 Community UpdateApache Flink First Half of 2015 Community Update
Apache Flink First Half of 2015 Community Update
Robert Metzger
 
Apache Flink Deep-Dive @ Hadoop Summit 2015 in San Jose, CA
Apache Flink Deep-Dive @ Hadoop Summit 2015 in San Jose, CAApache Flink Deep-Dive @ Hadoop Summit 2015 in San Jose, CA
Apache Flink Deep-Dive @ Hadoop Summit 2015 in San Jose, CA
Robert Metzger
 
Apache Flink Hands On
Apache Flink Hands OnApache Flink Hands On
Apache Flink Hands On
Robert Metzger
 

More from Robert Metzger (20)

How to Contribute to Apache Flink (and Flink at the Apache Software Foundation)
How to Contribute to Apache Flink (and Flink at the Apache Software Foundation)How to Contribute to Apache Flink (and Flink at the Apache Software Foundation)
How to Contribute to Apache Flink (and Flink at the Apache Software Foundation)
 
dA Platform Overview
dA Platform OverviewdA Platform Overview
dA Platform Overview
 
Apache Flink @ Tel Aviv / Herzliya Meetup
Apache Flink @ Tel Aviv / Herzliya MeetupApache Flink @ Tel Aviv / Herzliya Meetup
Apache Flink @ Tel Aviv / Herzliya Meetup
 
Apache Flink Community Updates November 2016 @ Berlin Meetup
Apache Flink Community Updates November 2016 @ Berlin MeetupApache Flink Community Updates November 2016 @ Berlin Meetup
Apache Flink Community Updates November 2016 @ Berlin Meetup
 
A Data Streaming Architecture with Apache Flink (berlin Buzzwords 2016)
A Data Streaming Architecture with Apache Flink (berlin Buzzwords 2016)A Data Streaming Architecture with Apache Flink (berlin Buzzwords 2016)
A Data Streaming Architecture with Apache Flink (berlin Buzzwords 2016)
 
Community Update May 2016 (January - May) | Berlin Apache Flink Meetup
Community Update May 2016 (January - May) | Berlin Apache Flink MeetupCommunity Update May 2016 (January - May) | Berlin Apache Flink Meetup
Community Update May 2016 (January - May) | Berlin Apache Flink Meetup
 
GOTO Night Amsterdam - Stream processing with Apache Flink
GOTO Night Amsterdam - Stream processing with Apache FlinkGOTO Night Amsterdam - Stream processing with Apache Flink
GOTO Night Amsterdam - Stream processing with Apache Flink
 
QCon London - Stream Processing with Apache Flink
QCon London - Stream Processing with Apache FlinkQCon London - Stream Processing with Apache Flink
QCon London - Stream Processing with Apache Flink
 
January 2016 Flink Community Update & Roadmap 2016
January 2016 Flink Community Update & Roadmap 2016January 2016 Flink Community Update & Roadmap 2016
January 2016 Flink Community Update & Roadmap 2016
 
Flink Community Update December 2015: Year in Review
Flink Community Update December 2015: Year in ReviewFlink Community Update December 2015: Year in Review
Flink Community Update December 2015: Year in Review
 
Apache Flink Meetup Munich (November 2015): Flink Overview, Architecture, Int...
Apache Flink Meetup Munich (November 2015): Flink Overview, Architecture, Int...Apache Flink Meetup Munich (November 2015): Flink Overview, Architecture, Int...
Apache Flink Meetup Munich (November 2015): Flink Overview, Architecture, Int...
 
Chicago Flink Meetup: Flink's streaming architecture
Chicago Flink Meetup: Flink's streaming architectureChicago Flink Meetup: Flink's streaming architecture
Chicago Flink Meetup: Flink's streaming architecture
 
Flink September 2015 Community Update
Flink September 2015 Community UpdateFlink September 2015 Community Update
Flink September 2015 Community Update
 
Architecture of Flink's Streaming Runtime @ ApacheCon EU 2015
Architecture of Flink's Streaming Runtime @ ApacheCon EU 2015Architecture of Flink's Streaming Runtime @ ApacheCon EU 2015
Architecture of Flink's Streaming Runtime @ ApacheCon EU 2015
 
Click-Through Example for Flink’s KafkaConsumer Checkpointing
Click-Through Example for Flink’s KafkaConsumer CheckpointingClick-Through Example for Flink’s KafkaConsumer Checkpointing
Click-Through Example for Flink’s KafkaConsumer Checkpointing
 
August Flink Community Update
August Flink Community UpdateAugust Flink Community Update
August Flink Community Update
 
Flink Cummunity Update July (Berlin Meetup)
Flink Cummunity Update July (Berlin Meetup)Flink Cummunity Update July (Berlin Meetup)
Flink Cummunity Update July (Berlin Meetup)
 
Apache Flink First Half of 2015 Community Update
Apache Flink First Half of 2015 Community UpdateApache Flink First Half of 2015 Community Update
Apache Flink First Half of 2015 Community Update
 
Apache Flink Deep-Dive @ Hadoop Summit 2015 in San Jose, CA
Apache Flink Deep-Dive @ Hadoop Summit 2015 in San Jose, CAApache Flink Deep-Dive @ Hadoop Summit 2015 in San Jose, CA
Apache Flink Deep-Dive @ Hadoop Summit 2015 in San Jose, CA
 
Apache Flink Hands On
Apache Flink Hands OnApache Flink Hands On
Apache Flink Hands On
 

Recently uploaded

Demystifying Knowledge Management through Storytelling
Demystifying Knowledge Management through StorytellingDemystifying Knowledge Management through Storytelling
Demystifying Knowledge Management through Storytelling
Enterprise Knowledge
 
Introducing BoxLang : A new JVM language for productivity and modularity!
Introducing BoxLang : A new JVM language for productivity and modularity!Introducing BoxLang : A new JVM language for productivity and modularity!
Introducing BoxLang : A new JVM language for productivity and modularity!
Ortus Solutions, Corp
 
JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
Miro Wengner
 
inQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
inQuba Webinar Mastering Customer Journey Management with Dr Graham HillinQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
inQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
LizaNolte
 
Containers & AI - Beauty and the Beast!?!
Containers & AI - Beauty and the Beast!?!Containers & AI - Beauty and the Beast!?!
Containers & AI - Beauty and the Beast!?!
Tobias Schneck
 
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansBiomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Neo4j
 
The Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptxThe Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptx
operationspcvita
 
Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |
AstuteBusiness
 
"NATO Hackathon Winner: AI-Powered Drug Search", Taras Kloba
"NATO Hackathon Winner: AI-Powered Drug Search",  Taras Kloba"NATO Hackathon Winner: AI-Powered Drug Search",  Taras Kloba
"NATO Hackathon Winner: AI-Powered Drug Search", Taras Kloba
Fwdays
 
Essentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation ParametersEssentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation Parameters
Safe Software
 
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsConnector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
DianaGray10
 
QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...
QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...
QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...
AlexanderRichford
 
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge GraphGraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
Neo4j
 
Session 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdfSession 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdf
UiPathCommunity
 
GlobalLogic Java Community Webinar #18 “How to Improve Web Application Perfor...
GlobalLogic Java Community Webinar #18 “How to Improve Web Application Perfor...GlobalLogic Java Community Webinar #18 “How to Improve Web Application Perfor...
GlobalLogic Java Community Webinar #18 “How to Improve Web Application Perfor...
GlobalLogic Ukraine
 
Discover the Unseen: Tailored Recommendation of Unwatched Content
Discover the Unseen: Tailored Recommendation of Unwatched ContentDiscover the Unseen: Tailored Recommendation of Unwatched Content
Discover the Unseen: Tailored Recommendation of Unwatched Content
ScyllaDB
 
From Natural Language to Structured Solr Queries using LLMs
From Natural Language to Structured Solr Queries using LLMsFrom Natural Language to Structured Solr Queries using LLMs
From Natural Language to Structured Solr Queries using LLMs
Sease
 
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
Fwdays
 
Principle of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptxPrinciple of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptx
BibashShahi
 
AI in the Workplace Reskilling, Upskilling, and Future Work.pptx
AI in the Workplace Reskilling, Upskilling, and Future Work.pptxAI in the Workplace Reskilling, Upskilling, and Future Work.pptx
AI in the Workplace Reskilling, Upskilling, and Future Work.pptx
Sunil Jagani
 

Recently uploaded (20)

Demystifying Knowledge Management through Storytelling
Demystifying Knowledge Management through StorytellingDemystifying Knowledge Management through Storytelling
Demystifying Knowledge Management through Storytelling
 
Introducing BoxLang : A new JVM language for productivity and modularity!
Introducing BoxLang : A new JVM language for productivity and modularity!Introducing BoxLang : A new JVM language for productivity and modularity!
Introducing BoxLang : A new JVM language for productivity and modularity!
 
JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
 
inQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
inQuba Webinar Mastering Customer Journey Management with Dr Graham HillinQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
inQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
 
Containers & AI - Beauty and the Beast!?!
Containers & AI - Beauty and the Beast!?!Containers & AI - Beauty and the Beast!?!
Containers & AI - Beauty and the Beast!?!
 
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansBiomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
 
The Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptxThe Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptx
 
Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |
 
"NATO Hackathon Winner: AI-Powered Drug Search", Taras Kloba
"NATO Hackathon Winner: AI-Powered Drug Search",  Taras Kloba"NATO Hackathon Winner: AI-Powered Drug Search",  Taras Kloba
"NATO Hackathon Winner: AI-Powered Drug Search", Taras Kloba
 
Essentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation ParametersEssentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation Parameters
 
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsConnector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
 
QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...
QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...
QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...
 
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge GraphGraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
 
Session 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdfSession 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdf
 
GlobalLogic Java Community Webinar #18 “How to Improve Web Application Perfor...
GlobalLogic Java Community Webinar #18 “How to Improve Web Application Perfor...GlobalLogic Java Community Webinar #18 “How to Improve Web Application Perfor...
GlobalLogic Java Community Webinar #18 “How to Improve Web Application Perfor...
 
Discover the Unseen: Tailored Recommendation of Unwatched Content
Discover the Unseen: Tailored Recommendation of Unwatched ContentDiscover the Unseen: Tailored Recommendation of Unwatched Content
Discover the Unseen: Tailored Recommendation of Unwatched Content
 
From Natural Language to Structured Solr Queries using LLMs
From Natural Language to Structured Solr Queries using LLMsFrom Natural Language to Structured Solr Queries using LLMs
From Natural Language to Structured Solr Queries using LLMs
 
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
 
Principle of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptxPrinciple of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptx
 
AI in the Workplace Reskilling, Upskilling, and Future Work.pptx
AI in the Workplace Reskilling, Upskilling, and Future Work.pptxAI in the Workplace Reskilling, Upskilling, and Future Work.pptx
AI in the Workplace Reskilling, Upskilling, and Future Work.pptx
 

Stratosphere Intro (Java and Scala Interface)

  • 1. Introduction to Stratosphere Aljoscha Krettek DIMA / TU Berlin
  • 2. What is this? ● ● ● Distributed data processing system source DAG (Directed acyclic graph) of sources, sinks, and operators: “data flow” map: “split words” Handles distribution, faulttolerance, network transfer reduce: “count words” sink 2
  • 3. Why would I use this? Automatic parallelization / Because you are told to source source source map: “split words” map: “split words” map: “split words” reduce: “count words” reduce: “count words” reduce: “count words” sink sink sink 3
  • 4. So how do I use this? (from Java) ● How is data represented in the system? ● How to I create data flows? ● Which types of operators are there? ● How do I write operators? ● How do run the whole shebang? 4
  • 5. How do I move my data? ● ● ● Data is stored in fields in PactRecord Basic data types: PactString, PactInteger, PactDouble, PactFloat, PactBoolean, … New data types must implement Value interface 5
  • 6. PactRecord PactRecord rec = ... PactInteger foo = rec.getField(0, PactInteger.class) int i = foo.getValue() PactInteger foo2 = new PactInteger(3) rec.setField(1, foo2) 6
  • 7. Creating Data Flows ● Create one or several sources ● Create operators: – – ● Input is/are preceding operator(s) Specify a class/object with the operator implementation Create one or several sinks: – Input is some operator 7
  • 8. WordCount Example Data Flow FileDataSource source = new FileDataSource(TextInputFormat.class, dataInput, "Input Lines"); MapContract mapper = MapContract.builder(TokenizeLine.class) .input(source) .name("Tokenize Lines") .build(); ReduceContract reducer = ReduceContract.builder(CountWords.class, PactString.class, 0) .input(mapper) .name("Count Words") .build(); FileDataSink out = new FileDataSink(RecordOutputFormat.class, output, reducer, "Word Counts"); RecordOutputFormat.configureRecordFormat(out) .recordDelimiter('n') .fieldDelimiter(' ') .field(PactString.class, 0) .field(PactInteger.class, 1); Plan plan = new Plan(out, "WordCount Example"); 8
  • 9. Operator Types ● We call them second order functions (SOF) ● Code inside the operator is the first order function or user defined function (UDF) ● ● Currently five SOFs: map, reduce, match, cogroup, cross SOF describes how PactRecords are handed to the UDF 9
  • 10. Map Operator ● ● User code receives one record at a time (per call to user code function) Not really a functional map since all operators can output an arbitrary number of records 10
  • 11. Map Operator Example public static class TokenizeLine extends MapStub { private final AsciiUtils.WhitespaceTokenizer tokenizer = new AsciiUtils.WhitespaceTokenizer(); private final PactRecord outputRecord = new PactRecord(); private final PactString word = new PactString(); private final PactInteger one = new PactInteger(1); @Override public void map(PactRecord record, Collector<PactRecord> collector) { PactString line = record.getField(0, PactString.class); this.tokenizer.setStringToTokenize(line); while (tokenizer.next(word)) { outputRecord.setField(0, word); outputRecord.setField(1, one); collector.collect(outputRecord); } } } 11
  • 12. Reduce Operator ● ● User code receives a group of records with same key Must specify which fields of a record are the key 12
  • 13. Reduce Operator Example public static class CountWords extends ReduceStub { private final PactInteger cnt = new PactInteger(); @Override public void reduce(Iterator<PactRecord> records, Collector<PactRecord> out) throws Exception { PactRecord element = null; int sum = 0; while (records.hasNext()) { element = records.next(); PactInteger i = element.getField(1, PactInteger.class); sum += i.getValue(); } cnt.setValue(sum); element.setField(1, cnt); out.collect(element); } } 13
  • 14. Specifying the Key Fields ReduceContract reducer = ReduceContract.builder( Foo.class, PactString.class, 0) .input(mapper) .keyField(PactInteger.class, 1) .name("Count Words") .build(); 14
  • 15. Cross Operator ● ● ● ● Two input operator Cartesian product: every record from left combined with every record from right One record from left, one record from right per user code call Implement CrossStub 15
  • 16. Match Operator ● ● ● Two input operator with keys Join: record from left combined with every record from right with same key Implement MatchStub 16
  • 17. CoGroup Operator ● ● ● ● Two input operator with keys Records from left combined with all record from right with same key User code gets an iterator for left and right records Implement CoGroupStub 17
  • 18. How to execute a data flow plan ● Either use LocalExecutor: LocalExecutor.execute(plan) ● Implement PlanAssembler.getPlan(String...args) And run on a local cluster or proper cluster ● See: http://stratosphere.eu/quickstart/ and http://stratosphere.eu/docs/gettingstarted.html 18
  • 20. And Now for Something Completely Different val input = TextFile(textInput) val words = input .flatMap { _.split(" ") map { (_, 1) } } val counts = words .groupBy { case (word, _) => word } .reduce { (w1, w2) => (w1._1, w1._2 + w2._2) } val output = counts .write(wordsOutput, CsvOutputFormat()) val plan = new ScalaPlan(Seq(output)) 20
  • 22. Anatomy of a Scala Class package foo.bar import something.else class Job(arg1: Int) { def map(in: Int): String = { val i: Int = in + 2 var a = “Hello” i.toString } } 22
  • 23. Singletons ● Similar to Java singletons and/or static methods object Job { def main(args: String*) { println(“Hello World”) } } 23
  • 24. Collections val a = Seq(1, 2, 4) List(“Hallo”, 2) Array(2,3) Map(1->”1”, 2->”2”) val b = a map { x => x + 2} val c = a map { _ + 2 } val c = a.map({ _ + 2 }) 24
  • 25. Generics and Tuples val a: Seq[Int] = Seq(1, 2, 4) val tup = (3, “a”) val tup2: (Int, String) = (3, “a”) 25
  • 27. Skeleton of a Stratosphere Program ● Input: a text file/JDBC source/CSV, etc. – ● Transformations on the Dataset – ● loaded in internal representation: the DataSet map, reduce, join, etc. Output: program results in a DataSink – Text file, JDBC, CSV, etc. 27
  • 28. The Almighty DataSet ● ● ● ● Operations are methods on DataSet[A] Working with DataSet[A] feels like working with Scala collections DataSet[A] is not an actual collection but represents computation on a collection Stringing together operations creates a data flow graph that can be execute 28
  • 29. An Important Difference Immediately Executed Executed when data flow is executed val input: List[String] = ... val input: DataSet[String] = ... val mapped = input.map { s => (s, 1) } val mapped = input.map { s => (s, 1) } val result = mapped.write(“file”, ...) val plan = new Plan(result) execute(plan) 29
  • 30. Usable Data Types ● Primitive types ● Tuples ● Case classes ● Custom data types that implement the Value interface 30
  • 31. Creating Data Sources val input = TextFile(“file://”) val input: DataSet[(Int, String)] = DataSource(“hdfs://”, CsvInputFormat[(Int, String)]()) def parseInput(line: String): (Int, Int) = {…} val input = DataSource(“hdfs://”, DelimitedInputFormat](parseInput)) 31
  • 32. Interlude: Anonymous Functions var fun: ((Int, String)) => String = ... fun = { t => t._2 } fun = { _._2 } fun = { case (i, w) => w } 32
  • 33. Map val input: DataSet[(Int, String)] = ... val mapper = input .map { case (a, b) => (a + 2, b) } val mapper2 = input .flatMap { _._2.split(“ “) } val filtered = input .filter { case (a, b) => a > 3 } 33
  • 34. Reduce val input: DataSet[(String, Int)] = ... val reducer = input .groupBy { case (w, _) => w } .groupReduce { _.minBy {...} } val reducer2 = input .groupBy { case (w, _) => w } .reduce { (w1, w2) => (w1._1, w1._2 + w2._2) } 34
  • 35. Cross val left: DataSet[(String, Int)] = ... val right: DataSet[(String, Int)] = ... val cross = left cross right .map { (l, r) => ... } val cross = left cross right .flatMap { (l, r) => ... } 35
  • 36. Join (Match) val counts: DataSet[(String, Int)] = ... val names: DataSet[(Int, String)] = ... val join = counts .join(right) .where {case (_,c) => c}.isEqualsTo {case (n,_) => n} .map { (l, r) => (l._1, r._2) } val join = counts .join(right) .where {case (_,c) => c}.isEqualsTo {case (n,_) => n} .flatMap { (l, r) => ... } 36
  • 37. CoGroup val counts: DataSet[(String, Int)] = ... val names: DataSet[(Int, String)] = ... val cogroup = counts .cogroup(right) .where {case (_,c) => c}.isEqualsTo {case (n,_) => n} .map { (l, r) => (l.minBy {...} , r.minBy {...}) } val cogroup = counts .cogroup(right) .where {case (_,c) => c}.isEqualsTo {case (n,_) => n} .flatMap { (l, r) => ... } 37
  • 38. Creating Data Sinks val counts: DataSet[(String, Int)] val sink = counts.write(“<>”, CsvOutputFormat()) def formatOutput(a: (String, Int)): String = { “Word “ + a._1 + “ count “ + a._2 } val sink = counts.write(“<>”, DelimitedOutputFormat(formatOutput)) 38
  • 39. Word Count example val input = TextFile(textInput) val words = input .flatMap { _.split(" ") map { (_, 1) } } val counts = words .groupBy { case (word, _) => word } .reduce { (w1, w2) => (w1._1, w1._2 + w2._2) } val output = counts .write(wordsOutput, CsvOutputFormat()) val plan = new ScalaPlan(Seq(output)) 39
  • 40. Things not mentioned ● The is support for iterations (both in Java and Scala) ● Many more data source/sink formats ● Look at the examples in the stratosphere source ● Don't be afraid to write on mailing list and on github: – ● http://stratosphere.eu/quickstart/scala.html Or come directly to us 40
  • 41. End.

Editor's Notes

  1. Google: Search results, Spam Filter Amazon: Recommendations Soundcloud: Recommendations Spotify: Recommendations Youtube: Recommendations, Adverts Netflix: Recommendations, compare to Maxdome :D Twitter: Just everything … :D Facebook: Adverts, GraphSearch, Friend suggestion, Filtering (for annoying friends) Instagram: They have lots of data, theres gotta be something … Bioinformatik: DNA, 1TB per genom, 1000 genome
  2. Google: Search results, Spam Filter Amazon: Recommendations Soundcloud: Recommendations Spotify: Recommendations Youtube: Recommendations, Adverts Netflix: Recommendations, compare to Maxdome :D Twitter: Just everything … :D Facebook: Adverts, GraphSearch, Friend suggestion, Filtering (for annoying friends) Instagram: They have lots of data, theres gotta be something … Bioinformatik: DNA, 1TB per genom, 1000 genome