In-Memory Data Streams
With
NEIL STEVENSON
neil@hazelcast.com
27th May 2017
13:25-14:10
© 2017 Hazelcast Inc. Confidential & Proprietary
Outline
• Hazelcast
• → The company, the software, and my role
• Background
• → Why stream at all ?
• Java 8 streams
• → What did Java 8 add to Java 7
• → Why isn’t this good enough ?
• Hazelcast Jet, part #1
• → Introduction and outline architecture
• → Low level abstractions : directed acyclic graphs
• A sample application, available to download : not Word Count
• Hazelcast Jet, part #2
• → Higher level abstractions → distributed java.util.stream
© 2017 Hazelcast Inc. Confidential & Proprietary
Hazelcast : The company, the software and my role
The Company
Founded in 2008, based out of Palo Alto, California with offices worldwide
Provides commercial support and valid-add subscription features for open source Hazelcast software
The Software
Apache 2 licensed, available to download from Github, from https://hazelcast.org or
https://hazelcast.com
My Role
Solutions Architect – help customers, give talks, drink coffee, write code, drink coffee
© 2017 Hazelcast Inc. Confidential & Proprietary
Part 1 – Fast Big Data
DAG = Directed Acyclic Graph
Model the flow of data from processing stage to processing stage
→ a stream of data, potentially infinite
→ process as it comes in, don’t save first, maybe never save
→ enrich, deplete, filter, split, etc as data passes through
→ at memory speeds, no waiting for disks
© 2017 Hazelcast Inc. Confidential & Proprietary
Part 1 – Fast Big Data
DAG = Directed Acyclic Graph
Model the flow of data from processing stage to processing stage
→ a stream of data, potentially infinite
→ process as it comes in, don’t save first, maybe never save
→ enrich, deplete, filter, split, etc as data passes through
→ at memory speeds, no waiting for disks
© 2017 Hazelcast Inc. Confidential & Proprietary
Part 1 – Fast Big Data
6
Stream and Fast In-Memory Batch Processing
Enrichment
Databases
IoT
Social
Networks
Enterprise
Applications
Databases/
Hazelcast IMDG
HDFS/
Spark
Stream
Stream
Stream
Batch
Batch
Ingest
Alerts
Enterprise
Applications
Interactive
Analytics
Databases/
Hazelcast IMDG
Output
© 2017 Hazelcast Inc. Confidential & Proprietary
Jet : Directed Acyclic Graph
VERTEX
The vertex is just the processing node in a pipeline.
→ Input comes in from somewhere, the first stage or the previous stage
→ Output goes out somewhere, the last stage or the next state stage
→ Stateless or stateful
→ Split, filter, enrich, deplete, fan-out, fan-in the data, many possibilities
© 2017 Hazelcast Inc. Confidential & Proprietary
Jet : Directed Acyclic Graph
EDGE
The edge is just the data transmission in the pipeline.
→ Out of one processor into the next one
→ Out of one processor into the next ones
→ The next processor can be on any JVM, local or distributed routing
→ Back-pressure system throttles producer when consumer cannot keep up
© 2017 Hazelcast Inc. Confidential & Proprietary
Part 1 – Jet Engine
Stream Processing
Traditional processing is based on calculations on stored data
Stream processing is about calculations prior to storage
Streams are immutable
Streams may be infinite
The “pipeline” paradigm, (input →process →output)
Pipeline stages are lambdas : (x, y) -> {return x * y;}
© 2017 Hazelcast Inc. Confidential & Proprietary
Part 1 –Jet Engine
What does it do ?
Stream Processing
In-memory
Distributed
© 2017 Hazelcast Inc. Confidential & Proprietary
Example 1 : Word Count
Word Count is the “hello world” of stream processing:
The Problem
 Count how many times each word occurs in some text
 Trivial, but shows some major concepts
Input
 Hamlet’s Soliloquy
1: To be, or not to be, that is the Question:
2: Whether ’tis Nobler in the mind to suffer
3: The Slings and Arrows of outragious Fortune,
4: Or to take Armes against a Sea of troubles,
Output
the=23
to=14
and=13
be=4
…
© 2017 Hazelcast Inc. Confidential & Proprietary
Example 1 : Word Count
Set<Map.Entry<Integer, String>> entrySet = sourceMap.entrySet();
Map<String, Integer> wordCounts = entrySet.stream()
.flatMap(m ->
Stream.of(Constants.WORDS_PATTERN.split(m.getValue())))
.map(String::toLowerCase)
.filter(m -> m.length() >= 5)
.collect(toMap(
key -> key,
value -> 1,
Integer::sum));
In Java we would basically iterate and tally
How can the JVM optimise?
© 2017 Hazelcast Inc. Confidential & Proprietary
Example 1 : Word Count
Input OutputTokenizer Reducer
Split the text into words
For each word emit (word)
Collect running totals
Once everything is finished,
emit all pairs of (word, count)
(text) (word) (word, count)
But really this is just a pipeline, so a DAG
© 2017 Hazelcast Inc. Confidential & Proprietary
Example 1 : Word Count
Input
(text) (word)
Output
(word, count)
Tokenizer Reducer
Split the text into words
For each word emit (word)
Collect running totals.
Once everything is finished,
emit all pairs of (word, count)
Using queues between vertices allows each to run in parallel, at their own speed
© 2017 Hazelcast Inc. Confidential & Proprietary
Example 1 : Word Count
Output
(word, count)
ReducerInput
Tokenizer
Tokenizer
We can exploit multiple CPUs because lines can be processed in parallel
© 2017 Hazelcast Inc. Confidential & Proprietary
Example 1 : Word Count
(word)
(word)
Input Output
Tokenizer
Tokenizer
Reducer
Reducer
Use routing algorithms to select the next vertex or vertices
© 2017 Hazelcast Inc. Confidential & Proprietary
Example 1 : Word Count
Node
Node
Input Output
Tokenizer
Tokenizer
Reducer
Reducer
Combiner
Combiner
Input Output
Tokenizer
Tokenizer
Reducer
Reducer
Combiner
Combiner
Distribute!!
© 2017 Hazelcast Inc. Confidential & Proprietary
Example 2 : Foreign Currency
The Problem :
Time-series foreign exchange prices.
We want to compute moving averages in various ways
→ last n measurements, last 15, last 50, etc
Why ?
→ rapidly changing data
→ time-to-market benefits from fast processing
Why ?
→ gives a clearer view of the trend
Why ?
→ to demonstrate a different architecture pattern
→ processing a stream of data, don’t save first then analyse
→ partitioning a stream of data, for scaling
© 2017 Hazelcast Inc. Confidential & Proprietary
Example 2 : Foreign Currency
The Data
For convenience, we’re using end of day prices rather than live prices, so frequency is one
sample per 24x60x60x1000 milliseconds. And only for the Euro.
<gesmes:Sender>
<gesmes:name>European Central Bank</gesmes:name>
</gesmes:Sender>
<Cube>
<Cube time="2017-04-20">
<Cube currency="USD" rate="1.0745"/>
<Cube currency="JPY" rate="117.16"/>
<Cube currency="BGN" rate="1.9558"/>
<Cube currency="CZK" rate="26.907"/>
<Cube currency="DKK" rate="7.4381"/>
<Cube currency="GBP" rate="0.8392"/>
<Cube currency="HUF" rate="313.5"/>
<Cube currency="PLN" rate="4.2588"/>
<Cube currency="RON" rate="4.5405"/>
© 2017 Hazelcast Inc. Confidential & Proprietary
Example 2 : Foreign Currency
Last n
Window
Input:
FX feed
(from,to,price)
One Solution
Input arrives as a stream of individual prices. Eg ”EUR,GBP,0.8392”
Collate these into batch of n per pair
© 2017 Hazelcast Inc. Confidential & Proprietary
Example 2 : Foreign Currency
Last n
Window
Simple
Average
Weighted
Average
Input:
FX feed
(from,to,price)
n * (from,to,price)
n * (from,to,price)
One Solution
Send a self-contained parcel of work to each calculator
A batch of n prices for a pair, eg. ”EUR,GBP,0.8392, 0.8391, 0.8390, 0.8389, …”
© 2017 Hazelcast Inc. Confidential & Proprietary
Example 2 : Foreign Currency
Output:
Store A
Last n
Window
Simple
Average
Weighted
Average
Input:
FX feed
Output:
Store B
(from,to,price)
n * (from,to,price)
n * (from,to,price)
(from,to,average)
(from,to,average)
One Solution
Stream out the averages….
Your output is someone else’s input
© 2017 Hazelcast Inc. Confidential & Proprietary
Example 2 : Foreign Currency
Output:
Store A
Input:
FX feed
Output:
Store B
(from,CAD,price)
Last n
Window
Simple
Average
Weighted
Average
n * (from, USD,price)
n * (from, USD,price)
(from, USD,,average)
(from, CAD,,average)
Last n
Window
Simple
Average
Weighted
Average
n * (from CAD,price)
n * (from, CAD,price)
(from,USD,price)
(from, CAD,,average)
(from, USD,,average)
One Solution
Partition provides performance. Send US Dollars and Canadian Dollars to different processor
clones
© 2017 Hazelcast Inc. Confidential & Proprietary
Example 2 : Foreign Currency
One Solution
DEMO
https://github.com/neilstevenson/jeeconf2017
© 2017 Hazelcast Inc. Confidential & Proprietary
Example 2 : Foreign Currency
One Solution
© 2017 Hazelcast Inc. Confidential & Proprietary
Jet Engine
Jet capability is easy to add to IMDG
Two steps and you’re ready to submit jobs!
<dependency>
<groupId>com.hazelcast.jet</groupId>
<artifactId>hazelcast-jet</artifactId>
<version>0.3.1</version>
</dependency>
@Bean
public JetInstance jetInstance(Config config) {
JetConfig jetConfig = new JetConfig();
jetConfig.setHazelcastConfig(config);
return Jet.newJetInstance(jetConfig);
}
© 2017 Hazelcast Inc. Confidential & Proprietary
Jet Engine
Jet capability is the processing, but what about the start and end of the pipelines ?
A source creates output without input.
A sink consumes input without output.
Where it goes is just a matter of plumbing
→ Hazelcast IMDG, IMap and IList
→ Kafka
→ HDFS
→ flat files
→ sockets
→ easy to write your own, they’re just vertices
implement process() to consume input
implement complete() to generate output
© 2017 Hazelcast Inc. Confidential & Proprietary
Jet Engine
DAG construction is easy(?)
Create vertices, and edges to link them
public MaDAG (final int last) {
Vertex mapSource = this.newVertex("mapSource",
Processors.readMap(Constants.MAP_HISTORIC_CURRENCY));
Vertex lastN = this.newVertex("lastN", new LastNProcessorSupplier(last));
this.edge(Edge.between(mapSource, lastN).partitioned(new MaKeyExtractor()));
Vertex sma = this.newVertex("sma", SmaProcessor::new);
this.edge(Edge.from(lastN, 0).to(sma));
Vertex smaMapSink = this.newVertex("smaMapSink",
Processors.writeMap(Constants.MAP_SMA));
this.edge(Edge.between(sma, smaMapSink));
But is there any easier way ?
© 2017 Hazelcast Inc. Confidential & Proprietary
Jet Engine
java.util.stream
An easier(?) way to construct a pipeline
Change from Java 8
Set<Map.Entry<Integer, String>> entrySet = sourceMap.entrySet();
Map<String, Integer> wordCounts = entrySet.stream()
.flatMap(m ->
Stream.of(Constants.WORDS_PATTERN.split(m.getValue())))
.map(String::toLowerCase)
.filter(m -> m.length() >= 5)
.collect(toMap(
key -> key,
value -> 1,
Integer::sum));
© 2017 Hazelcast Inc. Confidential & Proprietary
Jet Engine
com.hazelcast.jet.stream
An easier(?) way to construct a pipeline
Change to Jet
IStreamMap<Integer, String> streamMap = IStreamMap.streamMap(sourceMap);
IMap<String, Integer> wordCounts = streamMap.stream()
.flatMap(m ->
Stream.of(Constants.WORDS_PATTERN.split(m.getValue())))
.map(String::toLowerCase)
.filter(m -> m.length() >= 5)
.collect(toIMap(
key -> key,
value -> 1,
Integer::sum));
More thinking than typing
© 2017 Hazelcast Inc. Confidential & Proprietary
Jet Engine
DAG v java.util.stream
JET provides java.util.stream interface – high-level constructs
like Java 8’s collect(), distinct(),filter(), reduce(), sorted() etc
but run distributed
Or use the DAG approach, for low-level fine grained approached
Or mix & match
Vertex tokenize = dag.newVertex("tokenize",
flatMap((String line) ->
traverseArray(delimiter.split(line.toLowerCase()))
.filter(word -> !word.isEmpty())));
© 2017 Hazelcast Inc. Confidential & Proprietary
Jet Engine
DAG v java.util.stream
Vertex tokenize = dag.newVertex("tokenize",
flatMap((String line) -> traverseArray(delimiter.split(line.toLowerCase()))
.filter(word -> !word.isEmpty())));
Here filter implements
java.util.stream.Stream<T>
java.util.stream.Stream.filter(Predicate<? super T> predicate)
But the Jet version is
com.hazelcast.jet.stream.DistributedStream<T>
com.hazelcast.jet.stream.DistributedStream.filter(
(com.hazelcast.jet.Distribtued.Predicate<? super T> predicate)
So you can send copies to the grid to execute, remotely and in parallel
© 2017 Hazelcast Inc. Confidential & Proprietary
Jet Engine
Architecture
© 2017 Hazelcast Inc. Confidential & Proprietary
Jet Roadmap
34
Features Description
Robust Stream Processing
Processing guarantees for stream processing | Streaming specific
features (windowing, triggering)
High Performance
Hazelcast Integrations
JCache | Map and Cache events using partition ring buffer | CQ
Cache | Projection and Predicate for Map source
Management Center Management and monitoring features for Jet.
More Connectors JMS | JDBC
Cloud Deployment Pivotal Cloud Foundry | Open Shift
© 2017 Hazelcast Inc. Confidential & Proprietary
Part 1 – Jet Engine
Performance
Fastest in town!
© 2017 Hazelcast Inc. Confidential & Proprietary
Part 1 – Jet Engine
Performance
Run the graph on as many machines as necessary or available
→ Fan-out the input
→ Send from node to node, local or distributed
→ Fan-in the output
© 2017 Hazelcast Inc. Confidential & Proprietary
Conclusions
Stream Processing
• Suitable when data arrives too fast to process after storing, or where you don’t care to store
• Needs a much more functional programming style than tradition Java
• → lambdas feature heavily
• Java streams is ok, might be all you need
• → makes good use of a single machine
• Jet streams is better, for bigger volumes
• → makes use of multiple machines
• Jet is from Hazelcast
• → easy to get going, deploy to bare metal or any cloud
• Alternatives exist, such as Spark and Flink
• → Jet is open-source, Java, faster, no Zookeeper
© 2017 Hazelcast Inc. Confidential & Proprietary
The End
https://github.com/neilstevenson/jeeconf2017
neil@hazelcast.com
https://jet.hazelcast.org/
https://github.com/hazelcast/hazelcast-jet
Stack Overflow “hazelcast-jet” or Google Group
https://gitter.im/hazelcast/home

JEEConf 2017 - In-Memory Data Streams With Hazelcast Jet

  • 1.
    In-Memory Data Streams With NEILSTEVENSON neil@hazelcast.com 27th May 2017 13:25-14:10
  • 2.
    © 2017 HazelcastInc. Confidential & Proprietary Outline • Hazelcast • → The company, the software, and my role • Background • → Why stream at all ? • Java 8 streams • → What did Java 8 add to Java 7 • → Why isn’t this good enough ? • Hazelcast Jet, part #1 • → Introduction and outline architecture • → Low level abstractions : directed acyclic graphs • A sample application, available to download : not Word Count • Hazelcast Jet, part #2 • → Higher level abstractions → distributed java.util.stream
  • 3.
    © 2017 HazelcastInc. Confidential & Proprietary Hazelcast : The company, the software and my role The Company Founded in 2008, based out of Palo Alto, California with offices worldwide Provides commercial support and valid-add subscription features for open source Hazelcast software The Software Apache 2 licensed, available to download from Github, from https://hazelcast.org or https://hazelcast.com My Role Solutions Architect – help customers, give talks, drink coffee, write code, drink coffee
  • 4.
    © 2017 HazelcastInc. Confidential & Proprietary Part 1 – Fast Big Data DAG = Directed Acyclic Graph Model the flow of data from processing stage to processing stage → a stream of data, potentially infinite → process as it comes in, don’t save first, maybe never save → enrich, deplete, filter, split, etc as data passes through → at memory speeds, no waiting for disks
  • 5.
    © 2017 HazelcastInc. Confidential & Proprietary Part 1 – Fast Big Data DAG = Directed Acyclic Graph Model the flow of data from processing stage to processing stage → a stream of data, potentially infinite → process as it comes in, don’t save first, maybe never save → enrich, deplete, filter, split, etc as data passes through → at memory speeds, no waiting for disks
  • 6.
    © 2017 HazelcastInc. Confidential & Proprietary Part 1 – Fast Big Data 6 Stream and Fast In-Memory Batch Processing Enrichment Databases IoT Social Networks Enterprise Applications Databases/ Hazelcast IMDG HDFS/ Spark Stream Stream Stream Batch Batch Ingest Alerts Enterprise Applications Interactive Analytics Databases/ Hazelcast IMDG Output
  • 7.
    © 2017 HazelcastInc. Confidential & Proprietary Jet : Directed Acyclic Graph VERTEX The vertex is just the processing node in a pipeline. → Input comes in from somewhere, the first stage or the previous stage → Output goes out somewhere, the last stage or the next state stage → Stateless or stateful → Split, filter, enrich, deplete, fan-out, fan-in the data, many possibilities
  • 8.
    © 2017 HazelcastInc. Confidential & Proprietary Jet : Directed Acyclic Graph EDGE The edge is just the data transmission in the pipeline. → Out of one processor into the next one → Out of one processor into the next ones → The next processor can be on any JVM, local or distributed routing → Back-pressure system throttles producer when consumer cannot keep up
  • 9.
    © 2017 HazelcastInc. Confidential & Proprietary Part 1 – Jet Engine Stream Processing Traditional processing is based on calculations on stored data Stream processing is about calculations prior to storage Streams are immutable Streams may be infinite The “pipeline” paradigm, (input →process →output) Pipeline stages are lambdas : (x, y) -> {return x * y;}
  • 10.
    © 2017 HazelcastInc. Confidential & Proprietary Part 1 –Jet Engine What does it do ? Stream Processing In-memory Distributed
  • 11.
    © 2017 HazelcastInc. Confidential & Proprietary Example 1 : Word Count Word Count is the “hello world” of stream processing: The Problem  Count how many times each word occurs in some text  Trivial, but shows some major concepts Input  Hamlet’s Soliloquy 1: To be, or not to be, that is the Question: 2: Whether ’tis Nobler in the mind to suffer 3: The Slings and Arrows of outragious Fortune, 4: Or to take Armes against a Sea of troubles, Output the=23 to=14 and=13 be=4 …
  • 12.
    © 2017 HazelcastInc. Confidential & Proprietary Example 1 : Word Count Set<Map.Entry<Integer, String>> entrySet = sourceMap.entrySet(); Map<String, Integer> wordCounts = entrySet.stream() .flatMap(m -> Stream.of(Constants.WORDS_PATTERN.split(m.getValue()))) .map(String::toLowerCase) .filter(m -> m.length() >= 5) .collect(toMap( key -> key, value -> 1, Integer::sum)); In Java we would basically iterate and tally How can the JVM optimise?
  • 13.
    © 2017 HazelcastInc. Confidential & Proprietary Example 1 : Word Count Input OutputTokenizer Reducer Split the text into words For each word emit (word) Collect running totals Once everything is finished, emit all pairs of (word, count) (text) (word) (word, count) But really this is just a pipeline, so a DAG
  • 14.
    © 2017 HazelcastInc. Confidential & Proprietary Example 1 : Word Count Input (text) (word) Output (word, count) Tokenizer Reducer Split the text into words For each word emit (word) Collect running totals. Once everything is finished, emit all pairs of (word, count) Using queues between vertices allows each to run in parallel, at their own speed
  • 15.
    © 2017 HazelcastInc. Confidential & Proprietary Example 1 : Word Count Output (word, count) ReducerInput Tokenizer Tokenizer We can exploit multiple CPUs because lines can be processed in parallel
  • 16.
    © 2017 HazelcastInc. Confidential & Proprietary Example 1 : Word Count (word) (word) Input Output Tokenizer Tokenizer Reducer Reducer Use routing algorithms to select the next vertex or vertices
  • 17.
    © 2017 HazelcastInc. Confidential & Proprietary Example 1 : Word Count Node Node Input Output Tokenizer Tokenizer Reducer Reducer Combiner Combiner Input Output Tokenizer Tokenizer Reducer Reducer Combiner Combiner Distribute!!
  • 18.
    © 2017 HazelcastInc. Confidential & Proprietary Example 2 : Foreign Currency The Problem : Time-series foreign exchange prices. We want to compute moving averages in various ways → last n measurements, last 15, last 50, etc Why ? → rapidly changing data → time-to-market benefits from fast processing Why ? → gives a clearer view of the trend Why ? → to demonstrate a different architecture pattern → processing a stream of data, don’t save first then analyse → partitioning a stream of data, for scaling
  • 19.
    © 2017 HazelcastInc. Confidential & Proprietary Example 2 : Foreign Currency The Data For convenience, we’re using end of day prices rather than live prices, so frequency is one sample per 24x60x60x1000 milliseconds. And only for the Euro. <gesmes:Sender> <gesmes:name>European Central Bank</gesmes:name> </gesmes:Sender> <Cube> <Cube time="2017-04-20"> <Cube currency="USD" rate="1.0745"/> <Cube currency="JPY" rate="117.16"/> <Cube currency="BGN" rate="1.9558"/> <Cube currency="CZK" rate="26.907"/> <Cube currency="DKK" rate="7.4381"/> <Cube currency="GBP" rate="0.8392"/> <Cube currency="HUF" rate="313.5"/> <Cube currency="PLN" rate="4.2588"/> <Cube currency="RON" rate="4.5405"/>
  • 20.
    © 2017 HazelcastInc. Confidential & Proprietary Example 2 : Foreign Currency Last n Window Input: FX feed (from,to,price) One Solution Input arrives as a stream of individual prices. Eg ”EUR,GBP,0.8392” Collate these into batch of n per pair
  • 21.
    © 2017 HazelcastInc. Confidential & Proprietary Example 2 : Foreign Currency Last n Window Simple Average Weighted Average Input: FX feed (from,to,price) n * (from,to,price) n * (from,to,price) One Solution Send a self-contained parcel of work to each calculator A batch of n prices for a pair, eg. ”EUR,GBP,0.8392, 0.8391, 0.8390, 0.8389, …”
  • 22.
    © 2017 HazelcastInc. Confidential & Proprietary Example 2 : Foreign Currency Output: Store A Last n Window Simple Average Weighted Average Input: FX feed Output: Store B (from,to,price) n * (from,to,price) n * (from,to,price) (from,to,average) (from,to,average) One Solution Stream out the averages…. Your output is someone else’s input
  • 23.
    © 2017 HazelcastInc. Confidential & Proprietary Example 2 : Foreign Currency Output: Store A Input: FX feed Output: Store B (from,CAD,price) Last n Window Simple Average Weighted Average n * (from, USD,price) n * (from, USD,price) (from, USD,,average) (from, CAD,,average) Last n Window Simple Average Weighted Average n * (from CAD,price) n * (from, CAD,price) (from,USD,price) (from, CAD,,average) (from, USD,,average) One Solution Partition provides performance. Send US Dollars and Canadian Dollars to different processor clones
  • 24.
    © 2017 HazelcastInc. Confidential & Proprietary Example 2 : Foreign Currency One Solution DEMO https://github.com/neilstevenson/jeeconf2017
  • 25.
    © 2017 HazelcastInc. Confidential & Proprietary Example 2 : Foreign Currency One Solution
  • 26.
    © 2017 HazelcastInc. Confidential & Proprietary Jet Engine Jet capability is easy to add to IMDG Two steps and you’re ready to submit jobs! <dependency> <groupId>com.hazelcast.jet</groupId> <artifactId>hazelcast-jet</artifactId> <version>0.3.1</version> </dependency> @Bean public JetInstance jetInstance(Config config) { JetConfig jetConfig = new JetConfig(); jetConfig.setHazelcastConfig(config); return Jet.newJetInstance(jetConfig); }
  • 27.
    © 2017 HazelcastInc. Confidential & Proprietary Jet Engine Jet capability is the processing, but what about the start and end of the pipelines ? A source creates output without input. A sink consumes input without output. Where it goes is just a matter of plumbing → Hazelcast IMDG, IMap and IList → Kafka → HDFS → flat files → sockets → easy to write your own, they’re just vertices implement process() to consume input implement complete() to generate output
  • 28.
    © 2017 HazelcastInc. Confidential & Proprietary Jet Engine DAG construction is easy(?) Create vertices, and edges to link them public MaDAG (final int last) { Vertex mapSource = this.newVertex("mapSource", Processors.readMap(Constants.MAP_HISTORIC_CURRENCY)); Vertex lastN = this.newVertex("lastN", new LastNProcessorSupplier(last)); this.edge(Edge.between(mapSource, lastN).partitioned(new MaKeyExtractor())); Vertex sma = this.newVertex("sma", SmaProcessor::new); this.edge(Edge.from(lastN, 0).to(sma)); Vertex smaMapSink = this.newVertex("smaMapSink", Processors.writeMap(Constants.MAP_SMA)); this.edge(Edge.between(sma, smaMapSink)); But is there any easier way ?
  • 29.
    © 2017 HazelcastInc. Confidential & Proprietary Jet Engine java.util.stream An easier(?) way to construct a pipeline Change from Java 8 Set<Map.Entry<Integer, String>> entrySet = sourceMap.entrySet(); Map<String, Integer> wordCounts = entrySet.stream() .flatMap(m -> Stream.of(Constants.WORDS_PATTERN.split(m.getValue()))) .map(String::toLowerCase) .filter(m -> m.length() >= 5) .collect(toMap( key -> key, value -> 1, Integer::sum));
  • 30.
    © 2017 HazelcastInc. Confidential & Proprietary Jet Engine com.hazelcast.jet.stream An easier(?) way to construct a pipeline Change to Jet IStreamMap<Integer, String> streamMap = IStreamMap.streamMap(sourceMap); IMap<String, Integer> wordCounts = streamMap.stream() .flatMap(m -> Stream.of(Constants.WORDS_PATTERN.split(m.getValue()))) .map(String::toLowerCase) .filter(m -> m.length() >= 5) .collect(toIMap( key -> key, value -> 1, Integer::sum)); More thinking than typing
  • 31.
    © 2017 HazelcastInc. Confidential & Proprietary Jet Engine DAG v java.util.stream JET provides java.util.stream interface – high-level constructs like Java 8’s collect(), distinct(),filter(), reduce(), sorted() etc but run distributed Or use the DAG approach, for low-level fine grained approached Or mix & match Vertex tokenize = dag.newVertex("tokenize", flatMap((String line) -> traverseArray(delimiter.split(line.toLowerCase())) .filter(word -> !word.isEmpty())));
  • 32.
    © 2017 HazelcastInc. Confidential & Proprietary Jet Engine DAG v java.util.stream Vertex tokenize = dag.newVertex("tokenize", flatMap((String line) -> traverseArray(delimiter.split(line.toLowerCase())) .filter(word -> !word.isEmpty()))); Here filter implements java.util.stream.Stream<T> java.util.stream.Stream.filter(Predicate<? super T> predicate) But the Jet version is com.hazelcast.jet.stream.DistributedStream<T> com.hazelcast.jet.stream.DistributedStream.filter( (com.hazelcast.jet.Distribtued.Predicate<? super T> predicate) So you can send copies to the grid to execute, remotely and in parallel
  • 33.
    © 2017 HazelcastInc. Confidential & Proprietary Jet Engine Architecture
  • 34.
    © 2017 HazelcastInc. Confidential & Proprietary Jet Roadmap 34 Features Description Robust Stream Processing Processing guarantees for stream processing | Streaming specific features (windowing, triggering) High Performance Hazelcast Integrations JCache | Map and Cache events using partition ring buffer | CQ Cache | Projection and Predicate for Map source Management Center Management and monitoring features for Jet. More Connectors JMS | JDBC Cloud Deployment Pivotal Cloud Foundry | Open Shift
  • 35.
    © 2017 HazelcastInc. Confidential & Proprietary Part 1 – Jet Engine Performance Fastest in town!
  • 36.
    © 2017 HazelcastInc. Confidential & Proprietary Part 1 – Jet Engine Performance Run the graph on as many machines as necessary or available → Fan-out the input → Send from node to node, local or distributed → Fan-in the output
  • 37.
    © 2017 HazelcastInc. Confidential & Proprietary Conclusions Stream Processing • Suitable when data arrives too fast to process after storing, or where you don’t care to store • Needs a much more functional programming style than tradition Java • → lambdas feature heavily • Java streams is ok, might be all you need • → makes good use of a single machine • Jet streams is better, for bigger volumes • → makes use of multiple machines • Jet is from Hazelcast • → easy to get going, deploy to bare metal or any cloud • Alternatives exist, such as Spark and Flink • → Jet is open-source, Java, faster, no Zookeeper
  • 38.
    © 2017 HazelcastInc. Confidential & Proprietary The End https://github.com/neilstevenson/jeeconf2017 neil@hazelcast.com https://jet.hazelcast.org/ https://github.com/hazelcast/hazelcast-jet Stack Overflow “hazelcast-jet” or Google Group https://gitter.im/hazelcast/home