IN-MEMORY STREAM PROCESSING WITH
Nazarii Cherkas | Hazelcast
nazarii@hazelcast.com
https://twitter.com/n_cherkas
Brief Agenda
• Why Stream Processing?
• What‘s special about Streaming Data
• Challenges when processing the Infinite Stream
• Hazelcast Jet: The modern Stream Processing Engine
• Overview and Key Concepts
• Infinite Stream Processing
• Fault Tolerance
• Jet Performance
• Summary
2© 2018 Hazelcast Inc.
About me
• 7+ years of experience of on different positions
from Java Engineer to Team Lead
3© 2018 Hazelcast Inc.
About me
• 7+ years of experience of on different positions
from Java Engineer to Team Lead
• Solutions Architect at Hazelcast, I solve
problems of our users and interact with
community
4© 2018 Hazelcast Inc.
Why Stream Processing?
5© 2018 Hazelcast Inc.
Streaming Data is everywhere
6© 2018 Hazelcast Inc.
What's special about Streaming Data
• Infinite data sets
7© 2018 Hazelcast Inc.
What's special about Streaming Data
• Infinite data sets
• Small size of data record
8© 2018 Hazelcast Inc.
What's special about Streaming Data
• Infinite data sets
• Small size of data record
• Near real-time insights
9© 2018 Hazelcast Inc.
What's special about Streaming Data
• Infinite data sets
• Small size of data record
• Near real-time insights
• Variance in throughput and variance in disorder
10© 2018 Hazelcast Inc.
Definitions of Stream Processing
“...a type of data processing that is designed with infinite data sets in
mind...”
https://jet.hazelcast.org/use-cases/real-time-stream-processing/
https://data-artisans.com/what-is-stream-processing
https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-101
11© 2018 Hazelcast Inc.
Definitions of Stream Processing
“...a type of data processing that is designed with infinite data sets in
mind...”
“...processing of data in motion, or in other words, computing on data
directly as it is produced or received…”
https://jet.hazelcast.org/use-cases/real-time-stream-processing/
https://data-artisans.com/what-is-stream-processing
https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-101
12© 2018 Hazelcast Inc.
Definitions of Stream Processing
“...a type of data processing that is designed with infinite data sets in
mind...”
“...processing of data in motion, or in other words, computing on data
directly as it is produced or received…”
“...a technique to process the data on-the-fly, prior to it’s storage...”
https://jet.hazelcast.org/use-cases/real-time-stream-processing/
https://data-artisans.com/what-is-stream-processing
https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-101
13© 2018 Hazelcast Inc.
Stream vs Batch Processing
https://aws.amazon.com/streaming-data/
14
Batch processing Stream processing
Data scope Queries or processing over all or
most of the data in the dataset
Queries or processing over data
within a rolling time window, or on
just the most recent data record
© 2018 Hazelcast Inc.
Stream vs Batch Processing
https://aws.amazon.com/streaming-data/
15
Batch processing Stream processing
Data scope Queries or processing over all or
most of the data in the dataset
Queries or processing over data
within a rolling time window, or on
just the most recent data record
Data size Large batches of data
Individual records or micro batches
consisting of a few records
© 2018 Hazelcast Inc.
Stream vs Batch Processing
https://aws.amazon.com/streaming-data/
16
Batch processing Stream processing
Data scope Queries or processing over all or
most of the data in the dataset
Queries or processing over data
within a rolling time window, or on
just the most recent data record
Data size Large batches of data
Individual records or micro batches
consisting of a few records
Responsiveness Latencies in minutes to hours
Requires latency in the order of
seconds or milliseconds
© 2018 Hazelcast Inc.
Stream vs Batch Processing
https://aws.amazon.com/streaming-data/
17
Batch processing Stream processing
Data scope Queries or processing over all or
most of the data in the dataset
Queries or processing over data
within a rolling time window, or on
just the most recent data record
Data size Large batches of data
Individual records or micro batches
consisting of a few records
Responsiveness Latencies in minutes to hours
Requires latency in the order of
seconds or milliseconds
Analyses Complex analytics
Aggregates, simple response
functions and rolling metrics
© 2018 Hazelcast Inc.
Layers of Stream Processing
18© 2018 Hazelcast Inc.
Challenges of Stream Processing
• Distributed system coordination
19© 2018 Hazelcast Inc.
Challenges of Stream Processing
• Distributed system coordination
• Notion of time
20© 2018 Hazelcast Inc.
Challenges of Stream Processing
• Distributed system coordination
• Notion of time
• Memory management
21© 2018 Hazelcast Inc.
Challenges of Stream Processing
• Distributed system coordination
• Notion of time
• Memory management
• Fault-tolerance
22© 2018 Hazelcast Inc.
Hazelcast Jet: In-Memory Streaming and
Fast Batch Processing
23© 2018 Hazelcast Inc.
What is Hazelcast Jet
https://github.com/hazelcast/hazelcast-jet/
Apache License 2.0
24© 2018 Hazelcast Inc.
Source Sink
Hazelcast Jet use cases
• Low-latency Stream processing and analytics
25© 2018 Hazelcast Inc.
Hazelcast Jet use cases
• Low-latency Stream processing and analytics
• Fast Batch processing and ETL
26© 2018 Hazelcast Inc.
Hazelcast Jet use cases
• Low-latency Stream processing and analytics
• Fast Batch processing and ETL
• Distributed java.util.stream
27© 2018 Hazelcast Inc.
Hazelcast Jet use cases
• Low-latency Stream processing and analytics
• Fast Batch processing and ETL
• Distributed java.util.stream
• Implementing event sourcing and CQRS
28© 2018 Hazelcast Inc.
Hazelcast Jet use cases
• Low-latency Stream processing and analytics
• Fast Batch processing and ETL
• Distributed java.util.stream
• Implementing event sourcing and CQRS
• Data processing microservice architectures
29© 2018 Hazelcast Inc.
Hazelcast Jet: Architecture Overview
30
Core API
java.util.stream
Batch Readers and Writers
Batch Processing
Pipeline API
Streaming Readers and Writers
Stream Processing
Networking
Deployment
Data Structures and Partition Management
Execution Engine
Cluster Management with Cloud Discovery SPI
Java Client
Fault-Tolerance
Connectors
High-Level APIs
Processing
Core
© 2018 Hazelcast Inc.
Hazelcast Jet: Architecture Overview
31
Core API
java.util.stream
Batch Readers and Writers
Batch Processing
Pipeline API
Streaming Readers and Writers
Stream Processing
Networking
Deployment
Data Structures and Partition Management
Execution Engine
Cluster Management with Cloud Discovery SPI
Java Client
Fault-Tolerance
Connectors
High-Level APIs
Processing
Core
© 2018 Hazelcast Inc.
Hazelcast Jet: Architecture Overview
32
Core API
java.util.stream
Batch Readers and Writers
Batch Processing
Pipeline API
Streaming Readers and Writers
Stream Processing
Networking
Deployment
Data Structures and Partition Management
Execution Engine
Cluster Management with Cloud Discovery SPI
Java Client
Fault-Tolerance
Connectors
High-Level APIs
Processing
Core
© 2018 Hazelcast Inc.
Key concepts
Distributed Acyclic Graph (DAG)
33© 2018 Hazelcast Inc.
Key concepts
Jet Cluster
34© 2018 Hazelcast Inc.
Key concepts
Jet Cluster
35© 2018 Hazelcast Inc.
Key concepts
Job Execution
36© 2018 Hazelcast Inc.
Infinite Stream Processing with Jet
37© 2018 Hazelcast Inc.
Jet Streaming Demo
Flight Telemetry
Processing a near real-time Flight Telemetry Stream from ADS-B Exchange
- https://www.adsbexchange.com/
• Filter out planes outside of defined airports
38© 2018 Hazelcast Inc.
Jet Streaming Demo
Flight Telemetry
Processing a near real-time Flight Telemetry Stream from ADS-B Exchange
- https://www.adsbexchange.com/
• Filter out planes outside of defined airports
• Sliding over last 1 minute to detect, whether the plane is ascending, descending or
staying in the same level
39© 2018 Hazelcast Inc.
Jet Streaming Demo
Flight Telemetry
Processing a near real-time Flight Telemetry Stream from ADS-B Exchange
- https://www.adsbexchange.com/
• Filter out planes outside of defined airports
• Sliding over last 1 minute to detect, whether the plane is ascending, descending or
staying in the same level
• Based on the plane type and phase of the flight provides information about maximum
noise levels nearby to the airport and estimated C02 emissions for a region
40© 2018 Hazelcast Inc.
https://github.com/hazelcast/hazelcast-jet-demos/tree/master/flight-telemetry
Pipeline transformations
• Time-agnostic transformations
• Filter
• Map
• Flatmap
41© 2018 Hazelcast Inc.
Pipeline transformations
• Time-agnostic transformations
• Filter
• Map
• Flatmap
• Aggregation and Grouping
• Build-in count, different kind averages, min/max, linear trends and many more
42© 2018 Hazelcast Inc.
Pipeline transformations
• Time-agnostic transformations
• Filter
• Map
• Flatmap
• Aggregation and Grouping
• Build-in count, different kind averages, min/max, linear trends and many more
• Co-Aggregation
43© 2018 Hazelcast Inc.
Pipeline transformations
• Time-agnostic transformations
• Filter
• Map
• Flatmap
• Aggregation and Grouping
• Build-in count, different kind averages, min/max, linear trends and many more
• Co-Aggregation
• Hash-Join
44© 2018 Hazelcast Inc.
Windowing
45© 2018 Hazelcast Inc.
Windowing
46© 2018 Hazelcast Inc.
Windowing
47© 2018 Hazelcast Inc.
Watermarks to handle Late Events
Makes an educated guess that “from this point on there will be no more
items with timestamp less than this”
48© 2018 Hazelcast Inc.
Watermarks in Jet
Predefined Watermark Policies
• With Fixed Lag
• Limiting Lag and Delay
• Limiting Lag and Lull
• Limiting Timestamp and Wall-Clock Lag
49© 2018 Hazelcast Inc.
Fault Tolerance
50© 2018 Hazelcast Inc.
Jet Processing Fault Tolerance
Cluster elects a Coordinator Member who takes care of the Job Coordination
among the Cluster Members
51© 2018 Hazelcast Inc.
Jet Processing Fault Tolerance
Jet achieves fault tolerance in streaming jobs by making a snapshot of the
internal processing state
52© 2018 Hazelcast Inc.
Jet Processing Fault Tolerance
Coordinator Member detects the other Member failure and restarts the Job
using new topology
53© 2018 Hazelcast Inc.
Jet Processing Fault Tolerance
When the Coordinator Member crashes the new one is elected by the
Cluster
54© 2018 Hazelcast Inc.
Distributed Snapshots
Technique 1st described in a paper by Chandy and Lamport in 1989
55© 2018 Hazelcast Inc.
Distributed Snapshots
Technique 1st described in a paper by Chandy and Lamport in 1989
56© 2018 Hazelcast Inc.
Distributed Snapshots
Technique 1st described in a paper by Chandy and Lamport in 1989
57© 2018 Hazelcast Inc.
Distributed Snapshots
Technique 1st described in a paper by Chandy and Lamport in 1989
58© 2018 Hazelcast Inc.
Jet Processing Guarantees
• At-Least Once
59© 2018 Hazelcast Inc.
Jet Processing Guarantees
• At-Least Once
• Exactly Once
60© 2018 Hazelcast Inc.
Jet Processing Guarantees
• At-Least Once
• Exactly Once
• At-Most Once (meaning that the Fault Tolerance is turned off)
61© 2018 Hazelcast Inc.
Performance
62© 2018 Hazelcast Inc.
Hazelcast Jet Performance
Key Design Decisions
• DAG to Model Computations
63© 2018 Hazelcast Inc.
Hazelcast Jet Performance
Key Design Decisions
• DAG to Model Computations
• In-Memory Data Locality
64© 2018 Hazelcast Inc.
Hazelcast Jet Performance
Key Design Decisions
• DAG to Model Computations
• In-Memory Data Locality
• Partition Mapping Affinity
65© 2018 Hazelcast Inc.
Hazelcast Jet Performance
Key Design Decisions
• DAG to Model Computations
• In-Memory Data Locality
• Partition Mapping Affinity
• SP/SC Queues
66© 2018 Hazelcast Inc.
Hazelcast Jet Performance
Key Design Decisions
• DAG to Model Computations
• In-Memory Data Locality
• Partition Mapping Affinity
• SP/SC Queues
• Cooperative Multithreading (Green Threads)
67© 2018 Hazelcast Inc.
Jet Streaming Performance
68© 2018 Hazelcast Inc.
https://jet.hazelcast.org/performance/
Jet Throughput
69© 2018 Hazelcast Inc.
https://jet.hazelcast.org/performance/
© 2017 Hazelcast Inc.
Running Jet in Production
• Docker images - https://github.com/hazelcast/hazelcast-jet-docker
70© 2018 Hazelcast Inc.
Running Jet in Production
• Docker images - https://github.com/hazelcast/hazelcast-jet-docker
• Cluster Management: Mesos, Yarn
71© 2018 Hazelcast Inc.
Running Jet in Production
• Docker images - https://github.com/hazelcast/hazelcast-jet-docker
• Cluster Management: Mesos, Yarn
• Cluster Discovery
• Cloud Providers: AWS, Windows Azure, GCP, PCF, Heroku
• Kubernetes
• Consul, Eureka, Zookeeper
72© 2018 Hazelcast Inc.
Summary
Why you should consider to use the Hazelcast Jet
• High Performance | Industry Leading
73© 2018 Hazelcast Inc.
Summary
Why you should consider to use the Hazelcast Jet
• High Performance | Industry Leading
• Out-of-box integration with Hazelcast IMDG | Source, Sink, Enrichment
74© 2018 Hazelcast Inc.
Summary
Why you should consider to use the Hazelcast Jet
• High Performance | Industry Leading
• Out-of-box integration with Hazelcast IMDG | Source, Sink, Enrichment
• Easy to start with and integrate | Zero dependencies, developer friendly
75© 2018 Hazelcast Inc.
Summary
Why you should consider to use the Hazelcast Jet
• High Performance | Industry Leading
• Out-of-box integration with Hazelcast IMDG | Source, Sink, Enrichment
• Easy to start with and integrate | Zero dependencies, developer friendly
• Simple to deploy | Embedded 10MB jar or Client-Server
76© 2018 Hazelcast Inc.
Summary
Why you should consider to use the Hazelcast Jet
• High Performance | Industry Leading
• Out-of-box integration with Hazelcast IMDG | Source, Sink, Enrichment
• Easy to start with and integrate | Zero dependencies, developer friendly
• Simple to deploy | Embedded 10MB jar or Client-Server
• Works in every Cloud | Same as Hazelcast IMDG
77© 2018 Hazelcast Inc.
Summary
Why you should consider to use the Hazelcast Jet
• High Performance | Industry Leading
• Out-of-box integration with Hazelcast IMDG | Source, Sink, Enrichment
• Easy to start with and integrate | Zero dependencies, developer friendly
• Simple to deploy | Embedded 10MB jar or Client-Server
• Works in every Cloud | Same as Hazelcast IMDG
• For Developers by Developers | Code it
78© 2018 Hazelcast Inc.
Questions?
Version 0.6 is the current release with 0.7 coming Q3 2018
aiming for 1.0 this year
http://jet.hazelcast.org
https://groups.google.com/forum/#!forum/hazelcast-jet
https://gitter.im/hazelcast/hazelcast
79© 2018 Hazelcast Inc.

In-Memory Stream Processing with Hazelcast Jet @JEEConf

  • 1.
    IN-MEMORY STREAM PROCESSINGWITH Nazarii Cherkas | Hazelcast nazarii@hazelcast.com https://twitter.com/n_cherkas
  • 2.
    Brief Agenda • WhyStream Processing? • What‘s special about Streaming Data • Challenges when processing the Infinite Stream • Hazelcast Jet: The modern Stream Processing Engine • Overview and Key Concepts • Infinite Stream Processing • Fault Tolerance • Jet Performance • Summary 2© 2018 Hazelcast Inc.
  • 3.
    About me • 7+years of experience of on different positions from Java Engineer to Team Lead 3© 2018 Hazelcast Inc.
  • 4.
    About me • 7+years of experience of on different positions from Java Engineer to Team Lead • Solutions Architect at Hazelcast, I solve problems of our users and interact with community 4© 2018 Hazelcast Inc.
  • 5.
    Why Stream Processing? 5©2018 Hazelcast Inc.
  • 6.
    Streaming Data iseverywhere 6© 2018 Hazelcast Inc.
  • 7.
    What's special aboutStreaming Data • Infinite data sets 7© 2018 Hazelcast Inc.
  • 8.
    What's special aboutStreaming Data • Infinite data sets • Small size of data record 8© 2018 Hazelcast Inc.
  • 9.
    What's special aboutStreaming Data • Infinite data sets • Small size of data record • Near real-time insights 9© 2018 Hazelcast Inc.
  • 10.
    What's special aboutStreaming Data • Infinite data sets • Small size of data record • Near real-time insights • Variance in throughput and variance in disorder 10© 2018 Hazelcast Inc.
  • 11.
    Definitions of StreamProcessing “...a type of data processing that is designed with infinite data sets in mind...” https://jet.hazelcast.org/use-cases/real-time-stream-processing/ https://data-artisans.com/what-is-stream-processing https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-101 11© 2018 Hazelcast Inc.
  • 12.
    Definitions of StreamProcessing “...a type of data processing that is designed with infinite data sets in mind...” “...processing of data in motion, or in other words, computing on data directly as it is produced or received…” https://jet.hazelcast.org/use-cases/real-time-stream-processing/ https://data-artisans.com/what-is-stream-processing https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-101 12© 2018 Hazelcast Inc.
  • 13.
    Definitions of StreamProcessing “...a type of data processing that is designed with infinite data sets in mind...” “...processing of data in motion, or in other words, computing on data directly as it is produced or received…” “...a technique to process the data on-the-fly, prior to it’s storage...” https://jet.hazelcast.org/use-cases/real-time-stream-processing/ https://data-artisans.com/what-is-stream-processing https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-101 13© 2018 Hazelcast Inc.
  • 14.
    Stream vs BatchProcessing https://aws.amazon.com/streaming-data/ 14 Batch processing Stream processing Data scope Queries or processing over all or most of the data in the dataset Queries or processing over data within a rolling time window, or on just the most recent data record © 2018 Hazelcast Inc.
  • 15.
    Stream vs BatchProcessing https://aws.amazon.com/streaming-data/ 15 Batch processing Stream processing Data scope Queries or processing over all or most of the data in the dataset Queries or processing over data within a rolling time window, or on just the most recent data record Data size Large batches of data Individual records or micro batches consisting of a few records © 2018 Hazelcast Inc.
  • 16.
    Stream vs BatchProcessing https://aws.amazon.com/streaming-data/ 16 Batch processing Stream processing Data scope Queries or processing over all or most of the data in the dataset Queries or processing over data within a rolling time window, or on just the most recent data record Data size Large batches of data Individual records or micro batches consisting of a few records Responsiveness Latencies in minutes to hours Requires latency in the order of seconds or milliseconds © 2018 Hazelcast Inc.
  • 17.
    Stream vs BatchProcessing https://aws.amazon.com/streaming-data/ 17 Batch processing Stream processing Data scope Queries or processing over all or most of the data in the dataset Queries or processing over data within a rolling time window, or on just the most recent data record Data size Large batches of data Individual records or micro batches consisting of a few records Responsiveness Latencies in minutes to hours Requires latency in the order of seconds or milliseconds Analyses Complex analytics Aggregates, simple response functions and rolling metrics © 2018 Hazelcast Inc.
  • 18.
    Layers of StreamProcessing 18© 2018 Hazelcast Inc.
  • 19.
    Challenges of StreamProcessing • Distributed system coordination 19© 2018 Hazelcast Inc.
  • 20.
    Challenges of StreamProcessing • Distributed system coordination • Notion of time 20© 2018 Hazelcast Inc.
  • 21.
    Challenges of StreamProcessing • Distributed system coordination • Notion of time • Memory management 21© 2018 Hazelcast Inc.
  • 22.
    Challenges of StreamProcessing • Distributed system coordination • Notion of time • Memory management • Fault-tolerance 22© 2018 Hazelcast Inc.
  • 23.
    Hazelcast Jet: In-MemoryStreaming and Fast Batch Processing 23© 2018 Hazelcast Inc.
  • 24.
    What is HazelcastJet https://github.com/hazelcast/hazelcast-jet/ Apache License 2.0 24© 2018 Hazelcast Inc. Source Sink
  • 25.
    Hazelcast Jet usecases • Low-latency Stream processing and analytics 25© 2018 Hazelcast Inc.
  • 26.
    Hazelcast Jet usecases • Low-latency Stream processing and analytics • Fast Batch processing and ETL 26© 2018 Hazelcast Inc.
  • 27.
    Hazelcast Jet usecases • Low-latency Stream processing and analytics • Fast Batch processing and ETL • Distributed java.util.stream 27© 2018 Hazelcast Inc.
  • 28.
    Hazelcast Jet usecases • Low-latency Stream processing and analytics • Fast Batch processing and ETL • Distributed java.util.stream • Implementing event sourcing and CQRS 28© 2018 Hazelcast Inc.
  • 29.
    Hazelcast Jet usecases • Low-latency Stream processing and analytics • Fast Batch processing and ETL • Distributed java.util.stream • Implementing event sourcing and CQRS • Data processing microservice architectures 29© 2018 Hazelcast Inc.
  • 30.
    Hazelcast Jet: ArchitectureOverview 30 Core API java.util.stream Batch Readers and Writers Batch Processing Pipeline API Streaming Readers and Writers Stream Processing Networking Deployment Data Structures and Partition Management Execution Engine Cluster Management with Cloud Discovery SPI Java Client Fault-Tolerance Connectors High-Level APIs Processing Core © 2018 Hazelcast Inc.
  • 31.
    Hazelcast Jet: ArchitectureOverview 31 Core API java.util.stream Batch Readers and Writers Batch Processing Pipeline API Streaming Readers and Writers Stream Processing Networking Deployment Data Structures and Partition Management Execution Engine Cluster Management with Cloud Discovery SPI Java Client Fault-Tolerance Connectors High-Level APIs Processing Core © 2018 Hazelcast Inc.
  • 32.
    Hazelcast Jet: ArchitectureOverview 32 Core API java.util.stream Batch Readers and Writers Batch Processing Pipeline API Streaming Readers and Writers Stream Processing Networking Deployment Data Structures and Partition Management Execution Engine Cluster Management with Cloud Discovery SPI Java Client Fault-Tolerance Connectors High-Level APIs Processing Core © 2018 Hazelcast Inc.
  • 33.
    Key concepts Distributed AcyclicGraph (DAG) 33© 2018 Hazelcast Inc.
  • 34.
    Key concepts Jet Cluster 34©2018 Hazelcast Inc.
  • 35.
    Key concepts Jet Cluster 35©2018 Hazelcast Inc.
  • 36.
  • 37.
    Infinite Stream Processingwith Jet 37© 2018 Hazelcast Inc.
  • 38.
    Jet Streaming Demo FlightTelemetry Processing a near real-time Flight Telemetry Stream from ADS-B Exchange - https://www.adsbexchange.com/ • Filter out planes outside of defined airports 38© 2018 Hazelcast Inc.
  • 39.
    Jet Streaming Demo FlightTelemetry Processing a near real-time Flight Telemetry Stream from ADS-B Exchange - https://www.adsbexchange.com/ • Filter out planes outside of defined airports • Sliding over last 1 minute to detect, whether the plane is ascending, descending or staying in the same level 39© 2018 Hazelcast Inc.
  • 40.
    Jet Streaming Demo FlightTelemetry Processing a near real-time Flight Telemetry Stream from ADS-B Exchange - https://www.adsbexchange.com/ • Filter out planes outside of defined airports • Sliding over last 1 minute to detect, whether the plane is ascending, descending or staying in the same level • Based on the plane type and phase of the flight provides information about maximum noise levels nearby to the airport and estimated C02 emissions for a region 40© 2018 Hazelcast Inc. https://github.com/hazelcast/hazelcast-jet-demos/tree/master/flight-telemetry
  • 41.
    Pipeline transformations • Time-agnostictransformations • Filter • Map • Flatmap 41© 2018 Hazelcast Inc.
  • 42.
    Pipeline transformations • Time-agnostictransformations • Filter • Map • Flatmap • Aggregation and Grouping • Build-in count, different kind averages, min/max, linear trends and many more 42© 2018 Hazelcast Inc.
  • 43.
    Pipeline transformations • Time-agnostictransformations • Filter • Map • Flatmap • Aggregation and Grouping • Build-in count, different kind averages, min/max, linear trends and many more • Co-Aggregation 43© 2018 Hazelcast Inc.
  • 44.
    Pipeline transformations • Time-agnostictransformations • Filter • Map • Flatmap • Aggregation and Grouping • Build-in count, different kind averages, min/max, linear trends and many more • Co-Aggregation • Hash-Join 44© 2018 Hazelcast Inc.
  • 45.
  • 46.
  • 47.
  • 48.
    Watermarks to handleLate Events Makes an educated guess that “from this point on there will be no more items with timestamp less than this” 48© 2018 Hazelcast Inc.
  • 49.
    Watermarks in Jet PredefinedWatermark Policies • With Fixed Lag • Limiting Lag and Delay • Limiting Lag and Lull • Limiting Timestamp and Wall-Clock Lag 49© 2018 Hazelcast Inc.
  • 50.
  • 51.
    Jet Processing FaultTolerance Cluster elects a Coordinator Member who takes care of the Job Coordination among the Cluster Members 51© 2018 Hazelcast Inc.
  • 52.
    Jet Processing FaultTolerance Jet achieves fault tolerance in streaming jobs by making a snapshot of the internal processing state 52© 2018 Hazelcast Inc.
  • 53.
    Jet Processing FaultTolerance Coordinator Member detects the other Member failure and restarts the Job using new topology 53© 2018 Hazelcast Inc.
  • 54.
    Jet Processing FaultTolerance When the Coordinator Member crashes the new one is elected by the Cluster 54© 2018 Hazelcast Inc.
  • 55.
    Distributed Snapshots Technique 1stdescribed in a paper by Chandy and Lamport in 1989 55© 2018 Hazelcast Inc.
  • 56.
    Distributed Snapshots Technique 1stdescribed in a paper by Chandy and Lamport in 1989 56© 2018 Hazelcast Inc.
  • 57.
    Distributed Snapshots Technique 1stdescribed in a paper by Chandy and Lamport in 1989 57© 2018 Hazelcast Inc.
  • 58.
    Distributed Snapshots Technique 1stdescribed in a paper by Chandy and Lamport in 1989 58© 2018 Hazelcast Inc.
  • 59.
    Jet Processing Guarantees •At-Least Once 59© 2018 Hazelcast Inc.
  • 60.
    Jet Processing Guarantees •At-Least Once • Exactly Once 60© 2018 Hazelcast Inc.
  • 61.
    Jet Processing Guarantees •At-Least Once • Exactly Once • At-Most Once (meaning that the Fault Tolerance is turned off) 61© 2018 Hazelcast Inc.
  • 62.
  • 63.
    Hazelcast Jet Performance KeyDesign Decisions • DAG to Model Computations 63© 2018 Hazelcast Inc.
  • 64.
    Hazelcast Jet Performance KeyDesign Decisions • DAG to Model Computations • In-Memory Data Locality 64© 2018 Hazelcast Inc.
  • 65.
    Hazelcast Jet Performance KeyDesign Decisions • DAG to Model Computations • In-Memory Data Locality • Partition Mapping Affinity 65© 2018 Hazelcast Inc.
  • 66.
    Hazelcast Jet Performance KeyDesign Decisions • DAG to Model Computations • In-Memory Data Locality • Partition Mapping Affinity • SP/SC Queues 66© 2018 Hazelcast Inc.
  • 67.
    Hazelcast Jet Performance KeyDesign Decisions • DAG to Model Computations • In-Memory Data Locality • Partition Mapping Affinity • SP/SC Queues • Cooperative Multithreading (Green Threads) 67© 2018 Hazelcast Inc.
  • 68.
    Jet Streaming Performance 68©2018 Hazelcast Inc. https://jet.hazelcast.org/performance/
  • 69.
    Jet Throughput 69© 2018Hazelcast Inc. https://jet.hazelcast.org/performance/
  • 70.
    © 2017 HazelcastInc. Running Jet in Production • Docker images - https://github.com/hazelcast/hazelcast-jet-docker 70© 2018 Hazelcast Inc.
  • 71.
    Running Jet inProduction • Docker images - https://github.com/hazelcast/hazelcast-jet-docker • Cluster Management: Mesos, Yarn 71© 2018 Hazelcast Inc.
  • 72.
    Running Jet inProduction • Docker images - https://github.com/hazelcast/hazelcast-jet-docker • Cluster Management: Mesos, Yarn • Cluster Discovery • Cloud Providers: AWS, Windows Azure, GCP, PCF, Heroku • Kubernetes • Consul, Eureka, Zookeeper 72© 2018 Hazelcast Inc.
  • 73.
    Summary Why you shouldconsider to use the Hazelcast Jet • High Performance | Industry Leading 73© 2018 Hazelcast Inc.
  • 74.
    Summary Why you shouldconsider to use the Hazelcast Jet • High Performance | Industry Leading • Out-of-box integration with Hazelcast IMDG | Source, Sink, Enrichment 74© 2018 Hazelcast Inc.
  • 75.
    Summary Why you shouldconsider to use the Hazelcast Jet • High Performance | Industry Leading • Out-of-box integration with Hazelcast IMDG | Source, Sink, Enrichment • Easy to start with and integrate | Zero dependencies, developer friendly 75© 2018 Hazelcast Inc.
  • 76.
    Summary Why you shouldconsider to use the Hazelcast Jet • High Performance | Industry Leading • Out-of-box integration with Hazelcast IMDG | Source, Sink, Enrichment • Easy to start with and integrate | Zero dependencies, developer friendly • Simple to deploy | Embedded 10MB jar or Client-Server 76© 2018 Hazelcast Inc.
  • 77.
    Summary Why you shouldconsider to use the Hazelcast Jet • High Performance | Industry Leading • Out-of-box integration with Hazelcast IMDG | Source, Sink, Enrichment • Easy to start with and integrate | Zero dependencies, developer friendly • Simple to deploy | Embedded 10MB jar or Client-Server • Works in every Cloud | Same as Hazelcast IMDG 77© 2018 Hazelcast Inc.
  • 78.
    Summary Why you shouldconsider to use the Hazelcast Jet • High Performance | Industry Leading • Out-of-box integration with Hazelcast IMDG | Source, Sink, Enrichment • Easy to start with and integrate | Zero dependencies, developer friendly • Simple to deploy | Embedded 10MB jar or Client-Server • Works in every Cloud | Same as Hazelcast IMDG • For Developers by Developers | Code it 78© 2018 Hazelcast Inc.
  • 79.
    Questions? Version 0.6 isthe current release with 0.7 coming Q3 2018 aiming for 1.0 this year http://jet.hazelcast.org https://groups.google.com/forum/#!forum/hazelcast-jet https://gitter.im/hazelcast/hazelcast 79© 2018 Hazelcast Inc.

Editor's Notes

  • #2 TODO: review and move comments from Google Shit! presentation
  • #4 TODO: add contacts !!! TODO: what’s written? :)
  • #5 TODO: add contacts !!! TODO: what’s written? :)
  • #7  - the answer is that the streaming data [definition of term] is everywhere and it’s usually about … - all these examples of data are generated all the time and usually come with some important real-time insights that require the processing here and now TODO: too much, remove gaming activities
  • #8  - fraud detection - alerts generation - variance in throughput -> auto-scaling - disorder -> e.g., a plane full of people taking their phones out of airplane mode after having used them offline for the entire flight - disorder -> producer parallelism and retries – specific to the tools that are used, due to the internals, especially when using batching
  • #9  - fraud detection - alerts generation - variance in throughput -> auto-scaling - disorder -> e.g., a plane full of people taking their phones out of airplane mode after having used them offline for the entire flight - disorder -> producer parallelism and retries – specific to the tools that are used, due to the internals, especially when using batching
  • #10  - fraud detection - alerts generation - variance in throughput -> auto-scaling - disorder -> e.g., a plane full of people taking their phones out of airplane mode after having used them offline for the entire flight - disorder -> producer parallelism and retries – specific to the tools that are used, due to the internals, especially when using batching
  • #11  - fraud detection - alerts generation - variance in throughput -> auto-scaling - disorder -> e.g., a plane full of people taking their phones out of airplane mode after having used them offline for the entire flight - disorder -> producer parallelism and retries – specific to the tools that are used, due to the internals, especially when using batching
  • #12  - let’s try to understand what is Stream Processing - the key things: on the fly priour to it’s storage, infinite data set in mind, data in motion
  • #13  - the key things: on the fly priour to it’s storage, infinite data set in mind, data in motion
  • #14  - the key things: on the fly priour to it’s storage, infinite data set in mind, data in motion
  • #15 How it’s different from classical Batch Processing, when we run periodical jobs to handle our data? TODO: review and maybe come up with own points TODO: combine 1 & 2
  • #16 TODO: review and maybe come up with own points
  • #17 TODO: review and maybe come up with own points
  • #18 TODO: review and maybe come up with own points
  • #19 1. Architecturally, stream processing system usually consists of the following 2 layers 2. Now let’s see how the typical Stream Processing system looks in practice TODO: icons for tech stack of each layer ?!!!! TODO: make horizontal?!!!
  • #20  - hence, all this doesn’t come for free, there are multiple challenges to solve when you are Processing the Infinite Stream - problems: how to form the cluster how to coordinate and how to control the required level of consistency
  • #21  - problems: how to form the cluster how to coordinate and how to control the required level of consistency
  • #22  - problems: how to form the cluster how to coordinate and how to control the required level of consistency
  • #23  - how to solve these problems? - next slide -
  • #24  - Hazelcast Jet is one of the products which aim to solve such problem
  • #31 Architecturally, Jet consists of the following layers
  • #32 TODO: where is DAG API here? Jet Member is also a fully functional Hazelcast IMDG Member and a Jet Cluster is also a Hazelcast IMDG Cluster Hazelcast IMDG provides Layer of cluster management, deployment, data partitioning and networking In-Memory store for Jet Processing state Shared state to connect multiple Jet Jobs Remote data caching Enrichment data source
  • #33 TODO: where is DAG API here? Jet Member is also a fully functional Hazelcast IMDG Member and a Jet Cluster is also a Hazelcast IMDG Cluster Hazelcast IMDG provides Layer of cluster management, deployment, data partitioning and networking In-Memory store for Jet Processing state Shared state to connect multiple Jet Jobs Remote data caching Enrichment data source
  • #34 TODO: unify orange color among slides! TODO: animation
  • #35 Uses Hazelcast IMDG Clustering under the hood Peer-To-Peer communication Members can be either set statically or automatically discovered Elastically scales up or down Topologies Embedded Client-Server
  • #36 Uses Hazelcast IMDG Clustering under the hood Peer-To-Peer communication Members can be either set statically or automatically discovered Elastically scales up or down Topologies Embedded Client-Server
  • #37 Unit of work described by DAG which is submitted to the cluster for execution Asynchronous, Distributed Submitted to each running member *Scales up/down when adding removing members Embeds JAR with the source code, if needed
  • #39 Automatic dependent surveillance — broadcast (ADS–B) is a surveillance technology in which an aircraft determines its position via satellite navigation and periodically broadcasts it, enabling it to be tracked. The information can be received by air traffic control ground stations as a replacement for secondary surveillance radar, as no interrogation signal is needed from the ground. It can also be received by other aircraft to provide situational awareness and allow self-separation. ADS–B is "automatic" in that it requires no pilot or external input. It is "dependent" in that it depends on data from the aircraft's navigation system.[1]
  • #40 TODO: more info plus diagram
  • #41 TODO: more info plus diagram
  • #42 Context propagation for map, flatMap and filter 2) Aggregation and Grouping Transformation of a set of input values sharing the same distinct key into a single output value Build-in Aggregate Operations for count, different kind avagares, min/max, linear trends and many more Easy to implement own aggregations 3) Co-Aggregation groupBy over the items from more than one contributing stream Like JOIN with the Group By in SQL Typical use case - collecting stats over the user activity coming from the several streams 4) Hash-Join Join of one finite stream with another, possibly infinite stream Optimized for data enrichment - when each item of the primary stream gets enriched with the data resolved by a hashtable lookup To optimize the performance, the entire enriching stream is replicated on each Jet member
  • #43 Context propagation for map, flatMap and filter 2) Aggregation and Grouping Transformation of a set of input values sharing the same distinct key into a single output value Build-in Aggregate Operations for count, different kind avagares, min/max, linear trends and many more Easy to implement own aggregations 3) Co-Aggregation groupBy over the items from more than one contributing stream Like JOIN with the Group By in SQL Typical use case - collecting stats over the user activity coming from the several streams 4) Hash-Join Join of one finite stream with another, possibly infinite stream Optimized for data enrichment - when each item of the primary stream gets enriched with the data resolved by a hashtable lookup To optimize the performance, the entire enriching stream is replicated on each Jet member
  • #44 Context propagation for map, flatMap and filter 2) Aggregation and Grouping Transformation of a set of input values sharing the same distinct key into a single output value Build-in Aggregate Operations for count, different kind avagares, min/max, linear trends and many more Easy to implement own aggregations 3) Co-Aggregation groupBy over the items from more than one contributing stream Like JOIN with the Group By in SQL Typical use case - collecting stats over the user activity coming from the several streams 4) Hash-Join Join of one finite stream with another, possibly infinite stream Optimized for data enrichment - when each item of the primary stream gets enriched with the data resolved by a hashtable lookup To optimize the performance, the entire enriching stream is replicated on each Jet member
  • #45 Co-Aggregation – join page visits, user data and payments Context propagation for map, flatMap and filter 2) Aggregation and Grouping Transformation of a set of input values sharing the same distinct key into a single output value Build-in Aggregate Operations for count, different kind avagares, min/max, linear trends and many more Easy to implement own aggregations 3) Co-Aggregation groupBy over the items from more than one contributing stream Like JOIN with the Group By in SQL Typical use case - collecting stats over the user activity coming from the several streams 4) Hash-Join Join of one finite stream with another, possibly infinite stream Optimized for data enrichment - when each item of the primary stream gets enriched with the data resolved by a hashtable lookup To optimize the performance, the entire enriching stream is replicated on each Jet member
  • #49 TODO: must be “Event time” on axis
  • #52 TODO: Add a client App and make animations.
  • #56 TODO: ANIMATIONS!!! TODO: add a final step – when the snapshot completed - due to parallelism, in most cases a processor receives data from more than one upstream processor -
  • #57 TODO: ANIMATIONS!!! TODO: add a final step – when the snapshot completed - due to parallelism, in most cases a processor receives data from more than one upstream processor -
  • #58 TODO: ANIMATIONS!!! TODO: add a final step – when the snapshot completed - due to parallelism, in most cases a processor receives data from more than one upstream processor -
  • #59 TODO: ANIMATIONS!!! TODO: add a final step – when the snapshot completed - due to parallelism, in most cases a processor receives data from more than one upstream processor -
  • #60 TODO: animations
  • #61 TODO: animations
  • #62 TODO: animations
  • #74 Why it’s worth considering Jet for your next stream processing task
  • #75 TODO: Key Competitive Differentiators?
  • #76 TODO: Key Competitive Differentiators?
  • #77 TODO: Key Competitive Differentiators?
  • #78 TODO: Key Competitive Differentiators?
  • #79 TODO: Key Competitive Differentiators? Mention that this is an open product, e.g. it’s easy to implement a connector
  • #80 TODO: add resources