SlideShare a Scribd company logo
1 of 79
IN-MEMORY STREAM PROCESSING WITH
Nazarii Cherkas | Hazelcast
nazarii@hazelcast.com
https://twitter.com/n_cherkas
Brief Agenda
• Why Stream Processing?
• What‘s special about Streaming Data
• Challenges when processing the Infinite Stream
• Hazelcast Jet: The modern Stream Processing Engine
• Overview and Key Concepts
• Infinite Stream Processing
• Fault Tolerance
• Jet Performance
• Summary
2© 2018 Hazelcast Inc.
About me
• 7+ years of experience of on different positions
from Java Engineer to Team Lead
3© 2018 Hazelcast Inc.
About me
• 7+ years of experience of on different positions
from Java Engineer to Team Lead
• Solutions Architect at Hazelcast, I solve
problems of our users and interact with
community
4© 2018 Hazelcast Inc.
Why Stream Processing?
5© 2018 Hazelcast Inc.
Streaming Data is everywhere
6© 2018 Hazelcast Inc.
What's special about Streaming Data
• Infinite data sets
7© 2018 Hazelcast Inc.
What's special about Streaming Data
• Infinite data sets
• Small size of data record
8© 2018 Hazelcast Inc.
What's special about Streaming Data
• Infinite data sets
• Small size of data record
• Near real-time insights
9© 2018 Hazelcast Inc.
What's special about Streaming Data
• Infinite data sets
• Small size of data record
• Near real-time insights
• Variance in throughput and variance in disorder
10© 2018 Hazelcast Inc.
Definitions of Stream Processing
“...a type of data processing that is designed with infinite data sets in
mind...”
https://jet.hazelcast.org/use-cases/real-time-stream-processing/
https://data-artisans.com/what-is-stream-processing
https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-101
11© 2018 Hazelcast Inc.
Definitions of Stream Processing
“...a type of data processing that is designed with infinite data sets in
mind...”
“...processing of data in motion, or in other words, computing on data
directly as it is produced or received…”
https://jet.hazelcast.org/use-cases/real-time-stream-processing/
https://data-artisans.com/what-is-stream-processing
https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-101
12© 2018 Hazelcast Inc.
Definitions of Stream Processing
“...a type of data processing that is designed with infinite data sets in
mind...”
“...processing of data in motion, or in other words, computing on data
directly as it is produced or received…”
“...a technique to process the data on-the-fly, prior to it’s storage...”
https://jet.hazelcast.org/use-cases/real-time-stream-processing/
https://data-artisans.com/what-is-stream-processing
https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-101
13© 2018 Hazelcast Inc.
Stream vs Batch Processing
https://aws.amazon.com/streaming-data/
14
Batch processing Stream processing
Data scope Queries or processing over all or
most of the data in the dataset
Queries or processing over data
within a rolling time window, or on
just the most recent data record
© 2018 Hazelcast Inc.
Stream vs Batch Processing
https://aws.amazon.com/streaming-data/
15
Batch processing Stream processing
Data scope Queries or processing over all or
most of the data in the dataset
Queries or processing over data
within a rolling time window, or on
just the most recent data record
Data size Large batches of data
Individual records or micro batches
consisting of a few records
© 2018 Hazelcast Inc.
Stream vs Batch Processing
https://aws.amazon.com/streaming-data/
16
Batch processing Stream processing
Data scope Queries or processing over all or
most of the data in the dataset
Queries or processing over data
within a rolling time window, or on
just the most recent data record
Data size Large batches of data
Individual records or micro batches
consisting of a few records
Responsiveness Latencies in minutes to hours
Requires latency in the order of
seconds or milliseconds
© 2018 Hazelcast Inc.
Stream vs Batch Processing
https://aws.amazon.com/streaming-data/
17
Batch processing Stream processing
Data scope Queries or processing over all or
most of the data in the dataset
Queries or processing over data
within a rolling time window, or on
just the most recent data record
Data size Large batches of data
Individual records or micro batches
consisting of a few records
Responsiveness Latencies in minutes to hours
Requires latency in the order of
seconds or milliseconds
Analyses Complex analytics
Aggregates, simple response
functions and rolling metrics
© 2018 Hazelcast Inc.
Layers of Stream Processing
18© 2018 Hazelcast Inc.
Challenges of Stream Processing
• Distributed system coordination
19© 2018 Hazelcast Inc.
Challenges of Stream Processing
• Distributed system coordination
• Notion of time
20© 2018 Hazelcast Inc.
Challenges of Stream Processing
• Distributed system coordination
• Notion of time
• Memory management
21© 2018 Hazelcast Inc.
Challenges of Stream Processing
• Distributed system coordination
• Notion of time
• Memory management
• Fault-tolerance
22© 2018 Hazelcast Inc.
Hazelcast Jet: In-Memory Streaming and
Fast Batch Processing
23© 2018 Hazelcast Inc.
What is Hazelcast Jet
https://github.com/hazelcast/hazelcast-jet/
Apache License 2.0
24© 2018 Hazelcast Inc.
Source Sink
Hazelcast Jet use cases
• Low-latency Stream processing and analytics
25© 2018 Hazelcast Inc.
Hazelcast Jet use cases
• Low-latency Stream processing and analytics
• Fast Batch processing and ETL
26© 2018 Hazelcast Inc.
Hazelcast Jet use cases
• Low-latency Stream processing and analytics
• Fast Batch processing and ETL
• Distributed java.util.stream
27© 2018 Hazelcast Inc.
Hazelcast Jet use cases
• Low-latency Stream processing and analytics
• Fast Batch processing and ETL
• Distributed java.util.stream
• Implementing event sourcing and CQRS
28© 2018 Hazelcast Inc.
Hazelcast Jet use cases
• Low-latency Stream processing and analytics
• Fast Batch processing and ETL
• Distributed java.util.stream
• Implementing event sourcing and CQRS
• Data processing microservice architectures
29© 2018 Hazelcast Inc.
Hazelcast Jet: Architecture Overview
30
Core API
java.util.stream
Batch Readers and Writers
Batch Processing
Pipeline API
Streaming Readers and Writers
Stream Processing
Networking
Deployment
Data Structures and Partition Management
Execution Engine
Cluster Management with Cloud Discovery SPI
Java Client
Fault-Tolerance
Connectors
High-Level APIs
Processing
Core
© 2018 Hazelcast Inc.
Hazelcast Jet: Architecture Overview
31
Core API
java.util.stream
Batch Readers and Writers
Batch Processing
Pipeline API
Streaming Readers and Writers
Stream Processing
Networking
Deployment
Data Structures and Partition Management
Execution Engine
Cluster Management with Cloud Discovery SPI
Java Client
Fault-Tolerance
Connectors
High-Level APIs
Processing
Core
© 2018 Hazelcast Inc.
Hazelcast Jet: Architecture Overview
32
Core API
java.util.stream
Batch Readers and Writers
Batch Processing
Pipeline API
Streaming Readers and Writers
Stream Processing
Networking
Deployment
Data Structures and Partition Management
Execution Engine
Cluster Management with Cloud Discovery SPI
Java Client
Fault-Tolerance
Connectors
High-Level APIs
Processing
Core
© 2018 Hazelcast Inc.
Key concepts
Distributed Acyclic Graph (DAG)
33© 2018 Hazelcast Inc.
Key concepts
Jet Cluster
34© 2018 Hazelcast Inc.
Key concepts
Jet Cluster
35© 2018 Hazelcast Inc.
Key concepts
Job Execution
36© 2018 Hazelcast Inc.
Infinite Stream Processing with Jet
37© 2018 Hazelcast Inc.
Jet Streaming Demo
Flight Telemetry
Processing a near real-time Flight Telemetry Stream from ADS-B Exchange
- https://www.adsbexchange.com/
• Filter out planes outside of defined airports
38© 2018 Hazelcast Inc.
Jet Streaming Demo
Flight Telemetry
Processing a near real-time Flight Telemetry Stream from ADS-B Exchange
- https://www.adsbexchange.com/
• Filter out planes outside of defined airports
• Sliding over last 1 minute to detect, whether the plane is ascending, descending or
staying in the same level
39© 2018 Hazelcast Inc.
Jet Streaming Demo
Flight Telemetry
Processing a near real-time Flight Telemetry Stream from ADS-B Exchange
- https://www.adsbexchange.com/
• Filter out planes outside of defined airports
• Sliding over last 1 minute to detect, whether the plane is ascending, descending or
staying in the same level
• Based on the plane type and phase of the flight provides information about maximum
noise levels nearby to the airport and estimated C02 emissions for a region
40© 2018 Hazelcast Inc.
https://github.com/hazelcast/hazelcast-jet-demos/tree/master/flight-telemetry
Pipeline transformations
• Time-agnostic transformations
• Filter
• Map
• Flatmap
41© 2018 Hazelcast Inc.
Pipeline transformations
• Time-agnostic transformations
• Filter
• Map
• Flatmap
• Aggregation and Grouping
• Build-in count, different kind averages, min/max, linear trends and many more
42© 2018 Hazelcast Inc.
Pipeline transformations
• Time-agnostic transformations
• Filter
• Map
• Flatmap
• Aggregation and Grouping
• Build-in count, different kind averages, min/max, linear trends and many more
• Co-Aggregation
43© 2018 Hazelcast Inc.
Pipeline transformations
• Time-agnostic transformations
• Filter
• Map
• Flatmap
• Aggregation and Grouping
• Build-in count, different kind averages, min/max, linear trends and many more
• Co-Aggregation
• Hash-Join
44© 2018 Hazelcast Inc.
Windowing
45© 2018 Hazelcast Inc.
Windowing
46© 2018 Hazelcast Inc.
Windowing
47© 2018 Hazelcast Inc.
Watermarks to handle Late Events
Makes an educated guess that “from this point on there will be no more
items with timestamp less than this”
48© 2018 Hazelcast Inc.
Watermarks in Jet
Predefined Watermark Policies
• With Fixed Lag
• Limiting Lag and Delay
• Limiting Lag and Lull
• Limiting Timestamp and Wall-Clock Lag
49© 2018 Hazelcast Inc.
Fault Tolerance
50© 2018 Hazelcast Inc.
Jet Processing Fault Tolerance
Cluster elects a Coordinator Member who takes care of the Job Coordination
among the Cluster Members
51© 2018 Hazelcast Inc.
Jet Processing Fault Tolerance
Jet achieves fault tolerance in streaming jobs by making a snapshot of the
internal processing state
52© 2018 Hazelcast Inc.
Jet Processing Fault Tolerance
Coordinator Member detects the other Member failure and restarts the Job
using new topology
53© 2018 Hazelcast Inc.
Jet Processing Fault Tolerance
When the Coordinator Member crashes the new one is elected by the
Cluster
54© 2018 Hazelcast Inc.
Distributed Snapshots
Technique 1st described in a paper by Chandy and Lamport in 1989
55© 2018 Hazelcast Inc.
Distributed Snapshots
Technique 1st described in a paper by Chandy and Lamport in 1989
56© 2018 Hazelcast Inc.
Distributed Snapshots
Technique 1st described in a paper by Chandy and Lamport in 1989
57© 2018 Hazelcast Inc.
Distributed Snapshots
Technique 1st described in a paper by Chandy and Lamport in 1989
58© 2018 Hazelcast Inc.
Jet Processing Guarantees
• At-Least Once
59© 2018 Hazelcast Inc.
Jet Processing Guarantees
• At-Least Once
• Exactly Once
60© 2018 Hazelcast Inc.
Jet Processing Guarantees
• At-Least Once
• Exactly Once
• At-Most Once (meaning that the Fault Tolerance is turned off)
61© 2018 Hazelcast Inc.
Performance
62© 2018 Hazelcast Inc.
Hazelcast Jet Performance
Key Design Decisions
• DAG to Model Computations
63© 2018 Hazelcast Inc.
Hazelcast Jet Performance
Key Design Decisions
• DAG to Model Computations
• In-Memory Data Locality
64© 2018 Hazelcast Inc.
Hazelcast Jet Performance
Key Design Decisions
• DAG to Model Computations
• In-Memory Data Locality
• Partition Mapping Affinity
65© 2018 Hazelcast Inc.
Hazelcast Jet Performance
Key Design Decisions
• DAG to Model Computations
• In-Memory Data Locality
• Partition Mapping Affinity
• SP/SC Queues
66© 2018 Hazelcast Inc.
Hazelcast Jet Performance
Key Design Decisions
• DAG to Model Computations
• In-Memory Data Locality
• Partition Mapping Affinity
• SP/SC Queues
• Cooperative Multithreading (Green Threads)
67© 2018 Hazelcast Inc.
Jet Streaming Performance
68© 2018 Hazelcast Inc.
https://jet.hazelcast.org/performance/
Jet Throughput
69© 2018 Hazelcast Inc.
https://jet.hazelcast.org/performance/
© 2017 Hazelcast Inc.
Running Jet in Production
• Docker images - https://github.com/hazelcast/hazelcast-jet-docker
70© 2018 Hazelcast Inc.
Running Jet in Production
• Docker images - https://github.com/hazelcast/hazelcast-jet-docker
• Cluster Management: Mesos, Yarn
71© 2018 Hazelcast Inc.
Running Jet in Production
• Docker images - https://github.com/hazelcast/hazelcast-jet-docker
• Cluster Management: Mesos, Yarn
• Cluster Discovery
• Cloud Providers: AWS, Windows Azure, GCP, PCF, Heroku
• Kubernetes
• Consul, Eureka, Zookeeper
72© 2018 Hazelcast Inc.
Summary
Why you should consider to use the Hazelcast Jet
• High Performance | Industry Leading
73© 2018 Hazelcast Inc.
Summary
Why you should consider to use the Hazelcast Jet
• High Performance | Industry Leading
• Out-of-box integration with Hazelcast IMDG | Source, Sink, Enrichment
74© 2018 Hazelcast Inc.
Summary
Why you should consider to use the Hazelcast Jet
• High Performance | Industry Leading
• Out-of-box integration with Hazelcast IMDG | Source, Sink, Enrichment
• Easy to start with and integrate | Zero dependencies, developer friendly
75© 2018 Hazelcast Inc.
Summary
Why you should consider to use the Hazelcast Jet
• High Performance | Industry Leading
• Out-of-box integration with Hazelcast IMDG | Source, Sink, Enrichment
• Easy to start with and integrate | Zero dependencies, developer friendly
• Simple to deploy | Embedded 10MB jar or Client-Server
76© 2018 Hazelcast Inc.
Summary
Why you should consider to use the Hazelcast Jet
• High Performance | Industry Leading
• Out-of-box integration with Hazelcast IMDG | Source, Sink, Enrichment
• Easy to start with and integrate | Zero dependencies, developer friendly
• Simple to deploy | Embedded 10MB jar or Client-Server
• Works in every Cloud | Same as Hazelcast IMDG
77© 2018 Hazelcast Inc.
Summary
Why you should consider to use the Hazelcast Jet
• High Performance | Industry Leading
• Out-of-box integration with Hazelcast IMDG | Source, Sink, Enrichment
• Easy to start with and integrate | Zero dependencies, developer friendly
• Simple to deploy | Embedded 10MB jar or Client-Server
• Works in every Cloud | Same as Hazelcast IMDG
• For Developers by Developers | Code it
78© 2018 Hazelcast Inc.
Questions?
Version 0.6 is the current release with 0.7 coming Q3 2018
aiming for 1.0 this year
http://jet.hazelcast.org
https://groups.google.com/forum/#!forum/hazelcast-jet
https://gitter.im/hazelcast/hazelcast
79© 2018 Hazelcast Inc.

More Related Content

What's hot

From Legacy SQL Server to High Powered Confluent & Kafka Monitoring System at...
From Legacy SQL Server to High Powered Confluent & Kafka Monitoring System at...From Legacy SQL Server to High Powered Confluent & Kafka Monitoring System at...
From Legacy SQL Server to High Powered Confluent & Kafka Monitoring System at...HostedbyConfluent
 
From Spark to Ignition: Fueling Your Business on Real-Time Analytics
From Spark to Ignition: Fueling Your Business on Real-Time AnalyticsFrom Spark to Ignition: Fueling Your Business on Real-Time Analytics
From Spark to Ignition: Fueling Your Business on Real-Time AnalyticsSingleStore
 
Data & Analytics Forum: Moving Telcos to Real Time
Data & Analytics Forum: Moving Telcos to Real TimeData & Analytics Forum: Moving Telcos to Real Time
Data & Analytics Forum: Moving Telcos to Real TimeSingleStore
 
Real-Time Geospatial Intelligence at Scale
Real-Time Geospatial Intelligence at Scale Real-Time Geospatial Intelligence at Scale
Real-Time Geospatial Intelligence at Scale SingleStore
 
The Future of ETL - Strata Data New York 2018
The Future of ETL - Strata Data New York 2018The Future of ETL - Strata Data New York 2018
The Future of ETL - Strata Data New York 2018confluent
 
Modeling the Smart and Connected City of the Future with Kafka and Spark
Modeling the Smart and Connected City of the Future with Kafka and SparkModeling the Smart and Connected City of the Future with Kafka and Spark
Modeling the Smart and Connected City of the Future with Kafka and SparkSingleStore
 
O'Reilly Media Webcast: Building Real-Time Data Pipelines
O'Reilly Media Webcast: Building Real-Time Data PipelinesO'Reilly Media Webcast: Building Real-Time Data Pipelines
O'Reilly Media Webcast: Building Real-Time Data PipelinesSingleStore
 
Google Cloud Data Platform - Why Google for Data Analysis?
Google Cloud Data Platform - Why Google for Data Analysis?Google Cloud Data Platform - Why Google for Data Analysis?
Google Cloud Data Platform - Why Google for Data Analysis?Andreas Raible
 
See who is using MemSQL
See who is using MemSQLSee who is using MemSQL
See who is using MemSQLjenjermain
 
Real-Time Analytics with Spark and MemSQL
Real-Time Analytics with Spark and MemSQLReal-Time Analytics with Spark and MemSQL
Real-Time Analytics with Spark and MemSQLSingleStore
 
Apache Kafka and the Data Mesh | Ben Stopford and Michael Noll, Confluent
Apache Kafka and the Data Mesh | Ben Stopford and Michael Noll, ConfluentApache Kafka and the Data Mesh | Ben Stopford and Michael Noll, Confluent
Apache Kafka and the Data Mesh | Ben Stopford and Michael Noll, ConfluentHostedbyConfluent
 
Leveraging Spark to Democratize Data for Omni-Commerce with Shafaq Abdullah
Leveraging Spark to Democratize Data for Omni-Commerce with Shafaq AbdullahLeveraging Spark to Democratize Data for Omni-Commerce with Shafaq Abdullah
Leveraging Spark to Democratize Data for Omni-Commerce with Shafaq AbdullahDatabricks
 
How Microsoft Built and Scaled Cosmos
How Microsoft Built and Scaled CosmosHow Microsoft Built and Scaled Cosmos
How Microsoft Built and Scaled CosmosSingleStore
 
InfoTrack: Creating a single source of truth with the Elastic Stack
InfoTrack: Creating a single source of truth with the Elastic StackInfoTrack: Creating a single source of truth with the Elastic Stack
InfoTrack: Creating a single source of truth with the Elastic StackElasticsearch
 
How to build a self-service data platform and what it can do for your business?
How to build a self-service data platform and what it can do for your business?How to build a self-service data platform and what it can do for your business?
How to build a self-service data platform and what it can do for your business?Roger Rafanell Mas
 
Building a Modern, Scalable Cyber Intelligence Platform with Apache Kafka | J...
Building a Modern, Scalable Cyber Intelligence Platform with Apache Kafka | J...Building a Modern, Scalable Cyber Intelligence Platform with Apache Kafka | J...
Building a Modern, Scalable Cyber Intelligence Platform with Apache Kafka | J...HostedbyConfluent
 
Building Streaming Data Pipelines with Google Cloud Dataflow and Confluent Cl...
Building Streaming Data Pipelines with Google Cloud Dataflow and Confluent Cl...Building Streaming Data Pipelines with Google Cloud Dataflow and Confluent Cl...
Building Streaming Data Pipelines with Google Cloud Dataflow and Confluent Cl...HostedbyConfluent
 
Enforcing Schemas with Kafka Connect | David Navalho, Marionete and Anatol Lu...
Enforcing Schemas with Kafka Connect | David Navalho, Marionete and Anatol Lu...Enforcing Schemas with Kafka Connect | David Navalho, Marionete and Anatol Lu...
Enforcing Schemas with Kafka Connect | David Navalho, Marionete and Anatol Lu...HostedbyConfluent
 
Real-Time Analytics with Confluent and MemSQL
Real-Time Analytics with Confluent and MemSQLReal-Time Analytics with Confluent and MemSQL
Real-Time Analytics with Confluent and MemSQLSingleStore
 

What's hot (20)

From Legacy SQL Server to High Powered Confluent & Kafka Monitoring System at...
From Legacy SQL Server to High Powered Confluent & Kafka Monitoring System at...From Legacy SQL Server to High Powered Confluent & Kafka Monitoring System at...
From Legacy SQL Server to High Powered Confluent & Kafka Monitoring System at...
 
From Spark to Ignition: Fueling Your Business on Real-Time Analytics
From Spark to Ignition: Fueling Your Business on Real-Time AnalyticsFrom Spark to Ignition: Fueling Your Business on Real-Time Analytics
From Spark to Ignition: Fueling Your Business on Real-Time Analytics
 
Data & Analytics Forum: Moving Telcos to Real Time
Data & Analytics Forum: Moving Telcos to Real TimeData & Analytics Forum: Moving Telcos to Real Time
Data & Analytics Forum: Moving Telcos to Real Time
 
Real-Time Geospatial Intelligence at Scale
Real-Time Geospatial Intelligence at Scale Real-Time Geospatial Intelligence at Scale
Real-Time Geospatial Intelligence at Scale
 
The Future of ETL - Strata Data New York 2018
The Future of ETL - Strata Data New York 2018The Future of ETL - Strata Data New York 2018
The Future of ETL - Strata Data New York 2018
 
Modeling the Smart and Connected City of the Future with Kafka and Spark
Modeling the Smart and Connected City of the Future with Kafka and SparkModeling the Smart and Connected City of the Future with Kafka and Spark
Modeling the Smart and Connected City of the Future with Kafka and Spark
 
O'Reilly Media Webcast: Building Real-Time Data Pipelines
O'Reilly Media Webcast: Building Real-Time Data PipelinesO'Reilly Media Webcast: Building Real-Time Data Pipelines
O'Reilly Media Webcast: Building Real-Time Data Pipelines
 
Google Cloud Data Platform - Why Google for Data Analysis?
Google Cloud Data Platform - Why Google for Data Analysis?Google Cloud Data Platform - Why Google for Data Analysis?
Google Cloud Data Platform - Why Google for Data Analysis?
 
See who is using MemSQL
See who is using MemSQLSee who is using MemSQL
See who is using MemSQL
 
Real-Time Analytics with Spark and MemSQL
Real-Time Analytics with Spark and MemSQLReal-Time Analytics with Spark and MemSQL
Real-Time Analytics with Spark and MemSQL
 
Apache Kafka and the Data Mesh | Ben Stopford and Michael Noll, Confluent
Apache Kafka and the Data Mesh | Ben Stopford and Michael Noll, ConfluentApache Kafka and the Data Mesh | Ben Stopford and Michael Noll, Confluent
Apache Kafka and the Data Mesh | Ben Stopford and Michael Noll, Confluent
 
Leveraging Spark to Democratize Data for Omni-Commerce with Shafaq Abdullah
Leveraging Spark to Democratize Data for Omni-Commerce with Shafaq AbdullahLeveraging Spark to Democratize Data for Omni-Commerce with Shafaq Abdullah
Leveraging Spark to Democratize Data for Omni-Commerce with Shafaq Abdullah
 
How Microsoft Built and Scaled Cosmos
How Microsoft Built and Scaled CosmosHow Microsoft Built and Scaled Cosmos
How Microsoft Built and Scaled Cosmos
 
Intuit Analytics Cloud 101
Intuit Analytics Cloud 101Intuit Analytics Cloud 101
Intuit Analytics Cloud 101
 
InfoTrack: Creating a single source of truth with the Elastic Stack
InfoTrack: Creating a single source of truth with the Elastic StackInfoTrack: Creating a single source of truth with the Elastic Stack
InfoTrack: Creating a single source of truth with the Elastic Stack
 
How to build a self-service data platform and what it can do for your business?
How to build a self-service data platform and what it can do for your business?How to build a self-service data platform and what it can do for your business?
How to build a self-service data platform and what it can do for your business?
 
Building a Modern, Scalable Cyber Intelligence Platform with Apache Kafka | J...
Building a Modern, Scalable Cyber Intelligence Platform with Apache Kafka | J...Building a Modern, Scalable Cyber Intelligence Platform with Apache Kafka | J...
Building a Modern, Scalable Cyber Intelligence Platform with Apache Kafka | J...
 
Building Streaming Data Pipelines with Google Cloud Dataflow and Confluent Cl...
Building Streaming Data Pipelines with Google Cloud Dataflow and Confluent Cl...Building Streaming Data Pipelines with Google Cloud Dataflow and Confluent Cl...
Building Streaming Data Pipelines with Google Cloud Dataflow and Confluent Cl...
 
Enforcing Schemas with Kafka Connect | David Navalho, Marionete and Anatol Lu...
Enforcing Schemas with Kafka Connect | David Navalho, Marionete and Anatol Lu...Enforcing Schemas with Kafka Connect | David Navalho, Marionete and Anatol Lu...
Enforcing Schemas with Kafka Connect | David Navalho, Marionete and Anatol Lu...
 
Real-Time Analytics with Confluent and MemSQL
Real-Time Analytics with Confluent and MemSQLReal-Time Analytics with Confluent and MemSQL
Real-Time Analytics with Confluent and MemSQL
 

Similar to In-Memory Stream Processing with Hazelcast Jet @JEEConf

In-Memory Stream Processing with Hazelcast Jet @MorningAtLohika
In-Memory Stream Processing with Hazelcast Jet @MorningAtLohikaIn-Memory Stream Processing with Hazelcast Jet @MorningAtLohika
In-Memory Stream Processing with Hazelcast Jet @MorningAtLohikaNazarii Cherkas
 
Hadoop Application Architectures tutorial at Big DataService 2015
Hadoop Application Architectures tutorial at Big DataService 2015Hadoop Application Architectures tutorial at Big DataService 2015
Hadoop Application Architectures tutorial at Big DataService 2015hadooparchbook
 
PayPal datalake journey | teradata - edge of next | san diego | 2017 october ...
PayPal datalake journey | teradata - edge of next | san diego | 2017 october ...PayPal datalake journey | teradata - edge of next | san diego | 2017 october ...
PayPal datalake journey | teradata - edge of next | san diego | 2017 october ...Deepak Chandramouli
 
Zero to Snowflake Presentation
Zero to Snowflake Presentation Zero to Snowflake Presentation
Zero to Snowflake Presentation Brett VanderPlaats
 
Unconference Round Table Notes
Unconference Round Table NotesUnconference Round Table Notes
Unconference Round Table NotesTimothy Spann
 
Cassandra summit-2013
Cassandra summit-2013Cassandra summit-2013
Cassandra summit-2013dfilppi
 
Geek Nights Hong Kong
Geek Nights Hong KongGeek Nights Hong Kong
Geek Nights Hong KongRahul Gupta
 
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...DataStax
 
Big Data LDN 2018: STREAM PROCESSING TAKES ON EVERYTHING
Big Data LDN 2018: STREAM PROCESSING TAKES ON EVERYTHINGBig Data LDN 2018: STREAM PROCESSING TAKES ON EVERYTHING
Big Data LDN 2018: STREAM PROCESSING TAKES ON EVERYTHINGMatt Stubbs
 
First in Class: Optimizing the Data Lake for Tighter Integration
First in Class: Optimizing the Data Lake for Tighter IntegrationFirst in Class: Optimizing the Data Lake for Tighter Integration
First in Class: Optimizing the Data Lake for Tighter IntegrationInside Analysis
 
YugaByte + PKS CloudFoundry Meetup 10/15/2018
YugaByte + PKS CloudFoundry Meetup 10/15/2018YugaByte + PKS CloudFoundry Meetup 10/15/2018
YugaByte + PKS CloudFoundry Meetup 10/15/2018AlanCaldera
 
Stream processing for the practitioner: Blueprints for common stream processi...
Stream processing for the practitioner: Blueprints for common stream processi...Stream processing for the practitioner: Blueprints for common stream processi...
Stream processing for the practitioner: Blueprints for common stream processi...Aljoscha Krettek
 
Cassandra Summit 2014: Internet of Complex Things Analytics with Apache Cassa...
Cassandra Summit 2014: Internet of Complex Things Analytics with Apache Cassa...Cassandra Summit 2014: Internet of Complex Things Analytics with Apache Cassa...
Cassandra Summit 2014: Internet of Complex Things Analytics with Apache Cassa...DataStax Academy
 
Stream Processing and Real-Time Data Pipelines
Stream Processing and Real-Time Data PipelinesStream Processing and Real-Time Data Pipelines
Stream Processing and Real-Time Data PipelinesVladimír Schreiner
 
DM Radio Webinar: Adopting a Streaming-Enabled Architecture
DM Radio Webinar: Adopting a Streaming-Enabled ArchitectureDM Radio Webinar: Adopting a Streaming-Enabled Architecture
DM Radio Webinar: Adopting a Streaming-Enabled ArchitectureDATAVERSITY
 
times ten in-memory database for extreme performance
times ten in-memory database for extreme performancetimes ten in-memory database for extreme performance
times ten in-memory database for extreme performanceOracle Korea
 
Make your data fly - Building data platform in AWS
Make your data fly - Building data platform in AWSMake your data fly - Building data platform in AWS
Make your data fly - Building data platform in AWSKimmo Kantojärvi
 
Veritas + MongoDB
Veritas + MongoDBVeritas + MongoDB
Veritas + MongoDBMongoDB
 

Similar to In-Memory Stream Processing with Hazelcast Jet @JEEConf (20)

In-Memory Stream Processing with Hazelcast Jet @MorningAtLohika
In-Memory Stream Processing with Hazelcast Jet @MorningAtLohikaIn-Memory Stream Processing with Hazelcast Jet @MorningAtLohika
In-Memory Stream Processing with Hazelcast Jet @MorningAtLohika
 
Hadoop Application Architectures tutorial at Big DataService 2015
Hadoop Application Architectures tutorial at Big DataService 2015Hadoop Application Architectures tutorial at Big DataService 2015
Hadoop Application Architectures tutorial at Big DataService 2015
 
PayPal datalake journey | teradata - edge of next | san diego | 2017 october ...
PayPal datalake journey | teradata - edge of next | san diego | 2017 october ...PayPal datalake journey | teradata - edge of next | san diego | 2017 october ...
PayPal datalake journey | teradata - edge of next | san diego | 2017 october ...
 
Zero to Snowflake Presentation
Zero to Snowflake Presentation Zero to Snowflake Presentation
Zero to Snowflake Presentation
 
Unconference Round Table Notes
Unconference Round Table NotesUnconference Round Table Notes
Unconference Round Table Notes
 
Cassandra summit-2013
Cassandra summit-2013Cassandra summit-2013
Cassandra summit-2013
 
Geek Nights Hong Kong
Geek Nights Hong KongGeek Nights Hong Kong
Geek Nights Hong Kong
 
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
 
Big Data LDN 2018: STREAM PROCESSING TAKES ON EVERYTHING
Big Data LDN 2018: STREAM PROCESSING TAKES ON EVERYTHINGBig Data LDN 2018: STREAM PROCESSING TAKES ON EVERYTHING
Big Data LDN 2018: STREAM PROCESSING TAKES ON EVERYTHING
 
First in Class: Optimizing the Data Lake for Tighter Integration
First in Class: Optimizing the Data Lake for Tighter IntegrationFirst in Class: Optimizing the Data Lake for Tighter Integration
First in Class: Optimizing the Data Lake for Tighter Integration
 
YugaByte + PKS CloudFoundry Meetup 10/15/2018
YugaByte + PKS CloudFoundry Meetup 10/15/2018YugaByte + PKS CloudFoundry Meetup 10/15/2018
YugaByte + PKS CloudFoundry Meetup 10/15/2018
 
Stream processing for the practitioner: Blueprints for common stream processi...
Stream processing for the practitioner: Blueprints for common stream processi...Stream processing for the practitioner: Blueprints for common stream processi...
Stream processing for the practitioner: Blueprints for common stream processi...
 
Cassandra Summit 2014: Internet of Complex Things Analytics with Apache Cassa...
Cassandra Summit 2014: Internet of Complex Things Analytics with Apache Cassa...Cassandra Summit 2014: Internet of Complex Things Analytics with Apache Cassa...
Cassandra Summit 2014: Internet of Complex Things Analytics with Apache Cassa...
 
Stream Processing and Real-Time Data Pipelines
Stream Processing and Real-Time Data PipelinesStream Processing and Real-Time Data Pipelines
Stream Processing and Real-Time Data Pipelines
 
DM Radio Webinar: Adopting a Streaming-Enabled Architecture
DM Radio Webinar: Adopting a Streaming-Enabled ArchitectureDM Radio Webinar: Adopting a Streaming-Enabled Architecture
DM Radio Webinar: Adopting a Streaming-Enabled Architecture
 
times ten in-memory database for extreme performance
times ten in-memory database for extreme performancetimes ten in-memory database for extreme performance
times ten in-memory database for extreme performance
 
Amazon Aurora
Amazon AuroraAmazon Aurora
Amazon Aurora
 
Make your data fly - Building data platform in AWS
Make your data fly - Building data platform in AWSMake your data fly - Building data platform in AWS
Make your data fly - Building data platform in AWS
 
Amazon Aurora: Database Week SF
Amazon Aurora: Database Week SFAmazon Aurora: Database Week SF
Amazon Aurora: Database Week SF
 
Veritas + MongoDB
Veritas + MongoDBVeritas + MongoDB
Veritas + MongoDB
 

Recently uploaded

Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfjimielynbastida
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 

Recently uploaded (20)

Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdf
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 

In-Memory Stream Processing with Hazelcast Jet @JEEConf

  • 1. IN-MEMORY STREAM PROCESSING WITH Nazarii Cherkas | Hazelcast nazarii@hazelcast.com https://twitter.com/n_cherkas
  • 2. Brief Agenda • Why Stream Processing? • What‘s special about Streaming Data • Challenges when processing the Infinite Stream • Hazelcast Jet: The modern Stream Processing Engine • Overview and Key Concepts • Infinite Stream Processing • Fault Tolerance • Jet Performance • Summary 2© 2018 Hazelcast Inc.
  • 3. About me • 7+ years of experience of on different positions from Java Engineer to Team Lead 3© 2018 Hazelcast Inc.
  • 4. About me • 7+ years of experience of on different positions from Java Engineer to Team Lead • Solutions Architect at Hazelcast, I solve problems of our users and interact with community 4© 2018 Hazelcast Inc.
  • 5. Why Stream Processing? 5© 2018 Hazelcast Inc.
  • 6. Streaming Data is everywhere 6© 2018 Hazelcast Inc.
  • 7. What's special about Streaming Data • Infinite data sets 7© 2018 Hazelcast Inc.
  • 8. What's special about Streaming Data • Infinite data sets • Small size of data record 8© 2018 Hazelcast Inc.
  • 9. What's special about Streaming Data • Infinite data sets • Small size of data record • Near real-time insights 9© 2018 Hazelcast Inc.
  • 10. What's special about Streaming Data • Infinite data sets • Small size of data record • Near real-time insights • Variance in throughput and variance in disorder 10© 2018 Hazelcast Inc.
  • 11. Definitions of Stream Processing “...a type of data processing that is designed with infinite data sets in mind...” https://jet.hazelcast.org/use-cases/real-time-stream-processing/ https://data-artisans.com/what-is-stream-processing https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-101 11© 2018 Hazelcast Inc.
  • 12. Definitions of Stream Processing “...a type of data processing that is designed with infinite data sets in mind...” “...processing of data in motion, or in other words, computing on data directly as it is produced or received…” https://jet.hazelcast.org/use-cases/real-time-stream-processing/ https://data-artisans.com/what-is-stream-processing https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-101 12© 2018 Hazelcast Inc.
  • 13. Definitions of Stream Processing “...a type of data processing that is designed with infinite data sets in mind...” “...processing of data in motion, or in other words, computing on data directly as it is produced or received…” “...a technique to process the data on-the-fly, prior to it’s storage...” https://jet.hazelcast.org/use-cases/real-time-stream-processing/ https://data-artisans.com/what-is-stream-processing https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-101 13© 2018 Hazelcast Inc.
  • 14. Stream vs Batch Processing https://aws.amazon.com/streaming-data/ 14 Batch processing Stream processing Data scope Queries or processing over all or most of the data in the dataset Queries or processing over data within a rolling time window, or on just the most recent data record © 2018 Hazelcast Inc.
  • 15. Stream vs Batch Processing https://aws.amazon.com/streaming-data/ 15 Batch processing Stream processing Data scope Queries or processing over all or most of the data in the dataset Queries or processing over data within a rolling time window, or on just the most recent data record Data size Large batches of data Individual records or micro batches consisting of a few records © 2018 Hazelcast Inc.
  • 16. Stream vs Batch Processing https://aws.amazon.com/streaming-data/ 16 Batch processing Stream processing Data scope Queries or processing over all or most of the data in the dataset Queries or processing over data within a rolling time window, or on just the most recent data record Data size Large batches of data Individual records or micro batches consisting of a few records Responsiveness Latencies in minutes to hours Requires latency in the order of seconds or milliseconds © 2018 Hazelcast Inc.
  • 17. Stream vs Batch Processing https://aws.amazon.com/streaming-data/ 17 Batch processing Stream processing Data scope Queries or processing over all or most of the data in the dataset Queries or processing over data within a rolling time window, or on just the most recent data record Data size Large batches of data Individual records or micro batches consisting of a few records Responsiveness Latencies in minutes to hours Requires latency in the order of seconds or milliseconds Analyses Complex analytics Aggregates, simple response functions and rolling metrics © 2018 Hazelcast Inc.
  • 18. Layers of Stream Processing 18© 2018 Hazelcast Inc.
  • 19. Challenges of Stream Processing • Distributed system coordination 19© 2018 Hazelcast Inc.
  • 20. Challenges of Stream Processing • Distributed system coordination • Notion of time 20© 2018 Hazelcast Inc.
  • 21. Challenges of Stream Processing • Distributed system coordination • Notion of time • Memory management 21© 2018 Hazelcast Inc.
  • 22. Challenges of Stream Processing • Distributed system coordination • Notion of time • Memory management • Fault-tolerance 22© 2018 Hazelcast Inc.
  • 23. Hazelcast Jet: In-Memory Streaming and Fast Batch Processing 23© 2018 Hazelcast Inc.
  • 24. What is Hazelcast Jet https://github.com/hazelcast/hazelcast-jet/ Apache License 2.0 24© 2018 Hazelcast Inc. Source Sink
  • 25. Hazelcast Jet use cases • Low-latency Stream processing and analytics 25© 2018 Hazelcast Inc.
  • 26. Hazelcast Jet use cases • Low-latency Stream processing and analytics • Fast Batch processing and ETL 26© 2018 Hazelcast Inc.
  • 27. Hazelcast Jet use cases • Low-latency Stream processing and analytics • Fast Batch processing and ETL • Distributed java.util.stream 27© 2018 Hazelcast Inc.
  • 28. Hazelcast Jet use cases • Low-latency Stream processing and analytics • Fast Batch processing and ETL • Distributed java.util.stream • Implementing event sourcing and CQRS 28© 2018 Hazelcast Inc.
  • 29. Hazelcast Jet use cases • Low-latency Stream processing and analytics • Fast Batch processing and ETL • Distributed java.util.stream • Implementing event sourcing and CQRS • Data processing microservice architectures 29© 2018 Hazelcast Inc.
  • 30. Hazelcast Jet: Architecture Overview 30 Core API java.util.stream Batch Readers and Writers Batch Processing Pipeline API Streaming Readers and Writers Stream Processing Networking Deployment Data Structures and Partition Management Execution Engine Cluster Management with Cloud Discovery SPI Java Client Fault-Tolerance Connectors High-Level APIs Processing Core © 2018 Hazelcast Inc.
  • 31. Hazelcast Jet: Architecture Overview 31 Core API java.util.stream Batch Readers and Writers Batch Processing Pipeline API Streaming Readers and Writers Stream Processing Networking Deployment Data Structures and Partition Management Execution Engine Cluster Management with Cloud Discovery SPI Java Client Fault-Tolerance Connectors High-Level APIs Processing Core © 2018 Hazelcast Inc.
  • 32. Hazelcast Jet: Architecture Overview 32 Core API java.util.stream Batch Readers and Writers Batch Processing Pipeline API Streaming Readers and Writers Stream Processing Networking Deployment Data Structures and Partition Management Execution Engine Cluster Management with Cloud Discovery SPI Java Client Fault-Tolerance Connectors High-Level APIs Processing Core © 2018 Hazelcast Inc.
  • 33. Key concepts Distributed Acyclic Graph (DAG) 33© 2018 Hazelcast Inc.
  • 34. Key concepts Jet Cluster 34© 2018 Hazelcast Inc.
  • 35. Key concepts Jet Cluster 35© 2018 Hazelcast Inc.
  • 36. Key concepts Job Execution 36© 2018 Hazelcast Inc.
  • 37. Infinite Stream Processing with Jet 37© 2018 Hazelcast Inc.
  • 38. Jet Streaming Demo Flight Telemetry Processing a near real-time Flight Telemetry Stream from ADS-B Exchange - https://www.adsbexchange.com/ • Filter out planes outside of defined airports 38© 2018 Hazelcast Inc.
  • 39. Jet Streaming Demo Flight Telemetry Processing a near real-time Flight Telemetry Stream from ADS-B Exchange - https://www.adsbexchange.com/ • Filter out planes outside of defined airports • Sliding over last 1 minute to detect, whether the plane is ascending, descending or staying in the same level 39© 2018 Hazelcast Inc.
  • 40. Jet Streaming Demo Flight Telemetry Processing a near real-time Flight Telemetry Stream from ADS-B Exchange - https://www.adsbexchange.com/ • Filter out planes outside of defined airports • Sliding over last 1 minute to detect, whether the plane is ascending, descending or staying in the same level • Based on the plane type and phase of the flight provides information about maximum noise levels nearby to the airport and estimated C02 emissions for a region 40© 2018 Hazelcast Inc. https://github.com/hazelcast/hazelcast-jet-demos/tree/master/flight-telemetry
  • 41. Pipeline transformations • Time-agnostic transformations • Filter • Map • Flatmap 41© 2018 Hazelcast Inc.
  • 42. Pipeline transformations • Time-agnostic transformations • Filter • Map • Flatmap • Aggregation and Grouping • Build-in count, different kind averages, min/max, linear trends and many more 42© 2018 Hazelcast Inc.
  • 43. Pipeline transformations • Time-agnostic transformations • Filter • Map • Flatmap • Aggregation and Grouping • Build-in count, different kind averages, min/max, linear trends and many more • Co-Aggregation 43© 2018 Hazelcast Inc.
  • 44. Pipeline transformations • Time-agnostic transformations • Filter • Map • Flatmap • Aggregation and Grouping • Build-in count, different kind averages, min/max, linear trends and many more • Co-Aggregation • Hash-Join 44© 2018 Hazelcast Inc.
  • 48. Watermarks to handle Late Events Makes an educated guess that “from this point on there will be no more items with timestamp less than this” 48© 2018 Hazelcast Inc.
  • 49. Watermarks in Jet Predefined Watermark Policies • With Fixed Lag • Limiting Lag and Delay • Limiting Lag and Lull • Limiting Timestamp and Wall-Clock Lag 49© 2018 Hazelcast Inc.
  • 50. Fault Tolerance 50© 2018 Hazelcast Inc.
  • 51. Jet Processing Fault Tolerance Cluster elects a Coordinator Member who takes care of the Job Coordination among the Cluster Members 51© 2018 Hazelcast Inc.
  • 52. Jet Processing Fault Tolerance Jet achieves fault tolerance in streaming jobs by making a snapshot of the internal processing state 52© 2018 Hazelcast Inc.
  • 53. Jet Processing Fault Tolerance Coordinator Member detects the other Member failure and restarts the Job using new topology 53© 2018 Hazelcast Inc.
  • 54. Jet Processing Fault Tolerance When the Coordinator Member crashes the new one is elected by the Cluster 54© 2018 Hazelcast Inc.
  • 55. Distributed Snapshots Technique 1st described in a paper by Chandy and Lamport in 1989 55© 2018 Hazelcast Inc.
  • 56. Distributed Snapshots Technique 1st described in a paper by Chandy and Lamport in 1989 56© 2018 Hazelcast Inc.
  • 57. Distributed Snapshots Technique 1st described in a paper by Chandy and Lamport in 1989 57© 2018 Hazelcast Inc.
  • 58. Distributed Snapshots Technique 1st described in a paper by Chandy and Lamport in 1989 58© 2018 Hazelcast Inc.
  • 59. Jet Processing Guarantees • At-Least Once 59© 2018 Hazelcast Inc.
  • 60. Jet Processing Guarantees • At-Least Once • Exactly Once 60© 2018 Hazelcast Inc.
  • 61. Jet Processing Guarantees • At-Least Once • Exactly Once • At-Most Once (meaning that the Fault Tolerance is turned off) 61© 2018 Hazelcast Inc.
  • 63. Hazelcast Jet Performance Key Design Decisions • DAG to Model Computations 63© 2018 Hazelcast Inc.
  • 64. Hazelcast Jet Performance Key Design Decisions • DAG to Model Computations • In-Memory Data Locality 64© 2018 Hazelcast Inc.
  • 65. Hazelcast Jet Performance Key Design Decisions • DAG to Model Computations • In-Memory Data Locality • Partition Mapping Affinity 65© 2018 Hazelcast Inc.
  • 66. Hazelcast Jet Performance Key Design Decisions • DAG to Model Computations • In-Memory Data Locality • Partition Mapping Affinity • SP/SC Queues 66© 2018 Hazelcast Inc.
  • 67. Hazelcast Jet Performance Key Design Decisions • DAG to Model Computations • In-Memory Data Locality • Partition Mapping Affinity • SP/SC Queues • Cooperative Multithreading (Green Threads) 67© 2018 Hazelcast Inc.
  • 68. Jet Streaming Performance 68© 2018 Hazelcast Inc. https://jet.hazelcast.org/performance/
  • 69. Jet Throughput 69© 2018 Hazelcast Inc. https://jet.hazelcast.org/performance/
  • 70. © 2017 Hazelcast Inc. Running Jet in Production • Docker images - https://github.com/hazelcast/hazelcast-jet-docker 70© 2018 Hazelcast Inc.
  • 71. Running Jet in Production • Docker images - https://github.com/hazelcast/hazelcast-jet-docker • Cluster Management: Mesos, Yarn 71© 2018 Hazelcast Inc.
  • 72. Running Jet in Production • Docker images - https://github.com/hazelcast/hazelcast-jet-docker • Cluster Management: Mesos, Yarn • Cluster Discovery • Cloud Providers: AWS, Windows Azure, GCP, PCF, Heroku • Kubernetes • Consul, Eureka, Zookeeper 72© 2018 Hazelcast Inc.
  • 73. Summary Why you should consider to use the Hazelcast Jet • High Performance | Industry Leading 73© 2018 Hazelcast Inc.
  • 74. Summary Why you should consider to use the Hazelcast Jet • High Performance | Industry Leading • Out-of-box integration with Hazelcast IMDG | Source, Sink, Enrichment 74© 2018 Hazelcast Inc.
  • 75. Summary Why you should consider to use the Hazelcast Jet • High Performance | Industry Leading • Out-of-box integration with Hazelcast IMDG | Source, Sink, Enrichment • Easy to start with and integrate | Zero dependencies, developer friendly 75© 2018 Hazelcast Inc.
  • 76. Summary Why you should consider to use the Hazelcast Jet • High Performance | Industry Leading • Out-of-box integration with Hazelcast IMDG | Source, Sink, Enrichment • Easy to start with and integrate | Zero dependencies, developer friendly • Simple to deploy | Embedded 10MB jar or Client-Server 76© 2018 Hazelcast Inc.
  • 77. Summary Why you should consider to use the Hazelcast Jet • High Performance | Industry Leading • Out-of-box integration with Hazelcast IMDG | Source, Sink, Enrichment • Easy to start with and integrate | Zero dependencies, developer friendly • Simple to deploy | Embedded 10MB jar or Client-Server • Works in every Cloud | Same as Hazelcast IMDG 77© 2018 Hazelcast Inc.
  • 78. Summary Why you should consider to use the Hazelcast Jet • High Performance | Industry Leading • Out-of-box integration with Hazelcast IMDG | Source, Sink, Enrichment • Easy to start with and integrate | Zero dependencies, developer friendly • Simple to deploy | Embedded 10MB jar or Client-Server • Works in every Cloud | Same as Hazelcast IMDG • For Developers by Developers | Code it 78© 2018 Hazelcast Inc.
  • 79. Questions? Version 0.6 is the current release with 0.7 coming Q3 2018 aiming for 1.0 this year http://jet.hazelcast.org https://groups.google.com/forum/#!forum/hazelcast-jet https://gitter.im/hazelcast/hazelcast 79© 2018 Hazelcast Inc.

Editor's Notes

  1. TODO: review and move comments from Google Shit! presentation
  2. TODO: add contacts !!! TODO: what’s written? :)
  3. TODO: add contacts !!! TODO: what’s written? :)
  4. - the answer is that the streaming data [definition of term] is everywhere and it’s usually about … - all these examples of data are generated all the time and usually come with some important real-time insights that require the processing here and now TODO: too much, remove gaming activities
  5. - fraud detection - alerts generation - variance in throughput -> auto-scaling - disorder -> e.g., a plane full of people taking their phones out of airplane mode after having used them offline for the entire flight - disorder -> producer parallelism and retries – specific to the tools that are used, due to the internals, especially when using batching
  6. - fraud detection - alerts generation - variance in throughput -> auto-scaling - disorder -> e.g., a plane full of people taking their phones out of airplane mode after having used them offline for the entire flight - disorder -> producer parallelism and retries – specific to the tools that are used, due to the internals, especially when using batching
  7. - fraud detection - alerts generation - variance in throughput -> auto-scaling - disorder -> e.g., a plane full of people taking their phones out of airplane mode after having used them offline for the entire flight - disorder -> producer parallelism and retries – specific to the tools that are used, due to the internals, especially when using batching
  8. - fraud detection - alerts generation - variance in throughput -> auto-scaling - disorder -> e.g., a plane full of people taking their phones out of airplane mode after having used them offline for the entire flight - disorder -> producer parallelism and retries – specific to the tools that are used, due to the internals, especially when using batching
  9. - let’s try to understand what is Stream Processing - the key things: on the fly priour to it’s storage, infinite data set in mind, data in motion
  10. - the key things: on the fly priour to it’s storage, infinite data set in mind, data in motion
  11. - the key things: on the fly priour to it’s storage, infinite data set in mind, data in motion
  12. How it’s different from classical Batch Processing, when we run periodical jobs to handle our data? TODO: review and maybe come up with own points TODO: combine 1 & 2
  13. TODO: review and maybe come up with own points
  14. TODO: review and maybe come up with own points
  15. TODO: review and maybe come up with own points
  16. 1. Architecturally, stream processing system usually consists of the following 2 layers 2. Now let’s see how the typical Stream Processing system looks in practice TODO: icons for tech stack of each layer ?!!!! TODO: make horizontal?!!!
  17. - hence, all this doesn’t come for free, there are multiple challenges to solve when you are Processing the Infinite Stream - problems: how to form the cluster how to coordinate and how to control the required level of consistency
  18. - problems: how to form the cluster how to coordinate and how to control the required level of consistency
  19. - problems: how to form the cluster how to coordinate and how to control the required level of consistency
  20. - how to solve these problems? - next slide -
  21. - Hazelcast Jet is one of the products which aim to solve such problem
  22. Architecturally, Jet consists of the following layers
  23. TODO: where is DAG API here? Jet Member is also a fully functional Hazelcast IMDG Member and a Jet Cluster is also a Hazelcast IMDG Cluster Hazelcast IMDG provides Layer of cluster management, deployment, data partitioning and networking In-Memory store for Jet Processing state Shared state to connect multiple Jet Jobs Remote data caching Enrichment data source
  24. TODO: where is DAG API here? Jet Member is also a fully functional Hazelcast IMDG Member and a Jet Cluster is also a Hazelcast IMDG Cluster Hazelcast IMDG provides Layer of cluster management, deployment, data partitioning and networking In-Memory store for Jet Processing state Shared state to connect multiple Jet Jobs Remote data caching Enrichment data source
  25. TODO: unify orange color among slides! TODO: animation
  26. Uses Hazelcast IMDG Clustering under the hood Peer-To-Peer communication Members can be either set statically or automatically discovered Elastically scales up or down Topologies Embedded Client-Server
  27. Uses Hazelcast IMDG Clustering under the hood Peer-To-Peer communication Members can be either set statically or automatically discovered Elastically scales up or down Topologies Embedded Client-Server
  28. Unit of work described by DAG which is submitted to the cluster for execution Asynchronous, Distributed Submitted to each running member *Scales up/down when adding removing members Embeds JAR with the source code, if needed
  29. Automatic dependent surveillance — broadcast (ADS–B) is a surveillance technology in which an aircraft determines its position via satellite navigation and periodically broadcasts it, enabling it to be tracked. The information can be received by air traffic control ground stations as a replacement for secondary surveillance radar, as no interrogation signal is needed from the ground. It can also be received by other aircraft to provide situational awareness and allow self-separation. ADS–B is "automatic" in that it requires no pilot or external input. It is "dependent" in that it depends on data from the aircraft's navigation system.[1]
  30. TODO: more info plus diagram
  31. TODO: more info plus diagram
  32. Context propagation for map, flatMap and filter 2) Aggregation and Grouping Transformation of a set of input values sharing the same distinct key into a single output value Build-in Aggregate Operations for count, different kind avagares, min/max, linear trends and many more Easy to implement own aggregations 3) Co-Aggregation groupBy over the items from more than one contributing stream Like JOIN with the Group By in SQL Typical use case - collecting stats over the user activity coming from the several streams 4) Hash-Join Join of one finite stream with another, possibly infinite stream Optimized for data enrichment - when each item of the primary stream gets enriched with the data resolved by a hashtable lookup To optimize the performance, the entire enriching stream is replicated on each Jet member
  33. Context propagation for map, flatMap and filter 2) Aggregation and Grouping Transformation of a set of input values sharing the same distinct key into a single output value Build-in Aggregate Operations for count, different kind avagares, min/max, linear trends and many more Easy to implement own aggregations 3) Co-Aggregation groupBy over the items from more than one contributing stream Like JOIN with the Group By in SQL Typical use case - collecting stats over the user activity coming from the several streams 4) Hash-Join Join of one finite stream with another, possibly infinite stream Optimized for data enrichment - when each item of the primary stream gets enriched with the data resolved by a hashtable lookup To optimize the performance, the entire enriching stream is replicated on each Jet member
  34. Context propagation for map, flatMap and filter 2) Aggregation and Grouping Transformation of a set of input values sharing the same distinct key into a single output value Build-in Aggregate Operations for count, different kind avagares, min/max, linear trends and many more Easy to implement own aggregations 3) Co-Aggregation groupBy over the items from more than one contributing stream Like JOIN with the Group By in SQL Typical use case - collecting stats over the user activity coming from the several streams 4) Hash-Join Join of one finite stream with another, possibly infinite stream Optimized for data enrichment - when each item of the primary stream gets enriched with the data resolved by a hashtable lookup To optimize the performance, the entire enriching stream is replicated on each Jet member
  35. Co-Aggregation – join page visits, user data and payments Context propagation for map, flatMap and filter 2) Aggregation and Grouping Transformation of a set of input values sharing the same distinct key into a single output value Build-in Aggregate Operations for count, different kind avagares, min/max, linear trends and many more Easy to implement own aggregations 3) Co-Aggregation groupBy over the items from more than one contributing stream Like JOIN with the Group By in SQL Typical use case - collecting stats over the user activity coming from the several streams 4) Hash-Join Join of one finite stream with another, possibly infinite stream Optimized for data enrichment - when each item of the primary stream gets enriched with the data resolved by a hashtable lookup To optimize the performance, the entire enriching stream is replicated on each Jet member
  36. TODO: must be “Event time” on axis
  37. TODO: Add a client App and make animations.
  38. TODO: ANIMATIONS!!! TODO: add a final step – when the snapshot completed - due to parallelism, in most cases a processor receives data from more than one upstream processor -
  39. TODO: ANIMATIONS!!! TODO: add a final step – when the snapshot completed - due to parallelism, in most cases a processor receives data from more than one upstream processor -
  40. TODO: ANIMATIONS!!! TODO: add a final step – when the snapshot completed - due to parallelism, in most cases a processor receives data from more than one upstream processor -
  41. TODO: ANIMATIONS!!! TODO: add a final step – when the snapshot completed - due to parallelism, in most cases a processor receives data from more than one upstream processor -
  42. TODO: animations
  43. TODO: animations
  44. TODO: animations
  45. Why it’s worth considering Jet for your next stream processing task
  46. TODO: Key Competitive Differentiators?
  47. TODO: Key Competitive Differentiators?
  48. TODO: Key Competitive Differentiators?
  49. TODO: Key Competitive Differentiators?
  50. TODO: Key Competitive Differentiators? Mention that this is an open product, e.g. it’s easy to implement a connector
  51. TODO: add resources