TODO: review and move comments from Google Shit! presentation
TODO: add contacts !!!
TODO: what’s written? :)
TODO: add contacts !!!
TODO: what’s written? :)
- the answer is that the streaming data [definition of term] is everywhere and it’s usually about …
- all these examples of data are generated all the time and usually come with some important real-time insights that require the processing here and now
TODO: too much, remove gaming activities
- fraud detection
- alerts generation
- variance in throughput -> auto-scaling
- disorder -> e.g., a plane full of people taking their phones out of airplane mode after having used them offline for the entire flight
- disorder -> producer parallelism and retries – specific to the tools that are used, due to the internals, especially when using batching
- fraud detection
- alerts generation
- variance in throughput -> auto-scaling
- disorder -> e.g., a plane full of people taking their phones out of airplane mode after having used them offline for the entire flight
- disorder -> producer parallelism and retries – specific to the tools that are used, due to the internals, especially when using batching
- fraud detection
- alerts generation
- variance in throughput -> auto-scaling
- disorder -> e.g., a plane full of people taking their phones out of airplane mode after having used them offline for the entire flight
- disorder -> producer parallelism and retries – specific to the tools that are used, due to the internals, especially when using batching
- fraud detection
- alerts generation
- variance in throughput -> auto-scaling
- disorder -> e.g., a plane full of people taking their phones out of airplane mode after having used them offline for the entire flight
- disorder -> producer parallelism and retries – specific to the tools that are used, due to the internals, especially when using batching
- let’s try to understand what is Stream Processing
- the key things: on the fly priour to it’s storage, infinite data set in mind, data in motion
- the key things: on the fly priour to it’s storage, infinite data set in mind, data in motion
- the key things: on the fly priour to it’s storage, infinite data set in mind, data in motion
How it’s different from classical Batch Processing, when we run periodical jobs to handle our data?
TODO: review and maybe come up with own points
TODO: combine 1 & 2
TODO: review and maybe come up with own points
TODO: review and maybe come up with own points
TODO: review and maybe come up with own points
1. Architecturally, stream processing system usually consists of the following 2 layers
2. Now let’s see how the typical Stream Processing system looks in practice
TODO: icons for tech stack of each layer ?!!!!
TODO: make horizontal?!!!
- hence, all this doesn’t come for free, there are multiple challenges to solve when you are Processing the Infinite Stream
- problems: how to form the cluster how to coordinate and how to control the required level of consistency
- problems: how to form the cluster how to coordinate and how to control the required level of consistency
- problems: how to form the cluster how to coordinate and how to control the required level of consistency
- how to solve these problems?
- next slide
-
- Hazelcast Jet is one of the products which aim to solve such problem
Architecturally, Jet consists of the following layers
TODO: where is DAG API here?
Jet Member is also a fully functional Hazelcast IMDG Member and a Jet Cluster is also a Hazelcast IMDG Cluster
Hazelcast IMDG provides
Layer of cluster management, deployment, data partitioning and networking
In-Memory store for Jet Processing state
Shared state to connect multiple Jet Jobs
Remote data caching
Enrichment data source
TODO: where is DAG API here?
Jet Member is also a fully functional Hazelcast IMDG Member and a Jet Cluster is also a Hazelcast IMDG Cluster
Hazelcast IMDG provides
Layer of cluster management, deployment, data partitioning and networking
In-Memory store for Jet Processing state
Shared state to connect multiple Jet Jobs
Remote data caching
Enrichment data source
TODO: unify orange color among slides!
TODO: animation
Uses Hazelcast IMDG Clustering under the hood
Peer-To-Peer communication
Members can be either set statically or automatically discovered
Elastically scales up or down
Topologies
Embedded
Client-Server
Uses Hazelcast IMDG Clustering under the hood
Peer-To-Peer communication
Members can be either set statically or automatically discovered
Elastically scales up or down
Topologies
Embedded
Client-Server
Unit of work described by DAG which is submitted to the cluster for executionAsynchronous, Distributed
Submitted to each running member
*Scales up/down when adding removing members
Embeds JAR with the source code, if needed
Automatic dependent surveillance — broadcast (ADS–B) is a surveillance technology in which an aircraft determines its position via satellite navigation and periodically broadcasts it, enabling it to be tracked. The information can be received by air traffic control ground stations as a replacement for secondary surveillance radar, as no interrogation signal is needed from the ground. It can also be received by other aircraft to provide situational awareness and allow self-separation.
ADS–B is "automatic" in that it requires no pilot or external input. It is "dependent" in that it depends on data from the aircraft's navigation system.[1]
TODO: more info plus diagram
TODO: more info plus diagram
Context propagation for map, flatMap and filter
2) Aggregation and Grouping
Transformation of a set of input values sharing the same distinct key into a single output value
Build-in Aggregate Operations for count, different kind avagares, min/max, linear trends and many more
Easy to implement own aggregations
3) Co-Aggregation
groupBy over the items from more than one contributing stream
Like JOIN with the Group By in SQL
Typical use case - collecting stats over the user activity coming from the several streams
4) Hash-Join
Join of one finite stream with another, possibly infinite stream
Optimized for data enrichment - when each item of the primary stream gets enriched with the data resolved by a hashtable lookup
To optimize the performance, the entire enriching stream is replicated on each Jet member
Context propagation for map, flatMap and filter
2) Aggregation and Grouping
Transformation of a set of input values sharing the same distinct key into a single output value
Build-in Aggregate Operations for count, different kind avagares, min/max, linear trends and many more
Easy to implement own aggregations
3) Co-Aggregation
groupBy over the items from more than one contributing stream
Like JOIN with the Group By in SQL
Typical use case - collecting stats over the user activity coming from the several streams
4) Hash-Join
Join of one finite stream with another, possibly infinite stream
Optimized for data enrichment - when each item of the primary stream gets enriched with the data resolved by a hashtable lookup
To optimize the performance, the entire enriching stream is replicated on each Jet member
Context propagation for map, flatMap and filter
2) Aggregation and Grouping
Transformation of a set of input values sharing the same distinct key into a single output value
Build-in Aggregate Operations for count, different kind avagares, min/max, linear trends and many more
Easy to implement own aggregations
3) Co-Aggregation
groupBy over the items from more than one contributing stream
Like JOIN with the Group By in SQL
Typical use case - collecting stats over the user activity coming from the several streams
4) Hash-Join
Join of one finite stream with another, possibly infinite stream
Optimized for data enrichment - when each item of the primary stream gets enriched with the data resolved by a hashtable lookup
To optimize the performance, the entire enriching stream is replicated on each Jet member
Co-Aggregation – join page visits, user data and payments
Context propagation for map, flatMap and filter
2) Aggregation and Grouping
Transformation of a set of input values sharing the same distinct key into a single output value
Build-in Aggregate Operations for count, different kind avagares, min/max, linear trends and many more
Easy to implement own aggregations
3) Co-Aggregation
groupBy over the items from more than one contributing stream
Like JOIN with the Group By in SQL
Typical use case - collecting stats over the user activity coming from the several streams
4) Hash-Join
Join of one finite stream with another, possibly infinite stream
Optimized for data enrichment - when each item of the primary stream gets enriched with the data resolved by a hashtable lookup
To optimize the performance, the entire enriching stream is replicated on each Jet member
TODO: must be “Event time” on axis
TODO: Add a client App and make animations.
TODO: ANIMATIONS!!!
TODO: add a final step – when the snapshot completed
- due to parallelism, in most cases a processor receives data from more than one upstream processor
-
TODO: ANIMATIONS!!!
TODO: add a final step – when the snapshot completed
- due to parallelism, in most cases a processor receives data from more than one upstream processor
-
TODO: ANIMATIONS!!!
TODO: add a final step – when the snapshot completed
- due to parallelism, in most cases a processor receives data from more than one upstream processor
-
TODO: ANIMATIONS!!!
TODO: add a final step – when the snapshot completed
- due to parallelism, in most cases a processor receives data from more than one upstream processor
-
TODO: animations
TODO: animations
TODO: animations
Why it’s worth considering Jet for your next stream processing task
TODO: Key Competitive Differentiators?
TODO: Key Competitive Differentiators?
TODO: Key Competitive Differentiators?
TODO: Key Competitive Differentiators?
TODO: Key Competitive Differentiators?
Mention that this is an open product, e.g. it’s easy to implement a connector