SlideShare a Scribd company logo
distributed stream processing
@humbertostreb
Samza overview
an open-source distributed stream processing created by Linkedin
- sub-second latency
- handle large amount of state
- fault tolerance
- no messages are ever lost
- partitioned and distributed at every level
- processor isolation
- pluggable
Architecture
Streaming: Kafka
Execution: YARN
Processing: Samza
Kafka
- The stream may be sharded into one or more partitions.
- Each partition is independent from the others, and is replicated
across multiple machines.
- Each partition consists of a sequence of messages in a fixed order.
- Each message has an offset, which indicates its position in that
sequence.
- A Samza job can start consuming the sequence of messages from
any starting offset.
YARN
- ResourceManager
- NodeManager
- ApplicationMaster
YARN
Streams
A stream is composed of immutable messages of a similar type or
category
- more than one stream consumed in the same job, are chosen by
RoundRobin by default, but can be overridden
- by configuration streams can be prioritised
Job
Job is code that performs a logical
transformation on a set of input
streams to append output messages to
set of output streams.
Partitions
Each stream is broken into one or
more partitions. Each partition in
the stream is a totally ordered
sequence of messages.
Task
A job is scaled by breaking it into
multiple tasks. The task is the unit of
parallelism of the job, just as the
partition is to the stream. Each task
consumes data from one partition for
each of the job’s input streams.
Containers
Containers are the unit of physical
parallelism, and a container is
essentially a Unix process (or Linux
cgroup). Each container runs one or
more tasks.
SamzaContainer starts up steps
1 - Get last checkpointed offset for each input stream partition
2 - Create a “reader” thread for every input stream partition
3 - Start metrics reporters to report metrics
4 - Start a checkpoint timer to save your task’s input stream offsets
every so often
SamzaContainer starts up steps
5 - Start a window timer to trigger your task’s window method, if it is
defined
6 - Instantiate and initialize your StreamTask once for each input
stream partition
7 - Start an event loop that takes messages from the input stream reader
threads, and gives them to your StreamTasks
8 - Notify lifecycle listeners during each one of these steps
Checkpointing
Samza writes checkpoints to a separate Kafka topic called
__samza_checkpoint_<job-name>_<job-id>
State Management
- fast approach using a local database
- fault tolerance sending a local store’s
writes to a replicated changelog and
checkpointing
- out of the box support RocksDB
(key-value)
Event Loop
- synchronous tasks will run on the single thread by default, but you
can configure
- asynchronous tasks will always be invoked in a single thread, while
callbacks can be triggered from a different thread.
Samza will make sure that checkpointing is automatically performed
only after the async calls have completed.
Metrics
Samza has its own library to expose metrics, with counters, gauges and
timer.
Metrics can be exposed by JMX, Kafka topic and so on
Security
Samza provides no security.
All security is implemented in the stream system, or in the environment
that Samza containers run.
Links
https://www.infoq.com/presentations/samza-linkedin
http://es.slideshare.net/martinkleppmann/samza-at-linkedin-taking-stream-
processing-to-the-next-level
tanks

More Related Content

What's hot

Fallacies of distributed computing with Kubernetes on AWS
Fallacies of distributed computing with Kubernetes on AWSFallacies of distributed computing with Kubernetes on AWS
Fallacies of distributed computing with Kubernetes on AWS
Raffaele Di Fazio
 
YOW2018 - Events and Commands: Developing Asynchronous Microservices
YOW2018 - Events and Commands: Developing Asynchronous MicroservicesYOW2018 - Events and Commands: Developing Asynchronous Microservices
YOW2018 - Events and Commands: Developing Asynchronous Microservices
Chris Richardson
 
Advances in File Carving
Advances in File CarvingAdvances in File Carving
Advances in File Carving
Rob Zirnstein
 
AWS Real-Time Event Processing
AWS Real-Time Event ProcessingAWS Real-Time Event Processing
AWS Real-Time Event Processing
Amazon Web Services
 
IBM Cloud Paks - IBM Cloud
IBM Cloud Paks - IBM CloudIBM Cloud Paks - IBM Cloud
IBM Cloud Paks - IBM Cloud
AniaPaplaCardenal
 
Unified Batch & Stream Processing with Apache Samza
Unified Batch & Stream Processing with Apache SamzaUnified Batch & Stream Processing with Apache Samza
Unified Batch & Stream Processing with Apache Samza
DataWorks Summit
 
Unified Log Processing Architecture
Unified Log Processing ArchitectureUnified Log Processing Architecture
Unified Log Processing Architecture
Guido Schmutz
 
Introduction to Storm
Introduction to Storm Introduction to Storm
Introduction to Storm
Chandler Huang
 
Real-Time Data Flows with Apache NiFi
Real-Time Data Flows with Apache NiFiReal-Time Data Flows with Apache NiFi
Real-Time Data Flows with Apache NiFi
Manish Gupta
 
High Performance Computing Pitch Deck
High Performance Computing Pitch DeckHigh Performance Computing Pitch Deck
High Performance Computing Pitch Deck
Nicholas Vossburg
 
Get More Data Into Your SCADA 2016
Get More Data Into Your SCADA 2016Get More Data Into Your SCADA 2016
Get More Data Into Your SCADA 2016
Inductive Automation
 
VMware HCI solutions - 2020-01-16
VMware HCI solutions - 2020-01-16VMware HCI solutions - 2020-01-16
VMware HCI solutions - 2020-01-16
David Pasek
 
Kafka and Storm - event processing in realtime
Kafka and Storm - event processing in realtimeKafka and Storm - event processing in realtime
Kafka and Storm - event processing in realtime
Guido Schmutz
 
Event Driven Services Part 3: Putting the Micro into Microservices with State...
Event Driven Services Part 3: Putting the Micro into Microservices with State...Event Driven Services Part 3: Putting the Micro into Microservices with State...
Event Driven Services Part 3: Putting the Micro into Microservices with State...
Ben Stopford
 
Ray Serve: A new scalable machine learning model serving library on Ray
Ray Serve: A new scalable machine learning model serving library on RayRay Serve: A new scalable machine learning model serving library on Ray
Ray Serve: A new scalable machine learning model serving library on Ray
Simon Mo
 
VDI and Application Virtualization
VDI and Application VirtualizationVDI and Application Virtualization
VDI and Application Virtualization
James W. De Rienzo
 
Serverless with Spring Cloud Function, Knative and riff #SpringOneTour #s1t
Serverless with Spring Cloud Function, Knative and riff #SpringOneTour #s1tServerless with Spring Cloud Function, Knative and riff #SpringOneTour #s1t
Serverless with Spring Cloud Function, Knative and riff #SpringOneTour #s1t
Toshiaki Maki
 
Apache Flink Deep Dive
Apache Flink Deep DiveApache Flink Deep Dive
Apache Flink Deep Dive
DataWorks Summit
 
The twelve factor app
The twelve factor appThe twelve factor app
The twelve factor app
Ravi Okade
 
Hci solution with VxRail
Hci solution with VxRailHci solution with VxRail
Hci solution with VxRail
Anton An
 

What's hot (20)

Fallacies of distributed computing with Kubernetes on AWS
Fallacies of distributed computing with Kubernetes on AWSFallacies of distributed computing with Kubernetes on AWS
Fallacies of distributed computing with Kubernetes on AWS
 
YOW2018 - Events and Commands: Developing Asynchronous Microservices
YOW2018 - Events and Commands: Developing Asynchronous MicroservicesYOW2018 - Events and Commands: Developing Asynchronous Microservices
YOW2018 - Events and Commands: Developing Asynchronous Microservices
 
Advances in File Carving
Advances in File CarvingAdvances in File Carving
Advances in File Carving
 
AWS Real-Time Event Processing
AWS Real-Time Event ProcessingAWS Real-Time Event Processing
AWS Real-Time Event Processing
 
IBM Cloud Paks - IBM Cloud
IBM Cloud Paks - IBM CloudIBM Cloud Paks - IBM Cloud
IBM Cloud Paks - IBM Cloud
 
Unified Batch & Stream Processing with Apache Samza
Unified Batch & Stream Processing with Apache SamzaUnified Batch & Stream Processing with Apache Samza
Unified Batch & Stream Processing with Apache Samza
 
Unified Log Processing Architecture
Unified Log Processing ArchitectureUnified Log Processing Architecture
Unified Log Processing Architecture
 
Introduction to Storm
Introduction to Storm Introduction to Storm
Introduction to Storm
 
Real-Time Data Flows with Apache NiFi
Real-Time Data Flows with Apache NiFiReal-Time Data Flows with Apache NiFi
Real-Time Data Flows with Apache NiFi
 
High Performance Computing Pitch Deck
High Performance Computing Pitch DeckHigh Performance Computing Pitch Deck
High Performance Computing Pitch Deck
 
Get More Data Into Your SCADA 2016
Get More Data Into Your SCADA 2016Get More Data Into Your SCADA 2016
Get More Data Into Your SCADA 2016
 
VMware HCI solutions - 2020-01-16
VMware HCI solutions - 2020-01-16VMware HCI solutions - 2020-01-16
VMware HCI solutions - 2020-01-16
 
Kafka and Storm - event processing in realtime
Kafka and Storm - event processing in realtimeKafka and Storm - event processing in realtime
Kafka and Storm - event processing in realtime
 
Event Driven Services Part 3: Putting the Micro into Microservices with State...
Event Driven Services Part 3: Putting the Micro into Microservices with State...Event Driven Services Part 3: Putting the Micro into Microservices with State...
Event Driven Services Part 3: Putting the Micro into Microservices with State...
 
Ray Serve: A new scalable machine learning model serving library on Ray
Ray Serve: A new scalable machine learning model serving library on RayRay Serve: A new scalable machine learning model serving library on Ray
Ray Serve: A new scalable machine learning model serving library on Ray
 
VDI and Application Virtualization
VDI and Application VirtualizationVDI and Application Virtualization
VDI and Application Virtualization
 
Serverless with Spring Cloud Function, Knative and riff #SpringOneTour #s1t
Serverless with Spring Cloud Function, Knative and riff #SpringOneTour #s1tServerless with Spring Cloud Function, Knative and riff #SpringOneTour #s1t
Serverless with Spring Cloud Function, Knative and riff #SpringOneTour #s1t
 
Apache Flink Deep Dive
Apache Flink Deep DiveApache Flink Deep Dive
Apache Flink Deep Dive
 
The twelve factor app
The twelve factor appThe twelve factor app
The twelve factor app
 
Hci solution with VxRail
Hci solution with VxRailHci solution with VxRail
Hci solution with VxRail
 

Similar to Apache samza

Introduction to Kafka Streams Presentation
Introduction to Kafka Streams PresentationIntroduction to Kafka Streams Presentation
Introduction to Kafka Streams Presentation
Knoldus Inc.
 
Apache Storm
Apache StormApache Storm
Apache Storm
masifqadri
 
Messaging queue - Kafka
Messaging queue - KafkaMessaging queue - Kafka
Messaging queue - Kafka
Mayank Bansal
 
Cluster_Performance_Apache_Kafak_vs_RabbitMQ
Cluster_Performance_Apache_Kafak_vs_RabbitMQCluster_Performance_Apache_Kafak_vs_RabbitMQ
Cluster_Performance_Apache_Kafak_vs_RabbitMQ
Shameera Rathnayaka
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
Viswanath J
 
A Quick Guide to Refresh Kafka Skills
A Quick Guide to Refresh Kafka SkillsA Quick Guide to Refresh Kafka Skills
A Quick Guide to Refresh Kafka Skills
Ravindra kumar
 
Apache samza past, present and future
Apache samza  past, present and futureApache samza  past, present and future
Apache samza past, present and future
Ed Yakabosky
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
Ramakrishna kapa
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
Apache kafkaApache kafka
Apache kafka
Srikrishna k
 
Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?
Anton Nazaruk
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
Joe Stein
 
Real-Time Analytics with Kafka, Cassandra and Storm
Real-Time Analytics with Kafka, Cassandra and StormReal-Time Analytics with Kafka, Cassandra and Storm
Real-Time Analytics with Kafka, Cassandra and Storm
John Georgiadis
 
Controlling message flow
Controlling message flowControlling message flow
Controlling message flow
Rajarajan Sadhasivam
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
Jemin Patel
 
Comparing processing frameworks v7
Comparing processing frameworks v7Comparing processing frameworks v7
Comparing processing frameworks v7
Gabriela Choy
 
Messaging for Modern Applications
Messaging for Modern ApplicationsMessaging for Modern Applications
Messaging for Modern Applications
Tom McCuch
 
Kafka Deep Dive
Kafka Deep DiveKafka Deep Dive
Kafka Deep Dive
Knoldus Inc.
 
Samza portable runner for beam
Samza portable runner for beamSamza portable runner for beam
Samza portable runner for beam
Hai Lu
 
Event driven-arch
Event driven-archEvent driven-arch
Event driven-arch
Mohammed Shoaib
 

Similar to Apache samza (20)

Introduction to Kafka Streams Presentation
Introduction to Kafka Streams PresentationIntroduction to Kafka Streams Presentation
Introduction to Kafka Streams Presentation
 
Apache Storm
Apache StormApache Storm
Apache Storm
 
Messaging queue - Kafka
Messaging queue - KafkaMessaging queue - Kafka
Messaging queue - Kafka
 
Cluster_Performance_Apache_Kafak_vs_RabbitMQ
Cluster_Performance_Apache_Kafak_vs_RabbitMQCluster_Performance_Apache_Kafak_vs_RabbitMQ
Cluster_Performance_Apache_Kafak_vs_RabbitMQ
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
A Quick Guide to Refresh Kafka Skills
A Quick Guide to Refresh Kafka SkillsA Quick Guide to Refresh Kafka Skills
A Quick Guide to Refresh Kafka Skills
 
Apache samza past, present and future
Apache samza  past, present and futureApache samza  past, present and future
Apache samza past, present and future
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
 
Real-Time Analytics with Kafka, Cassandra and Storm
Real-Time Analytics with Kafka, Cassandra and StormReal-Time Analytics with Kafka, Cassandra and Storm
Real-Time Analytics with Kafka, Cassandra and Storm
 
Controlling message flow
Controlling message flowControlling message flow
Controlling message flow
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Comparing processing frameworks v7
Comparing processing frameworks v7Comparing processing frameworks v7
Comparing processing frameworks v7
 
Messaging for Modern Applications
Messaging for Modern ApplicationsMessaging for Modern Applications
Messaging for Modern Applications
 
Kafka Deep Dive
Kafka Deep DiveKafka Deep Dive
Kafka Deep Dive
 
Samza portable runner for beam
Samza portable runner for beamSamza portable runner for beam
Samza portable runner for beam
 
Event driven-arch
Event driven-archEvent driven-arch
Event driven-arch
 

More from Humberto Streb

Istio service mesh
Istio service meshIstio service mesh
Istio service mesh
Humberto Streb
 
Event sourcing e o poder do desacoplamento
Event sourcing e o poder do desacoplamentoEvent sourcing e o poder do desacoplamento
Event sourcing e o poder do desacoplamento
Humberto Streb
 
Reactive streams, because parallelism matters
Reactive streams, because parallelism mattersReactive streams, because parallelism matters
Reactive streams, because parallelism matters
Humberto Streb
 
Docker, jenkins e gradle para tomar o controle de sua entrega
Docker, jenkins e gradle para tomar o controle de sua entregaDocker, jenkins e gradle para tomar o controle de sua entrega
Docker, jenkins e gradle para tomar o controle de sua entrega
Humberto Streb
 
Socket.io
Socket.ioSocket.io
Socket.io
Humberto Streb
 
Fp without functional language
Fp without functional languageFp without functional language
Fp without functional language
Humberto Streb
 
Sinatra
SinatraSinatra
Descomplicando o controle de versão com git
Descomplicando o controle de versão com gitDescomplicando o controle de versão com git
Descomplicando o controle de versão com git
Humberto Streb
 

More from Humberto Streb (8)

Istio service mesh
Istio service meshIstio service mesh
Istio service mesh
 
Event sourcing e o poder do desacoplamento
Event sourcing e o poder do desacoplamentoEvent sourcing e o poder do desacoplamento
Event sourcing e o poder do desacoplamento
 
Reactive streams, because parallelism matters
Reactive streams, because parallelism mattersReactive streams, because parallelism matters
Reactive streams, because parallelism matters
 
Docker, jenkins e gradle para tomar o controle de sua entrega
Docker, jenkins e gradle para tomar o controle de sua entregaDocker, jenkins e gradle para tomar o controle de sua entrega
Docker, jenkins e gradle para tomar o controle de sua entrega
 
Socket.io
Socket.ioSocket.io
Socket.io
 
Fp without functional language
Fp without functional languageFp without functional language
Fp without functional language
 
Sinatra
SinatraSinatra
Sinatra
 
Descomplicando o controle de versão com git
Descomplicando o controle de versão com gitDescomplicando o controle de versão com git
Descomplicando o controle de versão com git
 

Recently uploaded

Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
Zilliz
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
IndexBug
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
Mariano Tinti
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
DianaGray10
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
SOFTTECHHUB
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
DianaGray10
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Speck&Tech
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
Neo4j
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
Zilliz
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
KAMESHS29
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
Claudio Di Ciccio
 

Recently uploaded (20)

Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
 

Apache samza

  • 2. Samza overview an open-source distributed stream processing created by Linkedin - sub-second latency - handle large amount of state - fault tolerance - no messages are ever lost - partitioned and distributed at every level - processor isolation - pluggable
  • 4. Kafka - The stream may be sharded into one or more partitions. - Each partition is independent from the others, and is replicated across multiple machines. - Each partition consists of a sequence of messages in a fixed order. - Each message has an offset, which indicates its position in that sequence. - A Samza job can start consuming the sequence of messages from any starting offset.
  • 7. Streams A stream is composed of immutable messages of a similar type or category - more than one stream consumed in the same job, are chosen by RoundRobin by default, but can be overridden - by configuration streams can be prioritised
  • 8. Job Job is code that performs a logical transformation on a set of input streams to append output messages to set of output streams.
  • 9. Partitions Each stream is broken into one or more partitions. Each partition in the stream is a totally ordered sequence of messages.
  • 10. Task A job is scaled by breaking it into multiple tasks. The task is the unit of parallelism of the job, just as the partition is to the stream. Each task consumes data from one partition for each of the job’s input streams.
  • 11. Containers Containers are the unit of physical parallelism, and a container is essentially a Unix process (or Linux cgroup). Each container runs one or more tasks.
  • 12. SamzaContainer starts up steps 1 - Get last checkpointed offset for each input stream partition 2 - Create a “reader” thread for every input stream partition 3 - Start metrics reporters to report metrics 4 - Start a checkpoint timer to save your task’s input stream offsets every so often
  • 13. SamzaContainer starts up steps 5 - Start a window timer to trigger your task’s window method, if it is defined 6 - Instantiate and initialize your StreamTask once for each input stream partition 7 - Start an event loop that takes messages from the input stream reader threads, and gives them to your StreamTasks 8 - Notify lifecycle listeners during each one of these steps
  • 14. Checkpointing Samza writes checkpoints to a separate Kafka topic called __samza_checkpoint_<job-name>_<job-id>
  • 15. State Management - fast approach using a local database - fault tolerance sending a local store’s writes to a replicated changelog and checkpointing - out of the box support RocksDB (key-value)
  • 16. Event Loop - synchronous tasks will run on the single thread by default, but you can configure - asynchronous tasks will always be invoked in a single thread, while callbacks can be triggered from a different thread. Samza will make sure that checkpointing is automatically performed only after the async calls have completed.
  • 17. Metrics Samza has its own library to expose metrics, with counters, gauges and timer. Metrics can be exposed by JMX, Kafka topic and so on
  • 18. Security Samza provides no security. All security is implemented in the stream system, or in the environment that Samza containers run.
  • 20. tanks