Apache Samza

•

0 likes•39 views

This presentation gives an overview of the Apache Samza project. It explains Samza's stream processing capabilities as well as its architecture, users, use cases etc. Links for further information and connecting http://www.amazon.com/Michael-Frampton/e/B00NIQDOOM/ https://nz.linkedin.com/pub/mike-frampton/20/630/385 https://open-source-systems.blogspot.com/

Technology

What Is Apache Samza ?
● An asynchronous computational framework
● For distributed sub second stream processing
● Fault tolerance, isolation and stateful processing
● Open source / Apache 2.0 license
● Developed in Java and Scala
● Runs stand-alone or on YARN

Samza Use Cases
● Applications that require millisecond - second response
– Streaming analytics
– DDOS attack detection
– Fraud detection
– Metric anomaly detection
– System notifications
– Performance monitoring

Samza Partitioned Stream
● Samza uses streams to process data
● Collections of ordered immutable objects
● Each object uses a key-value pair
● Each stream is sharded into partitions
● This allows the architecture to scale

Samza API's
● High Level Streams API (Java)
– Stream based processing API
● Low Level Task API (Java)
– Message based processing API
● Table API
– Random access by key data sources
● Testing Samza
– Samza's testing Integration framework
● Samza SQL
– Stream processing via SQL and UDF's
● Apache BEAM
– Samza provides a Beam runner for application execution

Samza Architecture
● Application are broken down into tasks
● Each task consumes data from a stream partition
● Tasks are executed with containers
● A coordinator assigns tasks to containers
● Tasks checkpoint their last processed task offset
● Each task has its own state store for state management
● Samza replicates changes to local store in separate stream
● This allows later recovery of local stores

Samza Architecture
● Task container coordination

Samza Architecture
● Fault tolerance of state

Samza Architecture
● Incremental checkpointing

Available Books
● See “Big Data Made Easy”
– Apress Jan 2015
●
See “Mastering Apache Spark”
– Packt Oct 2015
●
See “Complete Guide to Open Source Big Data Stack
– “Apress Jan 2018”
● Find the author on Amazon
– www.amazon.com/Michael-Frampton/e/B00NIQDOOM/
●
Connect on LinkedIn
– www.linkedin.com/in/mike-frampton-38563020

Connect
● Feel free to connect on LinkedIn
– www.linkedin.com/in/mike-frampton-38563020
● See my open source blog at
– open-source-systems.blogspot.com/
● I am always interested in
– New technology
– Opportunities
– Technology based issues
– Big data integration

Similar to Apache Samza

Apache samza past, present and futureEd Yakabosky

Stream processing using KafkaKnoldus Inc.

Samza tech talk_2015 - strataYi Pan

Kick-Start with SMACK StackKnoldus Inc.

Interactive Data Analysis in Spark Streamingdatamantra

Scalable complex event processing on samza @UBERShuyi Chen

Apache samzaHumberto Streb

Apache Spark - A High Level overviewKaran Alang

Beam me up, Samza!Xinyu Liu

Samza portable runner for beamHai Lu

Stream processing using Apache Storm - Big Data Meetup Athens 2016Adrianos Dadis

4th Athens Big Data Meetup - 1st Talk - Big Data Streaming Processing Using A...Athens Big Data

AWS Serverless Examples Dimosthenis Botsaris

Examples AWS Serverlessarconsis

Cassandra Lunch #88: CadenceAnant Corporation

Spinnaker workshopLee Xie

Samza tech talk_2015 - huaweiYi Pan

Apache Samza Past, Present and FutureKartik Paramasivam

Big Data Streaming processing using Apache Storm - FOSSCOMM 2016Adrianos Dadis

Netflix Keystone Pipeline at Samza Meetup 10-13-2015Monal Daxini

Similar to Apache Samza (20)

Apache samza past, present and future

Stream processing using Kafka

Samza tech talk_2015 - strata

Kick-Start with SMACK Stack

Interactive Data Analysis in Spark Streaming

Scalable complex event processing on samza @UBER

Apache samza

Apache Spark - A High Level overview

Beam me up, Samza!

Samza portable runner for beam

Stream processing using Apache Storm - Big Data Meetup Athens 2016

4th Athens Big Data Meetup - 1st Talk - Big Data Streaming Processing Using A...

AWS Serverless Examples

Examples AWS Serverless

Cassandra Lunch #88: Cadence

Spinnaker workshop

Samza tech talk_2015 - huawei

Apache Samza Past, Present and Future

Big Data Streaming processing using Apache Storm - FOSSCOMM 2016

Netflix Keystone Pipeline at Samza Meetup 10-13-2015

Recently uploaded

Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited

DMCC Future of Trade Web3 - Special EditionDubai Multi Commodity Centre

Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski

My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer

SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren

Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren

Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55

DevEX - reference for building teams, processes, and platformsSergiu Bodiu

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays

WordPress Websites for Engineers: Elevate Your Brandgvaughan

CloudStudio User manual (basic edition):comworks

Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada

Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service9953056974 Low Rate Call Girls In Saket, Delhi NCR

SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal

Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University

Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst

Commit 2024 - Secret Management made easyAlfredo García Lavilla

Artificial intelligence in cctv survelliance.pptxhariprasad279825

Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar

Recently uploaded (20)

Ensuring Technical Readiness For Copilot in Microsoft 365

DMCC Future of Trade Web3 - Special Edition

Integration and Automation in Practice: CI/CD in Mule Integration and Automat...

My INSURER PTE LTD - Insurtech Innovation Award 2024

SQL Database Design For Developers at php[tek] 2024

Advanced Test Driven-Development @ php[tek] 2024

Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...

DevEX - reference for building teams, processes, and platforms

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack

WordPress Websites for Engineers: Elevate Your Brand

CloudStudio User manual (basic edition):

Developer Data Modeling Mistakes: From Postgres to NoSQL

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024

Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service

SAP Build Work Zone - Overview L2-L3.pptx

Nell’iperspazio con Rocket: il Framework Web di Rust!

Human Factors of XR: Using Human Factors to Design XR Systems

Commit 2024 - Secret Management made easy

Artificial intelligence in cctv survelliance.pptx

Unleash Your Potential - Namagunga Girls Coding Club

Apache Samza

1. What Is Apache Samza ? ● An asynchronous computational framework ● For distributed sub second stream processing ● Fault tolerance, isolation and stateful processing ● Open source / Apache 2.0 license ● Developed in Java and Scala ● Runs stand-alone or on YARN

2. Samza Use Cases ● Applications that require millisecond - second response – Streaming analytics – DDOS attack detection – Fraud detection – Metric anomaly detection – System notifications – Performance monitoring

3. Samza Users

4. Samza Partitioned Stream ● Samza uses streams to process data ● Collections of ordered immutable objects ● Each object uses a key-value pair ● Each stream is sharded into partitions ● This allows the architecture to scale

5. Samza API's ● High Level Streams API (Java) – Stream based processing API ● Low Level Task API (Java) – Message based processing API ● Table API – Random access by key data sources ● Testing Samza – Samza's testing Integration framework ● Samza SQL – Stream processing via SQL and UDF's ● Apache BEAM – Samza provides a Beam runner for application execution

6. Samza Architecture

7. Samza Architecture ● Application are broken down into tasks ● Each task consumes data from a stream partition ● Tasks are executed with containers ● A coordinator assigns tasks to containers ● Tasks checkpoint their last processed task offset ● Each task has its own state store for state management ● Samza replicates changes to local store in separate stream ● This allows later recovery of local stores

8. Samza Architecture ● Task container coordination

9. Samza Architecture ● Fault tolerance of state

10. Samza Architecture ● Incremental checkpointing

11. Samza Architecture ● State management

12. Available Books ● See “Big Data Made Easy” – Apress Jan 2015 ● See “Mastering Apache Spark” – Packt Oct 2015 ● See “Complete Guide to Open Source Big Data Stack – “Apress Jan 2018” ● Find the author on Amazon – www.amazon.com/Michael-Frampton/e/B00NIQDOOM/ ● Connect on LinkedIn – www.linkedin.com/in/mike-frampton-38563020

13. Connect ● Feel free to connect on LinkedIn – www.linkedin.com/in/mike-frampton-38563020 ● See my open source blog at – open-source-systems.blogspot.com/ ● I am always interested in – New technology – Opportunities – Technology based issues – Big data integration

Apache Samza

Recommended

Recommended

More Related Content

Similar to Apache Samza

Similar to Apache Samza (20)

More from Mike Frampton

More from Mike Frampton (20)

Recently uploaded

Recently uploaded (20)

Apache Samza