Stream Processing using Samza SQL

Samza SQL
Srinivasulu Punuru
Agenda
1 What is Samza SQL?
2 Why SQL on Samza?
3 How does it work?
4 Demo
5 Q&A
Stream Processing using Samza SQL
What is Samza SQL?
Samza SQL by Example
Count page views of each member in a five minute window.
Send the result to kafka topic PageViewCount.
Samza low level task API
Repartitioner Job
public class PageViewRepartitioner implements StreamTask {
SystemStream outputStream = new SystemStream("kafka", "pvMemberId");
@Override
public void process(IncomingMessageEnvelope envelope, MessageCollector collector, TaskCoordinator coordinator) {
PageViewEvent pageViewEvent = (PageViewEvent) envelope.getMessage();
String key = pageViewEvent.getMemberId();
OutgoingMessageEnvelope outMessage = new OutgoingMessageEnvelope(outputStream, pageViewEvent, key, pageViewEvent);
collector.send(outMessage);
}
}
Samza low level task API (contd.)
Page view counter
job
public class PageViewCounter implements StreamTask {
SystemStream outputStream = new SystemStream("kafka", "pageviewCount");
private Instant lastTriggerTime = Instant.now();
private HashMap<String, Integer> counter = new HashMap<>();
@Override
public void process(IncomingMessageEnvelope envelope, MessageCollector collector, TaskCoordinator coordinator) {
PageViewEvent pageViewEvent = (PageViewEvent) envelope.getMessage();
String memberId = pageViewEvent.getMemberId();
counter.put(memberId, counter.getOrDefault(memberId, 0) + 1);
if (Duration.between(lastTriggerTime, Instant.now()).toMinutes() > 5) {
counter.forEach((key, value) -> collector.send(new OutgoingMessageEnvelope(outputStream, key, value)));
counter.clear();
}
}
}
Samza high level API
public class PageViewCountApplication implements StreamApplication {
@Override public void init(StreamGraph graph, Config config) {
MessageStream<PageViewEvent> pageViewEvents = graph.getInputStream("pageView" );
MessageStream pageViewCount = graph.getOutputStream("pageViewCount" );
pageView
.partitionBy(m -> m.memberId)
.window(Windows.keyedTumblingWindow(m -> m.memberId, Duration.ofMinutes(5),
initialValue, (m, c) -> c + 1))
.map(MyStreamOutput::new)
.sendTo(pageViewPerMember);
}
}
Samza SQL
INSERT INTO kafka.pageViewCount
SELECT memberId, count(*) FROM kafka.pageViewStream
GROUP BY memberId, TUMBLE(current_timestamp, INTERVAL '5' MINUTES)
Samza API stack
User can choose the API to
write a Samza job.
Why SQL on Samza
• Expand the target audience of stream processing.
• Obtain quick real time insights.
• Create stream processing applications quickly.
How does it work?
How do we execute below SQL on Samza?
INSERT INTO kafka.NewEmployees
SELECT firstName, lastName FROM kafka.profileUpdateStream
WHERE profile.newCompany = ‘LinkedIn’
High level architecture
Samza SQL to Calcite relational algebra
INSERT INTO kafka.NewLinkedInEmployees
SELECT firstName, lastName FROM kafka.profileChange
WHERE profile.newCompany = ‘LinkedIn’
LogicalTableModify
LogicalProject
LogicalFilter
LogicalTableScan
Samza operator graph conversion
LogicalTableModify
LogicalProject
LogicalFilter
LogicalTableScan
profileChange
.filter(p -> p.getNewCompany().equals("LinkedIn"))
.map(this::getFirstAndLastName)
.sendTo(newLinkedInEmployees);
Samza SQL message flow
Samza SQL message flow
Samza SQL rel message format
public class SamzaSqlRelMessage {
private final List<Object> relFieldValues = new ArrayList<>();
private final List<String> relFieldNames = new ArrayList<>();
public List<String> getRelFieldNames() {
return relFieldNames;
}
public List<Object> getRelFieldValues() {
return this.relFieldValues;
}
}
• Simple relational format that represents a row in a table
• Ordered list of named values
Pluggable input/output resolvers
INSERT INTO kafka.NewEmployees
SELECT firstName, lastName FROM kafka.profileUpdateStream
WHERE profile.newCompany = ‘LinkedIn’
Samza SQL architecture
Demo
Demo setup
How do you use it?
• Samza SQL is available in Samza 0.14 release.
• Tutorial – http://bit.ly/samzasql
Samza– 0.14
• Samza SQL
• Projection, Filtering, UDFs, Flatten, Union, Avro
• Apache Beam runner for Samza
• Azure EventHub support
• Amazon kinesis support
• Multi stage batch support
• High level API improvements
• Durable state
• Programmable SerDe
Samza SQL- Future
• Joins (Stream-Stream & Stream-Table)
• Aggregates & aggregate UDF
• Full Subquery support
• Samza SQL as a service
Samza SQL- Future
• Joins (Stream-Stream & Stream-Table)
• Aggregates & aggregate UDF
• Full Subquery support
• Samza SQL as a service
Questions?
Stream Processing using Samza SQL
Thank you
Samza operator graph conversion
LogicalTableModify
LogicalProject
LogicalFilter
LogicalTableScan
Pluggable schema and message converters
1 of 32

Recommended

Unified Stream Processing at Scale with Apache Samza - BDS2017 by
Unified Stream Processing at Scale with Apache Samza - BDS2017Unified Stream Processing at Scale with Apache Samza - BDS2017
Unified Stream Processing at Scale with Apache Samza - BDS2017Jacob Maes
1.9K views32 slides
Samza 0.13 meetup slide v1.0.pptx by
Samza 0.13 meetup slide   v1.0.pptxSamza 0.13 meetup slide   v1.0.pptx
Samza 0.13 meetup slide v1.0.pptxYi Pan
1.2K views27 slides
Event sourcing - what could possibly go wrong ? Devoxx PL 2021 by
Event sourcing  - what could possibly go wrong ? Devoxx PL 2021Event sourcing  - what could possibly go wrong ? Devoxx PL 2021
Event sourcing - what could possibly go wrong ? Devoxx PL 2021Andrzej Ludwikowski
633 views128 slides
ksqlDB: A Stream-Relational Database System by
ksqlDB: A Stream-Relational Database SystemksqlDB: A Stream-Relational Database System
ksqlDB: A Stream-Relational Database Systemconfluent
1.4K views37 slides
ApacheCon BigData Europe 2015 by
ApacheCon BigData Europe 2015 ApacheCon BigData Europe 2015
ApacheCon BigData Europe 2015 Renato Javier Marroquín Mogrovejo
1.1K views56 slides
Exactly-once Data Processing with Kafka Streams - July 27, 2017 by
Exactly-once Data Processing with Kafka Streams - July 27, 2017Exactly-once Data Processing with Kafka Streams - July 27, 2017
Exactly-once Data Processing with Kafka Streams - July 27, 2017confluent
1.2K views68 slides

More Related Content

What's hot

KSQL: Streaming SQL for Kafka by
KSQL: Streaming SQL for KafkaKSQL: Streaming SQL for Kafka
KSQL: Streaming SQL for Kafkaconfluent
6.7K views33 slides
Flink Forward Berlin 2017: Piotr Wawrzyniak - Extending Apache Flink stream p... by
Flink Forward Berlin 2017: Piotr Wawrzyniak - Extending Apache Flink stream p...Flink Forward Berlin 2017: Piotr Wawrzyniak - Extending Apache Flink stream p...
Flink Forward Berlin 2017: Piotr Wawrzyniak - Extending Apache Flink stream p...Flink Forward
527 views20 slides
High Available Task Scheduling Design using Kafka and Kafka Streams | Naveen ... by
High Available Task Scheduling Design using Kafka and Kafka Streams | Naveen ...High Available Task Scheduling Design using Kafka and Kafka Streams | Naveen ...
High Available Task Scheduling Design using Kafka and Kafka Streams | Naveen ...HostedbyConfluent
3.6K views15 slides
Getting Started with Confluent Schema Registry by
Getting Started with Confluent Schema RegistryGetting Started with Confluent Schema Registry
Getting Started with Confluent Schema Registryconfluent
485 views27 slides
Apache Samza: Reliable Stream Processing Atop Apache Kafka and Hadoop YARN by
Apache Samza: Reliable Stream Processing Atop Apache Kafka and Hadoop YARNApache Samza: Reliable Stream Processing Atop Apache Kafka and Hadoop YARN
Apache Samza: Reliable Stream Processing Atop Apache Kafka and Hadoop YARNblueboxtraveler
7.6K views29 slides
Going Reactive with Spring 5 by
Going Reactive with Spring 5Going Reactive with Spring 5
Going Reactive with Spring 5Drazen Nikolic
1.1K views30 slides

What's hot(20)

KSQL: Streaming SQL for Kafka by confluent
KSQL: Streaming SQL for KafkaKSQL: Streaming SQL for Kafka
KSQL: Streaming SQL for Kafka
confluent6.7K views
Flink Forward Berlin 2017: Piotr Wawrzyniak - Extending Apache Flink stream p... by Flink Forward
Flink Forward Berlin 2017: Piotr Wawrzyniak - Extending Apache Flink stream p...Flink Forward Berlin 2017: Piotr Wawrzyniak - Extending Apache Flink stream p...
Flink Forward Berlin 2017: Piotr Wawrzyniak - Extending Apache Flink stream p...
Flink Forward527 views
High Available Task Scheduling Design using Kafka and Kafka Streams | Naveen ... by HostedbyConfluent
High Available Task Scheduling Design using Kafka and Kafka Streams | Naveen ...High Available Task Scheduling Design using Kafka and Kafka Streams | Naveen ...
High Available Task Scheduling Design using Kafka and Kafka Streams | Naveen ...
HostedbyConfluent3.6K views
Getting Started with Confluent Schema Registry by confluent
Getting Started with Confluent Schema RegistryGetting Started with Confluent Schema Registry
Getting Started with Confluent Schema Registry
confluent485 views
Apache Samza: Reliable Stream Processing Atop Apache Kafka and Hadoop YARN by blueboxtraveler
Apache Samza: Reliable Stream Processing Atop Apache Kafka and Hadoop YARNApache Samza: Reliable Stream Processing Atop Apache Kafka and Hadoop YARN
Apache Samza: Reliable Stream Processing Atop Apache Kafka and Hadoop YARN
blueboxtraveler7.6K views
Going Reactive with Spring 5 by Drazen Nikolic
Going Reactive with Spring 5Going Reactive with Spring 5
Going Reactive with Spring 5
Drazen Nikolic1.1K views
Flink Forward San Francisco 2018: Steven Wu - "Scaling Flink in Cloud" by Flink Forward
Flink Forward San Francisco 2018: Steven Wu - "Scaling Flink in Cloud" Flink Forward San Francisco 2018: Steven Wu - "Scaling Flink in Cloud"
Flink Forward San Francisco 2018: Steven Wu - "Scaling Flink in Cloud"
Flink Forward2.8K views
Flink Forward San Francisco 2018: Stefan Richter - "How to build a modern str... by Flink Forward
Flink Forward San Francisco 2018: Stefan Richter - "How to build a modern str...Flink Forward San Francisco 2018: Stefan Richter - "How to build a modern str...
Flink Forward San Francisco 2018: Stefan Richter - "How to build a modern str...
Flink Forward2.2K views
Diving into the Deep End - Kafka Connect by confluent
Diving into the Deep End - Kafka ConnectDiving into the Deep End - Kafka Connect
Diving into the Deep End - Kafka Connect
confluent636 views
What is the State of my Kafka Streams Application? Unleashing Metrics. | Neil... by HostedbyConfluent
What is the State of my Kafka Streams Application? Unleashing Metrics. | Neil...What is the State of my Kafka Streams Application? Unleashing Metrics. | Neil...
What is the State of my Kafka Streams Application? Unleashing Metrics. | Neil...
HostedbyConfluent1.2K views
Performance Tuning RocksDB for Kafka Streams’ State Stores by confluent
Performance Tuning RocksDB for Kafka Streams’ State StoresPerformance Tuning RocksDB for Kafka Streams’ State Stores
Performance Tuning RocksDB for Kafka Streams’ State Stores
confluent703 views
Kafka Streams: the easiest way to start with stream processing by Yaroslav Tkachenko
Kafka Streams: the easiest way to start with stream processingKafka Streams: the easiest way to start with stream processing
Kafka Streams: the easiest way to start with stream processing
Yaroslav Tkachenko6.6K views
Kick your database_to_the_curb_reston_08_27_19 by confluent
Kick your database_to_the_curb_reston_08_27_19Kick your database_to_the_curb_reston_08_27_19
Kick your database_to_the_curb_reston_08_27_19
confluent404 views
How to manage large amounts of data with akka streams by Igor Mielientiev
How to manage large amounts of data with akka streamsHow to manage large amounts of data with akka streams
How to manage large amounts of data with akka streams
Igor Mielientiev727 views
Flink Forward SF 2017: Shaoxuan Wang_Xiaowei Jiang - Blinks Improvements to F... by Flink Forward
Flink Forward SF 2017: Shaoxuan Wang_Xiaowei Jiang - Blinks Improvements to F...Flink Forward SF 2017: Shaoxuan Wang_Xiaowei Jiang - Blinks Improvements to F...
Flink Forward SF 2017: Shaoxuan Wang_Xiaowei Jiang - Blinks Improvements to F...
Flink Forward622 views
From Zero to Streaming Healthcare in Production (Alexander Kouznetsov, Invita... by confluent
From Zero to Streaming Healthcare in Production (Alexander Kouznetsov, Invita...From Zero to Streaming Healthcare in Production (Alexander Kouznetsov, Invita...
From Zero to Streaming Healthcare in Production (Alexander Kouznetsov, Invita...
confluent1.2K views
Webinar: Deep Dive on Apache Flink State - Seth Wiesman by Ververica
Webinar: Deep Dive on Apache Flink State - Seth WiesmanWebinar: Deep Dive on Apache Flink State - Seth Wiesman
Webinar: Deep Dive on Apache Flink State - Seth Wiesman
Ververica 1.2K views
Apache kafka meet_up_zurich_at_swissre_from_zero_to_hero_with_kafka_connect_2... by confluent
Apache kafka meet_up_zurich_at_swissre_from_zero_to_hero_with_kafka_connect_2...Apache kafka meet_up_zurich_at_swissre_from_zero_to_hero_with_kafka_connect_2...
Apache kafka meet_up_zurich_at_swissre_from_zero_to_hero_with_kafka_connect_2...
confluent935 views
Pulsar in the Lakehouse: Overview of Apache Pulsar and Delta Lake Connector -... by StreamNative
Pulsar in the Lakehouse: Overview of Apache Pulsar and Delta Lake Connector -...Pulsar in the Lakehouse: Overview of Apache Pulsar and Delta Lake Connector -...
Pulsar in the Lakehouse: Overview of Apache Pulsar and Delta Lake Connector -...
StreamNative173 views

Similar to Stream Processing using Samza SQL

SamzaSQL QCon'16 presentation by
SamzaSQL QCon'16 presentationSamzaSQL QCon'16 presentation
SamzaSQL QCon'16 presentationYi Pan
1.9K views50 slides
Apache Samza 1.0 - What's New, What's Next by
Apache Samza 1.0 - What's New, What's NextApache Samza 1.0 - What's New, What's Next
Apache Samza 1.0 - What's New, What's NextPrateek Maheshwari
292 views45 slides
Nextcon samza preso july - final by
Nextcon samza preso   july - finalNextcon samza preso   july - final
Nextcon samza preso july - finalYi Pan
808 views29 slides
Data Microservices In The Cloud + 日本語コメント by
Data Microservices In The Cloud + 日本語コメントData Microservices In The Cloud + 日本語コメント
Data Microservices In The Cloud + 日本語コメントTakuya Saeki
179 views35 slides
SynapseIndia dotnet development ajax client library by
SynapseIndia dotnet development ajax client librarySynapseIndia dotnet development ajax client library
SynapseIndia dotnet development ajax client librarySynapseindiappsdevelopment
233 views18 slides
Scaling asp.net websites to millions of users by
Scaling asp.net websites to millions of usersScaling asp.net websites to millions of users
Scaling asp.net websites to millions of usersoazabir
65.2K views35 slides

Similar to Stream Processing using Samza SQL(20)

SamzaSQL QCon'16 presentation by Yi Pan
SamzaSQL QCon'16 presentationSamzaSQL QCon'16 presentation
SamzaSQL QCon'16 presentation
Yi Pan1.9K views
Nextcon samza preso july - final by Yi Pan
Nextcon samza preso   july - finalNextcon samza preso   july - final
Nextcon samza preso july - final
Yi Pan808 views
Data Microservices In The Cloud + 日本語コメント by Takuya Saeki
Data Microservices In The Cloud + 日本語コメントData Microservices In The Cloud + 日本語コメント
Data Microservices In The Cloud + 日本語コメント
Takuya Saeki179 views
Scaling asp.net websites to millions of users by oazabir
Scaling asp.net websites to millions of usersScaling asp.net websites to millions of users
Scaling asp.net websites to millions of users
oazabir65.2K views
KSQL - Stream Processing simplified! by Guido Schmutz
KSQL - Stream Processing simplified!KSQL - Stream Processing simplified!
KSQL - Stream Processing simplified!
Guido Schmutz838 views
Integrating SAP the Java EE Way - JBoss One Day talk 2012 by hwilming
Integrating SAP the Java EE Way - JBoss One Day talk 2012Integrating SAP the Java EE Way - JBoss One Day talk 2012
Integrating SAP the Java EE Way - JBoss One Day talk 2012
hwilming6.2K views
Building a Real-time Streaming ETL Framework Using ksqlDB and NoSQL by ScyllaDB
Building a Real-time Streaming ETL Framework Using ksqlDB and NoSQLBuilding a Real-time Streaming ETL Framework Using ksqlDB and NoSQL
Building a Real-time Streaming ETL Framework Using ksqlDB and NoSQL
ScyllaDB1.1K views
Spring Web MVC by zeeshanhanif
Spring Web MVCSpring Web MVC
Spring Web MVC
zeeshanhanif10.6K views
10 performance and scalability secrets of ASP.NET websites by oazabir
10 performance and scalability secrets of ASP.NET websites10 performance and scalability secrets of ASP.NET websites
10 performance and scalability secrets of ASP.NET websites
oazabir120.6K views
Multi Client Development with Spring by Joshua Long
Multi Client Development with SpringMulti Client Development with Spring
Multi Client Development with Spring
Joshua Long2.2K views
SMC304 Serverless Orchestration with AWS Step Functions by Amazon Web Services
SMC304 Serverless Orchestration with AWS Step FunctionsSMC304 Serverless Orchestration with AWS Step Functions
SMC304 Serverless Orchestration with AWS Step Functions
Event Streaming with Kafka Streams and Spring Cloud Stream | Soby Chacko, VMware by HostedbyConfluent
Event Streaming with Kafka Streams and Spring Cloud Stream | Soby Chacko, VMwareEvent Streaming with Kafka Streams and Spring Cloud Stream | Soby Chacko, VMware
Event Streaming with Kafka Streams and Spring Cloud Stream | Soby Chacko, VMware
HostedbyConfluent1.2K views
Spring MVC introduction HVA by Peter Maas
Spring MVC introduction HVASpring MVC introduction HVA
Spring MVC introduction HVA
Peter Maas1.1K views
[JEEConf-2017] RxJava as a key component in mature Big Data product by Igor Lozynskyi
[JEEConf-2017] RxJava as a key component in mature Big Data product[JEEConf-2017] RxJava as a key component in mature Big Data product
[JEEConf-2017] RxJava as a key component in mature Big Data product
Igor Lozynskyi345 views

Recently uploaded

DESIGN OF SPRINGS-UNIT4.pptx by
DESIGN OF SPRINGS-UNIT4.pptxDESIGN OF SPRINGS-UNIT4.pptx
DESIGN OF SPRINGS-UNIT4.pptxgopinathcreddy
21 views47 slides
AWS A5.18 A5.18M-2021.pdf by
AWS A5.18 A5.18M-2021.pdfAWS A5.18 A5.18M-2021.pdf
AWS A5.18 A5.18M-2021.pdfThinhNguyen455948
8 views48 slides
Design of Structures and Foundations for Vibrating Machines, Arya-ONeill-Pinc... by
Design of Structures and Foundations for Vibrating Machines, Arya-ONeill-Pinc...Design of Structures and Foundations for Vibrating Machines, Arya-ONeill-Pinc...
Design of Structures and Foundations for Vibrating Machines, Arya-ONeill-Pinc...csegroupvn
13 views210 slides
Module-1, Chapter-2 Data Types, Variables, and Arrays by
Module-1, Chapter-2 Data Types, Variables, and ArraysModule-1, Chapter-2 Data Types, Variables, and Arrays
Module-1, Chapter-2 Data Types, Variables, and ArraysDemian Antony D'Mello
6 views44 slides
Unlocking Research Visibility.pdf by
Unlocking Research Visibility.pdfUnlocking Research Visibility.pdf
Unlocking Research Visibility.pdfKhatirNaima
11 views19 slides
Plant Design Report-Oil Refinery.pdf by
Plant Design Report-Oil Refinery.pdfPlant Design Report-Oil Refinery.pdf
Plant Design Report-Oil Refinery.pdfSafeen Yaseen Ja'far
9 views10 slides

Recently uploaded(20)

Design of Structures and Foundations for Vibrating Machines, Arya-ONeill-Pinc... by csegroupvn
Design of Structures and Foundations for Vibrating Machines, Arya-ONeill-Pinc...Design of Structures and Foundations for Vibrating Machines, Arya-ONeill-Pinc...
Design of Structures and Foundations for Vibrating Machines, Arya-ONeill-Pinc...
csegroupvn13 views
Unlocking Research Visibility.pdf by KhatirNaima
Unlocking Research Visibility.pdfUnlocking Research Visibility.pdf
Unlocking Research Visibility.pdf
KhatirNaima11 views
Ansari: Practical experiences with an LLM-based Islamic Assistant by M Waleed Kadous
Ansari: Practical experiences with an LLM-based Islamic AssistantAnsari: Practical experiences with an LLM-based Islamic Assistant
Ansari: Practical experiences with an LLM-based Islamic Assistant
M Waleed Kadous11 views
Integrating Sustainable Development Goals (SDGs) in School Education by SheetalTank1
Integrating Sustainable Development Goals (SDGs) in School EducationIntegrating Sustainable Development Goals (SDGs) in School Education
Integrating Sustainable Development Goals (SDGs) in School Education
SheetalTank111 views
Design_Discover_Develop_Campaign.pptx by ShivanshSeth6
Design_Discover_Develop_Campaign.pptxDesign_Discover_Develop_Campaign.pptx
Design_Discover_Develop_Campaign.pptx
ShivanshSeth655 views
Web Dev Session 1.pptx by VedVekhande
Web Dev Session 1.pptxWeb Dev Session 1.pptx
Web Dev Session 1.pptx
VedVekhande20 views
2023Dec ASU Wang NETR Group Research Focus and Facility Overview.pptx by lwang78
2023Dec ASU Wang NETR Group Research Focus and Facility Overview.pptx2023Dec ASU Wang NETR Group Research Focus and Facility Overview.pptx
2023Dec ASU Wang NETR Group Research Focus and Facility Overview.pptx
lwang78188 views
GDSC Mikroskil Members Onboarding 2023.pdf by gdscmikroskil
GDSC Mikroskil Members Onboarding 2023.pdfGDSC Mikroskil Members Onboarding 2023.pdf
GDSC Mikroskil Members Onboarding 2023.pdf
gdscmikroskil68 views
REACTJS.pdf by ArthyR3
REACTJS.pdfREACTJS.pdf
REACTJS.pdf
ArthyR337 views
BCIC - Manufacturing Conclave - Technology-Driven Manufacturing for Growth by Innomantra
BCIC - Manufacturing Conclave -  Technology-Driven Manufacturing for GrowthBCIC - Manufacturing Conclave -  Technology-Driven Manufacturing for Growth
BCIC - Manufacturing Conclave - Technology-Driven Manufacturing for Growth
Innomantra 20 views

Stream Processing using Samza SQL

  • 2. Agenda 1 What is Samza SQL? 2 Why SQL on Samza? 3 How does it work? 4 Demo 5 Q&A
  • 5. Samza SQL by Example Count page views of each member in a five minute window. Send the result to kafka topic PageViewCount.
  • 6. Samza low level task API Repartitioner Job public class PageViewRepartitioner implements StreamTask { SystemStream outputStream = new SystemStream("kafka", "pvMemberId"); @Override public void process(IncomingMessageEnvelope envelope, MessageCollector collector, TaskCoordinator coordinator) { PageViewEvent pageViewEvent = (PageViewEvent) envelope.getMessage(); String key = pageViewEvent.getMemberId(); OutgoingMessageEnvelope outMessage = new OutgoingMessageEnvelope(outputStream, pageViewEvent, key, pageViewEvent); collector.send(outMessage); } }
  • 7. Samza low level task API (contd.) Page view counter job public class PageViewCounter implements StreamTask { SystemStream outputStream = new SystemStream("kafka", "pageviewCount"); private Instant lastTriggerTime = Instant.now(); private HashMap<String, Integer> counter = new HashMap<>(); @Override public void process(IncomingMessageEnvelope envelope, MessageCollector collector, TaskCoordinator coordinator) { PageViewEvent pageViewEvent = (PageViewEvent) envelope.getMessage(); String memberId = pageViewEvent.getMemberId(); counter.put(memberId, counter.getOrDefault(memberId, 0) + 1); if (Duration.between(lastTriggerTime, Instant.now()).toMinutes() > 5) { counter.forEach((key, value) -> collector.send(new OutgoingMessageEnvelope(outputStream, key, value))); counter.clear(); } } }
  • 8. Samza high level API public class PageViewCountApplication implements StreamApplication { @Override public void init(StreamGraph graph, Config config) { MessageStream<PageViewEvent> pageViewEvents = graph.getInputStream("pageView" ); MessageStream pageViewCount = graph.getOutputStream("pageViewCount" ); pageView .partitionBy(m -> m.memberId) .window(Windows.keyedTumblingWindow(m -> m.memberId, Duration.ofMinutes(5), initialValue, (m, c) -> c + 1)) .map(MyStreamOutput::new) .sendTo(pageViewPerMember); } }
  • 9. Samza SQL INSERT INTO kafka.pageViewCount SELECT memberId, count(*) FROM kafka.pageViewStream GROUP BY memberId, TUMBLE(current_timestamp, INTERVAL '5' MINUTES)
  • 10. Samza API stack User can choose the API to write a Samza job.
  • 11. Why SQL on Samza • Expand the target audience of stream processing. • Obtain quick real time insights. • Create stream processing applications quickly.
  • 12. How does it work?
  • 13. How do we execute below SQL on Samza? INSERT INTO kafka.NewEmployees SELECT firstName, lastName FROM kafka.profileUpdateStream WHERE profile.newCompany = ‘LinkedIn’
  • 15. Samza SQL to Calcite relational algebra INSERT INTO kafka.NewLinkedInEmployees SELECT firstName, lastName FROM kafka.profileChange WHERE profile.newCompany = ‘LinkedIn’ LogicalTableModify LogicalProject LogicalFilter LogicalTableScan
  • 16. Samza operator graph conversion LogicalTableModify LogicalProject LogicalFilter LogicalTableScan profileChange .filter(p -> p.getNewCompany().equals("LinkedIn")) .map(this::getFirstAndLastName) .sendTo(newLinkedInEmployees);
  • 19. Samza SQL rel message format public class SamzaSqlRelMessage { private final List<Object> relFieldValues = new ArrayList<>(); private final List<String> relFieldNames = new ArrayList<>(); public List<String> getRelFieldNames() { return relFieldNames; } public List<Object> getRelFieldValues() { return this.relFieldValues; } } • Simple relational format that represents a row in a table • Ordered list of named values
  • 20. Pluggable input/output resolvers INSERT INTO kafka.NewEmployees SELECT firstName, lastName FROM kafka.profileUpdateStream WHERE profile.newCompany = ‘LinkedIn’
  • 22. Demo
  • 24. How do you use it? • Samza SQL is available in Samza 0.14 release. • Tutorial – http://bit.ly/samzasql
  • 25. Samza– 0.14 • Samza SQL • Projection, Filtering, UDFs, Flatten, Union, Avro • Apache Beam runner for Samza • Azure EventHub support • Amazon kinesis support • Multi stage batch support • High level API improvements • Durable state • Programmable SerDe
  • 26. Samza SQL- Future • Joins (Stream-Stream & Stream-Table) • Aggregates & aggregate UDF • Full Subquery support • Samza SQL as a service
  • 27. Samza SQL- Future • Joins (Stream-Stream & Stream-Table) • Aggregates & aggregate UDF • Full Subquery support • Samza SQL as a service
  • 31. Samza operator graph conversion LogicalTableModify LogicalProject LogicalFilter LogicalTableScan
  • 32. Pluggable schema and message converters