Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

of

YouTube videos are no longer supported on SlideShare

View original on YouTube

Stream Processing using Samza SQL Slide 2 Stream Processing using Samza SQL Slide 3 Stream Processing using Samza SQL Slide 4 Stream Processing using Samza SQL Slide 5 Stream Processing using Samza SQL Slide 6 Stream Processing using Samza SQL Slide 7 Stream Processing using Samza SQL Slide 8 Stream Processing using Samza SQL Slide 9 Stream Processing using Samza SQL Slide 10 Stream Processing using Samza SQL Slide 11 Stream Processing using Samza SQL Slide 12 Stream Processing using Samza SQL Slide 13 Stream Processing using Samza SQL Slide 14 Stream Processing using Samza SQL Slide 15 Stream Processing using Samza SQL Slide 16 Stream Processing using Samza SQL Slide 17 Stream Processing using Samza SQL Slide 18 Stream Processing using Samza SQL Slide 19 Stream Processing using Samza SQL Slide 20 Stream Processing using Samza SQL Slide 21 Stream Processing using Samza SQL Slide 22 Stream Processing using Samza SQL Slide 23 Stream Processing using Samza SQL Slide 24 Stream Processing using Samza SQL Slide 25 Stream Processing using Samza SQL Slide 26 Stream Processing using Samza SQL Slide 27 Stream Processing using Samza SQL Slide 28 Stream Processing using Samza SQL Slide 29 Stream Processing using Samza SQL Slide 30 Stream Processing using Samza SQL Slide 31 Stream Processing using Samza SQL Slide 32 Stream Processing using Samza SQL Slide 33
Upcoming SlideShare
What to Upload to SlideShare
Next
Download to read offline and view in fullscreen.

Share

Stream Processing using Samza SQL

Download to read offline

Stream Processing using Samza SQL

Related Books

Free with a 30 day trial from Scribd

See all

Related Audiobooks

Free with a 30 day trial from Scribd

See all

Stream Processing using Samza SQL

  1. 1. Samza SQL Srinivasulu Punuru
  2. 2. Agenda 1 What is Samza SQL? 2 Why SQL on Samza? 3 How does it work? 4 Demo 5 Q&A
  3. 3. What is Samza SQL?
  4. 4. Samza SQL by Example Count page views of each member in a five minute window. Send the result to kafka topic PageViewCount.
  5. 5. Samza low level task API Repartitioner Job public class PageViewRepartitioner implements StreamTask { SystemStream outputStream = new SystemStream("kafka", "pvMemberId"); @Override public void process(IncomingMessageEnvelope envelope, MessageCollector collector, TaskCoordinator coordinator) { PageViewEvent pageViewEvent = (PageViewEvent) envelope.getMessage(); String key = pageViewEvent.getMemberId(); OutgoingMessageEnvelope outMessage = new OutgoingMessageEnvelope(outputStream, pageViewEvent, key, pageViewEvent); collector.send(outMessage); } }
  6. 6. Samza low level task API (contd.) Page view counter job public class PageViewCounter implements StreamTask { SystemStream outputStream = new SystemStream("kafka", "pageviewCount"); private Instant lastTriggerTime = Instant.now(); private HashMap<String, Integer> counter = new HashMap<>(); @Override public void process(IncomingMessageEnvelope envelope, MessageCollector collector, TaskCoordinator coordinator) { PageViewEvent pageViewEvent = (PageViewEvent) envelope.getMessage(); String memberId = pageViewEvent.getMemberId(); counter.put(memberId, counter.getOrDefault(memberId, 0) + 1); if (Duration.between(lastTriggerTime, Instant.now()).toMinutes() > 5) { counter.forEach((key, value) -> collector.send(new OutgoingMessageEnvelope(outputStream, key, value))); counter.clear(); } } }
  7. 7. Samza high level API public class PageViewCountApplication implements StreamApplication { @Override public void init(StreamGraph graph, Config config) { MessageStream<PageViewEvent> pageViewEvents = graph.getInputStream("pageView" ); MessageStream pageViewCount = graph.getOutputStream("pageViewCount" ); pageView .partitionBy(m -> m.memberId) .window(Windows.keyedTumblingWindow(m -> m.memberId, Duration.ofMinutes(5), initialValue, (m, c) -> c + 1)) .map(MyStreamOutput::new) .sendTo(pageViewPerMember); } }
  8. 8. Samza SQL INSERT INTO kafka.pageViewCount SELECT memberId, count(*) FROM kafka.pageViewStream GROUP BY memberId, TUMBLE(current_timestamp, INTERVAL '5' MINUTES)
  9. 9. Samza API stack User can choose the API to write a Samza job.
  10. 10. Why SQL on Samza • Expand the target audience of stream processing. • Obtain quick real time insights. • Create stream processing applications quickly.
  11. 11. How does it work?
  12. 12. How do we execute below SQL on Samza? INSERT INTO kafka.NewEmployees SELECT firstName, lastName FROM kafka.profileUpdateStream WHERE profile.newCompany = ‘LinkedIn’
  13. 13. High level architecture
  14. 14. Samza SQL to Calcite relational algebra INSERT INTO kafka.NewLinkedInEmployees SELECT firstName, lastName FROM kafka.profileChange WHERE profile.newCompany = ‘LinkedIn’ LogicalTableModify LogicalProject LogicalFilter LogicalTableScan
  15. 15. Samza operator graph conversion LogicalTableModify LogicalProject LogicalFilter LogicalTableScan profileChange .filter(p -> p.getNewCompany().equals("LinkedIn")) .map(this::getFirstAndLastName) .sendTo(newLinkedInEmployees);
  16. 16. Samza SQL message flow
  17. 17. Samza SQL message flow
  18. 18. Samza SQL rel message format public class SamzaSqlRelMessage { private final List<Object> relFieldValues = new ArrayList<>(); private final List<String> relFieldNames = new ArrayList<>(); public List<String> getRelFieldNames() { return relFieldNames; } public List<Object> getRelFieldValues() { return this.relFieldValues; } } • Simple relational format that represents a row in a table • Ordered list of named values
  19. 19. Pluggable input/output resolvers INSERT INTO kafka.NewEmployees SELECT firstName, lastName FROM kafka.profileUpdateStream WHERE profile.newCompany = ‘LinkedIn’
  20. 20. Samza SQL architecture
  21. 21. Demo
  22. 22. Demo setup
  23. 23. How do you use it? • Samza SQL is available in Samza 0.14 release. • Tutorial – http://bit.ly/samzasql
  24. 24. Samza– 0.14 • Samza SQL • Projection, Filtering, UDFs, Flatten, Union, Avro • Apache Beam runner for Samza • Azure EventHub support • Amazon kinesis support • Multi stage batch support • High level API improvements • Durable state • Programmable SerDe
  25. 25. Samza SQL- Future • Joins (Stream-Stream & Stream-Table) • Aggregates & aggregate UDF • Full Subquery support • Samza SQL as a service
  26. 26. Samza SQL- Future • Joins (Stream-Stream & Stream-Table) • Aggregates & aggregate UDF • Full Subquery support • Samza SQL as a service
  27. 27. Questions?
  28. 28. Thank you
  29. 29. Samza operator graph conversion LogicalTableModify LogicalProject LogicalFilter LogicalTableScan
  30. 30. Pluggable schema and message converters
  • bigquery

    Jan. 11, 2018
  • StreamingAnalytics

    Dec. 11, 2017

Stream Processing using Samza SQL

Views

Total views

687

On Slideshare

0

From embeds

0

Number of embeds

2

Actions

Downloads

2

Shares

0

Comments

0

Likes

2

×