8. Samza high level API
public class PageViewCountApplication implements StreamApplication {
@Override public void init(StreamGraph graph, Config config) {
MessageStream<PageViewEvent> pageViewEvents = graph.getInputStream("pageView" );
MessageStream pageViewCount = graph.getOutputStream("pageViewCount" );
pageView
.partitionBy(m -> m.memberId)
.window(Windows.keyedTumblingWindow(m -> m.memberId, Duration.ofMinutes(5),
initialValue, (m, c) -> c + 1))
.map(MyStreamOutput::new)
.sendTo(pageViewPerMember);
}
}
9. Samza SQL
INSERT INTO kafka.pageViewCount
SELECT memberId, count(*) FROM kafka.pageViewStream
GROUP BY memberId, TUMBLE(current_timestamp, INTERVAL '5' MINUTES)
11. Why SQL on Samza
• Expand the target audience of stream processing.
• Obtain quick real time insights.
• Create stream processing applications quickly.
13. How do we execute below SQL on Samza?
INSERT INTO kafka.NewEmployees
SELECT firstName, lastName FROM kafka.profileUpdateStream
WHERE profile.newCompany = ‘LinkedIn’
19. Samza SQL rel message format
public class SamzaSqlRelMessage {
private final List<Object> relFieldValues = new ArrayList<>();
private final List<String> relFieldNames = new ArrayList<>();
public List<String> getRelFieldNames() {
return relFieldNames;
}
public List<Object> getRelFieldValues() {
return this.relFieldValues;
}
}
• Simple relational format that represents a row in a table
• Ordered list of named values
20. Pluggable input/output resolvers
INSERT INTO kafka.NewEmployees
SELECT firstName, lastName FROM kafka.profileUpdateStream
WHERE profile.newCompany = ‘LinkedIn’