SlideShare a Scribd company logo
1 of 43
1
Lightweight Computing
with Pulsar Functions
Sanjeev Kulkarni, Sijie Guo
2
Event Driven Architectures
The rise of RealTime
BigData began with Batch
HDFS/MapReduce/Hive
ReacBon Times became important
Reduce Bme between data arrival and data analysis/acBon
Emergence of Real-Time Streaming ystems
3
What do we really mean by Real-Time?
Aims
Aim is to react to events as they happen in real-Bme
Where do Events happen/arrive?
Message Bus
Whats a reacBon
An acBon/transformaBon/funcBon
4
Compute Representation
Abstract View
f(x)
Incoming Messages Output Messages
5
Traditional Compute representation
DAG
%
%
%
%
%
Source
1
Source
2
Actio
n
Actio
n
Actio
Sink 1
Sink 2
6
Traditional Compute API
SBtching all of this by programmers
public static class SplitSentence extends BaseBasicBolt {
@Override
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("word"));
}
@Override
public Map<String, Object> getComponentConfiguration() {
return null;
}
public void execute(Tuple tuple, BasicOutputCollector
basicOutputCollector) {
String sentence = tuple.getStringByField("sentence");
String words[] = sentence.split(" ");
for (String w : words) {
basicOutputCollector.emit(new Values(w));
}
}
}
7
Traditional Compute API
SBtching all of this by programmers
public static class WordCount extends BaseBasicBolt {
Map<String, Integer> counts = new HashMap<String, Integer>();
@Override
public void execute(Tuple tuple, BasicOutputCollector collector) {
String word = tuple.getString(0);
Integer count = counts.get(word);
if (count == null)
count = 0;
count++;
counts.put(word, count);
collector.emit(new Values(word, count));
}
@Override
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("word", "count"));
}
}
8
Compute API 2.0
FuncBonal
Builder.newBuilder()
.newSource(() -> StreamletUtils.randomFromList(SENTENCES))
.flatMap(sentence -> Arrays.asList(sentence.toLowerCase().split("s+")))
.reduceByKeyAndWindow(word -> word, word -> 1,
WindowConfig.TumblingCountWindow(50),
(x, y) -> x + y);
9
Compute API 2.0
CharacterisBcs
Compact
Complicated
Map vs FlatMap
10
Traditional Real-Time Systems
Separate
Messaging Compute
11
Traditional Real-Time Systems
Developer Experience
Powerful API but complicated
Does everyone really need to learn funcBonal programming?
Configurable/Scaleable but management overhead
Edge systems have resource/manageability constraints
12
Traditional Real-Time Systems
OperaBonal Experience
Another system to operate is one too many
IOT deployment rouBnely have thousands of edge systems
SemanBc difference
Mismatch/DuplicaBon between Systems
Creates Developer and Operator FricBon
13
Lessons learnt
UseCases
A significant percentage of transformaBons are simple
ETL
ReacBve Services
ClassificaBon
Real-Bme AggregaBon
Event RouBng
Microservices
14
Meanwhile
The world of Cloud
The emergence of Serverless
Simple FuncBon API
FuncBons are submi^ed to the system
Run per event
ComposiBon APIs to do complex things
Wildly popular
15
Serverless vs Streaming
Whats really the difference
Both are event driven architectures
Both can be used for analyBcs/serving
Both have composiBon APIs
Conf based for Serverless vs DSL based for Streaming
Serverless typically don’t care for ordering
Really the funcBon of the underlying source
Pay per acBon
Really a product billing interfaces
16
Whats needed:- Stream-Native Compute
Insight gained from serverless
Simplest possible API
Method/Procedure/FuncBon
MulB Language API
Scale developers
Message bus naBve concepts
Input/Output/Log as topics
Flexible runBme
Simple standalone applicaBons vs system managed applicaBons
17
Introducing Pulsar Functions
18
Apache Pulsar
19
Ordering
Guaranteed ordering
Multi-tenancy
A single cluster can
support many tenants
and use cases
High throughput
Can reach 1.8 M
messages/s in a
single partition
Durability
Data replicated and
synced to disk
Geo-replication
Out of box support for
geographically
distributed
applications
Unified messaging
model
Support both
Streaming and
Queuing in a single
model
Delivery Guarantees
At least once, at most
once and effectively once
Low Latency
Low publish latency of
5ms at 99pct
Highly scalable
Can support millions of
topics
What is Apache Pulsar?
20
Pulsar Architecture
Pulsar Broker 1 Pulsar Broker 1 Pulsar Broker 1
Bookie 1 Bookie 2 Bookie 3 Bookie 4 Bookie 5
Apache BookKeeper
Apache Pulsar
Producer Consumer
Stateless Serving
BROKER
Clients interact only with brokers
No state is stored in brokers
BOOKIES
Apache BookKeeper as the storage
Storage is append only
Provides high performance, low latency
Durability
No data loss. fsync before acknowledgement
21
Pulsar Architecture
Pulsar Broker 1 Pulsar Broker 1 Pulsar Broker 1
Bookie 1 Bookie 2 Bookie 3 Bookie 4 Bookie 5
Apache BookKeeper
Apache Pulsar
Producer Consumer
SeparaBon of Storage and Serving
SERVING
Brokers can be added independently
Traffic can be shifted quickly across brokers
STORAGE
Bookies can be added independently
New bookies will ramp up traffic quickly
22
Segment Centric Storage
23
Flexible Messaging Model
24
Multi Tenancy
25
Topic (T1) Topic (T1)
Topic (T1)
SubscripBon (S1) SubscripBon (S1)
Producer
(P1)
Consumer
(C1)
Producer
(P3)
Producer
(P2)
Consumer
(C2)
Data Center A Data Center B
Data Center C
Multi Cluster Replication
26
Back to Pulsar Functions
27
Pulsar Functions
API
SDK less API
import java.util.function.Function;
public class ExclamationFunction implements Function<String, String> {
@Override
public String apply(String input) {
return input + "!";
}
}
28
Pulsar Functions
API
SDK API
import org.apache.pulsar.functions.api.PulsarFunction;
import org.apache.pulsar.functions.api.Context;
public class ExclamationFunction implements PulsarFunction<String, String> {
@Override
public String process(String input, Context context) {
return input + "!";
}
}
29
Pulsar Functions
Input and Output
FuncBon executed for every message of input topic
Supports mulBple topics as inputs
FuncBon Output goes to the output topic
FuncBon Output can be void/null
SerDe takes care of serializaBon/deserializaBon of messages
Custom SerDe can be provided by the users
Integrates with Schema Registry
30
Pulsar Functions
Processing Guarantees
ATMOST_ONCE
Message is acked to Pulsar as soon as we receive it
ATLEAST_ONCE
Message acked to Pulsar aeer the funcBon completes
Default behaviour:- Not many ppl want to loose data
EFFECTIVELY_ONCE
Uses Pulsar’s inbuilt effecBvely once semanBcs
Controlled at runBme by user
31
Pulsar Functions
Built in State
FuncBons can store state in StreamStore
Framework provides an simple library around this
Support server side operaBons like counters
Simplified applicaBon development
No need to standup an extra system
32
Pulsar Functions
WordCount Topology
import org.apache.pulsar.functions.api.Context;
import org.apache.pulsar.functions.api.PulsarFunction;
public class CounterFunction implements PulsarFunction<String, Void> {
@Override
public Void process(String input, Context context) throws Exception {
for (String word : input.split(".")) {
context.incrCounter(word, 1);
}
return null;
}
}
33
Built-in State Management
Pulsar uses BookKeeper as its stream storage
FuncBons can store State in BookKeeper
Framework provides the Context object for users to access State
Support server side operaBons like Counters
Simplified applicaBon development
No need to standup an extra system to develop/test/integrate/operate
34
State Storage w/ BookKeeper
The built-in state management is powered by Table Service in BookKeeper
BP-30: Table Service
Originated for a built-in metadata management within BookKeeper
Expose for general usage. e.g. State management for Pulsar FuncBons
Developer Preview
Pulsar FuncBons at Pulsar 2.0
Direct usage at BookKeeper 4.7
35
State Storage w/ BookKeeper
Updates are wri^en in the log streams in BookKeeper
Materialized into a key/value table view
The key/value table is indexed with rocksdb for fast lookup
The source-of-truth is the log streams in BookKeeper
Rocksdb are transient key/value indexes
Rocksdb instances are incrementally checkpointed and stored into BookKeeper for
fast recovery
36
Pulsar Functions
Running as a standalone applicaBon
bin/pulsar-admin functions localrun 
--input persistent://sample/standalone/ns1/test_input 
--output persistent://sample/standalone/ns1/test_result 
--className org.mycompany.ExclamationFunction 
--jar myjar.jar
Runs as a standalone process
Run as many instances as you want. Framework automaBcally balances data
Run and manage via Mesos/K8/Nomad/your favorite tool
37
Pulsar Functions
Running inside Pulsar cluster
‘Create’ and ‘Delete’ FuncBons in a Pulsar Cluster
Pulsar brokers run funcBons as either threads/processes/docker containers
Unifies Messaging and Compute cluster into one, significantly improving
manageability
Ideal match for Edge or small startup environment
Serverless in a jar
38
Pulsar Functions
Stepping back: Where Pulsar FuncBons belong
Powerful/Complicated systems have their place
Data Centers/Cloud
Complex analysis
A significant percentage of analyBcs/acBons are mundane
ETL/CounBng/RouBng
Use simple tools for simple things
39
Pulsar Functions: Use Cases
Edge CompuBng
Sensor devices generate tons of data
We need local acBons
Simple filtering, threshold detecBon, regex matching, etc
Manageability is a big concern
The less moving parts, the be^er
Resource Constrained
Limited scope for Full blown schedulers/Job Managers
40
Pulsar Functions: Use Cases
Model Serving
Models computed via offline analysis
Incoming requests should be classified using the model
FuncBon is a natural representaBon for the classificaBon acBon
Model itself can be stored in Bookkeeper
41
Roadmap
More language supports - Go, Javascript, C++
Cross FuncBons : FuncBon ComposiBon API
More State operaBons exposed to FuncBons
42
Conclusion
Stream-NaBve Compute (aka FuncBons) is the new paradigm in Messaging Systems
Stream-NaBve Storage (aka States) is the new paradigm in Storage Systems
Pulsar FuncBons bridges lightweight compuBng capability into messaging and
storage system, which is the trends that streaming applicaBons need
h^ps://pulsar.incubator.apache.org/docs/latest/funcBons/quickstart/
43
Questions and Thank You!

More Related Content

What's hot

Pulsar Architectural Patterns for CI/CD Automation and Self-Service_Devin Bost
Pulsar Architectural Patterns for CI/CD Automation and Self-Service_Devin BostPulsar Architectural Patterns for CI/CD Automation and Self-Service_Devin Bost
Pulsar Architectural Patterns for CI/CD Automation and Self-Service_Devin Bost
StreamNative
 
Streaming Design Patterns Using Alpakka Kafka Connector (Sean Glover, Lightbe...
Streaming Design Patterns Using Alpakka Kafka Connector (Sean Glover, Lightbe...Streaming Design Patterns Using Alpakka Kafka Connector (Sean Glover, Lightbe...
Streaming Design Patterns Using Alpakka Kafka Connector (Sean Glover, Lightbe...
confluent
 

What's hot (20)

No Surprises Geo Replication - Pulsar Virtual Summit Europe 2021
No Surprises Geo Replication - Pulsar Virtual Summit Europe 2021No Surprises Geo Replication - Pulsar Virtual Summit Europe 2021
No Surprises Geo Replication - Pulsar Virtual Summit Europe 2021
 
Using the JMS 2.0 API with Apache Pulsar - Pulsar Virtual Summit Europe 2021
Using the JMS 2.0 API with Apache Pulsar - Pulsar Virtual Summit Europe 2021Using the JMS 2.0 API with Apache Pulsar - Pulsar Virtual Summit Europe 2021
Using the JMS 2.0 API with Apache Pulsar - Pulsar Virtual Summit Europe 2021
 
Interactive Analytics on Pulsar with Pulsar SQL - Pulsar Virtual Summit Europ...
Interactive Analytics on Pulsar with Pulsar SQL - Pulsar Virtual Summit Europ...Interactive Analytics on Pulsar with Pulsar SQL - Pulsar Virtual Summit Europ...
Interactive Analytics on Pulsar with Pulsar SQL - Pulsar Virtual Summit Europ...
 
Transaction preview of Apache Pulsar
Transaction preview of Apache PulsarTransaction preview of Apache Pulsar
Transaction preview of Apache Pulsar
 
Open keynote_carolyn&matteo&sijie
Open keynote_carolyn&matteo&sijieOpen keynote_carolyn&matteo&sijie
Open keynote_carolyn&matteo&sijie
 
Apache Pulsar at Yahoo! Japan
Apache Pulsar at Yahoo! JapanApache Pulsar at Yahoo! Japan
Apache Pulsar at Yahoo! Japan
 
Scaling customer engagement with apache pulsar
Scaling customer engagement with apache pulsarScaling customer engagement with apache pulsar
Scaling customer engagement with apache pulsar
 
Getting Pulsar Spinning_Addison Higham
Getting Pulsar Spinning_Addison HighamGetting Pulsar Spinning_Addison Higham
Getting Pulsar Spinning_Addison Higham
 
Pulsar Architectural Patterns for CI/CD Automation and Self-Service_Devin Bost
Pulsar Architectural Patterns for CI/CD Automation and Self-Service_Devin BostPulsar Architectural Patterns for CI/CD Automation and Self-Service_Devin Bost
Pulsar Architectural Patterns for CI/CD Automation and Self-Service_Devin Bost
 
How Splunk Mission Control leverages various Pulsar subscription types_Pranav...
How Splunk Mission Control leverages various Pulsar subscription types_Pranav...How Splunk Mission Control leverages various Pulsar subscription types_Pranav...
How Splunk Mission Control leverages various Pulsar subscription types_Pranav...
 
Streaming Design Patterns Using Alpakka Kafka Connector (Sean Glover, Lightbe...
Streaming Design Patterns Using Alpakka Kafka Connector (Sean Glover, Lightbe...Streaming Design Patterns Using Alpakka Kafka Connector (Sean Glover, Lightbe...
Streaming Design Patterns Using Alpakka Kafka Connector (Sean Glover, Lightbe...
 
[Demo session] 관리형 Kafka 서비스 - Oracle Event Hub Service
[Demo session] 관리형 Kafka 서비스 - Oracle Event Hub Service[Demo session] 관리형 Kafka 서비스 - Oracle Event Hub Service
[Demo session] 관리형 Kafka 서비스 - Oracle Event Hub Service
 
Function Mesh for Apache Pulsar, the Way for Simple Streaming Solutions
Function Mesh for Apache Pulsar, the Way for Simple Streaming SolutionsFunction Mesh for Apache Pulsar, the Way for Simple Streaming Solutions
Function Mesh for Apache Pulsar, the Way for Simple Streaming Solutions
 
Unifying Messaging, Queueing & Light Weight Compute Using Apache Pulsar
Unifying Messaging, Queueing & Light Weight Compute Using Apache PulsarUnifying Messaging, Queueing & Light Weight Compute Using Apache Pulsar
Unifying Messaging, Queueing & Light Weight Compute Using Apache Pulsar
 
Kafka Summit SF 2017 - Kafka Connect Best Practices – Advice from the Field
Kafka Summit SF 2017 - Kafka Connect Best Practices – Advice from the FieldKafka Summit SF 2017 - Kafka Connect Best Practices – Advice from the Field
Kafka Summit SF 2017 - Kafka Connect Best Practices – Advice from the Field
 
How Splunk Is Using Pulsar IO
How Splunk Is Using Pulsar IOHow Splunk Is Using Pulsar IO
How Splunk Is Using Pulsar IO
 
Pulsar in the Lakehouse: Overview of Apache Pulsar and Delta Lake Connector -...
Pulsar in the Lakehouse: Overview of Apache Pulsar and Delta Lake Connector -...Pulsar in the Lakehouse: Overview of Apache Pulsar and Delta Lake Connector -...
Pulsar in the Lakehouse: Overview of Apache Pulsar and Delta Lake Connector -...
 
Better Kafka Performance Without Changing Any Code | Simon Ritter, Azul
Better Kafka Performance Without Changing Any Code | Simon Ritter, AzulBetter Kafka Performance Without Changing Any Code | Simon Ritter, Azul
Better Kafka Performance Without Changing Any Code | Simon Ritter, Azul
 
Twitter’s Apache Kafka Adoption Journey | Ming Liu, Twitter
Twitter’s Apache Kafka Adoption Journey | Ming Liu, TwitterTwitter’s Apache Kafka Adoption Journey | Ming Liu, Twitter
Twitter’s Apache Kafka Adoption Journey | Ming Liu, Twitter
 
Monitoring and Resiliency Testing our Apache Kafka Clusters at Goldman Sachs ...
Monitoring and Resiliency Testing our Apache Kafka Clusters at Goldman Sachs ...Monitoring and Resiliency Testing our Apache Kafka Clusters at Goldman Sachs ...
Monitoring and Resiliency Testing our Apache Kafka Clusters at Goldman Sachs ...
 

Similar to Stream-Native Processing with Pulsar Functions

Database Tools by Skype
Database Tools by SkypeDatabase Tools by Skype
Database Tools by Skype
elliando dias
 

Similar to Stream-Native Processing with Pulsar Functions (20)

Talk Python To Me: Stream Processing in your favourite Language with Beam on ...
Talk Python To Me: Stream Processing in your favourite Language with Beam on ...Talk Python To Me: Stream Processing in your favourite Language with Beam on ...
Talk Python To Me: Stream Processing in your favourite Language with Beam on ...
 
Flink Forward Berlin 2017: Aljoscha Krettek - Talk Python to me: Stream Proce...
Flink Forward Berlin 2017: Aljoscha Krettek - Talk Python to me: Stream Proce...Flink Forward Berlin 2017: Aljoscha Krettek - Talk Python to me: Stream Proce...
Flink Forward Berlin 2017: Aljoscha Krettek - Talk Python to me: Stream Proce...
 
Data Con LA 2018 - A Serverless Approach to Data Processing using Apache Puls...
Data Con LA 2018 - A Serverless Approach to Data Processing using Apache Puls...Data Con LA 2018 - A Serverless Approach to Data Processing using Apache Puls...
Data Con LA 2018 - A Serverless Approach to Data Processing using Apache Puls...
 
AWS Lambda and the Serverless Cloud
AWS Lambda and the Serverless CloudAWS Lambda and the Serverless Cloud
AWS Lambda and the Serverless Cloud
 
Build Real-Time Streaming ETL Pipelines With Akka Streams, Alpakka And Apache...
Build Real-Time Streaming ETL Pipelines With Akka Streams, Alpakka And Apache...Build Real-Time Streaming ETL Pipelines With Akka Streams, Alpakka And Apache...
Build Real-Time Streaming ETL Pipelines With Akka Streams, Alpakka And Apache...
 
Apache Pulsar Development 101 with Python
Apache Pulsar Development 101 with PythonApache Pulsar Development 101 with Python
Apache Pulsar Development 101 with Python
 
Flink Forward Berlin 2018: Thomas Weise & Aljoscha Krettek - "Python Streamin...
Flink Forward Berlin 2018: Thomas Weise & Aljoscha Krettek - "Python Streamin...Flink Forward Berlin 2018: Thomas Weise & Aljoscha Krettek - "Python Streamin...
Flink Forward Berlin 2018: Thomas Weise & Aljoscha Krettek - "Python Streamin...
 
Streaming Movies brings you Streamlined Applications -- How Adopting Netflix ...
Streaming Movies brings you Streamlined Applications -- How Adopting Netflix ...Streaming Movies brings you Streamlined Applications -- How Adopting Netflix ...
Streaming Movies brings you Streamlined Applications -- How Adopting Netflix ...
 
Python Streaming Pipelines with Beam on Flink
Python Streaming Pipelines with Beam on FlinkPython Streaming Pipelines with Beam on Flink
Python Streaming Pipelines with Beam on Flink
 
Serverless computing
Serverless computingServerless computing
Serverless computing
 
MicroProfile, Docker, Kubernetes, Istio and Open Shift lab @dev nexus
MicroProfile, Docker, Kubernetes, Istio and Open Shift lab @dev nexusMicroProfile, Docker, Kubernetes, Istio and Open Shift lab @dev nexus
MicroProfile, Docker, Kubernetes, Istio and Open Shift lab @dev nexus
 
Streaming Processing with a Distributed Commit Log
Streaming Processing with a Distributed Commit LogStreaming Processing with a Distributed Commit Log
Streaming Processing with a Distributed Commit Log
 
Introduction to Apache Beam & No Shard Left Behind: APIs for Massive Parallel...
Introduction to Apache Beam & No Shard Left Behind: APIs for Massive Parallel...Introduction to Apache Beam & No Shard Left Behind: APIs for Massive Parallel...
Introduction to Apache Beam & No Shard Left Behind: APIs for Massive Parallel...
 
Python Streaming Pipelines on Flink - Beam Meetup at Lyft 2019
Python Streaming Pipelines on Flink - Beam Meetup at Lyft 2019Python Streaming Pipelines on Flink - Beam Meetup at Lyft 2019
Python Streaming Pipelines on Flink - Beam Meetup at Lyft 2019
 
Spark (Structured) Streaming vs. Kafka Streams
Spark (Structured) Streaming vs. Kafka StreamsSpark (Structured) Streaming vs. Kafka Streams
Spark (Structured) Streaming vs. Kafka Streams
 
Apache Kafka - Scalable Message Processing and more!
Apache Kafka - Scalable Message Processing and more!Apache Kafka - Scalable Message Processing and more!
Apache Kafka - Scalable Message Processing and more!
 
Creating microservices architectures using node.js and Kubernetes
Creating microservices architectures using node.js and KubernetesCreating microservices architectures using node.js and Kubernetes
Creating microservices architectures using node.js and Kubernetes
 
Database Tools by Skype
Database Tools by SkypeDatabase Tools by Skype
Database Tools by Skype
 
MidSem
MidSemMidSem
MidSem
 
Logging for Production Systems in The Container Era
Logging for Production Systems in The Container EraLogging for Production Systems in The Container Era
Logging for Production Systems in The Container Era
 

More from Streamlio

More from Streamlio (11)

Infinite Topic Backlogs with Apache Pulsar
Infinite Topic Backlogs with Apache PulsarInfinite Topic Backlogs with Apache Pulsar
Infinite Topic Backlogs with Apache Pulsar
 
Streamlio and IoT analytics with Apache Pulsar
Streamlio and IoT analytics with Apache PulsarStreamlio and IoT analytics with Apache Pulsar
Streamlio and IoT analytics with Apache Pulsar
 
Self Regulating Streaming - Data Platforms Conference 2018
Self Regulating Streaming - Data Platforms Conference 2018Self Regulating Streaming - Data Platforms Conference 2018
Self Regulating Streaming - Data Platforms Conference 2018
 
Introduction to Apache BookKeeper Distributed Storage
Introduction to Apache BookKeeper Distributed StorageIntroduction to Apache BookKeeper Distributed Storage
Introduction to Apache BookKeeper Distributed Storage
 
Event Data Processing with Streamlio
Event Data Processing with StreamlioEvent Data Processing with Streamlio
Event Data Processing with Streamlio
 
Building data-driven microservices
Building data-driven microservicesBuilding data-driven microservices
Building data-driven microservices
 
Distributed Crypto-Currency Trading with Apache Pulsar
Distributed Crypto-Currency Trading with Apache PulsarDistributed Crypto-Currency Trading with Apache Pulsar
Distributed Crypto-Currency Trading with Apache Pulsar
 
Evaluating Streaming Data Solutions
Evaluating Streaming Data SolutionsEvaluating Streaming Data Solutions
Evaluating Streaming Data Solutions
 
Autopiloting Realtime Processing in Heron
Autopiloting Realtime Processing in HeronAutopiloting Realtime Processing in Heron
Autopiloting Realtime Processing in Heron
 
Introduction to Apache Heron
Introduction to Apache HeronIntroduction to Apache Heron
Introduction to Apache Heron
 
Messaging, storage, or both? The real time story of Pulsar and Apache Distri...
Messaging, storage, or both?  The real time story of Pulsar and Apache Distri...Messaging, storage, or both?  The real time story of Pulsar and Apache Distri...
Messaging, storage, or both? The real time story of Pulsar and Apache Distri...
 

Recently uploaded

Structuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessStructuring Teams and Portfolios for Success
Structuring Teams and Portfolios for Success
UXDXConf
 
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Peter Udo Diehl
 

Recently uploaded (20)

Structuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessStructuring Teams and Portfolios for Success
Structuring Teams and Portfolios for Success
 
Demystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyDemystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John Staveley
 
AI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří KarpíšekAI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří Karpíšek
 
Microsoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - QuestionnaireMicrosoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - Questionnaire
 
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
 
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
 
Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024
 
PLAI - Acceleration Program for Generative A.I. Startups
PLAI - Acceleration Program for Generative A.I. StartupsPLAI - Acceleration Program for Generative A.I. Startups
PLAI - Acceleration Program for Generative A.I. Startups
 
Powerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara LaskowskaPowerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara Laskowska
 
TopCryptoSupers 12thReport OrionX May2024
TopCryptoSupers 12thReport OrionX May2024TopCryptoSupers 12thReport OrionX May2024
TopCryptoSupers 12thReport OrionX May2024
 
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi IbrahimzadeFree and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
 
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdfThe Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
 
Oauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoftOauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoft
 
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdfIntroduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
 
WSO2CONMay2024OpenSourceConferenceDebrief.pptx
WSO2CONMay2024OpenSourceConferenceDebrief.pptxWSO2CONMay2024OpenSourceConferenceDebrief.pptx
WSO2CONMay2024OpenSourceConferenceDebrief.pptx
 
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
 
Google I/O Extended 2024 Warsaw
Google I/O Extended 2024 WarsawGoogle I/O Extended 2024 Warsaw
Google I/O Extended 2024 Warsaw
 
Syngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdfSyngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdf
 
The Metaverse: Are We There Yet?
The  Metaverse:    Are   We  There  Yet?The  Metaverse:    Are   We  There  Yet?
The Metaverse: Are We There Yet?
 
Intro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджераIntro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджера
 

Stream-Native Processing with Pulsar Functions

  • 1. 1 Lightweight Computing with Pulsar Functions Sanjeev Kulkarni, Sijie Guo
  • 2. 2 Event Driven Architectures The rise of RealTime BigData began with Batch HDFS/MapReduce/Hive ReacBon Times became important Reduce Bme between data arrival and data analysis/acBon Emergence of Real-Time Streaming ystems
  • 3. 3 What do we really mean by Real-Time? Aims Aim is to react to events as they happen in real-Bme Where do Events happen/arrive? Message Bus Whats a reacBon An acBon/transformaBon/funcBon
  • 6. 6 Traditional Compute API SBtching all of this by programmers public static class SplitSentence extends BaseBasicBolt { @Override public void declareOutputFields(OutputFieldsDeclarer declarer) { declarer.declare(new Fields("word")); } @Override public Map<String, Object> getComponentConfiguration() { return null; } public void execute(Tuple tuple, BasicOutputCollector basicOutputCollector) { String sentence = tuple.getStringByField("sentence"); String words[] = sentence.split(" "); for (String w : words) { basicOutputCollector.emit(new Values(w)); } } }
  • 7. 7 Traditional Compute API SBtching all of this by programmers public static class WordCount extends BaseBasicBolt { Map<String, Integer> counts = new HashMap<String, Integer>(); @Override public void execute(Tuple tuple, BasicOutputCollector collector) { String word = tuple.getString(0); Integer count = counts.get(word); if (count == null) count = 0; count++; counts.put(word, count); collector.emit(new Values(word, count)); } @Override public void declareOutputFields(OutputFieldsDeclarer declarer) { declarer.declare(new Fields("word", "count")); } }
  • 8. 8 Compute API 2.0 FuncBonal Builder.newBuilder() .newSource(() -> StreamletUtils.randomFromList(SENTENCES)) .flatMap(sentence -> Arrays.asList(sentence.toLowerCase().split("s+"))) .reduceByKeyAndWindow(word -> word, word -> 1, WindowConfig.TumblingCountWindow(50), (x, y) -> x + y);
  • 11. 11 Traditional Real-Time Systems Developer Experience Powerful API but complicated Does everyone really need to learn funcBonal programming? Configurable/Scaleable but management overhead Edge systems have resource/manageability constraints
  • 12. 12 Traditional Real-Time Systems OperaBonal Experience Another system to operate is one too many IOT deployment rouBnely have thousands of edge systems SemanBc difference Mismatch/DuplicaBon between Systems Creates Developer and Operator FricBon
  • 13. 13 Lessons learnt UseCases A significant percentage of transformaBons are simple ETL ReacBve Services ClassificaBon Real-Bme AggregaBon Event RouBng Microservices
  • 14. 14 Meanwhile The world of Cloud The emergence of Serverless Simple FuncBon API FuncBons are submi^ed to the system Run per event ComposiBon APIs to do complex things Wildly popular
  • 15. 15 Serverless vs Streaming Whats really the difference Both are event driven architectures Both can be used for analyBcs/serving Both have composiBon APIs Conf based for Serverless vs DSL based for Streaming Serverless typically don’t care for ordering Really the funcBon of the underlying source Pay per acBon Really a product billing interfaces
  • 16. 16 Whats needed:- Stream-Native Compute Insight gained from serverless Simplest possible API Method/Procedure/FuncBon MulB Language API Scale developers Message bus naBve concepts Input/Output/Log as topics Flexible runBme Simple standalone applicaBons vs system managed applicaBons
  • 19. 19 Ordering Guaranteed ordering Multi-tenancy A single cluster can support many tenants and use cases High throughput Can reach 1.8 M messages/s in a single partition Durability Data replicated and synced to disk Geo-replication Out of box support for geographically distributed applications Unified messaging model Support both Streaming and Queuing in a single model Delivery Guarantees At least once, at most once and effectively once Low Latency Low publish latency of 5ms at 99pct Highly scalable Can support millions of topics What is Apache Pulsar?
  • 20. 20 Pulsar Architecture Pulsar Broker 1 Pulsar Broker 1 Pulsar Broker 1 Bookie 1 Bookie 2 Bookie 3 Bookie 4 Bookie 5 Apache BookKeeper Apache Pulsar Producer Consumer Stateless Serving BROKER Clients interact only with brokers No state is stored in brokers BOOKIES Apache BookKeeper as the storage Storage is append only Provides high performance, low latency Durability No data loss. fsync before acknowledgement
  • 21. 21 Pulsar Architecture Pulsar Broker 1 Pulsar Broker 1 Pulsar Broker 1 Bookie 1 Bookie 2 Bookie 3 Bookie 4 Bookie 5 Apache BookKeeper Apache Pulsar Producer Consumer SeparaBon of Storage and Serving SERVING Brokers can be added independently Traffic can be shifted quickly across brokers STORAGE Bookies can be added independently New bookies will ramp up traffic quickly
  • 25. 25 Topic (T1) Topic (T1) Topic (T1) SubscripBon (S1) SubscripBon (S1) Producer (P1) Consumer (C1) Producer (P3) Producer (P2) Consumer (C2) Data Center A Data Center B Data Center C Multi Cluster Replication
  • 26. 26 Back to Pulsar Functions
  • 27. 27 Pulsar Functions API SDK less API import java.util.function.Function; public class ExclamationFunction implements Function<String, String> { @Override public String apply(String input) { return input + "!"; } }
  • 28. 28 Pulsar Functions API SDK API import org.apache.pulsar.functions.api.PulsarFunction; import org.apache.pulsar.functions.api.Context; public class ExclamationFunction implements PulsarFunction<String, String> { @Override public String process(String input, Context context) { return input + "!"; } }
  • 29. 29 Pulsar Functions Input and Output FuncBon executed for every message of input topic Supports mulBple topics as inputs FuncBon Output goes to the output topic FuncBon Output can be void/null SerDe takes care of serializaBon/deserializaBon of messages Custom SerDe can be provided by the users Integrates with Schema Registry
  • 30. 30 Pulsar Functions Processing Guarantees ATMOST_ONCE Message is acked to Pulsar as soon as we receive it ATLEAST_ONCE Message acked to Pulsar aeer the funcBon completes Default behaviour:- Not many ppl want to loose data EFFECTIVELY_ONCE Uses Pulsar’s inbuilt effecBvely once semanBcs Controlled at runBme by user
  • 31. 31 Pulsar Functions Built in State FuncBons can store state in StreamStore Framework provides an simple library around this Support server side operaBons like counters Simplified applicaBon development No need to standup an extra system
  • 32. 32 Pulsar Functions WordCount Topology import org.apache.pulsar.functions.api.Context; import org.apache.pulsar.functions.api.PulsarFunction; public class CounterFunction implements PulsarFunction<String, Void> { @Override public Void process(String input, Context context) throws Exception { for (String word : input.split(".")) { context.incrCounter(word, 1); } return null; } }
  • 33. 33 Built-in State Management Pulsar uses BookKeeper as its stream storage FuncBons can store State in BookKeeper Framework provides the Context object for users to access State Support server side operaBons like Counters Simplified applicaBon development No need to standup an extra system to develop/test/integrate/operate
  • 34. 34 State Storage w/ BookKeeper The built-in state management is powered by Table Service in BookKeeper BP-30: Table Service Originated for a built-in metadata management within BookKeeper Expose for general usage. e.g. State management for Pulsar FuncBons Developer Preview Pulsar FuncBons at Pulsar 2.0 Direct usage at BookKeeper 4.7
  • 35. 35 State Storage w/ BookKeeper Updates are wri^en in the log streams in BookKeeper Materialized into a key/value table view The key/value table is indexed with rocksdb for fast lookup The source-of-truth is the log streams in BookKeeper Rocksdb are transient key/value indexes Rocksdb instances are incrementally checkpointed and stored into BookKeeper for fast recovery
  • 36. 36 Pulsar Functions Running as a standalone applicaBon bin/pulsar-admin functions localrun --input persistent://sample/standalone/ns1/test_input --output persistent://sample/standalone/ns1/test_result --className org.mycompany.ExclamationFunction --jar myjar.jar Runs as a standalone process Run as many instances as you want. Framework automaBcally balances data Run and manage via Mesos/K8/Nomad/your favorite tool
  • 37. 37 Pulsar Functions Running inside Pulsar cluster ‘Create’ and ‘Delete’ FuncBons in a Pulsar Cluster Pulsar brokers run funcBons as either threads/processes/docker containers Unifies Messaging and Compute cluster into one, significantly improving manageability Ideal match for Edge or small startup environment Serverless in a jar
  • 38. 38 Pulsar Functions Stepping back: Where Pulsar FuncBons belong Powerful/Complicated systems have their place Data Centers/Cloud Complex analysis A significant percentage of analyBcs/acBons are mundane ETL/CounBng/RouBng Use simple tools for simple things
  • 39. 39 Pulsar Functions: Use Cases Edge CompuBng Sensor devices generate tons of data We need local acBons Simple filtering, threshold detecBon, regex matching, etc Manageability is a big concern The less moving parts, the be^er Resource Constrained Limited scope for Full blown schedulers/Job Managers
  • 40. 40 Pulsar Functions: Use Cases Model Serving Models computed via offline analysis Incoming requests should be classified using the model FuncBon is a natural representaBon for the classificaBon acBon Model itself can be stored in Bookkeeper
  • 41. 41 Roadmap More language supports - Go, Javascript, C++ Cross FuncBons : FuncBon ComposiBon API More State operaBons exposed to FuncBons
  • 42. 42 Conclusion Stream-NaBve Compute (aka FuncBons) is the new paradigm in Messaging Systems Stream-NaBve Storage (aka States) is the new paradigm in Storage Systems Pulsar FuncBons bridges lightweight compuBng capability into messaging and storage system, which is the trends that streaming applicaBons need h^ps://pulsar.incubator.apache.org/docs/latest/funcBons/quickstart/