© 2020 SPLUNK INC. Pulsar Functions A Deep Dive | Pulsar Summit 2020 David Kjerrumgaard Principal Software Engineer@splunk...
© 2020 SPLUNK INC. Pulsar Functions:- A Brief Introduction Bringing Serverless concepts to the streaming world. Execute pr...
© 2020 SPLUNK INC. Pulsar Functions:- A Brief Introduction Emphasis on simplicity Great for 90% use-cases on streams • Fil...
© 2020 SPLUNK INC. Pulsar Functions:- A Brief Introduction Flexible execution environments • Pulsar managed – Thread – Pro...
© 2020 SPLUNK INC. Pulsar Functions:- Submission Workflow Submit to any worker Json repr of FunctionConfig • tenant/namesp...
© 2020 SPLUNK INC. Pulsar Functions:- Submission Workflow AuthN/AuthZ checks FunctionConfig validation • missing parameter...
© 2020 SPLUNK INC. Pulsar Functions:- Submission Workflow System of record Stores all Functions • map from <FQFN, Function...
© 2020 SPLUNK INC. Pulsar Functions:- Submission Workflow Just before Function creation/update/delete Function MetaData Ma...
© 2020 SPLUNK INC. Pulsar Functions:- Submission Workflow Make a copy of the current state Function MetaData Manager:- Upd...
© 2020 SPLUNK INC. Pulsar Functions:- Submission Workflow Make a copy of the current state Merge the updates Function Meta...
© 2020 SPLUNK INC. Pulsar Functions:- Submission Workflow Make a copy of the current state Merge the updates Increment the...
© 2020 SPLUNK INC. Pulsar Functions:- Submission Workflow Make a copy of the current state Merge the updates Increment the...
© 2020 SPLUNK INC. Pulsar Functions:- Submission Workflow Make a copy of the current state Merge the updates Increment the...
© 2020 SPLUNK INC. Pulsar Functions:- Submission Workflow Make a copy of the current state Merge the updates Increment the...
© 2020 SPLUNK INC. Pulsar Functions:- Submission Workflow Multiple Workers Function MetaData Manager:- When do conflicts o...
© 2020 SPLUNK INC. Pulsar Functions:- Submission Workflow Multiple Workers Concurrent updates to same function Function Me...
© 2020 SPLUNK INC. Pulsar Functions:- Submission Workflow Multiple Workers Concurrent updates to same function First Write...
© 2020 SPLUNK INC. Pulsar Functions:- Submission Workflow Multiple Workers Concurrent updates to same function First Write...
© 2020 SPLUNK INC. Pulsar Functions:- Submission Workflow Submit to any worker Validation load scales linearly Determinist...
© 2020 SPLUNK INC. Pulsar Functions:- Submission Workflow MetaData topic topic growth MetaData Topic compaction non- trivi...
© 2020 SPLUNK INC. Pulsar Functions:- Scheduling Workflow Abstracts out Scheduler Executed only on a Leader Invoked when •...
© 2020 SPLUNK INC. Pulsar Functions:- Scheduling Workflow Empty Coordination Topic Failover Subscription Active Consumer i...
© 2020 SPLUNK INC. Pulsar Functions:- Scheduling Workflow Assignment Topic Written by the Leader Compacted based on key(FQ...
© 2020 SPLUNK INC. Pulsar Functions:- Scheduling Workflow Stores Assignment Compacted Key -> (FQFN + InstanceId) Assignmen...
© 2020 SPLUNK INC. Pulsar Functions:- Execution Workflow Triggered by Changes to Assignment Table Takes care of the worker...
© 2020 SPLUNK INC. Pulsar Functions:- Execution Workflow Abstracts out execution environments using Runtime Factory Manage...
© 2020 SPLUNK INC. Pulsar Functions:- Execution Workflow Short-circuit MetaData Manager and Runtime Manager Directly use S...
© 2020 SPLUNK INC. Pulsar Functions:- Execution Workflow Simple interface for creating execution environments Creates Runt...
© 2020 SPLUNK INC. Pulsar Functions:- Java Instance Java Instance is (source, function, sink) ensemble. Source abstracts r...
© 2020 SPLUNK INC. Pulsar Functions:- Java Instance Pulsar Source implements the Source interface to read from Pulsar topi...
© 2020 SPLUNK INC. Pulsar Functions:- Java Instance Java Instance What if we have non-Pulsar Source? Non Pulsar Source Pul...
© 2020 SPLUNK INC. Pulsar Functions:- Java Instance Java Instance Pulsar IO Non Pulsar Source Pulsar Sink f
© 2020 SPLUNK INC. Pulsar Functions:- Java Instance Non Pulsar Source reads from external system Identity Function lets th...
© 2020 SPLUNK INC. Pulsar Functions:- Java Instance Pulsar Source reads from Pulsar topics Identity Function lets the data...
© 2020 SPLUNK INC. Pulsar Functions:- Future Work Each setup only supports a static Runtime(Process/Thread/Pods) Change it...
© 2020 SPLUNK INC. Pulsar Functions:- Future Work MetaData Topic not compacted Stores all function change requests Worker ...
© 2020 SPLUNK INC. Pulsar Functions:- Future Work Chaining Functions Output of one going as input of others A simple workf...
© 2020 SPLUNK INC. Pulsar Functions:- Future Work Discover/Collect Cycle Repeating Cycle Don’t drop discovered tasks on fa...
Thank You © 2020 SPLUNK INC.
Pulsar Functions Deep Dive

Published on

Data Con LA 2020
Description
Pulsar Functions provide a simple yet powerful way of interacting with Pulsar topics, transforming, enriching and analyzing data contained in the streams. And with pluggable runtime environments, one can run Pulsar functions as threads/processes managed by Pulsar, or as containers/pods managed by external schedulers like Kubernetes. This talk does into the deep weeds of the underlying concepts in its implementation. In particular we will talk about the concepts of Runtime and scheduler that manages Pulsar managed functions. We will also delve into current pitfalls and areas of improvement.
Speaker
David Kjerrumgaard, Splunk, Principal Software Engineer

Published in: Data & Analytics
Pulsar Functions Deep Dive

  1. 1. © 2020 SPLUNK INC. Pulsar Functions A Deep Dive | Pulsar Summit 2020 David Kjerrumgaard Principal Software Engineer@splunk.com
  2. 2. © 2020 SPLUNK INC. Pulsar Functions:- A Deep Dive Brief introduction to Pulsar Functions Deep Dive into internals • Submission workflow • Scheduling workflow • Execution workflow • Java Instance concepts Current/Future Work Agenda
  3. 3. © 2020 SPLUNK INC. Pulsar Functions:- A Deep Dive Brief introduction to Pulsar Functions Deep Dive into internals • Submission workflow • Scheduling workflow • Execution workflow • Java Instance concepts Current/Future Work Agenda
  4. 4. © 2020 SPLUNK INC. Pulsar Functions:- A Brief Introduction Bringing Serverless concepts to the streaming world. Execute processing logic per message on input topic Function output goes to an output topic • Optional Abstract View Core Concept
  5. 5. © 2020 SPLUNK INC. Pulsar Functions:- A Brief Introduction Emphasis on simplicity Great for 90% use-cases on streams • Filtering • Routing • Enrichment Not meant to replace Spark/Flink SDK-less API import java.util.function.Function; public class ExclamationFunction implements Function<String, String> { @Override public String apply(String input) { return input + "!"; } } Simple API
  6. 6. © 2020 SPLUNK INC. Pulsar Functions:- A Brief Introduction Flexible execution environments • Pulsar managed – Thread – Process • Externally managed – Kubernetes CRUD based Rest API Function lifecycle
  7. 7. © 2020 SPLUNK INC. Pulsar Functions:- A Deep Dive Brief introduction to Pulsar Functions Deep Dive into internals • Submission workflow • Scheduling workflow • Execution workflow • Java Instance concepts Current/Future Work Agenda
  8. 8. © 2020 SPLUNK INC. Pulsar Functions:- Submission Workflow Submit to any worker Json repr of FunctionConfig • tenant/namespace/name • Input/Output • configs • lot more knobs …. Function Code • jars/.py/zip/etc FunctionConfig public class FunctionConfig { private String tenant; private String namespace; private String name; private String className; private Collection<String> inputs; private String output; private ProcessingGuarantees processingGuarantees; private Map<String, Object> userConfig; private Map<String, Object> secrets; private Integer parallelism; private Resources resources; ... } Function Representation
  9. 9. © 2020 SPLUNK INC. Pulsar Functions:- Submission Workflow AuthN/AuthZ checks FunctionConfig validation • missing parameters • Incorrect parameters • Local Configs Function Code Validation • class presence, etc Copy Code to Bookeeper FunctionMetaData message FunctionMetaData { FunctionDetails functionDetails; PackageLocationMetaData packageLocation; uint64 version; uint64 createTime; map<int32, FunctionState> instanceStates; FunctionAuthenticationSpec functionAuthSpec; } Submission Checks
  10. 10. © 2020 SPLUNK INC. Pulsar Functions:- Submission Workflow System of record Stores all Functions • map from <FQFN, FunctionMetaData> FQFN:- Fully Qualified Function Name Backed by Pulsar Topic • Function MetaData Topic Contains a MetaData Topic Tailer Function MetaData Manager Function MetaData Manager MetaData Topic Tailer MetaData Topic foo -> {functionDetails : {...}, version: 2, …}
  11. 11. © 2020 SPLUNK INC. Pulsar Functions:- Submission Workflow Just before Function creation/update/delete Function MetaData Manager:- Update State Machine MetaData Topic Tailer MetaData Topic foo -> {functionDetails : {...}, version: 2, …}
  12. 12. © 2020 SPLUNK INC. Pulsar Functions:- Submission Workflow Make a copy of the current state Function MetaData Manager:- Update State Machine MetaData Topic Tailer MetaData Topic foo -> {functionDetails : {...}, version: 2, …} foo -> {functionDetails : {...}, version: 2, …}
  13. 13. © 2020 SPLUNK INC. Pulsar Functions:- Submission Workflow Make a copy of the current state Merge the updates Function MetaData Manager:- Update State Machine MetaData Topic Tailer MetaData Topic foo -> {functionDetails : {...}, version: 2, …} foo -> {functionDetails : {......}, version: 2, …}
  14. 14. © 2020 SPLUNK INC. Pulsar Functions:- Submission Workflow Make a copy of the current state Merge the updates Increment the version Function MetaData Manager:- Update State Machine MetaData Topic Tailer MetaData Topic foo -> {functionDetails : {...}, version: 2, …} foo -> {functionDetails : {......}, version: 3, …}
  15. 15. © 2020 SPLUNK INC. Pulsar Functions:- Submission Workflow Make a copy of the current state Merge the updates Increment the version Write to MetaData Topic Function MetaData Manager:- Update State Machine MetaData Topic Tailer MetaData Topic foo -> {functionDetails : {...}, version: 2, …}
  16. 16. © 2020 SPLUNK INC. Pulsar Functions:- Submission Workflow Make a copy of the current state Merge the updates Increment the version Write to MetaData Topic Tailer reads and verifies Function MetaData Manager:- Update State Machine MetaData Topic Tailer MetaData Topic foo -> {functionDetails : {...}, version: 2, …}
  17. 17. © 2020 SPLUNK INC. Pulsar Functions:- Submission Workflow Make a copy of the current state Merge the updates Increment the version Write to MetaData Topic Tailer reads and verifies Upon no conflict, tailer updates Function MetaData Manager:- Update State Machine MetaData Topic Tailer MetaData Topic foo -> {functionDetails : {.....}, version: 3, …}
  18. 18. © 2020 SPLUNK INC. Pulsar Functions:- Submission Workflow Multiple Workers Function MetaData Manager:- When do conflicts occur? MetaData Topic Tailer MetaData Topic foo -> {functionDetails : {...}, version: 2, …} Worker 1 MetaData Topic Tailer foo -> {functionDetails : {...}, version: 2, …} Worker 2
  19. 19. © 2020 SPLUNK INC. Pulsar Functions:- Submission Workflow Multiple Workers Concurrent updates to same function Function MetaData Manager:- When do conflicts occur? MetaData Topic Tailer MetaData Topic foo -> {functionDetails : {...}, version: 2, …} Worker 1 MetaData Topic Tailer foo -> {functionDetails : {...}, version: 2, …} Worker 2
  20. 20. © 2020 SPLUNK INC. Pulsar Functions:- Submission Workflow Multiple Workers Concurrent updates to same function First Writer Wins Function MetaData Manager:- When do conflicts occur? MetaData Topic Tailer MetaData Topic foo -> {functionDetails : {...}, version: 3, …} Worker 1 MetaData Topic Tailer foo -> {functionDetails : {...}, version: 3, …} Worker 2
  21. 21. © 2020 SPLUNK INC. Pulsar Functions:- Submission Workflow Multiple Workers Concurrent updates to same function First Writer Wins Others are rejected Function MetaData Manager:- When do conflicts occur? MetaData Topic Tailer MetaData Topic foo -> {functionDetails : {...}, version: 3, …} Worker 1 MetaData Topic Tailer foo -> {functionDetails : {...}, version: 3, …} Worker 2
  22. 22. © 2020 SPLUNK INC. Pulsar Functions:- Submission Workflow Submit to any worker Validation load scales linearly Deterministic State Machine MetaData Topic is audit log Advantages MetaData Topic Tailer MetaData Topic foo -> {functionDetails : {...}, version: 3, …} Worker 1 MetaData Topic Tailer foo -> {functionDetails : {...}, version: 3, …} Worker 2
  23. 23. © 2020 SPLUNK INC. Pulsar Functions:- Submission Workflow MetaData topic topic growth MetaData Topic compaction non- trivial Worker Start time All Workers know everything Pitfalls MetaData Topic Tailer MetaData Topic foo -> {functionDetails : {...}, version: 3, …} Worker 1 MetaData Topic Tailer foo -> {functionDetails : {...}, version: 3, …} Worker 2
  24. 24. © 2020 SPLUNK INC. Pulsar Functions:- A Deep Dive Brief introduction to Pulsar Functions Deep Dive into internals • Submission workflow • Scheduling workflow • Execution workflow • Java Instance concepts Current/Future Work Agenda
  25. 25. © 2020 SPLUNK INC. Pulsar Functions:- Scheduling Workflow Abstracts out Scheduler Executed only on a Leader Invoked when • Function CRUD operations – create/update – delete • Worker Changes – Unresponsive/dead workers – New workers – Periodic – Leadership changes IScheduler Interface public interface IScheduler { List<Assignment> schedule(<List<Instance> unassigned, List<Instance> current, Set<String> workers); } Pluggable Scheduler
  26. 26. © 2020 SPLUNK INC. Pulsar Functions:- Scheduling Workflow Empty Coordination Topic Failover Subscription Active Consumer is the Leader Leader Election Leader Election Coordination Topic Worker 1Worker 2Worker 3
  27. 27. © 2020 SPLUNK INC. Pulsar Functions:- Scheduling Workflow Assignment Topic Written by the Leader Compacted based on key(FQFN + Instance Id) All workers know about all assignments Function Assignments Assignment Topic Worker 1Worker 2Worker 3 {foo, 1} : worker-1, ... {foo, 1} : worker-1, ... {foo, 1} : worker-1, ... Assignment Tailer Assignment Tailer Assignment Tailer
  28. 28. © 2020 SPLUNK INC. Pulsar Functions:- Scheduling Workflow Stores Assignment Compacted Key -> (FQFN + InstanceId) Assignment message Instance { FunctionMetaData functionMetaData = 1; int32 instanceId = 2; } message Assignment { Instance instance = 1; string workerId = 2; } Assignment Topic
  29. 29. © 2020 SPLUNK INC. Pulsar Functions:- A Deep Dive Brief introduction to Pulsar Functions Deep Dive into internals • Submission workflow • Scheduling workflow • Execution workflow • Java Instance concepts Current/Future Work Agenda
  30. 30. © 2020 SPLUNK INC. Pulsar Functions:- Execution Workflow Triggered by Changes to Assignment Table Takes care of the worker’s specific assignments Function lifecycle management via Spawner Function RunTime Manager Assignment Topic Worker {foo, 1} : worker-1, ... Assignment Tailer RunTime Manager Spawner Spawner
  31. 31. © 2020 SPLUNK INC. Pulsar Functions:- Execution Workflow Abstracts out execution environments using Runtime Factory Manages Function lifecycle Maintains grpc connection with Function instance Spawner GRPC Channel Spawner Function
  32. 32. © 2020 SPLUNK INC. Pulsar Functions:- Execution Workflow Short-circuit MetaData Manager and Runtime Manager Directly use Spawner Local Runner GRPC Channel Spawner Function Local Runner
  33. 33. © 2020 SPLUNK INC. Pulsar Functions:- Execution Workflow Simple interface for creating execution environments Creates Runtimes Runtime Factory public interface RuntimeFactory { void initialize(WorkerConfig workerConfig); Runtime createContainer(InstanceConfig instanceConfig, String codeFile); void close(); } Runtime Factory
  34. 34. © 2020 SPLUNK INC. Pulsar Functions:- A Deep Dive Brief introduction to Pulsar Functions Deep Dive into internals • Submission workflow • Scheduling workflow • Execution workflow • Java Instance concepts Current/Future Work Agenda
  35. 35. © 2020 SPLUNK INC. Pulsar Functions:- Java Instance Java Instance is (source, function, sink) ensemble. Source abstracts reading from input topics Sink abstracts writing to output topic Java Instance Source -> Process -> Sink Source Sink f
  36. 36. © 2020 SPLUNK INC. Pulsar Functions:- Java Instance Pulsar Source implements the Source interface to read from Pulsar topics Pulsar Sink implements Sink interface to write to Pulsar topic Java Instance Regular Pulsar Functions Pulsar Source Pulsar Sink f
  37. 37. © 2020 SPLUNK INC. Pulsar Functions:- Java Instance Java Instance What if we have non-Pulsar Source? Non Pulsar Source Pulsar Sink f
  38. 38. © 2020 SPLUNK INC. Pulsar Functions:- Java Instance Java Instance Pulsar IO Non Pulsar Source Pulsar Sink f
  39. 39. © 2020 SPLUNK INC. Pulsar Functions:- Java Instance Non Pulsar Source reads from external system Identity Function lets the data pass thru Pulsar Sink writes to Pulsar Java Instance Pulsar IO Source Non Pulsar Source SinkIdentity
  40. 40. © 2020 SPLUNK INC. Pulsar Functions:- Java Instance Pulsar Source reads from Pulsar topics Identity Function lets the data pass thru Non Pulsar Sink writes to an external system Java Instance Pulsar IO Sink Pulsar Source Non Pulsar Sink Identity
  41. 41. © 2020 SPLUNK INC. Pulsar Functions:- A Deep Dive Brief introduction to Pulsar Functions Deep Dive into internals • Submission workflow • Scheduling workflow • Execution workflow • Java Instance concepts Current/Future Work Agenda
  42. 42. © 2020 SPLUNK INC. Pulsar Functions:- Future Work Each setup only supports a static Runtime(Process/Thread/Pods) Change it to be dynamically specified during submission Function RunTime Manager Changes Dynamic Runtime Selection Assignment Topic Worker {foo, 1} : worker-1, ... Assignment Tailer RunTime Manager Spawner Function-1 Spawner Function-2 Thread Process
  43. 43. © 2020 SPLUNK INC. Pulsar Functions:- Future Work MetaData Topic not compacted Stores all function change requests Worker needs to read from beginning upon startup Function MetaData Topic Compaction MetaData Topic Tailer MetaData Topic foo -> {functionDetails : {...}, version: 2, …}
  44. 44. © 2020 SPLUNK INC. Pulsar Functions:- Future Work Chaining Functions Output of one going as input of others A simple workflow API Function Mesh f1 f2 f3 f4
  45. 45. © 2020 SPLUNK INC. Pulsar Functions:- Future Work Discover/Collect Cycle Repeating Cycle Don’t drop discovered tasks on failures BatchSource public interface BatchSource<T> { void open(final Map<String, Object> config, SourceContext context); void discover(Consumer<byte[]> taskEater); void prepare(byte[] task); Record<T> readNext(); } Batch Sources
  46. 46. Thank You © 2020 SPLUNK INC.

