Successfully reported this slideshow.
Your SlideShare is downloading. ×

Simplify Pulsar Functions Development with SQL - Pulsar Summit SF 2022

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad

Check these out next

1 of 39 Ad

Simplify Pulsar Functions Development with SQL - Pulsar Summit SF 2022

Download to read offline

Pulsar Functions is a succinct framework provided by Apache Pulsar to conduct real-time data processing. Its use cases include ETL pipeline, event-driven applications, and simple data analytics. While Pulsar Functions already provides an extremely simple programming interface, we want to further lower the barrier for users to access real-time data. Since SQL is one of the universal languages in the technology world and well accepted by the vast majority of data engineers, we decided to add a SQL expressing layer on top of Pulsar Functions runtime. In this talk, we will discuss the architecture and implementation of this new service. We will see how SQL syntax, Pulsar Functions, and Function Mesh can work together to deliver a unique user development experience for real-time data jobs in the cloud environment. We will also walk through use cases like filtering, routing, and projecting messages as well as integrating with the Pulsar IO Connectors framework.

Pulsar Functions is a succinct framework provided by Apache Pulsar to conduct real-time data processing. Its use cases include ETL pipeline, event-driven applications, and simple data analytics. While Pulsar Functions already provides an extremely simple programming interface, we want to further lower the barrier for users to access real-time data. Since SQL is one of the universal languages in the technology world and well accepted by the vast majority of data engineers, we decided to add a SQL expressing layer on top of Pulsar Functions runtime. In this talk, we will discuss the architecture and implementation of this new service. We will see how SQL syntax, Pulsar Functions, and Function Mesh can work together to deliver a unique user development experience for real-time data jobs in the cloud environment. We will also walk through use cases like filtering, routing, and projecting messages as well as integrating with the Pulsar IO Connectors framework.

Advertisement
Advertisement

More Related Content

More from StreamNative (20)

Recently uploaded (20)

Advertisement

Simplify Pulsar Functions Development with SQL - Pulsar Summit SF 2022

  1. 1. Pulsar Summit San Francisco Hotel Nikko August 18 2022 Ecosystem Simplify Pulsar Functions Development with SQL Neng Lu Platform Engineering Lead • StreamNative
  2. 2. Neng Lu is the platform engineering lead of compute at StreamNative. He drives the development of Pulsar Functions, Serverless Computing and ecosystem integration. He is also a committer of Apache Pulsar. Neng Lu Platform Engineering Lead StreamNative Rui Fu is a senior software engineer at StreamNative and a committer of Apache Pulsar. He actively contributes to Pulsar Functions, Function Mesh and Serverless Computing Rui Fu Senior Software Engineer StreamNative
  3. 3. Pulsar Functions
  4. 4. Pulsar Functions – Recap
  5. 5. Pulsar Functions – Recap
  6. 6. Pulsar Functions – Use Cases ● ETL(Extract-Transform-Load) Jobs ● Microservices ● Event Routing ● Real-time Aggregation
  7. 7. ● Easy Operation ○ Fully Integrated with Pulsar ○ No Extra Setup Needed ● Easy Development ○ Intuitive APIs: ■ Java: public O process(I input, Context context) ■ Python: def process(self, input, context) ■ Golang: func HandleRequest(ctx context.Context, in []byte) error Pulsar Functions – Benefits
  8. 8. Easier Operation?
  9. 9. Function Worker Recap ● Function Worker interleaves with Pulsar Broker ● Need to set up separate Function Worker cluster ● Function Worker relies on Pulsar Topics for scheduling ● Function Worker’s k8s runtime not truly cloud native
  10. 10. Function Mesh
  11. 11. Function Mesh – Recap ● Serverless framework to run Pulsar Functions in a cloud native way ● Consists of: ○ Set of CRDs for defining Pulsar Functions and Connectors ■ Function ■ Source ■ Sink ○ Operator that constantly reconciles the submitted CR ■ create sts, service, configmap, etc. ■ update according to user change ■ auto-scale if configured
  12. 12. Function Mesh – Architecture
  13. 13. Function Mesh – Summary ● Scheduling by Kubernetes not Function Worker ○ Simplicity ○ Reliability ○ Stability (both for function & brokers) ○ Extensibility (HPA, VPA, Scale-To-Zero etc) ● Compatible with Pulsar Admin Rest API ○ Seamless user experience
  14. 14. Easier Development?
  15. 15. Use Case 1 – Filtering/Routing ● Commonly used for different business purposes → duplicated code development ● Go through the whole Pulsar Functions dev life cycle ○ (Learn) ○ Develop ○ Package ○ Debug ○ Deploy
  16. 16. Use Case 2 – Connector with Transformations ● Long pipeline: ○ Connector ○ Transformation Function (Often duplicated with minor diffs) ○ Intermediate topic ● Go through the Pulsar Functions life cycle TWICE: ○ Connector ■ Develop(optional) ■ … ○ Transformation Function ■ Develop ■ Package ■ …
  17. 17. Any Solution?
  18. 18. SELECT * FROM StreamNative
  19. 19. SQL Abstraction – Why? ● Easiest to learn and apply ● Wide audience ● Safe & Controlled Operations ● Easy job life-cycle management ● Stream Processing Trend
  20. 20. SQL Abstraction – What? ● IS ○ an simplified way to develop Pulsar Functions pipeline ● IS NOT ○ an interactive tool to run ad-hoc query
  21. 21. SQL Abstraction – Components ● Gateway ● Runner ● Cli
  22. 22. SQL Abstraction – Gateway ● Parser <-> Runner ● Rest API Server <-> Cli
  23. 23. SQL Gateway – Parser ● Antlr4 grammar ● AST processor ● JSON representation SQL Statement Abstract Syntax Tree JSON Representation
  24. 24. Parser – Grammar
  25. 25. SQL Abstraction – Syntax ● Value Expression ○ Literal: Primitive value, like string, number, or boolean ○ Field: message payload field ○ KEY: message key ○ PROPERTIES[P_KEY]: message property ● WITH Item Definition ○ WITH MERGE KEYVALUE: Merge the fields of KeyValue schema ○ WITH UNWRAP KEY|VALUE: Extract Key or Value fields from KeyValue schema
  26. 26. Parser – Examples
  27. 27. Parser – AST
  28. 28. Parser – JSON Representation ● Intermediate Representation ○ Filter ○ Router ○ Projection ○ WITH Conditions
  29. 29. SQL Abstraction – Runner ● An implementation of Pulsar Functions API ● Accept the JSON representation ● Generate Filtering/Routing processor during initialization ● Utilize `GenericObject` to handle different schemas ● Directly push result into target topic
  30. 30. SQL Abstraction – Runner ● Processor ○ An interface for classes that implement data transformations ○ schema projections ○ data manipulations ○ data type conversions ● Chain Compiler ○ List<Processor> ○ Compiled from the SQL Context
  31. 31. SQL Gateway – REST APIs Query Management /snsql/query POST /snsql/query/pause/$NAME GET /snsql/query/resume/$NAME GET /snsql/query/delete/$NAME GET /snsql/query/status/$NAME GET /snsql/query/stats/$NAME GET Gateway Information /snsql/info GET /snsql/healthcheck GET
  32. 32. SQL Gateway – REST Server ● Quarkus Framework ○ easy to implement ○ cloud-native support ● Metadata Management ○ write into Pulsar topic ○ read with TableView API
  33. 33. SQL Abstraction – CLI ● Terminal based tool ● Interact with the SQL gateway APIs ● Query management
  34. 34. SQL Abstraction – Summary
  35. 35. Demo
  36. 36. Future Work ● Syntax support for Source/Sink ● Builtin system function support ● Aggregation Operation ● Join Operation
  37. 37. Resources ● Pulsar Functions: https://pulsar.apache.org/docs/functions-overview/ ● Function Mesh: https://functionmesh.io/ ● Slack & Mailing List: ○ Apache Pulsar Slack: https://apache-pulsar.slack.com/ ○ StreamNative Community Slack: https://streamnativecommunity.slack.com/ ○ Apache Pulsar Mailing List: ■ users@pulsar.apache.org ■ dev@pulsar.apache.org
  38. 38. Neng Lu Thank you! nlu@streamnative.io Pulsar Summit San Francisco Hotel Nikko August 18 2022 rfu@streamnative.io Rui Fu @nlu90 @freeznet rfu nlu

×