Successfully reported this slideshow.
Your SlideShare is downloading. ×

Operational Analytics on Event Streams in Kafka

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad

Check these out next

1 of 22 Ad

Operational Analytics on Event Streams in Kafka

Download to read offline

Speaker: Anirudh Ramanthan, Product Manager, Rockset

Tracking key events and analyzing these event streams are critical to many enterprises. We highlight how organizations are using Apache Kafka® as a fast, reliable event streaming platform alongside Rockset, a serverless search and analytics engine, to create stateful microservices to analyze their event streams.

In this talk, we will discuss a stateful microservices architecture, where events from multiple channels are collected and streamed into Kafka and continuously ingested into Rockset with no explicit schema or metadata specification required. Developers then use serverless compute frameworks, like AWS Lambda, in conjunction with serverless data management from Rockset to build microservices to derive insights on the data from Kafka. Organizations can leverage this pattern to support low-latency queries on event streams, providing immediate insight on their business.

Speaker: Anirudh Ramanthan, Product Manager, Rockset

Tracking key events and analyzing these event streams are critical to many enterprises. We highlight how organizations are using Apache Kafka® as a fast, reliable event streaming platform alongside Rockset, a serverless search and analytics engine, to create stateful microservices to analyze their event streams.

In this talk, we will discuss a stateful microservices architecture, where events from multiple channels are collected and streamed into Kafka and continuously ingested into Rockset with no explicit schema or metadata specification required. Developers then use serverless compute frameworks, like AWS Lambda, in conjunction with serverless data management from Rockset to build microservices to derive insights on the data from Kafka. Organizations can leverage this pattern to support low-latency queries on event streams, providing immediate insight on their business.

Advertisement
Advertisement

More Related Content

Slideshows for you (20)

Similar to Operational Analytics on Event Streams in Kafka (20)

Advertisement

More from confluent (20)

Recently uploaded (20)

Advertisement

Operational Analytics on Event Streams in Kafka

  1. 1. Version 1.0 Anirudh Ramanathan Rockset Twitter: foxish_ Operational Analytics on Event Streams in Kafka
  2. 2. 2 ● Product Team @ Rockset ● Ex Google Cloud ● K8s Contributor & Release Team (with a focus on Controllers, Operators) ● Apache Spark committer Me Rockset ● Search and analytics engine ● Enables real-time applications and operational analytics About
  3. 3. Overview ● Streaming Data ● Kafka & Analytics ● Operational Analytics with Rockset ● Under the Hood ● Live Demo: building a microservice to analyze streaming event data ● Q&A 3
  4. 4. Why streaming data? Event data ● Advertising ● Financial Transactions ● Web clicks ● Online Gaming Interactions ● IoT - sensor data ● Travel Bookings 4 Between systems ● Data enrichment & transformation ● Moving data between sources and sinks ● Event driven architectures
  5. 5. Kafka and streaming data 5 Source: https://kafka.apache.org/intro ● Apache Kafka is a distributed streaming platform. ● Widely used for… ○ building systems that transform or react to streams of data ○ building pipelines that reliably get data between systems
  6. 6. Why move streaming data? When operating on streaming data, there are some common patterns that emerge: ● Trigger downstream events, alerts, etc based on certain conditions ● Write into data lakes & warehouses for archival ● Write into datastores for analytics 6
  7. 7. Analytics on Streaming Data 7
  8. 8. Operational Analytics Considerations for an operational analytics engine ● Data Latency - How up-to-date is my database? ● Query - Can I express complex queries, JOINs, etc? ● Query Latency - How fast are my individual queries? ● Query Throughput - Can I serve many users and requests? ● Retention - Lifecycle management for incoming events ● Operations - Maintenance, capacity planning and scaling 8
  9. 9. Rockset for Operational Analytics ● Low latency indexing and querying ● Full featured SQL ● Continuous Ingest ● Time-based data retention ● REST API and Client SDKs ● Low operational overhead - availability, replicas, sharding handled under the hood 9
  10. 10. ● Component of Apache Kafka ● Framework for connecting Kafka with external systems such as databases, key-value stores, search indexes, and file systems. ● Solves schema management, fault tolerance, parallelism, scaling, delivery semantics and monitoring in a consistent way for sources and sinks. Kafka Connect 10 Source: https://www.confluent.io/blog/announcing-kafka-connect-building-larg e-scale-low-latency-data-pipelines/
  11. 11. 11
  12. 12. Under the Hood 12
  13. 13. Converged Indexes 13 ● Multiple indexes are built and maintained behind the scenes ● Fast analytical queries + fast search queries ● SQL Optimizer picks between columnar store or search index ● All indexes stored on write-optimized RocksDB layer SELECT * FROM search_logs WHERE keyword = ‘kafka’ AND locale = ‘en’ Search index SELECT keyword, count(*) FROM search_logs GROUP BY keyword ORDER BY count(*) DESC Columnar store
  14. 14. Smart Schemas 14 ● Type information stored with values, not “columns” ● Strongly typed queries on dynamically typed fields ● No explicit schema definition needed ● Designed for nested semi-structured data
  15. 15. Disaggregated, Cloud-Native Architecture 15 ● Ingest, query, and storage tiers grow and shrink automatically ● Hierarchical storage (RAM, SSDs, Object Storage) ● Uses hardware elasticity in the cloud with node autoscaling ● K8s custom controllers scale up the data and compute tiers with pod autoscaling
  16. 16. Disaggregated, Cloud-Native Architecture 16
  17. 17. Live Demo: building a microservice to analyze streaming event data 17
  18. 18. Building a Stateful Microservice 18
  19. 19. 19 Ingest No schema definition Continuous sync of new data from data sources Query Build Full SQL across both structured and semi-structured data Fast query serving at scale to power your applications and dashboards
  20. 20. 20 Start Building Now: rockset.com
  21. 21. Thank You Anirudh Ramanathan anirudh@rockset.com 21
  22. 22. 22 Colors If you paste this slide into the back of your deck while you’re working, it will automatically import the colors you need into your custom colors palette.

×