Successfully reported this slideshow.
Your SlideShare is downloading. ×

Data Engineer's Lunch #86: Building Real-Time Applications at Scale: A Case Study in Cyclist Crash Detection

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Loading in …3
×

Check these out next

1 of 33 Ad

Data Engineer's Lunch #86: Building Real-Time Applications at Scale: A Case Study in Cyclist Crash Detection

Download to read offline

As the demand for real-time data processing continues to grow, so too do the challenges associated with building production-ready applications that can handle large volumes of data and handle it quickly. In this talk, we will explore common problems faced when building real-time applications at scale, with a focus on a specific use case: detecting and responding to cyclist crashes. Using telemetry data collected from a fitness app, we’ll demonstrate how we used a combination of Apache Kafka and Python-based microservices running on Kubernetes to build a pipeline for processing and analyzing this data in real-time. We'll also discuss how we used machine learning techniques to build a model for detecting collisions and how we implemented notifications to alert family members of a crash. Our ultimate goal is to help you navigate the challenges that come with building data-intensive, real-time applications that use ML models. By showcasing a real-world example, we aim to provide practical solutions and insights that you can apply to your own projects.

Key takeaways:
An understanding of the common challenges faced when building real-time applications at scale

Strategies for using Apache Kafka and Python-based microservices to process and analyze data in real-time

Tips for implementing machine learning models in a real-time application

Best practices for responding to and handling critical events in a real-time application

As the demand for real-time data processing continues to grow, so too do the challenges associated with building production-ready applications that can handle large volumes of data and handle it quickly. In this talk, we will explore common problems faced when building real-time applications at scale, with a focus on a specific use case: detecting and responding to cyclist crashes. Using telemetry data collected from a fitness app, we’ll demonstrate how we used a combination of Apache Kafka and Python-based microservices running on Kubernetes to build a pipeline for processing and analyzing this data in real-time. We'll also discuss how we used machine learning techniques to build a model for detecting collisions and how we implemented notifications to alert family members of a crash. Our ultimate goal is to help you navigate the challenges that come with building data-intensive, real-time applications that use ML models. By showcasing a real-world example, we aim to provide practical solutions and insights that you can apply to your own projects.

Key takeaways:
An understanding of the common challenges faced when building real-time applications at scale

Strategies for using Apache Kafka and Python-based microservices to process and analyze data in real-time

Tips for implementing machine learning models in a real-time application

Best practices for responding to and handling critical events in a real-time application

Advertisement
Advertisement

More Related Content

Similar to Data Engineer's Lunch #86: Building Real-Time Applications at Scale: A Case Study in Cyclist Crash Detection (20)

More from Anant Corporation (20)

Advertisement

Recently uploaded (20)

Data Engineer's Lunch #86: Building Real-Time Applications at Scale: A Case Study in Cyclist Crash Detection

  1. 1. Quix / Data Natives 2022 How to build real-time ML apps How to work with Time-series data in Kafka using Python 1
  2. 2. Quix / Data Natives 2022 Hello, nice to meet you! CTO and co-founder at Quix Previously McLaren technical lead 2 Tomas Neubauer
  3. 3. Quix / Data Natives 2022 Racing background Experience in solving real-time data processing in the most extreme, time-critical environment ● 50,000 channels per car ● 1.5 kHz per channel ● 1,000s realtime models and simulations 3
  4. 4. Quix / Data Natives 2022 Now, raise your hand if you already knew about … 4
  5. 5. Quix / Data Natives 2022 Streaming Now, raise your hand if you already knew about … 5
  6. 6. Quix / Data Natives 2022 Kafka Now, raise your hand if you already knew about … 6
  7. 7. Quix / Data Natives 2022 Streaming VS Batch 1.1 7 An overview of data processing approaches
  8. 8. Quix / Data Natives 2022 Batch With batch processing, a set of data is collected over time, then fed into an analytics system. In other words, you collect a batch of information, then send it in for processing. 8 Processing
  9. 9. Quix / Data Natives 2022 Streaming With stream processing, data is fed into analytics tools piece-by-piece. The processing is usually done in real time. 9 Processing
  10. 10. Quix / Data Natives 2022 Streaming vs. batch Batch processing works well in situations where you don’t need real time analytics results. Stream processing is key if you want analytics results that are always current. By building data streams, you can feed data into analytics tools as soon as it is generated and get near-instant analytics results. 10
  11. 11. Quix / Data Natives 2022 Stream processing applications 1.1 11 An overview of stream processing approaches
  12. 12. Quix / Data Natives 2022 12 Stream processing applications When you building stream processing applications with Kafka, there are two options: 1. Just build an application that uses the Kafka producer and consumer APIs directly 2. Adopt a full-fledged stream processing framework (Apache Flink, Apache Beam, etc)
  13. 13. Quix / Data Natives 2022 13 Kafka producer and consumer APIs ● Works for simple stuff like one-message-at-a-time processing ● No external dependencies like JVM ● Gets very complicated when stateful processing is needed like calculation aggregations or joining multiple streams
  14. 14. Quix / Data Natives 2022 14 Stream processing frameworks ● Fully fledged stream processing frameworks solves stateful, more complex operations ● But it is for a cost of increased complexity in many dimensions: ○ Java dependency ○ Deployment gets difficult because code is not running on its own but in server side cluster (Flink cluster or Spark cluster) ○ Debugging is difficult ○ Performance optimization is difficult ○ Gets even worse when we combine synchronous architecture with asynchronous in one application
  15. 15. Quix / Data Natives 2022 15 15 Is there a third way? ● Combining Kafka API approach with stream processing library ● Abstraction from key-value messages of Kafka API to virtual tables ● Standalone library that runs: ○ Locally for development and debugging ○ In docker or in Kubernetes for production deployments at scale
  16. 16. Quix / Data Natives 2022 Our approach 2.2 Using Kafka + Kubernetes + Python 16
  17. 17. Quix / Data Natives 2022 NO JAVA We are committed to zero target by 2050 17
  18. 18. Quix / Data Natives 2022 NO SQL 18
  19. 19. Quix / Data Natives 2022 19
  20. 20. Quix / Data Natives 2022 20 20 IP UDFs are nasty ● Poor development experience ○ Logs only accessible from server, no debugging possible ● Performance hit caused by interface between JVM and Python
  21. 21. Quix / Data Natives 2022 Our approach to stream processing Microservices Microservices running in Kubernetes in docker containers scaling hand to hand with Kafka for compute scalability. Kafka Handle your data reliably and efficiently in memory with Kafka. Using Kafka partitions, replica system and persistence to deliver scalability and robustness. Python Python gives you flexibility. It lets you transform data, not just query it. From simple filtering to ML usecases like video processing. 21
  22. 22. Quix / Data Natives 2022 APP Processing with streaming SUB 22 Var 1 Var 2 Var 3 Var 4 Score Pred. INPUT TOPIC OUTPUT TOPIC PUB
  23. 23. Quix / Data Natives 2022 Var 1 Var 2 Var 3 Var 4 INPUT TOPIC Score Pred. Scale SUB 23 PUB OUTPUT TOPIC
  24. 24. Quix / Data Natives 2022 Var 1 Var 2 Var 3 Var 4 INPUT TOPIC Score Pred. Fault tolerant SUB 24 PUB OUTPUT TOPIC
  25. 25. Quix / Data Natives 2022 ML Deployment 01 REST API vs Streaming 25
  26. 26. Quix / Data Natives 2022 ML Deployment with API API REQUEST 26 Var 1 Var 2 Var 3 Var 4 REST API API RESPONSE Score Pred. APP
  27. 27. Quix / Data Natives 2022 Issues with REST APIs 02 REST API vs Streaming 27
  28. 28. Quix / Data Natives 2022 Problems with REST API 28 Var 1 Var 2 Var 3 Var 4 API REQUEST 28 REST API APP
  29. 29. Quix / Data Natives 2022 Problems with REST API 29 Var 1 Var 2 Var 3 Var 4 API REQUEST 29 API REQUEST REST API APP
  30. 30. Quix / Data Natives 2022 Problems with REST API 30 Var 1 Var 2 Var 3 Var 4 API REQUEST 30 API APP API APP API REQUEST
  31. 31. Quix / Data Natives 2022 Demo Lets build it. 31
  32. 32. Quix / Data Natives 2022 32 phone-data Goal Crash detection ANALYSIS & TRAINING
  33. 33. Quix / Data Natives 2022 GitHub

×