SlideShare a Scribd company logo
1 of 16
Download to read offline
Processing and analysing streaming
data with Python
using Apache Kafka, Apache Flink, QuestDB and Grafana
Javier Ramirez
(@supercoco9)
Developer Advocate
A simple problem (until you know the details)
● I want to calculate the total and average of several numbers
A simple big data problem (until you know the details)
● I want to calculate the total and average of several numbers
● They might be MANY numbers, more than you can store in memory, or in a
single hard drive
A simple streaming problem
● I want to calculate the total and average of several numbers
● They might be MANY numbers, more than you can store in memory, or in a
single hard drive
● The dataset is not static, new numbers are coming all the time
A simplish streaming problem
● I want to calculate the total and average of several numbers
● They might be MANY numbers, more than you can store in memory, or in a
single hard drive
● The dataset is not static, new numbers are coming all the time
● From different sensors, which are geo distributed and moving. We will be
adding and removing sensors all the time
A quite standard streaming problem
● I want to calculate the total and average of several numbers
● They might be MANY numbers, more than you can store in memory, or in a
single hard drive
● The dataset is not static, new numbers are coming all the time
● From different sensors, which are geo distributed and moving. We will be
adding and removing sensors all the time
● And since they use 3G and batteries, some might go quiet for a while and
then send a bunch of stale data
An elastic and scalable streaming problem
● I want to calculate the total and average of several numbers
● They might be MANY numbers, more than you can store in memory, or in a
single hard drive
● The dataset is not static, new numbers are coming all the time
● From different sensors, which are geo distributed and moving. We will be
adding and removing sensors all the time
● And since they use 3G and batteries, some might go quiet for a while and
then send a bunch of stale data
● Flow will not be constant (from few events per second to thousands)
An almost real-life streaming analytics scenario
● I want to calculate the total and average of several numbers
● They might be MANY numbers, more than you can store in memory, or in a single
hard drive
● The dataset is not static, new numbers are coming all the time
● From different sensors, which are geo distributed and moving. We will be adding
and removing sensors all the time
● And since they use 3G and batteries, some might go quiet for a while and then send
a bunch of stale data
● Flow will not be constant (from few events per second to thousands)
● And I don’t want just the total average, but total per month, per week, per day, per
hour, per minute…
A real business problem you can solve with streaming
● I want to calculate the total and average of several numbers
● They might be MANY numbers, more than you can store in memory, or in a single hard drive
● The dataset is not static, new numbers are coming all the time
● From different sensors, which are geo distributed and moving. We will be adding and removing sensors all the time
● And since they use 3G and batteries, some might go quiet for a while and then send a bunch of stale data
● Flow will not be constant (from few events per second to thousands)
● And I don’t want just the total average, but total per month, per week, per day, per hour, per minute…
● We need pretty dashboards with current status, comparison with the past, trends, and
anomaly detection
● To run this reliably, we need advanced monitoring, alerts, and autoscaling
● No, I am not hiring a whole new operations team to manage the system
Let’s play a game
YOU are hired to build an end-to-end solution ingesting sensor data from thousands of devices for an e-health application or an
industrial customer.
Or YOU are hired to build tracking of vehicle fleets, like Uber, Lyft, or an airline.
Or YOU are going to ingest data from a game like candy crush or fortnite.
Or YOU are in charge of monitoring and securing thousands of servers.
Or YOU are a video platform, or an telecom provider, and want to detect quality issues.
Or YOU have an ecommerce application where you need to display ads in real-time, provide recommendations, do fraud detection,
and detect cart abandonment.
Streaming data pipeline overview
Key requirements
Durable
Stateful
Continuous
Fast
Correct
Reactive
Reliable
Transform Analyze React Persist Analyze/Reuse
Ingest
Apache Kafka: A distributed streaming platform
Apache Flink: Stateful computations over data streams
QuestDB: The fastest open source time series database
Grafana: Query, visualize, alert on, and understand your
data no matter where it’s stored.
Apache Zeppelin: Web-based notebook that enables
data-driven, interactive data analytics and collaborative
documents with SQL, Scala, Python, R and more.
Demo time
https://github.com/questdb/questdb
Javier Ramirez
(@supercoco9)
Developer Advocate
Grazie

More Related Content

Similar to Processing and analysing streaming data with Python. Pycon Italy 2022

Monitoring Big Data Systems - "The Simple Way"
Monitoring Big Data Systems - "The Simple Way"Monitoring Big Data Systems - "The Simple Way"
Monitoring Big Data Systems - "The Simple Way"Demi Ben-Ari
 
How Gousto is moving to just-in-time personalization with Snowplow
How Gousto is moving to just-in-time personalization with SnowplowHow Gousto is moving to just-in-time personalization with Snowplow
How Gousto is moving to just-in-time personalization with SnowplowGiuseppe Gaviani
 
OSMC 2018 | Learnings, patterns and Uber’s metrics platform M3, open sourced ...
OSMC 2018 | Learnings, patterns and Uber’s metrics platform M3, open sourced ...OSMC 2018 | Learnings, patterns and Uber’s metrics platform M3, open sourced ...
OSMC 2018 | Learnings, patterns and Uber’s metrics platform M3, open sourced ...NETWAYS
 
Streamlio and IoT analytics with Apache Pulsar
Streamlio and IoT analytics with Apache PulsarStreamlio and IoT analytics with Apache Pulsar
Streamlio and IoT analytics with Apache PulsarStreamlio
 
Predicting Startup Market Trends based on the news and social media - Albert ...
Predicting Startup Market Trends based on the news and social media - Albert ...Predicting Startup Market Trends based on the news and social media - Albert ...
Predicting Startup Market Trends based on the news and social media - Albert ...GetInData
 
Analysing streaming data in real time (AWS)
Analysing streaming data in real time (AWS)Analysing streaming data in real time (AWS)
Analysing streaming data in real time (AWS)javier ramirez
 
Simply Business' Data Platform
Simply Business' Data PlatformSimply Business' Data Platform
Simply Business' Data PlatformDani Solà Lagares
 
Getting started with streaming analytics: streaming basics (1 of 3)
Getting started with streaming analytics: streaming basics (1 of 3)Getting started with streaming analytics: streaming basics (1 of 3)
Getting started with streaming analytics: streaming basics (1 of 3)javier ramirez
 
Getting started with streaming analytics
Getting started with streaming analyticsGetting started with streaming analytics
Getting started with streaming analyticsjavier ramirez
 
Provisioning and Capacity Planning Workshop (Dogpatch Labs, September 2015)
Provisioning and Capacity Planning Workshop (Dogpatch Labs, September 2015)Provisioning and Capacity Planning Workshop (Dogpatch Labs, September 2015)
Provisioning and Capacity Planning Workshop (Dogpatch Labs, September 2015)Brian Brazil
 
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016Dan Lynn
 
Dirty data? Clean it up! - Datapalooza Denver 2016
Dirty data? Clean it up! - Datapalooza Denver 2016Dirty data? Clean it up! - Datapalooza Denver 2016
Dirty data? Clean it up! - Datapalooza Denver 2016Dan Lynn
 
OSMC 2015 | Testing in Production by Devdas Bhagat
OSMC 2015 | Testing in Production by Devdas BhagatOSMC 2015 | Testing in Production by Devdas Bhagat
OSMC 2015 | Testing in Production by Devdas BhagatNETWAYS
 
OSMC 2015: Testing in Production by Devdas Bhagat
OSMC 2015: Testing in Production by Devdas BhagatOSMC 2015: Testing in Production by Devdas Bhagat
OSMC 2015: Testing in Production by Devdas BhagatNETWAYS
 
Measure All the Things! - Austin Data Day 2014
Measure All the Things! - Austin Data Day 2014Measure All the Things! - Austin Data Day 2014
Measure All the Things! - Austin Data Day 2014gdusbabek
 
Wiring the IoT for modern manufacturing
Wiring the IoT for modern manufacturingWiring the IoT for modern manufacturing
Wiring the IoT for modern manufacturingFlorent Solt
 
Turning the web stack upside down rethinking how data flows through systems
Turning the web stack upside down  rethinking how data flows through systemsTurning the web stack upside down  rethinking how data flows through systems
Turning the web stack upside down rethinking how data flows through systemsPaolo Negri
 
Challenges of monitoring distributed systems
Challenges of monitoring distributed systemsChallenges of monitoring distributed systems
Challenges of monitoring distributed systemsNenad Bozic
 
Continues Deployment - Tech Talk week
Continues Deployment - Tech Talk weekContinues Deployment - Tech Talk week
Continues Deployment - Tech Talk weekrantav
 

Similar to Processing and analysing streaming data with Python. Pycon Italy 2022 (20)

Monitoring Big Data Systems - "The Simple Way"
Monitoring Big Data Systems - "The Simple Way"Monitoring Big Data Systems - "The Simple Way"
Monitoring Big Data Systems - "The Simple Way"
 
How Gousto is moving to just-in-time personalization with Snowplow
How Gousto is moving to just-in-time personalization with SnowplowHow Gousto is moving to just-in-time personalization with Snowplow
How Gousto is moving to just-in-time personalization with Snowplow
 
OSMC 2018 | Learnings, patterns and Uber’s metrics platform M3, open sourced ...
OSMC 2018 | Learnings, patterns and Uber’s metrics platform M3, open sourced ...OSMC 2018 | Learnings, patterns and Uber’s metrics platform M3, open sourced ...
OSMC 2018 | Learnings, patterns and Uber’s metrics platform M3, open sourced ...
 
Streamlio and IoT analytics with Apache Pulsar
Streamlio and IoT analytics with Apache PulsarStreamlio and IoT analytics with Apache Pulsar
Streamlio and IoT analytics with Apache Pulsar
 
Predicting Startup Market Trends based on the news and social media - Albert ...
Predicting Startup Market Trends based on the news and social media - Albert ...Predicting Startup Market Trends based on the news and social media - Albert ...
Predicting Startup Market Trends based on the news and social media - Albert ...
 
Observability at Spotify
Observability at SpotifyObservability at Spotify
Observability at Spotify
 
Analysing streaming data in real time (AWS)
Analysing streaming data in real time (AWS)Analysing streaming data in real time (AWS)
Analysing streaming data in real time (AWS)
 
Simply Business' Data Platform
Simply Business' Data PlatformSimply Business' Data Platform
Simply Business' Data Platform
 
Getting started with streaming analytics: streaming basics (1 of 3)
Getting started with streaming analytics: streaming basics (1 of 3)Getting started with streaming analytics: streaming basics (1 of 3)
Getting started with streaming analytics: streaming basics (1 of 3)
 
Getting started with streaming analytics
Getting started with streaming analyticsGetting started with streaming analytics
Getting started with streaming analytics
 
Provisioning and Capacity Planning Workshop (Dogpatch Labs, September 2015)
Provisioning and Capacity Planning Workshop (Dogpatch Labs, September 2015)Provisioning and Capacity Planning Workshop (Dogpatch Labs, September 2015)
Provisioning and Capacity Planning Workshop (Dogpatch Labs, September 2015)
 
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
 
Dirty data? Clean it up! - Datapalooza Denver 2016
Dirty data? Clean it up! - Datapalooza Denver 2016Dirty data? Clean it up! - Datapalooza Denver 2016
Dirty data? Clean it up! - Datapalooza Denver 2016
 
OSMC 2015 | Testing in Production by Devdas Bhagat
OSMC 2015 | Testing in Production by Devdas BhagatOSMC 2015 | Testing in Production by Devdas Bhagat
OSMC 2015 | Testing in Production by Devdas Bhagat
 
OSMC 2015: Testing in Production by Devdas Bhagat
OSMC 2015: Testing in Production by Devdas BhagatOSMC 2015: Testing in Production by Devdas Bhagat
OSMC 2015: Testing in Production by Devdas Bhagat
 
Measure All the Things! - Austin Data Day 2014
Measure All the Things! - Austin Data Day 2014Measure All the Things! - Austin Data Day 2014
Measure All the Things! - Austin Data Day 2014
 
Wiring the IoT for modern manufacturing
Wiring the IoT for modern manufacturingWiring the IoT for modern manufacturing
Wiring the IoT for modern manufacturing
 
Turning the web stack upside down rethinking how data flows through systems
Turning the web stack upside down  rethinking how data flows through systemsTurning the web stack upside down  rethinking how data flows through systems
Turning the web stack upside down rethinking how data flows through systems
 
Challenges of monitoring distributed systems
Challenges of monitoring distributed systemsChallenges of monitoring distributed systems
Challenges of monitoring distributed systems
 
Continues Deployment - Tech Talk week
Continues Deployment - Tech Talk weekContinues Deployment - Tech Talk week
Continues Deployment - Tech Talk week
 

More from javier ramirez

¿Se puede vivir del open source? T3chfest
¿Se puede vivir del open source? T3chfest¿Se puede vivir del open source? T3chfest
¿Se puede vivir del open source? T3chfestjavier ramirez
 
QuestDB: The building blocks of a fast open-source time-series database
QuestDB: The building blocks of a fast open-source time-series databaseQuestDB: The building blocks of a fast open-source time-series database
QuestDB: The building blocks of a fast open-source time-series databasejavier ramirez
 
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...javier ramirez
 
Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...
Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...
Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...javier ramirez
 
Deduplicating and analysing time-series data with Apache Beam and QuestDB
Deduplicating and analysing time-series data with Apache Beam and QuestDBDeduplicating and analysing time-series data with Apache Beam and QuestDB
Deduplicating and analysing time-series data with Apache Beam and QuestDBjavier ramirez
 
Your Database Cannot Do this (well)
Your Database Cannot Do this (well)Your Database Cannot Do this (well)
Your Database Cannot Do this (well)javier ramirez
 
Your Timestamps Deserve Better than a Generic Database
Your Timestamps Deserve Better than a Generic DatabaseYour Timestamps Deserve Better than a Generic Database
Your Timestamps Deserve Better than a Generic Databasejavier ramirez
 
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...javier ramirez
 
QuestDB-Community-Call-20220728
QuestDB-Community-Call-20220728QuestDB-Community-Call-20220728
QuestDB-Community-Call-20220728javier ramirez
 
QuestDB: ingesting a million time series per second on a single instance. Big...
QuestDB: ingesting a million time series per second on a single instance. Big...QuestDB: ingesting a million time series per second on a single instance. Big...
QuestDB: ingesting a million time series per second on a single instance. Big...javier ramirez
 
Servicios e infraestructura de AWS y la próxima región en Aragón
Servicios e infraestructura de AWS y la próxima región en AragónServicios e infraestructura de AWS y la próxima región en Aragón
Servicios e infraestructura de AWS y la próxima región en Aragónjavier ramirez
 
Primeros pasos en desarrollo serverless
Primeros pasos en desarrollo serverlessPrimeros pasos en desarrollo serverless
Primeros pasos en desarrollo serverlessjavier ramirez
 
How AWS is reinventing the cloud
How AWS is reinventing the cloudHow AWS is reinventing the cloud
How AWS is reinventing the cloudjavier ramirez
 
Analitica de datos en tiempo real con Apache Flink y Apache BEAM
Analitica de datos en tiempo real con Apache Flink y Apache BEAMAnalitica de datos en tiempo real con Apache Flink y Apache BEAM
Analitica de datos en tiempo real con Apache Flink y Apache BEAMjavier ramirez
 
Getting started with streaming analytics: Setting up a pipeline
Getting started with streaming analytics: Setting up a pipelineGetting started with streaming analytics: Setting up a pipeline
Getting started with streaming analytics: Setting up a pipelinejavier ramirez
 
Getting started with streaming analytics: Deep Dive
Getting started with streaming analytics: Deep DiveGetting started with streaming analytics: Deep Dive
Getting started with streaming analytics: Deep Divejavier ramirez
 
Monitorización de seguridad y detección de amenazas con AWS
Monitorización de seguridad y detección de amenazas con AWSMonitorización de seguridad y detección de amenazas con AWS
Monitorización de seguridad y detección de amenazas con AWSjavier ramirez
 
Consulta cualquier fuente de datos usando SQL con Amazon Athena y sus consult...
Consulta cualquier fuente de datos usando SQL con Amazon Athena y sus consult...Consulta cualquier fuente de datos usando SQL con Amazon Athena y sus consult...
Consulta cualquier fuente de datos usando SQL con Amazon Athena y sus consult...javier ramirez
 
Recomendaciones, predicciones y detección de fraude usando servicios de intel...
Recomendaciones, predicciones y detección de fraude usando servicios de intel...Recomendaciones, predicciones y detección de fraude usando servicios de intel...
Recomendaciones, predicciones y detección de fraude usando servicios de intel...javier ramirez
 
Open Distro for ElasticSearch and how Grimoire is using it. Madrid DevOps Oct...
Open Distro for ElasticSearch and how Grimoire is using it. Madrid DevOps Oct...Open Distro for ElasticSearch and how Grimoire is using it. Madrid DevOps Oct...
Open Distro for ElasticSearch and how Grimoire is using it. Madrid DevOps Oct...javier ramirez
 

More from javier ramirez (20)

¿Se puede vivir del open source? T3chfest
¿Se puede vivir del open source? T3chfest¿Se puede vivir del open source? T3chfest
¿Se puede vivir del open source? T3chfest
 
QuestDB: The building blocks of a fast open-source time-series database
QuestDB: The building blocks of a fast open-source time-series databaseQuestDB: The building blocks of a fast open-source time-series database
QuestDB: The building blocks of a fast open-source time-series database
 
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
 
Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...
Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...
Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...
 
Deduplicating and analysing time-series data with Apache Beam and QuestDB
Deduplicating and analysing time-series data with Apache Beam and QuestDBDeduplicating and analysing time-series data with Apache Beam and QuestDB
Deduplicating and analysing time-series data with Apache Beam and QuestDB
 
Your Database Cannot Do this (well)
Your Database Cannot Do this (well)Your Database Cannot Do this (well)
Your Database Cannot Do this (well)
 
Your Timestamps Deserve Better than a Generic Database
Your Timestamps Deserve Better than a Generic DatabaseYour Timestamps Deserve Better than a Generic Database
Your Timestamps Deserve Better than a Generic Database
 
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
 
QuestDB-Community-Call-20220728
QuestDB-Community-Call-20220728QuestDB-Community-Call-20220728
QuestDB-Community-Call-20220728
 
QuestDB: ingesting a million time series per second on a single instance. Big...
QuestDB: ingesting a million time series per second on a single instance. Big...QuestDB: ingesting a million time series per second on a single instance. Big...
QuestDB: ingesting a million time series per second on a single instance. Big...
 
Servicios e infraestructura de AWS y la próxima región en Aragón
Servicios e infraestructura de AWS y la próxima región en AragónServicios e infraestructura de AWS y la próxima región en Aragón
Servicios e infraestructura de AWS y la próxima región en Aragón
 
Primeros pasos en desarrollo serverless
Primeros pasos en desarrollo serverlessPrimeros pasos en desarrollo serverless
Primeros pasos en desarrollo serverless
 
How AWS is reinventing the cloud
How AWS is reinventing the cloudHow AWS is reinventing the cloud
How AWS is reinventing the cloud
 
Analitica de datos en tiempo real con Apache Flink y Apache BEAM
Analitica de datos en tiempo real con Apache Flink y Apache BEAMAnalitica de datos en tiempo real con Apache Flink y Apache BEAM
Analitica de datos en tiempo real con Apache Flink y Apache BEAM
 
Getting started with streaming analytics: Setting up a pipeline
Getting started with streaming analytics: Setting up a pipelineGetting started with streaming analytics: Setting up a pipeline
Getting started with streaming analytics: Setting up a pipeline
 
Getting started with streaming analytics: Deep Dive
Getting started with streaming analytics: Deep DiveGetting started with streaming analytics: Deep Dive
Getting started with streaming analytics: Deep Dive
 
Monitorización de seguridad y detección de amenazas con AWS
Monitorización de seguridad y detección de amenazas con AWSMonitorización de seguridad y detección de amenazas con AWS
Monitorización de seguridad y detección de amenazas con AWS
 
Consulta cualquier fuente de datos usando SQL con Amazon Athena y sus consult...
Consulta cualquier fuente de datos usando SQL con Amazon Athena y sus consult...Consulta cualquier fuente de datos usando SQL con Amazon Athena y sus consult...
Consulta cualquier fuente de datos usando SQL con Amazon Athena y sus consult...
 
Recomendaciones, predicciones y detección de fraude usando servicios de intel...
Recomendaciones, predicciones y detección de fraude usando servicios de intel...Recomendaciones, predicciones y detección de fraude usando servicios de intel...
Recomendaciones, predicciones y detección de fraude usando servicios de intel...
 
Open Distro for ElasticSearch and how Grimoire is using it. Madrid DevOps Oct...
Open Distro for ElasticSearch and how Grimoire is using it. Madrid DevOps Oct...Open Distro for ElasticSearch and how Grimoire is using it. Madrid DevOps Oct...
Open Distro for ElasticSearch and how Grimoire is using it. Madrid DevOps Oct...
 

Recently uploaded

DAA Assignment Solution.pdf is the best1
DAA Assignment Solution.pdf is the best1DAA Assignment Solution.pdf is the best1
DAA Assignment Solution.pdf is the best1sinhaabhiyanshu
 
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...yulianti213969
 
Las implicancias del memorándum de entendimiento entre Codelco y SQM según la...
Las implicancias del memorándum de entendimiento entre Codelco y SQM según la...Las implicancias del memorándum de entendimiento entre Codelco y SQM según la...
Las implicancias del memorándum de entendimiento entre Codelco y SQM según la...Voces Mineras
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareGraham Ware
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Klinik kandungan
 
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATIONCapstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATIONLakpaYanziSherpa
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样wsppdmt
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格q6pzkpark
 
sourabh vyas1222222222222222222244444444
sourabh vyas1222222222222222222244444444sourabh vyas1222222222222222222244444444
sourabh vyas1222222222222222222244444444saurabvyas476
 
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证pwgnohujw
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...Elaine Werffeli
 
Case Study 4 Where the cry of rebellion happen?
Case Study 4 Where the cry of rebellion happen?Case Study 4 Where the cry of rebellion happen?
Case Study 4 Where the cry of rebellion happen?RemarkSemacio
 
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证acoha1
 
社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token Prediction社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token PredictionNABLAS株式会社
 
Simplify hybrid data integration at an enterprise scale. Integrate all your d...
Simplify hybrid data integration at an enterprise scale. Integrate all your d...Simplify hybrid data integration at an enterprise scale. Integrate all your d...
Simplify hybrid data integration at an enterprise scale. Integrate all your d...varanasisatyanvesh
 

Recently uploaded (20)

Abortion pills in Jeddah |+966572737505 | get cytotec
Abortion pills in Jeddah |+966572737505 | get cytotecAbortion pills in Jeddah |+966572737505 | get cytotec
Abortion pills in Jeddah |+966572737505 | get cytotec
 
DAA Assignment Solution.pdf is the best1
DAA Assignment Solution.pdf is the best1DAA Assignment Solution.pdf is the best1
DAA Assignment Solution.pdf is the best1
 
Abortion pills in Riyadh Saudi Arabia| +966572737505 | Get Cytotec, Unwanted Kit
Abortion pills in Riyadh Saudi Arabia| +966572737505 | Get Cytotec, Unwanted KitAbortion pills in Riyadh Saudi Arabia| +966572737505 | Get Cytotec, Unwanted Kit
Abortion pills in Riyadh Saudi Arabia| +966572737505 | Get Cytotec, Unwanted Kit
 
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
Las implicancias del memorándum de entendimiento entre Codelco y SQM según la...
Las implicancias del memorándum de entendimiento entre Codelco y SQM según la...Las implicancias del memorándum de entendimiento entre Codelco y SQM según la...
Las implicancias del memorándum de entendimiento entre Codelco y SQM según la...
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham Ware
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
 
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATIONCapstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATION
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
 
sourabh vyas1222222222222222222244444444
sourabh vyas1222222222222222222244444444sourabh vyas1222222222222222222244444444
sourabh vyas1222222222222222222244444444
 
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
Case Study 4 Where the cry of rebellion happen?
Case Study 4 Where the cry of rebellion happen?Case Study 4 Where the cry of rebellion happen?
Case Study 4 Where the cry of rebellion happen?
 
Abortion pills in Doha {{ QATAR }} +966572737505) Get Cytotec
Abortion pills in Doha {{ QATAR }} +966572737505) Get CytotecAbortion pills in Doha {{ QATAR }} +966572737505) Get Cytotec
Abortion pills in Doha {{ QATAR }} +966572737505) Get Cytotec
 
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotecAbortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
 
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
 
社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token Prediction社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token Prediction
 
Simplify hybrid data integration at an enterprise scale. Integrate all your d...
Simplify hybrid data integration at an enterprise scale. Integrate all your d...Simplify hybrid data integration at an enterprise scale. Integrate all your d...
Simplify hybrid data integration at an enterprise scale. Integrate all your d...
 

Processing and analysing streaming data with Python. Pycon Italy 2022

  • 1. Processing and analysing streaming data with Python using Apache Kafka, Apache Flink, QuestDB and Grafana Javier Ramirez (@supercoco9) Developer Advocate
  • 2. A simple problem (until you know the details) ● I want to calculate the total and average of several numbers
  • 3. A simple big data problem (until you know the details) ● I want to calculate the total and average of several numbers ● They might be MANY numbers, more than you can store in memory, or in a single hard drive
  • 4. A simple streaming problem ● I want to calculate the total and average of several numbers ● They might be MANY numbers, more than you can store in memory, or in a single hard drive ● The dataset is not static, new numbers are coming all the time
  • 5. A simplish streaming problem ● I want to calculate the total and average of several numbers ● They might be MANY numbers, more than you can store in memory, or in a single hard drive ● The dataset is not static, new numbers are coming all the time ● From different sensors, which are geo distributed and moving. We will be adding and removing sensors all the time
  • 6. A quite standard streaming problem ● I want to calculate the total and average of several numbers ● They might be MANY numbers, more than you can store in memory, or in a single hard drive ● The dataset is not static, new numbers are coming all the time ● From different sensors, which are geo distributed and moving. We will be adding and removing sensors all the time ● And since they use 3G and batteries, some might go quiet for a while and then send a bunch of stale data
  • 7. An elastic and scalable streaming problem ● I want to calculate the total and average of several numbers ● They might be MANY numbers, more than you can store in memory, or in a single hard drive ● The dataset is not static, new numbers are coming all the time ● From different sensors, which are geo distributed and moving. We will be adding and removing sensors all the time ● And since they use 3G and batteries, some might go quiet for a while and then send a bunch of stale data ● Flow will not be constant (from few events per second to thousands)
  • 8. An almost real-life streaming analytics scenario ● I want to calculate the total and average of several numbers ● They might be MANY numbers, more than you can store in memory, or in a single hard drive ● The dataset is not static, new numbers are coming all the time ● From different sensors, which are geo distributed and moving. We will be adding and removing sensors all the time ● And since they use 3G and batteries, some might go quiet for a while and then send a bunch of stale data ● Flow will not be constant (from few events per second to thousands) ● And I don’t want just the total average, but total per month, per week, per day, per hour, per minute…
  • 9. A real business problem you can solve with streaming ● I want to calculate the total and average of several numbers ● They might be MANY numbers, more than you can store in memory, or in a single hard drive ● The dataset is not static, new numbers are coming all the time ● From different sensors, which are geo distributed and moving. We will be adding and removing sensors all the time ● And since they use 3G and batteries, some might go quiet for a while and then send a bunch of stale data ● Flow will not be constant (from few events per second to thousands) ● And I don’t want just the total average, but total per month, per week, per day, per hour, per minute… ● We need pretty dashboards with current status, comparison with the past, trends, and anomaly detection ● To run this reliably, we need advanced monitoring, alerts, and autoscaling ● No, I am not hiring a whole new operations team to manage the system
  • 10. Let’s play a game YOU are hired to build an end-to-end solution ingesting sensor data from thousands of devices for an e-health application or an industrial customer. Or YOU are hired to build tracking of vehicle fleets, like Uber, Lyft, or an airline. Or YOU are going to ingest data from a game like candy crush or fortnite. Or YOU are in charge of monitoring and securing thousands of servers. Or YOU are a video platform, or an telecom provider, and want to detect quality issues. Or YOU have an ecommerce application where you need to display ads in real-time, provide recommendations, do fraud detection, and detect cart abandonment.
  • 11.
  • 12.
  • 13. Streaming data pipeline overview Key requirements Durable Stateful Continuous Fast Correct Reactive Reliable Transform Analyze React Persist Analyze/Reuse Ingest
  • 14. Apache Kafka: A distributed streaming platform Apache Flink: Stateful computations over data streams QuestDB: The fastest open source time series database Grafana: Query, visualize, alert on, and understand your data no matter where it’s stored. Apache Zeppelin: Web-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala, Python, R and more.