SlideShare a Scribd company logo
REAL-TIME IOT ANALYTICS
WITH APACHE PULSAR
DAVID KJERRUMGAARD
JUNE 18TH, 2020
DEFINING IOT ANALYTICS
• It is NOT JUST loading sensor data into a data lake to create predictive
analytic models. While this is crucial piece of the puzzle, it is not the only
one.
• IoT Analytics requires the ability to ingest, aggregate, and process an endless
stream of real-time data coming off a wide variety of sensor devices “at the
edge”
• IoT Analytics renders real-time decisions at the edge of the network to either
optimize operational performance or detect anomalies for immediate
remediation.
WHAT MAKES IOT ANALYTICS DIFFERENT?
IOT ANALYTICS CHALLENGES
• IoT deals with machine generated data consisting of discrete
observations such as temperature, vibration, pressure, etc. that is
produced at very high rates.
• We need an architecture that:
• Allows us to quickly identify and react to anomalous events
• Reduces the volume of data transmitted back to the data lake.
• In this talk, we will present a solution based on Apache Pulsar
Functions that distributes the analytics processing across all tiers of
the IoT data ingestion pipeline.
IOT DATA INGESTION PIPELINE
APACHE PULSAR FUNCTIONS
PULSAR FUNCTIONS: STREAM PROCESSING
The Apache Pulsar platform provides a flexible,
serverless computing framework that allows you
execute user-defined functions to process and
transform data.
• Implemented as simple methods but allows you to
leverage existing libraries and code within Java or
Python code.
• Functions execute against every single event that is
published to a specified topic and write their
results to another topic. Forming a logical directed-
acyclic graph.
• Enable dynamic filtering, transformation, routing
and analytics.
• Can run anywhere a JVM can, including edge
devices.
BUILDING BLOCKS FOR IOT ANALYTICS
DISTRIBUTED PROBABILISTIC
ANALYTICS WITH APACHE
PULSAR FUNCTIONS
PROBABILISTIC ANALYSIS
• Often times, it is sufficient to provide an approximate value when it is
impossible and/or impractical to provide a precise value. In many
cases having an approximate answer within a given time frame is
better than waiting for an exact answer.
• Probabilistic algorithms can provide approximate values when the
event stream is either too large to store in memory, or the data is
moving too fast to process.
• Instead of requiring to keep such enormous data on-hand, we
leverage algorithms that require only a few kilobytes of data.
DATA SKETCHES
• A central theme throughout most of these probabilistic data
structures is the concept of data sketches, which are designed to
require only enough of the data necessary to make an accurate
estimation of the correct answer.
• Typically, sketches are implemented a bit arrays or maps thereby
requiring memory on the order of Kilobytes, making them ideal for
resource-constrained environments, e.g. on the edge.
• Sketching algorithms only need to see each incoming item only once,
and are therefore ideal for processing infinite streams of data.
SKETCH EXAMPLE
• Let’s walk through an
demonstration to show exactly
what I mean by sketches and show
you that we do not need 100% of
the data in order to make an
accurate prediction of what the
picture contains
• How much of the data did you
require to identify the main item
in the picture?
DATA SKETCH PROPERTIES
• Configurable Accuracy
• Sketches sized correctly can be 100% accurate
• Error rate is inversely proportional to size of a Sketch
• Fixed Memory Utilization
• Maximum Sketch size is configured in advance
• Memory cost of a query is thus known in advance
• Allows Non-additive Operations to be Additive
• Sketches can be merged into a single Sketch without over counting
• Allows tasks to be parallelized and combined later
• Allows results to be combined across windows of execution
OPERATIONS SUPPORTED BY SKETCHES
SOME SKETCHY FUNCTIONS
EVENT FREQUENCY
• Another common statistic computed is the frequency at which a
specific element occurs within an endless data stream with repeated
elements, which enables us to answer questions such as; “How
many times has element X occurred in the data stream?”.
• Consider trying to analyze and sample the IoT sensor data for just a
single industrial plant that can produce millions of readings per
second. There isn’t enough time to perform the calculations or store
the data.
• In such a scenario you can chose to forego an exact answer, which
will we never be able to compute in time, for an approximate
answer that is within an acceptable range of accuracy.
COUNT-MIN SKETCH
• The Count-Min Sketch algorithm uses two elements:
• An M-by-K matrix of counters, each initialized to 0, where
each row corresponds to a hash function
• A collection of K independent hash functions h(x).
• When an element is added to the sketch, each of the hash
functions are applied to the element. These hash values are
treated as indexes into the bit array, and the corresponding
array element is set incremented by 1.
• Now that we have an approximate count for each element we
have seen stored in the M-by-K matrix, we are able to quickly
determine how many times an element X has occurred
previously in the stream by simply applying each of the hash
functions to the element, and retrieving all of the
corresponding array elements and using the SMALLEST value
in the list are the approximate event count.
PULSAR FUNCTION: EVENT FREQUENCY
import org.apache.pulsar.functions.api.Context;
import org.apache.pulsar.functions.api.Function;
import com.clearspring.analytics.stream.frequency.CountMinSketch;
public class CountMinFunction implements Function<String, Long> {
CountMinSketch sketch = new CountMinSketch(20,20,128);
Long process(String input, Context context) throws Exception {
sketch.add(input, 1); // Calculates bit indexes and performs +1
long count = sketch.estimateCount(input);
// React to the updated count
return new Long(count);
}
}
K-FREQUENCY-ESTIMATION
• Another common use of the Count-Min algorithm is maintaining lists
of frequent items which is commonly referred to as the “Heavy
Hitters”. This design pattern retains a list of items that occur more
frequently than some predefined value, e.g. the top-K list
• The K-Frequency-Estimation problem can also be solved by using the
Count-Min Sketch algorithm. The logic for updating the counts is
exactly the same as in the Event Frequency use case.
• However, there is an additional list of length K used to keep the top-K
elements seen that is updated.
PULSAR FUNCTION: TOP K
import org.apache.pulsar.functions.api.Context;
import org.apache.pulsar.functions.api.Function;
import com.clearspring.analytics.stream.StreamSummary;
public class CountMinFunction implements Function<String, List<Counter<String>>> {
StreamSummary<String> summary = new StreamSummary<String> (256);
List<Counter<String>> process(String input, Context context) throws Exception {
// Add the element to the sketch
summary.offer(input, 1)
// Grab the updated top 10
List<Counter<String>> topK = summary.topK(10);
return topK;
}
}
ANOMALY DETECTION
• The most anomaly detectors use a manually configured threshold value that is not adaptive
to even simple patterns or variances.
• Instead of using a single static value for our thresholds, we should consider using quantiles
instead.
• In statistics and probably, quantiles are used to represent probability distributions. The most
common of which are known as percentiles.
ANOMALY DETECTION WITH QUANTILES
• The data structure known as t-digest was developed by Ted
Dunning, as a way to accurately estimate extreme quantiles for
very large data sets with limited memory use.
• This capability makes t-digest particularly useful for calculating
quantiles that can be used to select a good threshold for
anomaly detection.
• The advantage of this approach is that the threshold
automatically adapts to the dataset as it collects more data.
PULSAR FUNCTION: T-DIGEST
import com.clearspring.analytics.stream.quantile.TDigest;
public AnomalyDetectorFunction implements Function<Sensor s, Void> {
private static TDigest digest;
Void process(Sensor s, Context context) throws Exception {
Double threshold = context.getUserConfigValue(“threshold”).toString();
String alertTopic = context.getUserConfigValue(“alert-topic”).toString();
getDigest().add(s.getMetric()); // Add metric to observations
Double quantile = getDigest().compute(s.getMetric());
if (quantile > threshold) {
context.publish(alertTopic, sensor.getId());
}
return null;
}
private static TDigest getDigest() {
if (digest == null) {
digest = new TDigest(100); // Accuracy = 3/100 = 3%
}
return digest;
}
}
IOT ANALYTICS PIPELINE USING
APACHE PULSAR FUNCTIONS
IDENTIFYING REAL-TIME ENERGY CONSUMPTION
PATTERNS
• A network of smart meters enables utilities companies to gain
greater visibility into their customers energy consumption.
• Increase/decrease energy generation to meet the demand
• Implement dynamic notifications to encourage consumers
to use less energy during peak demand periods.
• Provide real-time revenue forecasts to senior business
leaders.
• Identify fault meters and schedule maintenance calls to
repair them.
SMART METER ANALYTICS FLOW LOGIC
SMART METER ANALYTICS FLOW – EVENT FREQUENCY
SMART METER ANALYTICS FLOW - VALIDATION
SMART METER ANALYTICS FLOW – DATA ENRICHMENT
SMART METER ANALYTICS FLOW – CATEGORIZATION
SMART METER ANALYTICS FLOW – ANOMALY DETECTION
SMART METER ANALYTICS FLOW – TOP K
SUMMARY & REVIEW
SUMMARY & REVIEW
• IoT Analytics is an extremely complex
problem, and modern streaming
platforms are not well suited to solving
this problem.
• Apache Pulsar provides a platform for
implementing distributed analytics on
the edge to decrease the data capture
time.
• Apache Pulsar Functions allows you to
leverage existing probabilistic analysis
techniques to provide approximate
values, within an acceptable degree of
accuracy.
• Both techniques allow you to act upon
your data while the business value is
still high.
Questions
THANK YOU

More Related Content

What's hot

Big Data Streaming processing using Apache Storm - FOSSCOMM 2016
Big Data Streaming processing using Apache Storm - FOSSCOMM 2016Big Data Streaming processing using Apache Storm - FOSSCOMM 2016
Big Data Streaming processing using Apache Storm - FOSSCOMM 2016
Adrianos Dadis
 
How Pulsar Enables Netdata to Offer Unlimited Infrastructure Monitoring for F...
How Pulsar Enables Netdata to Offer Unlimited Infrastructure Monitoring for F...How Pulsar Enables Netdata to Offer Unlimited Infrastructure Monitoring for F...
How Pulsar Enables Netdata to Offer Unlimited Infrastructure Monitoring for F...
StreamNative
 
IoT Austin CUG talk
IoT Austin CUG talkIoT Austin CUG talk
IoT Austin CUG talk
Felicia Haggarty
 
Event Stream Processing with Kafka and Samza
Event Stream Processing with Kafka and SamzaEvent Stream Processing with Kafka and Samza
Event Stream Processing with Kafka and Samza
Zach Cox
 
Lessons Learned - Monitoring the Data Pipeline at Hulu
Lessons Learned - Monitoring the Data Pipeline at HuluLessons Learned - Monitoring the Data Pipeline at Hulu
Lessons Learned - Monitoring the Data Pipeline at HuluDataWorks Summit
 
In Flux Limiting for a multi-tenant logging service
In Flux Limiting for a multi-tenant logging serviceIn Flux Limiting for a multi-tenant logging service
In Flux Limiting for a multi-tenant logging service
DataWorks Summit/Hadoop Summit
 
Bulletproof Kafka with Fault Tree Analysis (Andrey Falko, Lyft) Kafka Summit ...
Bulletproof Kafka with Fault Tree Analysis (Andrey Falko, Lyft) Kafka Summit ...Bulletproof Kafka with Fault Tree Analysis (Andrey Falko, Lyft) Kafka Summit ...
Bulletproof Kafka with Fault Tree Analysis (Andrey Falko, Lyft) Kafka Summit ...
confluent
 
Capital One: Using Cassandra In Building A Reporting Platform
Capital One: Using Cassandra In Building A Reporting PlatformCapital One: Using Cassandra In Building A Reporting Platform
Capital One: Using Cassandra In Building A Reporting Platform
DataStax Academy
 
netflix-real-time-data-strata-talk
netflix-real-time-data-strata-talknetflix-real-time-data-strata-talk
netflix-real-time-data-strata-talk
Danny Yuan
 
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ UberKafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
confluent
 
eBay Pulsar: Real-time analytics platform
eBay Pulsar: Real-time analytics platformeBay Pulsar: Real-time analytics platform
eBay Pulsar: Real-time analytics platform
KyoungMo Yang
 
Introduction to Streaming Analytics
Introduction to Streaming AnalyticsIntroduction to Streaming Analytics
Introduction to Streaming Analytics
Guido Schmutz
 
(BDT318) How Netflix Handles Up To 8 Million Events Per Second
(BDT318) How Netflix Handles Up To 8 Million Events Per Second(BDT318) How Netflix Handles Up To 8 Million Events Per Second
(BDT318) How Netflix Handles Up To 8 Million Events Per Second
Amazon Web Services
 
101 ways to configure kafka - badly (Kafka Summit)
101 ways to configure kafka - badly (Kafka Summit)101 ways to configure kafka - badly (Kafka Summit)
101 ways to configure kafka - badly (Kafka Summit)
Henning Spjelkavik
 
Lambda at Weather Scale - Cassandra Summit 2015
Lambda at Weather Scale - Cassandra Summit 2015Lambda at Weather Scale - Cassandra Summit 2015
Lambda at Weather Scale - Cassandra Summit 2015
Robbie Strickland
 
Storm – Streaming Data Analytics at Scale - StampedeCon 2014
Storm – Streaming Data Analytics at Scale - StampedeCon 2014Storm – Streaming Data Analytics at Scale - StampedeCon 2014
Storm – Streaming Data Analytics at Scale - StampedeCon 2014
StampedeCon
 
Flink forward-2017-netflix keystones-paas
Flink forward-2017-netflix keystones-paasFlink forward-2017-netflix keystones-paas
Flink forward-2017-netflix keystones-paas
Monal Daxini
 
Using Spark, Kafka, Cassandra and Akka on Mesos for Real-Time Personalization
Using Spark, Kafka, Cassandra and Akka on Mesos for Real-Time PersonalizationUsing Spark, Kafka, Cassandra and Akka on Mesos for Real-Time Personalization
Using Spark, Kafka, Cassandra and Akka on Mesos for Real-Time Personalization
Patrick Di Loreto
 
Cost Effectively and Reliably Aggregating Billions of Messages Per Day Using ...
Cost Effectively and Reliably Aggregating Billions of Messages Per Day Using ...Cost Effectively and Reliably Aggregating Billions of Messages Per Day Using ...
Cost Effectively and Reliably Aggregating Billions of Messages Per Day Using ...
confluent
 
Beyond the DSL-Unlocking the Power of Kafka Streams with the Processor API (A...
Beyond the DSL-Unlocking the Power of Kafka Streams with the Processor API (A...Beyond the DSL-Unlocking the Power of Kafka Streams with the Processor API (A...
Beyond the DSL-Unlocking the Power of Kafka Streams with the Processor API (A...
confluent
 

What's hot (20)

Big Data Streaming processing using Apache Storm - FOSSCOMM 2016
Big Data Streaming processing using Apache Storm - FOSSCOMM 2016Big Data Streaming processing using Apache Storm - FOSSCOMM 2016
Big Data Streaming processing using Apache Storm - FOSSCOMM 2016
 
How Pulsar Enables Netdata to Offer Unlimited Infrastructure Monitoring for F...
How Pulsar Enables Netdata to Offer Unlimited Infrastructure Monitoring for F...How Pulsar Enables Netdata to Offer Unlimited Infrastructure Monitoring for F...
How Pulsar Enables Netdata to Offer Unlimited Infrastructure Monitoring for F...
 
IoT Austin CUG talk
IoT Austin CUG talkIoT Austin CUG talk
IoT Austin CUG talk
 
Event Stream Processing with Kafka and Samza
Event Stream Processing with Kafka and SamzaEvent Stream Processing with Kafka and Samza
Event Stream Processing with Kafka and Samza
 
Lessons Learned - Monitoring the Data Pipeline at Hulu
Lessons Learned - Monitoring the Data Pipeline at HuluLessons Learned - Monitoring the Data Pipeline at Hulu
Lessons Learned - Monitoring the Data Pipeline at Hulu
 
In Flux Limiting for a multi-tenant logging service
In Flux Limiting for a multi-tenant logging serviceIn Flux Limiting for a multi-tenant logging service
In Flux Limiting for a multi-tenant logging service
 
Bulletproof Kafka with Fault Tree Analysis (Andrey Falko, Lyft) Kafka Summit ...
Bulletproof Kafka with Fault Tree Analysis (Andrey Falko, Lyft) Kafka Summit ...Bulletproof Kafka with Fault Tree Analysis (Andrey Falko, Lyft) Kafka Summit ...
Bulletproof Kafka with Fault Tree Analysis (Andrey Falko, Lyft) Kafka Summit ...
 
Capital One: Using Cassandra In Building A Reporting Platform
Capital One: Using Cassandra In Building A Reporting PlatformCapital One: Using Cassandra In Building A Reporting Platform
Capital One: Using Cassandra In Building A Reporting Platform
 
netflix-real-time-data-strata-talk
netflix-real-time-data-strata-talknetflix-real-time-data-strata-talk
netflix-real-time-data-strata-talk
 
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ UberKafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
 
eBay Pulsar: Real-time analytics platform
eBay Pulsar: Real-time analytics platformeBay Pulsar: Real-time analytics platform
eBay Pulsar: Real-time analytics platform
 
Introduction to Streaming Analytics
Introduction to Streaming AnalyticsIntroduction to Streaming Analytics
Introduction to Streaming Analytics
 
(BDT318) How Netflix Handles Up To 8 Million Events Per Second
(BDT318) How Netflix Handles Up To 8 Million Events Per Second(BDT318) How Netflix Handles Up To 8 Million Events Per Second
(BDT318) How Netflix Handles Up To 8 Million Events Per Second
 
101 ways to configure kafka - badly (Kafka Summit)
101 ways to configure kafka - badly (Kafka Summit)101 ways to configure kafka - badly (Kafka Summit)
101 ways to configure kafka - badly (Kafka Summit)
 
Lambda at Weather Scale - Cassandra Summit 2015
Lambda at Weather Scale - Cassandra Summit 2015Lambda at Weather Scale - Cassandra Summit 2015
Lambda at Weather Scale - Cassandra Summit 2015
 
Storm – Streaming Data Analytics at Scale - StampedeCon 2014
Storm – Streaming Data Analytics at Scale - StampedeCon 2014Storm – Streaming Data Analytics at Scale - StampedeCon 2014
Storm – Streaming Data Analytics at Scale - StampedeCon 2014
 
Flink forward-2017-netflix keystones-paas
Flink forward-2017-netflix keystones-paasFlink forward-2017-netflix keystones-paas
Flink forward-2017-netflix keystones-paas
 
Using Spark, Kafka, Cassandra and Akka on Mesos for Real-Time Personalization
Using Spark, Kafka, Cassandra and Akka on Mesos for Real-Time PersonalizationUsing Spark, Kafka, Cassandra and Akka on Mesos for Real-Time Personalization
Using Spark, Kafka, Cassandra and Akka on Mesos for Real-Time Personalization
 
Cost Effectively and Reliably Aggregating Billions of Messages Per Day Using ...
Cost Effectively and Reliably Aggregating Billions of Messages Per Day Using ...Cost Effectively and Reliably Aggregating Billions of Messages Per Day Using ...
Cost Effectively and Reliably Aggregating Billions of Messages Per Day Using ...
 
Beyond the DSL-Unlocking the Power of Kafka Streams with the Processor API (A...
Beyond the DSL-Unlocking the Power of Kafka Streams with the Processor API (A...Beyond the DSL-Unlocking the Power of Kafka Streams with the Processor API (A...
Beyond the DSL-Unlocking the Power of Kafka Streams with the Processor API (A...
 

Similar to Using Apache Pulsar to Provide Real-Time IoT Analytics on the Edge_David

Using Apache Pulsar to Provide Real-Time IoT Analytics on the Edge
Using Apache Pulsar to Provide Real-Time IoT Analytics on the EdgeUsing Apache Pulsar to Provide Real-Time IoT Analytics on the Edge
Using Apache Pulsar to Provide Real-Time IoT Analytics on the Edge
DataWorks Summit
 
Netflix SRE perf meetup_slides
Netflix SRE perf meetup_slidesNetflix SRE perf meetup_slides
Netflix SRE perf meetup_slides
Ed Hunter
 
Training Neural Networks
Training Neural NetworksTraining Neural Networks
Training Neural Networks
Databricks
 
1. introduction
1. introduction1. introduction
1. introduction
Mathenge Kenneth
 
Learnings Using Spark Streaming and DataFrames for Walmart Search: Spark Summ...
Learnings Using Spark Streaming and DataFrames for Walmart Search: Spark Summ...Learnings Using Spark Streaming and DataFrames for Walmart Search: Spark Summ...
Learnings Using Spark Streaming and DataFrames for Walmart Search: Spark Summ...
Spark Summit
 
Using bluemix predictive analytics service in Node-RED
Using bluemix predictive analytics service in Node-REDUsing bluemix predictive analytics service in Node-RED
Using bluemix predictive analytics service in Node-RED
Lionel Mommeja
 
VCE Unit 01 (1).pptx
VCE Unit 01 (1).pptxVCE Unit 01 (1).pptx
VCE Unit 01 (1).pptx
skilljiolms
 
House price prediction
House price predictionHouse price prediction
House price prediction
SabahBegum
 
Integrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationsIntegrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applications
thelabdude
 
Challenges of monitoring distributed systems
Challenges of monitoring distributed systemsChallenges of monitoring distributed systems
Challenges of monitoring distributed systems
Nenad Bozic
 
Real time streaming analytics
Real time streaming analyticsReal time streaming analytics
Real time streaming analytics
Anirudh
 
Mantis: Netflix's Event Stream Processing System
Mantis: Netflix's Event Stream Processing SystemMantis: Netflix's Event Stream Processing System
Mantis: Netflix's Event Stream Processing System
C4Media
 
Reactive programming with examples
Reactive programming with examplesReactive programming with examples
Reactive programming with examples
Peter Lawrey
 
Actionable Insights with Apache Apex at Apache Big Data 2017 by Devendra Tagare
Actionable Insights with Apache Apex at Apache Big Data 2017 by Devendra TagareActionable Insights with Apache Apex at Apache Big Data 2017 by Devendra Tagare
Actionable Insights with Apache Apex at Apache Big Data 2017 by Devendra Tagare
Apache Apex
 
Data Platform at Twitter: Enabling Real-time & Batch Analytics at Scale
Data Platform at Twitter: Enabling Real-time & Batch Analytics at ScaleData Platform at Twitter: Enabling Real-time & Batch Analytics at Scale
Data Platform at Twitter: Enabling Real-time & Batch Analytics at Scale
Sriram Krishnan
 
Approximation Data Structures for Streaming Applications
Approximation Data Structures for Streaming ApplicationsApproximation Data Structures for Streaming Applications
Approximation Data Structures for Streaming Applications
Debasish Ghosh
 
Axibase Time Series Database
Axibase Time Series DatabaseAxibase Time Series Database
Axibase Time Series Database
heinrichvk
 
Cs 331 Data Structures
Cs 331 Data StructuresCs 331 Data Structures
The Incremental Path to Observability
The Incremental Path to ObservabilityThe Incremental Path to Observability
The Incremental Path to Observability
Emily Nakashima
 
Afanasov14flynet slides
Afanasov14flynet slidesAfanasov14flynet slides
Afanasov14flynet slides
Mikhail Afanasov
 

Similar to Using Apache Pulsar to Provide Real-Time IoT Analytics on the Edge_David (20)

Using Apache Pulsar to Provide Real-Time IoT Analytics on the Edge
Using Apache Pulsar to Provide Real-Time IoT Analytics on the EdgeUsing Apache Pulsar to Provide Real-Time IoT Analytics on the Edge
Using Apache Pulsar to Provide Real-Time IoT Analytics on the Edge
 
Netflix SRE perf meetup_slides
Netflix SRE perf meetup_slidesNetflix SRE perf meetup_slides
Netflix SRE perf meetup_slides
 
Training Neural Networks
Training Neural NetworksTraining Neural Networks
Training Neural Networks
 
1. introduction
1. introduction1. introduction
1. introduction
 
Learnings Using Spark Streaming and DataFrames for Walmart Search: Spark Summ...
Learnings Using Spark Streaming and DataFrames for Walmart Search: Spark Summ...Learnings Using Spark Streaming and DataFrames for Walmart Search: Spark Summ...
Learnings Using Spark Streaming and DataFrames for Walmart Search: Spark Summ...
 
Using bluemix predictive analytics service in Node-RED
Using bluemix predictive analytics service in Node-REDUsing bluemix predictive analytics service in Node-RED
Using bluemix predictive analytics service in Node-RED
 
VCE Unit 01 (1).pptx
VCE Unit 01 (1).pptxVCE Unit 01 (1).pptx
VCE Unit 01 (1).pptx
 
House price prediction
House price predictionHouse price prediction
House price prediction
 
Integrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationsIntegrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applications
 
Challenges of monitoring distributed systems
Challenges of monitoring distributed systemsChallenges of monitoring distributed systems
Challenges of monitoring distributed systems
 
Real time streaming analytics
Real time streaming analyticsReal time streaming analytics
Real time streaming analytics
 
Mantis: Netflix's Event Stream Processing System
Mantis: Netflix's Event Stream Processing SystemMantis: Netflix's Event Stream Processing System
Mantis: Netflix's Event Stream Processing System
 
Reactive programming with examples
Reactive programming with examplesReactive programming with examples
Reactive programming with examples
 
Actionable Insights with Apache Apex at Apache Big Data 2017 by Devendra Tagare
Actionable Insights with Apache Apex at Apache Big Data 2017 by Devendra TagareActionable Insights with Apache Apex at Apache Big Data 2017 by Devendra Tagare
Actionable Insights with Apache Apex at Apache Big Data 2017 by Devendra Tagare
 
Data Platform at Twitter: Enabling Real-time & Batch Analytics at Scale
Data Platform at Twitter: Enabling Real-time & Batch Analytics at ScaleData Platform at Twitter: Enabling Real-time & Batch Analytics at Scale
Data Platform at Twitter: Enabling Real-time & Batch Analytics at Scale
 
Approximation Data Structures for Streaming Applications
Approximation Data Structures for Streaming ApplicationsApproximation Data Structures for Streaming Applications
Approximation Data Structures for Streaming Applications
 
Axibase Time Series Database
Axibase Time Series DatabaseAxibase Time Series Database
Axibase Time Series Database
 
Cs 331 Data Structures
Cs 331 Data StructuresCs 331 Data Structures
Cs 331 Data Structures
 
The Incremental Path to Observability
The Incremental Path to ObservabilityThe Incremental Path to Observability
The Incremental Path to Observability
 
Afanasov14flynet slides
Afanasov14flynet slidesAfanasov14flynet slides
Afanasov14flynet slides
 

More from StreamNative

Is Using KoP (Kafka-on-Pulsar) a Good Idea? - Pulsar Summit SF 2022
Is Using KoP (Kafka-on-Pulsar) a Good Idea? - Pulsar Summit SF 2022Is Using KoP (Kafka-on-Pulsar) a Good Idea? - Pulsar Summit SF 2022
Is Using KoP (Kafka-on-Pulsar) a Good Idea? - Pulsar Summit SF 2022
StreamNative
 
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
StreamNative
 
Blue-green deploys with Pulsar & Envoy in an event-driven microservice ecosys...
Blue-green deploys with Pulsar & Envoy in an event-driven microservice ecosys...Blue-green deploys with Pulsar & Envoy in an event-driven microservice ecosys...
Blue-green deploys with Pulsar & Envoy in an event-driven microservice ecosys...
StreamNative
 
Distributed Database Design Decisions to Support High Performance Event Strea...
Distributed Database Design Decisions to Support High Performance Event Strea...Distributed Database Design Decisions to Support High Performance Event Strea...
Distributed Database Design Decisions to Support High Performance Event Strea...
StreamNative
 
Simplify Pulsar Functions Development with SQL - Pulsar Summit SF 2022
Simplify Pulsar Functions Development with SQL - Pulsar Summit SF 2022Simplify Pulsar Functions Development with SQL - Pulsar Summit SF 2022
Simplify Pulsar Functions Development with SQL - Pulsar Summit SF 2022
StreamNative
 
Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022
Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022
Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022
StreamNative
 
Validating Apache Pulsar’s Behavior under Failure Conditions - Pulsar Summit ...
Validating Apache Pulsar’s Behavior under Failure Conditions - Pulsar Summit ...Validating Apache Pulsar’s Behavior under Failure Conditions - Pulsar Summit ...
Validating Apache Pulsar’s Behavior under Failure Conditions - Pulsar Summit ...
StreamNative
 
Cross the Streams! Creating Streaming Data Pipelines with Apache Flink + Apac...
Cross the Streams! Creating Streaming Data Pipelines with Apache Flink + Apac...Cross the Streams! Creating Streaming Data Pipelines with Apache Flink + Apac...
Cross the Streams! Creating Streaming Data Pipelines with Apache Flink + Apac...
StreamNative
 
Message Redelivery: An Unexpected Journey - Pulsar Summit SF 2022
Message Redelivery: An Unexpected Journey - Pulsar Summit SF 2022Message Redelivery: An Unexpected Journey - Pulsar Summit SF 2022
Message Redelivery: An Unexpected Journey - Pulsar Summit SF 2022
StreamNative
 
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...
StreamNative
 
Understanding Broker Load Balancing - Pulsar Summit SF 2022
Understanding Broker Load Balancing - Pulsar Summit SF 2022Understanding Broker Load Balancing - Pulsar Summit SF 2022
Understanding Broker Load Balancing - Pulsar Summit SF 2022
StreamNative
 
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
StreamNative
 
Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022
Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022
Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022
StreamNative
 
Event-Driven Applications Done Right - Pulsar Summit SF 2022
Event-Driven Applications Done Right - Pulsar Summit SF 2022Event-Driven Applications Done Right - Pulsar Summit SF 2022
Event-Driven Applications Done Right - Pulsar Summit SF 2022
StreamNative
 
Pulsar @ Scale. 200M RPM and 1K instances - Pulsar Summit SF 2022
Pulsar @ Scale. 200M RPM and 1K instances - Pulsar Summit SF 2022Pulsar @ Scale. 200M RPM and 1K instances - Pulsar Summit SF 2022
Pulsar @ Scale. 200M RPM and 1K instances - Pulsar Summit SF 2022
StreamNative
 
Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022
Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022
Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022
StreamNative
 
Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022
Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022
Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022
StreamNative
 
Welcome and Opening Remarks - Pulsar Summit SF 2022
Welcome and Opening Remarks - Pulsar Summit SF 2022Welcome and Opening Remarks - Pulsar Summit SF 2022
Welcome and Opening Remarks - Pulsar Summit SF 2022
StreamNative
 
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...
StreamNative
 
MoP(MQTT on Pulsar) - a Powerful Tool for Apache Pulsar in IoT - Pulsar Summi...
MoP(MQTT on Pulsar) - a Powerful Tool for Apache Pulsar in IoT - Pulsar Summi...MoP(MQTT on Pulsar) - a Powerful Tool for Apache Pulsar in IoT - Pulsar Summi...
MoP(MQTT on Pulsar) - a Powerful Tool for Apache Pulsar in IoT - Pulsar Summi...
StreamNative
 

More from StreamNative (20)

Is Using KoP (Kafka-on-Pulsar) a Good Idea? - Pulsar Summit SF 2022
Is Using KoP (Kafka-on-Pulsar) a Good Idea? - Pulsar Summit SF 2022Is Using KoP (Kafka-on-Pulsar) a Good Idea? - Pulsar Summit SF 2022
Is Using KoP (Kafka-on-Pulsar) a Good Idea? - Pulsar Summit SF 2022
 
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
 
Blue-green deploys with Pulsar & Envoy in an event-driven microservice ecosys...
Blue-green deploys with Pulsar & Envoy in an event-driven microservice ecosys...Blue-green deploys with Pulsar & Envoy in an event-driven microservice ecosys...
Blue-green deploys with Pulsar & Envoy in an event-driven microservice ecosys...
 
Distributed Database Design Decisions to Support High Performance Event Strea...
Distributed Database Design Decisions to Support High Performance Event Strea...Distributed Database Design Decisions to Support High Performance Event Strea...
Distributed Database Design Decisions to Support High Performance Event Strea...
 
Simplify Pulsar Functions Development with SQL - Pulsar Summit SF 2022
Simplify Pulsar Functions Development with SQL - Pulsar Summit SF 2022Simplify Pulsar Functions Development with SQL - Pulsar Summit SF 2022
Simplify Pulsar Functions Development with SQL - Pulsar Summit SF 2022
 
Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022
Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022
Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022
 
Validating Apache Pulsar’s Behavior under Failure Conditions - Pulsar Summit ...
Validating Apache Pulsar’s Behavior under Failure Conditions - Pulsar Summit ...Validating Apache Pulsar’s Behavior under Failure Conditions - Pulsar Summit ...
Validating Apache Pulsar’s Behavior under Failure Conditions - Pulsar Summit ...
 
Cross the Streams! Creating Streaming Data Pipelines with Apache Flink + Apac...
Cross the Streams! Creating Streaming Data Pipelines with Apache Flink + Apac...Cross the Streams! Creating Streaming Data Pipelines with Apache Flink + Apac...
Cross the Streams! Creating Streaming Data Pipelines with Apache Flink + Apac...
 
Message Redelivery: An Unexpected Journey - Pulsar Summit SF 2022
Message Redelivery: An Unexpected Journey - Pulsar Summit SF 2022Message Redelivery: An Unexpected Journey - Pulsar Summit SF 2022
Message Redelivery: An Unexpected Journey - Pulsar Summit SF 2022
 
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...
 
Understanding Broker Load Balancing - Pulsar Summit SF 2022
Understanding Broker Load Balancing - Pulsar Summit SF 2022Understanding Broker Load Balancing - Pulsar Summit SF 2022
Understanding Broker Load Balancing - Pulsar Summit SF 2022
 
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
 
Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022
Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022
Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022
 
Event-Driven Applications Done Right - Pulsar Summit SF 2022
Event-Driven Applications Done Right - Pulsar Summit SF 2022Event-Driven Applications Done Right - Pulsar Summit SF 2022
Event-Driven Applications Done Right - Pulsar Summit SF 2022
 
Pulsar @ Scale. 200M RPM and 1K instances - Pulsar Summit SF 2022
Pulsar @ Scale. 200M RPM and 1K instances - Pulsar Summit SF 2022Pulsar @ Scale. 200M RPM and 1K instances - Pulsar Summit SF 2022
Pulsar @ Scale. 200M RPM and 1K instances - Pulsar Summit SF 2022
 
Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022
Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022
Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022
 
Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022
Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022
Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022
 
Welcome and Opening Remarks - Pulsar Summit SF 2022
Welcome and Opening Remarks - Pulsar Summit SF 2022Welcome and Opening Remarks - Pulsar Summit SF 2022
Welcome and Opening Remarks - Pulsar Summit SF 2022
 
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...
 
MoP(MQTT on Pulsar) - a Powerful Tool for Apache Pulsar in IoT - Pulsar Summi...
MoP(MQTT on Pulsar) - a Powerful Tool for Apache Pulsar in IoT - Pulsar Summi...MoP(MQTT on Pulsar) - a Powerful Tool for Apache Pulsar in IoT - Pulsar Summi...
MoP(MQTT on Pulsar) - a Powerful Tool for Apache Pulsar in IoT - Pulsar Summi...
 

Recently uploaded

一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
ukgaet
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
ewymefz
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
ewymefz
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
Tabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsTabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflows
alex933524
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Boston Institute of Analytics
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
nscud
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
Oppotus
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Subhajit Sahu
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
NABLAS株式会社
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Linda486226
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
ewymefz
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
vcaxypu
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 

Recently uploaded (20)

一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
Tabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsTabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflows
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 

Using Apache Pulsar to Provide Real-Time IoT Analytics on the Edge_David

  • 1. REAL-TIME IOT ANALYTICS WITH APACHE PULSAR DAVID KJERRUMGAARD JUNE 18TH, 2020
  • 2. DEFINING IOT ANALYTICS • It is NOT JUST loading sensor data into a data lake to create predictive analytic models. While this is crucial piece of the puzzle, it is not the only one. • IoT Analytics requires the ability to ingest, aggregate, and process an endless stream of real-time data coming off a wide variety of sensor devices “at the edge” • IoT Analytics renders real-time decisions at the edge of the network to either optimize operational performance or detect anomalies for immediate remediation.
  • 3. WHAT MAKES IOT ANALYTICS DIFFERENT?
  • 4. IOT ANALYTICS CHALLENGES • IoT deals with machine generated data consisting of discrete observations such as temperature, vibration, pressure, etc. that is produced at very high rates. • We need an architecture that: • Allows us to quickly identify and react to anomalous events • Reduces the volume of data transmitted back to the data lake. • In this talk, we will present a solution based on Apache Pulsar Functions that distributes the analytics processing across all tiers of the IoT data ingestion pipeline.
  • 7. PULSAR FUNCTIONS: STREAM PROCESSING The Apache Pulsar platform provides a flexible, serverless computing framework that allows you execute user-defined functions to process and transform data. • Implemented as simple methods but allows you to leverage existing libraries and code within Java or Python code. • Functions execute against every single event that is published to a specified topic and write their results to another topic. Forming a logical directed- acyclic graph. • Enable dynamic filtering, transformation, routing and analytics. • Can run anywhere a JVM can, including edge devices.
  • 8. BUILDING BLOCKS FOR IOT ANALYTICS
  • 10. PROBABILISTIC ANALYSIS • Often times, it is sufficient to provide an approximate value when it is impossible and/or impractical to provide a precise value. In many cases having an approximate answer within a given time frame is better than waiting for an exact answer. • Probabilistic algorithms can provide approximate values when the event stream is either too large to store in memory, or the data is moving too fast to process. • Instead of requiring to keep such enormous data on-hand, we leverage algorithms that require only a few kilobytes of data.
  • 11. DATA SKETCHES • A central theme throughout most of these probabilistic data structures is the concept of data sketches, which are designed to require only enough of the data necessary to make an accurate estimation of the correct answer. • Typically, sketches are implemented a bit arrays or maps thereby requiring memory on the order of Kilobytes, making them ideal for resource-constrained environments, e.g. on the edge. • Sketching algorithms only need to see each incoming item only once, and are therefore ideal for processing infinite streams of data.
  • 12. SKETCH EXAMPLE • Let’s walk through an demonstration to show exactly what I mean by sketches and show you that we do not need 100% of the data in order to make an accurate prediction of what the picture contains • How much of the data did you require to identify the main item in the picture?
  • 13. DATA SKETCH PROPERTIES • Configurable Accuracy • Sketches sized correctly can be 100% accurate • Error rate is inversely proportional to size of a Sketch • Fixed Memory Utilization • Maximum Sketch size is configured in advance • Memory cost of a query is thus known in advance • Allows Non-additive Operations to be Additive • Sketches can be merged into a single Sketch without over counting • Allows tasks to be parallelized and combined later • Allows results to be combined across windows of execution
  • 16. EVENT FREQUENCY • Another common statistic computed is the frequency at which a specific element occurs within an endless data stream with repeated elements, which enables us to answer questions such as; “How many times has element X occurred in the data stream?”. • Consider trying to analyze and sample the IoT sensor data for just a single industrial plant that can produce millions of readings per second. There isn’t enough time to perform the calculations or store the data. • In such a scenario you can chose to forego an exact answer, which will we never be able to compute in time, for an approximate answer that is within an acceptable range of accuracy.
  • 17. COUNT-MIN SKETCH • The Count-Min Sketch algorithm uses two elements: • An M-by-K matrix of counters, each initialized to 0, where each row corresponds to a hash function • A collection of K independent hash functions h(x). • When an element is added to the sketch, each of the hash functions are applied to the element. These hash values are treated as indexes into the bit array, and the corresponding array element is set incremented by 1. • Now that we have an approximate count for each element we have seen stored in the M-by-K matrix, we are able to quickly determine how many times an element X has occurred previously in the stream by simply applying each of the hash functions to the element, and retrieving all of the corresponding array elements and using the SMALLEST value in the list are the approximate event count.
  • 18. PULSAR FUNCTION: EVENT FREQUENCY import org.apache.pulsar.functions.api.Context; import org.apache.pulsar.functions.api.Function; import com.clearspring.analytics.stream.frequency.CountMinSketch; public class CountMinFunction implements Function<String, Long> { CountMinSketch sketch = new CountMinSketch(20,20,128); Long process(String input, Context context) throws Exception { sketch.add(input, 1); // Calculates bit indexes and performs +1 long count = sketch.estimateCount(input); // React to the updated count return new Long(count); } }
  • 19. K-FREQUENCY-ESTIMATION • Another common use of the Count-Min algorithm is maintaining lists of frequent items which is commonly referred to as the “Heavy Hitters”. This design pattern retains a list of items that occur more frequently than some predefined value, e.g. the top-K list • The K-Frequency-Estimation problem can also be solved by using the Count-Min Sketch algorithm. The logic for updating the counts is exactly the same as in the Event Frequency use case. • However, there is an additional list of length K used to keep the top-K elements seen that is updated.
  • 20. PULSAR FUNCTION: TOP K import org.apache.pulsar.functions.api.Context; import org.apache.pulsar.functions.api.Function; import com.clearspring.analytics.stream.StreamSummary; public class CountMinFunction implements Function<String, List<Counter<String>>> { StreamSummary<String> summary = new StreamSummary<String> (256); List<Counter<String>> process(String input, Context context) throws Exception { // Add the element to the sketch summary.offer(input, 1) // Grab the updated top 10 List<Counter<String>> topK = summary.topK(10); return topK; } }
  • 21. ANOMALY DETECTION • The most anomaly detectors use a manually configured threshold value that is not adaptive to even simple patterns or variances. • Instead of using a single static value for our thresholds, we should consider using quantiles instead. • In statistics and probably, quantiles are used to represent probability distributions. The most common of which are known as percentiles.
  • 22. ANOMALY DETECTION WITH QUANTILES • The data structure known as t-digest was developed by Ted Dunning, as a way to accurately estimate extreme quantiles for very large data sets with limited memory use. • This capability makes t-digest particularly useful for calculating quantiles that can be used to select a good threshold for anomaly detection. • The advantage of this approach is that the threshold automatically adapts to the dataset as it collects more data.
  • 23. PULSAR FUNCTION: T-DIGEST import com.clearspring.analytics.stream.quantile.TDigest; public AnomalyDetectorFunction implements Function<Sensor s, Void> { private static TDigest digest; Void process(Sensor s, Context context) throws Exception { Double threshold = context.getUserConfigValue(“threshold”).toString(); String alertTopic = context.getUserConfigValue(“alert-topic”).toString(); getDigest().add(s.getMetric()); // Add metric to observations Double quantile = getDigest().compute(s.getMetric()); if (quantile > threshold) { context.publish(alertTopic, sensor.getId()); } return null; } private static TDigest getDigest() { if (digest == null) { digest = new TDigest(100); // Accuracy = 3/100 = 3% } return digest; } }
  • 24. IOT ANALYTICS PIPELINE USING APACHE PULSAR FUNCTIONS
  • 25. IDENTIFYING REAL-TIME ENERGY CONSUMPTION PATTERNS • A network of smart meters enables utilities companies to gain greater visibility into their customers energy consumption. • Increase/decrease energy generation to meet the demand • Implement dynamic notifications to encourage consumers to use less energy during peak demand periods. • Provide real-time revenue forecasts to senior business leaders. • Identify fault meters and schedule maintenance calls to repair them.
  • 26. SMART METER ANALYTICS FLOW LOGIC
  • 27. SMART METER ANALYTICS FLOW – EVENT FREQUENCY
  • 28. SMART METER ANALYTICS FLOW - VALIDATION
  • 29. SMART METER ANALYTICS FLOW – DATA ENRICHMENT
  • 30. SMART METER ANALYTICS FLOW – CATEGORIZATION
  • 31. SMART METER ANALYTICS FLOW – ANOMALY DETECTION
  • 32. SMART METER ANALYTICS FLOW – TOP K
  • 34. SUMMARY & REVIEW • IoT Analytics is an extremely complex problem, and modern streaming platforms are not well suited to solving this problem. • Apache Pulsar provides a platform for implementing distributed analytics on the edge to decrease the data capture time. • Apache Pulsar Functions allows you to leverage existing probabilistic analysis techniques to provide approximate values, within an acceptable degree of accuracy. • Both techniques allow you to act upon your data while the business value is still high.