Serverless Streaming Architectures and Algorithms for the Enterprise

ServerlessStreamingArchitectures&Algorithms
fortheEnterprise
Anurag Khandelwal, Arun Kejariwal, Karthik Ramasamy
@anuragk_ @arun_kejariwal @karthikz

WHY BOTHER?
ON A HIGH LEVEL LANGUAGE
e.g., Python, JavaScript,…
DEMAND DRIVEN EXECUTION
Runs whenever new requests arrive
PAY BASED ON RUNTIME
~ millisecond granularity
CODE
BILLING
COMPUTATION
3

SERVERLESS COMPUTING SIMPLIFIES
CLOUD PROGRAMMING
4
Upload
Code
Pay for what
you use
Run at  
any scale

EVENT-DRIVEN APPLICATION EXAMPLE: IMAGE RESIZING[1]
5
Cloud Storage Serverless
Save 
thumbnail
Save path
Cloud Storage
Cloud Database
λ
[1] Slide adapted from talk by Eric Jonas and Johann Schleier-Smith, “A Berkeley View on Cloud CompuCng”

BATCH ANALYTICS EXAMPLE: VIDEO ANALYTICS
λ
No car: filter locally Car detected: analyze in cloud
Analyze video
using DNNs
Law enforcement
Traffic video analytics
λ
Video encoding/decoding
Encoder/Decoder
6

STREAMING EXAMPLE: FIGHTING SPAMS ON TWITTER
7
Spammy Tweet Regular Tweet
λ
Similarity
Clustering
Message Queue
Key-Value Store
✦ Fight spammy content, engagements, and behaviors in Twitter
✦ Spam campaign comes in large batch
✦ Despite randomized tweaks, enough similarity among spammy entities are preserved

A REAL USE-CASE: HOW FINANCIAL ENGINES CUT COSTS 90% USING SERVERLESS
[1]
8
✦ Financial Engines: Independent Investment Advisor
๏ 9 million people across 743 companies, $1.8 trillion in assets
✦ Automated portfolio management using computational engines
๏ Core engine component: Integer programming optimizer (IPO)
๏ Linear Programming to compute optimization/feasibility
[1] Financial Engines Cuts Costs 90% Using AWS Lambda and Serverless CompuCng, 
hNps://aws.amazon.com/soluCons/case-studies/ﬁnancial-engines/

IPO SERVER FARM
9
…
Solver
Library
Solver
Library
Solver
Library
Solver
Library
✦ IPO consumes > 30% of total CPU capacity
๏ Spikes of up to 1000 requests/s, 100ms per request
๏ Capacity planning during marketing campaigns that produce large traﬃc spikes is hard…
40 IPO Servers

NEED TO DO A LOT OF WORK …
10
✦ Scaling in response to load variations
✦ Request routing and load balancing
✦ Monitoring to respond to problems
✦ Provision servers based on budget, requirements
✦ System upgrades, including security patching
✦ Migration to new hardware as it becomes available
…

λ Solver
Libraryλ Solver
Library
11
✦ AWS Lambda function for each IPO request
๏ Run as many copies of the IPO function as needed in parallel
✦ Serverless beneﬁts
๏ Up to 94% cost savings annually, not including operational savings
๏ 200-300 M IPO requests/month, 60,000 per minute at peak
๏ Increased reliability: just instantiate new lambda requests on crash
λ Solver
Library
[1] Financial Engines Cuts Costs 90% Using AWS Lambda and Serverless CompuCng, 
hNps://aws.amazon.com/soluCons/case-studies/ﬁnancial-engines/
A REAL USE-CASE: HOW FINANCIAL ENGINES CUT COSTS 90% USING SERVERLESS
[1]

OF CLOUD PLATFORMS
EVOLUTION
12
On-prem
virtualization
Platform as a Service (PaaS)
Backend as a Service (BaaS)
Container Orchestration
Serverless Platforms
App Engine, Heroku
Borg, Kubernetes
✦ AWS Lambda, Google Cloud
Functions, Azure Functions
✦ Big Query, DynamoDB
✦ Cloud Dataﬂow
✦ Easy switch from legacy
infrastructure
✦ Added cloud services  
(e.g., storage, pub-sub)
VMs in the cloud

OF SHARING RESOURCES
EVOLUTION
App
Runtime
OS
Hardware
No Sharing
App
Runtime
OS
Hardware
VM
App
Runtime
OS
VM
Virtual Machines
OS
Hardware
App
Runtime
App
Runtime
Containers
Runtime
OS
Hardware
App App
FaaS
Increasing Virtualization
[1]
[1] Serverless ComputaCon with OpenLambda, Hendrickson et. al.

✦ Diﬀerent pricing models, resource allocations
✦ Security and isolation support
✦ Programming language support, OS support, etc.
[1,2]
SERVERLESS TODAY: FUNCTION-AS-A-SERVICE (FAAS)
14[1] Peeking Behind the Curtains of Serverless PlaWorms, Wang et. al.
[2] EvaluaCon of ProducCon Serverless CompuCng Environments, Lee et. al.
✦ Many FaaS platforms
AWS Lambda Google Cloud
Functions
IBM Cloud
Functions
Azure FunctionsCloudﬂare Workers Alibaba Function
Compute

FAAS ORCHESTRATION
[1]
15
[1] Comparison of FaaS OrchestraCon Systems, Lopez et. al.
✦ Many orchestration frameworks:
✦ Varying pricing models, programming models, parallel
execution support, state management, architectures, etc.
[1]
✦ Serverless trilemma:
๏ black boxes
๏ substitution principle
๏ double-billing
AWS Step Functions Azure Durable Functions
IBM
Composer

SERVERLESS IS MORE THAN FaaS …
Serverless = FaaS BaaS+
✦ Object Storage (e.g., S3)
✦ Key-Value Stores (e.g., DynamoDB)
✦ Database (e.g., Cloud Firestore)
✦ Data Processing (e.g., Cloud Dataﬂow)
✦ Complexity Hiding
✦ Consumption based billing
✦ Automatic scaling
λ
Storage
Database
FaaS
Data  
Processing
Messaging
16

… NOT EVERYTHING IS SERVERLESS!
✦ The “buzzword” eﬀect
๏ Cloud providers market services as “serverless”  
without its properties:
๏ Complexity hiding
๏ Consumption-based billing
๏ Automatic scaling
✦ “Semi”-serverless
๏ Do not provide one or more of these properties
17

PLAYERS IN SERVERLESS: EVERYONE IS A WINNER
18
Cloud  
Provider
Developer
Enterprise

DEVELOPER BENEFITS
19
Developer
✦ Simpliﬁed programming
✦ Delegate scaling, scheduling, etc., to cloud
def function(event, context):
doComplexComputation()
Resources automatically scale with load
Close to zero configuration
No scheduling, load balancing, ….

ENTERPRISE BENEFITS
20
Enterprise
✦ Delegate DevOps to cloud
✦ Cost savings: pay for what you use
Time
Resources
Used
Paid for (server-based)
Paid for (serverless)

THE COST OF SERVERLESS
21
Function Execution Cost
✦ Charged at ~100ms
✦ Charged per GB memory
Data Transfer Cost
✦ Charged per GB
✦ Function fusion: combine functions to avoid data transfer for performance and cost
๏ But fusing functions with diﬀerent memory requirements can be expensive..
✦ Function placement: place function close to source for cost savings
๏ But limited compute power at source may slow things down…
✦ How to balance cost with performance?
[1]
✦ Use fusion and placement judiciously to optimize cost and performance
[1] Costless: OpCmizing Cost of Serverless CompuCng through FuncCon Fusion and Placement, Elgamal et. al.

PROVIDER BENEFITS
22
Cloud  
Provider
✦ Higher utilization by multiplexing
resources across users
Time
Resources
Capacity User1 User2 User3

USE CASES
24
Streaming data
transformation
Data
distribution
Real-time
analytics
Real-time monitoring
and notifications
IoT
analytics
!
Event-driven
workflows
SERVERLESS
Interactive
applications
Log processing
and analytics

TRADING SUPPORT PLATFORM
Scenario
✦ Major bank looking to move to next-generation data
pipeline to support continuous reconciliation of trading
activity
Challenges
✦ Zero tolerance for data loss
✦ Performance at scale difﬁcult to achieve
✦ Need to support future data and usage growth
25

INDUSTRIAL IOT ANALYTICS
Data from sensors on
power generation
equipment
Combined with data from
sensors in distribution
network
Brought together and
analyzed in the cloud
For immediate insights
into capacity, failures,
alerts
!
26

STREAMING DATA TRANSFORMATIONS
27
Move best-fit
transformations and those
needed for fast data access
into streaming systems
Provide users and
applications access to data
at multiple stages of
transformation
Leverage batch systems for
specialized capabilities and
complex transformations

CONNECTED VEHICLE
28
Scenario
Continuously-arriving data generated by connected
cars needs to be quickly collected, processed and
distributed to applications and partners
Challenges
Require scalability to handle growing data
sources and volumes without complex
mix of technologies
Solution
Leverage Apache Pulsar solution to provide data
backbone that can receive, transform,
and distribute data at scale

CONNECTED VEHICLE
29
Telemetry data from
connected vehicles
transmitted and published
to Pulsar
Data cleansing, enrichment
and refinement processed
inside Pulsar
Data made available to
internal teams for analysis
and reports
Data feeds supplied to
partners and partner
applications

DATA DRIVEN WORKFLOWS
30
Scenario
Application processes
incoming events and
documents that generate
processing workﬂows
Challenges
Operational burdens and
scalability challenges of
existing technologies growing
as data grows
Solution
Process incoming events and
data and create work queues in
same system
Decrypt, extract, convert, dispatch, process, store

APPLICATION CHARACTERISTICS
31

BIG DATA ANALYTICS
Analyze volumes of data 
Wide range of applications,: text analytics, machine learning, predictive analytics, data
mining, statistics, natural language processing
Why Serverless?
No server management
Transparent resource elasticity
Pay for what you use
Building Analytics on FaaS platforms
PyWren, Flint, Locus, ExCamera, …
32

BIG DATA ANALYTICS: SORT
…
Partition
Task
…
Partition
Task
…
Partition
Task
…
Merge
Task
Merge
Task
Merge
Task
…
OR
REDIS S3
Service Capacity IOPS
S3 High Low
Redis Low High
33
λ λS3 S3

BIG DATA ANALYTICS: LOCUS
… S3
λ λ
λ λ
REDIS
λ
PARTITION
λ
MERGE
λ
FINAL
MERGE
Hybrid Sort
34

BIG DATA ANALYTICS: FLINT
Input
Partition
Input
Partition
Input
Partition
Output
Partition
Output
Partition
Flint
Executor
Flint
Executor
Flint
Executor
Flint
Executor
Flint
Executor
Queue Queue
S3
Lambda
SQS
AWSClient
 
Spark Context
Flint
Scheduler
35

How is it done today?
✦ Video = Series of Chunks
๏ Chunk = KeyFrame (large) + InterFrames (small deltas from KeyFrame)
Thread#1 Thread#2 Thread#3 Thread#4
1 5 6
KF I I…
Frames:
Encoded:
1 5 6
KF I I…
1 5 6
KF I I…
1 5 6
KF I I…
VIDEO ENCODING/DECODING
36
✦ High parallelism = worse compression (more KeyFrames)

VIDEO ANALYTICS: EXCAMERA
VIDEO ENCODING/DECODING ON AWS LAMBDA
Lambda#1 Lambda#2 Lambda#3 Lambda#4
1 5 6
KF I I…
1 5 6
KF I I…
1 5 6
KF I I…
1 5 6
KF I I…
37

VIDEO ENCODING/DECODING ON AWS LAMBDA
Lambda#1 Lambda#2 Lambda#3 Lambda#4
Serial Pass: Rebase
1 5 6
KF I I…
1 5 6
I I…
1 5 6
I I…
1 5 6
I I…
State State StateI I I
37
✦ 60X faster and 6x cheaper than Google’s vpxenc on 128 cores

Making lambdas talk to each other
✦ Lambdas are only permitted outbound TCP/IP connections
✦ Establish outbound cxns to rendezvous server (R) at init
✦ If A wants to talk to B, it sends R an init msg connect(A, B)
๏ R forwards all of A’s subsequent msgs to B
Rendezvous
Server (R)
A
B
C
...
Lambdas
38

APACHE PULSAR
OVERVIEW
40
Cloud Na)ve Messaging + Compute System
backed by a durable log storage

KEY CHARACTERISTICS
MULTI-TENANCY
DURABILITY
TIERED STORAGE
UNIFIED MESSAGE &
QUEUING
41
HIGHLY SCALABLE

CORE CONCEPTS
42
Apache Pulsar Cluster
Product Safety
ETL
Fraud
Detection
Topic-1
Account History
Topic-2
User Clustering
Marketing
Campaigns
ETL
Topic-1
Budgeted Spend
Topic-2
Demographic Classification
Topic-1
Location Resolution
Data
Serving
Microservice
Topic-1
Customer Authentication
Tenants
Namespaces
TENANTS, NAMESPACES & TOPICS
Topic-1
Risk Classification

TOPICS AND STREAMS
43
TopicProducers
Consumers
Time
Consumers
Consumers
Producers

TOPIC PARTITIONS
44
Topic - P0
Time
Topic - P1
Topic - P2
Producers
Producers
Consumers
Consumers
Consumers

PARTITIONS AND SEGMENTS
45
Time
Segment 1 Segment 2 Segment 3
Segment 1 Segment 2 Segment 3 Segment 4
Segment 1 Segment 2 Segment 3
P0
P1
P2

STREAMING CONSUMPTION - EXCLUSIVE SUBSCRIPTION
46
Pulsar topic/
partition
Producer 2
Producer 1
Consumer 1
Consumer 2
Subscription
A
M4
M3
M2
M1
M0
M4
M3
M2
M1
M0
X
Exclusive

STREAMING CONSUMPTION - FAILOVER SUBSCRIPTION
47
Pulsar topic/
partition
Producer 2
Producer 1
Consumer 1
Consumer 2
Subscription
B
M4
M3
M2
M1
M0
M4
M3
M2
M1
M0
Failover
In case of failure in
consumer 1

MESSAGE QUEUEING - SHARED SUBSCRIPTION
48
Pulsar topic/
partition
Producer 2
Producer 1
Consumer 2
Consumer 3
Subscription
C
M4
M3
M2
M1
M0
Shared
Traffic is equally distributed
across consumers
Consumer 1
M4M3
M2M1M0

MULTI-LAYER AND SCALABLE ARCHITECTURE
49
Consumer
Producer
Producer
Producer
Consumer
Consumer
Consumer
Messaging
Broker Broker Broker
Bookie Bookie Bookie Bookie Bookie
Event storage
Function Processing
WorkerWorker
✦ Independent layers for processing, serving and storage
✦ Messaging and processing built on Apache Pulsar
✦ Storage built on Apache BookKeeper

DATA FLOW
50
Bookie
Bookie
BookieBroker
Producer
Journal
Journal
Journal
fsync
fsync
fsync
Segment storage
Segment storage
Segment storage
background
process
Consumer

STORAGE ARCHITECTURE
51
Logical
View
Partition
Processing
& Storage
Segment 1 Segment 3Segment 2 Segment n
Partition
Broker
Partition
(primary)
Broker
Partition
(copy)
Broker
Partition
(copy)
. . . . . . . . . . . .
Processing
(brokers)
Warm
Storage
✦ Storage co-resident with serving
✦ Partition centric
✦ Cumbersome to scale
๏ Data redistribution
๏ Performance impact
✦ Storage decoupled from processing
✦ Partition stored as segments
✦ Flexible and easy scalability

DATA ACCESS PATTERNS
DATA WORK LOAD
WRITES
TAILING
READS
CATCHUP
READS
HISTORICAL
READS
HCTW
52

DATA ACCESS PATTERNS
53
Partition
. . . . . . . . . . . .
Processing
(brokers)
Warm
Storage
Cold
Storage
Tailing reads: served from
in-memory cache
Catch-up reads: served from
persistent storage layer
Historical reads: served
from cold storage

MULTITENANCY
55
SEVERAL TEAMS SHARING THE SAME CLUSTER
✦ Authentication / Authorization / Namespaces / Admin APIs
✦ I/O isolation between writes and reads
๏ Provided by storage layer - ensure readers draining backlog won’t affect publishers
Soft isolation
✦ Storage quotas — flow-control — back-pressure — rate limiting
Hardware isolation
✦ Constrain some tenants on a subset of brokers or bookies

STORAGE TIERING
56
TAKING ADVANTAGE OF LOW COST CLOUD STORAGE
✦ Offload cold topic data to lower-cost
storage (e.g. cloud storage, HDFS)
✦ Manual or automatic (configurable
threshold)
✦ Transparent to publishers and consumers
✦ Allows near-infinite event storage at low
cost
Cold storage
Hot storage
Topic

SCHEMA REGISTRY
57
MAKING SENSE OF THE BYTES IN DATA
✦ Provides type safety to applications built on top of Pulsar
✦ Two approaches
๏ Client side enforcement: type safety enforcement up to the application
๏ Server side enforcement: system enforces type safety and ensures that producers and consumers remain synced
✦ Schema registry enables clients to upload data schemas on a topic basis.
✦ Schemas dictate which data types are recognized as valid for that topic
✦ Supports JSON, protobuf, binary schemas

SCHEMA REGISTRY
58
MAKING SENSE OF THE BYTES IN DATA
✦ Means for publishers and consumers to
communicate structure of topic data
✦ Validates schema as data is published
✦ Supports JSON, protobuf, binary schemas
PulsarClient client = PulsarClient.builder()
.serviceUrl("pulsar://localhost:6650")
.build();
Producer<SensorReading> producer =
client.newProducer(JSONSchema.of(SensorReading.class))
.topic("sensor-data")
.create();
Consumer<SensorReading> consumer =
client.newConsumer(JSONSchema.of(SensorReading.class)
.topic("sensor-data")
.subscriptionName("sensor-subscriber")
.subscribe();

ON THE FLY SCALABILITY
59
ADJUST PULSAR ON DEMAND BASED ON LOAD
Scale serving
✦ New nodes immediately available to process
requests, no data rebalancing required
Scale processing
✦ Add threads, processes or containers to increase
parallelism
Scale storage retention
✦ Add nodes to increase capacity, no data
redistribution required
Messaging
Bookie Bookie Bookie Bookie Bookie
Stream storage
Processing
WorkerWorker

TOPIC COMPACTION
60
ADJUST PULSAR ON DEMAND BASED ON LOAD
✦ Efficient way to enable consumer to catch up
to current state
✦ Process that creates version of a topic that
only has current values for each key
✦ Triggered via simple command
{key: “A”, value: “foo”}
{key: “B”, value: “foobar”}
{key: “B”, value: “bar”}
{key: “A”, value: “binky”}
{key: “A”, value: “bar”}
Complete topic Compacted topic
{key: “B”, value: “foobar”}
{key: “A”, value: “bar”}

SQL QUERYING
61
Enable SQL clients to directly query
data in Streamlio
✦ Integrated with schema registry
✦ Uses Presto as query engine
✦ Query engine reads data directly from
storage layer
✦ Data visible to SQL engine as soon as
published
Processing
Messaging and queuing
Stream storage
Data Access
Msg QueuePub-Sub
SQL engine 
(Presto)Functions
SQL Clients
Metadata

INTERACTIVE QUERYING USING SQL
62
1234…20212223…40414243…60616263…
Segment 1
Segment 3
Segment 2
Segment 2
Segment 1
Segment 3
Segment 4
Segment 3
Segment 2
Segment 1
Segment 4
Segment 4
Segment
Reader
Segment
Reader
Segment
Reader
Segment
Reader
Coordina
tor

PULSAR AVAILABILITY AND RESILENCY
63

DURABILITY
64
(CONTD.)
Bookie
Bookie
BookieBrokerProducer
Journal
Journal
Journal
fsync
fsync
fsync
Segment storage
Segment storage
Segment storage
background
process
https://drivescale.com/2017/03/whatever-happened-durability/

RESILENCY AND RECOVERY
65
BROKER, BOOKIE AND DATA CENTER FAILURES
Segment 1
Segment 2
Segment n
. . .
Segment 2
Segment 3
Segment n
. . .
Segment 3
Segment 1
Segment n
. . .
Segment 1
Segment 2
Segment n
. . .
Storage
Broker
Serving
Broker Broker
✦ Broker Failure
๏ Topic reassigned to available broker based on load
๏ Can construct the previous state consistently
๏ No data needs to be copied
✦ Bookie Failure
๏ Immediate switch to a new node
๏ Background process copies segments to other bookies to
maintain replication factor
✦ Datacenter Failure
๏ Built-in multi-datacenter replication
๏ Brokers in any datacenter can immediately serve replicated
topics

BROKER FAILURE RECOVERY
66
BROKER, BOOKIE AND DATA CENTER FAILURES
๏ Topic reassigned to available broker based on load
๏ Can construct the previous state consistently
๏ No data needs to be copied
๏ Failure handled transparently by client library

BOOKIE FAILURE RECOVERY
67
1234…20212223…40414243…60616263…
Segment 1
Segment 3
Segment 2
Segment 2
Segment 1
Segment 3
Segment 4
Segment 3
Segment 2
Segment 1
Segment 4
Segment 4

BOOKIE FAILURE RECOVERY
68
๏ After a write failure, BookKeeper will immediately
switch write to a new bookie, within the same
segment
๏ As long as we have any 3 bookies in the cluster, we
can continue to write
๏ In background, starts a many-to-many recovery
process to regain the conﬁgured replication factor

SEAMLESS CLUSTER EXPANSION
1234…20212223…40414243…60616263…
Segment 1
Segment 3
Segment 2
Segment 2
Segment 1
Segment 3
Segment 4
Segment 3
Segment 2
Segment 1
Segment 4
Segment 4
Segment Y
Segment Z
Segment X
69

MULTI-DATACENTER REPLICATION
70
๏ Scalable asynchronous replication
๏ Integrated in the broker message ﬂow
๏ Simple conﬁguration to add/remove
regions
Topic (T1) Topic (T1)
Topic (T1)
Subscription (S1) Subscription (S1)
Producer

(P1)
Consumer

(C1)
Producer

(P3)
Producer

(P2)
Consumer

(C2)
Data Center A Data Center B
Data Center C
DISASTER RECOVERY

SYNCHRONOUS REPLICATION
DISASTER RECOVERY
✦ Each topic owned by one broker at a
time, i.e in one datacenter
✦ ZooKeeper cluster spread across
multiple locations
✦ Broker commits writes to bookies in
both datacenter
✦ In event of datacenter failure, broker in
surviving datacenter assumes
ownership of topic
ZooKeeperProducers
Datacenter 1
Consumers
Pulsar Cluster
Datacenter 2
Producers
Consumers
71

ASYNCHRONOUS REPLICATION
DISASTER RECOVERY
Producers
(active)
Datacenter 1
Consumers
(active)
Pulsar Cluster
(primary)
Datacenter 2
Producers
(standby)
Consumers
(standby)
Pulsar Cluster
(standby)
Pulsar
replication
ZooKeeper ZooKeeper
✦ Two independent clusters, primary and
standby
✦ Conﬁgured tenants and namespaces
replicate to standby
✦ Data published to primary is
asynchronously replicated to standby
✦ Producers and consumers restarted in
second datacenter upon primary failure
72

REPLICATED SUBSCRIPTIONS
DISASTER RECOVERY
Producers
Datacenter 1
Consumers
Pulsar
Cluster 1
Subscriptions
Datacenter 2
Consumers
Pulsar
Cluster 2
Subscriptions
Pulsar
Replication
MarkerMarker Marker
73

GROWING ECOSYSTEM OF APACHE PULSAR
DISASTER RECOVERY
74

APACHE PULSAR COMMUNITY
✦ Twitter: @apache_pulsar
✦ Wechat Subscription: ApachePulsar
✦ Mailing Lists: dev@pulsar.apache.org, users@pulsar.apache.org
✦ Slack: https://apache-pulsar.slack.com
✦ Localization: https://crowdin.com/project/apache-pulsar
✦ Github 
https://github.com/apache/pulsar 
https://github.com/apache/bookkeeper
76

APACHE PULSAR AS A SAAS - PREVIEW
https://cloud.streamlio.com
77

COMPUTE REPRESENTATION - ABSTRACT VIEW
79
f(x)
Incoming Messages Output Messages

WHAT’S NEEDED: STREAM NATIVE COMPUTATION
80
✦ Simplest possible API
๏ Method/Procedure/Function
๏ Multi Language API
๏ Scale developers
✦ Message bus native concepts
๏ Input/Output/Log as topics
✦ Flexible runtime
๏ Simple standalone applications vs system managed applications

PULSAR FUNCTIONS
81
Execute user-deﬁned functions to process
and transform data
✦ Dynamic ﬁltering, transformation, routing and analytics
✦ Easy for developers: serverless deployment, fully managed
by cluster
✦ Multiple input topics, multiple output topics
✦ Access to windows of messages
✦ Integrated global state storage
✦ Integrated with schema registry
f(x)

PULSAR FUNCTIONS
82
SDK-LESS API
import java.util.function.Function;
public class ExclamationFunction implements Function<String, String> {
@Override
public String apply(String input) {
return input + "!";
}
}

PULSAR FUNCTIONS
83
SDK API
import org.apache.pulsar.functions.api.PulsarFunction;
import org.apache.pulsar.functions.api.Context;
public class ExclamationFunction implements PulsarFunction<String, String> {
@Override
public String process(String input, Context context) {
return input + "!";
}
}

PULSAR FUNCTIONS
84
INPUT AND OUTPUT
✦ Function executed for every message of input topic
๏ Supports multiple topics as inputs
✦ Function Output goes to the output topic
๏ Function Output can be void/null
✦ SerDe takes care of serialization/deserialization of messages
๏ Custom SerDe can be provided by the users
๏ Integrates with Schema Registry

PULSAR FUNCTIONS
85
AT MOST ONCE
AT LEAST ONCE
EXACTLY ONCE
PROCESSING GUARANTEES

PULSAR FUNCTIONS
86
AS A STANDALONE APPLICATION
bin/pulsar-admin functions localrun
--input persistent://sample/standalone/ns1/test_input
--output persistent://sample/standalone/ns1/test_result
--className org.mycompany.ExclamationFunction
--jar myjar.jar
✦ Runs as a standalone process
✦ Run as many instances as you want. Framework automatically balances data
✦ Run and manage via Mesos/K8/Nomad/your favorite tool

PULSAR FUNCTIONS
87
RUNNING INSIDE PULSAR CLUSTER
✦ ‘Create’ and ‘Delete’ Functions in a Pulsar Cluster
✦ Pulsar brokers run functions as either threads/processes/docker containers
✦ Unifies Messaging and Compute cluster into one, significantly improving manageability
✦ Ideal match for Edge or small startup environment
✦ Serverless in a jar

PULSAR FUNCTIONS - DEPLOYMENT
CONTAINERS
THREADS
PROCESSES
88

89
(CONTD.)
Broker 1
Worker
Function
wordcount-1
Function
transform-2
Broker 1
Worker
Function
transform-1
Function
dataroute-1
Broker 1
Worker
Function
wordcount-2
Function
transform-3
Node 1 Node 2 Node 3

90
(CONTD.)
Worker
Function
wordcount-1
Function
transform-2
Worker
Function
transform-1
Function
dataroute-1
Worker
Function
wordcount-2
Function
transform-3
Broker 1 Broker 2 Broker 3

91
(CONTD.)
Function
wordcount-1
Function
transform-1
Function
transform-3
Pod 1 Pod 2 Pod 3
Broker 1 Broker 2 Broker 3
Pod 7 Pod 8 Pod 9
Function
dataroute-1
Function
wordcount-2
Function
transform-2
Pod 4 Pod 5 Pod 6

STATE MANAGEMENT IN PULSAR FUNCTIONS
92

PULSAR FUNCTIONS
93
BUILT-IN STATE
✦ Functions can store state in stream storage
๏ Framework provides an simple library around this
✦ Support server side operations like counters
✦ Simpliﬁed application development
๏ No need to standup an extra system

PULSAR FUNCTIONS
94
BUILT-IN STATE MANAGEMENT
✦ Pulsar uses BookKeeper as its stream storage
✦ Functions can store State in BookKeeper
✦ Framework provides the Context object for users to access State
✦ Support server side operations like Counters
✦ Simpliﬁed application development
๏ No need to standup an extra system to develop/test/integrate/operate

PULSAR FUNCTIONS
95
STATE EXAMPLE
import org.apache.pulsar.functions.api.Context;
import org.apache.pulsar.functions.api.PulsarFunction;
public class CounterFunction implements PulsarFunction<String, Void> {
@Override
public Void process(String input, Context context) throws Exception {
for (String word : input.split(".")) {
context.incrCounter(word, 1);
}
return null;
}
}

PULSAR FUNCTIONS
96
STATE IMPLEMENTATION
✦ The built-in state management is powered by Table Service in BookKeeper
✦ BP-30: Table Service
๏ Originated for a built-in metadata management within BookKeeper
๏ Expose for general usage. e.g. State management for Pulsar Functions
✦ Available from Pulsar 2.4

PULSAR FUNCTIONS
97
STATE IMPLEMENTATION
✦ Updates are written in the log streams in BookKeeper
✦ Materialized into a key/value table view
✦ The key/value table is indexed with rocksdb for fast lookup
✦ The source-of-truth is the log streams in BookKeeper
✦ Rocksdb are transient key/value indexes
✦ Rocksdb instances are incrementally checkpointed and stored into BookKeeper for fast recovery

EVENT PROCESSING DESIGN PATTERNS
DYNAMIC DATA ROUTING
ETL
DATA ENRICHMENT
FILTERING
98
WINDOW AGGREGATION

STATEFUL SERVERLESS APPLICATIONS
100

100
Generate and exchange intermediate data or ephemeral state

100
MapReduce  
(Spark, Hadoop)

100
M
M
M
M
M
R
R
R
M
M
R
R
M
M
R
MapReduce  
(Spark, Hadoop)

100
MapReduce  
(Spark, Hadoop)
Stateful Streaming
Video Analytics
…

100
Need a serverless layer for sharing and exchanging ephemeral state
MapReduce  
(Spark, Hadoop)
Stateful Streaming
Video Analytics
…

100
Requirements
Low Latency,
High IOPS
Lifetime
Management
Fine-grained
Elasticity
Need a serverless layer for sharing and exchanging ephemeral state
MapReduce  
(Spark, Hadoop)
Stateful Streaming
Video Analytics
…

Requirements
Low Latency,
High IOPS
Lifetime
Management
Fine-grained
Elasticity
EXISTING APPROACHES
101

Requirements
Low Latency,
High IOPS
Lifetime
Management
Fine-grained
Elasticity
EXISTING APPROACHES
101
CPU CPUCPU
Remote Persistent
Storage (e.g., S3)
…
…Stateful Tasks CPU

Requirements
Low Latency,
High IOPS
Lifetime
Management
Fine-grained
Elasticity
EXISTING APPROACHES
101
Video Encoding in
ExCamera [NSDI’17]
Task#1
Task#2
Task#N
…
Rendezvous
Server
Adhoc

Sorting data on PyWren
using Locus [NSDI’19]
Requirements
Low Latency,
High IOPS
Lifetime
Management
Fine-grained
Elasticity
EXISTING APPROACHES
101
Reduce#1
Reduce#2
Reduce#M
…
Map#1
Map#2
Map#N
…
Video Encoding in
Task#1
Task#2
Task#N
…
Rendezvous
Server
Redis
Adhoc

Requirements
Low Latency,
High IOPS
Lifetime
Management
Fine-grained
Elasticity
EXISTING APPROACHES
101
Reduce#1
Reduce#2
Reduce#M
…
Map#1
Map#2
Map#N
…
Video Encoding in
Task#1
Task#2
Task#N
…
Rendezvous
Server
Redis
Adhoc General

Requirements
Low Latency,
High IOPS
Lifetime
Management
Fine-grained
Elasticity
EXISTING APPROACHES
101
Reduce#1
Reduce#2
Reduce#M
…
Map#1
Map#2
Map#N
…
Video Encoding in
Task#1
Task#2
Task#N
…
Rendezvous
Server
Redis
Adhoc General
Anna [VLDB’19, IEEE TKDE’19]

Requirements
Low Latency,
High IOPS
Lifetime
Management
Fine-grained
Elasticity
EXISTING APPROACHES
101
Reduce#1
Reduce#2
Reduce#M
…
Map#1
Map#2
Map#N
…
Video Encoding in
Task#1
Task#2
Task#N
…
Rendezvous
Server
Redis
Adhoc General
Pocket [OSDI’18]
Anna [VLDB’19, IEEE TKDE’19]

JIFFY: MEMORY MANAGEMENT UNIT FOR SERVERLESS OS
102
…
CPU CPUCPU …CPU

102
…
CPU CPUCPU …CPU
Jiﬀy: Remote Ephemeral Storage
Application: Scale ephemeral storage
resources independent of other resources
Cloud Provider: Multiplex ephemeral
storage for high utilization

102
…
CPU CPUCPU …CPU
Jiﬀy: Remote Ephemeral Storage
Application: Scale ephemeral storage
resources independent of other resources
Cloud Provider: Multiplex ephemeral
storage for high utilization
Challenges:
What is the right interface?
How can we share ephemeral storage across applications with isolation?
How should we manage lifetimes of application storage?
How to facilitate efﬁcient communication across tasks?

JIFFY INTERFACE
103
Virtual Memory Layer: Transparent memory scaling at “block” granularity for each namespace
CreateNamespace(), DestroyNamespace()
Stateful Programming Models: Use data structures to exchange state between tasks
…Map Reduce Dataflow Streaming Dataflow Piccolo
Distributed Data Structure Layer: Wrap “blocks” to efficiently support rich semantics
…FIFO Queues Files Hash Table B-Tree
Enqueue(),  
Dequeue()
Read(),  
Write()
Get(),  
Put(),…
Lookup(),  
Insert(),…
M
M
R
R

Isolation: Separate data structure per namespace
Multiplexing: Blocks multiplexed across data structures
JIFFY: HIGH UTILIZATION WITH ISOLATION
104
Transparent scaling by adding/removing blocks &
data-structure speciﬁc repartitioning
Serve
r#1
Server
#2
Server
#N
Jiffy Approach
…DS#1 DS#N
Shared Ephemeral Storage
App#1 App#2 App#N…
High utilization by multiplexing ephemeral storage across apps
Provide isolation guarantees across applications

JIFFY: STATE LIFETIME MANAGEMENT
105

105
New challenges in serverless compute platforms: independent compute/memory lifetimes

105
Server-centric Architectures

105
Serverless Architectures

105
Goal: Couple lifetime of storage resources to application lifetime

105

Existing storage systems: do not couple
105

105
Programming languages: scoping & garbage collection

Challenge: Identify data scope, lifetime when compute and storage are separated
105

105
Jiﬀy Approach: Hierarchical namespaces with lease management

105
App1 App2
Task1 Task1 Task1 Task2
Subtask1 Subtask2
App3
/

105
App1 App2
Task1 Task1 Task1 Task2
Subtask1 Subtask2
App3
lease duration,
last renewed
Lease Renewals
Application
Tasks
/

JIFFY: INTER-TASK COMMUNICATION
106
Ephemeral Remote Storage
?
A CPU BCPU
How does B know it has data to consume?

106
?
A CPU BCPU
Jiffy: in-built notiﬁcation mechanism to
indicate availability of data
Jiffy
CPUA CPU B

106
?
A CPU BCPU
Jiffy
CPUA CPU B
Subscribe(Put)

106
?
A CPU BCPU
Jiffy
Notify(Put, K, V)
CPUA CPU B
Put(K, V)

JIFFY: SYSTEM OVERVIEW
107
Directory Service
Storage Service
Hierarchical namespaces
Data Structure per Namespace
Jiffy Client
Lease Renewal
Lease Management
Notiﬁcation Framework
Block-level allocator
CONTROL
DATA

TWOFOLD
JIFFY: KEY IDEAS
SEPARATION OF CONTROL PLANE
AND DATA PLANE
HIERARCHICAL NAMESPACES  
For resource multiplexing  
and lifetime management
ELASTIC SCALING
MILLISECOND TIMESCALES
ISOLATION BETWEEN TASKS
108

EVALUATION
LATENCY
ELASTICITY
MBPS
IOPS
109
FOUR DIMENSIONS

HOW WELL DOES JIFFY PERFORM?
110
Serverless Platform AWS Lambda Service
Storage Service Amazon EC2 (m4.16xlarge instances)
Compared Storage Systems Redis, Apache Crail, Pocket, DynamoDB, Amazon S3
Latency/IOPS/MBPS comparable to state-of-the-art (Redis, Apache Crail, Pocket)
• ~100us/operation for 64B requests, at ~100,000 operations per second.
Transparent ﬁne-grained elasticity for various data structures within 2-500ms
110

PERFORMANCE FOR STATEFUL APPLICATIONS
111
Encode 15min 4k
video on ExCamera
TaskID
15
12
9
6
3
0
Task Latency (s)
0 15 30 45 60
ExCamera
ExCamera + Jiffy
Sort 50GB data on
PyWren
S3
Redis
Jiffy
Task Latency (s)
0 10 20 30 40 50
Map Task
Reduce Task
TPC-DS Queries on
100GB data on Hive
Q1
Q2
Q3
Q4
Q5
Task Latency (s)
0 160 320 480 640 800
Local HDFS
Jiffy
Takeaway
Jiffy performance is comparable to state-of-the-art, even while providing
ﬁne-grained transparent elasticity, lifetime-management, etc.

Total Capacity
BENEFITS OF MULTIPLEXING
112
50GB sort jobs arriving every 50s,
50 100
Used
capacity
Time
0
Delay until
capacity
available
UsedCapacity
(GB)
0
10
20
30
40
50
60
Time (s)
0 50 100 150 200 250 300 350 400 450 500
Sort-1 Sort-2 Sort-3 Sort-4 Sort-5
0
10
20
30
40
50
60
Time (s)
0 50 100 150 200 250 300 350 400 450 500
Redis Jiffy
on storage system with ﬁxed 50GB capacity
No Available Capacity

SERVERLESS STREAMING ANALYTICS
113

STREAMING
APPROXIMATEREAL-TIME 115

DATA SKTECHES
CARDINALITY QUANTILES FREQUENT ELEMENTSMEMBERSHIP
116
EXAMPLE FAMILIES

IP/ Device ID Blacklisting
Databases (e.g., speed up semi-join
operations), Caches, Routers,
Storage Systems Reduce space requirement in
probabilistic routing tables
MEMBERSHIP
APPLICATIONS
117

MEMBERSHIP
CUCKOO FILTER
[Fan et al. 2014]
BLOOM FILTER
[Bloom 1970]
NEURAL BLOOM FILTER
[Rae et al. 2019]
LEARNED BLOOM FILTER
[Mitzenmacher 2018]
118
FLAVORS

BLOOM FILTER
[1]
119
[1] Bloom (1970). “Space-Time Trade-oﬀs in Hash Coding with Allowable Errors”.
[2] IllustraCon borrowed from hNp://www.eecs.harvard.edu/~michaelm/postscripts/im2005b.pdf
[2]

BLOOM FILTER
120
✦ Natural generalization of hashing
✦ False positives are possible
✦ No false negatives
No deletions allowed
✦ For false positive rate ε, # hash functions = log2(1/ε)
where, n = # elements,
k = # hash functions
m = # bits in the array

CUCKOO FILTER
[1]
121
✦ Key Highlights
๏ Add and remove items dynamically
๏ For false positive rate ε < 3%, more space efficient than Bloom filter
๏ Higher performance than Bloom filter for many real workloads
๏ Asymptotically worse performance than Bloom filter
‣ Min fingerprint size α log (# entries in table)
✦ Overview
๏ Stores only a fingerprint of an item inserted
‣ Original key and value bits of each item not retrievable
๏ Set membership query for item x: search hash table for fingerprint of x
[1] Fan et al. (2014). “Cuckoo Filter: PracCcally BeNer Than Bloom”, CoNEXT.

CUCKOO FILTER
[1]
122
Cuckoo Hashing [1]
[1] R. Pagh and F. Rodler. “Cuckoo hashing,” Journal of Algorithms, 51(2):122-144, 2004.
[2] IllustraCon borrowed from Fan et al., (2014) “Cuckoo Filter: PracCcally BeNer Than Bloom”, CoNEXT.
[2]
IllustraCon of Cuckoo hashing [2]
✦ High space occupancy
✦ Practical implementations: multiple items/bucket
✦ Example uses: Software-based Ethernet switches
Cuckoo Filter [2]
✦ Uses a multi-way associative Cuckoo hash table
✦ Employs partial-key cuckoo hashing
๏ Store ﬁngerprint of an item
๏ Relocate existing ﬁngerprints to their alternative
locations
[2]

123[1] Mitzenmacher et al. (2017). “AdapCve Cuckoo Filters”.
✦ Motivation
๏ Minimize false positive rate
✦ Selectively remove false positives without introducing false
negatives
✦ Maintain a replica of cuckoo hash table with raw elements
✦ Indices of buckets are determined by hash values of the
element, and not solely by the fingerprint
✦ Allow different hash functions for the fingerprints
๏ Enables removal and reinsertion of elements to remove
false positives
✦ Insertion complexity and space overhead
KEY HIGHLIGHTS
ADAPTIVE CUCKOO FILTER
[1]

124
CONCURRENT CUCKOO FILTER
[1]
[1] Li et al. (2014), “Algorithmic Improvements for Fast Concurrent Cuckoo Hashing”.
Support for multiple writers
Optimistic cuckoo hashing
Minimizes the size of the locked critical section during updates
Leverage Intel’s Hardware Transactional Memory (HTM)
Optimize TSX lock elision to reduce transactional abort rate
Algorithmic/Architectural tuning
Breadth-first Search for an Empty Slot
Lock After Discovering a Cuckoo Path
Striped fine-grain spin locks
Increase set-associativity
Prefetcing

CUCKOO FILTER
125
CUCKOO++ HASH TABLES
[Scouarnec 2018]
MORTON FILTER
[Breslow et al. 2018]
SMART CUCKOO
[Sun et al. 2017]
POSITION-AWARE CUCKOO
[Kwon et al. 2018]
VARIANTS

126[1] IllustraCon borrowed from Lang et al. (2019) “Performance-OpCmal Filtering: Bloom Overtakes Cuckoo at High Throughput”.
PERFORMANCE COMPARISON
[1]
Space-precision trade-off
Memory footprint
False positive rate
Rate of negative lookups
Throughput
Cache misses, # Network messages, Local disk I/O
Saved work per lookup that filtering avoids
Optimizations
Register blocking
Cache sectorization
METRICS

[1]
127[1] Kraska et al. (2018). “The Case for Learned index Structures”, SIGMOD.
✦ Bloom filter as a binary classifier - predict whether a key exists in as set or not (membership)
๏ Subtleties - no false negatives
‣ Learned model + auxiliary data structure
✦ Learn structure of lookup keys
๏ Minimize collisions between keys and non-keys
๏ Leverage continuous functions to capture the underlying data distribution
✦ Learn different models for read-heavy vs. write-heavy workloads
KEY HIGHLIGHTS

SANDWICHING
[1]
[1] Mitzenmacher (2018). “A Model for Learned Bloom Filters and OpCmizing by Sandwhiching”, NIPS.
✦ Challenges
๏ Deletion of keys
‣ Re-train the model
✦ Sandwich Learned Bloom Filter
๏ Increased robustness
✦ Pre-filtering
๏ Remove keys not present
๏ Minimizes the distance between the distribution of the queries and test set used to
estimate the learned Bloom filter’s false positive probability
๏ Limits the size of the Backup filter
✦ Computationally more complex than Learned Bloom Filter
[1]
128

129[1] Rae et al. (2019). “Meta-Learning Neural Bloom Filters”.
NEURAL BLOOM FILTER
[1]
✦ Inputs arrive at high throughput, or are ephemeral
๏ Few-shot neural data structures
✦ Learning membership in one-shot via meta-learning
✦ Overview
๏ Sample tasks from a common distribution
๏ Network learns to specialize to a given task with few examples
KEY HIGHLIGHTS
[1]

FREQUENT ELEMENTS
130
TOP-K ELEMENTS, HEAVY HITTERS
✦ E-commerce
✦ Security
✦ Network measurements
✦ Sensing
✦ Databases
✦ Feature selection

FREQUENT ELEMENTS
COUNT-SKETCH
[Charikar et al. 2002]
COUNT-MIN-LOG
[Pitel & Fouquier 2015]
COUNT-MIN
[Cormode & Muthukrishnan 2005]
LEARNED COUNT-MIN
[Hsu et al. 2019]
131
5 5 5 5

✦ A two-dimensional array counts with w columns and d rows
✦ Each entry of the array is initially zero
✦ d hash functions are chosen uniformly at random from a pairwise independent family
✦ Update
๏ For a new element i, for each row j and k = hj(i), increment the kth column by one
✦ Point query where, sketch is the table
✦ Parameters
COUNT-MIN
[1]
132
[1] Cormode and Muthukrishnan (2005). "An Improved Data Stream Summary: The Count-Min Sketch and its
ApplicaCons". J. Algorithms 55: 29–38.
),( δε
!
!
"
#
#
$
=
ε
e
w
!
!
"
#
#
$
=
δ
1
lnd
}1{}1{:,,1 wnhh d ……… →

✦ Millions/billions of features - a routine
๏ NLP, genomics, computational biology, chemistry
✦ Accuracy vs. Performance trade-oﬀ
๏ Model vs. runtime
✦ Model Interpretability
COUNT-SKETCH
FEATURE SELECTION
✦ Feature Hashing
๏ Loss of interpretability
✦ Count-Sketch + top-k heap
๏ top-k values of the sketch used for
iterative update
[1] IllustraCon borrowed from Aghazadeh et al. (2018). “MISSION: Ultra Large-Scale Feature SelecCon using
Count-Sketched”.
[1]
133

✦ Count-Min sketch with conservative update (CU sketch)
✦ Update an item with frequency c
๏ Avoid unnecessary updating of counter values => Reduce over-estimation error
๏ Prone to over-estimation error on low-frequency items
✦ Lossy Conservative Update (LCU) - SWS
๏ Divide stream into windows
๏ At window boundaries, ∀ 1 ≤ i ≤ w, 1 ≤ j ≤ d, decrement sketch[i,j] if 0 < sketch[i,j] ≤
COUNT-MIN
[1]
[1] Cormode, G. 2009. Encyclopedia entry on ’Count-MinSketch’. In Encyclopedia of Database Systems. Springer., 511–516.
VARIANTS
134

✦ Minimize error of low frequency items
✦ Overview
๏ Same structure than Count-Min Sketch with conservative update
๏ Replace the classical binary counting cells by log counting cells
COUNT-MIN-LOG
[1]
135[1] Pitel and Fouquier (2015). "Count-Min-Log sketch: Approximately counCng with approximate counters”.
UPDATE
QUERY

✦ Applications
๏ Changepoint/Global Iceberg Detection
๏ Entropy Estimation
UnivMON
[1]
136[1] Liu et al. (2016). "One Sketch to Rule Them All: Rethinking Network Flow Monitoring with UnivMon”.
ONLINESKETCHINGSTEP
OFFLINEESTIMATION
✦ Universal sketch
✦ Provably accurate for estimating a large class of functions
✦ Generality
๏ Delay binding to application of interest
✦ High ﬁdelity

✦ Need for line rate processing: 10-100 Gbps
✦ Limited memory in switching hardware
๏ Memory ∝ # heavy ﬂows
HASH-PIPE
[1]
137[1] Sivaraman et al. (2017). “Heavy-HiNer DetecCon EnCrely in the Data Plane”.
✦ Small time budget: 1 ns
๏ Manipulate state & process packets at each stage
๏ Process each packet only once

✦ Exploit patterns in the input
๏ For example, in text data, word frequency ∝ 1/word length
✦ Mitigate large estimation error
๏ Collisions between high-frequency elements
✦ Learn properties to identify heavy hitters
✦ Does not need to know the data distribution a priori
✦ Logarithmic improvement in error bound
✦ Key high level idea
๏ Assign each heavy hitter to its unique bucket
LEARNED COUNT-MIN
[1]
138[1] Hsu et al. (2019). “Learning-based Frequency EsCmaCon Algorithms”, ICLR.

LEARNED COUNT-MIN
139
✦ Frequency of an element in a unique bucket is exact
✦ Provably reduces estimation errors
[1] IllustraCon borrowed from Hsu et al. (2019). “Learning-based Frequency EsCmaCon Algorithms”, ICLR.
[1]

REAL-TIME FREQUENT ELEMENTS in PULSAR & HERON
140
Streamlio (Apache Pulsar and Apache Heron)
Data
Source 2
clean-fn 2
Data
Source 1
Data
Source 3
clean-fn 1
trend-
topology 3
Trending
Application
T1
T2
T3

PRIVATE COUNT-MIN
[1]
141[1] Melis et al. (2016), “Eﬃcient Private StaCsCcs with Succinct Sketches”.
✦ out-of-dictionary words → auto-complete
✦ Why not employ homomorphic encryption for privacy-preserving aggregation?
✦ Perform private aggregation over the sketches, rather than the raw inputs
✦ Reduce the communication and computation complexity
๏ Linear to logarithmic in the size of their input
✦ Real-world privacy-friendly systems
๏ Recommendations for media streaming services
๏ Prediction of user locations
‣ Improve transportation services and predict future trends
✦ Federated learning

FEDERATED LEARNING
142[1] IllustraCon borrowed from hNps://ai.googleblog.com/2017/04/federated-learning-collaboraCve.html.
[1]

FEDERATED & DIFFERENTIALLY
PRIVATE
Discover the heavy hitters but not their frequencies
Without additional noise
Iterative algorithm[1]
randomly a select set of users
Each user votes on a single character extension
to an already discovered popular prefix
Server aggregates the received votes using a trie structure and prunes
nodes that have counts that fall below a chosen threshold θ
[1] Zhu et al. (2019), “Federated Heavy HiNers with DiﬀerenCal Privacy”.
143

Customer
CARDINALITY ESTIMATION
# DISTINCT ELEMENTS
IN A DATABASE
# UNIQUE SEARCH
QUERIES
# UNIQUE WEBSITE
VISITORS
# DISTINCT NETWORK
FLOWS
144
APPLICATIONS

145
✦ Hash values as strings
✦ Occurrence of particular patterns in the binary representation
✦ Example: Hyperloglog [Flajolet et al. 2008]
BIT-PATTERN OBSERVABLES
✦ Hash values as real numbers
✦ k-th smallest value
๏ Insensitive to distribution of repeated values
✦ Examples: MinCount [Giroire, 2000]
ORDER STATISTIC OBSERVABLES

SKETCH-BASED VS. SAMPLING BASED
UNIFORM HASHING VS. LOGARITHMIC HASHING
INTERNAL BASED VS. BUCKET BASED
146
FLAVORS
Adaptive sampling, Distinct sampling, Method-of-Moments Estimator, (Smoothed) Jacknife Estimator
LogLog, SuperLogLog, HyperLogLog, and HyperLogLog++
MinCount
Counting Bloom filter

✦ Apply hash function h to every element in a multiset
✦ Cardinality of multiset is 2max(ϱ) where 0ϱ-11 is the bit pattern observed at the beginning of a hash
value
✦ Above suﬀers with high variance
๏ Employ stochastic averaging
๏ Partition input stream into m sub-streams Si using ﬁrst p bits of hash values (m = 2p)
147
HYPERLOGLOG
where

148
HYPERLOGLOG
OPTIMIZATIONS
✦ Use of 64-bit hash function
๏ Total memory requirement 5 * 2p -> 6 * 2p, where p is the precision
✦ Empirical bias correction
๏ Uses empirically determined data for cardinalities smaller than 5m and uses the unmodified raw estimate
otherwise
✦ Sparse representation
๏ For n≪m, store an integer obtained by concatenating the bit patterns for idx and ϱ(w)
๏ Use variable length encoding for integers that uses variable number of bytes to represent integers
๏ Use difference encoding - store the difference between successive elements
✦ Other optimizations [1, 2]
[1] hNp://druid.io/blog/2014/02/18/hyperloglog-opCmizaCons-for-real-world-systems.html
[2] hNp://anCrez.com/news/75

149
ANOMALY DETECTION
QuantelAI

150
ANOMALY DETECTION
Aggregate
function{}
Aggregate
function{}
Aggregate
function{}
Partitioned
Pulsar
topic*
Pulsar
Broker*
Aggregate
function{}
FIX logs
Market Data
Alternate Data
Fluent-bit
(Producer)
Partitioned
topic
Windowing
&
aggregations
applied here*
Higher level
aggregations*
Eagle AI
Model Server
(Consumer)
Pulsar
topic *
Pulsar
topic *
Aggregate
function{}
Aggregate
function{}
Aggregate
function{}
Aggregate
function{}
Aggregate
function{}
Aggregate
function{}
Aggregate
function{}
Aggregate
function{}
Aggregate
function{}
Aggregate
function{}
Aggregate
function{}
Aggregate
function{}
* Indicates components that can be load balanced
QuantelAI

SKETCHING FOR MACHINE LEANING
151

✦ Stochastic/Incremental gradient descent
๏ Slow to converge
✦ Variance reduction, Accelerated gradient descent
๏ AdaBound, AMSGrad, Nesterov, Adamax, Adam, RMSProp, AdaDelta
‣ Stragglers worsen the convergence
✦ Select a subset of training data points along with their
corresponding learning rates
๏ Greedily maximize the facility location function
‣ Minimizes the upper-bound on the estimation error of the full
gradient
FASTER TRAINING
[1] Mirzasoleiman et al. (2019). “Data Sketching for Faster Training of Machine Learning Models”.
152
KEY IDEA

SERVERLESS MACHINE LEARNING
153

CATEGORIES
REGRESSIONCLASSSIFICATION
154

Problem Statement
fn(x): smooth function
h(x): non-smooth function (such as l1 and l2 penalty)
Leverage ADMM
Worker w updates its own copy xw and master updates
global variable z
OPTIMIZATION
156 [1] Aytekin and Johansson (2019), “Harnessing the Power of Serverless RunCmes for Large-Scale OpCmizaCon”.

OPTIMIZATION
[1]
157[1] IllustraCon borrowed from Aytekin and Johansson (2019), “Harnessing the Power of Serverless RunCmes for
Large-Scale OpCmizaCon”.
✦ Discussion
๏ Utilization, Cold start, Responsiveness
[1]

158
OPTIMIZATION
[1]
[1] Gupta et al. (2019). “OverSketched Newton: Fast Convex OpCmizaCon for Serverless Systems”.
✦ Large-scale optimization problems
๏ Second order methods
‣ Use gradient and Hessian
‣ Faster convergence
‣ Do not require step size tuning
‣ Computationally prohibitive when training data is large
๏ Go Serverless
‣ Invoke thousands of workers
‣ Communication costs (# iterations)
‣ Compute approximate Hessian
✦ Matrix sketching
๏ Randomized Numerical Linear Algebra (RandNLA)
๏ Inbuilt resiliency against stragglers
‣ Leverage ECC to create redundant computation

OPTIMIZATION
159
✦ Gradient computation
๏ Matrix-vector multiplication
‣ Coded Matrix Multiplication - distributed, straggler resilient
[1]
[1] IllustraCon borrowed from Gupta et al. (2019). “OverSketched Newton: Fast Convex OpCmizaCon for
Serverless Systems”.

160
OPTIMIZATION
✦ Hessian computation
๏ Matrix-matrix multiplication (MM)
‣ Block partitioning of input matrices
‣ Sparse sketching matrix based on Count-Sketch
[1]
[1] IllustraCon borrowed from Gupta et al. (2019). “OverSketched Newton: Fast Convex OpCmizaCon for Serverless Systems”.
✦ Applications - Distributed, Straggler resilient
๏ Ridge Regularized Linear Regression

INFERENCE IN SERVERLESS
ENVIRONMENTS
161 [1] IllustraCon borrowed from Dakkak et al. (2018). “TrIMS: Transparent and Isolated Model Sharing for Low Latency Deep
Learning Inference in FuncCon as a Service Environments”.
Key Challenge: Low Latency
Cold Start: move large amount of model data within and across servers
Persistent model store across the GPU, CPU, local
storage, and cloud storage hierarchy
[1]

162
LOW LATENCY
[1] IllustraCon borrowed from Crankshaw et al. (2017). “Clipper: A Low-Latency Online PredicCon Serving System”.
✦ Content recommendation service
๏ Example: News
๏ Latency < 100 ms
✦ Scalability: Hundred of Millions/Billions per sec
✦ Deployment and Maintenance
✦ Optimizations
๏ Throughput
‣ Caching, Adaptive Batching
๏ Accuracy
‣ Bandit and Ensemble Methods
๏ Model Selection
‣ On a per user/session basis
‣ Straggler mitigation
[1]

CHALLENGES
RESOURCE MANAGEMENT
163[1] IllustraCon borrowed from Yadwadkar et al. (2019), “A Case for Managed and Model-less Inference Serving”.
[1]

WHAT MAKES IOT ANALYTICS DIFFERENT?
165
More Data
✦ High-volume, continuous
data in motion from
multiple sensors
✦ Store, blend and manage
time-series data
More Complexity
✦ Use of multiple analytics
techniques
✦ Distributed analytics
(edge)
More Automation
✦ Integration with operations
systems and BPS
✦ Bidirectional
communication and
control of endpoints

WHAT MAKES IOT ANALYTICS DIFFERENT?
166
Devices Gateways
Data
Collectors
Data
Transport Processing Repositories Applications

CHALLENGES
167
✦ Latency - delay resulting from data transmission from edge to cloud or datacenter may exceed
application requirements
✦ Capacity - volume of data streams would require expensive network bandwidth to collect and
transmit detailed data
✦ Processing lag - time required to process incoming data streams to make them ready for
applications may exceed requirements
✦ Complexity - complicated mix of technologies and tools creates inconsistency and operations
burdens

WHAT’S NEEDED?
168
✦ Simpliﬁed infrastructure for data movement
and processing
✦ Performance and scalability to keep up
with data
✦ Ability to process, understand and act on
data wherever it is
Resilient, scalable data movement
From edge to cloud to datacenter (and back)
Unified platform
Consistent development and processing environment
across edge, cloud, datacenter
Intelligence everywhere
Dynamically filter, process, analyze and route data as needed
at edge, cloud and datacenter

IOT DATA FABRIC
169
Apache Pulsar
Edge Cloud Datacenter
Integrated solution for
event data movement,
processing and storage
Scalable for deployment
across, edge, cloud and
datacenter
Simple framework for
ﬁltering, transformation,
enrichment, analytics
Built on Apache Pulsar
open source technology,
proven at massive scale

IOT ARCHITECTURE WITH APACHE PULSAR
170
Devices Gateways
Data
Collectors
Data
Transport Processing Repositories Applications
Apache Pulsar

SERVERLESS: MISSING PIECES
172
SLA Guarantees
Performance guarantees,
Performance isolation
Security
Side-channels, Information
leakage via network
communications
Heterogenous
Hardware
FPGAs, GPAs, TPUs, etc

173
✦ Increased co-residency: side-channels
๏ Rowhammer attacks on DRAM
[1]
๏ Exploiting Micro-architectural vulnerabilities
✦ Information leakage via network communications
✦ Potential solutions
๏ Hardware-level security and isolation
๏ Light-weight and secure container isolation
๏ Task-placement strategies
Security
MISSING PIECES: SECURITY

174
✦ Increased multiplexing = less predictable performance
๏ Resource-allocation delays
๏ Scheduling delays
๏ Cold-start latencies
✦ Potential solutions
๏ Hardware-level isolation, container-level isolation
๏ Bin-packing based on performance needs (throughput, latency)
๏ Bin-packing based on complementary resource needs
MISSING PIECES: SLA GUARANTEES
SLA Guarantees

175
✦ Only CPU resources, no hardware heterogeneity
๏ GPU
๏ TPU
๏ FPGAs
✦ Not fundamental, providers eventually will offer them
✦ Leads to new opportunities:
๏ Greater degree of multiplexing for different resource types
๏ Bin-pack applications with different hardware needs
MISSING PIECES: HETEROGENEOUS HARDWARE
Heterogeneous
Hardware

176
✦ Serverless enables:
๏ Complexity hiding
๏ Consumption based billing
๏ Automatic scaling
✦ All players beneﬁt:
๏ Developers (simpler programming)
๏ Enterprises (lower costs)
๏ Cloud providers (high resource utilization)
✦ Future Serverless infrastructures will address today’s shortcomings
๏ Security, SLA guarantees, Heterogenous hardware.
SERVERLESS IS THE FUTURE

179
ACKNOWLDEGEMENTS
RACHIT AGARWAL, ION STOICA,
ADITYA AKELLA
ERIC JONAS, JOHANN SCHLEIER-
SMITH
VIKRAM SREEKANTI, CHIA-CHE TSAI
QIFAN PU, VAISHAAL SHANKAR,
JOAO MENEZES CARREIRA, KARL
KRAUTH, NEERAJA YADWADKAR,
JOSEPH GONZALEZ, RALUCA
ADA POPA, DAVID A. PATTERSON

SERVERLESS
Peeking Behind The Curtains Of Serverless Platforms
[Wang et al. 2018]
The Serverless Data Center : Hardware Disaggregation
Meets Serverless Computing
[Pemberton and Schleier-Smith, 2019]
A Berkeley View On Serverless Computing
[Jonas et al. 2018]
SAND: Towards High-Performance Serverless
Computing
[Akkus et al. 2018]
The Server Is Dead, Long Live The Server: Rise Of
Serverless Computing, Overview Of Current
State And Future Trends In Research
And Industry
[Castro et al. 2019]
Agile Cold Starts For Scalable Serverless
[Mohan et al. 2019]
181

182Slide -
[Brenner and Kapitza, 2019]
Trust More, Serverless
Clemmys: towards secure remote execution in FaaS
[Trach et al. 2019]
SERVERLESS
182
No More, No Less - A Formal Model For
Serverless Computing
[Gabbrielli et al. 2019]
Serverless Computing: One Step Forward,
Two Steps Back
[Hellerstein et al. 2019]
Formal Foundations Of Serverless Computing
[Jangda et al. 2019]

numpywren: serverless
linear algebra
183
SERVERLESS ANALYTICS/MACHINE LEARNING
Shuffling, Fast and Slow: Scalable
Analytics on Serverless
Infrastructure
A Serverless Real-Time Data
Analytics Platform for Edge
Computing
[Nastic et al. 2017]
[Ishakian et al. 2017]
Serving deep learning
models in a serverless
platform
[Carreira et al. 2018]
A Case for Serverless
Machine Learning
[Pu et al. 2019]
[Bhattacharjee et al. 2019]
BARISTA: Efficient and Scalable Serverless Serving
System for Deep Learning Prediction Services
[Kim and Lin 2018]
Serverless Data
Analytics with Flint
[Shankar et al. 2018]
[Feng et al. 2018]
Exploring Serverless
Computing for Neural
Network Training

ACCELERATED STOCHASTIC GRADIENT DESCENT
On the momentum term in gradient
descent learning algorithms
[Qian 1999]
Accelerating stochastic gradient
descent using predictive
variance reduction
[Johnson and Zhang 2013]
184
A method for unconstrained convex
minimization problem with the rate
of convergence O(1/k2)
[Nesterov 1983]
Adaptive Subgradient Methods for
Online Learning and Stochastic
Optimization
[Duchi et al. 2011]
Incorporating Nesterov
Momentum into Adam
[Dozat 2016]
Adam: a Method for Stochastic
Optimization
[Kingma and Ba 2015]
Fast Stochastic Variance Reduced Gradient
Method with Momentum Acceleration for
Machine Learning
[Shang et al. 2017]
On the Convergence of Adam
and Beyond
[Reddi et al. 2019]

OPTIMIZATION
[Drineas and Mahoney 2016]
RandNLA: Randomized Numerical Linear
Algebra
[Gupta et al. 2019]
OverSketched Newton: Fast Convex Optimization
for Serverless Systems
[Boyd et al. 2010]
Distributed Optimization and Statistical
Learning via the Alternating Direction
Method of Multipliers
[Parikh and Boyd, 2014]
Proximal Algorithms
185
[Roosts et al. 2018]
Newton-MR: Newton’s method without
smoothness or convexity

APPROXIMATION
A stochastic approximation method
[Robbins and Munro 1951]
On a stochastic approximation method
[Chung et al. 1954]
An analysis of approximations for maximizing submodular set
functions - I
[Nemhauser et al. 1978]
An analysis of approximations for maximizing submodular set
functions - II
[Nemhauser et al. 1978]
Accelerated greedy algorithms for maximizing submodular set
functions
[Minoux 1978]
186

A general-purpose counting filter:
Making every bit count
[Pandey et al. 2017]
Multiple Set Matching and Pre-Filtering
with Bloom Multifilters
[Concas et al. 2019]
Cuckoo filter: Practically better than
Bloom
[Fan et al. 2014]
Improving retouched bloom filter for
trading off selected false positives
against false negatives
[Donnet et al. 2010]
187
MEMBERSHIP
Bloom filters in adversarial
environments
[Naor and Yegev 2015]
Bloom Filters, Adaptivity, and
the Dictionary Problem
[Bender et al. 2018]
Don’t thrash: how to cache your
hash on flash
[Bender et al. 2012]
The bloomier filter: an efficient data
structure for static support lookup
tables
[Chazelle et al. 2004]

FREQUENT ELEMENTS
[Sivaraman et al. 2017]
Heavy-Hitter Detection Entirely
in the Data Plane
[Roy et al. 2016]
Augmented Sketch: Faster and more Accurate
Stream Processing
[Aghazadel et al. 2018]
MISSION: Ultra Large-Scale Feature Selection
using Count-Sketches
[Harrison et al. 2018]
Network-Wide Heavy Hitter Detection
with Commodity Switches
188

NEURAL NETWORK BASED APPROACHES
Cardinality estimation with local deep
learning models
[Woltmann et al. 2019]
Learned Cardinalities: Estimating Correlated Joins
with Deep Learning
[Kipf et al. 2018]
Cardinality estimation using neural
networks
[Liu et al. 2015]
An Empirical Analysis of Deep Learning for
Cardinality Estimation
[Ortiz et al. 2019]
189

✦ Federated Optimization: Distributed Machine Learning for On-Device
Intelligence [Konečný et al. 2016]
✦ Communication-Efficient Learning of Deep Networks from Decentralized
Data [McMahan et al. 2016]
✦ Federated Learning: Strategies for Improving Communication Efficiency
[Konečný et al. 2016]
✦ Towards Federated Learning at Sscale: System Design [Bonawitz et al.
2019]
✦ Asynchronous FEDERATED Optimization [Xie et al. 2019]
✦ FEDERATED Heavy Hitters with Differential Privacy [Zhu et al. 2019]
FEDERATED LEARNING
190

ON THE WWW
192
Serverless deep/machine learning in
production—the pythonic way
https://medium.com/@waya.ai/deploy-deep-machine-learning-
in-production-the-pythonic-way-a17105f1540eServerless Inference
https://github.com/castorini/serverless-inference
https://martinfowler.com/articles/
serverless.html
An overview of gradient descent
optimization algorithms
http://ruder.io/optimizing-gradient-descent/
Amazon Elastic Inference
https://aws.amazon.com/machine-learning/
elastic-inference
OpenLambda — An open source
serverless computing platform
https://github.com/open-lambda/
open-lambda
Serverless Predictions at Scale
http://aws-de-media.s3.amazonaws.com/images/AWS_Summit_2018/
June6/Doppler/Serverless_Predictions_At_Scale.pdf

Serverless Streaming Architectures and Algorithms for the Enterprise

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Serverless Streaming Architectures and Algorithms for the Enterprise

Similar to Serverless Streaming Architectures and Algorithms for the Enterprise (20)

More from Arun Kejariwal

More from Arun Kejariwal (20)

Recently uploaded

Recently uploaded (20)

Serverless Streaming Architectures and Algorithms for the Enterprise