Transactional Streaming: If you can compute it, you can probably stream it.

Transactional Streaming
If you can compute it, you can probably stream it.
John Hugg
March 30th, 2016
@johnhugg / jhugg@voltdb.com

Who Am I?
• First developer on the VoltDB project.
• Previously at Vertica and other data
startups.
• Have made so many bad decisions
over the years, that now I almost know
what I'm talking about.
• jhugg@voltdb.com
• @johnhugg
• http://chat.voltdb.com

Operations at Scale
• Ingest data from several sources into a horizontally scalable system.
• Process data on arrival  
(i.e., transform, correlate, ﬁlter, and aggregate data).
• Understand, act, and record.
• Push relevant data to a downstream, big data system.

Data Movement
Processing Logic
State Management

One Size
Fits All
• Analytics and operational
stateful stores require
different storage engines to
be optimal. 
Columns vs. Rows 
Vertica vs. VoltDB
• Machine Learning 
Multi-Dim Math 
Search
• Microservices?
• Data Value?

Speciﬁcally:
Operational Stream Processing
and
Operational State
Where integration makes sense:
Leading Edge
Operations

What’s the Difference?
• Non-integrated systems means you write glue code, or you use
someone’s glue code.
• Operational glue code is different from batch-oriented glue code.
• Batch or OLAP has huge safety nets for glue code:
• HDFS, CSV, immutable data sets
• “Blow it away and reload”
• Much less time pressure

Glue Glue
You wrote this.
1 User.
Tested Well
1000s of users
Tested Well
1000s of users
Tested Well
1000s of users
Community Supplied
Many Users

But I’m not writing “glue code”
“I’m just using the well-tested Cassandra driver in my Storm code.”
• You’re using a computer network. They are not always reliable.
• Storm might fail in the middle of processing.
• Cassandra might fail in the middle of processing.
• Both systems are tested for this, but not together, using your glue
code.

Operational Glue
Code is Hard
Main Point:
Minimize it

Transactional Stream
Processing

Use the same system for
state and processing.
Ensures they are tested together.
No independant failures.

1 Transaction = 1 Event
ACID
• Atomic: Either 100% done or 0% done. No in-between.
• (Consistent)
• Isolated: Two concurrent operations can’t interfere with each other
• Durable: If it says it’s done, then it is done.

Processing Code
for a Single Event
Database / State

Processing Code
for a Single Event
Database / State
x x x x
Not Atomic

Romeo And Juliet Explain “Atomicity”
Operation 1:
Fake your death
Operation 2:
Tell Romeo

Processing Code
for a Single Event
Database / State
Processing Code
for a Single Event
Not Isolated

“A good example is
the best sermon.”
- Benjamin Franklin

Call Center Management
http://www.publicdomainpictures.net/
3000 AgentsMillions of Customers
Dashboards & Alerts Billing
Actions
Events
Processing
State

Call Center Management
Events
• “Begin Call” 
Calling Number, Agent Id, Start Time, etc…
• “End Call” 
Calling Number, Agent Id, End Time, etc…

What Kind of Problems
• Correlation - Streaming Join
• Out-of-order delivery
• At least once delivery - How to dedup
• Generate new event on call completion - once
• Precise Accounting
• Precise Stats - Event time vs processing time

Public Code
https://github.com/VoltDB/app-callcenter
It’s not ﬁnished as of today…

What’s the Hardest Part?
BeginCall code
EndCall code
State
Fake Call Generator
(Makes event pairs
with delay)
Bad Network
Transformer
(Duplicate & delay)
My Client Code

Schema for Call Center Example
CREATE TABLE opencalls
(
call_id BIGINT NOT NULL,
agent_id INTEGER NOT NULL,
phone_no VARCHAR(20 BYTES) NOT NULL,
start_ts TIMESTAMP DEFAULT NULL,
end_ts TIMESTAMP DEFAULT NULL,
PRIMARY KEY (call_id, agent_id, phone_no)
);
CREATE TABLE completedcalls
(
call_id BIGINT NOT NULL,
agent_id INTEGER NOT NULL,
phone_no VARCHAR(20 BYTES) NOT NULL,
start_ts TIMESTAMP NOT NULL,
end_ts TIMESTAMP NOT NULL,
duration INTEGER NOT NULL,
PRIMARY KEY (call_id, agent_id, phone_no)
);
Unpaired call begin/end events
Can arrive in any order
Any match transactionally
moves to the completed
calls table

Filtering Duplicates
Requires Idempotence

is the property of certain operations in
mathematics and computer science, that can be
applied multiple times without changing the
result beyond the initial application.
Idempotence

Idempotent Not Idempotent
set x = 5;
same as
set x = 5; set x = 5;
x++;
not same as
x++; x++;
if (x % 2 == 0) x++;
same as
if (x % 2 == 0) x++;
if (x % 2 == 0) x++;
if (x % 2 == 0) x *= 2;
not same as
if (x % 2 == 0) x *= 2;
if (x % 2 == 0) x *= 2;
spill coffee on brown pants eat whole plate of spaghetti

Idempotent Operations
Exactly Once Semantics
At-Least-Once Delivery
+
=

How to make BeginCall Idempotent?
• If call record is in completed calls,
ignore.
• If the call record is in open calls and is
missing end time, ignore.
• If call record is in open calls, check if
this event completes the call. 
Yes, handle swapped begin & end
• Otherwise, create an new record in
open calls table.
open calls
completed calls
Tables

How to make BeginCall Idempotent?
ignore.
open calls table.
open calls
completed calls
TablesIdempotency

ignore.
open calls table.
This thing to the left
is a transaction.

Actual Math
https://www.ﬂickr.com/photos/kimmanleyort/13148718593
Accounting & Statistics May Require:

Counting
• Counting is hard at
scale.
• 2 Kinds of fail:
• Missed counts
• Extra counts

Counting
Read
Read
x=27
Write 28
x=27 x=28
Write 28
x=28Value:
Incrementer 1:
Incrementer 2:

Counting
Systems with single-key
consistency
Systems with special features
to enable counters
ACID transactional systems
Systems that enforce a
single writer
As we say in
New England…
Performance is
wicked variable.
Not “Read Committed”

Accounting
• Accounting is just counting, but more so.
• Need to be able to increment by amount (or decrement).
• Often need to increment/decrement things in groups.

Accounting
• When gamer buys a Mystical Sword of Hegemony, update the following:
• Debit the gamer’s rubies or whatever.
• Update real-world region stats, like swords sold in gamer’s geo-
region, total money spent in gamer’s geo-region etc…
• Update game region stats for the current game location, say the
“Tar Shoals of Dintymoore”, like number of MSoHs in the region.
• Increment any offer-related stats, like record whether the MSoH was
offered because of customer engagement algorithm X15 or B12.

Accounting
consistency
to enable counters
single writer
As we say in
New England…
Performance is
wicked variable.
?

Last Dollar Problem
• Ad-Tech app wants to show a user an ad from a campaign.
• The price of the ad is $0.90.
• Advertiser has $1.00 campaign budget left.
• If the budget check and the display aren’t ACID, it’s possible to
decide to show the ad twice.
• Ad-Tech app is forced to choose between over or under-billing.

Aggregation
• Aggregation is just counting and accounting that the system does
for you.
• Often this is counting chopped up by groups.
• Eg. Sword sales by region. % success by offer.
• In Call Center, it could be average call length by agent.

Accounting Aggregation
consistency
to enable counters
single writer
As we say in
New England…
Performance is
wicked variable.
?

How to Aggregate Without Consistency?
• Use a stand-alone stream processor.
• Best ﬁt for aggregation by time, and speciﬁcally by processing
time, not event time.
• Run a query on all the data every time you want the aggregation.
• BOO!

Actual Math
What’s the mean and standard deviation of call length  
chopped up various ways?

Running Variance
is my next band name.

The Details (mostly) Don’t Matter
• Still need to think about performance and likely horizontal
partitioning of work.
• Integration of State & Processing + Full ACID Transactions 
=> I can program this math without thinking about:
• Failure
• Interference from weak isolation.
• Partial Visibility to State

Low Latency Can Affect the Decision
500ms
Want to be here You lose money here

Get Into the “Fast Path”
• Policy Enforcement in Telco
• Fraud Detection “Smoke Tests”
• Change what a user sees in response to action:
• Change the next webpage content based on recent
website actions.
• Pick what’s behind the magic door based on how the
game is going.

Problem
Factory full of robots
Sometimes they break
They log metadata

When Imperfect is Enough
• Before: No metadata. Maintenance works on stuff based on their
experience, schedules and visual inspection.
• Now: Basic stream processing system is up 99% of the time, and
provides a much richer guidance to maintenance.  
Robots fail less often and cost less to operate.
• Possible Future: More sophisticated stream processing is up
99.99% of the time and offers even more insight. 
Robots fail a tiny bit less often and costs are a tiny bit down.

When Imperfect Isn’t Worth It
Probability of Failure
(under system X)
Expected Average
Failure Cost
# of Operations x xCost of System X +
• I’ve worked on Ad-Tech use cases => High # Operations
• Complex Multi-Cluster/System Monsters => High % failure
• Billing systems and fraud systems => High cost per failure
Licenses
Hardware
Engineering
(Switching Tech)

More consistent
systems don’t have to
be more expensive
Easier to develop => Less Engineering
More Efﬁcient => Less Hardware

Conclusion - Thank You!
• Operations => Integration Wins 
Analytics, Batch => Use Specialized Tools
• With transactions, complex math becomes
mostly typing.
• Many of these problems can be solved without
transactional streaming, but…
• It’s going to be harder
• It might be less accurate
BS
Stuff I Don't Know
Stuff I Know
T H I S TA L K
http://chat.voltdb.com
@johnhugg
jhugg@voltdb.com
all images from wikimedia w/ cc license unless otherwise noted

Transactional Streaming: If you can compute it, you can probably stream it.

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Transactional Streaming: If you can compute it, you can probably stream it.

Similar to Transactional Streaming: If you can compute it, you can probably stream it. (20)

Recently uploaded

Recently uploaded (20)

Transactional Streaming: If you can compute it, you can probably stream it.