Successful Architectures for Fast Data

Successful Architectures for
Fast Data
Patrick McFadin
VP Developer Relations, DataStax
@PatrickMcFadin

© DataStax, All Rights Reserved.
• VP Developer Relations at DataStax
• Solution Architect for DataStax
• Chief Architect at Hobsons
• Building and deploying web apps since 1996
2
Who am I?

THE CHALLENGE:
Fast Data?
• Big Data is all about processing at scale
• Fast Data can be any size
• Fast Data is the biggest challenge in Data
Management today
3
When do I know I have Fast Data vs Big Data?

© DataStax, All Rights Reserved.4
The Problem
Your
Magical
App

Danger here be dragons
5

The Great Unknown
6
Scary parts
Scary parts
Scary parts
Scary parts
Scary parts
Scary parts
Scary parts
Scary parts

Distributed? Sad!
shard 1 shard 2 shard 3 shard 4
App Server
client

Uptime
Rack Rack Rack
Data Center
Rack Rack Rack
Data Center
0% chance of 100% uptime in one data center

Uptime - Cloud version
AZ AZ AZ
Region
AZ AZ AZ
Region
0% chance of 100% uptime in one region

Remember AWS:Reboot?
Double-click to edit

Can you withstand failure?
Double-click to edit

133 ms
Looks like you want to go
faster than light. Can I help?
Yes No

Solutions? Sad!

Time to slay the dragons

Organize Process Store
Macro Architecture for Success

Spark
Mesos
Akka
Cassandra
Kafka

CassandraAkka
SparkKafka
Mesos
KafkaKafkaKafka
SparkSparkSpark
AkkaAkkaAkka
CassandraCassandraCassandra

CassandraAkka
SparkKafka

Kafka decouples data pipelines

The problem
Kitchen
Hamburger
please
Meat disk
on bread
please

Scale
Producer
Topic = Hamburgers
Order
1
Order
2
Consumer
Order
3
Order
4
Order
5
Topic = Pizza
Order
1
Order
2
Order
3
Order
4
Order
5
Topic = Food

Kafka
Producer
Topic = Temperature
Temp
1
Temp
2
Consumer
Temp
3
Temp
4
Temp
5
Collection
API
Temperature
Processor
Topic = Precipitation
Precip
1
Precip
2
Precip
3
Precip
4
Precip
5
Precipitation
Processor
Broker

Kafka
Producer
Topic = Temperature
Temp
1
Temp
2
Consumer
Temp
3
Temp
4
Temp
5
Collection
API
Temperature
Processor
Precip
1
Precip
2
Precip
3
Precip
4
Precip
5
Precipitation
Processor
Broker
Partition 0
Partition 0

Kafka
Producer Consumer
Collection
API
Temperature
Processor
Precipitation
Processor
Topic = Temperature
Tem
1
Temp
2
Tem
3
Temp
4
Temp
5
Precip
1
Precip
2
Precip
3
Precip
4
Precip
5
Broker
Partition 0
Partition 0
Tem
1
Temp
2
Tem
3
Temp
4
Temp
5
Partition 1
Temperature
Processor

Kafka
Producer Consumer
Collection
API
Temperature
Processor
Precipitation
Processor
Topic = Temperature
Tem
1
Temp
2
Tem
3
Temp
4
Temp
5
Precip
1
Precip
2
Precip
3
Precip
4
Precip
5
Broker
Partition 0
Partition 0
Tem
1
Temp
2
Tem
3
Temp
4
Temp
5
Partition 1
Temperature
Processor
Topic = Temperature
Tem
1
Temp
2
Tem
3
Temp
4
Temp
5
Precip
1
Precip
2
Precip
3
Precip
4
Precip
5
Broker
Partition 0
Partition 0
Tem
1
Temp
2
Tem
3
Temp
4
Temp
5
Partition 1
Topic Temperature
Replication Factor = 2
Topic Precipitation

Kafka
Producer
Consumer
Collection
API
Temperature
Processor
Precipitation
Processor
Topic = Temperature
Tem
1
Temp
2
Tem
3
Temp
4
Temp
5
Precip
1
Precip
2
Precip
3
Precip
4
Precip
5
Broker
Partition 0
Partition 0
Tem
1
Temp
2
Tem
3
Temp
4
Temp
5
Partition 1 Temperature
Processor
Topic = Temperature
Tem
1
Temp
2
Tem
3
Temp
4
Temp
5
Precip
1
Precip
2
Precip
3
Precip
4
Precip
5
Broker
Partition 0
Partition 0
Tem
1
Temp
2
Tem
3
Temp
4
Temp
5
Partition 1
Temperature
Processor
Temperature
Processor
Precipitation
Processor
Topic Temperature
Topic Precipitation

Guarantees
Order
•Messages are ordered as they are sent by the
producer
•Consumers see messages in the order they were
inserted by the producer
Durability
•Messages are delivered at least once
•With a Replication Factor N up to N-1 server failures
can be tolerated without losing committed messages

Akka in a nutshell
• Highly concurrent
• Reactive
• Fully distributed
• Completely elastic and resilient
Actor
Mailbox
Actor
Mailbox
Actor
Mailbox
Actor
Mailbox

Temperature High/Low Stream
Weather
Stations
Receive API
Producer
TemperatureActor
TemperatureActor
TemperatureActor
Consumer

Cluster Server
Token Range
0 0-25
26 26-50
51 51-75
76 76-100
Server
ServerServer
0-25
76-100
26-5051-75

Replication
10.0.0.1
00-25
DC1
DC1: RF=1
Node Primary
10.0.0.1 00-25
10.0.0.2 26-50
10.0.0.3 51-75
10.0.0.4 76-100
10.0.0.1
00-25
10.0.0.4
76-100
10.0.0.2
26-50
10.0.0.3
51-75

Replication
10.0.0.1
00-25
10.0.0.4
76-100
10.0.0.2
26-50
10.0.0.3
51-75
DC1
DC1: RF=2
Node Primary Replica
10.0.0.1 00-25 76-100
10.0.0.2 26-50 00-25
10.0.0.3 51-75 26-50
10.0.0.4 76-100 51-75
76-100
00-25
26-50
51-75

Replication DC1
DC1: RF=3
Node Primary Replica Replica
10.0.0.1 00-25 76-100 51-75
10.0.0.2 26-50 00-25 76-100
10.0.0.3 51-75 26-50 00-25
10.0.0.4 76-100 51-75 26-50
10.0.0.1
00-25
10.0.0.4
76-100
10.0.0.2
26-50
10.0.0.3
51-75
76-100
51-75
00-25
76-100
26-50
00-25
51-75
26-50

Consistency DC1
DC1: RF=3
10.0.0.1 00-25 76-100 51-75
10.0.0.2 26-50 00-25 76-100
10.0.0.3 51-75 26-50 00-25
10.0.0.4 76-100 51-75 26-50
10.0.0.1
00-25
10.0.0.4
76-100
10.0.0.2
26-50
10.0.0.3
51-75
76-100
51-75
00-25
76-100
26-50
00-25
51-75
26-50
Client
Write to
partition 15

Consistency level
Consistency Level Number of Nodes Acknowledged
One One
Quorum 51%

Consistency DC1
DC1: RF=3
10.0.0.1 00-25 76-100 51-75
10.0.0.2 26-50 00-25 76-100
10.0.0.3 51-75 26-50 00-25
10.0.0.4 76-100 51-75 26-50
10.0.0.1
00-25
10.0.0.4
76-100
10.0.0.2
26-50
10.0.0.3
51-75
76-100
51-75
00-25
76-100
26-50
00-25
51-75
26-50
Client
Write to
partition 15
CL= One

Consistency DC1
DC1: RF=3
10.0.0.1 00-25 76-100 51-75
10.0.0.2 26-50 00-25 76-100
10.0.0.3 51-75 26-50 00-25
10.0.0.4 76-100 51-75 26-50
10.0.0.1
00-25
10.0.0.4
76-100
10.0.0.2
26-50
10.0.0.3
51-75
76-100
51-75
00-25
76-100
26-50
00-25
51-75
26-50
Client
Write to
partition 15
CL= Quorum

Multi-datacenter
DC1
DC1: RF=3
10.0.0.1 00-25 76-100 51-75
10.0.0.2 26-50 00-25 76-100
10.0.0.3 51-75 26-50 00-25
10.0.0.4 76-100 51-75 26-50
10.0.0.1
00-25
10.0.0.4
76-100
10.0.0.2
26-50
10.0.0.3
51-75
76-100
51-75
00-25
76-100
26-50
00-25
51-75
26-50
Client
Write to
partition 15
DC2
10.1.0.1
00-25
10.1.0.4
76-100
10.1.0.2
26-50
10.1.0.3
51-75
76-100
51-75
00-25
76-100
26-50
00-25
51-75
26-50
10.1.0.1 00-25 76-100 51-75
10.1.0.2 26-50 00-25 76-100
10.1.0.3 51-75 26-50 00-25
10.1.0.4 76-100 51-75 26-50
DC2: RF=3

Great combo
Store a ton of data Analyze a ton of data

Great combo
Spark Streaming
Near Real-time
SparkSQL
Structured Data
MLLib
Machine Learning
GraphX
Graph Analysis

Great combo
Spark Streaming
Near Real-time
SparkSQL
Structured Data
MLLib
Machine Learning
GraphX
Graph Analysis
CREATE TABLE raw_weather_data (
wsid text,
year int,
month int,
day int,
hour int,
temperature double,
dewpoint double,
pressure double,
wind_direction int,
wind_speed double,
sky_condition int,
sky_condition_text text,
one_hour_precip double,
six_hour_precip double,
PRIMARY KEY ((wsid), year, month, day, hour)
) WITH CLUSTERING ORDER BY (year DESC, month DESC, day DESC, hour DESC);
Spark Connector

Executer
Master
Worker
Executer
Executer
Server

Master
Worker
Worker
Worker Worker
0-24Token Ranges
0-100
25-49
50-74
75-99
I will only
analyze 25% of
the data.

DSE 5.1 Download today!
https://academy.datastax.com/downloads

CassandraAkka
SparkKafkaKafkaKafkaKafka
SparkSparkSpark
AkkaAkkaAkka
CassandraCassandraCassandra
I need CPU!!
I need memory!!
Got you covered

Kafka
Akka AkkaAkka
Kafka
Spark Spark

Kafka
Akka
Akka
Akka
Kafka
Spark Spark

Ready to go!
Mesosphere DC/OS

One Last Dragon
63

Will the cloud save you?

Macro Architecture for Success
Rent Rent

Be multi-cloud. Be happy
AWS
GCP
Azure

Did you see this?

How are you going to build
your next amazing app?
68

Let us help you!
www.datastax.com
academy.datastax.com
@PatrickMcFadin

Successful Architectures for Fast Data

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Successful Architectures for Fast Data

Similar to Successful Architectures for Fast Data (20)

More from Patrick McFadin

More from Patrick McFadin (17)

Recently uploaded

Recently uploaded (20)

Successful Architectures for Fast Data