More Related Content Similar to Make 2016 your year of SMACK talk (20) More from DataStax Academy (19) Make 2016 your year of SMACK talk2. Who are we?
2© 2015. All Rights Reserved.
Joe Stein - @allthingshadoop: CEO Elodina
Jon Haddad- @rustyrazorblade: Technical Evangelist, DataStax
Patrick McFadin- @PatrickMcFadin: Chief Evangelist, DataStax
9. • 75 data formats
• Process data in flight w/ a tight SLA / Real time analysis of data
to determine pricing
• scalable storage
• Deploy a lot of services reliably
• batch analytics
• Multiple data centers (Oh, and by the way, this has to work
across multiple DCs across several continents)
9© 2015. All Rights Reserved.
The problem in a huge nutshell
20. 20© 2015. All Rights Reserved.
Kafka decouples data-pipelines
23. 23© 2015. All Rights Reserved.
A high-throughput distributed messaging system
rethought as a distributed commit log.
33. Cassandra - More than one server
• All nodes participate in a cluster
• Shared nothing
• Add or remove as needed
• More capacity? Add a server
33
36. Token
Server
•Each partition is a 64 bit value
•Consistent hash between 2-63
and 264
•Each node owns a range of those
values
•The token is the beginning of that
range to the next node’s token value
•Virtual Nodes break these down
further Data
Token Range
0 …
42. Replication
DC1
DC1: RF=3
Node Primary Replica Replica
10.0.0.1 00-25 76-100 51-75
10.0.0.2 26-50 00-25 76-100
10.0.0.3 51-75 26-50 00-25
10.0.0.4 76-100 51-75 26-50
10.0.0.1
00-25
10.0.0.4
76-100
10.0.0.2
26-50
10.0.0.3
51-75
76-100
51-75
00-25
76-100
26-50
00-25
51-75
26-50
43. Consistency
DC1
DC1: RF=3
Node Primary Replica Replica
10.0.0.1 00-25 76-100 51-75
10.0.0.2 26-50 00-25 76-100
10.0.0.3 51-75 26-50 00-25
10.0.0.4 76-100 51-75 26-50
10.0.0.1
00-25
10.0.0.4
76-100
10.0.0.2
26-50
10.0.0.3
51-75
76-100
51-75
00-25
76-100
26-50
00-25
51-75
26-50
Client
Write to
partition 15
47. • Abstraction over RDDs
• Modeled after Pandas & R
• Structured data
• Python passes commands only
• Commands are pushed down
• Goal: Data Never Leaves the JVM
• You can still use the RDD if you want
• Operations are lazy
47© 2015. All Rights Reserved.
RDD
DataFrame
Dataframes
48. SparkSQL
48© 2015. All Rights Reserved.
movies.registerTempTable("movie")
ratings.registerTempTable("rating")
sql.sql("""select title, avg(rating) as avg_rating
from movie join rating
on movie.movie_id = rating.movie_id
group by title
order by avg_rating DESC limit 3""")
67. Goal we set out with
• smart broker.id assignment
• preservation of broker placement (through constraints and/or
new features)
• ability to-do configuration changes
• rolling restarts (for things like configuration changes)
• scaling the cluster up and down with automatic, programmatic
and manual options
• smart partition assignment via constraints visa vi roles,
resources and attributes
67© 2015. All Rights Reserved.
69. Scheduler & Executor
69© 2015. All Rights Reserved.
Scheduler
• Provides the operational automation for a Kafka Cluster
• Manages the changes to the broker's configuration
• Exposes a REST API for the CLI to use or any other client
• Runs on Marathon for high availability
Executor
• The executor interacts with the kafka broker as an intermediary
to the scheduler
70. CLI and REST API
• scheduler - starts the scheduler
• add - adds one more more brokers to the cluster
• update - changes resources, constraints or broker properties one or more brokers
• remove - take a broker out of the cluster
• start - starts a broker up
• stop - this can either a graceful shutdown or will force kill it (./kafka-mesos.sh help
stop)
• rebalance - allows you to rebalance a cluster either by selecting the brokers or
topics to rebalance. Manual assignment is still possible using the Apache Kafka
project tools. Rebalance can also change the replication factor on a topic
• help - ./kafka-mesos.sh help || ./kafka-mesos.sh help {command}
70© 2015. All Rights Reserved.
71. Launch 20 brokers in seconds
71© 2015. All Rights Reserved.
./kafka-mesos.sh add 1000..1019 --cpus 0.01 --heap 128 --mem 256 --options num.io.threads=1
./kafka-mesos.sh start 1000..1019
72. 72© 2015. All Rights Reserved.
Zipkin http://zipkin.io/
Apache Mesos Framework https://github.com/elodina/sawfly/blob/master/tristan.md
74. 74© 2015. All Rights Reserved.
LinkedIn Simoorg
https://github.com/linkedin/simoorg
Apache Mesos Framework https://github.com/elodina/sawfly/blob/master/pisaura.md
76. Multi-datacenter
DC1
DC1: RF=3
Node Primary Replica Replica
10.0.0.1 00-25 76-100 51-75
10.0.0.2 26-50 00-25 76-100
10.0.0.3 51-75 26-50 00-25
10.0.0.4 76-100 51-75 26-50
10.0.0.1
00-25
10.0.0.4
76-100
10.0.0.2
26-50
10.0.0.3
51-75
76-100
51-75
00-25
76-100
26-50
00-25
51-75
26-50
Client
Write to
partition 15
DC2
10.1.0.1
00-25
10.1.0.4
76-100
10.1.0.2
26-50
10.1.0.3
51-75
76-100
51-75
00-25
76-100
26-50
00-25
51-75
26-50
Node Primary Replica Replica
10.0.0.1 00-25 76-100 51-75
10.0.0.2 26-50 00-25 76-100
10.0.0.3 51-75 26-50 00-25
10.0.0.4 76-100 51-75 26-50
DC2: RF=3
77. Multi-datacenter
DC1
DC1: RF=3
Node Primary Replica Replica
10.0.0.1 00-25 76-100 51-75
10.0.0.2 26-50 00-25 76-100
10.0.0.3 51-75 26-50 00-25
10.0.0.4 76-100 51-75 26-50
10.0.0.1
00-25
10.0.0.4
76-100
10.0.0.2
26-50
10.0.0.3
51-75
76-100
51-75
00-25
76-100
26-50
00-25
51-75
26-50
Client
Write to
partition 15
DC2
10.1.0.1
00-25
10.1.0.4
76-100
10.1.0.2
26-50
10.1.0.3
51-75
76-100
51-75
00-25
76-100
26-50
00-25
51-75
26-50
Node Primary Replica Replica
10.0.0.1 00-25 76-100 51-75
10.0.0.2 26-50 00-25 76-100
10.0.0.3 51-75 26-50 00-25
10.0.0.4 76-100 51-75 26-50
DC2: RF=3
78. Multi-datacenter
DC1
DC1: RF=3
Node Primary Replica Replica
10.0.0.1 00-25 76-100 51-75
10.0.0.2 26-50 00-25 76-100
10.0.0.3 51-75 26-50 00-25
10.0.0.4 76-100 51-75 26-50
10.0.0.1
00-25
10.0.0.4
76-100
10.0.0.2
26-50
10.0.0.3
51-75
76-100
51-75
00-25
76-100
26-50
00-25
51-75
26-50
Client
Write to
partition 15
DC2
10.1.0.1
00-25
10.1.0.4
76-100
10.1.0.2
26-50
10.1.0.3
51-75
76-100
51-75
00-25
76-100
26-50
00-25
51-75
26-50
Node Primary Replica Replica
10.0.0.1 00-25 76-100 51-75
10.0.0.2 26-50 00-25 76-100
10.0.0.3 51-75 26-50 00-25
10.0.0.4 76-100 51-75 26-50
DC2: RF=3
79. Data Protection
• No longer OK to ship EU data to US under “Safe Harbour”
Product_Catalog RF=3
Product_Catalog RF=3 EU_Customer_Data RF=3
EU_Customer_Data RF=0
Product_Catalog RF=3
EU_Customer_Data RF=3