SlideShare a Scribd company logo
JOINING A p2p CONVERSATION
JOAQUIN CASARES, THE LAST PICKLE
JOINING A p2p CONVERSATION
JOAQUIN CASARES
▸ The Last Pickle
▸ Consultant
▸ Previously:
▸ Umbel
▸ Software Engineer
▸ Riptano/DataStax
▸ Support Engineer
▸ Software Engineer-in-Test
▸ Demo Engineer
JOINING A p2p CONVERSATION
THE LAST PICKLE
▸ 50+ years combined experience with Apache Cassandra.
▸ We communicate ideas.
▸ We are committed to doing the right thing for both our team of experts and our clients.
▸ Our passion for sharing our knowledge is present in all that we do.
▸ Consider us a member of your team.
▸ Ultimately:
▸ We want you to be successful and have all the information to do so.
WHERE DO WE GO
FROM HERE?
OVERVIEW
Overview
▸ p2p Networks.
▸ Cassandra fundamentals.
▸ How to add capacity.
▸ How to check on the status.
▸ Things you shouldn't forget, I think.
▸ How to forget.
DEFINITION: P2P
KaZaA TO BITTORRENT
KaZaA TO BITTORRENT
KaZaA: "P2P"
PEER
SUPERNODE
KaZaA TO BITTORRENT
KaZaA: "CENTRALIZED P2P"
PEER
SUPERNODE
KAZAA.COM
KaZaA TO BITTORRENT
KaZaA: SHUTDOWN
PEER
SUPERNODE
KAZAA.COM
CAN ANYONE HERE ME?
IS ANYONE ALIVE OUT THERE?
NO.
KaZaA TO BITTORRENT
BITTORRENT: A REAL DECENTRALIZED, DISTRIBUTED P2P NETWORK
PEER
TRACKER
SELF
KaZaA TO BITTORRENT
BITTORRENT: A REAL DECENTRALIZED, DISTRIBUTED P2P NETWORK
PEER
TRACKER
SELF
MY CONNECTION WAS SEVERED.
KaZaA TO BITTORRENT
BITTORRENT: A REAL DECENTRALIZED, DISTRIBUTED P2P NETWORK
PEER
TRACKER
SELF
I'M BANNED BY A STATE ACTOR.
I KNOW THAT HASH.
CASSANDRA TOKENS
CASSANDRA TOKENS
LEGACY OWNERSHIP
A G M T
CASSANDRA TOKENS
LEGACY OWNERSHIP
A G M T
M
G
A
T
CASSANDRA TOKENS
LEGACY OWNERSHIP
A G M T
M
G
A
T
(T, A]
(G, M]
(A, G](M, T]
CASSANDRA TOKENS
LEGACY OWNERSHIP
A G M T
M
G
A
T
(T, A]
(A, G]
(G, M]
(M, T]
CASSANDRA TOKENS
VIRTUAL NODE ("VNODES") OWNERSHIP
A D G J
3
2
1
4
M Q T W
1 3 2 1 3 2 4 4
CASSANDRA TOKENS
VIRTUAL NODE ("VNODES") OWNERSHIP
A D G J
3
2
1
4
M Q T W
1 3 2 1 3 2 4 4
CASSANDRA TOKENS
VNODES - JOINING A NODE
B D G J
3
2
1
4
M Q T W
5 3 2 1 3 2 4 4
5
P
5
A
1
USE 32-64 VNODES.
NOT: DEFAULT OF 256 VNODES / NODE.
The Last Pickle
CASSANDRA TOKENS
USE THE SAME NUM_TOKENS COUNT
ACROSS ALL MACHINES.
The Last Pickle
CASSANDRA TOKENS
CASSANDRA TOKENS
TIDBIT: MD5 TOKENS TO MURMUR3 TOKENS
▸ MD5 token range: 0 to 2127
▸ Murmur3 token range: -263
to 263
-1
▸ Murmur3:
▸ Better randomness
▸ Lower chance of collisions
▸ ~6x faster than MD5
SEED NODES
SEED NODES
CASSANDRA: P2P IMPLEMENTATION - JOINING
PEER
SEED
JOINING
CAN I JOIN THE PARTY?
HEY BOB, WE HAVE A NEW
FRIEND.
I'M BOB.
SEED NODES
CASSANDRA: P2P IMPLEMENTATION
PEER
SEED
NODE
HAVE SOME OF MY TOKENS.
I DIDN'T WANT THESE RANGES
ANYWAY.
SEED NODES
CASSANDRA: P2P IMPLEMENTATION - SEED FAILURE
PEER
SEED
NODE
IT'S OK. I KNOW WHO MY
FRIENDS ARE.
SEED NODES
CASSANDRA: P2P IMPLEMENTATION - GOSSIP
PEER
SEED
NODE
I'M BOB.
HEY BOB, I'M STILL HERE.
SEED NODES
TIDBIT: GOSSIP
▸ It works.
▸ Basically: echoes.
▸ CDN Gossip use case.
SEED NODES
CASSANDRA: P2P IMPLEMENTATION - FAILED JOINING
PEER
SEED
FAILED JOIN
NONE OF THESE ADDRESSES
WORK. WHERE'S THE PARTY?
SEED NODES
CASSANDRA: P2P IMPLEMENTATION - FAILED JOINING OF SEED NODE
PEER
SEED
JOINING SEED
SO... WAS IT THE CHICKEN?
OR THE EGG?
USE THE SAME SEED NODES
THROUGHOUT THE CLUSTER.
The Last Pickle
SEED NODES
3-5 SEED NODES PER DATA CENTER.
The Last Pickle
SEED NODES
BOOTSTRAP
BOOTSTRAP
ANNOUNCING
PEER
SEED
JOINING
I'M JOINING!
BOOTSTRAP
STREAMING
PEER
SEED
JOINING
THANKS FOR ALL THE DATA!
BOOTSTRAP
REPLICA OWNERSHIP
B D G J
3
2
1
4
M Q T W
5 3 2 1 3 2 4 4
5
P
5
A
1
BOOTSTRAP
REPLICA OWNERSHIP
B D G J
3
2
1
4
M Q T W
5 3 2 1 3 2 4 4
5
P
5
A
1
B
C
D
A
BOOTSTRAP
REPLICA OWNERSHIP
B D G J
3
2
1
4
M Q T W
5 3 2 1 3 2 4 4
5
P
5
A
1
B
C
D
A
Q
P
O
N
M
3 2
BOOTSTRAP
POST-BOOTSTRAP: CLEANUP
B D G J
3
2
1
4
M Q T W
5 3 2 1 3 2 4 4
5
P
5
A
1
B
C
D
A
Q
P
O
N
P
O
N
B
I REALLY DON'T NEED THIS
EXTRA PRESSURE P, O, & N.
SEE YOU, C!
M
BOOTSTRAP
POST-BOOTSTRAP: CLEANUP
B D G J
3
2
1
4
M Q T W
5 3 2 1 3 2 4 4
5
P
5
A
1
B
D
A
QP
I AM NOW RESPONSIBLE FOR BP.
M
BOOTSTRAP
WHEN TO USE THE BOOTSTRAP PROCESS
▸ Prerequisite: Is everything UN (Up/Normal)?
▸ nodetool status
BOOTSTRAP
WHEN TO USE THE BOOTSTRAP PROCESS
▸ Are you hitting a disk capacity issue?
▸ df -h
BOOTSTRAP
WHEN TO USE THE BOOTSTRAP PROCESS
▸ Are you hitting CPU capacity limits?
▸ top/htop
BOOTSTRAP
WHEN TO USE THE BOOTSTRAP PROCESS
▸ Does request latency have room for improvement?
▸ nodetool cfstats
▸ nodetool tablestats on newer versions of Cassandra.
BOOTSTRAP
WHEN TO USE THE BOOTSTRAP PROCESS
▸ Do you want to split up token hot spots?
▸ nodetool status $KEYSPACE
BOOTSTRAP
WHEN TO USE THE BOOTSTRAP PROCESS
▸ Is the prerequisite met and YES to any of the other questions? Then bootstrap!
BOOTSTRAP
PLAY IT SAFE
▸ UJ (Up/Joining) is an ephemeral state after ~2 minutes, but wait 5 minutes.
▸ UN (Up/Normal) is a persistent state.
▸ With Cassandra 2.2+, we have nodetool bootstrap resume, or simply restarting
the node.
▸ With Cassandra pre-2.2, we must clear:
▸ data_file_directories
▸ commitlog_directory
▸ saved_caches_directory
IF "POPCORN" JOINING:
NODETOOL NETSTATS
The Last Pickle
BOOTSTRAP
BOOTSTRAP
DON'TS
▸ Seed nodes cannot be bootstrapped.
▸ No need to include auto_bootstrap parameter in the cassandra.yaml.
▸ Do not join more than one node per rack concurrently.
▸ Do not start two bootstrap processes within 2 minutes of each other.
▸ Do not bootstrap a node when there are more than 0 nodes offline.
▸ The JVM option -Dcassandra.consistent.rangemovement=false can be used to override the default behavior.
▸ Will require a follow-up rolling repair.
▸ Do not bootstrap a node running a different version of Cassandra.
▸ Do not bootstrap new nodes into a mixed-version cluster.
▸ Don't forget about your racks! More on that later...
BOOTSTRAP
TIDBIT: CLEANUP
▸ Cleanup removes stale replicas off a node.
▸ Acts like a single-SSTable compaction.
▸ Does not need to be run manually, since any followup compactions will remove
stale data.
▸ Is useful when disk capacity is at it's limit.
KEEP FREE DISK SPACE UNDER 50% TO
ALLOW FOR NORMAL COMPACTIONS TO
COMPLETE SUCCESSFULLY.
The Last Pickle
BOOTSTRAP
REPLACE_ADDRESS
REPLACE_ADDRESS
REPLACING
PEER
SEED
DEAD
REPLACING
I'M REPLACING YOU JOHN!
JOHN, ARE YOU AROUND?
REPLACE_ADDRESS
REPLACING
PEER
SEED
REPLACING
HEY FELLOWS, I'M TINA,
THE NEW JOHN.
HEY TINA, HERE'S THE DATA
JOHN SHOULD HAVE HAD.
FOR IMPORTANT DATA:
USE AN ODD-NUMBERED
REPLICATION_FACTOR > 1.
The Last Pickle
REPLACE_ADDRESS
FOR IMPORTANT DATA:
WRITE AT A CONSISTENCY LEVEL OF
LOCAL_QUORUM OR HIGHER.
The Last Pickle
REPLACE_ADDRESS
TO AVOID STALE DATA:
READ AT A CONSISTENCY LEVEL OF
LOCAL_QUORUM OR HIGHER.
The Last Pickle
REPLACE_ADDRESS
REPLACE_ADDRESS
TIDBITS: REPLACE_ADDRESS
▸ If using cl.ONE, you might have a bad time.
▸ But:
▸ Remember to respect max_hint_window_in_ms, which defaults to 3 hours
and starts as soon as the original node is knocked offline.
▸ If the hinted handoff window is missed, a rolling repair may be needed.
▸ Use -Dcassandra.replace_address_first_boot=<IP_ADDRESS> to
prevent possible issues if you forget to remove the flag.
REPLACE_ADDRESS
TIDBITS: REPLACE_ADDRESS - EXPANDED
▸ If not using a consistency level higher than ONE, stale data or data loss is possible and likely in the event of a node
failure.
▸ But:
▸ Hinted handoff may prevent stale data or data loss if a new node is in the UN (Up/Normal) state within the
max_hint_window_in_ms, which defaults to 3 hours and starts as soon as the original node is knocked offline.
▸ If the hinted handoff window is missed, and if running with a replication factor > 1, and other replicas
received the newer mutation, a rolling repair will make the new replica consistent.
▸ Read-repair may prevent stale data from being returned and repair stale partitions, but the
dclocal_read_repair_chance table schema parameter defaults to 10% of all requests.
▸ Use -Dcassandra.replace_address_first_boot=<IP_ADDRESS> to prevent possible issues if you forget to
remove the flag.
REPLACE_ADDRESS
WHEN TO USE THE REPLACE_ADDRESS PROCESS
▸ Prerequisite: Is the node unrecoverable/inaccessible?
▸ Are the current snapshots too stale?
▸ Most times snapshots are used for disaster scenarios.
▸ Is there enough time to replace the old node and run a repair before
gc_grace_seconds, which defaults to 10 days?
▸ If not, use nodetool removetoken.
▸ Is YES the response to all of the above questions? Then replace_address.
REBUILD
REBUILD_ADDRESS EASY WAY TO SWAP
TO NEW HARDWARE.
The Last Pickle
REBUILD
REBUILD
JOINING
PEER
SEED
JOINING
I'M BOB.
BUT DON'T SEND US YOUR DATA.
WE'LL ASK WHEN READY.
I'M JULIA.
REBUILD
REBUILD
PEER
SEED
REBUILDING
LET'S DO THIS!
HOW MUCH LONGER, REALLY?
WOW, THOSE ARE FEW GIGS!
OK, READY NOW.
REBUILD
REBUILD
PEER
SEED
REBUILDING
LET'S DO THIS!
HOW MUCH LONGER, REALLY?
WOW, THOSE ARE FEW GIGS!
OK, READY NOW.
REBUILD
PROCESS: GETTING THINGS SORTED
▸ Prerequisite: Use the NetworkTopologyStrategy, instead of the SimpleStrategy.
▸ Prerequisite: Use a DCAwarePolicy with the Cassandra driver to restrict the contact points.
▸ Prerequisite: Use LOCAL_QUORUM and LOCAL_ONE consistency levels to restrict
requests to the specific data center.
▸ Do not define a replication strategy to the new new data center at first.
▸ Bootstrap all intended nodes into the new data center.
▸ Because there is no replication strategies that mention the new data center, this should
cause almost no streaming tasks.
REBUILD
PROCESS, PART 2: EXECUTING THE REBUILD
▸ Once all new nodes are in the new data center, continue.
▸ Modify the NTS settings to include the new data center.
▸ Run the rebuild process on as many concurrent nodes as latency metrics allow.
▸ To mitigate load on existing nodes, you may be able to use multiple data
center sources concurrently by using different data center parameters for the
nodetool rebuild command.
REBUILD
PROCESS, PART 3: USING THE NEW DATA CENTER
▸ Once all nodes have been completed the rebuild process, continue.
▸ If the intent was to remove a deprecated data center, update the
DCAwarePolicy for the Cassandra driver to point to the new data center and
restart the application. Then update NTS, and remove deprecated nodes.
▸ If the intent was to add a new data center, launch new application servers
within the same data center and modify the DCAwarePolicy to reference the
new data center.
REBUILD
WHEN TO USE THE REBUILD PROCESS
▸ Are you attempting to add or deprecate an entire data center?
▸ Migrate to new hardware?
▸ Moving to the cloud?
▸ Moving to a different cloud?
▸ Moving back from the cloud? :)
▸ Is YES the response to any of the above questions? Then rebuild.
THIS IS A WELL TRODDEN PATH.
DON'T WORRY.
BE HAPPY.
The Last Pickle
REBUILD
STATUS
STATUS
CHECK PROGRESS - SHOW OUTPUT
▸ nodetool compactionstats
▸ Monitor pending compactions, which are a byproduct of:
▸ Streaming data.
▸ nodetool cleanup
▸ Use nodetool setcompactionthroughput to throttle disk load.
▸ nodetool netstats
▸ Monitor active streaming tasks.
▸ nodetool status
▸ Monitor node's joining and up status.
RACKS
RACKS
PROPER BALANCE
▸ Balance is important when considering: disk load, token distribution, data center
load, and rack load.
▸ While other setups may be valid, Keep It Super Simple.
▸ Use 1 rack, or enough racks to equal the replication_strategy, for the data center.
▸ Ensure each rack always has an equal number of nodes.
▸ Each rack splits up the token range amongst themselves.
▸ Each data center will store its copies across racks, if available.
INTERNODE SECURITY
INTERNODE SECURITY
SSL ENCRYPTION
▸ Cassandra supports the following types of internode encryption:
▸ None.
▸ Inter-data center.
▸ Intra-data center, or inter-node.
▸ If data centers are separated by a public network, TLP recommends using inter-data
center encryption.
▸ If running with paranoid security settings, encryption can be used between each node,
regardless of topology settings.
EXPOSED JMX PORT
EXPOSED JMX PORT
LOCKING DOWN YOUR JMX PORT
▸ Cassandra allows access to system metrics, system commands, and potentially
destructive commands via a JMX port.
▸ By default, Cassandra 2.1.4+ restricts access to the JMX port to localhost.
▸ If remote access to JMX is required, edit cassandra-env.sh to change access
or authentication settings.
▸ TLP still recommends proper firewall and security settings be used to restrict
access to Cassandra from only verified machines.
REMOVING NODES
REMOVING NODES
DECOMMISSION
▸ nodetool decommission should be used for nodes that are still operational, but will no
longer be part of the rack.
▸ When decommissioning a node, the node's replica ranges are redistributed amongst the
surviving nodes in the rack.
▸ A "reverse bootstrap" occurs in which all replicas that the node is responsible for are
streamed to the new replica owners.
▸ Once all new replica owners hold all of the deprecated node's data, the node is removed
from the ring.
▸ After 72 hours, the node is removed from the gossip state.
REMOVING NODES
REMOVENODE
▸ nodetool removenode can be used when a node has died and will not be
replaced.
▸ The removenode command does not handle any streaming tasks, so follow up
repairs are required to ensure the cluster is in a consistent state.
▸ The removenode command simply removes a node from the gossip state,
forcing surviving nodes within the rack into being responsible for the
deprecated node's token ranges.
REMOVING NODES
ASSASSINATE
▸ Note: This is NOT a hammer.
▸ Sometimes gossip state can become wonky with echoes of previously removed nodes. In
these cases, and only in these cases, should nodetool assassinate be used.
▸ Much like nodetool removetoken, this command modifies the gossip state, but instead
of marking the node as being REMOVED, the entry is removed in its entirety.
▸ Sometimes a single command may not remove a stubborn gossip state. In these cases,
running nodetool assassinate across all nodes, in parallel, multiple times, may be
needed to remove any culprit gossip echoed states.
▸ Repeated note: This is NOT a hammer.
RECAP
RECAP
p2p NETWORKS
▸ KaZaA: Not really p2p.
▸ Bittorrent: Decentralized p2p.
▸ Cassandra: Stateful p2p, via Gossip.
RECAP
Fundamentals
▸ Nodes own multiple token ranges, when using Vnodes.
▸ Seed nodes allow new nodes to enter the cluster.
▸ Seed nodes also help keep a "consistent" topological state.
RECAP
ADDING CAPACITY
▸ Bootstrap
▸ The normal process to adding a node.
▸ Follow up with nodetool cleanup.
▸ replace_address
▸ Useful when a node is completely lost.
▸ nodetool rebuild
▸ Used to add an entire data center.
RECAP
STATUS
▸ Adding nodes creates collateral processes:
▸ Compaction.
▸ Streaming.
▸ Gossip entries.
RECAP
BE MINDFUL
▸ Ensure racks remain balanced upon topological changes.
▸ Ensure inter-node encryption is considered, especially when communicating
over an open network.
▸ Ensure JMX access is not accidentally exposed to public access.
RECAP
REMOVING NODES
▸ nodetool decommission
▸ Deprecate a live node.
▸ nodetool removetoken
▸ Remove a downed node and reassign token ranges.
▸ nodetool assassinate
▸ Remove a gossip entry entirely.
▸ Note: This is still NOT a hammer.
BUELLER? BUELLER?
QUESTIONS?
JOAQUIN@THELASTPICKLE.COM
Joaquin Casares, The Last Pickle
I'M BOB.
BON VOYAGE!
DIGIORNO
HASTA LA VISTA, BABY.
TTYL
TTYS
WELL THAT WAS FUN.
DON'T FORGET ABOUT ME!
WE'LL ALWAYS HAVE PARIS.
HERE'S LOOKING AT YOUR, KID.
I KNOW NOW WHY YOU CRY.
CYA

More Related Content

Similar to Joining a p2p Conversation - 2017-06 Meetup

flowr streamlining computing workflows
flowr streamlining computing workflowsflowr streamlining computing workflows
flowr streamlining computing workflows
sahil seth
 
Tasty Recipes for Every Day 2016 (Neos)
Tasty Recipes for Every Day 2016 (Neos)Tasty Recipes for Every Day 2016 (Neos)
Tasty Recipes for Every Day 2016 (Neos)
Sebastian Helzle
 
HBase: How to get MTTR below 1 minute
HBase: How to get MTTR below 1 minuteHBase: How to get MTTR below 1 minute
HBase: How to get MTTR below 1 minute
Hortonworks
 
To AWS with Ansible
To AWS with AnsibleTo AWS with Ansible
To AWS with Ansible
☁️ Gerben Geijteman
 
Elassandra
ElassandraElassandra
Elassandra
Diego Pacheco
 
2017-10-24 All Day DevOps - Disposable Development Environments
2017-10-24 All Day DevOps - Disposable Development Environments2017-10-24 All Day DevOps - Disposable Development Environments
2017-10-24 All Day DevOps - Disposable Development Environments
Boyd Hemphill
 
Basic stuff You Need to Know about Cassandra
Basic stuff You Need to Know about CassandraBasic stuff You Need to Know about Cassandra
Basic stuff You Need to Know about Cassandra
Yu-Chang Ho
 
MySQL Server Settings Tuning
MySQL Server Settings TuningMySQL Server Settings Tuning
MySQL Server Settings Tuningguest5ca94b
 
Node, can you even in CPU intensive operations?
Node, can you even in CPU intensive operations?Node, can you even in CPU intensive operations?
Node, can you even in CPU intensive operations?
The Software House
 
Going Multiplayer With Kafka With Ben Gamble | Current 2022
Going Multiplayer With Kafka With Ben Gamble | Current 2022Going Multiplayer With Kafka With Ben Gamble | Current 2022
Going Multiplayer With Kafka With Ben Gamble | Current 2022
HostedbyConfluent
 
OSDC 2019 | Storage Wars – Using Ceph since Firefly by Achim Ledermüller
OSDC 2019 | Storage Wars – Using Ceph since Firefly by Achim LedermüllerOSDC 2019 | Storage Wars – Using Ceph since Firefly by Achim Ledermüller
OSDC 2019 | Storage Wars – Using Ceph since Firefly by Achim Ledermüller
NETWAYS
 
C* Summit 2013: Practice Makes Perfect: Extreme Cassandra Optimization by Alb...
C* Summit 2013: Practice Makes Perfect: Extreme Cassandra Optimization by Alb...C* Summit 2013: Practice Makes Perfect: Extreme Cassandra Optimization by Alb...
C* Summit 2013: Practice Makes Perfect: Extreme Cassandra Optimization by Alb...
DataStax Academy
 
OSCON TALK: Becoming Friends with Cassandra and Spark
OSCON TALK: Becoming Friends with Cassandra and SparkOSCON TALK: Becoming Friends with Cassandra and Spark
OSCON TALK: Becoming Friends with Cassandra and Spark
Dani Traphagen
 
Unloading Plone
Unloading PloneUnloading Plone
Unloading Plone
Elizabeth Leddy
 
Spark core
Spark coreSpark core
Spark core
Freeman Zhang
 
The Docker Multitenancy Problem: A Journey through Infrastructure Hell
The Docker Multitenancy Problem: A Journey through Infrastructure HellThe Docker Multitenancy Problem: A Journey through Infrastructure Hell
The Docker Multitenancy Problem: A Journey through Infrastructure Hell
Peter Klipfel
 
Nightmare with ceph : Recovery from ceph cluster total failure
Nightmare with ceph : Recovery from ceph cluster total failureNightmare with ceph : Recovery from ceph cluster total failure
Nightmare with ceph : Recovery from ceph cluster total failure
Andrew Yongjoon Kong
 
Testing servers like software
Testing servers like softwareTesting servers like software
Testing servers like software
Peter Souter
 
Go Replicator
Go ReplicatorGo Replicator
Go Replicator
Joshua Drake
 
C* Summit EU 2013: Practice Makes Perfect: Extreme Cassandra Optimization
C* Summit EU 2013: Practice Makes Perfect: Extreme Cassandra OptimizationC* Summit EU 2013: Practice Makes Perfect: Extreme Cassandra Optimization
C* Summit EU 2013: Practice Makes Perfect: Extreme Cassandra Optimization
DataStax Academy
 

Similar to Joining a p2p Conversation - 2017-06 Meetup (20)

flowr streamlining computing workflows
flowr streamlining computing workflowsflowr streamlining computing workflows
flowr streamlining computing workflows
 
Tasty Recipes for Every Day 2016 (Neos)
Tasty Recipes for Every Day 2016 (Neos)Tasty Recipes for Every Day 2016 (Neos)
Tasty Recipes for Every Day 2016 (Neos)
 
HBase: How to get MTTR below 1 minute
HBase: How to get MTTR below 1 minuteHBase: How to get MTTR below 1 minute
HBase: How to get MTTR below 1 minute
 
To AWS with Ansible
To AWS with AnsibleTo AWS with Ansible
To AWS with Ansible
 
Elassandra
ElassandraElassandra
Elassandra
 
2017-10-24 All Day DevOps - Disposable Development Environments
2017-10-24 All Day DevOps - Disposable Development Environments2017-10-24 All Day DevOps - Disposable Development Environments
2017-10-24 All Day DevOps - Disposable Development Environments
 
Basic stuff You Need to Know about Cassandra
Basic stuff You Need to Know about CassandraBasic stuff You Need to Know about Cassandra
Basic stuff You Need to Know about Cassandra
 
MySQL Server Settings Tuning
MySQL Server Settings TuningMySQL Server Settings Tuning
MySQL Server Settings Tuning
 
Node, can you even in CPU intensive operations?
Node, can you even in CPU intensive operations?Node, can you even in CPU intensive operations?
Node, can you even in CPU intensive operations?
 
Going Multiplayer With Kafka With Ben Gamble | Current 2022
Going Multiplayer With Kafka With Ben Gamble | Current 2022Going Multiplayer With Kafka With Ben Gamble | Current 2022
Going Multiplayer With Kafka With Ben Gamble | Current 2022
 
OSDC 2019 | Storage Wars – Using Ceph since Firefly by Achim Ledermüller
OSDC 2019 | Storage Wars – Using Ceph since Firefly by Achim LedermüllerOSDC 2019 | Storage Wars – Using Ceph since Firefly by Achim Ledermüller
OSDC 2019 | Storage Wars – Using Ceph since Firefly by Achim Ledermüller
 
C* Summit 2013: Practice Makes Perfect: Extreme Cassandra Optimization by Alb...
C* Summit 2013: Practice Makes Perfect: Extreme Cassandra Optimization by Alb...C* Summit 2013: Practice Makes Perfect: Extreme Cassandra Optimization by Alb...
C* Summit 2013: Practice Makes Perfect: Extreme Cassandra Optimization by Alb...
 
OSCON TALK: Becoming Friends with Cassandra and Spark
OSCON TALK: Becoming Friends with Cassandra and SparkOSCON TALK: Becoming Friends with Cassandra and Spark
OSCON TALK: Becoming Friends with Cassandra and Spark
 
Unloading Plone
Unloading PloneUnloading Plone
Unloading Plone
 
Spark core
Spark coreSpark core
Spark core
 
The Docker Multitenancy Problem: A Journey through Infrastructure Hell
The Docker Multitenancy Problem: A Journey through Infrastructure HellThe Docker Multitenancy Problem: A Journey through Infrastructure Hell
The Docker Multitenancy Problem: A Journey through Infrastructure Hell
 
Nightmare with ceph : Recovery from ceph cluster total failure
Nightmare with ceph : Recovery from ceph cluster total failureNightmare with ceph : Recovery from ceph cluster total failure
Nightmare with ceph : Recovery from ceph cluster total failure
 
Testing servers like software
Testing servers like softwareTesting servers like software
Testing servers like software
 
Go Replicator
Go ReplicatorGo Replicator
Go Replicator
 
C* Summit EU 2013: Practice Makes Perfect: Extreme Cassandra Optimization
C* Summit EU 2013: Practice Makes Perfect: Extreme Cassandra OptimizationC* Summit EU 2013: Practice Makes Perfect: Extreme Cassandra Optimization
C* Summit EU 2013: Practice Makes Perfect: Extreme Cassandra Optimization
 

Recently uploaded

GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 

Recently uploaded (20)

GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 

Joining a p2p Conversation - 2017-06 Meetup

  • 1. JOINING A p2p CONVERSATION JOAQUIN CASARES, THE LAST PICKLE
  • 2. JOINING A p2p CONVERSATION JOAQUIN CASARES ▸ The Last Pickle ▸ Consultant ▸ Previously: ▸ Umbel ▸ Software Engineer ▸ Riptano/DataStax ▸ Support Engineer ▸ Software Engineer-in-Test ▸ Demo Engineer
  • 3. JOINING A p2p CONVERSATION THE LAST PICKLE ▸ 50+ years combined experience with Apache Cassandra. ▸ We communicate ideas. ▸ We are committed to doing the right thing for both our team of experts and our clients. ▸ Our passion for sharing our knowledge is present in all that we do. ▸ Consider us a member of your team. ▸ Ultimately: ▸ We want you to be successful and have all the information to do so.
  • 4. WHERE DO WE GO FROM HERE?
  • 5. OVERVIEW Overview ▸ p2p Networks. ▸ Cassandra fundamentals. ▸ How to add capacity. ▸ How to check on the status. ▸ Things you shouldn't forget, I think. ▸ How to forget.
  • 8. KaZaA TO BITTORRENT KaZaA: "P2P" PEER SUPERNODE
  • 9. KaZaA TO BITTORRENT KaZaA: "CENTRALIZED P2P" PEER SUPERNODE KAZAA.COM
  • 10. KaZaA TO BITTORRENT KaZaA: SHUTDOWN PEER SUPERNODE KAZAA.COM CAN ANYONE HERE ME? IS ANYONE ALIVE OUT THERE? NO.
  • 11. KaZaA TO BITTORRENT BITTORRENT: A REAL DECENTRALIZED, DISTRIBUTED P2P NETWORK PEER TRACKER SELF
  • 12. KaZaA TO BITTORRENT BITTORRENT: A REAL DECENTRALIZED, DISTRIBUTED P2P NETWORK PEER TRACKER SELF MY CONNECTION WAS SEVERED.
  • 13. KaZaA TO BITTORRENT BITTORRENT: A REAL DECENTRALIZED, DISTRIBUTED P2P NETWORK PEER TRACKER SELF I'M BANNED BY A STATE ACTOR. I KNOW THAT HASH.
  • 17. CASSANDRA TOKENS LEGACY OWNERSHIP A G M T M G A T (T, A] (G, M] (A, G](M, T]
  • 18. CASSANDRA TOKENS LEGACY OWNERSHIP A G M T M G A T (T, A] (A, G] (G, M] (M, T]
  • 19. CASSANDRA TOKENS VIRTUAL NODE ("VNODES") OWNERSHIP A D G J 3 2 1 4 M Q T W 1 3 2 1 3 2 4 4
  • 20. CASSANDRA TOKENS VIRTUAL NODE ("VNODES") OWNERSHIP A D G J 3 2 1 4 M Q T W 1 3 2 1 3 2 4 4
  • 21. CASSANDRA TOKENS VNODES - JOINING A NODE B D G J 3 2 1 4 M Q T W 5 3 2 1 3 2 4 4 5 P 5 A 1
  • 22. USE 32-64 VNODES. NOT: DEFAULT OF 256 VNODES / NODE. The Last Pickle CASSANDRA TOKENS
  • 23. USE THE SAME NUM_TOKENS COUNT ACROSS ALL MACHINES. The Last Pickle CASSANDRA TOKENS
  • 24. CASSANDRA TOKENS TIDBIT: MD5 TOKENS TO MURMUR3 TOKENS ▸ MD5 token range: 0 to 2127 ▸ Murmur3 token range: -263 to 263 -1 ▸ Murmur3: ▸ Better randomness ▸ Lower chance of collisions ▸ ~6x faster than MD5
  • 26. SEED NODES CASSANDRA: P2P IMPLEMENTATION - JOINING PEER SEED JOINING CAN I JOIN THE PARTY? HEY BOB, WE HAVE A NEW FRIEND. I'M BOB.
  • 27. SEED NODES CASSANDRA: P2P IMPLEMENTATION PEER SEED NODE HAVE SOME OF MY TOKENS. I DIDN'T WANT THESE RANGES ANYWAY.
  • 28. SEED NODES CASSANDRA: P2P IMPLEMENTATION - SEED FAILURE PEER SEED NODE IT'S OK. I KNOW WHO MY FRIENDS ARE.
  • 29. SEED NODES CASSANDRA: P2P IMPLEMENTATION - GOSSIP PEER SEED NODE I'M BOB. HEY BOB, I'M STILL HERE.
  • 30. SEED NODES TIDBIT: GOSSIP ▸ It works. ▸ Basically: echoes. ▸ CDN Gossip use case.
  • 31. SEED NODES CASSANDRA: P2P IMPLEMENTATION - FAILED JOINING PEER SEED FAILED JOIN NONE OF THESE ADDRESSES WORK. WHERE'S THE PARTY?
  • 32. SEED NODES CASSANDRA: P2P IMPLEMENTATION - FAILED JOINING OF SEED NODE PEER SEED JOINING SEED SO... WAS IT THE CHICKEN? OR THE EGG?
  • 33. USE THE SAME SEED NODES THROUGHOUT THE CLUSTER. The Last Pickle SEED NODES
  • 34. 3-5 SEED NODES PER DATA CENTER. The Last Pickle SEED NODES
  • 38. BOOTSTRAP REPLICA OWNERSHIP B D G J 3 2 1 4 M Q T W 5 3 2 1 3 2 4 4 5 P 5 A 1
  • 39. BOOTSTRAP REPLICA OWNERSHIP B D G J 3 2 1 4 M Q T W 5 3 2 1 3 2 4 4 5 P 5 A 1 B C D A
  • 40. BOOTSTRAP REPLICA OWNERSHIP B D G J 3 2 1 4 M Q T W 5 3 2 1 3 2 4 4 5 P 5 A 1 B C D A Q P O N M
  • 41. 3 2 BOOTSTRAP POST-BOOTSTRAP: CLEANUP B D G J 3 2 1 4 M Q T W 5 3 2 1 3 2 4 4 5 P 5 A 1 B C D A Q P O N P O N B I REALLY DON'T NEED THIS EXTRA PRESSURE P, O, & N. SEE YOU, C! M
  • 42. BOOTSTRAP POST-BOOTSTRAP: CLEANUP B D G J 3 2 1 4 M Q T W 5 3 2 1 3 2 4 4 5 P 5 A 1 B D A QP I AM NOW RESPONSIBLE FOR BP. M
  • 43. BOOTSTRAP WHEN TO USE THE BOOTSTRAP PROCESS ▸ Prerequisite: Is everything UN (Up/Normal)? ▸ nodetool status
  • 44. BOOTSTRAP WHEN TO USE THE BOOTSTRAP PROCESS ▸ Are you hitting a disk capacity issue? ▸ df -h
  • 45. BOOTSTRAP WHEN TO USE THE BOOTSTRAP PROCESS ▸ Are you hitting CPU capacity limits? ▸ top/htop
  • 46. BOOTSTRAP WHEN TO USE THE BOOTSTRAP PROCESS ▸ Does request latency have room for improvement? ▸ nodetool cfstats ▸ nodetool tablestats on newer versions of Cassandra.
  • 47. BOOTSTRAP WHEN TO USE THE BOOTSTRAP PROCESS ▸ Do you want to split up token hot spots? ▸ nodetool status $KEYSPACE
  • 48. BOOTSTRAP WHEN TO USE THE BOOTSTRAP PROCESS ▸ Is the prerequisite met and YES to any of the other questions? Then bootstrap!
  • 49. BOOTSTRAP PLAY IT SAFE ▸ UJ (Up/Joining) is an ephemeral state after ~2 minutes, but wait 5 minutes. ▸ UN (Up/Normal) is a persistent state. ▸ With Cassandra 2.2+, we have nodetool bootstrap resume, or simply restarting the node. ▸ With Cassandra pre-2.2, we must clear: ▸ data_file_directories ▸ commitlog_directory ▸ saved_caches_directory
  • 50. IF "POPCORN" JOINING: NODETOOL NETSTATS The Last Pickle BOOTSTRAP
  • 51. BOOTSTRAP DON'TS ▸ Seed nodes cannot be bootstrapped. ▸ No need to include auto_bootstrap parameter in the cassandra.yaml. ▸ Do not join more than one node per rack concurrently. ▸ Do not start two bootstrap processes within 2 minutes of each other. ▸ Do not bootstrap a node when there are more than 0 nodes offline. ▸ The JVM option -Dcassandra.consistent.rangemovement=false can be used to override the default behavior. ▸ Will require a follow-up rolling repair. ▸ Do not bootstrap a node running a different version of Cassandra. ▸ Do not bootstrap new nodes into a mixed-version cluster. ▸ Don't forget about your racks! More on that later...
  • 52. BOOTSTRAP TIDBIT: CLEANUP ▸ Cleanup removes stale replicas off a node. ▸ Acts like a single-SSTable compaction. ▸ Does not need to be run manually, since any followup compactions will remove stale data. ▸ Is useful when disk capacity is at it's limit.
  • 53. KEEP FREE DISK SPACE UNDER 50% TO ALLOW FOR NORMAL COMPACTIONS TO COMPLETE SUCCESSFULLY. The Last Pickle BOOTSTRAP
  • 56. REPLACE_ADDRESS REPLACING PEER SEED REPLACING HEY FELLOWS, I'M TINA, THE NEW JOHN. HEY TINA, HERE'S THE DATA JOHN SHOULD HAVE HAD.
  • 57. FOR IMPORTANT DATA: USE AN ODD-NUMBERED REPLICATION_FACTOR > 1. The Last Pickle REPLACE_ADDRESS
  • 58. FOR IMPORTANT DATA: WRITE AT A CONSISTENCY LEVEL OF LOCAL_QUORUM OR HIGHER. The Last Pickle REPLACE_ADDRESS
  • 59. TO AVOID STALE DATA: READ AT A CONSISTENCY LEVEL OF LOCAL_QUORUM OR HIGHER. The Last Pickle REPLACE_ADDRESS
  • 60. REPLACE_ADDRESS TIDBITS: REPLACE_ADDRESS ▸ If using cl.ONE, you might have a bad time. ▸ But: ▸ Remember to respect max_hint_window_in_ms, which defaults to 3 hours and starts as soon as the original node is knocked offline. ▸ If the hinted handoff window is missed, a rolling repair may be needed. ▸ Use -Dcassandra.replace_address_first_boot=<IP_ADDRESS> to prevent possible issues if you forget to remove the flag.
  • 61. REPLACE_ADDRESS TIDBITS: REPLACE_ADDRESS - EXPANDED ▸ If not using a consistency level higher than ONE, stale data or data loss is possible and likely in the event of a node failure. ▸ But: ▸ Hinted handoff may prevent stale data or data loss if a new node is in the UN (Up/Normal) state within the max_hint_window_in_ms, which defaults to 3 hours and starts as soon as the original node is knocked offline. ▸ If the hinted handoff window is missed, and if running with a replication factor > 1, and other replicas received the newer mutation, a rolling repair will make the new replica consistent. ▸ Read-repair may prevent stale data from being returned and repair stale partitions, but the dclocal_read_repair_chance table schema parameter defaults to 10% of all requests. ▸ Use -Dcassandra.replace_address_first_boot=<IP_ADDRESS> to prevent possible issues if you forget to remove the flag.
  • 62. REPLACE_ADDRESS WHEN TO USE THE REPLACE_ADDRESS PROCESS ▸ Prerequisite: Is the node unrecoverable/inaccessible? ▸ Are the current snapshots too stale? ▸ Most times snapshots are used for disaster scenarios. ▸ Is there enough time to replace the old node and run a repair before gc_grace_seconds, which defaults to 10 days? ▸ If not, use nodetool removetoken. ▸ Is YES the response to all of the above questions? Then replace_address.
  • 64. REBUILD_ADDRESS EASY WAY TO SWAP TO NEW HARDWARE. The Last Pickle REBUILD
  • 65. REBUILD JOINING PEER SEED JOINING I'M BOB. BUT DON'T SEND US YOUR DATA. WE'LL ASK WHEN READY. I'M JULIA.
  • 66. REBUILD REBUILD PEER SEED REBUILDING LET'S DO THIS! HOW MUCH LONGER, REALLY? WOW, THOSE ARE FEW GIGS! OK, READY NOW.
  • 67. REBUILD REBUILD PEER SEED REBUILDING LET'S DO THIS! HOW MUCH LONGER, REALLY? WOW, THOSE ARE FEW GIGS! OK, READY NOW.
  • 68. REBUILD PROCESS: GETTING THINGS SORTED ▸ Prerequisite: Use the NetworkTopologyStrategy, instead of the SimpleStrategy. ▸ Prerequisite: Use a DCAwarePolicy with the Cassandra driver to restrict the contact points. ▸ Prerequisite: Use LOCAL_QUORUM and LOCAL_ONE consistency levels to restrict requests to the specific data center. ▸ Do not define a replication strategy to the new new data center at first. ▸ Bootstrap all intended nodes into the new data center. ▸ Because there is no replication strategies that mention the new data center, this should cause almost no streaming tasks.
  • 69. REBUILD PROCESS, PART 2: EXECUTING THE REBUILD ▸ Once all new nodes are in the new data center, continue. ▸ Modify the NTS settings to include the new data center. ▸ Run the rebuild process on as many concurrent nodes as latency metrics allow. ▸ To mitigate load on existing nodes, you may be able to use multiple data center sources concurrently by using different data center parameters for the nodetool rebuild command.
  • 70. REBUILD PROCESS, PART 3: USING THE NEW DATA CENTER ▸ Once all nodes have been completed the rebuild process, continue. ▸ If the intent was to remove a deprecated data center, update the DCAwarePolicy for the Cassandra driver to point to the new data center and restart the application. Then update NTS, and remove deprecated nodes. ▸ If the intent was to add a new data center, launch new application servers within the same data center and modify the DCAwarePolicy to reference the new data center.
  • 71. REBUILD WHEN TO USE THE REBUILD PROCESS ▸ Are you attempting to add or deprecate an entire data center? ▸ Migrate to new hardware? ▸ Moving to the cloud? ▸ Moving to a different cloud? ▸ Moving back from the cloud? :) ▸ Is YES the response to any of the above questions? Then rebuild.
  • 72. THIS IS A WELL TRODDEN PATH. DON'T WORRY. BE HAPPY. The Last Pickle REBUILD
  • 74. STATUS CHECK PROGRESS - SHOW OUTPUT ▸ nodetool compactionstats ▸ Monitor pending compactions, which are a byproduct of: ▸ Streaming data. ▸ nodetool cleanup ▸ Use nodetool setcompactionthroughput to throttle disk load. ▸ nodetool netstats ▸ Monitor active streaming tasks. ▸ nodetool status ▸ Monitor node's joining and up status.
  • 75. RACKS
  • 76. RACKS PROPER BALANCE ▸ Balance is important when considering: disk load, token distribution, data center load, and rack load. ▸ While other setups may be valid, Keep It Super Simple. ▸ Use 1 rack, or enough racks to equal the replication_strategy, for the data center. ▸ Ensure each rack always has an equal number of nodes. ▸ Each rack splits up the token range amongst themselves. ▸ Each data center will store its copies across racks, if available.
  • 78. INTERNODE SECURITY SSL ENCRYPTION ▸ Cassandra supports the following types of internode encryption: ▸ None. ▸ Inter-data center. ▸ Intra-data center, or inter-node. ▸ If data centers are separated by a public network, TLP recommends using inter-data center encryption. ▸ If running with paranoid security settings, encryption can be used between each node, regardless of topology settings.
  • 80. EXPOSED JMX PORT LOCKING DOWN YOUR JMX PORT ▸ Cassandra allows access to system metrics, system commands, and potentially destructive commands via a JMX port. ▸ By default, Cassandra 2.1.4+ restricts access to the JMX port to localhost. ▸ If remote access to JMX is required, edit cassandra-env.sh to change access or authentication settings. ▸ TLP still recommends proper firewall and security settings be used to restrict access to Cassandra from only verified machines.
  • 82. REMOVING NODES DECOMMISSION ▸ nodetool decommission should be used for nodes that are still operational, but will no longer be part of the rack. ▸ When decommissioning a node, the node's replica ranges are redistributed amongst the surviving nodes in the rack. ▸ A "reverse bootstrap" occurs in which all replicas that the node is responsible for are streamed to the new replica owners. ▸ Once all new replica owners hold all of the deprecated node's data, the node is removed from the ring. ▸ After 72 hours, the node is removed from the gossip state.
  • 83. REMOVING NODES REMOVENODE ▸ nodetool removenode can be used when a node has died and will not be replaced. ▸ The removenode command does not handle any streaming tasks, so follow up repairs are required to ensure the cluster is in a consistent state. ▸ The removenode command simply removes a node from the gossip state, forcing surviving nodes within the rack into being responsible for the deprecated node's token ranges.
  • 84. REMOVING NODES ASSASSINATE ▸ Note: This is NOT a hammer. ▸ Sometimes gossip state can become wonky with echoes of previously removed nodes. In these cases, and only in these cases, should nodetool assassinate be used. ▸ Much like nodetool removetoken, this command modifies the gossip state, but instead of marking the node as being REMOVED, the entry is removed in its entirety. ▸ Sometimes a single command may not remove a stubborn gossip state. In these cases, running nodetool assassinate across all nodes, in parallel, multiple times, may be needed to remove any culprit gossip echoed states. ▸ Repeated note: This is NOT a hammer.
  • 85. RECAP
  • 86. RECAP p2p NETWORKS ▸ KaZaA: Not really p2p. ▸ Bittorrent: Decentralized p2p. ▸ Cassandra: Stateful p2p, via Gossip.
  • 87. RECAP Fundamentals ▸ Nodes own multiple token ranges, when using Vnodes. ▸ Seed nodes allow new nodes to enter the cluster. ▸ Seed nodes also help keep a "consistent" topological state.
  • 88. RECAP ADDING CAPACITY ▸ Bootstrap ▸ The normal process to adding a node. ▸ Follow up with nodetool cleanup. ▸ replace_address ▸ Useful when a node is completely lost. ▸ nodetool rebuild ▸ Used to add an entire data center.
  • 89. RECAP STATUS ▸ Adding nodes creates collateral processes: ▸ Compaction. ▸ Streaming. ▸ Gossip entries.
  • 90. RECAP BE MINDFUL ▸ Ensure racks remain balanced upon topological changes. ▸ Ensure inter-node encryption is considered, especially when communicating over an open network. ▸ Ensure JMX access is not accidentally exposed to public access.
  • 91. RECAP REMOVING NODES ▸ nodetool decommission ▸ Deprecate a live node. ▸ nodetool removetoken ▸ Remove a downed node and reassign token ranges. ▸ nodetool assassinate ▸ Remove a gossip entry entirely. ▸ Note: This is still NOT a hammer.
  • 93. JOAQUIN@THELASTPICKLE.COM Joaquin Casares, The Last Pickle I'M BOB. BON VOYAGE! DIGIORNO HASTA LA VISTA, BABY. TTYL TTYS WELL THAT WAS FUN. DON'T FORGET ABOUT ME! WE'LL ALWAYS HAVE PARIS. HERE'S LOOKING AT YOUR, KID. I KNOW NOW WHY YOU CRY. CYA