©2019 DataStax.
Use only with permission.
1
academy.datastax.com
©2019 DataStax.
Use only with permission.
Cassandra and the Multi-Cloud
Amanda K Moran
Developer Advocate for DataStax
But first… A Little About Amanda
• Graduated with MS in Computer Science and
Engineering from Santa Clara University in
2012
• Worked as a Software Engineer for 6 years and
now is a Developer Advocate
• Apache Committer, PMC Member, and initial
contributor to all installation and deployment
work for Apache Trafodion
• Keywords: Disney, Cloud, Dogs, Veggies,
Linux, Databases, Big Data, Analytics, Testing,
and Running
© DataStax, All Rights Reserved.2 Confidential
What Are We Talking About Today
• Introduction to Apache Cassandra
• What are Multiple DataCenters?
• Why all this talk about MultiCloud?!
• Apache Cassandra and the MultiCloud
• Demo!
3 © DataStax, All Rights Reserved. Confidential
Introduction to Apache
Cassandra
4 © DataStax, All Rights Reserved. Confidential
What is Apache Cassandra?
• First developed by Facebook
• Became a top-level Apache
Foundation project in 2010
• NoSQL database
• Distributed, decentralized database
• Elastic scalability -- add/remove nodes with no
downtime
© DataStax, All Rights Reserved. Confidential5
What is Apache Cassandra?
• High performance
– Very fast -- low latency
• High availability / fault tolerant
– No single point of failure
• Solves many of the problems faced with a
traditional DB for certain workloads
© DataStax, All Rights Reserved. Confidential6
What Does All This Mean?
• Let’s talk about the Big Topics:
– Distributed Systems
– Replication
– Elastically Scalable
– High Availability
– Latency
• Read path
• Write path
© DataStax, All Rights Reserved. Confidential7
Note: Don’t forget this is just a brief intro!
Distributed System
• Every node in the cluster has the same role
– Really!
– Cassandra does not have a Master-Worker Architecture
• Any client can connect to any node
– All nodes are Read and Write ready
• But this is not to say that all nodes
contain all data
© DataStax, All Rights Reserved. Confidential8
The cluster
9
Server
Token Range
0 0-25
26 26-50
51 51-75
76 76-100
Server
ServerServer
0-25
76-100
26-5051-75
Replication
• To be able to survive a node going down data must be
copied to other nodes
• The Replication Factor (RF) is set by the user
– 1-Number of nodes in the Cluster (not recommended)
• The data is asynchronously replicated
– Automatic
– Peer-to-peer communication
© DataStax, All Rights Reserved. Confidential10
Replication
11
DC1
DC1: RF=3
Node Primary Replica Replica
10.0.0.1 00-25 76-100 51-75
10.0.0.2 26-50 00-25 76-100
10.0.0.3 51-75 26-50 00-25
10.0.0.4 76-100 51-75 26-50
10.0.0.1
00-25
10.0.0.4
76-100
10.0.0.2
26-50
10.0.0.3
51-75
76-100
51-75
00-25
76-100
26-50
00-25
51-75
26-50
Elastically Scalable
• As more nodes are added, performance increases linearly
• You can scale up or down with no downtime
– Not even a restart!
• Reads and Writes both scale
© DataStax, All Rights Reserved. Confidential12
High Availability
• The lack of a Master node allows for high availability
– No single point of failure
• Replication allows nodes to fail and data to still be
available
– Cassandra expects nodes to fail and doesn’t panic
© DataStax, All Rights Reserved. Confidential13
Latency
• How is Apache Cassandra able to achieve such low
latency?
• It’s all about the read and write path!
– The write path is truly beautiful in its simplicity!
– High throughput with quick responses times are easy to
achieve
© DataStax, All Rights Reserved. Confidential14
Write Path (Client to Cluster)
© DataStax, All Rights Reserved. Confidential15
Write Path (Internals)
© DataStax, All Rights Reserved. Confidential16
Read Path (Client to Cluster)
• Data modeling comes in to play here!
– This is the one “simple trick about Cassandra/Nosql”
• Partition data by nodes
• Query will essentially query one node and return the data
– Constant time READ access
© DataStax, All Rights Reserved. Confidential17
select * from myTable where state =`CA`
Looks like SQL but it IS NOT! It’s
Cassandra Query Language
Multiple Data Centers
• Cassandra Cluster represented as a ring
• Can support multiple rings chained together
– Separated by region
– Separated by workload
© DataStax, All Rights Reserved. Confidential18
Multiple Data Centers
• Multiple Data Center support is out of the box
• Replication happens between data centers automatically
– No need to sync data
© DataStax, All Rights Reserved. Confidential19
What is DataStax?
• DataStax is the enterprise version of Apache Cassandra
• 70% of the the commits to the open source project
• 2x the Write performance of Apache Cassandra
• 2x the Read performance
• Add in the ability to do Search, Analytics, and Grpah
• Cool tools!
© DataStax, All Rights Reserved. Confidential20
WHY Multi-Cloud?
21 © DataStax, All Rights Reserved.
What is Multi-Cloud?
• Two or more public cloud providers at the same time
– Data moving between two++ providers
© DataStax, All Rights Reserved. Confidential22
Why Multi-Cloud?
• Data Center Locality
– Not all zones are in each provider
• Provider Specific Services
• Cloud provider competition
– Can shop around cheap compute! -- Maybe
– More likely -- afraid of lock in to a competitor
• Cost
© DataStax, All Rights Reserved. Confidential23
Why can Apache Cassandra do MultiCloud?
• Multi Data Center support -- out of the box
• Cloud Native database
– Built for the cloud
• Multi region support
– Expanded to Hybrid cloud
– Easy expansion to Multi-Cloud
• Every node has the same job
© DataStax, All Rights Reserved. Confidential24
Only Database that supports Multi-Cloud
© DataStax, All Rights Reserved. Confidential25
Only Database that supports Multi-Cloud
© DataStax, All Rights Reserved. Confidential26
Demo
27 © DataStax, All Rights Reserved.
Why NOT Multi Cloud?
28 © DataStax, All Rights Reserved.
Issues with Multi-Cloud
• Complexity
• Networking!
– Latency
• Security
– Boundary protection
• Legal
• Scaling at the Application layer
© DataStax, All Rights Reserved. Confidential29
Okay, this was
awesome! What now?
30 © DataStax, All Rights Reserved.
Information and Links
• Learn more about Cassandra: https://academy.datastax.com/
• Learn more about DataStax: https://www.datastax.com/
• Follow me on Twitter: @AmandaK_Data
• Github: https://github.com/amandamoran
© DataStax, All Rights Reserved. Confidential31
Join us at Accelerate!!
www.datastax.com/accelerate Discount Code: ADVOCATE20
Thank you
33 © DataStax, All Rights Reserved. Confidential

Apache Cassandra and The Multi-Cloud by Amanda Moran

  • 1.
    ©2019 DataStax. Use onlywith permission. 1 academy.datastax.com ©2019 DataStax. Use only with permission. Cassandra and the Multi-Cloud Amanda K Moran Developer Advocate for DataStax
  • 2.
    But first… ALittle About Amanda • Graduated with MS in Computer Science and Engineering from Santa Clara University in 2012 • Worked as a Software Engineer for 6 years and now is a Developer Advocate • Apache Committer, PMC Member, and initial contributor to all installation and deployment work for Apache Trafodion • Keywords: Disney, Cloud, Dogs, Veggies, Linux, Databases, Big Data, Analytics, Testing, and Running © DataStax, All Rights Reserved.2 Confidential
  • 3.
    What Are WeTalking About Today • Introduction to Apache Cassandra • What are Multiple DataCenters? • Why all this talk about MultiCloud?! • Apache Cassandra and the MultiCloud • Demo! 3 © DataStax, All Rights Reserved. Confidential
  • 4.
    Introduction to Apache Cassandra 4© DataStax, All Rights Reserved. Confidential
  • 5.
    What is ApacheCassandra? • First developed by Facebook • Became a top-level Apache Foundation project in 2010 • NoSQL database • Distributed, decentralized database • Elastic scalability -- add/remove nodes with no downtime © DataStax, All Rights Reserved. Confidential5
  • 6.
    What is ApacheCassandra? • High performance – Very fast -- low latency • High availability / fault tolerant – No single point of failure • Solves many of the problems faced with a traditional DB for certain workloads © DataStax, All Rights Reserved. Confidential6
  • 7.
    What Does AllThis Mean? • Let’s talk about the Big Topics: – Distributed Systems – Replication – Elastically Scalable – High Availability – Latency • Read path • Write path © DataStax, All Rights Reserved. Confidential7 Note: Don’t forget this is just a brief intro!
  • 8.
    Distributed System • Everynode in the cluster has the same role – Really! – Cassandra does not have a Master-Worker Architecture • Any client can connect to any node – All nodes are Read and Write ready • But this is not to say that all nodes contain all data © DataStax, All Rights Reserved. Confidential8
  • 9.
    The cluster 9 Server Token Range 00-25 26 26-50 51 51-75 76 76-100 Server ServerServer 0-25 76-100 26-5051-75
  • 10.
    Replication • To beable to survive a node going down data must be copied to other nodes • The Replication Factor (RF) is set by the user – 1-Number of nodes in the Cluster (not recommended) • The data is asynchronously replicated – Automatic – Peer-to-peer communication © DataStax, All Rights Reserved. Confidential10
  • 11.
    Replication 11 DC1 DC1: RF=3 Node PrimaryReplica Replica 10.0.0.1 00-25 76-100 51-75 10.0.0.2 26-50 00-25 76-100 10.0.0.3 51-75 26-50 00-25 10.0.0.4 76-100 51-75 26-50 10.0.0.1 00-25 10.0.0.4 76-100 10.0.0.2 26-50 10.0.0.3 51-75 76-100 51-75 00-25 76-100 26-50 00-25 51-75 26-50
  • 12.
    Elastically Scalable • Asmore nodes are added, performance increases linearly • You can scale up or down with no downtime – Not even a restart! • Reads and Writes both scale © DataStax, All Rights Reserved. Confidential12
  • 13.
    High Availability • Thelack of a Master node allows for high availability – No single point of failure • Replication allows nodes to fail and data to still be available – Cassandra expects nodes to fail and doesn’t panic © DataStax, All Rights Reserved. Confidential13
  • 14.
    Latency • How isApache Cassandra able to achieve such low latency? • It’s all about the read and write path! – The write path is truly beautiful in its simplicity! – High throughput with quick responses times are easy to achieve © DataStax, All Rights Reserved. Confidential14
  • 15.
    Write Path (Clientto Cluster) © DataStax, All Rights Reserved. Confidential15
  • 16.
    Write Path (Internals) ©DataStax, All Rights Reserved. Confidential16
  • 17.
    Read Path (Clientto Cluster) • Data modeling comes in to play here! – This is the one “simple trick about Cassandra/Nosql” • Partition data by nodes • Query will essentially query one node and return the data – Constant time READ access © DataStax, All Rights Reserved. Confidential17 select * from myTable where state =`CA` Looks like SQL but it IS NOT! It’s Cassandra Query Language
  • 18.
    Multiple Data Centers •Cassandra Cluster represented as a ring • Can support multiple rings chained together – Separated by region – Separated by workload © DataStax, All Rights Reserved. Confidential18
  • 19.
    Multiple Data Centers •Multiple Data Center support is out of the box • Replication happens between data centers automatically – No need to sync data © DataStax, All Rights Reserved. Confidential19
  • 20.
    What is DataStax? •DataStax is the enterprise version of Apache Cassandra • 70% of the the commits to the open source project • 2x the Write performance of Apache Cassandra • 2x the Read performance • Add in the ability to do Search, Analytics, and Grpah • Cool tools! © DataStax, All Rights Reserved. Confidential20
  • 21.
    WHY Multi-Cloud? 21 ©DataStax, All Rights Reserved.
  • 22.
    What is Multi-Cloud? •Two or more public cloud providers at the same time – Data moving between two++ providers © DataStax, All Rights Reserved. Confidential22
  • 23.
    Why Multi-Cloud? • DataCenter Locality – Not all zones are in each provider • Provider Specific Services • Cloud provider competition – Can shop around cheap compute! -- Maybe – More likely -- afraid of lock in to a competitor • Cost © DataStax, All Rights Reserved. Confidential23
  • 24.
    Why can ApacheCassandra do MultiCloud? • Multi Data Center support -- out of the box • Cloud Native database – Built for the cloud • Multi region support – Expanded to Hybrid cloud – Easy expansion to Multi-Cloud • Every node has the same job © DataStax, All Rights Reserved. Confidential24
  • 25.
    Only Database thatsupports Multi-Cloud © DataStax, All Rights Reserved. Confidential25
  • 26.
    Only Database thatsupports Multi-Cloud © DataStax, All Rights Reserved. Confidential26
  • 27.
    Demo 27 © DataStax,All Rights Reserved.
  • 28.
    Why NOT MultiCloud? 28 © DataStax, All Rights Reserved.
  • 29.
    Issues with Multi-Cloud •Complexity • Networking! – Latency • Security – Boundary protection • Legal • Scaling at the Application layer © DataStax, All Rights Reserved. Confidential29
  • 30.
    Okay, this was awesome!What now? 30 © DataStax, All Rights Reserved.
  • 31.
    Information and Links •Learn more about Cassandra: https://academy.datastax.com/ • Learn more about DataStax: https://www.datastax.com/ • Follow me on Twitter: @AmandaK_Data • Github: https://github.com/amandamoran © DataStax, All Rights Reserved. Confidential31
  • 32.
    Join us atAccelerate!! www.datastax.com/accelerate Discount Code: ADVOCATE20
  • 33.
    Thank you 33 ©DataStax, All Rights Reserved. Confidential