SlideShare a Scribd company logo
Cassandra FTW
                           Andrew Byde
                           Principal Scientist




Monday, 15 August 2011
Menu

                   • Introduction
                   • Data model + storage architecture
                   • Partitioning + replication
                   • Consistency
                   • De-normalisation

Monday, 15 August 2011
History + design




Monday, 15 August 2011
History

                   • 2007: Started at Facebook for inbox search
                   • July 2008: Open sourced by Facebook
                   • March 2009: Apache Incubator
                   • February 2010: Apache top-level project
                   • May 2011:Version 0.8
Monday, 15 August 2011
What it’s good for

                   • Horizontal scalability
                   • No single-point of failure
                   • Multi-data centre support
                   • Very high write workloads
                   • Tuneable consistency

Monday, 15 August 2011
What it’s not so good for

                   • Transactions
                   • Read heavy workloads
                   • Low latency applications
                         •   compared to in-memory dbs




Monday, 15 August 2011
Data model




Monday, 15 August 2011
Keyspaces and Column Families
                     SQL                                            Cassandra

           Database                 row/key col_1    col_2
                                                                     Keyspace
                                       row/key col_1     col_1
                                            row/  col_1    col_1


                Table                                              Column Family



                           Keyspaces & CFs have different
                            sets of configuration settings
Monday, 15 August 2011
Column Family

                         key: {
                            column: value,
                            column: value,
                            ...
                          }



Monday, 15 August 2011
Rows and columns
                         col1   col2   col3   col4   col5   col6   col7
                 row1            x                    x      x
                 row2     x      x      x      x      x
                 row3            x      x             x      x      x
                 row4            x      x      x             x
                 row5            x             x      x      x
                 row6            x
                 row7     x      x             x



Monday, 15 August 2011
Reads
               • get
               • get_slice          One row, some cols
                • name predicate
                • slice range
               • multiget_slice     Multiple rows
               • get_range_slices
Monday, 15 August 2011
get
                         col1   col2   col3   col4   col5   col6   col7
                 row1            x                    x      x
                 row2     x      x      x      x      x
                 row3            x      x             x      x      x
                 row4            x      x      x             x
                 row5            x             x      x      x
                 row6            x
                 row7     x      x             x



Monday, 15 August 2011
get_slice: name predicate
                         col1   col2   col3   col4   col5   col6   col7
                 row1            x                    x      x
                 row2     x      x      x      x      x
                 row3            x      x             x      x      x
                 row4            x      x      x             x
                 row5            x             x      x      x
                 row6            x
                 row7     x      x             x



Monday, 15 August 2011
get_slice: slice range
                          col1   col2   col3   col4   col5   col6   col7
                 row1             x                    x      x
                 row2      x      x      x      x      x
                 row3      x      x      x             x      x      x
                 row4             x      x      x             x
                 row5             x             x      x      x
                 row6             x
                 row7      x      x             x



Monday, 15 August 2011
multiget_slice: name
                              predicate
                          col1   col2   col3   col4   col5   col6   col7
                 row1             x                    x      x
                 row2      x      x      x      x      x
                 row3             x      x             x      x      x
                 row4             x      x      x             x
                 row5             x             x      x      x
                 row6             x
                 row7      x      x             x


Monday, 15 August 2011
get_range_slices: slice range
                         col1   col2   col3   col4   col5   col6   col7
                 row1            x                    x      x
                 row2     x      x      x      x      x
                 row3            x      x             x      x      x
                 row4            x      x      x             x
                 row5            x             x      x      x
                 row6            x
                 row7     x      x             x



Monday, 15 August 2011
Storage
                         architecture



Monday, 15 August 2011
Data Layout
                                     writes
                                        key-value insert
            on-disk
        un-ordered
        commit log                                                in-memory
        ...                                                     (key,col)-sorted
                                                                   memtable
                                            flush
                             on-disk        01001101110101000   01001101110101000



                         (key,col)-sorted                                           ...
                             SSTables
Monday, 15 August 2011
Data Layout
                            SSTables


                             SSTable
      Bloom Filter            01001101110101000



         Index
          Data




Monday, 15 August 2011
Data Layout
                                       reads
                                              ?



                          01001101110101000       01001101110101000   010011011101010001111010101001




Monday, 15 August 2011
Data Layout
                                       reads
                                              ?


                                    X             X
                          01001101110101000       01001101110101000   010011011101010001111010101001




Monday, 15 August 2011
Distribution:

                         Partitioning +
                          Replication


Monday, 15 August 2011
Partitioning + Replication



           (k, v)
                         ?




Monday, 15 August 2011
Partitioning + Replication
                   • Partitioning data on to nodes
                    • load balancing
                    • row-based
                   • Replication
                    • to protect against failure
                    • better availability
Monday, 15 August 2011
Partitioning
                   • Random: take hash of row key
                         •   good for load balancing

                         •   bad for range queries

                   • Ordered: subdivide key space
                         •   bad for load balancing

                         •   good for range queries

                   • Or build your own...
Monday, 15 August 2011
Simple Replication



           (k, v)




                           Nodes arranged on a ‘ring’
Monday, 15 August 2011
Simple Replication
                                     Primary location




           (k, v)




                           Nodes arranged on a ‘ring’
Monday, 15 August 2011
Simple Replication
                                     Primary location




           (k, v)                                   Extra copies
                                                   are successors
                                                     on the ring


                           Nodes arranged on a ‘ring’
Monday, 15 August 2011
Topology-aware
                                  Replication
                   • Snitch : node IP          (DataCenter, rack)

                   • EC2Snitch
                         •   Region   DC; availability_zone   rack

                   • PropertyFileSnitch
                         •   Configured from a file



Monday, 15 August 2011
Topology-aware
                           Replication
                                        DC 1     DC 2




                          (k, v)


                                   r1      r2   r1   r2


Monday, 15 August 2011
Topology-aware
                           Replication
                                        DC 1     DC 2




                          (k, v)


                                   r1      r2   r1   r2


Monday, 15 August 2011
Topology-aware
                           Replication
                                        DC 1     DC 2
       extra copies
       to different
       data center

                          (k, v)


                                   r1      r2   r1   r2


Monday, 15 August 2011
Topology-aware
                           Replication
                                        DC 1     DC 2
       extra copies
       to different
       data center

                          (k, v)

      spread across
      racks within a               r1      r2   r1   r2
       data center

Monday, 15 August 2011
Distribution:

                         Consistency



Monday, 15 August 2011
Consistency Level

                   • How many replicas must respond in order to
                         declare success
                   • W/N must succeed for write to succeed
                         •   write with client-generated timestamp

                   • R/N must succeed for read to succeed
                         •   return most recent, by timestamp


Monday, 15 August 2011
Consistency Level

                   • 1, 2, 3 responses
                   • Quorum (more than half)
                   • Quorum in local data center
                   • Quorum in each data center

Monday, 15 August 2011
Maintaining consistency

                   • Read repair
                   • Hinted handoff
                   • Anti-entropy


Monday, 15 August 2011
Read repair
                   • If the replicas disagree on read, send most
                         recent data back

                                            n1

                          read k?           n2

                                            n3


Monday, 15 August 2011
Read repair
                   • If the replicas disagree on read, send most
                         recent data back

                                            n1   v, t1

                          read k?           n2   not found!

                                            n3   v’, t2


Monday, 15 August 2011
Read repair
                   • If the replicas disagree on read, send most
                         recent data back

                                            n1   v, t1

                                            n2   not found!

                                            n3   v’, t2


Monday, 15 August 2011
Read repair
                   • If the replicas disagree on read, send most
                         recent data back

                                            n1

                                            n2

                                            n3   write (k, v’, t2)


Monday, 15 August 2011
Hinted handoff

                   • When a node is unavailable
                   • Writes can be written to any node as a hint
                   • Delivered when the node comes back
                         online




Monday, 15 August 2011
Anti-entropy

                   • Equivalent to ‘read repair all’
                   • Requires reading all data (woah)
                         •   (Although only hashes are sent to calculate diffs)

                   •          Manual process




Monday, 15 August 2011
De-normalisation




Monday, 15 August 2011
De-normalisation

                   • Disk space is much cheaper than disk seeks
                   • Read at 100 MB/s, seek at 100 IO/s
                   • => copy data to avoid seeks


Monday, 15 August 2011
Inbox
                                         user2

                         user1   msg1
                                         user3
                                 msg2


                                 msg3    user4
                                  ...




Monday, 15 August 2011
Data-centric model
                         m1: {
                           sender: user1
                           content: “Mary had a little lamb”
                           recipients: user2, user3
                         }


               • but how to do ‘recipients’ for Inbox?
               • one-to-many modelled by a join table

Monday, 15 August 2011
To join
          m1: {                                        user2: {
            sender: user1                                m1: true
            subject: “A rhyme”
            content: “Mary had a little lamb”          }
          }                                            user3: {
          m2: {
            sender: user1                                m1: true
            subject: “colours”                           m2: true
            content: “Its fleece was white as snow”
          }                                            }
          m3: {                                        user4: {
            sender: user1
            subject: “loyalty”                           m2: true
            content: “And everywhere that Mary went”     m3: true
          }
                                                       }


Monday, 15 August 2011
.. or not to join
                 • Joins are expensive, so de-normalise to trade
                         off space for time
                 • We can have lots of columns, so think BIG:
                 • Make message id a time-typed super-column.
                 • This makes get_slice an efficient way of
                         searching for messages in a time window



Monday, 15 August 2011
Super Column Family
                         user2: {
                           m1: {
                             sender: user1
                             subject: “A rhyme”
                           }
                         }
                         user3: {
                           m1: {
                             sender: user1
                             subject: “A rhyme”
                           }
                           m2: {
                             sender: user1
                             subject: “colours”
                           }
                         }
                         ...



Monday, 15 August 2011
De-normalisation +
                               Cassandra
                 • have to write a copy of the record for each
                         recipient ... but writes are very cheap
                 • get_slice fetches columns for a particular
                         row, so gets received messages for a user
                 • on-disk column order is optimal for this
                         query



Monday, 15 August 2011
Conclusion




Monday, 15 August 2011
What it’s good for

                   • Horizontal scalability
                   • No single-point of failure
                   • Multi-data centre support
                   • Very high write workloads
                   • Tuneable consistency

Monday, 15 August 2011
Q?




Monday, 15 August 2011

More Related Content

More from DATAVERSITY

Data Modeling Fundamentals
Data Modeling FundamentalsData Modeling Fundamentals
Data Modeling Fundamentals
DATAVERSITY
 
Showing ROI for Your Analytic Project
Showing ROI for Your Analytic ProjectShowing ROI for Your Analytic Project
Showing ROI for Your Analytic Project
DATAVERSITY
 
How a Semantic Layer Makes Data Mesh Work at Scale
How a Semantic Layer Makes  Data Mesh Work at ScaleHow a Semantic Layer Makes  Data Mesh Work at Scale
How a Semantic Layer Makes Data Mesh Work at Scale
DATAVERSITY
 
Is Enterprise Data Literacy Possible?
Is Enterprise Data Literacy Possible?Is Enterprise Data Literacy Possible?
Is Enterprise Data Literacy Possible?
DATAVERSITY
 
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
DATAVERSITY
 
Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?
DATAVERSITY
 
Data Governance Trends - A Look Backwards and Forwards
Data Governance Trends - A Look Backwards and ForwardsData Governance Trends - A Look Backwards and Forwards
Data Governance Trends - A Look Backwards and Forwards
DATAVERSITY
 
Data Governance Trends and Best Practices To Implement Today
Data Governance Trends and Best Practices To Implement TodayData Governance Trends and Best Practices To Implement Today
Data Governance Trends and Best Practices To Implement Today
DATAVERSITY
 
2023 Trends in Enterprise Analytics
2023 Trends in Enterprise Analytics2023 Trends in Enterprise Analytics
2023 Trends in Enterprise Analytics
DATAVERSITY
 
Data Strategy Best Practices
Data Strategy Best PracticesData Strategy Best Practices
Data Strategy Best Practices
DATAVERSITY
 
Who Should Own Data Governance – IT or Business?
Who Should Own Data Governance – IT or Business?Who Should Own Data Governance – IT or Business?
Who Should Own Data Governance – IT or Business?
DATAVERSITY
 
Data Management Best Practices
Data Management Best PracticesData Management Best Practices
Data Management Best Practices
DATAVERSITY
 
MLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive AdvantageMLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive Advantage
DATAVERSITY
 
Keeping the Pulse of Your Data – Why You Need Data Observability to Improve D...
Keeping the Pulse of Your Data – Why You Need Data Observability to Improve D...Keeping the Pulse of Your Data – Why You Need Data Observability to Improve D...
Keeping the Pulse of Your Data – Why You Need Data Observability to Improve D...
DATAVERSITY
 
Empowering the Data Driven Business with Modern Business Intelligence
Empowering the Data Driven Business with Modern Business IntelligenceEmpowering the Data Driven Business with Modern Business Intelligence
Empowering the Data Driven Business with Modern Business Intelligence
DATAVERSITY
 
Enterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data ArchitectureEnterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data Architecture
DATAVERSITY
 
Data Governance Best Practices, Assessments, and Roadmaps
Data Governance Best Practices, Assessments, and RoadmapsData Governance Best Practices, Assessments, and Roadmaps
Data Governance Best Practices, Assessments, and Roadmaps
DATAVERSITY
 
Including All Your Mission-Critical Data in Modern Apps and Analytics
Including All Your Mission-Critical Data in Modern Apps and AnalyticsIncluding All Your Mission-Critical Data in Modern Apps and Analytics
Including All Your Mission-Critical Data in Modern Apps and Analytics
DATAVERSITY
 
Assessing New Database Capabilities – Multi-Model
Assessing New Database Capabilities – Multi-ModelAssessing New Database Capabilities – Multi-Model
Assessing New Database Capabilities – Multi-Model
DATAVERSITY
 
What’s in Your Data Warehouse?
What’s in Your Data Warehouse?What’s in Your Data Warehouse?
What’s in Your Data Warehouse?
DATAVERSITY
 

More from DATAVERSITY (20)

Data Modeling Fundamentals
Data Modeling FundamentalsData Modeling Fundamentals
Data Modeling Fundamentals
 
Showing ROI for Your Analytic Project
Showing ROI for Your Analytic ProjectShowing ROI for Your Analytic Project
Showing ROI for Your Analytic Project
 
How a Semantic Layer Makes Data Mesh Work at Scale
How a Semantic Layer Makes  Data Mesh Work at ScaleHow a Semantic Layer Makes  Data Mesh Work at Scale
How a Semantic Layer Makes Data Mesh Work at Scale
 
Is Enterprise Data Literacy Possible?
Is Enterprise Data Literacy Possible?Is Enterprise Data Literacy Possible?
Is Enterprise Data Literacy Possible?
 
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
 
Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?
 
Data Governance Trends - A Look Backwards and Forwards
Data Governance Trends - A Look Backwards and ForwardsData Governance Trends - A Look Backwards and Forwards
Data Governance Trends - A Look Backwards and Forwards
 
Data Governance Trends and Best Practices To Implement Today
Data Governance Trends and Best Practices To Implement TodayData Governance Trends and Best Practices To Implement Today
Data Governance Trends and Best Practices To Implement Today
 
2023 Trends in Enterprise Analytics
2023 Trends in Enterprise Analytics2023 Trends in Enterprise Analytics
2023 Trends in Enterprise Analytics
 
Data Strategy Best Practices
Data Strategy Best PracticesData Strategy Best Practices
Data Strategy Best Practices
 
Who Should Own Data Governance – IT or Business?
Who Should Own Data Governance – IT or Business?Who Should Own Data Governance – IT or Business?
Who Should Own Data Governance – IT or Business?
 
Data Management Best Practices
Data Management Best PracticesData Management Best Practices
Data Management Best Practices
 
MLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive AdvantageMLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive Advantage
 
Keeping the Pulse of Your Data – Why You Need Data Observability to Improve D...
Keeping the Pulse of Your Data – Why You Need Data Observability to Improve D...Keeping the Pulse of Your Data – Why You Need Data Observability to Improve D...
Keeping the Pulse of Your Data – Why You Need Data Observability to Improve D...
 
Empowering the Data Driven Business with Modern Business Intelligence
Empowering the Data Driven Business with Modern Business IntelligenceEmpowering the Data Driven Business with Modern Business Intelligence
Empowering the Data Driven Business with Modern Business Intelligence
 
Enterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data ArchitectureEnterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data Architecture
 
Data Governance Best Practices, Assessments, and Roadmaps
Data Governance Best Practices, Assessments, and RoadmapsData Governance Best Practices, Assessments, and Roadmaps
Data Governance Best Practices, Assessments, and Roadmaps
 
Including All Your Mission-Critical Data in Modern Apps and Analytics
Including All Your Mission-Critical Data in Modern Apps and AnalyticsIncluding All Your Mission-Critical Data in Modern Apps and Analytics
Including All Your Mission-Critical Data in Modern Apps and Analytics
 
Assessing New Database Capabilities – Multi-Model
Assessing New Database Capabilities – Multi-ModelAssessing New Database Capabilities – Multi-Model
Assessing New Database Capabilities – Multi-Model
 
What’s in Your Data Warehouse?
What’s in Your Data Warehouse?What’s in Your Data Warehouse?
What’s in Your Data Warehouse?
 

Recently uploaded

LA HUG - Video Testimonials with Chynna Morgan - June 2024
LA HUG - Video Testimonials with Chynna Morgan - June 2024LA HUG - Video Testimonials with Chynna Morgan - June 2024
LA HUG - Video Testimonials with Chynna Morgan - June 2024
Lital Barkan
 
Organizational Change Leadership Agile Tour Geneve 2024
Organizational Change Leadership Agile Tour Geneve 2024Organizational Change Leadership Agile Tour Geneve 2024
Organizational Change Leadership Agile Tour Geneve 2024
Kirill Klimov
 
Evgen Osmak: Methods of key project parameters estimation: from the shaman-in...
Evgen Osmak: Methods of key project parameters estimation: from the shaman-in...Evgen Osmak: Methods of key project parameters estimation: from the shaman-in...
Evgen Osmak: Methods of key project parameters estimation: from the shaman-in...
Lviv Startup Club
 
An introduction to the cryptocurrency investment platform Binance Savings.
An introduction to the cryptocurrency investment platform Binance Savings.An introduction to the cryptocurrency investment platform Binance Savings.
An introduction to the cryptocurrency investment platform Binance Savings.
Any kyc Account
 
The Influence of Marketing Strategy and Market Competition on Business Perfor...
The Influence of Marketing Strategy and Market Competition on Business Perfor...The Influence of Marketing Strategy and Market Competition on Business Perfor...
The Influence of Marketing Strategy and Market Competition on Business Perfor...
Adam Smith
 
Premium MEAN Stack Development Solutions for Modern Businesses
Premium MEAN Stack Development Solutions for Modern BusinessesPremium MEAN Stack Development Solutions for Modern Businesses
Premium MEAN Stack Development Solutions for Modern Businesses
SynapseIndia
 
Buy Verified PayPal Account | Buy Google 5 Star Reviews
Buy Verified PayPal Account | Buy Google 5 Star ReviewsBuy Verified PayPal Account | Buy Google 5 Star Reviews
Buy Verified PayPal Account | Buy Google 5 Star Reviews
usawebmarket
 
ikea_woodgreen_petscharity_cat-alogue_digital.pdf
ikea_woodgreen_petscharity_cat-alogue_digital.pdfikea_woodgreen_petscharity_cat-alogue_digital.pdf
ikea_woodgreen_petscharity_cat-alogue_digital.pdf
agatadrynko
 
Call 8867766396 Satta Matka Dpboss Matka Guessing Satta batta Matka 420 Satta...
Call 8867766396 Satta Matka Dpboss Matka Guessing Satta batta Matka 420 Satta...Call 8867766396 Satta Matka Dpboss Matka Guessing Satta batta Matka 420 Satta...
Call 8867766396 Satta Matka Dpboss Matka Guessing Satta batta Matka 420 Satta...
bosssp10
 
Mastering B2B Payments Webinar from BlueSnap
Mastering B2B Payments Webinar from BlueSnapMastering B2B Payments Webinar from BlueSnap
Mastering B2B Payments Webinar from BlueSnap
Norma Mushkat Gaffin
 
VAT Registration Outlined In UAE: Benefits and Requirements
VAT Registration Outlined In UAE: Benefits and RequirementsVAT Registration Outlined In UAE: Benefits and Requirements
VAT Registration Outlined In UAE: Benefits and Requirements
uae taxgpt
 
Agency Managed Advisory Board As a Solution To Career Path Defining Business ...
Agency Managed Advisory Board As a Solution To Career Path Defining Business ...Agency Managed Advisory Board As a Solution To Career Path Defining Business ...
Agency Managed Advisory Board As a Solution To Career Path Defining Business ...
Boris Ziegler
 
3.0 Project 2_ Developing My Brand Identity Kit.pptx
3.0 Project 2_ Developing My Brand Identity Kit.pptx3.0 Project 2_ Developing My Brand Identity Kit.pptx
3.0 Project 2_ Developing My Brand Identity Kit.pptx
tanyjahb
 
Authentically Social Presented by Corey Perlman
Authentically Social Presented by Corey PerlmanAuthentically Social Presented by Corey Perlman
Authentically Social Presented by Corey Perlman
Corey Perlman, Social Media Speaker and Consultant
 
Maksym Vyshnivetskyi: PMO Quality Management (UA)
Maksym Vyshnivetskyi: PMO Quality Management (UA)Maksym Vyshnivetskyi: PMO Quality Management (UA)
Maksym Vyshnivetskyi: PMO Quality Management (UA)
Lviv Startup Club
 
The effects of customers service quality and online reviews on customer loyal...
The effects of customers service quality and online reviews on customer loyal...The effects of customers service quality and online reviews on customer loyal...
The effects of customers service quality and online reviews on customer loyal...
balatucanapplelovely
 
Company Valuation webinar series - Tuesday, 4 June 2024
Company Valuation webinar series - Tuesday, 4 June 2024Company Valuation webinar series - Tuesday, 4 June 2024
Company Valuation webinar series - Tuesday, 4 June 2024
FelixPerez547899
 
ModelingMarketingStrategiesMKS.CollumbiaUniversitypdf
ModelingMarketingStrategiesMKS.CollumbiaUniversitypdfModelingMarketingStrategiesMKS.CollumbiaUniversitypdf
ModelingMarketingStrategiesMKS.CollumbiaUniversitypdf
fisherameliaisabella
 
Business Valuation Principles for Entrepreneurs
Business Valuation Principles for EntrepreneursBusiness Valuation Principles for Entrepreneurs
Business Valuation Principles for Entrepreneurs
Ben Wann
 
Training my puppy and implementation in this story
Training my puppy and implementation in this storyTraining my puppy and implementation in this story
Training my puppy and implementation in this story
WilliamRodrigues148
 

Recently uploaded (20)

LA HUG - Video Testimonials with Chynna Morgan - June 2024
LA HUG - Video Testimonials with Chynna Morgan - June 2024LA HUG - Video Testimonials with Chynna Morgan - June 2024
LA HUG - Video Testimonials with Chynna Morgan - June 2024
 
Organizational Change Leadership Agile Tour Geneve 2024
Organizational Change Leadership Agile Tour Geneve 2024Organizational Change Leadership Agile Tour Geneve 2024
Organizational Change Leadership Agile Tour Geneve 2024
 
Evgen Osmak: Methods of key project parameters estimation: from the shaman-in...
Evgen Osmak: Methods of key project parameters estimation: from the shaman-in...Evgen Osmak: Methods of key project parameters estimation: from the shaman-in...
Evgen Osmak: Methods of key project parameters estimation: from the shaman-in...
 
An introduction to the cryptocurrency investment platform Binance Savings.
An introduction to the cryptocurrency investment platform Binance Savings.An introduction to the cryptocurrency investment platform Binance Savings.
An introduction to the cryptocurrency investment platform Binance Savings.
 
The Influence of Marketing Strategy and Market Competition on Business Perfor...
The Influence of Marketing Strategy and Market Competition on Business Perfor...The Influence of Marketing Strategy and Market Competition on Business Perfor...
The Influence of Marketing Strategy and Market Competition on Business Perfor...
 
Premium MEAN Stack Development Solutions for Modern Businesses
Premium MEAN Stack Development Solutions for Modern BusinessesPremium MEAN Stack Development Solutions for Modern Businesses
Premium MEAN Stack Development Solutions for Modern Businesses
 
Buy Verified PayPal Account | Buy Google 5 Star Reviews
Buy Verified PayPal Account | Buy Google 5 Star ReviewsBuy Verified PayPal Account | Buy Google 5 Star Reviews
Buy Verified PayPal Account | Buy Google 5 Star Reviews
 
ikea_woodgreen_petscharity_cat-alogue_digital.pdf
ikea_woodgreen_petscharity_cat-alogue_digital.pdfikea_woodgreen_petscharity_cat-alogue_digital.pdf
ikea_woodgreen_petscharity_cat-alogue_digital.pdf
 
Call 8867766396 Satta Matka Dpboss Matka Guessing Satta batta Matka 420 Satta...
Call 8867766396 Satta Matka Dpboss Matka Guessing Satta batta Matka 420 Satta...Call 8867766396 Satta Matka Dpboss Matka Guessing Satta batta Matka 420 Satta...
Call 8867766396 Satta Matka Dpboss Matka Guessing Satta batta Matka 420 Satta...
 
Mastering B2B Payments Webinar from BlueSnap
Mastering B2B Payments Webinar from BlueSnapMastering B2B Payments Webinar from BlueSnap
Mastering B2B Payments Webinar from BlueSnap
 
VAT Registration Outlined In UAE: Benefits and Requirements
VAT Registration Outlined In UAE: Benefits and RequirementsVAT Registration Outlined In UAE: Benefits and Requirements
VAT Registration Outlined In UAE: Benefits and Requirements
 
Agency Managed Advisory Board As a Solution To Career Path Defining Business ...
Agency Managed Advisory Board As a Solution To Career Path Defining Business ...Agency Managed Advisory Board As a Solution To Career Path Defining Business ...
Agency Managed Advisory Board As a Solution To Career Path Defining Business ...
 
3.0 Project 2_ Developing My Brand Identity Kit.pptx
3.0 Project 2_ Developing My Brand Identity Kit.pptx3.0 Project 2_ Developing My Brand Identity Kit.pptx
3.0 Project 2_ Developing My Brand Identity Kit.pptx
 
Authentically Social Presented by Corey Perlman
Authentically Social Presented by Corey PerlmanAuthentically Social Presented by Corey Perlman
Authentically Social Presented by Corey Perlman
 
Maksym Vyshnivetskyi: PMO Quality Management (UA)
Maksym Vyshnivetskyi: PMO Quality Management (UA)Maksym Vyshnivetskyi: PMO Quality Management (UA)
Maksym Vyshnivetskyi: PMO Quality Management (UA)
 
The effects of customers service quality and online reviews on customer loyal...
The effects of customers service quality and online reviews on customer loyal...The effects of customers service quality and online reviews on customer loyal...
The effects of customers service quality and online reviews on customer loyal...
 
Company Valuation webinar series - Tuesday, 4 June 2024
Company Valuation webinar series - Tuesday, 4 June 2024Company Valuation webinar series - Tuesday, 4 June 2024
Company Valuation webinar series - Tuesday, 4 June 2024
 
ModelingMarketingStrategiesMKS.CollumbiaUniversitypdf
ModelingMarketingStrategiesMKS.CollumbiaUniversitypdfModelingMarketingStrategiesMKS.CollumbiaUniversitypdf
ModelingMarketingStrategiesMKS.CollumbiaUniversitypdf
 
Business Valuation Principles for Entrepreneurs
Business Valuation Principles for EntrepreneursBusiness Valuation Principles for Entrepreneurs
Business Valuation Principles for Entrepreneurs
 
Training my puppy and implementation in this story
Training my puppy and implementation in this storyTraining my puppy and implementation in this story
Training my puppy and implementation in this story
 

Cassandra: Two data centers and great performance

  • 1. Cassandra FTW Andrew Byde Principal Scientist Monday, 15 August 2011
  • 2. Menu • Introduction • Data model + storage architecture • Partitioning + replication • Consistency • De-normalisation Monday, 15 August 2011
  • 3. History + design Monday, 15 August 2011
  • 4. History • 2007: Started at Facebook for inbox search • July 2008: Open sourced by Facebook • March 2009: Apache Incubator • February 2010: Apache top-level project • May 2011:Version 0.8 Monday, 15 August 2011
  • 5. What it’s good for • Horizontal scalability • No single-point of failure • Multi-data centre support • Very high write workloads • Tuneable consistency Monday, 15 August 2011
  • 6. What it’s not so good for • Transactions • Read heavy workloads • Low latency applications • compared to in-memory dbs Monday, 15 August 2011
  • 7. Data model Monday, 15 August 2011
  • 8. Keyspaces and Column Families SQL Cassandra Database row/key col_1 col_2 Keyspace row/key col_1 col_1 row/ col_1 col_1 Table Column Family Keyspaces & CFs have different sets of configuration settings Monday, 15 August 2011
  • 9. Column Family key: { column: value, column: value, ... } Monday, 15 August 2011
  • 10. Rows and columns col1 col2 col3 col4 col5 col6 col7 row1 x x x row2 x x x x x row3 x x x x x row4 x x x x row5 x x x x row6 x row7 x x x Monday, 15 August 2011
  • 11. Reads • get • get_slice One row, some cols • name predicate • slice range • multiget_slice Multiple rows • get_range_slices Monday, 15 August 2011
  • 12. get col1 col2 col3 col4 col5 col6 col7 row1 x x x row2 x x x x x row3 x x x x x row4 x x x x row5 x x x x row6 x row7 x x x Monday, 15 August 2011
  • 13. get_slice: name predicate col1 col2 col3 col4 col5 col6 col7 row1 x x x row2 x x x x x row3 x x x x x row4 x x x x row5 x x x x row6 x row7 x x x Monday, 15 August 2011
  • 14. get_slice: slice range col1 col2 col3 col4 col5 col6 col7 row1 x x x row2 x x x x x row3 x x x x x x row4 x x x x row5 x x x x row6 x row7 x x x Monday, 15 August 2011
  • 15. multiget_slice: name predicate col1 col2 col3 col4 col5 col6 col7 row1 x x x row2 x x x x x row3 x x x x x row4 x x x x row5 x x x x row6 x row7 x x x Monday, 15 August 2011
  • 16. get_range_slices: slice range col1 col2 col3 col4 col5 col6 col7 row1 x x x row2 x x x x x row3 x x x x x row4 x x x x row5 x x x x row6 x row7 x x x Monday, 15 August 2011
  • 17. Storage architecture Monday, 15 August 2011
  • 18. Data Layout writes key-value insert on-disk un-ordered commit log in-memory ... (key,col)-sorted memtable flush on-disk 01001101110101000 01001101110101000 (key,col)-sorted ... SSTables Monday, 15 August 2011
  • 19. Data Layout SSTables SSTable Bloom Filter 01001101110101000 Index Data Monday, 15 August 2011
  • 20. Data Layout reads ? 01001101110101000 01001101110101000 010011011101010001111010101001 Monday, 15 August 2011
  • 21. Data Layout reads ? X X 01001101110101000 01001101110101000 010011011101010001111010101001 Monday, 15 August 2011
  • 22. Distribution: Partitioning + Replication Monday, 15 August 2011
  • 23. Partitioning + Replication (k, v) ? Monday, 15 August 2011
  • 24. Partitioning + Replication • Partitioning data on to nodes • load balancing • row-based • Replication • to protect against failure • better availability Monday, 15 August 2011
  • 25. Partitioning • Random: take hash of row key • good for load balancing • bad for range queries • Ordered: subdivide key space • bad for load balancing • good for range queries • Or build your own... Monday, 15 August 2011
  • 26. Simple Replication (k, v) Nodes arranged on a ‘ring’ Monday, 15 August 2011
  • 27. Simple Replication Primary location (k, v) Nodes arranged on a ‘ring’ Monday, 15 August 2011
  • 28. Simple Replication Primary location (k, v) Extra copies are successors on the ring Nodes arranged on a ‘ring’ Monday, 15 August 2011
  • 29. Topology-aware Replication • Snitch : node IP (DataCenter, rack) • EC2Snitch • Region DC; availability_zone rack • PropertyFileSnitch • Configured from a file Monday, 15 August 2011
  • 30. Topology-aware Replication DC 1 DC 2 (k, v) r1 r2 r1 r2 Monday, 15 August 2011
  • 31. Topology-aware Replication DC 1 DC 2 (k, v) r1 r2 r1 r2 Monday, 15 August 2011
  • 32. Topology-aware Replication DC 1 DC 2 extra copies to different data center (k, v) r1 r2 r1 r2 Monday, 15 August 2011
  • 33. Topology-aware Replication DC 1 DC 2 extra copies to different data center (k, v) spread across racks within a r1 r2 r1 r2 data center Monday, 15 August 2011
  • 34. Distribution: Consistency Monday, 15 August 2011
  • 35. Consistency Level • How many replicas must respond in order to declare success • W/N must succeed for write to succeed • write with client-generated timestamp • R/N must succeed for read to succeed • return most recent, by timestamp Monday, 15 August 2011
  • 36. Consistency Level • 1, 2, 3 responses • Quorum (more than half) • Quorum in local data center • Quorum in each data center Monday, 15 August 2011
  • 37. Maintaining consistency • Read repair • Hinted handoff • Anti-entropy Monday, 15 August 2011
  • 38. Read repair • If the replicas disagree on read, send most recent data back n1 read k? n2 n3 Monday, 15 August 2011
  • 39. Read repair • If the replicas disagree on read, send most recent data back n1 v, t1 read k? n2 not found! n3 v’, t2 Monday, 15 August 2011
  • 40. Read repair • If the replicas disagree on read, send most recent data back n1 v, t1 n2 not found! n3 v’, t2 Monday, 15 August 2011
  • 41. Read repair • If the replicas disagree on read, send most recent data back n1 n2 n3 write (k, v’, t2) Monday, 15 August 2011
  • 42. Hinted handoff • When a node is unavailable • Writes can be written to any node as a hint • Delivered when the node comes back online Monday, 15 August 2011
  • 43. Anti-entropy • Equivalent to ‘read repair all’ • Requires reading all data (woah) • (Although only hashes are sent to calculate diffs) • Manual process Monday, 15 August 2011
  • 45. De-normalisation • Disk space is much cheaper than disk seeks • Read at 100 MB/s, seek at 100 IO/s • => copy data to avoid seeks Monday, 15 August 2011
  • 46. Inbox user2 user1 msg1 user3 msg2 msg3 user4 ... Monday, 15 August 2011
  • 47. Data-centric model m1: { sender: user1 content: “Mary had a little lamb” recipients: user2, user3 } • but how to do ‘recipients’ for Inbox? • one-to-many modelled by a join table Monday, 15 August 2011
  • 48. To join m1: { user2: { sender: user1 m1: true subject: “A rhyme” content: “Mary had a little lamb” } } user3: { m2: { sender: user1 m1: true subject: “colours” m2: true content: “Its fleece was white as snow” } } m3: { user4: { sender: user1 subject: “loyalty” m2: true content: “And everywhere that Mary went” m3: true } } Monday, 15 August 2011
  • 49. .. or not to join • Joins are expensive, so de-normalise to trade off space for time • We can have lots of columns, so think BIG: • Make message id a time-typed super-column. • This makes get_slice an efficient way of searching for messages in a time window Monday, 15 August 2011
  • 50. Super Column Family user2: { m1: { sender: user1 subject: “A rhyme” } } user3: { m1: { sender: user1 subject: “A rhyme” } m2: { sender: user1 subject: “colours” } } ... Monday, 15 August 2011
  • 51. De-normalisation + Cassandra • have to write a copy of the record for each recipient ... but writes are very cheap • get_slice fetches columns for a particular row, so gets received messages for a user • on-disk column order is optimal for this query Monday, 15 August 2011
  • 53. What it’s good for • Horizontal scalability • No single-point of failure • Multi-data centre support • Very high write workloads • Tuneable consistency Monday, 15 August 2011