SlideShare a Scribd company logo
Cassandra - A
Decentralized
Structured Storage
System
Avinash Lakshman, Facebook
Prashant Malik, Facebook




                                       Iván Carballo
                             icf1e11@ecs.soton.ac.uk
Content

• Introduction
• Data Model
• API
• System architecture
• Performance example
• Conclusion
                        2
Introduction
• Cassandra is a NoSQL database.

• Originally developed by Facebook.

• Cassandra was designed to:
 •   Manage large amounts of structured data.
 •   Run at tope of a system of hundreds of nodes.
 •   Handle high write throughput.
 •   Run on cheap hardware (scale out).
 •   Provide high scalability, reliability and performance.
                                                              3
Data Model




                                                                                4
Source: http://www.inmensia.com/blog/20100327/desmitificando_a_cassandra.html
Data Model (Example)




                                                                                              5

Source: http://www.divconq.com/2010/how-to-add-and-retrieve-data-from-a-cassandra-database/
API (Application Programming Interface)

• Three simple methods:

  • insert (table, key, rowMutation)

  • get (table, key, columnName)

  • delete (table, key, columnName)


                                          6
System architecture
• Partitioning
 • Data is partitioned dynamically over a set of nodes.
 • Uses consistent hashing.


• Replication
 • Each data item is replicated at N hosts.

• Failure detection
 • Every node knows if the rest of nodes in the system
   are up or down.
                                                          7
Performance example
• Facebook Inbox Search
 • More than 50 TB
 • 150 nodes
 • Different datacentres (west and east cost)


• Read performance:
    Latency       Search interactions    Term search
      Min                7.69 ms            7.78 ms
     Median             15.69 ms           18.27 ms
      Max               26.13 ms           44.41 ms
                                                       8
Conclusion

• They successfully implemented a system
  which provides:

 • Scalability

 • High performance

 • Wide applicability


                                           9
Paper

A. Lakshman and P. Malik, “Cassandra: a
decentralized structured storage system”, ACM
SIGOPS Operating Systems Review, vol. 44, n. 2,
pp. 35-40, April 2010.




                                                  10
THANK YOU

ANY QUESTIONS?


                 11

More Related Content

What's hot

Microservice-based software architecture
Microservice-based software architectureMicroservice-based software architecture
Microservice-based software architecture
ArangoDB Database
 
Introduction to nosql | NoSQL databases
Introduction to nosql | NoSQL databasesIntroduction to nosql | NoSQL databases
Introduction to nosql | NoSQL databases
ShilpaKrishna6
 
NoSQL databases
NoSQL databasesNoSQL databases
NoSQL databases
Meshal Albeedhani
 
Spring Data Cassandra
Spring Data CassandraSpring Data Cassandra
Spring Data Cassandra
niallmilton
 
Visualizing big data in the browser using spark
Visualizing big data in the browser using sparkVisualizing big data in the browser using spark
Visualizing big data in the browser using spark
Databricks
 
Andriy Zrobok "MS SQL 2019 - new for Big Data Processing"
Andriy Zrobok "MS SQL 2019 - new for Big Data Processing"Andriy Zrobok "MS SQL 2019 - new for Big Data Processing"
Andriy Zrobok "MS SQL 2019 - new for Big Data Processing"
Lviv Startup Club
 
20141206 4 q14_dataconference_i_am_your_db
20141206 4 q14_dataconference_i_am_your_db20141206 4 q14_dataconference_i_am_your_db
20141206 4 q14_dataconference_i_am_your_db
hyeongchae lee
 
NoSQL databases pros and cons
NoSQL databases pros and consNoSQL databases pros and cons
NoSQL databases pros and cons
Fabio Fumarola
 
No sql (1)
No sql (1)No sql (1)
No sql (1)
sanskar2674
 
NoSQL databases - An introduction
NoSQL databases - An introductionNoSQL databases - An introduction
NoSQL databases - An introduction
Pooyan Mehrparvar
 
Membase Meetup 2010
Membase Meetup 2010Membase Meetup 2010
Membase Meetup 2010
Membase
 
Presto at Facebook - Presto Meetup @ Boston (10/6/2015)
Presto at Facebook - Presto Meetup @ Boston (10/6/2015)Presto at Facebook - Presto Meetup @ Boston (10/6/2015)
Presto at Facebook - Presto Meetup @ Boston (10/6/2015)
Martin Traverso
 
Distcp gobblin
Distcp gobblinDistcp gobblin
Distcp gobblin
Vasanth Rajamani
 
U-SQL Federated Distributed Queries (SQLBits 2016)
U-SQL Federated Distributed Queries (SQLBits 2016)U-SQL Federated Distributed Queries (SQLBits 2016)
U-SQL Federated Distributed Queries (SQLBits 2016)
Michael Rys
 
Elastic Stack Introduction
Elastic Stack IntroductionElastic Stack Introduction
Elastic Stack Introduction
Vikram Shinde
 
Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQL
balwinders
 

What's hot (16)

Microservice-based software architecture
Microservice-based software architectureMicroservice-based software architecture
Microservice-based software architecture
 
Introduction to nosql | NoSQL databases
Introduction to nosql | NoSQL databasesIntroduction to nosql | NoSQL databases
Introduction to nosql | NoSQL databases
 
NoSQL databases
NoSQL databasesNoSQL databases
NoSQL databases
 
Spring Data Cassandra
Spring Data CassandraSpring Data Cassandra
Spring Data Cassandra
 
Visualizing big data in the browser using spark
Visualizing big data in the browser using sparkVisualizing big data in the browser using spark
Visualizing big data in the browser using spark
 
Andriy Zrobok "MS SQL 2019 - new for Big Data Processing"
Andriy Zrobok "MS SQL 2019 - new for Big Data Processing"Andriy Zrobok "MS SQL 2019 - new for Big Data Processing"
Andriy Zrobok "MS SQL 2019 - new for Big Data Processing"
 
20141206 4 q14_dataconference_i_am_your_db
20141206 4 q14_dataconference_i_am_your_db20141206 4 q14_dataconference_i_am_your_db
20141206 4 q14_dataconference_i_am_your_db
 
NoSQL databases pros and cons
NoSQL databases pros and consNoSQL databases pros and cons
NoSQL databases pros and cons
 
No sql (1)
No sql (1)No sql (1)
No sql (1)
 
NoSQL databases - An introduction
NoSQL databases - An introductionNoSQL databases - An introduction
NoSQL databases - An introduction
 
Membase Meetup 2010
Membase Meetup 2010Membase Meetup 2010
Membase Meetup 2010
 
Presto at Facebook - Presto Meetup @ Boston (10/6/2015)
Presto at Facebook - Presto Meetup @ Boston (10/6/2015)Presto at Facebook - Presto Meetup @ Boston (10/6/2015)
Presto at Facebook - Presto Meetup @ Boston (10/6/2015)
 
Distcp gobblin
Distcp gobblinDistcp gobblin
Distcp gobblin
 
U-SQL Federated Distributed Queries (SQLBits 2016)
U-SQL Federated Distributed Queries (SQLBits 2016)U-SQL Federated Distributed Queries (SQLBits 2016)
U-SQL Federated Distributed Queries (SQLBits 2016)
 
Elastic Stack Introduction
Elastic Stack IntroductionElastic Stack Introduction
Elastic Stack Introduction
 
Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQL
 

Similar to Cassandra: a NoSQL storage system

Cassandra - A decentralized storage system
Cassandra - A decentralized storage systemCassandra - A decentralized storage system
Cassandra - A decentralized storage system
Arunit Gupta
 
6.1-Cassandra.ppt
6.1-Cassandra.ppt6.1-Cassandra.ppt
6.1-Cassandra.ppt
yashsharma863914
 
6.1-Cassandra.ppt
6.1-Cassandra.ppt6.1-Cassandra.ppt
6.1-Cassandra.ppt
DanBarcan2
 
Cassandra
CassandraCassandra
Cassandra
ssuserbad56d
 
Presentation of Apache Cassandra
Presentation of Apache Cassandra Presentation of Apache Cassandra
Presentation of Apache Cassandra
Nikiforos Botis
 
NoSQL - Cassandra & MongoDB.pptx
NoSQL -  Cassandra & MongoDB.pptxNoSQL -  Cassandra & MongoDB.pptx
NoSQL - Cassandra & MongoDB.pptx
Naveen Kumar
 
Cassandra
CassandraCassandra
Cassandra
exsuns
 
2. Lecture2_NOSQL_KeyValue.ppt
2. Lecture2_NOSQL_KeyValue.ppt2. Lecture2_NOSQL_KeyValue.ppt
2. Lecture2_NOSQL_KeyValue.ppt
ShaimaaMohamedGalal
 
No sq lv1_0
No sq lv1_0No sq lv1_0
No sq lv1_0
Tuan Luong
 
6269441.ppt
6269441.ppt6269441.ppt
6269441.ppt
Swapna Jk
 
Introduction to Mahout and Machine Learning
Introduction to Mahout and Machine LearningIntroduction to Mahout and Machine Learning
Introduction to Mahout and Machine Learning
Varad Meru
 
Apache Cassandra training. Overview and Basics
Apache Cassandra training. Overview and BasicsApache Cassandra training. Overview and Basics
Apache Cassandra training. Overview and Basics
Oleg Magazov
 
A schema generation approach for column oriented no sql data stores
A schema generation approach for column oriented no sql data storesA schema generation approach for column oriented no sql data stores
A schema generation approach for column oriented no sql data stores
KIRAN V
 
NoSql Data Management
NoSql Data ManagementNoSql Data Management
NoSql Data Management
sameerfaizan
 
2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...
2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...
2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...
The Statistical and Applied Mathematical Sciences Institute
 
Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena Edelson
Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena EdelsonStreaming Analytics with Spark, Kafka, Cassandra and Akka by Helena Edelson
Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena Edelson
Spark Summit
 
5266732.ppt
5266732.ppt5266732.ppt
5266732.ppt
hothyfa
 
Introduction to cassandra
Introduction to cassandraIntroduction to cassandra
Introduction to cassandra
Nguyen Quang
 
BigData Developers MeetUp
BigData Developers MeetUpBigData Developers MeetUp
BigData Developers MeetUp
Christian Johannsen
 
Real time analytics using Hadoop and Elasticsearch
Real time analytics using Hadoop and ElasticsearchReal time analytics using Hadoop and Elasticsearch
Real time analytics using Hadoop and Elasticsearch
Abhishek Andhavarapu
 

Similar to Cassandra: a NoSQL storage system (20)

Cassandra - A decentralized storage system
Cassandra - A decentralized storage systemCassandra - A decentralized storage system
Cassandra - A decentralized storage system
 
6.1-Cassandra.ppt
6.1-Cassandra.ppt6.1-Cassandra.ppt
6.1-Cassandra.ppt
 
6.1-Cassandra.ppt
6.1-Cassandra.ppt6.1-Cassandra.ppt
6.1-Cassandra.ppt
 
Cassandra
CassandraCassandra
Cassandra
 
Presentation of Apache Cassandra
Presentation of Apache Cassandra Presentation of Apache Cassandra
Presentation of Apache Cassandra
 
NoSQL - Cassandra & MongoDB.pptx
NoSQL -  Cassandra & MongoDB.pptxNoSQL -  Cassandra & MongoDB.pptx
NoSQL - Cassandra & MongoDB.pptx
 
Cassandra
CassandraCassandra
Cassandra
 
2. Lecture2_NOSQL_KeyValue.ppt
2. Lecture2_NOSQL_KeyValue.ppt2. Lecture2_NOSQL_KeyValue.ppt
2. Lecture2_NOSQL_KeyValue.ppt
 
No sq lv1_0
No sq lv1_0No sq lv1_0
No sq lv1_0
 
6269441.ppt
6269441.ppt6269441.ppt
6269441.ppt
 
Introduction to Mahout and Machine Learning
Introduction to Mahout and Machine LearningIntroduction to Mahout and Machine Learning
Introduction to Mahout and Machine Learning
 
Apache Cassandra training. Overview and Basics
Apache Cassandra training. Overview and BasicsApache Cassandra training. Overview and Basics
Apache Cassandra training. Overview and Basics
 
A schema generation approach for column oriented no sql data stores
A schema generation approach for column oriented no sql data storesA schema generation approach for column oriented no sql data stores
A schema generation approach for column oriented no sql data stores
 
NoSql Data Management
NoSql Data ManagementNoSql Data Management
NoSql Data Management
 
2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...
2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...
2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...
 
Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena Edelson
Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena EdelsonStreaming Analytics with Spark, Kafka, Cassandra and Akka by Helena Edelson
Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena Edelson
 
5266732.ppt
5266732.ppt5266732.ppt
5266732.ppt
 
Introduction to cassandra
Introduction to cassandraIntroduction to cassandra
Introduction to cassandra
 
BigData Developers MeetUp
BigData Developers MeetUpBigData Developers MeetUp
BigData Developers MeetUp
 
Real time analytics using Hadoop and Elasticsearch
Real time analytics using Hadoop and ElasticsearchReal time analytics using Hadoop and Elasticsearch
Real time analytics using Hadoop and Elasticsearch
 

Recently uploaded

Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Wask
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 
Webinar: Designing a schema for a Data Warehouse
Webinar: Designing a schema for a Data WarehouseWebinar: Designing a schema for a Data Warehouse
Webinar: Designing a schema for a Data Warehouse
Federico Razzoli
 
Project Management Semester Long Project - Acuity
Project Management Semester Long Project - AcuityProject Management Semester Long Project - Acuity
Project Management Semester Long Project - Acuity
jpupo2018
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
Pixlogix Infotech
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
Tatiana Kojar
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
DanBrown980551
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
Hiroshi SHIBATA
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
akankshawande
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
Jason Packer
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
panagenda
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Speck&Tech
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
ssuserfac0301
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
Mariano Tinti
 
OpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - AuthorizationOpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - Authorization
David Brossard
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
kumardaparthi1024
 

Recently uploaded (20)

Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 
Webinar: Designing a schema for a Data Warehouse
Webinar: Designing a schema for a Data WarehouseWebinar: Designing a schema for a Data Warehouse
Webinar: Designing a schema for a Data Warehouse
 
Project Management Semester Long Project - Acuity
Project Management Semester Long Project - AcuityProject Management Semester Long Project - Acuity
Project Management Semester Long Project - Acuity
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
 
OpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - AuthorizationOpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - Authorization
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
 

Cassandra: a NoSQL storage system

  • 1. Cassandra - A Decentralized Structured Storage System Avinash Lakshman, Facebook Prashant Malik, Facebook Iván Carballo icf1e11@ecs.soton.ac.uk
  • 2. Content • Introduction • Data Model • API • System architecture • Performance example • Conclusion 2
  • 3. Introduction • Cassandra is a NoSQL database. • Originally developed by Facebook. • Cassandra was designed to: • Manage large amounts of structured data. • Run at tope of a system of hundreds of nodes. • Handle high write throughput. • Run on cheap hardware (scale out). • Provide high scalability, reliability and performance. 3
  • 4. Data Model 4 Source: http://www.inmensia.com/blog/20100327/desmitificando_a_cassandra.html
  • 5. Data Model (Example) 5 Source: http://www.divconq.com/2010/how-to-add-and-retrieve-data-from-a-cassandra-database/
  • 6. API (Application Programming Interface) • Three simple methods: • insert (table, key, rowMutation) • get (table, key, columnName) • delete (table, key, columnName) 6
  • 7. System architecture • Partitioning • Data is partitioned dynamically over a set of nodes. • Uses consistent hashing. • Replication • Each data item is replicated at N hosts. • Failure detection • Every node knows if the rest of nodes in the system are up or down. 7
  • 8. Performance example • Facebook Inbox Search • More than 50 TB • 150 nodes • Different datacentres (west and east cost) • Read performance: Latency Search interactions Term search Min 7.69 ms 7.78 ms Median 15.69 ms 18.27 ms Max 26.13 ms 44.41 ms 8
  • 9. Conclusion • They successfully implemented a system which provides: • Scalability • High performance • Wide applicability 9
  • 10. Paper A. Lakshman and P. Malik, “Cassandra: a decentralized structured storage system”, ACM SIGOPS Operating Systems Review, vol. 44, n. 2, pp. 35-40, April 2010. 10