SlideShare a Scribd company logo
Cassandra
Structured Storage System over a P2P Network




          Avinash Lakshman, Prashant Malik
Why Cassandra?
• Lots of data
  – Copies of messages, reverse indices of
    messages, per user data.
• Many incoming requests resulting in a lot
  of random reads and random writes.
• No existing production ready solutions in
  the market meet these requirements.
Design Goals
• High availability
• Eventual consistency
  – trade-off strong consistency in favor of high
    availability
• Incremental scalability
• Optimistic Replication
• “Knobs” to tune tradeoffs between consistency,
  durability and latency
• Low total cost of ownership
• Minimal administration
Data Model                                                       Columns are
                                                                                     added and
                              ColumnFamily1 Name : MailList                           modified
                                                                         Type : Simple Sort : Name
 KEY                          Name : tid1         Name : tid2           Name : tid3 dynamically
                                                                                        Name : tid4
                              Value : <Binary>    Value : <Binary>      Value : <Binary>        Value : <Binary>
                              TimeStamp : t1      TimeStamp : t2        TimeStamp : t3          TimeStamp : t4




                        ColumnFamily2            Name : WordList            Type : Super            Sort : Time
Column Families         Name : aloha                                                     Name : dude
  are declared
                         C1             C2             C3          C4                      C2             C6
     upfront
 SuperColumns            V1             V2             V3          V4                      V2             V6

 are added and           T1             T2             T3          T4                      T2             T6

    modified
Columns are
  dynamically
 added and
  modified        ColumnFamily3 Name : System                Type : Super       Sort : Name
dynamically       Name : hint1         Name : hint2         Name : hint3       Name : hint4
                  <Column List>        <Column List>        <Column List>      <Column List>
Write Operations
• A client issues a write request to a random
  node in the Cassandra cluster.
• The “Partitioner” determines the nodes
  responsible for the data.
• Locally, write operations are logged and
  then applied to an in-memory version.
• Commit log is stored on a dedicated disk
  local to the machine.
Write cont’d
Key (CF1 , CF2 , CF3)                                                         • Data size
                                                                              • Number of Objects
                                   Memtable ( CF1)
                                                                              • Lifetime

 Commit Log                        Memtable ( CF2)
 Binary serialized
 Key ( CF1 , CF2 , CF3 )           Memtable ( CF2)

                                                                         Data file on disk
                                               <Key name><Size of key Data><Index of columns/supercolumns><
                                               Serialized column family>
                           K128 Offset         ---
                                               ---
                           K256 Offset          BLOCK Index <Key Name> Offset, <Key Name> Offset
     Dedicated Disk
                           K384 Offset         ---
                                               ---

                           Bloom Filter        <Key name><Size of key Data><Index of columns/supercolumns><
                                               Serialized column family>

                           (Index in memory)
Compactions
                                                     K2 < Serialized data >             K4 < Serialized data >
              K1 < Serialized data >
                                                     K10 < Serialized data >            K5 < Serialized data >
              K2 < Serialized data >
                                                     K30 < Serialized data >            K10 < Serialized data >
              K3 < Serialized data >



                                   DELETED
                                                     --                                 --
              --
                                        Sorted       --                        Sorted   --
Sorted        --
                                                     --                                 --
              --




                                            MERGE SORT


   Index File
                                                   K1 < Serialized data >
          Loaded in memory                         K2 < Serialized data >
                                                   K3 < Serialized data >
         K1 Offset
                                                   K4 < Serialized data >
         K5 Offset                     Sorted
                                                   K5 < Serialized data >
         K30 Offset
                                                   K10 < Serialized data >
         Bloom Filter
                                                   K30 < Serialized data >

                                                 Data File
Write Properties
•   No locks in the critical path
•   Sequential disk access
•   Behaves like a write back Cache
•   Append support without read ahead
•   Atomicity guarantee for a key
• “Always Writable”
    – accept writes during failure scenarios
Read
                         Client


                  Query       Result

                       Cassandra Cluster


          Closest replica     Result                   Read repair if
                                                       digests differ
                        Replica A


                       Digest Query
Digest Response                            Digest Response


           Replica B                   Replica C
Partitioning And Replication
                          1 0           h(key1)
                    E
                                      A           N=3

          C

h(key2)                                    F


                                       B
              D

                          1/2
                                                        10
Cluster Membership and Failure
              Detection
•   Gossip protocol is used for cluster membership.
•   Super lightweight with mathematically provable properties.
•   State disseminated in O(logN) rounds where N is the number of
    nodes in the cluster.
•   Every T seconds each member increments its heartbeat counter and
    selects one other member to send its list to.
•   A member merges the list with its own list .
Accrual Failure Detector
•   Valuable for system management, replication, load balancing etc.
•   Defined as a failure detector that outputs a value, PHI, associated
    with each process.
•   Also known as Adaptive Failure detectors - designed to adapt to
    changing network conditions.
•   The value output, PHI, represents a suspicion level.
•   Applications set an appropriate threshold, trigger suspicions and
    perform appropriate actions.
•   In Cassandra the average time taken to detect a failure is 10-15
    seconds with the PHI threshold set at 5.
Properties of the Failure Detector
•   If a process p is faulty, the suspicion level
                  Φ(t)     ∞as t     ∞.
•   If a process p is faulty, there is a time after which Φ(t) is monotonic
    increasing.
•   A process p is correct      Φ(t) has an ub over an infinite execution.
•   If process p is correct, then for any time T,
                  Φ(t) = 0 for t >= T.
Implementation
•   PHI estimation is done in three phases
     – Inter arrival times for each member are stored in a sampling
       window.
     – Estimate the distribution of the above inter arrival times.
     – Gossip follows an exponential distribution.
     – The value of PHI is now computed as follows:
         • Φ(t) = -log10( P(tnow – tlast) )
                   where P(t) is the CDF of an exponential distribution. P(t) denotes the
                   probability that a heartbeat will arrive more than t units after the previous
                   one. P(t) = ( 1 – e-tλ )
The overall mechanism is described in the figure below.
Information Flow in the
    Implementation
Performance Benchmark
• Loading of data - limited by network
  bandwidth.
• Read performance for Inbox Search in
  production:

              Search Interactions Term Search
    Min       7.69 ms            7.78 ms
    Median    15.69 ms           18.27 ms
    Average   26.13 ms           44.41 ms
MySQL Comparison
• MySQL > 50 GB Data
  Writes Average : ~300 ms
  Reads Average : ~350 ms
• Cassandra > 50 GB Data
  Writes Average : 0.12 ms
  Reads Average : 15 ms
Lessons Learnt
• Add fancy features only when absolutely
  required.
• Many types of failures are possible.
• Big systems need proper systems-level
  monitoring.
• Value simple designs
Future work
•   Atomicity guarantees across multiple keys
•   Analysis support via Map/Reduce
•   Distributed transactions
•   Compression support
•   Granular security via ACL’s
Questions?

More Related Content

What's hot

Cheatsheet of msdos
Cheatsheet of msdosCheatsheet of msdos
Cheatsheet of msdos
Shuvradeb Barman Srijon
 
Logging Last Resource Optimization for Distributed Transactions in Oracle We...
Logging Last Resource Optimization for Distributed Transactions in  Oracle We...Logging Last Resource Optimization for Distributed Transactions in  Oracle We...
Logging Last Resource Optimization for Distributed Transactions in Oracle We...Gera Shegalov
 
Availability and Integrity in hadoop (Strata EU Edition)
Availability and Integrity in hadoop (Strata EU Edition)Availability and Integrity in hadoop (Strata EU Edition)
Availability and Integrity in hadoop (Strata EU Edition)
Steve Loughran
 
Cassandra 1.1
Cassandra 1.1Cassandra 1.1
Cassandra 1.1jbellis
 
Cassandra Intro -- TheEdge2012
Cassandra Intro -- TheEdge2012Cassandra Intro -- TheEdge2012
Cassandra Intro -- TheEdge2012
robosonia Mar
 
Advanced Windows Debugging
Advanced Windows DebuggingAdvanced Windows Debugging
Advanced Windows Debugging
Bala Subra
 
Postgre sql run book
Postgre sql run bookPostgre sql run book
Postgre sql run book
Vasudeva Rao
 
The top five questions to ask about NoSQL. JONATHAN ELLIS at Big Data Spain 2012
The top five questions to ask about NoSQL. JONATHAN ELLIS at Big Data Spain 2012The top five questions to ask about NoSQL. JONATHAN ELLIS at Big Data Spain 2012
The top five questions to ask about NoSQL. JONATHAN ELLIS at Big Data Spain 2012
Big Data Spain
 

What's hot (8)

Cheatsheet of msdos
Cheatsheet of msdosCheatsheet of msdos
Cheatsheet of msdos
 
Logging Last Resource Optimization for Distributed Transactions in Oracle We...
Logging Last Resource Optimization for Distributed Transactions in  Oracle We...Logging Last Resource Optimization for Distributed Transactions in  Oracle We...
Logging Last Resource Optimization for Distributed Transactions in Oracle We...
 
Availability and Integrity in hadoop (Strata EU Edition)
Availability and Integrity in hadoop (Strata EU Edition)Availability and Integrity in hadoop (Strata EU Edition)
Availability and Integrity in hadoop (Strata EU Edition)
 
Cassandra 1.1
Cassandra 1.1Cassandra 1.1
Cassandra 1.1
 
Cassandra Intro -- TheEdge2012
Cassandra Intro -- TheEdge2012Cassandra Intro -- TheEdge2012
Cassandra Intro -- TheEdge2012
 
Advanced Windows Debugging
Advanced Windows DebuggingAdvanced Windows Debugging
Advanced Windows Debugging
 
Postgre sql run book
Postgre sql run bookPostgre sql run book
Postgre sql run book
 
The top five questions to ask about NoSQL. JONATHAN ELLIS at Big Data Spain 2012
The top five questions to ask about NoSQL. JONATHAN ELLIS at Big Data Spain 2012The top five questions to ask about NoSQL. JONATHAN ELLIS at Big Data Spain 2012
The top five questions to ask about NoSQL. JONATHAN ELLIS at Big Data Spain 2012
 

Similar to Cassandra Nosql

Introduce Apache Cassandra - JavaTwo Taiwan, 2012
Introduce Apache Cassandra - JavaTwo Taiwan, 2012Introduce Apache Cassandra - JavaTwo Taiwan, 2012
Introduce Apache Cassandra - JavaTwo Taiwan, 2012Boris Yen
 
DaStor/Cassandra report for CDR solution
DaStor/Cassandra report for CDR solutionDaStor/Cassandra report for CDR solution
DaStor/Cassandra report for CDR solution
Schubert Zhang
 
Cacheconcurrencyconsistency cassandra svcc
Cacheconcurrencyconsistency cassandra svccCacheconcurrencyconsistency cassandra svcc
Cacheconcurrencyconsistency cassandra svcc
srisatish ambati
 
"Mobage DBA Fight against Big Data" - NHN TE
"Mobage DBA Fight against Big Data" - NHN TE"Mobage DBA Fight against Big Data" - NHN TE
"Mobage DBA Fight against Big Data" - NHN TERyosuke IWANAGA
 
Scaling web applications with cassandra presentation
Scaling web applications with cassandra presentationScaling web applications with cassandra presentation
Scaling web applications with cassandra presentationMurat Çakal
 
Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQL
Yan Cui
 
Linked in nosql_atnetflix_2012_v1
Linked in nosql_atnetflix_2012_v1Linked in nosql_atnetflix_2012_v1
Linked in nosql_atnetflix_2012_v1
Sid Anand
 
Dbms &amp; oracle
Dbms &amp; oracleDbms &amp; oracle
Dbms &amp; oracle
J VijayaRaghavan
 
Ben Coverston - The Apache Cassandra Project
Ben Coverston - The Apache Cassandra ProjectBen Coverston - The Apache Cassandra Project
Ben Coverston - The Apache Cassandra Project
Morningstar Tech Talks
 
How nebula graph index works
How nebula graph index worksHow nebula graph index works
How nebula graph index works
Nebula Graph
 
Cassandra Tutorial
Cassandra TutorialCassandra Tutorial
Cassandra Tutorialmubarakss
 
Cassandra deep-dive @ NoSQLNow!
Cassandra deep-dive @ NoSQLNow!Cassandra deep-dive @ NoSQLNow!
Cassandra deep-dive @ NoSQLNow!
Acunu
 
SQL Server Deep Dive, Denis Reznik
SQL Server Deep Dive, Denis ReznikSQL Server Deep Dive, Denis Reznik
SQL Server Deep Dive, Denis Reznik
Sigma Software
 
What’s Evolving in the Elastic Stack
What’s Evolving in the Elastic StackWhat’s Evolving in the Elastic Stack
What’s Evolving in the Elastic Stack
Elasticsearch
 
2013 london advanced-replication
2013 london advanced-replication2013 london advanced-replication
2013 london advanced-replication
Marc Schwering
 
Oracle 12.2 sharded database management
Oracle 12.2 sharded database managementOracle 12.2 sharded database management
Oracle 12.2 sharded database management
Leyi (Kamus) Zhang
 
DBMS Chapter-3.ppsx
DBMS Chapter-3.ppsxDBMS Chapter-3.ppsx
DBMS Chapter-3.ppsx
DharmikPatel745100
 
A Deep Dive Into Understanding Apache Cassandra
A Deep Dive Into Understanding Apache CassandraA Deep Dive Into Understanding Apache Cassandra
A Deep Dive Into Understanding Apache Cassandra
DataStax Academy
 
Cassandra introduction apache con 2014 budapest
Cassandra introduction apache con 2014 budapestCassandra introduction apache con 2014 budapest
Cassandra introduction apache con 2014 budapest
Duyhai Doan
 

Similar to Cassandra Nosql (20)

Cassandra NoSQL
Cassandra NoSQLCassandra NoSQL
Cassandra NoSQL
 
Introduce Apache Cassandra - JavaTwo Taiwan, 2012
Introduce Apache Cassandra - JavaTwo Taiwan, 2012Introduce Apache Cassandra - JavaTwo Taiwan, 2012
Introduce Apache Cassandra - JavaTwo Taiwan, 2012
 
DaStor/Cassandra report for CDR solution
DaStor/Cassandra report for CDR solutionDaStor/Cassandra report for CDR solution
DaStor/Cassandra report for CDR solution
 
Cacheconcurrencyconsistency cassandra svcc
Cacheconcurrencyconsistency cassandra svccCacheconcurrencyconsistency cassandra svcc
Cacheconcurrencyconsistency cassandra svcc
 
"Mobage DBA Fight against Big Data" - NHN TE
"Mobage DBA Fight against Big Data" - NHN TE"Mobage DBA Fight against Big Data" - NHN TE
"Mobage DBA Fight against Big Data" - NHN TE
 
Scaling web applications with cassandra presentation
Scaling web applications with cassandra presentationScaling web applications with cassandra presentation
Scaling web applications with cassandra presentation
 
Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQL
 
Linked in nosql_atnetflix_2012_v1
Linked in nosql_atnetflix_2012_v1Linked in nosql_atnetflix_2012_v1
Linked in nosql_atnetflix_2012_v1
 
Dbms &amp; oracle
Dbms &amp; oracleDbms &amp; oracle
Dbms &amp; oracle
 
Ben Coverston - The Apache Cassandra Project
Ben Coverston - The Apache Cassandra ProjectBen Coverston - The Apache Cassandra Project
Ben Coverston - The Apache Cassandra Project
 
How nebula graph index works
How nebula graph index worksHow nebula graph index works
How nebula graph index works
 
Cassandra Tutorial
Cassandra TutorialCassandra Tutorial
Cassandra Tutorial
 
Cassandra deep-dive @ NoSQLNow!
Cassandra deep-dive @ NoSQLNow!Cassandra deep-dive @ NoSQLNow!
Cassandra deep-dive @ NoSQLNow!
 
SQL Server Deep Dive, Denis Reznik
SQL Server Deep Dive, Denis ReznikSQL Server Deep Dive, Denis Reznik
SQL Server Deep Dive, Denis Reznik
 
What’s Evolving in the Elastic Stack
What’s Evolving in the Elastic StackWhat’s Evolving in the Elastic Stack
What’s Evolving in the Elastic Stack
 
2013 london advanced-replication
2013 london advanced-replication2013 london advanced-replication
2013 london advanced-replication
 
Oracle 12.2 sharded database management
Oracle 12.2 sharded database managementOracle 12.2 sharded database management
Oracle 12.2 sharded database management
 
DBMS Chapter-3.ppsx
DBMS Chapter-3.ppsxDBMS Chapter-3.ppsx
DBMS Chapter-3.ppsx
 
A Deep Dive Into Understanding Apache Cassandra
A Deep Dive Into Understanding Apache CassandraA Deep Dive Into Understanding Apache Cassandra
A Deep Dive Into Understanding Apache Cassandra
 
Cassandra introduction apache con 2014 budapest
Cassandra introduction apache con 2014 budapestCassandra introduction apache con 2014 budapest
Cassandra introduction apache con 2014 budapest
 

More from elliando dias

Clojurescript slides
Clojurescript slidesClojurescript slides
Clojurescript slides
elliando dias
 
Why you should be excited about ClojureScript
Why you should be excited about ClojureScriptWhy you should be excited about ClojureScript
Why you should be excited about ClojureScriptelliando dias
 
Functional Programming with Immutable Data Structures
Functional Programming with Immutable Data StructuresFunctional Programming with Immutable Data Structures
Functional Programming with Immutable Data Structures
elliando dias
 
Nomenclatura e peças de container
Nomenclatura  e peças de containerNomenclatura  e peças de container
Nomenclatura e peças de containerelliando dias
 
Polyglot and Poly-paradigm Programming for Better Agility
Polyglot and Poly-paradigm Programming for Better AgilityPolyglot and Poly-paradigm Programming for Better Agility
Polyglot and Poly-paradigm Programming for Better Agilityelliando dias
 
Javascript Libraries
Javascript LibrariesJavascript Libraries
Javascript Librarieselliando dias
 
How to Make an Eight Bit Computer and Save the World!
How to Make an Eight Bit Computer and Save the World!How to Make an Eight Bit Computer and Save the World!
How to Make an Eight Bit Computer and Save the World!elliando dias
 
A Practical Guide to Connecting Hardware to the Web
A Practical Guide to Connecting Hardware to the WebA Practical Guide to Connecting Hardware to the Web
A Practical Guide to Connecting Hardware to the Webelliando dias
 
Introdução ao Arduino
Introdução ao ArduinoIntrodução ao Arduino
Introdução ao Arduinoelliando dias
 
Incanter Data Sorcery
Incanter Data SorceryIncanter Data Sorcery
Incanter Data Sorceryelliando dias
 
Fab.in.a.box - Fab Academy: Machine Design
Fab.in.a.box - Fab Academy: Machine DesignFab.in.a.box - Fab Academy: Machine Design
Fab.in.a.box - Fab Academy: Machine Designelliando dias
 
The Digital Revolution: Machines that makes
The Digital Revolution: Machines that makesThe Digital Revolution: Machines that makes
The Digital Revolution: Machines that makes
elliando dias
 
Hadoop - Simple. Scalable.
Hadoop - Simple. Scalable.Hadoop - Simple. Scalable.
Hadoop - Simple. Scalable.elliando dias
 
Hadoop and Hive Development at Facebook
Hadoop and Hive Development at FacebookHadoop and Hive Development at Facebook
Hadoop and Hive Development at Facebookelliando dias
 
Multi-core Parallelization in Clojure - a Case Study
Multi-core Parallelization in Clojure - a Case StudyMulti-core Parallelization in Clojure - a Case Study
Multi-core Parallelization in Clojure - a Case Studyelliando dias
 

More from elliando dias (20)

Clojurescript slides
Clojurescript slidesClojurescript slides
Clojurescript slides
 
Why you should be excited about ClojureScript
Why you should be excited about ClojureScriptWhy you should be excited about ClojureScript
Why you should be excited about ClojureScript
 
Functional Programming with Immutable Data Structures
Functional Programming with Immutable Data StructuresFunctional Programming with Immutable Data Structures
Functional Programming with Immutable Data Structures
 
Nomenclatura e peças de container
Nomenclatura  e peças de containerNomenclatura  e peças de container
Nomenclatura e peças de container
 
Geometria Projetiva
Geometria ProjetivaGeometria Projetiva
Geometria Projetiva
 
Polyglot and Poly-paradigm Programming for Better Agility
Polyglot and Poly-paradigm Programming for Better AgilityPolyglot and Poly-paradigm Programming for Better Agility
Polyglot and Poly-paradigm Programming for Better Agility
 
Javascript Libraries
Javascript LibrariesJavascript Libraries
Javascript Libraries
 
How to Make an Eight Bit Computer and Save the World!
How to Make an Eight Bit Computer and Save the World!How to Make an Eight Bit Computer and Save the World!
How to Make an Eight Bit Computer and Save the World!
 
Ragel talk
Ragel talkRagel talk
Ragel talk
 
A Practical Guide to Connecting Hardware to the Web
A Practical Guide to Connecting Hardware to the WebA Practical Guide to Connecting Hardware to the Web
A Practical Guide to Connecting Hardware to the Web
 
Introdução ao Arduino
Introdução ao ArduinoIntrodução ao Arduino
Introdução ao Arduino
 
Minicurso arduino
Minicurso arduinoMinicurso arduino
Minicurso arduino
 
Incanter Data Sorcery
Incanter Data SorceryIncanter Data Sorcery
Incanter Data Sorcery
 
Rango
RangoRango
Rango
 
Fab.in.a.box - Fab Academy: Machine Design
Fab.in.a.box - Fab Academy: Machine DesignFab.in.a.box - Fab Academy: Machine Design
Fab.in.a.box - Fab Academy: Machine Design
 
The Digital Revolution: Machines that makes
The Digital Revolution: Machines that makesThe Digital Revolution: Machines that makes
The Digital Revolution: Machines that makes
 
Hadoop + Clojure
Hadoop + ClojureHadoop + Clojure
Hadoop + Clojure
 
Hadoop - Simple. Scalable.
Hadoop - Simple. Scalable.Hadoop - Simple. Scalable.
Hadoop - Simple. Scalable.
 
Hadoop and Hive Development at Facebook
Hadoop and Hive Development at FacebookHadoop and Hive Development at Facebook
Hadoop and Hive Development at Facebook
 
Multi-core Parallelization in Clojure - a Case Study
Multi-core Parallelization in Clojure - a Case StudyMulti-core Parallelization in Clojure - a Case Study
Multi-core Parallelization in Clojure - a Case Study
 

Recently uploaded

Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Product School
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 

Recently uploaded (20)

Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 

Cassandra Nosql

  • 1. Cassandra Structured Storage System over a P2P Network Avinash Lakshman, Prashant Malik
  • 2. Why Cassandra? • Lots of data – Copies of messages, reverse indices of messages, per user data. • Many incoming requests resulting in a lot of random reads and random writes. • No existing production ready solutions in the market meet these requirements.
  • 3. Design Goals • High availability • Eventual consistency – trade-off strong consistency in favor of high availability • Incremental scalability • Optimistic Replication • “Knobs” to tune tradeoffs between consistency, durability and latency • Low total cost of ownership • Minimal administration
  • 4. Data Model Columns are added and ColumnFamily1 Name : MailList modified Type : Simple Sort : Name KEY Name : tid1 Name : tid2 Name : tid3 dynamically Name : tid4 Value : <Binary> Value : <Binary> Value : <Binary> Value : <Binary> TimeStamp : t1 TimeStamp : t2 TimeStamp : t3 TimeStamp : t4 ColumnFamily2 Name : WordList Type : Super Sort : Time Column Families Name : aloha Name : dude are declared C1 C2 C3 C4 C2 C6 upfront SuperColumns V1 V2 V3 V4 V2 V6 are added and T1 T2 T3 T4 T2 T6 modified Columns are dynamically added and modified ColumnFamily3 Name : System Type : Super Sort : Name dynamically Name : hint1 Name : hint2 Name : hint3 Name : hint4 <Column List> <Column List> <Column List> <Column List>
  • 5. Write Operations • A client issues a write request to a random node in the Cassandra cluster. • The “Partitioner” determines the nodes responsible for the data. • Locally, write operations are logged and then applied to an in-memory version. • Commit log is stored on a dedicated disk local to the machine.
  • 6. Write cont’d Key (CF1 , CF2 , CF3) • Data size • Number of Objects Memtable ( CF1) • Lifetime Commit Log Memtable ( CF2) Binary serialized Key ( CF1 , CF2 , CF3 ) Memtable ( CF2) Data file on disk <Key name><Size of key Data><Index of columns/supercolumns>< Serialized column family> K128 Offset --- --- K256 Offset BLOCK Index <Key Name> Offset, <Key Name> Offset Dedicated Disk K384 Offset --- --- Bloom Filter <Key name><Size of key Data><Index of columns/supercolumns>< Serialized column family> (Index in memory)
  • 7. Compactions K2 < Serialized data > K4 < Serialized data > K1 < Serialized data > K10 < Serialized data > K5 < Serialized data > K2 < Serialized data > K30 < Serialized data > K10 < Serialized data > K3 < Serialized data > DELETED -- -- -- Sorted -- Sorted -- Sorted -- -- -- -- MERGE SORT Index File K1 < Serialized data > Loaded in memory K2 < Serialized data > K3 < Serialized data > K1 Offset K4 < Serialized data > K5 Offset Sorted K5 < Serialized data > K30 Offset K10 < Serialized data > Bloom Filter K30 < Serialized data > Data File
  • 8. Write Properties • No locks in the critical path • Sequential disk access • Behaves like a write back Cache • Append support without read ahead • Atomicity guarantee for a key • “Always Writable” – accept writes during failure scenarios
  • 9. Read Client Query Result Cassandra Cluster Closest replica Result Read repair if digests differ Replica A Digest Query Digest Response Digest Response Replica B Replica C
  • 10. Partitioning And Replication 1 0 h(key1) E A N=3 C h(key2) F B D 1/2 10
  • 11. Cluster Membership and Failure Detection • Gossip protocol is used for cluster membership. • Super lightweight with mathematically provable properties. • State disseminated in O(logN) rounds where N is the number of nodes in the cluster. • Every T seconds each member increments its heartbeat counter and selects one other member to send its list to. • A member merges the list with its own list .
  • 12.
  • 13.
  • 14.
  • 15.
  • 16. Accrual Failure Detector • Valuable for system management, replication, load balancing etc. • Defined as a failure detector that outputs a value, PHI, associated with each process. • Also known as Adaptive Failure detectors - designed to adapt to changing network conditions. • The value output, PHI, represents a suspicion level. • Applications set an appropriate threshold, trigger suspicions and perform appropriate actions. • In Cassandra the average time taken to detect a failure is 10-15 seconds with the PHI threshold set at 5.
  • 17. Properties of the Failure Detector • If a process p is faulty, the suspicion level Φ(t) ∞as t ∞. • If a process p is faulty, there is a time after which Φ(t) is monotonic increasing. • A process p is correct Φ(t) has an ub over an infinite execution. • If process p is correct, then for any time T, Φ(t) = 0 for t >= T.
  • 18. Implementation • PHI estimation is done in three phases – Inter arrival times for each member are stored in a sampling window. – Estimate the distribution of the above inter arrival times. – Gossip follows an exponential distribution. – The value of PHI is now computed as follows: • Φ(t) = -log10( P(tnow – tlast) ) where P(t) is the CDF of an exponential distribution. P(t) denotes the probability that a heartbeat will arrive more than t units after the previous one. P(t) = ( 1 – e-tλ ) The overall mechanism is described in the figure below.
  • 19. Information Flow in the Implementation
  • 20. Performance Benchmark • Loading of data - limited by network bandwidth. • Read performance for Inbox Search in production: Search Interactions Term Search Min 7.69 ms 7.78 ms Median 15.69 ms 18.27 ms Average 26.13 ms 44.41 ms
  • 21. MySQL Comparison • MySQL > 50 GB Data Writes Average : ~300 ms Reads Average : ~350 ms • Cassandra > 50 GB Data Writes Average : 0.12 ms Reads Average : 15 ms
  • 22. Lessons Learnt • Add fancy features only when absolutely required. • Many types of failures are possible. • Big systems need proper systems-level monitoring. • Value simple designs
  • 23. Future work • Atomicity guarantees across multiple keys • Analysis support via Map/Reduce • Distributed transactions • Compression support • Granular security via ACL’s