SlideShare a Scribd company logo
1 of 36
Cassandra as an Email Store

    Rustam Aliyev • 20 Feb 2012
Emails sent worldwide




4.500.000/sec
Email Statistics Report 2009-2013, The Radicati Group.


                                                         2
Email storage problem


      MTA       LDA




            3
Email storage problem


       MTA        LDA




   Filesystem + RDBMS
             ≠
 Scalability + Availability

             4
ElasticInbox 1000 ft view

   MTA
                                                      …


                                       elasticinbox nodes
                                   load-balancing, share-nothing




                             Message                        Original
                            Metadata                        Message




                                                                           Blob Store
                                                                   (OpenStack, AWS S3, others)

    Metadata Store (Cassandra                                                         …
            Cluster)




               …


                                         5
Why Cassandra?

Horizontal Scalability

High Availability, no SPOF and Automatic
Replication

Flexible schema

Counters

Email storage does more writes than reads
  spam, sent mails, notifications, mailing lists, unread
  emails, ...

                           6
Why not Cassandra for
         BLOBs?
Thrift does not support streaming

  Value has to fit into memory

  Default max Thrift frame size is 5MB



Possible solution: split large files into 1MB
chunks

  Less than 2% of emails >1MB (in our case)

                        7
Why not Cassandra for
         BLOBs?
Wasted RAM / JVM Heap

  200 x 5MB messages R/W = 1GB RAM




                        8
Why not Cassandra for
         BLOBs?
Wasted RAM / JVM Heap

  200 x 5MB messages R/W = 1GB RAM

Wasted disk space

  When RF=3, disk space = 6 × data

  1TB data   6TB storage required!




                           8
Why not Cassandra for
         BLOBs?
Wasted RAM / JVM Heap

  200 x 5MB messages R/W = 1GB RAM

Wasted disk space

  When RF=3, disk space = 6 × data

  1TB data   6TB storage required!

Wasted CPU

  More CPU used during compactions




                           8
Why not Cassandra for
         BLOBs?
Wasted RAM / JVM Heap

  200 x 5MB messages R/W = 1GB RAM

Wasted disk space

  When RF=3, disk space = 6 × data

  1TB data   6TB storage required!

Wasted CPU

  More CPU used during compactions

Leveled Compaction Strategy?
  New (1.0+), less wasted storage but more I/O.

                            8
BLOB Stores for BLOBs
BLOB Stores are designed for storing BLOBs

Can store unlimited number of objects in a single
container.

AWS S3, OpenStack Object Store, and other 15
supported (thanks @jclouds!).

40%-50% more space efficient than BLOBs in
Cassandra (w/RF=3; 1TB    3.5TB, rather than
6TB).

Cons: much slower than Cassandra (no memtable).

                       9
Polyglot Persistence
Martin Fowler: “any decent sized enterprise will
have a variety of different data storage
technologies for different kinds of data”




Martin Fowler, 16 Nov 2011
Don't take the example in the diagram too seriously.


                                                       10
Data Model




    11
Data Model
NoSQL data model is driven by data access pattens:




                                11
Data Model
NoSQL data model is driven by data access pattens:

   Email is immutable




                                11
Data Model
NoSQL data model is driven by data access pattens:

   Email is immutable

   Mostly, very recent messages are accessed and updated




                                11
Data Model
NoSQL data model is driven by data access pattens:

   Email is immutable

   Mostly, very recent messages are accessed and updated




                                11
Data Model
NoSQL data model is driven by data access pattens:

   Email is immutable

   Mostly, very recent messages are accessed and updated



But sometimes, access pattens are driven by NoSQL data model:




                                11
Data Model
NoSQL data model is driven by data access pattens:

   Email is immutable

   Mostly, very recent messages are accessed and updated



But sometimes, access pattens are driven by NoSQL data model:

   Synergy between programming model and data model




                                11
Data Model
NoSQL data model is driven by data access pattens:

   Email is immutable

   Mostly, very recent messages are accessed and updated



But sometimes, access pattens are driven by NoSQL data model:

   Synergy between programming model and data model

   Some Gmail features driven BigTable limitations?




                                11
Data Model
NoSQL data model is driven by data access pattens:

   Email is immutable

   Mostly, very recent messages are accessed and updated



But sometimes, access pattens are driven by NoSQL data model:

   Synergy between programming model and data model

   Some Gmail features driven BigTable limitations?

      Labels instead of folders




                                  11
Data Model
NoSQL data model is driven by data access pattens:

   Email is immutable

   Mostly, very recent messages are accessed and updated



But sometimes, access pattens are driven by NoSQL data model:

   Synergy between programming model and data model

   Some Gmail features driven BigTable limitations?

      Labels instead of folders

      No custom sorting, only by time




                                  11
Data Model
NoSQL data model is driven by data access pattens:

   Email is immutable

   Mostly, very recent messages are accessed and updated



But sometimes, access pattens are driven by NoSQL data model:

   Synergy between programming model and data model

   Some Gmail features driven BigTable limitations?

      Labels instead of folders

      No custom sorting, only by time

   Other examples: “More” pagination


                                  11
Data Model ‒ Column
           Families
4 Column Families:
  MessageMetadata
  IndexLabels
  Accounts
  Counters


Account ID: String (user@domail.tld)
Message ID: TimeUUID
Label ID:       Integer


                          12
Data Model ‒ Accounts
Column Family
Reserved Labels: 0 = All Mails, 1 = Inbox, 2 =
Drafts, ...

        "Accounts" {
            "user@elasticinbox.com" {
                "label:0" : "all",
                "label:1" : "inbox",
                "label:2" : "drafts",
                "label:230": "Custom Label",
                ...
            }
        }




                           13
Data Model ‒ IndexLabels
Column Family
Composite Key : Account + Label ID
Messages ordered by time
 "IndexLabels" {
     "user@elasticinbox.com:0" {   # All Mails
         "550e8400-e29b-41d4-a716-446655440000"   : null,
         "892e8300-e29b-41d4-a716-446655440000"   : null,
         "a0232400-e29b-41d4-a716-446655440000"   : null,
         ...
     }
     "user@elasticinbox.com:1" {   # Inbox
         "550e8400-e29b-41d4-a716-446655440000"   : null,
         "892e8300-e29b-41d4-a716-446655440000"   : null,
         "a0232400-e29b-41d4-a716-446655440000"   : null,
         ...
     }
 }

                              14
Data Model ‒
      MessageMetadata
SuperColumn Family

Stores message metadata and pre-parsed
contents

 Message headers, body and attachment info

TimeUUID as unique Message ID, ordered by
time




                     15
Data Model ‒
                MessageMetadata
"MessageMetadata" {
    "user@elasticinbox.com" {
        "550e8400-e29b-41d4-a716-446655440000" {
            "from"    : "[['Test','test@elasticinbox.com']]",
            "to"      : "[['Me','user@elasticinbox.com'],[…]]",
            "subject" : "Hello world!",
            "date"    : "12 March 2011 01:12:00",
            "uri"     : "blob://aws-s3/550e8400-e29b-41d4-a716-446655440000",
            "l:1"     : null,   # Label ID
            "m:1"     : null,   # Marker ID
            "html"    : "<html><body>This is message body</body></html>",
            "parts"   : "{'2.1': {'filename': 'image.png', ...}}",
            ...
        }
        "892e8300-e29b-41d4-a716-446655440000" {
            ...
        }
        ...
    }
}

                                     16
Data Model ‒
           MessageMetadata
Query: List 30 newest messages with label “Inbox”
   ids[] = SliceQuery(“IndexLabels”, “user@dom.tld:1”, 30)

   msg[] = MultigetQuery(“MessageMetadata”, “user@dom.tld”, ids[])



 Row Key                  SuperColumn                  SubColumns
                                                       "from"      :   "..."
                                                       "to"        :   "..."
                550e8400-e29b-41d4-a716-446655440000
                                                       "subject"   :   "..."
user@dom.tld                                           "html"      :   "..."

                892e8300-e29b-41d4-a716-446655440000        - // -
                a0232400-e29b-41d4-a716-446655440000        - // -
                e5586600-f81d-11df-8cc2-080027267700        - // -
some@dom2.tld
                e5595060-f81d-11df-bc91-080027267700        - // -


                                 17
Data Model ‒ Counters

SuperColumn Family
Account’s all counters are on the same node
"Counters" {
    "user@elasticinbox.com"    {
        "l:0" {
             "total_bytes" :   18239090,
             "total_msg"   :   394,
             "new_msg"     :   12
        }
        "l:1" {
             "total_msg"   :   144,
             "new_msg"     :   10
        }
        ...
}


                                   18
Data Model ‒ Counters

SuperColumn Family
Account’s all counters are on the same node
"Counters" {
                                           Non-atomic
    "user@elasticinbox.com"    {           Counters
        "l:0" {
             "total_bytes" :   18239090,   It’s easy to
                                           miscount
             "total_msg"   :   394,
             "new_msg"     :   12
        }
        "l:1" {
             "total_msg"   :   144,
             "new_msg"     :   10
        }
        ...
}


                                   18
ElasticInbox in Production
In production since Nov 2011

~200K accounts, 30M+ messages

4 node cluster, RF=3, Cassandra 0.8.x

Each 1TB of raw mails = 70GB in Cassandra

  Metadata + LZF compressed email text/html
  body




                      19
ElasticInbox in Production
Cassandra load : 40 requests per second per
node

Cassandra latency: 10ms read average, 0.02ms
write

Write to Read ratio:
         CF Name            W:R Ratio
     MessageMetadata           3:1
        IndexLabels            2:1
         Accounts             1:50
         Counters              2:3


                       20
Future work

Performance improvements (may involve minor
schema changes)

Full-text search (preferably on top of Cassandra)

POP3 and IMAP

Built-in filtering rules

Message threads / conversations



                          21
Questions?


www.elasticinbox.com

github.com/elasticinbox

@elasticinbox   @rstml

More Related Content

What's hot

NoSQL-Database-Concepts
NoSQL-Database-ConceptsNoSQL-Database-Concepts
NoSQL-Database-Concepts
Bhaskar Gunda
 
NoSQL: Why, When, and How
NoSQL: Why, When, and HowNoSQL: Why, When, and How
NoSQL: Why, When, and How
BigBlueHat
 
Introduction to NoSQL Databases
Introduction to NoSQL DatabasesIntroduction to NoSQL Databases
Introduction to NoSQL Databases
Derek Stainer
 
MongoDB- Crud Operation
MongoDB- Crud OperationMongoDB- Crud Operation
MongoDB- Crud Operation
Edureka!
 
Using Cassandra with your Web Application
Using Cassandra with your Web ApplicationUsing Cassandra with your Web Application
Using Cassandra with your Web Application
supertom
 

What's hot (20)

NoSQL-Database-Concepts
NoSQL-Database-ConceptsNoSQL-Database-Concepts
NoSQL-Database-Concepts
 
Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQL
 
Low-Latency Analytics with NoSQL – Introduction to Storm and Cassandra
Low-Latency Analytics with NoSQL – Introduction to Storm and CassandraLow-Latency Analytics with NoSQL – Introduction to Storm and Cassandra
Low-Latency Analytics with NoSQL – Introduction to Storm and Cassandra
 
NoSQL Databases: Why, what and when
NoSQL Databases: Why, what and whenNoSQL Databases: Why, what and when
NoSQL Databases: Why, what and when
 
NoSQL: Why, When, and How
NoSQL: Why, When, and HowNoSQL: Why, When, and How
NoSQL: Why, When, and How
 
A Hitchhiker's Guide to NOSQL v1.0
A Hitchhiker's Guide to NOSQL v1.0A Hitchhiker's Guide to NOSQL v1.0
A Hitchhiker's Guide to NOSQL v1.0
 
Schemaless Databases
Schemaless DatabasesSchemaless Databases
Schemaless Databases
 
Nosql seminar
Nosql seminarNosql seminar
Nosql seminar
 
Introduction to NoSQL Databases
Introduction to NoSQL DatabasesIntroduction to NoSQL Databases
Introduction to NoSQL Databases
 
The Cassandra Distributed Database
The Cassandra Distributed DatabaseThe Cassandra Distributed Database
The Cassandra Distributed Database
 
Data models in NoSQL
Data models in NoSQLData models in NoSQL
Data models in NoSQL
 
Mongodb - NoSql Database
Mongodb - NoSql DatabaseMongodb - NoSql Database
Mongodb - NoSql Database
 
NOSQL Databases types and Uses
NOSQL Databases types and UsesNOSQL Databases types and Uses
NOSQL Databases types and Uses
 
Oracle vs NoSQL – The good, the bad and the ugly
Oracle vs NoSQL – The good, the bad and the uglyOracle vs NoSQL – The good, the bad and the ugly
Oracle vs NoSQL – The good, the bad and the ugly
 
MongoDB- Crud Operation
MongoDB- Crud OperationMongoDB- Crud Operation
MongoDB- Crud Operation
 
NoSQL databases
NoSQL databasesNoSQL databases
NoSQL databases
 
Using Cassandra with your Web Application
Using Cassandra with your Web ApplicationUsing Cassandra with your Web Application
Using Cassandra with your Web Application
 
An Overview of Apache Cassandra
An Overview of Apache CassandraAn Overview of Apache Cassandra
An Overview of Apache Cassandra
 
Sql vs NO-SQL database differences explained
Sql vs NO-SQL database differences explainedSql vs NO-SQL database differences explained
Sql vs NO-SQL database differences explained
 
SQL/NoSQL How to choose ?
SQL/NoSQL How to choose ?SQL/NoSQL How to choose ?
SQL/NoSQL How to choose ?
 

Similar to ElasticInbox

http://www.hfadeel.com/Blog/?p=151
http://www.hfadeel.com/Blog/?p=151http://www.hfadeel.com/Blog/?p=151
http://www.hfadeel.com/Blog/?p=151
xlight
 
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
Databricks
 
Mongodb in-anger-boston-rb-2011
Mongodb in-anger-boston-rb-2011Mongodb in-anger-boston-rb-2011
Mongodb in-anger-boston-rb-2011
bostonrb
 
Architectural Anti Patterns - Notes on Data Distribution and Handling Failures
Architectural Anti Patterns - Notes on Data Distribution and Handling FailuresArchitectural Anti Patterns - Notes on Data Distribution and Handling Failures
Architectural Anti Patterns - Notes on Data Distribution and Handling Failures
Gleicon Moraes
 

Similar to ElasticInbox (20)

Building and deploying large scale real time news system with my sql and dist...
Building and deploying large scale real time news system with my sql and dist...Building and deploying large scale real time news system with my sql and dist...
Building and deploying large scale real time news system with my sql and dist...
 
No SQL and MongoDB - Hyderabad Scalability Meetup
No SQL and MongoDB - Hyderabad Scalability MeetupNo SQL and MongoDB - Hyderabad Scalability Meetup
No SQL and MongoDB - Hyderabad Scalability Meetup
 
Architectural anti-patterns for data handling
Architectural anti-patterns for data handlingArchitectural anti-patterns for data handling
Architectural anti-patterns for data handling
 
Scalable database, Scalable language @ JDC 2013
Scalable database, Scalable language @ JDC 2013Scalable database, Scalable language @ JDC 2013
Scalable database, Scalable language @ JDC 2013
 
Vote NO for MySQL
Vote NO for MySQLVote NO for MySQL
Vote NO for MySQL
 
No sq lv1_0
No sq lv1_0No sq lv1_0
No sq lv1_0
 
DBVersity MongoDB Online Training Presentations
DBVersity MongoDB Online Training PresentationsDBVersity MongoDB Online Training Presentations
DBVersity MongoDB Online Training Presentations
 
Serverless Analytics with Amazon Redshift Spectrum, AWS Glue, and Amazon Quic...
Serverless Analytics with Amazon Redshift Spectrum, AWS Glue, and Amazon Quic...Serverless Analytics with Amazon Redshift Spectrum, AWS Glue, and Amazon Quic...
Serverless Analytics with Amazon Redshift Spectrum, AWS Glue, and Amazon Quic...
 
http://www.hfadeel.com/Blog/?p=151
http://www.hfadeel.com/Blog/?p=151http://www.hfadeel.com/Blog/?p=151
http://www.hfadeel.com/Blog/?p=151
 
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
 
Architectural anti patterns_for_data_handling
Architectural anti patterns_for_data_handlingArchitectural anti patterns_for_data_handling
Architectural anti patterns_for_data_handling
 
Making MySQL Agile-ish
Making MySQL Agile-ishMaking MySQL Agile-ish
Making MySQL Agile-ish
 
Building Scalable and Extendable Data Pipeline for Call of Duty Games: Lesson...
Building Scalable and Extendable Data Pipeline for Call of Duty Games: Lesson...Building Scalable and Extendable Data Pipeline for Call of Duty Games: Lesson...
Building Scalable and Extendable Data Pipeline for Call of Duty Games: Lesson...
 
Mongodb in-anger-boston-rb-2011
Mongodb in-anger-boston-rb-2011Mongodb in-anger-boston-rb-2011
Mongodb in-anger-boston-rb-2011
 
Architectural Anti Patterns - Notes on Data Distribution and Handling Failures
Architectural Anti Patterns - Notes on Data Distribution and Handling FailuresArchitectural Anti Patterns - Notes on Data Distribution and Handling Failures
Architectural Anti Patterns - Notes on Data Distribution and Handling Failures
 
No sql - { If and Else }
No sql - { If and Else }No sql - { If and Else }
No sql - { If and Else }
 
Mysql
MysqlMysql
Mysql
 
MySQL And Search At Craigslist
MySQL And Search At CraigslistMySQL And Search At Craigslist
MySQL And Search At Craigslist
 
AWS re:Invent 2016: Real-Time Data Exploration and Analytics with Amazon Elas...
AWS re:Invent 2016: Real-Time Data Exploration and Analytics with Amazon Elas...AWS re:Invent 2016: Real-Time Data Exploration and Analytics with Amazon Elas...
AWS re:Invent 2016: Real-Time Data Exploration and Analytics with Amazon Elas...
 
Data massage: How databases have been scaled from one to one million nodes
Data massage: How databases have been scaled from one to one million nodesData massage: How databases have been scaled from one to one million nodes
Data massage: How databases have been scaled from one to one million nodes
 

Recently uploaded

Structuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessStructuring Teams and Portfolios for Success
Structuring Teams and Portfolios for Success
UXDXConf
 

Recently uploaded (20)

Intro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджераIntro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджера
 
A Business-Centric Approach to Design System Strategy
A Business-Centric Approach to Design System StrategyA Business-Centric Approach to Design System Strategy
A Business-Centric Approach to Design System Strategy
 
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone KomSalesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
 
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi IbrahimzadeFree and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
 
Optimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through ObservabilityOptimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through Observability
 
Designing for Hardware Accessibility at Comcast
Designing for Hardware Accessibility at ComcastDesigning for Hardware Accessibility at Comcast
Designing for Hardware Accessibility at Comcast
 
Demystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyDemystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John Staveley
 
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
 
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya HalderCustom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
 
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
 
The Metaverse: Are We There Yet?
The  Metaverse:    Are   We  There  Yet?The  Metaverse:    Are   We  There  Yet?
The Metaverse: Are We There Yet?
 
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
 
WSO2CONMay2024OpenSourceConferenceDebrief.pptx
WSO2CONMay2024OpenSourceConferenceDebrief.pptxWSO2CONMay2024OpenSourceConferenceDebrief.pptx
WSO2CONMay2024OpenSourceConferenceDebrief.pptx
 
IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024
 
Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutes
 
AI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří KarpíšekAI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří Karpíšek
 
Strategic AI Integration in Engineering Teams
Strategic AI Integration in Engineering TeamsStrategic AI Integration in Engineering Teams
Strategic AI Integration in Engineering Teams
 
Structuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessStructuring Teams and Portfolios for Success
Structuring Teams and Portfolios for Success
 
Top 10 Symfony Development Companies 2024
Top 10 Symfony Development Companies 2024Top 10 Symfony Development Companies 2024
Top 10 Symfony Development Companies 2024
 
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdfWhere to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
 

ElasticInbox

  • 1. Cassandra as an Email Store Rustam Aliyev • 20 Feb 2012
  • 2. Emails sent worldwide 4.500.000/sec Email Statistics Report 2009-2013, The Radicati Group. 2
  • 4. Email storage problem MTA LDA Filesystem + RDBMS ≠ Scalability + Availability 4
  • 5. ElasticInbox 1000 ft view MTA … elasticinbox nodes load-balancing, share-nothing Message Original Metadata Message Blob Store (OpenStack, AWS S3, others) Metadata Store (Cassandra … Cluster) … 5
  • 6. Why Cassandra? Horizontal Scalability High Availability, no SPOF and Automatic Replication Flexible schema Counters Email storage does more writes than reads spam, sent mails, notifications, mailing lists, unread emails, ... 6
  • 7. Why not Cassandra for BLOBs? Thrift does not support streaming Value has to fit into memory Default max Thrift frame size is 5MB Possible solution: split large files into 1MB chunks Less than 2% of emails >1MB (in our case) 7
  • 8. Why not Cassandra for BLOBs? Wasted RAM / JVM Heap 200 x 5MB messages R/W = 1GB RAM 8
  • 9. Why not Cassandra for BLOBs? Wasted RAM / JVM Heap 200 x 5MB messages R/W = 1GB RAM Wasted disk space When RF=3, disk space = 6 × data 1TB data 6TB storage required! 8
  • 10. Why not Cassandra for BLOBs? Wasted RAM / JVM Heap 200 x 5MB messages R/W = 1GB RAM Wasted disk space When RF=3, disk space = 6 × data 1TB data 6TB storage required! Wasted CPU More CPU used during compactions 8
  • 11. Why not Cassandra for BLOBs? Wasted RAM / JVM Heap 200 x 5MB messages R/W = 1GB RAM Wasted disk space When RF=3, disk space = 6 × data 1TB data 6TB storage required! Wasted CPU More CPU used during compactions Leveled Compaction Strategy? New (1.0+), less wasted storage but more I/O. 8
  • 12. BLOB Stores for BLOBs BLOB Stores are designed for storing BLOBs Can store unlimited number of objects in a single container. AWS S3, OpenStack Object Store, and other 15 supported (thanks @jclouds!). 40%-50% more space efficient than BLOBs in Cassandra (w/RF=3; 1TB 3.5TB, rather than 6TB). Cons: much slower than Cassandra (no memtable). 9
  • 13. Polyglot Persistence Martin Fowler: “any decent sized enterprise will have a variety of different data storage technologies for different kinds of data” Martin Fowler, 16 Nov 2011 Don't take the example in the diagram too seriously. 10
  • 15. Data Model NoSQL data model is driven by data access pattens: 11
  • 16. Data Model NoSQL data model is driven by data access pattens: Email is immutable 11
  • 17. Data Model NoSQL data model is driven by data access pattens: Email is immutable Mostly, very recent messages are accessed and updated 11
  • 18. Data Model NoSQL data model is driven by data access pattens: Email is immutable Mostly, very recent messages are accessed and updated 11
  • 19. Data Model NoSQL data model is driven by data access pattens: Email is immutable Mostly, very recent messages are accessed and updated But sometimes, access pattens are driven by NoSQL data model: 11
  • 20. Data Model NoSQL data model is driven by data access pattens: Email is immutable Mostly, very recent messages are accessed and updated But sometimes, access pattens are driven by NoSQL data model: Synergy between programming model and data model 11
  • 21. Data Model NoSQL data model is driven by data access pattens: Email is immutable Mostly, very recent messages are accessed and updated But sometimes, access pattens are driven by NoSQL data model: Synergy between programming model and data model Some Gmail features driven BigTable limitations? 11
  • 22. Data Model NoSQL data model is driven by data access pattens: Email is immutable Mostly, very recent messages are accessed and updated But sometimes, access pattens are driven by NoSQL data model: Synergy between programming model and data model Some Gmail features driven BigTable limitations? Labels instead of folders 11
  • 23. Data Model NoSQL data model is driven by data access pattens: Email is immutable Mostly, very recent messages are accessed and updated But sometimes, access pattens are driven by NoSQL data model: Synergy between programming model and data model Some Gmail features driven BigTable limitations? Labels instead of folders No custom sorting, only by time 11
  • 24. Data Model NoSQL data model is driven by data access pattens: Email is immutable Mostly, very recent messages are accessed and updated But sometimes, access pattens are driven by NoSQL data model: Synergy between programming model and data model Some Gmail features driven BigTable limitations? Labels instead of folders No custom sorting, only by time Other examples: “More” pagination 11
  • 25. Data Model ‒ Column Families 4 Column Families: MessageMetadata IndexLabels Accounts Counters Account ID: String (user@domail.tld) Message ID: TimeUUID Label ID: Integer 12
  • 26. Data Model ‒ Accounts Column Family Reserved Labels: 0 = All Mails, 1 = Inbox, 2 = Drafts, ... "Accounts" { "user@elasticinbox.com" { "label:0" : "all", "label:1" : "inbox", "label:2" : "drafts", "label:230": "Custom Label", ... } } 13
  • 27. Data Model ‒ IndexLabels Column Family Composite Key : Account + Label ID Messages ordered by time "IndexLabels" { "user@elasticinbox.com:0" { # All Mails "550e8400-e29b-41d4-a716-446655440000" : null, "892e8300-e29b-41d4-a716-446655440000" : null, "a0232400-e29b-41d4-a716-446655440000" : null, ... } "user@elasticinbox.com:1" { # Inbox "550e8400-e29b-41d4-a716-446655440000" : null, "892e8300-e29b-41d4-a716-446655440000" : null, "a0232400-e29b-41d4-a716-446655440000" : null, ... } } 14
  • 28. Data Model ‒ MessageMetadata SuperColumn Family Stores message metadata and pre-parsed contents Message headers, body and attachment info TimeUUID as unique Message ID, ordered by time 15
  • 29. Data Model ‒ MessageMetadata "MessageMetadata" { "user@elasticinbox.com" { "550e8400-e29b-41d4-a716-446655440000" { "from" : "[['Test','test@elasticinbox.com']]", "to" : "[['Me','user@elasticinbox.com'],[…]]", "subject" : "Hello world!", "date" : "12 March 2011 01:12:00", "uri" : "blob://aws-s3/550e8400-e29b-41d4-a716-446655440000", "l:1" : null, # Label ID "m:1" : null, # Marker ID "html" : "<html><body>This is message body</body></html>", "parts" : "{'2.1': {'filename': 'image.png', ...}}", ... } "892e8300-e29b-41d4-a716-446655440000" { ... } ... } } 16
  • 30. Data Model ‒ MessageMetadata Query: List 30 newest messages with label “Inbox” ids[] = SliceQuery(“IndexLabels”, “user@dom.tld:1”, 30) msg[] = MultigetQuery(“MessageMetadata”, “user@dom.tld”, ids[]) Row Key SuperColumn SubColumns "from" : "..." "to" : "..." 550e8400-e29b-41d4-a716-446655440000 "subject" : "..." user@dom.tld "html" : "..." 892e8300-e29b-41d4-a716-446655440000 - // - a0232400-e29b-41d4-a716-446655440000 - // - e5586600-f81d-11df-8cc2-080027267700 - // - some@dom2.tld e5595060-f81d-11df-bc91-080027267700 - // - 17
  • 31. Data Model ‒ Counters SuperColumn Family Account’s all counters are on the same node "Counters" { "user@elasticinbox.com" { "l:0" { "total_bytes" : 18239090, "total_msg" : 394, "new_msg" : 12 } "l:1" { "total_msg" : 144, "new_msg" : 10 } ... } 18
  • 32. Data Model ‒ Counters SuperColumn Family Account’s all counters are on the same node "Counters" { Non-atomic "user@elasticinbox.com" { Counters "l:0" { "total_bytes" : 18239090, It’s easy to miscount "total_msg" : 394, "new_msg" : 12 } "l:1" { "total_msg" : 144, "new_msg" : 10 } ... } 18
  • 33. ElasticInbox in Production In production since Nov 2011 ~200K accounts, 30M+ messages 4 node cluster, RF=3, Cassandra 0.8.x Each 1TB of raw mails = 70GB in Cassandra Metadata + LZF compressed email text/html body 19
  • 34. ElasticInbox in Production Cassandra load : 40 requests per second per node Cassandra latency: 10ms read average, 0.02ms write Write to Read ratio: CF Name W:R Ratio MessageMetadata 3:1 IndexLabels 2:1 Accounts 1:50 Counters 2:3 20
  • 35. Future work Performance improvements (may involve minor schema changes) Full-text search (preferably on top of Cassandra) POP3 and IMAP Built-in filtering rules Message threads / conversations 21

Editor's Notes

  1. \n
  2. \n
  3. \n
  4. \n
  5. \n
  6. \n
  7. \n
  8. Compaction everywhere, Outlook example\n
  9. Compaction everywhere, Outlook example\n
  10. Compaction everywhere, Outlook example\n
  11. Compaction everywhere, Outlook example\n
  12. OpenStack Swift has similarities with Cassandra design.\n
  13. \n
  14. \n
  15. \n
  16. \n
  17. \n
  18. \n
  19. \n
  20. \n
  21. \n
  22. \n
  23. \n
  24. \n
  25. \n
  26. \n
  27. \n
  28. \n
  29. SuperColumns implementation planned to be replaced in Cassandra 1.2\n
  30. Designed for ad/page impression count at Digg/Twitter\n
  31. Designed for ad/page impression count at Digg/Twitter\n
  32. Designed for ad/page impression count at Digg/Twitter\n
  33. \n
  34. \n
  35. \n
  36. \n