SortaSQL
Ian Pye <ian@cloudflare.com>
Motivation
Everyone likes SQL
• Tables
• Joins
• Online Transaction Processing
• Transactions!
• Arbitrary Queries
Scaling?
• What happens to joins when your data
  doesn’t fit in memory?
• I only need get and set for my data
• Sharding is too hard/unreliable
• A “monopolistically competitive market”?
Scaling




Seamless Horizontal Scalability from 1 to N
Proposal:
    Let the Filesystem do the hard work
• RDBMS presents a full SQL interface to
  applications, automatically accessing files to get
  data as needed
• RDBMS stores metadata allowing it to find the
  right data files
• Embedded key/value store handles the record
  level storage, locking, caching, etc.
• FS (local or distributed) stores data and is
  responsible for replication, performance, locking,
  etc.
Major Wins
• Scales continuously from 1-100 servers (FS
  permitting)
• Hot/cold storage hierarchy
• Allows ad-hoc queries via mature SQL
• Everyone already has built in bindings
Architecture

• Application Talks SQL to PostgreSQL
• PostgreSQL stores metadata
• Performs post processing on rows
  retrieved from KC files
• KC files live on a POSIX filesystem
Architecture
                     Application


                         SQL



                     PostgreSQL

                   SortaSQL Plugin


          libKC      libKC     libKC libKC


 Kyoto             Kyoto         Kyoto         Kyoto
Cabinet           Cabinet       Cabinet       Cabinet

    Filesystem                       Filesystem
Big Table
“A Bigtable is a sparse, distributed, persistent
       multidimensional sorted map.”
Multi-Dimensional
      • Storing values as protocol
        buffers allow for arbitrarily
        complex maps
      • Logic so that when maps
        get too big, they are
        promoted to top level KC
        stores
Persistent and Sorted
• Any Key/Value store which allows for
  binary values accessed via a B+Tree of keys
  will do
• We use Kyoto Cabinet (successor to Tokyo
  Cabinet)
Sparse
• Values can be arbitrarily different.
• NULLs are free (or cheap)
• Protocol Buffers again to the rescue.
Distributed
• All about the filesystem here
Sharding Made Easy
• Fine grained metadata allowed for efficient
  storage hierarchy
SortaSQL: Summary

• BigTable like structure
• Accessed via SQL (PHP bindings come for free!)
• Offload the hard part to the Filesystem Folks
Case Study: CloudFlare
• 400 GB data/day (Medium Data?)
 • Facebook = 25 TB data/day
 • USPS = 25.6 GB text data delivered/day
• Mix of Flash and Magnetic storage
• Mirrored
• Fixed user queries
• Random BizDev queries
Data Scheme
Metadata



Partitioned by owner, period and data type
Disk Layout

• 2 80GB SSDs (small
  and blazing)
• 2.5T RAID5 (big and
  slow)
Disk Layout
First Steps
(access some records)
Silly SQL Tricks
Window Functions are
    Your Friends
NoSQL to MySQL with
    Memcached
          • Replace Language not
            Storage Engine
          • Speak Memcached not
            SQL
In Context
• https://github.com/cloudflare/SortaSQL
• http://dev.mysql.com/tech-resources/articles/
  nosql-to-mysql-with-memcached.html
• http://queue.acm.org/detail.cfm?id=1961297

SortaSQL

Editor's Notes

  • #2 Hi, I&amp;#x2019;m ian and I&amp;#x2019;m going to be talking about a db project that I&amp;#x2019;ve been working on at CloudFlare\n
  • #3 What is CF. Web access logs -- who did what\n
  • #4 \n
  • #5 \n
  • #6 Hbase not a good match for us.\n
  • #7 Instead of buying more hardware, we decided to get fancy. A lot like hbase and hadoop.\n
  • #8 \n
  • #9 Go fast here\n
  • #10 go slow here\n
  • #11 I&amp;#x2019;m now going to try and convince you that this is not just a CF specific tool, but general purpose.\nIe, just like bigtable\n
  • #12 \n
  • #13 \n
  • #14 \n
  • #15 \n
  • #16 \n
  • #17 \n
  • #18 \n
  • #19 compound index leading to a set of counters (key/value pairs)\n
  • #20 how seen in the db -- given owner, period, data type, recover rows using a kcx function \n
  • #21 \n
  • #22 \n
  • #23 \n
  • #24 \n
  • #25 \n
  • #26 \n
  • #27 \n