Upcoming SlideShare
Loading in...5




CloudFlare looked at several NoSQL and SQL solutions and ended up with a hybrid model where many Kyoto Cabinet DBs are accessed via a Postgres wrapper. This presentation describes the resulting novel ...

CloudFlare looked at several NoSQL and SQL solutions and ended up with a hybrid model where many Kyoto Cabinet DBs are accessed via a Postgres wrapper. This presentation describes the resulting novel architecture which combines the horizontal scalability of NoSQL solutions with the flexibility and stability of SQL.



Total Views
Views on SlideShare
Embed Views



5 Embeds 498 494 1 1 1 1



Upload Details

Uploaded via as Apple Keynote

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment
  • Hi, I’m ian and I’m going to be talking about a db project that I’ve been working on at CloudFlare\n
  • What is CF. Web access logs -- who did what\n
  • \n
  • \n
  • Hbase not a good match for us.\n
  • Instead of buying more hardware, we decided to get fancy. A lot like hbase and hadoop.\n
  • \n
  • Go fast here\n
  • go slow here\n
  • I’m now going to try and convince you that this is not just a CF specific tool, but general purpose.\nIe, just like bigtable\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • compound index leading to a set of counters (key/value pairs)\n
  • how seen in the db -- given owner, period, data type, recover rows using a kcx function \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n

SortaSQL SortaSQL Presentation Transcript

  • SortaSQLIan Pye <>
  • Motivation
  • Everyone likes SQL• Tables• Joins• Online Transaction Processing• Transactions!• Arbitrary Queries
  • Scaling?• What happens to joins when your data doesn’t fit in memory?• I only need get and set for my data• Sharding is too hard/unreliable• A “monopolistically competitive market”?
  • ScalingSeamless Horizontal Scalability from 1 to N
  • Proposal: Let the Filesystem do the hard work• RDBMS presents a full SQL interface to applications, automatically accessing files to get data as needed• RDBMS stores metadata allowing it to find the right data files• Embedded key/value store handles the record level storage, locking, caching, etc.• FS (local or distributed) stores data and is responsible for replication, performance, locking, etc.
  • Major Wins• Scales continuously from 1-100 servers (FS permitting)• Hot/cold storage hierarchy• Allows ad-hoc queries via mature SQL• Everyone already has built in bindings
  • Architecture• Application Talks SQL to PostgreSQL• PostgreSQL stores metadata• Performs post processing on rows retrieved from KC files• KC files live on a POSIX filesystem
  • Architecture Application SQL PostgreSQL SortaSQL Plugin libKC libKC libKC libKC Kyoto Kyoto Kyoto KyotoCabinet Cabinet Cabinet Cabinet Filesystem Filesystem
  • Big Table“A Bigtable is a sparse, distributed, persistent multidimensional sorted map.”
  • Multi-Dimensional • Storing values as protocol buffers allow for arbitrarily complex maps • Logic so that when maps get too big, they are promoted to top level KC stores
  • Persistent and Sorted• Any Key/Value store which allows for binary values accessed via a B+Tree of keys will do• We use Kyoto Cabinet (successor to Tokyo Cabinet)
  • Sparse• Values can be arbitrarily different.• NULLs are free (or cheap)• Protocol Buffers again to the rescue.
  • Distributed• All about the filesystem here
  • Sharding Made Easy• Fine grained metadata allowed for efficient storage hierarchy
  • SortaSQL: Summery• BigTable like structure• Accessed via SQL (PHP bindings come for free!)• Offload the hard part to the Filesystem Folks
  • Case Study: CloudFlare• 400 GB data/day (Medium Data?) • Facebook = 25 TB data/day • USPS = 25.6 GB text data delivered/day• Mix of Flash and Magnetic storage• Mirrored• Fixed user queries• Random BizDev queries
  • Data Scheme
  • MetadataPartitioned by owner, period and data type
  • Disk Layout• 2 80GB SSDs (small and blazing)• 2.5T RAID5 (big and slow)
  • Disk Layout
  • First Steps(access some records)
  • Silly SQL Tricks
  • Window Functions are Your Friends
  • NoSQL to MySQL with Memcached • Replace Language not Storage Engine • Speak Memcached not SQL
  • In Context•• nosql-to-mysql-with-memcached.html•