Your SlideShare is downloading. ×
0
SortaSQL
SortaSQL
SortaSQL
SortaSQL
SortaSQL
SortaSQL
SortaSQL
SortaSQL
SortaSQL
SortaSQL
SortaSQL
SortaSQL
SortaSQL
SortaSQL
SortaSQL
SortaSQL
SortaSQL
SortaSQL
SortaSQL
SortaSQL
SortaSQL
SortaSQL
SortaSQL
SortaSQL
SortaSQL
SortaSQL
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

SortaSQL

4,244

Published on

CloudFlare looked at several NoSQL and SQL solutions and ended up with a hybrid model where many Kyoto Cabinet DBs are accessed via a Postgres wrapper. This presentation describes the resulting novel …

CloudFlare looked at several NoSQL and SQL solutions and ended up with a hybrid model where many Kyoto Cabinet DBs are accessed via a Postgres wrapper. This presentation describes the resulting novel architecture which combines the horizontal scalability of NoSQL solutions with the flexibility and stability of SQL.

Published in: Technology, Travel
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
4,244
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
22
Comments
0
Likes
2
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • Hi, I’m ian and I’m going to be talking about a db project that I’ve been working on at CloudFlare\n
  • What is CF. Web access logs -- who did what\n
  • \n
  • \n
  • Hbase not a good match for us.\n
  • Instead of buying more hardware, we decided to get fancy. A lot like hbase and hadoop.\n
  • \n
  • Go fast here\n
  • go slow here\n
  • I’m now going to try and convince you that this is not just a CF specific tool, but general purpose.\nIe, just like bigtable\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • compound index leading to a set of counters (key/value pairs)\n
  • how seen in the db -- given owner, period, data type, recover rows using a kcx function \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • Transcript

    • 1. SortaSQLIan Pye <ian@cloudflare.com>
    • 2. Motivation
    • 3. Everyone likes SQL• Tables• Joins• Online Transaction Processing• Transactions!• Arbitrary Queries
    • 4. Scaling?• What happens to joins when your data doesn’t fit in memory?• I only need get and set for my data• Sharding is too hard/unreliable• A “monopolistically competitive market”?
    • 5. ScalingSeamless Horizontal Scalability from 1 to N
    • 6. Proposal: Let the Filesystem do the hard work• RDBMS presents a full SQL interface to applications, automatically accessing files to get data as needed• RDBMS stores metadata allowing it to find the right data files• Embedded key/value store handles the record level storage, locking, caching, etc.• FS (local or distributed) stores data and is responsible for replication, performance, locking, etc.
    • 7. Major Wins• Scales continuously from 1-100 servers (FS permitting)• Hot/cold storage hierarchy• Allows ad-hoc queries via mature SQL• Everyone already has built in bindings
    • 8. Architecture• Application Talks SQL to PostgreSQL• PostgreSQL stores metadata• Performs post processing on rows retrieved from KC files• KC files live on a POSIX filesystem
    • 9. Architecture Application SQL PostgreSQL SortaSQL Plugin libKC libKC libKC libKC Kyoto Kyoto Kyoto KyotoCabinet Cabinet Cabinet Cabinet Filesystem Filesystem
    • 10. Big Table“A Bigtable is a sparse, distributed, persistent multidimensional sorted map.”
    • 11. Multi-Dimensional • Storing values as protocol buffers allow for arbitrarily complex maps • Logic so that when maps get too big, they are promoted to top level KC stores
    • 12. Persistent and Sorted• Any Key/Value store which allows for binary values accessed via a B+Tree of keys will do• We use Kyoto Cabinet (successor to Tokyo Cabinet)
    • 13. Sparse• Values can be arbitrarily different.• NULLs are free (or cheap)• Protocol Buffers again to the rescue.
    • 14. Distributed• All about the filesystem here
    • 15. Sharding Made Easy• Fine grained metadata allowed for efficient storage hierarchy
    • 16. SortaSQL: Summery• BigTable like structure• Accessed via SQL (PHP bindings come for free!)• Offload the hard part to the Filesystem Folks
    • 17. Case Study: CloudFlare• 400 GB data/day (Medium Data?) • Facebook = 25 TB data/day • USPS = 25.6 GB text data delivered/day• Mix of Flash and Magnetic storage• Mirrored• Fixed user queries• Random BizDev queries
    • 18. Data Scheme
    • 19. MetadataPartitioned by owner, period and data type
    • 20. Disk Layout• 2 80GB SSDs (small and blazing)• 2.5T RAID5 (big and slow)
    • 21. Disk Layout
    • 22. First Steps(access some records)
    • 23. Silly SQL Tricks
    • 24. Window Functions are Your Friends
    • 25. NoSQL to MySQL with Memcached • Replace Language not Storage Engine • Speak Memcached not SQL
    • 26. In Context• https://github.com/cloudflare/SortaSQL• http://dev.mysql.com/tech-resources/articles/ nosql-to-mysql-with-memcached.html• http://queue.acm.org/detail.cfm?id=1961297

    ×