Riak seattle-meetup-august


Published on

Published in: Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • \n
  • \n
  • * Lots of buzz words \n* Ther e are \n
  • Talk about how this is borne from what we built back at Basho\n* Sales app \n* Never wanted it to go down \n* apple, akamai pedigree \n* fungible assets \n\n
  • * Just wanted to touch on this a bit more\n - Every node is the same \n
  • \n
  • \n
  • - After you download and build Riak, here’s all that stands between you have cluster of N nodes. \n - We also have a extended suite of command line tools that make cluster management very simple \n\n\n
  • * These are both very well documented on the wiki \n* Can easily write your own client code to talk to Riak using these as guides\n
  • * Webmachine a REST toolkit if you like\n
  • * Buckets are used for basic data organization. (not in Dynamo) Rougly analogous to “tables” but only insofar as they can be used for organization. Buckets are also where you set certain properties. \n\nJSON is a data type we see a lot. BSON showed up recently,too. Python byte code. \n\n\n
  • * dynamo differences\n- buckets \n\n
  • ring, keyspace, vnode, partition, node. \n* pref list is the list of the N vnodes on which the data should be stored. \n* the ease of scaling out here should be pretty apparent. We aren’t asking you to pick a shard key, etc. Y * You select an N_val that suits your business needs (store more copies durably that matter more), and then add nodes to suit.\n* each color is a physical node\n* nodes run an equal # of partitions (optimistically) \n* each partition then runs an erlang process called a vnode\n* vnodes are responsible for the handling of requests\n* ring space = integer of 2^160. \n
  • * these provide the developer and stakeholders to tune consistency and availability into their apps at the bucket let. (and we are working making this more granular)\n
  • \n
  • Tunable consistency based on your business needs. \nLog data can be written with an W of 1 for speed. \nPrescription information can be read with an R val of 3 for higher consistency \n
  • enable “multi-backend” in your config file; backends are configurable at the bucket level\n* Bitcask - current default. we wrote this. Append only on disk. keeps a pointer to most recent copy of object in mem. updates that on each write. performs merges to clean upu\nold values. Super fast - < 5ms latencies on most systems at 99.9% Memory is only concern. (~100 bytes per object depending on key size)\n* LevelDB - just released by google. permissive licensing, items are ordered on disk, far less drastic memory requirements per key (less ram, slightly longer latencies) opens up the door for more time series/range based queries in Riak\n* Innostore - recommended when RAM requirements might be too high. embedded innoDB. We will start recommending this less at level support become more stable\n* In-memory for testing. We are also deprecating them\n\n\n
  • * Links are metadata that establish one-way relationships between objects in Riak; once links are attached, you can them perform links walking queries to find relationships. bucket - talks, key - Seattle, riaktag=”talk” (in header). \n* MapReduce - Chain any number of Map and Reduce phases. Map produces 0 or more results based on you function. “Reduce” is will combine the results of MapPhase and return them to client. \n* Pre commit - json validation. Post commit - send to another db/service \n* Full Text Search - full-text search engine built around Riak Core and tightly integrated with Riak KV. Use a pre-commit hook at the bucket level to index. \n
  • * We are pushing this super hard, as the next slide will show. \n
  • * Client libs - Erlang, java, JS, python, node.js, Ruby (ripple), haskell, Smalltalk, Go, Scala, PHP, Perl \n* Secondary indexing, deeper search integration, robust, better-defined MapReduce with pipe\n* Working on a successor to Rekon that will be an admin tool (use DISQUS anecdote about Cassandra Dashboard) \n
  • * Comcast - requirement is basically an internal Amazon S3 - a straight Key/Value store with an HTTP interface. This is to build a product called HOSS (high availably object storage system). This infrastructure is used across Comcast to store DVR data.\n* European country is using Rik \n* MIG-CAN - SMS Gateway for British \n* Wikia - multiple data centers for session storage\n\n
  • \n
  • \n
  • * Rusty’s 2I deck is available; code is in master\n* Lager blog post\n* pipe is in master; extensive readme on GitHub \n* Will be doing prereleases (packages, as opposed to building from source) - get on the mailing list to be part of this \n
  • \n
  • \n
  • \n
  • Riak seattle-meetup-august

    1. 1. The Best Open Source Database You Will Ever Have ThePleasure Of Running In Production Seattle Scalability Meetup August 24, 2011
    2. 2. Who Am I?• Mark Phillips• Community Manager• Basho Technologies• @pharkmillups
    3. 3. What is Riak?• a database• a key/value store• distributed• fault-tolerant• scalable• Dynamo-inspired• used by startups• used by FORTUNE 100 companies• written (primarily) in Erlang• pronounced “REE-awk”• not the right fit for every project and app
    4. 4. Riak’s Design Goals• Simple, Elegant Scalability• Ease of operations• Resiliency in the face of failure• Plumbing
    5. 5. Distributed, Scalable, Fault Tolerant No central coordinator; Easy to setup and operate
    6. 6. Distributed, Scalable, Fault Tolerant Horizontally Scalable;Add commodity hardware to get more [throughput | processing | storage].
    7. 7. Distributed, Scalable, Fault Tolerant Always Available No Single Point of Failure Self-healing
    8. 8. Building Clusters is Dead Simple$ riak start# to get your first node running$ riak start# to get your second node running$ riak-admin join riak@ send a join request to an existing Riak node$ later, rinse, repeat until you’re web scale
    9. 9. APIs•HTTP - Richly Featured RESTful API•Protocol Buffers - Courtesy of Google
    10. 10. Current HTTP API (made possible by webmachine)StorePOST /riak/bucket # Riak-defined keyPUT /riak/bucket/key # User-defined keyFetchGET /riak/bucket/keyDeleteDELETE /riak/bucket/key
    11. 11. How Riak Organizes Data Bucket/Key/Value•Bucket - top-level namespace in Riak. Used forbasic data organization; also used to setproperties for keys in that bucket (n_val, commithooks, choice of backend, etc.)•Key - binary blob to identify value•Value - any type of data type you can everimagine
    12. 12. What we borrowed from Dynamo:• Gossip protocol - ring membership, partition assignment• Consistent Hashing - division of work• Vector Clocks - versioning, conflict resolution• Read Repair - anti-entropy• Hinted Handoff - failure masking, data migration Paper was not a spec for a system but one approach; we learned from it but deviated where necessary
    13. 13. Consistent Hashing
    14. 14. N, R, W Values• N = number of replicas to store (on distinctphysical nodes)• R = number of replica responses needed fora successful read•W = number of replica responses needed fora successful write
    15. 15. N, R, W Values
    16. 16. N, R, W Values
    17. 17. Writing Things to Disk (because that’s what databases do)Riak allows for pluggable local storage• Bitcask (recommended, ships as default)• LevelDB (recently released by Google)• Innostore• Several other specialty backends
    18. 18. Querying Riak• Primary key based lookups• MapReduce• Links/Link Walking• Pre- and Post- Commit Hooks• Full Text Search• Secondary Indexes
    19. 19. Riak’s Design Goals (revisited)• Simple, Elegant Scalability• Ease of operations• Resiliency in the face of failure• Plumbing• Ease of Development, DeveloperUsability
    20. 20. Our Current Usability Focus• Robust, supported client libs• More complex querying capabilities• Documentation• Sample Apps, code, etc.• Cluster administration tools• Logging
    21. 21. Selected Use Cases• Session Storage (Wikia, DISQUS, MochiMedia)• Timeline (Yammer, DISQUS, Formspring)• Generic Object storage (Comcast)• Scalable Full-Text Search (Best Buy, Clipboard.com)• User Profile/Data Storage (Danish Government)• Message Storage (Voxer, MIG-CAN)• Gaming (MochiMedia)
    22. 22. Community and Open Source• Very committed to our open source community• Companies like Yammer, Comcast, DISQUS, Triforkand Formspring contributing actively• The Riak community is a great place to work andplay. Come join us!
    23. 23. Productions Deploys>250 clusters out there right now
    24. 24. Riak 1.0 (dropping late Sept.)• Secondary Indexing• Lager - new logging framework• riak_pipe - massive overhaul of M/R framework• Riak Search Integration• Lots o’ Bug Fixes/Stability Improvements
    25. 25. About Basho Offices in: San Francisco, CA; Cambridge, MA; Reston, VA Employees: All over the worldTotal Employees: ~35
    26. 26. About Basho How do we pay the bills? Enterprise Licenses of Riak and SLA’d Support Professional Services ConsultingOther Prominent Open Source Software: Webmachine Rebar Bitcask Lager
    27. 27. Get Involved• wiki.basho.com• github.com/basho/*• basho.com• twitter.com/basho• Riak Mailing List• downloads.basho.com/riak/CURRENT