Riak at shareaholic
Upcoming SlideShare
Loading in...5
×
 

Riak at shareaholic

on

  • 15,265 views

Slides from my talk on using Riak at Shareaholic

Slides from my talk on using Riak at Shareaholic

Statistics

Views

Total Views
15,265
Views on SlideShare
2,234
Embed Views
13,031

Actions

Likes
3
Downloads
41
Comments
2

34 Embeds 13,031

http://java.dzone.com 6586
http://rob.by 4516
http://nosql.mypopescu.com 1137
http://blog.shareaholic.com 429
http://feeds.feedburner.com 146
http://localhost 141
http://www.hanrss.com 13
http://www.newsblur.com 11
http://abtasty.com 9
http://newsblur.com 5
http://translate.googleusercontent.com 5
http://rritw.com 2
https://www.google.co.uk 2
https://www.google.nl 2
http://www.google.com 2
http://prlog.ru 2
http://apps.synaptive.net 2
http://feedreader.com 2
http://feedproxy.google.com 2
http://127.0.0.1 2
http://www.dzone.com 2
http://www.acushare.com 1
http://www.ofelio.com 1
http://twimblr.appspot.com 1
https://www.google.de 1
https://www.google.ca 1
https://www.google.com 1
http://131.253.14.66 1
http://xianguo.com 1
http://dzone.com 1
http://raskolnikoff.tumblr.com 1
http://www.twylah.com 1
https://twitter.com 1
http://iptv-portal.cdn.iinet.net.au 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Riak at shareaholic Riak at shareaholic Presentation Transcript

  • Riak @ Robby Grossman robby@shareaholic.com @freerobby
  • AgendaShareaholic: Product & TechWhy Riak: The Search for a Big Data StoreTransitioning to RiakRiak Use CasesDeploying to EC2
  • What’s ?
  • Browser Tools
  • Sharing Buttons
  • Recommendations
  • Social Analytics
  • Monthly @ Thousands of developers hitting API Hundreds of thousands of publishers Tens of millions of shares & clicks Hundreds of millions of pageviews & events
  • Tech @JRuby on Rails (via Torquebox)MySQL (Master, Read Slave)Elastic MapReduce (similar to Hadoop)RedisFormerly Mongo, Now Riak
  • Why Not Mongo?Working set needs to fit in memoryGlobal write lock blocks all queriesdespite not having transactions/joinsStandbys not “hot”
  • Why Riak?
  • Next @Options: Goals: HBase Linear scalability Cassandra Full-text search Riak Flexible indexing Easier Devops
  • HBasePros Cons Battle tested Complex Architecture High performance SPOFs Requires Hive for Indexing/Querying Expensive to deploy at small scale
  • CassandraPros Cons Native secondary Known users all indices domain experts Linear scalability Search requires Lucene Tunable CAP Heavy Weight MapReduce
  • RiakPros Cons Operationally simpler Multi-data center replication requires Linear scalability Enterprise product Integrated search leveldb puts high strain on CPU Secondary indices Tunable CAP Vector clocks solve time-sync problems
  • From Mongo to Riak
  • Migration GoalsNo time where database goes “offline”Product parity throughout migration
  • Migration Process1. App writes to Mongo and Riak2. Verify data integrity3. Import historical data4. App reads from Riak5. Decommission Mongo
  • Use Cases
  • Share APISave shared contentUses MapReduce topopulate user dashboard
  • RecommendationsSets of related pagesGenerated on-demand
  • Publisher AnalyticsGenerated nightly via HadoopTypical stored “document” (JSON)80kb-1Mb
  • Riak Successes
  • MapReduceHandy for queryingRuns at “web page speed”.Easy to re-reduce for complex queriesEasy to test via CURL
  • Tunable CAP @ Replication: primary/secondary authority Read failure tolerance: speed/consistency Write failure tolerance
  • Full Text SearchBuilt on LuceneMake user content searchableMake arbitrary keys queryable“Just turn it on”Hiccup: corrupt merge indexes
  • Query Example Who’s our oldest user who’s shared something in the last minute?curl -XPOST http://localhost:8098/mapred -H Content-Type: application/json -d { "inputs": { "bucket":"links", "query":"timestamp:[1346350877 TO 1346350937}" //60 second period }, "query":[ {"map":{"language":"javascript","source":"function(riakObject) { return [[Riak.mapValuesJson(riakObject)[0].user_id]]; }"}}, {"reduce":{"language":"javascript", "name":"Riak.reduceMin" // [[2],[5],[9],[13]] => [[2]] }} ]} [[2197]]
  • Riak on EC2
  • In a NutshellEC2 specs poorly proportioned for leveldbMultiple AZs in one location works wellScale vertically for better latency & consistencyScale horizontally for more throughput/$
  • BenchmarksTop Graph: c1.medium (1.7G, 5 CPU)Middle: m1.large (7.5G, 4 CPU)Bottom: cc1.4xlarge (23G, 33.5 CPU)
  • Throughput
  • Latency (Typical)
  • Latency (Worst Case)
  • Calculationsc1.medium (1.7G, 5 CPU)1758 IOPS/$-hrWorst 1% of queries: 300ms/800msm1.large (7.5G, 4 CPU)1167 IOPS/$-hrWorst 1% of queries: 110ms/200mscc1.4xlarge (23G, 33.5 CPU)872 IOPS/$-hrWorst 1% of queries: 47ms/139ms
  • Benchmark Takeaways You can’t go “by spec” IO is limiting factor RAM never limiting factor for 1% of keyspace to be in memory
  • Fin. Questions?Thanks: We’re Hiring! Tom Santero Robby Grossman Justin Sheehy robby@shareaholic.com Ryan Zezeski @freerobby Reid Draper #freenode riak crew
  • Fin.