Riak @     Robby Grossman  robby@shareaholic.com       @freerobby
AgendaShareaholic: Product & TechWhy Riak: The Search for a Big Data StoreTransitioning to RiakRiak Use CasesDeploying to ...
What’s   ?
Browser Tools
Sharing Buttons
Recommendations
Social Analytics
Monthly @ Thousands of developers hitting API Hundreds of thousands of publishers Tens of millions of shares & clicks Hund...
Tech @JRuby on Rails (via Torquebox)MySQL (Master, Read Slave)Elastic MapReduce (similar to Hadoop)RedisFormerly Mongo, No...
Why Not Mongo?Working set needs to fit in memoryGlobal write lock blocks all queriesdespite not having transactions/joinsSt...
Why Riak?
Next @Options:      Goals:  HBase         Linear scalability  Cassandra     Full-text search  Riak          Flexible index...
HBasePros                  Cons  Battle tested           Complex                          Architecture  High performance  ...
CassandraPros                   Cons  Native secondary       Known users all  indices                domain experts  Linea...
RiakPros                          Cons  Operationally simpler         Multi-data center                                rep...
From Mongo to Riak
Migration GoalsNo time where database goes “offline”Product parity throughout migration
Migration Process1. App writes to Mongo and Riak2. Verify data integrity3. Import historical data4. App reads from Riak5. ...
Use Cases
Share APISave shared contentUses MapReduce topopulate user dashboard
RecommendationsSets of related pagesGenerated on-demand
Publisher AnalyticsGenerated nightly via HadoopTypical stored “document” (JSON)80kb-1Mb
Riak Successes
MapReduceHandy for queryingRuns at “web page speed”.Easy to re-reduce for complex queriesEasy to test via CURL
Tunable CAP @    Replication: primary/secondary authority    Read failure tolerance: speed/consistency    Write failure to...
Full Text SearchBuilt on LuceneMake user content searchableMake arbitrary keys queryable“Just turn it on”Hiccup: corrupt m...
Query Example  Who’s our oldest user who’s shared something in the last minute?curl -XPOST http://localhost:8098/mapred -H...
Riak on EC2
In a NutshellEC2 specs poorly proportioned for leveldbMultiple AZs in one location works wellScale vertically for better l...
BenchmarksTop Graph: c1.medium (1.7G, 5 CPU)Middle: m1.large (7.5G, 4 CPU)Bottom: cc1.4xlarge (23G, 33.5 CPU)
Throughput
Latency (Typical)
Latency (Worst Case)
Calculationsc1.medium (1.7G, 5 CPU)1758 IOPS/$-hrWorst 1% of queries: 300ms/800msm1.large (7.5G, 4 CPU)1167 IOPS/$-hrWorst...
Benchmark Takeaways You can’t go “by spec” IO is limiting factor RAM never limiting factor for 1% of keyspace to be in mem...
Fin. Questions?Thanks:                 We’re Hiring!  Tom Santero              Robby Grossman  Justin Sheehy            ro...
Fin.
Upcoming SlideShare
Loading in...5
×

Migrating to Riak at Shareaholic

2,234

Published on

Robby Grossman, Shareaholic's Tech Lead, spoke at the first Boston Riak Meetup on August 30, 2012. These are his slides.

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,234
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
17
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Migrating to Riak at Shareaholic

  1. 1. Riak @ Robby Grossman robby@shareaholic.com @freerobby
  2. 2. AgendaShareaholic: Product & TechWhy Riak: The Search for a Big Data StoreTransitioning to RiakRiak Use CasesDeploying to EC2
  3. 3. What’s ?
  4. 4. Browser Tools
  5. 5. Sharing Buttons
  6. 6. Recommendations
  7. 7. Social Analytics
  8. 8. Monthly @ Thousands of developers hitting API Hundreds of thousands of publishers Tens of millions of shares & clicks Hundreds of millions of pageviews & events
  9. 9. Tech @JRuby on Rails (via Torquebox)MySQL (Master, Read Slave)Elastic MapReduce (similar to Hadoop)RedisFormerly Mongo, Now Riak
  10. 10. Why Not Mongo?Working set needs to fit in memoryGlobal write lock blocks all queriesdespite not having transactions/joinsStandbys not “hot”
  11. 11. Why Riak?
  12. 12. Next @Options: Goals: HBase Linear scalability Cassandra Full-text search Riak Flexible indexing Easier Devops
  13. 13. HBasePros Cons Battle tested Complex Architecture High performance SPOFs Requires Hive for Indexing/Querying Expensive to deploy at small scale
  14. 14. CassandraPros Cons Native secondary Known users all indices domain experts Linear scalability Search requires Lucene Tunable CAP Heavy Weight MapReduce
  15. 15. RiakPros Cons Operationally simpler Multi-data center replication requires Linear scalability Enterprise product Integrated search leveldb puts high strain on CPU Secondary indices Tunable CAP Vector clocks solve time-sync problems
  16. 16. From Mongo to Riak
  17. 17. Migration GoalsNo time where database goes “offline”Product parity throughout migration
  18. 18. Migration Process1. App writes to Mongo and Riak2. Verify data integrity3. Import historical data4. App reads from Riak5. Decommission Mongo
  19. 19. Use Cases
  20. 20. Share APISave shared contentUses MapReduce topopulate user dashboard
  21. 21. RecommendationsSets of related pagesGenerated on-demand
  22. 22. Publisher AnalyticsGenerated nightly via HadoopTypical stored “document” (JSON)80kb-1Mb
  23. 23. Riak Successes
  24. 24. MapReduceHandy for queryingRuns at “web page speed”.Easy to re-reduce for complex queriesEasy to test via CURL
  25. 25. Tunable CAP @ Replication: primary/secondary authority Read failure tolerance: speed/consistency Write failure tolerance
  26. 26. Full Text SearchBuilt on LuceneMake user content searchableMake arbitrary keys queryable“Just turn it on”Hiccup: corrupt merge indexes
  27. 27. Query Example Who’s our oldest user who’s shared something in the last minute?curl -XPOST http://localhost:8098/mapred -H Content-Type: application/json -d { "inputs": { "bucket":"links", "query":"timestamp:[1346350877 TO 1346350937}" //60 second period }, "query":[ {"map":{"language":"javascript","source":"function(riakObject) { return [[Riak.mapValuesJson(riakObject)[0].user_id]]; }"}}, {"reduce":{"language":"javascript", "name":"Riak.reduceMin" // [[2],[5],[9],[13]] => [[2]] }} ]} [[2197]]
  28. 28. Riak on EC2
  29. 29. In a NutshellEC2 specs poorly proportioned for leveldbMultiple AZs in one location works wellScale vertically for better latency & consistencyScale horizontally for more throughput/$
  30. 30. BenchmarksTop Graph: c1.medium (1.7G, 5 CPU)Middle: m1.large (7.5G, 4 CPU)Bottom: cc1.4xlarge (23G, 33.5 CPU)
  31. 31. Throughput
  32. 32. Latency (Typical)
  33. 33. Latency (Worst Case)
  34. 34. Calculationsc1.medium (1.7G, 5 CPU)1758 IOPS/$-hrWorst 1% of queries: 300ms/800msm1.large (7.5G, 4 CPU)1167 IOPS/$-hrWorst 1% of queries: 110ms/200mscc1.4xlarge (23G, 33.5 CPU)872 IOPS/$-hrWorst 1% of queries: 47ms/139ms
  35. 35. Benchmark Takeaways You can’t go “by spec” IO is limiting factor RAM never limiting factor for 1% of keyspace to be in memory
  36. 36. Fin. Questions?Thanks: We’re Hiring! Tom Santero Robby Grossman Justin Sheehy robby@shareaholic.com Ryan Zezeski @freerobby Reid Draper #freenode riak crew
  37. 37. Fin.
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×