AgendaShareaholic: Product & TechWhy Riak: The Search for a Big Data StoreTransitioning to RiakRiak Use CasesDeploying to EC2
What’s ?
Browser Tools
Sharing Buttons
Recommendations
Social Analytics
Monthly @ Thousands of developers hitting API Hundreds of thousands of publishers Tens of millions of shares & clicks Hundreds of millions of pageviews & events
Tech @JRuby on Rails (via Torquebox)MySQL (Master, Read Slave)Elastic MapReduce (similar to Hadoop)RedisFormerly Mongo, Now Riak
Why Not Mongo?Working set needs to fit in memoryGlobal write lock blocks all queriesdespite not having transactions/joinsStandbys not “hot”
Why Riak?
Next @Options: Goals: HBase Linear scalability Cassandra Full-text search Riak Flexible indexing Easier Devops
HBasePros Cons Battle tested Complex Architecture High performance SPOFs Requires Hive for Indexing/Querying Expensive to deploy at small scale
CassandraPros Cons Native secondary Known users all indices domain experts Linear scalability Search requires Lucene Tunable CAP Heavy Weight MapReduce
RiakPros Cons Operationally simpler Multi-data center replication requires Linear scalability Enterprise product Integrated search leveldb puts high strain on CPU Secondary indices Tunable CAP Vector clocks solve time-sync problems
From Mongo to Riak
Migration GoalsNo time where database goes “offline”Product parity throughout migration
Migration Process1. App writes to Mongo and Riak2. Verify data integrity3. Import historical data4. App reads from Riak5. Decommission Mongo
Use Cases
Share APISave shared contentUses MapReduce topopulate user dashboard
RecommendationsSets of related pagesGenerated on-demand
Publisher AnalyticsGenerated nightly via HadoopTypical stored “document” (JSON)80kb-1Mb
Riak Successes
MapReduceHandy for queryingRuns at “web page speed”.Easy to re-reduce for complex queriesEasy to test via CURL
Full Text SearchBuilt on LuceneMake user content searchableMake arbitrary keys queryable“Just turn it on”Hiccup: corrupt merge indexes
Query Example Who’s our oldest user who’s shared something in the last minute?curl -XPOST http://localhost:8098/mapred -H Content-Type: application/json -d { "inputs": { "bucket":"links", "query":"timestamp:[1346350877 TO 1346350937}" //60 second period }, "query":[ {"map":{"language":"javascript","source":"function(riakObject) { return [[Riak.mapValuesJson(riakObject)[0].user_id]]; }"}}, {"reduce":{"language":"javascript", "name":"Riak.reduceMin" // [[2],[5],[9],[13]] => [[2]] }} ]} [[2197]]
Riak on EC2
In a NutshellEC2 specs poorly proportioned for leveldbMultiple AZs in one location works wellScale vertically for better latency & consistencyScale horizontally for more throughput/$
1–2 of 2 previous next