No sql findings
Upcoming SlideShare
Loading in...5
×
 

No sql findings

on

  • 1,932 views

 

Statistics

Views

Total Views
1,932
Views on SlideShare
1,932
Embed Views
0

Actions

Likes
2
Downloads
19
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

No sql findings No sql findings Presentation Transcript

  • NoSQL Findings Christian van der Leeden Thursday, September 23, 2010
  • Our problem • Growth is not linear and not predictable • e.g. History::Session table now > 30 Mio entries • Activities > 26 Mio entries • Postgres will be the performance bottleneck Thursday, September 23, 2010
  • Criteria • Allow us to scale from 100k Daily Active Users (DAU) to 1 Mio DAU up to 10Mio DAU • Scale horizontally (“Just add servers”) • Good ruby performance • Good transition from Rails/Postgres -> Rails/NoSQL • Actively developed Thursday, September 23, 2010
  • Goal • Scores (@ 10 Mio Daily Active Users) • 10 Mio Scores/day == 350 inserts/second • around same read rate for Leaderboards • Game with 10 Mio Players • Leaderboard with 10 Mio entries • Session (@ 10 Mio DAU) • > 10 Mio session handshakes/day Thursday, September 23, 2010
  • Data Patterns • Most data is accessed time based (the most recent data is accessed the most often) • Write-Read rate is around the same • Eventually consistency is good enough most of the time Thursday, September 23, 2010
  • Rating criteria • Type (Document Store, Key/Value Store, Big Table) • Deployment • How easy is it to scale? • Existing installations • How big are known installations? • Heritage and activity • Where does the solution come from and how actively is it developed by whom? Thursday, September 23, 2010
  • Products evaluated • MongoDB • Redis • Cassandra • HBase • Membase Thursday, September 23, 2010
  • MongoDB • document store • “SQL DB” without relations • easy transition with MongoMapper, Mongoid • supports sharding over replication sets (since August 2010) • Haven’t found a big shareded server installation Thursday, September 23, 2010
  • Experience with Mongo • nice/easy to program with • deployment woes we’ve encountered (1.6.0) • segmentation fault • cannot read beacuse: invalid BSON object • when index is > RAM performance degradation (from 20ms to 200 ms for queries) • Global write lock makes data migrations slow Thursday, September 23, 2010
  • Cassandra • Big Table data store • Was developed by Facebook and is actively maintained • Easy to add servers and to setup (peer to peer concept) • Thrift API to Ruby was slow in tests (Our tests: around 150 write ops/second) • Avro API promises to be faster (will be an option in 0.7) • Used by Facebook • Not using it because it is too slow with ruby Thursday, September 23, 2010
  • Redis • Memcache with simple persistence • Supports many different data types and atomic operations on them • Sharding is done client side (difficult to add new servers) • We’re using it for indexes on SQL data • Very fast (Our tests: 4000 write operations/second) Thursday, September 23, 2010
  • HBase • Big Table Database • Complex to setup and to maintain • Very often used for Analytics Jobs with Hadoop/HIVE e.g as Amazon EC2 Elastic Map Reduce • For Analytics also look at Scribe for data collection Thursday, September 23, 2010
  • Membase • Key-Value Store • Distributed, persistent Memcache • Easy to add nodes • Used by Zynga Thursday, September 23, 2010
  • Example Leaderboards • User has many scores • Each score has one result (integer) • Game has many scores • Query: the leaderboard for one game • Insert one score into the leaderboard • What is my rank? • Give me 10 scores starting at position 100,000 Thursday, September 23, 2010
  • SQL vs NoSQL • Think about Data • Think about Queries • Redundancy is bad • Redundancy is ok • Indexes are managed by • Roll your own indexes the DB depending on queries • Query over relations • No Joins and connecting entities • Always exact results • Query results don’t have to return latest write operation Thursday, September 23, 2010
  • SQL vs NoSQL • standardized query • some solutions share language and DDL standards • All DBs are “the • Many different same” approaches • Document store • Big Table • Key Value Thursday, September 23, 2010
  • Postgres 1 n n 1 User Score Game • Create new score: Score.new(attributes) Score.save => insert into scores; • What is my rank? select count(*) from scores inner join games on (games.id = scores.game_id) where result > #{my_score.result} and games.name = #{game_name} order by result desc • Give me 10 scores in leaderboard from position 100000 select * from scores inner join games on (games.id = scores.game_id) order by result desc offset 100000 limit 10; Thursday, September 23, 2010
  • Redis SortedSet • New Score key: game_name score: result value: score_id redis.zadd(“Jewels”, key: "Jewels" result, score_id) 100 99 96 <2563> <96877> <6752> ... • My Rank? key: "Bug Landing" redis.zrevrank("Jewels", key: "Toss It" result) ... • 10 scores from position 100000 KeyValue Store key: score_id redis.zrevrange(“Jewels”, value: marshalled score object 100000, 10) 2563: { result : 100, user_id : 52345, game_id: 57142 } 96877: { result : 99, user_id : 2541, game_id: 57142 } 9752: { result : 96, user_id : 3652, game_id: 57142 } Thursday, September 23, 2010
  • Mongo Collection key: Scores { _id: 2563, result : 100, user_id : 52345, game_id: 57142 } { _id: 96877, result : 99, user_id : 2541, game_id: 57142 } { _id: 6752, result : 96, user_id : 3652, game_id: 57142 } • New Score Score.create!(attributes) db.scores.insert( { result: 100, user_id: 52345, game_id: 57142 } ) • What is my rank? db.scores.count( { result: { $gt: #{my_score.result} }}) • 10 scores from position 100000 db.scores.find({}).sort({ result: -1 }).skip (100000).limit(10) Thursday, September 23, 2010
  • Cassandra ColumFamily: Leaderboards ColumFamily: Scores row_key: game_name row_key: score_id row_key: "Jewels" row_key: 2563 game_id: 57142 result: 100 user_id: 6325 100: 2563 99: 96877 96: 6752 row_key: 96877 row_key: "Bug Landing" game_id: 57142 result: 99 user_id: 2375 row_key: "Toss It" row_key: 6752 ... game_id: 57142 result: 96 user_id: 2311 ... Thursday, September 23, 2010
  • ColumFamily: Leaderboards row_key: game_name Cassandra row_key: "Jewels" 100: 2563 99: 96877 row_key: "Bug Landing" 96: 6752 row_key: "Toss It" • Insert new score: ... client.insert(“ScoreList”, “Jewels”, result => id) client.insert(id, :result => result, :user_id => user_id, :game_id => game_id) • What is my rank? => not easy, need help from other tools • Give me the next 10 scores starting at score X client.get(“ScoreList”, “Jewels”, :start => X.result, count => 10) Thursday, September 23, 2010
  • Findings • Use and test the tools you want to use on the scale you are going to use them • There is no “Best NoSQL” solution • Mix and match the tools you need • NoSQL requires a lot of rethinking and change in your Ruby Code. Thursday, September 23, 2010
  • Links • Cassandra: http://cassandra.apache.org/ • Cassandra API: http://wiki.apache.org/cassandra/API • Twitter on Cassandra: http://github.com/ericflo/twissandra • Redis: http://code.google.com/p/redis/ • Redis API: http://code.google.com/p/redis/wiki/CommandReference • Membase: http://www.membase.org/ • HBase: http://hbase.apache.org/ • Scribe: http://github.com/facebook/scribe • Mongo: http://www.mongodb.org/ Thursday, September 23, 2010