No sql findings

  • 1,527 views
Uploaded on

 

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
1,527
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
19
Comments
0
Likes
2

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. NoSQL Findings Christian van der Leeden Thursday, September 23, 2010
  • 2. Our problem • Growth is not linear and not predictable • e.g. History::Session table now > 30 Mio entries • Activities > 26 Mio entries • Postgres will be the performance bottleneck Thursday, September 23, 2010
  • 3. Criteria • Allow us to scale from 100k Daily Active Users (DAU) to 1 Mio DAU up to 10Mio DAU • Scale horizontally (“Just add servers”) • Good ruby performance • Good transition from Rails/Postgres -> Rails/NoSQL • Actively developed Thursday, September 23, 2010
  • 4. Goal • Scores (@ 10 Mio Daily Active Users) • 10 Mio Scores/day == 350 inserts/second • around same read rate for Leaderboards • Game with 10 Mio Players • Leaderboard with 10 Mio entries • Session (@ 10 Mio DAU) • > 10 Mio session handshakes/day Thursday, September 23, 2010
  • 5. Data Patterns • Most data is accessed time based (the most recent data is accessed the most often) • Write-Read rate is around the same • Eventually consistency is good enough most of the time Thursday, September 23, 2010
  • 6. Rating criteria • Type (Document Store, Key/Value Store, Big Table) • Deployment • How easy is it to scale? • Existing installations • How big are known installations? • Heritage and activity • Where does the solution come from and how actively is it developed by whom? Thursday, September 23, 2010
  • 7. Products evaluated • MongoDB • Redis • Cassandra • HBase • Membase Thursday, September 23, 2010
  • 8. MongoDB • document store • “SQL DB” without relations • easy transition with MongoMapper, Mongoid • supports sharding over replication sets (since August 2010) • Haven’t found a big shareded server installation Thursday, September 23, 2010
  • 9. Experience with Mongo • nice/easy to program with • deployment woes we’ve encountered (1.6.0) • segmentation fault • cannot read beacuse: invalid BSON object • when index is > RAM performance degradation (from 20ms to 200 ms for queries) • Global write lock makes data migrations slow Thursday, September 23, 2010
  • 10. Cassandra • Big Table data store • Was developed by Facebook and is actively maintained • Easy to add servers and to setup (peer to peer concept) • Thrift API to Ruby was slow in tests (Our tests: around 150 write ops/second) • Avro API promises to be faster (will be an option in 0.7) • Used by Facebook • Not using it because it is too slow with ruby Thursday, September 23, 2010
  • 11. Redis • Memcache with simple persistence • Supports many different data types and atomic operations on them • Sharding is done client side (difficult to add new servers) • We’re using it for indexes on SQL data • Very fast (Our tests: 4000 write operations/second) Thursday, September 23, 2010
  • 12. HBase • Big Table Database • Complex to setup and to maintain • Very often used for Analytics Jobs with Hadoop/HIVE e.g as Amazon EC2 Elastic Map Reduce • For Analytics also look at Scribe for data collection Thursday, September 23, 2010
  • 13. Membase • Key-Value Store • Distributed, persistent Memcache • Easy to add nodes • Used by Zynga Thursday, September 23, 2010
  • 14. Example Leaderboards • User has many scores • Each score has one result (integer) • Game has many scores • Query: the leaderboard for one game • Insert one score into the leaderboard • What is my rank? • Give me 10 scores starting at position 100,000 Thursday, September 23, 2010
  • 15. SQL vs NoSQL • Think about Data • Think about Queries • Redundancy is bad • Redundancy is ok • Indexes are managed by • Roll your own indexes the DB depending on queries • Query over relations • No Joins and connecting entities • Always exact results • Query results don’t have to return latest write operation Thursday, September 23, 2010
  • 16. SQL vs NoSQL • standardized query • some solutions share language and DDL standards • All DBs are “the • Many different same” approaches • Document store • Big Table • Key Value Thursday, September 23, 2010
  • 17. Postgres 1 n n 1 User Score Game • Create new score: Score.new(attributes) Score.save => insert into scores; • What is my rank? select count(*) from scores inner join games on (games.id = scores.game_id) where result > #{my_score.result} and games.name = #{game_name} order by result desc • Give me 10 scores in leaderboard from position 100000 select * from scores inner join games on (games.id = scores.game_id) order by result desc offset 100000 limit 10; Thursday, September 23, 2010
  • 18. Redis SortedSet • New Score key: game_name score: result value: score_id redis.zadd(“Jewels”, key: "Jewels" result, score_id) 100 99 96 <2563> <96877> <6752> ... • My Rank? key: "Bug Landing" redis.zrevrank("Jewels", key: "Toss It" result) ... • 10 scores from position 100000 KeyValue Store key: score_id redis.zrevrange(“Jewels”, value: marshalled score object 100000, 10) 2563: { result : 100, user_id : 52345, game_id: 57142 } 96877: { result : 99, user_id : 2541, game_id: 57142 } 9752: { result : 96, user_id : 3652, game_id: 57142 } Thursday, September 23, 2010
  • 19. Mongo Collection key: Scores { _id: 2563, result : 100, user_id : 52345, game_id: 57142 } { _id: 96877, result : 99, user_id : 2541, game_id: 57142 } { _id: 6752, result : 96, user_id : 3652, game_id: 57142 } • New Score Score.create!(attributes) db.scores.insert( { result: 100, user_id: 52345, game_id: 57142 } ) • What is my rank? db.scores.count( { result: { $gt: #{my_score.result} }}) • 10 scores from position 100000 db.scores.find({}).sort({ result: -1 }).skip (100000).limit(10) Thursday, September 23, 2010
  • 20. Cassandra ColumFamily: Leaderboards ColumFamily: Scores row_key: game_name row_key: score_id row_key: "Jewels" row_key: 2563 game_id: 57142 result: 100 user_id: 6325 100: 2563 99: 96877 96: 6752 row_key: 96877 row_key: "Bug Landing" game_id: 57142 result: 99 user_id: 2375 row_key: "Toss It" row_key: 6752 ... game_id: 57142 result: 96 user_id: 2311 ... Thursday, September 23, 2010
  • 21. ColumFamily: Leaderboards row_key: game_name Cassandra row_key: "Jewels" 100: 2563 99: 96877 row_key: "Bug Landing" 96: 6752 row_key: "Toss It" • Insert new score: ... client.insert(“ScoreList”, “Jewels”, result => id) client.insert(id, :result => result, :user_id => user_id, :game_id => game_id) • What is my rank? => not easy, need help from other tools • Give me the next 10 scores starting at score X client.get(“ScoreList”, “Jewels”, :start => X.result, count => 10) Thursday, September 23, 2010
  • 22. Findings • Use and test the tools you want to use on the scale you are going to use them • There is no “Best NoSQL” solution • Mix and match the tools you need • NoSQL requires a lot of rethinking and change in your Ruby Code. Thursday, September 23, 2010
  • 23. Links • Cassandra: http://cassandra.apache.org/ • Cassandra API: http://wiki.apache.org/cassandra/API • Twitter on Cassandra: http://github.com/ericflo/twissandra • Redis: http://code.google.com/p/redis/ • Redis API: http://code.google.com/p/redis/wiki/CommandReference • Membase: http://www.membase.org/ • HBase: http://hbase.apache.org/ • Scribe: http://github.com/facebook/scribe • Mongo: http://www.mongodb.org/ Thursday, September 23, 2010