No sql findings

NoSQL Findings
Christian van der Leeden

Thursday, September 23, 2010

Our problem
• Growth is not linear and not predictable

• e.g. History::Session table now > 30 Mio entries

• Activities > 26 Mio entries

• Postgres will be the performance bottleneck


Criteria
• Allow us to scale from 100k Daily Active Users (DAU)
to 1 Mio DAU up to 10Mio DAU

• Scale horizontally (“Just add servers”)

• Good ruby performance

• Good transition from Rails/Postgres -> Rails/NoSQL

• Actively developed


Goal
• Scores (@ 10 Mio Daily Active Users)

• 10 Mio Scores/day == 350 inserts/second

• around same read rate for Leaderboards

• Game with 10 Mio Players

• Leaderboard with 10 Mio entries

• Session (@ 10 Mio DAU)

• > 10 Mio session handshakes/day


Data Patterns
• Most data is accessed time based (the most recent
data is accessed the most often)

• Write-Read rate is around the same

• Eventually consistency is good enough most of the
time


Rating criteria
• Type (Document Store, Key/Value Store, Big Table)

• Deployment

• How easy is it to scale?

• Existing installations

• How big are known installations?

• Heritage and activity

• Where does the solution come from and how actively is it
developed by whom?


Products evaluated
• MongoDB

• Redis

• Cassandra

• HBase

• Membase


MongoDB
• document store

• “SQL DB” without relations

• easy transition with MongoMapper, Mongoid

• supports sharding over replication sets (since August
2010)

• Haven’t found a big shareded server installation


Experience with Mongo
• nice/easy to program with

• deployment woes we’ve encountered (1.6.0)

• segmentation fault

• cannot read beacuse: invalid BSON object

• when index is > RAM performance degradation (from
20ms to 200 ms for queries)

• Global write lock makes data migrations slow


Cassandra
• Big Table data store

• Was developed by Facebook and is actively maintained

• Easy to add servers and to setup (peer to peer concept)

• Thrift API to Ruby was slow in tests (Our tests: around 150 write
ops/second)

• Avro API promises to be faster (will be an option in 0.7)

• Used by Facebook

• Not using it because it is too slow with ruby


Redis
• Memcache with simple persistence

• Supports many different data types and atomic
operations on them

• Sharding is done client side (difﬁcult to add new
servers)

• We’re using it for indexes on SQL data

• Very fast (Our tests: 4000 write operations/second)


HBase
• Big Table Database

• Complex to setup and to maintain

• Very often used for Analytics Jobs with Hadoop/HIVE
e.g as Amazon EC2 Elastic Map Reduce

• For Analytics also look at Scribe for data collection


Membase
• Key-Value Store

• Distributed, persistent Memcache

• Easy to add nodes

• Used by Zynga


Example Leaderboards
• User has many scores

• Each score has one result (integer)

• Game has many scores

• Query: the leaderboard for one game

• Insert one score into the leaderboard

• What is my rank?

• Give me 10 scores starting at position 100,000


SQL vs NoSQL
• Think about Data • Think about Queries

• Redundancy is bad • Redundancy is ok

• Indexes are managed by • Roll your own indexes
the DB depending on queries

• Query over relations • No Joins and connecting
entities
• Always exact results
• Query results don’t have to
return latest write
operation


SQL vs NoSQL
• standardized query • some solutions share
language and DDL standards

• All DBs are “the • Many different
same” approaches

• Document store

• Big Table

• Key Value


Postgres
1 n n 1
User Score Game

• Create new score:
Score.new(attributes)
Score.save => insert into scores;

select count(*) from scores inner join games on (games.id =
scores.game_id)
where result > #{my_score.result} and games.name = #{game_name}
order by result desc

• Give me 10 scores in leaderboard from position 100000
select * from scores inner join games on (games.id = scores.game_id)
order by result desc
offset 100000 limit 10;


Redis
SortedSet
• New Score
key: game_name
score: result
value: score_id
redis.zadd(“Jewels”,
key: "Jewels"
result, score_id)
100 99 96
<2563> <96877> <6752>
... • My Rank?
key: "Bug Landing" redis.zrevrank("Jewels",
key: "Toss It" result)
...

• 10 scores from position 100000
KeyValue Store

key: score_id
redis.zrevrange(“Jewels”,
value: marshalled score object
100000, 10)
2563: { result : 100, user_id : 52345, game_id: 57142 }


Mongo
Collection

key: Scores

{ _id: 2563, result : 100, user_id : 52345, game_id: 57142 }

• New Score
Score.create!(attributes)
db.scores.insert( { result: 100, user_id: 52345,
game_id: 57142 } )

db.scores.count( { result: { $gt: #{my_score.result} }})

• 10 scores from position 100000
db.scores.find({}).sort({ result: -1 }).skip
(100000).limit(10)


Cassandra
ColumFamily: Leaderboards ColumFamily: Scores

row_key: game_name row_key: score_id

row_key: "Jewels" row_key: 2563

game_id: 57142 result: 100 user_id: 6325
100: 2563 99: 96877 96: 6752

row_key: 96877
row_key: "Bug Landing"

row_key: "Toss It"
row_key: 6752
...
...


ColumFamily: Leaderboards

row_key: game_name

Cassandra row_key: "Jewels"

100: 2563 99: 96877

row_key: "Bug Landing"
96: 6752

row_key: "Toss It"

• Insert new score: ...

client.insert(“ScoreList”, “Jewels”, result => id)
client.insert(id, :result => result, :user_id =>
user_id, :game_id => game_id)

=> not easy, need help from other tools

• Give me the next 10 scores starting at score X
client.get(“ScoreList”, “Jewels”, :start =>
X.result, count => 10)


Findings
• Use and test the tools you want to use on the scale
you are going to use them

• There is no “Best NoSQL” solution

• Mix and match the tools you need

• NoSQL requires a lot of rethinking and change in
your Ruby Code.


Links
• Cassandra: http://cassandra.apache.org/

• Cassandra API: http://wiki.apache.org/cassandra/API

• Twitter on Cassandra: http://github.com/ericﬂo/twissandra

• Redis: http://code.google.com/p/redis/

• Redis API: http://code.google.com/p/redis/wiki/CommandReference

• Membase: http://www.membase.org/

• HBase: http://hbase.apache.org/

• Scribe: http://github.com/facebook/scribe

• Mongo: http://www.mongodb.org/


No sql findings

More Related Content

Similar to No sql findings

No sql findings