Scaling the Web: Databases & NoSQL

Scaling the Web:
Databases &
NoSQL

Richard Schneeman Wed Nov 10
@schneems works for @Gowalla 2011

whoami
• @Schneems
• BSME with Honors from Georgia Tech
• 5 + years experience Ruby & Rails
• Work for @Gowalla
• Rails 3.1 contributor : )
• 3 + years technical teaching

Compounding Traﬀic
ex. Wikipedia

Gowalla
• 50 best websites NYTimes 2010
• Founded 2009 @ SXSW
• 1 million+ Users
• Undisclosed Visitors
• Loves/highlights/comments/stories/guides
• Facebook/Foursquare/Twitter integration
• iphone/android/web apps
• public API

Gowalla Backend
• Ruby on Rails
• Uses the Ruby Language
• Rails is the Framework

The Web is Data
• Username => String
• Birthday => Int/ Int/ Int
• Blog Post => Text
• Image => Binary-ﬁle/blob

Data needs to be stored
to be useful

Gowalla Database
• PostgreSQL
• Relational (RDBMS)
• Open Source
• Competitor to MySQL
• ACID compliant
• Running on a Dedicated Managed Server

Need for Speed
• Throughput:
• The number of operations per minute that
can be performed

• Pure Speed:
• How long an individual operation takes.

Potential Problems
• Hardware
• Slow Network
• Slow hard-drive
• Insuﬀicient CPU
• Insuﬀicient Ram
• Software
• too many Reads
• too many Writes

Scaling Up versus Out
• Scale Up:
• More CPU, Bigger HD, More Ram etc.
• Scale Out:
• More machines
• More machines
• More machines
• ...

Scale Up
• Bigger faster machine
• More Ram
• More CPU
• Bigger ethernet bus
• ...
• Moores Law
• Diminishing returns

Scale Out
• Forget Moores law...
• Add more nodes
• Master/ Slave Database
• Sharding

Master/Slave
Write

Master DB

Copy
Slave DB Slave DB Slave DB Slave DB

Read

Master & Slave +/-
• Pro
• Increased read speed
• Takes read load oﬀ of master
• Allows us to Join across all tables
• Con
• Doesn’t buy increased write throughput
• Single Point of Failure in Master Node

Sharding
Write

Users in Users in Users in Users in
USA Europe Asia Africa

Read

Sharding +/-
• Pro
• Increased Write & Read throughput
• No Single Point of failure
• Individual features can fail
• Con
• Cannot Join queries between shards

What is a Database?
• Relational Database Managment System
(RDBMS)
• Stores Data Using Schema
• A.C.I.D. compliant
• Atomic
• Consistent
• Isolated
• Durable

RDBMS
• Relational
• Matches data on common characteristics
in data
• Enables “Join” & “Union” queries
• Makes data modular

Relational +/-
• Pros
• Data is modular
• Highly ﬂexible data layout
• Cons
• Getting desired data can be tricky
• Over modularization leads to many join
queries
• Trade oﬀ performance for search-ability

Schema Storage
• Blueprint for data storage
• Break data into tables/columns/rows
• Give data types to your data
• Integer
• String
• Text
• Boolean
• ...

Schema +/-
• Pros
• Regularize our data
• Helps keep data consistent
• Converts to programming “types” easily
• Cons
• Must seperatly manage schema
• Adding columns & indexes to existing
large tables can be painful & slow

ACID
• Properties that guarante a reliably
transaction are processed
database

• Atomic
• Consistent
• Isolated
• Durable

ACID
• Atomic
• Any database Transaction is all or nothing.
• If one part of the transaction fails it all fails

“An Incomplete Transaction Cannot Exist”

ACID
• Consistent
• Any transaction will take the another
from one consistent state to
database

“Only Consistent data is allowed to be
written”

ACID
• Isolated
• No transaction should be able to interfere
with another transaction

“the same ﬁeld cannot be updated by two
sources at the exact same time”

}
a = 0
a += 1 a = ??
a += 2

ACID
• Durable
• Onceway
that
a transaction Is committed it will stay

“Save it once, read it forever”

What is a Database?
• RDBMS
• Relational
• Flexible
• Has a schema
• Most likely ACID compliant
• Typically fast under low load or when
optimized

What is SQL?
• Structured Query Language
• The language databases speak
• Based on relational algebra
• Insert
• Query
• Update
• Delete
“SELECT Company, Country FROM Customers
WHERE Country = 'USA' ”

Why people <3 SQL
• Relational algebra is powerful
• SQL is proven
• well understood
• well documented

Why people </3 SQL
• Relational algebra Is hard
• Diﬀerent databases support diﬀerent SQL
syntax
• Yet another programming language to learn

SQL != Database
• SQL is used to talk to a RDBMS (database)
• SQL is not a RDBMS

What is NoSQL?

Not A
Relational
Database

Types of NoSQL
• Distributed Systems
• Document Store
• Graph Database
• Key-Value Store
• Eventually Consistent Systems

Mix And Match ↑

Key Value Stores
• Non Relational
• Typically No Schema
• Map one Key (a string) to a Value (some
object)

Example: Redis

Key Value Example
redis = Redis.new

redis.set(“foo”, “bar”)

redis.get(“foo”)

>> “bar”

Key Value Example
redis = Redis.new
Key Value
Key
Value
>> “bar”

Key Value
• Like a databse that can only ever use
primary Key (id)

YES
select * from users where id = ‘3’;

NO
select * from users where name = ‘schneems’;

NoSQL @ Gowalla
• Redis (key-value store)
• Store “Likes” & Analytics
• Memcache (key-value store)
• Cache Database results
• Cassandra
• (eventually consistent, with-schema, key
value store)
• Store “feeds” or “timelines”
• Solr (search index)

Memcache
• Key-Value Store
• Open Source
• Distributed
• In memory (ram) only
• fast, but volatile
• Not ACID
• Memory object caching system

Memcache Example
memcache = Memcache.new

memcache.set(“foo”, “bar”)

memcache.get(“foo”)

>> “bar”

Memcache
• Can store whole objects
memcache = Memcache.new
user = User.where(:username => “schneems”)
memcache.set(“user:3”, user)

user_from_cache = memcache.get(“user:3”)
user_from_cache == user
>> true
user_from_cache.username
>> “Schneems”

Memcache @ Gowalla
• Cache Common Queries
• Decreases Load on DB (postgres)
• Enables higher throughput from DB
• Faster response than DB
• Users see quicker page load time

What to Cache?
• Objects that change infrequently
• users
• spots (places)
• etc.
• Expensive(ish) sql queries
• Friend ids for users
• User ids for people visiting spots
• etc.

Memcache Distributed

A

C
B

Memcache Distributed
Easily add more nodes

A D

B C

Memcache <3’s DB
• We use them Together
• If memcache doesn’t have a value
• Fetch from the database
• Set the key from database
• Hard
• Cache Invalidation : (

Redis
• Key Value Store
• Open Source
• Not Distributed (yet)
• Extremely Quick
• “Data structure server”

Redis Example, again
redis = Redis.new



>> “bar”

Redis - Has Data Types
• Strings
• Hashes
• Lists
• Sets
• Sorted Sets

Redis Example, sets
redis = Redis.new
redis.sadd(“foo”, “bar”)
redis.members(“foo”)
>> [“bar”]
redis.sadd(“foo”, “fly”)
redis.members(“foo”)
>> [“bar”, “fly”]

Redis => Likeable
• Very Fast response
• ~ 50 queries per page view
• ~ 1 ms per query
• http://github.com/Gowalla/likeable

Cassandra
• Open Source
• Distributed
• Key Value Store
• Eventually Consistent
• Sortof not ACID
• Uses A Schema
• ColumnFamilies

Cassandra Distributed
Eventual Consistency

A D
Copied To
Extra
Nodes ...
Eventually
Data In B C

Cassandra

{
@ Gowalla

Activity
Feeds

Cassandra @ Gowalla
• Chronologic
• http://github.com/Gowalla/chronologic

Tradeoﬀs
• Every Data store has them
• Know your data store
• Strengths
• Weaknesses

NoSQL vs. RDBMS
• No Magic Bullet
• Use Both!!!
• Model data in a datastore you understand
• Switch to when/if you need to
• Understand Your Options

Questions?

Richard Schneeman
@schneems works for @Gowalla

Scaling the Web: Databases & NoSQL

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (10)

Similar to Scaling the Web: Databases & NoSQL

Similar to Scaling the Web: Databases & NoSQL (20)

More from Richard Schneeman

More from Richard Schneeman (13)

Recently uploaded

Recently uploaded (20)

Scaling the Web: Databases & NoSQL