Scaling the Web: Databases & NoSQL
Upcoming SlideShare
Loading in...5

Scaling the Web: Databases & NoSQL



This is an introduction to relational and non-relational databases and how their performance affects scaling a web application. ...

This is an introduction to relational and non-relational databases and how their performance affects scaling a web application.

This is a recording of a guest Lecture I gave at the University of Texas school of Information.

In this talk I address the technologies and tools Gowalla ( uses including memcache, redis and cassandra.

Find more on my blog:



Total Views
Views on SlideShare
Embed Views



3 Embeds 14

http://localhost 9 3 2



Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

Scaling the Web: Databases & NoSQL Scaling the Web: Databases & NoSQL Presentation Transcript

  • Scaling the Web:Databases &NoSQLRichard Schneeman Wed Nov 10@schneems works for @Gowalla 2011
  • whoami• @Schneems• BSME with Honors from Georgia Tech• 5 + years experience Ruby & Rails • Work for @Gowalla• Rails 3.1 contributor : )• 3 + years technical teaching
  • Traffic
  • Compounding Traffic ex. Wikipedia
  • Compounding Traffic ex. Wikipedia
  • Gowalla
  • Gowalla• 50 best websites NYTimes 2010• Founded 2009 @ SXSW• 1 million+ Users • Undisclosed Visitors• Loves/highlights/comments/stories/guides• Facebook/Foursquare/Twitter integration• iphone/android/web apps• public API
  • Gowalla Backend• Ruby on Rails • Uses the Ruby Language • Rails is the Framework
  • The Web is Data• Username => String• Birthday => Int/ Int/ Int• Blog Post => Text• Image => Binary-file/blob Data needs to be stored to be useful
  • Database
  • Gowalla Database• PostgreSQL • Relational (RDBMS) • Open Source • Competitor to MySQL • ACID compliant• Running on a Dedicated Managed Server
  • Need for Speed• Throughput: • The number of operations per minute that can be performed• Pure Speed: • How long an individual operation takes.
  • Potential Problems• Hardware • Slow Network • Slow hard-drive • Insufficient CPU • Insufficient Ram• Software • too many Reads • too many Writes
  • Scaling Up versus Out• Scale Up: • More CPU, Bigger HD, More Ram etc.• Scale Out: • More machines • More machines • More machines • ...
  • Scale Up• Bigger faster machine • More Ram • More CPU • Bigger ethernet bus • ...• Moores Law• Diminishing returns
  • Scale Out• Forget Moores law...• Add more nodes • Master/ Slave Database • Sharding
  • Master/Slave Write Master DB Copy Slave DB Slave DB Slave DB Slave DB Read
  • Master & Slave +/-• Pro • Increased read speed • Takes read load off of master • Allows us to Join across all tables• Con • Doesn’t buy increased write throughput • Single Point of Failure in Master Node
  • Sharding Write Users in Users in Users in Users in USA Europe Asia Africa Read
  • Sharding +/-• Pro • Increased Write & Read throughput • No Single Point of failure • Individual features can fail• Con • Cannot Join queries between shards
  • What is a Database?• Relational Database Managment System (RDBMS)• Stores Data Using Schema• A.C.I.D. compliant • Atomic • Consistent • Isolated • Durable
  • RDBMS• Relational • Matches data on common characteristics in data • Enables “Join” & “Union” queries• Makes data modular
  • Relational +/-• Pros • Data is modular • Highly flexible data layout• Cons • Getting desired data can be tricky • Over modularization leads to many join queries • Trade off performance for search-ability
  • Schema Storage• Blueprint for data storage• Break data into tables/columns/rows• Give data types to your data • Integer • String • Text • Boolean • ...
  • Schema +/-• Pros • Regularize our data • Helps keep data consistent • Converts to programming “types” easily• Cons • Must seperatly manage schema • Adding columns & indexes to existing large tables can be painful & slow
  • ACID• Properties that guarante a reliably transaction are processed database • Atomic • Consistent • Isolated • Durable
  • ACID• Atomic• Any database Transaction is all or nothing.• If one part of the transaction fails it all fails“An Incomplete Transaction Cannot Exist”
  • ACID• Consistent• Any transaction will take the another from one consistent state to database “Only Consistent data is allowed to be written”
  • ACID• Isolated• No transaction should be able to interfere with another transaction“the same field cannot be updated by two sources at the exact same time” } a = 0 a += 1 a = ?? a += 2
  • ACID• Durable• Onceway that a transaction Is committed it will stay “Save it once, read it forever”
  • What is a Database?• RDBMS • Relational • Flexible • Has a schema • Most likely ACID compliant • Typically fast under low load or when optimized
  • What is SQL? • Structured Query Language • The language databases speak • Based on relational algebra • Insert • Query • Update • Delete“SELECT Company, Country FROM Customers WHERE Country = USA ”
  • Why people <3 SQL• Relational algebra is powerful• SQL is proven • well understood • well documented
  • Why people </3 SQL• Relational algebra Is hard• Different databases support different SQL syntax• Yet another programming language to learn
  • SQL != Database• SQL is used to talk to a RDBMS (database)• SQL is not a RDBMS
  • What is NoSQL? Not A Relational Database
  • Types of NoSQL• Distributed Systems• Document Store• Graph Database• Key-Value Store• Eventually Consistent Systems Mix And Match ↑
  • Key Value Stores• Non Relational• Typically No Schema• Map one Key (a string) to a Value (some object) Example: Redis
  • Key Value Exampleredis = Redis.newredis.set(“foo”, “bar”)redis.get(“foo”)>> “bar”
  • Key Value Exampleredis = Key Valueredis.set(“foo”, “bar”) Keyredis.get(“foo”) Value>> “bar”
  • Key Value • Like a databse that can only ever use primary Key (id)YESselect * from users where id = ‘3’;NOselect * from users where name = ‘schneems’;
  • NoSQL @ Gowalla• Redis (key-value store) • Store “Likes” & Analytics• Memcache (key-value store) • Cache Database results• Cassandra • (eventually consistent, with-schema, key value store) • Store “feeds” or “timelines”• Solr (search index)
  • Memcache• Key-Value Store• Open Source• Distributed• In memory (ram) only • fast, but volatile • Not ACID• Memory object caching system
  • Memcache Examplememcache = Memcache.newmemcache.set(“foo”, “bar”)memcache.get(“foo”)>> “bar”
  • Memcache • Can store whole objectsmemcache = Memcache.newuser = User.where(:username => “schneems”)memcache.set(“user:3”, user)user_from_cache = memcache.get(“user:3”)user_from_cache == user>> trueuser_from_cache.username>> “Schneems”
  • Memcache @ Gowalla• Cache Common Queries • Decreases Load on DB (postgres) • Enables higher throughput from DB • Faster response than DB • Users see quicker page load time
  • What to Cache?• Objects that change infrequently • users • spots (places) • etc.• Expensive(ish) sql queries • Friend ids for users • User ids for people visiting spots • etc.
  • Memcache Distributed A C B
  • Memcache Distributed Easily add more nodes A D B C
  • Memcache <3’s DB• We use them Together• If memcache doesn’t have a value • Fetch from the database • Set the key from database• Hard • Cache Invalidation : (
  • Redis• Key Value Store• Open Source• Not Distributed (yet)• Extremely Quick• “Data structure server”
  • Redis Example, againredis = Redis.newredis.set(“foo”, “bar”)redis.get(“foo”)>> “bar”
  • Redis - Has Data Types• Strings• Hashes• Lists• Sets• Sorted Sets
  • Redis Example, setsredis = Redis.newredis.sadd(“foo”, “bar”)redis.members(“foo”)>> [“bar”]redis.sadd(“foo”, “fly”)redis.members(“foo”)>> [“bar”, “fly”]
  • Redis => Likeable• Very Fast response• ~ 50 queries per page view • ~ 1 ms per query•
  • Cassandra• Open Source• Distributed• Key Value Store• Eventually Consistent • Sortof not ACID• Uses A Schema • ColumnFamilies
  • Cassandra Distributed Eventual Consistency A D Copied To Extra Nodes ... Eventually Data In B C
  • Cassandra {@ Gowalla Activity Feeds
  • Cassandra @ Gowalla• Chronologic•
  • Should I useNoSQL?
  • Which One?
  • Pick theright tool
  • Tradeoffs• Every Data store has them• Know your data store • Strengths • Weaknesses
  • NoSQL vs. RDBMS• No Magic Bullet• Use Both!!!• Model data in a datastore you understand • Switch to when/if you need to• Understand Your Options
  • Questions?Richard Schneeman@schneems works for @Gowalla