• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Scaling the Web: Databases & NoSQL

Scaling the Web: Databases & NoSQL



This is an introduction to relational and non-relational databases and how their performance affects scaling a web application. ...

This is an introduction to relational and non-relational databases and how their performance affects scaling a web application.

This is a recording of a guest Lecture I gave at the University of Texas school of Information.

In this talk I address the technologies and tools Gowalla (gowalla.com) uses including memcache, redis and cassandra.

Find more on my blog:



Total Views
Views on SlideShare
Embed Views



3 Embeds 14

http://localhost 9
http://www.hanrss.com 3
http://www.tumblr.com 2



Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.


11 of 1 previous next

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

    Scaling the Web: Databases & NoSQL Scaling the Web: Databases & NoSQL Presentation Transcript

    • Scaling the Web:Databases &NoSQLRichard Schneeman Wed Nov 10@schneems works for @Gowalla 2011
    • whoami• @Schneems• BSME with Honors from Georgia Tech• 5 + years experience Ruby & Rails • Work for @Gowalla• Rails 3.1 contributor : )• 3 + years technical teaching
    • Traffic
    • Compounding Traffic ex. Wikipedia
    • Compounding Traffic ex. Wikipedia
    • Gowalla
    • Gowalla• 50 best websites NYTimes 2010• Founded 2009 @ SXSW• 1 million+ Users • Undisclosed Visitors• Loves/highlights/comments/stories/guides• Facebook/Foursquare/Twitter integration• iphone/android/web apps• public API
    • Gowalla Backend• Ruby on Rails • Uses the Ruby Language • Rails is the Framework
    • The Web is Data• Username => String• Birthday => Int/ Int/ Int• Blog Post => Text• Image => Binary-file/blob Data needs to be stored to be useful
    • Database
    • Gowalla Database• PostgreSQL • Relational (RDBMS) • Open Source • Competitor to MySQL • ACID compliant• Running on a Dedicated Managed Server
    • Need for Speed• Throughput: • The number of operations per minute that can be performed• Pure Speed: • How long an individual operation takes.
    • Potential Problems• Hardware • Slow Network • Slow hard-drive • Insufficient CPU • Insufficient Ram• Software • too many Reads • too many Writes
    • Scaling Up versus Out• Scale Up: • More CPU, Bigger HD, More Ram etc.• Scale Out: • More machines • More machines • More machines • ...
    • Scale Up• Bigger faster machine • More Ram • More CPU • Bigger ethernet bus • ...• Moores Law• Diminishing returns
    • Scale Out• Forget Moores law...• Add more nodes • Master/ Slave Database • Sharding
    • Master/Slave Write Master DB Copy Slave DB Slave DB Slave DB Slave DB Read
    • Master & Slave +/-• Pro • Increased read speed • Takes read load off of master • Allows us to Join across all tables• Con • Doesn’t buy increased write throughput • Single Point of Failure in Master Node
    • Sharding Write Users in Users in Users in Users in USA Europe Asia Africa Read
    • Sharding +/-• Pro • Increased Write & Read throughput • No Single Point of failure • Individual features can fail• Con • Cannot Join queries between shards
    • What is a Database?• Relational Database Managment System (RDBMS)• Stores Data Using Schema• A.C.I.D. compliant • Atomic • Consistent • Isolated • Durable
    • RDBMS• Relational • Matches data on common characteristics in data • Enables “Join” & “Union” queries• Makes data modular
    • Relational +/-• Pros • Data is modular • Highly flexible data layout• Cons • Getting desired data can be tricky • Over modularization leads to many join queries • Trade off performance for search-ability
    • Schema Storage• Blueprint for data storage• Break data into tables/columns/rows• Give data types to your data • Integer • String • Text • Boolean • ...
    • Schema +/-• Pros • Regularize our data • Helps keep data consistent • Converts to programming “types” easily• Cons • Must seperatly manage schema • Adding columns & indexes to existing large tables can be painful & slow
    • ACID• Properties that guarante a reliably transaction are processed database • Atomic • Consistent • Isolated • Durable
    • ACID• Atomic• Any database Transaction is all or nothing.• If one part of the transaction fails it all fails“An Incomplete Transaction Cannot Exist”
    • ACID• Consistent• Any transaction will take the another from one consistent state to database “Only Consistent data is allowed to be written”
    • ACID• Isolated• No transaction should be able to interfere with another transaction“the same field cannot be updated by two sources at the exact same time” } a = 0 a += 1 a = ?? a += 2
    • ACID• Durable• Onceway that a transaction Is committed it will stay “Save it once, read it forever”
    • What is a Database?• RDBMS • Relational • Flexible • Has a schema • Most likely ACID compliant • Typically fast under low load or when optimized
    • What is SQL? • Structured Query Language • The language databases speak • Based on relational algebra • Insert • Query • Update • Delete“SELECT Company, Country FROM Customers WHERE Country = USA ”
    • Why people <3 SQL• Relational algebra is powerful• SQL is proven • well understood • well documented
    • Why people </3 SQL• Relational algebra Is hard• Different databases support different SQL syntax• Yet another programming language to learn
    • SQL != Database• SQL is used to talk to a RDBMS (database)• SQL is not a RDBMS
    • What is NoSQL? Not A Relational Database
    • RDBMS
    • Types of NoSQL• Distributed Systems• Document Store• Graph Database• Key-Value Store• Eventually Consistent Systems Mix And Match ↑
    • Key Value Stores• Non Relational• Typically No Schema• Map one Key (a string) to a Value (some object) Example: Redis
    • Key Value Exampleredis = Redis.newredis.set(“foo”, “bar”)redis.get(“foo”)>> “bar”
    • Key Value Exampleredis = Redis.new Key Valueredis.set(“foo”, “bar”) Keyredis.get(“foo”) Value>> “bar”
    • Key Value • Like a databse that can only ever use primary Key (id)YESselect * from users where id = ‘3’;NOselect * from users where name = ‘schneems’;
    • NoSQL @ Gowalla• Redis (key-value store) • Store “Likes” & Analytics• Memcache (key-value store) • Cache Database results• Cassandra • (eventually consistent, with-schema, key value store) • Store “feeds” or “timelines”• Solr (search index)
    • Memcache• Key-Value Store• Open Source• Distributed• In memory (ram) only • fast, but volatile • Not ACID• Memory object caching system
    • Memcache Examplememcache = Memcache.newmemcache.set(“foo”, “bar”)memcache.get(“foo”)>> “bar”
    • Memcache • Can store whole objectsmemcache = Memcache.newuser = User.where(:username => “schneems”)memcache.set(“user:3”, user)user_from_cache = memcache.get(“user:3”)user_from_cache == user>> trueuser_from_cache.username>> “Schneems”
    • Memcache @ Gowalla• Cache Common Queries • Decreases Load on DB (postgres) • Enables higher throughput from DB • Faster response than DB • Users see quicker page load time
    • What to Cache?• Objects that change infrequently • users • spots (places) • etc.• Expensive(ish) sql queries • Friend ids for users • User ids for people visiting spots • etc.
    • Memcache Distributed A C B
    • Memcache Distributed Easily add more nodes A D B C
    • Memcache <3’s DB• We use them Together• If memcache doesn’t have a value • Fetch from the database • Set the key from database• Hard • Cache Invalidation : (
    • Redis• Key Value Store• Open Source• Not Distributed (yet)• Extremely Quick• “Data structure server”
    • Redis Example, againredis = Redis.newredis.set(“foo”, “bar”)redis.get(“foo”)>> “bar”
    • Redis - Has Data Types• Strings• Hashes• Lists• Sets• Sorted Sets
    • Redis Example, setsredis = Redis.newredis.sadd(“foo”, “bar”)redis.members(“foo”)>> [“bar”]redis.sadd(“foo”, “fly”)redis.members(“foo”)>> [“bar”, “fly”]
    • Redis => Likeable• Very Fast response• ~ 50 queries per page view • ~ 1 ms per query• http://github.com/Gowalla/likeable
    • Cassandra• Open Source• Distributed• Key Value Store• Eventually Consistent • Sortof not ACID• Uses A Schema • ColumnFamilies
    • Cassandra Distributed Eventual Consistency A D Copied To Extra Nodes ... Eventually Data In B C
    • Cassandra {@ Gowalla Activity Feeds
    • Cassandra @ Gowalla• Chronologic• http://github.com/Gowalla/chronologic
    • Should I useNoSQL?
    • Which One?
    • Pick theright tool
    • Tradeoffs• Every Data store has them• Know your data store • Strengths • Weaknesses
    • NoSQL vs. RDBMS• No Magic Bullet• Use Both!!!• Model data in a datastore you understand • Switch to when/if you need to• Understand Your Options
    • Questions?Richard Schneeman@schneems works for @Gowalla