Scaling the Web:Databases &NoSQLRichard Schneeman              Wed Nov 10@schneems works for @Gowalla         2011
whoami• @Schneems• BSME with Honors from Georgia Tech• 5 + years experience Ruby & Rails  • Work for @Gowalla• Rails 3.1 c...
Traffic
Compounding Traffic          ex. Wikipedia
Compounding Traffic          ex. Wikipedia
Gowalla
Gowalla• 50 best websites NYTimes 2010• Founded 2009 @ SXSW• 1 million+ Users  • Undisclosed Visitors• Loves/highlights/co...
Gowalla Backend• Ruby on Rails  • Uses the Ruby Language  • Rails is the Framework
The Web is Data• Username => String• Birthday => Int/ Int/ Int• Blog Post => Text• Image => Binary-file/blob  Data needs to...
Database
Gowalla Database• PostgreSQL  • Relational (RDBMS)  • Open Source  • Competitor to MySQL  • ACID compliant• Running on a D...
Need for Speed• Throughput:  • The number of operations per minute that    can be performed• Pure Speed:  • How long an in...
Potential Problems• Hardware  • Slow Network  • Slow hard-drive  • Insufficient CPU  • Insufficient Ram• Software  • too man...
Scaling Up versus Out• Scale Up:  • More CPU, Bigger HD, More Ram etc.• Scale Out:  • More machines  • More machines  • Mo...
Scale Up• Bigger faster machine  • More Ram  • More CPU  • Bigger ethernet bus  • ...• Moores Law• Diminishing returns
Scale Out• Forget Moores law...• Add more nodes  • Master/ Slave Database  • Sharding
Master/Slave                Write                Master DB                 Copy  Slave DB   Slave DB   Slave DB   Slave DB...
Master & Slave +/-• Pro  • Increased read speed  • Takes read load off of master  • Allows us to Join across all tables• Co...
Sharding                  Write  Users in   Users in   Users in   Users in   USA       Europe      Asia       Africa      ...
Sharding +/-• Pro  • Increased Write & Read throughput  • No Single Point of failure    • Individual features can fail• Co...
What is a Database?• Relational Database Managment System  (RDBMS)• Stores Data Using Schema• A.C.I.D. compliant  • Atomic...
RDBMS• Relational  • Matches data on common characteristics    in data  • Enables “Join” & “Union” queries• Makes data mod...
Relational +/-• Pros  • Data is modular  • Highly flexible data layout• Cons  • Getting desired data can be tricky  • Over ...
Schema Storage• Blueprint for data storage• Break data into tables/columns/rows• Give data types to your data  • Integer  ...
Schema +/-• Pros  • Regularize our data  • Helps keep data consistent  • Converts to programming “types” easily• Cons  • M...
ACID• Properties that guarante a reliably  transaction are processed                              database  • Atomic  • Co...
ACID• Atomic• Any database Transaction is all or nothing.• If one part of the transaction fails it all fails“An Incomplete...
ACID• Consistent• Any transaction will take the another  from one consistent state to                                datab...
ACID• Isolated• No transaction should be able to interfere  with another transaction“the same field cannot be updated by tw...
ACID• Durable• Onceway  that       a transaction Is committed it will stay      “Save it once, read it forever”
What is a Database?• RDBMS  • Relational  • Flexible  • Has a schema  • Most likely ACID compliant  • Typically fast under...
What is SQL?  • Structured Query Language  • The language databases speak  • Based on relational algebra    • Insert    • ...
Why people <3 SQL• Relational algebra is powerful• SQL is proven  • well understood  • well documented
Why people </3 SQL• Relational algebra Is hard• Different databases support different SQL  syntax• Yet another programming l...
SQL != Database• SQL is used to talk to a RDBMS (database)• SQL is not a RDBMS
What is NoSQL?  Not A  Relational  Database
RDBMS
Types of NoSQL• Distributed Systems• Document Store• Graph Database• Key-Value Store• Eventually Consistent Systems       ...
Key Value Stores• Non Relational• Typically No Schema• Map one Key (a string) to a Value (some  object)         Example: R...
Key Value Exampleredis = Redis.newredis.set(“foo”, “bar”)redis.get(“foo”)>> “bar”
Key Value Exampleredis = Redis.new           Key      Valueredis.set(“foo”, “bar”)           Keyredis.get(“foo”)   Value>>...
Key Value  • Like a databse that can only ever use    primary Key (id)YESselect * from users where id = ‘3’;NOselect * fro...
NoSQL @ Gowalla• Redis (key-value store)  • Store “Likes” & Analytics• Memcache (key-value store)  • Cache Database result...
Memcache• Key-Value Store• Open Source• Distributed• In memory (ram) only  • fast, but volatile  • Not ACID• Memory object...
Memcache Examplememcache = Memcache.newmemcache.set(“foo”, “bar”)memcache.get(“foo”)>> “bar”
Memcache  • Can store whole objectsmemcache = Memcache.newuser = User.where(:username => “schneems”)memcache.set(“user:3”,...
Memcache @ Gowalla• Cache Common Queries  • Decreases Load on DB (postgres)    • Enables higher throughput from DB  • Fast...
What to Cache?• Objects that change infrequently  • users  • spots (places)  • etc.• Expensive(ish) sql queries  • Friend ...
Memcache Distributed              A                   C          B
Memcache Distributed          Easily add more nodes          A          D          B         C
Memcache <3’s DB• We use them Together• If memcache doesn’t have a value  • Fetch from the database  • Set the key from da...
Redis• Key Value Store• Open Source• Not Distributed (yet)• Extremely Quick• “Data structure server”
Redis Example, againredis = Redis.newredis.set(“foo”, “bar”)redis.get(“foo”)>> “bar”
Redis - Has Data Types• Strings• Hashes• Lists• Sets• Sorted Sets
Redis Example, setsredis = Redis.newredis.sadd(“foo”, “bar”)redis.members(“foo”)>> [“bar”]redis.sadd(“foo”, “fly”)redis.me...
Redis => Likeable• Very Fast response• ~ 50 queries per page view  • ~ 1 ms per query• http://github.com/Gowalla/likeable
Cassandra• Open Source• Distributed• Key Value Store• Eventually Consistent  • Sortof not ACID• Uses A Schema  • ColumnFam...
Cassandra Distributed           Eventual Consistency           A          D                          Copied To            ...
Cassandra            {@ Gowalla Activity Feeds
Cassandra @ Gowalla• Chronologic• http://github.com/Gowalla/chronologic
Should I useNoSQL?
Which One?
Pick theright tool
Tradeoffs• Every Data store has them• Know your data store  • Strengths  • Weaknesses
NoSQL vs. RDBMS• No Magic Bullet• Use Both!!!• Model data in a datastore you understand  • Switch to when/if you need to• ...
Questions?Richard Schneeman@schneems works for @Gowalla
Scaling the Web: Databases & NoSQL
Upcoming SlideShare
Loading in …5
×

Scaling the Web: Databases & NoSQL

1,826 views
1,718 views

Published on

This is an introduction to relational and non-relational databases and how their performance affects scaling a web application.

This is a recording of a guest Lecture I gave at the University of Texas school of Information.

In this talk I address the technologies and tools Gowalla (gowalla.com) uses including memcache, redis and cassandra.

Find more on my blog:
http://schneems.com

Published in: Technology
1 Comment
8 Likes
Statistics
Notes
No Downloads
Views
Total views
1,826
On SlideShare
0
From Embeds
0
Number of Embeds
19
Actions
Shares
0
Downloads
49
Comments
1
Likes
8
Embeds 0
No embeds

No notes for slide

Scaling the Web: Databases & NoSQL

  1. 1. Scaling the Web:Databases &NoSQLRichard Schneeman Wed Nov 10@schneems works for @Gowalla 2011
  2. 2. whoami• @Schneems• BSME with Honors from Georgia Tech• 5 + years experience Ruby & Rails • Work for @Gowalla• Rails 3.1 contributor : )• 3 + years technical teaching
  3. 3. Traffic
  4. 4. Compounding Traffic ex. Wikipedia
  5. 5. Compounding Traffic ex. Wikipedia
  6. 6. Gowalla
  7. 7. Gowalla• 50 best websites NYTimes 2010• Founded 2009 @ SXSW• 1 million+ Users • Undisclosed Visitors• Loves/highlights/comments/stories/guides• Facebook/Foursquare/Twitter integration• iphone/android/web apps• public API
  8. 8. Gowalla Backend• Ruby on Rails • Uses the Ruby Language • Rails is the Framework
  9. 9. The Web is Data• Username => String• Birthday => Int/ Int/ Int• Blog Post => Text• Image => Binary-file/blob Data needs to be stored to be useful
  10. 10. Database
  11. 11. Gowalla Database• PostgreSQL • Relational (RDBMS) • Open Source • Competitor to MySQL • ACID compliant• Running on a Dedicated Managed Server
  12. 12. Need for Speed• Throughput: • The number of operations per minute that can be performed• Pure Speed: • How long an individual operation takes.
  13. 13. Potential Problems• Hardware • Slow Network • Slow hard-drive • Insufficient CPU • Insufficient Ram• Software • too many Reads • too many Writes
  14. 14. Scaling Up versus Out• Scale Up: • More CPU, Bigger HD, More Ram etc.• Scale Out: • More machines • More machines • More machines • ...
  15. 15. Scale Up• Bigger faster machine • More Ram • More CPU • Bigger ethernet bus • ...• Moores Law• Diminishing returns
  16. 16. Scale Out• Forget Moores law...• Add more nodes • Master/ Slave Database • Sharding
  17. 17. Master/Slave Write Master DB Copy Slave DB Slave DB Slave DB Slave DB Read
  18. 18. Master & Slave +/-• Pro • Increased read speed • Takes read load off of master • Allows us to Join across all tables• Con • Doesn’t buy increased write throughput • Single Point of Failure in Master Node
  19. 19. Sharding Write Users in Users in Users in Users in USA Europe Asia Africa Read
  20. 20. Sharding +/-• Pro • Increased Write & Read throughput • No Single Point of failure • Individual features can fail• Con • Cannot Join queries between shards
  21. 21. What is a Database?• Relational Database Managment System (RDBMS)• Stores Data Using Schema• A.C.I.D. compliant • Atomic • Consistent • Isolated • Durable
  22. 22. RDBMS• Relational • Matches data on common characteristics in data • Enables “Join” & “Union” queries• Makes data modular
  23. 23. Relational +/-• Pros • Data is modular • Highly flexible data layout• Cons • Getting desired data can be tricky • Over modularization leads to many join queries • Trade off performance for search-ability
  24. 24. Schema Storage• Blueprint for data storage• Break data into tables/columns/rows• Give data types to your data • Integer • String • Text • Boolean • ...
  25. 25. Schema +/-• Pros • Regularize our data • Helps keep data consistent • Converts to programming “types” easily• Cons • Must seperatly manage schema • Adding columns & indexes to existing large tables can be painful & slow
  26. 26. ACID• Properties that guarante a reliably transaction are processed database • Atomic • Consistent • Isolated • Durable
  27. 27. ACID• Atomic• Any database Transaction is all or nothing.• If one part of the transaction fails it all fails“An Incomplete Transaction Cannot Exist”
  28. 28. ACID• Consistent• Any transaction will take the another from one consistent state to database “Only Consistent data is allowed to be written”
  29. 29. ACID• Isolated• No transaction should be able to interfere with another transaction“the same field cannot be updated by two sources at the exact same time” } a = 0 a += 1 a = ?? a += 2
  30. 30. ACID• Durable• Onceway that a transaction Is committed it will stay “Save it once, read it forever”
  31. 31. What is a Database?• RDBMS • Relational • Flexible • Has a schema • Most likely ACID compliant • Typically fast under low load or when optimized
  32. 32. What is SQL? • Structured Query Language • The language databases speak • Based on relational algebra • Insert • Query • Update • Delete“SELECT Company, Country FROM Customers WHERE Country = USA ”
  33. 33. Why people <3 SQL• Relational algebra is powerful• SQL is proven • well understood • well documented
  34. 34. Why people </3 SQL• Relational algebra Is hard• Different databases support different SQL syntax• Yet another programming language to learn
  35. 35. SQL != Database• SQL is used to talk to a RDBMS (database)• SQL is not a RDBMS
  36. 36. What is NoSQL? Not A Relational Database
  37. 37. RDBMS
  38. 38. Types of NoSQL• Distributed Systems• Document Store• Graph Database• Key-Value Store• Eventually Consistent Systems Mix And Match ↑
  39. 39. Key Value Stores• Non Relational• Typically No Schema• Map one Key (a string) to a Value (some object) Example: Redis
  40. 40. Key Value Exampleredis = Redis.newredis.set(“foo”, “bar”)redis.get(“foo”)>> “bar”
  41. 41. Key Value Exampleredis = Redis.new Key Valueredis.set(“foo”, “bar”) Keyredis.get(“foo”) Value>> “bar”
  42. 42. Key Value • Like a databse that can only ever use primary Key (id)YESselect * from users where id = ‘3’;NOselect * from users where name = ‘schneems’;
  43. 43. NoSQL @ Gowalla• Redis (key-value store) • Store “Likes” & Analytics• Memcache (key-value store) • Cache Database results• Cassandra • (eventually consistent, with-schema, key value store) • Store “feeds” or “timelines”• Solr (search index)
  44. 44. Memcache• Key-Value Store• Open Source• Distributed• In memory (ram) only • fast, but volatile • Not ACID• Memory object caching system
  45. 45. Memcache Examplememcache = Memcache.newmemcache.set(“foo”, “bar”)memcache.get(“foo”)>> “bar”
  46. 46. Memcache • Can store whole objectsmemcache = Memcache.newuser = User.where(:username => “schneems”)memcache.set(“user:3”, user)user_from_cache = memcache.get(“user:3”)user_from_cache == user>> trueuser_from_cache.username>> “Schneems”
  47. 47. Memcache @ Gowalla• Cache Common Queries • Decreases Load on DB (postgres) • Enables higher throughput from DB • Faster response than DB • Users see quicker page load time
  48. 48. What to Cache?• Objects that change infrequently • users • spots (places) • etc.• Expensive(ish) sql queries • Friend ids for users • User ids for people visiting spots • etc.
  49. 49. Memcache Distributed A C B
  50. 50. Memcache Distributed Easily add more nodes A D B C
  51. 51. Memcache <3’s DB• We use them Together• If memcache doesn’t have a value • Fetch from the database • Set the key from database• Hard • Cache Invalidation : (
  52. 52. Redis• Key Value Store• Open Source• Not Distributed (yet)• Extremely Quick• “Data structure server”
  53. 53. Redis Example, againredis = Redis.newredis.set(“foo”, “bar”)redis.get(“foo”)>> “bar”
  54. 54. Redis - Has Data Types• Strings• Hashes• Lists• Sets• Sorted Sets
  55. 55. Redis Example, setsredis = Redis.newredis.sadd(“foo”, “bar”)redis.members(“foo”)>> [“bar”]redis.sadd(“foo”, “fly”)redis.members(“foo”)>> [“bar”, “fly”]
  56. 56. Redis => Likeable• Very Fast response• ~ 50 queries per page view • ~ 1 ms per query• http://github.com/Gowalla/likeable
  57. 57. Cassandra• Open Source• Distributed• Key Value Store• Eventually Consistent • Sortof not ACID• Uses A Schema • ColumnFamilies
  58. 58. Cassandra Distributed Eventual Consistency A D Copied To Extra Nodes ... Eventually Data In B C
  59. 59. Cassandra {@ Gowalla Activity Feeds
  60. 60. Cassandra @ Gowalla• Chronologic• http://github.com/Gowalla/chronologic
  61. 61. Should I useNoSQL?
  62. 62. Which One?
  63. 63. Pick theright tool
  64. 64. Tradeoffs• Every Data store has them• Know your data store • Strengths • Weaknesses
  65. 65. NoSQL vs. RDBMS• No Magic Bullet• Use Both!!!• Model data in a datastore you understand • Switch to when/if you need to• Understand Your Options
  66. 66. Questions?Richard Schneeman@schneems works for @Gowalla

×