Your SlideShare is downloading. ×
0
Front Range PHP NoSQL Databases
Front Range PHP NoSQL Databases
Front Range PHP NoSQL Databases
Front Range PHP NoSQL Databases
Front Range PHP NoSQL Databases
Front Range PHP NoSQL Databases
Front Range PHP NoSQL Databases
Front Range PHP NoSQL Databases
Front Range PHP NoSQL Databases
Front Range PHP NoSQL Databases
Front Range PHP NoSQL Databases
Front Range PHP NoSQL Databases
Front Range PHP NoSQL Databases
Front Range PHP NoSQL Databases
Front Range PHP NoSQL Databases
Front Range PHP NoSQL Databases
Front Range PHP NoSQL Databases
Front Range PHP NoSQL Databases
Front Range PHP NoSQL Databases
Front Range PHP NoSQL Databases
Front Range PHP NoSQL Databases
Front Range PHP NoSQL Databases
Front Range PHP NoSQL Databases
Front Range PHP NoSQL Databases
Front Range PHP NoSQL Databases
Front Range PHP NoSQL Databases
Front Range PHP NoSQL Databases
Front Range PHP NoSQL Databases
Front Range PHP NoSQL Databases
Front Range PHP NoSQL Databases
Front Range PHP NoSQL Databases
Front Range PHP NoSQL Databases
Front Range PHP NoSQL Databases
Front Range PHP NoSQL Databases
Front Range PHP NoSQL Databases
Front Range PHP NoSQL Databases
Front Range PHP NoSQL Databases
Front Range PHP NoSQL Databases
Front Range PHP NoSQL Databases
Front Range PHP NoSQL Databases
Front Range PHP NoSQL Databases
Front Range PHP NoSQL Databases
Front Range PHP NoSQL Databases
Front Range PHP NoSQL Databases
Front Range PHP NoSQL Databases
Front Range PHP NoSQL Databases
Front Range PHP NoSQL Databases
Front Range PHP NoSQL Databases
Front Range PHP NoSQL Databases
Front Range PHP NoSQL Databases
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Front Range PHP NoSQL Databases

5,498

Published on

The presentation I did for the FrontRange PHP User Group on 3/10/2010.

The presentation I did for the FrontRange PHP User Group on 3/10/2010.

Published in: Technology
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
5,498
On Slideshare
0
From Embeds
0
Number of Embeds
16
Actions
Shares
0
Downloads
100
Comments
0
Likes
3
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • Introduce Disclose work for Basho Working on Dynamo clone for the last couple of years
  • Transcript

    • 1. NoSQL Databases Jon Meredith [email_address]
    • 2. What isn't NoSQL?
      • NOT a standard.
      • 3. NOT a product.
      • 4. NOT a single technology.
    • 5. Well, what is it?
        It's a buzzword .
      • A banner for non-relational databases to organize under.
      • 6. Mostly created in response to scaling and reliability problems.
      • 7. Huge differences between 'NoSQL' systems – but have elements in common.
    • 8. Where did it come from?
      • They've been around for a while
        • Local key/value stores
        • 9. Object databases
        • 10. Graph databases
        • 11. XML databases
      • New problems are emerging
        • Internet search
        • 12. e-commerce
        • 13. Social networking
    • 14. Where did it come from?
      • Some efforts came from scaling the web...
      • 15. Several papers published
        • 2006 – Google BigTable
        • 16. 2007 – Dynamo Paper
      • In 2008 - explosion of data storage projects
      • 17. All shambling under the NoSQL banner.
    • 18. Really, why not use RDBMs?
      • I need to perform arbitrary queries
      • 19. My application needs transactions
      • 20. Data needs to be nicely normalized
      • 21. I have replication for scalabilty/reliability
    • 22. Data Mapping Woes
      • Relational databases divide data into columns made up of tables.
      • 23. Programmers use complex nested data structures
      • Have to map between the two
    • 27. Data Mapping Woes (2)
      • Data in systems evolve over time … which means changes to the schema.
      • 28. Upgrade/rollback scripts have to operate on the whole database – could be millions of rows.
      • 29. Doing phased rollouts is hard … the application needs to do work
    • 30. Alternative!
      • Let the application do it
      • 31. Use convenient language features
        • PHP serialize/unserialize
      • … or use standards for mixed platforms
        • JSON very popular and well supported
        • 32. Google's protocol buffers
        • 33. … even XML
      • Design for forward compatibility
        • Preserve unknown fields
        • 34. Version objects
    • 35. Scalability and Availability
      • Scalability
        • How many requests you can process
      • Availability
        • How does your service degrade as things break.
      • RDBMS solutions - replication and sharding
    • 36. Scaling RDBMs - Replication
      • Master-Slave replication is easiest
      • 37. Every change on the master happens on the slave.
      • 38. Slaves are read-only. Does not scale INSERT, UPDATE, DELETE queries.
      • 39. Application responsible for distributing queries to correct server.
    • 40. Scaling RDBMs - Replication
      • Multi-master ring replication
        • Can update any master
        • 41. Updates travel around the ring
        • 42. What happens when it fails?
          • Reconfigure the ring
        • What happens on return
          • Synchronize the master
          • 43. Add back in to the ring
    • 44. Replication
      • Replication is usually asynchronous for performance – you don't want to wait for the slowest slave on each update.
      • 45. Replication takes time – there is time lag between the first and last server to see an update.
      • 46. You may not read your writes – not getting aCid properties any more.
    • 47. Scaling RDBMS – Sharding
      • Do application level splitting of data
        • Split large table into N smaller tables
        • 48. Use Id modulo N to find the right table
      • Tables could be spread across multiple database servers
        • But the application needs to know where to query
    • 49. Availability
      • If you want availability you need multiple servers – maybe even multiple sites.
      • 50. In the real world you get network partitions
        • Just because you can't see your other data center doesn't mean users can't.
      • What should you do if you can't see the other data center?
    • 51. Availability
      • Degrade one site to read-only
        • Defeats availability
      • If you allow both sites to operate
        • There's a chance two users could modify the same data.
        • 52. The application needs to know how to resolve it
    • 53. The bottom line...
      • Building systems that are
        • ...Scalable...
        • 54. ...Available...
        • 55. ...Maintainable...
        • 56. with an RDBMs requires large efforts by application developers and operational staff
    • 57. It's hard because...
      • Significant work for developers.
        • App needs to convert data to table/columns
        • 58. App needs to know data location
        • 59. App needs to handle failover
        • 60. App needs to handle inconsistency
      • Work for operational staff
        • Fixing replication topologies and synchronizing servers is fiddly work.
    • 61. Last decades bleeding edge is here
      • Organizations with big problems started experimenting with alternatives
      • 62. Developed internal systems during the mid 2000s
        • Distributed by design
        • 63. Different data models
      • Published details in 2006/2007
    • 64. Amazon
      • Huge e-commerce vendor.
      • 65. Amazon cares about customer experience
        • Availabilty
        • 66. Latency at the 99 th percentile
      • Built as an SOA – pages built from hundreds of services.
      • 67. Amazon runs multiple data centers.
        • Hardware failure is their normal state
        • 68. Network partitions common
    • 69. Amazon Requirements
      • Shopping cart service must always be available
      • 70. Customers should be able to view and add to their carts (in their words)
        • If disks are failing
        • 71. Network routes are flapping
        • 72. Data centers are being destroyed by tornadoes
    • 73. Amazon Observations
      • Many services just stored state.
        • Access by primary key
        • 74. No queries
      • Examples
        • Shopping carts
        • 75. Best seller lists
        • 76. Customer profiles
      • Hard to scale out relational databases
    • 77. Amazon Solution: Dynamo
      • Primary key access only
      • 78. Fault tolerant: Keeps N copies of the data
      • 79. Designed for inconsistency
      • 80. Totally decentralized – nodes 'gossip' state
      • 81. Self-healing
    • 82. Eventual Consistency 1
      • Brewer's CAP Theorem
        • Consistency
        • 83. Availability
        • 84. Partition tolerance
      • Pick two out of three!
      • 85. Amazon chose A-P over C
    • 86. Eventual Consistency 2
      • N copies of each value
      • 87. Read operations (get) require 'R' nodes to respond
      • 88. Write operations (put) require 'W' nodes to respond
      • 89. If R+W > N nodes will read their writes (if no failure)
      • 90. NRW tunes the cluster – typically (3,2,2)
    • 91. Eventual Consistency 3
      • Consequence of availability: Conflicts
      • 92. Conflicts can come from
        • Network partitions
        • 93. Applications themselves – no transactions or locking
      • Applications must handle conflicts
      • 94. Dynamo minimizes with vector clocks
    • 95. Vector Clocks
    • 96. Partitioning
    • 97. Example: Shopping Cart
      • User browses site – adds 3 widgets
    • 98. Shopping Cart - Conflict Network Failure
    • 99. Shopping Cart - Merge
    • 100. Open Source Dynamo
      • Dynamo is internal to Amazon
      • 101. Open source options
        • Riak from Basho
        • 102. Project Voldemort
    • 103. Google BigTables
      • Used internally at Google
      • Distributed storage system for structured data
    • 106. Data representation
      • Data stored in tables.
      • 107. Table indexed by {key,timestamp} and a variable number of sparse columns
      • 108. Columns are grouped into column families. Columns in a family are stored together.
      • 109. Each table is broken into tablets.
      • 110. Tablets are stored on a cluster file system (GFS).
    • 111. BigTable – Column Families Copyright Google
    • 112. Map/Reduce
      • Processing framework that sits on top of BigTable.
      • 113. Programmers write two functions map() and reduce().
      • 114. Table is mapped, then reduced.
      • 115. Job control system monitors and resubmits.
    • 116. Map/Reduce Source: institutes.lanl.gov
    • 117. BigTable has inspired... Map/Reduce
    • 120. Explosion of NoSQL Dbs
      • Too many projects
      • 121. Two good resources
        • http://nosql.mypopescu.com/
        • 122. http://www.vineetgupta.com/ 2010/01/nosql-databases-part-1-landscape.html
    • 123. So many projects! Dynamo, BigTables, Redis Riak, Voldemort, CouchDb, Peanuts Hadoop/Hbase, Cassandra, Hypertable MongoDb, Terrastore, Scalaris, BerkleyDB MemcacheDB, Dynomite, Neo4J, TokyoCabinet … and more
    • 124. NoSQL Characteristics
      • Broad types
      • Persistence
      • Distribution
        • Replicated
        • 128. Decentralized
    • 129. Riak from Basho http://riak.basho.com
      • Dynamo clone written in Erlang
      • 130. RESTful HTTP interface
      • 131. Fully distributed
      • 132. Clients for multiple languages
      • 133. Multiple storage backends
      • I work there now!
    • 136. Redis 1.2
    • 143. Redis 1.2 (cont)
      • Values can be strings, sets, ordered sets, lists
      • 144. Operations like increment, decrement, intersection, push, pop
      • 145. In-memory (can be backed by disk)
      • 146. Auto-sharding in client libraries
      • 147. No fault tolerance (coming after 2.0)
      • 148. Example: retwis – Twitter clone in PHP
    • 149. Cassandra
      • http://incubator.apache.org/cassandra/
      • 150. BigTable ColumnFamily data model
      • 151. Dynamo data distribution
      • 152. Written in Java
      • 153. Thrift based interface
      • 154. In use at
    • 156. CouchDB
      • Document oriented database
        • All JSON documents
      • Written in Erlang
      • 157. Used by Ubuntu One
      • 158. HTTP interface
      • 159. Uses Javascript for indexing/mapreduce
      • 160. Incremental replication
    • 161. BerkleyDB
      • Sleepycat now owned by Oracle
      • 162. Key/Value Store
      • Alternative: Tokyo Cabinet
    • 166. I'm out of time
      • MongoDB
      • 167. Neo4J – Graph Database
      • 168. Peanuts – Yahoo
    • 169. This is all great but...
      • Relational databases provide a lot of functionality.
        • Giving up queries
        • 170. Even range queries are hard for distributed hash systems.
        • 171. No transactions – rules out some classes of applications.
        • 172. Space is still evolving
    • 173. Conclusion
      • NoSQL systems give applications the tools they need for scalability/availability
      • 174. They force you to think about distributed design issues like consistency.
      • 175. Play with them!

    ×