Polyglot Persistence

1,390 views

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,390
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
13
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Polyglot Persistence

  1. 1. Polyglot Persistence Scott Leberknight
  2. 2. Polyglot?
  3. 3. http://memeagora.blogspot.com/2006/12/polyglot-programming.html Neal Ford December 2006 Polyglot Programming
  4. 4. http://www.amazon.com/Paradox-Choice-Why-More-Less/dp/0060005688
  5. 5. First web frameworks...
  6. 6. http://java-source.net/open-source/web-frameworks
  7. 7. non-Java web frameworks too!
  8. 8. ...then AJAX and JavaScript
  9. 9. InitialContext ic = new InitialContext(); DataSource ds = ic.lookup("java:comp/env/jdbc/cof Connection con = null; Statement stmt = null; ResultSet rs = null; try { con = ds.getConnection(); stmt = con.createStatement(); rs = stmt.executeQuery("select name, price from List<Coffee> coffees = new ArrayList<Cofee>(); while (rs.next()) { String name = rs.getString("name"); float price = rs.getFloat("price"); coffees.add(new Coffee(name, price); } } catch (SQLException sqlex) { ...and now PERSISTENCE
  10. 10. Why? Scalability (on massive scales) High availability New types of apps, e.g. social networking Fault tolerance Distributability Flexibility (i.e. "schemaless")
  11. 11. Why? One size does not fit all
  12. 12. Relational Document Oriented Object Bigtable-ish A few types of Databases... Key-value EAV (Entity-Attribute-Value)
  13. 13. Structured Semi-Structured UnstructuredTypes of data
  14. 14. ACID vs. BASE
  15. 15. ACID Atomic Consistent Isolated Durable
  16. 16. ACID in Action 1st Bank checking savings customers Transfer $1000 from 1st Bank checking to savings
  17. 17. BASE Basically Available Soft State Eventually Consistent
  18. 18. BASE in Action 1st Bank checking savings customers Transfer $1000 from 1st Bank checking to Bank of Foo savings Bank of Foo account account_type customer
  19. 19. Schedule, Cost, Quality (choose any 2)
  20. 20. Brewer's Conjecture
  21. 21. "When designing distributed web services, there are three properties that are commonly desired: consistency, availability, and partition tolerance. It is impossible to achieve all three." - "Brewer's Conjecture and the Feasibility of Consistent, Available, Partition-Tolerant Web Services" Seth Gilbert and Nancy Lynch (MIT)
  22. 22. Consistency Partition-tolerance Availability (choose any 2)
  23. 23. We're living in interesting times... Explosion of alternative persistence choices Completely new philosophies on persistence
  24. 24. Whirlwind tour... Relational Document-Oriented Key/Value Bigtable
  25. 25. Ankle-deep
  26. 26. Relational Databases blog blog_entry blog_entry_comment category daily_statistics blog_owner blog_user
  27. 27. Relations (tables, joins, integrity) ACID guarantees Query using SQL Strict schema Difficult to scale, partition (e.g. 2-phase commit) By far most popular persistence choice today Mismatch with OO languages
  28. 28. select * from fakenames f where f.surname like 'Smi%' and f.city = 'Richmond' and f.state = 'VA' order by f.surname, f.given_name; 28
  29. 29. Scaling... Buy a bigger machine (vertical scaling)
  30. 30. What if there is no bigger machine? Horizontal scaling: Functional Sharding
  31. 31. Users 0 Users 1 Products 0 Orders 0 Orders 1 Orders 2 Functional Shards
  32. 32. Document-Oriented Databases
  33. 33. "As opposed to Relational Databases, document-based databases do not store data in tables with uniform sized fields for each record. Instead, each record is stored as a document that has certain characteristics. Any number of fields of any length can be added to a document. Fields can also contain multiple pieces of data." - Wikipedia (http://en.wikipedia.org/wiki/Document-oriented_database)
  34. 34. Examples: Lotus Notes Apache CouchDB Amazon SimpleDB (for our purposes anyway) MongoDB
  35. 35. CouchDB
  36. 36. Architecture
  37. 37. Concepts: Documents Views Schemaless Distributed
  38. 38. RESTful...
  39. 39. Views JavaScript as description language Map/Reduce functions Add structure to semi-structured data Independent of actual documents (created in special Design Documents)
  40. 40. function(doc) { emit(null, doc); } 42 Simplest map function...
  41. 41. // Map function to find Seattlites function(doc) { if (doc.State == "WA" && doc.City == "Seattle") { emit(doc.Number, { "GivenName":doc.GivenName, "Surname":doc.Surname }); } } 43
  42. 42. // Map function function(doc) { emit(doc.State, 1); } // Reduce function; aggregates counts function (key, values) { return sum(values); } 44 Counting people by state...
  43. 43. Views are not meant to be created dynamically like SQL queries! Caution: To keep view querying fast, the view engine maintains indexes of its views, and incrementally updates them to reflect changes in the database. CouchDB’s core design is largely optimized around the need for efficient, incremental creation of views and their indexes. - http://couchdb.apache.org/docs/overview.html
  44. 44. Amazon SimpleDB
  45. 45. "Amazon SimpleDB is a web service for running queries on structured data in real time. This service works in close conjunction with Amazon Simple Storage Service (Amazon S3) and Amazon Elastic Compute Cloud (Amazon EC2), collectively providing the ability to store, process and query data sets in the cloud. These services are designed to make web-scale computing easier and more cost-effective for developers." - SimpleDB Developer Guide (Version 2007-11-07)
  46. 46. "A traditional, clustered relational database requires a sizable upfront capital outlay, is complex to design, and often requires a DBA to maintain and administer.Amazon SimpleDB is dramatically simpler, requiring no schema, automatically indexing your data and providing a simple API for storage and access.This approach eliminates the administrative burden of data modeling, index maintenance, and performance tuning. Developers gain access to this functionality within Amazon’s proven computing environment, are able to scale instantly, and pay only for what they use." - SimpleDB Developer Guide (Version 2007-11-07)
  47. 47. Organize data into domains Domains have items Items have attributes Attributes have value(s)
  48. 48. Domain: Fakenames "5" "6/6/1941" "Gwendolyn" EmailAddress "Michael" "1" "9/5/1982" "Chris" "David" "11/18/1963""3" "Swinton" ID "Vera" "Johnson" Birthday "vsutton@coldmail.com" "Vera.M.Sutton@dodgit.com" "4" GivenName "9/20/1951""gswinton@dodgit.com" "Lewis" "2" "mjohnson@stopit.com" "michael.johnson@yaboo.com" "Sutton" "7/14/1952" "david.schuler@goofymail.com" "dschuler@yaboo.com" "schulerd@xyzco.com" Surname "Schuler" Items Attributes Values
  49. 49. Domain: Amazon "Full Screen" "Mens" "Entertainment" Color Size Length "DVDs" "White" "Yellow" "Beige" "Pink" Format "Clothes" "Blue" "Gray" "Black" "Books" "Sound of Music" "Item03" "Blouse" "Item02" "Full Screen" "Widescreen" "Entertainment" "174 min" SubcategoryID Author "Kurt Vonnegut " "Womens" "Item04" "Item05" "Item01" "Pulp Fiction""DVDs" Name "Small" "Medium" "Large" "Slaugherhouse Five" Category "Clothes" "Entertainment" "154 min" "168 min (special edition)" "30x30" "32x30" "34x30" ... "Jeans"
  50. 50. "REST" API POST / HTTP/1.1 Content-Type: application/x-www-form-urlencoded; charset=utf-8 User-Agent: Amazon Simple DB Java Library Host: sdb.amazonaws.com Content-Length: 232 Action=CreateDomain& DomainName=Fakenames& AWSAccessKeyId=[your AWS access key id]& SignatureVersion=2& SignatureMethod=HmacSHA256& Signature=[computed signature]& Timestamp=2009-03-23T23%3A58%3A55.327Z& Version=2007-11-07
  51. 51. Available APIs: Java C# Perl PHP VB Ruby gems: aws-simpledb aws-sdb simpledb Amazon 3rdparty Python: polarrose-twisted-amazon
  52. 52. AmazonSimpleDB service = new AmazonSimpleDBClient(accessKeyId, secretAccessKey); // Create a new domain CreateDomainRequest cdReq = new CreateDomainRequest().withDomainName("Fakenames"); CreateDomainResponse cdResp = service.createDomain(cdReq); // List all our domains ListDomainsRequest ldReq = new ListDomainsRequest(); ListDomainsResponse ldResp = service.listDomains(ldReq); 54
  53. 53. Sample response: <ListDomainsResponse xmlns="http://sdb.amazonaws.com/doc/2007-11-07/"> <ListDomainsResult> <DomainName> Fakenames </DomainName> <DomainName> Movies </DomainName> </ListDomainsResult> <ResponseMetadata> <RequestId> 8c4d0240-49ea-5d2f-9573-437324cd144c </RequestId> <BoxUsage> 0.0000071759 </BoxUsage> </ResponseMetadata> </ListDomainsResponse>
  54. 54. // Add an attribute value ReplaceableAttribute newEmail = new ReplaceableAttribute("emailAddress", "bortiz@spammail.com", false); PutAttributesRequest request = new PutAttributesRequest() .withDomainName("Fakenames") .withItemName("1") .withAttribute(newEmail); PutAttributesResponse response = service.putAttributes(request); 56
  55. 55. Query API
  56. 56. // Query for Richmonders String query = "['city' = 'Richmond'] intersection ['state' = 'VA']"; QueryRequest request = new QueryRequest() .withDomainName("Fakenames") .withQueryExpression(query); QueryResponse response = service.query(request); 58
  57. 57. // Query for Richmonders, with attributes String query = "['city' = 'Richmond'] intersection ['state' = 'VA']"; QueryWithAttributesRequest request = new QueryWithAttributesRequest() .withDomainName("Fakenames") .withQueryExpression(query); QueryWithAttributesResponse response = service.query(request); 59
  58. 58. SELECT API
  59. 59. // Get a count String query = "select count(*) from Fakenames"; SelectRequest request = new SelectRequest().withSelectExpression(query); SelectResponse response = service.select(request); 61
  60. 60. // Select Richmonders String query = "select * from Fakenames" + " where city = 'Richmond' intersection state = 'VA'" + " intersection surname like 'Smi%'"; SelectRequest request = new SelectRequest().withSelectExpression(query); SelectResponse response = service.select(request); 62
  61. 61. There are Limits! Query execution time <= 5 sec Max items in query response = 250 See SimpleDB Developer Guide for more... Size limits <= 1024 bytes Attribute limit per item <= 256
  62. 62. (May I have another?) <QueryResponse xmlns="http://sdb.amazonaws.com/doc/2007-11-07/"> <QueryResult> <ItemName> 131 </ItemName> ... <NextToken> rO0ABXNyACdjb20uYW1hem9uLnNkcy5RdWVyeVByb2Nlc3Nvci5Nb3JlVG9r racXLnINNqwMACkkAFGluaXRpYWxDb25qdW5jdEluZGV4WgAOaXNQYWdlQm91bmRhc ... </NextToken> </QueryResult> <ResponseMetadata> ... </ResponseMetadata> </QueryResponse> NextToken
  63. 63. Eventually consistent(*) "Amazon SimpleDB keeps multiple copies of each domain.When data is written or updated...all copies of the data are updated. However, it takes time for the data to propagate to all storage locations.The data will eventually be consistent, but an immediate read might not show the change. Consistency is usually reached within seconds, but a high system load or network partition might increase this time. Performing a read after a short period of time should return the updated data." (Version 2007-11-07) - SimpleDB Developer Guide
  64. 64. (*) ConsistentRead Version 2009-04-15 added consistent read option "If eventually consistent reads are not acceptable for your application, use ConsistentRead.Although this operation might take longer than a standard read, it always returns the last updated value." (Version 2009-04-15) - SimpleDB Developer Guide
  65. 65. Distributed Key - Value Stores
  66. 66. value = store.get(key) store.put(key, value) store.remove(key) 68 Basically...
  67. 67. Data stored as key/value pairs "A big hashtable" Replication Fault tolerance Data consistency & versioning Horizontal scaling
  68. 68. Amazon Dynamo (a real-world example)
  69. 69. Distributed key-value storage system Used by Amazon core and web services (e.g. your Amazon shopping cart...) Massively scaleable Fault tolerant Eventually consistent
  70. 70. The-Project-Which-Must- Not-Be-Named (ProjectVoldemort)
  71. 71. What is it? "a distributed key-value storage system" automatic replication across multiple servers transparent server failure handling automatic data item versioning
  72. 72. "Voldemort is not a relational database, it does not attempt to satisfy arbitrary relations while satisfying ACID properties. Nor is it an object database that attempts to transparently map object reference graphs. Nor does it introduce a new abstraction such as document- orientation. It is basically just a big, distributed, persistent, fault-tolerant hash table." http://project-voldemort.com/
  73. 73. designed for horizontal scaling used at LinkedIn "for certain high-scalability storage problems where simple functional partitioning is not sufficient"
  74. 74. "Consistent hashing" No single server holds all data Data partitioned across multiple servers Versioning using "vector clocks"
  75. 75. Configuration: cluster.xml describes cluster (servers, data partitions) stores.xml describes data stores (persistence, routing, key/value data format, replication factor, preferred reads/writes, required reads/writes)
  76. 76. <cluster> <name>mycluster</name> <server> <id>0</id> <host>localhost</host> <http-port>8081</http-port> <socket-port>6666</socket-port> <partitions>0, 1, 2, 3</partitions> </server> <server> <id>1</id> <host>localhost</host> <http-port>8082</http-port> <socket-port>6667</socket-port> <partitions>4, 5, 6, 7</partitions> </server> </cluster> 78 sample cluster.xml
  77. 77. <stores> <store> <name>people</name> <persistence>bdb</persistence> <routing>client</routing> <replication-factor>3</replication-factor> <preferred-reads>3</preferred-reads> <required-reads>2</required-reads> <preferred-writes>2</preferred-writes> <required-writes>1</required-writes> <key-serializer> <type>json</type> <schema-info>"string"</schema-info> </key-serializer> <value-serializer> <type>json</type> <schema-info>{"GivenName":"string", "Surname":"string"}</schema-info> </value-serializer> </store> </stores> 79 sample stores.xml
  78. 78. > locate "1" Node 0 host: localhost port: 6666 available: yes last checked: 96171 ms ago Node 1 host: localhost port: 6667 available: yes last checked: 96171 ms ago Node 2 host: localhost port: 6668 available: yes last checked: 96172 ms ago 80 replication
  79. 79. $ ./voldemort-shell.sh people tcp://localhost:6666 Established connection to people via tcp://localhost:6666 > put "1" { "GivenName":"Bob", "Surname":"Smith" } > get "1" version(0:1): {"GivenName":"Bob", "Surname":"Smith", } > put "1" { "GivenName":"Robert", "Surname":"Smith", } > get "1" version(0:2): {"GivenName":"Robert", "Surname":"Smith", } 81 vector clock (master node: version)
  80. 80. StoreClientFactory factory = new SocketStoreClientFactory(numThreads, numThreads, maxQueuedRequests, maxConnectionsPerNode, maxTotalConnections, bootstrapUrl); StoreClient<Integer, Map<String, Object>> client = factory.getStoreClient("fakenames"); // Update a value Versioned versioned = client.get(1); Map<String, Object> person = versioned.getValue(); person.put("EmailAddress", newEmailAddr); versioned.setObject(person); client.put(1, versioned); 82 Java API example
  81. 81. Bigtable Google
  82. 82. - Bigtable:A Distributed Storage System for Structured Data http://labs.google.com/papers/bigtable.html "Bigtable is a distributed storage system for managing structured data that is designed to scale to a very large size: petabytes of data across thousands of commodity servers. Many projects at Google store data in Bigtable including web indexing, Google Earth, and Google Finance."
  83. 83. "A Bigtable is a sparse, distributed, persistent multidimensional sorted map" - Bigtable:A Distributed Storage System for Structured Data http://labs.google.com/papers/bigtable.html
  84. 84. ?
  85. 85. distributed sparse column-oriented versioned
  86. 86. (row key, column key, timestamp) => value The map is indexed by a row key, column key, and a timestamp; each value in the map is an uninterpreted array of bytes. - Bigtable:A Distributed Storage System for Structured Data http://labs.google.com/papers/bigtable.html
  87. 87. Key Concepts: row key => 20090407152657 column family => "name:" column key => "name:first", "name:last" timestamp => 1239124584398
  88. 88. Row Key Timestamp Column Family "info:"Column Family "info:" Column Family "content:" 20090407145045 t7 "info:summary" "An intro to..."20090407145045 t6 "info:author" "John Doe" 20090407145045 t5 "Google's Bigtable is..." 20090407145045 t4 "Google Bigtable is..." 20090407145045 t3 "info:category" "Persistence" 20090407145045 t2 "info:author" "John" 20090407145045 t1 "info:title" "Intro to Bigtable" 20090320162535 t4 "info:category" "Persistence"20090320162535 t3 "CouchDB is..." 20090320162535 t2 "info:author" "Bob Smith" 20090320162535 t1 "info:title" "Doc-oriented..."
  89. 89. Row Key Timestamp Column Family "info:"Column Family "info:" Column Family "content:" 20090407145045 t7 "info:summary" "An intro to..."20090407145045 t6 "info:author" "John Doe" 20090407145045 t5 "Google's Bigtable is..." 20090407145045 t4 "Google Bigtable is..." 20090407145045 t3 "info:category" "Persistence" 20090407145045 t2 "info:author" "John" 20090407145045 t1 "info:title" "Intro to Bigtable" 20090320162535 t4 "info:category" "Persistence"20090320162535 t3 "CouchDB is..." 20090320162535 t2 "info:author" "Bob Smith" 20090320162535 t1 "info:title" "Doc-oriented..." Ask for row 20090407145045...
  90. 90. Apache HBase (an open source Bigtable implementation)
  91. 91. HBase uses a data model very similar to that of Bigtable. Applications store data rows in labeled tables.A data row has a sortable row key and an arbitrary number of columns.The table is stored sparsely, so that rows in the same table can have widely varying numbers of columns. - http://wiki.apache.org/hadoop/Hbase/HbaseArchitecture
  92. 92. hbase(main):001:0> create 'blog', 'info', 'content' 0 row(s) in 4.3640 seconds hbase(main):002:0> put 'blog', '20090320162535', 'info:title', 'Document-oriented storage using CouchDB' 0 row(s) in 0.0330 seconds hbase(main):003:0> put 'blog', '20090320162535', 'info:author', 'Bob Smith' 0 row(s) in 0.0030 seconds hbase(main):004:0> put 'blog', '20090320162535', 'content:', 'CouchDB is a document-oriented...' 0 row(s) in 0.0030 seconds hbase(main):005:0> put 'blog', '20090320162535', 'info:category', 'Persistence' 0 row(s) in 0.0030 seconds hbase(main):006:0> get 'blog', '20090320162535' COLUMN CELL content: timestamp=1239135042862, value=CouchDB is a doc... info:author timestamp=1239135042755, value=Bob Smith info:category timestamp=1239135042982, value=Persistence info:title timestamp=1239135042623, value=Document-oriented... 4 row(s) in 0.0140 seconds 94 HBase Shell
  93. 93. hbase(main):015:0> get 'blog', '20090407145045', {COLUMN=>'info:author', VERSIONS=>3 } timestamp=1239135325074, value=John Doe timestamp=1239135324741, value=John 2 row(s) in 0.0060 seconds hbase(main):016:0> scan 'blog', { STARTROW => '20090300', STOPROW => '20090400' } ROW COLUMN+CELL 20090320162535 column=content:, timestamp=1239135042862, value=CouchDB is... 20090320162535 column=info:author, timestamp=1239135042755, value=Bob Smith 20090320162535 column=info:category, timestamp=1239135042982, value=Persistence 20090320162535 column=info:title, timestamp=1239135042623, value=Document... 4 row(s) in 0.0230 seconds 95
  94. 94. Got byte[]?
  95. 95. // Create a new table HBaseAdmin admin = new HBaseAdmin(new HBaseConfiguration()); HTableDescriptor descriptor = new HTableDescriptor("mytable"); descriptor.addFamily(new HColumnDescriptor("family1:")); descriptor.addFamily(new HColumnDescriptor("family2:")); descriptor.addFamily(new HColumnDescriptor("family3:")); admin.createTable(descriptor); 97
  96. 96. // Add some data into 'mytable' HTable table = new HTable("mytable"); BatchUpdate update = new BatchUpdate("row1"); update.put("family1:aaa", Bytes.toBytes("some value")); table.commit(update); // Get data back RowResult result = table.getRow("row1"); Cell cell = result.get("family1:aaa"); // Overwrite earlier value and add more data BatchUpdate update2 = new BatchUpdate("row1"); update2.put("family1:aaa", Bytes.toBytes("some value")); update2.put("family2:bbb", Bytes.toBytes("another value")); table.commit(update2); 98
  97. 97. Finding data: get (by row key) scan (by row key ranges, filtering) Secondary indexes allow scanning by different keys (a bit more flexibility, requires more storage)
  98. 98. // Scan for people born during January 1960 HTable table = new HTable("fakenames"); byte[][] columns = Bytes.toByteArrays(new String[]{ "name:", "gender:" }); byte[] startRow = Bytes.toBytes("19600101"); byte[] endRow = Bytes.toBytes("19600201"); Scanner scanner = table.getScanner(columns, startRow, endRow); for (RowResult result: scanner) { ... } scanner.close(); 100
  99. 99. Conclusions?
  100. 100. one size does not fit all lots of alternatives think about what you really need... (not what's currently "hot")
  101. 101. What do you really need? distributed deployment? fault tolerance? query richness? schema evolution? extreme scalability? ability to enforce relationships? ACID or BASE? key/value storage?
  102. 102. Even more alternatives... XML databases Semantic Web / RDF / Triplestores Graph databases Tuplespaces
  103. 103. References!
  104. 104. General Polyglot Persistence http://www.sleberknight.com/blog/sleberkn/entry/polyglot_persistence Database Thaw http://martinfowler.com/bliki/DatabaseThaw.html Application Design in the context of the shifting storage spectrum http://qconsf.com/sf2008/presentation/Application+Design+in+the+context+of+the+shifting+storage+spectrum BASE:An Acid Alternative http://queue.acm.org/detail.cfm?id=1394128 The Challenges of Latency http://www.infoq.com/articles/pritchett-latency One size fits all:A concept whose time has come and gone http://www.databasecolumn.com/2007/09/one-size-fits-all.html http://www.cs.brown.edu/~ugur/fits_all.pdf The End of an Architectural Era (It's Time for a Complete Rewrite) http://db.cs.yale.edu/vldb07hstore.pdf Brewer’s Conjecture and the Feasibility of Consistent,Available, Partition-Tolerant Web Services http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.20.1495
  105. 105. General Semi-Structured Data http://www.dcs.bbk.ac.uk/~ptw/teaching/ssd/toc.html Latency is Everywhere and it CostsYou Sales - How to Crush it http://highscalability.com/latency-everywhere-and-it-costs-you-sales-how-crush-it QCon London 2009: Database projects to watch closely http://gojko.net/2009/03/11/qcon-london-2009-database-projects-to-watch-closely Memories, Guesses, and Apologie http://blogs.msdn.com/pathelland/archive/2007/05/15/memories-guesses-and-apologies.aspx Column-oriented databases http://en.wikipedia.org/wiki/Column-oriented_DBMS Entity-Attribute-Value model http://en.wikipedia.org/wiki/Entity-Attribute-Value_model Read Consistency: Dumb Databases, Smart Services http://blog.labnotes.org/2007/09/20/read-consistency-dumb-databases-smart-services/ Neo4j graph database http://neo4j.org/ NoSql web site - "Your Ultimate Guide to the Non-Relational Universe" http://nosql-database.org/
  106. 106. Document-Oriented Databases Document-Oriented Database http://en.wikipedia.org/wiki/Document-oriented_database Apache CouchDB http://couchdb.apache.org/ Why CouchDB? http://pmuellr.blogspot.com/2008/01/why-couchdb.html Why CouchDB Sucks http://www.eflorenzano.com/blog/post/why-couchdb-sucks/ Damien Katz CouchDB Interview http://www.infoq.com/news/2008/11/CouchDB-Damien-Katz CouchDB:Thinking beyond the RDBMS http://blog.labnotes.org/2007/09/02/couchdb-thinking-beyond-the-rdbms/ CouchDB Implementation http://horicky.blogspot.com/2008/10/couchdb-implementation.html Dare Takes a Look at CouchDB http://intertwingly.net/blog/2007/09/12/Dare-Takes-a-Look-at-CouchDB
  107. 107. Document-Oriented Databases CouchDB - A Use Case http://kore-nordmann.de/blog/couchdb_a_use_case.html Amazon SimpleDB http://aws.amazon.com/simpledb/ http://en.wikipedia.org/wiki/SimpleDB thrudb - Document Oriented Database Services http://code.google.com/p/thrudb/ thrudb - faster, cheaper than SimpleDB http://www.igvita.com/2007/12/28/thrudb-faster-and-cheaper-than-simpledb/ QCon 2008 track on Document-Oriented Distributed Databases http://qconsf.com/sf2008/tracks/show_track.jsp?trackOID=170
  108. 108. Distributed K-V Stores Amazon's Dynamo http://www.allthingsdistributed.com/2007/10/amazons_dynamo.html http://www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdf Anti-RDBMS:A list of distributed key-value stores http://www.metabrew.com/article/anti-rdbms-a-list-of-distributed-key-value-stores/ http://www.reddit.com/r/programming/comments/7qv19/antirdbms_a_list_of_distributed_keyvalue_stores/ Is the Relational Database Doomed? http://developers.slashdot.org/comments.pl?sid=1127539&cid=26849641 ProjectVoldemort http://project-voldemort.com/ ProjectVoldemort design (also see excellent list of references from this page) http://project-voldemort.com/design.php Consistent Hashing http://en.wikipedia.org/wiki/Consistent_hashing
  109. 109. Bigtable / HBase Google Architecture http://highscalability.com/google-architecturehttp://highscalability.com/google-architecture Bigtable:A Distributed Storage System for Structured Data http://en.wikipedia.org/wiki/BigTable http://labs.google.com/papers/bigtable.html http://labs.google.com/papers/bigtable-osdi06.pdf Apache HBase http://hadoop.apache.org/hbase/ http://en.wikipedia.org/wiki/HBase Apache Hadoop http://hadoop.apache.org/ Understanding HBase and BigTable http://jimbojw.com/wiki/index.php?title=Understanding_Hbase_and_BigTable Matching Impedance:When to use HBase http://blog.rapleaf.com/dev/?p=26 HBase Leads Discuss Hadoop, BigTable and Distributed Databases http://www.infoq.com/news/2008/04/hbase-interview Hadoop/HBase vs RDBMS http://www.docstoc.com/docs/2996433/Hadoop-and-HBase-vs-RDBMS
  110. 110. Questions?
  111. 111. scott.leberknight@nearinfinity.com www.nearinfinity.com/blogs/ twitter: sleberknight

×