SlideShare a Scribd company logo
1 of 113
Download to read offline
Polyglot Persistence
Scott Leberknight
Polyglot?
http://memeagora.blogspot.com/2006/12/polyglot-programming.html
Neal Ford
December 2006
Polyglot Programming
http://www.amazon.com/Paradox-Choice-Why-More-Less/dp/0060005688
First web frameworks...
http://java-source.net/open-source/web-frameworks
non-Java web frameworks too!
...then AJAX and JavaScript
InitialContext ic = new InitialContext();
DataSource ds = ic.lookup("java:comp/env/jdbc/cof
Connection con = null;
Statement stmt = null;
ResultSet rs = null;
try {
con = ds.getConnection();
stmt = con.createStatement();
rs = stmt.executeQuery("select name, price from
List<Coffee> coffees = new ArrayList<Cofee>();
while (rs.next()) {
String name = rs.getString("name");
float price = rs.getFloat("price");
coffees.add(new Coffee(name, price);
}
} catch (SQLException sqlex) {
...and now
PERSISTENCE
Why?
Scalability
(on massive scales)
High availability
New types of apps,
e.g. social networking
Fault tolerance Distributability
Flexibility
(i.e. "schemaless")
Why?
One size does not fit all
Relational
Document
Oriented
Object
Bigtable-ish
A few types of Databases...
Key-value
EAV
(Entity-Attribute-Value)
Structured
Semi-Structured
UnstructuredTypes of data
ACID vs. BASE
ACID
Atomic
Consistent
Isolated
Durable
ACID in Action
1st Bank
checking savings
customers
Transfer
$1000 from
1st Bank
checking to
savings
BASE
Basically Available
Soft State
Eventually Consistent
BASE in Action
1st Bank
checking savings
customers
Transfer $1000 from
1st Bank checking to
Bank of Foo savings
Bank of Foo
account account_type
customer
Schedule, Cost, Quality
(choose any 2)
Brewer's Conjecture
"When designing distributed web services, there
are three properties that are commonly desired:
consistency, availability, and partition tolerance.
It is impossible to achieve all three."
- "Brewer's Conjecture and the Feasibility of Consistent,
Available, Partition-Tolerant Web Services"
Seth Gilbert and Nancy Lynch (MIT)
Consistency
Partition-tolerance
Availability
(choose any 2)
We're living in interesting times...
Explosion of alternative persistence choices
Completely new philosophies on persistence
Whirlwind tour...
Relational
Document-Oriented
Key/Value
Bigtable
Ankle-deep
Relational
Databases
blog blog_entry blog_entry_comment
category
daily_statistics
blog_owner
blog_user
Relations
(tables, joins, integrity)
ACID guarantees
Query using SQL Strict schema
Difficult to scale,
partition
(e.g. 2-phase commit)
By far most popular persistence choice today
Mismatch with
OO languages
select *
from fakenames f
where f.surname like 'Smi%'
and f.city = 'Richmond'
and f.state = 'VA'
order by f.surname, f.given_name;
28
Scaling...
Buy a bigger machine
(vertical scaling)
What if there is no bigger machine?
Horizontal scaling:
Functional
Sharding
Users 0
Users 1
Products 0 Orders 0
Orders 1
Orders 2
Functional
Shards
Document-Oriented
Databases
"As opposed to Relational Databases, document-based
databases do not store data in tables with uniform sized
fields for each record. Instead, each record is stored as a
document that has certain characteristics. Any number of
fields of any length can be added to a document. Fields can
also contain multiple pieces of data."
- Wikipedia
(http://en.wikipedia.org/wiki/Document-oriented_database)
Examples:
Lotus Notes
Apache CouchDB
Amazon SimpleDB
(for our purposes anyway)
MongoDB
CouchDB
Architecture
Concepts:
Documents
Views
Schemaless
Distributed
RESTful...
Views
JavaScript as description language
Map/Reduce functions
Add structure to semi-structured data
Independent of actual documents
(created in special Design Documents)
function(doc) {
emit(null, doc);
}
42
Simplest map function...
// Map function to find Seattlites
function(doc) {
if (doc.State == "WA" && doc.City == "Seattle") {
emit(doc.Number,
{ "GivenName":doc.GivenName, "Surname":doc.Surname });
}
}
43
// Map function
function(doc) {
emit(doc.State, 1);
}
// Reduce function; aggregates counts
function (key, values) {
return sum(values);
}
44
Counting people by state...
Views are not meant to be created
dynamically like SQL queries!
Caution:
To keep view querying fast, the view engine maintains
indexes of its views, and incrementally updates them to
reflect changes in the database. CouchDB’s core
design is largely optimized around the need for
efficient, incremental creation of views and
their indexes.
- http://couchdb.apache.org/docs/overview.html
Amazon SimpleDB
"Amazon SimpleDB is a web service for running queries on
structured data in real time. This service works in close
conjunction with Amazon Simple Storage Service (Amazon S3)
and Amazon Elastic Compute Cloud (Amazon EC2), collectively
providing the ability to store, process and query data sets in
the cloud. These services are designed to make web-scale
computing easier and more cost-effective for developers."
- SimpleDB Developer Guide
(Version 2007-11-07)
"A traditional, clustered relational database requires a sizable
upfront capital outlay, is complex to design, and often requires a
DBA to maintain and administer.Amazon SimpleDB is
dramatically simpler, requiring no schema, automatically
indexing your data and providing a simple API for storage
and access.This approach eliminates the administrative
burden of data modeling, index maintenance, and performance
tuning. Developers gain access to this functionality within
Amazon’s proven computing environment, are able to scale
instantly, and pay only for what they use."
- SimpleDB Developer Guide
(Version 2007-11-07)
Organize data into domains
Domains have items
Items have attributes
Attributes have value(s)
Domain: Fakenames
"5"
"6/6/1941"
"Gwendolyn"
EmailAddress
"Michael"
"1"
"9/5/1982"
"Chris"
"David"
"11/18/1963""3"
"Swinton"
ID
"Vera"
"Johnson"
Birthday
"vsutton@coldmail.com"
"Vera.M.Sutton@dodgit.com"
"4"
GivenName
"9/20/1951""gswinton@dodgit.com"
"Lewis"
"2"
"mjohnson@stopit.com"
"michael.johnson@yaboo.com"
"Sutton"
"7/14/1952"
"david.schuler@goofymail.com"
"dschuler@yaboo.com"
"schulerd@xyzco.com"
Surname
"Schuler"
Items
Attributes
Values
Domain: Amazon
"Full Screen"
"Mens"
"Entertainment"
Color Size Length
"DVDs"
"White"
"Yellow"
"Beige"
"Pink"
Format
"Clothes"
"Blue"
"Gray"
"Black"
"Books"
"Sound of
Music"
"Item03"
"Blouse"
"Item02"
"Full Screen"
"Widescreen"
"Entertainment" "174 min"
SubcategoryID Author
"Kurt
Vonnegut "
"Womens"
"Item04"
"Item05"
"Item01" "Pulp Fiction""DVDs"
Name
"Small"
"Medium"
"Large"
"Slaugherhouse
Five"
Category
"Clothes"
"Entertainment"
"154 min"
"168 min
(special
edition)"
"30x30"
"32x30"
"34x30"
...
"Jeans"
"REST" API
POST / HTTP/1.1
Content-Type: application/x-www-form-urlencoded; charset=utf-8
User-Agent: Amazon Simple DB Java Library
Host: sdb.amazonaws.com
Content-Length: 232
Action=CreateDomain&
DomainName=Fakenames&
AWSAccessKeyId=[your AWS access key id]&
SignatureVersion=2&
SignatureMethod=HmacSHA256&
Signature=[computed signature]&
Timestamp=2009-03-23T23%3A58%3A55.327Z&
Version=2007-11-07
Available APIs:
Java C#
Perl PHP
VB
Ruby gems:
aws-simpledb
aws-sdb
simpledb
Amazon
3rdparty
Python:
polarrose-twisted-amazon
AmazonSimpleDB service =
new AmazonSimpleDBClient(accessKeyId, secretAccessKey);
// Create a new domain
CreateDomainRequest cdReq =
new CreateDomainRequest().withDomainName("Fakenames");
CreateDomainResponse cdResp = service.createDomain(cdReq);
// List all our domains
ListDomainsRequest ldReq = new ListDomainsRequest();
ListDomainsResponse ldResp = service.listDomains(ldReq);
54
Sample response:
<ListDomainsResponse
xmlns="http://sdb.amazonaws.com/doc/2007-11-07/">
<ListDomainsResult>
<DomainName>
Fakenames
</DomainName>
<DomainName>
Movies
</DomainName>
</ListDomainsResult>
<ResponseMetadata>
<RequestId>
8c4d0240-49ea-5d2f-9573-437324cd144c
</RequestId>
<BoxUsage>
0.0000071759
</BoxUsage>
</ResponseMetadata>
</ListDomainsResponse>
// Add an attribute value
ReplaceableAttribute newEmail =
new ReplaceableAttribute("emailAddress",
"bortiz@spammail.com",
false);
PutAttributesRequest request =
new PutAttributesRequest()
.withDomainName("Fakenames")
.withItemName("1")
.withAttribute(newEmail);
PutAttributesResponse response = service.putAttributes(request);
56
Query API
// Query for Richmonders
String query =
"['city' = 'Richmond'] intersection ['state' = 'VA']";
QueryRequest request = new QueryRequest()
.withDomainName("Fakenames")
.withQueryExpression(query);
QueryResponse response = service.query(request);
58
// Query for Richmonders, with attributes
String query =
"['city' = 'Richmond'] intersection ['state' = 'VA']";
QueryWithAttributesRequest request =
new QueryWithAttributesRequest()
.withDomainName("Fakenames")
.withQueryExpression(query);
QueryWithAttributesResponse response =
service.query(request);
59
SELECT API
// Get a count
String query = "select count(*) from Fakenames";
SelectRequest request =
new SelectRequest().withSelectExpression(query);
SelectResponse response = service.select(request);
61
// Select Richmonders
String query = "select * from Fakenames"
+ " where city = 'Richmond' intersection state = 'VA'"
+ " intersection surname like 'Smi%'";
SelectRequest request =
new SelectRequest().withSelectExpression(query);
SelectResponse response = service.select(request);
62
There are Limits!
Query execution time <= 5 sec
Max items in query response = 250
See SimpleDB Developer Guide for more...
Size limits <= 1024 bytes
Attribute limit per item <= 256
(May I have another?)
<QueryResponse
xmlns="http://sdb.amazonaws.com/doc/2007-11-07/">
<QueryResult>
<ItemName>
131
</ItemName>
...
<NextToken>
rO0ABXNyACdjb20uYW1hem9uLnNkcy5RdWVyeVByb2Nlc3Nvci5Nb3JlVG9r
racXLnINNqwMACkkAFGluaXRpYWxDb25qdW5jdEluZGV4WgAOaXNQYWdlQm91bmRhc
...
</NextToken>
</QueryResult>
<ResponseMetadata>
...
</ResponseMetadata>
</QueryResponse>
NextToken
Eventually consistent(*)
"Amazon SimpleDB keeps multiple copies of each
domain.When data is written or updated...all copies of
the data are updated. However, it takes time for the
data to propagate to all storage locations.The data will
eventually be consistent, but an immediate read
might not show the change. Consistency is usually
reached within seconds, but a high system load or
network partition might increase this time. Performing
a read after a short period of time should return the
updated data."
(Version 2007-11-07)
- SimpleDB Developer Guide
(*) ConsistentRead
Version 2009-04-15 added consistent read option
"If eventually consistent reads are not
acceptable for your application, use
ConsistentRead.Although this operation
might take longer than a standard read, it
always returns the last updated value."
(Version 2009-04-15)
- SimpleDB Developer Guide
Distributed Key -
Value Stores
value = store.get(key)
store.put(key, value)
store.remove(key)
68
Basically...
Data stored as
key/value pairs
"A big hashtable"
Replication Fault tolerance
Data consistency &
versioning
Horizontal
scaling
Amazon Dynamo
(a real-world example)
Distributed key-value
storage system
Used by Amazon core and web services
(e.g. your Amazon shopping cart...)
Massively
scaleable
Fault tolerant
Eventually
consistent
The-Project-Which-Must-
Not-Be-Named
(ProjectVoldemort)
What is it?
"a distributed key-value storage system"
automatic replication across multiple servers
transparent server failure handling
automatic data item versioning
"Voldemort is not a relational database, it does not
attempt to satisfy arbitrary relations while satisfying ACID
properties. Nor is it an object database that attempts to
transparently map object reference graphs. Nor does it
introduce a new abstraction such as document-
orientation. It is basically just a big, distributed,
persistent, fault-tolerant hash table."
http://project-voldemort.com/
designed for horizontal scaling
used at LinkedIn "for certain high-scalability
storage problems where simple functional
partitioning is not sufficient"
"Consistent hashing"
No single server holds all data
Data partitioned across multiple servers
Versioning using "vector clocks"
Configuration:
cluster.xml describes cluster
(servers, data partitions)
stores.xml describes data stores
(persistence, routing, key/value data format, replication factor,
preferred reads/writes, required reads/writes)
<cluster>
<name>mycluster</name>
<server>
<id>0</id>
<host>localhost</host>
<http-port>8081</http-port>
<socket-port>6666</socket-port>
<partitions>0, 1, 2, 3</partitions>
</server>
<server>
<id>1</id>
<host>localhost</host>
<http-port>8082</http-port>
<socket-port>6667</socket-port>
<partitions>4, 5, 6, 7</partitions>
</server>
</cluster>
78
sample cluster.xml
<stores>
<store>
<name>people</name>
<persistence>bdb</persistence>
<routing>client</routing>
<replication-factor>3</replication-factor>
<preferred-reads>3</preferred-reads>
<required-reads>2</required-reads>
<preferred-writes>2</preferred-writes>
<required-writes>1</required-writes>
<key-serializer>
<type>json</type>
<schema-info>"string"</schema-info>
</key-serializer>
<value-serializer>
<type>json</type>
<schema-info>{"GivenName":"string", "Surname":"string"}</schema-info>
</value-serializer>
</store>
</stores>
79
sample stores.xml
> locate "1"
Node 0
host: localhost
port: 6666
available: yes
last checked: 96171 ms ago
Node 1
host: localhost
port: 6667
available: yes
last checked: 96171 ms ago
Node 2
host: localhost
port: 6668
available: yes
last checked: 96172 ms ago
80
replication
$ ./voldemort-shell.sh people tcp://localhost:6666
Established connection to people via tcp://localhost:6666
> put "1" { "GivenName":"Bob", "Surname":"Smith" }
> get "1"
version(0:1): {"GivenName":"Bob", "Surname":"Smith", }
> put "1" { "GivenName":"Robert", "Surname":"Smith", }
> get "1"
version(0:2): {"GivenName":"Robert", "Surname":"Smith", }
81
vector clock
(master node: version)
StoreClientFactory factory =
new SocketStoreClientFactory(numThreads,
numThreads, maxQueuedRequests, maxConnectionsPerNode,
maxTotalConnections, bootstrapUrl);
StoreClient<Integer, Map<String, Object>> client =
factory.getStoreClient("fakenames");
// Update a value
Versioned versioned = client.get(1);
Map<String, Object> person = versioned.getValue();
person.put("EmailAddress", newEmailAddr);
versioned.setObject(person);
client.put(1, versioned);
82
Java API example
Bigtable
Google
- Bigtable:A Distributed Storage System
for Structured Data
http://labs.google.com/papers/bigtable.html
"Bigtable is a distributed storage
system for managing structured data
that is designed to scale to a very
large size: petabytes of data across
thousands of commodity
servers. Many projects at Google
store data in Bigtable including web
indexing, Google Earth, and Google
Finance."
"A Bigtable is a sparse, distributed, persistent
multidimensional sorted map"
- Bigtable:A Distributed Storage System
for Structured Data
http://labs.google.com/papers/bigtable.html
?
distributed
sparse
column-oriented
versioned
(row key, column key, timestamp) => value
The map is indexed by a row key,
column key, and a timestamp; each
value in the map is an uninterpreted array
of bytes.
- Bigtable:A Distributed Storage System
for Structured Data
http://labs.google.com/papers/bigtable.html
Key Concepts:
row key => 20090407152657
column family => "name:"
column key => "name:first", "name:last"
timestamp => 1239124584398
Row Key Timestamp Column Family "info:"Column Family "info:" Column Family "content:"
20090407145045 t7 "info:summary" "An intro to..."20090407145045
t6 "info:author" "John Doe"
20090407145045
t5 "Google's Bigtable is..."
20090407145045
t4 "Google Bigtable is..."
20090407145045
t3 "info:category" "Persistence"
20090407145045
t2 "info:author" "John"
20090407145045
t1 "info:title" "Intro to Bigtable"
20090320162535 t4 "info:category" "Persistence"20090320162535
t3 "CouchDB is..."
20090320162535
t2 "info:author" "Bob Smith"
20090320162535
t1 "info:title" "Doc-oriented..."
Row Key Timestamp Column Family "info:"Column Family "info:" Column Family "content:"
20090407145045 t7 "info:summary" "An intro to..."20090407145045
t6 "info:author" "John Doe"
20090407145045
t5 "Google's Bigtable is..."
20090407145045
t4 "Google Bigtable is..."
20090407145045
t3 "info:category" "Persistence"
20090407145045
t2 "info:author" "John"
20090407145045
t1 "info:title" "Intro to Bigtable"
20090320162535 t4 "info:category" "Persistence"20090320162535
t3 "CouchDB is..."
20090320162535
t2 "info:author" "Bob Smith"
20090320162535
t1 "info:title" "Doc-oriented..."
Ask for row 20090407145045...
Apache HBase
(an open source Bigtable implementation)
HBase uses a data model very similar to that of Bigtable.
Applications store data rows in labeled tables.A data row
has a sortable row key and an arbitrary number of
columns.The table is stored sparsely, so that rows in
the same table can have widely varying numbers of
columns.
- http://wiki.apache.org/hadoop/Hbase/HbaseArchitecture
hbase(main):001:0> create 'blog', 'info', 'content'
0 row(s) in 4.3640 seconds
hbase(main):002:0> put 'blog', '20090320162535', 'info:title', 'Document-oriented
storage using CouchDB'
0 row(s) in 0.0330 seconds
hbase(main):003:0> put 'blog', '20090320162535', 'info:author', 'Bob Smith'
0 row(s) in 0.0030 seconds
hbase(main):004:0> put 'blog', '20090320162535', 'content:', 'CouchDB is a
document-oriented...'
0 row(s) in 0.0030 seconds
hbase(main):005:0> put 'blog', '20090320162535', 'info:category', 'Persistence'
0 row(s) in 0.0030 seconds
hbase(main):006:0> get 'blog', '20090320162535'
COLUMN CELL
content: timestamp=1239135042862, value=CouchDB is a doc...
info:author timestamp=1239135042755, value=Bob Smith
info:category timestamp=1239135042982, value=Persistence
info:title timestamp=1239135042623, value=Document-oriented...
4 row(s) in 0.0140 seconds
94
HBase Shell
hbase(main):015:0> get 'blog', '20090407145045', {COLUMN=>'info:author', VERSIONS=>3 }
timestamp=1239135325074, value=John Doe
timestamp=1239135324741, value=John
2 row(s) in 0.0060 seconds
hbase(main):016:0> scan 'blog', { STARTROW => '20090300', STOPROW => '20090400' }
ROW COLUMN+CELL
20090320162535 column=content:, timestamp=1239135042862, value=CouchDB is...
20090320162535 column=info:author, timestamp=1239135042755, value=Bob Smith
20090320162535 column=info:category, timestamp=1239135042982, value=Persistence
20090320162535 column=info:title, timestamp=1239135042623, value=Document...
4 row(s) in 0.0230 seconds
95
Got byte[]?
// Create a new table
HBaseAdmin admin = new HBaseAdmin(new HBaseConfiguration());
HTableDescriptor descriptor = new HTableDescriptor("mytable");
descriptor.addFamily(new HColumnDescriptor("family1:"));
descriptor.addFamily(new HColumnDescriptor("family2:"));
descriptor.addFamily(new HColumnDescriptor("family3:"));
admin.createTable(descriptor);
97
// Add some data into 'mytable'
HTable table = new HTable("mytable");
BatchUpdate update = new BatchUpdate("row1");
update.put("family1:aaa", Bytes.toBytes("some value"));
table.commit(update);
// Get data back
RowResult result = table.getRow("row1");
Cell cell = result.get("family1:aaa");
// Overwrite earlier value and add more data
BatchUpdate update2 = new BatchUpdate("row1");
update2.put("family1:aaa", Bytes.toBytes("some value"));
update2.put("family2:bbb", Bytes.toBytes("another value"));
table.commit(update2);
98
Finding data:
get (by row key)
scan (by row key ranges, filtering)
Secondary indexes allow scanning
by different keys
(a bit more flexibility, requires more storage)
// Scan for people born during January 1960
HTable table = new HTable("fakenames");
byte[][] columns =
Bytes.toByteArrays(new String[]{ "name:", "gender:" });
byte[] startRow = Bytes.toBytes("19600101");
byte[] endRow = Bytes.toBytes("19600201");
Scanner scanner = table.getScanner(columns, startRow, endRow);
for (RowResult result: scanner) {
...
}
scanner.close();
100
Conclusions?
one size does
not fit all
lots of alternatives
think about what you
really need...
(not what's currently "hot")
What do you really need?
distributed
deployment?
fault
tolerance?
query
richness?
schema
evolution?
extreme
scalability?
ability to enforce
relationships?
ACID or BASE?
key/value
storage?
Even more alternatives...
XML databases
Semantic Web / RDF / Triplestores
Graph databases
Tuplespaces
References!
General
Polyglot Persistence
http://www.sleberknight.com/blog/sleberkn/entry/polyglot_persistence
Database Thaw
http://martinfowler.com/bliki/DatabaseThaw.html
Application Design in the context of the shifting storage spectrum
http://qconsf.com/sf2008/presentation/Application+Design+in+the+context+of+the+shifting+storage+spectrum
BASE:An Acid Alternative
http://queue.acm.org/detail.cfm?id=1394128
The Challenges of Latency
http://www.infoq.com/articles/pritchett-latency
One size fits all:A concept whose time has come and gone
http://www.databasecolumn.com/2007/09/one-size-fits-all.html
http://www.cs.brown.edu/~ugur/fits_all.pdf
The End of an Architectural Era (It's Time for a Complete Rewrite)
http://db.cs.yale.edu/vldb07hstore.pdf
Brewer’s Conjecture and the Feasibility of Consistent,Available, Partition-Tolerant Web Services
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.20.1495
General
Semi-Structured Data
http://www.dcs.bbk.ac.uk/~ptw/teaching/ssd/toc.html
Latency is Everywhere and it CostsYou Sales - How to Crush it
http://highscalability.com/latency-everywhere-and-it-costs-you-sales-how-crush-it
QCon London 2009: Database projects to watch closely
http://gojko.net/2009/03/11/qcon-london-2009-database-projects-to-watch-closely
Memories, Guesses, and Apologie
http://blogs.msdn.com/pathelland/archive/2007/05/15/memories-guesses-and-apologies.aspx
Column-oriented databases
http://en.wikipedia.org/wiki/Column-oriented_DBMS
Entity-Attribute-Value model
http://en.wikipedia.org/wiki/Entity-Attribute-Value_model
Read Consistency: Dumb Databases, Smart Services
http://blog.labnotes.org/2007/09/20/read-consistency-dumb-databases-smart-services/
Neo4j graph database
http://neo4j.org/
NoSql web site - "Your Ultimate Guide to the Non-Relational Universe"
http://nosql-database.org/
Document-Oriented Databases
Document-Oriented Database
http://en.wikipedia.org/wiki/Document-oriented_database
Apache CouchDB
http://couchdb.apache.org/
Why CouchDB?
http://pmuellr.blogspot.com/2008/01/why-couchdb.html
Why CouchDB Sucks
http://www.eflorenzano.com/blog/post/why-couchdb-sucks/
Damien Katz CouchDB Interview
http://www.infoq.com/news/2008/11/CouchDB-Damien-Katz
CouchDB:Thinking beyond the RDBMS
http://blog.labnotes.org/2007/09/02/couchdb-thinking-beyond-the-rdbms/
CouchDB Implementation
http://horicky.blogspot.com/2008/10/couchdb-implementation.html
Dare Takes a Look at CouchDB
http://intertwingly.net/blog/2007/09/12/Dare-Takes-a-Look-at-CouchDB
Document-Oriented Databases
CouchDB - A Use Case
http://kore-nordmann.de/blog/couchdb_a_use_case.html
Amazon SimpleDB
http://aws.amazon.com/simpledb/
http://en.wikipedia.org/wiki/SimpleDB
thrudb - Document Oriented Database Services
http://code.google.com/p/thrudb/
thrudb - faster, cheaper than SimpleDB
http://www.igvita.com/2007/12/28/thrudb-faster-and-cheaper-than-simpledb/
QCon 2008 track on Document-Oriented Distributed Databases
http://qconsf.com/sf2008/tracks/show_track.jsp?trackOID=170
Distributed K-V Stores
Amazon's Dynamo
http://www.allthingsdistributed.com/2007/10/amazons_dynamo.html
http://www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdf
Anti-RDBMS:A list of distributed key-value stores
http://www.metabrew.com/article/anti-rdbms-a-list-of-distributed-key-value-stores/
http://www.reddit.com/r/programming/comments/7qv19/antirdbms_a_list_of_distributed_keyvalue_stores/
Is the Relational Database Doomed?
http://developers.slashdot.org/comments.pl?sid=1127539&cid=26849641
ProjectVoldemort
http://project-voldemort.com/
ProjectVoldemort design (also see excellent list of references from this page)
http://project-voldemort.com/design.php
Consistent Hashing
http://en.wikipedia.org/wiki/Consistent_hashing
Bigtable / HBase
Google Architecture
http://highscalability.com/google-architecturehttp://highscalability.com/google-architecture
Bigtable:A Distributed Storage System for Structured Data
http://en.wikipedia.org/wiki/BigTable
http://labs.google.com/papers/bigtable.html
http://labs.google.com/papers/bigtable-osdi06.pdf
Apache HBase
http://hadoop.apache.org/hbase/
http://en.wikipedia.org/wiki/HBase
Apache Hadoop
http://hadoop.apache.org/
Understanding HBase and BigTable
http://jimbojw.com/wiki/index.php?title=Understanding_Hbase_and_BigTable
Matching Impedance:When to use HBase
http://blog.rapleaf.com/dev/?p=26
HBase Leads Discuss Hadoop, BigTable and Distributed Databases
http://www.infoq.com/news/2008/04/hbase-interview
Hadoop/HBase vs RDBMS
http://www.docstoc.com/docs/2996433/Hadoop-and-HBase-vs-RDBMS
Questions?
scott.leberknight@nearinfinity.com
www.nearinfinity.com/blogs/
twitter: sleberknight

More Related Content

What's hot

Spark Dataframe - Mr. Jyotiska
Spark Dataframe - Mr. JyotiskaSpark Dataframe - Mr. Jyotiska
Spark Dataframe - Mr. JyotiskaSigmoid
 
SparkSQL and Dataframe
SparkSQL and DataframeSparkSQL and Dataframe
SparkSQL and DataframeNamgee Lee
 
Cutting Edge Data Processing with PHP & XQuery
Cutting Edge Data Processing with PHP & XQueryCutting Edge Data Processing with PHP & XQuery
Cutting Edge Data Processing with PHP & XQueryWilliam Candillon
 
UKOUG 2011 - Drag, Drop and other Stuff. Using your Database as a File Server
UKOUG 2011 - Drag, Drop and other Stuff. Using your Database as a File ServerUKOUG 2011 - Drag, Drop and other Stuff. Using your Database as a File Server
UKOUG 2011 - Drag, Drop and other Stuff. Using your Database as a File ServerMarco Gralike
 
Updates from Cassandra Summit 2016 & SASI Indexes
Updates from Cassandra Summit 2016 & SASI IndexesUpdates from Cassandra Summit 2016 & SASI Indexes
Updates from Cassandra Summit 2016 & SASI IndexesJim Hatcher
 
PySpark with Juypter
PySpark with JuypterPySpark with Juypter
PySpark with JuypterLi Ming Tsai
 
OPP2010 (Brussels) - Programming with XML in PL/SQL - Part 1
OPP2010 (Brussels) - Programming with XML in PL/SQL - Part 1OPP2010 (Brussels) - Programming with XML in PL/SQL - Part 1
OPP2010 (Brussels) - Programming with XML in PL/SQL - Part 1Marco Gralike
 
MongoDB: Easy Java Persistence with Morphia
MongoDB: Easy Java Persistence with MorphiaMongoDB: Easy Java Persistence with Morphia
MongoDB: Easy Java Persistence with MorphiaScott Hernandez
 
Starting with JSON Path Expressions in Oracle 12.1.0.2
Starting with JSON Path Expressions in Oracle 12.1.0.2Starting with JSON Path Expressions in Oracle 12.1.0.2
Starting with JSON Path Expressions in Oracle 12.1.0.2Marco Gralike
 
Miracle Open World 2011 - XML Index Strategies
Miracle Open World 2011  -  XML Index StrategiesMiracle Open World 2011  -  XML Index Strategies
Miracle Open World 2011 - XML Index StrategiesMarco Gralike
 
UKOUG 2010 (Birmingham) - XML Indexing strategies - Choosing the Right Index ...
UKOUG 2010 (Birmingham) - XML Indexing strategies - Choosing the Right Index ...UKOUG 2010 (Birmingham) - XML Indexing strategies - Choosing the Right Index ...
UKOUG 2010 (Birmingham) - XML Indexing strategies - Choosing the Right Index ...Marco Gralike
 

What's hot (20)

Spark Dataframe - Mr. Jyotiska
Spark Dataframe - Mr. JyotiskaSpark Dataframe - Mr. Jyotiska
Spark Dataframe - Mr. Jyotiska
 
Catmandu / LibreCat Project
Catmandu / LibreCat ProjectCatmandu / LibreCat Project
Catmandu / LibreCat Project
 
SparkSQL and Dataframe
SparkSQL and DataframeSparkSQL and Dataframe
SparkSQL and Dataframe
 
Cutting Edge Data Processing with PHP & XQuery
Cutting Edge Data Processing with PHP & XQueryCutting Edge Data Processing with PHP & XQuery
Cutting Edge Data Processing with PHP & XQuery
 
Cassandra 2.2 & 3.0
Cassandra 2.2 & 3.0Cassandra 2.2 & 3.0
Cassandra 2.2 & 3.0
 
Sql cheat sheet
Sql cheat sheetSql cheat sheet
Sql cheat sheet
 
UKOUG 2011 - Drag, Drop and other Stuff. Using your Database as a File Server
UKOUG 2011 - Drag, Drop and other Stuff. Using your Database as a File ServerUKOUG 2011 - Drag, Drop and other Stuff. Using your Database as a File Server
UKOUG 2011 - Drag, Drop and other Stuff. Using your Database as a File Server
 
Updates from Cassandra Summit 2016 & SASI Indexes
Updates from Cassandra Summit 2016 & SASI IndexesUpdates from Cassandra Summit 2016 & SASI Indexes
Updates from Cassandra Summit 2016 & SASI Indexes
 
PySpark with Juypter
PySpark with JuypterPySpark with Juypter
PySpark with Juypter
 
MongoDB
MongoDBMongoDB
MongoDB
 
OPP2010 (Brussels) - Programming with XML in PL/SQL - Part 1
OPP2010 (Brussels) - Programming with XML in PL/SQL - Part 1OPP2010 (Brussels) - Programming with XML in PL/SQL - Part 1
OPP2010 (Brussels) - Programming with XML in PL/SQL - Part 1
 
Spring data requery
Spring data requerySpring data requery
Spring data requery
 
Requery overview
Requery overviewRequery overview
Requery overview
 
MongoDB: Easy Java Persistence with Morphia
MongoDB: Easy Java Persistence with MorphiaMongoDB: Easy Java Persistence with Morphia
MongoDB: Easy Java Persistence with Morphia
 
Starting with JSON Path Expressions in Oracle 12.1.0.2
Starting with JSON Path Expressions in Oracle 12.1.0.2Starting with JSON Path Expressions in Oracle 12.1.0.2
Starting with JSON Path Expressions in Oracle 12.1.0.2
 
Cassandra 3.0
Cassandra 3.0Cassandra 3.0
Cassandra 3.0
 
XQuery in the Cloud
XQuery in the CloudXQuery in the Cloud
XQuery in the Cloud
 
Miracle Open World 2011 - XML Index Strategies
Miracle Open World 2011  -  XML Index StrategiesMiracle Open World 2011  -  XML Index Strategies
Miracle Open World 2011 - XML Index Strategies
 
Advanced topics in hive
Advanced topics in hiveAdvanced topics in hive
Advanced topics in hive
 
UKOUG 2010 (Birmingham) - XML Indexing strategies - Choosing the Right Index ...
UKOUG 2010 (Birmingham) - XML Indexing strategies - Choosing the Right Index ...UKOUG 2010 (Birmingham) - XML Indexing strategies - Choosing the Right Index ...
UKOUG 2010 (Birmingham) - XML Indexing strategies - Choosing the Right Index ...
 

Viewers also liked (15)

HBase Lightning Talk
HBase Lightning TalkHBase Lightning Talk
HBase Lightning Talk
 
Rack
RackRack
Rack
 
CoffeeScript
CoffeeScriptCoffeeScript
CoffeeScript
 
jps & jvmtop
jps & jvmtopjps & jvmtop
jps & jvmtop
 
iOS
iOSiOS
iOS
 
httpie
httpiehttpie
httpie
 
Cloudera Impala
Cloudera ImpalaCloudera Impala
Cloudera Impala
 
RESTful Web Services with Jersey
RESTful Web Services with JerseyRESTful Web Services with Jersey
RESTful Web Services with Jersey
 
Hadoop
HadoopHadoop
Hadoop
 
Java 8 Lambda Expressions
Java 8 Lambda ExpressionsJava 8 Lambda Expressions
Java 8 Lambda Expressions
 
Dropwizard
DropwizardDropwizard
Dropwizard
 
Google Guava
Google GuavaGoogle Guava
Google Guava
 
Apache ZooKeeper
Apache ZooKeeperApache ZooKeeper
Apache ZooKeeper
 
Awesomizing your Squarespace Website
Awesomizing your Squarespace WebsiteAwesomizing your Squarespace Website
Awesomizing your Squarespace Website
 
AWS Lambda
AWS LambdaAWS Lambda
AWS Lambda
 

Similar to Polyglot Persistence

OSCON 2011 CouchApps
OSCON 2011 CouchAppsOSCON 2011 CouchApps
OSCON 2011 CouchAppsBradley Holt
 
MVP Cloud OS Week Track 1 9 Sept: Data liberty
MVP Cloud OS Week Track 1 9 Sept: Data libertyMVP Cloud OS Week Track 1 9 Sept: Data liberty
MVP Cloud OS Week Track 1 9 Sept: Data libertycsmyth501
 
MVP Cloud OS Week: 9 Sept, Track 1 Data Liberty
MVP Cloud OS Week: 9 Sept, Track 1 Data LibertyMVP Cloud OS Week: 9 Sept, Track 1 Data Liberty
MVP Cloud OS Week: 9 Sept, Track 1 Data Libertycsmyth501
 
Lightning fast analytics with Spark and Cassandra
Lightning fast analytics with Spark and CassandraLightning fast analytics with Spark and Cassandra
Lightning fast analytics with Spark and Cassandranickmbailey
 
Real-Time Spark: From Interactive Queries to Streaming
Real-Time Spark: From Interactive Queries to StreamingReal-Time Spark: From Interactive Queries to Streaming
Real-Time Spark: From Interactive Queries to StreamingDatabricks
 
Native Phone Development 101
Native Phone Development 101Native Phone Development 101
Native Phone Development 101Sasmito Adibowo
 
Dealing with Azure Cosmos DB
Dealing with Azure Cosmos DBDealing with Azure Cosmos DB
Dealing with Azure Cosmos DBMihail Mateev
 
Terrastore - A document database for developers
Terrastore - A document database for developersTerrastore - A document database for developers
Terrastore - A document database for developersSergio Bossa
 
CouchDB Open Source Bridge
CouchDB Open Source BridgeCouchDB Open Source Bridge
CouchDB Open Source BridgeChris Anderson
 
Kotlin Developer Starter in Android projects
Kotlin Developer Starter in Android projectsKotlin Developer Starter in Android projects
Kotlin Developer Starter in Android projectsBartosz Kosarzycki
 
Kotlin Developer Starter in Android - STX Next Lightning Talks - Feb 12, 2016
Kotlin Developer Starter in Android - STX Next Lightning Talks - Feb 12, 2016Kotlin Developer Starter in Android - STX Next Lightning Talks - Feb 12, 2016
Kotlin Developer Starter in Android - STX Next Lightning Talks - Feb 12, 2016STX Next
 
Embedding a language into string interpolator
Embedding a language into string interpolatorEmbedding a language into string interpolator
Embedding a language into string interpolatorMichael Limansky
 
Introduction to MongoDB and Workshop
Introduction to MongoDB and WorkshopIntroduction to MongoDB and Workshop
Introduction to MongoDB and WorkshopAhmedabadJavaMeetup
 
Painless Persistence in a Disconnected World
Painless Persistence in a Disconnected WorldPainless Persistence in a Disconnected World
Painless Persistence in a Disconnected WorldChristian Melchior
 
Building Your First Big Data Application on AWS
Building Your First Big Data Application on AWSBuilding Your First Big Data Application on AWS
Building Your First Big Data Application on AWSAmazon Web Services
 
Solutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
Solutions for bi-directional Integration between Oracle RDMBS & Apache KafkaSolutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
Solutions for bi-directional Integration between Oracle RDMBS & Apache KafkaGuido Schmutz
 
Solutions for bi-directional integration between Oracle RDBMS and Apache Kafk...
Solutions for bi-directional integration between Oracle RDBMS and Apache Kafk...Solutions for bi-directional integration between Oracle RDBMS and Apache Kafk...
Solutions for bi-directional integration between Oracle RDBMS and Apache Kafk...confluent
 

Similar to Polyglot Persistence (20)

OSCON 2011 CouchApps
OSCON 2011 CouchAppsOSCON 2011 CouchApps
OSCON 2011 CouchApps
 
MVP Cloud OS Week Track 1 9 Sept: Data liberty
MVP Cloud OS Week Track 1 9 Sept: Data libertyMVP Cloud OS Week Track 1 9 Sept: Data liberty
MVP Cloud OS Week Track 1 9 Sept: Data liberty
 
MVP Cloud OS Week: 9 Sept, Track 1 Data Liberty
MVP Cloud OS Week: 9 Sept, Track 1 Data LibertyMVP Cloud OS Week: 9 Sept, Track 1 Data Liberty
MVP Cloud OS Week: 9 Sept, Track 1 Data Liberty
 
Lightning fast analytics with Spark and Cassandra
Lightning fast analytics with Spark and CassandraLightning fast analytics with Spark and Cassandra
Lightning fast analytics with Spark and Cassandra
 
Real-Time Spark: From Interactive Queries to Streaming
Real-Time Spark: From Interactive Queries to StreamingReal-Time Spark: From Interactive Queries to Streaming
Real-Time Spark: From Interactive Queries to Streaming
 
Querying mongo db
Querying mongo dbQuerying mongo db
Querying mongo db
 
Native Phone Development 101
Native Phone Development 101Native Phone Development 101
Native Phone Development 101
 
Dealing with Azure Cosmos DB
Dealing with Azure Cosmos DBDealing with Azure Cosmos DB
Dealing with Azure Cosmos DB
 
Terrastore - A document database for developers
Terrastore - A document database for developersTerrastore - A document database for developers
Terrastore - A document database for developers
 
CouchDB Open Source Bridge
CouchDB Open Source BridgeCouchDB Open Source Bridge
CouchDB Open Source Bridge
 
Play framework
Play frameworkPlay framework
Play framework
 
Kotlin Developer Starter in Android projects
Kotlin Developer Starter in Android projectsKotlin Developer Starter in Android projects
Kotlin Developer Starter in Android projects
 
Kotlin Developer Starter in Android - STX Next Lightning Talks - Feb 12, 2016
Kotlin Developer Starter in Android - STX Next Lightning Talks - Feb 12, 2016Kotlin Developer Starter in Android - STX Next Lightning Talks - Feb 12, 2016
Kotlin Developer Starter in Android - STX Next Lightning Talks - Feb 12, 2016
 
Embedding a language into string interpolator
Embedding a language into string interpolatorEmbedding a language into string interpolator
Embedding a language into string interpolator
 
Introduction to MongoDB and Workshop
Introduction to MongoDB and WorkshopIntroduction to MongoDB and Workshop
Introduction to MongoDB and Workshop
 
Couchdb Nosql
Couchdb NosqlCouchdb Nosql
Couchdb Nosql
 
Painless Persistence in a Disconnected World
Painless Persistence in a Disconnected WorldPainless Persistence in a Disconnected World
Painless Persistence in a Disconnected World
 
Building Your First Big Data Application on AWS
Building Your First Big Data Application on AWSBuilding Your First Big Data Application on AWS
Building Your First Big Data Application on AWS
 
Solutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
Solutions for bi-directional Integration between Oracle RDMBS & Apache KafkaSolutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
Solutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
 
Solutions for bi-directional integration between Oracle RDBMS and Apache Kafk...
Solutions for bi-directional integration between Oracle RDBMS and Apache Kafk...Solutions for bi-directional integration between Oracle RDBMS and Apache Kafk...
Solutions for bi-directional integration between Oracle RDBMS and Apache Kafk...
 

More from Scott Leberknight (6)

JShell & ki
JShell & kiJShell & ki
JShell & ki
 
JUnit Pioneer
JUnit PioneerJUnit Pioneer
JUnit Pioneer
 
JDKs 10 to 14 (and beyond)
JDKs 10 to 14 (and beyond)JDKs 10 to 14 (and beyond)
JDKs 10 to 14 (and beyond)
 
Unit Testing
Unit TestingUnit Testing
Unit Testing
 
SDKMAN!
SDKMAN!SDKMAN!
SDKMAN!
 
JUnit 5
JUnit 5JUnit 5
JUnit 5
 

Recently uploaded

Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsAndrey Dotsenko
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 

Recently uploaded (20)

Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 

Polyglot Persistence

  • 8. ...then AJAX and JavaScript
  • 9. InitialContext ic = new InitialContext(); DataSource ds = ic.lookup("java:comp/env/jdbc/cof Connection con = null; Statement stmt = null; ResultSet rs = null; try { con = ds.getConnection(); stmt = con.createStatement(); rs = stmt.executeQuery("select name, price from List<Coffee> coffees = new ArrayList<Cofee>(); while (rs.next()) { String name = rs.getString("name"); float price = rs.getFloat("price"); coffees.add(new Coffee(name, price); } } catch (SQLException sqlex) { ...and now PERSISTENCE
  • 10. Why? Scalability (on massive scales) High availability New types of apps, e.g. social networking Fault tolerance Distributability Flexibility (i.e. "schemaless")
  • 11. Why? One size does not fit all
  • 12. Relational Document Oriented Object Bigtable-ish A few types of Databases... Key-value EAV (Entity-Attribute-Value)
  • 16. ACID in Action 1st Bank checking savings customers Transfer $1000 from 1st Bank checking to savings
  • 18. BASE in Action 1st Bank checking savings customers Transfer $1000 from 1st Bank checking to Bank of Foo savings Bank of Foo account account_type customer
  • 21. "When designing distributed web services, there are three properties that are commonly desired: consistency, availability, and partition tolerance. It is impossible to achieve all three." - "Brewer's Conjecture and the Feasibility of Consistent, Available, Partition-Tolerant Web Services" Seth Gilbert and Nancy Lynch (MIT)
  • 23. We're living in interesting times... Explosion of alternative persistence choices Completely new philosophies on persistence
  • 27. Relations (tables, joins, integrity) ACID guarantees Query using SQL Strict schema Difficult to scale, partition (e.g. 2-phase commit) By far most popular persistence choice today Mismatch with OO languages
  • 28. select * from fakenames f where f.surname like 'Smi%' and f.city = 'Richmond' and f.state = 'VA' order by f.surname, f.given_name; 28
  • 29. Scaling... Buy a bigger machine (vertical scaling)
  • 30. What if there is no bigger machine? Horizontal scaling: Functional Sharding
  • 31. Users 0 Users 1 Products 0 Orders 0 Orders 1 Orders 2 Functional Shards
  • 33. "As opposed to Relational Databases, document-based databases do not store data in tables with uniform sized fields for each record. Instead, each record is stored as a document that has certain characteristics. Any number of fields of any length can be added to a document. Fields can also contain multiple pieces of data." - Wikipedia (http://en.wikipedia.org/wiki/Document-oriented_database)
  • 34. Examples: Lotus Notes Apache CouchDB Amazon SimpleDB (for our purposes anyway) MongoDB
  • 36.
  • 40.
  • 41. Views JavaScript as description language Map/Reduce functions Add structure to semi-structured data Independent of actual documents (created in special Design Documents)
  • 43. // Map function to find Seattlites function(doc) { if (doc.State == "WA" && doc.City == "Seattle") { emit(doc.Number, { "GivenName":doc.GivenName, "Surname":doc.Surname }); } } 43
  • 44. // Map function function(doc) { emit(doc.State, 1); } // Reduce function; aggregates counts function (key, values) { return sum(values); } 44 Counting people by state...
  • 45. Views are not meant to be created dynamically like SQL queries! Caution: To keep view querying fast, the view engine maintains indexes of its views, and incrementally updates them to reflect changes in the database. CouchDB’s core design is largely optimized around the need for efficient, incremental creation of views and their indexes. - http://couchdb.apache.org/docs/overview.html
  • 47. "Amazon SimpleDB is a web service for running queries on structured data in real time. This service works in close conjunction with Amazon Simple Storage Service (Amazon S3) and Amazon Elastic Compute Cloud (Amazon EC2), collectively providing the ability to store, process and query data sets in the cloud. These services are designed to make web-scale computing easier and more cost-effective for developers." - SimpleDB Developer Guide (Version 2007-11-07)
  • 48. "A traditional, clustered relational database requires a sizable upfront capital outlay, is complex to design, and often requires a DBA to maintain and administer.Amazon SimpleDB is dramatically simpler, requiring no schema, automatically indexing your data and providing a simple API for storage and access.This approach eliminates the administrative burden of data modeling, index maintenance, and performance tuning. Developers gain access to this functionality within Amazon’s proven computing environment, are able to scale instantly, and pay only for what they use." - SimpleDB Developer Guide (Version 2007-11-07)
  • 49. Organize data into domains Domains have items Items have attributes Attributes have value(s)
  • 51. Domain: Amazon "Full Screen" "Mens" "Entertainment" Color Size Length "DVDs" "White" "Yellow" "Beige" "Pink" Format "Clothes" "Blue" "Gray" "Black" "Books" "Sound of Music" "Item03" "Blouse" "Item02" "Full Screen" "Widescreen" "Entertainment" "174 min" SubcategoryID Author "Kurt Vonnegut " "Womens" "Item04" "Item05" "Item01" "Pulp Fiction""DVDs" Name "Small" "Medium" "Large" "Slaugherhouse Five" Category "Clothes" "Entertainment" "154 min" "168 min (special edition)" "30x30" "32x30" "34x30" ... "Jeans"
  • 52. "REST" API POST / HTTP/1.1 Content-Type: application/x-www-form-urlencoded; charset=utf-8 User-Agent: Amazon Simple DB Java Library Host: sdb.amazonaws.com Content-Length: 232 Action=CreateDomain& DomainName=Fakenames& AWSAccessKeyId=[your AWS access key id]& SignatureVersion=2& SignatureMethod=HmacSHA256& Signature=[computed signature]& Timestamp=2009-03-23T23%3A58%3A55.327Z& Version=2007-11-07
  • 53. Available APIs: Java C# Perl PHP VB Ruby gems: aws-simpledb aws-sdb simpledb Amazon 3rdparty Python: polarrose-twisted-amazon
  • 54. AmazonSimpleDB service = new AmazonSimpleDBClient(accessKeyId, secretAccessKey); // Create a new domain CreateDomainRequest cdReq = new CreateDomainRequest().withDomainName("Fakenames"); CreateDomainResponse cdResp = service.createDomain(cdReq); // List all our domains ListDomainsRequest ldReq = new ListDomainsRequest(); ListDomainsResponse ldResp = service.listDomains(ldReq); 54
  • 56. // Add an attribute value ReplaceableAttribute newEmail = new ReplaceableAttribute("emailAddress", "bortiz@spammail.com", false); PutAttributesRequest request = new PutAttributesRequest() .withDomainName("Fakenames") .withItemName("1") .withAttribute(newEmail); PutAttributesResponse response = service.putAttributes(request); 56
  • 58. // Query for Richmonders String query = "['city' = 'Richmond'] intersection ['state' = 'VA']"; QueryRequest request = new QueryRequest() .withDomainName("Fakenames") .withQueryExpression(query); QueryResponse response = service.query(request); 58
  • 59. // Query for Richmonders, with attributes String query = "['city' = 'Richmond'] intersection ['state' = 'VA']"; QueryWithAttributesRequest request = new QueryWithAttributesRequest() .withDomainName("Fakenames") .withQueryExpression(query); QueryWithAttributesResponse response = service.query(request); 59
  • 61. // Get a count String query = "select count(*) from Fakenames"; SelectRequest request = new SelectRequest().withSelectExpression(query); SelectResponse response = service.select(request); 61
  • 62. // Select Richmonders String query = "select * from Fakenames" + " where city = 'Richmond' intersection state = 'VA'" + " intersection surname like 'Smi%'"; SelectRequest request = new SelectRequest().withSelectExpression(query); SelectResponse response = service.select(request); 62
  • 63. There are Limits! Query execution time <= 5 sec Max items in query response = 250 See SimpleDB Developer Guide for more... Size limits <= 1024 bytes Attribute limit per item <= 256
  • 64. (May I have another?) <QueryResponse xmlns="http://sdb.amazonaws.com/doc/2007-11-07/"> <QueryResult> <ItemName> 131 </ItemName> ... <NextToken> rO0ABXNyACdjb20uYW1hem9uLnNkcy5RdWVyeVByb2Nlc3Nvci5Nb3JlVG9r racXLnINNqwMACkkAFGluaXRpYWxDb25qdW5jdEluZGV4WgAOaXNQYWdlQm91bmRhc ... </NextToken> </QueryResult> <ResponseMetadata> ... </ResponseMetadata> </QueryResponse> NextToken
  • 65. Eventually consistent(*) "Amazon SimpleDB keeps multiple copies of each domain.When data is written or updated...all copies of the data are updated. However, it takes time for the data to propagate to all storage locations.The data will eventually be consistent, but an immediate read might not show the change. Consistency is usually reached within seconds, but a high system load or network partition might increase this time. Performing a read after a short period of time should return the updated data." (Version 2007-11-07) - SimpleDB Developer Guide
  • 66. (*) ConsistentRead Version 2009-04-15 added consistent read option "If eventually consistent reads are not acceptable for your application, use ConsistentRead.Although this operation might take longer than a standard read, it always returns the last updated value." (Version 2009-04-15) - SimpleDB Developer Guide
  • 68. value = store.get(key) store.put(key, value) store.remove(key) 68 Basically...
  • 69. Data stored as key/value pairs "A big hashtable" Replication Fault tolerance Data consistency & versioning Horizontal scaling
  • 71. Distributed key-value storage system Used by Amazon core and web services (e.g. your Amazon shopping cart...) Massively scaleable Fault tolerant Eventually consistent
  • 73. What is it? "a distributed key-value storage system" automatic replication across multiple servers transparent server failure handling automatic data item versioning
  • 74. "Voldemort is not a relational database, it does not attempt to satisfy arbitrary relations while satisfying ACID properties. Nor is it an object database that attempts to transparently map object reference graphs. Nor does it introduce a new abstraction such as document- orientation. It is basically just a big, distributed, persistent, fault-tolerant hash table." http://project-voldemort.com/
  • 75. designed for horizontal scaling used at LinkedIn "for certain high-scalability storage problems where simple functional partitioning is not sufficient"
  • 76. "Consistent hashing" No single server holds all data Data partitioned across multiple servers Versioning using "vector clocks"
  • 77. Configuration: cluster.xml describes cluster (servers, data partitions) stores.xml describes data stores (persistence, routing, key/value data format, replication factor, preferred reads/writes, required reads/writes)
  • 78. <cluster> <name>mycluster</name> <server> <id>0</id> <host>localhost</host> <http-port>8081</http-port> <socket-port>6666</socket-port> <partitions>0, 1, 2, 3</partitions> </server> <server> <id>1</id> <host>localhost</host> <http-port>8082</http-port> <socket-port>6667</socket-port> <partitions>4, 5, 6, 7</partitions> </server> </cluster> 78 sample cluster.xml
  • 80. > locate "1" Node 0 host: localhost port: 6666 available: yes last checked: 96171 ms ago Node 1 host: localhost port: 6667 available: yes last checked: 96171 ms ago Node 2 host: localhost port: 6668 available: yes last checked: 96172 ms ago 80 replication
  • 81. $ ./voldemort-shell.sh people tcp://localhost:6666 Established connection to people via tcp://localhost:6666 > put "1" { "GivenName":"Bob", "Surname":"Smith" } > get "1" version(0:1): {"GivenName":"Bob", "Surname":"Smith", } > put "1" { "GivenName":"Robert", "Surname":"Smith", } > get "1" version(0:2): {"GivenName":"Robert", "Surname":"Smith", } 81 vector clock (master node: version)
  • 82. StoreClientFactory factory = new SocketStoreClientFactory(numThreads, numThreads, maxQueuedRequests, maxConnectionsPerNode, maxTotalConnections, bootstrapUrl); StoreClient<Integer, Map<String, Object>> client = factory.getStoreClient("fakenames"); // Update a value Versioned versioned = client.get(1); Map<String, Object> person = versioned.getValue(); person.put("EmailAddress", newEmailAddr); versioned.setObject(person); client.put(1, versioned); 82 Java API example
  • 84. - Bigtable:A Distributed Storage System for Structured Data http://labs.google.com/papers/bigtable.html "Bigtable is a distributed storage system for managing structured data that is designed to scale to a very large size: petabytes of data across thousands of commodity servers. Many projects at Google store data in Bigtable including web indexing, Google Earth, and Google Finance."
  • 85. "A Bigtable is a sparse, distributed, persistent multidimensional sorted map" - Bigtable:A Distributed Storage System for Structured Data http://labs.google.com/papers/bigtable.html
  • 86. ?
  • 88. (row key, column key, timestamp) => value The map is indexed by a row key, column key, and a timestamp; each value in the map is an uninterpreted array of bytes. - Bigtable:A Distributed Storage System for Structured Data http://labs.google.com/papers/bigtable.html
  • 89. Key Concepts: row key => 20090407152657 column family => "name:" column key => "name:first", "name:last" timestamp => 1239124584398
  • 90. Row Key Timestamp Column Family "info:"Column Family "info:" Column Family "content:" 20090407145045 t7 "info:summary" "An intro to..."20090407145045 t6 "info:author" "John Doe" 20090407145045 t5 "Google's Bigtable is..." 20090407145045 t4 "Google Bigtable is..." 20090407145045 t3 "info:category" "Persistence" 20090407145045 t2 "info:author" "John" 20090407145045 t1 "info:title" "Intro to Bigtable" 20090320162535 t4 "info:category" "Persistence"20090320162535 t3 "CouchDB is..." 20090320162535 t2 "info:author" "Bob Smith" 20090320162535 t1 "info:title" "Doc-oriented..."
  • 91. Row Key Timestamp Column Family "info:"Column Family "info:" Column Family "content:" 20090407145045 t7 "info:summary" "An intro to..."20090407145045 t6 "info:author" "John Doe" 20090407145045 t5 "Google's Bigtable is..." 20090407145045 t4 "Google Bigtable is..." 20090407145045 t3 "info:category" "Persistence" 20090407145045 t2 "info:author" "John" 20090407145045 t1 "info:title" "Intro to Bigtable" 20090320162535 t4 "info:category" "Persistence"20090320162535 t3 "CouchDB is..." 20090320162535 t2 "info:author" "Bob Smith" 20090320162535 t1 "info:title" "Doc-oriented..." Ask for row 20090407145045...
  • 92. Apache HBase (an open source Bigtable implementation)
  • 93. HBase uses a data model very similar to that of Bigtable. Applications store data rows in labeled tables.A data row has a sortable row key and an arbitrary number of columns.The table is stored sparsely, so that rows in the same table can have widely varying numbers of columns. - http://wiki.apache.org/hadoop/Hbase/HbaseArchitecture
  • 94. hbase(main):001:0> create 'blog', 'info', 'content' 0 row(s) in 4.3640 seconds hbase(main):002:0> put 'blog', '20090320162535', 'info:title', 'Document-oriented storage using CouchDB' 0 row(s) in 0.0330 seconds hbase(main):003:0> put 'blog', '20090320162535', 'info:author', 'Bob Smith' 0 row(s) in 0.0030 seconds hbase(main):004:0> put 'blog', '20090320162535', 'content:', 'CouchDB is a document-oriented...' 0 row(s) in 0.0030 seconds hbase(main):005:0> put 'blog', '20090320162535', 'info:category', 'Persistence' 0 row(s) in 0.0030 seconds hbase(main):006:0> get 'blog', '20090320162535' COLUMN CELL content: timestamp=1239135042862, value=CouchDB is a doc... info:author timestamp=1239135042755, value=Bob Smith info:category timestamp=1239135042982, value=Persistence info:title timestamp=1239135042623, value=Document-oriented... 4 row(s) in 0.0140 seconds 94 HBase Shell
  • 95. hbase(main):015:0> get 'blog', '20090407145045', {COLUMN=>'info:author', VERSIONS=>3 } timestamp=1239135325074, value=John Doe timestamp=1239135324741, value=John 2 row(s) in 0.0060 seconds hbase(main):016:0> scan 'blog', { STARTROW => '20090300', STOPROW => '20090400' } ROW COLUMN+CELL 20090320162535 column=content:, timestamp=1239135042862, value=CouchDB is... 20090320162535 column=info:author, timestamp=1239135042755, value=Bob Smith 20090320162535 column=info:category, timestamp=1239135042982, value=Persistence 20090320162535 column=info:title, timestamp=1239135042623, value=Document... 4 row(s) in 0.0230 seconds 95
  • 97. // Create a new table HBaseAdmin admin = new HBaseAdmin(new HBaseConfiguration()); HTableDescriptor descriptor = new HTableDescriptor("mytable"); descriptor.addFamily(new HColumnDescriptor("family1:")); descriptor.addFamily(new HColumnDescriptor("family2:")); descriptor.addFamily(new HColumnDescriptor("family3:")); admin.createTable(descriptor); 97
  • 98. // Add some data into 'mytable' HTable table = new HTable("mytable"); BatchUpdate update = new BatchUpdate("row1"); update.put("family1:aaa", Bytes.toBytes("some value")); table.commit(update); // Get data back RowResult result = table.getRow("row1"); Cell cell = result.get("family1:aaa"); // Overwrite earlier value and add more data BatchUpdate update2 = new BatchUpdate("row1"); update2.put("family1:aaa", Bytes.toBytes("some value")); update2.put("family2:bbb", Bytes.toBytes("another value")); table.commit(update2); 98
  • 99. Finding data: get (by row key) scan (by row key ranges, filtering) Secondary indexes allow scanning by different keys (a bit more flexibility, requires more storage)
  • 100. // Scan for people born during January 1960 HTable table = new HTable("fakenames"); byte[][] columns = Bytes.toByteArrays(new String[]{ "name:", "gender:" }); byte[] startRow = Bytes.toBytes("19600101"); byte[] endRow = Bytes.toBytes("19600201"); Scanner scanner = table.getScanner(columns, startRow, endRow); for (RowResult result: scanner) { ... } scanner.close(); 100
  • 102. one size does not fit all lots of alternatives think about what you really need... (not what's currently "hot")
  • 103. What do you really need? distributed deployment? fault tolerance? query richness? schema evolution? extreme scalability? ability to enforce relationships? ACID or BASE? key/value storage?
  • 104. Even more alternatives... XML databases Semantic Web / RDF / Triplestores Graph databases Tuplespaces
  • 106. General Polyglot Persistence http://www.sleberknight.com/blog/sleberkn/entry/polyglot_persistence Database Thaw http://martinfowler.com/bliki/DatabaseThaw.html Application Design in the context of the shifting storage spectrum http://qconsf.com/sf2008/presentation/Application+Design+in+the+context+of+the+shifting+storage+spectrum BASE:An Acid Alternative http://queue.acm.org/detail.cfm?id=1394128 The Challenges of Latency http://www.infoq.com/articles/pritchett-latency One size fits all:A concept whose time has come and gone http://www.databasecolumn.com/2007/09/one-size-fits-all.html http://www.cs.brown.edu/~ugur/fits_all.pdf The End of an Architectural Era (It's Time for a Complete Rewrite) http://db.cs.yale.edu/vldb07hstore.pdf Brewer’s Conjecture and the Feasibility of Consistent,Available, Partition-Tolerant Web Services http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.20.1495
  • 107. General Semi-Structured Data http://www.dcs.bbk.ac.uk/~ptw/teaching/ssd/toc.html Latency is Everywhere and it CostsYou Sales - How to Crush it http://highscalability.com/latency-everywhere-and-it-costs-you-sales-how-crush-it QCon London 2009: Database projects to watch closely http://gojko.net/2009/03/11/qcon-london-2009-database-projects-to-watch-closely Memories, Guesses, and Apologie http://blogs.msdn.com/pathelland/archive/2007/05/15/memories-guesses-and-apologies.aspx Column-oriented databases http://en.wikipedia.org/wiki/Column-oriented_DBMS Entity-Attribute-Value model http://en.wikipedia.org/wiki/Entity-Attribute-Value_model Read Consistency: Dumb Databases, Smart Services http://blog.labnotes.org/2007/09/20/read-consistency-dumb-databases-smart-services/ Neo4j graph database http://neo4j.org/ NoSql web site - "Your Ultimate Guide to the Non-Relational Universe" http://nosql-database.org/
  • 108. Document-Oriented Databases Document-Oriented Database http://en.wikipedia.org/wiki/Document-oriented_database Apache CouchDB http://couchdb.apache.org/ Why CouchDB? http://pmuellr.blogspot.com/2008/01/why-couchdb.html Why CouchDB Sucks http://www.eflorenzano.com/blog/post/why-couchdb-sucks/ Damien Katz CouchDB Interview http://www.infoq.com/news/2008/11/CouchDB-Damien-Katz CouchDB:Thinking beyond the RDBMS http://blog.labnotes.org/2007/09/02/couchdb-thinking-beyond-the-rdbms/ CouchDB Implementation http://horicky.blogspot.com/2008/10/couchdb-implementation.html Dare Takes a Look at CouchDB http://intertwingly.net/blog/2007/09/12/Dare-Takes-a-Look-at-CouchDB
  • 109. Document-Oriented Databases CouchDB - A Use Case http://kore-nordmann.de/blog/couchdb_a_use_case.html Amazon SimpleDB http://aws.amazon.com/simpledb/ http://en.wikipedia.org/wiki/SimpleDB thrudb - Document Oriented Database Services http://code.google.com/p/thrudb/ thrudb - faster, cheaper than SimpleDB http://www.igvita.com/2007/12/28/thrudb-faster-and-cheaper-than-simpledb/ QCon 2008 track on Document-Oriented Distributed Databases http://qconsf.com/sf2008/tracks/show_track.jsp?trackOID=170
  • 110. Distributed K-V Stores Amazon's Dynamo http://www.allthingsdistributed.com/2007/10/amazons_dynamo.html http://www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdf Anti-RDBMS:A list of distributed key-value stores http://www.metabrew.com/article/anti-rdbms-a-list-of-distributed-key-value-stores/ http://www.reddit.com/r/programming/comments/7qv19/antirdbms_a_list_of_distributed_keyvalue_stores/ Is the Relational Database Doomed? http://developers.slashdot.org/comments.pl?sid=1127539&cid=26849641 ProjectVoldemort http://project-voldemort.com/ ProjectVoldemort design (also see excellent list of references from this page) http://project-voldemort.com/design.php Consistent Hashing http://en.wikipedia.org/wiki/Consistent_hashing
  • 111. Bigtable / HBase Google Architecture http://highscalability.com/google-architecturehttp://highscalability.com/google-architecture Bigtable:A Distributed Storage System for Structured Data http://en.wikipedia.org/wiki/BigTable http://labs.google.com/papers/bigtable.html http://labs.google.com/papers/bigtable-osdi06.pdf Apache HBase http://hadoop.apache.org/hbase/ http://en.wikipedia.org/wiki/HBase Apache Hadoop http://hadoop.apache.org/ Understanding HBase and BigTable http://jimbojw.com/wiki/index.php?title=Understanding_Hbase_and_BigTable Matching Impedance:When to use HBase http://blog.rapleaf.com/dev/?p=26 HBase Leads Discuss Hadoop, BigTable and Distributed Databases http://www.infoq.com/news/2008/04/hbase-interview Hadoop/HBase vs RDBMS http://www.docstoc.com/docs/2996433/Hadoop-and-HBase-vs-RDBMS