I threw a few slides together based on my notes from the conference I attended last week called NoSQLEast. There were some very interesting presentations on how people deal with developing and maintaining massive systems.
Some scalable systems<br />Google ~ BigTable<br />Amazon ~ Dynamo ~ SimpleDB<br />Microsoft ~Powerset ~ Bing ~ Dynomite<br />Twitter ~ Hadoop ~ Pig<br />Facbook ~ Digg ~ Cassandra ~ Thrift<br />Nasdaq ~ tin ~ text & filesystem<br />Akamai ~ Riak<br />Ubuntu ~ LHC ~ BBC ~ CouchDB<br />Linkedin ~ Gilt ~ Voldemort<br />Business Insider ~ MongoDB<br />Stuff built in Erlangby guys with physics degrees<br />
How they define scalable<br />If I add Xresources, then I gain Xperformance.<br />If I double my nodes (servers), then I should get double the computing power.<br />If I double my processors, then the processing should take half as long to do.<br />If I double my network bandwidth, then I should be able to transmit twice as fast or twice as much data.<br />If we double the amount of developers, then we should get twice the amount of work done.<br />
Some chatter dump<br />No… SQL, ORMs, Schemas, Joins, Foreign Keys, Transactions, ACID, RDBMS<br />Distributed Key/Value Stores ~ Document-oriented Database ~ MapReduce<br />Functional Languages ~ Erlang ~ F# ~ No OO<br />RESTful ~ JSON ~ BSON ~ HTTP<br />Horizontal vs. Vertical Scaling<br />Google Bigtable Paper<br />Dynamo Amazon Paper<br />CAP Theorem (Consistency, Availability, Partition Tolerance) ~ Only 2 @ a time.<br />BASE ~ Eventually Consistent for High Availability ~ DNS<br />SLA ~ Number of 9s<br />Code for Failure ~ Fault-tolerance ~ Graceful Degradation<br />SN (Shared Nothing) Architecture ~ No bottlenecks <br />Sharding~ Horizontal Partitioning<br />Distributed Map ~ Consistent Hashing (Ring of Nodes)<br />Sloppy Quorum ~ Minimum Nodes for R/W<br />Hinted Handoff ~ Always Writeable ~ Handles Temp failures<br />Merkle Tree Replication ~ Handles Permanent Failures<br />Fault-tolerance ~ Read-Repair ~ Replication<br />Vector Clocks (node, counter) ~ No Wall Clocks<br />SuperColumns ~ ColumnFamily<br />Stateless App Servers ~ P2P Bootstrapping<br />CDN (Content Delivery Network)<br />MVCC (Multiversion Concurrency Control) ~ B-tree ~ Tail Appends ~ Cluster Rebalancing<br />
Some popular reads<br />(Brewer’s CAP theorem) Towards a Robust Distributed Systems http://www.cs.berkeley.edu/~brewer/cs262b-2004/PODC-keynote.pdf<br />(Google) Bigtable: A Distributed Storage System for Structured Data http://labs.google.com/papers/bigtable-osdi06.pdf<br />Dynamo: Amazon’s Highly Available Key-value Store http://s3.amazonaws.com/AllThingsDistributed/sosp/amazon-dynamo-sosp2007.pdf<br />