Non-relational Databases A new kind of Databases for handling Web Scale
Agenda The problem
The solution
Benefits
Cost
Example: Cassandra
The problem The Web introduces a new scale for applications, in terms of: Concurrent users  (millions of reqs/second)
Data  (peta-bytes generated daily)
Processing  (all this data needs processing)
Exponential growth  (surging unpredictable demands)
The problem (contd.) Web sites with very large traffic have no way to deal with this using existing RDBMS solutions: Oracle
MS SQL
Sybase
MySQL
PostgreSQL Even with their high-end clustering solutions
The problem (contd.) Why? Applications using normalized database schema require the use of join's, which doesn't perform well under lots of data and/or nodes
Existing RDBMS clustering solutions require scale-up, which is limited & not really scalable when dealing with exponential growth
Machines have upper limits on capacity, & sharding the data & processing across machines is very complex & app-specific
The problem (contd.) Why not just use sharding? Very problematic when adding/removing nodes
Basically, you end up denormalizing everything & loosing all benefits of relational databases
Who faced this problem? Web applications dealing with high traffic, massive data, large user-base & user-generated content, such as: Google
Yahoo!
Amazon
Facebook
Twitter
Linked-In
& many more
1 difference though Compared to traditional large applications (telco, financial, &c), these web applications are usually  free  & therefore: can sacrifice data integrity / consistency No one will sue them if he doesn't receive the most current: status of their friends (Facebook/Twitter)
Web search result (Google /Yahoo!)
Item added to cart (Amazon)
The solution These companies had to come up with a new kind of DBMS, capable of handling web scale Possibly sacrificing some level of consistency or some other feature
Must we sacrifice something? In 2000, Eric Brewer (co-founder of Inktomi) formulated the CAP theorem, claiming that you can only optimize 2 out of these 3: C onsistency
A vailability
P artition-tolerance BTW, the theorem was later proved by MIT scientists in 2002
Simple example When you have a lot of data which needs to be highly available, you'll usually need to  p artition it across machines & also replicate it to be more fault-tolerant
This means, that when writing a record, all replica's must be updated too
Now you need to choose between: Lock all relevant replica's during update => be less  a vailable

Nonrelational Databases