Nonrelational Databases

Non-relational Databases A new kind of Databases for handling Web Scale

The problem The Web introduces a new scale for applications, in terms of: Concurrent users (millions of reqs/second)

Data (peta-bytes generated daily)

Processing (all this data needs processing)

Exponential growth (surging unpredictable demands)

The problem (contd.) Web sites with very large traffic have no way to deal with this using existing RDBMS solutions: Oracle

PostgreSQL Even with their high-end clustering solutions

The problem (contd.) Why? Applications using normalized database schema require the use of join's, which doesn't perform well under lots of data and/or nodes

Existing RDBMS clustering solutions require scale-up, which is limited & not really scalable when dealing with exponential growth

Machines have upper limits on capacity, & sharding the data & processing across machines is very complex & app-specific

The problem (contd.) Why not just use sharding? Very problematic when adding/removing nodes

Basically, you end up denormalizing everything & loosing all benefits of relational databases

Who faced this problem? Web applications dealing with high traffic, massive data, large user-base & user-generated content, such as: Google

1 difference though Compared to traditional large applications (telco, financial, &c), these web applications are usually free & therefore: can sacrifice data integrity / consistency No one will sue them if he doesn't receive the most current: status of their friends (Facebook/Twitter)

Web search result (Google /Yahoo!)

The solution These companies had to come up with a new kind of DBMS, capable of handling web scale Possibly sacrificing some level of consistency or some other feature

Must we sacrifice something? In 2000, Eric Brewer (co-founder of Inktomi) formulated the CAP theorem, claiming that you can only optimize 2 out of these 3: C onsistency

P artition-tolerance BTW, the theorem was later proved by MIT scientists in 2002

Simple example When you have a lot of data which needs to be highly available, you'll usually need to p artition it across machines & also replicate it to be more fault-tolerant

This means, that when writing a record, all replica's must be updated too

Now you need to choose between: Lock all relevant replica's during update => be less a vailable

Nonrelational Databases

More Related Content

What's hot

Similar to Nonrelational Databases

More from Udi Bauman

Recently uploaded

Nonrelational Databases