An introduction to
      NoSQL


                     Radu Potop
NoSQL

● umbrella term
●
  non-relational data storage
● no fixed table schemas

● a fresh take on the database


technology
Relational databases have issues in
handling big volumes of data

Some companies and their
databases:
●
  Digg.com - 3 TB for green badges
● Facebook - 50 TB for inbox search

●
  eBay - 2 P(eta)B in total
Issues

●
  horizontal scalability
● server performance

●
  rigid schemas
● distribution across servers
Characteristics of NoSQL

● no ACID guarantees (Atomicity,
Consistency, Isolation, Durability)
● highly distributed

● scalable

●
  better performance - they don't
have to handle relations
NoSQL databases examples:

● Google Bigtable (used intensively
by almost everything made by
Google)
●
  Amazon Dynamo (used by Amazon
S3)
● Facebook Cassandra

●
  Apache HBase
● LinkedIn Voldemort
Some types of databases:

●
     Document Oriented databases
    ● JSON format, XML databases

    ● examples: CouchDB, BaseX




●    Key - Value pairs databases
    ●
       values can be more than strings
      (set of strings)
    ●
       examples: Redis, Cassandra
CouchDB

● created by the Apache
Foundation
●
  written in Erlang
● open source

●
  document oriented database
● stores data as JSON documents


collection
● queried via REST API
●
  JavaScript is the default language
● also supported:


    PHP, Ruby, Python and Erlang
● built-in replication features

● used by Ubuntu One
JSON document
{
  "_id" : "fc5e038d38a570",
  "_rev" : "D546012",

    "to" : "email@example",
    "subject" : "helloWorld",
    "body" : "some text"
}
Operations with these documents

●
   HTTP requests:
  ● GET (select), POST (create), PUT


   (update), DELETE (delete).
●
   HTTP AUTH
● Aplications: curl, Futon

●
   JavaScript
● any application that knows HTTP


requests
Futon interface
Redis

●
  key - value database
● written in C

●
  open source
● networked

● in-memory

●
  persistent database
● similar to memcached

●
  data is non-volatile
● atomic operations
●
  very high performance
    ~100.000 operations/second
    by 50 parallel clients
● all data is kept in memory -


blazing fast
●
  periodic synchronization to hard-
drive
●
  powerful replication
●
 bindings for a lot of languages:
PHP, Ruby, Python, C, Java, etc.

    SET foo bar
    GET foo => bar

SET - insert
GET - select
Key - value based databases
became very popular lately

Other key-value databases:
● Facebook's Cassandra (now also


used by Digg)
● GM.T

●
  MemcacheDB (a persistence
enabled variant of memcached)
●
  LinkedIn Voldemort
Conclusion

● relational databases are not the
holy grail of data storage
●
  scalability issues determined
large corporations to look to other
solutions
● don't believe the FUD and give


them a try
Thank you

NoSQL

  • 1.
    An introduction to NoSQL Radu Potop
  • 2.
    NoSQL ● umbrella term ● non-relational data storage ● no fixed table schemas ● a fresh take on the database technology
  • 3.
    Relational databases haveissues in handling big volumes of data Some companies and their databases: ● Digg.com - 3 TB for green badges ● Facebook - 50 TB for inbox search ● eBay - 2 P(eta)B in total
  • 4.
    Issues ● horizontalscalability ● server performance ● rigid schemas ● distribution across servers
  • 5.
    Characteristics of NoSQL ●no ACID guarantees (Atomicity, Consistency, Isolation, Durability) ● highly distributed ● scalable ● better performance - they don't have to handle relations
  • 6.
    NoSQL databases examples: ●Google Bigtable (used intensively by almost everything made by Google) ● Amazon Dynamo (used by Amazon S3) ● Facebook Cassandra ● Apache HBase ● LinkedIn Voldemort
  • 7.
    Some types ofdatabases: ● Document Oriented databases ● JSON format, XML databases ● examples: CouchDB, BaseX ● Key - Value pairs databases ● values can be more than strings (set of strings) ● examples: Redis, Cassandra
  • 8.
    CouchDB ● created bythe Apache Foundation ● written in Erlang ● open source ● document oriented database ● stores data as JSON documents collection
  • 9.
    ● queried viaREST API ● JavaScript is the default language ● also supported: PHP, Ruby, Python and Erlang ● built-in replication features ● used by Ubuntu One
  • 10.
    JSON document { "_id" : "fc5e038d38a570", "_rev" : "D546012", "to" : "email@example", "subject" : "helloWorld", "body" : "some text" }
  • 11.
    Operations with thesedocuments ● HTTP requests: ● GET (select), POST (create), PUT (update), DELETE (delete). ● HTTP AUTH ● Aplications: curl, Futon ● JavaScript ● any application that knows HTTP requests
  • 12.
  • 13.
    Redis ● key- value database ● written in C ● open source ● networked ● in-memory ● persistent database ● similar to memcached ● data is non-volatile
  • 14.
    ● atomic operations ● very high performance ~100.000 operations/second by 50 parallel clients ● all data is kept in memory - blazing fast ● periodic synchronization to hard- drive ● powerful replication
  • 15.
    ● bindings fora lot of languages: PHP, Ruby, Python, C, Java, etc. SET foo bar GET foo => bar SET - insert GET - select
  • 16.
    Key - valuebased databases became very popular lately Other key-value databases: ● Facebook's Cassandra (now also used by Digg) ● GM.T ● MemcacheDB (a persistence enabled variant of memcached) ● LinkedIn Voldemort
  • 17.
    Conclusion ● relational databasesare not the holy grail of data storage ● scalability issues determined large corporations to look to other solutions ● don't believe the FUD and give them a try
  • 18.