0
NoSQL
In the Context of Social Web


     Summer of Web 2010




                               Bogdan Gaza
About me
• Student at Faculty of Computer Science -
  first year
• Ruby & Rails fan
• Building RailsAdmin for RubySOC 2010
...
Data, data Everywhere
Data growth on the web

 • Facebook Photos +25TB/week
 • Twitter +7TB/day
 • Flickr +21GB/hour
 • +150GB of tweets while I...
Data size
                                                       988.00
 1000.00



  750.00
                             ...
How to store this huge amount of data?
Databases
           More T      RDBMS
                       MySQL, Oracle




Online transaction
processing (OLTP)


   ...
RDBMS
• Relational database management system
• Based on E.F. Codd - relational model - 1969
• Dominant for transactional ...
RDBMS performance
              SalaryList


                           Majority of
                           Web Apps
Pe...
NoSQL

• Doesn’t mean No to SQL
• It actually means Not Only SQL
• A class of data stores that may not require
  fixed tabl...
NoSQL Categories

• Key-Value stores: Voldemort
• BigTable clones: HBase
• Document Databases: Mongodb, Couchdb
• Graph Da...
Common grounds
• They all support huge amount of data
• The majority of them supports replication
  and sharding
• Have so...
Key-Value stores

• Scales to huge amount of data
• Can handle massive load
• Based on Amazon’s Dynamo
• A big persistent ...
BigTable clones

• Tables similar to RDBMS but semi-
  structured
• Based on Google’s BigTable paper
• Column oriented
• E...
Document databases

• Similar to Key-Value Stores but the DB
  knows what the Value is
• Collection of key-value collectio...
Graph databases
• Focus in structure of data
• Scales to the complexity of data
• Data stored in nodes
• Lots of cool Grap...
But how do I query it?

• RESTful interface
• QueryAPIs
• SPARQL
• Gremlin - graph traversal database
• GQL - SQL-like Que...
Who uses NoSQL?
Who uses NoSQL?
  BigTable   Cassandra
                             FlockDB




Dynamo         Voldemort
                 ...
• Scalable, high-performance, open-source,
  document oriented database
• Somewhere between key-value stores and
  documen...
• JSON-style documents
  { author: 'joe', created : new
  Date('03-28-2009'), title : 'Yet another blog
  post' }

• Schem...
• Dynamic queries
  db.people.update( { name:"Joe" }, { $inc: { n : 1 } } );
• Replication - very easy to set up
• Indexin...
• Map Reduce search
res = db.events.mapReduce(m, r, { query : {type:'sale'} });

 • Simple querying
db.users.find({'last_na...
Don’t search for:




One database to rule them all!
Use the best suited storage for each kind of data
Thanks!
One more thing!
Hummingbird

• Real time Web Traffic Visualiser
• Let’s you see visitors interacting with your
  site in real time
• Did a ...
Hummingbird technology

• Node.js
  Evented TCP server written in JavaScript
  powered by the V8 JS engine.
• WebSocks
• S...
Hummingbird technology
 • Tracking pixel to gather data
 • Build over
NoSQL in the context of Social Web
NoSQL in the context of Social Web
Upcoming SlideShare
Loading in...5
×

NoSQL in the context of Social Web

5,502

Published on

NoSQL presentation of basic principles.

Published in: Technology
2 Comments
13 Likes
Statistics
Notes
No Downloads
Views
Total Views
5,502
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
317
Comments
2
Likes
13
Embeds 0
No embeds

No notes for slide
































  • Transcript of "NoSQL in the context of Social Web"

    1. 1. NoSQL In the Context of Social Web Summer of Web 2010 Bogdan Gaza
    2. 2. About me • Student at Faculty of Computer Science - first year • Ruby & Rails fan • Building RailsAdmin for RubySOC 2010 • Interested in Scalability and High-Availability • http://twitter.com/hurrycane
    3. 3. Data, data Everywhere
    4. 4. Data growth on the web • Facebook Photos +25TB/week • Twitter +7TB/day • Flickr +21GB/hour • +150GB of tweets while I give this talk • All this amounts multiply every year
    5. 5. Data size 988.00 1000.00 750.00 623.00 500.00 397.00 253.00 250.00 161.00 0 2006 2007 2008 2009 2010 ExaBytes of data stored on the web 1ExaByte = 1018 bytes (source IDC)
    6. 6. How to store this huge amount of data?
    7. 7. Databases More T RDBMS MySQL, Oracle Online transaction processing (OLTP) NoSQL Mongodb, couchdb Less T
    8. 8. RDBMS • Relational database management system • Based on E.F. Codd - relational model - 1969 • Dominant for transactional & analytical applications • Most popular and easy to use RDBMS is MySQL
    9. 9. RDBMS performance SalaryList Majority of Web Apps Performance Social networks Trend analysis Data complexity
    10. 10. NoSQL • Doesn’t mean No to SQL • It actually means Not Only SQL • A class of data stores that may not require fixed table schemas (wikipedia) • Majority of them based on Google’s BigTable
    11. 11. NoSQL Categories • Key-Value stores: Voldemort • BigTable clones: HBase • Document Databases: Mongodb, Couchdb • Graph Databases: Neo4j
    12. 12. Common grounds • They all support huge amount of data • The majority of them supports replication and sharding • Have some sort of failure detection mechanism • Can scale to the complexity of data they store
    13. 13. Key-Value stores • Scales to huge amount of data • Can handle massive load • Based on Amazon’s Dynamo • A big persistent associative-array • Examples: Voldermort, Tokyo Tyrant/Cabinet
    14. 14. BigTable clones • Tables similar to RDBMS but semi- structured • Based on Google’s BigTable paper • Column oriented • Examples: HBase, Cassandra
    15. 15. Document databases • Similar to Key-Value Stores but the DB knows what the Value is • Collection of key-value collections • Documents are often versioned • Example: CouchDB, MongoDB, Redis
    16. 16. Graph databases • Focus in structure of data • Scales to the complexity of data • Data stored in nodes • Lots of cool Graph algorithms can be implemented • Examples: Neo4J, FlockDB
    17. 17. But how do I query it? • RESTful interface • QueryAPIs • SPARQL • Gremlin - graph traversal database • GQL - SQL-like Query Lange for Google BT
    18. 18. Who uses NoSQL?
    19. 19. Who uses NoSQL? BigTable Cassandra FlockDB Dynamo Voldemort MongoDB
    20. 20. • Scalable, high-performance, open-source, document oriented database • Somewhere between key-value stores and document databases • Big community • Tons of features
    21. 21. • JSON-style documents { author: 'joe', created : new Date('03-28-2009'), title : 'Yet another blog post' } • Schema free • Flexibility • Data is stored in collections
    22. 22. • Dynamic queries db.people.update( { name:"Joe" }, { $inc: { n : 1 } } ); • Replication - very easy to set up • Indexing • Sharding • GeoSpatial Index - location based queries
    23. 23. • Map Reduce search res = db.events.mapReduce(m, r, { query : {type:'sale'} }); • Simple querying db.users.find({'last_name': 'Smith'}) • GridFS - for storing large files • Support for many programming languages
    24. 24. Don’t search for: One database to rule them all!
    25. 25. Use the best suited storage for each kind of data
    26. 26. Thanks!
    27. 27. One more thing!
    28. 28. Hummingbird • Real time Web Traffic Visualiser • Let’s you see visitors interacting with your site in real time • Did a say real time?
    29. 29. Hummingbird technology • Node.js Evented TCP server written in JavaScript powered by the V8 JS engine. • WebSocks • SVG Graphs • Real time = 20 times per second
    30. 30. Hummingbird technology • Tracking pixel to gather data • Build over
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×