NoSQL Databases Introduction - UTN 2013


Published on

This was one of the workshop that we gave at the UTN University, to the students of Computer Science.

Published in: Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

NoSQL Databases Introduction - UTN 2013

  1. 1. NoSQL Databases Introduction October, 2013
  2. 2. Agenda  Introduction  SQL overview  Why NoSQL?  Characteristics of NoSQL databases  Use Cases  A NoSQL database in action!  Summary
  3. 3. Introduction  A database is an organized collection of data. The data are typically organized to model relevant aspects of reality in a way that supports processes requiring this information.  Management systems (DBMSs) are specially designed applications that interact with the user, other applications, and the database itself to capture and analyze data.  Formally, the term database refers to the data itself and supporting data structures. Databases are created to operate large quantities of information by inputting, storing, retrieving, and managing that information.
  4. 4. SQL Databases
  5. 5. Characteristics  SQL is an ANSI and ISO standard computer language for creating and manipulating databases.  SQL allows the user to create, update, delete, and retrieve data from a database.  SQL is very simple and easy to learn.  High Speed: SQL Queries can be used to retrieve large amounts of records from a database quickly and efficiently.  Well Defined Standards Exist: SQL databases use long-established standard, which is being adopted by ANSI & ISO. Non-SQL databases do not adhere to any clear standard.  No Coding Required: Using standard SQL it is easier to manage database systems without having to write substantial amount of code.  Transactions – ACID Properties (Atomic, Consistent, Isolated, Durable)
  6. 6. What has happened?  Relational databases were introduced into the 1970s to allow applications to store data through a standard data modeling and query language (SQL). Since the rise of the web, the volume of data stored about users, objects, products and events has exploded. Data is also accessed more frequently, and is processed more intensively – for example, social networks create hundreds of millions of customized, real-time activity feeds for users based on their connections' activities.  In response to this demand, computing infrastructure and deployment strategies have also changed dramatically. Low-cost, commodity cloud hardware has emerged to replace vertical scaling on highly complex and expensive single-server deployments. And engineers now use agile development methods, which aim for continuous deployment and short development cycles, to allow for quick response to user demand for features.
  7. 7. NoSQL Databases
  8. 8. But.. What’s NoSQL?  A NoSQL database provides a mechanism for storage and retrieval of data that employs less constrained consistency models than traditional relational databases.  NoSQL systems are also referred to as "Not only SQL" to emphasize that they do in fact allow SQL-like query languages to be used.
  9. 9. Characteristics  Large data volumes (such as Google’s big data’)  Scalable replication and distribution  Potentially thousands of machines  Potentially distributed around the world  Queries need to return answers quickly  Mostly query, few updates  Asynchronous Inserts & Updates  Schema-less  ACID transaction properties are not needed – BASE (Basically Available, SoftState, Eventually Consistent).  CAP Theorem  Open source development
  10. 10. CAP Theorem  According to the theorem, a distributed system cannot satisfy all three of these guarantees at the same time.  Eventual consistency guarantees that if no new updates are made to a given data item, eventually all accesses to that item will return the last updated value.
  11. 11. Taxonomy  The basic classification that most would agree on is based on data model. A few of these and their prototypes are:  Column: HBase, Accumulo  Document: MongoDB, Couchbase  Key-value : Dynamo, Riak, Redis, Cache, Project Voldemort  Graph: Neo4J, Allegro, Virtuoso
  12. 12. MapReduce A MapReduce program is composed of a Map() procedure that performs filtering and sorting (such as sorting students by first name into queues, one queue for each name) and a Reduce() procedure that performs a summary operation (such as counting the number of students in each queue, yielding name frequencies).
  13. 13. NoSQL is not a magic solution  Inconsistent APIs between NoSQL providers.  Denormalized data requires you to maintain you own data relationships in code.  Not a lot of real operational power for DevOps / IT.  Lack of complicated queries requires joins / aggregations / filters to be done in code (except for MapReduce).  Need whole value from the key to read or write any partial information.
  14. 14. NoSQL Use Cases:  SAP uses MongoDB as a core component of SAP’s platform- as-a-service (PaaS) offering.  Foursquare uses MongoDB to store venues and user ‘check-ins’ into venues, sharding the data over more than 25 machines on Amazon EC2.  MongoDB is used for back-end storage on the SourceForge front pages, project pages, and download pages for all projects.  Codecademy is the easiest way to learn to code online.  is a leading UK-based news website.  EA Sports: MongoDB is being used for the game feeds component.
  15. 15. NoSQL Use Cases:  AOL: “We selected Couchbase after evaluating several open source products to power our next-generation backend ad serving platform”.  Zynga’s FarmVille, Café World, Mafia Wars and other games have over 235 million active users per month. We rely on technology from Couchbase to make that possible.  In the PayPal Media Network Advertising Pipeline, Couchbase is used to build a scalable cross channel audience profiling, segmentation, identity mapping & frequency capping.  LinkedIn built a durable and scalable index for it's metrics visualization engine using Couchbase.  Skyscanner scaled one of its flight search APIs from 100,000 searches a day to over 3 million, introducing Couchbase on its tech stack.
  16. 16. Another use cases..  Netflix is using Amazon SimpleDB. Link  Twitter uses Cassandra, Hadoop, Hbase, amont others. Link  Facebook and Instagram, are both using Cassandra.  Google uses BigTable (equivalent to Hadoop HBase).  LinkedIn uses Voldemort.  Etc
  17. 17. Summary  This is just the tip of an iceberg. Now on, the rest it’s on you!   SQL works great, cant scale for large data.  NoSQL works great, cant fit for all.  Use SQL + NoSQL 
  18. 18. References  Base de Datos [Wikipedia]  SQL [Wikipedia]  NoSQL Distilled [Martin Fowler]  NoSQL vs. SQL - Battle of the Backends [Google IO12]  SQL Standard and NoSQL Databases  What is NoSQL? [MongoDB]  Why NoSQL? [Couchbase]  CouchDB: The Definitive Guide  BigTable Patent [Google]
  19. 19. Thanks!
  20. 20. Backup
  21. 21. JSON  JSON or JavaScript Object Notation, is a text-based open standard designed for human-readable data interchange. Derived from the JavaScript scripting language, JSON is a language for representing simple data structures and associative arrays, called objects. Despite its relationship to JavaScript, JSON is language-independent, with parsers available for many languages.  Sample: