NO SQL Databases, Big Data and the cloud


Published on

Summary on NOSQL databases.

Published in: Data & Analytics, Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Consistency: A read sees all previously completed writes.Availability: Reads and writes always succeed.Partition tolerance: Guaranteed properties are maintained even when network failures prevent some machines from communicating with others. basic idea is that if a client writes to one side of a partition, any reads that go to the other side of that partition can't possibly know about the most recent write. Now you're faced with a choice: do you respond to the reads with potentially stale information, or do you wait (potentially forever) to hear from the other side of the partition and compromise availability?
  • NO SQL Databases, Big Data and the cloud

    1. 1. Manu Cohen-Yashar The Cloud, Big Data and NoSQL
    2. 2. Agenda Data boom Problems with RDBMS No SQL Big Data What’s next
    3. 3. Understand NO SQL Types of databases Primary usage Data model Pros and Cons
    4. 4. Lots of Data Data is doubles every 18 month Pictures Web site emails Sensors Geo Information Financial Information Science Art . . . (Infinite list)
    5. 5. No Limits With the cloud it is now possible to mount any size if cluster and conduct any computation in any scale. The one who will make sense of all available data will rule the world. The conclusion: Use the cloud to analyze large scale of data.
    6. 6. Lets Talk about data When we think of data we think of …
    7. 7. Data has many forms Yet data comes in many forms and shapes Graphs Documents Time Series Blobs Geo Sensors Unstructured Structured Web
    8. 8. Problems with RDBMS Does not scale very well Sharding Replication Models data according to the relational model Is this the best model for all data types? Complex and Expensive Require a DBA Expensive to buy Oracle SQL
    9. 9. No Relational Not all types of data fit well into the relational world. Not all data use cases fit well into the ACID convention The relational model does not scale very good Difficult to distribute Difficult to replicate
    10. 10. The CAP Theory RDBMS Replicated NoSQL Sharded NoSQL During a network partition, a distributed system must choose either Consistency or Availability.
    11. 11. NO SQL Large family of databases No Schema No relations enforced Designed for high scale and distribution Types of NO SQL DB Key Value Wide Columns Documents Graph
    12. 12. Motivation for NO SQL Large Scale and Distribution Simplicity Low cost Good fit with the data model Volume, Velocity and Variety
    13. 13. What Is No Schema Some data is structured, and some does not. No SQL databases do not ENFORCE a schema like RDBMS systems. You can leverage data structure by creating indexes and smart queries.
    14. 14. Types of NO SQL Databases Key values Wide column Document Graph
    15. 15. Key values Data is ordered as a key - values pair Query by key and values Simple indexes (by partition key) Examples Azure Table Storage Amazon DynamoDB Key1 Key2 VaIue1 VaIue2 VaIue3 VaIue4 VaIue5 Israel 1234 1 2 3 France 2345 4 5 8
    16. 16. Demo DynamoDB and Azure Tables
    17. 17. Wide column / Column Families Data is ordered as a key – value groups Store data by column A column family is how the data is stored on the disk Query by keykey range only No Indexes (on some dbs) Examples Google Big-Table Cassandra HBase
    18. 18. Example – Cassandra Data Model Column Key value Super Column Collection of columns Column Family Dictionary of columns Super Column Family Dictionary of Column Families
    19. 19. Demo Cassandra
    20. 20. Document Database Data is ordered as a Key – Document Query by key and document content Use indexes Examples Mongo Raven CouchDB Couchbase
    21. 21. Demo
    22. 22. Graph databases Data is ordered in elements and relations. Query by relations Supports complicated mathematical graph calculus Examples Neo 4J StarDog (used for sematic web)
    23. 23. RDF and OWL Triple Subject - Predicate – Object Define facts RDF (Resource Description Framework) Defines some extra structure to triples. Example: "rdf:type“ is used to say that things are of certain types. Schema: Defines some classes which represent the concept of subjects, objects, predicates etc. Enables making statements about classes of thing, and types of relationship. OWL Adds semantics to the schema. Expressed in triples. Example: "If A isMarriedTo B" then this implies "B isMarriedTo A".
    24. 24. Demo
    25. 25. There is no one NO SQL solution for all use cases Important There are over than 150 possible offerings…
    26. 26. Replication and Sharding No SQL databases can span over a large cluster Replication Copy the data to multiple servers Usually each data element is copied 3 times One master two slaves Result: High Availability Sharding Split the data between servers Horizontal partitioning of the data Result: Horizontal scale Replication and Sharding can be done together
    27. 27. The Cloud and NO SQL All Cloud Providers have NO SQL solutions Azure Tables Google Big Table Amazon DynamoDB NO SQL Databases are deployed on a cluster There are large number of cloud hosting offerings for no-sql clusters MongoHQ (MongoDB) Cassandra on Google Compute engine Many more
    28. 28. Example – Mongo in Azure
    29. 29. Check your schema Be open to use NO-SQL data stores Identify your use-case and find the right database for you Create a simple POC
    30. 30. Questions