NO SQL Databases, Big Data and the cloud

  • 134 views
Uploaded on

Summary on NOSQL databases.

Summary on NOSQL databases.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
134
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
4
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • Consistency: A read sees all previously completed writes.Availability: Reads and writes always succeed.Partition tolerance: Guaranteed properties are maintained even when network failures prevent some machines from communicating with others.https://foundationdb.com/white-papers/the-cap-theorem/The basic idea is that if a client writes to one side of a partition, any reads that go to the other side of that partition can't possibly know about the most recent write. Now you're faced with a choice: do you respond to the reads with potentially stale information, or do you wait (potentially forever) to hear from the other side of the partition and compromise availability?

Transcript

  • 1. Manu Cohen-Yashar The Cloud, Big Data and NoSQL
  • 2. Agenda Data boom Problems with RDBMS No SQL Big Data What’s next
  • 3. Understand NO SQL Types of databases Primary usage Data model Pros and Cons
  • 4. Lots of Data Data is doubles every 18 month Pictures Web site emails Sensors Geo Information Financial Information Science Art . . . (Infinite list)
  • 5. No Limits With the cloud it is now possible to mount any size if cluster and conduct any computation in any scale. The one who will make sense of all available data will rule the world. The conclusion: Use the cloud to analyze large scale of data.
  • 6. Lets Talk about data When we think of data we think of …
  • 7. Data has many forms Yet data comes in many forms and shapes Graphs Documents Time Series Blobs Geo Sensors Unstructured Structured Web
  • 8. Problems with RDBMS Does not scale very well Sharding Replication Models data according to the relational model Is this the best model for all data types? Complex and Expensive Require a DBA Expensive to buy Oracle SQL
  • 9. No Relational Not all types of data fit well into the relational world. Not all data use cases fit well into the ACID convention The relational model does not scale very good Difficult to distribute Difficult to replicate
  • 10. The CAP Theory RDBMS Replicated NoSQL Sharded NoSQL During a network partition, a distributed system must choose either Consistency or Availability.
  • 11. NO SQL Large family of databases No Schema No relations enforced Designed for high scale and distribution Types of NO SQL DB Key Value Wide Columns Documents Graph
  • 12. Motivation for NO SQL Large Scale and Distribution Simplicity Low cost Good fit with the data model Volume, Velocity and Variety
  • 13. What Is No Schema Some data is structured, and some does not. No SQL databases do not ENFORCE a schema like RDBMS systems. You can leverage data structure by creating indexes and smart queries.
  • 14. Types of NO SQL Databases Key values Wide column Document Graph
  • 15. Key values Data is ordered as a key - values pair Query by key and values Simple indexes (by partition key) Examples Azure Table Storage Amazon DynamoDB Key1 Key2 VaIue1 VaIue2 VaIue3 VaIue4 VaIue5 Israel 1234 1 2 3 France 2345 4 5 8
  • 16. Demo DynamoDB and Azure Tables
  • 17. Wide column / Column Families Data is ordered as a key – value groups Store data by column A column family is how the data is stored on the disk Query by keykey range only No Indexes (on some dbs) Examples Google Big-Table Cassandra HBase
  • 18. Example – Cassandra Data Model Column Key value Super Column Collection of columns Column Family Dictionary of columns Super Column Family Dictionary of Column Families
  • 19. Demo Cassandra
  • 20. Document Database Data is ordered as a Key – Document Query by key and document content Use indexes Examples Mongo Raven CouchDB Couchbase
  • 21. Demo
  • 22. Graph databases Data is ordered in elements and relations. Query by relations Supports complicated mathematical graph calculus Examples Neo 4J StarDog (used for sematic web)
  • 23. RDF and OWL Triple Subject - Predicate – Object Define facts RDF (Resource Description Framework) Defines some extra structure to triples. Example: "rdf:type“ is used to say that things are of certain types. Schema: Defines some classes which represent the concept of subjects, objects, predicates etc. Enables making statements about classes of thing, and types of relationship. OWL Adds semantics to the schema. Expressed in triples. Example: "If A isMarriedTo B" then this implies "B isMarriedTo A".
  • 24. Demo
  • 25. There is no one NO SQL solution for all use cases Important There are over than 150 possible offerings…
  • 26. Replication and Sharding No SQL databases can span over a large cluster Replication Copy the data to multiple servers Usually each data element is copied 3 times One master two slaves Result: High Availability Sharding Split the data between servers Horizontal partitioning of the data Result: Horizontal scale Replication and Sharding can be done together
  • 27. The Cloud and NO SQL All Cloud Providers have NO SQL solutions Azure Tables Google Big Table Amazon DynamoDB NO SQL Databases are deployed on a cluster There are large number of cloud hosting offerings for no-sql clusters MongoHQ (MongoDB) Cassandra on Google Compute engine Many more
  • 28. Example – Mongo in Azure
  • 29. Check your schema Be open to use NO-SQL data stores Identify your use-case and find the right database for you Create a simple POC
  • 30. Questions