• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
NoSQL: onde, como e por quê? Cassandra e MongoDB

NoSQL: onde, como e por quê? Cassandra e MongoDB



Visão geral sobre bancos de dados NoSQL e detalhes técnicos dos modelos de dados e arquiteturas das implementações Apache Cassandra e MongoDB.

Visão geral sobre bancos de dados NoSQL e detalhes técnicos dos modelos de dados e arquiteturas das implementações Apache Cassandra e MongoDB.



Total Views
Views on SlideShare
Embed Views



5 Embeds 829

http://www.hjort.co 818
http://webcache.googleusercontent.com 4
http://www.linkedin.com 3
http://malefaction28.okeedo.com 2
http://smashingreader.com 2



Upload Details

Uploaded via as OpenOffice

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment
  • ...
  • A grande maioria das aplicações necessita de algum tipo de persistência de dados.
  • Por que precisamos de SQL? Modelo relacional está bem consolidado Linguagem com décadas de evolução Integridade referencial e transações (ACID) Conjunto rico de ferramentas É o que aprendemos É o que o mercado usa
  • Onde usamos SQL (i.e. ACID)? Aplicações empresariais Agências de viagem Internet banking Compras online Cartões de crédito Transações em geral
  • ...
  • Informação digital criada, capturada e replicada pelo mundo Fonte: IDC White Paper, "The Diverse and Exploding Digital Universe", 2008.
  • Aplicações web modernas Grandes volumes de dados (escala da Internet) Altas taxas de leitura e escrita Necessidade de alta disponibilidade Frequentes mudanças nos esquemas Aplicações “sociais” não exigem os mesmos níveis de consistência que aplicações “bancárias”
  • • Scaling existing Relational Databases is hard • Sharding is one solution, but makes your RDBMS unusable • Operational Nightmare
  • Os modelos transacionais ACID Atomicity, Consistency, Isolation, Durability: a set of properties that guarantee database transactions are processed reliably BASE Basically Available, Soft state, Eventual consistency : as opposed to the database concept of ACID http://queue.acm.org/detail.cfm?id=1394128 Eventually Consistent http://queue.acm.org/detail.cfm?id=1466448
  • O Movimento NoSQL NoSQL = Not Only SQL http://nosql-database.org/ bancos que diferem do modelo clássico relacional não relacionais distribuídos horizontalmente escaláveis com esquemas flexíveis replicáveis APIs simples BASE (e não ACID)
  • Wide Column Store: Bigtable (Google) SimpleDB (AWS) Cassandra (Apache) HBase (Apache) Hypertable Document Store: CouchDB (Apache) MongoDB Key-Value Store: Riak Redis Table Storage (Microsoft Azure)
  • ...
  • Eric Brewer (UCB) in 2000 presented the CAP theorem , which states that of 3 properties of shared-data systems: C: data consistency A: system availability P: tolerance to network partition Only 2 can be achieved at any given time! A more formal confirmation can be found in a 2002 paper by Seth Gilbert and Nancy Lynch.
  • CA – Corruption possible if live nodes can’t communicate (network partition)
  • CP – Completely inaccessible if any nodes are dead
  • AP – Always available, but may not always read most recent NoSQL chooses AP, but makes consistency configurable
  • Cassandra highlights ● High availability ● Incremental scalability ● Eventually consistent ● Super fast writes ● Tunable tradeoffs between consistency and latency ● Minimal administration ● No SPF
  • Developed at Facebook (problem of inbox search) Follows the BigTable ( Google ) Data Model - column oriented http://labs.google.com/papers/bigtable.html Follows the Dynamo ( Amazon ) Eventual Consistency model http://www.allthingsdistributed.com/2007/10/amazons_dynamo.html Opensourced at Apache as an incubator project (2008) and then was top-leveled (2010) Implemented in Java
  • Keyspace Agrupamento de famílias de colunas (~banco de dados) Column Family Agrupamento de colunas com ordenação fixada (~tabela) Row Key Chave que representa uma linha de colunas (~chave primária) Column Representação de um valor, com: - Nome (Name) - Valor (Value) - Timestamp
  • ...
  • ...
  • Similar à linguagem SQL, mas com particularidades do Cassandra: consistency levels, TTL, slices
  • Importância de esquemas flexíveis
  • Every node is equal! Always at least one copy in each datacenter Alternate datacenters on the ring DHT (Distributed Hash Table) Ring
  • Eventual consistency ● Synch to Washington, asynch to Hong Kong Client API Tunables ● Confirm R replicas match at read time ● Synchronously write to W replicas of N total replicas Allows for almost-strong consistency ● When W + R > N
  • Gossip protocol (~P2P) is used for cluster membership and failure detection on nodes. Enables seamless nodes addition Rebalancing of keys Fast detection of nodes that goes down Every node knows about all others - no master State disseminated in O(log N) rounds
  • * Dados das primeiras versões do Cassandra (v0.6) Versão 1.0 trouxe melhoras de: - 40% na escrita - 400% na leitura! http://www.datastax.com/dev/blog/whats-new-in-cassandra-1-0-performance
  • MongoDB is a powerful, flexible, and scalable data store. It combines the ability to scale out with many of the most useful features of relational databases, such as secondary indexes, range queries, and sorting. MongoDB is also incredibly featureful: it has tons of useful features such as built-in support for MapReduce-style aggregation and geospatial indexes. 1. (Slang.) humongous extraordinarily large.
  • MongoDB basic concepts: • A document is the basic unit of data, roughly equivalent to a row in a RDBMS • Similarly, a collection can be thought of as the schema-free equivalent of a table • A single instance of MongoDB can host multiple independent databases , each of which can have its own collections and permissions Document : an ordered set of keys with associated values (i.e., map, hash, or dictionary)
  • With Mongo, you do less "normalization" than you would perform designing a relational schema because there are no server-side joins . Generally, you will want one database collection for each of your top level objects.
  • JSON-style documents with dynamic schemas offer simplicity and power.
  • Rich, document-based queries.
  • Index on any attribute, just like you're used to. Allows it to do queries orders of magnitude faster. Geospatial index is also provided. Map Reduce : a method of aggregation that can be easily parallelized across multiple servers. A capped collection is created fixed in size, behaving like circular queues . GridFS is a mechanism for storing large binary files in MongoDB. No need for a separate file storage architecture. JavaScript can be executed on the server.
  • Sharding is the process of splitting data up and storing different portions of the data on different machines. Sharding is MongoDB’s approach to scaling out . Sharding allows you to add more machines to handle increasing load and data size without affecting your application. Scale horizontally without compromising functionality.
  • From the application’s point of view, a sharded setup looks just like a nonsharded setup. There is no need to change application code when you need to scale. To set up sharding with no points of failure , you’ll need the following: • Multiple config servers • Multiple mongos servers • Replica sets for each shard • w set correctly
  • Grande ruptura – IMS x RDBMS (invenção do modelo relacional)
  • A segunda ruptura: RDBMS x NoSQL
  • Conclusão NoSQL can’t do everything Right tool for the right job

NoSQL: onde, como e por quê? Cassandra e MongoDB NoSQL: onde, como e por quê? Cassandra e MongoDB Presentation Transcript