Your SlideShare is downloading. ×
0
Big Data and NoSQL with MongoDB &
Cassandra

NOSQL Intro with MongoDB and Cassandra

1
-

Brian Enochson
- SW Engineer who has worked as designer / developer

on NOSQL (Mongo, Cassandra, Hadoop)
- Specialize i...
•
•
•
•

•

Presentation Intro
Introduction to Big Data
Introduction to NoSQL
Relational Database to NoSQL technology
cont...
•
•

•
•

•
•
•

Introduction to MongoDB
MongoDB Components, capabilities and
common use cases
Json & BsON
Documents, coll...
•
•
•
•
•
•

Cassandra
Architecture
Data Model
Data Modeling
Application Development
Wrap-up and final Q & A

NOSQL Intro ...




http://www.cloudtweaks.com/2014/01/hand-writing-data-data-everywhere-but-lets-juststop-and-think/
NOSQL Intro with M...


•

Why are database like Mongo or Cassandra
needed?
To understand one needs to look at
• the history of databases
• How...
•

•
•

•
•

•

1960’s – Hierarchical and Network type (IMS and
CODASYL)
1970’s – Beginnings of theory behind relational m...
•

Developers today are faced with Internet scale

100,000’s of users
Low cost of storage
Increased processing power
Abili...


Some facts from
http://www.storagenewsletter.com/rubriques/m
arket-reportsresearch/ibm-cmo-study/

Approximately 90 per...
•

Relational
• Divide into tables, relate into foreign keys, DB constraints,
normalized data, the Interface is SQL

•

No...
Luckily, due to the large number of compromises made
when attempting to scale their existing relational
databases, these t...






Eventual consistency
Application has increased responsibility such
as maintain consistency & handle transactions
...
Driving force in requiring new technology is often
referred to as the “3 V’s”.
•
•
•

Volume – amount of data
Variety – ra...
NoSQL != Big Data




NoSQL products were created to help solve the big
data problem.
Big data is a much larger problem ...
Document DB





Wide Column– Column Family





Cassandra, HBASE, Amazon SimpleDB

Key Value



•

Riak, Redis, Dyna...




Choosing the right NoSQL type and eventual product
depends on…
Type of Data
•
•
•
•
•
•
•
•





One key and a lo...
•

ACID

•

CAP Theorem

•

BASE

NOSQL Intro with MongoDB and
Cassandra

18
PROBABLY HAVE HEARD OF ACID
•
Atomic – All or None
•

Consistency – What is written is valid

•

Isolation – One operation...




Many may have heard this one
CAP stands for Consistency, Availability and
Partition Tolerance

• Consistency –like t...
In Mongo terms you can have 2 of 3. Availability, Partition-Tolerance
or Eventual Consistency.

NOSQL Intro with MongoDB a...
NOSQL Intro with MongoDB and
Cassandra

22
•

So we are talking about large amounts of data

•

High velocity of acquisition

•

A lot of variety that we need to sto...
•

Maybe consider going relational if
• Highly transactional (FoundationDB?)
• Business Intelligence Systems (Hadoop may m...
And now
let’s look at MongoDB

NOSQL Intro with MongoDB and
Cassandra

25
http://db-engines.com/en/ranking_definition

NOSQL Intro with MongoDB and
Cassandra

26
Few

•
•
•
•
•
•

high level points

Document Oriented
Storage format is JSON (actually BSON)
Replication built in
Master...
•

Open Source

•

Schemaless

•

Scalable

•

Document Level Atomicity

•

Easy Installation

•

Relatively Ease Of Use

...
•

No cross document transactions

•

No joins

•

Replication – master / slave

•

Sharding

NOSQL Intro with MongoDB and...


-

* Credit – Dwight Merriman, Founder and CEO – MongoDB (was 10Gen)

NOSQL Intro with MongoDB and
Cassandra

30


Master Slave and Secondary Reads

** http://docs.mongodb.org/manual/core/replication-introduction/

NOSQL Intro with Mo...


Primary






Receives all write requests
Replica set can only have on primary
Mongo stored all changes in oplog

S...


http://docs.mongodb.org/manual/core/sharding-introduction/

NOSQL Intro with MongoDB and
Cassandra

33


Shards




Store the data, normally in production each shard is
a replica set

Routers


Routes client operations to...


•

•



At its simplest form, Mongo is a document oriented database

MongoDB stores all data in documents, which are
J...








{
"_id" :
"52a602280f2e642811ce8478",
"ratingCode" : "PG13",
"country" : "USA",
"entityType" : "Rating”
}

N...
NOSQL Intro with MongoDB and
Cassandra

37










Documents have the following rules:
The maximum BSON document size is 16
megabytes.
The field name _id is r...



Windows
http://docs.mongodb.org/manual/tutorial/installmongodb-on-windows/



MAC
http://docs.mongodb.org/manual/tut...




Database
mongod

Shell
mongo
show dbs
show collections
db.stats()
NOSQL Intro with MongoDB and
Cassandra

40


1_simpleinsert.txt
 Insert
 Find
 Find all
 Find One
 Find with criteria
 Indexes
 Explain()
NOSQL Intro with Mo...


2_arrays_sort.txt
• Embedded documents
• Limit, Sort
• Using regex in query
• Removing documents
• Drop collection
NOSQ...




3_imp_exp.txt
Mongo provides tools for getting data in and
out of the database
• Data Can Be Exported to json files
...


4_cond_ops.txt
•
•
•
•
•

$lt
$gt
$gte
$lte
$or

• Also $not, $exists, $type, $in


(for $type refer to
http://docs.mo...


Aggregation Framework




Uses a pipeline model to perform a series of operations
on data. Common is a match phase (s...


5_admin.txt
• how dbs
• show collections
• db.stats()

• db.posts.stats()
• db.posts.drop()
• db.system.indexes.find()
...
•
•
•
•
•

Remember with NoSql redundancy is not evil
Applications insure consistency, not the DB
Application join data, n...
•

Your basic units of data (what would be a document)?

•

How are these units grouped / related?

•

•

How does Mongo l...


Normalized
• Similar to relational model.
• One collection per entity type
• Little or no redundancy
• Allows clean upd...
NOSQL Intro with MongoDB and
Cassandra

50
•

From parent to child
{
name: "O'Reilly Media",
books: [12346789, 234567890, ...]
}

•

From child to parent
{
_id: 1234...


•

•

•

Often used pattern in Mongo is to embed
information as subdocuments.
Used when there is a contains relationshi...
NOSQL Intro with MongoDB and
Cassandra

53

•

Many or few collections
Many Collections
•
•
•
•

•

As seen in normalized
Clean and little redundancy
May not provid...
•

•

Document Growth – will relocate if exceeds allocated
size
Atomicity

• Atomic at document level
• Consideration for ...


CMS Systems



Log Collection


https://code.google.com/p/log4mongo/



Caching



Queues / Messaging


Capped Col...
Mongo Driver
Supplied by MongoDB Itself
Easy to setup
Housed on maven repo

Morphia
Uses App Model
Handles References Well...


Node
Javascript (JSON), Coffeescript
MEAN Stack






Scala



Casbah
Reactive Mongo

NOSQL Intro with MongoDB and...


Get MEAN



Mongo, Express, Angular and Node






http://bitnami.com/stack/mean
http://mean.io

Can install, in a ...








Database in the cloud
https://mongolab.com/

Can access using shell, GUI Mongo explorer,
mongoimport, mongoexp...
MongoDB: The Definitive Guide, 2nd Edition
By: Kristina Chodorow
Publisher: O'Reilly Media, Inc.
Pub. Date: May 23, 2013
P...
MongoDB Applied Design Patterns
By: Rick Copeland
Publisher: O'Reilly Media, Inc.
Pub. Date: March 18, 2013
Print ISBN-13:...
•
•
•
•
•

•

http://www.mongodb.org/
https://mongolab.com/welcome/
https://education.mongodb.com/
http://blog.mongodb.org...
Let’s look briefly at Cassandra as an
alternative to Mongo

NOSQL Intro with MongoDB and
Cassandra

64
•

Developed At Facebook, based on Google Big Table and
Amazon Dynamo **

•

Open Sourced in mid 2008

•

Apache Project M...
•

No Single Point of Failure – highly available.
• Peer to Peer – no master

•
•
•
•
•
•
•
•

Data Center Aware – distrib...






** Important Term **
Quorum : Q = N / 2 + 1.
We get consistency in a BASE world by satisfying W + R >
N
3 obvious...


C* data model is made of these:


Column – a name, a value and a timestamp. Applications
can use the name as the data ...
NOSQL Intro with MongoDB and
Cassandra

69





Tokens – partitioner dependent element on the ring.
Each node has a single unique token assigned.
Each node claims...
•

•

Replication is how many copies of each piece of
data that should be stored. In C* terms it is
Replication Factor or ...
NOSQL Intro with MongoDB and
Cassandra

72


Using token generation values from before. 4 node cluster.
Write value with token 3253529586511730793292182592897102643...
NOSQL Intro with MongoDB and
Cassandra

74
•
•

•

When writing, Coordinator Node will be selected. Selected
at write (or read) time. Not a SPF!
Using Gossip Protoco...







3 important concepts:
Read Repair - At time of read, inconsistencies are noticed
between nodes and replicas are...
•

•

Interaction with Cassandra can be done using one of
supplied clients such as CLI or CQL. Otherwise client
applicatio...


Many More Topics / Information Related to C*
not covered



Great for Fast Writes



No Single POF



Data Center Aw...


Questions?



Comments?

Thank You!!!!!!
 brian.enochson@gmail.com


NOSQL Intro with MongoDB and
Cassandra

79
Upcoming SlideShare
Loading in...5
×

Big Data, NoSQL with MongoDB and Cassasdra

4,675

Published on

Presentation on Big Data, NoSQL with MongoDB and Cassasdra

Published in: Technology
0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
4,675
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
201
Comments
0
Likes
4
Embeds 0
No embeds

No notes for slide

Transcript of "Big Data, NoSQL with MongoDB and Cassasdra"

  1. 1. Big Data and NoSQL with MongoDB & Cassandra NOSQL Intro with MongoDB and Cassandra 1
  2. 2. - Brian Enochson - SW Engineer who has worked as designer / developer on NOSQL (Mongo, Cassandra, Hadoop) - Specialize in SW Development, architecture and training     Brian Enochson brian.enochson@gmail.com Twitter @benochso Google Plus https://plus.google.com/+BrianEnochson NOSQL Intro with MongoDB and Cassandra 2
  3. 3. • • • • • Presentation Intro Introduction to Big Data Introduction to NoSQL Relational Database to NoSQL technology contrast & compare NoSQL landscape NOSQL Intro with MongoDB and Cassandra 3
  4. 4. • • • • • • • Introduction to MongoDB MongoDB Components, capabilities and common use cases Json & BsON Documents, collections, references and Mongo ID Querying Data Modeling/Schema Design Replication & Sharding NOSQL Intro with MongoDB and Cassandra 4
  5. 5. • • • • • • Cassandra Architecture Data Model Data Modeling Application Development Wrap-up and final Q & A NOSQL Intro with MongoDB and Cassandra 5
  6. 6.   http://www.cloudtweaks.com/2014/01/hand-writing-data-data-everywhere-but-lets-juststop-and-think/ NOSQL Intro with MongoDB and Cassandra 6
  7. 7.  • Why are database like Mongo or Cassandra needed? To understand one needs to look at • the history of databases • How systems were built in the past • Then examine modern applications • Web scale • Data acquisition • Other factors like cost of H/W NOSQL Intro with MongoDB and Cassandra 7
  8. 8. • • • • • • 1960’s – Hierarchical and Network type (IMS and CODASYL) 1970’s – Beginnings of theory behind relational model. Codd 1980’s – Rise of the relational model. SQL. E/R Model (Chen) 1990’s – Access/Excel and MySQL. ODMS began to appear 2000;’s – Two forces; large enterprise and open source. Google and Amazon. CAP Theorem (more on that to come…) 2010’s – Immergence of NoSQL as an industry player and viable alternative NOSQL Intro with MongoDB and Cassandra 8
  9. 9. • Developers today are faced with Internet scale 100,000’s of users Low cost of storage Increased processing power Ability to capture (and need) of millions of events. Caching solves it to an extent but brings other complexities • Real-time • Need to scale out and not up. (add infinite number of low cost machines vs. replace with a more powerful machine). • • • • • Cost • Let’s not forget for enterprise DB’s Internet scale can become expensive • Open source DB’s may solve license cost, but don’t ignore operational costs NOSQL Intro with MongoDB and Cassandra 9
  10. 10.  Some facts from http://www.storagenewsletter.com/rubriques/m arket-reportsresearch/ibm-cmo-study/ Approximately 90 percent of all the real-time information being created today is unstructured data Every day we create 2.5 quintillion (10 to the 18th) bytes of data (this is 30 zeroes!!) 90 percent of the world's data today has been created in the last two years alone NOSQL Intro with MongoDB and Cassandra 10
  11. 11. • Relational • Divide into tables, relate into foreign keys, DB constraints, normalized data, the Interface is SQL • NoSQL • Store in schemaless format, redundancy encouraged, application access determines the storage format (your queries).Interface varies and is optimized for the implementation, no forced DB constraints. NOSQL Intro with MongoDB and Cassandra 11
  12. 12. Luckily, due to the large number of compromises made when attempting to scale their existing relational databases, these tradeoffs were not so foreign or distasteful as they might have been.  Greg Burd https://www.usenix.org/legacy/publications /login/2011-10/openpdfs/Burd.pdf NOSQL Intro with MongoDB and Cassandra 12
  13. 13.    Eventual consistency Application has increased responsibility such as maintain consistency & handle transactions Store redundant data NOSQL Intro with MongoDB and Cassandra 13
  14. 14. Driving force in requiring new technology is often referred to as the “3 V’s”. • • • Volume – amount of data Variety – range of data types and sources Velocity – speed of data in and out NOSQL Intro with MongoDB and Cassandra 14
  15. 15. NoSQL != Big Data   NoSQL products were created to help solve the big data problem. Big data is a much larger problem than just storage. Analysis tools like Hadoop, messaging systems like Kafka, real time processing engines like Storm and machine learning (Mahout) all help solve the big data problem. NOSQL Intro with MongoDB and Cassandra 15
  16. 16. Document DB   Wide Column– Column Family   Cassandra, HBASE, Amazon SimpleDB Key Value  • Riak, Redis, DynamoDB, Voldemort, MemcacheDB Graph  • Neo4J, OrientDB Search (search can also be a persistence store)  •  MongoDB, CouchDB, Lucene, Solr, ElasticSearch Many many many, many more! (http://nosql-database.org/) NOSQL Intro with MongoDB and Cassandra 16
  17. 17.   Choosing the right NoSQL type and eventual product depends on… Type of Data • • • • • • • •    One key and a lot of data? Schema variance High volume of data? Storing, media, blobs, Document oriented? Tracking relationships? Combination? Multi-Datacenter Type of Access Volumes of Data (there is big data and there is BIG DATA) Need/want support/services/training NOSQL Intro with MongoDB and Cassandra 17
  18. 18. • ACID • CAP Theorem • BASE NOSQL Intro with MongoDB and Cassandra 18
  19. 19. PROBABLY HAVE HEARD OF ACID • Atomic – All or None • Consistency – What is written is valid • Isolation – One operation at a time • Durability – Once committed to the DB, it stays This is the world we have lived in for a long time… NOSQL Intro with MongoDB and Cassandra 19
  20. 20.   Many may have heard this one CAP stands for Consistency, Availability and Partition Tolerance • Consistency –like the C in ACID. Operation is all or nothing, • Availability – service is available. • Partition Tolerance – No failure other than complete network failure causes system not to respond  ** http://www.cs.berkeley.edu/~brewer/cs262b2004/PODC-keynote.pdf NOSQL Intro with MongoDB and Cassandra 20
  21. 21. In Mongo terms you can have 2 of 3. Availability, Partition-Tolerance or Eventual Consistency. NOSQL Intro with MongoDB and Cassandra 21
  22. 22. NOSQL Intro with MongoDB and Cassandra 22
  23. 23. • So we are talking about large amounts of data • High velocity of acquisition • A lot of variety that we need to store. Will worry about it later how to handle (or not) • Need to scale and not break the bank • Want the database to support agile, not hinder NOSQL Intro with MongoDB and Cassandra 23
  24. 24. • Maybe consider going relational if • Highly transactional (FoundationDB?) • Business Intelligence Systems (Hadoop may make this not true) • Don’t be fooled by fear of losing ACID…. http://highscalability.com/blog/2013/5/1/myth-eric-brewer-onwhy-banks-are-base-not-acid-availability.html NOSQL Intro with MongoDB and Cassandra 24
  25. 25. And now let’s look at MongoDB NOSQL Intro with MongoDB and Cassandra 25
  26. 26. http://db-engines.com/en/ranking_definition NOSQL Intro with MongoDB and Cassandra 26
  27. 27. Few • • • • • • high level points Document Oriented Storage format is JSON (actually BSON) Replication built in Master / slave architecture Strong querying support Name from "humongous" NOSQL Intro with MongoDB and Cassandra 27
  28. 28. • Open Source • Schemaless • Scalable • Document Level Atomicity • Easy Installation • Relatively Ease Of Use • Great (!!!!) Documentation NOSQL Intro with MongoDB and Cassandra 28
  29. 29. • No cross document transactions • No joins • Replication – master / slave • Sharding NOSQL Intro with MongoDB and Cassandra 29
  30. 30.  - * Credit – Dwight Merriman, Founder and CEO – MongoDB (was 10Gen) NOSQL Intro with MongoDB and Cassandra 30
  31. 31.  Master Slave and Secondary Reads ** http://docs.mongodb.org/manual/core/replication-introduction/ NOSQL Intro with MongoDB and Cassandra 31
  32. 32.  Primary     Receives all write requests Replica set can only have on primary Mongo stored all changes in oplog Secondary Replicates primary oplog  Clients can prefer to read from secondaries  If primary goes down a new primary is elected (after 10 seconds no response)  NOSQL Intro with MongoDB and Cassandra 32
  33. 33.  http://docs.mongodb.org/manual/core/sharding-introduction/ NOSQL Intro with MongoDB and Cassandra 33
  34. 34.  Shards   Store the data, normally in production each shard is a replica set Routers  Routes client operations to shards based on shard key, can have more than one for availability  Shard key is range based or hashed  Config Servers   Contains cluster metadata Production there are 3 config servers NOSQL Intro with MongoDB and Cassandra 34
  35. 35.  • •  At its simplest form, Mongo is a document oriented database MongoDB stores all data in documents, which are JSON-style data structures composed of field-andvalue pairs. MongoDB stores documents on disk in the BSON serialization format. BSON is a binary representation of JSON documents. BSON contains more data types than does JSON. ** For in-depth BSON information, see bsonspec.org. NOSQL Intro with MongoDB and Cassandra 35
  36. 36.       { "_id" : "52a602280f2e642811ce8478", "ratingCode" : "PG13", "country" : "USA", "entityType" : "Rating” } NOSQL Intro with MongoDB and Cassandra 36
  37. 37. NOSQL Intro with MongoDB and Cassandra 37
  38. 38.      Documents have the following rules: The maximum BSON document size is 16 megabytes. The field name _id is reserved for use as a primary key; its value must be unique in the collection. The field names cannot start with the $ character. The field names cannot contain the . character. NOSQL Intro with MongoDB and Cassandra 38
  39. 39.   Windows http://docs.mongodb.org/manual/tutorial/installmongodb-on-windows/  MAC http://docs.mongodb.org/manual/tutorial/installmongodb-on-os-x/  Create Data Directory , Defaults  • C:datadb • /data/db/ (make sure have permissions)   Or can set using -dbpath C:mongodbbinmongod.exe --dbpath d:testmongodbdata NOSQL Intro with MongoDB and Cassandra 39
  40. 40.   Database mongod Shell mongo show dbs show collections db.stats() NOSQL Intro with MongoDB and Cassandra 40
  41. 41.  1_simpleinsert.txt  Insert  Find  Find all  Find One  Find with criteria  Indexes  Explain() NOSQL Intro with MongoDB and Cassandra 41
  42. 42.  2_arrays_sort.txt • Embedded documents • Limit, Sort • Using regex in query • Removing documents • Drop collection NOSQL Intro with MongoDB and Cassandra 42
  43. 43.   3_imp_exp.txt Mongo provides tools for getting data in and out of the database • Data Can Be Exported to json files • Json files can then be Imported NOSQL Intro with MongoDB and Cassandra 43
  44. 44.  4_cond_ops.txt • • • • • $lt $gt $gte $lte $or • Also $not, $exists, $type, $in  (for $type refer to http://docs.mongodb.org/manual/reference/ope rator/query/type/#_S_type ) NOSQL Intro with MongoDB and Cassandra 44
  45. 45.  Aggregation Framework   Uses a pipeline model to perform a series of operations on data. Common is a match phase (selection) and then grouping (create result) Map Reduce  Two phases  Map that creates one or more documents from each input document  Reduce phase that combines output from Map into some result  Finalize – optional that can perform some logic (e.g. sorting) on reduce output NOSQL Intro with MongoDB and Cassandra 45
  46. 46.  5_admin.txt • how dbs • show collections • db.stats() • db.posts.stats() • db.posts.drop() • db.system.indexes.find() NOSQL Intro with MongoDB and Cassandra 46
  47. 47. • • • • • Remember with NoSql redundancy is not evil Applications insure consistency, not the DB Application join data, not defined in the DB Datamodel is schema-less Datamodel is built to support queries usually NOSQL Intro with MongoDB and Cassandra 47
  48. 48. • Your basic units of data (what would be a document)? • How are these units grouped / related? • • How does Mongo let you query this data, what are the options? Finally, maybe most importantly, what are your applications access patterns? • • • • • Reads vs. writes Queries Updates Deletions How structured is it NOSQL Intro with MongoDB and Cassandra 48
  49. 49.  Normalized • Similar to relational model. • One collection per entity type • Little or no redundancy • Allows clean updates, familiar to many SQL users, easier to understand NOSQL Intro with MongoDB and Cassandra 49
  50. 50. NOSQL Intro with MongoDB and Cassandra 50
  51. 51. • From parent to child { name: "O'Reilly Media", books: [12346789, 234567890, ...] } • From child to parent { _id: 123456789, title: "MongoDB: The Definitive Guide", publisher_id: "oreilly" } NOSQL Intro with MongoDB and Cassandra 51
  52. 52.  • • • Often used pattern in Mongo is to embed information as subdocuments. Used when there is a contains relationship Easier querying (when related data is often used together) Need to keep 16 MB document size in mind NOSQL Intro with MongoDB and Cassandra 52
  53. 53. NOSQL Intro with MongoDB and Cassandra 53
  54. 54.  • Many or few collections Many Collections • • • • • As seen in normalized Clean and little redundancy May not provide best performance May require frequent updates to application if new types added Multiple Collections • Middle ground, partially normalized • Not many collections • One large generic collection • Contains many types • Use type field NOSQL Intro with MongoDB and Cassandra 54
  55. 55. • • Document Growth – will relocate if exceeds allocated size Atomicity • Atomic at document level • Consideration for insertions, remove and multi-document updates  Sharding – collections distributed across mongod instances, uses a shard key.  Indexes – index fields often queries, indexes affect write performance slightly  Consider using TTL to automatically expire documents NOSQL Intro with MongoDB and Cassandra 55
  56. 56.  CMS Systems  Log Collection  https://code.google.com/p/log4mongo/  Caching  Queues / Messaging  Capped Collections - fixed-size collections that support high-throughput operations that insert, retrieve, and delete documents based on insertion order.  Analytics  Prototyping NOSQL Intro with MongoDB and Cassandra 56
  57. 57. Mongo Driver Supplied by MongoDB Itself Easy to setup Housed on maven repo Morphia Uses App Model Handles References Well Spring Mongo Great if using Spring already NOSQL Intro with MongoDB and Cassandra 57
  58. 58.  Node Javascript (JSON), Coffeescript MEAN Stack    Scala   Casbah Reactive Mongo NOSQL Intro with MongoDB and Cassandra 58
  59. 59.  Get MEAN  Mongo, Express, Angular and Node    http://bitnami.com/stack/mean http://mean.io Can install, in a VM or even in the cloud NOSQL Intro with MongoDB and Cassandra 59
  60. 60.     Database in the cloud https://mongolab.com/ Can access using shell, GUI Mongo explorer, mongoimport, mongoexport and use in application Amazon, Rackspace, Joyent or Azure NOSQL Intro with MongoDB and Cassandra 60
  61. 61. MongoDB: The Definitive Guide, 2nd Edition By: Kristina Chodorow Publisher: O'Reilly Media, Inc. Pub. Date: May 23, 2013 Print ISBN-13: 978-1-4493-4468-9 Pages in Print Edition: 432 MongoDB in Action By: Kyle Banker Publisher: Manning Publications Pub. Date: December 16, 2011 Print ISBN-10: 1-935182-87-0 Print ISBN-13: 978-1-935182-87-0 Pages in Print Edition: 312 The Definitive Guide to MongoDB: The NoSQL Database for Cloud and Desktop Computing By Eelco Plugge; Peter Membrey; Tim Hawkins Apress, September 2010 ISBN: 9781430230519 327 pages NOSQL Intro with MongoDB and Cassandra 61
  62. 62. MongoDB Applied Design Patterns By: Rick Copeland Publisher: O'Reilly Media, Inc. Pub. Date: March 18, 2013 Print ISBN-13: 978-1-4493-4004-9 Pages in Print Edition: 176 MongoDB for Web Development (rough cut!) By: Mitch Pirtle Publisher: Addison-Wesley Professional Last Updated: 14-JUN-2013 Pub. Date: March 11, 2015 (Estimated) Print ISBN-10: 0-321-70533-5 Print ISBN-13: 978-0-321-70533-4 Pages in Print Edition: 360 Instant MongoDB By: Amol Nayak; Publisher: Packt Publishing Pub. Date: July 26, 2013 Print ISBN-13: 978-1-78216-970-3 Pages in Print Edition: 72 NOSQL Intro with MongoDB and Cassandra 62
  63. 63. • • • • • • http://www.mongodb.org/ https://mongolab.com/welcome/ https://education.mongodb.com/ http://blog.mongodb.org/ http://stackoverflow.com/questions/tagged/ mongodb http://bitnami.com/stack/mean NOSQL Intro with MongoDB and Cassandra 63
  64. 64. Let’s look briefly at Cassandra as an alternative to Mongo NOSQL Intro with MongoDB and Cassandra 64
  65. 65. • Developed At Facebook, based on Google Big Table and Amazon Dynamo ** • Open Sourced in mid 2008 • Apache Project March 2009 • • • Commercial Support through Datastax (originally known as Riptano, founded 2010) Used at Netflix, eBay and many more. Reportedly 300 TB on 400 machines largest installation Current version is 2.0.3 NOSQL Intro with MongoDB and Cassandra 65
  66. 66. • No Single Point of Failure – highly available. • Peer to Peer – no master • • • • • • • • Data Center Aware – distributed architecture Linear Scaling – just add hardware Eventual Consistency, tunable tradeoff between latency and consistency Architecture is optimized for writes. Can have 2 billion columns (cells)! Data modeling for reads. Design starts with looking at your queries. (sound familiar?) With CQL became more SQL-Like, but no joins, no subqueries, limited ordering (but very useful) Column Names can part of data, e.g. Time Series NOSQL Intro with MongoDB and Cassandra 66
  67. 67.    ** Important Term ** Quorum : Q = N / 2 + 1. We get consistency in a BASE world by satisfying W + R > N 3 obvious ways: 1. W = 1, R = N 2. W = N, R = 1 3. W = Q, R = Q (N is replication factor, R = read replica count, W = write replica count) NOSQL Intro with MongoDB and Cassandra 67
  68. 68.  C* data model is made of these:  Column – a name, a value and a timestamp. Applications can use the name as the data and not use value. (RDBMS like a column). Row – a collection of columns identified by a unique key. Key is called a partition key (RDBMS like a row).  Column Family – container for an ordered collection rows. Each row is an ordered collection of columns. Each column has a key and maybe a value. (RDBMS like a table). This is also known as a table now in C* terms.  Keyspace – administrative container for CF’s. It is a namespace. Also has a replication strategy – more late.  (RDBMS like a DB or schema). NOSQL Intro with MongoDB and Cassandra 68
  69. 69. NOSQL Intro with MongoDB and Cassandra 69
  70. 70.    Tokens – partitioner dependent element on the ring. Each node has a single unique token assigned. Each node claims a range of tokens that is from its token to token of the previous node on the ring. Use this formula Initial_Token= Zero_Indexed_Node_Number * ((2^127) / Number_Of_Nodes)  In cassandra.yaml initial token=42535295865117307932921825928971026432  ** http://blog.milford.io/cassandra-token-calculator/  NOSQL Intro with MongoDB and Cassandra 70
  71. 71. • • Replication is how many copies of each piece of data that should be stored. In C* terms it is Replication Factor or “RF”. In C* RF is set at the keyspace level: CREATE KEYSPACE drg_compare WITH replication = {'class':'SimpleStrategy', 'replication_factor':3}; • How the data is replicated is called the Replication Strategy • SimpleStrategy – returns nodes “next” to each other on ring, Assumes single DC • NetworkTopologyStrategy – for configuring per data center. Rack and DC’s aware. update keyspace UserProfile with strategy_options=[{DC1:3, DC2:3}]; NOSQL Intro with MongoDB and Cassandra 71
  72. 72. NOSQL Intro with MongoDB and Cassandra 72
  73. 73.  Using token generation values from before. 4 node cluster. Write value with token 32535295865117307932921825928971026432 NOSQL Intro with MongoDB and Cassandra 73
  74. 74. NOSQL Intro with MongoDB and Cassandra 74
  75. 75. • • • When writing, Coordinator Node will be selected. Selected at write (or read) time. Not a SPF! Using Gossip Protocol nodes share information with each other. Who is up, who is down, who is taking which token ranges, etc. Every second, each node shares with 1 to 3 nodes. Consistency Level (CL) – says how many nodes must agree before an operation is a success. Set at read or write operation. • ONE – coordinator will wait for one node to ack write (also TWO, THREE). One is default if none provided. • QUORUM – we saw that before. N / 2 + 1. LOCAL_QUORUM, EACH_QUORUM • ANY – waits for some replicate. If all down, still succeeds. Only for writes. Doesn’t guarantee it can be read. • ALL– Blocks waiting for all replicas NOSQL Intro with MongoDB and Cassandra 75
  76. 76.     3 important concepts: Read Repair - At time of read, inconsistencies are noticed between nodes and replicas are updated. Direct and background. Direct is determined by CL. Anti-Entropy Node Repair - For data that is not read frequently, or to update data on a node that has been down for a while, the nodetool repair process (also called antientropy repair). Builds Merkle trees, compares nodes and does repair. Hinted Handoff - Writes are always sent to all replicas for the specified row regardless of the consistency level specified by the client. If a node happens to be down at the time of write, its corresponding replicas will save hints about the missed writes, and then handoff the affected rows once the node comes back online. This notification happens is via Gossip. Default 1 hour. NOSQL Intro with MongoDB and Cassandra 76
  77. 77. • • Interaction with Cassandra can be done using one of supplied clients such as CLI or CQL. Otherwise client applications are built using a language client library. Many clients in multiple languages. Including Java, .NET, Python, Scala, Go, PHP, Node.js, Perl, Ruby, etc. • Java: • Hector wraps the underlying Thrift API. Hector is one of the most commonly used client libraries. • Astyanax is a client library developed by Netflix . • Datastax CQL – newest CQL driver, will be very familiar to JDBC developers • And many more … (JPA) • Also exists Datastax OPSCenter and other various GUI’s and REST API (Virgil) NOSQL Intro with MongoDB and Cassandra 77
  78. 78.  Many More Topics / Information Related to C* not covered  Great for Fast Writes  No Single POF  Data Center Aware  Also Relative Ease Of Use NOSQL Intro with MongoDB and Cassandra 78
  79. 79.  Questions?  Comments? Thank You!!!!!!  brian.enochson@gmail.com  NOSQL Intro with MongoDB and Cassandra 79
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×