Slides from workshop held on 12/14 in Asbury Park, NJ
http://www.meetup.com/Jersey-Shore-Tech/events/148118762/?gj=ro2_e&a=ro2_gnl&rv=ro2_e&_af_eid=148118762&_af=event
2. REQUISITE SLIDE – WHO AM I?
-
Brian Enochson
- SW Engineer who has worked as designer / developer on NOSQL (Mongo,
Cassandra, Hadoop)
- Consultant – HBO, ACS, CIBER
- Specialize in SW Development, architecture and training
Brian Enochson
brian.enochson@gmail.com
Twitter @benochso
Google Plus https://plus.google.com/+BrianEnochson
Contact Me:
I am available for training, consulting & development.
NOSQL INTRO & MONGODB
2
3. AGENDA
Hour 1
•
Installation of required software (will send out list before, but make sure
all of class has what is needed)
•
Introduction to Big Data
•
Introduction to NoSQL
•
Relational Database to NoSQL technology contrast & compare
•
NoSQL landscape
•
Exercise – install and use required software
NOSQL INTRO & MONGODB
3
4. AGENDA
Hour 2
•
Introduction to MongoDB
•
MongoDB Components, capabilities and common use cases
•
Json & BsON
•
Documents, collections, references and Mongo ID
•
Querying
•
Other CRUD Operations
•
Indexes
•
Exercise – Design and populate MongoDB
NOSQL INTRO & MONGODB
4
5. AGENDA
Hour 3
•
Data Modeling/Schema Design
•
Replication & Sharding
•
Exercise: Application Development Using MongDB and Java
•
Wrap-up and final Q & A
NOSQL INTRO & MONGODB
5
6. SOFTWARE
Later we will need
•
MongoDB
http://www.mongodb.org/downloads
•
Java JDK
• 1.6
•
Netbeans, Eclipse or Intellij (with maven support)
• or maven and any editor
•
Our project
• http://bit.ly/IVnTEb
(or https://www.dropbox.com/sh/mwu6lltaljqq59z/PMWiw7ZPk3)
•
Robomongo or MongoExplorer
NOSQL INTRO & MONGODB
6
7. BIG DATA
Why are database like Mongo needed?
•
To understand we need to look at
• the history of databases
• How systems were built in the past
•
Modern Application Architectures
• Web scale
• Data acquisition
•
Other factors like cost of H/W
NOSQL INTRO & MONGODB
7
8. HISTORY OF THE DATABASE
•
1960’s – Hierarchical and Network type (IMS and CODASYL)
•
1970’s – Beginnings of theory behind relational model. Codd
•
1980’s – Rise of the relational model. SQL. E/R Model (Chen)
•
1990’s – Access/Excel and MySQL. ODMS began to appear
•
2000;’s – Two forces; large enterprise and open source. Google and Amazon.
CAP Theorem (more on that to come…)
•
2010’s – Immergence of NoSQL as an industry player and viable alternative
NOSQL INTRO & MONGODB
8
9. WHY WERE ALTERNATIVES NEEDED
•
Developers today are faced with Internet scale
• 100,000’s of users
• Low cost of storage
• Increased processing power
• Ability to capture (and need) of millions of events. Caching solves it to an
extent but brings other complexities
• Real-time
• Need to scale out and not up. (add infinite number of low cost machines vs.
replace with a more powerful machine).
•
Cost
• Let’s not forget for enterprise DB’s Internet scale can become expensive
• Open source DB’s may solve license cost, but don’t ignore operational costs
NOSQL INTRO & MONGODB
9
10. A LOT OF DATA
Some facts from http://www.storagenewsletter.com/rubriques/marketreportsresearch/ibm-cmo-study/
Approximately 90 percent of all the real-time information being created today is
unstructured data
Every day we create 2.5 quintillion (10 to the 18th) bytes of data (this is 30
zeroes!!)
90 percent of the world's data today has been created in the last two years alone
NOSQL INTRO & MONGODB
10
11. RELATIONAL VS. NOSQL
• Relational
• Divide into tables, relate into foreign keys, DB constraints, normalized
data, the Interface is SQL
• NoSQL
• Store in schemaless format, redundancy encouraged, application access
determines the storage format (your queries).Interface varies and is
optimized for the implementation, no forced DB constraints. Tradeoff is
often you get eventual consistency.
NOSQL INTRO & MONGODB
11
12. TRADEOFFS?
Luckily, due to the large number of compromises made when
attempting to scale their existing relational databases,
these tradeoffs were not so foreign or
distasteful as they might have been.
Greg Burd - https://www.usenix.org/legacy/publications/login/201110/openpdfs/Burd.pdf
NOSQL INTRO & MONGODB
12
13. 3 V’S – DESCRIBING THE BIG DATA PROBLEM
Driving force in requiring new technology is often referred to as the “3 V Model”.
•
High Volume – amount of data
•
High Variety – range of data types and sources
•
High Velocity – speed of data in and out
OK, maybe 4 V’s
•
Veracity – is all the data applicable to the problem being analyzed.
NOSQL INTRO & MONGODB
13
14. NOSQL IS NOT BIG DATA
NoSQL != Big Data
NoSQL products were created to help solve the big data problem.
Big data is a much larger problem than just storage. Analysis tools like
Hadoop, messaging systems like Kafka, real time processing engines like
Storm and machine learning (Mahout) all help solve the big data problem.
NOSQL INTRO & MONGODB
14
15. NOSQL TYPES
Document DB
• MongoDB, CouchDB,
Wide Column– Column Family
• Cassandra, HBASE, Amazon SimpleDB
Key Value
• Riak, Redis, DynamoDB, Voldemort, MemcacheDB
Graph
• Neo4J, OrientDB
Search (also alternatives, normally used with *)
• Lucene, Solr, ElasticSearch
Many many many, many more! (http://nosql-database.org/)
NOSQL INTRO & MONGODB
15
16. CHOOSING THE RIGHT ONE…
Choosing the right NoSQL type and eventual product depends on…
Type of Data
• One key and a lot of data?
• High volume of data?
• Storing, media, blobs,
• Document oriented?
• Tracking relationships?
• Combination?
• Multi-Datacenter
Type of Access
Volumes of Data (there is big data and there is BIG DATA)
Need Support/Services/Training
NOSQL INTRO & MONGODB
16
18. ACID
YOU PROBABLY ALL HAVE HEARD OF ACID
•
Atomic – All or None
•
Consistency – What is written is valid
•
Isolation – One operation at a time
•
Durability – Once committed to the DB, it stays
This is the world we have lived in for a long time…
NOSQL INTRO & MONGODB
18
19. CAP THEOREM (BREWERS)
Many may have heard this one
CAP stands for Consistency, Availability and Partition Tolerance
• Consistency –like the C in ACID. Operation is all or nothing,
• Availability – service is available.
• Partition Tolerance – No failure other than complete network failure causes
system not to respond
(REMEMBER VISUAL GUIDE TO SELECTING A NO SQL DATABASE
So.. What does this mean?
** http://www.cs.berkeley.edu/~brewer/cs262b-2004/PODC-keynote.pdf
NOSQL INTRO & MONGODB
19
20. YOU CAN ONLY HAVE 2 OF THEM
Or better said in C* terms you can have Availability and Partition-Tolerant
AND Eventual Consistency.
Means eventually all accesses will return the last updated value.
NOSQL INTRO & MONGODB
20
21. VISUAL GUIDE – USING THE CAP THEOREM
HTTP://BLOG.NAHURST.COM/VISUAL-GUIDE-TO-NOSQL-SYSTEMS
NOSQL INTRO & MONGODB
21
22. BIG DATA WRAP UP
•
So we are talking about large amounts of data
•
High velocity of acquisition
•
A lot of variety that we need to store. Will worry about it later how to
handle (or not)
•
Need to scale and not break the bank
•
Want the database to support agile, not hinder
NOSQL INTRO & MONGODB
22
23. STILL WRAPPING
•
Maybe consider going relational if
• High transaction (FoundationDB?)
• Business Intelligence Systems (Hadoop may make this not true)
• Don’t be fooled by fear of losing ACID….
http://highscalability.com/blog/2013/5/1/myth-eric-brewer-on-why-banks-arebase-not-acid-availability.html
NOSQL INTRO & MONGODB
23
25. MONGO OVERVIEW
Few high level points
•
Document Oriented
•
Storage format is JSON (actually BSON)
•
Replication built in
•
Master / slave architecture
•
Strong querying support
•
from "humongous"
NOSQL INTRO & MONGODB
25
29. DOCUMENT
At its simplest form, Mongo is a document oriented
database
•
MongoDB stores all data in documents, which are JSON-style data
structures composed of field-and-value pairs.
•
MongoDB stores documents on disk in the BSON serialization format.
BSON is a binary representation of JSON documents. BSON contains
more data types than does JSON.
** For in-depth BSON information, see bsonspec.org.
NOSQL INTRO & MONGODB
29
30. WHAT DOES A DOCUMENT LOOK LIKE
{
"_id" : "52a602280f2e642811ce8478",
"ratingCode" : "PG13",
"country" : "USA",
"entityType" : "Rating”
}
NOSQL INTRO & MONGODB
30
32. RULES FOR A DOCUMENT
Documents have the following rules:
The maximum BSON document size is 16 megabytes.
The field name _id is reserved for use as a primary key; its value must be
unique in the collection.
The field names cannot start with the $ character.
The field names cannot contain the . character.
NOSQL INTRO & MONGODB
32
36. MORE MONGO SHELL
2_arrays_sort.txt
• Embedded documents
• Limit, Sort
• Using regex in query
• Removing documents
• Drop collection
NOSQL INTRO & MONGODB
36
37. IMPORT / EXPORT
3_imp_exp.txt
Mongo provides tools for getting data in and out of the database
• Data Can Be Exported to json files
• Json files can then be Imported
NOSQL INTRO & MONGODB
37
40. DATA MODELING
•
Remember with NoSql redundancy is not evil
•
Applications insure consistency, not the DB
•
Application join data, not defined in the DB
•
Datamodel is schema-less
•
Datamodel is built to support queries usually
NOSQL INTRO & MONGODB
40
41. QUESTIONS TO ASK
•
Your basic units of data (what would be a document)?
•
How are these units grouped / related?
•
How does Mongo let you query this data, what are the options?
•
Finally, maybe most importantly, what are your applications access
patterns?
• Reads vs. writes
• Queries
• Updates
• Deletions
• How structured is it
NOSQL INTRO & MONGODB
41
42. DATA MODEL - NORMALIZED
Normalized
• Similar to relational model.
• One collection per entity type
• Little or no redundancy
• Allows clean updates, familiar to many SQL users, easier to understand
NOSQL INTRO & MONGODB
42
44. REFERENCES
•
From parent to child
{
name: "O'Reilly Media",
books: [12346789, 234567890, ...]
}
•
From child to parent
{
_id: 123456789,
title: "MongoDB: The Definitive Guide",
publisher_id: "oreilly"
}
NOSQL INTRO & MONGODB
44
45. DATA MODEL - EMBEDDED
Oft used pattern in Mongo, is to embed information as subdocuments.
•
Used when there is a contains relationship
•
Easier querying (when related data is often used together)
•
Need to keep 16 MB document size in mind
NOSQL INTRO & MONGODB
45
47. OTHER CONSIDERATIONS FOR DATA MODELING
Many or few collections
•
Many Collections
• As seen in normalized
• Clean and little redundancy
• May not provide best performance
• May require frequent updates to application if new types added
•
Multiple Collections
• Middle ground, partially normalized
•
Not many collections
• One large generic collection
• Contains many types
• Use type field
NOSQL INTRO & MONGODB
47
48. CONSIDERATION CONTINUED
•
Document Growth – will relocate if exceeds allocated size
•
Atomicity
• Atomic at document level
• Consideration for insertions, remove and multi-document updates
Sharding – collections distributed across mongod instances, uses a shard key
Indexes – index fields often queries, indexes affect write performance slightly
Consider using TTL to automatically expire documents
NOSQL INTRO & MONGODB
48
49. COMMON USES FOR MONGO
Log Collection
https://code.google.com/p/log4mongo/
Caching
Queues / Messaging
Capped Collections - fixed-size collections that support high-throughput
operations that insert, retrieve, and delete documents based on insertion order.
Analytics
Prototyping
NOSQL INTRO & MONGODB
49
50. MONGODB DEVELOPMENT WITH JAVA
Supplied by MongoDB Itself
Easy to setup
Housed on maven repo
NOSQL INTRO & MONGODB
50
51. EXAMPLE JAVA APP
Load Health Data
Query Data
Administrative Functions
NOSQL INTRO & MONGODB
51
53. SOME OTHER COOL STUFF
Get
MEAN
Mongo, Express, Angular and Node
http://bitnami.com/stack/mean
Can install, in a VM or even in the cloud
NOSQL INTRO & MONGODB
53
54. THE CLOUD
Database in the cloud
https://mongolab.com/
Can access using shell, GUI Mongo explorer, mongoimport, mongoexport
and use in application
Amazon, Rackspace, Joyent or Azure
NOSQL INTRO & MONGODB
54
55. BOOKS
MongoDB: The Definitive Guide, 2nd Edition
By: Kristina Chodorow
Publisher: O'Reilly Media, Inc.
Pub. Date: May 23, 2013
Print ISBN-13: 978-1-4493-4468-9
Pages in Print Edition: 432
MongoDB in Action
By: Kyle Banker
Publisher: Manning Publications
Pub. Date: December 16, 2011
Print ISBN-10: 1-935182-87-0
Print ISBN-13: 978-1-935182-87-0
Pages in Print Edition: 312
The Definitive Guide to MongoDB: The NoSQL Database for Cloud and Desktop Computing
By Eelco Plugge; Peter Membrey; Tim Hawkins
Apress, September 2010
ISBN: 9781430230519
327 pages
NOSQL INTRO & MONGODB
55
56. BOOKS CONT.
MongoDB Applied Design Patterns
By: Rick Copeland
Publisher: O'Reilly Media, Inc.
Pub. Date: March 18, 2013
Print ISBN-13: 978-1-4493-4004-9
Pages in Print Edition: 176
MongoDB for Web Development (rough cut!)
By: Mitch Pirtle
Publisher: Addison-Wesley Professional
Last Updated: 14-JUN-2013
Pub. Date: March 11, 2015 (Estimated)
Print ISBN-10: 0-321-70533-5
Print ISBN-13: 978-0-321-70533-4
Pages in Print Edition: 360
Instant MongoDB
By: Amol Nayak;
Publisher: Packt Publishing
Pub. Date: July 26, 2013
Print ISBN-13: 978-1-78216-970-3
Pages in Print Edition: 72
NOSQL INTRO & MONGODB
56