1. NoSQL Technologies
Lecturer: Dr. Nguyen Binh Minh
Students:
- Pham Anh Doi
- Nguyen Quang Huy
- Nguyen Hai Nam
- Luong Anh Tuan
- Pham Duc Thang
- Nguyen Thi Tuyet Trinh
- Emmanuel Nana Ofori
2016-11-24 1
2. Agenda
• Introduction NoSQL
• Three Popular NoSQL: Cassandra, MongoDB,
ElasticSearch
• Compare Cassandra vs MongoDB vs
ElasticSearch
• Some NoSQL Changllengs
2016-11-24 2
7. What is MongoDB ?
• MongoDB is an open-source document
database, and leading NoSQL database.
MongoDB is written in c++
• MongoDB is a cross-platform, document oriented
database that provides, high performance, high
availability, and easy scalability. MongoDB works
on concept of collection and document.
8. MongoDB - Overview
• Database:
Database is a physical container for collections. Each database gets
its own set of files on the file system. A single MongoDB server
typically has multiple databases.
• Collection
Collection is a group of MongoDB documents. It is the equivalent of
an RDBMS table. A collection exists within a single database.
Collections do not enforce a schema. Documents within a collection
can have different fields. Typically, all documents in a collection are of
similar or related purpose.
9. MongoDB - Overview
• Document
A document is a set of key-value pairs. Documents have dynamic
schema. Dynamic schema means that documents in the same
collection do not need to have the same set of fields or structure, and
common fields in a collection's documents may hold different types
of data.
10. MongoDB - Overview
• Below given table shows the relationship of RDBMS
terminology with MongoDB
11. Sample document
• Below given example shows the document structure of a
blog site which is simply a comma separated key value
pair.
12. Sample document
• _id is a 12 bytes hexadecimal number which assures the uniqueness
of every document. You can provide _id while inserting the
document. If you didn't provide then MongoDB provide a unique id
for every document. These 12 bytes first 4 bytes for the current
timestamp, next 3 bytes for machine id, next 2 bytes for process id of
mongodb server and remaining 3 bytes are simple incremental value.
13. Advantages of MongoDB over RDBMS
• Schema less : MongoDB is document database in
which one collection holds different different
documents. Number of fields, content and size of the
document can be differ from one document to
another.
• Structure of a single object is clear
• No complex joins
• Deep query-ability. MongoDB supports dynamic
queries on documents using a document-based query
language that's nearly as powerful as SQL
• Tuning
14. Advantages of MongoDB over RDBMS
• Ease of scale-out: MongoDB is easy to scale
• Conversion / mapping of application objects to database objects not
needed
• Uses internal memory for storing the (windowed) working set,
enabling faster access of data
15. Why should use MongoDB?
• Document Oriented Storage : Data is stored in
the form of JSON style documents
• Index on any attribute
• Replication & High Availability
• Auto-Sharding
• Rich Queries
• Fast In-Place Updates
• Professional Support By MongoDB
16. Where should use MongoDB?
• Big Data
• Content Management and Delivery
• Mobile and Social Infrastructure
• User Data Management
• Data Hub
17. Not use for?
• Highly Transactional Applications.
• Problems requiring SQL.
Some Companies using MongoDB in Production
20. What is Cassandra
• Apache Cassandra is an open source, distributed and
decentralized/distributed storage system (database), for managing very
large amounts of structured data spread out across the world. It provides
highly available service with no single point of failure.
• Notable points of Apache Cassandra:
• It is scalable, fault-tolerant, and consistent.
• It is a column-oriented database.
• Its distribution design is based on Amazon’s Dynamo and its data model on Google’s
Bigtable.
• Created at Facebook, it differs sharply from relational database management
systems.
• Cassandra implements a Dynamo-style replication model with no single point of
failure, but adds a more powerful “column family” data model.
• Cassandra is being used by some of the biggest companies such as Facebook,
Twitter, Cisco, Rackspace, ebay, Twitter, Netflix, and more.
21. Feature of
Cassandra
Elastic
scalability
No Single
Point of
Failure
Scale
Horizontally
(Linear
Availability /
Scale Out)
Flexible data
storage
Easy data
distribution
Peer-to-peer
Architecture
( no primary
secondary)
Fast writes
22. Architecture of Cassandra
Cassandra was built from the ground up with the
understanding that hardware and system failures can
and do occur
Peer-to-peer, distributed system
All nodes are the same
Data partitioned among all modes in the cluster
Custom data replication to ensure fault tolerance
Read/write anywhere design
24. Component of Cassandra
Node − It is the place where data is stored.
Data center − It is a collection of related nodes.
Cluster − A cluster is a component that contains one or more data centers.
Commit log − The commit log is a crash-recovery mechanism in Cassandra.
Every write operation is written to the commit log.
Mem-table − A mem-table is a memory-resident data structure. After
commit log, the data will be written to the mem-table. Sometimes, for a
single-column family, there will be multiple mem-tables.
SSTable − It is a disk file to which the data is flushed from the mem-table
when its contents reach a threshold value.
Bloom filter − These are nothing but quick, nondeterministic, algorithms for
testing whether an element is a member of a set. It is a special kind of
cache. Bloom filters are accessed after every query.
26. • Is an earch engine / real-time search(1)
• Is free and open source distributed inverted index created by
shay banon.
• Build on top ofApache Lucene(2) .
• Developed in Java, so inherently cross-platform.
Mozilla, Quora, SoundCloud, GitHub, StackExchange, Center for Open
Science, Reverb, Netflix….
28. Why Elastic Search?
Easy to scale (Distributed)
Everything is one JSON call away (RESTful API)
Excellent Query DSL
Support for advanced search features (Full Text)
Configurable and Extensible
Document Oriented
Schema free
Conflict management
29. Elastic Search is built to scale horizontally out of
the box. When ever you need to increase
capacity, just add more nodes, and let the cluster
reorganize itself to take advantage of the extra
hardware.
Easy to Scale (Distributed)
RESTful API
ElasticSearch is API driven. Almost any action can be
performed using a simple RESTful API using JSON
over HTTP. .
Responses are always in JSON format.
30. Demo
1. Run elastic search and test in http://localhost:9200/
Response :
{
"status" : 200,
"name" : “elasticsearch",
"version" : {
"number" : "1.3.4",
"build_hash" : "f1585f096d3f3985e73456debdc1a0745f512bbc",
"build_timestamp" : "2015-04-21T14:27:12Z",
"build_snapshot" : false,
"lucene_version" : "4.9"
},
"tagline" : "You Know, for Search"
}
31. Demo
PUT data : curl -XPUT 'http://localhost:9200/twitter/tweet/1' -d '{"user" : "kimchy"}‘
Searching data: curl -XPUT
'http://localhost:9200/blog/post/_search?q=user:dilbert&pretty'
32. Feature Cassandra MongoDB ElasticSearch
Model #1 Wide-column store #1 Search engine #1 Document stores
Developer Apache Software MongoDB, Inc Foundation Elastic
Initial Release 2008 2010 2009
Database as a Service no no no
Server operating systems BSD, Linux, OS X,
Windows
All OS with a Java VM Linux, OS X, Solaris,
Windows
Data Schema free free free
Secondary indexes restricted yes yes
SQL no no no
APIs and other access
methods
Proprietary protocol Java API
RESTful HTTP/JSON API
proprietary protocol
using JSON
Server-side scripts no yes JavaScript
Compare Cassandra vs MongoDB vs ElasticSearch
33. Feature Cassandra MongoDB ElasticSearch
Partitioning methods Sharding Sharding Sharding
Replication methods selectable replication
factor
yes Master-slave replication
MapReduce yes no yes
Consistency concepts Eventual Consistency
Immediate Consistency
Eventual Consistency Eventual Consistency
Immediate Consistency
Foreign keys no no
no typically not used,
however similar
functionality with DBRef
possible
Transaction concepts no no no
Concurrency Support
for concurrent
manipulation of data
yes yes yes
Compare Cassandra vs MongoDB vs ElasticSearch