1. Storing of Unstructured Data into MongoDB
Using Consistent Hashing Algorithm
Guide: Presented By,
Dr.M.Saibaba.,AD,RMG, Saran Raj S,
IGCAR – Kalpakkam. M.Tech-DBS, IIIT Srirangam,
External Co-ordinator: Internal Co-Ordinator:
Mr.E.Soundarajan SIRD,RMG, Mrs. S. Jayanthi
IGCAR – Kalpakkam. Anna University, Trichy.
2. Objective:
Nowadays large amount of unstructured data is consumed and
produced over the network.
The big problem is how to store those data and improve the
availability and scalability of the storage system is most
challenge one.
Some of the NoSql databases are supported to store the large
amount of the data’s of unstructured data.
The main objective of this project is storing and retrieving a
large amount of unstructured data into Mongo DB.
3. Introduction
What is Unstructured data?
Unstructured data is a generic label for describing data that is
not contained in a database or some other type of the data
structure.
Unstructured data can be textual or non-textual.
Textual unstructured data is generated in media like email
messages, PowerPoint presentations, Word documents,
collaboration software and instant messages.
Non-textual unstructured data is generated in media
like JPEG images, MP3 audio files and Flash video files.
cont..
4. Introduction Cont.
What is MongoDB ?
MongoDB is a NoSql (Not only Sql) database.
Mongo DB is an open source document-oriented database.
Mongo DB was created by Dwight Merriman and Eliot
Horowitz and .
Mongo DB maintains the most valuable features of
relational databases: strong consistency, expressive query
language and secondary indexes
5. Existing System
RDBMS is not supported to store the unstructured data.
There is the failure has occur for storing unstructured data like
large number of Video files, Pdf and Word files
DynamoDB is suitable for use cases where data access is by
one or two dimensions of data, and usually by applications
running on Amazon's EC2 service.
6. Proposed system
In this project I am going to use MongoDB to store the large amount of
unstructured data.
MongoDB is very consistent to store the unstructured data using a
framework.
My Store framework is containing CRUD (Create, Read, Update, Delete)
operations.
7. Algorithm: Consistent Hashing
Algorithm
Consistent hashing aim is to provide a uniform data sharding around
the cluster with the least data lose.
class Ring(object):
def add(self, *keys):
raise NotImplementedError
def remove(self, *keys):
raise NotImplementedError
def get(self, key):
raise NotImplementedError
def empty(self, key):
raise NotImplementedError
9. System Requirements
Hardware Requirements
System : Quad core processor
2.80GHz.
Hard Disk : 500 GB.
RAM : 4 GB.
Software Requirements
Operating system : Ubuntu
Coding Language : Python/C++
Data Base : MongoDB