Big data technologies and databases

Paper name : Big Data Analytics
Staff : Mrs M. Florence Dayana M. C. A., M.Phil., (Ph.D.)
Class : II- M.Sc.(Computer Science)
Semester : IV
Unit : III
Topic : Big Data Technologies and Databases

BIG DATA
TECHNOLOGIES AND
DATABASES

NoSQL:
* NoSQL stands for Not Only SQL.
* These are non-relational, open source,
distributed database.
* NoSQL databases are widely used in big
data and other real time web applications.
* NoSQL databases is used to stock log data
which can then be pulled for analysis.
* They do not adhere to relational data
model.
*They are Distributed.

TYPES OF NOSQL:
* Key-Value or the big hash table: It maintains a big
hash table of keys and values.
* schema-less
Document:
* It maintains data in collections of documents.
* Each storage block has data from only one
column.
* Graphs are also called network database.
* A graph stores data in nodes.

ADVANTAGES OF NOSQL:
* Can easily scale up and down.
* Doesn’t required a pre-defined schema.
* Cheap, easy to implement.
* Relaxes the data consistency requirement.
* Data can be replicated to multiple nodes and
can be partitioned.
* NoSQL is being put to use in varied
industries.
* They are used to support analysis for
applications.

NEW SQL:
* New modern RDBMS is called New sql.
* It supports relational data model and uses
sql as their primary interface.
* New sql is based on the shared nothing
architecture with a sql interface foe
application interaction.

HADOOP:
* Hadoop is the open source project of the Apache
foundation.
* It is a framework written in java.
* It was created to support distribution for
“Nutch”, the text search engine.
FEATURES OF HADOOP:
1. It Is optimized to handle massive quantities of
structured, semi-structured and unstructured data.
2. Hadoop has a shared nothing architecture.
3. Replicate its data across multiple computers.

4. It is for high thought put rather than low
latency.
5. It complements OLTP and OLAP.
Key Advantages Of Hadoop:
* Stores data in its native format
* Scalable
* Cost effective
* Resilient to failure
* Flexibility
* Fast

Versions of Hadoop:
* Hadoop 1.0
* Hadoop 2.0
Main parts of Hadoop:
* Data storage framework:
* It is a general –purpose file system called
Hadoop distributed file system(HDFS). It simply
stores data files and these data file can be in just
about any format.
* Data processing framework:
* This is a simple functional programming
model initially popularized by google as MapReduce.

Overview of Hadoop ecosystem:
* HDFS
* HBase
* Hive
* Pig
* Zookeeper
* Oozie
* Mahout
* Chukwa
* Sqoop
* Ambari

Hardware Failure:
* In a distributed system, Several servers are
networked together.
* This implies that more often than nor, there
may be a possibility of hardware failure.
Hadoop accomplishes two tasks
* Massive data storage.
* Faster data processing

MapReduce Daemons:
Job Tracker:
* It provides connectivity between Hadoop
and application.
* When you submit code to cluster, Job
tracker creates a execution plan by deciding
which task to assign to which node.
Task Tracker:
* This daemon is responsible for executing
individual tasks that is assigned by the job
tracker.

Big data technologies and databases

Big data technologies and databases

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Big data technologies and databases

Similar to Big data technologies and databases (20)

Recently uploaded

Recently uploaded (20)

Big data technologies and databases