Paper name : Big Data Analytics
Staff : Mrs M. Florence Dayana M. C. A., M.Phil., (Ph.D.)
Class : II- M.Sc.(Computer Science)
Semester : IV
Unit : III
Topic : Big Data Technologies and Databases
BIG DATA
TECHNOLOGIES AND
DATABASES
NoSQL:
* NoSQL stands for Not Only SQL.
* These are non-relational, open source,
distributed database.
* NoSQL databases are widely used in big
data and other real time web applications.
* NoSQL databases is used to stock log data
which can then be pulled for analysis.
* They do not adhere to relational data
model.
*They are Distributed.
TYPES OF NOSQL:
* Key-Value or the big hash table: It maintains a big
hash table of keys and values.
* schema-less
Document:
* It maintains data in collections of documents.
* Each storage block has data from only one
column.
* Graphs are also called network database.
* A graph stores data in nodes.
ADVANTAGES OF NOSQL:
* Can easily scale up and down.
* Doesn’t required a pre-defined schema.
* Cheap, easy to implement.
* Relaxes the data consistency requirement.
* Data can be replicated to multiple nodes and
can be partitioned.
* NoSQL is being put to use in varied
industries.
* They are used to support analysis for
applications.
NEW SQL:
* New modern RDBMS is called New sql.
* It supports relational data model and uses
sql as their primary interface.
* New sql is based on the shared nothing
architecture with a sql interface foe
application interaction.
HADOOP:
* Hadoop is the open source project of the Apache
foundation.
* It is a framework written in java.
* It was created to support distribution for
“Nutch”, the text search engine.
FEATURES OF HADOOP:
1. It Is optimized to handle massive quantities of
structured, semi-structured and unstructured data.
2. Hadoop has a shared nothing architecture.
3. Replicate its data across multiple computers.
4. It is for high thought put rather than low
latency.
5. It complements OLTP and OLAP.
Key Advantages Of Hadoop:
* Stores data in its native format
* Scalable
* Cost effective
* Resilient to failure
* Flexibility
* Fast
Versions of Hadoop:
* Hadoop 1.0
* Hadoop 2.0
Main parts of Hadoop:
* Data storage framework:
* It is a general –purpose file system called
Hadoop distributed file system(HDFS). It simply
stores data files and these data file can be in just
about any format.
* Data processing framework:
* This is a simple functional programming
model initially popularized by google as MapReduce.
Overview of Hadoop ecosystem:
* HDFS
* HBase
* Hive
* Pig
* Zookeeper
* Oozie
* Mahout
* Chukwa
* Sqoop
* Ambari
Hardware Failure:
* In a distributed system, Several servers are
networked together.
* This implies that more often than nor, there
may be a possibility of hardware failure.
Hadoop accomplishes two tasks
* Massive data storage.
* Faster data processing
Hadoop Concepts:
Hadoop MapReduce Framework:
MapReduce Daemons:
Job Tracker:
* It provides connectivity between Hadoop
and application.
* When you submit code to cluster, Job
tracker creates a execution plan by deciding
which task to assign to which node.
Task Tracker:
* This daemon is responsible for executing
individual tasks that is assigned by the job
tracker.
Big data technologies and databases

Big data technologies and databases

  • 1.
    Paper name :Big Data Analytics Staff : Mrs M. Florence Dayana M. C. A., M.Phil., (Ph.D.) Class : II- M.Sc.(Computer Science) Semester : IV Unit : III Topic : Big Data Technologies and Databases
  • 2.
  • 3.
    NoSQL: * NoSQL standsfor Not Only SQL. * These are non-relational, open source, distributed database. * NoSQL databases are widely used in big data and other real time web applications. * NoSQL databases is used to stock log data which can then be pulled for analysis. * They do not adhere to relational data model. *They are Distributed.
  • 4.
    TYPES OF NOSQL: *Key-Value or the big hash table: It maintains a big hash table of keys and values. * schema-less Document: * It maintains data in collections of documents. * Each storage block has data from only one column. * Graphs are also called network database. * A graph stores data in nodes.
  • 5.
    ADVANTAGES OF NOSQL: *Can easily scale up and down. * Doesn’t required a pre-defined schema. * Cheap, easy to implement. * Relaxes the data consistency requirement. * Data can be replicated to multiple nodes and can be partitioned. * NoSQL is being put to use in varied industries. * They are used to support analysis for applications.
  • 6.
    NEW SQL: * Newmodern RDBMS is called New sql. * It supports relational data model and uses sql as their primary interface. * New sql is based on the shared nothing architecture with a sql interface foe application interaction.
  • 7.
    HADOOP: * Hadoop isthe open source project of the Apache foundation. * It is a framework written in java. * It was created to support distribution for “Nutch”, the text search engine. FEATURES OF HADOOP: 1. It Is optimized to handle massive quantities of structured, semi-structured and unstructured data. 2. Hadoop has a shared nothing architecture. 3. Replicate its data across multiple computers.
  • 9.
    4. It isfor high thought put rather than low latency. 5. It complements OLTP and OLAP. Key Advantages Of Hadoop: * Stores data in its native format * Scalable * Cost effective * Resilient to failure * Flexibility * Fast
  • 10.
    Versions of Hadoop: *Hadoop 1.0 * Hadoop 2.0 Main parts of Hadoop: * Data storage framework: * It is a general –purpose file system called Hadoop distributed file system(HDFS). It simply stores data files and these data file can be in just about any format. * Data processing framework: * This is a simple functional programming model initially popularized by google as MapReduce.
  • 11.
    Overview of Hadoopecosystem: * HDFS * HBase * Hive * Pig * Zookeeper * Oozie * Mahout * Chukwa * Sqoop * Ambari
  • 13.
    Hardware Failure: * Ina distributed system, Several servers are networked together. * This implies that more often than nor, there may be a possibility of hardware failure. Hadoop accomplishes two tasks * Massive data storage. * Faster data processing
  • 14.
  • 15.
  • 16.
    MapReduce Daemons: Job Tracker: *It provides connectivity between Hadoop and application. * When you submit code to cluster, Job tracker creates a execution plan by deciding which task to assign to which node. Task Tracker: * This daemon is responsible for executing individual tasks that is assigned by the job tracker.