Cred_hadoop_presenatation

Where does Big Data come from
 Web data
 Social Media
 Click stream data
 Sensor data
 Connected Device

Big Data Challenges
 Size of Big data.
 Unstructured or semi structured data.
 Analyzing Big data.

How Hadoop solves the Big Data
Problem
 Hadoop is built on cluster of
machines.
 It handles unstructured and semi
structured data.
 Hadoop cluster can scale
horizontally to meet storage
requirements .
 Hadoop clusters provide both
storage as well as computation.

Solving Big data problems with
Hadoop
ENTERPRISE USE CASES

Retail
 Challenges :
Were higher priced items selling in certain markets ?
Should inventory be re-allocated or price optimized based on
geography ?

Manufacturing
 Challenges:
Monitor and predict network failure

Hadoop - Introduction
HADOOP = HDFS + MAPREDUCE

Services in Hadoop
 Namenode : Stores and maintains the metadata for HDFS
 Secondary namenode : Performs housekeeping functions for
namenode
 Datanode : Stores actual HDFS data blocks
 Jobtracker : Manages MapReduce jobs and distributes individual tasks
to task trackers.
 Tasktracker : Responsible to instantiate and monitor Map and reduce
task.

Hadoop Fault tolernace
 The Data stored in HDFS is replicated to more than one DataNode,
so that even if one data node goes down we have copy of data on
some other node.
 The replication factor by default is 3 and is configurable
 The namenode is Single Point of Failure in Cluster and hence the
logs and metadata are periodically backed up to secondary
namenode.

HDFS – Hadoop Distributed File
System
 Hadoop is the distributed file system for storing huge data sets on
the cluster of commodity hardware with streaming data access
pattern.

Hadoop Ecosystems Introduction
 Sqoop : Imports data from relational databases.
 Flume : Collection and import of log and event data.
 Map Reduce : Parallel computation on server clusters.
 HDFS : Distributed redundant file system for Hadoop
 Pig : High level programming language for Hadoop computations.
 Hive : Data warehouse with SQL like access

Data Processing systems in Hadoop
Batch Processing
 Map Reduce
Stream Processing
 Apache Spark
 Apache Storm

Cred_hadoop_presenatation

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Viewers also liked

Viewers also liked (20)

Similar to Cred_hadoop_presenatation

Similar to Cred_hadoop_presenatation (20)

Cred_hadoop_presenatation