Where does Big Data come from
Click stream data
Big Data Challenges
Size of Big data.
Unstructured or semi structured data.
Analyzing Big data.
How Hadoop solves the Big Data
Hadoop is built on cluster of
It handles unstructured and semi
Hadoop cluster can scale
horizontally to meet storage
Hadoop clusters provide both
storage as well as computation.
Solving Big data problems with
ENTERPRISE USE CASES
Were higher priced items selling in certain markets ?
Should inventory be re-allocated or price optimized based on
Monitor and predict network failure
Services in Hadoop
Namenode : Stores and maintains the metadata for HDFS
Secondary namenode : Performs housekeeping functions for
Datanode : Stores actual HDFS data blocks
Jobtracker : Manages MapReduce jobs and distributes individual tasks
to task trackers.
Tasktracker : Responsible to instantiate and monitor Map and reduce
Hadoop Fault tolernace
The Data stored in HDFS is replicated to more than one DataNode,
so that even if one data node goes down we have copy of data on
some other node.
The replication factor by default is 3 and is configurable
The namenode is Single Point of Failure in Cluster and hence the
logs and metadata are periodically backed up to secondary
HDFS – Hadoop Distributed File
Hadoop is the distributed file system for storing huge data sets on
the cluster of commodity hardware with streaming data access
Hadoop Ecosystems Introduction
Sqoop : Imports data from relational databases.
Flume : Collection and import of log and event data.
Map Reduce : Parallel computation on server clusters.
HDFS : Distributed redundant file system for Hadoop
Pig : High level programming language for Hadoop computations.
Hive : Data warehouse with SQL like access
Data Processing systems in Hadoop