Yalla Ramisetty (Day-1)
yalla4u@gmail.com
 What is BigData ?
 What is Hadoop ?
 How Hadoop helps to achieve Bigdata
problem ?
 Big data is a broad term for data sets so large
or complex that traditional data
processing applications are inadequate.
 Challenges include analysis,
capture, curation, search, sharing, storage,
transfer, visualization, and information
privacy.
** credits : www.wikipedia.com**
 How to store huge data sets ?
 Processing Speed ?
 Cost of processing frame work ?
 Cost of Infrastructure ?
 Need of Simple Abstract modules ?
 Ease of development.
Need of an unified Framework  Hadoop
• Hadoop:
• An open-source software framework that supports data-
intensive distributed applications, licensed under the Apache
v2 license.
• Goals / Requirements:
• Abstract and facilitate the storage and processing of large
and/or rapidly growing data sets
• Structured and non-structured data
• Simple programming models
• High scalability and availability
• Use commodity (cheap!) hardware with little redundancy
• Fault-tolerance
• Move computation rather than data
OS
Processing
layer
Data
layer
Normal-PC
Disk
HDFS
Disk Disk Disk
Node Node Node
Shared File System
OS
MR
OS
MR
OS
MR
HDFS
Processing
layer
Data
layer
Disk
HDFS
Disk
Disk Disk
DataNode DataNode DataNode
Shared FileSystem
NameNode
DataNode
daemon
DataNode
daemon
DataNode
daemon
NameNode
daemon
Client
HDFS
Processing
layer
Data
layer
Disk
HDFS
Disk
Disk Disk
TaskTrackerNode
TTNode
Shared File System
JobTrackerNode
DataNode
daemon
DataNode
daemon
DataNode
daemon
NameNode
daemon
Client
TTNode
JobTracker
daemon
TaskTracker
daemon
TaskTracker
daemon
TaskTracker
daemon
TaskTracker
daemon
DataNode
daemon
bench
S S S S S S
bench
S S S S S S
bench
S S S S S S
bench
S S S S S S
bench
S S S S S S
Class Room Cluster
Rack
DNNN DN DN DN DN
Rack
DNDN DN DN DN DN
Rack
DNDN DN DN DN DN
Rack
DNDN DN DN DN DN
Rack
DNDN DN DN DN DN
Switch
Student
Leader/Master
BigFile
Big Data Introduction
Big Data Introduction
Big Data Introduction
Big Data Introduction

Big Data Introduction

  • 1.
  • 2.
     What isBigData ?  What is Hadoop ?  How Hadoop helps to achieve Bigdata problem ?
  • 3.
     Big datais a broad term for data sets so large or complex that traditional data processing applications are inadequate.  Challenges include analysis, capture, curation, search, sharing, storage, transfer, visualization, and information privacy. ** credits : www.wikipedia.com**
  • 8.
     How tostore huge data sets ?  Processing Speed ?  Cost of processing frame work ?  Cost of Infrastructure ?  Need of Simple Abstract modules ?  Ease of development. Need of an unified Framework  Hadoop
  • 10.
    • Hadoop: • Anopen-source software framework that supports data- intensive distributed applications, licensed under the Apache v2 license. • Goals / Requirements: • Abstract and facilitate the storage and processing of large and/or rapidly growing data sets • Structured and non-structured data • Simple programming models • High scalability and availability • Use commodity (cheap!) hardware with little redundancy • Fault-tolerance • Move computation rather than data
  • 13.
    OS Processing layer Data layer Normal-PC Disk HDFS Disk Disk Disk NodeNode Node Shared File System OS MR OS MR OS MR
  • 14.
    HDFS Processing layer Data layer Disk HDFS Disk Disk Disk DataNode DataNodeDataNode Shared FileSystem NameNode DataNode daemon DataNode daemon DataNode daemon NameNode daemon Client
  • 15.
    HDFS Processing layer Data layer Disk HDFS Disk Disk Disk TaskTrackerNode TTNode Shared FileSystem JobTrackerNode DataNode daemon DataNode daemon DataNode daemon NameNode daemon Client TTNode JobTracker daemon TaskTracker daemon TaskTracker daemon TaskTracker daemon TaskTracker daemon DataNode daemon
  • 17.
    bench S S SS S S bench S S S S S S bench S S S S S S bench S S S S S S bench S S S S S S Class Room Cluster Rack DNNN DN DN DN DN Rack DNDN DN DN DN DN Rack DNDN DN DN DN DN Rack DNDN DN DN DN DN Rack DNDN DN DN DN DN Switch Student Leader/Master
  • 18.