Big Data Technologies and Solutions

By :- Rishi Arora
www.rishiarora.com

Companies
by
estimated
Number
of
Servers

Source : http://www.ibmbigdatahub.com/infographic/four-vs-big-data

WHY BIG DATA ?
WHY NOW ?
2,500 exabytes of new information in 2012 with Internet as primary driver
Digital universe grew by 62% last year to 800K petabytes and will grow to
1.2 “zettabytes” this year
Source : An IDC White Paper- As the Economy Contracts, the Digital Universe

Read Write Disk is Slow
1Tb Drives are read at 100Mb/sec
Use Disks in Parallel
1 HDD = 100 Mb/sec
100 HDD = 10 Gb /Sec
Solution

Problem #2
Hardware Failure
Single Machine Failure
Keep Multiple Copies of Data
Solution

Problem #3
Merge Data from Different Reads
Keep Multiple Copies of Data
Solution
Only completed results need to be taken into consideration
and failed results need to be ignored
Data needs to be compressed to be sent across the network

DISTRIBUTED
FAULT TOLERENT
SCALABLE
FLEXIBLE
INTILLIGENT

Hadoop
Components
HDFS Map Reduce
Distributed File Manager Map Reduce

• Designed for modest number of Large files (millions
instead of billions)
• Sequential access not Random access
• Write Once, Read Many
• Data is split into chunks and stored in multiple nodes
as blocks
• Namenode maintains the block locations
• Blocks get replicated over the data nodes
• Single namespace and accessible universally
• Computation is moved to the data – data locality
HDFS Overview

Map Reduce Overview
• Tasks are distributed to multiple nodes
• Each node processes the data stored in that node
• Consists of two phase:
• Map – Reads input data and output intermediate
keys and values
• Reduce – Values with the same key are sent to
the same reducer for further processing

HDFS
HDFS v2 YARN
ZOOKEEPER
Coordinator
FLUME
LogCollector
SQOOP
DataExchanger
Workflow
PIG
Scripting
HIVE
SQLQuery
MachineLearning
Column
Store
Hadoop Ecosystem

Big Data Technologies and Solutions

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (9)

Similar to Big Data Technologies and Solutions

Similar to Big Data Technologies and Solutions (20)

Recently uploaded

Recently uploaded (20)

Big Data Technologies and Solutions