Bigdata Analytics using Hadoop

• Introduction
What is Big data?
Why Big data a problem?
• Big Data Every Where…
• Characterization of Big-Data(V3)
• Big Data Domains Specific

What is Big Data?
• Big data is the amount of data that is beyond the storage
and processing capabilities of a single physical machine.
• Big data analytics refers to the process of collecting,
organizing and analyzing large sets of data to find
patterns and useful information.

Why Big Data a problem?
• The model has changed…
• Old Model: Few companies are generating data, all others
are consuming data
New Model: All of us are generating data, and all of us are
consuming data

• Lots of data is being collected
and warehoused
• Web data, e-commerce
• Bank/Credit Card
transactions
• Social Network
Big Data Every Where!

Characteristics of BigData:
• Volume
• Velocity
• Variety

Hadoop..A Solution for BigData

HDFS??
• Data is distributed across many machines at load time
– Different blocks from the same file will be stored
on different machines
• Blocks are replicated across multiple machines, known
as DataNodes
– Default replication is three-fold
• A master node called the NameNode keeps track of
which blocks make up a file, and where those blocks are
located
– Known as the metadata

HDFS 1.0 Architecture Cont’d

Map Reduce??
• MapReduce is a method for distributing a task across
multiple nodes.
• Everything is in the form of Key-Value pairs for flexibility
• Consists of two phases:
– Map
– Reduce
• For mapper and reducer the input must be in the form of (key ,
value) pair and their outputs also in the (key , value) pair only.

Characteristics of MapReduce
• Job Tracker
• Task Tracker
• Mapper
• Reducer

Bigdata Analytics using Hadoop

Bigdata Analytics using Hadoop

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Bigdata Analytics using Hadoop

Similar to Bigdata Analytics using Hadoop (20)

Recently uploaded

Recently uploaded (20)

Bigdata Analytics using Hadoop