• Introduction
What is Big data?
Why Big data a problem?
• Big Data Every Where…
• Characterization of Big-Data(V3)
• Big Data Domains Specific
What is Big Data?
• Big data is the amount of data that is beyond the storage
and processing capabilities of a single physical machine.
• Big data analytics refers to the process of collecting,
organizing and analyzing large sets of data to find
patterns and useful information.
Why Big Data a problem?
• The model has changed…
• Old Model: Few companies are generating data, all others
are consuming data
New Model: All of us are generating data, and all of us are
consuming data
• Lots of data is being collected
and warehoused
• Web data, e-commerce
• Bank/Credit Card
transactions
• Social Network
Big Data Every Where!
Characteristics of BigData:
• Volume
• Velocity
• Variety
Hadoop..A Solution for BigData
Hadoop Architecture
HDFS??
• Data is distributed across many machines at load time
– Different blocks from the same file will be stored
on different machines
• Blocks are replicated across multiple machines, known
as DataNodes
– Default replication is three-fold
• A master node called the NameNode keeps track of
which blocks make up a file, and where those blocks are
located
– Known as the metadata
HDFS ARCHITECTURE
HDFS 1.0 Architecture Cont’d
Hadoop 2.0
Writing files to HDFS
Map Reduce??
• MapReduce is a method for distributing a task across
multiple nodes.
• Everything is in the form of Key-Value pairs for flexibility
• Consists of two phases:
– Map
– Reduce
• For mapper and reducer the input must be in the form of (key ,
value) pair and their outputs also in the (key , value) pair only.
Characteristics of MapReduce
• Job Tracker
• Task Tracker
• Mapper
• Reducer
Name Node
Hadoop 2.0
MapReduce(Mapper Flow)
MapReduce(Reducers to output)
Conclusion…..
Bigdata Analytics using Hadoop

Bigdata Analytics using Hadoop

  • 2.
    • Introduction What isBig data? Why Big data a problem? • Big Data Every Where… • Characterization of Big-Data(V3) • Big Data Domains Specific
  • 3.
    What is BigData? • Big data is the amount of data that is beyond the storage and processing capabilities of a single physical machine. • Big data analytics refers to the process of collecting, organizing and analyzing large sets of data to find patterns and useful information.
  • 4.
    Why Big Dataa problem? • The model has changed… • Old Model: Few companies are generating data, all others are consuming data New Model: All of us are generating data, and all of us are consuming data
  • 5.
    • Lots ofdata is being collected and warehoused • Web data, e-commerce • Bank/Credit Card transactions • Social Network Big Data Every Where!
  • 6.
    Characteristics of BigData: •Volume • Velocity • Variety
  • 7.
  • 8.
  • 9.
    HDFS?? • Data isdistributed across many machines at load time – Different blocks from the same file will be stored on different machines • Blocks are replicated across multiple machines, known as DataNodes – Default replication is three-fold • A master node called the NameNode keeps track of which blocks make up a file, and where those blocks are located – Known as the metadata
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
    Map Reduce?? • MapReduceis a method for distributing a task across multiple nodes. • Everything is in the form of Key-Value pairs for flexibility • Consists of two phases: – Map – Reduce • For mapper and reducer the input must be in the form of (key , value) pair and their outputs also in the (key , value) pair only.
  • 15.
    Characteristics of MapReduce •Job Tracker • Task Tracker • Mapper • Reducer
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.