08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
Bigdata Analytics using Hadoop
1.
2. • Introduction
What is Big data?
Why Big data a problem?
• Big Data Every Where…
• Characterization of Big-Data(V3)
• Big Data Domains Specific
3. What is Big Data?
• Big data is the amount of data that is beyond the storage
and processing capabilities of a single physical machine.
• Big data analytics refers to the process of collecting,
organizing and analyzing large sets of data to find
patterns and useful information.
4. Why Big Data a problem?
• The model has changed…
• Old Model: Few companies are generating data, all others
are consuming data
New Model: All of us are generating data, and all of us are
consuming data
5. • Lots of data is being collected
and warehoused
• Web data, e-commerce
• Bank/Credit Card
transactions
• Social Network
Big Data Every Where!
9. HDFS??
• Data is distributed across many machines at load time
– Different blocks from the same file will be stored
on different machines
• Blocks are replicated across multiple machines, known
as DataNodes
– Default replication is three-fold
• A master node called the NameNode keeps track of
which blocks make up a file, and where those blocks are
located
– Known as the metadata
14. Map Reduce??
• MapReduce is a method for distributing a task across
multiple nodes.
• Everything is in the form of Key-Value pairs for flexibility
• Consists of two phases:
– Map
– Reduce
• For mapper and reducer the input must be in the form of (key ,
value) pair and their outputs also in the (key , value) pair only.