This ppt is a small slideshare to describe the problem of increasing digital data and its technical solution called Hadoop.Embedded effects and motions make it exciting and more presentable
2. Big data means really a big data, it is a collection of
large & complex data that it becomes difficult to
process using traditional data processing
applications.
5. TYPES OF BIG DATA
Structured Data:-Relational Data
Semi-Structured Data:-XML Data
Unstructured Data:-PDF ,Word ,Text ,Media Logs etc.
6. Daily, updation of 0.5 PBs on FACEBOOK including 40 millions PHOTOS.
Daily ,videos uploading on YOUTUBE that can be watched for 1 year
continously.
Also affect INTERNET SEARCH,FINANCE & BUSINESS INFORMATION
Challenge include in CAPTURE,SEARCHING,SHARING,ANALY-
SIS,STORAGE & VISUALIZATION of data.
9. A software framework for distributed processing of large datasets
across large clusters of computers
Large datasets Terabytes or petabytes of data
Large clusters hundreds or thousands of nodes
Open-source implementation for Google MAPREDUCE
Based on a simple data model, anydatawillfit
10. 2005: Doug Cutting and Michael J. Cafarella and team developed Hadoop
to support distribution for the Nutch search engine project.
Doug named it after his son's toy elephant
The project was funded by YAHOO
2006: Yahoo gave the project to APACHE SOFTWARE FOUNDATION.
13. A software frameawork for distributing computation of
huge data.
Consists of two main phases
◦ Map
◦ Reduce
The Map Task: converts input into individually broken
elements.
The Reduce Task: takes the output from a map task as
input and combines.
14. How MapReduce Works??
We Love India We 1 Love 1
Love 1 India 1
India 1 We 2
We Play Cricket We 1 Tennis 1
Play 1 Play 1
Tennis
MAP REDUCE
We Love India
We Play Cricket
15. HDFS
Distributed File system used by Hadoop is (HDFS).
Based on the Google File System (GFS).
Designed to run on thousands of clusters of small
computers.
HDFS uses a MASTERSLAVE ARCHITECTURE
16. Master node is called namenode.
Slave node is called datanode.
Master (Name Node) manages the file system metadata.
Slave( DataNodes) store the actual data.
A file in an HDFS is split into several blocks
Blocks are stored in a set of DataNodes.
NameNode the maps blocks to the DataNodes.
The DataNodes takes care of read, write, creation and deletion
operatons based on instruction given by NameNode.
17. Provides access to HDFS.
Contains Java libraries and utilities
Contains the necessary java files &
scripts to start HADOOP.
18. ADVANTAGES OF HADOOP
Designed to detect & handle
failures.
• Automation distribution of data across
the machines.
Doesn’t rely on hardware for fault
tolerance.
• Servers can be added or removed
dynamically.