Hadoop overview

APACHE HADOOP
All about it in a nutshell…………...

★ Hadoop was created by Doug Cutting and Mike Cafarella in 2005.
★ Cutting, who was working at Yahoo! at the time, named it after his son's toy
elephant.
★ It was originally developed to support distribution for the Nutch search
engine project
HISTORY

★ A batch processing framework of tools
★ These tools are to support running applications on Big Data
★ It is open source
★ Distributed under apache licence
★ Solves three big data problems; velocity, volume, variety
★ Traditional approach maps big data to a single high power machine --- single point of
failure/expensive
but Hadoop approach will harness the power of many low power computers into a powerful
one while creating redundancy in its distributed approach.
HADOOP DEFINED

★ MapReduce
○ task and job trackers
★ Filesystem: HDFS
○ name and data nodes
★ Projects:
○ SLAVES -- Computers with a data node and a task tracker
○ MASTER -- A computer with a datanode, tasktracker, name node and job tracker
○ Project tools:
■ Hive, Hbase, Mahout, Pig, Oozie, Flume, Scoop
ARCHITECTURE

★ Job tracker gets job :
○ distributes to task trackers on slaves
○ when job is done, it is assembled back to job tracker on master.
★ Name node indexes which data node has which data
○ for redundancy and fault tolerance, three copies of each data is
maintained on diff data nodes.
○ tables on name node are backed up and there is also a backup
master
HADOOP MAP-REDUCE ENGINE

★ On which data node the file is located
★ What if a node fails?
★ How to share tasks among data nodes
★ Scalability-1 to 1000 clusters
★ Scalability cost is linear. i.e. The bigger the cluster, the higher the
processing power (x = number of PCs, y = processing speed)
WHAT HADOOP DOESN'T WANT US TO BE
WORRIED ABOUT

★ Yahoo
★ Facebook
★ Amazon
★ Ebay
★ American airlines
★ The New York times
★ Federal reserve board
★ Chevron
★ IBM
★ Who’s next ?? DreamOval Products?? DreamOval business ?? could even be
you!
HADOOP BENEFICIARIES

★ Adverts: mining of users behaviour to generate
recommendation
★ Searches: group related documents
★ Security: search for uncommon patterns, AML, fraud
etc...
HADOOP APPLICATIONS

★ Admins: install, manage, maintain
★ Users: Designing applications, import and export data,
work with tools
….. EVERY DOer COULD BE A USER :)
HADOOP USERS

By 2015, 50% of enterprise data will be
processed by Hadoop...
YAHOO PREDICTING HADOOP'S FUTURE

1. Standalone mode: all hadoop daemons on one PC and on one Java Virtual
Machine process
2. Pseudo distributed mode: all hadoop daemons on one PC and on different
Java Virtual Machine processes
3. Fully distributed mode: all hadoop daemons on diff PCs and on different
Java Virtual Machine processes
HADOOP INSTALLATION TYPES

★ An ssh server for master slave communication
★ Java 6 or greater
★ Download and install hadoop
★ Add path to .bashrc
★ Configure hadoop environment
○ Edit env.sh-- Java_home TO RIGHT PATH, disable ipv6
○ Configure .xml (core-site- configure name nodes and ports; mapped-site- configure job
trackers and ports)
★ Launch Hadoop daemons
INSTALLATION OVERVIEW

Hadoop overview

More Related Content

What's hot

Similar to Hadoop overview

Recently uploaded

Hadoop overview