Hadoop

Dr. Panjabrao Deshmukh
Polytechnic, Amravati
Presented by :-
Kartik N. Kalpande
Seminar On
Hadoop Application

1. INTRODUCTION
2. HADOOP FRAMEWORK
3. HOW IT WORK
4. WHY USE HADOOP
5. NEED OF HADOOP IN HEALTHCARE
6. APPLICATIONS OF HADOOP
7. HADOOPAT YAHOO!
8. SCOPE
9. LIMITATION
10. CONCLUSION
11. REFERENCES

 What is Hadoop?
 It is part of the Apache project sponsored by
the Apache Software Foundation.

• HDFS(Distributed File System)
The Hadoop Distributed File System
(HDFS) is the primary storage system
used by Hadoop applications.

• Map Reduce
Hadoop MapReduce(Hadoop Map/Reduce)
is a software framework for distributed
processing of large data sets on compute
clusters of commodity hardware

Who use hadoop
 Amazon
 Yahoo
 IBM
 Facebook
 Google
 Adobe
 IBS
 …Many More

 It’s cost effective.
 It’s fault tolerant
 It’s flexible
 It’s scalable

Need of Hadoop in Healthcare
Data Solutions
 Provide storage for billions and trillions of
unstructured data sets.
 Fault tolerance along with high avaiability of the system.
 Parallel Data Processing that is unconstrained.

The project uses MapReduce
and Hadoop to create and
maintain the world’s largest
biometric database, which can
verify a person’s identity within
200 milliseconds.

Currently we have 2 major clusters:
A 1100-machine cluster with 8800 cores
and about 12 PB raw storage.
A 300-machine cluster with 2400 cores and
about 3 PB raw storage.

Hadoop at Yahoo!
Hadoop is a top level Apache
project, initiated and led by
Yahoo!. It relies on an active
community of contributors
from all over the world for its
success.

 More than 100,000 CPUs in >40,000 computers running
Hadoop
 Our biggest cluster: 4500 nodes (2*4cpu boxes w 4*1TB
disk & 16GB RAM)
 Used to support research for Ad Systems and Web
Search
 Also used to do scaling tests to support development of
Apache Hadoop on larger clusters
 >60% of Hadoop Jobs within Yahoo are Apache Pig
jobs.

Scope
Big Data technologies and Hadoop are
among today's world technologies and
need as well, As this data is very good
source to do your research on any
domain, any field .

Limitation
When it comes to making the most
of big data, Hadoop may not be the
only answer. Apache flume, Mill-
wheel, and Google’s own Cloud
Data-flow as possible solutions.

Conclusion
Hadoop MapReduce is a large scale,
open source software framework
dedicated to scalable, distributed, data-
intensive computing.
The framework breaks up large data
into smaller parallelizable chunks and
handles scheduling..

References
Hadoop Releases apache.org. Apache
Software Foundation.
Hadoop Releases Hadoop.apache.org.
Welcome to Apache Hadoop
hadoop.apache.org.
Yahoo! Launches World’s Largest Hadoop
Production Application Yahoo. 19 February
2008..
Hadoop and Distributed Computing at
Yahoo! Yahoo.

Hadoop

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Hadoop

Similar to Hadoop (20)

More from Kartik Kalpande Patil

More from Kartik Kalpande Patil (20)

Recently uploaded

Recently uploaded (20)

Hadoop