Dr. Panjabrao Deshmukh
Polytechnic, Amravati
Presented by :-
Kartik N. Kalpande
Seminar On
Hadoop Application
1. INTRODUCTION
2. HADOOP FRAMEWORK
3. HOW IT WORK
4. WHY USE HADOOP
5. NEED OF HADOOP IN HEALTHCARE
6. APPLICATIONS OF HADOOP
7. HADOOPAT YAHOO!
8. SCOPE
9. LIMITATION
10. CONCLUSION
11. REFERENCES
 What is Hadoop?
 It is part of the Apache project sponsored by
the Apache Software Foundation.
• HDFS(Distributed File System)
The Hadoop Distributed File System
(HDFS) is the primary storage system
used by Hadoop applications.
• Map Reduce
Hadoop MapReduce(Hadoop Map/Reduce)
is a software framework for distributed
processing of large data sets on compute
clusters of commodity hardware
Who use hadoop
 Amazon
 Yahoo
 IBM
 Facebook
 Google
 Adobe
 IBS
 …Many More
 It’s cost effective.
 It’s fault tolerant
 It’s flexible
 It’s scalable
Need of Hadoop in Healthcare
Data Solutions
 Provide storage for billions and trillions of
unstructured data sets.
 Fault tolerance along with high avaiability of the system.
 Parallel Data Processing that is unconstrained.
The project uses MapReduce
and Hadoop to create and
maintain the world’s largest
biometric database, which can
verify a person’s identity within
200 milliseconds.
Currently we have 2 major clusters:
A 1100-machine cluster with 8800 cores
and about 12 PB raw storage.
A 300-machine cluster with 2400 cores and
about 3 PB raw storage.
Hadoop at Yahoo!
Hadoop is a top level Apache
project, initiated and led by
Yahoo!. It relies on an active
community of contributors
from all over the world for its
success.
 More than 100,000 CPUs in >40,000 computers running
Hadoop
 Our biggest cluster: 4500 nodes (2*4cpu boxes w 4*1TB
disk & 16GB RAM)
 Used to support research for Ad Systems and Web
Search
 Also used to do scaling tests to support development of
Apache Hadoop on larger clusters
 >60% of Hadoop Jobs within Yahoo are Apache Pig
jobs.
Scope
Big Data technologies and Hadoop are
among today's world technologies and
need as well, As this data is very good
source to do your research on any
domain, any field .
Limitation
When it comes to making the most
of big data, Hadoop may not be the
only answer. Apache flume, Mill-
wheel, and Google’s own Cloud
Data-flow as possible solutions.
Conclusion
Hadoop MapReduce is a large scale,
open source software framework
dedicated to scalable, distributed, data-
intensive computing.
The framework breaks up large data
into smaller parallelizable chunks and
handles scheduling..
References
Hadoop Releases apache.org. Apache
Software Foundation.
Hadoop Releases Hadoop.apache.org.
Welcome to Apache Hadoop
hadoop.apache.org.
Yahoo! Launches World’s Largest Hadoop
Production Application Yahoo. 19 February
2008..
Hadoop and Distributed Computing at
Yahoo! Yahoo.
Hadoop

Hadoop

  • 1.
    Dr. Panjabrao Deshmukh Polytechnic,Amravati Presented by :- Kartik N. Kalpande Seminar On Hadoop Application
  • 2.
    1. INTRODUCTION 2. HADOOPFRAMEWORK 3. HOW IT WORK 4. WHY USE HADOOP 5. NEED OF HADOOP IN HEALTHCARE 6. APPLICATIONS OF HADOOP 7. HADOOPAT YAHOO! 8. SCOPE 9. LIMITATION 10. CONCLUSION 11. REFERENCES
  • 3.
     What isHadoop?  It is part of the Apache project sponsored by the Apache Software Foundation.
  • 4.
    • HDFS(Distributed FileSystem) The Hadoop Distributed File System (HDFS) is the primary storage system used by Hadoop applications.
  • 5.
    • Map Reduce HadoopMapReduce(Hadoop Map/Reduce) is a software framework for distributed processing of large data sets on compute clusters of commodity hardware
  • 7.
    Who use hadoop Amazon  Yahoo  IBM  Facebook  Google  Adobe  IBS  …Many More
  • 8.
     It’s costeffective.  It’s fault tolerant  It’s flexible  It’s scalable
  • 9.
    Need of Hadoopin Healthcare Data Solutions  Provide storage for billions and trillions of unstructured data sets.  Fault tolerance along with high avaiability of the system.  Parallel Data Processing that is unconstrained.
  • 10.
    The project usesMapReduce and Hadoop to create and maintain the world’s largest biometric database, which can verify a person’s identity within 200 milliseconds.
  • 11.
    Currently we have2 major clusters: A 1100-machine cluster with 8800 cores and about 12 PB raw storage. A 300-machine cluster with 2400 cores and about 3 PB raw storage.
  • 12.
    Hadoop at Yahoo! Hadoopis a top level Apache project, initiated and led by Yahoo!. It relies on an active community of contributors from all over the world for its success.
  • 13.
     More than100,000 CPUs in >40,000 computers running Hadoop  Our biggest cluster: 4500 nodes (2*4cpu boxes w 4*1TB disk & 16GB RAM)  Used to support research for Ad Systems and Web Search  Also used to do scaling tests to support development of Apache Hadoop on larger clusters  >60% of Hadoop Jobs within Yahoo are Apache Pig jobs.
  • 15.
    Scope Big Data technologiesand Hadoop are among today's world technologies and need as well, As this data is very good source to do your research on any domain, any field .
  • 16.
    Limitation When it comesto making the most of big data, Hadoop may not be the only answer. Apache flume, Mill- wheel, and Google’s own Cloud Data-flow as possible solutions.
  • 17.
    Conclusion Hadoop MapReduce isa large scale, open source software framework dedicated to scalable, distributed, data- intensive computing. The framework breaks up large data into smaller parallelizable chunks and handles scheduling..
  • 18.
    References Hadoop Releases apache.org.Apache Software Foundation. Hadoop Releases Hadoop.apache.org. Welcome to Apache Hadoop hadoop.apache.org. Yahoo! Launches World’s Largest Hadoop Production Application Yahoo. 19 February 2008.. Hadoop and Distributed Computing at Yahoo! Yahoo.