2. Introduction
Data which are very large in size is called Big Data. which is in Peta bytes
i.e. 10^15 byte .
Almost 90% of the world’s data was generated in the last few years.
Big data world the sheer volume, velocity and variety of data renders many
ordinary technologies are ineffective.
These data come from many sources like :
Social networking sites: like Facebook, Google etc.
E-commerce site: like Amazon, Flipkart etc.
Telecom company: like Airtel, Vodafone etc.
And many more….
Hadoop is the solution for this big data.To manage all the data that their
servers were gathering in an efficient, cost effective way.
3. Hadoop
For this huge amount of unstructured data which needs to be stored,
processed and analyzed.
So, for this issues hadoop uses HDFS.
Hadoop was originally created by a Yahoo.
Hadoop is an open source framework from Apache.
It is used to store process and analyze data which are very huge in volume.
It is written in Java and is not online analytical processing.
4. Architecture
Hadoop framework includes following four modules:
1. Hadoop Common: These Java libraries are used to start Hadoop.
2. Yarn: Yet another Resource Negotiator is used for job scheduling
and manage the cluster..
3. HDFS: Hadoop Distributed File System.
4. Map Reduce: This is YARN-based system for parallel processing
of large data sets.
7. Working
Hadoop runs code across a cluster of computers. This process
includes the following core tasks that Hadoop performs:
Data is initially divided into directories and files. Files are divided
into uniform sized blocks of 128M and 64M (preferably 128M).
These files are then distributed across various cluster nodes for
further processing.
HDFS, being on top of the local file system, supervises the
processing.
Blocks are replicated for handling hardware failure.
Checking that the code was executed successfully.
Performing the sort that takes place between the map and reduce
stages.
Sending the sorted data to a certain computer.
Writing the debugging logs for each job.
8. Application
Now days Hadoop Technolgy used in healthcare system like in Cancer
Treatments and Genomics , in Monitoring Patient Vitals , in the Hospital
Network , in Fraud Prevention and Detection etc.
It is being used by Facebook , Yahoo, Google, Twitter, LinkedIn and many
more.
9. Hadoop Installation
Environment required for Hadoop:
For Hadoop installation on the UNIX environment you need :
Java Installation
SSH installation
Hadoop Installation and File Configuration