1. J. Sai Krishna and G. Sravya
Lahari
2nd B.Tech (CSE)
K.O.R.M College of Engineering
Kadapa
The solution for Big data
HADOOP
2. Contents
1. Data – trends in storing data.
2. Bigdata – problems in IT industry
3. Introduction to HADOOP
4. HDFS (Hadoop Distributed File System)
5. MapReduce
6. Prominent users of Hadoop.
7. Conclusion
3. Data – trends in storing data
What is data--- Any real world symbol
(character, numeric, special character) or
a of group of them is said to be data it
may be of the visual or audio or scriptural
,etc
File system
Databases
Cloud (internet)
4. Big data
What is big data—In IT, it is a collection of data
sets so large and complex data that it becomes
difficult to process using on-hand database
management tools or traditional data processing
applications.
As of 2012, limits on the size of data sets that are
feasible to process in reasonable time were on
the order of Exabyte of data.
5. BIGDATA and problems with it.
Daily about 0.5 Petabytes of updates are being made
into FACEBOOK including 40 millions photos.
Daily, YOUTUBE is loaded with videos that can be
watched for one year continuously
Limitations are encountered due to large data sets in
many areas, including meteorology, genomics,
complex physics simulations, and biological and
environmental research.
Also affect Internet search, finance and business
informatics.
The challenges include in capture, retrieval, storage,
search, sharing, analysis, and visualization.
7. What is Hadoop?
It is a opensource software written in java
Hadoop software library is a framework that
allows for the distributed processing of large data
sets across clusters of computers using simple
programming models.
It is designed to scale up from single servers to
thousands of machines, each offering local
computation and storage.
8. The project includes these modules:
• Hadoop Common
• Hadoop Distributed File System (HDFS)
• Hadoop MapReduce
9. 1.Hadoop Commons
It provides access to the filesystems supported by
Hadoop.
The Hadoop Common package contains the
necessary JAR files and scripts needed to start
Hadoop.
The package also provides source code,
documentation, and a contribution section which
includes projects from the Hadoop Community
(Avro, Cassandra, Chukwa, Hbase, Hive, Mahout,
Pig, ZooKeeper)
10. Interesting, right?
This is just a sneak preview of the full presentation. We hope
you like it! To see the rest of it, just click here to view it in full
on PowerShow.com. Then, if you’d like, you can also log in to
PowerShow.com to download the entire presentation for free.