Data has been piling up in organizations since a number of years but since some time, because of the prevailing fervor behind ‘Big Data’ and ‘Business Intelligence’, there is awareness and availability of valued information and accurate storage of data to organizations, which is why they are happily storing their heaps of data and extracting desired information in required format.
2. What is Apache Hadoop?
•A proficient data management framework for Big Data
•Open source software for distributed processing of
large chunks of data
•Offers distributed parallel processing across servers,
ranging from a single server to multiple machines
•Processing and analysis of thousands of terabytes of
data
•Apt framework to increase business efficiency and
maximize ROI
•Latest Release on 18 November, 2014: Release 2.6.0
What is Apache Hadoop?
3. Main Modules of Hadoop
Hadoop
Common
HDFS (Hadoop
Distributed File
System)
Hadoop YARN
Hadoop
MapReduce
Main Modules of Hadoop
4. Main Modules of Hadoop (contd.)
•Hadoop Common
Common utilities to help other Hadoop modules
and support subprojects
Includes File System, RPC and serialization libraries
•Hadoop Distributed File System (HDFS)
Distributed File System giving access to application
data
Spans across all nodes in a Hadoop cluster to link
them into one big file system
Java based, giving scalable and reliable data
storage
Main Modules of Hadoop (contd.)
5. Main Modules of Hadoop (contd.)
•Hadoop YARN
Utilized for job scheduling and resource
management of clusters
Splits up two roles of JobTracker, namely, resource
management and job scheduling into different areas
•Hadoop MapReduce
System for parallel processing of large data sets
A framework that gets into work assignment to
nodes in a particular cluster
Writes applications processing large amount of
data, on multiple nodes of hardware with utmost
reliability
Main Modules of Hadoop (contd.)
6. Other Hadoop Related Projects at Apache
• Avro
•Cassandra
•Hbase
•Hive
•Pig
•Spark
• Ambari
•Chukwa
•Mahout
•Tez
•ZooKeeper
Other Hadoop Related Projects at Apache
7. Why Hadoop?
• Next generation real time analytics
•Rich eco systems
•Scale-out storage
•Reduced cost of ownership
•Scalability, Flexibility and Reliability
•Fault tolerance
•Simplistic programming models
Why Hadoop?
8. Looking Forward To Have A Mutually Beneficial Association.
Assuring You Of Our Best Services Always.
SPEC INDIA
"SPEC House“, Parth
Complex,
Swastik Cross Road,
Navrangpura,
Ahmedabad-380
009, INDIA.
Tel.:+91-79-26404031 to
34
VoIP : + 1 - 908 - 450 -
9862
Instant Messengers
spec.bd | spec_india |
bd.spec
specindia2009 |
specindia.bd
e-mail: lead@spec-india.com
URL: http://www.spec-
india.com
THANK YOU