Msquare Systems Inc.,
What is Hadoop?

Apache Hadoop is an open source project governed by the Apache Software Foundation (ASF) that allows you ...
Core services on Hadoop

MapReduce: MapReduce is a framework for writing applications that process large amounts of struct...
Hadoop Data Services

Apache Pig: Its platform for processing and analyzing large data sets.

Apache Hbase: A column-orien...
Hadoop Data Services

Apache Accumulo : Accumulo is a high performance data storage and retrieval system with cell-level a...
Hadoop Operational Services

Apache Zookeeper: A highly available system for coordinating distributing processes.

Apache ...
What Hadoop can, and can't do

What Hadoop can't do
You can't use Hadoop for
 Structured data
 Transactional data

What ...
Support & Partner
Getting Started or Support –

Muthu Natarajan

muthu.n@msquaresystems.com

www.msquaresystems.com

Phone...
Upcoming SlideShare
Loading in...5
×

Hadoop white papers

346

Published on

Quick Brief about " What is Hadoop"
I didn't explain in detail about hadoop, but reading this slides will give you insight of Hadoop and core product usage. This document will be more useful for PM, Newbies, Technical Architect entering into Cloud Computing.

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
346
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
12
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Hadoop white papers

  1. 1. Msquare Systems Inc.,
  2. 2. What is Hadoop? Apache Hadoop is an open source project governed by the Apache Software Foundation (ASF) that allows you to gain insight from massive amounts of structured and unstructured data quickly and without significant investment. Hadoop is designed to run on commodity hardware and can scale up or down without system interruption. It consists of three main functions: storage, processing and resource management.
  3. 3. Core services on Hadoop MapReduce: MapReduce is a framework for writing applications that process large amounts of structured and unstructured data in parallel across a cluster of several machines in a reliable and fault-tolerant. HDFS: Hadoop Distributed File System is a java-based file system that provides scalable and reliable data storage for large group of clusters. Hadoop Yarn: Yarn is a next generation framework for Hadoop Data processing extending MapReduce capabilities by supporting nonMapReduce workloads associated with other programming models. Apache Tez: Tez generalizes the MapReduce paradigm to a more powerful framework for executing a complex DAG (directed acyclic graph) of tasks for near real-time big data processing
  4. 4. Hadoop Data Services Apache Pig: Its platform for processing and analyzing large data sets. Apache Hbase: A column-oriented No SQL data storage system that provides random real-time read/write access to big data for user applications. Apache Hive: Built on the MapReduce framework, Hive is a data warehouse that enables easy data summarization and add-hoc queries via SQL-like interface for large datasets stored in HDFS. Apache Flume: Allows efficiently aggregating and moving large amounts of log data from many different sources to Hadoop. Apache Mahout: Apache Mahout scalable machine learning algorithms for hadoop, which aids with data science for clustering, classification and batch based collaborative filtering
  5. 5. Hadoop Data Services Apache Accumulo : Accumulo is a high performance data storage and retrieval system with cell-level access control. It is a scalable implementation of Google’s Big Table design that works on top of Apache Hadoop and Apache ZooKeeper. Apache Storm : Storm is a distributed real-time computation system for processing fast, large streams of data adding reliable real-time data processing capabilities to Apache Hadoop 2.x. Apache Catalog : A table and metadata management service that provides a centralized way for data processing systems to understand the structure and location of the data stored within Apache Hadoop Apache Sqoop : Sqoop is a tool that speeds and eases movement of data in and out of Hadoop. It provides a reliable parallel load for various, popular enterprise data sources.
  6. 6. Hadoop Operational Services Apache Zookeeper: A highly available system for coordinating distributing processes. Apache Falcon: Falcon is a data management framework for simplifying data lifecycle management and processing pipelines on Apache hadoop. Apache Ambari: Open source installation lifecycle management, administration, and monitoring system for Apache Hadoop Clusters. Apache knox: “Knox” gateway is a system that provides a single point of authentication and access for Apache Hadoop services in a cluster. Apache Oozie: Oozie Java web application used to schedule Apache Hadoop Jobs. Oozie combines multiple jobs sequentially into one logical unit of work.
  7. 7. What Hadoop can, and can't do What Hadoop can't do You can't use Hadoop for  Structured data  Transactional data What Hadoop can do You can use Hadoop for  Big Data
  8. 8. Support & Partner Getting Started or Support – Muthu Natarajan muthu.n@msquaresystems.com www.msquaresystems.com Phone: 212-941-6000
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×