Hadoop

SOFTWARE DEVELOPMENT DONE RIGHT
Netherlands | USA | India | UK | France

What is Big
Data?

Generally refers to data that can not be
processed by traditional systems efficiently
mainly because of it's size.


Twitter/Facebook example

Facebook – 500TB data daily

Twitter – 250million tweets daily


90% of data has been generated in last 2-3
years.

Big Data
Sources

Sources -
• Social networking sites like twitter, facebook etc.
• Smart phones
• Trading platforms
• Machines
• Log Files


This data is used for different purposes like
• Product Trends
• Market Analysis

What is
Hadoop ?

Apache Hadoop is a Framework for running
applications on large cluster built of commodity
hardware.

Transparently provides applications both
reliability and data motion.

Implements a computational paradigm named
Map/Reduce where application is divided in
small fragments of work.

Provides a distributed file system (HDFS)

Transfers code near to data.

Hadoop opened the gates for processing Big
Data

Hadoop's
History

Hadoop is based on work done by Google


GFS – HDFS


Google Map Reduce – Hadoop Map
Reduce


BigTable – HBase

Hadoop
Features

Partial Failure Support


Data Recoverability


Component Recovery


Consistency


Scalability

Hadoop
Components

Core Components
• HDFS – Hadoop Distributed File System
• Map Reduce


Projects in Hadoop Ecosystem
• Pig, Hive, HBase, Flume, Oozie, Sqoop
etc.

Case
Study

Product - Data Quality and cleansing product
solutions.


Before Hadoop

Two node DB cluster

Multi-threaded java application for de-
duplication

1 million records took 10 hrs. to process


After Hadoop

8 GB Ram, 4 cores, 4 machines in cluster.

1 million records took 30 min to process

Hadoop In
Use

Any application which has

> 10TB data

Needs fast and cheap processing

Log Analysis

Recommendation Engine

Feed Analysis

Data Mining

Statistical Analysis

ETL Processing

Business Intelligence

Cloudera

Cloudera is “The commercial Hadoop
company”.


Founded by leading experts on Hadoop
from Facebook, Google,Oracle and Yahoo.


Provides consulting and training services
for Hadoop users.


Staff includes committers to virtually all
Hadoop projects.

Resources

Books

Hadoop : The Definitive Guide (by Tom White)

Hbase : The Definitive Guide (by Lars George)

MapReduce Design Patterns (by Donald Miner)


Web

http://hadoop.apache.org/

http://hbase.apache.org/

http://research.google.com/archive/bigtable.html

http://research.google.com/archive/mapreduce-osdi04.pdf

Contact us @

Xebia India
Website
www.xebia.com Thought Leadership
www.xebia.in http://blog.xebia.com
www.xebia.fr http://podcast.xebia.com

Hadoop

More Related Content

What's hot

Viewers also liked

Similar to Hadoop

Hadoop