2. What is ?
“Hadoop” is a free, Java-based programming
framework that supports the processing of
large data sets in a distributed computing
environment. It is part of the Apache project
sponsored by the Apache Software
Foundation.
6. in the real world.
Telecommunicatio
ns
Data Warehousing
Market Research
Forecasting
Social Networking
Natural
Language
Processing
(NLP)
Image Video
Processing
Academic
Research
…
7. ‘s History
Inspired by Big Table and
MapReduce
papers circa. 2004.
Created By Doug Cutting.
Originally built to support distribution for Nutch
Search Engine.
Named after a stuff elephant.
8. What is NOT ?
It isn’t a relational database...
an online transaction processing
system...
a structured data store of any kind!
11. Challenges of using
:
There’s a widely acknowledged talent gap. (it
can be difficult for entry level programmers
who don’t have sufficient skills to be productive
with MapReduce)
Data Security.
Full fledged data management and
governance.